This still uses only a single cpu, but we can now have 'parallelization'
This seems to work fine in qemu and bochs, but my own computer did not
like this when I last tried.
I have absolutely no idea how multithreading should actually be
implmemented and I just thought and implemented the most simple one I
could think of. This might not be in any way correct :D