I removed the intermediate function when calling syscalls. Now syscall
handler calls the current process automatically. Only exception is
sys_fork, since it needs a assembly trampoline for the new thread.
Now after each interrupt we will ask the scheduler to reschedule
if the current thread is the idle thread. This allows semaphore
unblocking to be practically instant when there is only one thread
executing.
Now disk reading is back to ~3 MB/s for single threaded process
Performance of the old kmalloc implementation was terrible.
We now use fixed-width linked list allocations for sizes <= 60 bytes.
This is much faster than variable size allocation.
We don't use bitmap scanning anymore since it was probably the slow
part. Instead we use headers that tell allocations size and aligment.
I removed the kmalloc_eternal, even though it was very fast, there is
not really any need for it, since the only place it was used in was IDT.
These changes allowed my psf (font) parsing to go from ~500 ms to ~20 ms.
(coming soon :D)