This moves locking to the inodes themselves which allows reducing lock
times significantly. Main inodes (ext2 and tmpfs) still do contain a
single big mutex that gets locked during operations but now we have the
architecture to optimize these.
I was modifying the `vaddr` argument while invalidating which lead to
the smp message containing wrong virtual address. This showed up in
really weird bugs from invalid TLB
Initial step of paging now just prepares fast page for heap, actual page
table initialization happens after heap is initialized which allows
x86_64 to never depend on kmalloc for pages.
Processor's stacks are now also spawned with PMM/VMM allocated stacks
instead of kmalloc identity mapped.
There is no need for them to be uncached. Having them as uncached killed
the networking performance, over 90% time was spent in kernel out of
which 80% was in checksum calculation and memcpy, half each (measured in
qemu with e1000e)
I don't know why I though the block chain had to be stored fully in the
ThreadBlocker, that did not even fix the problem I was trying to fix
when I last rewrote it. Roll back to doubly linked list of block chain
and now just check that the node is contained within the ThreadBlocker
before removing and after acquiring the ThreadBlocker's lock. Also there
is no need to have a separate lock the node's blocker field. We can just
perform an atomic reads and writes to it. We can still get a blocker
that the node is no longer part of, but this can be resolved with a
simple check. This patch reduces ThreadBlocker's size from over 200
bytes to just 12 bytes +4 bytes padding
Also add custom load addresses for x86_64 target. This allows qemu to
load the kernel with -kernel argument. Without these addresses qemu
would refuse to load as it only supports 32 bit ELFs, but as our kernel
starts in 32 bit mode anyway, we can just load it!
This was just RefPtr<OpenFileDescription> and descriptor flags.
Descriptor flags only define O_CLOEXEC, so we can just store fd's
cloexec status in a bitmap rather than separate fields. This cuts down
the size of OpenFileDescriptorSet to basically half!
Kernel syscall API no longer zeros all unused argument registers and
libc now uses inlined syscall macro internally. This significantly
cleans up generated code for basic syscall wrapper functions.