Remove buffering from network layer and rework loopback interface.
loopback now has a separate recieve thread to allow concurrent sends and
prevent deadlocks
We now only send enough data to fill other ends window, not past that.
Previous logic had a but that allowed sending too much data leading to
retransmissions.
When the target sends zero window and later updates window size,
immediately retransmit non-acknowledged bytes.
Don't validate packets through listeing socket twice. The actual socket
will already verify the checksum so the listening socket does not have
to.
We now report actually available window size when sending packets. If
the available window size grows significantly we send an ACK to reflect
this to the remote.
Invalidations are not done if mapping or unmapping previously unmapped
page. TLB invalidate IPIs are now ignored if they don't affect the
currently mapped address space
Process's memory regions are now behind an rwlock instead of using the
full process lock. This allows most pointer validations to not block as
write operations to memory regions are rare.
Thread's userspace stack is now part of process's memory regions. This
simplifies code that explicitly looped over threads to see if the
accessed address was inside a thread's stack.
Only drawback of this is that MemoryRegions don't support guard pages,
so userspace stackoverflow will be handeled as cleanly as it was prior
to this.
This patch also fixes some unnecessary locking of the process lock and
moves locking to the internal helper functions instead of asserting that
the lock is held. Also we now make sure loaded ELF regions are in sorted
order as we previously expected.
Eeach futex object now has its own mutex to prevent unnecessary locking
of the process/global futex lock. This basically removes sys_futex from
profiles when running software with llvmpipe
Hashmap insertions and deletions made futex very slow to use. When
running SuperTuxKart, ~15% of cpu time was spent doing these.
Never freeing objects is not great either but at least the performance
is usable now :)
AFAICS non extended commands are supposed to support 27 bit LBAs but
qemu seems to ignore bits 27:24. Maybe I'm just doing something wrong
but this seems to fix this.
This fixes using big disks :D ATM using using disks >= 8 GiB (with 512
byte LBAs) returned wrong data on reads, failing the boot :D
There is no need to save and load sse state on every interrupt. Instead
we can use CR0.TS to make threads trigger an interrupt when they use sse
instructions. This can be used to only save and load sse state when
needed.
Processor now keeps track of its current "sse thread" and the scheduler
either enabled or disabled sse based on which thread it is starting up.
When a thread dies, it checks if it was the current sse thread to avoid
use after free bugs. When load balancing, processor has to save the
thread's sse state before sending it to a new processor (if it was the
current sse thread). This ensures thread's sse state will be correct
when the new processor ends up loading it.
Doing a yield no longer raises a software interrupt. Instead it just
saves all the callee saved registers, ip, sp and return value. Because
yield is only called in the kernel, it can just restore registers and
jump to the target address. There is never a need to use iret :)
My code to find least loaded processor used processor index instead of
processor id to index the array. Most of the time this lead to wrong
processor returned as the least loaded, leaving some processors
basically idle.
If multiple threads were waiting for more block buffers without anyone
releasing them, they ended up in a deadlock.
Now we store 6 blocks for 8 threads. If a thread already has a block
buffer, it will not have to wait for a new one. Only if there are more
than 8 threads using blocks, will it block until there are free slots
for a thread available.
Add support for processor local futexes. These work the exact same way
as global ones, but only lock a process specific lock and use a process
specific hash map.
Also reduce the time futex lock is held. There was no need to hold the
global lock while validating addresses in the process' address space.