We now appreciate sa_mask and SA_NODEFER and change the signal mask for
the duration of signal handler. This is done by making a sigprocmask
syscall at the end of the signal handler. Back-to-back signals will
still grow stack as original registers are popped AFTER the block mask
is updated. I guess this is why linux has sigreturn(?).
We need to have interrupts enabled when signal kills the process as
process does mutex locking. Also signals are now only checked when
returning to userspace in the same place where userspace segments are
loaded.
Stack pointer was pointing to value of return address on return instead
of past it. This did not affect anything as ig Processor::yield() didn't
use stack after calling the trampoline
Invalidations are not done if mapping or unmapping previously unmapped
page. TLB invalidate IPIs are now ignored if they don't affect the
currently mapped address space
There is no need to save and load sse state on every interrupt. Instead
we can use CR0.TS to make threads trigger an interrupt when they use sse
instructions. This can be used to only save and load sse state when
needed.
Processor now keeps track of its current "sse thread" and the scheduler
either enabled or disabled sse based on which thread it is starting up.
When a thread dies, it checks if it was the current sse thread to avoid
use after free bugs. When load balancing, processor has to save the
thread's sse state before sending it to a new processor (if it was the
current sse thread). This ensures thread's sse state will be correct
when the new processor ends up loading it.
Doing a yield no longer raises a software interrupt. Instead it just
saves all the callee saved registers, ip, sp and return value. Because
yield is only called in the kernel, it can just restore registers and
jump to the target address. There is never a need to use iret :)
If the processor has invariant TSC it can be used to measure time. We
keep track of the last nanosecond and TSC values and offset them based
on the current TSC. This allows getting current time in userspace.
The implementation maps a single RO page to every processes' address
space. The page contains the TSC info which gets updated every 100 ms.
If the processor does not have invariant TSC, this page will not
indicate the capability for TSC based timing.
There was the problem about how does a processor know which cpu it is
running without doing syscall. TSC counters may or may not be
synchronized between cores, so we need a separate TSC info for each
processor. I ended up adding sequence of bytes 0..255 at the start of
the shared page. When a scheduler gets a new thread, it updates the
threads gs/fs segment to point to the byte corresponding to the current
cpu.
This TSC based timing is also used in kernel. With 64 bit HPET this
probably does not bring much of a benefit, but on PIT or 32 bit HPET
this removes the need to aquire a spinlock to get the current time.
This change does force the userspace to not use gs/fs themselves and
they are both now reserved. Other one is used for TLS (this can be
technically used if user does not call libc code) and the other for
the current processor index (cannot be used as kernel unconditionally
resets it after each load balance).
I was looking at how many times timer's current time was polled
(userspace and kernel combined). When idling in window manager, it was
around 8k times/s. When running doom it peaked at over 1 million times
per second when loading and settled at ~30k times/s.
This was causing some kernel panic because processors ran out of smp
message storage when reserving large areas.
Also most of the time there is no need to actually send the SMP message.
If process is mapping something to just its own address space, there is
no need for a TLB shootdown. Maybe this should be only limited to kernel
memory and threads across the same process. I'm not sure what the best
approach here and it is better to send too many invalidations that too
few!
I had not understood how MSIs work and I was unnecessarily routing them
through IOAPIC. This is not necessary and should not be done :D
Also MSIs were reserving interrupts that IOAPIC was capable of
generating. Now IOAPIC and MSIs use different set of interrupts so
IOAPIC can use more interrupts if needed.
Before I assumed that bootloaders loaded the kernel at physical address
0, but this patch kinda allows loading to different addresses. This
still doesn't fully work as kernel bootstrap paging relies on kernel
being loaded at 0
This cleans up the kernel executable as bootloaders don't have to
load AP init code straight to 0xF000, but it will be moved there once
kernel is doing the AP initialization.
This makes scheduler preemption much cleaner as bsb does not have to
send smp messages to notify other processes about timer interrupt.
Also PIT percision is now "full" 0.8 us instead of 1 ms that I was using
before.