Hashmap insertions and deletions made futex very slow to use. When
running SuperTuxKart, ~15% of cpu time was spent doing these.
Never freeing objects is not great either but at least the performance
is usable now :)
AFAICS non extended commands are supposed to support 27 bit LBAs but
qemu seems to ignore bits 27:24. Maybe I'm just doing something wrong
but this seems to fix this.
This fixes using big disks :D ATM using using disks >= 8 GiB (with 512
byte LBAs) returned wrong data on reads, failing the boot :D
There is no need to save and load sse state on every interrupt. Instead
we can use CR0.TS to make threads trigger an interrupt when they use sse
instructions. This can be used to only save and load sse state when
needed.
Processor now keeps track of its current "sse thread" and the scheduler
either enabled or disabled sse based on which thread it is starting up.
When a thread dies, it checks if it was the current sse thread to avoid
use after free bugs. When load balancing, processor has to save the
thread's sse state before sending it to a new processor (if it was the
current sse thread). This ensures thread's sse state will be correct
when the new processor ends up loading it.
Doing a yield no longer raises a software interrupt. Instead it just
saves all the callee saved registers, ip, sp and return value. Because
yield is only called in the kernel, it can just restore registers and
jump to the target address. There is never a need to use iret :)
If user's bash does not have bultin stat, updating image perms was
terribly slow. This patch adds a simple c program that does the job
without exec overhead
My code to find least loaded processor used processor index instead of
processor id to index the array. Most of the time this lead to wrong
processor returned as the least loaded, leaving some processors
basically idle.
If multiple threads were waiting for more block buffers without anyone
releasing them, they ended up in a deadlock.
Now we store 6 blocks for 8 threads. If a thread already has a block
buffer, it will not have to wait for a new one. Only if there are more
than 8 threads using blocks, will it block until there are free slots
for a thread available.
Add support for processor local futexes. These work the exact same way
as global ones, but only lock a process specific lock and use a process
specific hash map.
Also reduce the time futex lock is held. There was no need to hold the
global lock while validating addresses in the process' address space.
If the processor has invariant TSC it can be used to measure time. We
keep track of the last nanosecond and TSC values and offset them based
on the current TSC. This allows getting current time in userspace.
The implementation maps a single RO page to every processes' address
space. The page contains the TSC info which gets updated every 100 ms.
If the processor does not have invariant TSC, this page will not
indicate the capability for TSC based timing.
There was the problem about how does a processor know which cpu it is
running without doing syscall. TSC counters may or may not be
synchronized between cores, so we need a separate TSC info for each
processor. I ended up adding sequence of bytes 0..255 at the start of
the shared page. When a scheduler gets a new thread, it updates the
threads gs/fs segment to point to the byte corresponding to the current
cpu.
This TSC based timing is also used in kernel. With 64 bit HPET this
probably does not bring much of a benefit, but on PIT or 32 bit HPET
this removes the need to aquire a spinlock to get the current time.
This change does force the userspace to not use gs/fs themselves and
they are both now reserved. Other one is used for TLS (this can be
technically used if user does not call libc code) and the other for
the current processor index (cannot be used as kernel unconditionally
resets it after each load balance).
I was looking at how many times timer's current time was polled
(userspace and kernel combined). When idling in window manager, it was
around 8k times/s. When running doom it peaked at over 1 million times
per second when loading and settled at ~30k times/s.
Joystick axis and buttons are now named to standard values, this allows
interfacing multiple different controllers (only DS3 is supported)
Add ioctl calls for userspace to set joystick player leds and rumble
Only use DS3 code paths when we detect that the attached device is
actually an DS3 controller
update test-joystick program to the new interface and add support to
control rumble and player leds
I was creating a local variable shadowing the global one. This prevented
cleanup to close it. (this is not really necessary as the program dies
anyway)
I'm not sure if these are used by anything but I would assume so as I
have added them :D
functions added:
- getprotobyname
- open_memstream
- munlock
- lockf
- nice
- crypt
- getsid
- wcstoul