Instead of calculating bit-by-bit crc32, we now calculate a lookup table
during compile time. The old crc32 calculation was taking almost 50% of
the decompression time.
Also handle multiple symbols at once without outputting to user. It is
much more efficient to output many bytes instead of the up to 258 that a
single symbol can decode to :^)
We no longer require the user to pass full compressed data in one go,
instead the decompressor reports to the user if it needs more input or
output space.
Instead of immediately doing rerender of client data and syncing 60 Hz,
we now only keep track of the damaged regions and also do the rerender
step 60 Hz.
Instead of sending while serializing (what even was that), we serialize
the whole packet into a buffer which can be sent in one go. First of all
this reduces the number of sends by a lot. This also fixes WindowServer
ending up sending partial packets when client is not responsive.
Previously we would just try sending once, if any send failed the send
was aborted while partial packet was already transmitted. This lead to
packet stream being out of sync leading to the client killing itself.
Now we allow 64 KiB outgoing buffer per client. If this buffer ever fills
up, we will not send partial packets.
Kernel syscall API no longer zeros all unused argument registers and
libc now uses inlined syscall macro internally. This significantly
cleans up generated code for basic syscall wrapper functions.
Install SIGCANCEL handler for all threads.
Remove unneeded atomic stores and loads. States are only changed within
the thread itself.
Define pthread_testcancel as a macro so it gets inlined inside
cancellation points
If the window does not have an alpha channel, we now set every pixel's
alpha to 0xFF. This is needed by the WindowServer when it does alpha
blending, there used to be some weird stuff happening on overlapping
windows.
Also when we are invalidating a region with width of the whole window,
we can do a single memcpy instead of a memcpy for each row separately.