Compare commits

...

156 Commits

Author SHA1 Message Date
1bf5e6a051 WindowServer: Fix xbanan access check 2026-04-15 16:40:30 +03:00
394719a909 userspace: Fix some includes found when compiling to linux 2026-04-15 16:39:36 +03:00
3ebadc5c74 LibDEFLATE: Optimize decompression
Instead of calculating bit-by-bit crc32, we now calculate a lookup table
during compile time. The old crc32 calculation was taking almost 50% of
the decompression time.

Also handle multiple symbols at once without outputting to user. It is
much more efficient to output many bytes instead of the up to 258 that a
single symbol can decode to :^)
2026-04-14 01:50:30 +03:00
d471bbf856 Kernel: Cleanup bootloader headers
Also add custom load addresses for x86_64 target. This allows qemu to
load the kernel with -kernel argument. Without these addresses qemu
would refuse to load as it only supports 32 bit ELFs, but as our kernel
starts in 32 bit mode anyway, we can just load it!
2026-04-13 16:48:57 +03:00
c849293f3d Kernel: Add support for loading gzip compressed initrd 2026-04-13 16:48:57 +03:00
0156d06cdc LibDEFLATE: Support decompressing to/from partial buffer
We no longer require the user to pass full compressed data in one go,
instead the decompressor reports to the user if it needs more input or
output space.
2026-04-13 03:04:55 +03:00
ad12bf3e1d LibC: Cleanup environment variable code 2026-04-13 00:36:13 +03:00
42964ad0b4 Kernel: Remove concept of OpenFile
This was just RefPtr<OpenFileDescription> and descriptor flags.
Descriptor flags only define O_CLOEXEC, so we can just store fd's
cloexec status in a bitmap rather than separate fields. This cuts down
the size of OpenFileDescriptorSet to basically half!
2026-04-12 04:42:08 +03:00
87979b1627 LibImage: Don't allocate zlib stream to a contiguous buffer
We can now pass multiple buffers to the decoder!
2026-04-11 19:48:46 +03:00
fed9dbefdf LibDEFLATE: Allow decompression from multiple byte spans
Before we required the compressed data to live in a single contiguous
chunch of memory.
2026-04-11 19:47:44 +03:00
2984927be5 WindowServer: Block without timeout when there is no damaged regions 2026-04-11 08:41:21 +03:00
2e654b53fa WindowServer: Use rectangular framebuffer syncs 2026-04-11 08:30:15 +03:00
ac6e6f3ec1 Kernel: Add ioctl to sync rectangular areas in framebuffer
msync is not really the best API for framebuffer synchronization
2026-04-11 08:29:10 +03:00
2b97587e9f WindowServer: Rewrite damaged region tracking
Instead of immediately doing rerender of client data and syncing 60 Hz,
we now only keep track of the damaged regions and also do the rerender
step 60 Hz.
2026-04-11 08:26:22 +03:00
4bde088b28 WindowServer: Store rectangles as min and max bounds
This makes some math easier than x,y and w,h
2026-04-11 06:35:45 +03:00
2a9dad2dd8 LibC: Add SSE2 non-temporal memset and memcpy
Also cleanup other assembly by using local labels to emit them from the
assembled program.
2026-04-11 03:30:52 +03:00
d11160d2f7 Kernel: Fix si_addr reporting
Meaning of this is signal specific and not the instruction pointer
2026-04-11 03:30:52 +03:00
7333008f40 LibC: Use IP instead of si_addr for faulting instruction
si_addr only means faulting instruction for SIGILL. For SIGSEGV it is
the faulting memory address.
2026-04-11 03:30:52 +03:00
cd7d309fd1 Kernel: Push missing IP and SP to mcontext in signal handler
I was missing these two registers, messing up the whole siginfo_t
structure. This fixes libc's stack trace dump crashing :D
2026-04-11 03:30:52 +03:00
a4ba1da65a LibGUI/WindowServer: Rework packet serialization
Instead of sending while serializing (what even was that), we serialize
the whole packet into a buffer which can be sent in one go. First of all
this reduces the number of sends by a lot. This also fixes WindowServer
ending up sending partial packets when client is not responsive.
Previously we would just try sending once, if any send failed the send
was aborted while partial packet was already transmitted. This lead to
packet stream being out of sync leading to the client killing itself.
Now we allow 64 KiB outgoing buffer per client. If this buffer ever fills
up, we will not send partial packets.
2026-04-11 03:30:52 +03:00
2f9b8b6fc9 Kernel/LibC: Rework userspace syscall interface
Kernel syscall API no longer zeros all unused argument registers and
libc now uses inlined syscall macro internally. This significantly
cleans up generated code for basic syscall wrapper functions.
2026-04-11 03:30:52 +03:00
279ac6b2b6 BAN: Implement some macro utilities
This contains stuff to count arguments, stringify, concatinate, for_each
2026-04-11 03:30:52 +03:00
9084d9305c Kernel: Change preemption condition
Instead of keeping track of the current time and rescheduling when
interval has passed, keep track of the next expected reschedule time.
This prevents theoretically missing every second pre-emption when
scheduler's timer is interrupting at same rate as the interval.
2026-04-11 03:30:52 +03:00
80c4213501 LibC: Make errno macro directly access uthread
This allows inlining errno usages

This breaks libc ABI and requires toolchain rebuild
2026-04-11 03:30:32 +03:00
e0af23a924 LibC: Move uthread definition to its own header
Use `__asm__` instead of `asm` to allow compilation with --std=c99 and
before
2026-04-11 03:30:32 +03:00
7e907b70f6 Kernel: Store memory region size as uint64_t
On 32 bit target, we were storing 32 bit physical region sizes which
would truncate regions > 4 GiB
2026-04-07 03:41:25 +03:00
7fb27b16e8 LibC: Fix pthread cancellation
Install SIGCANCEL handler for all threads.

Remove unneeded atomic stores and loads. States are only changed within
the thread itself.

Define pthread_testcancel as a macro so it gets inlined inside
cancellation points
2026-04-07 03:41:25 +03:00
3fb903d991 LibGUI: Optimize invalidate and set alpha channel
If the window does not have an alpha channel, we now set every pixel's
alpha to 0xFF. This is needed by the WindowServer when it does alpha
blending, there used to be some weird stuff happening on overlapping
windows.

Also when we are invalidating a region with width of the whole window,
we can do a single memcpy instead of a memcpy for each row separately.
2026-04-06 19:29:34 +03:00
2a4a688c2d WindowServer: Optimize rendering
We now use SSE2 to do alpha blending on 4 pixels at a time where
possible and use memcpy instead of manual loops for non blended regions.
2026-04-06 19:29:34 +03:00
1487c86262 Kernel: Resolve \\_S5 package elements on poweroff 2026-04-06 19:29:34 +03:00
4d3751028b LibInput: Honor chroot and credentials when loading keymap 2026-04-06 19:29:34 +03:00
e4c6539964 Kernel: Be more clever with physical memory
Initially allocate all physical memory except kernel memory and boot
modules. Before we just skipped all memory before kernel boot modules.
Also release memory used by boot modules after the kernel is up and
running. Once the boot modules are loaded, there is no need to keep them
in memory.
2026-04-06 19:29:34 +03:00
34b59f062b LibC: Implement blocking pthread_rwlock
pthread_rwlock now uses a mutex and condition variable internally so it
doesn't need to yield while waiting!
2026-04-06 19:29:34 +03:00
ec4aa8d0b6 LibC: Fix shared pthread_barrier init
Initialize internal lock and cond as shared when the barrier is shared
2026-04-05 12:06:18 +03:00
1eebe85071 LibC: Fix pthread_cond_timedwait
If timeout occurred, I was not removing the entry from block list
2026-04-05 11:31:16 +03:00
db0507e670 LibC: Mark pthread_exit noreturn 2026-04-05 11:30:45 +03:00
1e3ca7dc18 Kernel: Fix signal related syscalls
There were missing locks, out of order sigprocmask, incorrect signal
masking...
2026-04-05 02:31:30 +03:00
8ca3c5d778 Kernel: Clean up signal handling
We now appreciate sa_mask and SA_NODEFER and change the signal mask for
the duration of signal handler. This is done by making a sigprocmask
syscall at the end of the signal handler. Back-to-back signals will
still grow stack as original registers are popped AFTER the block mask
is updated. I guess this is why linux has sigreturn(?).
2026-04-05 02:25:59 +03:00
df257755f7 Kernel: If userspace sets fs or gs, dont overwrite it
Current cpu index is stored at either segment. If userspace sets that
segment, kernel will not overwrite it on every reschedule. This is fine
as long as user program does not use anything that relies on it :)
2026-04-04 23:48:43 +03:00
d7e292a9f8 Kernel: Drop 32 bit userspace stack to 4 MiB
32 bit userspace only has 256 MiB reserved for stacks, so with 32 MiB
stacks it only allowed total of 7 threads. Now we can have up to 62
threads
2026-04-04 23:48:43 +03:00
9fce114e8e Kernel: Don't clone entire kernel stack on fork
We only need to copy area between [ret_sp, stack_end]. This range is
always very small compared to the whole stack (64 KiB).
2026-04-04 23:48:43 +03:00
9d83424346 Kernel: Remove unnecessary stack pointer loading
Any time I started a thread I was loading the stack pointer which is
already correctly passed :D
2026-04-04 23:48:43 +03:00
a29681a524 Kernel: Fix signal generation
We need to have interrupts enabled when signal kills the process as
process does mutex locking. Also signals are now only checked when
returning to userspace in the same place where userspace segments are
loaded.
2026-04-04 23:48:43 +03:00
47d85eb281 Kernel: Pass the actual vaddr range to reserve pages 2026-04-04 23:48:43 +03:00
85f676c30a DynamicLoader: Calulate max loaded file count based on dtv size
dtv should be dynamic but i dont care right now :)
2026-04-04 23:48:43 +03:00
8c5fa1c0b8 DynamicLoader: Fix R_386_PC32 relocation
I was not accounting elf base with offset
2026-04-04 23:48:43 +03:00
c7690053ae LibC: Don't crash on 32 bit pthread_create 2026-04-04 23:48:43 +03:00
3f55be638d Kernel: Allow reserve_free_page{,s} to fail
Apparently I was asserting here before :D
2026-04-04 23:48:43 +03:00
664c824bc0 Kernel: Keep fast page always reserved
There was a bug where 32 bit target's reserve_free_page was allocating
the fast page address
2026-04-04 23:48:43 +03:00
e239d9ca55 ports/SDL2: Use 48 kHz floats instead of 44.1 kHz PCM16 2026-04-03 16:17:16 +03:00
bf1d9662d7 LibAudio: Use floats instead of doubles for samples 2026-04-03 16:15:02 +03:00
675c215e6a Kernel: Add CoW support to MemoryBackedRegion
This speeds up fork by A LOT. Forking WindowServer took ~90 ms before
this and now its ~5 ms.
2026-04-03 01:54:59 +03:00
c09bca56f9 Kernel: Add fast write perm remove to page tables 2026-04-03 01:54:22 +03:00
7d8f7753d5 Kernel: Cleanup and fix page tables and better TLB shootdown 2026-04-03 01:53:30 +03:00
f77aa65dc5 Kernel: Cleanup accessing userspace memory
Instead of doing page validiation and loading manually we just do simple
memcpy and handle the possible page faults
2026-04-02 16:36:33 +03:00
9589b5984d Kernel: Move USERSPACE_END to lower half
This allows calculating distance to USERSPACE_END from lower half
address
2026-04-02 16:34:47 +03:00
32806a5af3 LibC: Allow "t" in stdio mode 2026-04-02 15:44:50 +03:00
876fbe3d7c LibC: Fix sem_{,timed}wait 2026-04-02 15:43:34 +03:00
c1b8f5e475 LibC: Add and cleanup network definitions 2026-04-02 15:42:00 +03:00
cf31ea9cbe LibC: Add _SC_PHYS_PAGES and _SC_AVPHYS_PAGES 2026-04-02 15:41:26 +03:00
7e6b8c93b4 LibC: Implement strsep 2026-04-02 15:40:23 +03:00
dd2bbe4588 LibC: Implement sched_getcpu 2026-04-02 15:39:36 +03:00
e01e35713b LibC: Allow including assert.h multiple times
Some shit seems to depend on this
2026-04-02 15:38:06 +03:00
82d5d9ba58 LibC: Write memchr, memcmp and strlen with sse 2026-04-02 15:35:03 +03:00
d168492462 WindowServer: bind volume up/down to volume control 2026-04-02 15:24:02 +03:00
6f2e8320a9 TaskBar: Show current volume level 2026-04-02 15:22:42 +03:00
bf4831f468 AudioServer: Add support for volume control 2026-04-02 15:21:38 +03:00
5647cf24d2 Kernel: Implement volume control to audio drivers 2026-04-02 15:14:27 +03:00
85f61aded5 BAN: Use builtins for math overflow 2026-04-02 14:49:12 +03:00
21639071c2 kill: Allow killing with process name 2026-04-02 05:02:05 +03:00
68506a789a Kernel: Add support for volume control keys 2026-04-02 05:02:05 +03:00
d9ca25b796 LibC: Add FNM_CASEFOLD and FNM_IGNORECASE
These are part of POSIX issue 8
2026-03-25 04:27:00 +02:00
e9c81477d7 BAN/LibC: Implement remainder
This is basically just fmod but with fprem1 instead of fprem
2026-03-25 01:06:45 +02:00
5c20d5e291 Kernel: HDAudio hide unusable pins and cleanup path finding 2026-03-24 01:16:47 +02:00
f89d690716 Kernel: HDAudio only probe codecs in STATESTS
This removes unnecessary probing that lead to timeouts. Also cap codec
address at 14 instead of 15. My test laptop was duplicating codec 0 at
address 15 leading to duplicate devices.
2026-03-24 00:49:47 +02:00
c563efcd1c AudioServer: Query pins of the asked device and not the current one 2026-03-23 22:57:49 +02:00
dedeebbfbe Kernel: Use ByteRingBuffer with audio buffers 2026-03-23 22:12:40 +02:00
35e2a70de0 AudioServer: Handle client data before disconnecting clients 2026-03-23 20:41:13 +02:00
81d5c86a7a WindowServer: Automatically launch xbanan if installed 2026-03-23 19:39:08 +02:00
db6644bae9 BuildSystem: Set glib-compile- binaries in meson cross file 2026-03-23 19:34:00 +02:00
14f1c1a358 LibC: Implement vsyslog 2026-03-23 19:13:38 +02:00
5be9bc64a2 ports/libxml2: Configure with -shared-libgcc
otherwise it doesn't seem to find libiconv due to __divdc3
2026-03-23 19:09:33 +02:00
64d3a5c8b7 ports: Update zlib 1.3.1->1.3.2
1.3.1 is no longer available at zlib.net
2026-03-23 18:54:57 +02:00
ccb4d13a82 Kernel: Compile EventFD file 2026-03-23 18:25:18 +02:00
cbe835a2c8 DynamicLoader: Add missing strlen definition 2026-03-23 18:23:31 +02:00
6a77754adf LibC: Don't link against libstdc++
This prevented building the toolchain
2026-03-23 18:22:42 +02:00
7d7d5ba734 LibC: Compile eventfd file 2026-03-23 18:22:04 +02:00
684fa1c4b0 ports: Add pixman port
This fixes cairo dependencies
2026-03-23 17:58:39 +02:00
a98d851fde ports: Add gtk3 port 2026-03-23 17:58:39 +02:00
9c3e2dab40 ports: Add pango port 2026-03-23 17:58:39 +02:00
eddb68f2fa ports/mesa: Build with x support 2026-03-23 17:55:57 +02:00
791091174a ports/cairo: Build with x support 2026-03-23 17:50:35 +02:00
dd9280c6ea ports/expat: Add support for shared libraries 2026-03-23 17:48:19 +02:00
a4d83f9fdb ports: Add xbanan port
This allows running x apps on top of my own GUI interface!
2026-03-23 17:47:11 +02:00
f42c5c4a5b ports: Add a lot of x library ports + xeyes/xclock 2026-03-23 17:45:59 +02:00
186fa4f1a1 ports: Update git 2.52.0->2.53.0 2026-03-23 17:35:08 +02:00
09292bb87e BAN: Cleanup math code and add SSE sqrt
We should prefer SSE instructions when they are easily available. For
other functions x87 is just simpler. It's hard to write faster and close
to as accurate approximations with SSE.

This does not use xmmintrin.h as clangd does not like that file and
starts throwing errors in every file that includes this :)
2026-03-22 22:07:48 +02:00
d18a0de879 Kernel: Fix mprotext for partial regions
if mprotected are did not contain the start of the region, mprotect
would exit early
2026-03-17 23:33:05 +02:00
cdc45935b5 Kernel: Don't allow chdir into non-directories 2026-03-17 22:57:17 +02:00
43e18148a6 LibC: Define SSP things 2026-03-17 20:30:25 +02:00
b0db645248 LibC: Add basic elf.h 2026-03-17 20:25:38 +02:00
07712758a7 BAN: Add default constructor to ipv4address 2026-03-17 20:24:48 +02:00
c1a424a635 Kernel: Implement linux's eventfd 2026-03-17 20:24:06 +02:00
a49588dbc7 DynamicLoader: Fix library lookup for already loaded files 2026-03-17 20:05:05 +02:00
1f22b9b982 DynamicLinker: Implement RTLD_NOLOAD 2026-03-17 20:04:48 +02:00
1d07d8e08e LibC/DynamicLoader: Add support for dynamically loaded TLS
Previously I failed to dlopen if any of the objects contained TLS
section
2026-03-17 20:01:51 +02:00
05b2424fca LibC: Implement more proper random number generator 2026-03-17 19:53:43 +02:00
07201c711e LibC: set endp in string to float conversion error 2026-03-17 19:50:12 +02:00
8fac88c9a6 LibC: Add sincos{,f,l} 2026-03-17 19:42:53 +02:00
c9aafa78ec DynamicLoader: Fix RO section mprotect arguments 2026-03-05 17:57:03 +02:00
e1c337a483 LibC: Fix compile and link flags
We were linking with -nostdlib and manually linked against libgcc. This
does not link with crtbegin and crtend which provides __dso_handle
preventing use of some global C++ constructors inside libc.

Now we just don't link against libc fixing this issue
2026-03-05 16:25:06 +02:00
acebe68dfa DynamicLoader: Fix copy relocation and TLS initialization 2026-03-04 23:04:19 +02:00
eeef945c25 Kernel: Make tty use the new byte ring buffer 2026-02-28 14:53:15 +02:00
a602753bda Kernel: Add front/back/pop_back to ByteRingBuffer 2026-02-28 14:51:35 +02:00
812ae77cd7 Kernel: Make TCP sockets use the new ring buffer
Also fix race condition that sometimes prevented window updates not
being sent after zero window effectively hanging the whole socket
2026-02-28 14:22:08 +02:00
8b8af1a9d9 Kernel: Rewrite pipes using the new ring buffer 2026-02-28 14:20:52 +02:00
493b5cb9b1 Kernel: Implement byte ring buffer
This maps the ring twice right next to each other so we don't have to
care about wrapping around when doing memcpy or accessing the data
2026-02-28 14:18:23 +02:00
1ecd7cc2fe Kernel: Allow protocol specific socket options
I had forgot to remove this condition on the syscall
2026-02-27 19:20:22 +02:00
5c38832456 Kernel: use wake_with_waketime in epoll
We already have the wake time so there is no reason to calculate the
timeout
2026-02-27 19:14:35 +02:00
d16f07a547 Kernel: Print thread id when writing to /dev/debug 2026-02-27 19:12:35 +02:00
54acb05131 Kernel: Don't print "./" prefix with debug functions 2026-02-27 19:10:51 +02:00
9ddf19f605 Kernel: Optimize networking code
Remove buffering from network layer and rework loopback interface.
loopback now has a separate recieve thread to allow concurrent sends and
prevent deadlocks
2026-02-27 19:08:08 +02:00
ff378e4538 Kernel: Cleanup and optimize TCP
We now only send enough data to fill other ends window, not past that.
Previous logic had a but that allowed sending too much data leading to
retransmissions.

When the target sends zero window and later updates window size,
immediately retransmit non-acknowledged bytes.

Don't validate packets through listeing socket twice. The actual socket
will already verify the checksum so the listening socket does not have
to.
2026-02-24 16:20:23 +02:00
2ea0a24795 Kernel: Fix TCP SYN option propagation
Listening socket now forwards TCP options to the newly created socket
2026-02-23 23:00:47 +02:00
acf28d8170 Kernel: Use ring buffers for TCP windows
This speeds up TCP networkign a ton as it doesnt have to do unnecessary
memmoves for each send/receive
2026-02-23 21:10:13 +02:00
666a7bb826 Kernel: Rework TCP window size reporting
We now report actually available window size when sending packets. If
the available window size grows significantly we send an ACK to reflect
this to the remote.
2026-02-23 21:10:13 +02:00
1ac20251cf Kernel: Fix TCP stack crash on retransmission 2026-02-23 17:48:16 +02:00
a318a19fe2 LibGUI/WindowServer: Add fullscreen events
When window's fullscreen state changes we now generate events!
2026-02-23 16:06:48 +02:00
f4a7aec167 LibGUI/WindowServer: Add support for custom cursor origin 2026-02-23 16:06:48 +02:00
9445332499 Kernel: Remove unnecessary interface lookup
This prevented connecting to local sockets listening on INADDR_ANY
2026-02-23 16:06:48 +02:00
8edd63d115 Kernel: Cleanup {set,get}sockopt debug prints 2026-02-23 16:06:48 +02:00
304ace1172 LibInput: Export keyboard layout keymaps 2026-02-23 16:06:48 +02:00
a5cdf0640f BAN: Add value_type to String{,View} 2026-02-23 16:06:48 +02:00
1fc2e43881 BAN: Add support for string format padding 2026-02-23 16:06:48 +02:00
0964c9f928 BAN: Remove unnecessary assert from span 2026-02-23 16:06:48 +02:00
8b1e820869 BAN: Add reallocator support to Vector 2026-02-21 04:03:11 +02:00
9edc6966db BAN: Add reallocator definition
for the moment this does not exist in kernel, kmalloc rewrite soon™️
2026-02-21 04:03:11 +02:00
12207dcb77 BAN: Add is_trivially_copyable trait 2026-02-21 04:03:11 +02:00
2255e36810 LibDEFLATE: Add GZip support
This allows compressing and decompressing with data using GZip headers
and footers
2026-02-21 04:03:11 +02:00
5abddd448e LibC: Fix typo/bug in fnmatch
* would stop matching at '0' instead of end of string
2026-02-19 22:12:59 +02:00
f022a1b08f Shell: Fix crash when executing semicolon
This fixes #4
2026-02-13 17:52:54 +02:00
b3bbfaeff0 LibC: Fix posix_spawnattr_t definition 2026-02-10 01:22:25 +02:00
679a3d4209 LibGUI: Add Texture::clear{,_rect} 2026-02-08 19:45:01 +02:00
a0211d88e7 Kernel: Don't include TCP header in MSS 2026-02-08 19:44:30 +02:00
e216fc7798 Kernel: Fix port allocation endianness 2026-02-08 19:43:08 +02:00
c648ea12f2 Kernel: Cleanup and fix UNIX sockets
EPOLLOUT is now sent to the correct socket and buffer is now a ring
buffer to avoid unnecessary memmove on every packet
2026-02-08 19:38:28 +02:00
2e59373a1e Kernel: Fix non blocking sockets blocking :D 2026-02-08 19:33:28 +02:00
a51a81b6cd Kernel: Move {set,get}sockopt to sockets
Sockets can now actually implement socket options :D
2026-02-08 19:27:16 +02:00
9809f87010 LibC: Fix {read,write}v return value for partial actions 2026-02-08 18:45:29 +02:00
8794122c2d BAN: Variant allow copy/move from empty 2026-02-07 18:54:31 +02:00
8fb2270ecf DynamicLoader: map RO sections actually read only
I was mapping everything RW as i did not have mprotect when I
implemented the dynamic loader.
2026-02-04 23:21:06 +02:00
c304133224 LibC: Indicate regex support in unistd.h 2026-01-25 01:47:30 +02:00
7843d3de62 LibC: Support attrs and file actions in posix spawn
Apparently GCC wants to use posix_spawn now that it is available, this
patch adds support for the missing fields. POSIX Issue 8 did add some
fields that are not supported here
2026-01-25 01:45:47 +02:00
aef536fff3 Kernel: Fix SharedMemoryObject cloning on deleted keys 2026-01-25 01:42:17 +02:00
d472e1ac0e Kernel: Remove obsolete FIXMEs and null pointer checks 2026-01-24 22:42:18 +02:00
120c08fb75 Kernel: Implement fcntl based locks 2026-01-24 22:38:34 +02:00
264 changed files with 9405 additions and 3628 deletions

View File

@@ -9,6 +9,10 @@ namespace BAN
struct IPv4Address struct IPv4Address
{ {
constexpr IPv4Address()
: IPv4Address(0)
{ }
constexpr IPv4Address(uint32_t u32_address) constexpr IPv4Address(uint32_t u32_address)
{ {
raw = u32_address; raw = u32_address;

View File

@@ -0,0 +1,46 @@
#pragma once
#define _ban_count_args_impl(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, ...) _9
#define _ban_count_args(...) _ban_count_args_impl(__VA_ARGS__ __VA_OPT__(,) 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
#define _ban_concat_impl(a, b) a##b
#define _ban_concat(a, b) _ban_concat_impl(a, b)
#define _ban_stringify_impl(x) #x
#define _ban_stringify(x) _ban_stringify_impl(x)
#define _ban_fe_0(f)
#define _ban_fe_1(f, _0) f(0, _0)
#define _ban_fe_2(f, _0, _1) f(0, _0) f(1, _1)
#define _ban_fe_3(f, _0, _1, _2) f(0, _0) f(1, _1) f(2, _2)
#define _ban_fe_4(f, _0, _1, _2, _3) f(0, _0) f(1, _1) f(2, _2) f(3, _3)
#define _ban_fe_5(f, _0, _1, _2, _3, _4) f(0, _0) f(1, _1) f(2, _2) f(3, _3) f(4, _4)
#define _ban_fe_6(f, _0, _1, _2, _3, _4, _5) f(0, _0) f(1, _1) f(2, _2) f(3, _3) f(4, _4) f(5, _5)
#define _ban_fe_7(f, _0, _1, _2, _3, _4, _5, _6) f(0, _0) f(1, _1) f(2, _2) f(3, _3) f(4, _4) f(5, _5) f(6, _6)
#define _ban_fe_8(f, _0, _1, _2, _3, _4, _5, _6, _7) f(0, _0) f(1, _1) f(2, _2) f(3, _3) f(4, _4) f(5, _5) f(6, _6) f(7, _7)
#define _ban_fe_9(f, _0, _1, _2, _3, _4, _5, _6, _7, _8) f(0, _0) f(1, _1) f(2, _2) f(3, _3) f(4, _4) f(5, _5) f(6, _6) f(7, _7) f(8, _8)
#define _ban_for_each(f, ...) _ban_concat(_ban_fe_, _ban_count_args(__VA_ARGS__))(f __VA_OPT__(,) __VA_ARGS__)
#define _ban_fe_comma_0(f)
#define _ban_fe_comma_1(f, _0) f(0, _0)
#define _ban_fe_comma_2(f, _0, _1) f(0, _0), f(1, _1)
#define _ban_fe_comma_3(f, _0, _1, _2) f(0, _0), f(1, _1), f(2, _2)
#define _ban_fe_comma_4(f, _0, _1, _2, _3) f(0, _0), f(1, _1), f(2, _2), f(3, _3)
#define _ban_fe_comma_5(f, _0, _1, _2, _3, _4) f(0, _0), f(1, _1), f(2, _2), f(3, _3), f(4, _4)
#define _ban_fe_comma_6(f, _0, _1, _2, _3, _4, _5) f(0, _0), f(1, _1), f(2, _2), f(3, _3), f(4, _4), f(5, _5)
#define _ban_fe_comma_7(f, _0, _1, _2, _3, _4, _5, _6) f(0, _0), f(1, _1), f(2, _2), f(3, _3), f(4, _4), f(5, _5), f(6, _6)
#define _ban_fe_comma_8(f, _0, _1, _2, _3, _4, _5, _6, _7) f(0, _0), f(1, _1), f(2, _2), f(3, _3), f(4, _4), f(5, _5), f(6, _6), f(7, _7)
#define _ban_fe_comma_9(f, _0, _1, _2, _3, _4, _5, _6, _7, _8) f(0, _0), f(1, _1), f(2, _2), f(3, _3), f(4, _4), f(5, _5), f(6, _6), f(7, _7), f(8, _8)
#define _ban_for_each_comma(f, ...) _ban_concat(_ban_fe_comma_, _ban_count_args(__VA_ARGS__))(f __VA_OPT__(,) __VA_ARGS__)
#define _ban_get_0(a0, ...) a0
#define _ban_get_1(a0, a1, ...) a1
#define _ban_get_2(a0, a1, a2, ...) a2
#define _ban_get_3(a0, a1, a2, a3, ...) a3
#define _ban_get_4(a0, a1, a2, a3, a4, ...) a4
#define _ban_get_5(a0, a1, a2, a3, a4, a5, ...) a5
#define _ban_get_6(a0, a1, a2, a3, a4, a5, a6, ...) a6
#define _ban_get_7(a0, a1, a2, a3, a4, a5, a6, a7, ...) a7
#define _ban_get_8(a0, a1, a2, a3, a4, a5, a6, a7, a8, ...) a8
#define _ban_get_9(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, ...) a9
#define _ban_get(n, ...) _ban_concat(_ban_get_, n)(__VA_ARGS__)

View File

@@ -36,12 +36,11 @@ namespace BAN::Math
template<integral T> template<integral T>
inline constexpr T gcd(T a, T b) inline constexpr T gcd(T a, T b)
{ {
T t;
while (b) while (b)
{ {
t = b; T temp = b;
b = a % b; b = a % b;
a = t; a = temp;
} }
return a; return a;
} }
@@ -66,25 +65,20 @@ namespace BAN::Math
return (x & (x - 1)) == 0; return (x & (x - 1)) == 0;
} }
template<BAN::integral T> template<integral T>
static constexpr bool will_multiplication_overflow(T a, T b) __attribute__((always_inline))
inline constexpr bool will_multiplication_overflow(T a, T b)
{ {
if (a == 0 || b == 0) T dummy;
return false; return __builtin_mul_overflow(a, b, &dummy);
if ((a > 0) == (b > 0))
return a > BAN::numeric_limits<T>::max() / b;
else
return a < BAN::numeric_limits<T>::min() / b;
} }
template<BAN::integral T> template<integral T>
static constexpr bool will_addition_overflow(T a, T b) __attribute__((always_inline))
inline constexpr bool will_addition_overflow(T a, T b)
{ {
if (a > 0 && b > 0) T dummy;
return a > BAN::numeric_limits<T>::max() - b; return __builtin_add_overflow(a, b, &dummy);
if (a < 0 && b < 0)
return a < BAN::numeric_limits<T>::min() - b;
return false;
} }
template<typename T> template<typename T>
@@ -98,6 +92,19 @@ namespace BAN::Math
return sizeof(T) * 8 - __builtin_clzll(x) - 1; return sizeof(T) * 8 - __builtin_clzll(x) - 1;
} }
// This is ugly but my clangd does not like including
// intrinsic headers at all
#if !defined(__SSE__) || !defined(__SSE2__)
#pragma GCC push_options
#ifndef __SSE__
#pragma GCC target("sse")
#endif
#ifndef __SSE2__
#pragma GCC target("sse2")
#endif
#define BAN_MATH_POP_OPTIONS
#endif
template<floating_point T> template<floating_point T>
inline constexpr T floor(T x) inline constexpr T floor(T x)
{ {
@@ -159,7 +166,23 @@ namespace BAN::Math
"jne 1b;" "jne 1b;"
: "+t"(a) : "+t"(a)
: "u"(b) : "u"(b)
: "ax" : "ax", "cc"
);
return a;
}
template<floating_point T>
inline constexpr T remainder(T a, T b)
{
asm(
"1:"
"fprem1;"
"fnstsw %%ax;"
"testb $4, %%ah;"
"jne 1b;"
: "+t"(a)
: "u"(b)
: "ax", "cc"
); );
return a; return a;
} }
@@ -167,7 +190,7 @@ namespace BAN::Math
template<floating_point T> template<floating_point T>
static T modf(T x, T* iptr) static T modf(T x, T* iptr)
{ {
const T frac = BAN::Math::fmod<T>(x, 1); const T frac = BAN::Math::fmod<T>(x, (T)1.0);
*iptr = x - frac; *iptr = x - frac;
return frac; return frac;
} }
@@ -175,15 +198,15 @@ namespace BAN::Math
template<floating_point T> template<floating_point T>
inline constexpr T frexp(T num, int* exp) inline constexpr T frexp(T num, int* exp)
{ {
if (num == 0.0) if (num == (T)0.0)
{ {
*exp = 0; *exp = 0;
return 0.0; return (T)0.0;
} }
T _exp; T e;
asm("fxtract" : "+t"(num), "=u"(_exp)); asm("fxtract" : "+t"(num), "=u"(e));
*exp = (int)_exp + 1; *exp = (int)e + 1;
return num / (T)2.0; return num / (T)2.0;
} }
@@ -251,6 +274,7 @@ namespace BAN::Math
"fstp %%st(1);" "fstp %%st(1);"
: "+t"(x) : "+t"(x)
); );
return x; return x;
} }
@@ -263,18 +287,9 @@ namespace BAN::Math
template<floating_point T> template<floating_point T>
inline constexpr T pow(T x, T y) inline constexpr T pow(T x, T y)
{ {
asm( if (x == (T)0.0)
"fyl2x;" return (T)0.0;
"fld1;" return exp2<T>(y * log2<T>(x));
"fld %%st(1);"
"fprem;"
"f2xm1;"
"faddp;"
"fscale;"
: "+t"(x), "+u"(y)
);
return x;
} }
template<floating_point T> template<floating_point T>
@@ -310,16 +325,27 @@ namespace BAN::Math
template<floating_point T> template<floating_point T>
inline constexpr T sqrt(T x) inline constexpr T sqrt(T x)
{ {
asm("fsqrt" : "+t"(x)); if constexpr(BAN::is_same_v<T, float>)
return x; {
using v4sf = float __attribute__((vector_size(16)));
return __builtin_ia32_sqrtss((v4sf) { x, 0.0f, 0.0f, 0.0f })[0];
}
else if constexpr(BAN::is_same_v<T, double>)
{
using v2df = double __attribute__((vector_size(16)));
return __builtin_ia32_sqrtsd((v2df) { x, 0.0 })[0];
}
else if constexpr(BAN::is_same_v<T, long double>)
{
asm("fsqrt" : "+t"(x));
return x;
}
} }
template<floating_point T> template<floating_point T>
inline constexpr T cbrt(T value) inline constexpr T cbrt(T value)
{ {
if (value == 0.0) return pow<T>(value, (T)1.0 / (T)3.0);
return 0.0;
return pow<T>(value, 1.0 / 3.0);
} }
template<floating_point T> template<floating_point T>
@@ -346,30 +372,21 @@ namespace BAN::Math
inline constexpr T tan(T x) inline constexpr T tan(T x)
{ {
T one, ret; T one, ret;
asm( asm("fptan" : "=t"(one), "=u"(ret) : "0"(x));
"fptan"
: "=t"(one), "=u"(ret)
: "0"(x)
);
return ret; return ret;
} }
template<floating_point T> template<floating_point T>
inline constexpr T atan2(T y, T x) inline constexpr T atan2(T y, T x)
{ {
asm( asm("fpatan" : "+t"(x) : "u"(y) : "st(1)");
"fpatan"
: "+t"(x)
: "u"(y)
: "st(1)"
);
return x; return x;
} }
template<floating_point T> template<floating_point T>
inline constexpr T atan(T x) inline constexpr T atan(T x)
{ {
return atan2<T>(x, 1.0); return atan2<T>(x, (T)1.0);
} }
template<floating_point T> template<floating_point T>
@@ -378,10 +395,10 @@ namespace BAN::Math
if (x == (T)0.0) if (x == (T)0.0)
return (T)0.0; return (T)0.0;
if (x == (T)1.0) if (x == (T)1.0)
return numbers::pi_v<T> / (T)2.0; return +numbers::pi_v<T> / (T)2.0;
if (x == (T)-1.0) if (x == (T)-1.0)
return -numbers::pi_v<T> / (T)2.0; return -numbers::pi_v<T> / (T)2.0;
return (T)2.0 * atan<T>(x / (T(1.0) + sqrt<T>((T)1.0 - x * x))); return (T)2.0 * atan<T>(x / ((T)1.0 + sqrt<T>((T)1.0 - x * x)));
} }
template<floating_point T> template<floating_point T>
@@ -411,7 +428,7 @@ namespace BAN::Math
template<floating_point T> template<floating_point T>
inline constexpr T tanh(T x) inline constexpr T tanh(T x)
{ {
const T exp_px = exp<T>(x); const T exp_px = exp<T>(+x);
const T exp_nx = exp<T>(-x); const T exp_nx = exp<T>(-x);
return (exp_px - exp_nx) / (exp_px + exp_nx); return (exp_px - exp_nx) / (exp_px + exp_nx);
} }
@@ -440,4 +457,9 @@ namespace BAN::Math
return sqrt<T>(x * x + y * y); return sqrt<T>(x * x + y * y);
} }
#ifdef BAN_MATH_POP_OPTIONS
#undef BAN_MATH_POP_OPTIONS
#pragma GCC pop_options
#endif
} }

View File

@@ -9,10 +9,12 @@
namespace BAN namespace BAN
{ {
#if defined(__is_kernel) #if defined(__is_kernel)
static constexpr void*(&allocator)(size_t) = kmalloc; static constexpr void*(*allocator)(size_t) = kmalloc;
static constexpr void(&deallocator)(void*) = kfree; static constexpr void*(*reallocator)(void*, size_t) = nullptr;
static constexpr void(*deallocator)(void*) = kfree;
#else #else
static constexpr void*(&allocator)(size_t) = malloc; static constexpr void*(*allocator)(size_t) = malloc;
static constexpr void(&deallocator)(void*) = free; static constexpr void*(*reallocator)(void*, size_t) = realloc;
static constexpr void(*deallocator)(void*) = free;
#endif #endif
} }

View File

@@ -69,7 +69,6 @@ namespace BAN
value_type* data() const value_type* data() const
{ {
ASSERT(m_data);
return m_data; return m_data;
} }
@@ -84,7 +83,6 @@ namespace BAN
Span slice(size_type start, size_type length = ~size_type(0)) const Span slice(size_type start, size_type length = ~size_type(0)) const
{ {
ASSERT(m_data);
ASSERT(start <= m_size); ASSERT(start <= m_size);
if (length == ~size_type(0)) if (length == ~size_type(0))
length = m_size - start; length = m_size - start;

View File

@@ -14,6 +14,7 @@ namespace BAN
{ {
public: public:
using size_type = size_t; using size_type = size_t;
using value_type = char;
using iterator = IteratorSimple<char, String>; using iterator = IteratorSimple<char, String>;
using const_iterator = ConstIteratorSimple<char, String>; using const_iterator = ConstIteratorSimple<char, String>;
static constexpr size_type sso_capacity = 15; static constexpr size_type sso_capacity = 15;
@@ -352,10 +353,9 @@ namespace BAN::Formatter
{ {
template<typename F> template<typename F>
void print_argument(F putc, const String& string, const ValueFormat&) void print_argument(F putc, const String& string, const ValueFormat& format)
{ {
for (String::size_type i = 0; i < string.size(); i++) print_argument(putc, string.sv(), format);
putc(string[i]);
} }
} }

View File

@@ -14,6 +14,7 @@ namespace BAN
{ {
public: public:
using size_type = size_t; using size_type = size_t;
using value_type = char;
using const_iterator = ConstIteratorSimple<char, StringView>; using const_iterator = ConstIteratorSimple<char, StringView>;
public: public:
@@ -246,10 +247,12 @@ namespace BAN::Formatter
{ {
template<typename F> template<typename F>
void print_argument(F putc, const StringView& sv, const ValueFormat&) void print_argument(F putc, const StringView& sv, const ValueFormat& format)
{ {
for (StringView::size_type i = 0; i < sv.size(); i++) for (StringView::size_type i = 0; i < sv.size(); i++)
putc(sv[i]); putc(sv[i]);
for (int i = sv.size(); i < format.fill; i++)
putc(' ');
} }
} }

View File

@@ -61,6 +61,9 @@ namespace BAN
template<typename T> struct is_move_constructible { static constexpr bool value = is_constructible_v<T, T&&>; }; template<typename T> struct is_move_constructible { static constexpr bool value = is_constructible_v<T, T&&>; };
template<typename T> inline constexpr bool is_move_constructible_v = is_move_constructible<T>::value; template<typename T> inline constexpr bool is_move_constructible_v = is_move_constructible<T>::value;
template<typename T> struct is_trivially_copyable { static constexpr bool value = __is_trivially_copyable(T); };
template<typename T> inline constexpr bool is_trivially_copyable_v = is_trivially_copyable<T>::value;
template<typename T> struct is_integral { static constexpr bool value = requires (T t, T* p, void (*f)(T)) { reinterpret_cast<T>(t); f(0); p + t; }; }; template<typename T> struct is_integral { static constexpr bool value = requires (T t, T* p, void (*f)(T)) { reinterpret_cast<T>(t); f(0); p + t; }; };
template<typename T> inline constexpr bool is_integral_v = is_integral<T>::value; template<typename T> inline constexpr bool is_integral_v = is_integral<T>::value;
template<typename T> concept integral = is_integral_v<T>; template<typename T> concept integral = is_integral_v<T>;

View File

@@ -126,14 +126,16 @@ namespace BAN
Variant(Variant&& other) Variant(Variant&& other)
: m_index(other.m_index) : m_index(other.m_index)
{ {
detail::move_construct<Ts...>(other.m_index, other.m_storage, m_storage); if (other.has_value())
detail::move_construct<Ts...>(other.m_index, other.m_storage, m_storage);
other.clear(); other.clear();
} }
Variant(const Variant& other) Variant(const Variant& other)
: m_index(other.m_index) : m_index(other.m_index)
{ {
detail::copy_construct<Ts...>(other.m_index, other.m_storage, m_storage); if (other.has_value())
detail::copy_construct<Ts...>(other.m_index, other.m_storage, m_storage);
} }
template<typename T> template<typename T>
@@ -157,12 +159,13 @@ namespace BAN
Variant& operator=(Variant&& other) Variant& operator=(Variant&& other)
{ {
if (m_index == other.m_index) if (m_index == other.m_index && m_index != invalid_index())
detail::move_assign<Ts...>(m_index, other.m_storage, m_storage); detail::move_assign<Ts...>(m_index, other.m_storage, m_storage);
else else
{ {
clear(); clear();
detail::move_construct<Ts...>(other.m_index, other.m_storage, m_storage); if (other.has_value())
detail::move_construct<Ts...>(other.m_index, other.m_storage, m_storage);
m_index = other.m_index; m_index = other.m_index;
} }
other.clear(); other.clear();
@@ -171,12 +174,13 @@ namespace BAN
Variant& operator=(const Variant& other) Variant& operator=(const Variant& other)
{ {
if (m_index == other.m_index) if (m_index == other.m_index && m_index != invalid_index())
detail::copy_assign<Ts...>(m_index, other.m_storage, m_storage); detail::copy_assign<Ts...>(m_index, other.m_storage, m_storage);
else else
{ {
clear(); clear();
detail::copy_construct<Ts...>(other.m_index, other.m_storage, m_storage); if (other.has_value())
detail::copy_construct<Ts...>(other.m_index, other.m_storage, m_storage);
m_index = other.m_index; m_index = other.m_index;
} }
return *this; return *this;

View File

@@ -381,19 +381,46 @@ namespace BAN
template<typename T> template<typename T>
ErrorOr<void> Vector<T>::ensure_capacity(size_type size) ErrorOr<void> Vector<T>::ensure_capacity(size_type size)
{ {
static_assert(alignof(T) <= alignof(max_align_t), "over aligned types not supported");
if (m_capacity >= size) if (m_capacity >= size)
return {}; return {};
size_type new_cap = BAN::Math::max<size_type>(size, m_capacity * 2);
T* new_data = (T*)BAN::allocator(new_cap * sizeof(T)); const size_type new_cap = BAN::Math::max<size_type>(size, m_capacity * 2);
if (new_data == nullptr)
return Error::from_errno(ENOMEM); if constexpr (BAN::is_trivially_copyable_v<T>)
for (size_type i = 0; i < m_size; i++)
{ {
new (new_data + i) T(move(m_data[i])); if constexpr (BAN::reallocator)
m_data[i].~T(); {
auto* new_data = static_cast<T*>(BAN::reallocator(m_data, new_cap * sizeof(T)));
if (new_data == nullptr)
return Error::from_errno(ENOMEM);
m_data = new_data;
}
else
{
auto* new_data = static_cast<T*>(BAN::allocator(new_cap * sizeof(T)));
if (new_data == nullptr)
return Error::from_errno(ENOMEM);
memcpy(new_data, m_data, m_size * sizeof(T));
BAN::deallocator(m_data);
m_data = new_data;
}
} }
BAN::deallocator(m_data); else
m_data = new_data; {
auto* new_data = static_cast<T*>(BAN::allocator(new_cap * sizeof(T)));
if (new_data == nullptr)
return Error::from_errno(ENOMEM);
for (size_type i = 0; i < m_size; i++)
{
new (new_data + i) T(move(m_data[i]));
m_data[i].~T();
}
BAN::deallocator(m_data);
m_data = new_data;
}
m_capacity = new_cap; m_capacity = new_cap;
return {}; return {};
} }

Binary file not shown.

View File

@@ -25,6 +25,7 @@ set(KERNEL_SOURCES
kernel/Epoll.cpp kernel/Epoll.cpp
kernel/Errors.cpp kernel/Errors.cpp
kernel/FS/DevFS/FileSystem.cpp kernel/FS/DevFS/FileSystem.cpp
kernel/FS/EventFD.cpp
kernel/FS/Ext2/FileSystem.cpp kernel/FS/Ext2/FileSystem.cpp
kernel/FS/Ext2/Inode.cpp kernel/FS/Ext2/Inode.cpp
kernel/FS/FAT/FileSystem.cpp kernel/FS/FAT/FileSystem.cpp
@@ -50,6 +51,7 @@ set(KERNEL_SOURCES
kernel/InterruptController.cpp kernel/InterruptController.cpp
kernel/kernel.cpp kernel/kernel.cpp
kernel/Lock/SpinLock.cpp kernel/Lock/SpinLock.cpp
kernel/Memory/ByteRingBuffer.cpp
kernel/Memory/DMARegion.cpp kernel/Memory/DMARegion.cpp
kernel/Memory/FileBackedRegion.cpp kernel/Memory/FileBackedRegion.cpp
kernel/Memory/Heap.cpp kernel/Memory/Heap.cpp
@@ -137,6 +139,7 @@ if("${BANAN_ARCH}" STREQUAL "x86_64")
arch/x86_64/Signal.S arch/x86_64/Signal.S
arch/x86_64/Syscall.S arch/x86_64/Syscall.S
arch/x86_64/Thread.S arch/x86_64/Thread.S
arch/x86_64/User.S
arch/x86_64/Yield.S arch/x86_64/Yield.S
) )
elseif("${BANAN_ARCH}" STREQUAL "i686") elseif("${BANAN_ARCH}" STREQUAL "i686")
@@ -148,6 +151,7 @@ elseif("${BANAN_ARCH}" STREQUAL "i686")
arch/i686/Signal.S arch/i686/Signal.S
arch/i686/Syscall.S arch/i686/Syscall.S
arch/i686/Thread.S arch/i686/Thread.S
arch/i686/User.S
arch/i686/Yield.S arch/i686/Yield.S
) )
else() else()
@@ -164,10 +168,7 @@ set(BAN_SOURCES
set(KLIBC_SOURCES set(KLIBC_SOURCES
klibc/ctype.cpp klibc/ctype.cpp
klibc/string.cpp klibc/string.cpp
klibc/arch/${BANAN_ARCH}/string.S
# Ehhh don't do this but for now libc uses the same stuff kernel can use
# This won't work after libc starts using sse implemetations tho
../userspace/libraries/LibC/arch/${BANAN_ARCH}/string.S
) )
set(LIBDEFLATE_SOURCE set(LIBDEFLATE_SOURCE

View File

@@ -21,6 +21,11 @@ namespace Kernel
SpinLock PageTable::s_fast_page_lock; SpinLock PageTable::s_fast_page_lock;
constexpr uint64_t s_page_flag_mask = 0x8000000000000FFF;
constexpr uint64_t s_page_addr_mask = ~s_page_flag_mask;
static bool s_is_post_heap_done = false;
static PageTable* s_kernel = nullptr; static PageTable* s_kernel = nullptr;
static bool s_has_nxe = false; static bool s_has_nxe = false;
static bool s_has_pge = false; static bool s_has_pge = false;
@@ -67,7 +72,7 @@ namespace Kernel
void PageTable::initialize_post_heap() void PageTable::initialize_post_heap()
{ {
// NOTE: this is no-op as our 32 bit target does not use hhdm s_is_post_heap_done = true;
} }
void PageTable::initial_load() void PageTable::initial_load()
@@ -141,9 +146,9 @@ namespace Kernel
} }
template<typename T> template<typename T>
static vaddr_t P2V(const T paddr) static uint64_t* P2V(const T paddr)
{ {
return (paddr_t)paddr - g_boot_info.kernel_paddr + KERNEL_OFFSET; return reinterpret_cast<uint64_t*>(reinterpret_cast<paddr_t>(paddr) - g_boot_info.kernel_paddr + KERNEL_OFFSET);
} }
void PageTable::initialize_kernel() void PageTable::initialize_kernel()
@@ -193,13 +198,18 @@ namespace Kernel
{ {
constexpr uint64_t pdpte = (fast_page() >> 30) & 0x1FF; constexpr uint64_t pdpte = (fast_page() >> 30) & 0x1FF;
constexpr uint64_t pde = (fast_page() >> 21) & 0x1FF; constexpr uint64_t pde = (fast_page() >> 21) & 0x1FF;
constexpr uint64_t pte = (fast_page() >> 12) & 0x1FF;
uint64_t* pdpt = reinterpret_cast<uint64_t*>(P2V(m_highest_paging_struct)); const uint64_t* pdpt = P2V(m_highest_paging_struct);
ASSERT(pdpt[pdpte] & Flags::Present); ASSERT(pdpt[pdpte] & Flags::Present);
uint64_t* pd = reinterpret_cast<uint64_t*>(P2V(pdpt[pdpte]) & PAGE_ADDR_MASK); uint64_t* pd = P2V(pdpt[pdpte] & s_page_addr_mask);
ASSERT(!(pd[pde] & Flags::Present)); ASSERT(!(pd[pde] & Flags::Present));
pd[pde] = V2P(allocate_zeroed_page_aligned_page()) | Flags::ReadWrite | Flags::Present; pd[pde] = V2P(allocate_zeroed_page_aligned_page()) | Flags::ReadWrite | Flags::Present;
uint64_t* pt = P2V(pd[pde] & s_page_addr_mask);
ASSERT(pt[pte] == 0);
pt[pte] = Flags::Reserved;
} }
void PageTable::map_fast_page(paddr_t paddr) void PageTable::map_fast_page(paddr_t paddr)
@@ -214,9 +224,9 @@ namespace Kernel
constexpr uint64_t pde = (fast_page() >> 21) & 0x1FF; constexpr uint64_t pde = (fast_page() >> 21) & 0x1FF;
constexpr uint64_t pte = (fast_page() >> 12) & 0x1FF; constexpr uint64_t pte = (fast_page() >> 12) & 0x1FF;
uint64_t* pdpt = reinterpret_cast<uint64_t*>(P2V(s_kernel->m_highest_paging_struct)); uint64_t* pdpt = P2V(s_kernel->m_highest_paging_struct);
uint64_t* pd = reinterpret_cast<uint64_t*>(P2V(pdpt[pdpte] & PAGE_ADDR_MASK)); uint64_t* pd = P2V(pdpt[pdpte] & s_page_addr_mask);
uint64_t* pt = reinterpret_cast<uint64_t*>(P2V(pd[pde] & PAGE_ADDR_MASK)); uint64_t* pt = P2V(pd[pde] & s_page_addr_mask);
ASSERT(!(pt[pte] & Flags::Present)); ASSERT(!(pt[pte] & Flags::Present));
pt[pte] = paddr | Flags::ReadWrite | Flags::Present; pt[pte] = paddr | Flags::ReadWrite | Flags::Present;
@@ -234,12 +244,12 @@ namespace Kernel
constexpr uint64_t pde = (fast_page() >> 21) & 0x1FF; constexpr uint64_t pde = (fast_page() >> 21) & 0x1FF;
constexpr uint64_t pte = (fast_page() >> 12) & 0x1FF; constexpr uint64_t pte = (fast_page() >> 12) & 0x1FF;
uint64_t* pdpt = reinterpret_cast<uint64_t*>(P2V(s_kernel->m_highest_paging_struct)); uint64_t* pdpt = P2V(s_kernel->m_highest_paging_struct);
uint64_t* pd = reinterpret_cast<uint64_t*>(P2V(pdpt[pdpte] & PAGE_ADDR_MASK)); uint64_t* pd = P2V(pdpt[pdpte] & s_page_addr_mask);
uint64_t* pt = reinterpret_cast<uint64_t*>(P2V(pd[pde] & PAGE_ADDR_MASK)); uint64_t* pt = P2V(pd[pde] & s_page_addr_mask);
ASSERT(pt[pte] & Flags::Present); ASSERT(pt[pte] & Flags::Present);
pt[pte] = 0; pt[pte] = Flags::Reserved;
asm volatile("invlpg (%0)" :: "r"(fast_page()) : "memory"); asm volatile("invlpg (%0)" :: "r"(fast_page()) : "memory");
} }
@@ -263,7 +273,7 @@ namespace Kernel
m_highest_paging_struct = V2P(kmalloc(32, 32, true)); m_highest_paging_struct = V2P(kmalloc(32, 32, true));
ASSERT(m_highest_paging_struct); ASSERT(m_highest_paging_struct);
uint64_t* pdpt = reinterpret_cast<uint64_t*>(P2V(m_highest_paging_struct)); uint64_t* pdpt = P2V(m_highest_paging_struct);
pdpt[0] = 0; pdpt[0] = 0;
pdpt[1] = 0; pdpt[1] = 0;
pdpt[2] = 0; pdpt[2] = 0;
@@ -276,18 +286,17 @@ namespace Kernel
if (m_highest_paging_struct == 0) if (m_highest_paging_struct == 0)
return; return;
uint64_t* pdpt = reinterpret_cast<uint64_t*>(P2V(m_highest_paging_struct)); uint64_t* pdpt = P2V(m_highest_paging_struct);
for (uint32_t pdpte = 0; pdpte < 3; pdpte++) for (uint32_t pdpte = 0; pdpte < 3; pdpte++)
{ {
if (!(pdpt[pdpte] & Flags::Present)) if (!(pdpt[pdpte] & Flags::Present))
continue; continue;
uint64_t* pd = reinterpret_cast<uint64_t*>(P2V(pdpt[pdpte] & PAGE_ADDR_MASK)); uint64_t* pd = P2V(pdpt[pdpte] & s_page_addr_mask);
for (uint32_t pde = 0; pde < 512; pde++) for (uint32_t pde = 0; pde < 512; pde++)
{ {
if (!(pd[pde] & Flags::Present)) if (!(pd[pde] & Flags::Present))
continue; continue;
kfree(reinterpret_cast<uint64_t*>(P2V(pd[pde] & PAGE_ADDR_MASK))); kfree(P2V(pd[pde] & s_page_addr_mask));
} }
kfree(pd); kfree(pd);
} }
@@ -298,15 +307,43 @@ namespace Kernel
{ {
SpinLockGuard _(m_lock); SpinLockGuard _(m_lock);
ASSERT(m_highest_paging_struct < 0x100000000); ASSERT(m_highest_paging_struct < 0x100000000);
const uint32_t pdpt_lo = m_highest_paging_struct; asm volatile("movl %0, %%cr3" :: "r"(static_cast<uint32_t>(m_highest_paging_struct)));
asm volatile("movl %0, %%cr3" :: "r"(pdpt_lo));
Processor::set_current_page_table(this); Processor::set_current_page_table(this);
} }
void PageTable::invalidate(vaddr_t vaddr, bool send_smp_message) void PageTable::invalidate_range(vaddr_t vaddr, size_t pages, bool send_smp_message)
{ {
ASSERT(vaddr % PAGE_SIZE == 0); ASSERT(vaddr % PAGE_SIZE == 0);
asm volatile("invlpg (%0)" :: "r"(vaddr) : "memory");
const bool is_userspace = (vaddr < KERNEL_OFFSET);
if (is_userspace && this != &PageTable::current())
;
else if (pages <= 32 || !s_is_post_heap_done)
{
for (size_t i = 0; i < pages; i++, vaddr += PAGE_SIZE)
asm volatile("invlpg (%0)" :: "r"(vaddr));
}
else if (is_userspace || !s_has_pge)
{
asm volatile("movl %0, %%cr3" :: "r"(static_cast<uint32_t>(m_highest_paging_struct)));
}
else
{
asm volatile(
"movl %%cr4, %%eax;"
"andl $~0x80, %%eax;"
"movl %%eax, %%cr4;"
"movl %0, %%cr3;"
"orl $0x80, %%eax;"
"movl %%eax, %%cr4;"
:
: "r"(static_cast<uint32_t>(m_highest_paging_struct))
: "eax"
);
}
if (send_smp_message) if (send_smp_message)
{ {
@@ -314,14 +351,14 @@ namespace Kernel
.type = Processor::SMPMessage::Type::FlushTLB, .type = Processor::SMPMessage::Type::FlushTLB,
.flush_tlb = { .flush_tlb = {
.vaddr = vaddr, .vaddr = vaddr,
.page_count = 1, .page_count = pages,
.page_table = vaddr < KERNEL_OFFSET ? this : nullptr, .page_table = vaddr < KERNEL_OFFSET ? this : nullptr,
} }
}); });
} }
} }
void PageTable::unmap_page(vaddr_t vaddr, bool send_smp_message) void PageTable::unmap_page(vaddr_t vaddr, bool invalidate)
{ {
ASSERT(vaddr); ASSERT(vaddr);
ASSERT(vaddr % PAGE_SIZE == 0); ASSERT(vaddr % PAGE_SIZE == 0);
@@ -340,16 +377,16 @@ namespace Kernel
if (is_page_free(vaddr)) if (is_page_free(vaddr))
Kernel::panic("trying to unmap unmapped page 0x{H}", vaddr); Kernel::panic("trying to unmap unmapped page 0x{H}", vaddr);
uint64_t* pdpt = reinterpret_cast<uint64_t*>(P2V(m_highest_paging_struct)); uint64_t* pdpt = P2V(m_highest_paging_struct);
uint64_t* pd = reinterpret_cast<uint64_t*>(P2V(pdpt[pdpte] & PAGE_ADDR_MASK)); uint64_t* pd = P2V(pdpt[pdpte] & s_page_addr_mask);
uint64_t* pt = reinterpret_cast<uint64_t*>(P2V(pd[pde] & PAGE_ADDR_MASK)); uint64_t* pt = P2V(pd[pde] & s_page_addr_mask);
const paddr_t old_paddr = pt[pte] & PAGE_ADDR_MASK; const paddr_t old_paddr = pt[pte] & s_page_addr_mask;
pt[pte] = 0; pt[pte] = 0;
if (old_paddr != 0) if (invalidate && old_paddr != 0)
invalidate(vaddr, send_smp_message); invalidate_page(vaddr, true);
} }
void PageTable::unmap_range(vaddr_t vaddr, size_t size) void PageTable::unmap_range(vaddr_t vaddr, size_t size)
@@ -361,18 +398,10 @@ namespace Kernel
SpinLockGuard _(m_lock); SpinLockGuard _(m_lock);
for (vaddr_t page = 0; page < page_count; page++) for (vaddr_t page = 0; page < page_count; page++)
unmap_page(vaddr + page * PAGE_SIZE, false); unmap_page(vaddr + page * PAGE_SIZE, false);
invalidate_range(vaddr, page_count, true);
Processor::broadcast_smp_message({
.type = Processor::SMPMessage::Type::FlushTLB,
.flush_tlb = {
.vaddr = vaddr,
.page_count = page_count,
.page_table = vaddr < KERNEL_OFFSET ? this : nullptr,
}
});
} }
void PageTable::map_page_at(paddr_t paddr, vaddr_t vaddr, flags_t flags, MemoryType memory_type, bool send_smp_message) void PageTable::map_page_at(paddr_t paddr, vaddr_t vaddr, flags_t flags, MemoryType memory_type, bool invalidate)
{ {
ASSERT(vaddr); ASSERT(vaddr);
ASSERT(vaddr != fast_page()); ASSERT(vaddr != fast_page());
@@ -407,11 +436,11 @@ namespace Kernel
SpinLockGuard _(m_lock); SpinLockGuard _(m_lock);
uint64_t* pdpt = reinterpret_cast<uint64_t*>(P2V(m_highest_paging_struct)); uint64_t* pdpt = P2V(m_highest_paging_struct);
if (!(pdpt[pdpte] & Flags::Present)) if (!(pdpt[pdpte] & Flags::Present))
pdpt[pdpte] = V2P(allocate_zeroed_page_aligned_page()) | Flags::Present; pdpt[pdpte] = V2P(allocate_zeroed_page_aligned_page()) | Flags::Present;
uint64_t* pd = reinterpret_cast<uint64_t*>(P2V(pdpt[pdpte] & PAGE_ADDR_MASK)); uint64_t* pd = P2V(pdpt[pdpte] & s_page_addr_mask);
if ((pd[pde] & uwr_flags) != uwr_flags) if ((pd[pde] & uwr_flags) != uwr_flags)
{ {
if (!(pd[pde] & Flags::Present)) if (!(pd[pde] & Flags::Present))
@@ -422,14 +451,14 @@ namespace Kernel
if (!(flags & Flags::Present)) if (!(flags & Flags::Present))
uwr_flags &= ~Flags::Present; uwr_flags &= ~Flags::Present;
uint64_t* pt = reinterpret_cast<uint64_t*>(P2V(pd[pde] & PAGE_ADDR_MASK)); uint64_t* pt = P2V(pd[pde] & s_page_addr_mask);
const paddr_t old_paddr = pt[pte] & PAGE_ADDR_MASK; const paddr_t old_paddr = pt[pte] & s_page_addr_mask;
pt[pte] = paddr | uwr_flags | extra_flags; pt[pte] = paddr | uwr_flags | extra_flags;
if (old_paddr != 0) if (invalidate && old_paddr != 0)
invalidate(vaddr, send_smp_message); invalidate_page(vaddr, true);
} }
void PageTable::map_range_at(paddr_t paddr, vaddr_t vaddr, size_t size, flags_t flags, MemoryType memory_type) void PageTable::map_range_at(paddr_t paddr, vaddr_t vaddr, size_t size, flags_t flags, MemoryType memory_type)
@@ -443,15 +472,49 @@ namespace Kernel
SpinLockGuard _(m_lock); SpinLockGuard _(m_lock);
for (size_t page = 0; page < page_count; page++) for (size_t page = 0; page < page_count; page++)
map_page_at(paddr + page * PAGE_SIZE, vaddr + page * PAGE_SIZE, flags, memory_type, false); map_page_at(paddr + page * PAGE_SIZE, vaddr + page * PAGE_SIZE, flags, memory_type, false);
invalidate_range(vaddr, page_count, true);
}
Processor::broadcast_smp_message({ void PageTable::remove_writable_from_range(vaddr_t vaddr, size_t size)
.type = Processor::SMPMessage::Type::FlushTLB, {
.flush_tlb = { ASSERT(vaddr);
.vaddr = vaddr, ASSERT(vaddr % PAGE_SIZE == 0);
.page_count = page_count,
.page_table = vaddr < KERNEL_OFFSET ? this : nullptr, uint32_t pdpte = (vaddr >> 30) & 0x1FF;
uint32_t pde = (vaddr >> 21) & 0x1FF;
uint32_t pte = (vaddr >> 12) & 0x1FF;
const uint32_t e_pdpte = ((vaddr + size - 1) >> 30) & 0x1FF;
const uint32_t e_pde = ((vaddr + size - 1) >> 21) & 0x1FF;
const uint32_t e_pte = ((vaddr + size - 1) >> 12) & 0x1FF;
SpinLockGuard _(m_lock);
const uint64_t* pdpt = P2V(m_highest_paging_struct);
for (; pdpte <= e_pdpte; pdpte++)
{
if (!(pdpt[pdpte] & Flags::Present))
continue;
const uint64_t* pd = P2V(pdpt[pdpte] & s_page_addr_mask);
for (; pde < 512; pde++)
{
if (pdpte == e_pdpte && pde > e_pde)
break;
if (!(pd[pde] & Flags::ReadWrite))
continue;
uint64_t* pt = P2V(pd[pde] & s_page_addr_mask);
for (; pte < 512; pte++)
{
if (pdpte == e_pdpte && pde == e_pde && pte > e_pte)
break;
pt[pte] &= ~static_cast<uint64_t>(Flags::ReadWrite);
}
pte = 0;
} }
}); pde = 0;
}
invalidate_range(vaddr, size / PAGE_SIZE, true);
} }
uint64_t PageTable::get_page_data(vaddr_t vaddr) const uint64_t PageTable::get_page_data(vaddr_t vaddr) const
@@ -464,15 +527,15 @@ namespace Kernel
SpinLockGuard _(m_lock); SpinLockGuard _(m_lock);
uint64_t* pdpt = (uint64_t*)P2V(m_highest_paging_struct); const uint64_t* pdpt = P2V(m_highest_paging_struct);
if (!(pdpt[pdpte] & Flags::Present)) if (!(pdpt[pdpte] & Flags::Present))
return 0; return 0;
uint64_t* pd = (uint64_t*)P2V(pdpt[pdpte] & PAGE_ADDR_MASK); const uint64_t* pd = P2V(pdpt[pdpte] & s_page_addr_mask);
if (!(pd[pde] & Flags::Present)) if (!(pd[pde] & Flags::Present))
return 0; return 0;
uint64_t* pt = (uint64_t*)P2V(pd[pde] & PAGE_ADDR_MASK); const uint64_t* pt = P2V(pd[pde] & s_page_addr_mask);
if (!(pt[pte] & Flags::Used)) if (!(pt[pte] & Flags::Used))
return 0; return 0;
@@ -486,8 +549,7 @@ namespace Kernel
paddr_t PageTable::physical_address_of(vaddr_t vaddr) const paddr_t PageTable::physical_address_of(vaddr_t vaddr) const
{ {
uint64_t page_data = get_page_data(vaddr); return get_page_data(vaddr) & s_page_addr_mask;
return (page_data & PAGE_ADDR_MASK) & ~(1ull << 63);
} }
bool PageTable::is_page_free(vaddr_t vaddr) const bool PageTable::is_page_free(vaddr_t vaddr) const
@@ -529,14 +591,8 @@ namespace Kernel
return false; return false;
for (size_t offset = 0; offset < bytes; offset += PAGE_SIZE) for (size_t offset = 0; offset < bytes; offset += PAGE_SIZE)
reserve_page(vaddr + offset, true, false); reserve_page(vaddr + offset, true, false);
Processor::broadcast_smp_message({ invalidate_range(vaddr, bytes / PAGE_SIZE, true);
.type = Processor::SMPMessage::Type::FlushTLB,
.flush_tlb = {
.vaddr = vaddr,
.page_count = bytes / PAGE_SIZE,
.page_table = vaddr < KERNEL_OFFSET ? this : nullptr,
}
});
return true; return true;
} }
@@ -549,48 +605,47 @@ namespace Kernel
if (size_t rem = last_address % PAGE_SIZE) if (size_t rem = last_address % PAGE_SIZE)
last_address -= rem; last_address -= rem;
const uint32_t s_pdpte = (first_address >> 30) & 0x1FF; uint32_t pdpte = (first_address >> 30) & 0x1FF;
const uint32_t s_pde = (first_address >> 21) & 0x1FF; uint32_t pde = (first_address >> 21) & 0x1FF;
const uint32_t s_pte = (first_address >> 12) & 0x1FF; uint32_t pte = (first_address >> 12) & 0x1FF;
const uint32_t e_pdpte = (last_address >> 30) & 0x1FF; const uint32_t e_pdpte = ((last_address - 1) >> 30) & 0x1FF;
const uint32_t e_pde = (last_address >> 21) & 0x1FF; const uint32_t e_pde = ((last_address - 1) >> 21) & 0x1FF;
const uint32_t e_pte = (last_address >> 12) & 0x1FF; const uint32_t e_pte = ((last_address - 1) >> 12) & 0x1FF;
SpinLockGuard _(m_lock); SpinLockGuard _(m_lock);
// Try to find free page that can be mapped without // Try to find free page that can be mapped without
// allocations (page table with unused entries) // allocations (page table with unused entries)
uint64_t* pdpt = reinterpret_cast<uint64_t*>(P2V(m_highest_paging_struct)); const uint64_t* pdpt = P2V(m_highest_paging_struct);
for (uint32_t pdpte = s_pdpte; pdpte < 4; pdpte++) for (; pdpte <= e_pdpte; pdpte++)
{ {
if (pdpte > e_pdpte)
break;
if (!(pdpt[pdpte] & Flags::Present)) if (!(pdpt[pdpte] & Flags::Present))
continue; continue;
uint64_t* pd = reinterpret_cast<uint64_t*>(P2V(pdpt[pdpte] & PAGE_ADDR_MASK)); const uint64_t* pd = P2V(pdpt[pdpte] & s_page_addr_mask);
for (uint32_t pde = s_pde; pde < 512; pde++) for (; pde < 512; pde++)
{ {
if (pdpte == e_pdpte && pde > e_pde) if (pdpte == e_pdpte && pde > e_pde)
break; break;
if (!(pd[pde] & Flags::Present)) if (!(pd[pde] & Flags::Present))
continue; continue;
uint64_t* pt = (uint64_t*)P2V(pd[pde] & PAGE_ADDR_MASK); const uint64_t* pt = P2V(pd[pde] & s_page_addr_mask);
for (uint32_t pte = s_pte; pte < 512; pte++) for (; pte < 512; pte++)
{ {
if (pdpte == e_pdpte && pde == e_pde && pte >= e_pte) if (pdpte == e_pdpte && pde == e_pde && pte > e_pte)
break; break;
if (!(pt[pte] & Flags::Used)) if (pt[pte] & Flags::Used)
{ continue;
vaddr_t vaddr = 0; vaddr_t vaddr = 0;
vaddr |= (vaddr_t)pdpte << 30; vaddr |= (vaddr_t)pdpte << 30;
vaddr |= (vaddr_t)pde << 21; vaddr |= (vaddr_t)pde << 21;
vaddr |= (vaddr_t)pte << 12; vaddr |= (vaddr_t)pte << 12;
ASSERT(reserve_page(vaddr)); ASSERT(reserve_page(vaddr));
return vaddr; return vaddr;
}
} }
pte = 0;
} }
pde = 0;
} }
// Find any free page // Find any free page
@@ -603,7 +658,7 @@ namespace Kernel
} }
} }
ASSERT_NOT_REACHED(); return 0;
} }
vaddr_t PageTable::reserve_free_contiguous_pages(size_t page_count, vaddr_t first_address, vaddr_t last_address) vaddr_t PageTable::reserve_free_contiguous_pages(size_t page_count, vaddr_t first_address, vaddr_t last_address)
@@ -636,7 +691,7 @@ namespace Kernel
} }
} }
ASSERT_NOT_REACHED(); return 0;
} }
static void dump_range(vaddr_t start, vaddr_t end, PageTable::flags_t flags) static void dump_range(vaddr_t start, vaddr_t end, PageTable::flags_t flags)
@@ -659,7 +714,7 @@ namespace Kernel
flags_t flags = 0; flags_t flags = 0;
vaddr_t start = 0; vaddr_t start = 0;
uint64_t* pdpt = reinterpret_cast<uint64_t*>(P2V(m_highest_paging_struct)); const uint64_t* pdpt = P2V(m_highest_paging_struct);
for (uint32_t pdpte = 0; pdpte < 4; pdpte++) for (uint32_t pdpte = 0; pdpte < 4; pdpte++)
{ {
if (!(pdpt[pdpte] & Flags::Present)) if (!(pdpt[pdpte] & Flags::Present))
@@ -668,7 +723,7 @@ namespace Kernel
start = 0; start = 0;
continue; continue;
} }
uint64_t* pd = (uint64_t*)P2V(pdpt[pdpte] & PAGE_ADDR_MASK); const uint64_t* pd = P2V(pdpt[pdpte] & s_page_addr_mask);
for (uint64_t pde = 0; pde < 512; pde++) for (uint64_t pde = 0; pde < 512; pde++)
{ {
if (!(pd[pde] & Flags::Present)) if (!(pd[pde] & Flags::Present))
@@ -677,7 +732,7 @@ namespace Kernel
start = 0; start = 0;
continue; continue;
} }
uint64_t* pt = (uint64_t*)P2V(pd[pde] & PAGE_ADDR_MASK); const uint64_t* pt = P2V(pd[pde] & s_page_addr_mask);
for (uint64_t pte = 0; pte < 512; pte++) for (uint64_t pte = 0; pte < 512; pte++)
{ {
if (parse_flags(pt[pte]) != flags) if (parse_flags(pt[pte]) != flags)

View File

@@ -1,12 +1,13 @@
.section .userspace, "ax" .section .userspace, "ax"
// stack contains // stack contains
// return address // (4 bytes) return address (on return stack)
// return stack // (4 bytes) return stack
// return rflags // (4 bytes) return rflags
// siginfo_t // (8 bytes) restore sigmask
// signal number // (36 bytes) siginfo_t
// signal handler // (4 bytes) signal number
// (4 bytes) signal handler
.global signal_trampoline .global signal_trampoline
signal_trampoline: signal_trampoline:
@@ -18,6 +19,10 @@ signal_trampoline:
pushl %eax pushl %eax
pushl %ebp pushl %ebp
movl 80(%esp), %eax
pushl %eax; addl $4, (%esp)
pushl (%eax)
// FIXME: populate these // FIXME: populate these
xorl %eax, %eax xorl %eax, %eax
pushl %eax // stack pushl %eax // stack
@@ -28,9 +33,9 @@ signal_trampoline:
pushl %eax // link pushl %eax // link
movl %esp, %edx // ucontext movl %esp, %edx // ucontext
leal 60(%esp), %esi // siginfo leal 68(%esp), %esi // siginfo
movl 56(%esp), %edi // signal number movl 64(%esp), %edi // signal number
movl 52(%esp), %eax // handlers movl 60(%esp), %eax // handlers
// align stack to 16 bytes // align stack to 16 bytes
movl %esp, %ebp movl %esp, %ebp
@@ -53,7 +58,15 @@ signal_trampoline:
movl %ebp, %esp movl %ebp, %esp
addl $24, %esp addl $24, %esp
// restore sigmask
movl $83, %eax // SYS_SIGPROCMASK
movl $3, %ebx // SIG_SETMASK
leal 72(%esp), %ecx // set
xorl %edx, %edx // oset
int $0xF0
// restore registers // restore registers
addl $8, %esp
popl %ebp popl %ebp
popl %eax popl %eax
popl %ebx popl %ebx
@@ -62,8 +75,8 @@ signal_trampoline:
popl %edi popl %edi
popl %esi popl %esi
// skip handler, number, siginfo_t // skip handler, number, siginfo_t, sigmask
addl $44, %esp addl $52, %esp
// restore flags // restore flags
popf popf

View File

@@ -63,7 +63,7 @@ sys_fork_trampoline:
call read_ip call read_ip
testl %eax, %eax testl %eax, %eax
jz .reload_stack jz .done
movl %esp, %ebx movl %esp, %ebx
@@ -79,9 +79,3 @@ sys_fork_trampoline:
popl %ebx popl %ebx
popl %ebp popl %ebp
ret ret
.reload_stack:
call get_thread_start_sp
movl %eax, %esp
xorl %eax, %eax
jmp .done

View File

@@ -7,9 +7,6 @@ read_ip:
# void start_kernel_thread() # void start_kernel_thread()
.global start_kernel_thread .global start_kernel_thread
start_kernel_thread: start_kernel_thread:
call get_thread_start_sp
movl %eax, %esp
# STACK LAYOUT # STACK LAYOUT
# on_exit arg # on_exit arg
# on_exit func # on_exit func
@@ -34,9 +31,6 @@ start_kernel_thread:
.global start_userspace_thread .global start_userspace_thread
start_userspace_thread: start_userspace_thread:
call get_thread_start_sp
movl %eax, %esp
movw $(0x20 | 3), %bx movw $(0x20 | 3), %bx
movw %bx, %ds movw %bx, %ds
movw %bx, %es movw %bx, %es

54
kernel/arch/i686/User.S Normal file
View File

@@ -0,0 +1,54 @@
# bool safe_user_memcpy(void*, const void*, size_t)
.global safe_user_memcpy
.global safe_user_memcpy_end
.global safe_user_memcpy_fault
safe_user_memcpy:
xorl %eax, %eax
xchgl 4(%esp), %edi
xchgl 8(%esp), %esi
movl 12(%esp), %ecx
movl %edi, %edx
rep movsb
movl 4(%esp), %edi
movl 8(%esp), %esi
incl %eax
safe_user_memcpy_fault:
ret
safe_user_memcpy_end:
# bool safe_user_strncpy(void*, const void*, size_t)
.global safe_user_strncpy
.global safe_user_strncpy_end
.global safe_user_strncpy_fault
safe_user_strncpy:
xchgl 4(%esp), %edi
xchgl 8(%esp), %esi
movl 12(%esp), %ecx
testl %ecx, %ecx
jz safe_user_strncpy_fault
.safe_user_strncpy_loop:
movb (%esi), %al
movb %al, (%edi)
testb %al, %al
jz .safe_user_strncpy_done
incl %edi
incl %esi
decl %ecx
jnz .safe_user_strncpy_loop
safe_user_strncpy_fault:
xorl %eax, %eax
jmp .safe_user_strncpy_return
.safe_user_strncpy_done:
movl $1, %eax
.safe_user_strncpy_return:
movl 4(%esp), %edi
movl 8(%esp), %esi
ret
safe_user_strncpy_end:

View File

@@ -11,13 +11,14 @@
.code32 .code32
# multiboot2 header // video mode info, page align modules
.set multiboot_flags, (1 << 2) | (1 << 0)
.section .multiboot, "aw" .section .multiboot, "aw"
.align 8
multiboot_start: multiboot_start:
.long 0x1BADB002 .long 0x1BADB002
.long (1 << 2) # page align modules .long multiboot_flags
.long -(0x1BADB002 + (1 << 2)) .long -(0x1BADB002 + multiboot_flags)
.long 0 .long 0
.long 0 .long 0
@@ -30,7 +31,8 @@ multiboot_start:
.long FB_HEIGHT .long FB_HEIGHT
.long FB_BPP .long FB_BPP
multiboot_end: multiboot_end:
.align 8
.section .multiboot2, "aw"
multiboot2_start: multiboot2_start:
.long 0xE85250D6 .long 0xE85250D6
.long 0 .long 0
@@ -66,7 +68,6 @@ multiboot2_start:
multiboot2_end: multiboot2_end:
.section .bananboot, "aw" .section .bananboot, "aw"
.align 8
bananboot_start: bananboot_start:
.long 0xBABAB007 .long 0xBABAB007
.long -(0xBABAB007 + FB_WIDTH + FB_HEIGHT + FB_BPP) .long -(0xBABAB007 + FB_WIDTH + FB_HEIGHT + FB_BPP)

View File

@@ -1,20 +1,20 @@
.macro maybe_load_kernel_segments, n .macro intr_header, n
testb $3, \n(%esp) pushal
jz 1f; jnp 1f testb $3, \n+8*4(%esp)
jz 1f
movw $0x10, %ax movw $0x10, %ax
movw %ax, %ds movw %ax, %ds
movw %ax, %es movw %ax, %es
movw %ax, %fs movw %ax, %fs
movw $0x28, %ax movw $0x28, %ax
movw %ax, %gs movw %ax, %gs
1: 1: cld
.endm .endm
.macro maybe_load_userspace_segments, n .macro intr_footer, n
testb $3, \n(%esp) testb $3, \n+8*4(%esp)
jz 1f; jnp 1f jz 1f
call cpp_check_signal
movw $(0x20 | 3), %bx movw $(0x20 | 3), %bx
movw %bx, %ds movw %bx, %ds
movw %bx, %es movw %bx, %es
@@ -22,14 +22,11 @@
movw %bx, %fs movw %bx, %fs
movw $(0x38 | 3), %bx movw $(0x38 | 3), %bx
movw %bx, %gs movw %bx, %gs
1: 1: popal
.endm .endm
isr_stub: isr_stub:
pushal intr_header 12
maybe_load_kernel_segments 44
cld
movl %cr0, %eax; pushl %eax movl %cr0, %eax; pushl %eax
movl %cr2, %eax; pushl %eax movl %cr2, %eax; pushl %eax
movl %cr3, %eax; pushl %eax movl %cr3, %eax; pushl %eax
@@ -57,15 +54,12 @@ isr_stub:
movl %ebp, %esp movl %ebp, %esp
addl $24, %esp addl $24, %esp
maybe_load_userspace_segments 44 intr_footer 12
popal
addl $8, %esp addl $8, %esp
iret iret
irq_stub: irq_stub:
pushal intr_header 12
maybe_load_kernel_segments 44
cld
movl 32(%esp), %edi # interrupt number movl 32(%esp), %edi # interrupt number
@@ -78,16 +72,13 @@ irq_stub:
movl %ebp, %esp movl %ebp, %esp
maybe_load_userspace_segments 44 intr_footer 12
popal
addl $8, %esp addl $8, %esp
iret iret
.global asm_ipi_handler .global asm_ipi_handler
asm_ipi_handler: asm_ipi_handler:
pushal intr_header 4
maybe_load_kernel_segments 36
cld
movl %esp, %ebp movl %esp, %ebp
andl $-16, %esp andl $-16, %esp
@@ -96,15 +87,12 @@ asm_ipi_handler:
movl %ebp, %esp movl %ebp, %esp
maybe_load_userspace_segments 36 intr_footer 4
popal
iret iret
.global asm_timer_handler .global asm_timer_handler
asm_timer_handler: asm_timer_handler:
pushal intr_header 4
maybe_load_kernel_segments 36
cld
movl %esp, %ebp movl %esp, %ebp
andl $-16, %esp andl $-16, %esp
@@ -113,8 +101,7 @@ asm_timer_handler:
movl %ebp, %esp movl %ebp, %esp
maybe_load_userspace_segments 36 intr_footer 4
popal
iret iret
.macro isr n .macro isr n

View File

@@ -11,6 +11,7 @@ SECTIONS
{ {
g_kernel_execute_start = .; g_kernel_execute_start = .;
*(.multiboot) *(.multiboot)
*(.multiboot2)
*(.bananboot) *(.bananboot)
*(.text.*) *(.text.*)
} }

View File

@@ -23,7 +23,7 @@ namespace Kernel
SpinLock PageTable::s_fast_page_lock; SpinLock PageTable::s_fast_page_lock;
static constexpr vaddr_t s_hhdm_offset = 0xFFFF800000000000; static constexpr vaddr_t s_hhdm_offset = 0xFFFF800000000000;
static bool s_is_hddm_initialized = false; static bool s_is_post_heap_done = false;
constexpr uint64_t s_page_flag_mask = 0x8000000000000FFF; constexpr uint64_t s_page_flag_mask = 0x8000000000000FFF;
constexpr uint64_t s_page_addr_mask = ~s_page_flag_mask; constexpr uint64_t s_page_addr_mask = ~s_page_flag_mask;
@@ -376,7 +376,7 @@ namespace Kernel
V2P = &FuncsHHDM::V2P; V2P = &FuncsHHDM::V2P;
P2V = &FuncsHHDM::P2V; P2V = &FuncsHHDM::P2V;
s_is_hddm_initialized = true; s_is_post_heap_done = true;
// This is a hack to unmap fast page. fast page pt is copied // This is a hack to unmap fast page. fast page pt is copied
// while it is mapped, so we need to manually unmap it // while it is mapped, so we need to manually unmap it
@@ -485,6 +485,7 @@ namespace Kernel
constexpr uint64_t pml4e = (uc_vaddr >> 39) & 0x1FF; constexpr uint64_t pml4e = (uc_vaddr >> 39) & 0x1FF;
constexpr uint64_t pdpte = (uc_vaddr >> 30) & 0x1FF; constexpr uint64_t pdpte = (uc_vaddr >> 30) & 0x1FF;
constexpr uint64_t pde = (uc_vaddr >> 21) & 0x1FF; constexpr uint64_t pde = (uc_vaddr >> 21) & 0x1FF;
constexpr uint64_t pte = (uc_vaddr >> 12) & 0x1FF;
uint64_t* pml4 = P2V(m_highest_paging_struct); uint64_t* pml4 = P2V(m_highest_paging_struct);
ASSERT(!(pml4[pml4e] & Flags::Present)); ASSERT(!(pml4[pml4e] & Flags::Present));
@@ -497,6 +498,10 @@ namespace Kernel
uint64_t* pd = P2V(pdpt[pdpte] & s_page_addr_mask); uint64_t* pd = P2V(pdpt[pdpte] & s_page_addr_mask);
ASSERT(!(pd[pde] & Flags::Present)); ASSERT(!(pd[pde] & Flags::Present));
pd[pde] = allocate_zeroed_page_aligned_page() | Flags::ReadWrite | Flags::Present; pd[pde] = allocate_zeroed_page_aligned_page() | Flags::ReadWrite | Flags::Present;
uint64_t* pt = P2V(pd[pde] & s_page_addr_mask);
ASSERT(pt[pte] == 0);
pt[pte] = Flags::Reserved;
} }
void PageTable::map_fast_page(paddr_t paddr) void PageTable::map_fast_page(paddr_t paddr)
@@ -542,7 +547,7 @@ namespace Kernel
uint64_t* pt = P2V(pd[pde] & s_page_addr_mask); uint64_t* pt = P2V(pd[pde] & s_page_addr_mask);
ASSERT(pt[pte] & Flags::Present); ASSERT(pt[pte] & Flags::Present);
pt[pte] = 0; pt[pte] = Flags::Reserved;
asm volatile("invlpg (%0)" :: "r"(fast_page()) : "memory"); asm volatile("invlpg (%0)" :: "r"(fast_page()) : "memory");
} }
@@ -612,10 +617,39 @@ namespace Kernel
Processor::set_current_page_table(this); Processor::set_current_page_table(this);
} }
void PageTable::invalidate(vaddr_t vaddr, bool send_smp_message) void PageTable::invalidate_range(vaddr_t vaddr, size_t pages, bool send_smp_message)
{ {
ASSERT(vaddr % PAGE_SIZE == 0); ASSERT(vaddr % PAGE_SIZE == 0);
asm volatile("invlpg (%0)" :: "r"(vaddr) : "memory");
const bool is_userspace = (vaddr < KERNEL_OFFSET);
if (is_userspace && this != &PageTable::current())
;
else if (pages <= 32 || !s_is_post_heap_done)
{
for (size_t i = 0; i < pages; i++, vaddr += PAGE_SIZE)
asm volatile("invlpg (%0)" :: "r"(vaddr));
}
else if (is_userspace || !s_has_pge)
{
asm volatile("movq %0, %%cr3" :: "r"(m_highest_paging_struct));
}
else
{
asm volatile(
"movq %%cr4, %%rax;"
"andq $~0x80, %%rax;"
"movq %%rax, %%cr4;"
"movq %0, %%cr3;"
"orq $0x80, %%rax;"
"movq %%rax, %%cr4;"
:
: "r"(m_highest_paging_struct)
: "rax"
);
}
if (send_smp_message) if (send_smp_message)
{ {
@@ -623,14 +657,14 @@ namespace Kernel
.type = Processor::SMPMessage::Type::FlushTLB, .type = Processor::SMPMessage::Type::FlushTLB,
.flush_tlb = { .flush_tlb = {
.vaddr = vaddr, .vaddr = vaddr,
.page_count = 1, .page_count = pages,
.page_table = vaddr < KERNEL_OFFSET ? this : nullptr, .page_table = vaddr < KERNEL_OFFSET ? this : nullptr,
} }
}); });
} }
} }
void PageTable::unmap_page(vaddr_t vaddr, bool send_smp_message) void PageTable::unmap_page(vaddr_t vaddr, bool invalidate)
{ {
ASSERT(vaddr); ASSERT(vaddr);
ASSERT(vaddr != fast_page()); ASSERT(vaddr != fast_page());
@@ -663,31 +697,23 @@ namespace Kernel
pt[pte] = 0; pt[pte] = 0;
if (old_paddr != 0) if (invalidate && old_paddr != 0)
invalidate(vaddr, send_smp_message); invalidate_page(vaddr, true);
} }
void PageTable::unmap_range(vaddr_t vaddr, size_t size) void PageTable::unmap_range(vaddr_t vaddr, size_t size)
{ {
ASSERT(vaddr % PAGE_SIZE == 0); ASSERT(vaddr % PAGE_SIZE == 0);
size_t page_count = range_page_count(vaddr, size); const size_t page_count = range_page_count(vaddr, size);
SpinLockGuard _(m_lock); SpinLockGuard _(m_lock);
for (vaddr_t page = 0; page < page_count; page++) for (vaddr_t page = 0; page < page_count; page++)
unmap_page(vaddr + page * PAGE_SIZE, false); unmap_page(vaddr + page * PAGE_SIZE, false);
invalidate_range(vaddr, page_count, true);
Processor::broadcast_smp_message({
.type = Processor::SMPMessage::Type::FlushTLB,
.flush_tlb = {
.vaddr = vaddr,
.page_count = page_count,
.page_table = vaddr < KERNEL_OFFSET ? this : nullptr,
}
});
} }
void PageTable::map_page_at(paddr_t paddr, vaddr_t vaddr, flags_t flags, MemoryType memory_type, bool send_smp_message) void PageTable::map_page_at(paddr_t paddr, vaddr_t vaddr, flags_t flags, MemoryType memory_type, bool invalidate)
{ {
ASSERT(vaddr); ASSERT(vaddr);
ASSERT(vaddr != fast_page()); ASSERT(vaddr != fast_page());
@@ -752,8 +778,8 @@ namespace Kernel
pt[pte] = paddr | uwr_flags | extra_flags; pt[pte] = paddr | uwr_flags | extra_flags;
if (old_paddr != 0) if (invalidate && old_paddr != 0)
invalidate(vaddr, send_smp_message); invalidate_page(vaddr, true);
} }
void PageTable::map_range_at(paddr_t paddr, vaddr_t vaddr, size_t size, flags_t flags, MemoryType memory_type) void PageTable::map_range_at(paddr_t paddr, vaddr_t vaddr, size_t size, flags_t flags, MemoryType memory_type)
@@ -769,15 +795,66 @@ namespace Kernel
SpinLockGuard _(m_lock); SpinLockGuard _(m_lock);
for (size_t page = 0; page < page_count; page++) for (size_t page = 0; page < page_count; page++)
map_page_at(paddr + page * PAGE_SIZE, vaddr + page * PAGE_SIZE, flags, memory_type, false); map_page_at(paddr + page * PAGE_SIZE, vaddr + page * PAGE_SIZE, flags, memory_type, false);
invalidate_range(vaddr, page_count, true);
}
Processor::broadcast_smp_message({ void PageTable::remove_writable_from_range(vaddr_t vaddr, size_t size)
.type = Processor::SMPMessage::Type::FlushTLB, {
.flush_tlb = { ASSERT(vaddr);
.vaddr = vaddr, ASSERT(vaddr % PAGE_SIZE == 0);
.page_count = page_count,
.page_table = vaddr < KERNEL_OFFSET ? this : nullptr, ASSERT(is_canonical(vaddr));
ASSERT(is_canonical(vaddr + size - 1));
const vaddr_t uc_vaddr_start = uncanonicalize(vaddr);
const vaddr_t uc_vaddr_end = uncanonicalize(vaddr + size - 1);
uint16_t pml4e = (uc_vaddr_start >> 39) & 0x1FF;
uint16_t pdpte = (uc_vaddr_start >> 30) & 0x1FF;
uint16_t pde = (uc_vaddr_start >> 21) & 0x1FF;
uint16_t pte = (uc_vaddr_start >> 12) & 0x1FF;
const uint16_t e_pml4e = (uc_vaddr_end >> 39) & 0x1FF;
const uint16_t e_pdpte = (uc_vaddr_end >> 30) & 0x1FF;
const uint16_t e_pde = (uc_vaddr_end >> 21) & 0x1FF;
const uint16_t e_pte = (uc_vaddr_end >> 12) & 0x1FF;
SpinLockGuard _(m_lock);
const uint64_t* pml4 = P2V(m_highest_paging_struct);
for (; pml4e <= e_pml4e; pml4e++)
{
if (!(pml4[pml4e] & Flags::ReadWrite))
continue;
const uint64_t* pdpt = P2V(pml4[pml4e] & s_page_addr_mask);
for (; pdpte < 512; pdpte++)
{
if (pml4e == e_pml4e && pdpte > e_pdpte)
break;
if (!(pdpt[pdpte] & Flags::ReadWrite))
continue;
const uint64_t* pd = P2V(pdpt[pdpte] & s_page_addr_mask);
for (; pde < 512; pde++)
{
if (pml4e == e_pml4e && pdpte == e_pdpte && pde > e_pde)
break;
if (!(pd[pde] & Flags::ReadWrite))
continue;
uint64_t* pt = P2V(pd[pde] & s_page_addr_mask);
for (; pte < 512; pte++)
{
if (pml4e == e_pml4e && pdpte == e_pdpte && pde == e_pde && pte > e_pte)
break;
pt[pte] &= ~static_cast<uint64_t>(Flags::ReadWrite);
}
pte = 0;
}
pde = 0;
} }
}); pdpte = 0;
}
invalidate_range(vaddr, size / PAGE_SIZE, true);
} }
uint64_t PageTable::get_page_data(vaddr_t vaddr) const uint64_t PageTable::get_page_data(vaddr_t vaddr) const
@@ -824,13 +901,13 @@ namespace Kernel
return page_data & s_page_addr_mask; return page_data & s_page_addr_mask;
} }
bool PageTable::reserve_page(vaddr_t vaddr, bool only_free, bool send_smp_message) bool PageTable::reserve_page(vaddr_t vaddr, bool only_free, bool invalidate)
{ {
SpinLockGuard _(m_lock); SpinLockGuard _(m_lock);
ASSERT(vaddr % PAGE_SIZE == 0); ASSERT(vaddr % PAGE_SIZE == 0);
if (only_free && !is_page_free(vaddr)) if (only_free && !is_page_free(vaddr))
return false; return false;
map_page_at(0, vaddr, Flags::Reserved, MemoryType::Normal, send_smp_message); map_page_at(0, vaddr, Flags::Reserved, MemoryType::Normal, invalidate);
return true; return true;
} }
@@ -845,14 +922,7 @@ namespace Kernel
return false; return false;
for (size_t offset = 0; offset < bytes; offset += PAGE_SIZE) for (size_t offset = 0; offset < bytes; offset += PAGE_SIZE)
reserve_page(vaddr + offset, true, false); reserve_page(vaddr + offset, true, false);
Processor::broadcast_smp_message({ invalidate_range(vaddr, bytes / PAGE_SIZE, true);
.type = Processor::SMPMessage::Type::FlushTLB,
.flush_tlb = {
.vaddr = vaddr,
.page_count = bytes / PAGE_SIZE,
.page_table = vaddr < KERNEL_OFFSET ? this : nullptr,
}
});
return true; return true;
} }
@@ -866,9 +936,9 @@ namespace Kernel
last_address -= rem; last_address -= rem;
ASSERT(is_canonical(first_address)); ASSERT(is_canonical(first_address));
ASSERT(is_canonical(last_address)); ASSERT(is_canonical(last_address - 1));
const vaddr_t uc_vaddr_start = uncanonicalize(first_address); const vaddr_t uc_vaddr_start = uncanonicalize(first_address);
const vaddr_t uc_vaddr_end = uncanonicalize(last_address); const vaddr_t uc_vaddr_end = uncanonicalize(last_address - 1);
uint16_t pml4e = (uc_vaddr_start >> 39) & 0x1FF; uint16_t pml4e = (uc_vaddr_start >> 39) & 0x1FF;
uint16_t pdpte = (uc_vaddr_start >> 30) & 0x1FF; uint16_t pdpte = (uc_vaddr_start >> 30) & 0x1FF;
@@ -885,10 +955,8 @@ namespace Kernel
// Try to find free page that can be mapped without // Try to find free page that can be mapped without
// allocations (page table with unused entries) // allocations (page table with unused entries)
const uint64_t* pml4 = P2V(m_highest_paging_struct); const uint64_t* pml4 = P2V(m_highest_paging_struct);
for (; pml4e < 512; pml4e++) for (; pml4e <= e_pml4e; pml4e++)
{ {
if (pml4e > e_pml4e)
break;
if (!(pml4[pml4e] & Flags::Present)) if (!(pml4[pml4e] & Flags::Present))
continue; continue;
const uint64_t* pdpt = P2V(pml4[pml4e] & s_page_addr_mask); const uint64_t* pdpt = P2V(pml4[pml4e] & s_page_addr_mask);
@@ -908,22 +976,24 @@ namespace Kernel
const uint64_t* pt = P2V(pd[pde] & s_page_addr_mask); const uint64_t* pt = P2V(pd[pde] & s_page_addr_mask);
for (; pte < 512; pte++) for (; pte < 512; pte++)
{ {
if (pml4e == e_pml4e && pdpte == e_pdpte && pde == e_pde && pte >= e_pte) if (pml4e == e_pml4e && pdpte == e_pdpte && pde == e_pde && pte > e_pte)
break; break;
if (!(pt[pte] & Flags::Used)) if (pt[pte] & Flags::Used)
{ continue;
vaddr_t vaddr = 0; vaddr_t vaddr = 0;
vaddr |= static_cast<uint64_t>(pml4e) << 39; vaddr |= static_cast<uint64_t>(pml4e) << 39;
vaddr |= static_cast<uint64_t>(pdpte) << 30; vaddr |= static_cast<uint64_t>(pdpte) << 30;
vaddr |= static_cast<uint64_t>(pde) << 21; vaddr |= static_cast<uint64_t>(pde) << 21;
vaddr |= static_cast<uint64_t>(pte) << 12; vaddr |= static_cast<uint64_t>(pte) << 12;
vaddr = canonicalize(vaddr); vaddr = canonicalize(vaddr);
ASSERT(reserve_page(vaddr)); ASSERT(reserve_page(vaddr));
return vaddr; return vaddr;
}
} }
pte = 0;
} }
pde = 0;
} }
pdpte = 0;
} }
for (vaddr_t uc_vaddr = uc_vaddr_start; uc_vaddr < uc_vaddr_end; uc_vaddr += PAGE_SIZE) for (vaddr_t uc_vaddr = uc_vaddr_start; uc_vaddr < uc_vaddr_end; uc_vaddr += PAGE_SIZE)
@@ -935,7 +1005,7 @@ namespace Kernel
} }
} }
ASSERT_NOT_REACHED(); return 0;
} }
vaddr_t PageTable::reserve_free_contiguous_pages(size_t page_count, vaddr_t first_address, vaddr_t last_address) vaddr_t PageTable::reserve_free_contiguous_pages(size_t page_count, vaddr_t first_address, vaddr_t last_address)
@@ -948,7 +1018,7 @@ namespace Kernel
last_address -= rem; last_address -= rem;
ASSERT(is_canonical(first_address)); ASSERT(is_canonical(first_address));
ASSERT(is_canonical(last_address)); ASSERT(is_canonical(last_address - 1));
SpinLockGuard _(m_lock); SpinLockGuard _(m_lock);
@@ -977,7 +1047,7 @@ namespace Kernel
} }
} }
ASSERT_NOT_REACHED(); return 0;
} }
bool PageTable::is_page_free(vaddr_t page) const bool PageTable::is_page_free(vaddr_t page) const

View File

@@ -1,12 +1,13 @@
.section .userspace, "ax" .section .userspace, "ax"
// stack contains // stack contains
// return address // (8 bytes) return address (on return stack)
// return stack // (8 bytes) return stack
// return rflags // (8 bytes) return rflags
// siginfo_t // (8 bytes) restore sigmask
// signal number // (56 bytes) siginfo_t
// signal handler // (8 bytes) signal number
// (8 bytes) signal handler
.global signal_trampoline .global signal_trampoline
signal_trampoline: signal_trampoline:
@@ -26,6 +27,10 @@ signal_trampoline:
pushq %rax pushq %rax
pushq %rbp pushq %rbp
movq 208(%rsp), %rax
pushq %rax; addq $(128 + 8), (%rsp)
pushq (%rax)
// FIXME: populate these // FIXME: populate these
xorq %rax, %rax xorq %rax, %rax
pushq %rax // stack pushq %rax // stack
@@ -35,9 +40,9 @@ signal_trampoline:
pushq %rax // link pushq %rax // link
movq %rsp, %rdx // ucontext movq %rsp, %rdx // ucontext
leaq 176(%rsp), %rsi // siginfo leaq 192(%rsp), %rsi // siginfo
movq 168(%rsp), %rdi // signal number movq 184(%rsp), %rdi // signal number
movq 160(%rsp), %rax // handler movq 176(%rsp), %rax // handler
// align stack to 16 bytes // align stack to 16 bytes
movq %rsp, %rbp movq %rsp, %rbp
@@ -55,7 +60,15 @@ signal_trampoline:
movq %rbp, %rsp movq %rbp, %rsp
addq $40, %rsp addq $40, %rsp
// restore sigmask
movq $83, %rdi // SYS_SIGPROCMASK
movq $3, %rsi // SIG_SETMASK
leaq 192(%rsp), %rdx // set
xorq %r10, %r10 // oset
syscall
// restore registers // restore registers
addq $16, %rsp
popq %rbp popq %rbp
popq %rax popq %rax
popq %rbx popq %rbx
@@ -72,13 +85,13 @@ signal_trampoline:
popq %r14 popq %r14
popq %r15 popq %r15
// skip handler, number, siginfo_t // skip handler, number, siginfo_t, sigmask
addq $72, %rsp addq $80, %rsp
// restore flags // restore flags
popfq popfq
movq (%rsp), %rsp movq (%rsp), %rsp
// return over red-zone and siginfo_t // return over red-zone
ret $128 ret $128

View File

@@ -33,7 +33,7 @@ sys_fork_trampoline:
call read_ip call read_ip
testq %rax, %rax testq %rax, %rax
je .done jz .done
movq %rax, %rsi movq %rax, %rsi
movq %rsp, %rdi movq %rsp, %rdi

View File

@@ -7,9 +7,6 @@ read_ip:
# void start_kernel_thread() # void start_kernel_thread()
.global start_kernel_thread .global start_kernel_thread
start_kernel_thread: start_kernel_thread:
call get_thread_start_sp
movq %rax, %rsp
# STACK LAYOUT # STACK LAYOUT
# on_exit arg # on_exit arg
# on_exit func # on_exit func
@@ -27,9 +24,5 @@ start_kernel_thread:
.global start_userspace_thread .global start_userspace_thread
start_userspace_thread: start_userspace_thread:
call get_thread_start_sp
movq %rax, %rsp
swapgs swapgs
iretq iretq

87
kernel/arch/x86_64/User.S Normal file
View File

@@ -0,0 +1,87 @@
# bool safe_user_memcpy(void*, const void*, size_t)
.global safe_user_memcpy
.global safe_user_memcpy_end
.global safe_user_memcpy_fault
safe_user_memcpy:
xorq %rax, %rax
movq %rdx, %rcx
rep movsb
incq %rax
safe_user_memcpy_fault:
ret
safe_user_memcpy_end:
# bool safe_user_strncpy(void*, const void*, size_t)
.global safe_user_strncpy
.global safe_user_strncpy_end
.global safe_user_strncpy_fault
safe_user_strncpy:
movq %rdx, %rcx
testq %rcx, %rcx
jz safe_user_strncpy_fault
.safe_user_strncpy_align_loop:
testb $0x7, %sil
jz .safe_user_strncpy_align_done
movb (%rsi), %al
movb %al, (%rdi)
testb %al, %al
jz .safe_user_strncpy_done
incq %rdi
incq %rsi
decq %rcx
jnz .safe_user_strncpy_align_loop
jmp safe_user_strncpy_fault
.safe_user_strncpy_align_done:
movq $0x0101010101010101, %r8
movq $0x8080808080808080, %r9
.safe_user_strncpy_qword_loop:
cmpq $8, %rcx
jb .safe_user_strncpy_qword_done
movq (%rsi), %rax
movq %rax, %r10
movq %rax, %r11
# https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
subq %r8, %r10
notq %r11
andq %r11, %r10
andq %r9, %r10
jnz .safe_user_strncpy_byte_loop
movq %rax, (%rdi)
addq $8, %rdi
addq $8, %rsi
subq $8, %rcx
jnz .safe_user_strncpy_qword_loop
jmp safe_user_strncpy_fault
.safe_user_strncpy_qword_done:
testq %rcx, %rcx
jz safe_user_strncpy_fault
.safe_user_strncpy_byte_loop:
movb (%rsi), %al
movb %al, (%rdi)
testb %al, %al
jz .safe_user_strncpy_done
incq %rdi
incq %rsi
decq %rcx
jnz .safe_user_strncpy_byte_loop
safe_user_strncpy_fault:
xorq %rax, %rax
ret
.safe_user_strncpy_done:
movb $1, %al
ret
safe_user_strncpy_end:

View File

@@ -11,26 +11,28 @@
.code32 .code32
# multiboot2 header // custom addresses, video mode info, page align modules
.set multiboot_flags, (1 << 16) | (1 << 2) | (1 << 0)
.section .multiboot, "aw" .section .multiboot, "aw"
.align 8
multiboot_start: multiboot_start:
.long 0x1BADB002 .long 0x1BADB002
.long (1 << 2) # page align modules .long multiboot_flags
.long -(0x1BADB002 + (1 << 2)) .long -(0x1BADB002 + multiboot_flags)
.long 0 .long V2P(multiboot_start)
.long 0 .long V2P(g_kernel_start)
.long 0 .long V2P(g_kernel_bss_start)
.long 0 .long V2P(g_kernel_end)
.long 0 .long V2P(_start)
.long 0 .long 0
.long FB_WIDTH .long FB_WIDTH
.long FB_HEIGHT .long FB_HEIGHT
.long FB_BPP .long FB_BPP
multiboot_end: multiboot_end:
.align 8
.section .multiboot2, "aw"
multiboot2_start: multiboot2_start:
.long 0xE85250D6 .long 0xE85250D6
.long 0 .long 0
@@ -66,7 +68,6 @@ multiboot2_start:
multiboot2_end: multiboot2_end:
.section .bananboot, "aw" .section .bananboot, "aw"
.align 8
bananboot_start: bananboot_start:
.long 0xBABAB007 .long 0xBABAB007
.long -(0xBABAB007 + FB_WIDTH + FB_HEIGHT + FB_BPP) .long -(0xBABAB007 + FB_WIDTH + FB_HEIGHT + FB_BPP)

View File

@@ -1,12 +1,4 @@
.macro swapgs_if_necessary, n .macro intr_header, n
testb $3, \n(%rsp)
jz 1f; jnp 1f
swapgs
1:
.endm
.macro pushaq, n
swapgs_if_necessary \n
pushq %rax pushq %rax
pushq %rcx pushq %rcx
pushq %rdx pushq %rdx
@@ -22,10 +14,18 @@
pushq %r13 pushq %r13
pushq %r14 pushq %r14
pushq %r15 pushq %r15
testb $3, \n+15*8(%rsp)
jz 1f
swapgs
1: cld
.endm .endm
.macro popaq, n .macro intr_footer, n
popq %r15 testb $3, \n+15*8(%rsp)
jz 1f
call cpp_check_signal
swapgs
1: popq %r15
popq %r14 popq %r14
popq %r13 popq %r13
popq %r12 popq %r12
@@ -40,12 +40,10 @@
popq %rdx popq %rdx
popq %rcx popq %rcx
popq %rax popq %rax
swapgs_if_necessary \n
.endm .endm
isr_stub: isr_stub:
pushaq 24 intr_header 24
cld
movq %cr0, %rax; pushq %rax movq %cr0, %rax; pushq %rax
movq %cr2, %rax; pushq %rax movq %cr2, %rax; pushq %rax
movq %cr3, %rax; pushq %rax movq %cr3, %rax; pushq %rax
@@ -58,33 +56,33 @@ isr_stub:
call cpp_isr_handler call cpp_isr_handler
addq $32, %rsp addq $32, %rsp
popaq 24 intr_footer 24
addq $16, %rsp addq $16, %rsp
iretq iretq
irq_stub: irq_stub:
pushaq 24 intr_header 24
cld xorq %rbp, %rbp
movq 120(%rsp), %rdi # irq number movq 120(%rsp), %rdi # irq number
call cpp_irq_handler call cpp_irq_handler
popaq 24 intr_footer 24
addq $16, %rsp addq $16, %rsp
iretq iretq
.global asm_ipi_handler .global asm_ipi_handler
asm_ipi_handler: asm_ipi_handler:
pushaq 8 intr_header 8
cld xorq %rbp, %rbp
call cpp_ipi_handler call cpp_ipi_handler
popaq 8 intr_footer 8
iretq iretq
.global asm_timer_handler .global asm_timer_handler
asm_timer_handler: asm_timer_handler:
pushaq 8 intr_header 8
cld xorq %rbp, %rbp
call cpp_timer_handler call cpp_timer_handler
popaq 8 intr_footer 8
iretq iretq
.macro isr n .macro isr n

View File

@@ -11,6 +11,7 @@ SECTIONS
{ {
g_kernel_execute_start = .; g_kernel_execute_start = .;
*(.multiboot) *(.multiboot)
*(.multiboot2)
*(.bananboot) *(.bananboot)
*(.text.*) *(.text.*)
} }
@@ -43,6 +44,7 @@ SECTIONS
} }
.bss ALIGN(4K) : AT(ADDR(.bss) - KERNEL_OFFSET) .bss ALIGN(4K) : AT(ADDR(.bss) - KERNEL_OFFSET)
{ {
g_kernel_bss_start = .;
*(COMMON) *(COMMON)
*(.bss) *(.bss)
g_kernel_writable_end = .; g_kernel_writable_end = .;

View File

@@ -1,44 +1,34 @@
#pragma once #pragma once
#include <kernel/Attributes.h> #include <BAN/MacroUtils.h>
#include <kernel/IDT.h>
#include <stdint.h>
#include <sys/syscall.h>
namespace Kernel #if defined(__x86_64__)
{ #define _kas_instruction "syscall"
#define _kas_result rax
ALWAYS_INLINE long syscall(int syscall, uintptr_t arg1 = 0, uintptr_t arg2 = 0, uintptr_t arg3 = 0, uintptr_t arg4 = 0, uintptr_t arg5 = 0) #define _kas_arguments rdi, rsi, rdx, r10, r8, r9
{ #define _kas_globbers rcx, rdx, rdi, rsi, r8, r9, r10, r11
long ret; #elif defined(__i686__)
#if ARCH(x86_64) #define _kas_instruction "int $0xF0"
register uintptr_t r10 asm("r10") = arg3; #define _kas_result eax
register uintptr_t r8 asm( "r8") = arg4; #define _kas_arguments eax, ebx, ecx, edx, esi, edi
register uintptr_t r9 asm( "r9") = arg5; #define _kas_globbers
asm volatile(
"syscall"
: "=a"(ret)
, "+D"(syscall)
, "+S"(arg1)
, "+d"(arg2)
, "+r"(r10)
, "+r"(r8)
, "+r"(r9)
:: "rcx", "r11", "memory");
#elif ARCH(i686)
asm volatile(
"int %[irq]"
: "=a"(ret)
: [irq]"i"(static_cast<int>(IRQ_SYSCALL)) // WTF GCC 15
, "a"(syscall)
, "b"(arg1)
, "c"(arg2)
, "d"(arg3)
, "S"(arg4)
, "D"(arg5)
: "memory");
#endif #endif
return ret;
}
} #define _kas_argument_var(index, value) register long _kas_a##index asm(_ban_stringify(_ban_get(index, _kas_arguments))) = (long)value;
#define _kas_dummy_var(index, value) register long _kas_d##index asm(#value);
#define _kas_input(index, _) "r"(_kas_a##index)
#define _kas_output(index, _) , "=r"(_kas_d##index)
#define _kas_globber(_, value) #value
#define _kas_syscall(...) ({ \
register long _kas_ret asm(_ban_stringify(_kas_result)); \
_ban_for_each(_kas_argument_var, __VA_ARGS__) \
_ban_for_each(_kas_dummy_var, _kas_globbers) \
asm volatile( \
_kas_instruction \
: "=r"(_kas_ret) _ban_for_each(_kas_output, _kas_globbers) \
: _ban_for_each_comma(_kas_input, __VA_ARGS__) \
: "cc", "memory"); \
(void)_kas_a0; /* require 1 argument */ \
_kas_ret; \
})

View File

@@ -6,7 +6,7 @@
namespace Kernel namespace Kernel
{ {
class AC97AudioController : public AudioController, public Interruptable class AC97AudioController final : public AudioController, public Interruptable
{ {
public: public:
static BAN::ErrorOr<void> create(PCI::Device& pci_device); static BAN::ErrorOr<void> create(PCI::Device& pci_device);
@@ -23,11 +23,15 @@ namespace Kernel
uint32_t get_current_pin() const override { return 0; } uint32_t get_current_pin() const override { return 0; }
BAN::ErrorOr<void> set_current_pin(uint32_t pin) override { if (pin != 0) return BAN::Error::from_errno(EINVAL); return {}; } BAN::ErrorOr<void> set_current_pin(uint32_t pin) override { if (pin != 0) return BAN::Error::from_errno(EINVAL); return {}; }
BAN::ErrorOr<void> set_volume_mdB(int32_t) override;
private: private:
AC97AudioController(PCI::Device& pci_device) AC97AudioController(PCI::Device& pci_device)
: m_pci_device(pci_device) : m_pci_device(pci_device)
{ } { }
uint32_t get_volume_data() const;
BAN::ErrorOr<void> initialize(); BAN::ErrorOr<void> initialize();
BAN::ErrorOr<void> initialize_bld(); BAN::ErrorOr<void> initialize_bld();
BAN::ErrorOr<void> initialize_interrupts(); BAN::ErrorOr<void> initialize_interrupts();

View File

@@ -1,8 +1,11 @@
#pragma once #pragma once
#include <kernel/Device/Device.h> #include <kernel/Device/Device.h>
#include <kernel/Memory/ByteRingBuffer.h>
#include <kernel/PCI.h> #include <kernel/PCI.h>
#include <sys/ioctl.h>
namespace Kernel namespace Kernel
{ {
@@ -16,6 +19,7 @@ namespace Kernel
protected: protected:
AudioController(); AudioController();
BAN::ErrorOr<void> initialize();
virtual void handle_new_data() = 0; virtual void handle_new_data() = 0;
@@ -26,8 +30,10 @@ namespace Kernel
virtual uint32_t get_current_pin() const = 0; virtual uint32_t get_current_pin() const = 0;
virtual BAN::ErrorOr<void> set_current_pin(uint32_t) = 0; virtual BAN::ErrorOr<void> set_current_pin(uint32_t) = 0;
virtual BAN::ErrorOr<void> set_volume_mdB(int32_t) = 0;
bool can_read_impl() const override { return false; } bool can_read_impl() const override { return false; }
bool can_write_impl() const override { SpinLockGuard _(m_spinlock); return m_sample_data_size < m_sample_data_capacity; } bool can_write_impl() const override { SpinLockGuard _(m_spinlock); return !m_sample_data->full(); }
bool has_error_impl() const override { return false; } bool has_error_impl() const override { return false; }
bool has_hungup_impl() const override { return false; } bool has_hungup_impl() const override { return false; }
@@ -40,9 +46,9 @@ namespace Kernel
mutable SpinLock m_spinlock; mutable SpinLock m_spinlock;
static constexpr size_t m_sample_data_capacity = 1 << 20; static constexpr size_t m_sample_data_capacity = 1 << 20;
uint8_t m_sample_data[m_sample_data_capacity]; BAN::UniqPtr<ByteRingBuffer> m_sample_data;
size_t m_sample_data_head { 0 };
size_t m_sample_data_size { 0 }; snd_volume_info m_volume_info {};
private: private:
const dev_t m_rdev; const dev_t m_rdev;

View File

@@ -8,7 +8,7 @@ namespace Kernel
class HDAudioController; class HDAudioController;
class HDAudioFunctionGroup : public AudioController class HDAudioFunctionGroup final : public AudioController
{ {
public: public:
static BAN::ErrorOr<BAN::RefPtr<HDAudioFunctionGroup>> create(BAN::RefPtr<HDAudioController>, uint8_t cid, HDAudio::AFGNode&&); static BAN::ErrorOr<BAN::RefPtr<HDAudioFunctionGroup>> create(BAN::RefPtr<HDAudioController>, uint8_t cid, HDAudio::AFGNode&&);
@@ -24,6 +24,8 @@ namespace Kernel
uint32_t get_current_pin() const override; uint32_t get_current_pin() const override;
BAN::ErrorOr<void> set_current_pin(uint32_t) override; BAN::ErrorOr<void> set_current_pin(uint32_t) override;
BAN::ErrorOr<void> set_volume_mdB(int32_t) override;
void handle_new_data() override; void handle_new_data() override;
private: private:
@@ -46,15 +48,12 @@ namespace Kernel
BAN::ErrorOr<void> recurse_output_paths(const HDAudio::AFGWidget& widget, BAN::Vector<const HDAudio::AFGWidget*>& path); BAN::ErrorOr<void> recurse_output_paths(const HDAudio::AFGWidget& widget, BAN::Vector<const HDAudio::AFGWidget*>& path);
uint16_t get_format_data() const; uint16_t get_format_data() const;
uint16_t get_volume_data() const;
size_t bdl_offset() const; size_t bdl_offset() const;
void queue_bdl_data(); void queue_bdl_data();
private: private:
static constexpr size_t m_max_path_length = 16;
// use 6 512 sample BDL entries // use 6 512 sample BDL entries
// each entry is ~10.7 ms at 48 kHz // each entry is ~10.7 ms at 48 kHz
// -> total buffered audio is 64 ms // -> total buffered audio is 64 ms
@@ -66,6 +65,7 @@ namespace Kernel
const uint8_t m_cid; const uint8_t m_cid;
BAN::Vector<BAN::Vector<const HDAudio::AFGWidget*>> m_output_paths; BAN::Vector<BAN::Vector<const HDAudio::AFGWidget*>> m_output_paths;
BAN::Vector<const HDAudio::AFGWidget*> m_output_pins;
size_t m_output_path_index { SIZE_MAX }; size_t m_output_path_index { SIZE_MAX };
uint8_t m_stream_id { 0xFF }; uint8_t m_stream_id { 0xFF };

View File

@@ -50,9 +50,21 @@ namespace Kernel::HDAudio
{ {
bool input; bool input;
bool output; bool output;
bool display; // HDMI or DP
uint32_t config;
} pin_complex; } pin_complex;
}; };
struct Amplifier
{
uint8_t offset;
uint8_t num_steps;
uint8_t step_size;
bool mute;
};
BAN::Optional<Amplifier> output_amplifier;
BAN::Vector<uint16_t> connections; BAN::Vector<uint16_t> connections;
}; };

View File

@@ -11,6 +11,7 @@ namespace Kernel::HDAudio
VMIN = 0x02, VMIN = 0x02,
VMAJ = 0x03, VMAJ = 0x03,
GCTL = 0x08, GCTL = 0x08,
STATESTS = 0x0E,
INTCTL = 0x20, INTCTL = 0x20,
INTSTS = 0x24, INTSTS = 0x24,

View File

@@ -44,7 +44,7 @@ namespace Kernel
struct BootModule struct BootModule
{ {
paddr_t start; paddr_t start;
size_t size; uint64_t size;
}; };
struct BootInfo struct BootInfo

View File

@@ -37,6 +37,8 @@ namespace Kernel
virtual BAN::ErrorOr<size_t> read_impl(off_t, BAN::ByteSpan) override; virtual BAN::ErrorOr<size_t> read_impl(off_t, BAN::ByteSpan) override;
virtual BAN::ErrorOr<size_t> write_impl(off_t, BAN::ConstByteSpan) override; virtual BAN::ErrorOr<size_t> write_impl(off_t, BAN::ConstByteSpan) override;
BAN::ErrorOr<long> ioctl_impl(int cmd, void* arg) override;
virtual bool can_read_impl() const override { return true; } virtual bool can_read_impl() const override { return true; }
virtual bool can_write_impl() const override { return true; } virtual bool can_write_impl() const override { return true; }
virtual bool has_error_impl() const override { return false; } virtual bool has_error_impl() const override { return false; }

View File

@@ -0,0 +1,52 @@
#pragma once
#include <kernel/FS/Inode.h>
namespace Kernel
{
class EventFD final : public Inode
{
public:
static BAN::ErrorOr<BAN::RefPtr<Inode>> create(uint64_t initval, bool semaphore);
ino_t ino() const override { return 0; }
Mode mode() const override { return { Mode::IFCHR | Mode::IRUSR | Mode::IWUSR }; }
nlink_t nlink() const override { return ref_count(); }
uid_t uid() const override { return 0; }
gid_t gid() const override { return 0; }
off_t size() const override { return 0; }
timespec atime() const override { return {}; }
timespec mtime() const override { return {}; }
timespec ctime() const override { return {}; }
blksize_t blksize() const override { return 8; }
blkcnt_t blocks() const override { return 0; }
dev_t dev() const override { return 0; }
dev_t rdev() const override { return 0; }
const FileSystem* filesystem() const override { return nullptr; }
protected:
BAN::ErrorOr<size_t> read_impl(off_t, BAN::ByteSpan) override;
BAN::ErrorOr<size_t> write_impl(off_t, BAN::ConstByteSpan) override;
BAN::ErrorOr<void> fsync_impl() final override { return {}; }
bool can_read_impl() const override { return m_value > 0; }
bool can_write_impl() const override { return m_value < UINT64_MAX - 1; }
bool has_error_impl() const override { return false; }
bool has_hungup_impl() const override { return false; }
private:
EventFD(uint64_t initval, bool is_semaphore)
: m_is_semaphore(is_semaphore)
, m_value(initval)
{ }
private:
const bool m_is_semaphore;
uint64_t m_value;
ThreadBlocker m_thread_blocker;
};
}

View File

@@ -113,6 +113,8 @@ namespace Kernel
BAN::ErrorOr<size_t> recvmsg(msghdr& message, int flags); BAN::ErrorOr<size_t> recvmsg(msghdr& message, int flags);
BAN::ErrorOr<void> getsockname(sockaddr* address, socklen_t* address_len); BAN::ErrorOr<void> getsockname(sockaddr* address, socklen_t* address_len);
BAN::ErrorOr<void> getpeername(sockaddr* address, socklen_t* address_len); BAN::ErrorOr<void> getpeername(sockaddr* address, socklen_t* address_len);
BAN::ErrorOr<void> getsockopt(int level, int option, void* value, socklen_t* value_len);
BAN::ErrorOr<void> setsockopt(int level, int option, const void* value, socklen_t value_len);
// General API // General API
BAN::ErrorOr<size_t> read(off_t, BAN::ByteSpan buffer); BAN::ErrorOr<size_t> read(off_t, BAN::ByteSpan buffer);
@@ -161,6 +163,8 @@ namespace Kernel
virtual BAN::ErrorOr<size_t> sendmsg_impl(const msghdr&, int) { return BAN::Error::from_errno(ENOTSUP); } virtual BAN::ErrorOr<size_t> sendmsg_impl(const msghdr&, int) { return BAN::Error::from_errno(ENOTSUP); }
virtual BAN::ErrorOr<void> getsockname_impl(sockaddr*, socklen_t*) { return BAN::Error::from_errno(ENOTSUP); } virtual BAN::ErrorOr<void> getsockname_impl(sockaddr*, socklen_t*) { return BAN::Error::from_errno(ENOTSUP); }
virtual BAN::ErrorOr<void> getpeername_impl(sockaddr*, socklen_t*) { return BAN::Error::from_errno(ENOTSUP); } virtual BAN::ErrorOr<void> getpeername_impl(sockaddr*, socklen_t*) { return BAN::Error::from_errno(ENOTSUP); }
virtual BAN::ErrorOr<void> getsockopt_impl(int, int, void*, socklen_t*) { return BAN::Error::from_errno(ENOTSUP); }
virtual BAN::ErrorOr<void> setsockopt_impl(int, int, const void*, socklen_t) { return BAN::Error::from_errno(ENOTSUP); }
// General API // General API
virtual BAN::ErrorOr<size_t> read_impl(off_t, BAN::ByteSpan) { return BAN::Error::from_errno(ENOTSUP); } virtual BAN::ErrorOr<size_t> read_impl(off_t, BAN::ByteSpan) { return BAN::Error::from_errno(ENOTSUP); }

View File

@@ -2,6 +2,7 @@
#include <BAN/Array.h> #include <BAN/Array.h>
#include <kernel/FS/Inode.h> #include <kernel/FS/Inode.h>
#include <kernel/Memory/ByteRingBuffer.h>
#include <kernel/ThreadBlocker.h> #include <kernel/ThreadBlocker.h>
namespace Kernel namespace Kernel
@@ -38,7 +39,7 @@ namespace Kernel
virtual BAN::ErrorOr<size_t> write_impl(off_t, BAN::ConstByteSpan) override; virtual BAN::ErrorOr<size_t> write_impl(off_t, BAN::ConstByteSpan) override;
virtual BAN::ErrorOr<void> fsync_impl() final override { return {}; } virtual BAN::ErrorOr<void> fsync_impl() final override { return {}; }
virtual bool can_read_impl() const override { return m_buffer_size > 0; } virtual bool can_read_impl() const override { return !m_buffer->empty(); }
virtual bool can_write_impl() const override { return true; } virtual bool can_write_impl() const override { return true; }
virtual bool has_error_impl() const override { return m_reading_count == 0; } virtual bool has_error_impl() const override { return m_reading_count == 0; }
virtual bool has_hungup_impl() const override { return m_writing_count == 0; } virtual bool has_hungup_impl() const override { return m_writing_count == 0; }
@@ -54,9 +55,7 @@ namespace Kernel
timespec m_ctime {}; timespec m_ctime {};
ThreadBlocker m_thread_blocker; ThreadBlocker m_thread_blocker;
BAN::Array<uint8_t, PAGE_SIZE> m_buffer; BAN::UniqPtr<ByteRingBuffer> m_buffer;
BAN::Atomic<size_t> m_buffer_size { 0 };
size_t m_buffer_tail { 0 };
BAN::Atomic<uint32_t> m_writing_count { 1 }; BAN::Atomic<uint32_t> m_writing_count { 1 };
BAN::Atomic<uint32_t> m_reading_count { 1 }; BAN::Atomic<uint32_t> m_reading_count { 1 };

View File

@@ -1,12 +1,11 @@
#pragma once #pragma once
#include <kernel/BootInfo.h> #include <kernel/BootInfo.h>
#include <kernel/FS/FileSystem.h> #include <kernel/FS/Inode.h>
namespace Kernel namespace Kernel
{ {
bool is_ustar_boot_module(const BootModule&); BAN::ErrorOr<bool> unpack_boot_module_into_directory(BAN::RefPtr<Inode>, const BootModule&);
BAN::ErrorOr<void> unpack_boot_module_into_filesystem(BAN::RefPtr<FileSystem>, const BootModule&);
} }

View File

@@ -20,7 +20,7 @@ namespace Kernel
constexpr uint8_t IRQ_MSI_BASE = 0x80; constexpr uint8_t IRQ_MSI_BASE = 0x80;
constexpr uint8_t IRQ_MSI_END = 0xF0; constexpr uint8_t IRQ_MSI_END = 0xF0;
#if ARCH(i686) #if ARCH(i686)
constexpr uint8_t IRQ_SYSCALL = 0xF0; constexpr uint8_t IRQ_SYSCALL = 0xF0; // hard coded in kernel/API/Syscall.h
#endif #endif
constexpr uint8_t IRQ_IPI = 0xF1; constexpr uint8_t IRQ_IPI = 0xF1;
constexpr uint8_t IRQ_TIMER = 0xF2; constexpr uint8_t IRQ_TIMER = 0xF2;

View File

@@ -0,0 +1,76 @@
#pragma once
#include <BAN/ByteSpan.h>
#include <BAN/UniqPtr.h>
#include <BAN/Vector.h>
#include <kernel/Memory/Types.h>
namespace Kernel
{
class ByteRingBuffer
{
public:
static BAN::ErrorOr<BAN::UniqPtr<ByteRingBuffer>> create(size_t size);
~ByteRingBuffer();
void push(BAN::ConstByteSpan data)
{
ASSERT(data.size() + m_size <= m_capacity);
uint8_t* buffer_head = reinterpret_cast<uint8_t*>(m_vaddr) + (m_tail + m_size) % m_capacity;
memcpy(buffer_head, data.data(), data.size());
m_size += data.size();
}
void pop(size_t size)
{
ASSERT(size <= m_size);
m_tail = (m_tail + size) % m_capacity;
m_size -= size;
}
void pop_back(size_t size)
{
ASSERT(size <= m_size);
m_size -= size;
}
BAN::ConstByteSpan get_data() const
{
const uint8_t* base = reinterpret_cast<const uint8_t*>(m_vaddr);
return { base + m_tail, m_size };
}
uint8_t front() const
{
ASSERT(!empty());
return reinterpret_cast<const uint8_t*>(m_vaddr)[m_tail];
}
uint8_t back() const
{
ASSERT(!empty());
return reinterpret_cast<const uint8_t*>(m_vaddr)[m_tail + m_size];
}
bool empty() const { return m_size == 0; }
bool full() const { return m_size == m_capacity; }
size_t free() const { return m_capacity - m_size; }
size_t size() const { return m_size; }
size_t capacity() const { return m_capacity; }
private:
ByteRingBuffer(size_t capacity)
: m_capacity(capacity)
{ }
private:
size_t m_size { 0 };
size_t m_tail { 0 };
const size_t m_capacity;
vaddr_t m_vaddr { 0 };
};
}

View File

@@ -18,6 +18,8 @@ namespace Kernel
static void initialize(); static void initialize();
static Heap& get(); static Heap& get();
void release_boot_modules();
paddr_t take_free_page(); paddr_t take_free_page();
void release_page(paddr_t); void release_page(paddr_t);

View File

@@ -28,6 +28,20 @@ namespace Kernel
private: private:
MemoryBackedRegion(PageTable&, size_t size, Type, PageTable::flags_t, int status_flags); MemoryBackedRegion(PageTable&, size_t size, Type, PageTable::flags_t, int status_flags);
private:
struct PhysicalPage
{
PhysicalPage(paddr_t paddr)
: paddr(paddr)
{ }
~PhysicalPage();
BAN::Atomic<uint32_t> ref_count { 1 };
const paddr_t paddr;
};
BAN::Vector<PhysicalPage*> m_physical_pages;
Mutex m_mutex;
}; };
} }

View File

@@ -100,19 +100,21 @@ namespace Kernel
static BAN::ErrorOr<PageTable*> create_userspace(); static BAN::ErrorOr<PageTable*> create_userspace();
~PageTable(); ~PageTable();
void unmap_page(vaddr_t, bool send_smp_message = true); void unmap_page(vaddr_t, bool invalidate = true);
void unmap_range(vaddr_t, size_t bytes); void unmap_range(vaddr_t, size_t bytes);
void map_page_at(paddr_t, vaddr_t, flags_t, MemoryType = MemoryType::Normal, bool send_smp_message = true); void map_page_at(paddr_t, vaddr_t, flags_t, MemoryType = MemoryType::Normal, bool invalidate = true);
void map_range_at(paddr_t, vaddr_t, size_t bytes, flags_t, MemoryType = MemoryType::Normal); void map_range_at(paddr_t, vaddr_t, size_t bytes, flags_t, MemoryType = MemoryType::Normal);
void remove_writable_from_range(vaddr_t, size_t);
paddr_t physical_address_of(vaddr_t) const; paddr_t physical_address_of(vaddr_t) const;
flags_t get_page_flags(vaddr_t) const; flags_t get_page_flags(vaddr_t) const;
bool is_page_free(vaddr_t) const; bool is_page_free(vaddr_t) const;
bool is_range_free(vaddr_t, size_t bytes) const; bool is_range_free(vaddr_t, size_t bytes) const;
bool reserve_page(vaddr_t, bool only_free = true, bool send_smp_message = true); bool reserve_page(vaddr_t, bool only_free = true, bool invalidate = true);
bool reserve_range(vaddr_t, size_t bytes, bool only_free = true); bool reserve_range(vaddr_t, size_t bytes, bool only_free = true);
vaddr_t reserve_free_page(vaddr_t first_address, vaddr_t last_address = UINTPTR_MAX); vaddr_t reserve_free_page(vaddr_t first_address, vaddr_t last_address = UINTPTR_MAX);
@@ -121,6 +123,9 @@ namespace Kernel
void load(); void load();
void initial_load(); void initial_load();
void invalidate_page(vaddr_t addr, bool send_smp_message) { invalidate_range(addr, 1, send_smp_message); }
void invalidate_range(vaddr_t addr, size_t pages, bool send_smp_message);
InterruptState lock() const { return m_lock.lock(); } InterruptState lock() const { return m_lock.lock(); }
void unlock(InterruptState state) const { m_lock.unlock(state); } void unlock(InterruptState state) const { m_lock.unlock(state); }
@@ -133,8 +138,6 @@ namespace Kernel
void map_kernel_memory(); void map_kernel_memory();
void prepare_fast_page(); void prepare_fast_page();
void invalidate(vaddr_t, bool send_smp_message);
static void map_fast_page(paddr_t); static void map_fast_page(paddr_t);
static void unmap_fast_page(); static void unmap_fast_page();

View File

@@ -10,7 +10,7 @@ namespace Kernel
class PhysicalRange class PhysicalRange
{ {
public: public:
PhysicalRange(paddr_t, size_t); PhysicalRange(paddr_t, uint64_t);
paddr_t reserve_page(); paddr_t reserve_page();
void release_page(paddr_t); void release_page(paddr_t);

View File

@@ -4,7 +4,7 @@
#if ARCH(x86_64) #if ARCH(x86_64)
#define KERNEL_OFFSET 0xFFFFFFFF80000000 #define KERNEL_OFFSET 0xFFFFFFFF80000000
#define USERSPACE_END 0xFFFF800000000000 #define USERSPACE_END 0x800000000000
#elif ARCH(i686) #elif ARCH(i686)
#define KERNEL_OFFSET 0xC0000000 #define KERNEL_OFFSET 0xC0000000
#define USERSPACE_END 0xC0000000 #define USERSPACE_END 0xC0000000

View File

@@ -20,8 +20,6 @@ namespace Kernel
static BAN::ErrorOr<BAN::UniqPtr<VirtualRange>> create_to_vaddr_range(PageTable&, vaddr_t vaddr_start, vaddr_t vaddr_end, size_t, PageTable::flags_t flags, bool preallocate_pages, bool add_guard_pages); static BAN::ErrorOr<BAN::UniqPtr<VirtualRange>> create_to_vaddr_range(PageTable&, vaddr_t vaddr_start, vaddr_t vaddr_end, size_t, PageTable::flags_t flags, bool preallocate_pages, bool add_guard_pages);
~VirtualRange(); ~VirtualRange();
BAN::ErrorOr<BAN::UniqPtr<VirtualRange>> clone(PageTable&);
vaddr_t vaddr() const { return m_vaddr + (m_has_guard_pages ? PAGE_SIZE : 0); } vaddr_t vaddr() const { return m_vaddr + (m_has_guard_pages ? PAGE_SIZE : 0); }
size_t size() const { return m_size - (m_has_guard_pages ? 2 * PAGE_SIZE : 0); } size_t size() const { return m_size - (m_has_guard_pages ? 2 * PAGE_SIZE : 0); }
PageTable::flags_t flags() const { return m_flags; } PageTable::flags_t flags() const { return m_flags; }

View File

@@ -31,35 +31,18 @@ namespace Kernel
public: public:
static BAN::ErrorOr<BAN::UniqPtr<ARPTable>> create(); static BAN::ErrorOr<BAN::UniqPtr<ARPTable>> create();
~ARPTable();
BAN::ErrorOr<BAN::MACAddress> get_mac_from_ipv4(NetworkInterface&, BAN::IPv4Address); BAN::ErrorOr<BAN::MACAddress> get_mac_from_ipv4(NetworkInterface&, BAN::IPv4Address);
void add_arp_packet(NetworkInterface&, BAN::ConstByteSpan); BAN::ErrorOr<void> handle_arp_packet(NetworkInterface&, BAN::ConstByteSpan);
private: private:
ARPTable(); ARPTable() = default;
void packet_handle_task();
BAN::ErrorOr<void> handle_arp_packet(NetworkInterface&, const ARPPacket&);
private: private:
struct PendingArpPacket SpinLock m_arp_table_lock;
{
NetworkInterface& interface;
ARPPacket packet;
};
private:
SpinLock m_table_lock;
SpinLock m_pending_lock;
BAN::HashMap<BAN::IPv4Address, BAN::MACAddress> m_arp_table; BAN::HashMap<BAN::IPv4Address, BAN::MACAddress> m_arp_table;
Thread* m_thread { nullptr };
BAN::CircularQueue<PendingArpPacket, 128> m_pending_packets;
ThreadBlocker m_pending_thread_blocker;
friend class BAN::UniqPtr<ARPTable>; friend class BAN::UniqPtr<ARPTable>;
}; };

View File

@@ -23,14 +23,14 @@ namespace Kernel
static BAN::ErrorOr<BAN::RefPtr<E1000>> create(PCI::Device&); static BAN::ErrorOr<BAN::RefPtr<E1000>> create(PCI::Device&);
~E1000(); ~E1000();
virtual BAN::MACAddress get_mac_address() const override { return m_mac_address; } BAN::MACAddress get_mac_address() const override { return m_mac_address; }
virtual bool link_up() override { return m_link_up; } bool link_up() override { return m_link_up; }
virtual int link_speed() override; int link_speed() override;
virtual size_t payload_mtu() const override { return E1000_RX_BUFFER_SIZE - sizeof(EthernetHeader); } size_t payload_mtu() const override { return E1000_RX_BUFFER_SIZE - sizeof(EthernetHeader); }
virtual void handle_irq() final override; void handle_irq() final override;
protected: protected:
E1000(PCI::Device& pci_device) E1000(PCI::Device& pci_device)
@@ -45,12 +45,12 @@ namespace Kernel
uint32_t read32(uint16_t reg); uint32_t read32(uint16_t reg);
void write32(uint16_t reg, uint32_t value); void write32(uint16_t reg, uint32_t value);
virtual BAN::ErrorOr<void> send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::ConstByteSpan) override; BAN::ErrorOr<void> send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::Span<const BAN::ConstByteSpan> payload) override;
virtual bool can_read_impl() const override { return false; } bool can_read_impl() const override { return false; }
virtual bool can_write_impl() const override { return false; } bool can_write_impl() const override { return false; }
virtual bool has_error_impl() const override { return false; } bool has_error_impl() const override { return false; }
virtual bool has_hungup_impl() const override { return false; } bool has_hungup_impl() const override { return false; }
private: private:
BAN::ErrorOr<void> read_mac_address(); BAN::ErrorOr<void> read_mac_address();
@@ -61,7 +61,7 @@ namespace Kernel
void enable_link(); void enable_link();
BAN::ErrorOr<void> enable_interrupt(); BAN::ErrorOr<void> enable_interrupt();
void handle_receive(); void receive_thread();
protected: protected:
PCI::Device& m_pci_device; PCI::Device& m_pci_device;
@@ -75,6 +75,10 @@ namespace Kernel
BAN::UniqPtr<DMARegion> m_tx_descriptor_region; BAN::UniqPtr<DMARegion> m_tx_descriptor_region;
SpinLock m_lock; SpinLock m_lock;
bool m_thread_should_die { false };
BAN::Atomic<bool> m_thread_is_dead { true };
ThreadBlocker m_thread_blocker;
BAN::MACAddress m_mac_address {}; BAN::MACAddress m_mac_address {};
bool m_link_up { false }; bool m_link_up { false };

View File

@@ -12,8 +12,8 @@ namespace Kernel
static BAN::ErrorOr<BAN::RefPtr<E1000E>> create(PCI::Device&); static BAN::ErrorOr<BAN::RefPtr<E1000E>> create(PCI::Device&);
protected: protected:
virtual void detect_eeprom() override; void detect_eeprom() override;
virtual uint32_t eeprom_read(uint8_t addr) override; uint32_t eeprom_read(uint8_t addr) override;
private: private:
E1000E(PCI::Device& pci_device) E1000E(PCI::Device& pci_device)

View File

@@ -38,11 +38,10 @@ namespace Kernel
public: public:
static BAN::ErrorOr<BAN::UniqPtr<IPv4Layer>> create(); static BAN::ErrorOr<BAN::UniqPtr<IPv4Layer>> create();
~IPv4Layer();
ARPTable& arp_table() { return *m_arp_table; } ARPTable& arp_table() { return *m_arp_table; }
void add_ipv4_packet(NetworkInterface&, BAN::ConstByteSpan); BAN::ErrorOr<void> handle_ipv4_packet(NetworkInterface&, BAN::ConstByteSpan);
virtual void unbind_socket(uint16_t port) override; virtual void unbind_socket(uint16_t port) override;
virtual BAN::ErrorOr<void> bind_socket_with_target(BAN::RefPtr<NetworkSocket>, const sockaddr* target_address, socklen_t target_address_len) override; virtual BAN::ErrorOr<void> bind_socket_with_target(BAN::RefPtr<NetworkSocket>, const sockaddr* target_address, socklen_t target_address_len) override;
@@ -55,35 +54,15 @@ namespace Kernel
virtual size_t header_size() const override { return sizeof(IPv4Header); } virtual size_t header_size() const override { return sizeof(IPv4Header); }
private: private:
IPv4Layer(); IPv4Layer() = default;
void add_ipv4_header(BAN::ByteSpan packet, BAN::IPv4Address src_ipv4, BAN::IPv4Address dst_ipv4, uint8_t protocol) const;
BAN::ErrorOr<in_port_t> find_free_port(); BAN::ErrorOr<in_port_t> find_free_port();
void packet_handle_task();
BAN::ErrorOr<void> handle_ipv4_packet(NetworkInterface&, BAN::ByteSpan);
private: private:
struct PendingIPv4Packet BAN::UniqPtr<ARPTable> m_arp_table;
{
NetworkInterface& interface;
};
private: RecursiveSpinLock m_bound_socket_lock;
RecursiveSpinLock m_bound_socket_lock; BAN::HashMap<int, BAN::WeakPtr<NetworkSocket>> m_bound_sockets;
BAN::UniqPtr<ARPTable> m_arp_table;
Thread* m_thread { nullptr };
static constexpr size_t pending_packet_buffer_size = 128 * PAGE_SIZE;
BAN::UniqPtr<VirtualRange> m_pending_packet_buffer;
BAN::CircularQueue<PendingIPv4Packet, 128> m_pending_packets;
ThreadBlocker m_pending_thread_blocker;
SpinLock m_pending_lock;
size_t m_pending_total_size { 0 };
BAN::HashMap<int, BAN::WeakPtr<NetworkSocket>> m_bound_sockets;
friend class BAN::UniqPtr<IPv4Layer>; friend class BAN::UniqPtr<IPv4Layer>;
}; };

View File

@@ -9,6 +9,7 @@ namespace Kernel
{ {
public: public:
static constexpr size_t buffer_size = BAN::numeric_limits<uint16_t>::max() + 1; static constexpr size_t buffer_size = BAN::numeric_limits<uint16_t>::max() + 1;
static constexpr size_t buffer_count = 32;
public: public:
static BAN::ErrorOr<BAN::RefPtr<LoopbackInterface>> create(); static BAN::ErrorOr<BAN::RefPtr<LoopbackInterface>> create();
@@ -24,8 +25,9 @@ namespace Kernel
LoopbackInterface() LoopbackInterface()
: NetworkInterface(Type::Loopback) : NetworkInterface(Type::Loopback)
{} {}
~LoopbackInterface();
BAN::ErrorOr<void> send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::ConstByteSpan) override; BAN::ErrorOr<void> send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::Span<const BAN::ConstByteSpan> payload) override;
bool can_read_impl() const override { return false; } bool can_read_impl() const override { return false; }
bool can_write_impl() const override { return false; } bool can_write_impl() const override { return false; }
@@ -33,8 +35,27 @@ namespace Kernel
bool has_hungup_impl() const override { return false; } bool has_hungup_impl() const override { return false; }
private: private:
SpinLock m_buffer_lock; void receive_thread();
private:
struct Descriptor
{
uint8_t* addr;
uint32_t size;
uint8_t state;
};
private:
Mutex m_buffer_lock;
BAN::UniqPtr<VirtualRange> m_buffer; BAN::UniqPtr<VirtualRange> m_buffer;
uint32_t m_buffer_tail { 0 };
uint32_t m_buffer_head { 0 };
Descriptor m_descriptors[buffer_count] {};
bool m_thread_should_die { false };
BAN::Atomic<bool> m_thread_is_dead { true };
ThreadBlocker m_thread_blocker;
}; };
} }

View File

@@ -60,7 +60,11 @@ namespace Kernel
virtual dev_t rdev() const override { return m_rdev; } virtual dev_t rdev() const override { return m_rdev; }
virtual BAN::StringView name() const override { return m_name; } virtual BAN::StringView name() const override { return m_name; }
virtual BAN::ErrorOr<void> send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::ConstByteSpan) = 0; BAN::ErrorOr<void> send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::ConstByteSpan payload)
{
return send_bytes(destination, protocol, { &payload, 1 });
}
virtual BAN::ErrorOr<void> send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::Span<const BAN::ConstByteSpan> payload) = 0;
private: private:
const Type m_type; const Type m_type;

View File

@@ -11,7 +11,7 @@ namespace Kernel
BAN::IPv4Address src_ipv4 { 0 }; BAN::IPv4Address src_ipv4 { 0 };
BAN::IPv4Address dst_ipv4 { 0 }; BAN::IPv4Address dst_ipv4 { 0 };
BAN::NetworkEndian<uint16_t> protocol { 0 }; BAN::NetworkEndian<uint16_t> protocol { 0 };
BAN::NetworkEndian<uint16_t> extra { 0 }; BAN::NetworkEndian<uint16_t> length { 0 };
}; };
static_assert(sizeof(PseudoHeader) == 12); static_assert(sizeof(PseudoHeader) == 12);
@@ -36,6 +36,7 @@ namespace Kernel
NetworkLayer() = default; NetworkLayer() = default;
}; };
uint16_t calculate_internet_checksum(BAN::ConstByteSpan packet, const PseudoHeader& pseudo_header); uint16_t calculate_internet_checksum(BAN::ConstByteSpan buffer);
uint16_t calculate_internet_checksum(BAN::Span<const BAN::ConstByteSpan> buffers);
} }

View File

@@ -32,7 +32,7 @@ namespace Kernel
BAN::ErrorOr<BAN::RefPtr<NetworkInterface>> interface(const sockaddr* target, socklen_t target_len); BAN::ErrorOr<BAN::RefPtr<NetworkInterface>> interface(const sockaddr* target, socklen_t target_len);
virtual size_t protocol_header_size() const = 0; virtual size_t protocol_header_size() const = 0;
virtual void add_protocol_header(BAN::ByteSpan packet, uint16_t dst_port, PseudoHeader) = 0; virtual void get_protocol_header(BAN::ByteSpan header, BAN::ConstByteSpan payload, uint16_t dst_port, PseudoHeader) = 0;
virtual NetworkProtocol protocol() const = 0; virtual NetworkProtocol protocol() const = 0;
virtual void receive_packet(BAN::ConstByteSpan, const sockaddr* sender, socklen_t sender_len) = 0; virtual void receive_packet(BAN::ConstByteSpan, const sockaddr* sender, socklen_t sender_len) = 0;

View File

@@ -29,9 +29,11 @@ namespace Kernel
: NetworkInterface(Type::Ethernet) : NetworkInterface(Type::Ethernet)
, m_pci_device(pci_device) , m_pci_device(pci_device)
{ } { }
~RTL8169();
BAN::ErrorOr<void> initialize(); BAN::ErrorOr<void> initialize();
virtual BAN::ErrorOr<void> send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::ConstByteSpan) override; virtual BAN::ErrorOr<void> send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::Span<const BAN::ConstByteSpan>) override;
virtual bool can_read_impl() const override { return false; } virtual bool can_read_impl() const override { return false; }
virtual bool can_write_impl() const override { return false; } virtual bool can_write_impl() const override { return false; }
@@ -47,7 +49,7 @@ namespace Kernel
void enable_link(); void enable_link();
BAN::ErrorOr<void> enable_interrupt(); BAN::ErrorOr<void> enable_interrupt();
void handle_receive(); void receive_thread();
protected: protected:
PCI::Device& m_pci_device; PCI::Device& m_pci_device;
@@ -63,6 +65,9 @@ namespace Kernel
BAN::UniqPtr<DMARegion> m_tx_descriptor_region; BAN::UniqPtr<DMARegion> m_tx_descriptor_region;
SpinLock m_lock; SpinLock m_lock;
bool m_thread_should_die { false };
BAN::Atomic<bool> m_thread_is_dead { true };
ThreadBlocker m_thread_blocker; ThreadBlocker m_thread_blocker;
uint32_t m_rx_current { 0 }; uint32_t m_rx_current { 0 };

View File

@@ -4,7 +4,7 @@
#include <BAN/Endianness.h> #include <BAN/Endianness.h>
#include <BAN/Queue.h> #include <BAN/Queue.h>
#include <kernel/Lock/Mutex.h> #include <kernel/Lock/Mutex.h>
#include <kernel/Memory/VirtualRange.h> #include <kernel/Memory/ByteRingBuffer.h>
#include <kernel/Networking/NetworkInterface.h> #include <kernel/Networking/NetworkInterface.h>
#include <kernel/Networking/NetworkSocket.h> #include <kernel/Networking/NetworkSocket.h>
#include <kernel/Thread.h> #include <kernel/Thread.h>
@@ -50,28 +50,30 @@ namespace Kernel
static BAN::ErrorOr<BAN::RefPtr<TCPSocket>> create(NetworkLayer&, const Info&); static BAN::ErrorOr<BAN::RefPtr<TCPSocket>> create(NetworkLayer&, const Info&);
~TCPSocket(); ~TCPSocket();
virtual NetworkProtocol protocol() const override { return NetworkProtocol::TCP; } NetworkProtocol protocol() const override { return NetworkProtocol::TCP; }
virtual size_t protocol_header_size() const override { return sizeof(TCPHeader) + m_tcp_options_bytes; } size_t protocol_header_size() const override { return sizeof(TCPHeader) + m_tcp_options_bytes; }
virtual void add_protocol_header(BAN::ByteSpan packet, uint16_t dst_port, PseudoHeader) override; void get_protocol_header(BAN::ByteSpan header, BAN::ConstByteSpan payload, uint16_t dst_port, PseudoHeader) override;
protected: protected:
virtual BAN::ErrorOr<long> accept_impl(sockaddr*, socklen_t*, int) override; BAN::ErrorOr<long> accept_impl(sockaddr*, socklen_t*, int) override;
virtual BAN::ErrorOr<void> connect_impl(const sockaddr*, socklen_t) override; BAN::ErrorOr<void> connect_impl(const sockaddr*, socklen_t) override;
virtual BAN::ErrorOr<void> listen_impl(int) override; BAN::ErrorOr<void> listen_impl(int) override;
virtual BAN::ErrorOr<void> bind_impl(const sockaddr*, socklen_t) override; BAN::ErrorOr<void> bind_impl(const sockaddr*, socklen_t) override;
virtual BAN::ErrorOr<size_t> recvmsg_impl(msghdr& message, int flags) override; BAN::ErrorOr<size_t> recvmsg_impl(msghdr& message, int flags) override;
virtual BAN::ErrorOr<size_t> sendmsg_impl(const msghdr& message, int flags) override; BAN::ErrorOr<size_t> sendmsg_impl(const msghdr& message, int flags) override;
virtual BAN::ErrorOr<void> getpeername_impl(sockaddr*, socklen_t*) override; BAN::ErrorOr<void> getpeername_impl(sockaddr*, socklen_t*) override;
BAN::ErrorOr<void> getsockopt_impl(int, int, void*, socklen_t*) override;
BAN::ErrorOr<void> setsockopt_impl(int, int, const void*, socklen_t) override;
virtual BAN::ErrorOr<long> ioctl_impl(int, void*) override; BAN::ErrorOr<long> ioctl_impl(int, void*) override;
virtual void receive_packet(BAN::ConstByteSpan, const sockaddr* sender, socklen_t sender_len) override; void receive_packet(BAN::ConstByteSpan, const sockaddr* sender, socklen_t sender_len) override;
virtual bool can_read_impl() const override; bool can_read_impl() const override;
virtual bool can_write_impl() const override; bool can_write_impl() const override;
virtual bool has_error_impl() const override { return false; } bool has_error_impl() const override { return false; }
virtual bool has_hungup_impl() const override; bool has_hungup_impl() const override;
private: private:
enum class State enum class State
@@ -91,33 +93,32 @@ namespace Kernel
struct RecvWindowInfo struct RecvWindowInfo
{ {
uint32_t start_seq { 0 }; // sequence number of first byte in buffer uint32_t start_seq { 0 }; // sequence number of first byte in buffer
bool has_ghost_byte { false }; bool has_ghost_byte { false };
uint32_t data_size { 0 }; // number of bytes in this buffer uint8_t scale_shift { 0 }; // window scale
uint8_t scale_shift { 0 }; // window scale BAN::UniqPtr<ByteRingBuffer> buffer;
BAN::UniqPtr<VirtualRange> buffer;
}; };
struct SendWindowInfo struct SendWindowInfo
{ {
uint32_t mss { 0 }; // maximum segment size uint32_t mss { 0 }; // maximum segment size
uint16_t non_scaled_size { 0 }; // window size without scaling uint16_t non_scaled_size { 0 }; // window size without scaling
uint8_t scale_shift { 0 }; // window scale uint8_t scale_shift { 0 }; // window scale
uint32_t scaled_size() const { return (uint32_t)non_scaled_size << scale_shift; } uint32_t scaled_size() const { return (uint32_t)non_scaled_size << scale_shift; }
uint32_t start_seq { 0 }; // sequence number of first byte in buffer uint32_t start_seq { 0 }; // sequence number of first byte in buffer
uint32_t current_seq { 0 }; // sequence number of next send uint32_t current_seq { 0 }; // sequence number of next send
uint32_t current_ack { 0 }; // sequence number aknowledged by connection uint32_t current_ack { 0 }; // sequence number aknowledged by connection
uint64_t last_send_ms { 0 }; // last send time, used for retransmission timeout uint64_t last_send_ms { 0 }; // last send time, used for retransmission timeout
bool has_ghost_byte { false }; bool has_ghost_byte { false };
bool had_zero_window { false };
uint32_t data_size { 0 }; // number of bytes in this buffer uint32_t sent_size { 0 }; // number of bytes in this buffer that have been sent
uint32_t sent_size { 0 }; // number of bytes in this buffer that have been sent BAN::UniqPtr<ByteRingBuffer> buffer;
BAN::UniqPtr<VirtualRange> buffer;
}; };
struct ConnectionInfo struct ConnectionInfo
@@ -131,6 +132,8 @@ namespace Kernel
{ {
ConnectionInfo target; ConnectionInfo target;
uint32_t target_start_seq; uint32_t target_start_seq;
uint16_t maximum_seqment_size;
uint8_t window_scale;
}; };
struct ListenKey struct ListenKey
@@ -165,8 +168,17 @@ namespace Kernel
State m_next_state { State::Closed }; State m_next_state { State::Closed };
uint8_t m_next_flags { 0 }; uint8_t m_next_flags { 0 };
size_t m_last_sent_window_size { 0 };
Thread* m_thread { nullptr }; Thread* m_thread { nullptr };
// TODO: actually support these
bool m_keep_alive { false };
bool m_no_delay { false };
bool m_should_send_zero_window { false };
bool m_should_send_window_update { false };
uint64_t m_time_wait_start_ms { 0 }; uint64_t m_time_wait_start_ms { 0 };
ThreadBlocker m_thread_blocker; ThreadBlocker m_thread_blocker;

View File

@@ -25,26 +25,28 @@ namespace Kernel
public: public:
static BAN::ErrorOr<BAN::RefPtr<UDPSocket>> create(NetworkLayer&, const Socket::Info&); static BAN::ErrorOr<BAN::RefPtr<UDPSocket>> create(NetworkLayer&, const Socket::Info&);
virtual NetworkProtocol protocol() const override { return NetworkProtocol::UDP; } NetworkProtocol protocol() const override { return NetworkProtocol::UDP; }
virtual size_t protocol_header_size() const override { return sizeof(UDPHeader); } size_t protocol_header_size() const override { return sizeof(UDPHeader); }
virtual void add_protocol_header(BAN::ByteSpan packet, uint16_t dst_port, PseudoHeader) override; void get_protocol_header(BAN::ByteSpan header, BAN::ConstByteSpan payload, uint16_t dst_port, PseudoHeader) override;
protected: protected:
virtual void receive_packet(BAN::ConstByteSpan, const sockaddr* sender, socklen_t sender_len) override; void receive_packet(BAN::ConstByteSpan, const sockaddr* sender, socklen_t sender_len) override;
virtual BAN::ErrorOr<void> connect_impl(const sockaddr*, socklen_t) override; BAN::ErrorOr<void> connect_impl(const sockaddr*, socklen_t) override;
virtual BAN::ErrorOr<void> bind_impl(const sockaddr* address, socklen_t address_len) override; BAN::ErrorOr<void> bind_impl(const sockaddr* address, socklen_t address_len) override;
virtual BAN::ErrorOr<size_t> recvmsg_impl(msghdr& message, int flags) override; BAN::ErrorOr<size_t> recvmsg_impl(msghdr& message, int flags) override;
virtual BAN::ErrorOr<size_t> sendmsg_impl(const msghdr& message, int flags) override; BAN::ErrorOr<size_t> sendmsg_impl(const msghdr& message, int flags) override;
virtual BAN::ErrorOr<void> getpeername_impl(sockaddr*, socklen_t*) override { return BAN::Error::from_errno(ENOTCONN); } BAN::ErrorOr<void> getpeername_impl(sockaddr*, socklen_t*) override { return BAN::Error::from_errno(ENOTCONN); }
BAN::ErrorOr<void> getsockopt_impl(int, int, void*, socklen_t*) override;
BAN::ErrorOr<void> setsockopt_impl(int, int, const void*, socklen_t) override;
virtual BAN::ErrorOr<long> ioctl_impl(int, void*) override; BAN::ErrorOr<long> ioctl_impl(int, void*) override;
virtual bool can_read_impl() const override { return !m_packets.empty(); } bool can_read_impl() const override { return !m_packets.empty(); }
virtual bool can_write_impl() const override { return true; } bool can_write_impl() const override { return true; }
virtual bool has_error_impl() const override { return false; } bool has_error_impl() const override { return false; }
virtual bool has_hungup_impl() const override { return false; } bool has_hungup_impl() const override { return false; }
private: private:
UDPSocket(NetworkLayer&, const Socket::Info&); UDPSocket(NetworkLayer&, const Socket::Info&);

View File

@@ -6,7 +6,6 @@
#include <kernel/FS/Socket.h> #include <kernel/FS/Socket.h>
#include <kernel/FS/TmpFS/Inode.h> #include <kernel/FS/TmpFS/Inode.h>
#include <kernel/FS/VirtualFileSystem.h> #include <kernel/FS/VirtualFileSystem.h>
#include <kernel/Lock/SpinLock.h>
#include <kernel/OpenFileDescriptorSet.h> #include <kernel/OpenFileDescriptorSet.h>
namespace Kernel namespace Kernel
@@ -32,6 +31,8 @@ namespace Kernel
virtual BAN::ErrorOr<size_t> recvmsg_impl(msghdr& message, int flags) override; virtual BAN::ErrorOr<size_t> recvmsg_impl(msghdr& message, int flags) override;
virtual BAN::ErrorOr<size_t> sendmsg_impl(const msghdr& message, int flags) override; virtual BAN::ErrorOr<size_t> sendmsg_impl(const msghdr& message, int flags) override;
virtual BAN::ErrorOr<void> getpeername_impl(sockaddr*, socklen_t*) override; virtual BAN::ErrorOr<void> getpeername_impl(sockaddr*, socklen_t*) override;
virtual BAN::ErrorOr<void> getsockopt_impl(int, int, void*, socklen_t*) override;
virtual BAN::ErrorOr<void> setsockopt_impl(int, int, const void*, socklen_t) override;
virtual bool can_read_impl() const override; virtual bool can_read_impl() const override;
virtual bool can_write_impl() const override; virtual bool can_write_impl() const override;
@@ -69,9 +70,10 @@ namespace Kernel
size_t size; size_t size;
BAN::Vector<FDWrapper> fds; BAN::Vector<FDWrapper> fds;
BAN::Optional<struct ucred> ucred; BAN::Optional<struct ucred> ucred;
BAN::WeakPtr<UnixDomainSocket> sender;
}; };
BAN::ErrorOr<void> add_packet(const msghdr&, PacketInfo&&); BAN::ErrorOr<size_t> add_packet(const msghdr&, PacketInfo&&, bool dont_block);
private: private:
const Socket::Type m_socket_type; const Socket::Type m_socket_type;
@@ -81,10 +83,14 @@ namespace Kernel
BAN::CircularQueue<PacketInfo, 512> m_packet_infos; BAN::CircularQueue<PacketInfo, 512> m_packet_infos;
size_t m_packet_size_total { 0 }; size_t m_packet_size_total { 0 };
size_t m_packet_buffer_tail { 0 };
BAN::UniqPtr<VirtualRange> m_packet_buffer; BAN::UniqPtr<VirtualRange> m_packet_buffer;
Mutex m_packet_lock; mutable Mutex m_packet_lock;
ThreadBlocker m_packet_thread_blocker; ThreadBlocker m_packet_thread_blocker;
BAN::Atomic<size_t> m_sndbuf { 0 };
BAN::Atomic<size_t> m_bytes_sent { 0 };
friend class BAN::RefPtr<UnixDomainSocket>; friend class BAN::RefPtr<UnixDomainSocket>;
}; };

View File

@@ -44,6 +44,10 @@ namespace Kernel
void close_all(); void close_all();
void close_cloexec(); void close_cloexec();
bool is_cloexec(int fd);
void add_cloexec(int fd);
void remove_cloexec(int fd);
BAN::ErrorOr<void> flock(int fd, int op); BAN::ErrorOr<void> flock(int fd, int op);
BAN::ErrorOr<size_t> read(int fd, BAN::ByteSpan); BAN::ErrorOr<size_t> read(int fd, BAN::ByteSpan);
@@ -84,27 +88,6 @@ namespace Kernel
friend class BAN::RefPtr<OpenFileDescription>; friend class BAN::RefPtr<OpenFileDescription>;
}; };
struct OpenFile
{
OpenFile() = default;
OpenFile(BAN::RefPtr<OpenFileDescription> description, int descriptor_flags)
: description(BAN::move(description))
, descriptor_flags(descriptor_flags)
{ }
BAN::RefPtr<Inode> inode() const { ASSERT(description); return description->file.inode; }
BAN::StringView path() const { ASSERT(description); return description->file.canonical_path.sv(); }
int& status_flags() { ASSERT(description); return description->status_flags; }
const int& status_flags() const { ASSERT(description); return description->status_flags; }
off_t& offset() { ASSERT(description); return description->offset; }
const off_t& offset() const { ASSERT(description); return description->offset; }
BAN::RefPtr<OpenFileDescription> description;
int descriptor_flags { 0 };
};
BAN::ErrorOr<void> validate_fd(int) const; BAN::ErrorOr<void> validate_fd(int) const;
BAN::ErrorOr<int> get_free_fd() const; BAN::ErrorOr<int> get_free_fd() const;
BAN::ErrorOr<void> get_free_fd_pair(int fds[2]) const; BAN::ErrorOr<void> get_free_fd_pair(int fds[2]) const;
@@ -139,7 +122,8 @@ namespace Kernel
const Credentials& m_credentials; const Credentials& m_credentials;
mutable Mutex m_mutex; mutable Mutex m_mutex;
BAN::Array<OpenFile, OPEN_MAX> m_open_files; BAN::Array<BAN::RefPtr<OpenFileDescription>, OPEN_MAX> m_open_files;
BAN::Array<uint32_t, (OPEN_MAX + 31) / 32> m_cloexec_files {};
}; };
} }

View File

@@ -147,6 +147,8 @@ namespace Kernel
BAN::ErrorOr<long> sys_epoll_ctl(int epfd, int op, int fd, struct epoll_event* event); BAN::ErrorOr<long> sys_epoll_ctl(int epfd, int op, int fd, struct epoll_event* event);
BAN::ErrorOr<long> sys_epoll_pwait2(int epfd, struct epoll_event* events, int maxevents, const struct timespec* timeout, const sigset_t* sigmask); BAN::ErrorOr<long> sys_epoll_pwait2(int epfd, struct epoll_event* events, int maxevents, const struct timespec* timeout, const sigset_t* sigmask);
BAN::ErrorOr<long> sys_eventfd(unsigned int initval_hi, int flags);
BAN::ErrorOr<long> sys_pipe(int fildes[2]); BAN::ErrorOr<long> sys_pipe(int fildes[2]);
BAN::ErrorOr<long> sys_dup2(int fildes, int fildes2); BAN::ErrorOr<long> sys_dup2(int fildes, int fildes2);
@@ -396,6 +398,7 @@ namespace Kernel
BAN::UniqPtr<PageTable> m_page_table; BAN::UniqPtr<PageTable> m_page_table;
BAN::RefPtr<TTY> m_controlling_terminal; BAN::RefPtr<TTY> m_controlling_terminal;
friend class OpenFileDescriptorSet;
friend class Thread; friend class Thread;
}; };

View File

@@ -101,7 +101,7 @@ namespace Kernel
InterruptStack* m_interrupt_stack { nullptr }; InterruptStack* m_interrupt_stack { nullptr };
InterruptRegisters* m_interrupt_registers { nullptr }; InterruptRegisters* m_interrupt_registers { nullptr };
uint64_t m_last_reschedule_ns { 0 }; uint64_t m_next_reschedule_ns { 0 };
uint64_t m_last_load_balance_ns { 0 }; uint64_t m_last_load_balance_ns { 0 };
struct ThreadInfo struct ThreadInfo

View File

@@ -3,6 +3,7 @@
#include <BAN/Array.h> #include <BAN/Array.h>
#include <kernel/Device/Device.h> #include <kernel/Device/Device.h>
#include <kernel/Lock/SpinLock.h> #include <kernel/Lock/SpinLock.h>
#include <kernel/Memory/ByteRingBuffer.h>
#include <kernel/Terminal/TerminalDriver.h> #include <kernel/Terminal/TerminalDriver.h>
#include <kernel/ThreadBlocker.h> #include <kernel/ThreadBlocker.h>
#include <LibInput/KeyEvent.h> #include <LibInput/KeyEvent.h>
@@ -102,8 +103,7 @@ namespace Kernel
struct Buffer struct Buffer
{ {
BAN::Array<uint8_t, 1024> buffer; BAN::UniqPtr<ByteRingBuffer> buffer;
size_t bytes { 0 };
bool flush { false }; bool flush { false };
ThreadBlocker thread_blocker; ThreadBlocker thread_blocker;
}; };

View File

@@ -38,8 +38,12 @@ namespace Kernel
// stack overflows on some machines with 8 page stack // stack overflows on some machines with 8 page stack
static constexpr size_t kernel_stack_size { PAGE_SIZE * 16 }; static constexpr size_t kernel_stack_size { PAGE_SIZE * 16 };
// TODO: userspace stack is hard limited to 32 MiB, maybe make this dynamic? // TODO: userspace stack size is hard limited, maybe make this dynamic?
#if ARCH(x86_64)
static constexpr size_t userspace_stack_size { 32 << 20 }; static constexpr size_t userspace_stack_size { 32 << 20 };
#elif ARCH(i686)
static constexpr size_t userspace_stack_size { 4 << 20 };
#endif
public: public:
static BAN::ErrorOr<Thread*> create_kernel(entry_t, void*); static BAN::ErrorOr<Thread*> create_kernel(entry_t, void*);
@@ -56,12 +60,10 @@ namespace Kernel
// Returns true, if thread is going to trigger signal // Returns true, if thread is going to trigger signal
bool is_interrupted_by_signal(bool skip_stop_and_cont = false) const; bool is_interrupted_by_signal(bool skip_stop_and_cont = false) const;
// Returns true if pending signal can be added to thread
bool can_add_signal_to_execute() const;
bool will_execute_signal() const;
// Returns true if handled signal had SA_RESTART // Returns true if handled signal had SA_RESTART
bool handle_signal(int signal = 0, const siginfo_t& signal_info = {}); bool handle_signal_if_interrupted();
void add_signal(int signal, const siginfo_t& info); bool handle_signal(int signal, const siginfo_t&);
void add_signal(int signal, const siginfo_t&);
void set_suspend_signal_mask(uint64_t sigmask); void set_suspend_signal_mask(uint64_t sigmask);
static bool is_stopping_signal(int signal); static bool is_stopping_signal(int signal);
@@ -153,6 +155,16 @@ namespace Kernel
bool currently_on_alternate_stack() const; bool currently_on_alternate_stack() const;
struct signal_handle_info_t
{
vaddr_t handler;
vaddr_t stack_top;
uint64_t restore_sigmask;
bool has_sa_restart;
};
signal_handle_info_t remove_signal_and_get_info(int signal);
void handle_signal_impl(int signal, const siginfo_t&, const signal_handle_info_t&);
private: private:
// NOTE: this is the first member to force it being last destructed // NOTE: this is the first member to force it being last destructed
// {kernel,userspace}_stack has to be destroyed before page table // {kernel,userspace}_stack has to be destroyed before page table
@@ -166,7 +178,9 @@ namespace Kernel
bool m_is_userspace { false }; bool m_is_userspace { false };
bool m_delete_process { false }; bool m_delete_process { false };
bool m_has_custom_fsbase { false };
vaddr_t m_fsbase { 0 }; vaddr_t m_fsbase { 0 };
bool m_has_custom_gsbase { false };
vaddr_t m_gsbase { 0 }; vaddr_t m_gsbase { 0 };
SchedulerQueue::Node* m_scheduler_node { nullptr }; SchedulerQueue::Node* m_scheduler_node { nullptr };

View File

@@ -531,11 +531,8 @@ acpi_release_global_lock:
return BAN::Error::from_errno(EFAULT); return BAN::Error::from_errno(EFAULT);
} }
if (!s5_node.as.package->elements[0].resolved || !s5_node.as.package->elements[1].resolved) TRY(AML::resolve_package_element(s5_node.as.package->elements[0], true));
{ TRY(AML::resolve_package_element(s5_node.as.package->elements[1], true));
dwarnln("TODO: lazy evaluate package \\_S5 elements");
return BAN::Error::from_errno(ENOTSUP);
}
auto slp_typa_node = TRY(AML::convert_node(TRY(s5_node.as.package->elements[0].value.node->copy()), AML::ConvInteger, sizeof(uint64_t))); auto slp_typa_node = TRY(AML::convert_node(TRY(s5_node.as.package->elements[0].value.node->copy()), AML::ConvInteger, sizeof(uint64_t)));
auto slp_typb_node = TRY(AML::convert_node(TRY(s5_node.as.package->elements[1].value.node->copy()), AML::ConvInteger, sizeof(uint64_t))); auto slp_typb_node = TRY(AML::convert_node(TRY(s5_node.as.package->elements[1].value.node->copy()), AML::ConvInteger, sizeof(uint64_t)));

View File

@@ -118,6 +118,8 @@ namespace Kernel
BAN::ErrorOr<void> AC97AudioController::initialize() BAN::ErrorOr<void> AC97AudioController::initialize()
{ {
TRY(AudioController::initialize());
m_pci_device.enable_bus_mastering(); m_pci_device.enable_bus_mastering();
m_mixer = TRY(m_pci_device.allocate_bar_region(0)); m_mixer = TRY(m_pci_device.allocate_bar_region(0));
@@ -135,8 +137,27 @@ namespace Kernel
// Reset mixer to default values // Reset mixer to default values
m_mixer->write16(AudioMixerRegister::Reset, 0); m_mixer->write16(AudioMixerRegister::Reset, 0);
// Master volume 100%, no mute // Master volumes
m_mixer->write16(AudioMixerRegister::MasterVolume, 0x0000); m_mixer->write16(AudioMixerRegister::MasterVolume, 0x2020);
if (m_mixer->read16(AudioMixerRegister::MasterVolume) == 0x2020)
{
m_volume_info = {
.min_mdB = -94500,
.max_mdB = 0,
.step_mdB = 1500,
.mdB = 0,
};
}
else
{
m_volume_info = {
.min_mdB = -46500,
.max_mdB = 0,
.step_mdB = 1500,
.mdB = 0,
};
}
m_mixer->write16(AudioMixerRegister::MasterVolume, get_volume_data());
// PCM output volume left/right +0 db, no mute // PCM output volume left/right +0 db, no mute
m_mixer->write16(AudioMixerRegister::PCMOutVolume, 0x0808); m_mixer->write16(AudioMixerRegister::PCMOutVolume, 0x0808);
@@ -185,6 +206,19 @@ namespace Kernel
return {}; return {};
} }
uint32_t AC97AudioController::get_volume_data() const
{
const uint32_t steps = (-m_volume_info.mdB + m_volume_info.step_mdB / 2) / m_volume_info.step_mdB;
return (steps << 8) | steps;
}
BAN::ErrorOr<void> AC97AudioController::set_volume_mdB(int32_t mdB)
{
m_volume_info.mdB = BAN::Math::clamp(mdB, m_volume_info.min_mdB, m_volume_info.max_mdB);
m_mixer->write16(AudioMixerRegister::MasterVolume, get_volume_data());
return {};
}
void AC97AudioController::handle_new_data() void AC97AudioController::handle_new_data()
{ {
ASSERT(m_spinlock.current_processor_has_lock()); ASSERT(m_spinlock.current_processor_has_lock());
@@ -203,23 +237,22 @@ namespace Kernel
if (next_bld_head == m_bdl_tail) if (next_bld_head == m_bdl_tail)
break; break;
const size_t sample_data_tail = (m_sample_data_head + m_sample_data_capacity - m_sample_data_size) % m_sample_data_capacity; const size_t sample_frames = BAN::Math::min(m_sample_data->size() / get_channels() / sizeof(uint16_t), m_samples_per_entry / get_channels());
if (sample_frames == 0)
const size_t max_memcpy = BAN::Math::min(m_sample_data_size, m_sample_data_capacity - sample_data_tail);
const size_t samples = BAN::Math::min(max_memcpy / 2, m_samples_per_entry);
if (samples == 0)
break; break;
const size_t copy_total_bytes = sample_frames * get_channels() * sizeof(uint16_t);
auto& entry = reinterpret_cast<AC97::BufferDescriptorListEntry*>(m_bdl_region->vaddr())[m_bdl_head]; auto& entry = reinterpret_cast<AC97::BufferDescriptorListEntry*>(m_bdl_region->vaddr())[m_bdl_head];
entry.samples = samples; entry.samples = sample_frames * get_channels();
entry.flags = (1 << 15); entry.flags = (1 << 15);
memcpy( memcpy(
reinterpret_cast<void*>(m_bdl_region->paddr_to_vaddr(entry.address)), reinterpret_cast<void*>(m_bdl_region->paddr_to_vaddr(entry.address)),
&m_sample_data[sample_data_tail], m_sample_data->get_data().data(),
samples * 2 copy_total_bytes
); );
m_sample_data_size -= samples * 2; m_sample_data->pop(copy_total_bytes);
lvi = m_bdl_head; lvi = m_bdl_head;
m_bdl_head = next_bld_head; m_bdl_head = next_bld_head;

View File

@@ -53,34 +53,28 @@ namespace Kernel
return {}; return {};
} }
BAN::ErrorOr<void> AudioController::initialize()
{
m_sample_data = TRY(ByteRingBuffer::create(m_sample_data_capacity));
return {};
}
BAN::ErrorOr<size_t> AudioController::write_impl(off_t, BAN::ConstByteSpan buffer) BAN::ErrorOr<size_t> AudioController::write_impl(off_t, BAN::ConstByteSpan buffer)
{ {
SpinLockGuard lock_guard(m_spinlock); SpinLockGuard lock_guard(m_spinlock);
while (m_sample_data_size >= m_sample_data_capacity) while (m_sample_data->full())
{ {
SpinLockGuardAsMutex smutex(lock_guard); SpinLockGuardAsMutex smutex(lock_guard);
TRY(Thread::current().block_or_eintr_indefinite(m_sample_data_blocker, &smutex)); TRY(Thread::current().block_or_eintr_indefinite(m_sample_data_blocker, &smutex));
} }
size_t nwritten = 0; const size_t to_copy = BAN::Math::min(buffer.size(), m_sample_data->free());
while (nwritten < buffer.size()) m_sample_data->push(buffer.slice(0, to_copy));
{
if (m_sample_data_size >= m_sample_data_capacity)
break;
const size_t max_memcpy = BAN::Math::min(m_sample_data_capacity - m_sample_data_size, m_sample_data_capacity - m_sample_data_head);
const size_t to_copy = BAN::Math::min(buffer.size() - nwritten, max_memcpy);
memcpy(m_sample_data + m_sample_data_head, buffer.data() + nwritten, to_copy);
nwritten += to_copy;
m_sample_data_head = (m_sample_data_head + to_copy) % m_sample_data_capacity;
m_sample_data_size += to_copy;
}
handle_new_data(); handle_new_data();
return nwritten; return to_copy;
} }
BAN::ErrorOr<long> AudioController::ioctl_impl(int cmd, void* arg) BAN::ErrorOr<long> AudioController::ioctl_impl(int cmd, void* arg)
@@ -97,9 +91,9 @@ namespace Kernel
case SND_GET_BUFFERSZ: case SND_GET_BUFFERSZ:
{ {
SpinLockGuard _(m_spinlock); SpinLockGuard _(m_spinlock);
*static_cast<uint32_t*>(arg) = m_sample_data_size; *static_cast<uint32_t*>(arg) = m_sample_data->size();
if (cmd == SND_RESET_BUFFER) if (cmd == SND_RESET_BUFFER)
m_sample_data_size = 0; m_sample_data->pop(m_sample_data->size());
return 0; return 0;
} }
case SND_GET_TOTAL_PINS: case SND_GET_TOTAL_PINS:
@@ -111,6 +105,12 @@ namespace Kernel
case SND_SET_PIN: case SND_SET_PIN:
TRY(set_current_pin(*static_cast<uint32_t*>(arg))); TRY(set_current_pin(*static_cast<uint32_t*>(arg)));
return 0; return 0;
case SND_GET_VOLUME_INFO:
*static_cast<snd_volume_info*>(arg) = m_volume_info;
return 0;
case SND_SET_VOLUME_MDB:
TRY(set_volume_mdB(*static_cast<int32_t*>(arg)));
return 0;
} }
return CharacterDevice::ioctl_impl(cmd, arg); return CharacterDevice::ioctl_impl(cmd, arg);

View File

@@ -2,6 +2,8 @@
#include <kernel/Audio/HDAudio/Registers.h> #include <kernel/Audio/HDAudio/Registers.h>
#include <kernel/FS/DevFS/FileSystem.h> #include <kernel/FS/DevFS/FileSystem.h>
#include <BAN/Sort.h>
namespace Kernel namespace Kernel
{ {
@@ -25,6 +27,8 @@ namespace Kernel
BAN::ErrorOr<void> HDAudioFunctionGroup::initialize() BAN::ErrorOr<void> HDAudioFunctionGroup::initialize()
{ {
TRY(AudioController::initialize());
if constexpr(DEBUG_HDAUDIO) if constexpr(DEBUG_HDAUDIO)
{ {
const auto widget_to_string = const auto widget_to_string =
@@ -50,19 +54,13 @@ namespace Kernel
{ {
if (widget.type == HDAudio::AFGWidget::Type::PinComplex) if (widget.type == HDAudio::AFGWidget::Type::PinComplex)
{ {
const uint32_t config = TRY(m_controller->send_command({ dprintln(" widget {}: {} ({}, {}, {}), {32b}",
.data = 0x00,
.command = 0xF1C,
.node_index = widget.id,
.codec_address = m_cid,
}));
dprintln(" widget {}: {} ({}, {}), {32b}",
widget.id, widget.id,
widget_to_string(widget.type), widget_to_string(widget.type),
(int)widget.pin_complex.output, (int)widget.pin_complex.output,
(int)widget.pin_complex.input, (int)widget.pin_complex.input,
config (int)widget.pin_complex.display,
widget.pin_complex.config
); );
} }
else else
@@ -79,59 +77,49 @@ namespace Kernel
} }
TRY(initialize_stream()); TRY(initialize_stream());
TRY(initialize_output());
if (auto ret = initialize_output(); ret.is_error())
{
// No usable pins, not really an error
if (ret.error().get_error_code() == ENODEV)
return {};
return ret.release_error();
}
DevFileSystem::get().add_device(this); DevFileSystem::get().add_device(this);
return {}; return {};
} }
uint32_t HDAudioFunctionGroup::get_total_pins() const uint32_t HDAudioFunctionGroup::get_total_pins() const
{ {
uint32_t count = 0; return m_output_pins.size();
for (const auto& widget : m_afg_node.widgets)
if (widget.type == HDAudio::AFGWidget::Type::PinComplex && widget.pin_complex.output)
count++;
return count;
} }
uint32_t HDAudioFunctionGroup::get_current_pin() const uint32_t HDAudioFunctionGroup::get_current_pin() const
{ {
const auto current_id = m_output_paths[m_output_path_index].front()->id; const auto current_id = m_output_paths[m_output_path_index].front()->id;
for (size_t i = 0; i < m_output_pins.size(); i++)
uint32_t pin = 0; if (m_output_pins[i]->id == current_id)
for (const auto& widget : m_afg_node.widgets) return i;
{
if (widget.type != HDAudio::AFGWidget::Type::PinComplex || !widget.pin_complex.output)
continue;
if (widget.id == current_id)
return pin;
pin++;
}
ASSERT_NOT_REACHED(); ASSERT_NOT_REACHED();
} }
BAN::ErrorOr<void> HDAudioFunctionGroup::set_current_pin(uint32_t pin) BAN::ErrorOr<void> HDAudioFunctionGroup::set_current_pin(uint32_t pin)
{ {
uint32_t pin_id = 0; if (pin >= m_output_pins.size())
for (const auto& widget : m_afg_node.widgets) return BAN::Error::from_errno(EINVAL);
{
if (widget.type != HDAudio::AFGWidget::Type::PinComplex || !widget.pin_complex.output)
continue;
if (pin-- > 0)
continue;
pin_id = widget.id;
break;
}
if (auto ret = disable_output_path(m_output_path_index); ret.is_error()) if (auto ret = disable_output_path(m_output_path_index); ret.is_error())
dwarnln("failed to disable old output path {}", ret.error()); dwarnln("failed to disable old output path {}", ret.error());
const uint32_t pin_id = m_output_pins[pin]->id;
for (size_t i = 0; i < m_output_paths.size(); i++) for (size_t i = 0; i < m_output_paths.size(); i++)
{ {
if (m_output_paths[i].front()->id != pin_id) if (m_output_paths[i].front()->id != pin_id)
continue; continue;
if (auto ret = enable_output_path(i); !ret.is_error()) if (auto ret = enable_output_path(i); ret.is_error())
{ {
if (ret.error().get_error_code() == ENOTSUP) if (ret.error().get_error_code() == ENOTSUP)
continue; continue;
@@ -139,7 +127,6 @@ namespace Kernel
return ret.release_error(); return ret.release_error();
} }
dprintln("set output widget to {}", pin_id);
m_output_path_index = i; m_output_path_index = i;
return {}; return {};
} }
@@ -149,6 +136,37 @@ namespace Kernel
return BAN::Error::from_errno(ENOTSUP); return BAN::Error::from_errno(ENOTSUP);
} }
BAN::ErrorOr<void> HDAudioFunctionGroup::set_volume_mdB(int32_t mdB)
{
mdB = BAN::Math::clamp(mdB, m_volume_info.min_mdB, m_volume_info.max_mdB);
const auto& path = m_output_paths[m_output_path_index];
for (size_t i = 0; i < path.size(); i++)
{
if (!path[i]->output_amplifier.has_value())
continue;
const int32_t step_round = (mdB >= 0)
? +m_volume_info.step_mdB / 2
: -m_volume_info.step_mdB / 2;
const uint32_t step = (mdB + step_round) / m_volume_info.step_mdB + path[i]->output_amplifier->offset;
const uint32_t volume = 0b1'0'1'1'0000'0'0000000 | step;
TRY(m_controller->send_command({
.data = static_cast<uint8_t>(volume & 0xFF),
.command = static_cast<uint16_t>(0x300 | (volume >> 8)),
.node_index = path[i]->id,
.codec_address = m_cid,
}));
break;
}
m_volume_info.mdB = mdB;
return {};
}
size_t HDAudioFunctionGroup::bdl_offset() const size_t HDAudioFunctionGroup::bdl_offset() const
{ {
const size_t bdl_entry_bytes = m_bdl_entry_sample_frames * get_channels() * sizeof(uint16_t); const size_t bdl_entry_bytes = m_bdl_entry_sample_frames * get_channels() * sizeof(uint16_t);
@@ -230,18 +248,39 @@ namespace Kernel
BAN::ErrorOr<void> HDAudioFunctionGroup::initialize_output() BAN::ErrorOr<void> HDAudioFunctionGroup::initialize_output()
{ {
BAN::Vector<const HDAudio::AFGWidget*> path;
TRY(path.reserve(m_max_path_length));
for (const auto& widget : m_afg_node.widgets) for (const auto& widget : m_afg_node.widgets)
{ {
if (widget.type != HDAudio::AFGWidget::Type::PinComplex || !widget.pin_complex.output) if (widget.type != HDAudio::AFGWidget::Type::PinComplex || !widget.pin_complex.output)
continue; continue;
// no physical connection
if ((widget.pin_complex.config >> 30) == 0b01)
continue;
// needs a GPU
if (widget.pin_complex.display)
continue;
BAN::Vector<const HDAudio::AFGWidget*> path;
TRY(path.push_back(&widget)); TRY(path.push_back(&widget));
TRY(recurse_output_paths(widget, path)); TRY(recurse_output_paths(widget, path));
path.pop_back();
if (!m_output_paths.empty() && m_output_paths.back().front()->id == widget.id)
TRY(m_output_pins.push_back(&widget));
} }
if (m_output_pins.empty())
return BAN::Error::from_errno(ENODEV);
// prefer short paths
BAN::sort::sort(m_output_paths.begin(), m_output_paths.end(),
[](const auto& a, const auto& b) {
if (a.front()->id != b.front()->id)
return a.front()->id < b.front()->id;
return a.size() < b.size();
}
);
dprintln_if(DEBUG_HDAUDIO, "found {} paths from output to DAC", m_output_paths.size()); dprintln_if(DEBUG_HDAUDIO, "found {} paths from output to DAC", m_output_paths.size());
// select first supported path // select first supported path
@@ -283,13 +322,6 @@ namespace Kernel
return 0b0'0'000'000'0'001'0001; return 0b0'0'000'000'0'001'0001;
} }
uint16_t HDAudioFunctionGroup::get_volume_data() const
{
// TODO: don't hardcode this
// left and right output, no mute, max gain
return 0b1'0'1'1'0000'0'1111111;
}
BAN::ErrorOr<void> HDAudioFunctionGroup::enable_output_path(uint8_t index) BAN::ErrorOr<void> HDAudioFunctionGroup::enable_output_path(uint8_t index)
{ {
ASSERT(index < m_output_paths.size()); ASSERT(index < m_output_paths.size());
@@ -310,7 +342,6 @@ namespace Kernel
} }
const auto format = get_format_data(); const auto format = get_format_data();
const auto volume = get_volume_data();
for (size_t i = 0; i < path.size(); i++) for (size_t i = 0; i < path.size(); i++)
{ {
@@ -339,13 +370,17 @@ namespace Kernel
})); }));
} }
// set volume // set volume to 0 dB, no mute
TRY(m_controller->send_command({ if (path[i]->output_amplifier.has_value())
.data = static_cast<uint8_t>(volume & 0xFF), {
.command = static_cast<uint16_t>(0x300 | (volume >> 8)), const uint32_t volume = 0b1'0'1'1'0000'0'0000000 | path[i]->output_amplifier->offset;
.node_index = path[i]->id, TRY(m_controller->send_command({
.codec_address = m_cid, .data = static_cast<uint8_t>(volume & 0xFF),
})); .command = static_cast<uint16_t>(0x300 | (volume >> 8)),
.node_index = path[i]->id,
.codec_address = m_cid,
}));
}
switch (path[i]->type) switch (path[i]->type)
{ {
@@ -390,6 +425,41 @@ namespace Kernel
} }
} }
// update volume info to this path
m_volume_info.min_mdB = 0;
m_volume_info.max_mdB = 0;
m_volume_info.step_mdB = 0;
for (size_t i = 0; i < path.size(); i++)
{
if (!path[i]->output_amplifier.has_value())
continue;
const auto& amp = path[i]->output_amplifier.value();
const int32_t step_mdB = amp.step_size * 250;
m_volume_info.step_mdB = step_mdB;
m_volume_info.min_mdB = -amp.offset * step_mdB;
m_volume_info.max_mdB = (amp.num_steps - amp.offset) * step_mdB;
m_volume_info.mdB = BAN::Math::clamp(m_volume_info.mdB, m_volume_info.min_mdB, m_volume_info.max_mdB);
const int32_t step_round = (m_volume_info.mdB >= 0)
? +step_mdB / 2
: -step_mdB / 2;
const uint32_t step = (m_volume_info.mdB + step_round) / step_mdB + amp.offset;
const uint32_t volume = 0b1'0'1'1'0000'0'0000000 | step;
TRY(m_controller->send_command({
.data = static_cast<uint8_t>(volume & 0xFF),
.command = static_cast<uint16_t>(0x300 | (volume >> 8)),
.node_index = path[i]->id,
.codec_address = m_cid,
}));
break;
}
if (m_volume_info.min_mdB == 0 && m_volume_info.max_mdB == 0)
m_volume_info.mdB = 0;
return {}; return {};
} }
@@ -442,10 +512,6 @@ namespace Kernel
BAN::ErrorOr<void> HDAudioFunctionGroup::recurse_output_paths(const HDAudio::AFGWidget& widget, BAN::Vector<const HDAudio::AFGWidget*>& path) BAN::ErrorOr<void> HDAudioFunctionGroup::recurse_output_paths(const HDAudio::AFGWidget& widget, BAN::Vector<const HDAudio::AFGWidget*>& path)
{ {
// cycle "detection"
if (path.size() >= m_max_path_length)
return {};
// we've reached a DAC // we've reached a DAC
if (widget.type == HDAudio::AFGWidget::Type::OutputConverter) if (widget.type == HDAudio::AFGWidget::Type::OutputConverter)
{ {
@@ -462,9 +528,18 @@ namespace Kernel
{ {
if (!widget.connections.contains(connection.id)) if (!widget.connections.contains(connection.id))
continue; continue;
// cycle detection
for (const auto* w : path)
if (w == &connection)
goto already_visited;
TRY(path.push_back(&connection)); TRY(path.push_back(&connection));
TRY(recurse_output_paths(connection, path)); TRY(recurse_output_paths(connection, path));
path.pop_back(); path.pop_back();
already_visited:
continue;
} }
return {}; return {};
@@ -483,30 +558,18 @@ namespace Kernel
while ((m_bdl_head + 1) % m_bdl_entry_count != m_bdl_tail) while ((m_bdl_head + 1) % m_bdl_entry_count != m_bdl_tail)
{ {
const size_t sample_data_tail = (m_sample_data_head + m_sample_data_capacity - m_sample_data_size) % m_sample_data_capacity; const size_t sample_frames = BAN::Math::min(m_sample_data->size() / get_channels() / sizeof(uint16_t), m_bdl_entry_sample_frames);
const size_t sample_frames = BAN::Math::min(m_sample_data_size / get_channels() / sizeof(uint16_t), m_bdl_entry_sample_frames);
if (sample_frames == 0) if (sample_frames == 0)
break; break;
const size_t copy_total_bytes = sample_frames * get_channels() * sizeof(uint16_t); const size_t copy_total_bytes = sample_frames * get_channels() * sizeof(uint16_t);
const size_t copy_before_wrap = BAN::Math::min(copy_total_bytes, m_sample_data_capacity - sample_data_tail);
memcpy( memcpy(
reinterpret_cast<void*>(m_bdl_region->vaddr() + m_bdl_head * bdl_entry_bytes), reinterpret_cast<void*>(m_bdl_region->vaddr() + m_bdl_head * bdl_entry_bytes),
&m_sample_data[sample_data_tail], m_sample_data->get_data().data(),
copy_before_wrap copy_total_bytes
); );
if (copy_before_wrap < copy_total_bytes)
{
memcpy(
reinterpret_cast<void*>(m_bdl_region->vaddr() + m_bdl_head * bdl_entry_bytes + copy_before_wrap),
&m_sample_data[0],
copy_total_bytes - copy_before_wrap
);
}
if (copy_total_bytes < bdl_entry_bytes) if (copy_total_bytes < bdl_entry_bytes)
{ {
memset( memset(
@@ -516,8 +579,7 @@ namespace Kernel
); );
} }
m_sample_data_size -= copy_total_bytes; m_sample_data->pop(copy_total_bytes);
m_bdl_head = (m_bdl_head + 1) % m_bdl_entry_count; m_bdl_head = (m_bdl_head + 1) % m_bdl_entry_count;
} }

View File

@@ -65,8 +65,12 @@ namespace Kernel
m_pci_device.enable_interrupt(0, *this); m_pci_device.enable_interrupt(0, *this);
m_bar0->write32(Regs::INTCTL, UINT32_MAX); m_bar0->write32(Regs::INTCTL, UINT32_MAX);
for (uint8_t codec_id = 0; codec_id < 0x10; codec_id++) const uint16_t state_sts = m_bar0->read16(Regs::STATESTS);
for (uint8_t codec_id = 0; codec_id < 15; codec_id++)
{ {
if (!(state_sts & (1 << codec_id)))
continue;
auto codec_or_error = initialize_codec(codec_id); auto codec_or_error = initialize_codec(codec_id);
if (codec_or_error.is_error()) if (codec_or_error.is_error())
continue; continue;
@@ -307,8 +311,26 @@ namespace Kernel
if (result.type == AFGWidget::Type::PinComplex) if (result.type == AFGWidget::Type::PinComplex)
{ {
const uint32_t cap = send_command_or_zero(0xF00, 0x0C); const uint32_t cap = send_command_or_zero(0xF00, 0x0C);
result.pin_complex.output = !!(cap & (1 << 4)); result.pin_complex = {
result.pin_complex.input = !!(cap & (1 << 5)); .input = !!(cap & (1 << 5)),
.output = !!(cap & (1 << 4)),
.display = !!(cap & ((1 << 7) | (1 << 24))),
.config = send_command_or_zero(0xF1C, 0x00),
};
}
if (const uint32_t out_amp_cap = send_command_or_zero(0xF00, 0x12))
{
const uint8_t offset = (out_amp_cap >> 0) & 0x7F;
const uint8_t num_steps = (out_amp_cap >> 8) & 0x7F;
const uint8_t step_size = (out_amp_cap >> 16) & 0x7F;
const bool mute = (out_amp_cap >> 31);
result.output_amplifier = HDAudio::AFGWidget::Amplifier {
.offset = offset,
.num_steps = num_steps,
.step_size = step_size,
.mute = mute,
};
} }
const uint8_t connection_info = send_command_or_zero(0xF00, 0x0E); const uint8_t connection_info = send_command_or_zero(0xF00, 0x0E);

View File

@@ -335,6 +335,8 @@ namespace Debug
void print_prefix(const char* file, int line) void print_prefix(const char* file, int line)
{ {
if (file[0] == '.' && file[1] == '/')
file += 2;
auto ms_since_boot = Kernel::SystemTimer::is_initialized() ? Kernel::SystemTimer::get().ms_since_boot() : 0; auto ms_since_boot = Kernel::SystemTimer::is_initialized() ? Kernel::SystemTimer::get().ms_since_boot() : 0;
BAN::Formatter::print(Debug::putchar, "[{5}.{3}] {}:{}: ", ms_since_boot / 1000, ms_since_boot % 1000, file, line); BAN::Formatter::print(Debug::putchar, "[{5}.{3}] {}:{}: ", ms_since_boot / 1000, ms_since_boot % 1000, file, line);
} }

View File

@@ -21,10 +21,11 @@ namespace Kernel
{ {
auto ms_since_boot = SystemTimer::get().ms_since_boot(); auto ms_since_boot = SystemTimer::get().ms_since_boot();
SpinLockGuard _(Debug::s_debug_lock); SpinLockGuard _(Debug::s_debug_lock);
BAN::Formatter::print(Debug::putchar, "[{5}.{3}] {} {}: ", BAN::Formatter::print(Debug::putchar, "[{5}.{3}] {}:{} {}: ",
ms_since_boot / 1000, ms_since_boot / 1000,
ms_since_boot % 1000, ms_since_boot % 1000,
Kernel::Process::current().pid(), Kernel::Process::current().pid(),
Thread::current().tid(),
Kernel::Process::current().name() Kernel::Process::current().name()
); );
for (size_t i = 0; i < buffer.size(); i++) for (size_t i = 0; i < buffer.size(); i++)

View File

@@ -6,6 +6,7 @@
#include <kernel/Terminal/FramebufferTerminal.h> #include <kernel/Terminal/FramebufferTerminal.h>
#include <sys/framebuffer.h> #include <sys/framebuffer.h>
#include <sys/ioctl.h>
#include <sys/mman.h> #include <sys/mman.h>
#include <sys/sysmacros.h> #include <sys/sysmacros.h>
@@ -133,6 +134,26 @@ namespace Kernel
return bytes_to_copy; return bytes_to_copy;
} }
BAN::ErrorOr<long> FramebufferDevice::ioctl_impl(int cmd, void* arg)
{
switch (cmd)
{
case FB_MSYNC_RECTANGLE:
{
auto& rectangle = *static_cast<fb_msync_region*>(arg);
sync_pixels_rectangle(
rectangle.min_x,
rectangle.min_y,
rectangle.max_x - rectangle.min_x,
rectangle.max_y - rectangle.min_y
);
return 0;
}
}
return CharacterDevice::ioctl(cmd, arg);
}
uint32_t FramebufferDevice::get_pixel(uint32_t x, uint32_t y) const uint32_t FramebufferDevice::get_pixel(uint32_t x, uint32_t y) const
{ {
ASSERT(x < m_width && y < m_height); ASSERT(x < m_width && y < m_height);

View File

@@ -209,7 +209,7 @@ namespace Kernel
continue; continue;
SpinLockGuardAsMutex smutex(guard); SpinLockGuardAsMutex smutex(guard);
TRY(Thread::current().block_or_eintr_or_timeout_ns(m_thread_blocker, waketime_ns - current_ns, false, &smutex)); TRY(Thread::current().block_or_eintr_or_waketime_ns(m_thread_blocker, waketime_ns, false, &smutex));
} }
return event_count; return event_count;

View File

@@ -0,0 +1,54 @@
#include <kernel/FS/EventFD.h>
#include <sys/epoll.h>
namespace Kernel
{
BAN::ErrorOr<BAN::RefPtr<Inode>> EventFD::create(uint64_t initval, bool semaphore)
{
auto* eventfd_ptr = new EventFD(initval, semaphore);
if (eventfd_ptr == nullptr)
return BAN::Error::from_errno(ENOMEM);
return BAN::RefPtr<Inode>(BAN::RefPtr<EventFD>::adopt(eventfd_ptr));
}
BAN::ErrorOr<size_t> EventFD::read_impl(off_t, BAN::ByteSpan buffer)
{
if (buffer.size() < sizeof(uint64_t))
return BAN::Error::from_errno(EINVAL);
while (m_value == 0)
TRY(Thread::current().block_or_eintr_indefinite(m_thread_blocker, &m_mutex));
const uint64_t read_value = m_is_semaphore ? 1 : m_value;
m_value -= read_value;
buffer.as<uint64_t>() = read_value;
epoll_notify(EPOLLOUT);
return sizeof(uint64_t);
}
BAN::ErrorOr<size_t> EventFD::write_impl(off_t, BAN::ConstByteSpan buffer)
{
if (buffer.size() < sizeof(uint64_t))
return BAN::Error::from_errno(EINVAL);
const uint64_t write_value = buffer.as<const uint64_t>();
if (write_value == UINT64_MAX)
return BAN::Error::from_errno(EINVAL);
while (m_value + write_value < m_value)
TRY(Thread::current().block_or_eintr_indefinite(m_thread_blocker, &m_mutex));
m_value += write_value;
if (m_value > 0)
epoll_notify(EPOLLIN);
return sizeof(uint64_t);
}
}

View File

@@ -222,6 +222,22 @@ namespace Kernel
return getpeername_impl(address, address_len); return getpeername_impl(address, address_len);
} }
BAN::ErrorOr<void> Inode::getsockopt(int level, int option, void* value, socklen_t* value_len)
{
LockGuard _(m_mutex);
if (!mode().ifsock())
return BAN::Error::from_errno(ENOTSOCK);
return getsockopt_impl(level, option, value, value_len);
}
BAN::ErrorOr<void> Inode::setsockopt(int level, int option, const void* value, socklen_t value_len)
{
LockGuard _(m_mutex);
if (!mode().ifsock())
return BAN::Error::from_errno(ENOTSOCK);
return setsockopt_impl(level, option, value, value_len);
}
BAN::ErrorOr<size_t> Inode::read(off_t offset, BAN::ByteSpan buffer) BAN::ErrorOr<size_t> Inode::read(off_t offset, BAN::ByteSpan buffer)
{ {
LockGuard _(m_mutex); LockGuard _(m_mutex);

View File

@@ -9,12 +9,16 @@
namespace Kernel namespace Kernel
{ {
static constexpr size_t s_pipe_buffer_size = 0x10000;
BAN::ErrorOr<BAN::RefPtr<Inode>> Pipe::create(const Credentials& credentials) BAN::ErrorOr<BAN::RefPtr<Inode>> Pipe::create(const Credentials& credentials)
{ {
Pipe* pipe = new Pipe(credentials); auto* pipe_ptr = new Pipe(credentials);
if (pipe == nullptr) if (pipe_ptr == nullptr)
return BAN::Error::from_errno(ENOMEM); return BAN::Error::from_errno(ENOMEM);
return BAN::RefPtr<Inode>::adopt(pipe); auto pipe = BAN::RefPtr<Pipe>::adopt(pipe_ptr);
pipe->m_buffer = TRY(ByteRingBuffer::create(s_pipe_buffer_size));
return BAN::RefPtr<Inode>(pipe);
} }
Pipe::Pipe(const Credentials& credentials) Pipe::Pipe(const Credentials& credentials)
@@ -69,27 +73,16 @@ namespace Kernel
BAN::ErrorOr<size_t> Pipe::read_impl(off_t, BAN::ByteSpan buffer) BAN::ErrorOr<size_t> Pipe::read_impl(off_t, BAN::ByteSpan buffer)
{ {
while (m_buffer_size == 0) while (m_buffer->empty())
{ {
if (m_writing_count == 0) if (m_writing_count == 0)
return 0; return 0;
TRY(Thread::current().block_or_eintr_indefinite(m_thread_blocker, &m_mutex)); TRY(Thread::current().block_or_eintr_indefinite(m_thread_blocker, &m_mutex));
} }
const size_t to_copy = BAN::Math::min<size_t>(buffer.size(), m_buffer_size); const size_t to_copy = BAN::Math::min<size_t>(buffer.size(), m_buffer->size());
memcpy(buffer.data(), m_buffer->get_data().data(), to_copy);
if (m_buffer_tail + to_copy <= m_buffer.size()) m_buffer->pop(to_copy);
memcpy(buffer.data(), m_buffer.data() + m_buffer_tail, to_copy);
else
{
const size_t before_wrap = m_buffer.size() - m_buffer_tail;
const size_t after_wrap = to_copy - before_wrap;
memcpy(buffer.data(), m_buffer.data() + m_buffer_tail, before_wrap);
memcpy(buffer.data() + before_wrap, m_buffer.data(), after_wrap);
}
m_buffer_tail = (m_buffer_tail + to_copy) % m_buffer.size();
m_buffer_size -= to_copy;
m_atime = SystemTimer::get().real_time(); m_atime = SystemTimer::get().real_time();
@@ -102,7 +95,7 @@ namespace Kernel
BAN::ErrorOr<size_t> Pipe::write_impl(off_t, BAN::ConstByteSpan buffer) BAN::ErrorOr<size_t> Pipe::write_impl(off_t, BAN::ConstByteSpan buffer)
{ {
while (m_buffer_size >= m_buffer.size()) while (m_buffer->full())
{ {
if (m_reading_count == 0) if (m_reading_count == 0)
{ {
@@ -112,20 +105,8 @@ namespace Kernel
TRY(Thread::current().block_or_eintr_indefinite(m_thread_blocker, &m_mutex)); TRY(Thread::current().block_or_eintr_indefinite(m_thread_blocker, &m_mutex));
} }
const size_t to_copy = BAN::Math::min(buffer.size(), m_buffer.size() - m_buffer_size); const size_t to_copy = BAN::Math::min(buffer.size(), m_buffer->free());
const size_t buffer_head = (m_buffer_tail + m_buffer_size) % m_buffer.size(); m_buffer->push(buffer.slice(0, to_copy));
if (buffer_head + to_copy <= m_buffer.size())
memcpy(m_buffer.data() + buffer_head, buffer.data(), to_copy);
else
{
const size_t before_wrap = m_buffer.size() - buffer_head;
const size_t after_wrap = to_copy - before_wrap;
memcpy(m_buffer.data() + buffer_head, buffer.data(), before_wrap);
memcpy(m_buffer.data(), buffer.data() + before_wrap, after_wrap);
}
m_buffer_size += to_copy;
timespec current_time = SystemTimer::get().real_time(); timespec current_time = SystemTimer::get().real_time();
m_mtime = current_time; m_mtime = current_time;

View File

@@ -1,94 +1,262 @@
#include <BAN/ScopeGuard.h> #include <BAN/ScopeGuard.h>
#include <kernel/FS/USTARModule.h> #include <kernel/FS/USTARModule.h>
#include <kernel/Timer/Timer.h>
#include <LibDEFLATE/Decompressor.h>
#include <tar.h> #include <tar.h>
namespace Kernel namespace Kernel
{ {
bool is_ustar_boot_module(const BootModule& module) class DataSource
{ {
if (module.start % PAGE_SIZE) public:
DataSource() = default;
virtual ~DataSource() = default;
size_t data_size() const
{ {
dprintln("ignoring non-page-aligned module"); return m_data_size;
}
BAN::ConstByteSpan data()
{
return { m_data_buffer, m_data_size };
}
void pop_data(size_t size)
{
ASSERT(size <= m_data_size);
if (size > 0 && size < m_data_size)
memmove(m_data_buffer, m_data_buffer + size, m_data_size - size);
m_data_size -= size;
m_bytes_produced += size;
}
virtual BAN::ErrorOr<bool> produce_data() = 0;
uint64_t bytes_produced() const
{
return m_bytes_produced;
}
virtual uint64_t bytes_consumed() const = 0;
protected:
uint8_t m_data_buffer[4096];
size_t m_data_size { 0 };
private:
uint64_t m_bytes_produced { 0 };
};
class DataSourceRaw final : public DataSource
{
public:
DataSourceRaw(const BootModule& module)
: m_module(module)
{ }
BAN::ErrorOr<bool> produce_data() override
{
if (m_offset >= m_module.size || m_data_size >= sizeof(m_data_buffer))
return false;
while (m_offset < m_module.size && m_data_size < sizeof(m_data_buffer))
{
const size_t to_copy = BAN::Math::min(
sizeof(m_data_buffer) - m_data_size,
PAGE_SIZE - (m_offset % PAGE_SIZE)
);
PageTable::with_fast_page((m_module.start + m_offset) & PAGE_ADDR_MASK, [&] {
memcpy(m_data_buffer + m_data_size, PageTable::fast_page_as_ptr(m_offset % PAGE_SIZE), to_copy);
});
m_data_size += to_copy;
m_offset += to_copy;
}
return true;
}
uint64_t bytes_consumed() const override
{
return bytes_produced();
}
private:
const BootModule& m_module;
size_t m_offset { 0 };
};
class DataSourceGZip final : public DataSource
{
public:
DataSourceGZip(BAN::UniqPtr<DataSource>&& data_source)
: m_data_source(BAN::move(data_source))
, m_decompressor(LibDEFLATE::StreamType::GZip)
{ }
BAN::ErrorOr<bool> produce_data() override
{
if (m_is_done)
return false;
bool did_produce_data { false };
for (;;)
{
TRY(m_data_source->produce_data());
size_t input_consumed, output_produced;
const auto status = TRY(m_decompressor.decompress(
m_data_source->data(),
input_consumed,
{ m_data_buffer + m_data_size, sizeof(m_data_buffer) - m_data_size },
output_produced
));
m_data_source->pop_data(input_consumed);
m_data_size += output_produced;
if (output_produced)
did_produce_data = true;
switch (status)
{
using DecompStatus = LibDEFLATE::Decompressor::Status;
case DecompStatus::Done:
m_is_done = true;
return did_produce_data;
case DecompStatus::NeedMoreInput:
break;
case DecompStatus::NeedMoreOutput:
return did_produce_data;
}
}
}
uint64_t bytes_consumed() const override
{
return m_data_source->bytes_consumed();
}
private:
BAN::UniqPtr<DataSource> m_data_source;
LibDEFLATE::Decompressor m_decompressor;
bool m_is_done { false };
};
static BAN::ErrorOr<void> unpack_boot_module_into_directory(BAN::RefPtr<Inode> root_inode, DataSource& data_source);
BAN::ErrorOr<bool> unpack_boot_module_into_directory(BAN::RefPtr<Inode> root_inode, const BootModule& module)
{
ASSERT(root_inode->mode().ifdir());
BAN::UniqPtr<DataSource> data_source = TRY(BAN::UniqPtr<DataSourceRaw>::create(module));
bool is_compressed = false;
TRY(data_source->produce_data());
if (data_source->data_size() >= 2 && memcmp(&data_source->data()[0], "\x1F\x8B", 2) == 0)
{
data_source = TRY(BAN::UniqPtr<DataSourceGZip>::create(BAN::move(data_source)));
is_compressed = true;
}
TRY(data_source->produce_data());
if (data_source->data_size() < 512 || memcmp(&data_source->data()[257], "ustar", 5) != 0)
{
dwarnln("Unrecognized initrd format");
return false; return false;
} }
if (module.size < 512) const auto module_size_kib = module.size / 1024;
return false; dprintln("unpacking {}.{3} MiB{} initrd",
module_size_kib / 1024, (module_size_kib % 1024) * 1000 / 1024,
is_compressed ? " compressed" : ""
);
bool has_ustar_signature; const auto unpack_ms1 = SystemTimer::get().ms_since_boot();
PageTable::with_fast_page(module.start, [&] { TRY(unpack_boot_module_into_directory(root_inode, *data_source));
has_ustar_signature = memcmp(PageTable::fast_page_as_ptr(257), "ustar", 5) == 0; const auto unpack_ms2 = SystemTimer::get().ms_since_boot();
});
return has_ustar_signature; const auto duration_ms = unpack_ms2 - unpack_ms1;
dprintln("unpacking {}.{3} MiB{} initrd took {}.{3} s",
module_size_kib / 1024, (module_size_kib % 1024) * 1000 / 1024,
is_compressed ? " compressed" : "",
duration_ms / 1000, duration_ms % 1000
);
if (is_compressed)
{
const auto uncompressed_kib = data_source->bytes_produced() / 1024;
dprintln("uncompressed size {}.{3} MiB",
uncompressed_kib / 1024, (uncompressed_kib % 1024) * 1000 / 1024
);
}
return true;
} }
BAN::ErrorOr<void> unpack_boot_module_into_filesystem(BAN::RefPtr<FileSystem> filesystem, const BootModule& module) BAN::ErrorOr<void> unpack_boot_module_into_directory(BAN::RefPtr<Inode> root_inode, DataSource& data_source)
{ {
ASSERT(is_ustar_boot_module(module));
auto root_inode = filesystem->root_inode();
uint8_t* temp_page = static_cast<uint8_t*>(kmalloc(PAGE_SIZE));
if (temp_page == nullptr)
return BAN::Error::from_errno(ENOMEM);
BAN::ScopeGuard _([temp_page] { kfree(temp_page); });
BAN::String next_file_name; BAN::String next_file_name;
BAN::String next_link_name; BAN::String next_link_name;
size_t offset = 0; constexpr uint32_t print_interval_ms = 1000;
while (offset + 512 <= module.size) auto next_print_ms = SystemTimer::get().ms_since_boot() + print_interval_ms;
while (TRY(data_source.produce_data()), data_source.data_size() >= 512)
{ {
size_t file_size = 0; if (SystemTimer::get().ms_since_boot() >= next_print_ms)
mode_t file_mode = 0; {
uid_t file_uid = 0; const auto kib_consumed = data_source.bytes_consumed() / 1024;
gid_t file_gid = 0; const auto kib_produced = data_source.bytes_produced() / 1024;
uint8_t file_type = 0; if (kib_consumed == kib_produced)
char file_path[100 + 1 + 155 + 1] {}; {
dprintln(" ... {}.{3} MiB",
PageTable::with_fast_page((module.start + offset) & PAGE_ADDR_MASK, [&] { kib_consumed / 1024, (kib_consumed % 1024) * 1000 / 1024
const size_t page_off = offset % PAGE_SIZE; );
const auto parse_octal =
[page_off](size_t offset, size_t length) -> size_t
{
size_t result = 0;
for (size_t i = 0; i < length; i++)
{
const char ch = PageTable::fast_page_as<char>(page_off + offset + i);
if (ch == '\0')
break;
result = (result * 8) + (ch - '0');
}
return result;
};
if (memcmp(PageTable::fast_page_as_ptr(page_off + 257), "ustar", 5)) {
file_size = SIZE_MAX;
return;
} }
else
{
dprintln(" ... {}.{3} MiB ({}.{3} MiB)",
kib_consumed / 1024, (kib_consumed % 1024) * 1000 / 1024,
kib_produced / 1024, (kib_produced % 1024) * 1000 / 1024
);
}
next_print_ms = SystemTimer::get().ms_since_boot() + print_interval_ms;
}
memcpy(file_path, PageTable::fast_page_as_ptr(page_off + 345), 155); const auto parse_octal =
const size_t prefix_len = strlen(file_path); [&data_source](size_t offset, size_t length) -> size_t
file_path[prefix_len] = '/'; {
memcpy(file_path + prefix_len + 1, PageTable::fast_page_as_ptr(page_off), 100); size_t result = 0;
for (size_t i = 0; i < length; i++)
{
const char ch = data_source.data()[offset + i];
if (ch == '\0')
break;
result = (result * 8) + (ch - '0');
}
return result;
};
file_mode = parse_octal(100, 8); if (memcmp(&data_source.data()[257], "ustar", 5) != 0)
file_uid = parse_octal(108, 8);
file_gid = parse_octal(116, 8);
file_size = parse_octal(124, 12);
file_type = PageTable::fast_page_as<char>(page_off + 156);
});
if (file_size == SIZE_MAX)
break;
if (offset + 512 + file_size > module.size)
break; break;
auto parent_inode = filesystem->root_inode(); char file_path[100 + 1 + 155 + 1];
memcpy(file_path, &data_source.data()[345], 155);
const size_t prefix_len = strlen(file_path);
file_path[prefix_len] = '/';
memcpy(file_path + prefix_len + 1, &data_source.data()[0], 100);
mode_t file_mode = parse_octal(100, 8);
const uid_t file_uid = parse_octal(108, 8);
const gid_t file_gid = parse_octal(116, 8);
const size_t file_size = parse_octal(124, 12);
const uint8_t file_type = data_source.data()[156];
auto parent_inode = root_inode;
auto file_path_parts = TRY(BAN::StringView(next_file_name.empty() ? file_path : next_file_name.sv()).split('/')); auto file_path_parts = TRY(BAN::StringView(next_file_name.empty() ? file_path : next_file_name.sv()).split('/'));
for (size_t i = 0; i < file_path_parts.size() - 1; i++) for (size_t i = 0; i < file_path_parts.size() - 1; i++)
@@ -111,27 +279,33 @@ namespace Kernel
auto file_name_sv = file_path_parts.back(); auto file_name_sv = file_path_parts.back();
bool did_consume_data = false;
if (file_type == 'L' || file_type == 'K') if (file_type == 'L' || file_type == 'K')
{ {
auto& target = (file_type == 'L') ? next_file_name : next_link_name; auto& target_str = (file_type == 'L') ? next_file_name : next_link_name;
TRY(target.resize(file_size)); TRY(target_str.resize(file_size));
data_source.pop_data(512);
size_t nwritten = 0; size_t nwritten = 0;
while (nwritten < file_size) while (nwritten < file_size)
{ {
const paddr_t paddr = module.start + offset + 512 + nwritten; TRY(data_source.produce_data());
PageTable::with_fast_page(paddr & PAGE_ADDR_MASK, [&] { if (data_source.data_size() == 0)
memcpy(temp_page, PageTable::fast_page_as_ptr(), PAGE_SIZE); return {};
});
const size_t page_off = paddr % PAGE_SIZE; const size_t to_copy = BAN::Math::min(data_source.data_size(), file_size - nwritten);
const size_t to_write = BAN::Math::min(file_size - nwritten, PAGE_SIZE - page_off); memcpy(target_str.data() + nwritten, data_source.data().data(), to_copy);
memcpy(target.data() + nwritten, temp_page + page_off, to_write); nwritten += to_copy;
nwritten += to_write;
data_source.pop_data(to_copy);
} }
while (!target.empty() && target.back() == '\0') did_consume_data = true;
target.pop_back();
while (!target_str.empty() && target_str.back() == '\0')
target_str.pop_back();
} }
else if (file_type == DIRTYPE) else if (file_type == DIRTYPE)
{ {
@@ -149,14 +323,11 @@ namespace Kernel
link_name = next_link_name.sv(); link_name = next_link_name.sv();
else else
{ {
const paddr_t paddr = module.start + offset; memcpy(link_buffer, &data_source.data()[157], 100);
PageTable::with_fast_page(paddr & PAGE_ADDR_MASK, [&] {
memcpy(link_buffer, PageTable::fast_page_as_ptr((paddr % PAGE_SIZE) + 157), 100);
});
link_name = link_buffer; link_name = link_buffer;
} }
auto target_inode = filesystem->root_inode(); auto target_inode = root_inode;
auto link_path_parts = TRY(link_name.split('/')); auto link_path_parts = TRY(link_name.split('/'));
for (const auto part : link_path_parts) for (const auto part : link_path_parts)
@@ -188,10 +359,7 @@ namespace Kernel
link_name = next_link_name.sv(); link_name = next_link_name.sv();
else else
{ {
const paddr_t paddr = module.start + offset; memcpy(link_buffer, &data_source.data()[157], 100);
PageTable::with_fast_page(paddr & PAGE_ADDR_MASK, [&] {
memcpy(link_buffer, PageTable::fast_page_as_ptr((paddr % PAGE_SIZE) + 157), 100);
});
link_name = link_buffer; link_name = link_buffer;
} }
@@ -203,26 +371,26 @@ namespace Kernel
{ {
if (auto ret = parent_inode->create_file(file_name_sv, file_mode, file_uid, file_gid); ret.is_error()) if (auto ret = parent_inode->create_file(file_name_sv, file_mode, file_uid, file_gid); ret.is_error())
dwarnln("failed to create file '{}': {}", file_name_sv, ret.error()); dwarnln("failed to create file '{}': {}", file_name_sv, ret.error());
else else if (file_size)
{ {
if (file_size) auto inode = TRY(parent_inode->find_inode(file_name_sv));
data_source.pop_data(512);
size_t nwritten = 0;
while (nwritten < file_size)
{ {
auto inode = TRY(parent_inode->find_inode(file_name_sv)); TRY(data_source.produce_data());
ASSERT(data_source.data_size() > 0); // what to do?
size_t nwritten = 0; const size_t to_write = BAN::Math::min(file_size - nwritten, data_source.data_size());
while (nwritten < file_size) TRY(inode->write(nwritten, data_source.data().slice(0, to_write)));
{ nwritten += to_write;
const paddr_t paddr = module.start + offset + 512 + nwritten;
PageTable::with_fast_page(paddr & PAGE_ADDR_MASK, [&] {
memcpy(temp_page, PageTable::fast_page_as_ptr(), PAGE_SIZE);
});
const size_t page_off = paddr % PAGE_SIZE; data_source.pop_data(to_write);
const size_t to_write = BAN::Math::min(file_size - nwritten, PAGE_SIZE - page_off);
TRY(inode->write(nwritten, { temp_page + page_off, to_write }));
nwritten += to_write;
}
} }
did_consume_data = true;
} }
} }
@@ -232,9 +400,27 @@ namespace Kernel
next_link_name.clear(); next_link_name.clear();
} }
offset += 512 + file_size; if (!did_consume_data)
if (auto rem = offset % 512) {
offset += 512 - rem; data_source.pop_data(512);
size_t consumed = 0;
while (consumed < file_size)
{
TRY(data_source.produce_data());
if (data_source.data_size() == 0)
return {};
data_source.pop_data(BAN::Math::min(file_size - consumed, data_source.data_size()));
}
}
if (const auto rem = file_size % 512)
{
TRY(data_source.produce_data());
if (data_source.data_size() < rem)
return {};
data_source.pop_data(512 - rem);
}
} }
return {}; return {};

View File

@@ -61,18 +61,22 @@ namespace Kernel
if (filesystem_or_error.is_error()) if (filesystem_or_error.is_error())
panic("Failed to create fallback filesystem: {}", filesystem_or_error.error()); panic("Failed to create fallback filesystem: {}", filesystem_or_error.error());
dprintln("Loading fallback filesystem from {} modules", g_boot_info.modules.size()); dprintln("Trying to load fallback filesystem from {} modules", g_boot_info.modules.size());
auto filesystem = BAN::RefPtr<FileSystem>::adopt(filesystem_or_error.release_value()); auto filesystem = BAN::RefPtr<FileSystem>::adopt(filesystem_or_error.release_value());
bool loaded_initrd = false;
for (const auto& module : g_boot_info.modules) for (const auto& module : g_boot_info.modules)
{ {
if (!is_ustar_boot_module(module)) if (auto ret = unpack_boot_module_into_directory(filesystem->root_inode(), module); ret.is_error())
continue;
if (auto ret = unpack_boot_module_into_filesystem(filesystem, module); ret.is_error())
dwarnln("Failed to unpack boot module: {}", ret.error()); dwarnln("Failed to unpack boot module: {}", ret.error());
else
loaded_initrd |= ret.value();
} }
if (!loaded_initrd)
panic("Could not load initrd from any boot module :(");
return filesystem; return filesystem;
} }

View File

@@ -164,6 +164,33 @@ namespace Kernel
"Unkown Exception 0x1F", "Unkown Exception 0x1F",
}; };
extern "C" uint8_t safe_user_memcpy[];
extern "C" uint8_t safe_user_memcpy_end[];
extern "C" uint8_t safe_user_memcpy_fault[];
extern "C" uint8_t safe_user_strncpy[];
extern "C" uint8_t safe_user_strncpy_end[];
extern "C" uint8_t safe_user_strncpy_fault[];
struct safe_user_page_fault
{
const uint8_t* ip_start;
const uint8_t* ip_end;
const uint8_t* ip_fault;
};
static constexpr safe_user_page_fault s_safe_user_page_faults[] {
{
.ip_start = safe_user_memcpy,
.ip_end = safe_user_memcpy_end,
.ip_fault = safe_user_memcpy_fault,
},
{
.ip_start = safe_user_strncpy,
.ip_end = safe_user_strncpy_end,
.ip_fault = safe_user_strncpy_fault,
},
};
extern "C" void cpp_isr_handler(uint32_t isr, uint32_t error, InterruptStack* interrupt_stack, const Registers* regs) extern "C" void cpp_isr_handler(uint32_t isr, uint32_t error, InterruptStack* interrupt_stack, const Registers* regs)
{ {
if (g_paniced) if (g_paniced)
@@ -194,13 +221,28 @@ namespace Kernel
if (result.is_error()) if (result.is_error())
{ {
dwarnln("Demand paging: {}", result.error()); dwarnln("Demand paging: {}", result.error());
// TODO: this is too strict, we should maybe do SIGBUS and
// SIGKILL only on recursive exceptions
Processor::set_interrupt_state(InterruptState::Enabled);
Thread::current().handle_signal(SIGKILL, {}); Thread::current().handle_signal(SIGKILL, {});
Processor::set_interrupt_state(InterruptState::Disabled);
return; return;
} }
if (result.value()) if (result.value())
return; return;
const uint8_t* ip = reinterpret_cast<const uint8_t*>(interrupt_stack->ip);
for (const auto& safe_user : s_safe_user_page_faults)
{
if (ip < safe_user.ip_start || ip >= safe_user.ip_end)
continue;
interrupt_stack->ip = reinterpret_cast<vaddr_t>(safe_user.ip_fault);
return;
}
break; break;
} }
case ISR::DeviceNotAvailable: case ISR::DeviceNotAvailable:
@@ -285,7 +327,7 @@ namespace Kernel
Debug::s_debug_lock.unlock(InterruptState::Disabled); Debug::s_debug_lock.unlock(InterruptState::Disabled);
if (tid && Thread::current().is_userspace()) if (tid && GDT::is_user_segment(interrupt_stack->cs))
{ {
// TODO: Confirm and fix the exception to signal mappings // TODO: Confirm and fix the exception to signal mappings
@@ -316,6 +358,7 @@ namespace Kernel
case ISR::InvalidOpcode: case ISR::InvalidOpcode:
signal_info.si_signo = SIGILL; signal_info.si_signo = SIGILL;
signal_info.si_code = ILL_ILLOPC; signal_info.si_code = ILL_ILLOPC;
signal_info.si_addr = reinterpret_cast<void*>(interrupt_stack->ip);
break; break;
case ISR::PageFault: case ISR::PageFault:
signal_info.si_signo = SIGSEGV; signal_info.si_signo = SIGSEGV;
@@ -323,6 +366,7 @@ namespace Kernel
signal_info.si_code = SEGV_ACCERR; signal_info.si_code = SEGV_ACCERR;
else else
signal_info.si_code = SEGV_MAPERR; signal_info.si_code = SEGV_MAPERR;
signal_info.si_addr = reinterpret_cast<void*>(regs->cr2);
break; break;
default: default:
dwarnln("Unhandled exception"); dwarnln("Unhandled exception");
@@ -330,7 +374,9 @@ namespace Kernel
break; break;
} }
Processor::set_interrupt_state(InterruptState::Enabled);
Thread::current().handle_signal(signal_info.si_signo, signal_info); Thread::current().handle_signal(signal_info.si_signo, signal_info);
Processor::set_interrupt_state(InterruptState::Disabled);
} }
else else
{ {
@@ -365,10 +411,6 @@ namespace Kernel
Process::update_alarm_queue(); Process::update_alarm_queue();
Processor::scheduler().timer_interrupt(); Processor::scheduler().timer_interrupt();
auto& current_thread = Thread::current();
if (current_thread.can_add_signal_to_execute())
current_thread.handle_signal();
} }
extern "C" void cpp_irq_handler(uint32_t irq) extern "C" void cpp_irq_handler(uint32_t irq)
@@ -392,15 +434,18 @@ namespace Kernel
else else
dprintln("no handler for irq 0x{2H}", irq); dprintln("no handler for irq 0x{2H}", irq);
auto& current_thread = Thread::current();
if (current_thread.can_add_signal_to_execute())
current_thread.handle_signal();
Processor::scheduler().reschedule_if_idle(); Processor::scheduler().reschedule_if_idle();
ASSERT(Thread::current().state() != Thread::State::Terminated); ASSERT(Thread::current().state() != Thread::State::Terminated);
} }
extern "C" void cpp_check_signal()
{
Processor::set_interrupt_state(InterruptState::Enabled);
Thread::current().handle_signal_if_interrupted();
Processor::set_interrupt_state(InterruptState::Disabled);
}
void IDT::register_interrupt_handler(uint8_t index, void (*handler)(), uint8_t ist) void IDT::register_interrupt_handler(uint8_t index, void (*handler)(), uint8_t ist)
{ {
auto& desc = m_idt[index]; auto& desc = m_idt[index];
@@ -440,7 +485,6 @@ namespace Kernel
IRQ_LIST_X IRQ_LIST_X
#undef X #undef X
extern "C" void asm_yield_handler();
extern "C" void asm_ipi_handler(); extern "C" void asm_ipi_handler();
extern "C" void asm_timer_handler(); extern "C" void asm_timer_handler();
#if ARCH(i686) #if ARCH(i686)

View File

@@ -136,6 +136,9 @@ namespace Kernel::Input
m_scancode_to_keycode_extended[0x49] = keycode_function(18); m_scancode_to_keycode_extended[0x49] = keycode_function(18);
m_scancode_to_keycode_extended[0x51] = keycode_function(19); m_scancode_to_keycode_extended[0x51] = keycode_function(19);
m_scancode_to_keycode_normal[0x46] = keycode_function(20); m_scancode_to_keycode_normal[0x46] = keycode_function(20);
m_scancode_to_keycode_extended[0x20] = keycode_function(21);
m_scancode_to_keycode_extended[0x2E] = keycode_function(22);
m_scancode_to_keycode_extended[0x30] = keycode_function(23);
// Arrow keys // Arrow keys
m_scancode_to_keycode_extended[0x48] = keycode_normal(5, 0); m_scancode_to_keycode_extended[0x48] = keycode_normal(5, 0);
@@ -246,6 +249,9 @@ namespace Kernel::Input
m_scancode_to_keycode_extended[0x7D] = keycode_function(18); m_scancode_to_keycode_extended[0x7D] = keycode_function(18);
m_scancode_to_keycode_extended[0x7A] = keycode_function(19); m_scancode_to_keycode_extended[0x7A] = keycode_function(19);
m_scancode_to_keycode_normal[0x7E] = keycode_function(20); m_scancode_to_keycode_normal[0x7E] = keycode_function(20);
m_scancode_to_keycode_extended[0x23] = keycode_function(21);
m_scancode_to_keycode_extended[0x21] = keycode_function(22);
m_scancode_to_keycode_extended[0x32] = keycode_function(23);
// Arrow keys // Arrow keys
m_scancode_to_keycode_extended[0x75] = keycode_normal(5, 0); m_scancode_to_keycode_extended[0x75] = keycode_normal(5, 0);
@@ -356,6 +362,9 @@ namespace Kernel::Input
m_scancode_to_keycode_normal[0x6F] = keycode_function(18); m_scancode_to_keycode_normal[0x6F] = keycode_function(18);
m_scancode_to_keycode_normal[0x6D] = keycode_function(19); m_scancode_to_keycode_normal[0x6D] = keycode_function(19);
m_scancode_to_keycode_normal[0x5F] = keycode_function(20); m_scancode_to_keycode_normal[0x5F] = keycode_function(20);
m_scancode_to_keycode_normal[0x9C] = keycode_function(21);
m_scancode_to_keycode_normal[0x9D] = keycode_function(22);
m_scancode_to_keycode_normal[0x95] = keycode_function(23);
// Arrow keys // Arrow keys
m_scancode_to_keycode_normal[0x63] = keycode_normal(5, 0); m_scancode_to_keycode_normal[0x63] = keycode_normal(5, 0);

View File

@@ -0,0 +1,49 @@
#include <kernel/Memory/ByteRingBuffer.h>
#include <kernel/Memory/Heap.h>
#include <kernel/Memory/PageTable.h>
namespace Kernel
{
BAN::ErrorOr<BAN::UniqPtr<ByteRingBuffer>> ByteRingBuffer::create(size_t size)
{
ASSERT(size % PAGE_SIZE == 0);
const size_t page_count = size / PAGE_SIZE;
auto* buffer_ptr = new ByteRingBuffer(size);
if (buffer_ptr == nullptr)
return BAN::Error::from_errno(ENOMEM);
auto buffer = BAN::UniqPtr<ByteRingBuffer>::adopt(buffer_ptr);
buffer->m_vaddr = PageTable::kernel().reserve_free_contiguous_pages(page_count * 2, KERNEL_OFFSET);
if (buffer->m_vaddr == 0)
return BAN::Error::from_errno(ENOMEM);
for (size_t i = 0; i < page_count; i++)
{
const paddr_t paddr = Heap::get().take_free_page();
if (paddr == 0)
return BAN::Error::from_errno(ENOMEM);
PageTable::kernel().map_page_at(paddr, buffer->m_vaddr + i * PAGE_SIZE, PageTable::ReadWrite | PageTable::Present);
PageTable::kernel().map_page_at(paddr, buffer->m_vaddr + size + i * PAGE_SIZE, PageTable::ReadWrite | PageTable::Present);
}
return buffer;
}
ByteRingBuffer::~ByteRingBuffer()
{
if (m_vaddr == 0)
return;
for (size_t i = 0; i < m_capacity / PAGE_SIZE; i++)
{
const paddr_t paddr = PageTable::kernel().physical_address_of(m_vaddr + i * PAGE_SIZE);
if (paddr == 0)
break;
Heap::get().release_page(paddr);
}
PageTable::kernel().unmap_range(m_vaddr, m_capacity * 2);
}
}

View File

@@ -2,11 +2,62 @@
#include <kernel/Memory/Heap.h> #include <kernel/Memory/Heap.h>
#include <kernel/Memory/PageTable.h> #include <kernel/Memory/PageTable.h>
#include <BAN/Sort.h>
extern uint8_t g_kernel_end[]; extern uint8_t g_kernel_end[];
namespace Kernel namespace Kernel
{ {
struct ReservedRegion
{
paddr_t paddr;
uint64_t size;
};
static BAN::Vector<ReservedRegion> get_reserved_regions()
{
BAN::Vector<ReservedRegion> reserved_regions;
MUST(reserved_regions.reserve(2 + g_boot_info.modules.size()));
MUST(reserved_regions.emplace_back(0, 0x100000));
MUST(reserved_regions.emplace_back(g_boot_info.kernel_paddr, reinterpret_cast<size_t>(g_kernel_end - KERNEL_OFFSET)));
for (const auto& module : g_boot_info.modules)
MUST(reserved_regions.emplace_back(module.start, module.size));
// page align regions
for (auto& region : reserved_regions)
{
const auto rem = region.paddr % PAGE_SIZE;
region.paddr -= rem;
region.size += rem;
if (const auto rem = region.size % PAGE_SIZE)
region.size += PAGE_SIZE - rem;
}
// sort regions
BAN::sort::sort(reserved_regions.begin(), reserved_regions.end(),
[](const auto& lhs, const auto& rhs) -> bool {
if (lhs.paddr == rhs.paddr)
return lhs.size < rhs.size;
return lhs.paddr < rhs.paddr;
}
);
// combine overlapping regions
for (size_t i = 1; i < reserved_regions.size(); i++)
{
auto& prev = reserved_regions[i - 1];
auto& curr = reserved_regions[i - 0];
if (prev.paddr > curr.paddr + curr.size || curr.paddr > prev.paddr + prev.size)
continue;
prev.size = BAN::Math::max(prev.size, curr.paddr + curr.size - prev.paddr);
reserved_regions.remove(i--);
}
return reserved_regions;
}
static Heap* s_instance = nullptr; static Heap* s_instance = nullptr;
void Heap::initialize() void Heap::initialize()
@@ -28,6 +79,7 @@ namespace Kernel
if (g_boot_info.memory_map_entries.empty()) if (g_boot_info.memory_map_entries.empty())
panic("Bootloader did not provide a memory map"); panic("Bootloader did not provide a memory map");
auto reserved_regions = get_reserved_regions();
for (const auto& entry : g_boot_info.memory_map_entries) for (const auto& entry : g_boot_info.memory_map_entries)
{ {
const char* entry_type_string = nullptr; const char* entry_type_string = nullptr;
@@ -58,33 +110,71 @@ namespace Kernel
if (entry.type != MemoryMapEntry::Type::Available) if (entry.type != MemoryMapEntry::Type::Available)
continue; continue;
// FIXME: only reserve kernel area and modules, not everything from 0 -> kernel end paddr_t e_start = entry.address;
paddr_t start = entry.address; if (auto rem = e_start % PAGE_SIZE)
if (start < (vaddr_t)g_kernel_end - KERNEL_OFFSET + g_boot_info.kernel_paddr) e_start = PAGE_SIZE - rem;
start = (vaddr_t)g_kernel_end - KERNEL_OFFSET + g_boot_info.kernel_paddr;
for (const auto& module : g_boot_info.modules) paddr_t e_end = entry.address + entry.length;
if (start < module.start + module.size) if (auto rem = e_end % PAGE_SIZE)
start = module.start + module.size; e_end -= rem;
for (const auto& reserved_region : reserved_regions)
{
const paddr_t r_start = reserved_region.paddr;
const paddr_t r_end = reserved_region.paddr + reserved_region.size;
if (r_end < e_start)
continue;
if (r_start > e_end)
break;
const paddr_t end = BAN::Math::max(e_start, r_start);
if (e_start + 2 * PAGE_SIZE <= end)
MUST(m_physical_ranges.emplace_back(e_start, end - e_start));
e_start = BAN::Math::max(e_start, BAN::Math::min(e_end, r_end));
}
if (e_start + 2 * PAGE_SIZE <= e_end)
MUST(m_physical_ranges.emplace_back(e_start, e_end - e_start));
}
uint64_t total_kibi_bytes = 0;
for (auto& range : m_physical_ranges)
{
const uint64_t kibi_bytes = range.usable_memory() / 1024;
dprintln("RAM {8H}->{8H} ({}.{3} MiB)", range.start(), range.end(), kibi_bytes / 1024, kibi_bytes % 1024);
total_kibi_bytes += kibi_bytes;
}
dprintln("Total RAM {}.{3} MiB", total_kibi_bytes / 1024, total_kibi_bytes % 1024);
}
void Heap::release_boot_modules()
{
const auto modules = BAN::move(g_boot_info.modules);
uint64_t kibi_bytes = 0;
for (const auto& module : modules)
{
vaddr_t start = module.start;
if (auto rem = start % PAGE_SIZE) if (auto rem = start % PAGE_SIZE)
start += PAGE_SIZE - rem; start += PAGE_SIZE - rem;
paddr_t end = entry.address + entry.length; vaddr_t end = module.start + module.size;
if (auto rem = end % PAGE_SIZE) if (auto rem = end % PAGE_SIZE)
end -= rem; end -= rem;
// Physical pages needs atleast 2 pages const size_t size = end - start;
if (end > start + PAGE_SIZE) if (size < 2 * PAGE_SIZE)
MUST(m_physical_ranges.emplace_back(start, end - start)); continue;
SpinLockGuard _(m_lock);
MUST(m_physical_ranges.emplace_back(start, size));
kibi_bytes += m_physical_ranges.back().usable_memory() / 1024;
} }
size_t total = 0; if (kibi_bytes)
for (auto& range : m_physical_ranges) dprintln("Released {}.{3} MiB of RAM from boot modules", kibi_bytes / 1024, kibi_bytes % 1024);
{
size_t bytes = range.usable_memory();
dprintln("RAM {8H}->{8H} ({}.{} MB)", range.start(), range.end(), bytes / (1 << 20), bytes % (1 << 20) * 1000 / (1 << 20));
total += bytes;
}
dprintln("Total RAM {}.{} MB", total / (1 << 20), total % (1 << 20) * 1000 / (1 << 20));
} }
paddr_t Heap::take_free_page() paddr_t Heap::take_free_page()

View File

@@ -1,5 +1,6 @@
#include <kernel/Memory/Heap.h> #include <kernel/Memory/Heap.h>
#include <kernel/Memory/MemoryBackedRegion.h> #include <kernel/Memory/MemoryBackedRegion.h>
#include <kernel/Lock/LockGuard.h>
namespace Kernel namespace Kernel
{ {
@@ -14,6 +15,9 @@ namespace Kernel
return BAN::Error::from_errno(ENOMEM); return BAN::Error::from_errno(ENOMEM);
auto region = BAN::UniqPtr<MemoryBackedRegion>::adopt(region_ptr); auto region = BAN::UniqPtr<MemoryBackedRegion>::adopt(region_ptr);
const size_t page_count = (size + PAGE_SIZE - 1) / PAGE_SIZE;
TRY(region->m_physical_pages.resize(page_count, nullptr));
TRY(region->initialize(address_range)); TRY(region->initialize(address_range));
return region; return region;
@@ -28,38 +32,75 @@ namespace Kernel
{ {
ASSERT(m_type == Type::PRIVATE); ASSERT(m_type == Type::PRIVATE);
size_t needed_pages = BAN::Math::div_round_up<size_t>(m_size, PAGE_SIZE); for (auto* page : m_physical_pages)
for (size_t i = 0; i < needed_pages; i++) if (page && --page->ref_count == 0)
{ delete page;
paddr_t paddr = m_page_table.physical_address_of(m_vaddr + i * PAGE_SIZE); }
if (paddr != 0)
Heap::get().release_page(paddr); MemoryBackedRegion::PhysicalPage::~PhysicalPage()
} {
Heap::get().release_page(paddr);
} }
BAN::ErrorOr<bool> MemoryBackedRegion::allocate_page_containing_impl(vaddr_t address, bool wants_write) BAN::ErrorOr<bool> MemoryBackedRegion::allocate_page_containing_impl(vaddr_t address, bool wants_write)
{ {
ASSERT(m_type == Type::PRIVATE); ASSERT(m_type == Type::PRIVATE);
ASSERT(contains(address)); ASSERT(contains(address));
(void)wants_write;
// Check if address is already mapped const vaddr_t vaddr = address & PAGE_ADDR_MASK;
vaddr_t vaddr = address & PAGE_ADDR_MASK;
if (m_page_table.physical_address_of(vaddr) != 0)
return false;
// Map new physcial page to address LockGuard _(m_mutex);
paddr_t paddr = Heap::get().take_free_page();
auto& physical_page = m_physical_pages[(vaddr - m_vaddr) / PAGE_SIZE];
if (physical_page == nullptr)
{
const paddr_t paddr = Heap::get().take_free_page();
if (paddr == 0)
return BAN::Error::from_errno(ENOMEM);
physical_page = new PhysicalPage(paddr);
if (physical_page == nullptr)
return BAN::Error::from_errno(ENOMEM);
m_page_table.map_page_at(paddr, vaddr, m_flags);
PageTable::with_fast_page(paddr, [] {
memset(PageTable::fast_page_as_ptr(), 0x00, PAGE_SIZE);
});
return true;
}
if (auto is_only_ref = (physical_page->ref_count == 1); is_only_ref || !wants_write)
{
auto flags = m_flags;
if (!is_only_ref)
flags &= ~PageTable::ReadWrite;
m_page_table.map_page_at(physical_page->paddr, vaddr, flags);
return true;
}
const paddr_t paddr = Heap::get().take_free_page();
if (paddr == 0) if (paddr == 0)
return BAN::Error::from_errno(ENOMEM); return BAN::Error::from_errno(ENOMEM);
auto* new_physical_page = new PhysicalPage(paddr);
if (new_physical_page == nullptr)
return BAN::Error::from_errno(ENOMEM);
m_page_table.map_page_at(paddr, vaddr, m_flags); m_page_table.map_page_at(paddr, vaddr, m_flags);
// Zero out the new page ASSERT(&m_page_table == &PageTable::current());
PageTable::with_fast_page(paddr, [&] { PageTable::with_fast_page(physical_page->paddr, [vaddr] {
memset(PageTable::fast_page_as_ptr(), 0x00, PAGE_SIZE); memcpy(reinterpret_cast<void*>(vaddr), PageTable::fast_page_as_ptr(), PAGE_SIZE);
}); });
if (--physical_page->ref_count == 0)
delete physical_page;
physical_page = new_physical_page;
return true; return true;
} }
@@ -67,16 +108,20 @@ namespace Kernel
{ {
ASSERT(&PageTable::current() == &m_page_table); ASSERT(&PageTable::current() == &m_page_table);
LockGuard _(m_mutex);
const size_t aligned_size = (m_size + PAGE_SIZE - 1) & PAGE_ADDR_MASK; const size_t aligned_size = (m_size + PAGE_SIZE - 1) & PAGE_ADDR_MASK;
auto result = TRY(MemoryBackedRegion::create(new_page_table, m_size, { .start = m_vaddr, .end = m_vaddr + aligned_size }, m_type, m_flags, m_status_flags)); auto result = TRY(MemoryBackedRegion::create(new_page_table, m_size, { .start = m_vaddr, .end = m_vaddr + aligned_size }, m_type, m_flags, m_status_flags));
for (size_t offset = 0; offset < m_size; offset += PAGE_SIZE) if (writable())
m_page_table.remove_writable_from_range(m_vaddr, m_size);
for (size_t i = 0; i < m_physical_pages.size(); i++)
{ {
paddr_t paddr = m_page_table.physical_address_of(m_vaddr + offset); if (m_physical_pages[i] == nullptr)
if (paddr == 0)
continue; continue;
const size_t to_copy = BAN::Math::min<size_t>(PAGE_SIZE, m_size - offset); result->m_physical_pages[i] = m_physical_pages[i];
TRY(result->copy_data_to_region(offset, (const uint8_t*)(m_vaddr + offset), to_copy)); result->m_physical_pages[i]->ref_count++;
} }
return BAN::UniqPtr<MemoryRegion>(BAN::move(result)); return BAN::UniqPtr<MemoryRegion>(BAN::move(result));
@@ -87,20 +132,35 @@ namespace Kernel
ASSERT(offset && offset < m_size); ASSERT(offset && offset < m_size);
ASSERT(offset % PAGE_SIZE == 0); ASSERT(offset % PAGE_SIZE == 0);
auto* new_region = new MemoryBackedRegion(m_page_table, m_size - offset, m_type, m_flags, m_status_flags); LockGuard _(m_mutex);
if (new_region == nullptr)
auto* new_region_ptr = new MemoryBackedRegion(m_page_table, m_size - offset, m_type, m_flags, m_status_flags);
if (new_region_ptr == nullptr)
return BAN::Error::from_errno(ENOMEM); return BAN::Error::from_errno(ENOMEM);
auto new_region = BAN::UniqPtr<MemoryBackedRegion>::adopt(new_region_ptr);
new_region->m_vaddr = m_vaddr + offset; new_region->m_vaddr = m_vaddr + offset;
const size_t moved_pages = (m_size - offset + PAGE_SIZE - 1) / PAGE_SIZE;
TRY(new_region->m_physical_pages.resize(moved_pages));
const size_t remaining_pages = m_physical_pages.size() - moved_pages;
for (size_t i = 0; i < moved_pages; i++)
new_region->m_physical_pages[i] = m_physical_pages[remaining_pages + i];
MUST(m_physical_pages.resize(remaining_pages));
m_size = offset; m_size = offset;
return BAN::UniqPtr<MemoryRegion>::adopt(new_region); return BAN::UniqPtr<MemoryRegion>(BAN::move(new_region));
} }
BAN::ErrorOr<void> MemoryBackedRegion::copy_data_to_region(size_t offset_into_region, const uint8_t* buffer, size_t buffer_size) BAN::ErrorOr<void> MemoryBackedRegion::copy_data_to_region(size_t offset_into_region, const uint8_t* buffer, size_t buffer_size)
{ {
ASSERT(offset_into_region + buffer_size <= m_size); ASSERT(offset_into_region + buffer_size <= m_size);
LockGuard _(m_mutex);
size_t written = 0; size_t written = 0;
while (written < buffer_size) while (written < buffer_size)
{ {
@@ -108,18 +168,18 @@ namespace Kernel
vaddr_t page_offset = write_vaddr % PAGE_SIZE; vaddr_t page_offset = write_vaddr % PAGE_SIZE;
size_t bytes = BAN::Math::min<size_t>(buffer_size - written, PAGE_SIZE - page_offset); size_t bytes = BAN::Math::min<size_t>(buffer_size - written, PAGE_SIZE - page_offset);
paddr_t paddr = m_page_table.physical_address_of(write_vaddr & PAGE_ADDR_MASK); if (!(m_page_table.get_page_flags(write_vaddr & PAGE_ADDR_MASK) & PageTable::ReadWrite))
if (paddr == 0)
{ {
if (!TRY(allocate_page_containing(write_vaddr, false))) if (!TRY(allocate_page_containing(write_vaddr, true)))
{ {
dwarnln("Could not allocate page for data copying"); dwarnln("Could not allocate page for data copying");
return BAN::Error::from_errno(EFAULT); return BAN::Error::from_errno(EFAULT);
} }
paddr = m_page_table.physical_address_of(write_vaddr & PAGE_ADDR_MASK);
ASSERT(paddr);
} }
const paddr_t paddr = m_page_table.physical_address_of(write_vaddr & PAGE_ADDR_MASK);
ASSERT(paddr);
PageTable::with_fast_page(paddr, [&] { PageTable::with_fast_page(paddr, [&] {
memcpy(PageTable::fast_page_as_ptr(page_offset), (void*)(buffer + written), bytes); memcpy(PageTable::fast_page_as_ptr(page_offset), (void*)(buffer + written), bytes);
}); });

View File

@@ -22,8 +22,10 @@ namespace Kernel
BAN::ErrorOr<void> MemoryRegion::initialize(AddressRange address_range) BAN::ErrorOr<void> MemoryRegion::initialize(AddressRange address_range)
{ {
size_t needed_pages = BAN::Math::div_round_up<size_t>(m_size, PAGE_SIZE); if (auto rem = address_range.end % PAGE_SIZE)
m_vaddr = m_page_table.reserve_free_contiguous_pages(needed_pages, address_range.start); address_range.end += PAGE_SIZE - rem;
const size_t needed_pages = BAN::Math::div_round_up<size_t>(m_size, PAGE_SIZE);
m_vaddr = m_page_table.reserve_free_contiguous_pages(needed_pages, address_range.start, address_range.end);
if (m_vaddr == 0) if (m_vaddr == 0)
return BAN::Error::from_errno(ENOMEM); return BAN::Error::from_errno(ENOMEM);
if (m_vaddr + m_size > address_range.end) if (m_vaddr + m_size > address_range.end)

View File

@@ -10,7 +10,7 @@ namespace Kernel
static constexpr size_t bits_per_page = PAGE_SIZE * 8; static constexpr size_t bits_per_page = PAGE_SIZE * 8;
PhysicalRange::PhysicalRange(paddr_t paddr, size_t size) PhysicalRange::PhysicalRange(paddr_t paddr, uint64_t size)
: m_paddr(paddr) : m_paddr(paddr)
, m_page_count(size / PAGE_SIZE) , m_page_count(size / PAGE_SIZE)
, m_free_pages(m_page_count) , m_free_pages(m_page_count)

View File

@@ -83,7 +83,7 @@ namespace Kernel
BAN::ErrorOr<BAN::UniqPtr<MemoryRegion>> SharedMemoryObject::clone(PageTable& new_page_table) BAN::ErrorOr<BAN::UniqPtr<MemoryRegion>> SharedMemoryObject::clone(PageTable& new_page_table)
{ {
auto region = TRY(SharedMemoryObjectManager::get().map_object(m_object->key, new_page_table, { .start = vaddr(), .end = vaddr() + size() })); auto region = TRY(SharedMemoryObject::create(m_object, new_page_table, { .start = vaddr(), .end = vaddr() + size() }));
return BAN::UniqPtr<MemoryRegion>(BAN::move(region)); return BAN::UniqPtr<MemoryRegion>(BAN::move(region));
} }

View File

@@ -110,36 +110,6 @@ namespace Kernel
return {}; return {};
} }
BAN::ErrorOr<BAN::UniqPtr<VirtualRange>> VirtualRange::clone(PageTable& page_table)
{
ASSERT(&PageTable::current() == &m_page_table);
ASSERT(&m_page_table != &page_table);
SpinLockGuard _(m_lock);
auto result = TRY(create_to_vaddr(page_table, vaddr(), size(), m_flags, m_preallocated, m_has_guard_pages));
const size_t page_count = size() / PAGE_SIZE;
for (size_t i = 0; i < page_count; i++)
{
if (m_paddrs[i] == 0)
continue;
if (!result->m_preallocated)
{
result->m_paddrs[i] = Heap::get().take_free_page();
if (result->m_paddrs[i] == 0)
return BAN::Error::from_errno(ENOMEM);
result->m_page_table.map_page_at(result->m_paddrs[i], vaddr() + i * PAGE_SIZE, m_flags);
}
PageTable::with_fast_page(result->m_paddrs[i], [&] {
memcpy(PageTable::fast_page_as_ptr(), reinterpret_cast<void*>(vaddr() + i * PAGE_SIZE), PAGE_SIZE);
});
}
return result;
}
BAN::ErrorOr<bool> VirtualRange::allocate_page_for_demand_paging(vaddr_t vaddr) BAN::ErrorOr<bool> VirtualRange::allocate_page_for_demand_paging(vaddr_t vaddr)
{ {
ASSERT(contains(vaddr)); ASSERT(contains(vaddr));

View File

@@ -17,27 +17,7 @@ namespace Kernel
BAN::ErrorOr<BAN::UniqPtr<ARPTable>> ARPTable::create() BAN::ErrorOr<BAN::UniqPtr<ARPTable>> ARPTable::create()
{ {
auto arp_table = TRY(BAN::UniqPtr<ARPTable>::create()); return TRY(BAN::UniqPtr<ARPTable>::create());
arp_table->m_thread = TRY(Thread::create_kernel(
[](void* arp_table_ptr)
{
auto& arp_table = *reinterpret_cast<ARPTable*>(arp_table_ptr);
arp_table.packet_handle_task();
}, arp_table.ptr()
));
TRY(Processor::scheduler().add_thread(arp_table->m_thread));
return arp_table;
}
ARPTable::ARPTable()
{
}
ARPTable::~ARPTable()
{
if (m_thread)
m_thread->add_signal(SIGKILL, {});
m_thread = nullptr;
} }
BAN::ErrorOr<BAN::MACAddress> ARPTable::get_mac_from_ipv4(NetworkInterface& interface, BAN::IPv4Address ipv4_address) BAN::ErrorOr<BAN::MACAddress> ARPTable::get_mac_from_ipv4(NetworkInterface& interface, BAN::IPv4Address ipv4_address)
@@ -64,7 +44,7 @@ namespace Kernel
ipv4_address = interface.get_gateway(); ipv4_address = interface.get_gateway();
{ {
SpinLockGuard _(m_table_lock); SpinLockGuard _(m_arp_table_lock);
auto it = m_arp_table.find(ipv4_address); auto it = m_arp_table.find(ipv4_address);
if (it != m_arp_table.end()) if (it != m_arp_table.end())
return it->value; return it->value;
@@ -87,7 +67,7 @@ namespace Kernel
while (SystemTimer::get().ms_since_boot() < timeout) while (SystemTimer::get().ms_since_boot() < timeout)
{ {
{ {
SpinLockGuard _(m_table_lock); SpinLockGuard _(m_arp_table_lock);
auto it = m_arp_table.find(ipv4_address); auto it = m_arp_table.find(ipv4_address);
if (it != m_arp_table.end()) if (it != m_arp_table.end())
return it->value; return it->value;
@@ -98,8 +78,16 @@ namespace Kernel
return BAN::Error::from_errno(ETIMEDOUT); return BAN::Error::from_errno(ETIMEDOUT);
} }
BAN::ErrorOr<void> ARPTable::handle_arp_packet(NetworkInterface& interface, const ARPPacket& packet) BAN::ErrorOr<void> ARPTable::handle_arp_packet(NetworkInterface& interface, BAN::ConstByteSpan buffer)
{ {
if (buffer.size() < sizeof(ARPPacket))
{
dwarnln_if(DEBUG_ARP, "Too small ARP packet");
return {};
}
const auto& packet = buffer.as<const ARPPacket>();
if (packet.ptype != EtherType::IPv4) if (packet.ptype != EtherType::IPv4)
{ {
dprintln("Non IPv4 arp packet?"); dprintln("Non IPv4 arp packet?");
@@ -112,23 +100,24 @@ namespace Kernel
{ {
if (packet.tpa == interface.get_ipv4_address()) if (packet.tpa == interface.get_ipv4_address())
{ {
ARPPacket arp_reply; const ARPPacket arp_reply {
arp_reply.htype = 0x0001; .htype = 0x0001,
arp_reply.ptype = EtherType::IPv4; .ptype = EtherType::IPv4,
arp_reply.hlen = 0x06; .hlen = 0x06,
arp_reply.plen = 0x04; .plen = 0x04,
arp_reply.oper = ARPOperation::Reply; .oper = ARPOperation::Reply,
arp_reply.sha = interface.get_mac_address(); .sha = interface.get_mac_address(),
arp_reply.spa = interface.get_ipv4_address(); .spa = interface.get_ipv4_address(),
arp_reply.tha = packet.sha; .tha = packet.sha,
arp_reply.tpa = packet.spa; .tpa = packet.spa,
};
TRY(interface.send_bytes(packet.sha, EtherType::ARP, BAN::ConstByteSpan::from(arp_reply))); TRY(interface.send_bytes(packet.sha, EtherType::ARP, BAN::ConstByteSpan::from(arp_reply)));
} }
break; break;
} }
case ARPOperation::Reply: case ARPOperation::Reply:
{ {
SpinLockGuard _(m_table_lock); SpinLockGuard _(m_arp_table_lock);
auto it = m_arp_table.find(packet.spa); auto it = m_arp_table.find(packet.spa);
if (it != m_arp_table.end()) if (it != m_arp_table.end())
@@ -154,48 +143,4 @@ namespace Kernel
return {}; return {};
} }
void ARPTable::packet_handle_task()
{
for (;;)
{
PendingArpPacket pending = ({
SpinLockGuard guard(m_pending_lock);
while (m_pending_packets.empty())
{
SpinLockGuardAsMutex smutex(guard);
m_pending_thread_blocker.block_indefinite(&smutex);
}
auto packet = m_pending_packets.front();
m_pending_packets.pop();
packet;
});
if (auto ret = handle_arp_packet(pending.interface, pending.packet); ret.is_error())
dwarnln("{}", ret.error());
}
}
void ARPTable::add_arp_packet(NetworkInterface& interface, BAN::ConstByteSpan buffer)
{
if (buffer.size() < sizeof(ARPPacket))
{
dwarnln_if(DEBUG_ARP, "ARP packet too small");
return;
}
auto& arp_packet = buffer.as<const ARPPacket>();
SpinLockGuard _(m_pending_lock);
if (m_pending_packets.full())
{
dwarnln_if(DEBUG_ARP, "ARP packet queue full");
return;
}
m_pending_packets.push({ .interface = interface, .packet = arp_packet });
m_pending_thread_blocker.unblock();
}
} }

View File

@@ -1,6 +1,7 @@
#include <kernel/IDT.h> #include <kernel/IDT.h>
#include <kernel/InterruptController.h> #include <kernel/InterruptController.h>
#include <kernel/IO.h> #include <kernel/IO.h>
#include <kernel/Lock/SpinLockAsMutex.h>
#include <kernel/Memory/PageTable.h> #include <kernel/Memory/PageTable.h>
#include <kernel/MMIO.h> #include <kernel/MMIO.h>
#include <kernel/Networking/E1000/E1000.h> #include <kernel/Networking/E1000/E1000.h>
@@ -57,6 +58,11 @@ namespace Kernel
E1000::~E1000() E1000::~E1000()
{ {
m_thread_should_die = true;
m_thread_blocker.unblock();
while (!m_thread_is_dead)
Processor::yield();
} }
BAN::ErrorOr<void> E1000::initialize() BAN::ErrorOr<void> E1000::initialize()
@@ -84,6 +90,16 @@ namespace Kernel
dprintln(" link speed: {} Mbps", speed); dprintln(" link speed: {} Mbps", speed);
} }
auto* thread = TRY(Thread::create_kernel([](void* e1000_ptr) {
static_cast<E1000*>(e1000_ptr)->receive_thread();
}, this));
if (auto ret = Processor::scheduler().add_thread(thread); ret.is_error())
{
delete thread;
return ret.release_error();
}
m_thread_is_dead = false;
return {}; return {};
} }
@@ -259,10 +275,8 @@ namespace Kernel
return {}; return {};
} }
BAN::ErrorOr<void> E1000::send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::ConstByteSpan buffer) BAN::ErrorOr<void> E1000::send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::Span<const BAN::ConstByteSpan> payload)
{ {
ASSERT(buffer.size() + sizeof(EthernetHeader) <= E1000_TX_BUFFER_SIZE);
SpinLockGuard _(m_lock); SpinLockGuard _(m_lock);
size_t tx_current = read32(REG_TDT) % E1000_TX_DESCRIPTOR_COUNT; size_t tx_current = read32(REG_TDT) % E1000_TX_DESCRIPTOR_COUNT;
@@ -274,48 +288,75 @@ namespace Kernel
ethernet_header.src_mac = get_mac_address(); ethernet_header.src_mac = get_mac_address();
ethernet_header.ether_type = protocol; ethernet_header.ether_type = protocol;
memcpy(tx_buffer + sizeof(EthernetHeader), buffer.data(), buffer.size()); size_t packet_size = sizeof(EthernetHeader);
for (const auto& buffer : payload)
{
ASSERT(packet_size + buffer.size() < E1000_TX_BUFFER_SIZE);
memcpy(tx_buffer + packet_size, buffer.data(), buffer.size());
packet_size += buffer.size();
}
auto& descriptor = reinterpret_cast<volatile e1000_tx_desc*>(m_tx_descriptor_region->vaddr())[tx_current]; auto& descriptor = reinterpret_cast<volatile e1000_tx_desc*>(m_tx_descriptor_region->vaddr())[tx_current];
descriptor.length = sizeof(EthernetHeader) + buffer.size(); descriptor.length = packet_size;
descriptor.status = 0; descriptor.status = 0;
descriptor.cmd = CMD_EOP | CMD_IFCS | CMD_RS; descriptor.cmd = CMD_EOP | CMD_IFCS | CMD_RS;
// FIXME: there isnt really any reason to wait for transmission
write32(REG_TDT, (tx_current + 1) % E1000_TX_DESCRIPTOR_COUNT); write32(REG_TDT, (tx_current + 1) % E1000_TX_DESCRIPTOR_COUNT);
while (descriptor.status == 0) while (descriptor.status == 0)
continue; continue;
dprintln_if(DEBUG_E1000, "sent {} bytes", sizeof(EthernetHeader) + buffer.size()); dprintln_if(DEBUG_E1000, "sent {} bytes", packet_size);
return {}; return {};
} }
void E1000::receive_thread()
{
SpinLockGuard _(m_lock);
while (!m_thread_should_die)
{
for (;;)
{
const uint32_t rx_current = (read32(REG_RDT0) + 1) % E1000_RX_DESCRIPTOR_COUNT;
auto& descriptor = reinterpret_cast<volatile e1000_rx_desc*>(m_rx_descriptor_region->vaddr())[rx_current];
if (!(descriptor.status & 1))
break;
ASSERT(descriptor.length <= E1000_RX_BUFFER_SIZE);
dprintln_if(DEBUG_E1000, "got {} bytes", (uint16_t)descriptor.length);
m_lock.unlock(InterruptState::Enabled);
NetworkManager::get().on_receive(*this, BAN::ConstByteSpan {
reinterpret_cast<const uint8_t*>(m_rx_buffer_region->vaddr() + rx_current * E1000_RX_BUFFER_SIZE),
descriptor.length
});
m_lock.lock();
descriptor.status = 0;
write32(REG_RDT0, rx_current);
}
SpinLockAsMutex smutex(m_lock, InterruptState::Enabled);
m_thread_blocker.block_indefinite(&smutex);
}
m_thread_is_dead = true;
}
void E1000::handle_irq() void E1000::handle_irq()
{ {
const uint32_t icr = read32(REG_ICR); const uint32_t icr = read32(REG_ICR);
if (!(icr & (ICR_RxQ0 | ICR_RXT0)))
return;
write32(REG_ICR, icr); write32(REG_ICR, icr);
SpinLockGuard _(m_lock); if (icr & (ICR_RxQ0 | ICR_RXT0))
{
for (;;) { SpinLockGuard _(m_lock);
uint32_t rx_current = (read32(REG_RDT0) + 1) % E1000_RX_DESCRIPTOR_COUNT; m_thread_blocker.unblock();
auto& descriptor = reinterpret_cast<volatile e1000_rx_desc*>(m_rx_descriptor_region->vaddr())[rx_current];
if (!(descriptor.status & 1))
break;
ASSERT(descriptor.length <= E1000_RX_BUFFER_SIZE);
dprintln_if(DEBUG_E1000, "got {} bytes", (uint16_t)descriptor.length);
NetworkManager::get().on_receive(*this, BAN::ConstByteSpan {
reinterpret_cast<const uint8_t*>(m_rx_buffer_region->vaddr() + rx_current * E1000_RX_BUFFER_SIZE),
descriptor.length
});
descriptor.status = 0;
write32(REG_RDT0, rx_current);
} }
} }

View File

@@ -21,50 +21,26 @@ namespace Kernel
BAN::ErrorOr<BAN::UniqPtr<IPv4Layer>> IPv4Layer::create() BAN::ErrorOr<BAN::UniqPtr<IPv4Layer>> IPv4Layer::create()
{ {
auto ipv4_manager = TRY(BAN::UniqPtr<IPv4Layer>::create()); auto ipv4_manager = TRY(BAN::UniqPtr<IPv4Layer>::create());
ipv4_manager->m_thread = TRY(Thread::create_kernel(
[](void* ipv4_manager_ptr)
{
auto& ipv4_manager = *reinterpret_cast<IPv4Layer*>(ipv4_manager_ptr);
ipv4_manager.packet_handle_task();
}, ipv4_manager.ptr()
));
TRY(Processor::scheduler().add_thread(ipv4_manager->m_thread));
ipv4_manager->m_pending_packet_buffer = TRY(VirtualRange::create_to_vaddr_range(
PageTable::kernel(),
KERNEL_OFFSET,
~(uintptr_t)0,
pending_packet_buffer_size,
PageTable::Flags::ReadWrite | PageTable::Flags::Present,
true, false
));
ipv4_manager->m_arp_table = TRY(ARPTable::create()); ipv4_manager->m_arp_table = TRY(ARPTable::create());
return ipv4_manager; return ipv4_manager;
} }
IPv4Layer::IPv4Layer() static IPv4Header get_ipv4_header(size_t packet_size, BAN::IPv4Address src_ipv4, BAN::IPv4Address dst_ipv4, uint8_t protocol)
{ }
IPv4Layer::~IPv4Layer()
{ {
if (m_thread) IPv4Header header {
m_thread->add_signal(SIGKILL, {}); .version_IHL = 0x45,
m_thread = nullptr; .DSCP_ECN = 0x00,
} .total_length = packet_size,
.identification = 1,
void IPv4Layer::add_ipv4_header(BAN::ByteSpan packet, BAN::IPv4Address src_ipv4, BAN::IPv4Address dst_ipv4, uint8_t protocol) const .flags_frament = 0x00,
{ .time_to_live = 0x40,
auto& header = packet.as<IPv4Header>(); .protocol = protocol,
header.version_IHL = 0x45; .checksum = 0,
header.DSCP_ECN = 0x00; .src_address = src_ipv4,
header.total_length = packet.size(); .dst_address = dst_ipv4,
header.identification = 1; };
header.flags_frament = 0x00; header.checksum = calculate_internet_checksum(BAN::ConstByteSpan::from(header));
header.time_to_live = 0x40; return header;
header.protocol = protocol;
header.src_address = src_ipv4;
header.dst_address = dst_ipv4;
header.checksum = 0;
header.checksum = calculate_internet_checksum(BAN::ConstByteSpan::from(header), {});
} }
void IPv4Layer::unbind_socket(uint16_t port) void IPv4Layer::unbind_socket(uint16_t port)
@@ -170,8 +146,8 @@ namespace Kernel
SpinLockGuard _(m_bound_socket_lock); SpinLockGuard _(m_bound_socket_lock);
if (bind_address.sin_port == 0) if (bind_address.sin_port == 0)
bind_address.sin_port = TRY(find_free_port()); bind_address.sin_port = BAN::host_to_network_endian(TRY(find_free_port()));
const uint16_t port = BAN::host_to_network_endian(bind_address.sin_port); const uint16_t port = BAN::network_endian_to_host(bind_address.sin_port);
if (m_bound_sockets.contains(port)) if (m_bound_sockets.contains(port))
return BAN::Error::from_errno(EADDRINUSE); return BAN::Error::from_errno(EADDRINUSE);
@@ -204,7 +180,7 @@ namespace Kernel
return {}; return {};
} }
BAN::ErrorOr<size_t> IPv4Layer::sendto(NetworkSocket& socket, BAN::ConstByteSpan buffer, const sockaddr* address, socklen_t address_len) BAN::ErrorOr<size_t> IPv4Layer::sendto(NetworkSocket& socket, BAN::ConstByteSpan payload, const sockaddr* address, socklen_t address_len)
{ {
if (address->sa_family != AF_INET) if (address->sa_family != AF_INET)
return BAN::Error::from_errno(EINVAL); return BAN::Error::from_errno(EINVAL);
@@ -231,46 +207,63 @@ namespace Kernel
if (!receiver) if (!receiver)
return BAN::Error::from_errno(EADDRNOTAVAIL); return BAN::Error::from_errno(EADDRNOTAVAIL);
TRY(socket.interface(receiver->address(), receiver->address_len()));
} }
BAN::Vector<uint8_t> packet_buffer; const auto ipv4_header = get_ipv4_header(
TRY(packet_buffer.resize(buffer.size() + sizeof(IPv4Header) + socket.protocol_header_size())); sizeof(IPv4Header) + socket.protocol_header_size() + payload.size(),
auto packet = BAN::ByteSpan { packet_buffer.span() };
auto pseudo_header = PseudoHeader {
.src_ipv4 = interface->get_ipv4_address(),
.dst_ipv4 = dst_ipv4,
.protocol = socket.protocol()
};
memcpy(
packet.slice(sizeof(IPv4Header)).slice(socket.protocol_header_size()).data(),
buffer.data(),
buffer.size()
);
socket.add_protocol_header(
packet.slice(sizeof(IPv4Header)),
dst_port,
pseudo_header
);
add_ipv4_header(
packet,
interface->get_ipv4_address(), interface->get_ipv4_address(),
dst_ipv4, dst_ipv4,
socket.protocol() socket.protocol()
); );
TRY(interface->send_bytes(dst_mac, EtherType::IPv4, packet)); const auto pseudo_header = PseudoHeader {
.src_ipv4 = interface->get_ipv4_address(),
.dst_ipv4 = dst_ipv4,
.protocol = socket.protocol(),
.length = socket.protocol_header_size() + payload.size()
};
return buffer.size(); uint8_t protocol_header_buffer[32];
ASSERT(socket.protocol_header_size() < sizeof(protocol_header_buffer));
auto protocol_header = BAN::ByteSpan::from(protocol_header_buffer).slice(0, socket.protocol_header_size());
socket.get_protocol_header(protocol_header, payload, dst_port, pseudo_header);
BAN::ConstByteSpan buffers[] {
BAN::ConstByteSpan::from(ipv4_header),
protocol_header,
payload,
};
TRY(interface->send_bytes(dst_mac, EtherType::IPv4, { buffers, sizeof(buffers) / sizeof(*buffers) }));
return payload.size();
} }
BAN::ErrorOr<void> IPv4Layer::handle_ipv4_packet(NetworkInterface& interface, BAN::ByteSpan packet) BAN::ErrorOr<void> IPv4Layer::handle_ipv4_packet(NetworkInterface& interface, BAN::ConstByteSpan packet)
{ {
ASSERT(packet.size() >= sizeof(IPv4Header)); if (packet.size() < sizeof(IPv4Header))
{
dwarnln_if(DEBUG_IPV4, "Too small IPv4 packet");
return {};
}
auto& ipv4_header = packet.as<const IPv4Header>(); auto& ipv4_header = packet.as<const IPv4Header>();
auto ipv4_data = packet.slice(sizeof(IPv4Header)); if (calculate_internet_checksum(BAN::ConstByteSpan::from(ipv4_header)) != 0)
{
dwarnln_if(DEBUG_IPV4, "IPv4 packet checksum failed");
return {};
}
if (ipv4_header.total_length > packet.size() || ipv4_header.total_length > interface.payload_mtu() || ipv4_header.total_length < sizeof(IPv4Header))
{
if (ipv4_header.flags_frament & IPv4Flags::DF)
dwarnln_if(DEBUG_IPV4, "Invalid IPv4 packet");
else
dwarnln_if(DEBUG_IPV4, "IPv4 fragmentation not supported");
return {};
}
auto ipv4_data = packet.slice(0, ipv4_header.total_length).slice(sizeof(IPv4Header));
auto src_ipv4 = ipv4_header.src_address; auto src_ipv4 = ipv4_header.src_address;
@@ -293,14 +286,33 @@ namespace Kernel
{ {
auto dst_mac = TRY(m_arp_table->get_mac_from_ipv4(interface, src_ipv4)); auto dst_mac = TRY(m_arp_table->get_mac_from_ipv4(interface, src_ipv4));
auto& reply_icmp_header = ipv4_data.as<ICMPHeader>(); auto send_ipv4_header = get_ipv4_header(
reply_icmp_header.type = ICMPType::EchoReply; ipv4_data.size(),
reply_icmp_header.checksum = 0; interface.get_ipv4_address(),
reply_icmp_header.checksum = calculate_internet_checksum(ipv4_data, {}); src_ipv4,
NetworkProtocol::ICMP
);
add_ipv4_header(packet, interface.get_ipv4_address(), src_ipv4, NetworkProtocol::ICMP); ICMPHeader send_icmp_header {
.type = ICMPType::EchoReply,
.code = icmp_header.code,
.checksum = 0,
.rest = icmp_header.rest,
};
auto send_payload = ipv4_data.slice(sizeof(ICMPHeader));
const BAN::ConstByteSpan send_buffers[] {
BAN::ConstByteSpan::from(send_ipv4_header),
BAN::ConstByteSpan::from(send_icmp_header),
send_payload
};
auto send_buffers_span = BAN::Span { send_buffers, sizeof(send_buffers) / sizeof(*send_buffers) };
send_icmp_header.checksum = calculate_internet_checksum(send_buffers_span.slice(1));
TRY(interface.send_bytes(dst_mac, EtherType::IPv4, send_buffers_span));
TRY(interface.send_bytes(dst_mac, EtherType::IPv4, packet));
break; break;
} }
case ICMPType::DestinationUnreachable: case ICMPType::DestinationUnreachable:
@@ -382,80 +394,4 @@ namespace Kernel
return {}; return {};
} }
void IPv4Layer::packet_handle_task()
{
for (;;)
{
PendingIPv4Packet pending = ({
SpinLockGuard guard(m_pending_lock);
while (m_pending_packets.empty())
{
SpinLockGuardAsMutex smutex(guard);
m_pending_thread_blocker.block_indefinite(&smutex);
}
auto packet = m_pending_packets.front();
m_pending_packets.pop();
packet;
});
uint8_t* buffer_start = reinterpret_cast<uint8_t*>(m_pending_packet_buffer->vaddr());
const size_t ipv4_packet_size = reinterpret_cast<const IPv4Header*>(buffer_start)->total_length;
if (auto ret = handle_ipv4_packet(pending.interface, BAN::ByteSpan(buffer_start, ipv4_packet_size)); ret.is_error())
dwarnln_if(DEBUG_IPV4, "{}", ret.error());
SpinLockGuard _(m_pending_lock);
m_pending_total_size -= ipv4_packet_size;
if (m_pending_total_size)
memmove(buffer_start, buffer_start + ipv4_packet_size, m_pending_total_size);
}
}
void IPv4Layer::add_ipv4_packet(NetworkInterface& interface, BAN::ConstByteSpan buffer)
{
if (buffer.size() < sizeof(IPv4Header))
{
dwarnln_if(DEBUG_IPV4, "IPv4 packet too small");
return;
}
SpinLockGuard _(m_pending_lock);
if (m_pending_packets.full())
{
dwarnln_if(DEBUG_IPV4, "IPv4 packet queue full");
return;
}
if (m_pending_total_size + buffer.size() > m_pending_packet_buffer->size())
{
dwarnln_if(DEBUG_IPV4, "IPv4 packet queue full");
return;
}
auto& ipv4_header = buffer.as<const IPv4Header>();
if (calculate_internet_checksum(BAN::ConstByteSpan::from(ipv4_header), {}) != 0)
{
dwarnln_if(DEBUG_IPV4, "Invalid IPv4 packet");
return;
}
if (ipv4_header.total_length > buffer.size() || ipv4_header.total_length > interface.payload_mtu())
{
if (ipv4_header.flags_frament & IPv4Flags::DF)
dwarnln_if(DEBUG_IPV4, "Invalid IPv4 packet");
else
dwarnln_if(DEBUG_IPV4, "IPv4 fragmentation not supported");
return;
}
uint8_t* buffer_start = reinterpret_cast<uint8_t*>(m_pending_packet_buffer->vaddr());
memcpy(buffer_start + m_pending_total_size, buffer.data(), ipv4_header.total_length);
m_pending_total_size += ipv4_header.total_length;
m_pending_packets.push({ .interface = interface });
m_pending_thread_blocker.unblock();
}
} }

View File

@@ -1,3 +1,4 @@
#include <kernel/Lock/LockGuard.h>
#include <kernel/Networking/Loopback.h> #include <kernel/Networking/Loopback.h>
#include <kernel/Networking/NetworkManager.h> #include <kernel/Networking/NetworkManager.h>
@@ -10,40 +11,121 @@ namespace Kernel
if (loopback_ptr == nullptr) if (loopback_ptr == nullptr)
return BAN::Error::from_errno(ENOMEM); return BAN::Error::from_errno(ENOMEM);
auto loopback = BAN::RefPtr<LoopbackInterface>::adopt(loopback_ptr); auto loopback = BAN::RefPtr<LoopbackInterface>::adopt(loopback_ptr);
loopback->m_buffer = TRY(VirtualRange::create_to_vaddr_range( loopback->m_buffer = TRY(VirtualRange::create_to_vaddr_range(
PageTable::kernel(), PageTable::kernel(),
KERNEL_OFFSET, KERNEL_OFFSET,
BAN::numeric_limits<vaddr_t>::max(), BAN::numeric_limits<vaddr_t>::max(),
buffer_size, buffer_size * buffer_count,
PageTable::Flags::ReadWrite | PageTable::Flags::Present, PageTable::Flags::ReadWrite | PageTable::Flags::Present,
true, false true, false
)); ));
auto* thread = TRY(Thread::create_kernel([](void* loopback_ptr) {
static_cast<LoopbackInterface*>(loopback_ptr)->receive_thread();
}, loopback_ptr));
if (auto ret = Processor::scheduler().add_thread(thread); ret.is_error())
{
delete thread;
return ret.release_error();
}
loopback->m_thread_is_dead = false;
loopback->set_ipv4_address({ 127, 0, 0, 1 }); loopback->set_ipv4_address({ 127, 0, 0, 1 });
loopback->set_netmask({ 255, 0, 0, 0 }); loopback->set_netmask({ 255, 0, 0, 0 });
for (size_t i = 0; i < buffer_count; i++)
{
loopback->m_descriptors[i] = {
.addr = reinterpret_cast<uint8_t*>(loopback->m_buffer->vaddr()) + i * buffer_size,
.size = 0,
.state = 0,
};
}
return loopback; return loopback;
} }
BAN::ErrorOr<void> LoopbackInterface::send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::ConstByteSpan buffer) LoopbackInterface::~LoopbackInterface()
{ {
ASSERT(buffer.size() + sizeof(EthernetHeader) <= buffer_size); m_thread_should_die = true;
m_thread_blocker.unblock();
SpinLockGuard _(m_buffer_lock); while (!m_thread_is_dead)
Processor::yield();
}
uint8_t* buffer_vaddr = reinterpret_cast<uint8_t*>(m_buffer->vaddr()); BAN::ErrorOr<void> LoopbackInterface::send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::Span<const BAN::ConstByteSpan> payload)
{
auto& descriptor =
[&]() -> Descriptor&
{
LockGuard _(m_buffer_lock);
for (;;)
{
auto& descriptor = m_descriptors[m_buffer_head];
if (descriptor.state == 0)
{
m_buffer_head = (m_buffer_head + 1) % buffer_count;
descriptor.state = 1;
return descriptor;
}
m_thread_blocker.block_indefinite(&m_buffer_lock);
}
}();
auto& ethernet_header = *reinterpret_cast<EthernetHeader*>(buffer_vaddr); auto& ethernet_header = *reinterpret_cast<EthernetHeader*>(descriptor.addr);
ethernet_header.dst_mac = destination; ethernet_header.dst_mac = destination;
ethernet_header.src_mac = get_mac_address(); ethernet_header.src_mac = get_mac_address();
ethernet_header.ether_type = protocol; ethernet_header.ether_type = protocol;
memcpy(buffer_vaddr + sizeof(EthernetHeader), buffer.data(), buffer.size()); size_t packet_size = sizeof(EthernetHeader);
for (const auto& buffer : payload)
{
ASSERT(packet_size + buffer.size() <= buffer_size);
memcpy(descriptor.addr + packet_size, buffer.data(), buffer.size());
packet_size += buffer.size();
}
NetworkManager::get().on_receive(*this, BAN::ConstByteSpan { LockGuard _(m_buffer_lock);
buffer_vaddr, descriptor.size = packet_size;
buffer.size() + sizeof(EthernetHeader) descriptor.state = 2;
}); m_thread_blocker.unblock();
return {}; return {};
} }
void LoopbackInterface::receive_thread()
{
LockGuard _(m_buffer_lock);
while (!m_thread_should_die)
{
for (;;)
{
auto& descriptor = m_descriptors[m_buffer_tail];
if (descriptor.state != 2)
break;
m_buffer_tail = (m_buffer_tail + 1) % buffer_count;
m_buffer_lock.unlock();
NetworkManager::get().on_receive(*this, {
descriptor.addr,
descriptor.size,
});
m_buffer_lock.lock();
descriptor.size = 0;
descriptor.state = 0;
m_thread_blocker.unblock();
}
m_thread_blocker.block_indefinite(&m_buffer_lock);
}
m_thread_is_dead = true;
}
} }

View File

@@ -3,15 +3,28 @@
namespace Kernel namespace Kernel
{ {
uint16_t calculate_internet_checksum(BAN::ConstByteSpan packet, const PseudoHeader& pseudo_header) uint16_t calculate_internet_checksum(BAN::ConstByteSpan buffer)
{
return calculate_internet_checksum({ &buffer, 1 });
}
uint16_t calculate_internet_checksum(BAN::Span<const BAN::ConstByteSpan> buffers)
{ {
uint32_t checksum = 0; uint32_t checksum = 0;
for (size_t i = 0; i < sizeof(pseudo_header) / sizeof(uint16_t); i++)
checksum += BAN::host_to_network_endian(reinterpret_cast<const uint16_t*>(&pseudo_header)[i]); for (size_t i = 0; i < buffers.size(); i++)
for (size_t i = 0; i < packet.size() / sizeof(uint16_t); i++) {
checksum += BAN::host_to_network_endian(reinterpret_cast<const uint16_t*>(packet.data())[i]); auto buffer = buffers[i];
if (packet.size() % 2)
checksum += (uint16_t)packet[packet.size() - 1] << 8; const uint16_t* buffer_u16 = reinterpret_cast<const uint16_t*>(buffer.data());
for (size_t j = 0; j < buffer.size() / 2; j++)
checksum += BAN::host_to_network_endian(buffer_u16[j]);
if (buffer.size() % 2 == 0)
continue;
ASSERT(i == buffers.size() - 1);
checksum += buffer[buffer.size() - 1] << 8;
}
while (checksum >> 16) while (checksum >> 16)
checksum = (checksum >> 16) + (checksum & 0xFFFF); checksum = (checksum >> 16) + (checksum & 0xFFFF);
return ~(uint16_t)checksum; return ~(uint16_t)checksum;

View File

@@ -154,18 +154,18 @@ namespace Kernel
return; return;
auto ethernet_header = packet.as<const EthernetHeader>(); auto ethernet_header = packet.as<const EthernetHeader>();
auto packet_data = packet.slice(sizeof(EthernetHeader));
switch (ethernet_header.ether_type) switch (ethernet_header.ether_type)
{ {
case EtherType::ARP: case EtherType::ARP:
{ if (auto ret = m_ipv4_layer->arp_table().handle_arp_packet(interface, packet_data); ret.is_error())
m_ipv4_layer->arp_table().add_arp_packet(interface, packet.slice(sizeof(EthernetHeader))); dwarnln("ARP: {}", ret.error());
break; break;
}
case EtherType::IPv4: case EtherType::IPv4:
{ if (auto ret = m_ipv4_layer->handle_ipv4_packet(interface, packet_data); ret.is_error())
m_ipv4_layer->add_ipv4_packet(interface, packet.slice(sizeof(EthernetHeader))); dwarnln("IPv4; {}", ret.error());
break; break;
}
default: default:
dprintln_if(DEBUG_ETHERTYPE, "Unknown EtherType 0x{4H}", (uint16_t)ethernet_header.ether_type); dprintln_if(DEBUG_ETHERTYPE, "Unknown EtherType 0x{4H}", (uint16_t)ethernet_header.ether_type);
break; break;

View File

@@ -7,6 +7,9 @@
namespace Kernel namespace Kernel
{ {
// each buffer is 7440 bytes + padding = 8192
constexpr size_t s_buffer_size = 8192;
bool RTL8169::probe(PCI::Device& pci_device) bool RTL8169::probe(PCI::Device& pci_device)
{ {
if (pci_device.vendor_id() != 0x10ec) if (pci_device.vendor_id() != 0x10ec)
@@ -68,9 +71,28 @@ namespace Kernel
// lock config registers // lock config registers
m_io_bar_region->write8(RTL8169_IO_9346CR, RTL8169_9346CR_MODE_NORMAL); m_io_bar_region->write8(RTL8169_IO_9346CR, RTL8169_9346CR_MODE_NORMAL);
auto* thread = TRY(Thread::create_kernel([](void* rtl8169_ptr) {
static_cast<RTL8169*>(rtl8169_ptr)->receive_thread();
}, this));
if (auto ret = Processor::scheduler().add_thread(thread); ret.is_error())
{
delete thread;
return ret.release_error();
}
m_thread_is_dead = false;
return {}; return {};
} }
RTL8169::~RTL8169()
{
m_thread_should_die = true;
m_thread_blocker.unblock();
while (!m_thread_is_dead)
Processor::yield();
}
BAN::ErrorOr<void> RTL8169::reset() BAN::ErrorOr<void> RTL8169::reset()
{ {
m_io_bar_region->write8(RTL8169_IO_CR, RTL8169_CR_RST); m_io_bar_region->write8(RTL8169_IO_CR, RTL8169_CR_RST);
@@ -85,15 +107,12 @@ namespace Kernel
BAN::ErrorOr<void> RTL8169::initialize_rx() BAN::ErrorOr<void> RTL8169::initialize_rx()
{ {
// each buffer is 7440 bytes + padding = 8192 m_rx_buffer_region = TRY(DMARegion::create(m_rx_descriptor_count * s_buffer_size));
constexpr size_t buffer_size = 2 * PAGE_SIZE;
m_rx_buffer_region = TRY(DMARegion::create(m_rx_descriptor_count * buffer_size));
m_rx_descriptor_region = TRY(DMARegion::create(m_rx_descriptor_count * sizeof(RTL8169Descriptor))); m_rx_descriptor_region = TRY(DMARegion::create(m_rx_descriptor_count * sizeof(RTL8169Descriptor)));
for (size_t i = 0; i < m_rx_descriptor_count; i++) for (size_t i = 0; i < m_rx_descriptor_count; i++)
{ {
const paddr_t rx_buffer_paddr = m_rx_buffer_region->paddr() + i * buffer_size; const paddr_t rx_buffer_paddr = m_rx_buffer_region->paddr() + i * s_buffer_size;
uint32_t command = 0x1FF8 | RTL8169_DESC_CMD_OWN; uint32_t command = 0x1FF8 | RTL8169_DESC_CMD_OWN;
if (i == m_rx_descriptor_count - 1) if (i == m_rx_descriptor_count - 1)
@@ -120,21 +139,17 @@ namespace Kernel
// configure max rx packet size // configure max rx packet size
m_io_bar_region->write16(RTL8169_IO_RMS, RTL8169_RMS_MAX); m_io_bar_region->write16(RTL8169_IO_RMS, RTL8169_RMS_MAX);
return {}; return {};
} }
BAN::ErrorOr<void> RTL8169::initialize_tx() BAN::ErrorOr<void> RTL8169::initialize_tx()
{ {
// each buffer is 7440 bytes + padding = 8192 m_tx_buffer_region = TRY(DMARegion::create(m_tx_descriptor_count * s_buffer_size));
constexpr size_t buffer_size = 2 * PAGE_SIZE;
m_tx_buffer_region = TRY(DMARegion::create(m_tx_descriptor_count * buffer_size));
m_tx_descriptor_region = TRY(DMARegion::create(m_tx_descriptor_count * sizeof(RTL8169Descriptor))); m_tx_descriptor_region = TRY(DMARegion::create(m_tx_descriptor_count * sizeof(RTL8169Descriptor)));
for (size_t i = 0; i < m_tx_descriptor_count; i++) for (size_t i = 0; i < m_tx_descriptor_count; i++)
{ {
const paddr_t tx_buffer_paddr = m_tx_buffer_region->paddr() + i * buffer_size; const paddr_t tx_buffer_paddr = m_tx_buffer_region->paddr() + i * s_buffer_size;
uint32_t command = 0; uint32_t command = 0;
if (i == m_tx_descriptor_count - 1) if (i == m_tx_descriptor_count - 1)
@@ -194,14 +209,8 @@ namespace Kernel
return 0; return 0;
} }
BAN::ErrorOr<void> RTL8169::send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::ConstByteSpan buffer) BAN::ErrorOr<void> RTL8169::send_bytes(BAN::MACAddress destination, EtherType protocol, BAN::Span<const BAN::ConstByteSpan> payload)
{ {
constexpr size_t buffer_size = 8192;
const uint16_t packet_size = sizeof(EthernetHeader) + buffer.size();
if (packet_size > buffer_size)
return BAN::Error::from_errno(EINVAL);
if (!link_up()) if (!link_up())
return BAN::Error::from_errno(EADDRNOTAVAIL); return BAN::Error::from_errno(EADDRNOTAVAIL);
@@ -219,14 +228,20 @@ namespace Kernel
m_lock.unlock(state); m_lock.unlock(state);
auto* tx_buffer = reinterpret_cast<uint8_t*>(m_tx_buffer_region->vaddr() + tx_current * buffer_size); auto* tx_buffer = reinterpret_cast<uint8_t*>(m_tx_buffer_region->vaddr() + tx_current * s_buffer_size);
// write packet // write packet
auto& ethernet_header = *reinterpret_cast<EthernetHeader*>(tx_buffer); auto& ethernet_header = *reinterpret_cast<EthernetHeader*>(tx_buffer);
ethernet_header.dst_mac = destination; ethernet_header.dst_mac = destination;
ethernet_header.src_mac = get_mac_address(); ethernet_header.src_mac = get_mac_address();
ethernet_header.ether_type = protocol; ethernet_header.ether_type = protocol;
memcpy(tx_buffer + sizeof(EthernetHeader), buffer.data(), buffer.size());
size_t packet_size = sizeof(EthernetHeader);
for (const auto& buffer : payload)
{
memcpy(tx_buffer + packet_size, buffer.data(), buffer.size());
packet_size += buffer.size();
}
// give packet ownership to NIC // give packet ownership to NIC
uint32_t command = packet_size | RTL8169_DESC_CMD_OWN | RTL8169_DESC_CMD_LS | RTL8169_DESC_CMD_FS; uint32_t command = packet_size | RTL8169_DESC_CMD_OWN | RTL8169_DESC_CMD_LS | RTL8169_DESC_CMD_FS;
@@ -240,6 +255,50 @@ namespace Kernel
return {}; return {};
} }
void RTL8169::receive_thread()
{
SpinLockGuard _(m_lock);
while (!m_thread_should_die)
{
for (;;)
{
auto& descriptor = reinterpret_cast<volatile RTL8169Descriptor*>(m_rx_descriptor_region->vaddr())[m_rx_current];
if (descriptor.command & RTL8169_DESC_CMD_OWN)
break;
// packet buffer can only hold single packet, so we should not receive any multi-descriptor packets
ASSERT((descriptor.command & RTL8169_DESC_CMD_LS) && (descriptor.command & RTL8169_DESC_CMD_FS));
const uint16_t packet_length = descriptor.command & 0x3FFF;
if (packet_length > s_buffer_size)
dwarnln("Got {} bytes to {} byte buffer", packet_length, s_buffer_size);
else if (descriptor.command & (1u << 21))
; // descriptor has an error
else
{
m_lock.unlock(InterruptState::Enabled);
NetworkManager::get().on_receive(*this, BAN::ConstByteSpan {
reinterpret_cast<const uint8_t*>(m_rx_buffer_region->vaddr() + m_rx_current * s_buffer_size),
packet_length
});
m_lock.lock();
}
m_rx_current = (m_rx_current + 1) % m_rx_descriptor_count;
descriptor.command = descriptor.command | RTL8169_DESC_CMD_OWN;
}
SpinLockAsMutex smutex(m_lock, InterruptState::Enabled);
m_thread_blocker.block_indefinite(&smutex);
}
m_thread_is_dead = true;
}
void RTL8169::handle_irq() void RTL8169::handle_irq()
{ {
const uint16_t interrupt_status = m_io_bar_region->read16(RTL8169_IO_ISR); const uint16_t interrupt_status = m_io_bar_region->read16(RTL8169_IO_ISR);
@@ -251,7 +310,7 @@ namespace Kernel
dprintln("link status -> {}", m_link_up.load()); dprintln("link status -> {}", m_link_up.load());
} }
if (interrupt_status & RTL8169_IR_TOK) if (interrupt_status & (RTL8169_IR_TOK | RTL8169_IR_ROK))
{ {
SpinLockGuard _(m_lock); SpinLockGuard _(m_lock);
m_thread_blocker.unblock(); m_thread_blocker.unblock();
@@ -266,38 +325,6 @@ namespace Kernel
if (interrupt_status & RTL8169_IR_FVOW) if (interrupt_status & RTL8169_IR_FVOW)
dwarnln("Rx FIFO overflow"); dwarnln("Rx FIFO overflow");
// dont log TDU is sent after each sent packet // dont log TDU is sent after each sent packet
if (!(interrupt_status & RTL8169_IR_ROK))
return;
constexpr size_t buffer_size = 8192;
for (;;)
{
auto& descriptor = reinterpret_cast<volatile RTL8169Descriptor*>(m_rx_descriptor_region->vaddr())[m_rx_current];
if (descriptor.command & RTL8169_DESC_CMD_OWN)
break;
// packet buffer can only hold single packet, so we should not receive any multi-descriptor packets
ASSERT((descriptor.command & RTL8169_DESC_CMD_LS) && (descriptor.command & RTL8169_DESC_CMD_FS));
const uint16_t packet_length = descriptor.command & 0x3FFF;
if (packet_length > buffer_size)
dwarnln("Got {} bytes to {} byte buffer", packet_length, buffer_size);
else if (descriptor.command & (1u << 21))
; // descriptor has an error
else
{
NetworkManager::get().on_receive(*this, BAN::ConstByteSpan {
reinterpret_cast<const uint8_t*>(m_rx_buffer_region->vaddr() + m_rx_current * buffer_size),
packet_length
});
}
m_rx_current = (m_rx_current + 1) % m_rx_descriptor_count;
descriptor.command = descriptor.command | RTL8169_DESC_CMD_OWN;
}
} }
} }

View File

@@ -7,6 +7,7 @@
#include <fcntl.h> #include <fcntl.h>
#include <netinet/in.h> #include <netinet/in.h>
#include <netinet/tcp.h>
#include <sys/epoll.h> #include <sys/epoll.h>
#include <sys/ioctl.h> #include <sys/ioctl.h>
@@ -24,26 +25,19 @@ namespace Kernel
static constexpr size_t s_recv_window_buffer_size = 16 * PAGE_SIZE; static constexpr size_t s_recv_window_buffer_size = 16 * PAGE_SIZE;
static constexpr size_t s_send_window_buffer_size = 16 * PAGE_SIZE; static constexpr size_t s_send_window_buffer_size = 16 * PAGE_SIZE;
// allows upto 1 MiB windows
static constexpr uint8_t s_window_shift = 4;
// https://www.rfc-editor.org/rfc/rfc1122 4.2.2.6
static constexpr uint16_t s_default_mss = 536;
BAN::ErrorOr<BAN::RefPtr<TCPSocket>> TCPSocket::create(NetworkLayer& network_layer, const Info& info) BAN::ErrorOr<BAN::RefPtr<TCPSocket>> TCPSocket::create(NetworkLayer& network_layer, const Info& info)
{ {
auto socket = TRY(BAN::RefPtr<TCPSocket>::create(network_layer, info)); auto socket = TRY(BAN::RefPtr<TCPSocket>::create(network_layer, info));
socket->m_recv_window.buffer = TRY(VirtualRange::create_to_vaddr_range( socket->m_last_sent_window_size = s_recv_window_buffer_size;
PageTable::kernel(), socket->m_recv_window.buffer = TRY(ByteRingBuffer::create(s_recv_window_buffer_size));
KERNEL_OFFSET, socket->m_recv_window.scale_shift = s_window_shift;
~(vaddr_t)0, socket->m_send_window.buffer = TRY(ByteRingBuffer::create(s_send_window_buffer_size));
s_recv_window_buffer_size,
PageTable::Flags::ReadWrite | PageTable::Flags::Present,
true, false
));
socket->m_recv_window.scale_shift = PAGE_SIZE_SHIFT; // use PAGE_SIZE windows
socket->m_send_window.buffer = TRY(VirtualRange::create_to_vaddr_range(
PageTable::kernel(),
KERNEL_OFFSET,
~(vaddr_t)0,
s_send_window_buffer_size,
PageTable::Flags::ReadWrite | PageTable::Flags::Present,
true, false
));
socket->m_thread = TRY(Thread::create_kernel( socket->m_thread = TRY(Thread::create_kernel(
[](void* socket_ptr) [](void* socket_ptr)
{ {
@@ -100,12 +94,13 @@ namespace Kernel
return_inode->m_listen_parent = this; return_inode->m_listen_parent = this;
return_inode->m_connection_info.emplace(connection.target); return_inode->m_connection_info.emplace(connection.target);
return_inode->m_recv_window.start_seq = connection.target_start_seq; return_inode->m_recv_window.start_seq = connection.target_start_seq;
return_inode->m_send_window.scale_shift = connection.window_scale;
return_inode->m_send_window.mss = connection.maximum_seqment_size;
return_inode->m_next_flags = SYN | ACK; return_inode->m_next_flags = SYN | ACK;
return_inode->m_next_state = State::SynReceived; return_inode->m_next_state = State::SynReceived;
return_inode->m_mutex.unlock();
if (!return_inode->m_connection_info->has_window_scale) if (!return_inode->m_connection_info->has_window_scale)
return_inode->m_recv_window.scale_shift = 0; return_inode->m_recv_window.scale_shift = 0;
return_inode->m_mutex.unlock();
TRY(m_listen_children.emplace(listen_key, return_inode)); TRY(m_listen_children.emplace(listen_key, return_inode));
@@ -197,8 +192,7 @@ namespace Kernel
BAN::ErrorOr<size_t> TCPSocket::recvmsg_impl(msghdr& message, int flags) BAN::ErrorOr<size_t> TCPSocket::recvmsg_impl(msghdr& message, int flags)
{ {
flags &= (MSG_OOB | MSG_PEEK | MSG_WAITALL); if (flags & ~(MSG_PEEK))
if (flags != 0)
{ {
dwarnln("TODO: recvmsg with flags 0x{H}", flags); dwarnln("TODO: recvmsg with flags 0x{H}", flags);
return BAN::Error::from_errno(ENOTSUP); return BAN::Error::from_errno(ENOTSUP);
@@ -206,14 +200,14 @@ namespace Kernel
if (CMSG_FIRSTHDR(&message)) if (CMSG_FIRSTHDR(&message))
{ {
dwarnln("ignoring recvmsg control message"); dprintln_if(DEBUG_TCP, "ignoring recvmsg control message");
message.msg_controllen = 0; message.msg_controllen = 0;
} }
if (!m_has_connected) if (!m_has_connected)
return BAN::Error::from_errno(ENOTCONN); return BAN::Error::from_errno(ENOTCONN);
while (m_recv_window.data_size == 0) while (m_recv_window.buffer->empty())
{ {
if (m_state != State::Established) if (m_state != State::Established)
return return_with_maybe_zero(); return return_with_maybe_zero();
@@ -223,21 +217,34 @@ namespace Kernel
message.msg_flags = 0; message.msg_flags = 0;
size_t total_recv = 0; size_t total_recv = 0;
for (int i = 0; i < message.msg_iovlen; i++) for (int i = 0; i < message.msg_iovlen && total_recv < m_recv_window.buffer->size(); i++)
{ {
auto* recv_buffer = reinterpret_cast<uint8_t*>(m_recv_window.buffer->vaddr()); auto& iov = message.msg_iov[i];
const size_t nrecv = BAN::Math::min<size_t>(message.msg_iov[i].iov_len, m_recv_window.data_size); const size_t nrecv = BAN::Math::min(iov.iov_len, m_recv_window.buffer->size() - total_recv);
memcpy(message.msg_iov[i].iov_base, recv_buffer, nrecv); memcpy(iov.iov_base, m_recv_window.buffer->get_data().data() + total_recv, nrecv);
total_recv += nrecv; total_recv += nrecv;
m_recv_window.data_size -= nrecv; }
m_recv_window.start_seq += nrecv;
if (m_recv_window.data_size == 0)
break;
// TODO: use circular buffer to avoid this if (!(flags & MSG_PEEK))
memmove(recv_buffer, recv_buffer + nrecv, m_recv_window.data_size); {
m_recv_window.buffer->pop(total_recv);
m_recv_window.start_seq += total_recv;
}
const size_t update_window_threshold = m_recv_window.buffer->capacity() / 8;
const bool should_update_window_size =
m_last_sent_window_size != m_recv_window.buffer->capacity() && (
(m_last_sent_window_size == 0) ||
(m_recv_window.buffer->empty()) ||
(m_last_sent_window_size + update_window_threshold <= m_recv_window.buffer->free())
);
if (should_update_window_size || m_should_send_zero_window)
{
m_should_send_window_update = true;
m_thread_blocker.unblock();
} }
return total_recv; return total_recv;
@@ -245,10 +252,7 @@ namespace Kernel
BAN::ErrorOr<size_t> TCPSocket::sendmsg_impl(const msghdr& message, int flags) BAN::ErrorOr<size_t> TCPSocket::sendmsg_impl(const msghdr& message, int flags)
{ {
if (flags & MSG_NOSIGNAL) if (flags & ~(MSG_NOSIGNAL | MSG_DONTWAIT))
dwarnln("sendmsg ignoring MSG_NOSIGNAL");
flags &= (MSG_EOR | MSG_OOB /* | MSG_NOSIGNAL */);
if (flags != 0)
{ {
dwarnln("TODO: sendmsg with flags 0x{H}", flags); dwarnln("TODO: sendmsg with flags 0x{H}", flags);
return BAN::Error::from_errno(ENOTSUP); return BAN::Error::from_errno(ENOTSUP);
@@ -260,25 +264,24 @@ namespace Kernel
if (!m_has_connected) if (!m_has_connected)
return BAN::Error::from_errno(ENOTCONN); return BAN::Error::from_errno(ENOTCONN);
while (m_send_window.data_size == m_send_window.buffer->size()) while (m_send_window.buffer->full())
{ {
if (m_state != State::Established) if (m_state != State::Established)
return return_with_maybe_zero(); return return_with_maybe_zero();
if (flags & MSG_DONTWAIT)
return BAN::Error::from_errno(EAGAIN);
TRY(Thread::current().block_or_eintr_indefinite(m_thread_blocker, &m_mutex)); TRY(Thread::current().block_or_eintr_indefinite(m_thread_blocker, &m_mutex));
} }
size_t total_sent = 0; size_t total_sent = 0;
for (int i = 0; i < message.msg_iovlen; i++) for (int i = 0; i < message.msg_iovlen && !m_send_window.buffer->full(); i++)
{ {
auto* send_buffer = reinterpret_cast<uint8_t*>(m_send_window.buffer->vaddr()); const auto& iov = message.msg_iov[i];
const size_t nsend = BAN::Math::min<size_t>(message.msg_iov[i].iov_len, m_send_window.buffer->size() - m_send_window.data_size); const size_t nsend = BAN::Math::min(iov.iov_len, m_send_window.buffer->free());
memcpy(send_buffer + m_send_window.data_size, message.msg_iov[i].iov_base, nsend); m_send_window.buffer->push({ static_cast<const uint8_t*>(iov.iov_base), nsend });
total_sent += nsend; total_sent += nsend;
m_send_window.data_size += nsend;
if (m_send_window.data_size == m_send_window.buffer->size())
break;
} }
m_thread_blocker.unblock(); m_thread_blocker.unblock();
@@ -297,12 +300,99 @@ namespace Kernel
return {}; return {};
} }
BAN::ErrorOr<void> TCPSocket::getsockopt_impl(int level, int option, void* value, socklen_t* value_len)
{
int result;
switch (level)
{
case SOL_SOCKET:
switch (option)
{
case SO_KEEPALIVE:
result = m_keep_alive;
break;
case SO_ERROR:
result = 0;
break;
case SO_SNDBUF:
result = m_send_window.scaled_size();
break;
case SO_RCVBUF:
result = m_recv_window.buffer->capacity();
break;
default:
dwarnln("getsockopt(SOL_SOCKET, {})", option);
return BAN::Error::from_errno(ENOPROTOOPT);
}
break;
case IPPROTO_TCP:
switch (option)
{
case TCP_NODELAY:
result = m_no_delay;
break;
default:
dwarnln("getsockopt(IPPROTO_TCP, {})", option);
return BAN::Error::from_errno(ENOPROTOOPT);
}
break;
default:
dwarnln("getsockopt({}, {})", level, option);
return BAN::Error::from_errno(EINVAL);
}
const size_t len = BAN::Math::min<size_t>(sizeof(result), *value_len);
memcpy(value, &result, len);
*value_len = sizeof(int);
return {};
}
BAN::ErrorOr<void> TCPSocket::setsockopt_impl(int level, int option, const void* value, socklen_t value_len)
{
switch (level)
{
case SOL_SOCKET:
switch (option)
{
case SO_KEEPALIVE:
if (value_len != sizeof(int))
return BAN::Error::from_errno(EINVAL);
m_keep_alive = *static_cast<const int*>(value);
break;
default:
dwarnln("setsockopt(SOL_SOCKET, {})", option);
return BAN::Error::from_errno(ENOPROTOOPT);
}
break;
case IPPROTO_TCP:
switch (option)
{
case TCP_NODELAY:
if (value_len != sizeof(int))
return BAN::Error::from_errno(EINVAL);
m_no_delay = *static_cast<const int*>(value);
break;
default:
dwarnln("setsockopt(IPPROTO_TCP, {})", option);
return BAN::Error::from_errno(ENOPROTOOPT);
}
break;
default:
dwarnln("setsockopt({}, {})", level, option);
return BAN::Error::from_errno(EINVAL);
}
return {};
}
BAN::ErrorOr<long> TCPSocket::ioctl_impl(int request, void* argument) BAN::ErrorOr<long> TCPSocket::ioctl_impl(int request, void* argument)
{ {
switch (request) switch (request)
{ {
case FIONREAD: case FIONREAD:
*static_cast<int*>(argument) = m_recv_window.data_size; *static_cast<int*>(argument) = m_recv_window.buffer->size();
return 0; return 0;
} }
@@ -315,14 +405,14 @@ namespace Kernel
return true; return true;
if (m_state == State::Listen) if (m_state == State::Listen)
return !m_pending_connections.empty(); return !m_pending_connections.empty();
return m_recv_window.data_size > 0; return !m_recv_window.buffer->empty();
} }
bool TCPSocket::can_write_impl() const bool TCPSocket::can_write_impl() const
{ {
if (m_state != State::Established) if (m_state != State::Established)
return false; return false;
return m_send_window.data_size < m_send_window.buffer->size(); return !m_send_window.buffer->full();
} }
bool TCPSocket::has_hungup_impl() const bool TCPSocket::has_hungup_impl() const
@@ -395,7 +485,7 @@ namespace Kernel
if (header.options[i] == TCPOption::NOP) if (header.options[i] == TCPOption::NOP)
continue; continue;
if (header.options[i] == TCPOption::MaximumSeqmentSize) if (header.options[i] == TCPOption::MaximumSeqmentSize)
result.maximum_seqment_size = BAN::host_to_network_endian(*reinterpret_cast<const uint16_t*>(&header.options[i + 2])); result.maximum_seqment_size = BAN::network_endian_to_host(*reinterpret_cast<const uint16_t*>(&header.options[i + 2]));
if (header.options[i] == TCPOption::WindowScale) if (header.options[i] == TCPOption::WindowScale)
result.window_scale = header.options[i + 2]; result.window_scale = header.options[i + 2];
if (header.options[i + 1] == 0) if (header.options[i + 1] == 0)
@@ -406,22 +496,28 @@ namespace Kernel
return result; return result;
} }
void TCPSocket::add_protocol_header(BAN::ByteSpan packet, uint16_t dst_port, PseudoHeader pseudo_header) void TCPSocket::get_protocol_header(BAN::ByteSpan header_buffer, BAN::ConstByteSpan payload, uint16_t dst_port, PseudoHeader pseudo_header)
{ {
ASSERT(m_next_flags); ASSERT(m_next_flags);
ASSERT(m_mutex.locker() == Thread::current().tid()); ASSERT(m_mutex.locker() == Thread::current().tid());
ASSERT(header_buffer.size() == protocol_header_size());
auto& header = packet.as<TCPHeader>(); m_last_sent_window_size = m_should_send_zero_window ? 0 : m_recv_window.buffer->free();
memset(&header, 0, sizeof(TCPHeader));
memset(header.options, TCPOption::End, m_tcp_options_bytes); auto& header = header_buffer.as<TCPHeader>();
header = {
.src_port = bound_port(),
.dst_port = dst_port,
.seq_number = m_send_window.current_seq + m_send_window.has_ghost_byte,
.ack_number = m_recv_window.start_seq + m_recv_window.buffer->size() + m_recv_window.has_ghost_byte,
.data_offset = (sizeof(TCPHeader) + m_tcp_options_bytes) / sizeof(uint32_t),
.flags = m_next_flags,
.window_size = BAN::Math::min<size_t>(0xFFFF, m_last_sent_window_size >> m_recv_window.scale_shift),
.checksum = 0,
.urgent_pointer = 0,
};
memset(header.options, 0, m_tcp_options_bytes);
header.src_port = bound_port();
header.dst_port = dst_port;
header.seq_number = m_send_window.current_seq + m_send_window.has_ghost_byte;
header.ack_number = m_recv_window.start_seq + m_recv_window.data_size + m_recv_window.has_ghost_byte;
header.data_offset = (sizeof(TCPHeader) + m_tcp_options_bytes) / sizeof(uint32_t);
header.window_size = BAN::Math::min<size_t>(0xFFFF, m_recv_window.buffer->size() >> m_recv_window.scale_shift);
header.flags = m_next_flags;
if (header.flags & FIN) if (header.flags & FIN)
m_send_window.has_ghost_byte = true; m_send_window.has_ghost_byte = true;
m_next_flags = 0; m_next_flags = 0;
@@ -436,28 +532,43 @@ namespace Kernel
}; };
auto interface = MUST(this->interface(reinterpret_cast<const sockaddr*>(&target), sizeof(target))); auto interface = MUST(this->interface(reinterpret_cast<const sockaddr*>(&target), sizeof(target)));
add_tcp_header_option<0, TCPOption::MaximumSeqmentSize>(header, interface->payload_mtu() - m_network_layer.header_size()); add_tcp_header_option<0, TCPOption::MaximumSeqmentSize>(header, interface->payload_mtu() - m_network_layer.header_size() - protocol_header_size());
if (m_connection_info->has_window_scale) if (m_connection_info->has_window_scale)
add_tcp_header_option<4, TCPOption::WindowScale>(header, m_recv_window.scale_shift); add_tcp_header_option<4, TCPOption::WindowScale>(header, m_recv_window.scale_shift);
header.window_size = BAN::Math::min<size_t>(0xFFFF, m_recv_window.buffer->size()); header.window_size = BAN::Math::min<size_t>(0xFFFF, m_recv_window.buffer->capacity());
m_send_window.mss = 1440;
m_send_window.start_seq++; m_send_window.start_seq++;
m_send_window.current_seq = m_send_window.start_seq; m_send_window.current_seq = m_send_window.start_seq;
} }
pseudo_header.extra = packet.size(); const BAN::ConstByteSpan buffers[] {
header.checksum = calculate_internet_checksum(packet, pseudo_header); BAN::ConstByteSpan::from(pseudo_header),
header_buffer,
payload,
};
header.checksum = calculate_internet_checksum({ buffers, sizeof(buffers) / sizeof(*buffers) });
dprintln_if(DEBUG_TCP, "sending {} {8b}", (uint8_t)m_state, header.flags); dprintln_if(DEBUG_TCP, "sending {} {8b}", (uint8_t)m_state, header.flags);
dprintln_if(DEBUG_TCP, " {}", (uint32_t)header.ack_number); dprintln_if(DEBUG_TCP, " ack {}", (uint32_t)header.ack_number);
dprintln_if(DEBUG_TCP, " {}", (uint32_t)header.seq_number); dprintln_if(DEBUG_TCP, " seq {}", (uint32_t)header.seq_number);
} }
void TCPSocket::receive_packet(BAN::ConstByteSpan buffer, const sockaddr* sender, socklen_t sender_len) void TCPSocket::receive_packet(BAN::ConstByteSpan buffer, const sockaddr* sender, socklen_t sender_len)
{ {
(void)sender_len; if (m_state == State::Listen)
{
auto socket =
[&]() -> BAN::RefPtr<TCPSocket> {
LockGuard _(m_mutex);
if (auto it = m_listen_children.find(ListenKey(sender, sender_len)); it != m_listen_children.end())
return it->value;
return {};
}();
if (socket)
return socket->receive_packet(buffer, sender, sender_len);
}
{ {
uint16_t checksum = 0; uint16_t checksum = 0;
@@ -470,14 +581,17 @@ namespace Kernel
auto interface = interface_or_error.release_value(); auto interface = interface_or_error.release_value();
auto& addr_in = *reinterpret_cast<const sockaddr_in*>(sender); auto& addr_in = *reinterpret_cast<const sockaddr_in*>(sender);
checksum = calculate_internet_checksum(buffer, const PseudoHeader pseudo_header {
PseudoHeader { .src_ipv4 = BAN::IPv4Address(addr_in.sin_addr.s_addr),
.src_ipv4 = BAN::IPv4Address(addr_in.sin_addr.s_addr), .dst_ipv4 = interface->get_ipv4_address(),
.dst_ipv4 = interface->get_ipv4_address(), .protocol = NetworkProtocol::TCP,
.protocol = NetworkProtocol::TCP, .length = buffer.size(),
.extra = buffer.size() };
} const BAN::ConstByteSpan buffers[] {
); BAN::ConstByteSpan::from(pseudo_header),
buffer
};
checksum = calculate_internet_checksum({ buffers, sizeof(buffers) / sizeof(*buffers) });
} }
else else
{ {
@@ -497,11 +611,14 @@ namespace Kernel
const bool hungup_before = has_hungup_impl(); const bool hungup_before = has_hungup_impl();
auto& header = buffer.as<const TCPHeader>(); auto& header = buffer.as<const TCPHeader>();
dprintln_if(DEBUG_TCP, "receiving {} {8b}", (uint8_t)m_state, header.flags); dprintln_if(DEBUG_TCP, "receiving {} {8b}", (uint8_t)m_state, header.flags);
dprintln_if(DEBUG_TCP, " {}", (uint32_t)header.ack_number); dprintln_if(DEBUG_TCP, " ack {}", (uint32_t)header.ack_number);
dprintln_if(DEBUG_TCP, " {}", (uint32_t)header.seq_number); dprintln_if(DEBUG_TCP, " seq {}", (uint32_t)header.seq_number);
m_send_window.non_scaled_size = header.window_size; m_send_window.non_scaled_size = header.window_size;
if (m_send_window.scaled_size() == 0)
m_send_window.had_zero_window = true;
bool check_payload = false; bool check_payload = false;
switch (m_state) switch (m_state)
@@ -520,8 +637,7 @@ namespace Kernel
} }
auto options = parse_tcp_options(header); auto options = parse_tcp_options(header);
if (options.maximum_seqment_size.has_value()) m_send_window.mss = options.maximum_seqment_size.value_or(s_default_mss);
m_send_window.mss = *options.maximum_seqment_size;
if (options.window_scale.has_value()) if (options.window_scale.has_value())
m_send_window.scale_shift = *options.window_scale; m_send_window.scale_shift = *options.window_scale;
else else
@@ -546,44 +662,34 @@ namespace Kernel
m_has_connected = true; m_has_connected = true;
break; break;
case State::Listen: case State::Listen:
if (header.flags == SYN) if (header.flags != SYN)
{ dprintln_if(DEBUG_TCP, "Unexpected packet to listening socket");
if (m_pending_connections.size() == m_pending_connections.capacity()) else if (m_pending_connections.size() == m_pending_connections.capacity())
dprintln_if(DEBUG_TCP, "No storage to store pending connection"); dprintln_if(DEBUG_TCP, "No storage to store pending connection");
else
{
ConnectionInfo connection_info;
memcpy(&connection_info.address, sender, sender_len);
connection_info.address_len = sender_len;
connection_info.has_window_scale = parse_tcp_options(header).window_scale.has_value();
MUST(m_pending_connections.emplace(
connection_info,
header.seq_number + 1
));
epoll_notify(EPOLLIN);
m_thread_blocker.unblock();
}
}
else else
{ {
auto it = m_listen_children.find(ListenKey(sender, sender_len)); const auto options = parse_tcp_options(header);
if (it == m_listen_children.end())
{ ConnectionInfo connection_info;
dprintln_if(DEBUG_TCP, "Unexpected packet to listening socket"); memcpy(&connection_info.address, sender, sender_len);
break; connection_info.address_len = sender_len;
} connection_info.has_window_scale = options.window_scale.has_value();
auto socket = it->value; MUST(m_pending_connections.emplace(
m_mutex.unlock(); connection_info,
socket->receive_packet(buffer, sender, sender_len); header.seq_number + 1,
m_mutex.lock(); options.maximum_seqment_size.value_or(s_default_mss),
options.window_scale.value_or(0)
));
epoll_notify(EPOLLIN);
m_thread_blocker.unblock();
} }
return; return;
case State::Established: case State::Established:
check_payload = true; check_payload = true;
if (!(header.flags & FIN)) if (!(header.flags & FIN))
break; break;
if (m_recv_window.start_seq + m_recv_window.data_size != header.seq_number) if (m_recv_window.start_seq + m_recv_window.buffer->size() != header.seq_number)
break; break;
m_next_flags = FIN | ACK; m_next_flags = FIN | ACK;
m_next_state = State::LastAck; m_next_state = State::LastAck;
@@ -632,7 +738,9 @@ namespace Kernel
break; break;
} }
if (header.seq_number != m_recv_window.start_seq + m_recv_window.data_size + m_recv_window.has_ghost_byte) const uint32_t expected_seq = m_recv_window.start_seq + m_recv_window.buffer->size() + m_recv_window.has_ghost_byte;
if (header.seq_number > expected_seq)
dprintln_if(DEBUG_TCP, "Missing packets"); dprintln_if(DEBUG_TCP, "Missing packets");
else if (check_payload) else if (check_payload)
{ {
@@ -643,26 +751,35 @@ namespace Kernel
m_send_window.current_ack = header.ack_number; m_send_window.current_ack = header.ack_number;
auto payload = buffer.slice(header.data_offset * sizeof(uint32_t)); auto payload = buffer.slice(header.data_offset * sizeof(uint32_t));
if (payload.size() > 0)
if (header.seq_number < expected_seq)
{ {
if (m_recv_window.data_size + payload.size() > m_recv_window.buffer->size()) const uint32_t already_received_bytes = expected_seq - header.seq_number;
dprintln_if(DEBUG_TCP, "Cannot fit received bytes to window, waiting for retransmission"); if (already_received_bytes <= payload.size())
payload = payload.slice(already_received_bytes);
else else
{ payload = {};
auto* buffer = reinterpret_cast<uint8_t*>(m_recv_window.buffer->vaddr()); }
memcpy(buffer + m_recv_window.data_size, payload.data(), payload.size());
m_recv_window.data_size += payload.size();
epoll_notify(EPOLLIN); const bool can_receive_new_data = (payload.size() > 0 && !m_recv_window.buffer->full());
dprintln_if(DEBUG_TCP, "Received {} bytes", payload.size()); if (can_receive_new_data)
{
const size_t nrecv = BAN::Math::min(payload.size(), m_recv_window.buffer->free());
m_recv_window.buffer->push(payload.slice(0, nrecv));
if (m_next_flags == 0) epoll_notify(EPOLLIN);
{
m_next_flags = ACK; dprintln_if(DEBUG_TCP, "Received {} bytes", nrecv);
m_next_state = m_state; }
}
} // make sure zero window is reported
if (m_last_sent_window_size > 0 && m_recv_window.buffer->full())
m_should_send_zero_window = true;
else if (can_receive_new_data)
{
m_next_flags = ACK;
m_next_state = m_state;
} }
} }
@@ -766,21 +883,14 @@ namespace Kernel
continue; continue;
} }
if (m_send_window.data_size > 0 && m_send_window.current_ack - m_send_window.has_ghost_byte > m_send_window.start_seq) if (m_send_window.current_ack - m_send_window.has_ghost_byte > m_send_window.start_seq)
{ {
uint32_t acknowledged_bytes = m_send_window.current_ack - m_send_window.start_seq - m_send_window.has_ghost_byte; const uint32_t acknowledged_bytes = m_send_window.current_ack - m_send_window.start_seq - m_send_window.has_ghost_byte;
ASSERT(acknowledged_bytes <= m_send_window.data_size); ASSERT(acknowledged_bytes <= m_send_window.buffer->size());
m_send_window.data_size -= acknowledged_bytes;
m_send_window.start_seq += acknowledged_bytes; m_send_window.start_seq += acknowledged_bytes;
if (m_send_window.data_size > 0)
{
auto* send_buffer = reinterpret_cast<uint8_t*>(m_send_window.buffer->vaddr());
memmove(send_buffer, send_buffer + acknowledged_bytes, m_send_window.data_size);
}
m_send_window.sent_size -= acknowledged_bytes; m_send_window.sent_size -= acknowledged_bytes;
m_send_window.buffer->pop(acknowledged_bytes);
epoll_notify(EPOLLOUT); epoll_notify(EPOLLOUT);
@@ -789,26 +899,32 @@ namespace Kernel
continue; continue;
} }
const bool should_retransmit = m_send_window.data_size > 0 && current_ms >= m_send_window.last_send_ms + retransmit_timeout_ms; const bool should_retransmit = m_send_window.had_zero_window || (m_send_window.sent_size > 0 && current_ms >= m_send_window.last_send_ms + retransmit_timeout_ms);
if (m_send_window.data_size > m_send_window.sent_size || should_retransmit) const bool can_send_new_data = (m_send_window.buffer->size() > m_send_window.sent_size && m_send_window.sent_size < m_send_window.scaled_size());
if (m_send_window.scaled_size() > 0 && (should_retransmit || can_send_new_data))
{ {
m_send_window.had_zero_window = false;
ASSERT(m_connection_info.has_value()); ASSERT(m_connection_info.has_value());
auto* target_address = reinterpret_cast<const sockaddr*>(&m_connection_info->address); auto* target_address = reinterpret_cast<const sockaddr*>(&m_connection_info->address);
auto target_address_len = m_connection_info->address_len; auto target_address_len = m_connection_info->address_len;
const uint32_t send_base = should_retransmit ? 0 : m_send_window.sent_size; const size_t send_offset = should_retransmit ? 0 : m_send_window.sent_size;
const uint32_t total_send = BAN::Math::min<uint32_t>(m_send_window.data_size - send_base, m_send_window.scaled_size()); const size_t total_send = BAN::Math::min<size_t>(
m_send_window.buffer->size() - send_offset,
m_send_window.scaled_size() - send_offset
);
m_send_window.current_seq = m_send_window.start_seq + m_send_window.sent_size; m_send_window.current_seq = m_send_window.start_seq + send_offset;
auto* send_buffer = reinterpret_cast<const uint8_t*>(m_send_window.buffer->vaddr() + send_base); for (size_t i = 0; i < total_send;)
for (uint32_t i = 0; i < total_send;)
{ {
const uint32_t to_send = BAN::Math::min(total_send - i, m_send_window.mss); const size_t to_send = BAN::Math::min<size_t>(total_send - i, m_send_window.mss);
auto message = BAN::ConstByteSpan(send_buffer + i, to_send); auto message = m_send_window.buffer->get_data().slice(send_offset + i, to_send);
m_next_flags = ACK; m_next_flags = ACK;
if (auto ret = m_network_layer.sendto(*this, message, target_address, target_address_len); ret.is_error()) if (auto ret = m_network_layer.sendto(*this, message, target_address, target_address_len); ret.is_error())
@@ -819,9 +935,10 @@ namespace Kernel
dprintln_if(DEBUG_TCP, "Sent {} bytes", to_send); dprintln_if(DEBUG_TCP, "Sent {} bytes", to_send);
m_send_window.sent_size += to_send;
m_send_window.current_seq += to_send;
i += to_send; i += to_send;
m_send_window.current_seq += to_send;
if (send_offset + i > m_send_window.sent_size)
m_send_window.sent_size = send_offset + i;
} }
m_send_window.last_send_ms = current_ms; m_send_window.last_send_ms = current_ms;
@@ -829,6 +946,30 @@ namespace Kernel
continue; continue;
} }
if (m_last_sent_window_size == 0)
m_should_send_zero_window = false;
if (m_should_send_zero_window || m_should_send_window_update)
{
ASSERT(m_connection_info.has_value());
auto* target_address = reinterpret_cast<const sockaddr*>(&m_connection_info->address);
auto target_address_len = m_connection_info->address_len;
m_next_flags = ACK;
if (auto ret = m_network_layer.sendto(*this, {}, target_address, target_address_len); ret.is_error())
dwarnln("{}", ret.error());
m_should_send_zero_window = false;
m_should_send_window_update = false;
if (m_last_sent_window_size == 0 && !m_recv_window.buffer->full())
{
m_next_flags = ACK;
if (auto ret = m_network_layer.sendto(*this, {}, target_address, target_address_len); ret.is_error())
dwarnln("{}", ret.error());
}
}
m_thread_blocker.unblock(); m_thread_blocker.unblock();
m_thread_blocker.block_with_wake_time_ms(current_ms + retransmit_timeout_ms, &m_mutex); m_thread_blocker.block_with_wake_time_ms(current_ms + retransmit_timeout_ms, &m_mutex);
} }

View File

@@ -35,13 +35,26 @@ namespace Kernel
m_address_len = 0; m_address_len = 0;
} }
void UDPSocket::add_protocol_header(BAN::ByteSpan packet, uint16_t dst_port, PseudoHeader) void UDPSocket::get_protocol_header(BAN::ByteSpan header_buffer, BAN::ConstByteSpan payload, uint16_t dst_port, PseudoHeader pseudo_header)
{ {
auto& header = packet.as<UDPHeader>(); ASSERT(header_buffer.size() == protocol_header_size());
header.src_port = bound_port();
header.dst_port = dst_port; auto& header = header_buffer.as<UDPHeader>();
header.length = packet.size(); header = {
header.checksum = 0; .src_port = bound_port(),
.dst_port = dst_port,
.length = protocol_header_size() + payload.size(),
.checksum = 0,
};
const BAN::ConstByteSpan buffers[] {
BAN::ConstByteSpan::from(pseudo_header),
header_buffer,
payload,
};
header.checksum = calculate_internet_checksum({ buffers, sizeof(buffers) / sizeof(*buffers) });
if (header.checksum == 0)
header.checksum = 0xFFFF;
} }
void UDPSocket::receive_packet(BAN::ConstByteSpan packet, const sockaddr* sender, socklen_t sender_len) void UDPSocket::receive_packet(BAN::ConstByteSpan packet, const sockaddr* sender, socklen_t sender_len)
@@ -162,10 +175,7 @@ namespace Kernel
BAN::ErrorOr<size_t> UDPSocket::sendmsg_impl(const msghdr& message, int flags) BAN::ErrorOr<size_t> UDPSocket::sendmsg_impl(const msghdr& message, int flags)
{ {
if (flags & MSG_NOSIGNAL) if (flags & ~(MSG_NOSIGNAL | MSG_DONTWAIT))
dwarnln("sendmsg ignoring MSG_NOSIGNAL");
flags &= (MSG_EOR | MSG_OOB /* | MSG_NOSIGNAL */);
if (flags != 0)
{ {
dwarnln("TODO: sendmsg with flags 0x{H}", flags); dwarnln("TODO: sendmsg with flags 0x{H}", flags);
return BAN::Error::from_errno(ENOTSUP); return BAN::Error::from_errno(ENOTSUP);
@@ -213,6 +223,65 @@ namespace Kernel
return TRY(m_network_layer.sendto(*this, buffer.span(), address, address_len)); return TRY(m_network_layer.sendto(*this, buffer.span(), address, address_len));
} }
BAN::ErrorOr<void> UDPSocket::getsockopt_impl(int level, int option, void* value, socklen_t* value_len)
{
int result;
switch (level)
{
case SOL_SOCKET:
switch (option)
{
case SO_ERROR:
result = 0;
break;
case SO_SNDBUF:
result = m_packet_buffer->size();
break;
case SO_RCVBUF:
result = m_packet_buffer->size();
break;
default:
dwarnln("getsockopt(SOLSOCKET, {})", option);
return BAN::Error::from_errno(ENOPROTOOPT);
}
break;
case IPPROTO_UDP:
dwarnln("getsockopt(IPPROTO_UDP, {})", option);
return BAN::Error::from_errno(ENOPROTOOPT);
default:
dwarnln("getsockopt({}, {})", level, option);
return BAN::Error::from_errno(EINVAL);
}
const size_t len = BAN::Math::min<size_t>(sizeof(result), *value_len);
memcpy(value, &result, len);
*value_len = sizeof(int);
return {};
}
BAN::ErrorOr<void> UDPSocket::setsockopt_impl(int level, int option, const void* value, socklen_t value_len)
{
(void)value;
(void)value_len;
switch (level)
{
case SOL_SOCKET:
dwarnln("setsockopt(SOL_SOCKET, {})", option);
return BAN::Error::from_errno(ENOPROTOOPT);
case IPPROTO_UDP:
dwarnln("setsockopt(IPPROTO_UDP, {})", option);
return BAN::Error::from_errno(ENOPROTOOPT);
default:
dwarnln("setsockopt({}, {})", level, option);
return BAN::Error::from_errno(EINVAL);
}
return {};
}
BAN::ErrorOr<long> UDPSocket::ioctl_impl(int request, void* argument) BAN::ErrorOr<long> UDPSocket::ioctl_impl(int request, void* argument)
{ {
switch (request) switch (request)

View File

@@ -25,7 +25,7 @@ namespace Kernel
static BAN::HashMap<BAN::RefPtr<Inode>, BAN::WeakPtr<UnixDomainSocket>, UnixSocketHash> s_bound_sockets; static BAN::HashMap<BAN::RefPtr<Inode>, BAN::WeakPtr<UnixDomainSocket>, UnixSocketHash> s_bound_sockets;
static Mutex s_bound_socket_lock; static Mutex s_bound_socket_lock;
static constexpr size_t s_packet_buffer_size = 10 * PAGE_SIZE; static constexpr size_t s_packet_buffer_size = 0x10000;
static BAN::ErrorOr<BAN::StringView> validate_sockaddr_un(const sockaddr* address, socklen_t address_len) static BAN::ErrorOr<BAN::StringView> validate_sockaddr_un(const sockaddr* address, socklen_t address_len)
{ {
@@ -45,8 +45,6 @@ namespace Kernel
return BAN::StringView { sockaddr_un.sun_path, length }; return BAN::StringView { sockaddr_un.sun_path, length };
} }
// FIXME: why is this using spinlocks instead of mutexes??
BAN::ErrorOr<BAN::RefPtr<UnixDomainSocket>> UnixDomainSocket::create(Socket::Type socket_type, const Socket::Info& info) BAN::ErrorOr<BAN::RefPtr<UnixDomainSocket>> UnixDomainSocket::create(Socket::Type socket_type, const Socket::Info& info)
{ {
auto socket = TRY(BAN::RefPtr<UnixDomainSocket>::create(socket_type, info)); auto socket = TRY(BAN::RefPtr<UnixDomainSocket>::create(socket_type, info));
@@ -64,6 +62,7 @@ namespace Kernel
UnixDomainSocket::UnixDomainSocket(Socket::Type socket_type, const Socket::Info& info) UnixDomainSocket::UnixDomainSocket(Socket::Type socket_type, const Socket::Info& info)
: Socket(info) : Socket(info)
, m_socket_type(socket_type) , m_socket_type(socket_type)
, m_sndbuf(s_packet_buffer_size)
{ {
switch (socket_type) switch (socket_type)
{ {
@@ -289,28 +288,56 @@ namespace Kernel
case Socket::Type::SEQPACKET: case Socket::Type::SEQPACKET:
case Socket::Type::DGRAM: case Socket::Type::DGRAM:
return false; return false;
default:
ASSERT_NOT_REACHED();
} }
ASSERT_NOT_REACHED();
} }
BAN::ErrorOr<void> UnixDomainSocket::add_packet(const msghdr& packet, PacketInfo&& packet_info) BAN::ErrorOr<size_t> UnixDomainSocket::add_packet(const msghdr& packet, PacketInfo&& packet_info, bool dont_block)
{ {
LockGuard _(m_packet_lock); LockGuard _(m_packet_lock);
while (m_packet_infos.full() || m_packet_size_total + packet_info.size > s_packet_buffer_size) const auto has_space =
TRY(Thread::current().block_or_eintr_indefinite(m_packet_thread_blocker, &m_packet_lock)); [&]() -> bool
{
if (m_packet_infos.full())
return false;
if (is_streaming())
return m_packet_size_total < m_packet_buffer->size();
return m_packet_size_total + packet_info.size <= m_packet_buffer->size();
};
uint8_t* packet_buffer = reinterpret_cast<uint8_t*>(m_packet_buffer->vaddr() + m_packet_size_total); while (!has_space())
size_t offset = 0;
for (int i = 0; i < packet.msg_iovlen; i++)
{ {
memcpy(packet_buffer + offset, packet.msg_iov[i].iov_base, packet.msg_iov[i].iov_len); if (dont_block)
offset += packet.msg_iov[i].iov_len; return BAN::Error::from_errno(EAGAIN);
TRY(Thread::current().block_or_eintr_indefinite(m_packet_thread_blocker, &m_packet_lock));
} }
ASSERT(offset == packet_info.size); if (auto available = m_packet_buffer->size() - m_packet_size_total; available < packet_info.size)
{
ASSERT(is_streaming());
packet_info.size = available;
}
uint8_t* packet_buffer_base_u8 = reinterpret_cast<uint8_t*>(m_packet_buffer->vaddr());
size_t bytes_copied = 0;
for (int i = 0; i < packet.msg_iovlen && bytes_copied < packet_info.size; i++)
{
const uint8_t* iov_base_u8 = static_cast<const uint8_t*>(packet.msg_iov[i].iov_base);
const size_t to_copy = BAN::Math::min(packet.msg_iov[i].iov_len, packet_info.size - bytes_copied);
const size_t copy_offset = (m_packet_buffer_tail + m_packet_size_total + bytes_copied) % m_packet_buffer->size();
const size_t before_wrap = BAN::Math::min(to_copy, m_packet_buffer->size() - copy_offset);
memcpy(packet_buffer_base_u8 + copy_offset, iov_base_u8, before_wrap);
if (const size_t after_wrap = to_copy - before_wrap)
memcpy(packet_buffer_base_u8, iov_base_u8 + before_wrap, after_wrap);
bytes_copied += to_copy;
}
ASSERT(bytes_copied == packet_info.size);
m_packet_size_total += packet_info.size; m_packet_size_total += packet_info.size;
m_packet_infos.emplace(BAN::move(packet_info)); m_packet_infos.emplace(BAN::move(packet_info));
@@ -318,7 +345,7 @@ namespace Kernel
epoll_notify(EPOLLIN); epoll_notify(EPOLLIN);
return {}; return bytes_copied;
} }
bool UnixDomainSocket::can_read_impl() const bool UnixDomainSocket::can_read_impl() const
@@ -334,24 +361,13 @@ namespace Kernel
return false; return false;
} }
LockGuard _(m_packet_lock);
return m_packet_size_total > 0; return m_packet_size_total > 0;
} }
bool UnixDomainSocket::can_write_impl() const bool UnixDomainSocket::can_write_impl() const
{ {
if (m_info.has<ConnectionInfo>()) return m_bytes_sent < m_sndbuf;
{
auto& connection_info = m_info.get<ConnectionInfo>();
auto connection = connection_info.connection.lock();
if (!connection)
return false;
if (connection->m_packet_infos.full())
return false;
if (connection->m_packet_size_total >= s_packet_buffer_size)
return false;
}
return true;
} }
bool UnixDomainSocket::has_hungup_impl() const bool UnixDomainSocket::has_hungup_impl() const
@@ -375,6 +391,7 @@ namespace Kernel
} }
LockGuard _(m_packet_lock); LockGuard _(m_packet_lock);
while (m_packet_size_total == 0) while (m_packet_size_total == 0)
{ {
if (m_info.has<ConnectionInfo>()) if (m_info.has<ConnectionInfo>())
@@ -395,7 +412,7 @@ namespace Kernel
cheader->cmsg_len = message.msg_controllen; cheader->cmsg_len = message.msg_controllen;
size_t cheader_len = 0; size_t cheader_len = 0;
uint8_t* packet_buffer = reinterpret_cast<uint8_t*>(m_packet_buffer->vaddr()); uint8_t* packet_buffer_base_u8 = reinterpret_cast<uint8_t*>(m_packet_buffer->vaddr());
message.msg_flags = 0; message.msg_flags = 0;
@@ -469,7 +486,12 @@ namespace Kernel
uint8_t* iov_base = static_cast<uint8_t*>(iov.iov_base); uint8_t* iov_base = static_cast<uint8_t*>(iov.iov_base);
const size_t nrecv = BAN::Math::min<size_t>(iov.iov_len - iov_offset, packet_info.size - packet_received); const size_t nrecv = BAN::Math::min<size_t>(iov.iov_len - iov_offset, packet_info.size - packet_received);
memcpy(iov_base + iov_offset, packet_buffer + packet_received, nrecv);
const size_t copy_offset = (m_packet_buffer_tail + packet_received) % m_packet_buffer->size();
const size_t before_wrap = BAN::Math::min<size_t>(nrecv, m_packet_buffer->size() - copy_offset);
memcpy(iov_base + iov_offset, packet_buffer_base_u8 + copy_offset, before_wrap);
if (const size_t after_wrap = nrecv - before_wrap)
memcpy(iov_base + iov_offset + before_wrap, packet_buffer_base_u8, after_wrap);
packet_received += nrecv; packet_received += nrecv;
@@ -490,12 +512,17 @@ namespace Kernel
if (packet_info.size == 0) if (packet_info.size == 0)
m_packet_infos.pop(); m_packet_infos.pop();
// FIXME: get rid of this memmove :) m_packet_buffer_tail = (m_packet_buffer_tail + to_discard) % m_packet_buffer->size();
memmove(packet_buffer, packet_buffer + to_discard, m_packet_size_total - to_discard);
m_packet_size_total -= to_discard; m_packet_size_total -= to_discard;
total_recv += packet_received; total_recv += packet_received;
if (auto sender = packet_info.sender.lock())
{
sender->m_bytes_sent -= to_discard;
sender->epoll_notify(EPOLLOUT);
}
// on linux ancillary data is a barrier on stream sockets, lets do the same // on linux ancillary data is a barrier on stream sockets, lets do the same
if (!is_streaming() || had_ancillary_data) if (!is_streaming() || had_ancillary_data)
break; break;
@@ -505,17 +532,12 @@ namespace Kernel
m_packet_thread_blocker.unblock(); m_packet_thread_blocker.unblock();
epoll_notify(EPOLLOUT);
return total_recv; return total_recv;
} }
BAN::ErrorOr<size_t> UnixDomainSocket::sendmsg_impl(const msghdr& message, int flags) BAN::ErrorOr<size_t> UnixDomainSocket::sendmsg_impl(const msghdr& message, int flags)
{ {
if (flags & MSG_NOSIGNAL) if (flags & ~(MSG_NOSIGNAL | MSG_DONTWAIT))
dwarnln("sendmsg ignoring MSG_NOSIGNAL");
flags &= (MSG_EOR | MSG_OOB /* | MSG_NOSIGNAL */);
if (flags != 0)
{ {
dwarnln("TODO: sendmsg with flags 0x{H}", flags); dwarnln("TODO: sendmsg with flags 0x{H}", flags);
return BAN::Error::from_errno(ENOTSUP); return BAN::Error::from_errno(ENOTSUP);
@@ -529,13 +551,14 @@ namespace Kernel
return result; return result;
}(); }();
if (total_message_size > s_packet_buffer_size) if (!is_streaming() && total_message_size > m_packet_buffer->size())
return BAN::Error::from_errno(ENOBUFS); return BAN::Error::from_errno(EMSGSIZE);
PacketInfo packet_info { PacketInfo packet_info {
.size = total_message_size, .size = total_message_size,
.fds = {}, .fds = {},
.ucred = {}, .ucred = {},
.sender = TRY(get_weak_ptr()),
}; };
for (const auto* header = CMSG_FIRSTHDR(&message); header; header = CMSG_NXTHDR(&message, header)) for (const auto* header = CMSG_FIRSTHDR(&message); header; header = CMSG_NXTHDR(&message, header))
@@ -608,8 +631,9 @@ namespace Kernel
auto target = connection_info.connection.lock(); auto target = connection_info.connection.lock();
if (!target) if (!target)
return BAN::Error::from_errno(ENOTCONN); return BAN::Error::from_errno(ENOTCONN);
TRY(target->add_packet(message, BAN::move(packet_info))); const size_t bytes_sent = TRY(target->add_packet(message, BAN::move(packet_info), flags & MSG_DONTWAIT));
return total_message_size; m_bytes_sent += bytes_sent;
return bytes_sent;
} }
else else
{ {
@@ -626,7 +650,7 @@ namespace Kernel
Process::current().root_file().inode, Process::current().root_file().inode,
Process::current().credentials(), Process::current().credentials(),
absolute_path, absolute_path,
O_RDWR O_WRONLY
)).inode; )).inode;
} }
else else
@@ -652,9 +676,11 @@ namespace Kernel
if (!target) if (!target)
return BAN::Error::from_errno(EDESTADDRREQ); return BAN::Error::from_errno(EDESTADDRREQ);
TRY(target->add_packet(message, BAN::move(packet_info))); if (target->m_socket_type != m_socket_type)
return BAN::Error::from_errno(EPROTOTYPE);
return total_message_size; const auto bytes_sent = TRY(target->add_packet(message, BAN::move(packet_info), flags & MSG_DONTWAIT));
m_bytes_sent += bytes_sent;
return bytes_sent;
} }
} }
@@ -678,4 +704,64 @@ namespace Kernel
return {}; return {};
} }
BAN::ErrorOr<void> UnixDomainSocket::getsockopt_impl(int level, int option, void* value, socklen_t* value_len)
{
if (level != SOL_SOCKET)
{
dwarnln("getsockopt({}, {})", level, option);
return BAN::Error::from_errno(EINVAL);
}
int result;
switch (option)
{
case SO_ERROR:
result = 0;
break;
case SO_SNDBUF:
result = m_sndbuf;
break;
case SO_RCVBUF:
result = m_packet_buffer->size();
break;
default:
dwarnln("getsockopt(SOL_SOCKET, {})", option);
return BAN::Error::from_errno(ENOTSUP);
}
const size_t len = BAN::Math::min<size_t>(sizeof(result), *value_len);
memcpy(value, &result, len);
*value_len = sizeof(int);
return {};
}
BAN::ErrorOr<void> UnixDomainSocket::setsockopt_impl(int level, int option, const void* value, socklen_t value_len)
{
if (level != SOL_SOCKET)
{
dwarnln("setsockopt({}, {})", level, option);
return BAN::Error::from_errno(EINVAL);
}
switch (option)
{
case SO_SNDBUF:
{
if (value_len != sizeof(int))
return BAN::Error::from_errno(EINVAL);
const int new_sndbuf = *static_cast<const int*>(value);
if (new_sndbuf < 0)
return BAN::Error::from_errno(EINVAL);
m_sndbuf = new_sndbuf;
break;
}
default:
dwarnln("setsockopt(SOL_SOCKET, {})", option);
return BAN::Error::from_errno(ENOTSUP);
}
return {};
}
} }

View File

@@ -12,6 +12,20 @@
namespace Kernel namespace Kernel
{ {
struct InodeRefPtrHash
{
BAN::hash_t operator()(const BAN::RefPtr<Inode>& inode)
{
return BAN::hash<const Inode*>()(inode.ptr());
}
};
static Mutex s_fcntl_lock_mutex;
static ThreadBlocker s_fcntl_lock_thread_blocker;
static BAN::HashMap<BAN::RefPtr<Inode>, BAN::Vector<struct flock>, InodeRefPtrHash> s_fcntl_locks;
static BAN::ErrorOr<void> fcntl_lock(BAN::RefPtr<Inode> inode, int cmd, struct flock& flock);
OpenFileDescriptorSet::OpenFileDescriptorSet(const Credentials& credentials) OpenFileDescriptorSet::OpenFileDescriptorSet(const Credentials& credentials)
: m_credentials(credentials) : m_credentials(credentials)
{ {
@@ -37,17 +51,19 @@ namespace Kernel
close_all(); close_all();
for (int fd = 0; fd < (int)other.m_open_files.size(); fd++) for (int fd = 0; fd < static_cast<int>(other.m_open_files.size()); fd++)
{ {
if (other.validate_fd(fd).is_error()) if (other.validate_fd(fd).is_error())
continue; continue;
auto& open_file = m_open_files[fd]; auto& open_file = m_open_files[fd];
open_file.description = other.m_open_files[fd].description; open_file = other.m_open_files[fd];
open_file.descriptor_flags = other.m_open_files[fd].descriptor_flags; open_file->file.inode->on_clone(open_file->status_flags);
open_file.inode()->on_clone(open_file.status_flags());
} }
for (size_t i = 0; i < m_cloexec_files.size(); i++)
m_cloexec_files[i] = other.m_cloexec_files[i];
return {}; return {};
} }
@@ -67,11 +83,15 @@ namespace Kernel
if ((flags & O_TRUNC) && (flags & O_WRONLY) && file.inode->mode().ifreg()) if ((flags & O_TRUNC) && (flags & O_WRONLY) && file.inode->mode().ifreg())
TRY(file.inode->truncate(0)); TRY(file.inode->truncate(0));
LockGuard _(m_mutex);
constexpr int status_mask = O_APPEND | O_DSYNC | O_NONBLOCK | O_RSYNC | O_SYNC | O_ACCMODE; constexpr int status_mask = O_APPEND | O_DSYNC | O_NONBLOCK | O_RSYNC | O_SYNC | O_ACCMODE;
int fd = TRY(get_free_fd());
m_open_files[fd].description = TRY(BAN::RefPtr<OpenFileDescription>::create(BAN::move(file), 0, flags & status_mask)); LockGuard _(m_mutex);
m_open_files[fd].descriptor_flags = flags & O_CLOEXEC;
const int fd = TRY(get_free_fd());
m_open_files[fd] = TRY(BAN::RefPtr<OpenFileDescription>::create(BAN::move(file), 0, flags & status_mask));
if (flags & O_CLOEXEC)
add_cloexec(fd);
return fd; return fd;
} }
@@ -85,7 +105,7 @@ namespace Kernel
Socket::Domain domain; Socket::Domain domain;
Socket::Type type; Socket::Type type;
int status_flags; int status_flags;
int descriptor_flags; bool cloexec;
}; };
static BAN::ErrorOr<SocketInfo> parse_socket_info(int domain, int type, int protocol) static BAN::ErrorOr<SocketInfo> parse_socket_info(int domain, int type, int protocol)
@@ -110,11 +130,11 @@ namespace Kernel
} }
info.status_flags = 0; info.status_flags = 0;
info.descriptor_flags = 0; info.cloexec = false;
if (type & SOCK_NONBLOCK) if (type & SOCK_NONBLOCK)
info.status_flags |= O_NONBLOCK; info.status_flags |= O_NONBLOCK;
if (type & SOCK_CLOEXEC) if (type & SOCK_CLOEXEC)
info.descriptor_flags |= O_CLOEXEC; info.cloexec = true;
type &= ~(SOCK_NONBLOCK | SOCK_CLOEXEC); type &= ~(SOCK_NONBLOCK | SOCK_CLOEXEC);
switch (type) switch (type)
@@ -157,9 +177,12 @@ namespace Kernel
socket_sv = "<udp socket>"; socket_sv = "<udp socket>";
LockGuard _(m_mutex); LockGuard _(m_mutex);
int fd = TRY(get_free_fd());
m_open_files[fd].description = TRY(BAN::RefPtr<OpenFileDescription>::create(VirtualFileSystem::File(socket, socket_sv), 0, O_RDWR | sock_info.status_flags)); const int fd = TRY(get_free_fd());
m_open_files[fd].descriptor_flags = sock_info.descriptor_flags; m_open_files[fd] = TRY(BAN::RefPtr<OpenFileDescription>::create(VirtualFileSystem::File(socket, socket_sv), 0, O_RDWR | sock_info.status_flags));
if (sock_info.cloexec)
add_cloexec(fd);
return fd; return fd;
} }
@@ -174,10 +197,14 @@ namespace Kernel
LockGuard _(m_mutex); LockGuard _(m_mutex);
TRY(get_free_fd_pair(socket_vector)); TRY(get_free_fd_pair(socket_vector));
m_open_files[socket_vector[0]].description = TRY(BAN::RefPtr<OpenFileDescription>::create(VirtualFileSystem::File(socket1, "<socketpair>"_sv), 0, O_RDWR | sock_info.status_flags)); m_open_files[socket_vector[0]] = TRY(BAN::RefPtr<OpenFileDescription>::create(VirtualFileSystem::File(socket1, "<socketpair>"_sv), 0, O_RDWR | sock_info.status_flags));
m_open_files[socket_vector[0]].descriptor_flags = sock_info.descriptor_flags; m_open_files[socket_vector[1]] = TRY(BAN::RefPtr<OpenFileDescription>::create(VirtualFileSystem::File(socket2, "<socketpair>"_sv), 0, O_RDWR | sock_info.status_flags));
m_open_files[socket_vector[1]].description = TRY(BAN::RefPtr<OpenFileDescription>::create(VirtualFileSystem::File(socket2, "<socketpair>"_sv), 0, O_RDWR | sock_info.status_flags)); if (sock_info.cloexec)
m_open_files[socket_vector[1]].descriptor_flags = sock_info.descriptor_flags; {
add_cloexec(socket_vector[0]);
add_cloexec(socket_vector[1]);
}
return {}; return {};
} }
@@ -188,10 +215,11 @@ namespace Kernel
TRY(get_free_fd_pair(fds)); TRY(get_free_fd_pair(fds));
auto pipe = TRY(Pipe::create(m_credentials)); auto pipe = TRY(Pipe::create(m_credentials));
m_open_files[fds[0]].description = TRY(BAN::RefPtr<OpenFileDescription>::create(VirtualFileSystem::File(pipe, "<pipe rd>"_sv), 0, O_RDONLY)); m_open_files[fds[0]] = TRY(BAN::RefPtr<OpenFileDescription>::create(VirtualFileSystem::File(pipe, "<pipe rd>"_sv), 0, O_RDONLY));
m_open_files[fds[0]].descriptor_flags = 0; m_open_files[fds[1]] = TRY(BAN::RefPtr<OpenFileDescription>::create(VirtualFileSystem::File(pipe, "<pipe wr>"_sv), 0, O_WRONLY));
m_open_files[fds[1]].description = TRY(BAN::RefPtr<OpenFileDescription>::create(VirtualFileSystem::File(pipe, "<pipe wr>"_sv), 0, O_WRONLY));
m_open_files[fds[1]].descriptor_flags = 0; ASSERT(!is_cloexec(fds[0]));
ASSERT(!is_cloexec(fds[1]));
return {}; return {};
} }
@@ -210,15 +238,76 @@ namespace Kernel
(void)close(fildes2); (void)close(fildes2);
auto& open_file = m_open_files[fildes2]; auto& open_file = m_open_files[fildes2];
open_file.description = m_open_files[fildes].description; open_file = m_open_files[fildes];
open_file.descriptor_flags = 0; open_file->file.inode->on_clone(open_file->status_flags);
open_file.inode()->on_clone(open_file.status_flags());
ASSERT(!is_cloexec(fildes2));
return fildes2; return fildes2;
} }
BAN::ErrorOr<int> OpenFileDescriptorSet::fcntl(int fd, int cmd, uintptr_t extra) BAN::ErrorOr<int> OpenFileDescriptorSet::fcntl(int fd, int cmd, uintptr_t extra)
{ {
if (cmd == F_SETLK || cmd == F_SETLKW || cmd == F_GETLK)
{
struct flock flock;
TRY(Process::current().read_from_user(reinterpret_cast<void*>(extra), &flock, sizeof(struct flock)));
flock.l_pid = Process::current().pid();
BAN::RefPtr<Inode> inode;
{
LockGuard _(m_mutex);
TRY(validate_fd(fd));
inode = m_open_files[fd]->file.inode;
switch (flock.l_whence)
{
case SEEK_SET:
break;
case SEEK_CUR:
if (BAN::Math::will_addition_overflow(flock.l_start, m_open_files[fd]->offset))
return BAN::Error::from_errno(EOVERFLOW);
flock.l_start += m_open_files[fd]->offset;
break;
case SEEK_END:
if (BAN::Math::will_addition_overflow(flock.l_start, inode->size()))
return BAN::Error::from_errno(EOVERFLOW);
flock.l_start += inode->size();
break;
default:
return BAN::Error::from_errno(EINVAL);
}
if (BAN::Math::will_addition_overflow(flock.l_start, flock.l_len))
return BAN::Error::from_errno(EOVERFLOW);
if (flock.l_len < 0)
{
flock.l_start += flock.l_len;
if (flock.l_len == BAN::numeric_limits<decltype(flock.l_len)>::min())
return BAN::Error::from_errno(EOVERFLOW);
flock.l_len = -flock.l_len;
}
flock.l_whence = SEEK_SET;
}
auto lock_ret = fcntl_lock(inode, cmd, flock);
if (lock_ret.is_error())
{
if (lock_ret.error().get_error_code() == ENOMEM)
return BAN::Error::from_errno(ENOLCK);
return lock_ret.release_error();
}
if (cmd == F_GETLK)
TRY(Process::current().write_to_user(reinterpret_cast<void*>(extra), &flock, sizeof(struct flock)));
return 0;
}
LockGuard _(m_mutex); LockGuard _(m_mutex);
TRY(validate_fd(fd)); TRY(validate_fd(fd));
@@ -231,67 +320,28 @@ namespace Kernel
const int new_fd = TRY(get_free_fd()); const int new_fd = TRY(get_free_fd());
auto& open_file = m_open_files[new_fd]; auto& open_file = m_open_files[new_fd];
open_file.description = m_open_files[fd].description; open_file = m_open_files[fd];
open_file.descriptor_flags = (cmd == F_DUPFD_CLOEXEC) ? O_CLOEXEC : 0; open_file->file.inode->on_clone(open_file->status_flags);
open_file.inode()->on_clone(open_file.status_flags()); if (cmd == F_DUPFD_CLOEXEC)
add_cloexec(new_fd);
return new_fd; return new_fd;
} }
case F_GETFD: case F_GETFD:
return m_open_files[fd].descriptor_flags; return is_cloexec(fd) ? O_CLOEXEC : 0;
case F_SETFD: case F_SETFD:
if (extra & FD_CLOEXEC) if (extra & FD_CLOEXEC)
m_open_files[fd].descriptor_flags |= O_CLOEXEC; add_cloexec(fd);
else else
m_open_files[fd].descriptor_flags &= ~O_CLOEXEC; remove_cloexec(fd);
return 0; return 0;
case F_GETFL: case F_GETFL:
return m_open_files[fd].status_flags(); return m_open_files[fd]->status_flags;
case F_SETFL: case F_SETFL:
extra &= O_APPEND | O_DSYNC | O_NONBLOCK | O_RSYNC | O_SYNC; extra &= O_APPEND | O_DSYNC | O_NONBLOCK | O_RSYNC | O_SYNC;
m_open_files[fd].status_flags() &= O_ACCMODE; m_open_files[fd]->status_flags &= O_ACCMODE;
m_open_files[fd].status_flags() |= extra; m_open_files[fd]->status_flags |= extra;
return 0; return 0;
case F_GETLK:
{
dwarnln("TODO: proper fcntl F_GETLK");
auto* param = reinterpret_cast<struct flock*>(extra);
const auto& flock = m_open_files[fd].description->flock;
if (flock.lockers.empty())
param->l_type = F_UNLCK;
else
{
*param = {
.l_type = static_cast<short>(flock.shared ? F_RDLCK : F_WRLCK),
.l_whence = SEEK_SET,
.l_start = 0,
.l_len = 1,
.l_pid = *flock.lockers.begin(),
};
}
return 0;
}
case F_SETLK:
case F_SETLKW:
{
dwarnln("TODO: proper fcntl F_SETLK(W)");
int op = cmd == F_SETLKW ? LOCK_NB : 0;
switch (reinterpret_cast<const struct flock*>(extra)->l_type)
{
case F_UNLCK: op |= LOCK_UN; break;
case F_RDLCK: op |= LOCK_SH; break;
case F_WRLCK: op |= LOCK_EX; break;
default:
return BAN::Error::from_errno(EINVAL);
}
TRY(flock(fd, op));
return 0;
}
default: default:
break; break;
} }
@@ -313,10 +363,10 @@ namespace Kernel
base_offset = 0; base_offset = 0;
break; break;
case SEEK_CUR: case SEEK_CUR:
base_offset = m_open_files[fd].offset(); base_offset = m_open_files[fd]->offset;
break; break;
case SEEK_END: case SEEK_END:
base_offset = m_open_files[fd].inode()->size(); base_offset = m_open_files[fd]->file.inode->size();
break; break;
default: default:
return BAN::Error::from_errno(EINVAL); return BAN::Error::from_errno(EINVAL);
@@ -326,7 +376,7 @@ namespace Kernel
if (new_offset < 0) if (new_offset < 0)
return BAN::Error::from_errno(EINVAL); return BAN::Error::from_errno(EINVAL);
m_open_files[fd].offset() = new_offset; m_open_files[fd]->offset = new_offset;
return new_offset; return new_offset;
} }
@@ -335,14 +385,14 @@ namespace Kernel
{ {
LockGuard _(m_mutex); LockGuard _(m_mutex);
TRY(validate_fd(fd)); TRY(validate_fd(fd));
return m_open_files[fd].offset(); return m_open_files[fd]->offset;
} }
BAN::ErrorOr<void> OpenFileDescriptorSet::truncate(int fd, off_t length) BAN::ErrorOr<void> OpenFileDescriptorSet::truncate(int fd, off_t length)
{ {
LockGuard _(m_mutex); LockGuard _(m_mutex);
TRY(validate_fd(fd)); TRY(validate_fd(fd));
return m_open_files[fd].inode()->truncate(length); return m_open_files[fd]->file.inode->truncate(length);
} }
BAN::ErrorOr<void> OpenFileDescriptorSet::close(int fd) BAN::ErrorOr<void> OpenFileDescriptorSet::close(int fd)
@@ -353,7 +403,7 @@ namespace Kernel
auto& open_file = m_open_files[fd]; auto& open_file = m_open_files[fd];
if (auto& flock = open_file.description->flock; Thread::current().has_process() && flock.lockers.contains(Process::current().pid())) if (auto& flock = open_file->flock; Thread::current().has_process() && flock.lockers.contains(Process::current().pid()))
{ {
flock.lockers.remove(Process::current().pid()); flock.lockers.remove(Process::current().pid());
if (flock.lockers.empty()) if (flock.lockers.empty())
@@ -363,9 +413,21 @@ namespace Kernel
} }
} }
open_file.inode()->on_close(open_file.status_flags()); {
open_file.description.clear(); LockGuard _(s_fcntl_lock_mutex);
open_file.descriptor_flags = 0; if (auto it = s_fcntl_locks.find(open_file->file.inode); it != s_fcntl_locks.end())
{
auto& flocks = it->value;
for (size_t i = 0; i < flocks.size(); i++)
if (flocks[i].l_pid == Process::current().pid())
flocks.remove(i--);
s_fcntl_lock_thread_blocker.unblock();
}
}
open_file->file.inode->on_close(open_file->status_flags);
open_file = {};
remove_cloexec(fd);
return {}; return {};
} }
@@ -373,19 +435,171 @@ namespace Kernel
void OpenFileDescriptorSet::close_all() void OpenFileDescriptorSet::close_all()
{ {
LockGuard _(m_mutex); LockGuard _(m_mutex);
for (int fd = 0; fd < (int)m_open_files.size(); fd++) for (int fd = 0; fd < static_cast<int>(m_open_files.size()); fd++)
(void)close(fd); (void)close(fd);
} }
void OpenFileDescriptorSet::close_cloexec() void OpenFileDescriptorSet::close_cloexec()
{ {
LockGuard _(m_mutex); LockGuard _(m_mutex);
for (int fd = 0; fd < (int)m_open_files.size(); fd++) for (int fd = 0; fd < static_cast<int>(m_open_files.size()); fd++)
{ if (is_cloexec(fd))
if (validate_fd(fd).is_error())
continue;
if (m_open_files[fd].descriptor_flags & O_CLOEXEC)
(void)close(fd); (void)close(fd);
}
bool OpenFileDescriptorSet::is_cloexec(int fd)
{
return m_cloexec_files[fd / 32] & (1u << (fd % 32));
}
void OpenFileDescriptorSet::add_cloexec(int fd)
{
m_cloexec_files[fd / 32] |= 1u << (fd % 32);
}
void OpenFileDescriptorSet::remove_cloexec(int fd)
{
m_cloexec_files[fd / 32] &= ~(1u << (fd % 32));
}
static BAN::ErrorOr<void> fcntl_lock(BAN::RefPtr<Inode> inode, int cmd, struct flock& flock)
{
if (!inode->mode().ifreg())
return BAN::Error::from_errno(EINVAL);
LockGuard _(s_fcntl_lock_mutex);
static constexpr auto locks_overlap =
[](struct flock a, struct flock b) -> bool
{
if (a.l_len == 0 && b.l_len == 0)
return true;
if (a.l_len == 0 && a.l_start < b.l_start + b.l_len)
return true;
if (b.l_len == 0 && b.l_start < a.l_start + a.l_len)
return true;
if (a.l_start + a.l_len <= b.l_start)
return false;
if (b.l_start + b.l_len <= a.l_start)
return false;
return true;
};
const auto lock_preventer =
[flock](BAN::Span<struct flock> locks) -> struct flock*
{
for (auto& lock : locks)
{
if (lock.l_pid == flock.l_pid)
continue;
if (!locks_overlap(lock, flock))
continue;
if (lock.l_type == F_RDLCK && flock.l_type == F_RDLCK)
continue;
return &lock;
}
return nullptr;
};
switch (cmd)
{
case F_GETLK:
{
auto it = s_fcntl_locks.find(inode);
if (it == s_fcntl_locks.end())
flock.l_type = F_UNLCK;
else if (auto* preventer = lock_preventer(it->value.span()))
flock = *preventer;
else
flock.l_type = F_UNLCK;
return {};
}
case F_SETLK:
case F_SETLKW:
{
if (flock.l_type == F_UNLCK)
{
auto it = s_fcntl_locks.find(inode);
if (it == s_fcntl_locks.end())
return {};
auto& flocks = it->value;
for (size_t i = 0; i < flocks.size(); i++)
{
if (flocks[i].l_pid != flock.l_pid)
continue;
if (!locks_overlap(flocks[i], flock))
continue;
struct flock new_flocks[2];
size_t new_flock_count { 0 };
if (flocks[i].l_start < flock.l_start)
{
const off_t flock_len = flocks[i].l_len ? flocks[i].l_len : inode->size();
new_flocks[new_flock_count] = flock;
new_flocks[new_flock_count].l_len = flocks[i].l_start + flock_len - flock.l_start;
new_flock_count++;
}
if (flock.l_len == 0)
;
else if (flocks[i].l_len == 0)
{
new_flocks[new_flock_count] = flock;
new_flocks[new_flock_count].l_start = flock.l_start + flock.l_len;
new_flocks[new_flock_count].l_len = 0;
new_flock_count++;
}
else if (flocks[i].l_start + flocks[i].l_len > flock.l_start + flock.l_len)
{
new_flocks[new_flock_count] = flock;
new_flocks[new_flock_count].l_start = flock.l_start + flock.l_len;
new_flocks[new_flock_count].l_len = (flocks[i].l_start + flocks[i].l_len) - (flock.l_start + flock.l_len);
new_flock_count++;
}
switch (new_flock_count)
{
case 0:
flocks.remove(i--);
break;
case 1:
flocks[i] = new_flocks[0];
break;
case 2:
TRY(flocks.insert(i + 1, new_flocks[1]));
flocks[i++] = new_flocks[0];
break;
}
}
if (flocks.empty())
s_fcntl_locks.remove(it);
s_fcntl_lock_thread_blocker.unblock();
return {};
}
else for (;;)
{
auto it = s_fcntl_locks.find(inode);
if (it == s_fcntl_locks.end())
it = TRY(s_fcntl_locks.emplace(inode));
if (lock_preventer(it->value.span()) == nullptr)
{
TRY(it->value.push_back(flock));
return {};
}
if (cmd == F_SETLK)
return BAN::Error::from_errno(EAGAIN);
TRY(Thread::current().block_or_eintr_indefinite(s_fcntl_lock_thread_blocker, &s_fcntl_lock_mutex));
}
}
default:
ASSERT_NOT_REACHED();
} }
} }
@@ -399,7 +613,7 @@ namespace Kernel
{ {
TRY(validate_fd(fd)); TRY(validate_fd(fd));
auto& flock = m_open_files[fd].description->flock; auto& flock = m_open_files[fd]->flock;
switch (op & ~LOCK_NB) switch (op & ~LOCK_NB)
{ {
case LOCK_UN: case LOCK_UN:
@@ -456,11 +670,11 @@ namespace Kernel
LockGuard _(m_mutex); LockGuard _(m_mutex);
TRY(validate_fd(fd)); TRY(validate_fd(fd));
auto& open_file = m_open_files[fd]; auto& open_file = m_open_files[fd];
if (!(open_file.status_flags() & O_RDONLY)) if (!(open_file->status_flags & O_RDONLY))
return BAN::Error::from_errno(EBADF); return BAN::Error::from_errno(EBADF);
inode = open_file.inode(); inode = open_file->file.inode;
is_nonblock = !!(open_file.status_flags() & O_NONBLOCK); is_nonblock = !!(open_file->status_flags & O_NONBLOCK);
offset = open_file.offset(); offset = open_file->offset;
} }
if (inode->mode().ifsock()) if (inode->mode().ifsock())
@@ -496,7 +710,7 @@ namespace Kernel
LockGuard _(m_mutex); LockGuard _(m_mutex);
// NOTE: race condition with offset, its UB per POSIX // NOTE: race condition with offset, its UB per POSIX
if (!validate_fd(fd).is_error()) if (!validate_fd(fd).is_error())
m_open_files[fd].offset() = offset + nread; m_open_files[fd]->offset = offset + nread;
return nread; return nread;
} }
@@ -510,11 +724,11 @@ namespace Kernel
LockGuard _(m_mutex); LockGuard _(m_mutex);
TRY(validate_fd(fd)); TRY(validate_fd(fd));
auto& open_file = m_open_files[fd]; auto& open_file = m_open_files[fd];
if (!(open_file.status_flags() & O_WRONLY)) if (!(open_file->status_flags & O_WRONLY))
return BAN::Error::from_errno(EBADF); return BAN::Error::from_errno(EBADF);
inode = open_file.inode(); inode = open_file->file.inode;
is_nonblock = !!(open_file.status_flags() & O_NONBLOCK); is_nonblock = !!(open_file->status_flags & O_NONBLOCK);
offset = (open_file.status_flags() & O_APPEND) ? inode->size() : open_file.offset(); offset = (open_file->status_flags & O_APPEND) ? inode->size() : open_file->offset;
} }
if (inode->mode().ifsock()) if (inode->mode().ifsock())
@@ -553,7 +767,7 @@ namespace Kernel
LockGuard _(m_mutex); LockGuard _(m_mutex);
// NOTE: race condition with offset, its UB per POSIX // NOTE: race condition with offset, its UB per POSIX
if (!validate_fd(fd).is_error()) if (!validate_fd(fd).is_error())
m_open_files[fd].offset() = offset + nwrite; m_open_files[fd]->offset = offset + nwrite;
return nwrite; return nwrite;
} }
@@ -566,10 +780,10 @@ namespace Kernel
LockGuard _(m_mutex); LockGuard _(m_mutex);
TRY(validate_fd(fd)); TRY(validate_fd(fd));
auto& open_file = m_open_files[fd]; auto& open_file = m_open_files[fd];
if (!(open_file.status_flags() & O_RDONLY)) if (!(open_file->status_flags & O_RDONLY))
return BAN::Error::from_errno(EACCES); return BAN::Error::from_errno(EACCES);
inode = open_file.inode(); inode = open_file->file.inode;
offset = open_file.offset(); offset = open_file->offset;
} }
for (;;) for (;;)
@@ -584,7 +798,7 @@ namespace Kernel
LockGuard _(m_mutex); LockGuard _(m_mutex);
// NOTE: race condition with offset, its UB per POSIX // NOTE: race condition with offset, its UB per POSIX
if (!validate_fd(fd).is_error()) if (!validate_fd(fd).is_error())
m_open_files[fd].offset() = offset; m_open_files[fd]->offset = offset;
return ret; return ret;
} }
} }
@@ -598,15 +812,15 @@ namespace Kernel
LockGuard _(m_mutex); LockGuard _(m_mutex);
TRY(validate_fd(fd)); TRY(validate_fd(fd));
auto& open_file = m_open_files[fd]; auto& open_file = m_open_files[fd];
if (!open_file.inode()->mode().ifsock()) if (!open_file->file.inode->mode().ifsock())
return BAN::Error::from_errno(ENOTSOCK); return BAN::Error::from_errno(ENOTSOCK);
inode = open_file.inode(); inode = open_file->file.inode;
is_nonblock = !!(open_file.status_flags() & O_NONBLOCK); is_nonblock = !!(open_file->status_flags & O_NONBLOCK);
} }
LockGuard _(inode->m_mutex); LockGuard _(inode->m_mutex);
if (is_nonblock && !inode->can_read()) if (is_nonblock && !inode->can_read())
return BAN::Error::from_errno(EWOULDBLOCK); return BAN::Error::from_errno(EAGAIN);
return inode->recvmsg(message, flags); return inode->recvmsg(message, flags);
} }
@@ -619,28 +833,29 @@ namespace Kernel
LockGuard _(m_mutex); LockGuard _(m_mutex);
TRY(validate_fd(fd)); TRY(validate_fd(fd));
auto& open_file = m_open_files[fd]; auto& open_file = m_open_files[fd];
if (!open_file.inode()->mode().ifsock()) if (!open_file->file.inode->mode().ifsock())
return BAN::Error::from_errno(ENOTSOCK); return BAN::Error::from_errno(ENOTSOCK);
inode = open_file.inode(); inode = open_file->file.inode;
is_nonblock = !!(open_file.status_flags() & O_NONBLOCK); is_nonblock = !!(open_file->status_flags & O_NONBLOCK);
} }
LockGuard _(inode->m_mutex); LockGuard _(inode->m_mutex);
if (inode->has_hungup()) if (inode->has_hungup())
{ {
Thread::current().add_signal(SIGPIPE, {}); if (!(flags & MSG_NOSIGNAL))
Thread::current().add_signal(SIGPIPE, {});
return BAN::Error::from_errno(EPIPE); return BAN::Error::from_errno(EPIPE);
} }
if (is_nonblock && !inode->can_write()) if (is_nonblock && !inode->can_write())
return BAN::Error::from_errno(EWOULDBLOCK); return BAN::Error::from_errno(EAGAIN);
return inode->sendmsg(message, flags); return inode->sendmsg(message, flags | (is_nonblock ? MSG_DONTWAIT : 0));
} }
BAN::ErrorOr<VirtualFileSystem::File> OpenFileDescriptorSet::file_of(int fd) const BAN::ErrorOr<VirtualFileSystem::File> OpenFileDescriptorSet::file_of(int fd) const
{ {
LockGuard _(m_mutex); LockGuard _(m_mutex);
TRY(validate_fd(fd)); TRY(validate_fd(fd));
return TRY(m_open_files[fd].description->file.clone()); return TRY(m_open_files[fd]->file.clone());
} }
BAN::ErrorOr<BAN::String> OpenFileDescriptorSet::path_of(int fd) const BAN::ErrorOr<BAN::String> OpenFileDescriptorSet::path_of(int fd) const
@@ -648,7 +863,7 @@ namespace Kernel
LockGuard _(m_mutex); LockGuard _(m_mutex);
TRY(validate_fd(fd)); TRY(validate_fd(fd));
BAN::String path; BAN::String path;
TRY(path.append(m_open_files[fd].path())); TRY(path.append(m_open_files[fd]->file.canonical_path));
return path; return path;
} }
@@ -656,14 +871,14 @@ namespace Kernel
{ {
LockGuard _(m_mutex); LockGuard _(m_mutex);
TRY(validate_fd(fd)); TRY(validate_fd(fd));
return m_open_files[fd].inode(); return m_open_files[fd]->file.inode;
} }
BAN::ErrorOr<int> OpenFileDescriptorSet::status_flags_of(int fd) const BAN::ErrorOr<int> OpenFileDescriptorSet::status_flags_of(int fd) const
{ {
LockGuard _(m_mutex); LockGuard _(m_mutex);
TRY(validate_fd(fd)); TRY(validate_fd(fd));
return m_open_files[fd].status_flags(); return m_open_files[fd]->status_flags;
} }
BAN::ErrorOr<void> OpenFileDescriptorSet::validate_fd(int fd) const BAN::ErrorOr<void> OpenFileDescriptorSet::validate_fd(int fd) const
@@ -671,7 +886,7 @@ namespace Kernel
LockGuard _(m_mutex); LockGuard _(m_mutex);
if (fd < 0 || fd >= (int)m_open_files.size()) if (fd < 0 || fd >= (int)m_open_files.size())
return BAN::Error::from_errno(EBADF); return BAN::Error::from_errno(EBADF);
if (!m_open_files[fd].description) if (!m_open_files[fd])
return BAN::Error::from_errno(EBADF); return BAN::Error::from_errno(EBADF);
return {}; return {};
} }
@@ -680,7 +895,7 @@ namespace Kernel
{ {
LockGuard _(m_mutex); LockGuard _(m_mutex);
for (int fd = 0; fd < (int)m_open_files.size(); fd++) for (int fd = 0; fd < (int)m_open_files.size(); fd++)
if (!m_open_files[fd].description) if (!m_open_files[fd])
return fd; return fd;
return BAN::Error::from_errno(EMFILE); return BAN::Error::from_errno(EMFILE);
} }
@@ -691,7 +906,7 @@ namespace Kernel
size_t found = 0; size_t found = 0;
for (int fd = 0; fd < (int)m_open_files.size(); fd++) for (int fd = 0; fd < (int)m_open_files.size(); fd++)
{ {
if (!m_open_files[fd].description) if (!m_open_files[fd])
fds[found++] = fd; fds[found++] = fd;
if (found == 2) if (found == 2)
return {}; return {};
@@ -739,7 +954,7 @@ namespace Kernel
{ {
LockGuard _(m_mutex); LockGuard _(m_mutex);
TRY(validate_fd(fd)); TRY(validate_fd(fd));
return FDWrapper { m_open_files[fd].description }; return FDWrapper { m_open_files[fd] };
} }
size_t OpenFileDescriptorSet::open_all_fd_wrappers(BAN::Span<FDWrapper> fd_wrappers) size_t OpenFileDescriptorSet::open_all_fd_wrappers(BAN::Span<FDWrapper> fd_wrappers)
@@ -753,8 +968,7 @@ namespace Kernel
return i; return i;
const int fd = fd_or_error.release_value(); const int fd = fd_or_error.release_value();
m_open_files[fd].description = BAN::move(fd_wrappers[i].m_description); m_open_files[fd] = BAN::move(fd_wrappers[i].m_description);
m_open_files[fd].descriptor_flags = 0;
fd_wrappers[i].m_fd = fd; fd_wrappers[i].m_fd = fd;
} }

View File

@@ -5,6 +5,7 @@
#include <kernel/ELF.h> #include <kernel/ELF.h>
#include <kernel/Epoll.h> #include <kernel/Epoll.h>
#include <kernel/FS/DevFS/FileSystem.h> #include <kernel/FS/DevFS/FileSystem.h>
#include <kernel/FS/EventFD.h>
#include <kernel/FS/ProcFS/FileSystem.h> #include <kernel/FS/ProcFS/FileSystem.h>
#include <kernel/FS/VirtualFileSystem.h> #include <kernel/FS/VirtualFileSystem.h>
#include <kernel/IDT.h> #include <kernel/IDT.h>
@@ -28,6 +29,7 @@
#include <pthread.h> #include <pthread.h>
#include <stdio.h> #include <stdio.h>
#include <sys/banan-os.h> #include <sys/banan-os.h>
#include <sys/eventfd.h>
#include <sys/futex.h> #include <sys/futex.h>
#include <sys/sysmacros.h> #include <sys/sysmacros.h>
#include <sys/wait.h> #include <sys/wait.h>
@@ -301,6 +303,8 @@ namespace Kernel
void Process::exit(int status, int signal) void Process::exit(int status, int signal)
{ {
ASSERT(Processor::get_interrupt_state() == InterruptState::Enabled);
bool expected = false; bool expected = false;
if (!m_is_exiting.compare_exchange(expected, true)) if (!m_is_exiting.compare_exchange(expected, true))
{ {
@@ -308,9 +312,6 @@ namespace Kernel
ASSERT_NOT_REACHED(); ASSERT_NOT_REACHED();
} }
const auto state = Processor::get_interrupt_state();
Processor::set_interrupt_state(InterruptState::Enabled);
if (m_parent) if (m_parent)
{ {
Process* parent_process = nullptr; Process* parent_process = nullptr;
@@ -367,8 +368,6 @@ namespace Kernel
while (m_threads.size() > 1) while (m_threads.size() > 1)
Processor::yield(); Processor::yield();
Processor::set_interrupt_state(state);
Thread::current().on_exit(); Thread::current().on_exit();
ASSERT_NOT_REACHED(); ASSERT_NOT_REACHED();
@@ -379,7 +378,7 @@ namespace Kernel
const auto [master_addr, master_size] = master_tls; const auto [master_addr, master_size] = master_tls;
ASSERT(master_size % alignof(uthread) == 0); ASSERT(master_size % alignof(uthread) == 0);
const size_t tls_size = master_size + PAGE_SIZE; const size_t tls_size = master_size + sizeof(uthread);
auto region = TRY(MemoryBackedRegion::create( auto region = TRY(MemoryBackedRegion::create(
page_table, page_table,
@@ -408,28 +407,26 @@ namespace Kernel
bytes_copied += to_copy; bytes_copied += to_copy;
} }
const uthread uthread { auto uthread = TRY(BAN::UniqPtr<struct uthread>::create());
*uthread = {
.self = reinterpret_cast<struct uthread*>(region->vaddr() + master_size), .self = reinterpret_cast<struct uthread*>(region->vaddr() + master_size),
.master_tls_addr = reinterpret_cast<void*>(master_addr), .master_tls_addr = reinterpret_cast<void*>(master_addr),
.master_tls_size = master_size, .master_tls_size = master_size,
.master_tls_module_count = 1,
.dynamic_tls = nullptr,
.cleanup_stack = nullptr, .cleanup_stack = nullptr,
.id = 0, .id = 0,
.errno_ = 0, .errno_ = 0,
.cancel_type = 0, .cancel_type = 0,
.cancel_state = 0, .cancel_state = 0,
.canceled = 0, .canceled = 0,
.dtv = { 0, region->vaddr() }
}; };
const uintptr_t dtv[2] { 1, region->vaddr() };
TRY(region->copy_data_to_region( TRY(region->copy_data_to_region(
master_size, master_size,
reinterpret_cast<const uint8_t*>(&uthread), reinterpret_cast<const uint8_t*>(uthread.ptr()),
sizeof(uthread) sizeof(struct uthread)
));
TRY(region->copy_data_to_region(
master_size + sizeof(uthread),
reinterpret_cast<const uint8_t*>(&dtv),
sizeof(dtv)
)); ));
TLSResult result; TLSResult result;
@@ -1271,9 +1268,8 @@ namespace Kernel
return 0; return 0;
} }
// FIXME: buffer_region can be null as stack is not MemoryRegion
auto* buffer_region = TRY(validate_and_pin_pointer_access(buffer, count, true)); auto* buffer_region = TRY(validate_and_pin_pointer_access(buffer, count, true));
BAN::ScopeGuard _([buffer_region] { if (buffer_region) buffer_region->unpin(); }); BAN::ScopeGuard _([buffer_region] { buffer_region->unpin(); });
return TRY(m_open_file_descriptors.read(fd, BAN::ByteSpan(static_cast<uint8_t*>(buffer), count))); return TRY(m_open_file_descriptors.read(fd, BAN::ByteSpan(static_cast<uint8_t*>(buffer), count)));
} }
@@ -1285,9 +1281,8 @@ namespace Kernel
return 0; return 0;
} }
// FIXME: buffer_region can be null as stack is not MemoryRegion
auto* buffer_region = TRY(validate_and_pin_pointer_access(buffer, count, false)); auto* buffer_region = TRY(validate_and_pin_pointer_access(buffer, count, false));
BAN::ScopeGuard _([buffer_region] { if (buffer_region) buffer_region->unpin(); }); BAN::ScopeGuard _([buffer_region] { buffer_region->unpin(); });
return TRY(m_open_file_descriptors.write(fd, BAN::ConstByteSpan(static_cast<const uint8_t*>(buffer), count))); return TRY(m_open_file_descriptors.write(fd, BAN::ConstByteSpan(static_cast<const uint8_t*>(buffer), count)));
} }
@@ -1465,7 +1460,7 @@ namespace Kernel
auto inode = TRY(m_open_file_descriptors.inode_of(fd)); auto inode = TRY(m_open_file_descriptors.inode_of(fd));
auto* buffer_region = TRY(validate_and_pin_pointer_access(buffer, count, true)); auto* buffer_region = TRY(validate_and_pin_pointer_access(buffer, count, true));
BAN::ScopeGuard _([buffer_region] { if (buffer_region) buffer_region->unpin(); }); BAN::ScopeGuard _([buffer_region] { buffer_region->unpin(); });
return TRY(inode->read(offset, { reinterpret_cast<uint8_t*>(buffer), count })); return TRY(inode->read(offset, { reinterpret_cast<uint8_t*>(buffer), count }));
} }
@@ -1475,7 +1470,7 @@ namespace Kernel
auto inode = TRY(m_open_file_descriptors.inode_of(fd)); auto inode = TRY(m_open_file_descriptors.inode_of(fd));
auto* buffer_region = TRY(validate_and_pin_pointer_access(buffer, count, false)); auto* buffer_region = TRY(validate_and_pin_pointer_access(buffer, count, false));
BAN::ScopeGuard _([buffer_region] { if (buffer_region) buffer_region->unpin(); }); BAN::ScopeGuard _([buffer_region] { buffer_region->unpin(); });
return TRY(inode->write(offset, { reinterpret_cast<const uint8_t*>(buffer), count })); } return TRY(inode->write(offset, { reinterpret_cast<const uint8_t*>(buffer), count })); }
@@ -1660,12 +1655,6 @@ namespace Kernel
BAN::ErrorOr<long> Process::sys_getsockopt(int socket, int level, int option_name, void* user_option_value, socklen_t* user_option_len) BAN::ErrorOr<long> Process::sys_getsockopt(int socket, int level, int option_name, void* user_option_value, socklen_t* user_option_len)
{ {
if (level != SOL_SOCKET)
{
dwarnln("{}: getsockopt level {}", name(), level);
return BAN::Error::from_errno(EINVAL);
}
socklen_t option_len; socklen_t option_len;
TRY(read_from_user(user_option_len, &option_len, sizeof(socklen_t))); TRY(read_from_user(user_option_len, &option_len, sizeof(socklen_t)));
@@ -1676,30 +1665,17 @@ namespace Kernel
if (!inode->mode().ifsock()) if (!inode->mode().ifsock())
return BAN::Error::from_errno(ENOTSOCK); return BAN::Error::from_errno(ENOTSOCK);
switch (option_name) auto* buffer = TRY(validate_and_pin_pointer_access(user_option_value, option_len, true));
{ BAN::ScopeGuard _([buffer] { buffer->unpin(); });
case SO_ERROR:
{
option_len = BAN::Math::min<socklen_t>(option_len, sizeof(int));
const int zero { 0 };
TRY(write_to_user(user_option_value, &zero, option_len));
TRY(write_to_user(user_option_len, &option_len, sizeof(socklen_t)));
return 0;
}
}
dwarnln("getsockopt(SOL_SOCKET, {})", option_name); TRY(inode->getsockopt(level, option_name, user_option_value, &option_len));
return BAN::Error::from_errno(ENOTSUP); TRY(write_to_user(user_option_len, &option_len, sizeof(socklen_t)));
return 0;
} }
BAN::ErrorOr<long> Process::sys_setsockopt(int socket, int level, int option_name, const void* user_option_value, socklen_t option_len) BAN::ErrorOr<long> Process::sys_setsockopt(int socket, int level, int option_name, const void* user_option_value, socklen_t option_len)
{ {
if (level != SOL_SOCKET)
{
dwarnln("{}: setsockopt level {}", name(), level);
return BAN::Error::from_errno(EINVAL);
}
if (option_len < 0) if (option_len < 0)
return BAN::Error::from_errno(EINVAL); return BAN::Error::from_errno(EINVAL);
@@ -1707,10 +1683,12 @@ namespace Kernel
if (!inode->mode().ifsock()) if (!inode->mode().ifsock())
return BAN::Error::from_errno(ENOTSOCK); return BAN::Error::from_errno(ENOTSOCK);
(void)user_option_value; auto* buffer = TRY(validate_and_pin_pointer_access(user_option_value, option_len, false));
BAN::ScopeGuard _([buffer] { buffer->unpin(); });
dwarnln("setsockopt(SOL_SOCKET, {})", option_name); TRY(inode->setsockopt(level, option_name, user_option_value, option_len));
return BAN::Error::from_errno(ENOTSUP);
return 0;
} }
BAN::ErrorOr<long> Process::sys_accept(int socket, sockaddr* address, socklen_t* address_len, int flags) BAN::ErrorOr<long> Process::sys_accept(int socket, sockaddr* address, socklen_t* address_len, int flags)
@@ -1802,10 +1780,10 @@ namespace Kernel
BAN::Vector<MemoryRegion*> regions; BAN::Vector<MemoryRegion*> regions;
BAN::ScopeGuard _([&regions] { BAN::ScopeGuard _([&regions] {
for (auto* region : regions) for (auto* region : regions)
if (region != nullptr) region->unpin();
region->unpin();
}); });
// FIXME: this can leak memory if push to regions fails but pinning succeeded
if (message.msg_name) if (message.msg_name)
TRY(regions.push_back(TRY(validate_and_pin_pointer_access(message.msg_name, message.msg_namelen, true)))); TRY(regions.push_back(TRY(validate_and_pin_pointer_access(message.msg_name, message.msg_namelen, true))));
if (message.msg_control) if (message.msg_control)
@@ -1832,8 +1810,7 @@ namespace Kernel
BAN::Vector<MemoryRegion*> regions; BAN::Vector<MemoryRegion*> regions;
BAN::ScopeGuard _([&regions] { BAN::ScopeGuard _([&regions] {
for (auto* region : regions) for (auto* region : regions)
if (region != nullptr) region->unpin();
region->unpin();
}); });
if (message.msg_name) if (message.msg_name)
@@ -1985,7 +1962,7 @@ namespace Kernel
BAN::ErrorOr<long> Process::sys_ppoll(pollfd* fds, nfds_t nfds, const timespec* user_timeout, const sigset_t* user_sigmask) BAN::ErrorOr<long> Process::sys_ppoll(pollfd* fds, nfds_t nfds, const timespec* user_timeout, const sigset_t* user_sigmask)
{ {
auto* fds_region = TRY(validate_and_pin_pointer_access(fds, nfds * sizeof(pollfd), true)); auto* fds_region = TRY(validate_and_pin_pointer_access(fds, nfds * sizeof(pollfd), true));
BAN::ScopeGuard _([fds_region] { if (fds_region) fds_region->unpin(); }); BAN::ScopeGuard _([fds_region] { fds_region->unpin(); });
const auto old_sigmask = Thread::current().m_signal_block_mask; const auto old_sigmask = Thread::current().m_signal_block_mask;
if (user_sigmask != nullptr) if (user_sigmask != nullptr)
@@ -2168,10 +2145,7 @@ namespace Kernel
} }
auto* events_region = TRY(validate_and_pin_pointer_access(events, maxevents * sizeof(epoll_event), true)); auto* events_region = TRY(validate_and_pin_pointer_access(events, maxevents * sizeof(epoll_event), true));
BAN::ScopeGuard _([events_region] { BAN::ScopeGuard _([events_region] { events_region->unpin(); });
if (events_region)
events_region->unpin();
});
const auto old_sigmask = Thread::current().m_signal_block_mask; const auto old_sigmask = Thread::current().m_signal_block_mask;
if (user_sigmask) if (user_sigmask)
@@ -2185,6 +2159,27 @@ namespace Kernel
return TRY(static_cast<Epoll*>(epoll_inode.ptr())->wait(BAN::Span<epoll_event>(events, maxevents), waketime_ns)); return TRY(static_cast<Epoll*>(epoll_inode.ptr())->wait(BAN::Span<epoll_event>(events, maxevents), waketime_ns));
} }
BAN::ErrorOr<long> Process::sys_eventfd(unsigned int initval, int flags)
{
if (flags & ~(EFD_CLOEXEC | EFD_NONBLOCK | EFD_SEMAPHORE))
return BAN::Error::from_errno(EINVAL);
int oflags = 0;
if (flags & EFD_CLOEXEC)
oflags |= O_CLOEXEC;
if (flags & EFD_NONBLOCK)
oflags |= O_NONBLOCK;
const bool is_semaphore = !!(flags & EFD_SEMAPHORE);
return TRY(m_open_file_descriptors.open(
VirtualFileSystem::File {
TRY(EventFD::create(initval, is_semaphore)),
"<eventfd>"_sv
}, oflags
));
}
BAN::ErrorOr<long> Process::sys_pipe(int user_fildes[2]) BAN::ErrorOr<long> Process::sys_pipe(int user_fildes[2])
{ {
int fildes[2]; int fildes[2];
@@ -2364,7 +2359,7 @@ namespace Kernel
return BAN::Error::from_errno(EOVERFLOW); return BAN::Error::from_errno(EOVERFLOW);
auto* list_region = TRY(validate_and_pin_pointer_access(list, list_len * sizeof(struct dirent), true)); auto* list_region = TRY(validate_and_pin_pointer_access(list, list_len * sizeof(struct dirent), true));
BAN::ScopeGuard _([list_region] { if (list_region) list_region->unpin(); }); BAN::ScopeGuard _([list_region] { list_region->unpin(); });
return TRY(m_open_file_descriptors.read_dir_entries(fd, list, list_len)); return TRY(m_open_file_descriptors.read_dir_entries(fd, list, list_len));
} }
@@ -2382,6 +2377,8 @@ namespace Kernel
TRY(read_string_from_user(user_path, path, PATH_MAX)); TRY(read_string_from_user(user_path, path, PATH_MAX));
auto new_cwd = TRY(find_file(AT_FDCWD, path, O_SEARCH)); auto new_cwd = TRY(find_file(AT_FDCWD, path, O_SEARCH));
if (!new_cwd.inode->mode().ifdir())
return BAN::Error::from_errno(ENOTDIR);
LockGuard _(m_process_lock); LockGuard _(m_process_lock);
m_working_directory = BAN::move(new_cwd); m_working_directory = BAN::move(new_cwd);
@@ -2665,6 +2662,9 @@ namespace Kernel
for (size_t j = 0; j < new_regions.size(); j++) for (size_t j = 0; j < new_regions.size(); j++)
TRY(m_mapped_regions.insert(i + j + 1, BAN::move(new_regions[j]))); TRY(m_mapped_regions.insert(i + j + 1, BAN::move(new_regions[j])));
while (i + 1 < m_mapped_regions.size() && !m_mapped_regions[i + 1]->overlaps(vaddr, len))
i++;
continue; continue;
} }
@@ -2948,7 +2948,7 @@ namespace Kernel
for (;;) for (;;)
{ {
while (Thread::current().will_exit_because_of_signal()) while (Thread::current().will_exit_because_of_signal())
Thread::current().handle_signal(); Thread::current().handle_signal_if_interrupted();
SpinLockGuard guard(m_signal_lock); SpinLockGuard guard(m_signal_lock);
if (!m_stopped) if (!m_stopped)
@@ -3047,7 +3047,7 @@ namespace Kernel
TRY(read_from_user(user_act, &new_act, sizeof(struct sigaction))); TRY(read_from_user(user_act, &new_act, sizeof(struct sigaction)));
{ {
SpinLockGuard signal_lock_guard(m_signal_lock); SpinLockGuard _(m_signal_lock);
old_act = m_signal_handlers[signal]; old_act = m_signal_handlers[signal];
if (user_act != nullptr) if (user_act != nullptr)
m_signal_handlers[signal] = new_act; m_signal_handlers[signal] = new_act;
@@ -3061,7 +3061,14 @@ namespace Kernel
BAN::ErrorOr<long> Process::sys_sigpending(sigset_t* user_sigset) BAN::ErrorOr<long> Process::sys_sigpending(sigset_t* user_sigset)
{ {
const sigset_t sigset = (signal_pending_mask() | Thread::current().m_signal_pending_mask) & Thread::current().m_signal_block_mask; sigset_t sigset;
{
auto& thread = Thread::current();
SpinLockGuard _(thread.m_signal_lock);
sigset = (signal_pending_mask() | thread.m_signal_pending_mask) & thread.m_signal_block_mask;
}
TRY(write_to_user(user_sigset, &sigset, sizeof(sigset_t))); TRY(write_to_user(user_sigset, &sigset, sizeof(sigset_t)));
return 0; return 0;
} }
@@ -3070,34 +3077,36 @@ namespace Kernel
{ {
LockGuard _(m_process_lock); LockGuard _(m_process_lock);
if (user_oset != nullptr) auto& thread = Thread::current();
{
const sigset_t current = Thread::current().m_signal_block_mask; const sigset_t old_sigset = thread.m_signal_block_mask;
TRY(write_to_user(user_oset, &current, sizeof(sigset_t)));
}
if (user_set != nullptr) if (user_set != nullptr)
{ {
sigset_t set; sigset_t mask;
TRY(read_from_user(user_set, &set, sizeof(sigset_t))); TRY(read_from_user(user_set, &mask, sizeof(sigset_t)));
mask &= ~((1ull << SIGKILL) | (1ull << SIGSTOP));
const sigset_t mask = set & ~(SIGKILL | SIGSTOP); SpinLockGuard _(thread.m_signal_lock);
switch (how) switch (how)
{ {
case SIG_BLOCK: case SIG_BLOCK:
Thread::current().m_signal_block_mask |= mask; thread.m_signal_block_mask |= mask;
break;
case SIG_SETMASK:
Thread::current().m_signal_block_mask = mask;
break; break;
case SIG_UNBLOCK: case SIG_UNBLOCK:
Thread::current().m_signal_block_mask &= ~mask; thread.m_signal_block_mask &= ~mask;
break;
case SIG_SETMASK:
thread.m_signal_block_mask = mask;
break; break;
default: default:
return BAN::Error::from_errno(EINVAL); return BAN::Error::from_errno(EINVAL);
} }
} }
if (user_oset != nullptr)
TRY(write_to_user(user_oset, &old_sigset, sizeof(sigset_t)));
return 0; return 0;
} }
@@ -3109,7 +3118,7 @@ namespace Kernel
LockGuard _(m_process_lock); LockGuard _(m_process_lock);
auto& thread = Thread::current(); auto& thread = Thread::current();
thread.set_suspend_signal_mask(set & ~(SIGKILL | SIGSTOP)); thread.set_suspend_signal_mask(set & ~((1ull << SIGKILL) | (1ull << SIGSTOP)));
// FIXME: i *think* here is a race condition as kill doesnt hold process lock // FIXME: i *think* here is a race condition as kill doesnt hold process lock
while (!thread.is_interrupted_by_signal()) while (!thread.is_interrupted_by_signal())
@@ -3188,7 +3197,7 @@ namespace Kernel
op &= ~(FUTEX_PRIVATE | FUTEX_REALTIME); op &= ~(FUTEX_PRIVATE | FUTEX_REALTIME);
auto* buffer_region = TRY(validate_and_pin_pointer_access(addr, sizeof(uint32_t), false)); auto* buffer_region = TRY(validate_and_pin_pointer_access(addr, sizeof(uint32_t), false));
BAN::ScopeGuard pin_guard([&] { if (buffer_region) buffer_region->unpin(); }); BAN::ScopeGuard pin_guard([buffer_region] { buffer_region->unpin(); });
const paddr_t paddr = m_page_table->physical_address_of(vaddr & PAGE_ADDR_MASK) | (vaddr & ~PAGE_ADDR_MASK); const paddr_t paddr = m_page_table->physical_address_of(vaddr & PAGE_ADDR_MASK) | (vaddr & ~PAGE_ADDR_MASK);
ASSERT(paddr != 0); ASSERT(paddr != 0);
@@ -3302,7 +3311,9 @@ namespace Kernel
BAN::ErrorOr<long> Process::sys_set_fsbase(void* addr) BAN::ErrorOr<long> Process::sys_set_fsbase(void* addr)
{ {
Thread::current().set_fsbase(reinterpret_cast<vaddr_t>(addr)); auto& thread = Thread::current();
thread.m_has_custom_fsbase = true;
thread.set_fsbase(reinterpret_cast<vaddr_t>(addr));
Processor::load_fsbase(); Processor::load_fsbase();
return 0; return 0;
} }
@@ -3314,7 +3325,9 @@ namespace Kernel
BAN::ErrorOr<long> Process::sys_set_gsbase(void* addr) BAN::ErrorOr<long> Process::sys_set_gsbase(void* addr)
{ {
Thread::current().set_gsbase(reinterpret_cast<vaddr_t>(addr)); auto& thread = Thread::current();
thread.m_has_custom_gsbase = true;
thread.set_gsbase(reinterpret_cast<vaddr_t>(addr));
Processor::load_gsbase(); Processor::load_gsbase();
return 0; return 0;
} }
@@ -3819,7 +3832,7 @@ namespace Kernel
return BAN::Error::from_errno(EPERM); return BAN::Error::from_errno(EPERM);
auto* region = TRY(validate_and_pin_pointer_access(groups, count * sizeof(gid_t), false)); auto* region = TRY(validate_and_pin_pointer_access(groups, count * sizeof(gid_t), false));
BAN::ScopeGuard pin_guard([region] { if (region) region->unpin(); }); BAN::ScopeGuard pin_guard([region] { region->unpin(); });
TRY(m_credentials.set_groups({ groups, count })); TRY(m_credentials.set_groups({ groups, count }));
@@ -3884,237 +3897,45 @@ namespace Kernel
return region->allocate_page_containing(address, wants_write); return region->allocate_page_containing(address, wants_write);
} }
// TODO: The following 3 functions could be simplified into one generic helper function extern "C" bool safe_user_memcpy(void*, const void*, size_t);
extern "C" bool safe_user_strncpy(void*, const void*, size_t);
static inline bool is_valid_user_address(const void* user_addr, size_t size)
{
const vaddr_t user_vaddr = reinterpret_cast<vaddr_t>(user_addr);
if (BAN::Math::will_addition_overflow<vaddr_t>(user_vaddr, size))
return false;
if (user_vaddr + size > USERSPACE_END)
return false;
return true;
}
BAN::ErrorOr<void> Process::read_from_user(const void* user_addr, void* out, size_t size) BAN::ErrorOr<void> Process::read_from_user(const void* user_addr, void* out, size_t size)
{ {
const vaddr_t user_vaddr = reinterpret_cast<vaddr_t>(user_addr); if (!is_valid_user_address(user_addr, size))
return BAN::Error::from_errno(EFAULT);
auto* out_u8 = static_cast<uint8_t*>(out); if (!safe_user_memcpy(out, user_addr, size))
size_t ncopied = 0; return BAN::Error::from_errno(EFAULT);
return {};
{
RWLockRDGuard _(m_memory_region_lock);
const size_t first_index = find_mapped_region(user_vaddr);
for (size_t i = first_index; ncopied < size && i < m_mapped_regions.size(); i++)
{
auto& region = m_mapped_regions[i];
if (!region->contains(user_vaddr + ncopied))
return BAN::Error::from_errno(EFAULT);
const size_t ncopy = BAN::Math::min<size_t>(
(region->vaddr() + region->size()) - (user_vaddr + ncopied),
size - ncopied
);
const size_t page_count = range_page_count(user_vaddr + ncopied, ncopy);
const vaddr_t page_base = (user_vaddr + ncopied) & PAGE_ADDR_MASK;
for (size_t p = 0; p < page_count; p++)
{
const auto flags = PageTable::UserSupervisor | PageTable::Present;
if ((m_page_table->get_page_flags(page_base + p * PAGE_SIZE) & flags) != flags)
goto read_from_user_with_allocation;
}
memcpy(out_u8 + ncopied, reinterpret_cast<void*>(user_vaddr + ncopied), ncopy);
ncopied += ncopy;
}
if (ncopied >= size)
return {};
if (ncopied > 0)
return BAN::Error::from_errno(EFAULT);
}
read_from_user_with_allocation:
RWLockWRGuard _(m_memory_region_lock);
const size_t first_index = find_mapped_region(user_vaddr + ncopied);
for (size_t i = first_index; ncopied < size && i < m_mapped_regions.size(); i++)
{
auto& region = m_mapped_regions[i];
if (!region->contains(user_vaddr + ncopied))
return BAN::Error::from_errno(EFAULT);
const size_t ncopy = BAN::Math::min<size_t>(
(region->vaddr() + region->size()) - (user_vaddr + ncopied),
size - ncopied
);
const size_t page_count = range_page_count(user_vaddr + ncopied, ncopy);
const vaddr_t page_base = (user_vaddr + ncopied) & PAGE_ADDR_MASK;
for (size_t p = 0; p < page_count; p++)
{
const auto flags = PageTable::UserSupervisor | PageTable::Present;
if ((m_page_table->get_page_flags(page_base + p * PAGE_SIZE) & flags) == flags)
continue;
if (!TRY(region->allocate_page_containing(page_base + p * PAGE_SIZE, false)))
return BAN::Error::from_errno(EFAULT);
}
memcpy(out_u8 + ncopied, reinterpret_cast<void*>(user_vaddr + ncopied), ncopy);
ncopied += ncopy;
}
if (ncopied >= size)
return {};
return BAN::Error::from_errno(EFAULT);
} }
BAN::ErrorOr<void> Process::read_string_from_user(const char* user_addr, char* out, size_t max_size) BAN::ErrorOr<void> Process::read_string_from_user(const char* user_addr, char* out, size_t max_size)
{ {
const vaddr_t user_vaddr = reinterpret_cast<vaddr_t>(user_addr); max_size = BAN::Math::min<size_t>(max_size, USERSPACE_END - reinterpret_cast<vaddr_t>(user_addr));
if (!is_valid_user_address(user_addr, max_size))
size_t ncopied = 0; return BAN::Error::from_errno(EFAULT);
if (!safe_user_strncpy(out, user_addr, max_size))
{ return BAN::Error::from_errno(EFAULT);
RWLockRDGuard _(m_memory_region_lock); return {};
const size_t first_index = find_mapped_region(user_vaddr);
for (size_t i = first_index; ncopied < max_size && i < m_mapped_regions.size(); i++)
{
auto& region = m_mapped_regions[i];
if (!region->contains(user_vaddr + ncopied))
return BAN::Error::from_errno(EFAULT);
vaddr_t last_page = 0;
for (; ncopied < max_size; ncopied++)
{
const vaddr_t curr_page = (user_vaddr + ncopied) & PAGE_ADDR_MASK;
if (curr_page != last_page)
{
const auto flags = PageTable::UserSupervisor | PageTable::Present;
if ((m_page_table->get_page_flags(curr_page) & flags) != flags)
goto read_string_from_user_with_allocation;
}
out[ncopied] = user_addr[ncopied];
if (out[ncopied] == '\0')
return {};
last_page = curr_page;
}
}
if (ncopied >= max_size)
return BAN::Error::from_errno(ENAMETOOLONG);
if (ncopied > 0)
return BAN::Error::from_errno(EFAULT);
}
read_string_from_user_with_allocation:
RWLockWRGuard _(m_memory_region_lock);
const size_t first_index = find_mapped_region(user_vaddr + ncopied);
for (size_t i = first_index; ncopied < max_size && i < m_mapped_regions.size(); i++)
{
auto& region = m_mapped_regions[i];
if (!region->contains(user_vaddr + ncopied))
return BAN::Error::from_errno(EFAULT);
vaddr_t last_page = 0;
for (; ncopied < max_size; ncopied++)
{
const vaddr_t curr_page = (user_vaddr + ncopied) & PAGE_ADDR_MASK;
if (curr_page != last_page)
{
const auto flags = PageTable::UserSupervisor | PageTable::Present;
if ((m_page_table->get_page_flags(curr_page) & flags) == flags)
;
else if (!TRY(region->allocate_page_containing(curr_page, false)))
return BAN::Error::from_errno(EFAULT);
}
out[ncopied] = user_addr[ncopied];
if (out[ncopied] == '\0')
return {};
last_page = curr_page;
}
}
if (ncopied >= max_size)
return BAN::Error::from_errno(ENAMETOOLONG);
return BAN::Error::from_errno(EFAULT);
} }
BAN::ErrorOr<void> Process::write_to_user(void* user_addr, const void* in, size_t size) BAN::ErrorOr<void> Process::write_to_user(void* user_addr, const void* in, size_t size)
{ {
const vaddr_t user_vaddr = reinterpret_cast<vaddr_t>(user_addr); if (!is_valid_user_address(user_addr, size))
return BAN::Error::from_errno(EFAULT);
const auto* in_u8 = static_cast<const uint8_t*>(in); if (!safe_user_memcpy(user_addr, in, size))
size_t ncopied = 0; return BAN::Error::from_errno(EFAULT);
return {};
{
RWLockRDGuard _(m_memory_region_lock);
const size_t first_index = find_mapped_region(user_vaddr);
for (size_t i = first_index; ncopied < size && i < m_mapped_regions.size(); i++)
{
auto& region = m_mapped_regions[i];
if (!region->contains(user_vaddr + ncopied))
return BAN::Error::from_errno(EFAULT);
const size_t ncopy = BAN::Math::min<size_t>(
(region->vaddr() + region->size()) - (user_vaddr + ncopied),
size - ncopied
);
const size_t page_count = range_page_count(user_vaddr + ncopied, ncopy);
const vaddr_t page_base = (user_vaddr + ncopied) & PAGE_ADDR_MASK;
for (size_t i = 0; i < page_count; i++)
{
const auto flags = PageTable::UserSupervisor | PageTable::ReadWrite | PageTable::Present;
if ((m_page_table->get_page_flags(page_base + i * PAGE_SIZE) & flags) != flags)
goto write_to_user_with_allocation;
}
memcpy(reinterpret_cast<void*>(user_vaddr + ncopied), in_u8 + ncopied, ncopy);
ncopied += ncopy;
}
if (ncopied >= size)
return {};
if (ncopied > 0)
return BAN::Error::from_errno(EFAULT);
}
write_to_user_with_allocation:
RWLockWRGuard _(m_memory_region_lock);
const size_t first_index = find_mapped_region(user_vaddr + ncopied);
for (size_t i = first_index; ncopied < size && i < m_mapped_regions.size(); i++)
{
auto& region = m_mapped_regions[i];
if (!region->contains(user_vaddr + ncopied))
return BAN::Error::from_errno(EFAULT);
const size_t ncopy = BAN::Math::min<size_t>(
(region->vaddr() + region->size()) - (user_vaddr + ncopied),
size - ncopied
);
const size_t page_count = range_page_count(user_vaddr + ncopied, ncopy);
const vaddr_t page_base = (user_vaddr + ncopied) & PAGE_ADDR_MASK;
for (size_t p = 0; p < page_count; p++)
{
const auto flags = PageTable::UserSupervisor | PageTable::ReadWrite | PageTable::Present;
if ((m_page_table->get_page_flags(page_base + p * PAGE_SIZE) & flags) == flags)
continue;
if (!TRY(region->allocate_page_containing(page_base + p * PAGE_SIZE, true)))
return BAN::Error::from_errno(EFAULT);
}
memcpy(reinterpret_cast<void*>(user_vaddr + ncopied), in_u8 + ncopied, ncopy);
ncopied += ncopy;
}
if (ncopied >= size)
return {};
return BAN::Error::from_errno(EFAULT);
} }
BAN::ErrorOr<MemoryRegion*> Process::validate_and_pin_pointer_access(const void* ptr, size_t size, bool needs_write) BAN::ErrorOr<MemoryRegion*> Process::validate_and_pin_pointer_access(const void* ptr, size_t size, bool needs_write)

Some files were not shown because too many files have changed in this diff Show More