deanoburrito / northport Goto Github PK
View Code? Open in Web Editor NEWMonolithic kernel, drivers and support libraries for x86_64, riscv64 and m68k.
License: MIT License
Monolithic kernel, drivers and support libraries for x86_64, riscv64 and m68k.
License: MIT License
Currently we have a single global scheduler, that all cores get their next thread from.
We're well overdue for a rewrite, I'd like to split the scheduler into a smaller thread list, that is per-core.
Occasionally cores will need to re-balance the number of threads in each work queue, probably with quieter cores stealing from more-loaded cores.
This may also be a good time to separate the idea of a process/thread manager, and a scheduler. Currently our scheduler is also responsible for managing processes and threads, it could be a nice idea to clean this up
A lot of the kernel and libraries have received significant updates since the last documentation push.
Things that are missing:
Also with the build system refactor, a lot of the links to source files have broken. Lets fix these.
It would also be nice to include some code guidelines/style guides too.
A lot of what's in PlatformInit() (KernelMain.cpp) is quite general. Things like initializing PCI, potentially drivers and other devices like PS2 controllers dont really need to be in the pre-scheduled environment.
It'd be a good test of the scheduler (and potentially faster?) to have this done as part of tasks that run under the scheduler.
The HHDM maps all usable physical memory (or up to 4G, whichever is higher), however we also use it accessing other things by their physical address. So far this hasn't caused any issues, but it's still a bug.
Current known accesses outside of HHDM:
This file contains memory related functions, but also things like EnumHasFlag() and move/swap.
Lets move some of the utility functions out into Utility.h.
While we're at it, lets move Forward() (from CppStd.h) and Launder there as well.
This would be a good time to fix capitalization on move/swap to Move/Swap as well.
Low hanging fruit here! IdAllocator
sees a lot of use however it immediately reuses any freed IDs. It'd be nice to have a queue of IDs that have been freed, but wont be reallocated until a certain condition is met. I'm thinking we keep a steady queue length of some amount (maybe 64-128?) IDs, and flush it whenever another id is freed.
Perhaps we add an option in the constructor for whether to use this feature or not, since it will slow down allocating/freeing ids.
This looks like a heisenbug. So far I've been able to reproduce it on my desktop (16 cores), my laptop (also 16 cores) seems to be fine - but this may be timing related as the mobile processor is a lot slower. The kernel does work on the same machines when only using a single core, increasing the number of used cores seems to increase the speed at which the crash happens.
This can happen in KVM too, although it's a lot rarer, and adding print statements to the scheduling logic seems to the outright remove the issue.
For reference there's only a handful of places multiple cores interact with each other:
I've made use of echo -e
in the build system to make the output a bit friendler but this is not posix compliant usage of echo. Some shells may (and do) break with this usage, including shell used for github CI. This results in stray characters being emitted everywhere.
This brings up an important question of what other assumptions are made within the build system when it comes to being portable.
Initially I had decided not to use the global pointer, but it's saved and restored at every entry point into the kernel it'd be a waste not to use it.
Currently a number of the system calls take in c style strings as arguments, lets update them to use k-strings instead.
FormatPrinter (the class that handles printf style formatting) current outputs 'TOKEN' in place of all tokens. We should replace this so its actually functional.
This can be done in stages:
Referring to the non-integer registers of any platform, it would be nice to store these alongside each runnable trap frame.
We probably want to define multiple sets of registers (say one program uses vector regs, but not FPU or the opposite), that way we only need to save and restore what's used (or what's available).
We also should track the state of each set: has it been used at all (uninit), has it been used but not updated (clean), used and has modified state (dirty).
Elf64HeaderParser can provide symbol names, but their usefulness varies depending on the encoding of the symbol.
It would be nice to provide a function sl::String Unmangle()
, to display more useful information.
This function was introduced as a nice way of working with strongly typed enum classes, without breaking the illusion of c++'s enum class (which this function ironically undoes).
There's likely a few places this function could be implemented to improve readability, and we should mark it as FORCE_INLINE
to eliminate that slight overhead.
A simple regex or manual search should do the trick.
Currently: all pages table are cloned from the kernel pages, and the bottom half is used for programs. If kernel code modifies memory in any of the layers except the top, this is fine, as these are shared structures.
However if paging data is modified the lop level of the structure (pml4, or pml5 is la57 is enabled) this wont be updated on other kernel pagetables (on other processes).
One solution is to simply map the entire higher half of the top level table, and then level the page faults occur at top_level - 1. This would use slightly more initial memory as the top_level - 1 pages would need to be allocated (~1MB of memory reserved for these). This is a one time cost, as these pages would be referenced by all top level paging structures. No extra memory cost would occur for cloned tables, as a full top level table is required anyway, and would simply point to these already created structures.
Building the kernel with clang causes the kernel to break at InitPlatform().
I'd like to support both GCC and clang ideally.
This needs an implementation!
Currently unable to reproduce this bug myself, known to be present in 04de59a and older. OS was run using a standard ubuntu LTS install (qemu 4.2).
Output of readelf -e startup.elf
here: https://pastebin.com/NqKK3hsx
Output of addr2line
on faulting address:
$ addr2line -e initdisk/apps/startup.elf -a 0x2025f9
0x00000000002025f9
/home/xyz/Documents/Github/northport/libs/np-syscall/SyscallFunctions.cpp:11
A lot has been learnt since first implementing most of the system calls, and I think it's time to do a pass over the api. Things like enable/disable device events don't really need to be 2 separate calls, and could be combined.
Also artificially limiting ourselves to 16 calls per group doesn't allow a lot of room for future expansion, and leads to possibly having 2 or more groups of the same type, but fragmented across the api. Let's split the id into 2 parts: upper 32 bits is the group, lower 32 bits is the call id within that group. This does completely remove the jump table option (we could still use a table of tables).
Theres also currently no version management, I'd like to support this at a group level.
Currently the system clock allows adding events with an expiry time longer than the period of the backing clock.
This could lead to undefined results (depending on how the underlying hardware handles the write - like its just an overflow).
One possible solution is allow the sys timer to report its maximum period, and this is compared against the next value to be used for a timer interrupt. It it's too long we inject a fake event at the head of list, as many as necessary until the target time is reached.
This might also be a good time to look at adding new events to the front of the queue, which would require cancelling and then resetting the sys timer. This should be doable in most cases.
847f221 fixes the problem of accidentally overwriting NMI redirect entries, but now we can never write them at all.
My proposed solution is to just construct the redirect entry manually, and then use IoApic::WriteReg(entryNum, redirect);
to skip over the protection logic. This is fine since it's coming from inside of the ioapic init code (where we can try to assume) we know what we're doing.
A quick investigation looks like a heap node is overwriting part of the scheduler structure in the AP core local variables. Which then causes the vector think it has more entries than it does, overwriting the next header, and it continues.
A better solution might be to write a slab allocator for the kernel, and use that instead of the linkedlist one. This avoids the issues with heap nodes. We could even do custom allocation like with the idt setup, this would lead to some wasted space, as we dont have many cores, and their local variables arent very big.
During the panic sequence a stack trace is printed, including symbol names. These names come from the symbol store for the kernel which also includes loaded drivers.
Rarely an exception will trigger while printing the stack trace, which has been narrowed down to a page fault while trying to read a symbol name.
It'd be good to figure out where these are coming from.
With a proper userspace on the horizon it'd be nice to have more of the usual templates available. In particular:
The existing vector and circular queue implementations could use improvements as well, mainly in their utility functions and iterators.
Unable to reliably reproduce this bug (may hint at a race condition). When building for riscv64 the final core to call Scheduler::RegisterCore()
fails to switch task immediately, resulting in the ASSERT_UNREACHABLE()
following that function call to fail (see the file CommonInit.cpp). This bug has been observed at all optimization levels and both compilers.
The local apic timer calibration has a race condition where many cores can reset the io apic, interfering with each other's calibration.
A solution is in the works for this, however there may be more bugs lurking. Good thing to tackle after the scheduler refactor!
This one is low priority.
Most things mapped within the kernel are either within the hhdm, or mapped as relative to it (the kernel heaps). Kernel stacks begin at the fixed address of 0xffff'd555'0000'0000
, which gives us a good ~1.3TiB of space for the HHDM before it would collide with the stacks. It's actually less as there needs to be room for the heap, which occupies a few GiBs of it's own (multiple segments with guard pages in between).
This is unlikely to ever be a real issue, but I'd like to address it anyway. Possible solutions?
x
we would access it at hhdm_base + (x % hhdm_limit)
, and if x / hhdm_limit > 1
we would modify the mapped pages so that they allow access to x
. This is memory efficient, but would potentially require interaction with the vmm everytime we tried to access hhdm memory.We've assumed that all ACPI tables begin on a page boundary, as we're using the addresses returned from VMM::Kernel().Alloc()
, which are aligned down to the nearest page.
This appears to be working fine under UEFI, but can break on some bios systems (namely qemu + seabios).
This hints at a deeper issue with the KernelVmDriver, where mmio mappings are assumed to be page-aligned, which is not always the case. We probably need to involve the virtual memory drivers in decide how many pages are required for a vm range (this would also make it easy to transparently implement things like guard pages or page-heap).
I didnt do enough testing on this one, as itll occasionally spit out some nonsense.
I think a big part of the problem is treating each scoped name entry as its own token, which can cause pointers to be displayed in the wrong place.
There are 2 paths forward with this:
The title says it all. The original implementation of sl::String
is quite old and was before I added spans to syslib. It'd be nice to flip this paradigm upside down and have all of string's operations that require other strings to take in stringspans instead. To preserve existing code we could add an implicit cast from string -> stringspan.
I believe its coming from mapping physical memory during PageTableManager::InitKernel()
, likely during the 1gb page fill.
This really dosnt have to be like this, the bootloader dosnt have this problem, Perhaps there's a better solution.
These functions would speed up future development, and would be useful to have.
Their prototypes are already defined, but currently commented out, in kernel/memory/Paging.h
At the time of writing, MapRanges will likely need PMM::AllocPages()
, which is also currently unimplemented.
It does what it needs to, however it locks on every draw call. So if we're using DrawPixel to render complex shapes, we'll be locking every call.
It'd be good to have an overload of the smaller functions that take an overload with an existing lock - that way we can lock before/after the loop of DrawPixel calls.
A lot of the ground work as been done. SSE is currently enabled, but no state is saved on context switches. We would also want to stub out a handler for SIMD errors should they occur.
Once this is enabled we can enable SSE instructions in some of the userspace libraries, or all of them if we do 2 builds (1 kernel - no sse, 1 userspace - all features).
MapMemory() can allocate physical frames if needed, however it won't free them. I don't really want to tore this info, I'd rather it was explicitly specific, with a reasonable default (yes, free physical page).
UnmapRange also has the problem of freeing page frames that are non-consecutive, we have no way to return this info to the caller. Therefore we should always free physical frames, unless explicitly told not to (maybe its shared memory).
Currently these methods determine the base of the incoming number via bool isHex
. It would be nice to use a similar system to IntToSting(), which takes a number to use as the base.
Low priority for now, would be nice.
This is not going to replace a lot of the local config variables around the place, but it would be nice to have a centralized place for all kernel settings. I'd like to have all compile-time and run-time config end up in a central place, overriding each as in a reasonable way. This would also be a good excuse to start accepting command line arguments, and have those contribute to the config.
Since this is likely going to introduce several breaking changes, lets also set some rules on how the various config names are generated. With the intent being that if you know the compile-time name of something, you could easily guess the command-line name of the same option.
We could also include things like the build [id|time], compiler and some other fun info.
It's time, for the first stage I'd like to have it integrated into the build system, and have copies built (and published in the releases section) by github CI on new commits.
Topics to cover:
Drivers and the device API are still WIP so they are notably left out.
It'd be nice to include a separate on the retionale behind certain design decisions (in a less formal style).
Most of the code uses the newer [[attribute_here]]
syntax, however there are still a few uses of __attribute__((attribute_here))
. It'd be good to do a pass over everything, make it all consistent!
QueueClockEvent can be called inside of an interrupt handler (in response to a timer interrupt firing), and if the event is periodic it will be re-queued. If the interrupted code was inside of one of the heap locks we may deadlock. This deadlock is pretty hard to come back, as it would use the smallest slab, which is mostly unused by the rest of the kernel, but still possible.
The line is question is: https://github.com/DeanoBurrito/northport/blob/master/kernel/tasking/Clock.cpp#L91
One possible solution would be to not free the memory when dequeuing the event initially (which is another possible deadlock), and then reuse this memory when re-adding later. This seems fine but is not scalable if we have the issue again in the future, maybe we add some sort of magazine allocator which caches extra entries?
As cool as stivale is, it would be nice to remove references from it from random parts of the init code. Perhaps into some functions that parse the tags given to us by the limine.
A nice side affect of this is that we could potentially support multiple boot protocols (multiboot 2 perhaps) in the future, although these would require dedicated entry stubs to get into the stivale2 defined environment.
For some reason I applied the flags to every level of the page table (only when we generate a new entry however). This can lead to random areas being marked as forever readonly or no-execute.
Kernel code breaks with any sort of optimization enabled. Both on GCC and clang. This is something I remember coming across back in the IDT handler rewrite. I believe it's an offset issue with how the relative jump is being written to memory.
The system reset seems to occur immediately after an unhandled interrupt vector is received (that is already concerning, possibly worth looking into), at some point during InitAllCores(). Possibly local APIC timer calibration?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.