deanoburrito / northport Goto Github PK

View Code? Open in Web Editor NEW

82.0 82.0 6.0 2.39 MB

Monolithic kernel, drivers and support libraries for x86_64, riscv64 and m68k.

License: MIT License

Makefile 2.80% C++ 86.63% C 9.56% Assembly 1.01%

c-plus-plus kernel operating-system os osdev

northport's People

Contributors

Stargazers

Watchers

Forkers

dreamos82 ahnjihwan passw rifat87 nomadarchitect fengjixuchui

northport's Issues

Scheduler Rewrite: per-core focus

Currently we have a single global scheduler, that all cores get their next thread from.
We're well overdue for a rewrite, I'd like to split the scheduler into a smaller thread list, that is per-core.
Occasionally cores will need to re-balance the number of threads in each work queue, probably with quieter cores stealing from more-loaded cores.

This may also be a good time to separate the idea of a process/thread manager, and a scheduler. Currently our scheduler is also responsible for managing processes and threads, it could be a nice idea to clean this up

Documentation: part 2

A lot of the kernel and libraries have received significant updates since the last documentation push.
Things that are missing:

VFS
Driver management subsystem (and how to contribute drivers
High level overview of the kernel systems
SMP docs
PCI

Also with the build system refactor, a lot of the links to source files have broken. Lets fix these.
It would also be nice to include some code guidelines/style guides too.

Cleanup PlatformInit() into schedulable tasks.

A lot of what's in PlatformInit() (KernelMain.cpp) is quite general. Things like initializing PCI, potentially drivers and other devices like PS2 controllers dont really need to be in the pre-scheduled environment.

It'd be a good test of the scheduler (and potentially faster?) to have this done as part of tasks that run under the scheduler.

Remove accidental HHDM usage

The HHDM maps all usable physical memory (or up to 4G, whichever is higher), however we also use it accessing other things by their physical address. So far this hasn't caused any issues, but it's still a bug.

Current known accesses outside of HHDM:

ACPI tables
PCI config space when using ECAM.

Cleanup syslib/include/Memory.h

This file contains memory related functions, but also things like EnumHasFlag() and move/swap.
Lets move some of the utility functions out into Utility.h.
While we're at it, lets move Forward() (from CppStd.h) and Launder there as well.

This would be a good time to fix capitalization on move/swap to Move/Swap as well.

IdAllocator: Add a retired IDs queue

Low hanging fruit here! IdAllocator sees a lot of use however it immediately reuses any freed IDs. It'd be nice to have a queue of IDs that have been freed, but wont be reallocated until a certain condition is met. I'm thinking we keep a steady queue length of some amount (maybe 64-128?) IDs, and flush it whenever another id is freed.
Perhaps we add an option in the constructor for whether to use this feature or not, since it will slow down allocating/freeing ids.

Kernel crashes with SMP enabled at high core counts

This looks like a heisenbug. So far I've been able to reproduce it on my desktop (16 cores), my laptop (also 16 cores) seems to be fine - but this may be timing related as the mobile processor is a lot slower. The kernel does work on the same machines when only using a single core, increasing the number of used cores seems to increase the speed at which the crash happens.
This can happen in KVM too, although it's a lot rarer, and adding print statements to the scheduling logic seems to the outright remove the issue.

For reference there's only a handful of places multiple cores interact with each other:

The clock: when an AP adds a new clock event it accesses the global event list.
The scheduler: each core operates mostly independently, but [en/de]queuing threads can happen across cores, and work stealing will access (atomically, so unlikely to be the issue) another core's work queue. Idle thread's are unique to each core, so shouldn't be a problem.
IPI mailboxes: any core can access any core's mailbox (including it's own).

Build system portability issues

I've made use of echo -e in the build system to make the output a bit friendler but this is not posix compliant usage of echo. Some shells may (and do) break with this usage, including shell used for github CI. This results in stray characters being emitted everywhere.

This brings up an important question of what other assumptions are made within the build system when it comes to being portable.

Make use of global pointer on riscv

Initially I had decided not to use the global pointer, but it's saved and restored at every entry point into the kernel it'd be a waste not to use it.

Depreciate C-Strings in syscalls, use K-Strings instead.

Currently a number of the system calls take in c style strings as arguments, lets update them to use k-strings instead.

FormatPrinter has unhandled cases

FormatPrinter (the class that handles printf style formatting) current outputs 'TOKEN' in place of all tokens. We should replace this so its actually functional.
This can be done in stages:

Integers (signed and unsigned)
Floating point numbers (single, double and extended precision)
Misc specifiers (string, single char, output char count)
Custom specifiers (bool).

Support load/save of floating point & vector registers.

Referring to the non-integer registers of any platform, it would be nice to store these alongside each runnable trap frame.
We probably want to define multiple sets of registers (say one program uses vector regs, but not FPU or the opposite), that way we only need to save and restore what's used (or what's available).
We also should track the state of each set: has it been used at all (uninit), has it been used but not updated (clean), used and has modified state (dirty).

Fix Timer Calibration

I'm not sure when this happened, but occasionally the calibration time for a core will be way out.

The lapic timers seem to be calibrated correctly on each core, according to some quick tests, so it's some kind of PIT or conversion error.

C++ name unmangling

Elf64HeaderParser can provide symbol names, but their usefulness varies depending on the encoding of the symbol.
It would be nice to provide a function sl::String Unmangle(), to display more useful information.

Better use of sl::EnumHasFlag()

This function was introduced as a nice way of working with strongly typed enum classes, without breaking the illusion of c++'s enum class (which this function ironically undoes).
There's likely a few places this function could be implemented to improve readability, and we should mark it as FORCE_INLINE to eliminate that slight overhead.
A simple regex or manual search should do the trick.

Kernel page tables can fall out of sync

Currently: all pages table are cloned from the kernel pages, and the bottom half is used for programs. If kernel code modifies memory in any of the layers except the top, this is fine, as these are shared structures.
However if paging data is modified the lop level of the structure (pml4, or pml5 is la57 is enabled) this wont be updated on other kernel pagetables (on other processes).

One solution is to simply map the entire higher half of the top level table, and then level the page faults occur at top_level - 1. This would use slightly more initial memory as the top_level - 1 pages would need to be allocated (~1MB of memory reserved for these). This is a one time cost, as these pages would be referenced by all top level paging structures. No extra memory cost would occur for cloned tables, as a full top level table is required anyway, and would simply point to these already created structures.

Clang builds are extremely broken.

Building the kernel with clang causes the kernel to break at InitPlatform().
I'd like to support both GCC and clang ideally.

SimpleFramebuffer::DrawLine() implementation

This needs an implementation!

Mysterious stack corruption

Currently unable to reproduce this bug myself, known to be present in 04de59a and older. OS was run using a standard ubuntu LTS install (qemu 4.2).

Output of readelf -e startup.elf here: https://pastebin.com/NqKK3hsx
Output of addr2line on faulting address:

$ addr2line -e initdisk/apps/startup.elf -a 0x2025f9
0x00000000002025f9
/home/xyz/Documents/Github/northport/libs/np-syscall/SyscallFunctions.cpp:11

Logging output of the kernel from the crash:

System call tidying

A lot has been learnt since first implementing most of the system calls, and I think it's time to do a pass over the api. Things like enable/disable device events don't really need to be 2 separate calls, and could be combined.
Also artificially limiting ourselves to 16 calls per group doesn't allow a lot of room for future expansion, and leads to possibly having 2 or more groups of the same type, but fragmented across the api. Let's split the id into 2 parts: upper 32 bits is the group, lower 32 bits is the call id within that group. This does completely remove the jump table option (we could still use a table of tables).
Theres also currently no version management, I'd like to support this at a group level.

Support clock events with expiry longer than clock period

Currently the system clock allows adding events with an expiry time longer than the period of the backing clock.
This could lead to undefined results (depending on how the underlying hardware handles the write - like its just an overflow).

One possible solution is allow the sys timer to report its maximum period, and this is compared against the next value to be used for a timer interrupt. It it's too long we inject a fake event at the head of list, as many as necessary until the target time is reached.

This might also be a good time to look at adding new events to the front of the queue, which would require cancelling and then resetting the sys timer. This should be doable in most cases.

IO APIC will never write NMIs

847f221 fixes the problem of accidentally overwriting NMI redirect entries, but now we can never write them at all.

My proposed solution is to just construct the redirect entry manually, and then use IoApic::WriteReg(entryNum, redirect); to skip over the protection logic. This is fine since it's coming from inside of the ioapic init code (where we can try to assume) we know what we're doing.

Fix booting with multiple cores corrupting heap.

A quick investigation looks like a heap node is overwriting part of the scheduler structure in the AP core local variables. Which then causes the vector think it has more entries than it does, overwriting the next header, and it continues.

A better solution might be to write a slab allocator for the kernel, and use that instead of the linkedlist one. This avoids the issues with heap nodes. We could even do custom allocation like with the idt setup, this would lead to some wasted space, as we dont have many cores, and their local variables arent very big.

Occasional bad symbol name inside of symbol store.

During the panic sequence a stack trace is printed, including symbol names. These names come from the symbol store for the kernel which also includes loaded drivers.
Rarely an exception will trigger while printing the stack trace, which has been narrowed down to a page fault while trying to read a symbol name.
It'd be good to figure out where these are coming from.

Syslib: Expanded Template Library

With a proper userspace on the horizon it'd be nice to have more of the usual templates available. In particular:

map/unordered map
dequeue/queue
hashtable/hashmap

The existing vector and circular queue implementations could use improvements as well, mainly in their utility functions and iterators.

Scheduler fails to initialize entry task.

Unable to reliably reproduce this bug (may hint at a race condition). When building for riscv64 the final core to call Scheduler::RegisterCore() fails to switch task immediately, resulting in the ASSERT_UNREACHABLE() following that function call to fail (see the file CommonInit.cpp). This bug has been observed at all optimization levels and both compilers.

Audit for race conditions within early kernel init.

The local apic timer calibration has a race condition where many cores can reset the io apic, interfering with each other's calibration.

A solution is in the works for this, however there may be more bugs lurking. Good thing to tackle after the scheduler refactor!

Enforce HHDM Upper Limit

This one is low priority.

Most things mapped within the kernel are either within the hhdm, or mapped as relative to it (the kernel heaps). Kernel stacks begin at the fixed address of 0xffff'd555'0000'0000, which gives us a good ~1.3TiB of space for the HHDM before it would collide with the stacks. It's actually less as there needs to be room for the heap, which occupies a few GiBs of it's own (multiple segments with guard pages in between).

This is unlikely to ever be a real issue, but I'd like to address it anyway. Possible solutions?

Ignore anything in the memory map above a fixed address. Ugly, but it would work.
Some kind of 'paged' hhdm, where to access memory address x we would access it at hhdm_base + (x % hhdm_limit), and if x / hhdm_limit > 1 we would modify the mapped pages so that they allow access to x. This is memory efficient, but would potentially require interaction with the vmm everytime we tried to access hhdm memory.
Have two areas of the hhdm. The normal lower region, and then a fixed size upper region. I already dont like this one haha. THe upper region is only active when the lower region is full, and would function similarly to the previously where its actual mapping varies. In this case we could place the burden for managing this on whatever software is using it. I.e. drivers with insane mmio mappings would need to handle that themselves.

`SetRsdp()` can truncate ACPI table accesses

We've assumed that all ACPI tables begin on a page boundary, as we're using the addresses returned from VMM::Kernel().Alloc(), which are aligned down to the nearest page.
This appears to be working fine under UEFI, but can break on some bios systems (namely qemu + seabios).

This hints at a deeper issue with the KernelVmDriver, where mmio mappings are assumed to be page-aligned, which is not always the case. We probably need to involve the virtual memory drivers in decide how many pages are required for a vm range (this would also make it easy to transparently implement things like guard pages or page-heap).

C++ demangler has some strange edge cases.

I didnt do enough testing on this one, as itll occasionally spit out some nonsense.
I think a big part of the problem is treating each scoped name entry as its own token, which can cause pointers to be displayed in the wrong place.

There are 2 paths forward with this:

We keep track of appending text (like a pointer), waiting until a scoped name has ended before it's printed. I like this the most, as there's only 1 case of appending text, the rest are prepended. However this isnt super maintainable, and more state is more room for unexpected errors.
We could also handle this on the input parser. When parsing a scoped name, we output the shorthand tokens as needed, but only output a single scoped name, and perhaps the token maintains a list text items (start, len - all from original source). This keeps the output simple, however adds complexity on the input.

Change `sl::String` to accept `sl::StringSpan` for comparisons and other functions, instead of other strings

The title says it all. The original implementation of sl::String is quite old and was before I added spans to syslib. It'd be nice to flip this paradigm upside down and have all of string's operations that require other strings to take in stringspans instead. To preserve existing code we could add an implicit cast from string -> stringspan.

Fix increased boot times

I believe its coming from mapping physical memory during PageTableManager::InitKernel(), likely during the 1gb page fill.

This really dosnt have to be like this, the bootloader dosnt have this problem, Perhaps there's a better solution.

Implement PageTableManager::MapRange/UnmapRange

These functions would speed up future development, and would be useful to have.
Their prototypes are already defined, but currently commented out, in kernel/memory/Paging.h

At the time of writing, MapRanges will likely need PMM::AllocPages(), which is also currently unimplemented.

Better SimpleFramebuffer implementation

It does what it needs to, however it locks on every draw call. So if we're using DrawPixel to render complex shapes, we'll be locking every call.
It'd be good to have an overload of the smaller functions that take an overload with an existing lock - that way we can lock before/after the loop of DrawPixel calls.

Finish support for x86 vector registers

A lot of the ground work as been done. SSE is currently enabled, but no state is saved on context switches. We would also want to stub out a handler for SIMD errors should they occur.

Once this is enabled we can enable SSE instructions in some of the userspace libraries, or all of them if we do 2 builds (1 kernel - no sse, 1 userspace - all features).

Have option for page manager to free physical pages

MapMemory() can allocate physical frames if needed, however it won't free them. I don't really want to tore this info, I'd rather it was explicitly specific, with a reasonable default (yes, free physical page).

UnmapRange also has the problem of freeing page frames that are non-consecutive, we have no way to return this info to the caller. Therefore we should always free physical frames, unless explicitly told not to (maybe its shared memory).

TryGetUInt/TryGetInt should support all reasonable number bases

Currently these methods determine the base of the incoming number via bool isHex. It would be nice to use a similar system to IntToSting(), which takes a number to use as the base.

Low priority for now, would be nice.

Global Kernel Config

This is not going to replace a lot of the local config variables around the place, but it would be nice to have a centralized place for all kernel settings. I'd like to have all compile-time and run-time config end up in a central place, overriding each as in a reasonable way. This would also be a good excuse to start accepting command line arguments, and have those contribute to the config.

Since this is likely going to introduce several breaking changes, lets also set some rules on how the various config names are generated. With the intent being that if you know the compile-time name of something, you could easily guess the command-line name of the same option.

We could also include things like the build [id|time], compiler and some other fun info.

Documentation

It's time, for the first stage I'd like to have it integrated into the build system, and have copies built (and published in the releases section) by github CI on new commits.

Topics to cover:

Project build system, how to use it, how to extend it.
Source code layout
High level overview of kernel
Logging subsystem
Init sequence
PMM
VMM (in it's current state)
Kernel heap
Supported platforms and their quirks
Scheduler
Clocks/timers
VFS

Drivers and the device API are still WIP so they are notably left out.
It'd be nice to include a separate on the retionale behind certain design decisions (in a less formal style).

Retire use of attribute(())

Most of the code uses the newer [[attribute_here]] syntax, however there are still a few uses of __attribute__((attribute_here)). It'd be good to do a pass over everything, make it all consistent!

Deadlock in QueueClockEvent

QueueClockEvent can be called inside of an interrupt handler (in response to a timer interrupt firing), and if the event is periodic it will be re-queued. If the interrupted code was inside of one of the heap locks we may deadlock. This deadlock is pretty hard to come back, as it would use the smallest slab, which is mostly unused by the rest of the kernel, but still possible.

The line is question is: https://github.com/DeanoBurrito/northport/blob/master/kernel/tasking/Clock.cpp#L91

One possible solution would be to not free the memory when dequeuing the event initially (which is another possible deadlock), and then reuse this memory when re-adding later. This seems fine but is not scalable if we have the issue again in the future, maybe we add some sort of magazine allocator which caches extra entries?