Comments (11)
@jhogberg I will be back to this soon.
I think yeah, theoretically CPU can prefetch pages so I guess it would not be safe to do that, which is a pity as with block flushing it would be theoretically pretty fast even if there is a need to do a syscall to propagate the flushing to all threads.
But still, I'm not 100% sure, maybe CPUs would only prefetch i-cache lines that are reachable from code - but this is not something I would want to do if there is uncertainty as I really don't like random hard to repro crashes due to this.
from asmjit.
Flush instruction cache after the code has been written (done by
ProtectJitReadWriteScope
destructor).
To add another wrinkle to this, it's not enough to do this on the thread that modified the code in a multi-threaded environment. For it to work properly one needs to issue an instruction synchronization barrier on all threads that will execute the code (either by using a serializing instruction like cpuid
on x86
or isb
on aarch64
).
We ran into this recently on aarch64
, doing it wrong worked for an embarrassingly long time because the way we synchronized other code-related things made us hit implicit barriers all the time (mainly syscalls and being scheduled out).
Is the current API sufficient? How to support environments where it's not sufficient?
We find it sufficient, but if JitAllocator
gave us less freedom to cross-modify running code we'd either start using VirtMem
directly or roll that on our own too. We're using it out of convenience and wouldn't lose much if you started making drastic changes in this area, it would be a bit of extra work but we've been thinking about doing that anyway to get more control over the memory layout.
Is the new API design usable from user's perspective?
I think so, nearly all uses of JitAllocator
are followed by a copy and instruction cache flush anyway so wrapping that in a write()
function is just more convenient.
Which new techniques could be added to the new API to make it fit more use cases?
To go back to the instruction cache stuff, I think people would appreciate something to make multi-threaded instruction synchronization easier to handle. You don't have to do anything as funky as cross-modifying running code to run into issues, it's enough to free code and then generate code at the same address.
from asmjit.
@jhogberg That's the point of this discussion to actually design the new API in a way so you would not need to create your own abstraction on top of
VirtMem
- although it's already great that you would do it on top ofVirtMem
and not creating one from scratch :)
Sure, I just wanted you to know that if our requirements place too many strange constraints on the new design, it wouldn't be too much effort to change things on our end.
I think the allocator should track removed code in each block and if adding code to a block in which the code has been removed we should issue this barrier
The tricky bit is issuing it on all (other) threads that might execute code. MacOS and Windows settle this internally behind their instruction cache flushing APIs, on Linux you can use sys_membarrier(2)
, but other platforms require help from the user.
from asmjit.
@jhogberg I think one solution to have a cross-platform barrier could be having something like a counter that would be incremented every time something is removed from executable memory and a place where each thread would call something like VirtMem::maybeSynchronizeBarrier()
(or something nicer) that would compare a global counter with a thread local one and issue that barrier if there is a mismatch (and then update the thread local counter) - however, it's true that this would be on the user to call, not sure if it's user friendly though, however, if there is no way to ask the OS to do that I'm unsure how else this could potentially be solved.
Additionally, if this cannot be solved, we can just forbid removing already allocated code by just marking that place as a waste and releasing the whole mapped block once all functions in that block are removed (but this could be a waste in cases that code is removed, but also stays there).
I think that reviewing the jit_alloc
branch would be nice, as the only missing thing there is a multiple write scope.
from asmjit.
however, it's true that this would be on the user to call, not sure if it's user friendly though, however, if there is no way to ask the OS to do that I'm unsure how else this could potentially be solved.
There's no better alternative that I know of, so it's probably best just to bite the bullet. I think having the write functions return an error code to flag this condition would be enough. Thankfully the most popular platforms have syscalls to handle it without manual intervention so most people won't ever see it.
I think that reviewing the
jit_alloc
branch would be nice, as the only missing thing there is a multiple write scope.
It looks good, though people might want a cheaper write variant that skips the instruction barrier when it's known that it isn't necessary, e.g. changing a b
or b.cond
branch target on aarch64
, which is one of a few alterations that the architecture explicitly allows without barriers (see section B2.2.5 Concurrent modification ...
in the reference manual if you're curious).
from asmjit.
I have updated the https://github.com/asmjit/asmjit/tree/jit_alloc branch.
Still not sure whether the API is okay, basically it's now possible to control the cache flush via a parameter, and there is now a WriteScope, which is not really doing anything extra - just about designing a sane API at the moment.
I'm still thinking how to model better the flushInstructionCache()
and CachePolicy
- in the end this should be configurable whether to flush caches in all threads or whether local flush is okay.
I also thought about an idea of flushing cache before writing to executable memory, in batches. For example, JitAllocator allocates a new block, marks all segments as dirty, and then when allocating memory it would flush in batches, like 4kB, 8kB, 16kB - and then mark these regions as flushed. Then when you again use JitAllocator::alloc
and it gives you a span which region was already flushed, it would be marked as NoFlush
(or something smarter) and the writer would know to not flush that one. In theory, if a region of executable memory was flushed, no thread ever executed it afterwards, then it should be safe to write code into it without flushing, right?
from asmjit.
@jhogberg That's the point of this discussion to actually design the new API in a way so you would not need to create your own abstraction on top of VirtMem
- although it's already great that you would do it on top of VirtMem
and not creating one from scratch :)
I don't plan to remove features from JitAllocator
actually, I just think that a bit different design could solve a lot of issues while offering a very similar set of features. If we settle on spans and offer a write()
API and write scopes, then JitAllocator can be refactored at any time to offer something new without having to touch any code that uses it.
For example I have lately found a function pthread_jit_write_with_callback_np()
- if a new API is added to JitAllocator
that would allow the callback (as already drafted in the issue) then we would be able to support pthread_jit_write_with_callback_np()
out of box and the application would be able to even use PTHREAD_JIT_WRITE_ALLOW_CALLBACKS_NP()
to even tighten the security, etc...
BTW I have already implemented some bits yesterday, you can check out this branch - impact on user code is minimal:
BTW that "instruction synchronization barrier" is a good point. I think the allocator should track removed code in each block and if adding code to a block in which the code has been removed we should issue this barrier. I will look into this, but most likely I will start adding this feature to jit_alloc branch. BTW testing this would be fun as well.
from asmjit.
Still not sure whether the API is okay, basically it's now possible to control the cache flush via a parameter, and there is now a WriteScope, which is not really doing anything extra - just about designing a sane API at the moment.
I think it looks good, I can't think of anything we'd need beyond that.
In theory, if a region of executable memory was flushed, no thread ever executed it afterwards, then it should be safe to write code into it without flushing, right?
I'm not so sure, couldn't the processor speculatively pre-fetch some of those cache lines in between flushing and reading (for whatever reason)?
from asmjit.
But still, I'm not 100% sure, maybe CPUs would only prefetch i-cache lines that are reachable from code - but this is not something I would want to do if there is uncertainty as I really don't like random hard to repro crashes due to this.
I don't think it's a good idea either, even if the processor would restrict instruction prefetching to code that's reachable, there's often data alongside the code (e.g. dispatch/lookup tables) which could potentially cause it to pre-fetch out of bounds. If some important data is then present in the pre-fetched region, we'd go off the rails just the same.
from asmjit.
I'm ready to merge jit_alloc
branch. It's an ABI break, but not source break, so we should slowly move in this direction and see whether this would be a sufficient design or not.
from asmjit.
Jit allocator changes were already merged: 8e2f4de
from asmjit.
Related Issues (20)
- CI Service for RISC-V port HOT 2
- How to call a external function with x86::Compiler?
- How can I convert a byte array to assembly code? HOT 3
- parse "faddp st1, st0" and "fsubp st1, st0" instruction failed HOT 1
- Unexpected error when load label address to register in AArch64 HOT 3
- InvalidDisplacement error when use loop instruction HOT 3
- Calling void function on a64 (android) not working HOT 2
- LAHF issue with x86_64 compiler HOT 2
- Return "Segmentation fault" when use std instruction HOT 4
- Jmp to label size HOT 3
- How to load the address of a label to a register? HOT 1
- "memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set" HOT 9
- No matter what I do. I am not able to get it working in my C++ CLI project. HOT 3
- Build Problem: Android Ndk build failed HOT 1
- Invalid movzx instruction generated by x86 EmitHelper::emitArgMove HOT 8
- Liveness analysis differs in Release mode vs Debug mode HOT 19
- calling names when use call instruction HOT 2
- Are the generated ISA databases still in the public domain? HOT 3
- Ports Meta Issue
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from asmjit.