Giter Site home page Giter Site logo

hermannloose / cfcss Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 3.0 740 KB

Futzing around with LLVM trying to implement compiler support for control-flow checking with software signatures (CFCSS) for the Fiasco micro-kernel. This is the topic of my "Großer Beleg" thesis in operating systems.

Shell 2.92% C 2.89% C++ 94.19%

cfcss's People

Contributors

hermannloose avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

cfcss's Issues

Refactor to make SplitAfterCall obsolete.

Since call instructions in LLVM don't terminate basic blocks, the decision was originally made to split blocks after call instructions, in order to treat all blocks uniformly. This is not currently done. The fact that the two categories of basic blocks draw their predecessors from different locations—normal predecessors within the function vs. return blocks of called functions—is not hidden from InstrumentBasicBlocks and results in a lot of branching, negatively impacting readability.

Call instructions are already handled separately, all in one go, after basic block instrumentation. It would be easier and more readable to not split after call instructions but instead instrument both before and after them, with a clear notion of "this is control flow between functions".

InstrumentBasicBlocks fails for basic blocks without predecessors apart from the entry block.

The assertion in InstrumentBasicBlocks.cpp:220 that checks for the presence of an authoritative predecessor fails for basic blocks other than the entry block that do not have predecessors.

I'm not sure what causes these blocks to be generated in the first place, but I stumbled upon this when instrumenting libgit2. Workaround is to run opt with -simplifycfg before instrumentation, which could probably be requested in InstrumentBasicBlocks::getAnalysisUsage().

Add support for function pointers.

Calls without an associated function are currently ignored. The possibility of supporting function pointers should be investigated.

One possible (pessimistic?) assumption would be that all calls through pointers with a given signature can go to all functions with that signature that have had their addresses taken, although that might not rule out situations where calls through function pointers leave an instrumented module, causing the signature check upon return to fail.

Split externally visible functions into a gateway and the actual implementation.

Externally visible functions might be called from code unaware of CFCSS and therefore receive garbage when reading either GSR or D. Not doing a signature check to work around this throws away some of the benefits of CFCSS, especially when the function is recursive and a lot of CFCSS-aware calls follow after the initial one.

We can provide for both cases with reasonably small overhead by moving the function body to a new function only visible internally and leaving the original function as a gateway that only sets GSR and D to defined values and does a signature check before returning. Calls to the original function within the module are updated to use the internal function directly.

LLVM fails to mark some functions as not returning, trips assertion in instrumentation.

This example in libgit2 results in checkout_deferred_remove() ending on an unreachable instruction in LLVM IR, yet Function::doesNotReturn() still returns false. __assert_fail is correctly labelled as not returning.

Currently basic blocks containing a call to such a function will be split after that call. The instrumentation later queries InstructionIndex for the primary return instruction of the called function, triggering the assertion in InstructionIndex.cpp:91.

More research is needed to figure out why LLVM does not mark functions like these as not returning where applicable. Currently no workaround apart from digging into the callee when running SplitAfterCall.

CFG aliasing detection slightly broken regarding superset vs. subset aliasing

When checking the predecessors of fanin nodes that can be reached from predecessors of the current node for overlap with the current node's predecessors, the case of the fanin node's predecessors being a subset of the current node's predecessors goes undetected.

See RemoveCFGAliasing.cpp for reference.

Once the fanin node is itself processed as the current node, the aliasing with the former current node will be detected correctly. However, this asymmetric behaviour is confusing and should either be clarified in a comment or changed to detect the aliasing right away.

Tracking: state of SPEC CPU2006 support.

Working

Fortran benchmarks via DragonEgg

  • 401.bzip2
  • 403.gcc
  • 410.bwaves
  • 429.mcf
  • 433.milc
  • 434.zeusmp
  • 435.gromacs
  • 436.cactusADM
  • 437.leslie3d
  • 445.gobmk
  • 454.calculix
  • 456.hmmer
  • 458.sjeng
  • 459.GemsFDTD
  • 462.libquantum
  • 464.h264ref
  • 465.tonto
  • 470.lbm
  • 473.astar
  • 481.wrf
  • 482.sphinx3

Broken

  • 444.namd (trips assertion about authoritative call site in InstrumentBasicBlocks)
  • 447.dealII (segfault when running InstrumentBasicBlocks)
  • 450.soplex (trips same assertion as 444.namd)
  • 453.povray (trips same assertion as 444.namd)
  • 471.omnetpp (segfault when running InstrumentBasicBlocks)
  • 483.xalancbmk (trips assertion in GatewayFunctions)

Use larger ID space to ensure there are no collisions.

Currently block signatures within a module just count upwards from zero, with functions signatures doing the same. Once entry blocks use the signature of their enclosing function—no bug filed yet—it could become clumsy to devise a scheme for still keeping the IDs sufficiently separate to catch most control flow errors. Hence random IDs.

Using the facilities provided in <random> is probably good enough, otherwise one could look into UUIDs, where speed might or might not be an issue.

Dealiasing some switch statements leads to invalid phi nodes.

When multiple cases of a switch statement have the same basic block as their target, RemoveCFGAliasing might insert a proxy block for these edges. This will reduce the number of incoming edges at the original basic block without touching corresponding phi nodes, leading to an invalid module and failing with the following error message:

PHINode should have one entry for each predecessor of its parent basic block!

RemoveCFGAliasing should be changed to only substitute the source basic block with the proxy block once per phi node, removing any other references to the source basic block.

Question about hasAddressTaken

I'm sorry to ask you a question here.
My question is related to HasAddresstaken. I tried to search the relevant content online but couldn't solve my problem. I saw your GitHub and I thought you should know something about it, so I took the liberties to ask you.
My question is as follows:

In LLVM, the member function hasAddressaken under the basic block is defined as follows:

bool hasAddressTaken () const Returns true if there are any uses of this basic block other than direct branches, switches, etc. More...

So I tried to implement an indirect call branch, and I inserted a call instruction into each basic block.

` Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @print() #0 {
entry:
%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([9 x i8], [9 x i8]* @.str, i32 0, i32 0))
ret i32 0
}

declare dso_local i32 @printf(i8*, ...) #1

; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @main() #0 {
entry:
%retval = alloca i32, align 4
%i = alloca i32, align 4
%a = alloca [10 x i32], align 16
store i32 0, i32* %retval, align 4
store i32 0, i32* %i, align 4
%0 = bitcast [10 x i32]* %a to i8*
call void @llvm.memset.p0i8.i64(i8* align 16 %0, i8 0, i64 40, i1 false)
store i32 0, i32* %i, align 4
%1 = call i32 @print()
br label %for.cond

for.cond: ; preds = %for.inc, %entry
%2 = load i32, i32* %i, align 4
%cmp = icmp slt i32 %2, 20
%3 = call i32 @print()
br i1 %cmp, label %for.body, label %for.end`

You can see that the print function defined at the beginning is called once in each basic block. But even then, hasAddressTaken returns false. Can anyone help me understand the definition of this function, or what I should do to make it return true?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.