Giter Site home page Giter Site logo

Comments (5)

ttsiodras avatar ttsiodras commented on June 14, 2024 1

@nhasabni @jgottschlich-intel Thanks for the clarifications, guys - I expected this was the situation, but wanted to hear it from ... the horse's mouth, as the saying goes. I did grep for anomaly, @nhasabni - to no avail. None of the issues was detected. I think it is clear that control-flag - at least with the current database of errors - can only act in an addendum-role - not a replacement for the strong static analysis engines that have evolved over decades to hunt down for a large number of categories of bugs.

I will, however, keep an eye out on control-flag - periodically trying it out on the kinds of bugs we find in space missions (and if you ever want me to do a test, don't hesitate to reach out). It looks like a technology that has potential - looking fwd to a point in time where it will point out things that are missed by established static analysers.

Cheers,
Thanassis.

from control-flag.

ttsiodras avatar ttsiodras commented on June 14, 2024

Addendum : the run completed, emitting this output from the 42 inputs files I placed in src_ttsiod:

Training: start.
Trie L1 build took: 696.567s
Trie L2 build took: 354.597s
Training: complete.
Storing logs in test1_scan_output/
Scan progress:4/42 ... in progress
Scan progress:8/42 ... in progress
Scan progress:12/42 ... in progress
Scan progress:16/42 ... in progress
Scan progress:20/42 ... in progress
Scan progress:24/42 ... in progress
Scan progress:28/42 ... in progress
Scan progress:32/42 ... in progress
Scan progress:36/42 ... in progress
Scan progress:40/42 ... in progress

Based on the output, I theorize now that the .ts file actually stores training inputs, not a pre-computed, trained set of weights on a NN.

But anyway, grep didn't reveal any potential anomaly...

 $ grep otentia test1_scan_output/thread_*
 $

...even from files in src_ttsiod doing things like this:

$ cat src_ttsiod/doublefree.c
#include <stdlib.h>

main()
{
    char *p = (char *) malloc(100);
    free(p);
    free(p);
}

To make sure, i checked the logs and the files are definitely processed:

[TID=139798927312640] Scanning File: src_ttsiod/sprintf.c
[TID=139798927312640] Scanning File: src_ttsiod/case.c
[TID=139798927312640] Scanning File: src_ttsiod/oob.c
[TID=139798927312640] Scanning File: src_ttsiod/opendir_leak.c
[TID=139798927312640] Scanning File: src_ttsiod/atoi.c
[TID=139798927312640] Scanning File: src_ttsiod/shortcircuit.c
[TID=139798927312640] Scanning File: src_ttsiod/mixing_signed_unsigned.c
[TID=139798927312640] Scanning File: src_ttsiod/infinite.c
[TID=139798927312640] Scanning File: src_ttsiod/sizes.c

Which means I either...

(a) did something wrong in using the tool - most likely - or...
(b) the training set from the 6000 repos never saw a double free. Or a NULL dereference - or an out of bounds access - or...

Any help/advice most welcome.

from control-flag.

nhasabni avatar nhasabni commented on June 14, 2024

Hi @ttsiodras,

First of all, thanks for trying out ControlFlag. We really appreciate your details - helps us understand the exact build and test environment.

We agree that "Training: start" message could be misleading. Note, however, that ControlFlag is using a simple machine learning model that is quick to train with modest compute power. So, indeed, scan_for_anomalies script ends up training that model at the start before scanning the target repository for anomalies. That being said, we see that being able to use the pre-trained models could be useful. We plan to add this feature soon.

For your question about the type of bugs found by ControlFlag: Note that ControlFlag is not a typical bug detection system in that it is not designed to catch specific types of bugs (e.g., double free). In fact, ControlFlag does not even know what a bug looks like. All that it knows are typical programming patterns and it then flags any deviations with respect to those patterns as anomalies. It is more often the case that these anomalies lead to bugs. Feel free to refer to our research paper (link) for more details.

For your results - did you try grep anomaly test1_scan_output/thread_*. Anomalies found by ControlFlag could be obtained with this command.

Looking forward to hear what you found!

Cheers,
The ControlFlag team

from control-flag.

jgottschlich-intel avatar jgottschlich-intel commented on June 14, 2024

Hi @ttsiodras -

In addition to what @nhasabni said, right now ControlFlag only looks at control structure anomalies. These are things like "if (x = y)" where the programmer perhaps meant "if (x == y)" or other programmatic misuses like "if (x | y)" rather than "if (x || y)" which is not only usually what the programmer wants, it may also be more efficient because logical OR and AND use short-circuited logic in C/C++, whereas it seems that bitwise OR and AND might not.

Eventually, we plan to add support for the types of defects you've described. It's on a roadmap, but not something ControlFlag can currently do.

Thanks a bunch for the questions -- we look forward to adding support for all the points you raised.

Best,
The ControlFlag team

from control-flag.

jgottschlich-intel avatar jgottschlich-intel commented on June 14, 2024

Hi @ttsiodras -

Sorry, just one other thing I forgot to mention. One of our goals with CF is to complement the existing linters / static analyzers. Decades of work have gone into them and we're not necessarily trying to replace them. Instead, we are more looking to complement them; by trying to identify things they might not be currently capable of finding (like the defects Niranjan and I discuss above).

Does that make sense?

Thanks again for the interest and the great questions! Please keep them coming; you are helping us a lot!

Justin

from control-flag.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.