Comments (5)
@nhasabni @jgottschlich-intel Thanks for the clarifications, guys - I expected this was the situation, but wanted to hear it from ... the horse's mouth, as the saying goes. I did grep
for anomaly, @nhasabni - to no avail. None of the issues was detected. I think it is clear that control-flag - at least with the current database of errors - can only act in an addendum-role - not a replacement for the strong static analysis engines that have evolved over decades to hunt down for a large number of categories of bugs.
I will, however, keep an eye out on control-flag - periodically trying it out on the kinds of bugs we find in space missions (and if you ever want me to do a test, don't hesitate to reach out). It looks like a technology that has potential - looking fwd to a point in time where it will point out things that are missed by established static analysers.
Cheers,
Thanassis.
from control-flag.
Addendum : the run completed, emitting this output from the 42 inputs files I placed in src_ttsiod
:
Training: start.
Trie L1 build took: 696.567s
Trie L2 build took: 354.597s
Training: complete.
Storing logs in test1_scan_output/
Scan progress:4/42 ... in progress
Scan progress:8/42 ... in progress
Scan progress:12/42 ... in progress
Scan progress:16/42 ... in progress
Scan progress:20/42 ... in progress
Scan progress:24/42 ... in progress
Scan progress:28/42 ... in progress
Scan progress:32/42 ... in progress
Scan progress:36/42 ... in progress
Scan progress:40/42 ... in progress
Based on the output, I theorize now that the .ts file actually stores training inputs, not a pre-computed, trained set of weights on a NN.
But anyway, grep
didn't reveal any potential anomaly...
$ grep otentia test1_scan_output/thread_*
$
...even from files in src_ttsiod
doing things like this:
$ cat src_ttsiod/doublefree.c
#include <stdlib.h>
main()
{
char *p = (char *) malloc(100);
free(p);
free(p);
}
To make sure, i checked the logs and the files are definitely processed:
[TID=139798927312640] Scanning File: src_ttsiod/sprintf.c
[TID=139798927312640] Scanning File: src_ttsiod/case.c
[TID=139798927312640] Scanning File: src_ttsiod/oob.c
[TID=139798927312640] Scanning File: src_ttsiod/opendir_leak.c
[TID=139798927312640] Scanning File: src_ttsiod/atoi.c
[TID=139798927312640] Scanning File: src_ttsiod/shortcircuit.c
[TID=139798927312640] Scanning File: src_ttsiod/mixing_signed_unsigned.c
[TID=139798927312640] Scanning File: src_ttsiod/infinite.c
[TID=139798927312640] Scanning File: src_ttsiod/sizes.c
Which means I either...
(a) did something wrong in using the tool - most likely - or...
(b) the training set from the 6000 repos never saw a double free. Or a NULL dereference - or an out of bounds access - or...
Any help/advice most welcome.
from control-flag.
Hi @ttsiodras,
First of all, thanks for trying out ControlFlag. We really appreciate your details - helps us understand the exact build and test environment.
We agree that "Training: start" message could be misleading. Note, however, that ControlFlag is using a simple machine learning model that is quick to train with modest compute power. So, indeed, scan_for_anomalies
script ends up training that model at the start before scanning the target repository for anomalies. That being said, we see that being able to use the pre-trained models could be useful. We plan to add this feature soon.
For your question about the type of bugs found by ControlFlag: Note that ControlFlag is not a typical bug detection system in that it is not designed to catch specific types of bugs (e.g., double free). In fact, ControlFlag does not even know what a bug looks like. All that it knows are typical programming patterns and it then flags any deviations with respect to those patterns as anomalies. It is more often the case that these anomalies lead to bugs. Feel free to refer to our research paper (link) for more details.
For your results - did you try grep anomaly test1_scan_output/thread_*
. Anomalies found by ControlFlag could be obtained with this command.
Looking forward to hear what you found!
Cheers,
The ControlFlag team
from control-flag.
Hi @ttsiodras -
In addition to what @nhasabni said, right now ControlFlag only looks at control structure anomalies. These are things like "if (x = y)" where the programmer perhaps meant "if (x == y)" or other programmatic misuses like "if (x | y)" rather than "if (x || y)" which is not only usually what the programmer wants, it may also be more efficient because logical OR and AND use short-circuited logic in C/C++, whereas it seems that bitwise OR and AND might not.
Eventually, we plan to add support for the types of defects you've described. It's on a roadmap, but not something ControlFlag can currently do.
Thanks a bunch for the questions -- we look forward to adding support for all the points you raised.
Best,
The ControlFlag team
from control-flag.
Hi @ttsiodras -
Sorry, just one other thing I forgot to mention. One of our goals with CF is to complement the existing linters / static analyzers. Decades of work have gone into them and we're not necessarily trying to replace them. Instead, we are more looking to complement them; by trying to identify things they might not be currently capable of finding (like the defects Niranjan and I discuss above).
Does that make sense?
Thanks again for the interest and the great questions! Please keep them coming; you are helping us a lot!
Justin
from control-flag.
Related Issues (20)
- Docker image with control-flag already built
- Segmentation fault while scan_for_anomalies.sh HOT 7
- I've tried it with ClickHouse and it did not find anything meaningful. HOT 11
- Is it possible to mine java pattern? HOT 1
- [FEATURE]Support for the Cpp programming language HOT 2
- [FEATURE]Support for Non-Control structures HOT 2
- [BUG] Authentication Error, Not Handled Correctly HOT 4
- Is it possible to mine c# pattern HOT 2
- [FEATURE] HOT 2
- [BUG] Limited to 16 threads? Missing logfiles? HOT 3
- [FEATURE] Create `requirements.txt` File
- [FEATURE] Give Warning About Possible Antivirus Activity Upon Downloading Top 100 Repos HOT 1
- [BUG] Anomaly report: first "Did you mean" is same as found expression
- [BUG] Line numbers of potential anomaly are incorrect
- [BUG] cf_file_scanner segfaults while scanning files
- Control-Flag crash at Ubuntu HOT 10
- [FEATURE]Can I support Golang?
- [BUG] dead link, C++ "Large" date set cannot be downloaded (404) HOT 1
- Support for the Kotlin programming language? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from control-flag.