trishullab / salento Goto Github PK
View Code? Open in Web Editor NEWStatistical bug-finding framework for API-using code
License: Apache License 2.0
Statistical bug-finding framework for API-using code
License: Apache License 2.0
Salento is currently stuck in tensforflow 0.12. One important maintenance milestone is to bring the API up to date with the latest version, 1.4 as of now.
I am currently doing this effort.
Salento expects as an input a sequence of packages.
The problem is that the file format that contains the sequence of packages is a JSON objects, which means that all packages must fit into memory to read them. We currently have some use cases where the datasets do not fit memory, so this architecture is a bottleneck for scalability.
We need to:
train.py
) such that data is loaded lazily and use as much as possible generators (versus creating lists upfront)As far as I understand, the same code extractors can be used for multiple tools (salento and bayou).
Maybe it makes more sense to move code extractors to their own repository?
This would simplify repository maintenance and packaging.
The problem appears to be that Salento's internals are not expecting unknown vocabs. I am wondering if we should just filter out unknown vocabs when ranging through, say Aggregator.events
.
@vijay-murali, thoughts?
I'm getting this error when running the sequence aggregator:
Package 1----
Traceback (most recent call last):
File "/home/tgc/salento/src/main/python/salento/aggregators/sequence_aggregator.py", line 52, in <module>
aggregator.run()
File "/home/tgc/salento/src/main/python/salento/aggregators/sequence_aggregator.py", line 38, in run
llh += math.log(self.distribution_next_call(spec, events[:i], call=self.call(event)))
File "/home/tgc/salento/src/main/python/salento/aggregators/base.py", line 73, in distribution_next_call
return dist if call is None else dist[call]
KeyError: 'cogl_pipeline_set_layer_filters'
Traceback (most recent call last):
File "/home/tgc/salento/src/main/python/salento/aggregators/kld_aggregator.py", line 94, in <module>
aggregator.run()
File "/home/tgc/salento/src/main/python/salento/aggregators/kld_aggregator.py", line 81, in run
kld_score = self.compute_kld(spec, seqs_l)
File "/home/tgc/salento/src/main/python/salento/aggregators/kld_aggregator.py", line 59, in compute_kld
log_q = self.log_likelihood(spec, sequence)
File "/home/tgc/salento/src/main/python/salento/aggregators/kld_aggregator.py", line 44, in log_likelihood
llh += math.log(self.distribution_next_state(spec, events[:i] + [partial_event], state=state))
File "/home/tgc/salento/src/main/python/salento/aggregators/base.py", line 91, in distribution_next_state
return dist[state]
KeyError: '4#5'
Hi, @vijay-murali,
I am trying to debug the error below and for that I was looking at the implementation of kld.py
.
tarted at 2017-11-27 16:42:05.398390
2017-11-27 16:42:05.398588: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
### foo.c
Traceback (most recent call last):
File "../salento/statistical/kld.py", line 150, in <module>
main()
File "../salento/statistical/kld.py", line 65, in main
klds = [(l, kld.compute(l, pack)) for l in locations]
File "../salento/statistical/kld.py", line 65, in <listcomp>
klds = [(l, kld.compute(l, pack)) for l in locations]
File "../salento/statistical/kld.py", line 131, in compute
samples = [sample(seqs_l, nsamples=1) for i in range(self.args.num_iters)]
File "../salento/statistical/kld.py", line 131, in <listcomp>
samples = [sample(seqs_l, nsamples=1) for i in range(self.args.num_iters)]
File "/home/tiago/Work/salento/statistical/utils.py", line 20, in sample
samples = [random.choice(s) for i in range(nsamples)] if nsamples > 1 else random.choice(s)
File "/usr/lib/python3.6/random.py", line 257, in choice
raise IndexError('Cannot choose from an empty sequence') from None
IndexError: Cannot choose from an empty sequence
{"packages": [
{"data": [
{"sequence": [
{
"call": "pthread_mutex_lock",
"states": [],
"location": "foo.c:2"
},
{
"call": "pthread_mutex_unlock",
"states": [],
"location": "foo.c:1"
}
]}
],
"name": "foo.c"
}
]}
In function main()
we find the following code:
for pack in parser.packages:
locations = parser.locations(pack)
# ...
klds = [(l, kld.compute(l, pack)) for l in locations]
For this input we get that there is only one package, where locations = ['foo.c:1', 'foo.c:2']
.
Then we have a call to compute(self, l, pack)
, where in the first line we can find:
seqs_l = self.parser.sequences(pack, l)
According to the documentation of sequences
:
If location is given, then get all sequences in package that end at location.`
Hence, for foo.c:1
we get the only sequence in the input and for foo.c:2
we get seqs_l = []
which then triggers the error.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.