commitanalyzingservice / cas_coderepoanalyzer Goto Github PK
View Code? Open in Web Editor NEWIngests and analyzes a code repository.
License: GNU General Public License v2.0
Ingests and analyzes a code repository.
License: GNU General Public License v2.0
Can I download the metrics values as a .csv file for newly listed projects? I created an account in the tool, signed in and entered a URL of a git repository. It run successfully and displayed the values of metrics. However, I still can't download data as a csv file. I noticed that we can download already existing available data as .csv, but not the one I have added even though they are listed as public.
Hi!
I followed the 'README' step by step of the 'CAS_CodeRepoAnalyzer' project, but it's not working!
can anyone help?
My best regards,
Rubson Lima
When repo ID d6e977d4-e3da-4d04-8604-56a8c0473f9d
starts analyzing, "list index out of range" gets printed to the console. This also leaves the status at "Analyzing" which is misleading as it is no longer analyzing. This also means that the repo never will get analyzed in future passes.
It seems there is being made an error in the calculation of the REXP metric during ingestion. More specifically:
in ingester/git.py:
sexp = experiences[subsystem]
sets the metric SEXP to the experience of the developer in the subsystem that is seen the latest for the first time.
However, SEXP should rather be an aggregate of subsystem experiences if the commit changes more than one subsystem.
Solution seems to be to sum the values to sexp, and divide by nf afterwards (similar to exp and rexp)
What do you think?
Currently, we simply skip the repository. It would be good to react to this instead.
We are getting very low values for the probabilities as we are just multiplying insignificant probabilities with 0 instead of rebuilding the model.
Issue seems to be with having multiple git process run concurrently.
It still needs to get all commits as we do not know when a particular line of code was changed (could be very far back); however, we shouldn't re-link already linked corrective commits.
Instead of doing the historical analysis on each request on the web frontend, do it directly after re-ingesting and either store it in memcached, or directly in postgres in a json field. This could also be done on the web side if deemed too complicated.
If you look at commit 79f59c2144 on http://commit.guru/repo/jquery you will see the person's email with failed Unicode chars in it (the u7352 stuff). We need to figure out where the slash is lost. It could be anywhere from:
Or on the front end:
We need to rule out that it is not caused by the CodeRepoAnalyzer -> database before I start digging in to the front-end side of things.
My guess is that it's lost from the first ingestion from the git log
output, but I might be wrong.
If a commit has all metrics being 0
, then it must be a merge commit. You could also scan the commit message for "merge" as it is very reliably going to indicate a merge commit in association with the zeroed metrics.
When the analysis is taking place, ignore any commit marked as merge
as this will almost definitely have a negative impact on the quality of the model.
What does this tool do? Key features? Why should I bother installing this over some other tool?
Some projects do not commit many changes to master, hence, it would be nice to have an advanced option where users can specify a branch to be analyzed.
Proposed by: Yasutaka Kamei
From issue #2:
You could also scan the commit message for "merge" as it is very reliably going to indicate a merge commit in association
This is not a good way to detect merge commits. Particularly in the Gerrit project, the word "merge" is very often used in commit messages for commits that are not actually merges.
In git, a merge commit can be detected by the count of parent commits. A merge commit will always have 2 parents, while a regular commit will only have 1.
Sessions are not being properly closed.
I can't parse the fixes
field easily because it's not JSON encoded and therefore I would have to write my own list parser to read the elements.
The commits store the date in unix timestamp NOT utc time, so our comparison doesn't work.
2014-04-19 11:20:34,236 ERROR: Got an exception linking bug fixing changes to bug inducing changes for repo 65db096f-b5fb-465a-8724-455b03ba0b2b
Traceback (most recent call last):
File "/home/cas_user/cas/CAS_CodeRepoAnalyzer/analyzer/analyzer.py", line 81, in analyzeRepo
git_commit_linker.linkCorrectiveCommits(corrective_commits, all_commits)
File "/home/cas_user/cas/CAS_CodeRepoAnalyzer/analyzer/git_commit_linker.py", line 43, in linkCorrectiveCommits
buggy_commits = self._linkCorrectiveCommit(corrective_commit)
File "/home/cas_user/cas/CAS_CodeRepoAnalyzer/analyzer/git_commit_linker.py", line 78, in _linkCorrectiveCommit
bug_introducing_changes = self.gitAnnotate(region_chunks, commit)
File "/home/cas_user/cas/CAS_CodeRepoAnalyzer/analyzer/git_commit_linker.py", line 235, in gitAnnotate
+ file + "'", shell=True, cwd= self.repo_path )).split(" ")[0][2:]
File "/usr/lib/python3.3/subprocess.py", line 589, in check_output
raise CalledProcessError(retcode, process.args, output=output)
subprocess.CalledProcessError: Command 'git blame -L864,+1 c8c2dd95182289eb6eab140ec7964d346bc93601^ -l -- 'mysqltuner.pl'' returned n
When the CAS manager adds an ingested or analyzer task to the queue it should changes the repo to status to signify it's waiting in the queue to be ingested/or analyzed. Otherwise, it may be possible a repo will get added multiple times to the thread pool task queue
For instance, in the linux repository, drivers/net/usb/qmi_wwan.c does not have 2,210 developers.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.