commitanalyzingservice / cas_coderepoanalyzer Goto Github PK

View Code? Open in Web Editor NEW

32.0 32.0 26.0 99.02 MB

Ingests and analyzes a code repository.

License: GNU General Public License v2.0

Python 99.97% Batchfile 0.03%

cas_coderepoanalyzer's People

Contributors

Stargazers

Watchers

cas_coderepoanalyzer's Issues

String index issue error when linking commits on linux

Sometimes we get a negative code churn for some projects.

Can we download the results as a .csv file for newly listed projects?

Can I download the metrics values as a .csv file for newly listed projects? I created an account in the tool, signed in and entered a URL of a git repository. It run successfully and displayed the values of metrics. However, I still can't download data as a csv file. I noticed that we can download already existing available data as .csv, but not the one I have added even though they are listed as public.

I can't run the project

Hi!

I followed the 'README' step by step of the 'CAS_CodeRepoAnalyzer' project, but it's not working!

can anyone help?

My best regards,
Rubson Lima

"list index out of range" while analyzing

When repo ID d6e977d4-e3da-4d04-8604-56a8c0473f9d starts analyzing, "list index out of range" gets printed to the console. This also leaves the status at "Analyzing" which is misleading as it is no longer analyzing. This also means that the repo never will get analyzed in future passes.

SEXP metric error

It seems there is being made an error in the calculation of the REXP metric during ingestion. More specifically:

in ingester/git.py:

sexp = experiences[subsystem] sets the metric SEXP to the experience of the developer in the subsystem that is seen the latest for the first time.

However, SEXP should rather be an aggregate of subsystem experiences if the commit changes more than one subsystem.

Solution seems to be to sum the values to sexp, and divide by nf afterwards (similar to exp and rexp)

What do you think?

Singularities when building the glm model in R will drop the metric, changing the matrix.

Currently, we simply skip the repository. It would be good to react to this instead.

When building the linear regression model, rebuild model without the insignificant coefficients

We are getting very low values for the probabilities as we are just multiplying insignificant probabilities with 0 instead of rebuilding the model.

Concurrency issue when analyzing multiple repos using the new SZZ algorithm.

Issue seems to be with having multiple git process run concurrently.

The linker between corrective commits and bug inducing commits should only do new corrective commits when re-analyzing for performance reasons.

It still needs to get all commits as we do not know when a particular line of code was changed (could be very far back); however, we shouldn't re-link already linked corrective commits.

Cache commit threshold and historical analysis

Instead of doing the historical analysis on each request on the web frontend, do it directly after re-ingesting and either store it in memcached, or directly in postgres in a json field. This could also be done on the web side if deemed too complicated.

Unicode problem in text parsing

If you look at commit 79f59c2144 on http://commit.guru/repo/jquery you will see the person's email with failed Unicode chars in it (the u7352 stuff). We need to figure out where the slash is lost. It could be anywhere from:

The first ingestion
Python itself
SqlAlchemy parsing/coercing
Database coalition (content type),

Or on the front end:

Node maybe can't handle it?
The waterline ORM is incorrectly parsing it
The socket connection might be loosing it
Angular.js might be loosing it
The actual html page might have the incorrect character encoding.

We need to rule out that it is not caused by the CodeRepoAnalyzer -> database before I start digging in to the front-end side of things.

My guess is that it's lost from the first ingestion from the git log output, but I might be wrong.

Flag commits as MERGE

If a commit has all metrics being 0, then it must be a merge commit. You could also scan the commit message for "merge" as it is very reliably going to indicate a merge commit in association with the zeroed metrics.

When the analysis is taking place, ignore any commit marked as merge as this will almost definitely have a negative impact on the quality of the model.

Please improve the readme

What does this tool do? Key features? Why should I bother installing this over some other tool?

Add functionality to analyze specific branch

Some projects do not commit many changes to master, hence, it would be nice to have an advanced option where users can specify a branch to be analyzed.

Proposed by: Yasutaka Kamei

Incorrect detection of merge commits.

From issue #2:

You could also scan the commit message for "merge" as it is very reliably going to indicate a merge commit in association

This is not a good way to detect merge commits. Particularly in the Gerrit project, the word "merge" is very often used in commit messages for commits that are not actually merges.

In git, a merge commit can be detected by the count of parent commits. A merge commit will always have 2 parents, while a regular commit will only have 1.

Sql Alchemy connection timeout

Sessions are not being properly closed.

Concurrent access to 'R' is not allowed

commit 'fixes' field should be a JSON list

I can't parse the fixes field easily because it's not JSON encoded and therefore I would have to write my own list parser to read the elements.

Currently getting all commits when modeling and not just all prior to 3 months ago.

The commits store the date in unix timestamp NOT utc time, so our comparison doesn't work.

Updating repositories seems to be broken

Error when linking a commit for MySQLTuner- tries to annotate a line that doesn't exist.

2014-04-19 11:20:34,236 ERROR: Got an exception linking bug fixing changes to bug inducing changes for repo 65db096f-b5fb-465a-8724-455b03ba0b2b
Traceback (most recent call last):
File "/home/cas_user/cas/CAS_CodeRepoAnalyzer/analyzer/analyzer.py", line 81, in analyzeRepo
git_commit_linker.linkCorrectiveCommits(corrective_commits, all_commits)
File "/home/cas_user/cas/CAS_CodeRepoAnalyzer/analyzer/git_commit_linker.py", line 43, in linkCorrectiveCommits
buggy_commits = self._linkCorrectiveCommit(corrective_commit)
File "/home/cas_user/cas/CAS_CodeRepoAnalyzer/analyzer/git_commit_linker.py", line 78, in _linkCorrectiveCommit
bug_introducing_changes = self.gitAnnotate(region_chunks, commit)
File "/home/cas_user/cas/CAS_CodeRepoAnalyzer/analyzer/git_commit_linker.py", line 235, in gitAnnotate
+ file + "'", shell=True, cwd= self.repo_path )).split(" ")[0][2:]
File "/usr/lib/python3.3/subprocess.py", line 589, in check_output
raise CalledProcessError(retcode, process.args, output=output)
subprocess.CalledProcessError: Command 'git blame -L864,+1 c8c2dd95182289eb6eab140ec7964d346bc93601^ -l -- 'mysqltuner.pl'' returned n

Repos may be added to the work queue multiple times.

When the CAS manager adds an ingested or analyzer task to the queue it should changes the repo to status to signify it's waiting in the queue to be ingested/or analyzed. Otherwise, it may be possible a repo will get added multiple times to the thread pool task queue

number of developers metric is way off for repositories like linux

For instance, in the linux repository, drivers/net/usb/qmi_wwan.c does not have 2,210 developers.

commitanalyzingservice / cas_coderepoanalyzer Goto Github PK

cas_coderepoanalyzer's People

Contributors

Stargazers

Watchers

Forkers

cas_coderepoanalyzer's Issues

Recommend Projects

Recommend Topics

Recommend Org