Giter Site home page Giter Site logo

cas_coderepoanalyzer's People

Contributors

bgrawi avatar tofferrosen avatar xsultan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cas_coderepoanalyzer's Issues

Can we download the results as a .csv file for newly listed projects?

Can I download the metrics values as a .csv file for newly listed projects? I created an account in the tool, signed in and entered a URL of a git repository. It run successfully and displayed the values of metrics. However, I still can't download data as a csv file. I noticed that we can download already existing available data as .csv, but not the one I have added even though they are listed as public.

I can't run the project

Hi!

I followed the 'README' step by step of the 'CAS_CodeRepoAnalyzer' project, but it's not working!

can anyone help?

My best regards,
Rubson Lima

"list index out of range" while analyzing

When repo ID d6e977d4-e3da-4d04-8604-56a8c0473f9d starts analyzing, "list index out of range" gets printed to the console. This also leaves the status at "Analyzing" which is misleading as it is no longer analyzing. This also means that the repo never will get analyzed in future passes.

SEXP metric error

It seems there is being made an error in the calculation of the REXP metric during ingestion. More specifically:

in ingester/git.py:

sexp = experiences[subsystem] sets the metric SEXP to the experience of the developer in the subsystem that is seen the latest for the first time.

However, SEXP should rather be an aggregate of subsystem experiences if the commit changes more than one subsystem.

Solution seems to be to sum the values to sexp, and divide by nf afterwards (similar to exp and rexp)

What do you think?

Cache commit threshold and historical analysis

Instead of doing the historical analysis on each request on the web frontend, do it directly after re-ingesting and either store it in memcached, or directly in postgres in a json field. This could also be done on the web side if deemed too complicated.

Unicode problem in text parsing

If you look at commit 79f59c2144 on http://commit.guru/repo/jquery you will see the person's email with failed Unicode chars in it (the u7352 stuff). We need to figure out where the slash is lost. It could be anywhere from:

  • The first ingestion
  • Python itself
  • SqlAlchemy parsing/coercing
  • Database coalition (content type),

Or on the front end:

  • Node maybe can't handle it?
  • The waterline ORM is incorrectly parsing it
  • The socket connection might be loosing it
  • Angular.js might be loosing it
  • The actual html page might have the incorrect character encoding.

We need to rule out that it is not caused by the CodeRepoAnalyzer -> database before I start digging in to the front-end side of things.

My guess is that it's lost from the first ingestion from the git log output, but I might be wrong.

Flag commits as MERGE

If a commit has all metrics being 0, then it must be a merge commit. You could also scan the commit message for "merge" as it is very reliably going to indicate a merge commit in association with the zeroed metrics.

When the analysis is taking place, ignore any commit marked as merge as this will almost definitely have a negative impact on the quality of the model.

Add functionality to analyze specific branch

Some projects do not commit many changes to master, hence, it would be nice to have an advanced option where users can specify a branch to be analyzed.

Proposed by: Yasutaka Kamei

Incorrect detection of merge commits.

From issue #2:

You could also scan the commit message for "merge" as it is very reliably going to indicate a merge commit in association

This is not a good way to detect merge commits. Particularly in the Gerrit project, the word "merge" is very often used in commit messages for commits that are not actually merges.

In git, a merge commit can be detected by the count of parent commits. A merge commit will always have 2 parents, while a regular commit will only have 1.

Error when linking a commit for MySQLTuner- tries to annotate a line that doesn't exist.

2014-04-19 11:20:34,236 ERROR: Got an exception linking bug fixing changes to bug inducing changes for repo 65db096f-b5fb-465a-8724-455b03ba0b2b
Traceback (most recent call last):
File "/home/cas_user/cas/CAS_CodeRepoAnalyzer/analyzer/analyzer.py", line 81, in analyzeRepo
git_commit_linker.linkCorrectiveCommits(corrective_commits, all_commits)
File "/home/cas_user/cas/CAS_CodeRepoAnalyzer/analyzer/git_commit_linker.py", line 43, in linkCorrectiveCommits
buggy_commits = self._linkCorrectiveCommit(corrective_commit)
File "/home/cas_user/cas/CAS_CodeRepoAnalyzer/analyzer/git_commit_linker.py", line 78, in _linkCorrectiveCommit
bug_introducing_changes = self.gitAnnotate(region_chunks, commit)
File "/home/cas_user/cas/CAS_CodeRepoAnalyzer/analyzer/git_commit_linker.py", line 235, in gitAnnotate
+ file + "'", shell=True, cwd= self.repo_path )).split(" ")[0][2:]
File "/usr/lib/python3.3/subprocess.py", line 589, in check_output
raise CalledProcessError(retcode, process.args, output=output)
subprocess.CalledProcessError: Command 'git blame -L864,+1 c8c2dd95182289eb6eab140ec7964d346bc93601^ -l -- 'mysqltuner.pl'' returned n

Repos may be added to the work queue multiple times.

When the CAS manager adds an ingested or analyzer task to the queue it should changes the repo to status to signify it's waiting in the queue to be ingested/or analyzed. Otherwise, it may be possible a repo will get added multiple times to the thread pool task queue

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.