wogscpar / szzunleashed Goto Github PK
View Code? Open in Web Editor NEWAn implementation of the SZZ algorithm, i.e., an approach to identify bug-introducing commits.
License: MIT License
An implementation of the SZZ algorithm, i.e., an approach to identify bug-introducing commits.
License: MIT License
docker image name is 'ssz'
But Readme is running 'szz'.
It seems to me that in the following code snippet
SZZUnleashed/szz/src/main/java/heuristics/SimpleBugIntroducerFinder.java
Lines 180 to 207 in 79f369f
pair[0]
is a bug-introducing commit, and pair[1]
is a bug-fixing commit as defined in the issue list.
However, in line 193 (as well as in line 224), I think the order of the pair should be (bug-fixing commit, bug-introducing commit)
, so (pair[1], pair[0])
.
Is this correct or am I missing something?
The fetch script results in a UnicodeEncodeError, see below. An empty file "res0.json" is created in the issues subdirectory.
Environment: Python 3.6.6 on Win10.
python fetch.py
Total issue matches: 2435
Progress: | = 1000 issues
Traceback (most recent call last):
File "fetch.py", line 53, in <module>
fetch()
File "fetch.py", line 46, in fetch
f.write(conn.read().decode('utf-8'))
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2193' in position 320915: character maps to <undefined>
Now fetch.py only works with the Jira repository of the JENKINS project. It could be interesting to allow the script to work with any Jira project.
I would suggest to pass two parameters when the script is called from command line:
The code of the project issues in Jira: e.g., JENKINS
The name of the Jira page of the project: e.g., issues.jenkins-ci.org
Moreover, a third parameter may allow to configure the end date of the query: currently, it is set to "2018-02-20 10:34"
If you agree, I will generate a pull request with the above-mentioned changes.
I'm looking at Figure 3 in your paper (BTW, nice graphics!) and I don't understand why line 3 in Commit 3 is not blamed to either Commit 2 or Commit 1.
Could you please clarify?
The short description of the repo should be corrected. Instead of
A complete implementation of the SZZ algorithm as described by Zeller et al's.
I suggest
An implementation of the SZZ algorithm, i.e., an approach to identify bug-introducing commits.
The reference to the work by the SZZ authors is anyway first in the README.md... on top of that "Zeller et al." is wrong, since it's actually "Śliwerski et al."
Also, I suggest adding a few more tags to support findability of this repo from the software engineering research community. I suggest adding: "defect-prediction", "mining-software-repositories", and "software-engineering-research".
assemble_code_churns.py I don't know how to use this file
The "issues" subfolder is created, but it's empty.
Here is the stack trace:
[main] INFO Main - Checking available processors...
[main] INFO Main - Found 8 processes!
Exception in thread "main" java.lang.ClassCastException: org.json.simple.JSONArray cannot be cast to org.json.simple.JSONObject
at diff.SimplePartition.splitJSON(SimplePartition.java:52)
at diff.SimplePartition.splitFile(SimplePartition.java:121)
at Main.main(Main.java:44)
How can it be used for a Public GitHub repository to find bug-fixing commits and the modified code [for any release]?
How to use olt parameter that you talked about in #28.
I want to try it to avoid MemoryError.
Thanks
FileNotFoundError exception in git_log_to_array.py, see below.
Cloned jenkins to C:\Code\jenkins and provided absolute path to script. Tried a few variations of separators in file path. Not sure which file the system is looking for.
C:\Code\SZZUnleashed\fetch_jira_bugs>python git_log_to_array.py --repo-path C:/Code/jenkins
Traceback (most recent call last):
File "git_log_to_array.py", line 43, in <module>
git_log_to_json(init_hash, path_to_repo)
File "git_log_to_array.py", line 14, in git_log_to_json
stdout=subprocess.PIPE).stdout.decode('ascii').split()
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\subprocess.py", line 403, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
The analysis of any project in the jira of the apache software foundations fails.
As example, we executed
git clone https://github.com/apache/accumulo
python3 fetch.py --issue-code ACCUMULO --jira-project issues.apache.org/jira
python3 git_log_to_array.py --repo-path ./accumulo --from-commit 31b54248176f320cc15f432ece29f998a0d3a363
python3 find_bug_fixes.py --gitlog gitlog.json --issue-list ./issues
The last script cannot find any matching bug-fixing commit while reading from the commit messages there are several commits reporting the Jira issue id and clearly reporting "fixed" in the commit message.
First of all, I really appreciate your work on making SZZ algorithm public. This is truly helpful for researchers and practitioners.
Secondly, I am not using it with docker, and I am windows user.
Questions:
[1] I noticed that annotation.json is created quickly, however, the command line still shows "trying to find potential bug introducing commit" and stalls for a very long time. Based on the documentation, if "annotation.json" has same information as "fix_and_introducers_pairs.json", but shows details about bug introducing file rather commit, I do not understand why it stalls for a long time to get commits.
As soon as I run the program following happens
A.
B. inside each one I already have
C. however, it stalls for a very long time here.
[2] I was wondering how would i be able to get file introducing the bug rather than commit level? Can I traverse the annotation.json and look for filePath?
thank you!
Successfully built szz_find_bug_introducers-0.1.jar with gradle. The res1000.json file appears to be populated with correct data.
Not sure if it is related, an also not sure what the purpose of the file results\result0\commits.json is, but it is missing and a FileNotFoundException is thrown.
If this is not a bug, then the instructions need to be updated to explain what needs to be present before running the jar file.
Output and stack trace:
java -jar szz_find_bug_introducers-0.1.jar -i C:\Code\SZZUnleashed\fetch_jira_bugs\issues\res1000.json -r C:\Code\jenkins
[main] INFO Main - Checking available processors...
[main] INFO Main - Found 8 processes!
[Thread-0] INFO parser.GitParserThread - Started process...
Exception in thread "Thread-0" java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
at parser.GitParser.readBugFixCommits(GitParser.java:342)
at parser.GitParserThread.run(GitParserThread.java:94)
[Thread-1] INFO parser.GitParserThread - Started process...
Exception in thread "Thread-1" java.lang.ClassCastException: java.lang.Long cannot be cast to java.util.Map
at parser.GitParser.readBugFixCommits(GitParser.java:342)
at parser.GitParserThread.run(GitParserThread.java:94)
[Thread-2] INFO parser.GitParserThread - Started process...
Exception in thread "Thread-2" java.lang.ClassCastException: java.lang.Long cannot be cast to java.util.Map
at parser.GitParser.readBugFixCommits(GitParser.java:342)
at parser.GitParserThread.run(GitParserThread.java:94)
[Thread-3] INFO parser.GitParserThread - Started process...
[Thread-4] INFO parser.GitParserThread - Started process...
[Thread-5] INFO parser.GitParserThread - Started process...
[Thread-5] INFO parser.GitParserThread - Found 0 number of commits.
[Thread-5] INFO parser.GitParserThread - Checking each commits diff...
[Thread-5] INFO parser.GitParserThread - Parsing difflines for all found commits.
[Thread-5] INFO parser.GitParserThread - Saving parsed commits to file
Exception in thread "Thread-4" java.lang.ClassCastException: java.lang.Long cannot be cast to java.util.Map
at parser.GitParser.readBugFixCommits(GitParser.java:342)
at parser.GitParserThread.run(GitParserThread.java:94)
[Thread-5] INFO parser.GitParserThread - Building line mapping graph.
[Thread-6] INFO parser.GitParserThread - Started process...
[Thread-5] INFO parser.GitParserThread - Saving results to file
[Thread-6] INFO parser.GitParserThread - Found 0 number of commits.
[Thread-6] INFO parser.GitParserThread - Checking each commits diff...
[Thread-5] INFO parser.GitParserThread - Trying to find potential bug introducing commits...
[Thread-6] INFO parser.GitParserThread - Parsing difflines for all found commits.
[Thread-6] INFO parser.GitParserThread - Saving parsed commits to file
[Thread-5] INFO parser.GitParserThread - Saving found bug introducing commits...
[Thread-7] INFO parser.GitParserThread - Started process...
[Thread-6] INFO parser.GitParserThread - Building line mapping graph.
[Thread-7] INFO parser.GitParserThread - Found 0 number of commits.
[Thread-6] INFO parser.GitParserThread - Saving results to file
[Thread-7] INFO parser.GitParserThread - Checking each commits diff...
[Thread-7] INFO parser.GitParserThread - Parsing difflines for all found commits.
[Thread-6] INFO parser.GitParserThread - Trying to find potential bug introducing commits...
[Thread-7] INFO parser.GitParserThread - Saving parsed commits to file
[Thread-6] INFO parser.GitParserThread - Saving found bug introducing commits...
[Thread-7] INFO parser.GitParserThread - Building line mapping graph.
[Thread-7] INFO parser.GitParserThread - Saving results to file
[Thread-7] INFO parser.GitParserThread - Trying to find potential bug introducing commits...
[Thread-7] INFO parser.GitParserThread - Saving found bug introducing commits...
Exception in thread "Thread-3" java.lang.ClassCastException: org.json.simple.JSONArray cannot be cast to java.util.Map
at parser.GitParser.readBugFixCommits(GitParser.java:342)
at parser.GitParserThread.run(GitParserThread.java:94)
java.io.FileNotFoundException: results\result0\commits.json (The system cannot find the file specified)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(Unknown Source)
at java.io.FileInputStream.<init>(Unknown Source)
at java.io.FileInputStream.<init>(Unknown Source)
at java.io.FileReader.<init>(Unknown Source)
at diff.SimplePartition.mergeFiles(SimplePartition.java:140)
at Main.main(Main.java:65)
java.io.FileNotFoundException: results\result1\commits.json (The system cannot find the file specified)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(Unknown Source)
at java.io.FileInputStream.<init>(Unknown Source)
at java.io.FileInputStream.<init>(Unknown Source)
at java.io.FileReader.<init>(Unknown Source)
at diff.SimplePartition.mergeFiles(SimplePartition.java:140)
at Main.main(Main.java:65)
java.io.FileNotFoundException: results\result2\commits.json (The system cannot find the file specified)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(Unknown Source)
at java.io.FileInputStream.<init>(Unknown Source)
at java.io.FileInputStream.<init>(Unknown Source)
at java.io.FileReader.<init>(Unknown Source)
at diff.SimplePartition.mergeFiles(SimplePartition.java:140)
at Main.main(Main.java:65)
java.io.FileNotFoundException: results\result3\commits.json (The system cannot find the file specified)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(Unknown Source)
at java.io.FileInputStream.<init>(Unknown Source)
at java.io.FileInputStream.<init>(Unknown Source)
at java.io.FileReader.<init>(Unknown Source)
at diff.SimplePartition.mergeFiles(SimplePartition.java:140)
at Main.main(Main.java:65)
java.io.FileNotFoundException: results\result4\commits.json (The system cannot find the file specified)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(Unknown Source)
at java.io.FileInputStream.<init>(Unknown Source)
at java.io.FileInputStream.<init>(Unknown Source)
at java.io.FileReader.<init>(Unknown Source)
at diff.SimplePartition.mergeFiles(SimplePartition.java:140)
at Main.main(Main.java:65)
I believe we found some explicit calls for a complete open source SZZ implementation in the literature. I don't remember which papers raised this need. Does anyone know? It would make sense to add references to these in the Readme, to state that we "respond to the call by [REF] and [REF]".
I may be totally wrong here, but it seems to me that Configuration.java supports more parameters than the two specified in Readme.md for the szz_find_bug_introducers-<version_number>.jar file.
Maybe an explanation about them should be added in the Readme.md file.
In particular, I think that it may be important to explain the parameter that sets the number of cores during the execution. Currently, the Readme.md file states:
The algorithm tries to use as many cores as possible during runtime.
However, the option to enable the user to set the number of cores seems to have been implemented in Configuration.java. I think this possibility should be explained in Readme.md since it may be relevant for the users.
What do you think?
I noticed that the contents of your res0.json
, res1000.json
and res2000.json
are exactly the same.
Regardless of what start_at
and max_results
are, it will get all the data (no difference).
It always,
{"expand":"schema,names","startAt":0,"maxResults":50,"total":2445,"issues":
[{"expand":"operations,versionedRepresentations,editmeta,changelog,renderedFields",
"id":"188567","self":"https://issues.jenkins-ci.org/rest/api/2/issue/188567","key":
"JENKINS-49642","fields":{"issuetype":
{"self":"https://issues.jenkins-ci.org/rest/api/2/issuetype/1","id":"1",
"description":"A problem which impairs or prevents the functions of the product.","iconUrl":
...
You can copy the following string
to your browser URL.
https://issues.jenkins-ci.org/rest/api/2/search?jql=project%20%3D%20JENKINS%20AND%20issuetype%20%3D%20Bug%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20resolution%20%3D%20Fixed%20AND%20component%20%3D%20core%20AND%20created%20%3C%3D%20%222018-02-20%2010%3A34%22%20ORDER%20BY%20created%20DESC&start_at=0&max_results=1
https://issues.jenkins-ci.org/rest/api/2/search?jql= \
project = JENKINS AND issuetype = Bug AND status in (Resolved, Closed) \
AND resolution = Fixed AND component = core \
AND created <= "2018-02-20 10:34" \
ORDER BY created DESC&start_at=0&max_results=1
line numebr such as,
{"0": 0}
Jenkins or Jira issues are not downloaded correctly. The fetch.py file's function fetch
downloads only 50 issues at a time and pagination does not work either as everytime the 50 first issues are downloaded.
I tested the code with several Jira projects and the Jenkins project provided as an example. All tested projects had the same problem.
Apparently the Jira REST API's syntax has changed which causes the problem. I fixed the issue by changing start_at
to startAt
and max_results
to maxResults
. Example of my fix is below
request = 'https://' + jira_project_name + '/rest/api/2/search?'\
+ 'jql={}&startAt={}&maxResults={}'
I refer to it and ran the SZZ program for 5 days without getting any results.
java -Xmx64g -jar ${JAR_PATH}/szz_find_bug_introducers-0.1.jar -d 1 -i ${ISSUE_LIST_PA TH}/hadoop.json -r ${REPOS_PATH}/hadoop/
$ tree hadoop/
hadoop/
├── issues
│ ├── fix_and_introducers_pairs_0.json
│ ├── fix_and_introducers_pairs_1.json
│ ├── fix_and_introducers_pairs_2.json
│ ├── fix_and_introducers_pairs_3.json
│ ├── fix_and_introducers_pairs_4.json
│ ├── fix_and_introducers_pairs_5.json
│ ├── fix_and_introducers_pairs_6.json
│ └── fix_and_introducers_pairs_7.json
└── results
├── result0
│ ├── annotations.json
│ └── commits.json
├── result1
│ ├── annotations.json
│ └── commits.json
├── result2
│ └── commits.json
├── result3
│ └── commits.json
├── result4
│ ├── annotations.json
│ └── commits.json
├── result5
│ ├── annotations.json
│ └── commits.json
├── result6
│ ├── annotations.json
│ └── commits.json
└── result7
├── annotations.json
└── commits.json
10 directories, 22 files
$ free -g
total used free shared buff/cache available
Mem: 32 32 0 0 0 0
Swap: 41 28 13
and nohup.out
[main] INFO Main - Checking available processors...
[main] INFO Main - Found 8 processes!
[Thread-0] INFO parser.GitParserThread - Started process...
[Thread-3] INFO parser.GitParserThread - Started process...
[Thread-4] INFO parser.GitParserThread - Started process...
[Thread-5] INFO parser.GitParserThread - Started process...
[Thread-6] INFO parser.GitParserThread - Started process...
[Thread-7] INFO parser.GitParserThread - Started process...
[Thread-8] INFO parser.GitParserThread - Started process...
[Thread-9] INFO parser.GitParserThread - Started process...
[Thread-8] INFO parser.GitParserThread - Found 1839 number of commits.
[Thread-8] INFO parser.GitParserThread - Checking each commits diff...
[Thread-8] INFO parser.GitParserThread - Parsing difflines for all found commits.
[Thread-9] INFO parser.GitParserThread - Found 1926 number of commits.
[Thread-9] INFO parser.GitParserThread - Checking each commits diff...
[Thread-9] INFO parser.GitParserThread - Parsing difflines for all found commits.
[Thread-3] INFO parser.GitParserThread - Found 1949 number of commits.
[Thread-3] INFO parser.GitParserThread - Checking each commits diff...
[Thread-3] INFO parser.GitParserThread - Parsing difflines for all found commits.
[Thread-6] INFO parser.GitParserThread - Found 1977 number of commits.
[Thread-6] INFO parser.GitParserThread - Checking each commits diff...
[Thread-6] INFO parser.GitParserThread - Parsing difflines for all found commits.
[Thread-4] INFO parser.GitParserThread - Found 2015 number of commits.
[Thread-4] INFO parser.GitParserThread - Checking each commits diff...
[Thread-4] INFO parser.GitParserThread - Parsing difflines for all found commits.
[Thread-7] INFO parser.GitParserThread - Found 1941 number of commits.
[Thread-7] INFO parser.GitParserThread - Checking each commits diff...
[Thread-7] INFO parser.GitParserThread - Parsing difflines for all found commits.
[Thread-5] INFO parser.GitParserThread - Found 2047 number of commits.
[Thread-5] INFO parser.GitParserThread - Checking each commits diff...
[Thread-5] INFO parser.GitParserThread - Parsing difflines for all found commits.
[Thread-0] INFO parser.GitParserThread - Found 1973 number of commits.
[Thread-0] INFO parser.GitParserThread - Checking each commits diff...
[Thread-0] INFO parser.GitParserThread - Parsing difflines for all found commits.
[Thread-3] INFO parser.GitParserThread - Saving parsed commits to file
[Thread-8] INFO parser.GitParserThread - Saving parsed commits to file
[Thread-9] INFO parser.GitParserThread - Saving parsed commits to file
[Thread-0] INFO parser.GitParserThread - Saving parsed commits to file
[Thread-4] INFO parser.GitParserThread - Saving parsed commits to file
[Thread-6] INFO parser.GitParserThread - Saving parsed commits to file
[Thread-7] INFO parser.GitParserThread - Saving parsed commits to file
[Thread-5] INFO parser.GitParserThread - Saving parsed commits to file
[Thread-3] INFO parser.GitParserThread - Building line mapping graph.
[Thread-7] INFO parser.GitParserThread - Building line mapping graph.
[Thread-4] INFO parser.GitParserThread - Building line mapping graph.
[Thread-8] INFO parser.GitParserThread - Building line mapping graph.
[Thread-6] INFO parser.GitParserThread - Building line mapping graph.
[Thread-9] INFO parser.GitParserThread - Building line mapping graph.
[Thread-5] INFO parser.GitParserThread - Building line mapping graph.
[Thread-0] INFO parser.GitParserThread - Building line mapping graph.
[Thread-3] INFO parser.GitParserThread - Saving results to file
[Thread-3] INFO parser.GitParserThread - Trying to find potential bug introducing commits...
[Thread-6] INFO parser.GitParserThread - Saving results to file
[Thread-6] INFO parser.GitParserThread - Trying to find potential bug introducing commits...
[Thread-8] INFO parser.GitParserThread - Saving results to file
[Thread-8] INFO parser.GitParserThread - Trying to find potential bug introducing commits...
[Thread-0] INFO parser.GitParserThread - Saving results to file
[Thread-0] INFO parser.GitParserThread - Trying to find potential bug introducing commits...
[Thread-9] INFO parser.GitParserThread - Saving results to file
[Thread-9] INFO parser.GitParserThread - Trying to find potential bug introducing commits...
[Thread-7] INFO parser.GitParserThread - Saving results to file
[Thread-7] INFO parser.GitParserThread - Trying to find potential bug introducing commits...
in commit.json file, the line number in the diff dict , the result seen not right, it should +1 ,that is the really right line number
execute the cmd : git blame -l b831acd9854b525d680ca72fd218c848121b9d3f^ -- core/src/test/java/hudson/model/ViewTest.java, the delete code line number is actually is 101, not 100, add dict result is the same should +1 when write result to file
In addition, the annotation graph result make me confuse, the bug-introduction in this file, 101 line number, from git blame command show commit hash should be 67827c7eaac821aa22a2f26bd4dbe7d44470b6c9 not 05b46659e451c316fb5f1a5243c49b9a84a50702 that result in annotations.json
the code in SZZUnleashed/szz/src/main/java/parser/GitParser.java 212, the var i should be index ????
when run fetch.py, how to use other project
A lot of open-source projects rely on Github issues for their issue tracker. Since researchers work a lot with open source repositories, I think it's extremely valuable to have a support for retrieving issues from Github issues and using it with SZZUnleashed.
Could this feature be added?
I have two questions:
First, in the fix_and_bug_introducers.json, is the order [fixing, buggy] or the opposite? I am confused because commits.json which is supposed to contain buggy files has the commit numbers which are at location 0 in the pairs in fix_and_bug_introducers.json?
Second, when SZZUnleashed finds a commit as buggy would it label the entire commit as buggy or just a few files are labelled as buggy? Apparently commits.json contains the entire commit(as it appears in GitHub), and not just a few files that might have caused the bug.
If someone can kindly answer these I'd be grateful.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.