Transfer Kernel Canonical Correlation Analysis
ai-se / process-product Goto Github PK
View Code? Open in Web Editor NEWProcess Vs Product Metrics
License: Apache License 2.0
Process Vs Product Metrics
License: Apache License 2.0
Page 2:
Content: " unwise to trust metric importance results fromanalytics in-the-small studies since those change, dramatically when movingto analytics in-the-larg"
Comment: OK, but why? The current motivation is not that strong.
Page 3:
Content: "Now we can access data on hundreds to thousands of projects. Howdoes this change software analytics?"
Comment: In what respect? Does this refer to pooling data of multiple projects together, or to building models for more projects? The motivation is rather vague at this point.
Page 3:
Content: " For example, for 722,471 commitsstudied in this paper, data collected required 500 days of CPU (using fivemachines, 16 cores, 7days)."
Comment: OK, but actual companies building a model for themselves would not need to do this on thousands of projects, hence the effort would be lower for them?
Page 6:
Content: " in both released based and JIT based setting. A"
Comment: These study settings should be mentioned more explicitly before the RQs.
Page 6:
Content: "process metrics have significantly lower correlation than product metrics inboth released based and JIT based setting"
Comment: Is this a good or a bad thing?
Table 2: Do metrics like "age" only apply to JIT models? What are "neighbors"? What is "recent"?
Table 3: It might be better to order the papers by year to prove the point about recent studies including relatively few projects in their data set.
Page 11:
Content: "The papers in the intersectionare [60, 48, 24, 6] explore both process and product metrics."
Comment: Where is the 5th one?
Page 12:
Content: " more than 8 issues."
Comment: Why 8?
Page 12:
Content: " eight contributors."
Comment: Why 8?
Page 12:
Content: " modified version of Commit Guru [65] "
Comment: What modifications were made?
Table 4: Are these metrics calculated on the last version of each repo, for java files only, or across all commits of all repos? The latter could bias the results to older, larger projects?
Page 13:
Content: " using a keyword based search."
Comment: What keywords were used?
Page 14:
Content: " uses SZZ algorithm"
Comment: Which SZZ implementation? Is the bug report date-heuristic used?
Page 14:
Content: " use the release number, release date informationsupplied from the API to group commits into releases and thus dividingeach project the into multiple releases for each of the metrics."
Comment: Did all projects have releases?
Page 14:
Content: " or was changed in a defective child commit."
Comment: Why?
Page 16:
Content: "But by reporting on results fromboth methods, it is more likely that other researchers will be able to comparetheir results against ours. "
Comment: Nice.
Page 20:
Content: "see any significant benefit when accessing the performance in regards to thePopt20, which is another effort aware evaluation criteria used by Kamei et al.and this study."
Comment: Somehow, even product metrics seem to perform equally well on this metric.
Page 21:
Content: "With the exception of AUC"
Comment: Popt20?
Page 23:
Content: " evident from the results, thatfile level prediction shows statistically significant improvement "
Comment: Supported by statistical test results?
Page 24:
Content: " then check each of the 3 subsequent releases"
Comment: In terms of what?
Page 24:
Content: " see in both process based and product based models thePopt20 does significantly better in the third release"
Comment: Perhaps many projects have only few releases?
Page 24:
Content: " This basically means if either process or product metrics can capturesuch differences, then the metric values for a file between release R and R+1should not be highly correlated."
Comment: Since process metrics capture the development process, would a low correlation imply changes in the process?
Page 24:
Content: " Spearman correlation values for every file between two consecutivereleases for all the projects explored as a violin plot for each type of metrics."
Comment: Basically, for each file there is one spearman correlation across all its metrics, then those correlations are aggregated across all files, all commits and all projects into one violin plot?
Page 27:
Content: "indicate the models are proba-bility learning to predict the same set of files defective and finding the samedefect percentage in the test set as training set and it is not able to prop-erly differentiate between defective and non-defective files. "
Comment: What is the difference with RQ5?
Page 27:
Content: "Spearman rank correlation between the learned and predicted probability formodels built using process and product metrics."
Comment: Is this analysis per file, then aggregated across all files and all projects?
Page 27:
Content: " part 3 only contains files which are defective intraining and not in test set,"
Comment: The other way around?
Page 28:
Content: " using both process and product metrics "
Comment: "both" or "either"?
Page 30:
Content: " sorted by the absolute value of their β-coefficients within thelearned regression equation."
Comment: Are coefficients comparable across the different metrics in a logistic regression model? Why not use odds ratios?
Page 31:
Content: " have relied on issues marked as a 'bug' or 'enhancement' to count bugsor enhancements"
Comment: Which metrics leverage information about enhancements?
Page 32:
Content: " took precaution to remove any pull merge requests from thecommits to remove any extra contributions added to the hero programmer."
Comment: Any details about this?
Page 32:
Content: " process metrics generate better predictors than process metrics"
Comment: Something seems wrong in this sentence.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.