Comments (6)
Hello @dmgerman! We appreciate your feedback and I apologize for the delayed response. I absolutely agree that GHTorrent is the largest hurdle for using GHData. We are exploring solutions and will include your SQLite suggestion in our discussions :)
from augur.
@dmgerman : there is no coupling to the GHTorrent as a data source. If you mimic the schema from another source, that will work. The challenge, technologically, is that the GHTorrent schema requires queries more complex than sqllite will support in order to operationalize the metrics.
We could refactor some of that with pandas.
Is the main aim here to have a developer version on lighterweight technology?
from augur.
Can you give me an example of a query that SQLite will not be able to handle (that mySQL can)? In my experience it is the other way around (mySQL does not support merge joins, for example).
The goal is to change the need for GHTorrent to a script that does the equivalent mining, but only for one project. That will be significantly easier than having to install GHtorrent.
from augur.
@dmgerman, GHTorrent appears to already support SQLite and can be configured to only load data for a set list of repositories. If you would like to add SQLite support to ghdata, you could either modify ghtorrent.py to work with both MySQL and SQLite, or add another data source focused on SQLite.
from augur.
I think there are specific challenges being addressed in this issue:
- Wanting to run GHData against a smaller dataset, or against a dataset that is not captured in GHTorrent (i.e., not a GitHub repository). We think this will be accomplished working the Perceval as a mapper/aggregator. This is something Jesus and I are discussing as part of the CHAOSS project
- Actual support for SQLLite: SInce GHTorrent now provides .csv files by default, I think a subset could be mapped into SQLLite as @howderek notes above.
from augur.
Hello @dmgerman,
I just wanted to follow up on your feedback about GHTorrent being burdensome. We have discussed a new architecture that will make it so that Augur does not rely on GHTorrent (which we agree is an issue), which we will work on implementing in the coming months.
Thank you again for your feedback!
from augur.
Related Issues (20)
- New Contributors Closing Issues metric API
- Contribution Attribution metric API
- Number of Downloads metric API
- Conversion Rate metric API HOT 1
- OpenSSF Best Practices Badge metric API
- Elephant factor metric API
- Committers metric API
- Bus factor metric API
- Test coverage metric API
- Defect Resolution Duration metric API
- License Coverage metric API
- OSI Approved Licenses metric API
- Licenses Declared metric API
- SPDX Document metric API
- Upstream Code Dependencies metric API HOT 1
- Libyears metric API HOT 4
- repo_deps_libyear column name misspelled HOT 3
- repo_deps_libyear current_release_date and latest_release_date data type HOT 1
- Delete account fails HOT 1
- Adding a repository to a group fails HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from augur.