Giter Site home page Giter Site logo

Comments (6)

howderek avatar howderek commented on May 21, 2024

Hello @dmgerman! We appreciate your feedback and I apologize for the delayed response. I absolutely agree that GHTorrent is the largest hurdle for using GHData. We are exploring solutions and will include your SQLite suggestion in our discussions :)

from augur.

sgoggins avatar sgoggins commented on May 21, 2024

@dmgerman : there is no coupling to the GHTorrent as a data source. If you mimic the schema from another source, that will work. The challenge, technologically, is that the GHTorrent schema requires queries more complex than sqllite will support in order to operationalize the metrics.

We could refactor some of that with pandas.

Is the main aim here to have a developer version on lighterweight technology?

from augur.

dmgerman avatar dmgerman commented on May 21, 2024

Can you give me an example of a query that SQLite will not be able to handle (that mySQL can)? In my experience it is the other way around (mySQL does not support merge joins, for example).

The goal is to change the need for GHTorrent to a script that does the equivalent mining, but only for one project. That will be significantly easier than having to install GHtorrent.

from augur.

howderek avatar howderek commented on May 21, 2024

@dmgerman, GHTorrent appears to already support SQLite and can be configured to only load data for a set list of repositories. If you would like to add SQLite support to ghdata, you could either modify ghtorrent.py to work with both MySQL and SQLite, or add another data source focused on SQLite.

from augur.

sgoggins avatar sgoggins commented on May 21, 2024

I think there are specific challenges being addressed in this issue:

  1. Wanting to run GHData against a smaller dataset, or against a dataset that is not captured in GHTorrent (i.e., not a GitHub repository). We think this will be accomplished working the Perceval as a mapper/aggregator. This is something Jesus and I are discussing as part of the CHAOSS project
  2. Actual support for SQLLite: SInce GHTorrent now provides .csv files by default, I think a subset could be mapped into SQLLite as @howderek notes above.

from augur.

howderek avatar howderek commented on May 21, 2024

Hello @dmgerman,

I just wanted to follow up on your feedback about GHTorrent being burdensome. We have discussed a new architecture that will make it so that Augur does not rely on GHTorrent (which we agree is an issue), which we will work on implementing in the coming months.

Thank you again for your feedback!

from augur.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.