-
What temporal resolution do we have?
-
Both re-read the abstract
-
Both need to read about tensor decomposition eventually
-
For each question we're asking what data do we need?
Questions from abstract
- What are the differences between different software ecosystems?
- What do communities of commonly-used-together software packages look like?
- How do these communities change with time?
- Do authors collaborate differently in different languages?
- How do communities of software authors change with time?
- Which packages do authors choose to contribute to?
Which platforms
Pypi
CRAN
NPM
Julia
Cargo
Elm
TODO
Does bigger dump have the data from API?
We'll need to crawl version dependencies and so-on from their API.
Build a crawler and crawl elm versions - get 100 at a time
Want dependencies for each version of each project -- in future only at temporal resolution blah.
About software packages
- What do communities of commonly-used-together software packages look like?
- How do these communities change with time?
Method
At a variety of points in time (complicated, return later) extract a matrix of version-version links and resolve that to project-project links. Factorise to find communities.
Complications:
- What resolution on t?
- Resolving to project-project ignores the realities of version pinning changing the dependency graph, but if we don't do it we will lose comparability between time slices.
Data required
Dependencies of project versions throughout time. We have this for
Data about authors required
- Do authors collaborate differently in different languages?
- How do communities of software authors change with time?
- Which packages do authors choose to contribute to?
Method
Need the tensor from above + a tensor