Comments (7)
@laurentS - I have no objection to adding the dependencies
but what do you think about creating as a dedicated stream, as a child stream of repository?
from tap-github.
Just to clarify, I'm talking about dependents
, ie: the packages/repos that depend on the currently fetched one. So going "up" the dependency tree, as opposed to "down" with dependencies
(for which there seems to be API endpoints, at least in graphQL).
Happy to do this as a child stream if it makes more sense. As far as I can see, it would be a single request/record per repo, with 2 data fields to start with (but potentially more in the future).
from tap-github.
@laurentS - Thanks for clarifying the dependents
vs dependencies
. I think the bigger clarification though is whether you want just the count of dependents
or if you'll also (now or in the future) want the listing. I first thought we wanted the list of them, which is why I suggested the child stream. If you do think you'll want the list of repos that depend on the active one, then I think this would be correctly modeled as a child stream of repository
since it neatly generates a one-to-many mapping of child records (even though you are correct to say they are technically 'upstream').
If you only want the count of dependents, I could see this being a property of repositories
as you suggest. Two considerations come to mind if adding as a property:
- For stability and performance, you probably would want to check that the field is selected before making the extra request. (I don't know if we have a pattern for this but it should be feasible and I see it being common enough that we'd want to have a pattern available.)
- I don't think the addition/subtraction of
dependents
will bump the incremental key forrepositories
. I don't know how important this is but want to call it out as something to consider.
from tap-github.
There are all great points!
- for now, we only need the count of dependents, so I think I'll go with making it a property of
repositories
. I've not considered getting the full list of them, as it would starting becoming fairly painful (and looking more like web scraping than API consumption). Also, we don't really use this info at the moment ;) - good point about checking if the field is selected. I had not really considered this, but it seems important.
- again, I hadn't thought of this, thanks for calling it out. I'd agree with you, and leave the incremental key untouched. If nothing else, just because the values on github seem to be cached, and therefore probably not 100% up to date.
from tap-github.
@laurentS I'd love to revive this now that we have the GraphQl endpoints. I think we should aim to grab both dependents and dependencies
from tap-github.
For the dependencies, you can use this - https://docs.github.com/en/graphql/overview/schema-previews#access-to-a-repositories-dependency-graph-preview, see how it can be used in https://github.com/simonw/til/blob/master/github/dependencies-graphql-api.md
Or by scraping, see dogsheep/github-to-sqlite#70 and the assosciated functions
from tap-github.
from tap-github.
Related Issues (20)
- Add stream `issue_reactions` (currently blocked) HOT 2
- Pagination in graphql streams seems broken
- Field `commit_timestamp` may be missing from stream HOT 4
- Improve backfoff handler
- Replace RESTStream.get_next_page_token with RESTStream.get_new_paginator HOT 1
- Passing a username as "organizations" config value crashes the tap HOT 5
- KeyError: `commit_timestamp` HOT 5
- Field `fetched_at` in stream `extra-metrics` can be formatted as a date-time string
- Releases stream has 10,000 record limit HOT 3
- The 'pull_number' field not being populated for the 'pull_request_commits' stream HOT 5
- If a member is part of multiple teams, they will only be listed once HOT 2
- ValueError: not enough values to unpack (expected at least 1, got 0) in repository_streams HOT 1
- Incremental replication doesn't respect the current state HOT 1
- Use pre-commit.ci to lint project
- Stream `extra_metrics` fails on repos with large number of issues/PRs HOT 1
- Drop support for python 3.7 HOT 1
- Invalid SCHEMA messages are produced for deselected streams HOT 3
- Replace use of `get_next_page_token` in the tap HOT 2
- Workflow streams incorrectly claim to support incremental loading
- Hard to tell if API token is valid or not HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tap-github.