Giter Site home page Giter Site logo

meltano / hub Goto Github PK

View Code? Open in Web Editor NEW
50.0 4.0 66.0 41.46 MB

The single source of truth for all Meltano plugins, including all available Singer Taps and Targets: https://hub.meltano.com

Home Page: https://hub.meltano.com

HTML 0.22% Python 12.64% JavaScript 9.42% Shell 0.40% CSS 2.64% Vue 74.67%

hub's Introduction

Netlify Status


MeltanoHub

Source of MeltanoHub: hub.meltano.com. The central place for any Meltano plugin.

Not familiar with Meltano? Meltano is your CLI for ELT+ that:

  • Starts simple: Meltano is pip-installable and comes in a prepackaged docker container, you can have your first ELT pipeline running within minutes.
  • Has DataOps out-of-the-box: Meltano provides tools that make DataOps best practices easy to use in every project.
  • Integrates with everything: 600+ natively supported data sources & targets, as well as additional plugins like great expectations or dbt are natively available.
  • Is Easily customizable: Meltano isn't just extensible, it's built to be extended! The SDK for Singer Connectors & EDK for Meltano Components are easy to use. Meltano Hub helps you find all of the connectors and components created across the data community.
  • Is a Mature system: Developed since 2018, runs in production at large companies like GitLab, and currently powers over a million pipeline runs monthly.
  • Has first class ELT tooling built-in: Extract data from any data source, load into any target, use inline maps to transform on data on the fly, and test the incoming data, all in one package.

If you want to get started with Meltano, we suggest you:


Development

To work locally with this project, see CONTRIBUTING.md

Meltano API

In addition to the Singer API that serves metadata about Singer taps and targets, theres also a Meltano API which serves all plugin types that are discoverable by Meltano. This API will serve the front end client for the MeltanoHub 2.0 site and Meltano itself will use the API as a resource for discovering and installing plugin definitions, in exchange for the legacy discovery.yml file.

The API has the following endpoints:

hub's People

Contributors

aaronsteers avatar albert-marrero avatar alexmarple avatar aphethean1 avatar btheunissen avatar cjohnhanson avatar danielpdwalker avatar dependabot[bot] avatar douwem avatar edgarrmondragon avatar gthesheep avatar hsyyid avatar leag avatar mattarderne avatar meltybot avatar niallrees avatar pnadolny13 avatar prratek avatar rawwar avatar reubenfrankel avatar rwfeather avatar s7clarke10 avatar sbalnojan avatar sburwash avatar tayloramurphy avatar tobiascadee avatar visch avatar willdasilva avatar yujoy avatar z3z1ma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

hub's Issues

Key Data for Hub Singer Taps and Targets

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/3

Originally created by @tayloramurphy on 2021-04-14 15:26:19


On Singer.io when you look at a tap or target you see this information:

  • Name
  • Logo
  • Brief description of the tap source
  • Link to API sometimes
  • Language (Python)
  • Data Catalog (not really sure what this does, most say "Supports the extraction of individual data streams and fields")
  • "Example Data" which is just a list of words
  • Link to Singer Spec
  • pip install tap-x code block
  • Link to repo
  • LInk to PRs, Issues, and link to Singer Slack
  • List of available targets
  • Pitch for Stitch

Screen_Shot_2021-04-14_at_10.10.24_AM


What is our future state for the Hub?

  • Much of the above
  • Prominent Singer Spec page
  • Cross-links to https://hub.getdbt.com
  • License of Tap clearly displayed
  • Metrics on taps/targets:
    • number of open PRs
    • number of open issues
    • Last activity on repo (latest commit)
    • Number of downloads / uses
      • anonymous Meltano runs
      • how is this tap used in combination with targets
    • CI Badges
  • Tap/target Capabilities:
    • replication types
    • configuration listings
    • latest catalog output in a human readable format (downloadable)
      • mark columns with available metadata (PII, etc.)
    • Built with SDK yes/no
  • Ownership / Governance:
    • Who's the current sponsor/maintainer
    • Is this actively maintained or potentially abandoned / up for adoption (can be manual at first but then can programmatically update)
  • Variants:
    • What variants are available under the same namespace (tap-gitlab) ?

Additional points brought up in office hours:

  • What's going to be the Meltano "semver". How will people know if something is production ready or not?
  • API coverage of the tap - is this missing 2/3rds of the endpoint? <- can get people to contribute this
    • Mark items in the catalog as unsupported
  • SDK can have a demo mode with some stored JSON data - that can be added as well
  • What's the decision process for upgrading a variant to the "blessed" version <- should be documented on the website
  • "Most wanted" section for taps that don't exist yet - maybe we can make placeholders? should these be ranked?

Add `includes` to individual Meltano yaml files to ref clean Singer yaml

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/37

Originally created by @tayloramurphy on 2021-04-16 19:13:37


For each tap and target we will create a syntax that references the clean Singer yaml spec for every tap and target. On the YAML we'll have the overrides / additions for each setting and other Meltano-specific items that discovery.yml cares about.


As the spec is defined for Singer Taps and Targets for the Hub and for the SDK, we'll have to translate from those specs into what's required and useful for Meltano.

For example, one of the key things we document in target-bigquery is how the load_schema is set based on their configuration in the extractor. https://meltano.com/plugins/loaders/bigquery.html#dataset-id That information shouldn't be on the Hub, but it will need to be synced with the hub to work well with Meltano.

2021-05-04 - We're bumping this up in priority to make the sync the clean yaml spec in the hub easier for Meltano. We'll have an includes: syntax and then Meltano-specific things to make the sync easier.

Exported Meltano usage stats, aggregated per plugin

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/2

Originally created by @aaronsteers on 2021-04-16 17:26:35


If a user has opted in to usage tracking, Meltano captures anonymized metrics on elt and invoke. Can we add this as a per-plugin metric within SingerHub?

Those metrics appear to be tracking in Google Analytics, as per this code snippet:

It would be helpful also to capture 'success/fail' indicator, but even without that, this would still be a big help during the SingerHub discovery process.

One challenge might be in securely extracting this data and landing in a way which can be consumed by SingerHub.

Learnings and observations from comparable decentralized registries

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/1

Originally created by @aaronsteers on 2021-04-16 15:16:14


No an action item here as of yet, but wanted to draw attention to some features in relevant sites that I think we can aim for with SingerHub, and discuss which of these we want to incorporate or improve in our own hub.

Similar decentralized registries:

  1. Chocolatey Community Package Registry - https://community.chocolatey.org/
    • Description: A library of installable packages for the Windows OS.
    • Sample page for WSL2: https://community.chocolatey.org/packages/wsl2#collapsing-right-sidebar
    • Features:
      • Maintainers are listed on the site, along with (1) means to directly contact maintainer or (2) comment in a per-package Discus thread.
      • Users can report a package abandoned.
      • The Chocolatey website can broker a handoff process for packages from an unresponsive maintainer to a volunteer willing to become maintainer.
      • Site shows CI/CD or "Validation" test status.
      • Site shows number of downloads, last update, latest version number, etc.
  2. Terraform Registry - https://registry.terraform.io/
    • Description: Allows user-contributed and partner-contributed modules to be easily discovered and used by Terraform users.
    • Sample page for RDS: https://registry.terraform.io/modules/terraform-aws-modules/rds/aws/latest
    • Features:
      • Documented config options with sample usage.
      • List of versions, latest version, link to github
      • "Total provisions" metric, indicating number of successful invocations.
      • Link to report an issue.
      • Document which license (Apache 2.0, for the sample model link)
      • Official, Verified, and Community badges
      • Distinction between Providers and Modules
  3. PyPI Registry
    • Description: Find, install, and publish Python packages
    • Sample page for requests: https://pypi.org/project/requests/
    • Features:
      • Does not allow private packages, recommendation is to self-host
      • Has GitHub Statistics
      • License, Author, Requirements, latest version, release history
      • README of project is main thing on page
      • Data is available on libraries.io or in public BigQuery project
  4. npmjs registry
    • Description: For managing node.js packages
    • Sample page for express: https://www.npmjs.com/package/express
    • Features:
      • Similar to PyPI in terms of content
      • Shows dependencies and dependents
  5. DockerHub
    • Description: For managing and finding Containers (primarily) but also editions of Docker and different plugins
    • Sample page for ubuntu: https://hub.docker.com/_/ubuntu
    • Features:
      • Verified / Official tags
      • download counts, tags, reviews
      • Primarily shows the README on the page
  6. OctoPrint

Use Meltano to extract projects & metrics from GitHub and GitLab

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/15

Originally created by @tayloramurphy on 2021-04-20 21:57:21


@aaronsteers and I discussed today one of the next iterations of SingerHub. We want to use Meltano to pull data from GitHub and GitLab (and potentially wherever else taps/targets are hosted). The goals of this would be to:

  • get the majority of the tap and target listings organized
  • get basic metrics for each of the projects
  • dogfood meltano
  • use dbt to transform the data
  • push the data from the DB into a format that's usable by Jekyll (JSON)

We're thinking about this like so:

Using GCP, run Meltano to pull data about the available taps and targets into BigQuery. We would also pull in metrics about each of these repositories. dbt would then be used to cleanup the data. We see a combination of seed files and SQL code being good for helping with some of the manual process.

With dbt we can then format the data into JSON so it can be used by Jekyll to interleave the data with the YAML spec.

From https://gitlab.com/meltano/singerhub/-/issues/3, the MVC of metrics would be:

  • number of open PRs
  • number of open issues
  • Last activity on repo (latest commit)
  • number of stars

We'd have to do some deduplication and parsing, but that shouldn't be too bad with dbt!

For a first iteration, we can export the JSON once a week and push it to the SingerHub project (or maybe it can be queried at build time?)

Add pages for taps, targets, and singer on the hub

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/21

Originally created by @tayloramurphy on 2021-05-04 23:41:26


We want pages for the following:

  • hub.meltano.com/taps
  • hub.meltano.com/targets
  • hub.meltano.com/singer
    • hub.meltano.com/singer/spec

The /singer page will focus on our commitment to the Singer ecosystem and community. We will detail:

  • The Open API we intend to maintain for broader community access to the clean yaml definitions of every tap and target we find
  • Link to the spec
  • Link to the JSON schemas
  • Discuss the Tap and Target SDK with links to resources on those
  • Link to the /taps and /targets pages

The /taps and /targets page will:

  • list all of the available taps/targets we're aware of
  • detail how to do a basic pip install
  • give easy links to the YAML definition and how to contribute to it
  • Link to JSON schema
  • Have a Table of Contents to make Settings more discoverable: https://gitlab.com/meltano/hub/-/issues/30

Adding Testing/Validation status badges on plugin pages

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/10

Originally created by @tayloramurphy on 2021-03-03 15:56:41


Following from https://gitlab.com/meltano/meltano/-/issues/2612#note_518395941, it would be good to indicate to users the validity of a given tap or target in a programmatic way. We've started to do this manually by adding Repository, Maintainer, and Maintenance Status on plugin pages (see https://meltano.com/plugins/extractors/chargebee.html) but we should be able to programmatically pull from repositories.

Document capabilities supported by extractors (`discover`: entity selection, `state`: incremental replication)

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/7

Originally created by @DouweM on 2021-01-12 22:55:46


We already track whether discoverable taps have the discover and state capabilities in discovery.yml (e.g. https://gitlab.com/meltano/meltano/blob/master/src/meltano/core/bundle/discovery.yml#L16-17), but don't currently explicitly document this support on their pages under https://meltano.com/plugins/extractors/

Create script that generates a targets.yml and a taps.yml file at build time

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/13

Originally created by @tayloramurphy on 2021-04-16 21:42:40


Every target is going to live under _taps and _targets for the time being. Each of these will be markdown files with the front matter containing the valid YAML.

A pre-req for https://gitlab.com/meltano/meltano/-/issues/2714 and the follow-on issue of https://gitlab.com/meltano/meltano/-/issues/2206 will be to have a single file to includes from in the Meltano discovery.yml file.

Port index of discoverable taps and targest to dedicated MeltanoHub website

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/4

Originally created by @DouweM on 2021-02-24 16:17:33


This will be the Singer equivalent of PyPI, Docker Hub, or dbt Hub.

Part of this is defining the first draft of the spec for taps and targets https://gitlab.com/meltano/singerhub/-/issues/14

  • Convert all targets to YAML
  • Convert all taps to YAML
  • Validate existing YAML is clean w/ no meltano-specific items https://gitlab.com/meltano/hub/-/merge_requests/1
    • Generate JSON schema to validate against (ideally single schema for both taps / targets) #14
  • Make HTML more sensible (no uls with h3s and ps without li) @DouweM
  • Index
    • Large Banner welcoming Singer community and linking to hub.meltano.com/singer (#21)
    • 3 Plugin categories - Extractors / Loaders / Utilities
      • Highlight a few that we support? Link to docs on what those are?
  • Add pages for /singer, /taps, /targets #21
  • Tap and target lists
  • Details pages
    • Break discovery.yml into separate meltano-specific files for generation of
    • Add "| MeltanoHub" to title
    • Add description
    • Add permalink?
    • Move over from subpages under https://meltano.com/plugins/extractors/ and https://meltano.com/plugins/loaders/ to hub.meltano.com/plugins/extractors and hub.meltano.com/plugins/loaders:
      • Introduction
        • Link to source API docs
      • Maintenance
        • Suggestion to find a fork/variant when status is responsive
        • Get maintainer data from maintainers.yml
      • Installation
        • pip install <pip_url>
        • meltano add X --variant Y
      • Settings
        • Default value
        • Descriptions, which in some cases is more complete than what's in discovery.yml
        • On Meltano pages:
          • Env var
          • meltano config...
      • Troubleshooting

Scrape Github repo index for metrics on singer-esque forks and repos

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/11

Originally created by @aaronsteers on 2021-03-10 20:17:51


As noted here, it should be possible to automatically scrape a full index of repos: PyGithub/PyGithub#723

According to the API docs, it should be possible to also capture the repo creation date, commits, etc. We might run into rate limiting but there's an optional "since date" in their API we could use for our replication key.

For team metrics and for presentations, would be a good metric to be able to track. Singer Hub could similarly auto-discover new forks in this way, except we would not want to auto-promote new taps - for obvious reasons.

Once the set of repos we need data from is identified, we could also scrape vitality metrics such as "last commit date", or "last PR merge", "average PR age", etc.

Add Superset as a plugin

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/255

Originally created by @DouweM on 2021-02-24 16:52:39


Previous Issue Body - https://superset.apache.org/ - https://github.com/apache/superset - https://pypi.org/project/apache-superset
meltano add analyzer superset

meltano invoke superset db upgrade # Run automatically?
meltano invoke superset fab create-admin
meltano invoke superset init # Run automatically?
meltano invoke superset run

Meltano can manage Superset configuration (https://superset.apache.org/docs/installation/configuring-superset) by allowing values to be set for the keys in https://github.com/apache/superset/blob/master/superset/config.py, automatically generating superset_config.py, and pointing Superset there by using the SUPERSET_CONFIG_PATH env var. Users should also be able to set SUPERSET_CONFIG_PATH (or meltano config superset set config_path <path>) themselves to use their own config file.

Ideally, Meltano would also be able to inject database connection strings corresponding to loaders directly into Superset so that these don't need to be managed in two places: https://superset.apache.org/docs/databases/installing-database-drivers, https://superset.apache.org/docs/databases/postgres. Possibly through the DB_CONNECTION_MUTATOR setting? apache/superset#9045

Meltano 2.0 Scope

Using https://github.com/pnadolny13/meltano_example_implementations/blob/superset_single_container/meltano_projects/jaffle_superset/meltano.yml#L52 (with possibly some https://gitlab.com/meltano/files-superset) we will get Superset to a place where users can install and spin it up and down using Docker.

  • Entry on the Hub for Superset
  • Possible Files bundle if necessary

Add capability to parse readme files and auto-generate a plugin's discovery.yml text.

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/8

Originally created by @aaronsteers on 2020-07-29 18:11:25


In theory, at least, we should be able to parse the contents of markdown tables and capture the relevant settings config info for externally-managed plugins.

This would greatly streamline the onboarding of new plugins, and could likely reduce support costs for existing plugins.

Depends on: meltano#2206

Central validation infrastructure for tap/target integration testing

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/9

Originally created by @tayloramurphy on 2021-03-03 15:59:33


Following from https://gitlab.com/meltano/meltano/-/issues/2612#note_518395941 it would be useful to have some centralized redundant validation that a tap/target does what it say it can do.

@DouweM mentioned that realistically we may not be able to do more than validating <executable> --help works, but even having basic documentation checks would be useful to better encourage useability across the ecosystem.

Streamline process to add new discoverable taps/targets and variants

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/5

Originally created by @DouweM on 2021-03-04 18:52:53


We can do much better than https://meltano.com/docs/contributor-guide.html#discoverable-plugins and can go beyond https://gitlab.com/meltano/meltano/-/issues/2608.

On https://gitlab.com/meltano/meltano/-/issues/2600, users should be able to sign in and submit new taps/targets through the UI, even if this would still result in merge requests to some repo behind the scenes.

There should also be a way to do this from the SDK. This issue will be broken up into specific initiatives beyond #6.

Consider how to document and manage OSS vs proprietary taps

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/12

Originally created by @tayloramurphy on 2021-04-16 19:25:56


It came up in discussion about how people might want to use the hub for proprietary and unlisted use cases. The spec and the base design of the SingerHub should be built thinking about this at the start.

I'd estimate the following % for each type:

Code-type Listed Unlisted
OSS 90% <1%
Proprietary 5% <5%

We're going to focus most of our effort on the open source community, but will build will related use cases in mind.

Express Tap/Target settings as JSON schema within Singer YAML

Migrated from GitLab: https://gitlab.com/meltano/hub/-/issues/23

Originally created by @tayloramurphy on 2021-05-05 15:08:18


To match what will be generated by the SDK, the settings in a tap/target definition should be expressed as JSON schema https://gitlab.com/meltano/singer-sdk/-/blob/main/singer_sdk/helpers/_typing.py

In handling this, better define the kinds and think about how password is perhaps not the best way to describe it. This is tracked in https://gitlab.com/meltano/hub/-/issues/25

I've validated that we can dump JSON Schema into YAML :slight_smile:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.