Giter Site home page Giter Site logo

singer-tools's Introduction

singer-tools

Tools for working with Singer Taps and Targets

  • singer-check-tap - validates Tap output
  • singer-infer-schema - infers a json-schema from Tap output
  • singer-release - for Singer projects that are written in Python, deploy packages to PyPi
  • diff-jsonl - diffs two JSONL files, such as those produced by a tap

Installation

singer-tools should be installed into and run from a dedicated virtualenv to avoid version conflicts with the tap it's operating on.

  1. Create and activate a virtualenv
  2. pip install -e . (for developing singer-tools) or pip install singer-tools (for running singer-tools)
  3. Run via <virtualenv>/bin/<singer-tool>, e.g. <virtualenv>/bin/singer-check-tap.

Tools

singer-check-tap

You can use singer-check-tap to check whether a Tap conforms to the Singer specification.

Checking a tap

If you run singer-check-tap and provide a path to a Tap (with the --tap option) and a configuration for that Tap, it will do the following:

  1. Run the tap with the specified config and no --state option
  2. Validate the output the tap produces
  3. Check the exit status of the tap
  4. Capture the final state produced by the Tap and save it to a file
  5. Run the tap again, this time with a --state arg pointing to the final state produced by the first invocation
  6. Validate the output the tap produces
  7. Check the exit status

If all of the invocations of the Tap succeed (exit with status 0) and if the output of the Tap conforms to the specification, this program will exit with status 0. If any of the invocations of the Tap fail (exit non-zero) or produce output that does not conform to the specification, this program will print an error message and exit with a non-zero status.

Checking output of a tap

Sometimes it's convenient to validate the output of a tap, rather have singer-check-tap actually run the tap. You can do that by omitting the --tap argument and providing the Tap output on STDIN. For example:

my-tap --config config.json | singer-check-tap

In this mode of operation, singer-check-tap will just validate the data on stdin and exit with a status of zero if it's valid or non-zero otherwise.

Sample data

You can try singer-check-tap out on the data in the samples directory.

A good run:

$ singer-check-tap < samples/fixerio-valid-initial.json
Checking stdin for valid Singer-formatted data
The output is valid.
It contained 17 messages for 1 streams.

      1 schema messages
     15 record messages
      1 state messages

Details by stream:
+---------------+---------+---------+
| stream        | records | schemas |
+---------------+---------+---------+
| exchange_rate | 15      | 1       |
+---------------+---------+---------+

A bad run:

$ singer-check-tap < samples/fixerio-invalid-no-key-properties.json
Checking stdin for valid Singer-formatted data
Traceback (most recent call last):
  File "/opt/code/singer-tools/venv/bin/singer-check-tap", line 11, in <module>
    load_entry_point('singer-tools', 'console_scripts', 'singer-check-tap')()
  File "/opt/code/singer-tools/singertools/check_tap.py", line 195, in main
    summary = summarize_output(sys.stdin)
  File "/opt/code/singer-tools/singertools/check_tap.py", line 90, in summarize_output
    summary.add(singer.parse_message(line))
  File "/opt/code/singer-tools/venv/lib/python3.4/site-packages/singer_python-0.2.1-py3.4.egg/singer/__init__.py", line 117, in parse_message
    key_properties=_required_key(o, 'key_properties'))
  File "/opt/code/singer-tools/venv/lib/python3.4/site-packages/singer_python-0.2.1-py3.4.egg/singer/__init__.py", line 101, in _required_key
    k, msg))
Exception: Message is missing required key 'key_properties': {'stream': 'exchange_rate', 'schema': {'properties': {'date': {'format': 'date-time', 'type': 'string'}}, 'additionalProperties': True, 'type': 'object'}, 'type': 'SCHEMA'}

singer-infer-schema

If the data source you're using does not publish a schema, you can use infer-schema to parse a sample of JSON-formatted data and produce a basic schema.

Infer single stream

$ singer-infer-schema < data.json > schema.json

Infer multiple streams

$ cat /tmp/tap-square-output.out | /usr/local/share/virtualenvs/singer-tools/bin/singer-infer-schema --out-dir /tmp/tap-square-schemas

The above will result in a directory like this one with one per stream:

ls /tmp/tap-square-schemas/
categories.inferred.json  inventories.inferred.json  modifier_lists.inferred.json  payments.inferred.json  shifts.inferred.json
discounts.inferred.json   items.inferred.json        orders.inferred.json          refunds.inferred.json   taxes.inferred.json

Note

You should not consider the resulting schema to be complete. It's only intended to be a starting point, and will likely require manual editing. But it's probably easier than writing a schema from scratch.

singer-release

For Singer projects that are written in Python, you should use singer-release to deploy packages to PyPi. This script confirms that your changes are up-to-date with origin/master, tags the release, and then deploys it to PyPi. To run it, just run singer-release from the root directory of a Singer project that has a setup.py file. This script will do the following:

  1. Parses the version number from setup.py
  2. Confirms that you are on the master branch
  3. Confirms that your git working directory and index are clean
  4. Does a git push
  5. Tags the repo with the version number
  6. Pushes the tags with git push --tags
  7. python setup.py sdist upload

Note that singer-release does not change the version number. You must edit setup.py and set the version number manually and commit the change before running singer-release.

diff-jsonl

When you make a change to a tap, you want some confidence that you're not introducing a regression. So, it's helpful to be able to diff the output of tap jobs. The diff-jsonl tool diffs two JSONL files, such as those produced by a tap:

$ diff-jsonl data-on-master.jsonl data-on-branch.jsonl
*** data-on-master.jsonl

--- data-on-branch.jsonl

***************

*** 833,839 ****

          "billingStreet": null,
          "city": null,
          "company": "Corgis Ltd",
!         "contactCompany": 7,
          "cookies": null,
          "country": null,
          "createdAt": "2016-03-10T18:47:20Z",
--- 833,839 ----

          "billingStreet": null,
          "city": null,
          "company": "Corgis Ltd",
!         "contactCompany": "7",
          "cookies": null,
          "country": null,
          "createdAt": "2016-03-10T18:47:20Z",
***************

*** 870,877 ****

          "lastName": "Karstendick",
          "lastReferredEnrollment": null,
          "lastReferredVisit": null,
!         "leadPartitionId": 1,
!         "leadPerson": 7,
          "leadRevenueCycleModelId": null,
          "leadRevenueStageId": null,
          "leadRole": null,
--- 870,877 ----

          "lastName": "Karstendick",
          "lastReferredEnrollment": null,
          "lastReferredVisit": null,
!         "leadPartitionId": "1",
!         "leadPerson": "7",
          "leadRevenueCycleModelId": null,
          "leadRevenueStageId": null,
          "leadRole": null,
***************

License

Copyright © 2017 Stitch

Distributed under the Apache License Version 2.0

singer-tools's People

Contributors

asaf-erlich avatar briansloane avatar iterati avatar kallan357 avatar karstendick avatar mdelaurentis avatar mplovepop avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

singer-tools's Issues

singer-check-tap ignores missing properties

Problem:
singer-check-tap ignores RECORDs with missing properties, even though their SCHEMA does not allow said properties to be null

Expected behavior:
singer-check-tap declares output invalid, or at least warns about records being at odds with the schema

Example:

{"type": "SCHEMA", "stream": "stream", "schema": {"type": "object", "properties": {"id": {"type": "number"}}}, "key_properties": ["id"]}
{"type": "RECORD", "stream": "stream", "record": {}, "time_extracted": "2021-02-17T22:15:35.591070Z" } 

In this simple example, the schema declares id's type to be string (and only string). In fact, it declares it to be a key_property. The record, however, is empty - it doesn't contain id nor any other property. singer-check-tap considers this valid (returns success) as evidenced by this output:

Checking stdin for valid Singer-formatted data
The output is valid.
It contained 2 messages for 1 streams.

      1 schema messages
      1 record messages
      0 state messages

Details by stream:
+--------+---------+---------+
| stream | records | schemas |
+--------+---------+---------+
| stream | 1       | 1       |
+--------+---------+---------+

Bug: False positive data detection in `infer-schema`

I came here to report a bug and start working on a PR but I see there may already be two open PRs that relate to this issue.

The issue is that integers and numerics are sometimes/often inferred to be date-time data types, which causes issues downstream if the resulting schema is not manually overriden.

See #16 for what appears to be a valid fix.

singer-infer-schema is missing a main method

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2299, in resolve
    return functools.reduce(getattr, self.attrs, module)
AttributeError: module 'singertools.infer_schema' has no attribute 'main'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/singer-infer-schema", line 11, in <module>
    load_entry_point('singer-tools==0.0.1', 'console_scripts', 'singer-infer-schema')()
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 561, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2631, in load_entry_point
    return ep.load()
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2291, in load
    return self.resolve()
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2301, in resolve
    raise ImportError(str(exc))
ImportError: module 'singertools.infer_schema' has no attribute 'main'

Problems while testing tap mods

Hello, I'm developing a new data structure for the fulfil io tap (https://github.com/fulfilio/tap-fulfil), because we need some more data to send to the stitch integration, the thing is that while I was trying to test the mods i made to the repository (locally) with the singer-check-tap from the singer-tools repo (https://github.com/singer-io/singer-tools) it always brings me back the same status at the end (ERROR: tap exited with status 1) even testing with the original fulfil tap repo, I was wondering if someone could help me with this, beacuse maybe i'm not testing it right.

Thanks in advance,

Roberto.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.