Giter Site home page Giter Site logo

tap-spotify's Introduction

tap-spotify

tap-spotify is a Singer tap for Spotify.

Built with the Meltano Tap SDK for Singer Taps.

Python version Singer SDK version License Code style Test tap-spotify

Overview

tap-spotify extracts raw data from the Spotify Web API for the following resources:

Installation

# pip
pip install git+https://github.com/Matatika/tap-spotify

# pipx
pipx install git+https://github.com/Matatika/tap-spotify

# poetry
poetry add git+https://github.com/Matatika/tap-spotify

Configuration

Accepted Config Options

Name Required Default Description
client_id Yes Your tap-spotify app client ID
client_secret Yes Your tap-spotify app client secret
refresh_token Yes Your tap-spotify app refresh token

A full list of supported settings and capabilities for this tap is available by running:

tap-spotify --about

Source Authentication and Authorization

Before using tap-spotify, you will need to create an app from your Spotify developer dashboard. We recommend restricting your use of this app to tap-spotify only. Provide an name, description and a redirect URI of https://matatika.github.io/spotify-refresh-token (explained below).

Get a Refresh Token

Use this web app to get a refresh token with your Spotify app credentials:

  • Provide your app client ID and secret in the appropriate fields
  • Click 'Submit' and follow the Spotify login flow
  • Copy the refresh token

Credit to Alec Chen for the original project

Each stream requires certain token scopes. By default (all streams selected), the following token scopes are required:

When specific streams are selected, the required token scopes may change.

Stream Required scope(s)
user_top_tracks_st_stream user-top-read
user_top_tracks_mt_stream user-top-read
user_top_tracks_lt_stream user-top-read
user_top_artists_st_stream user-top-read
user_top_artists_mt_stream user-top-read
user_top_artists_lt_stream user-top-read
global_top_tracks_daily_stream
global_top_tracks_weekly_stream
global_viral_tracks_daily_stream
user_saved_tracks_stream user-library-read

If a required scope is not set, tap-spotify will encounter a 403 Forbidden response from the Spotify Web API and fail. You must set all required scopes for the selected streams.

Any other scopes not listed here are not required by tap-spotify. Setting these will allow applications using the same Spotify app credentials to read more specific and possibly sensitive resource data, so do this at your own risk.

Usage

You can easily run tap-spotify by itself or in a pipeline using Meltano.

Executing the Tap Directly

tap-spotify --version
tap-spotify --help
tap-spotify --config CONFIG --discover > ./catalog.json

Developer Resources

Initialize your Development Environment

pipx install poetry
make init

Lint your Code

make lint

Create and Run Tests

Create tests within the tap_spotify/tests subfolder and then run:

make test

You can also test the tap-spotify CLI interface directly using poetry run:

poetry run tap-spotify --help

Testing with Meltano

Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.

Your project comes with a custom meltano.yml project file already created. Open the meltano.yml and follow any "TODO" items listed in the file.

Next, install Meltano (if you haven't already) and any needed plugins:

# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd tap-spotify
meltano install

Now you can test and orchestrate using Meltano:

# Test invocation:
meltano invoke tap-spotify --version
# OR run a test `elt` pipeline:
meltano elt tap-spotify target-jsonl

SDK Dev Guide

See the dev guide for more instructions on how to use the SDK to develop your own taps and targets.

tap-spotify's People

Contributors

danielpdwalker avatar dependabot[bot] avatar pre-commit-ci[bot] avatar reubenfrankel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tap-spotify's Issues

`502 Bad Gateway` from top items endpoints

Looks like something changed on Spotify's side that started causing 502 Bad Gateway errors when accessing top tracks/artists, where the combined value of limit and offset parameters exceeds the total number of items - 100:

{
    "error": {
        "status": 502,
        "message": ""
    }
}

Working

https://api.spotify.com/v1/me/top/tracks?offset=99&limit=1
https://api.spotify.com/v1/me/top/tracks?offset=50&limit=50
https://api.spotify.com/v1/me/top/tracks?offset=49&limit=49

Broken

https://api.spotify.com/v1/me/top/tracks?offset=100&limit=1
https://api.spotify.com/v1/me/top/tracks?offset=51&limit=50
https://api.spotify.com/v1/me/top/tracks?offset=98&limit=49

We originally defined a limit of 49 for top items streams in order to maximise the amount of data returned (i.e. over 100 items) by exploiting a previous quirk of the API, where paginating with next links yielded an "extra" 49 items (147 total):

1st request

{
    "items": [...
    ],
    "total": 100,
    "limit": 49,
    "offset": 0,
    "href": "https://api.spotify.com/v1/me/top/tracks?limit=49",
    "next": "https://api.spotify.com/v1/me/top/tracks?offset=49&limit=49",
    "previous": null
}

2nd request

{
    "items": [...
    ],
    "total": 100,
    "limit": 49,
    "offset": 49,
    "href": "https://api.spotify.com/v1/me/top/tracks?offset=49&limit=49",
    "next": "https://api.spotify.com/v1/me/top/tracks?offset=98&limit=49",
    "previous": "https://api.spotify.com/v1/me/top/tracks?offset=0&limit=49"
}

3rd request (now 502 Bad Gateway) - no next link, pagination stops

{
    "items": [...
    ],
    "total": 100,
    "limit": 49,
    "offset": 98,
    "href": "https://api.spotify.com/v1/me/top/tracks?offset=98&limit=49",
    "next": null,
    "previous": "https://api.spotify.com/v1/me/top/tracks?offset=49&limit=49"
}

If 50 was used instead, pagination would stop at ?limit=50&offset=50 (100 total):

1st request

{
    "items": [...
    ],
    "total": 100,
    "limit": 50,
    "offset": 0,
    "href": "https://api.spotify.com/v1/me/top/tracks?limit=50",
    "next": "https://api.spotify.com/v1/me/top/tracks?offset=50&limit=50",
    "previous": null
}

2nd request - no next link, pagination stops

{
    "items": [...
    ],
    "total": 100,
    "limit": 50,
    "offset": 50,
    "href": "https://api.spotify.com/v1/me/top/tracks?offset=50&limit=50",
    "next": null,
    "previous": "https://api.spotify.com/v1/me/top/tracks?offset=0&limit=50"
}

Unable to parse singer METRICs

Background

Taps can output singer METRIC messages. These are output to stderr, while RECORD messages are output to stdout and intended to be consumed by a target. i.e. tap -> target.

Monitoring on stderr can parse these METRIC messages and turn them into useful pipeline monitoring.

Additional information

Error parsing json {:source=>"raw_metric", :raw=>"{'type': 'counter', 'metric': 'record_count', 'value': 0, 'tags': {'stream': 'user_top_tracks_st_stream'}}", :exception=>#<LogStash::Json::ParserError: Unexpected character (''' (code 39)): was expecting double-quote to start field name

Proposed fix

For this tap the fix is simple. Bump the SDK version to a release with this change: meltano/sdk@9d6a48a

How to obtain refresh token?

I attempted to setup tap-spotify today but was unable to locate the needed refresh_token config value. From the Spotify Developer interface I see client_secret and client_id but not refresh_token.

Any help is much appreciated. Thanks!

Tap not working with transferwise `target-postgres`

With tap-spotify and target-postgres variant transferwise you get the error:

target-postgres--transferwise | loader | time=2022-03-31 10:49:23 name=target_postgres level=CRITICAL message=Primary key is set to mandatory but not defined in the [global_top_tracks_daily_stream] stream
target-postgres--transferwise | loader | Traceback (most recent call last):
target-postgres--transferwise | loader | File "/tmp/shelltaskscripts16237457880279188084/workspace/.meltano/loaders/target-postgres--transferwise/venv/bin/target-postgres", line 8, in <module>
target-postgres--transferwise | loader | sys.exit(main())
target-postgres--transferwise | loader | File "/tmp/shelltaskscripts16237457880279188084/workspace/.meltano/loaders/target-postgres--transferwise/venv/lib/python3.9/site-packages/target_postgres/__init__.py", line 373, in main
target-postgres--transferwise | loader | persist_lines(config, singer_messages)
target-postgres--transferwise | loader | File "/tmp/shelltaskscripts16237457880279188084/workspace/.meltano/loaders/target-postgres--transferwise/venv/lib/python3.9/site-packages/target_postgres/__init__.py", line 209, in persist_lines
target-postgres--transferwise | loader | raise Exception("key_properties field is required")
target-postgres--transferwise | loader | Exception: key_properties field is required
meltano | elt | Loading failed (1): Exception: key_properties field is required
meltano | elt | ELT could not be completed: Loader failed

This can be fixed by adding primary keys to each of the streams.

Please license

It would be awesome if there was a license for this tap.

ci: Warn when top items streams return no records, rather than fail

From the Web API docs:

Over what time frame the affinities are computed. Valid values: long_term (calculated from ~1 year of data and including all new data as it becomes available), medium_term (approximately last 6 months), short_term (approximately last 4 weeks).

At the moment, I am manually streaming some songs on the test account in order to generate some data that the API will return. Recently, streams with a short_term time range started failing because I hadn't done this in a while - I imagine leaving it longer would have meant medium_term and evenutally long_term would have also failed. For the tests, it would be better to warn about no records for these streams rather than fail, as it gives a false impression of the tap being broken at a glance.

Affected streams:

streams.UserTopTracksShortTermStream,
streams.UserTopTracksMediumTermStream,
streams.UserTopTracksLongTermStream,
streams.UserTopArtistsShortTermStream,
streams.UserTopArtistsMediumTermStream,
streams.UserTopArtistsLongTermStream,

`413 Content Too Large` for `/v1/audio-features`

For some reason, the Web API now allows the top tracks endpoint to return over 100 tracks for a user (observed for the long_term range). This has exposed a bug in the tap where it makes a request for the audio features of multiple tracks, the number of which now can be greater than the maximum allowed (100) - this either results in a 400 Client Error or 413 Content Too Large depending on how much this limit is exceeded by.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.