Giter Site home page Giter Site logo

simonw / airtable-export Goto Github PK

View Code? Open in Web Editor NEW
104.0 6.0 14.0 34 KB

Export Airtable data to YAML, JSON or SQLite files on disk

Home Page: https://datasette.io/tools/airtable-export

License: Apache License 2.0

Python 100.00%
airtable airtable-api yaml datasette-io datasette-tool

airtable-export's Introduction

airtable-export

PyPI Changelog Tests License

Export Airtable data to files on disk

Installation

Install this tool using pip:

$ pip install airtable-export

Usage

You will need to the following information:

  • Your Airtable base ID - this is a string starting with app...
  • Your Airtable personal access token - this is a string starting with pat...

If you just want to export a subset of your tables you also need to know the names of those tables.

You can export all of your data to a folder called export/ by running the following:

airtable-export export base_id --key=key

This example would files for each of your tables, for example: export/table1.yml and export/table2.yml.

Rather than passing the API key using the --key option you can set it as an environment variable called AIRTABLE_KEY.

To export only specified tables, pass their names as additional arguments:

airtable-export export base_id table1 table2 --key=key

Export options

By default the tool exports your data as YAML.

You can also export as JSON or as newline delimited JSON using the --json or --ndjson options:

airtable-export export base_id --key=key --ndjson

You can pass multiple format options at once. This command will create a .json, .yml and .ndjson file for each exported table:

airtable-export export base_id \
    --key=key --ndjson --yaml --json

If you import all tables, or if you add the --schema option, a JSON schema for the base will be written to output-dir/_schema.json.

SQLite database export

You can export tables to a SQLite database file using the --sqlite database.db option:

airtable-export export base_id \
    --key=key --sqlite database.db

This can be combined with other format options. If you only specify --sqlite the export directory argument will be ignored.

The SQLite database will have a table created for each table you export. Those tables will have a primary key column called airtable_id.

If you run this command against an existing SQLite database records with matching primary keys will be over-written by new records from the export.

Request options

By default the tool uses python-httpx's default configurations.

You can override the user-agent using the --user-agent option:

airtable-export export base_id table1 table2 --key=key --user-agent "Airtable Export Robot"

You can override the timeout during a network read operation using the --http-read-timeout option. If not set, this defaults to 5s.

airtable-export export base_id table1 table2 --key=key --http-read-timeout 60

Running this using GitHub Actions

GitHub Actions is GitHub's workflow automation product. You can use it to run airtable-export in order to back up your Airtable data to a GitHub repository. Doing this gives you a visible commit history of changes you make to your Airtable data - like this one.

To run this for your own Airtable database you'll first need to add the following secrets to your GitHub repository:

AIRTABLE_BASE_ID
The base ID, a string beginning `app...`
AIRTABLE_KEY
Your Airtable API key
AIRTABLE_TABLES
A space separated list of the Airtable tables that you want to backup. If any of these contain spaces you will need to enclose them in single quotes, e.g. 'My table with spaces in the name' OtherTableWithNoSpaces

Once you have set those secrets, add the following as a file called .github/workflows/backup-airtable.yml:

name: Backup Airtable

on:
  workflow_dispatch:
  schedule:
  - cron: '32 0 * * *'

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - name: Check out repo
      uses: actions/checkout@v2
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: 3.8
    - uses: actions/cache@v2
      name: Configure pip caching
      with:
        path: ~/.cache/pip
        key: ${{ runner.os }}-pip-
        restore-keys: |
          ${{ runner.os }}-pip-
    - name: Install airtable-export
      run: |
        pip install airtable-export
    - name: Backup Airtable to backups/
      env:
        AIRTABLE_BASE_ID: ${{ secrets.AIRTABLE_BASE_ID }}
        AIRTABLE_KEY: ${{ secrets.AIRTABLE_KEY }}
        AIRTABLE_TABLES: ${{ secrets.AIRTABLE_TABLES }}
      run: |-
        airtable-export backups $AIRTABLE_BASE_ID $AIRTABLE_TABLES -v
    - name: Commit and push if it changed
      run: |-
        git config user.name "Automated"
        git config user.email "[email protected]"
        git add -A
        timestamp=$(date -u)
        git commit -m "Latest data: ${timestamp}" || exit 0
        git push

This will run once a day (at 32 minutes past midnight UTC) and will also run if you manually click the "Run workflow" button, see GitHub Actions: Manual triggers with workflow_dispatch.

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd airtable-export
python -mvenv venv
source venv/bin/activate

Or if you are using pipenv:

pipenv shell

Now install the dependencies and tests:

pip install -e '.[test]'

To run the tests:

pytest

airtable-export's People

Contributors

eliblock avatar simonw avatar trevormunoz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

airtable-export's Issues

Airtable is changing its authentication system

Allow multiple output formats

Running this should create each table file three times, with three different extensions:

airtable-export ... --yaml --json --ndjson

The new --yaml option is only needed if you specify at least one other format, since it's the default.

Make stuff easier to view in Datasette

The exported SQLite database opened in Datasette looks like this:

CleanShot 2023-09-17 at 08 25 28@2x

Originally I was going to try to turn these into many-to-many table records, but Datasette isn't actually great at displaying those yet (and Airtable really does favour many-to-many over single-to-many).

But I think there are some tricks I can pull off with the render_cell plugin hook which can get most of what I want here with a minimal amount of work.

  • If a column has an array of strings all starting rec* then attempt to resolve those rows against other tables and show links to them.
  • If a column has JSON that matches the Airtable photos/attachments format turn that into thumbnail images (which link to the full sized image)

I'm going to play around with this and see if I can get it to work.

Restore?

Just curious if there is any interest or best practice for restoring an Airtable base from the exported files created by airtable-export? I realize that this is not a backup tool, but it does get tantalizingly close.

Sensible error handling

If the Airtable API returns an error - due to an incorrect API key for example - we should handle it nicely.

returnFieldsByFieldId

It would be nice to have a built-in option to enable "returnFieldsByFieldId=true" since there are many use cases where there is an advantage to having field names not change if they are renamed in AirTable. Thank you for your work!

Authentication problem

Hi,

I'm trying to export tables from a db of mine, but I am always getting 406 errors, although the very same call with the same api key through curl works just fine. Any idea what could cause this? (I'm running Linux Manjaro if this can help troubleshooting)

$ airtable-export --version
airtable-export, version 0.6

$ airtable-export export --key=keyXXX export appXXX prizes -v
Error: 406 Client Error: Not Acceptable for url: https://api.airtable.com/v0/export/appXXX
For more information check: https://httpstatuses.com/406

$ curl https://api.airtable.com/v0/appXXXprizes\?api_key\=keyXXX
{"records":[{"id":"rec5EuisiankLtNJg","fields":...

No tables downloaded if I only specify base_id

Congrats for this great package!

I followed the instructions and I am not able to download tables if I only supply the base_id parameter, e.g. this is my command (API key is the environment variable):

airtable-export export appXXXX --sqlite my_export.db

The output is a 0 byte empty my_export.db file.

However, if I supply the table names, it does work:

airtable-export export appXXXX tblYYYYY --sqlite my_export.db

I have verified with my own code that I am able to retrieve everything, from schema to all table names, via the API with these URLs:

https://api.airtable.com/v0/meta/bases/{base_id}/tables
https://api.airtable.com/v0/{base_id}/{table_name}

What could be the problem? Thanks in advance

Add JSON support

Some people may prefer to use JSON, or newline-delimited JSON, over YAML.

Ability to export all tables in a base

This was blocked for ages on the need for a schema API, but they appear to have one now:

Docs: https://airtable.com/developers/web/api/get-base-schema

curl "https://api.airtable.com/v0/meta/bases/{baseId}/tables"
{
  "tables": [
    {
      "description": "Apartments to track.",
      "fields": [
        {
          "description": "Name of the apartment",
          "id": "fld1VnoyuotSTyxW1",
          "name": "Name",
          "type": "singleLineText"
        },
        {
          "id": "fldoaIqdn5szURHpw",
          "name": "Pictures",
          "type": "multipleAttachments"
        },
        {
          "id": "fldumZe00w09RYTW6",
          "name": "District",
          "options": {
            "inverseLinkFieldId": "fldWnCJlo2z6ttT8Y",
            "isReversed": false,
            "linkedTableId": "tblK6MZHez0ZvBChZ",
            "prefersSingleRecordLink": true
          },
          "type": "multipleRecordLinks"
        }
      ],

Option to export and save images and attachments

https://support.airtable.com/docs/airtable-attachment-url-behavior

The attachment viewer URL requires a viewer to be logged in via Airtable, whereas the download URL can be publicly accessed via a web browser. This presents a potential security risk for any files that contain sensitive information since a permanently accessible download URL would still be viewable even after a viewer has been removed from an Airtable workspace. For that reason, we introduced expiring attachment download URLs starting November 8th, 2022.

[...]

API - Similar to the previously mentioned CSV downloads, download any attachments from the links attachment obtained via Airtable’s API before those links expire.

This tool should grow a feature to download and save attachments, since the URLs are no longer a useful long-term reference.

How long until they expire? The docs say:

If you, or a third-party integration, are using API processes to obtain attachment URLs from Airtable, then it’s important to note that the URLs obtained will only be active for a short period of time (~2 hours). Using Airtable as a CDN has been and continues to be discouraged in line with this change.

Error handling / retry

Per the airtable API docs:

RATE LIMITS
The API is limited to 5 requests per second per base. If you exceed this rate, you will receive a 429 status code and will need to wait 30 seconds before subsequent requests will succeed.

The official JavaScript client has built-in retry logic.

If you anticipate a higher read volume, we recommend using a caching proxy. This rate limit is the same for all plans and increased limits are not currently available.

Possibly related, but possibly unrelated, I've been seeing occasional Error: The read operation timed out in my GitHub Actions airtable backup jobs.

I don't know if there is already internal retry logic and we're just blowing past the retry count or if we're running into something not currently contemplated by the code.

Support for `joins` or `foreign keys` when exporting to sqlite

Hi,

When exporting an airtable db to sqlite, is there a way to query the resulting dataset with joins ?
I'm not an expert in sqlite, but I'm guessing it is possible to join 2 tables without having foreign keys defined? But given the format of those foreign keys (eg. ["recqyhKQGTj1HU8Gi"]), I'm not sure if it is possible.

Thanks for possible hints!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.