Giter Site home page Giter Site logo

simonw / datasette-upload-csvs Goto Github PK

View Code? Open in Web Editor NEW
21.0 3.0 6.0 55 KB

Datasette plugin for uploading CSV files and converting them to database tables

Home Page: https://datasette.io/plugins/datasette-upload-csvs

License: Apache License 2.0

Python 73.59% HTML 26.41%
datasette-plugin datasette csvs datasette-io

datasette-upload-csvs's Introduction

datasette-upload-csvs

PyPI Changelog Tests License

Datasette plugin for uploading CSV files and converting them to database tables

Installation

datasette install datasette-upload-csvs

Usage

The plugin adds an interface at /-/upload-csvs for uploading a CSV file and using it to create a new database table.

By default only the root actor can access the page - so you'll need to run Datasette with the --root option and click on the link shown in the terminal to sign in and access the page.

The upload-csvs permission governs access. You can use permission plugins such as datasette-permissions-sql to grant additional access to the write interface.

datasette-upload-csvs's People

Contributors

asg017 avatar simonw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

datasette-upload-csvs's Issues

Forbidden - No mutable databases available

I just installed datasette with pipx (Ubuntu 22.04), and I injected datasette-upload-csvs with:
pipx inject datasette datasette-upload-csvs

Then I launch datasette with root permissions
datasette --root
and click on the link with the token.

If I try to navigate to the /-/upload-csvs link I get the 403 error:
Forbidden - No mutable databases available

What am I doing wrong? Is there any setting I should change to upload csvs?

It is a fresh datasette instance, it has not database except _memory

Thanks,
Luigi

add primary key to (possibly) enable drill-down

After importing a small CSV, I used datasette enrichments to generate a full_address field, and an opencage_json field. The problem is that I can't see the full opencage_json data. I thought I could click on the rowid to drill into detail, but it is not linked. I'm guessing that that's because there is no primary key. If that's the case, it might be helpful to offer settings in datasette-upload-csvs, so that the primary key defaults to rowid unless another column is selected.

Screen Shot 2024-01-19 at 10 48 26 PM

Ability to set permissions on a per-database basis

Need to consider permissions here too. Currently:

async def upload_csvs(scope, receive, datasette, request):
if not await datasette.permission_allowed(
request.actor, "upload-csvs", default=False
):
raise Forbidden("Permission denied for upload-csvs")

That needs to take the database into consideration as well.

Good opportunity for me to revise (and maybe document) how Datasette permissions deal with changing an existing permission to now depend on a resource.

Originally posted by @simonw in #28 (comment)

Redesign how this plugin handles parsing CSV and writing to the DB

The plugin currently works by kicking off a long-running operation in the single "write" thread and parsing the CSV file entirely within that operation:

await db.execute_write_fn(insert_docs_catch_errors, block=False)

I'm having trouble getting the tests to pass against Datasette 1.0 - see #36 - which has made me think that this might not be the best approach.

I'd rather not tie up the write connection for so long - ideally I'd like the parsing to happen in a separate thread with rows written to the database 100 at a time or so.

I'm not entirely sure how to do that, so I'll likely get a good TIL out of it.

Tests fail against latest dependencies

FAILED tests/test_datasette_upload_csvs.py::test_redirect - TypeError: AsyncClient.get() got an unexpected keyword argument 'allow_redirects'
FAILED tests/test_datasette_upload_csvs.py::test_upload - TypeError: Expected bytes or bytes-like object got: <class 'str'>
FAILED tests/test_datasette_upload_csvs.py::test_permissions - TypeError: AsyncClient.get() got an unexpected keyword argument 'allow_redirects'

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 105: invalid start byte

Got this uploading this CSV: https://static.simonwillison.net/static/2022/Animal%20Rescue%20incidents%20attended%20by%20LFB%20from%20Jan%202009.csv

Relevant TODO:

def insert_docs(conn):
database = sqlite_utils.Database(conn)
# TODO: Support other encodings:
reader = csv_std.reader(codecs.iterdecode(csv.file, "utf-8"))
headers = next(reader)

Preview first 100 lines before importing

Importing CSV files is tricky when there's a chance they might actually be TSV, or may use an unexpected encoding or strange escaping rules.

I can address many of these issues by providing a "preview" of the CSV file before running the conversion to SQLite. The preview could operate against the first 100 rows (which can be efficiently loaded without consuming the whole file). The user can then confirm that it looks like the import is correct, or can switch to CSV or some other encoding/escaping mechanism and preview again.

This also provides the opportunity to allow the user to select the primary key, select columns that should be extracted into a separate table and select columns for indexing with FTS.

ascii codec Error uploading CSV

Got this uploading CSV of the Squirrel Census:

{
	"0": {
		"id": "f029f3d9-1d9c-43f7-9eaa-297d3e3e3f9d",
		"filename": "2018_Central_Park_Squirrel_Census_-_Squirrel_Data",
		"bytes_todo": 747760,
		"bytes_done": 7604,
		"rows_done": 30,
		"started": "2022-07-03 20:10:18.452435",
		"completed": null,
		"error": "'ascii' codec can't decode byte 0xe2 in position 170: ordinal not in range(128)"
	}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.