I'm getting this error when trying to copy editions. The next step in the readme is t

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Error loading editions about openlibrary-search HOT 5 CLOSED

thedug commented on August 26, 2024

Error loading editions

from openlibrary-search.

Comments (5)

DaveBathnes commented on August 26, 2024 3

Hi @thedug thanks for raising this!

I was slightly surprised by the sudden interest in these notes but realised OpenLibrary had seen them and linked to the repository. They were written quite a while ago without really expecting anyone to look at them, so I imagine quite a few things have changed now - but it looks like you've done a good job of tackling that problem!

I've got some time booked in to properly look at this repository though and make sure there's some decent scripts, so thanks for your notes on this.

from openlibrary-search.

thedug commented on August 26, 2024 2

I ended up dropping the column and and will re add it.

I also ran into an issue will null chars. I used this to remove them.

tr < ol_dump_editions_2021-11-30_processed.csv -d '\000' > ol_dump_editions_2021-11-30_processed_nonulls.csv

from openlibrary-search.

DaveBathnes commented on August 26, 2024 1

Just a quick update on this. I'm almost there with a significant refactor which will properly script the database creation (all currently in this branch https://github.com/LibrariesHacked/openlibrary-search/tree/1-error-loading-editions). I've been testing today but due to the sheer size of data it's been going all day. The first attempt failed when disk space ran out!

@thedug On your original question - you were right of course with the work_key causing an error, so dropping it would have fixed and then recreating once the copy import is done. I think in my notes I must have omitted the fact that I start off with the table with only the columns to enable the copy command to work, then add the column and populate it. The editions table just needs the work_key added to link with the works table. Then the authorship table links the authors and works tables.

@gennaios I think it would definitely be a good enhancement to then make it database agnostic. Once I have the database scripts I'd like to refactor them to allow for multiple database engines. There are plenty of complexities to that - indexing the json column for example, which is a particular command to PostgreSQL, and the copy commands which are by far the quickest way of getting the data in to a PostgreSQL DB, but something more general would work across DBs.

Thanks for the feedback and apologies for very late replies!

from openlibrary-search.

gennaios commented on August 26, 2024

By chance, I also happened to find such recently and have an interest. I’ll be importing into Sqlite. I’m not sure what would be some ideal approach but perhaps reformatting the dumps such that one could import into any db? If such is possible and you’d consider such, that’d be great.

from openlibrary-search.

Xaneets commented on August 26, 2024

@DaveBathnes

Just a quick update on this. I'm almost there with a significant refactor which will properly script the database creation (all currently in this branch https://github.com/LibrariesHacked/openlibrary-search/tree/1-error-loading-editions). I've been testing today but due to the sheer size of data it's been going all day. The first attempt failed when disk space ran out!

To speed up data cleaning, you can use a Fast-Open-Library based on the rust language. It clears data 6 > times faster

from openlibrary-search.

Error loading editions about openlibrary-search HOT 5 CLOSED

Comments (5)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent