Comments (2)
I definitely like the idea and running a quick query for the counts of every table should be pretty straight forward.
Ben and I have also previously talked about doing something a bit more comprehensive where we basically dump the tables to files and get checksums for the files to confirm that they contain exactly the same data as a version that we believe to be a valid import based on having looked at the files post import. This is easy and efficient in both MySQL (SELECT INTO OUTFILE) and Postgres (COPY) and it looks like maybe for SQLite as well (.mode csv).
The one issue here is that if the data gets updated it will flag as being different, but I think that we can handle that with a combination of a clear description to the user (it's not necessarily that it's wrong, it's just different so you should probably take a look) and having a cron job on a server that checks for changes in databases periodically and pings us if there's a change so that we can update the checksums.
On the other hand, this might get unnecessarily complicated... Thoughts?
from retriever.
Closing since this clearly hasn't risen to the top of the list. We do have some plans for implementing checksums as mentioned above.
from retriever.
Related Issues (20)
- API research for API integration in Data Retriever (GSoC '21)
- Add a default bounding box for usgs-elevation
- Retriever doesn't detect new python scripts HOT 1
- Add RDatasets
- Tidycensus dataset doesn't work with the download and install csv commands. HOT 3
- Make sure that the the R api dataset are run on the retrieverdash
- Add new functions to rdataretriever and Retriever.jl
- Excel xlsx file; not supported HOT 9
- Update codecov to action stage in workflows HOT 2
- not able to use gdal==3.3.2 while working with ".shp" files HOT 2
- Improve test coverage HOT 6
- display_all_rdatasets_names in rdatasets takes a list of package_name HOT 4
- Create breeding bird survey for all releases. HOT 4
- Downloading fails for files with no Content-Disposition HOT 1
- Retriever should gracefully fail if there is no internet. HOT 2
- hacktoberfest guide
- Installation from source fails due to missing configuration HOT 6
- Installation failing on Python 3.12 due to removal of imp package HOT 1
- Test and update Bioclim data
- GSoC 2024 - Getting started. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from retriever.