Giter Site home page Giter Site logo

(Python) database tools / chapter? about guide HOT 13 OPEN

egpbos avatar egpbos commented on September 22, 2024
(Python) database tools / chapter?

from guide.

Comments (13)

egpbos avatar egpbos commented on September 22, 2024 1

@LourensVeen so what you are saying is that sqlite3 actually is a useful one to keep there?

@suvayu ok, this is an interesting direction. It's sounding more like a chapter on databases, though. Perhaps actually something like that is better than listing random libraries that nobody may ever use? I also remember from my brief time doing MySQL + SQLAlchemy that we ended up spending a lot of time just doing raw SQL and indeed interfacing directly with the MySQL CLI tool or e.g. through good old phpMyAdmin.

I would say then that the database interfaces list can simply be removed from this chapter. If people know the general way of working with databases and they need either an ORM or some direct interface between the DB they chose and the language they use, then Google will deliver it to them. Ok, maybe it makes sense to list popular ORMs in such a chapter, among which would be SQLAlchemy, but that would be the extent of the Python-/language-specific needs then.

Agree? Disagree? Things I missed?

from guide.

f-hafner avatar f-hafner commented on September 22, 2024 1

I think I tried to replace the entire sqlite db with DuckDB in the past, but then abandoned the idea because I could not persist the row indices, but perhaps I should have tried a little more.

so yes, @suvayu , let's have a look together. I can't share the data publicly, but can give you access or prepare an extract that I can share.

from guide.

egpbos avatar egpbos commented on September 22, 2024

Btw, for the historians among you, it seems like the original contributor of this list is lost in the mists of time ;) https://github.com/NLeSC/guide/blame/52cf8b2fe97f571cff745f3d8a60d915deb5b4dd/software/language_specific_info.md

from guide.

HannoSpreeuw avatar HannoSpreeuw commented on September 22, 2024

Sorry, I do have some experience with SQLAlchemy from PADRE, but that was mainly about getting it to work after a Python 2 --> 3 conversion. What you need is an opinion about whether SA is the preferred Object Relational Mapper. I truly don't know.

from guide.

egpbos avatar egpbos commented on September 22, 2024

How about @suvayu?

I think @LourensVeen has the theoretical background on ORMs, but is not familiar with SA, or at least that's the last I heard... Perhaps still enough to make a sensible recommendation on this?

from guide.

LourensVeen avatar LourensVeen commented on September 22, 2024

I have extensive but outdated experience with PostgreSQL, and I'm using sqlite3 from Python directly through its SQL-based API. I could probably do some research and write a reasonable recommendation, but I'm also swamped again, so at the earliest in 2024...

from guide.

suvayu avatar suvayu commented on September 22, 2024

I'm guessing this is the section you are referring to? My experience with both databases and SQLAlchemy is only at the level "I know enough to debug, and spot obvious mistakes".

That said, one (somewhat) low effort approach could be someone moderately experienced go through related awesome lists and select a few recommendations.

Another point would be, not to stop at just interfaces, but mention other related tools like db documentation, query optimisers/explainers, database modelling hints, and convenience utilities like DBCli.

I'll be back next week. I can help as long as someone else also joins me.

from guide.

egpbos avatar egpbos commented on September 22, 2024

@wrvhage What does the Data SIG think, is this something that makes sense?

Note btw that new Guide contribution guidelines will soon be online, so please don't write a database chapter yet ;)

from guide.

LourensVeen avatar LourensVeen commented on September 22, 2024

sqlite is excellent, and definitely recommended if you have a use case that fits it. For lots of data or complex queries or if you need a separate server, then PostgreSQL is better, but it does have a separate server so it makes your application context more complex. And then of course there are column stores and NoSQL databases that may or may not be a better match than SQL. SQL often gets overlooked these days, which is a shame because there are definitely still lots of use cases where it's just the right solution.

from guide.

suvayu avatar suvayu commented on September 22, 2024

My impression is for research use cases, these days duckdb is probably a good choice, since the data is usually static and it's more about analytics.

from guide.

f-hafner avatar f-hafner commented on September 22, 2024

I came across this thread by chance, and thought I added my two cents from one of my previous research projects. There, we've been using sqlite extensively and over time built a database of almost 1TB with many tables and relatively complex queries.

What we have been struggling is aggregation queries -- they are quite slow in sqlite for this size of db. I recently experimented with DuckDB, and I could connect to the sqlite existing database and perform some aggregation queries 10x faster than through sqlite.

In short, I think combining DuckDB and sqlite could be promising in many use-cases. I guess @suvayu is right that the more static the data, the more useful is DuckDB. But if you are building a db from scratch or adding a lot of tables (which at least in the social sciences is not uncommon), I think sqlite is better because it can persist row indices which I think DuckDB cannot. For analytics, it seems one can then still use DuckDB, but perhaps one could explore the combination of the two a bit more.

from guide.

suvayu avatar suvayu commented on September 22, 2024

from guide.

f-hafner avatar f-hafner commented on September 22, 2024

@suvayu and I will work on the following

Issues to address for db comparisons duckdb vs sqlite

  • when resources are constrained, it should not fail -> set memory limit; spill to temp directory
  • how to control resource use (cores, threads), with an eye on personal laptop vs server
  • compare speed on large queries (aggregation) between sqlite and duckdb -- is there one area where we prefer sqlite??
  • wider use cases, like querying files (json, parquet, csv, excel)
  • lazy evaluation? in sqlite vs duckdb?

from guide.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.