Comments (13)
It's not a bug though I agree it is confusing. The reason it's not a bug is that get_db_alias_for_partitioned_doc
is only intended for use when we are doing direct queries with partitioned models while db_for_read_write
will direct partitioned model queries to the proxy DB for routing.
Note that if USE_PARTITIONED_DATABASE == False
then db_for_read_write
will also return default
(though the logic there is rather suspect).
I think the only reason to call db_for_read_write
from get_db_alias_for_partitioned_doc
would be if we expect the partitioned tables to be in a database other than default
(which we don't).
Happy to have this logic clean up though if you can see a good way to do that.
from commcare-hq.
get_db_alias_for_partitioned_doc
is only intended for use when we are doing direct queries with partitioned models whiledb_for_read_write
will direct partitioned model queries to the proxy DB for routing.
This seems like it could become the source of hard-to-find bugs since the db router (MultiDBRouter
) uses db_for_read_write
for some queries, and in other places we use get_db_alias_for_partitioned_doc
& friends. Knowing which is used in which places and how that impacts model loading and saving has been hard for me to understand.
I suppose we're not using PartitionedModel
or get_db_alias_for_partitioned_doc
, etc. on warehouse or synclog models? Those would return default
when db_for_read_write
would return some other db name unrelated to partitioning. Does that sound right to you?
I think the only reason to call
db_for_read_write
fromget_db_alias_for_partitioned_doc
would be if we expect the partitioned tables to be in a database other thandefault
(which we don't).
This is what I'm working on. I'm looking into to using a separate database connection for saving blob metadata so it is not lost if an exception is thrown resulting in transaction rollback on the default
connection.
I think on partitioned dbs transaction rollback does not happen since those connections are in autocommit mode (maybe that's only the proxy?), but when running in non-sharded environments it can happen. Maybe I should just forget this idea since it seems to be adding a fair bit of complexity, and we probably don't care about lost blobs on non-sharded environments since they are usually small scale. Edit: I really want to know if we're accumulating orphaned blobs. I'm not sure how else to do that.
from commcare-hq.
I suppose we're not using
PartitionedModel
orget_db_alias_for_partitioned_doc
, etc. on warehouse or synclog models? Those would returndefault
whendb_for_read_write
would return some other db name unrelated to partitioning. Does that sound right to you?
Yes. If you called get_db_alias_for_partitioned_doc
with the ID of a synclog
you would certainly get the wrong database.
I'm a little confused about what you were saying about using separate connections for the blob metadata. Are you saying you want to store the in a new database? Or you just want an isolated connection pool?
I thought they would be stored in the shard databases.
from commcare-hq.
I just want an isolated connection pool. They will be stored in the shard databases.
from commcare-hq.
The way that I've found to add an isolated connection pool is to add a blobdb
database to settings.DATABASES
with the same connection info as the default
database in non-partitioned environments. In partitioned environments it just uses the shard db connections (although I still need to research if those are in AUTOCOMMIT mode or if they use per-request transactions).
from commcare-hq.
Pretty sure none of the connections are in autocommit mode.
I agree that I don't think it's worth the effort to go down this route. Why do you think it's necessary in the first place? Is it because you don't want to end up with data in riak that's not represented in SQL?
from commcare-hq.
Is it because you don't want to end up with data in riak that's not represented in SQL?
Yes. Anywhere a blobdb.put(...)
operation occurs inside a transaction.atomic()
block where the transaction gets rolled back will result in an orphaned blob in the blob db.
from commcare-hq.
Pretty sure none of the connections are in autocommit mode.
Interesting. Django's transaction documentation says "Django’s default behavior is to run in autocommit mode." As far as I can tell we are not setting AUTOCOMMIT
or ATOMIC_REQUESTS
connection parameters anywhere in our codebase, so I would assume we're using the default. Are we wrapping each request in a transaction somewhere?
from commcare-hq.
from commcare-hq.
What if you did something like what we during form submission where you create the metadata first and then delete it if there is an error (though in form processing we do the reverse - we delete if there was no error)
I think the critical difference there is SubmissionProcessTracker
deletes if there was no error, where I want to do the opposite, but I don't have control of the transaction context inside blobdb.put(...)
. In other words, I want to keep metadata even if there is an error, but it's automatically deleted on error (transaction rollback), and I can't control that.
Maybe you meant to suggest that we delete the blob if there is an error? This would be great, but sounds hard. I'd need to somehow detect when a transaction rollback occurs involving blob metadata. Django has an on_commit()
hook, but not an on_rollback()
hook.
Maybe we should take this discussion to voice? I think for now I'm just going to punt and forget about it. Hopefully we don't orphan many blobs.
from commcare-hq.
from commcare-hq.
@snopoke do you know if xform.unsaved_attachments
are always saved eventually?
commcare-hq/corehq/form_processor/backends/sql/processor.py
Lines 43 to 52 in 118d88d
This looks like a place where blob content is written to the blob db and then later could be orphaned if unsaved_attachments
are not persisted to the database (for example, if an error occurs during form processing).
from commcare-hq.
Depending on where the error happened that's definitely a possibility.
from commcare-hq.
Related Issues (20)
- [CEP] runtime configurable settings HOT 4
- [CEP] "Single-use" data links HOT 11
- API Endpoints - Add Lookup Table to App Structure API
- [CEP] Cache formplayer's validate_form's result on HQ to improve App Preview Refreshes HOT 5
- [CEP] Table partitioning for remote API logs HOT 1
- [CEP] Allow filtering by privileges on Exchange HOT 4
- [CEP] Allow deprecating unused case properties HOT 1
- [CEP] Generic HQ auditing model HOT 8
- [CEP] expanded parent xpath filtering for case search HOT 1
- [CEP] The use of type hints HOT 33
- [CEP] CommCare Query Language for Case Search HOT 12
- [CEP] date_add function for case search HOT 9
- [CEP] Case Search Query support for creating calendar based dates HOT 4
- [CEP] Custom repeater URLs with UCR expressions HOT 2
- [CEP] System Forms Metadata Standardization HOT 6
- HELPME: Refactor problematic methods
- [CEP] Supporting domains without privilege while GAing feature-flags to paid privileges HOT 26
- [CEP] Do not preindex_everything on every deploy HOT 7
- Error building commcarehq_incl to run HQ in Docker
- Blobdb error bug after restore on Ubuntu 22.04
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from commcare-hq.