Giter Site home page Giter Site logo

Comments (13)

snopoke avatar snopoke commented on June 26, 2024

It's not a bug though I agree it is confusing. The reason it's not a bug is that get_db_alias_for_partitioned_doc is only intended for use when we are doing direct queries with partitioned models while db_for_read_write will direct partitioned model queries to the proxy DB for routing.

Note that if USE_PARTITIONED_DATABASE == False then db_for_read_write will also return default (though the logic there is rather suspect).

I think the only reason to call db_for_read_write from get_db_alias_for_partitioned_doc would be if we expect the partitioned tables to be in a database other than default (which we don't).

Happy to have this logic clean up though if you can see a good way to do that.

from commcare-hq.

millerdev avatar millerdev commented on June 26, 2024

get_db_alias_for_partitioned_doc is only intended for use when we are doing direct queries with partitioned models while db_for_read_write will direct partitioned model queries to the proxy DB for routing.

This seems like it could become the source of hard-to-find bugs since the db router (MultiDBRouter) uses db_for_read_write for some queries, and in other places we use get_db_alias_for_partitioned_doc & friends. Knowing which is used in which places and how that impacts model loading and saving has been hard for me to understand.

I suppose we're not using PartitionedModel or get_db_alias_for_partitioned_doc, etc. on warehouse or synclog models? Those would return default when db_for_read_write would return some other db name unrelated to partitioning. Does that sound right to you?

I think the only reason to call db_for_read_write from get_db_alias_for_partitioned_doc would be if we expect the partitioned tables to be in a database other than default (which we don't).

This is what I'm working on. I'm looking into to using a separate database connection for saving blob metadata so it is not lost if an exception is thrown resulting in transaction rollback on the default connection.

I think on partitioned dbs transaction rollback does not happen since those connections are in autocommit mode (maybe that's only the proxy?), but when running in non-sharded environments it can happen. Maybe I should just forget this idea since it seems to be adding a fair bit of complexity, and we probably don't care about lost blobs on non-sharded environments since they are usually small scale. Edit: I really want to know if we're accumulating orphaned blobs. I'm not sure how else to do that.

from commcare-hq.

snopoke avatar snopoke commented on June 26, 2024

I suppose we're not using PartitionedModel or get_db_alias_for_partitioned_doc, etc. on warehouse or synclog models? Those would return default when db_for_read_write would return some other db name unrelated to partitioning. Does that sound right to you?

Yes. If you called get_db_alias_for_partitioned_doc with the ID of a synclog you would certainly get the wrong database.

I'm a little confused about what you were saying about using separate connections for the blob metadata. Are you saying you want to store the in a new database? Or you just want an isolated connection pool?

I thought they would be stored in the shard databases.

from commcare-hq.

millerdev avatar millerdev commented on June 26, 2024

I just want an isolated connection pool. They will be stored in the shard databases.

from commcare-hq.

millerdev avatar millerdev commented on June 26, 2024

The way that I've found to add an isolated connection pool is to add a blobdb database to settings.DATABASES with the same connection info as the default database in non-partitioned environments. In partitioned environments it just uses the shard db connections (although I still need to research if those are in AUTOCOMMIT mode or if they use per-request transactions).

from commcare-hq.

snopoke avatar snopoke commented on June 26, 2024

Pretty sure none of the connections are in autocommit mode.

I agree that I don't think it's worth the effort to go down this route. Why do you think it's necessary in the first place? Is it because you don't want to end up with data in riak that's not represented in SQL?

from commcare-hq.

millerdev avatar millerdev commented on June 26, 2024

Is it because you don't want to end up with data in riak that's not represented in SQL?

Yes. Anywhere a blobdb.put(...) operation occurs inside a transaction.atomic() block where the transaction gets rolled back will result in an orphaned blob in the blob db.

from commcare-hq.

millerdev avatar millerdev commented on June 26, 2024

Pretty sure none of the connections are in autocommit mode.

Interesting. Django's transaction documentation says "Django’s default behavior is to run in autocommit mode." As far as I can tell we are not setting AUTOCOMMIT or ATOMIC_REQUESTS connection parameters anywhere in our codebase, so I would assume we're using the default. Are we wrapping each request in a transaction somewhere?

from commcare-hq.

snopoke avatar snopoke commented on June 26, 2024

from commcare-hq.

millerdev avatar millerdev commented on June 26, 2024

What if you did something like what we during form submission where you create the metadata first and then delete it if there is an error (though in form processing we do the reverse - we delete if there was no error)

I think the critical difference there is SubmissionProcessTracker deletes if there was no error, where I want to do the opposite, but I don't have control of the transaction context inside blobdb.put(...). In other words, I want to keep metadata even if there is an error, but it's automatically deleted on error (transaction rollback), and I can't control that.

Maybe you meant to suggest that we delete the blob if there is an error? This would be great, but sounds hard. I'd need to somehow detect when a transaction rollback occurs involving blob metadata. Django has an on_commit() hook, but not an on_rollback() hook.

Maybe we should take this discussion to voice? I think for now I'm just going to punt and forget about it. Hopefully we don't orphan many blobs.

from commcare-hq.

snopoke avatar snopoke commented on June 26, 2024

from commcare-hq.

millerdev avatar millerdev commented on June 26, 2024

@snopoke do you know if xform.unsaved_attachments are always saved eventually?

xform_attachment.write_content(attachment.content)
if xform_attachment.is_image:
try:
img_size = Image.open(attachment.content_as_file()).size
xform_attachment.properties = dict(width=img_size[0], height=img_size[1])
except IOError:
xform_attachment.content_type = 'application/octet-stream'
xform_attachments.append(xform_attachment)
xform.unsaved_attachments = xform_attachments

This looks like a place where blob content is written to the blob db and then later could be orphaned if unsaved_attachments are not persisted to the database (for example, if an error occurs during form processing).

from commcare-hq.

snopoke avatar snopoke commented on June 26, 2024

Depending on where the error happened that's definitely a possibility.

from commcare-hq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.