<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I suppose we're not using PartitionedModel</cod

PartitionedModel.db returns different value than db_for_read_write about commcare-hq HOT 13 CLOSED

dimagi commented on June 26, 2024

PartitionedModel.db returns different value than db_for_read_write

from commcare-hq.

Comments (13)

snopoke commented on June 26, 2024

It's not a bug though I agree it is confusing. The reason it's not a bug is that get_db_alias_for_partitioned_doc is only intended for use when we are doing direct queries with partitioned models while db_for_read_write will direct partitioned model queries to the proxy DB for routing.

Note that if USE_PARTITIONED_DATABASE == False then db_for_read_write will also return default (though the logic there is rather suspect).

I think the only reason to call db_for_read_write from get_db_alias_for_partitioned_doc would be if we expect the partitioned tables to be in a database other than default (which we don't).

Happy to have this logic clean up though if you can see a good way to do that.

from commcare-hq.

millerdev commented on June 26, 2024

get_db_alias_for_partitioned_doc is only intended for use when we are doing direct queries with partitioned models while db_for_read_write will direct partitioned model queries to the proxy DB for routing.

This seems like it could become the source of hard-to-find bugs since the db router (MultiDBRouter) uses db_for_read_write for some queries, and in other places we use get_db_alias_for_partitioned_doc & friends. Knowing which is used in which places and how that impacts model loading and saving has been hard for me to understand.

I suppose we're not using PartitionedModel or get_db_alias_for_partitioned_doc, etc. on warehouse or synclog models? Those would return default when db_for_read_write would return some other db name unrelated to partitioning. Does that sound right to you?

I think the only reason to call db_for_read_write from get_db_alias_for_partitioned_doc would be if we expect the partitioned tables to be in a database other than default (which we don't).

This is what I'm working on. I'm looking into to using a separate database connection for saving blob metadata so it is not lost if an exception is thrown resulting in transaction rollback on the default connection.

I think on partitioned dbs transaction rollback does not happen since those connections are in autocommit mode (maybe that's only the proxy?), but when running in non-sharded environments it can happen. ~~Maybe I should just forget this idea since it seems to be adding a fair bit of complexity, and we probably don't care about lost blobs on non-sharded environments since they are usually small scale.~~ Edit: I really want to know if we're accumulating orphaned blobs. I'm not sure how else to do that.

from commcare-hq.

snopoke commented on June 26, 2024

I suppose we're not using PartitionedModel or get_db_alias_for_partitioned_doc, etc. on warehouse or synclog models? Those would return default when db_for_read_write would return some other db name unrelated to partitioning. Does that sound right to you?

Yes. If you called get_db_alias_for_partitioned_doc with the ID of a synclog you would certainly get the wrong database.

I'm a little confused about what you were saying about using separate connections for the blob metadata. Are you saying you want to store the in a new database? Or you just want an isolated connection pool?

I thought they would be stored in the shard databases.

from commcare-hq.

millerdev commented on June 26, 2024

I just want an isolated connection pool. They will be stored in the shard databases.

from commcare-hq.

millerdev commented on June 26, 2024

The way that I've found to add an isolated connection pool is to add a blobdb database to settings.DATABASES with the same connection info as the default database in non-partitioned environments. In partitioned environments it just uses the shard db connections (although I still need to research if those are in AUTOCOMMIT mode or if they use per-request transactions).

from commcare-hq.

snopoke commented on June 26, 2024

Pretty sure none of the connections are in autocommit mode.

I agree that I don't think it's worth the effort to go down this route. Why do you think it's necessary in the first place? Is it because you don't want to end up with data in riak that's not represented in SQL?

from commcare-hq.

millerdev commented on June 26, 2024

Is it because you don't want to end up with data in riak that's not represented in SQL?

Yes. Anywhere a blobdb.put(...) operation occurs inside a transaction.atomic() block where the transaction gets rolled back will result in an orphaned blob in the blob db.

from commcare-hq.

millerdev commented on June 26, 2024

Pretty sure none of the connections are in autocommit mode.

Interesting. Django's transaction documentation says "Django’s default behavior is to run in autocommit mode." As far as I can tell we are not setting AUTOCOMMIT or ATOMIC_REQUESTS connection parameters anywhere in our codebase, so I would assume we're using the default. Are we wrapping each request in a transaction somewhere?

from commcare-hq.

snopoke commented on June 26, 2024

Pretty sure none of the connections are in autocommit mode. Interesting. Django's transaction documentation <https://docs.djangoproject.com/en/1.11/topics/db/transactions/#django-s-default-transaction-behavior> says "Django’s default behavior is to run in autocommit mode." As far as I can tell we are not setting AUTOCOMMIT or ATOMIC_REQUESTS connection parameters anywhere in our codebase, so I would assume we're using the default. Are we wrapping each request in a transaction somewhere?

Sorry I was thinking of implicit transactions. Is it because you don't want to end up with data in riak that's not

represented in SQL?

Yes. Anywhere a blobdb.put(...) operation occurs inside a

transaction.atomic() block where the transaction gets rolled back will result in an orphaned blob in the blob db

What if you did something like what we during form submission where you create the metadata first and then delete it if there is an error (though in form processing we do the reverse - we delete if there was no error): https://github.com/dimagi/commcare-hq/blob/422ad1988804801a724ae113082f54e002ebe927/corehq/form_processor/submission_post.py#L532 That might fail to delete but having a meta that points to a missing blob is better than the reverse. You could also create it with a flag and then update the fag on success.

from commcare-hq.

millerdev commented on June 26, 2024

What if you did something like what we during form submission where you create the metadata first and then delete it if there is an error (though in form processing we do the reverse - we delete if there was no error)

I think the critical difference there is SubmissionProcessTracker deletes if there was no error, where I want to do the opposite, but I don't have control of the transaction context inside blobdb.put(...). In other words, I want to keep metadata even if there is an error, but it's automatically deleted on error (transaction rollback), and I can't control that.

Maybe you meant to suggest that we delete the blob if there is an error? This would be great, but sounds hard. I'd need to somehow detect when a transaction rollback occurs involving blob metadata. Django has an on_commit() hook, but not an on_rollback() hook.

Maybe we should take this discussion to voice? I think for now I'm just going to punt and forget about it. Hopefully we don't orphan many blobs.

from commcare-hq.

snopoke commented on June 26, 2024

I guess this is why Write Ahead Logs exist. I agree that leaving this for now is fine. It doesn't put us in any worse position than we already are.

from commcare-hq.

millerdev commented on June 26, 2024

@snopoke do you know if xform.unsaved_attachments are always saved eventually?

commcare-hq/corehq/form_processor/backends/sql/processor.py

Lines 43 to 52 in 118d88d

    
               xform_attachment.write_content(attachment.content) 
        
               if xform_attachment.is_image: 
        
                   try: 
        
                       img_size = Image.open(attachment.content_as_file()).size 
        
                       xform_attachment.properties = dict(width=img_size[0], height=img_size[1]) 
        
                   except IOError: 
        
                       xform_attachment.content_type = 'application/octet-stream' 
        
               xform_attachments.append(xform_attachment) 
        
           xform.unsaved_attachments = xform_attachments

This looks like a place where blob content is written to the blob db and then later could be orphaned if unsaved_attachments are not persisted to the database (for example, if an error occurs during form processing).

from commcare-hq.

snopoke commented on June 26, 2024

Depending on where the error happened that's definitely a possibility.

from commcare-hq.

PartitionedModel.db returns different value than db_for_read_write about commcare-hq HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	xform_attachment.write_content(attachment.content)
	if xform_attachment.is_image:
	try:
	img_size = Image.open(attachment.content_as_file()).size
	xform_attachment.properties = dict(width=img_size[0], height=img_size[1])
	except IOError:
	xform_attachment.content_type = 'application/octet-stream'
	xform_attachments.append(xform_attachment)

	xform.unsaved_attachments = xform_attachments