Giter Site home page Giter Site logo

Comments (11)

ses4j avatar ses4j commented on September 1, 2024 1

@orf @Ehco1996 Django 2.2 is now released, and docs are no longer draft:

https://docs.djangoproject.com/en/2.2/ref/models/querysets/#django.db.models.query.QuerySet.bulk_update

from django-bulk-update.

orf avatar orf commented on September 1, 2024 1

I'd like to really thank you for creating a reproduction repository @mikicz, it's increadibly helpful and I wish everyones reproductions where as detailed as this!

I've created a ticket on the Django bugtracker (https://code.djangoproject.com/ticket/31202) and assigned it to myself. I'm somewhat busy right now but I promise that I will spend some time and see if I can dig more into this. I'll update the ticket rather than this issue.

Thanks again for creating this test case.

from django-bulk-update.

Ehco1996 avatar Ehco1996 commented on September 1, 2024

it's so great ,but i found that the doc is in dev ,and the newest django stable ver is 2.1.2
so bulk_update is not available

from django-bulk-update.

wahello avatar wahello commented on September 1, 2024

Thanks a great. It's helpful before django 2.2.

from django-bulk-update.

mikicz avatar mikicz commented on September 1, 2024

It would also be really helpful to document the speed difference. In my experience (updating tens of thousands of lines) this package does the update waaay faster.

from django-bulk-update.

orf avatar orf commented on September 1, 2024

Can you elaborate? It uses the same method, so the speed difference should be negligible. If you find Django is way slower then please open a ticket with some details!

from django-bulk-update.

mikicz avatar mikicz commented on September 1, 2024

I'm on Django 2.2.9. Might be doing something wrong, but can't see what.

objects = list(Model.objects.all()[:10000])
Model.objects.bulk_update(
    objects,
    [], # about 10 fields
    batch_size=1000
)
from bulk_update.helper import bulk_update

objects = list(Model.objects.all()[:10000])
bulk_update(
    objects,
    update_fields=[], # about 10 fields
    batch_size=1000
)

Bellow is profiling done by pyinstrument. Django in-built solution takes 144 seconds on my pc. The package one takes 2.5 seconds.

inbuilt_pyinstrument.txt
package_pyinstrument.txt

from django-bulk-update.

mikicz avatar mikicz commented on September 1, 2024

@orf Hi, did you manage to to take a look at that? I could provide a more comprehensive example if necessary...

from django-bulk-update.

orf avatar orf commented on September 1, 2024

Not yet, but those samples are invaluable. The two big differences between this package and the Django inbuilt one are:

  1. Django uses the Expressions API. This is where the overhead is coming from - but it's pretty ridiculous one. Perhaps you're hitting an edge case here.
  2. Django works around SQL parameter limitations in some databases (Oracle/SQlite).

Knowing the types of the fields would be really useful, as well as the kinds of values those fields contain (lots of large strings or arrays?).

I would love it if you could perhaps post the timings here with 1 to 10 columns, and the same number of rows. It would be interesting to see how the times grow?

from django-bulk-update.

mikicz avatar mikicz commented on September 1, 2024

I created an example project. Please excuse the non-imaginative naming of my classes and columns. The column types respect the types of my model in the project where I encountered the problem. In the project I have there are a bunch of other columns as well which aren't updated, I didn't include them, but obviously that isn't what's causing the slow performance, as the results bellow show. https://github.com/mikicz/bulk-update-tests

The most important bits:
Models: https://github.com/mikicz/bulk-update-tests/blob/master/apps/something/models.py
Update code: https://github.com/mikicz/bulk-update-tests/blob/master/apps/something/test_bulk_update.py

Results on my new quite powerfull Dell, with local PostgreSQL 11.5

In [1]: Something.objects.count()                                                                                                                                                                                                             
Out[1]: 1008895

In [2]: from apps.something.test_bulk_update import *                                                                                                                                                                                         

In [3]: %time inbuilt()                                                                                                                                                                                                                       
CPU times: user 2min 9s, sys: 653 ms, total: 2min 10s
Wall time: 2min 24s

In [4]: %time bulk_update_package()                                                                                                                                                                                                           
CPU times: user 11.2 s, sys: 41.6 ms, total: 11.3 s
Wall time: 24.2 s

I would say that the difference is quite significant. I am planning to do some more analysis, maybe dropping some of the different column types from the update etc.

from django-bulk-update.

mikicz avatar mikicz commented on September 1, 2024

I'm only happy to be of help, to be a tiny part of making Django such a great resource for everybody using it. Thank you for your work and actioning this!

from django-bulk-update.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.