Giter Site home page Giter Site logo

Comments (15)

TWeidi avatar TWeidi commented on August 17, 2024 1

Good news, I just moved my project from strawberry-django-plus to strawberry-django and the query resolution time immediately dropped to approx. one second although still having the nested connections in place, which is in accordance with the benchmark repo you linked above.

Nested connections are fine in places where the n+1 issue is not a big issue. But in this case, considering you had 4 nested connection, you would have a worst case scenario of ~800 db queries (1 for the connection itself + 1 for the totalCount + (4 nested connections * 100 results) + (4 nested totalCount * 100 results))

Hm, that sounds plausible, but the django-debug-toolbar was only showing 6 queries to the database (c.f. my linked project) and it is still showing 6 queries after moving to strawberry-django. Maybe the toolbar does not track the extra queries?

from strawberry-django-plus.

bellini666 avatar bellini666 commented on August 17, 2024 1

Good news, I just moved my project from strawberry-django-plus to strawberry-django and the query resolution time immediately dropped to approx. one second although still having the nested connections in place, which is in accordance with the benchmark repo you linked above.

That is indeed good news! :)

Hm, that sounds plausible, but the django-debug-toolbar was only showing 6 queries to the database (c.f. my linked project) and it is still showing 6 queries after moving to strawberry-django. Maybe the toolbar does not track the extra queries?

It should track everything. I'll need to take a look at your project during the weekend to understand what might be going on there

from strawberry-django-plus.

bellini666 avatar bellini666 commented on August 17, 2024

Hey @TWeidi ,

I would expect the performance to be similar to graphene. I have some large projects with similar queries and they perform the same or better than in comparison to when I was using graphene. Wondering what might be going on here.

Could you share more details about your environment? For example:

  1. Your strawberry version and schema configuration (e.g. enabled extensions other than the optimizer)
  2. Your python version
  3. Your django version
  4. Are you running WSGI or ASGI?

Also, would you be able to share that code (or a simplified version of it) that I can run locally to debug?

from strawberry-django-plus.

TWeidi avatar TWeidi commented on August 17, 2024

I just implemented a comparable endpoint with the Django REST-Framework (using select_related and prefetch_related optimizations) which gives me 100 items per page in 500-700 milliseconds, which is approx another factor of six faster than my current graphene-django implementation and about fast enough to have a decent user experience when (re-)loading data to the UI. I was hoping to get to such a response time with GraphQL as well...

from strawberry-django-plus.

TWeidi avatar TWeidi commented on August 17, 2024

About my environment:
1.From my pyproject.toml:

  • strawberry-django-plus = "^3.0.3"
  • strawberry-graphql = "^0.187.5"
  • strawberry-graphql-django = "^0.9.4"
  1. Python version: python = "3.9"
  2. Django version: 4.2
  3. Running currently as WSGI

I guess i can adapt my current model slightly so i can share it. The adaptions will only affect the fields naming.

from strawberry-django-plus.

bellini666 avatar bellini666 commented on August 17, 2024

I just implemented a comparable endpoint with the Django REST-Framework (using select_related and prefetch_related optimizations) which gives me 100 items per page in 500-700 milliseconds, which is approx another factor of six faster than my current graphene-django implementation and about fast enough to have a decent user experience when (re-)loading data to the UI. I was hoping to get to such a response time with GraphQL as well...

REST will always be faster than Strawberry given the same amount of data being returned. That because the GraphQL spec dictates that the server should validate its return value to ensure that the object is really of the type it is defined in the schema, and it is done for each and every value in there. i.e. if you have a list of 100 objects, and each one have 10 fields, you will be doing 1000 validations at the end.

We've been talking internally on Strawberry to try to find a way to improve that, specially for types that comes from ORMs. But currently that is an overhead of the spec itself, which will happen for all servers, be it implemented in Python, Node, Rust, etc.

You can see some benchmarks in here: https://github.com/flbraun/django-graphql-benchmarks#latest-results

1.From my pyproject.toml:

Hrm, all of those I fine. My projects are all on Django 4.2 as well, the only major difference is that I'm running Python 3.11 and ASGI. I know that 3.11 has a lot of performance improvements over 3.9/3.10, but it shouldn't be that much. Also, WSGI should actually be faster than ASGI when comparing a single operation (ASGI will be better for multiple operations, due to improved concurrency handling)

I probably really need an example to test locally, maybe your code is hitting a corner case which is not yet improved.

from strawberry-django-plus.

bellini666 avatar bellini666 commented on August 17, 2024

Ok, I think I found something...

I'm currently merging this lib into https://github.com/strawberry-graphql/strawberry-graphql-django and I run a test in both v3.0.3 and my latest PR on it

I tried to load a list of 5000 items with a lot of attributes in those. It took ~5sec there for ASGI and ~7sec for WSGI. On strawberry-django-plus the same test took ~26sec and ~15sec. So around 4x slower on ASGI and 2x slower on WSGI

That is probably due to the fact that not only the integration there has one less layer, but also I optimized it a lot.

I'm probably going to release the new version of strawberry-django this week (just waiting for a review to finish) and deprecate this lib. If you want and can try it, you can install it from this branch (https://github.com/strawberry-graphql/strawberry-graphql-django/tree/permissions), which contains basically the version that is going to be released, and follow these steps to migrate to it: https://github.com/blb-ventures/strawberry-django-plus/blob/strawberry_django/docs/migration-guide.md#migrating-to-strawberry-django

from strawberry-django-plus.

bellini666 avatar bellini666 commented on August 17, 2024

I'm releasing the new version of strawberry-django today! I did some more benchmarks, this is crazy!

query TestQuery {
  fruits {
    id
    name
    description
    weight
    color {
      id
      name
    }
  }
}

Benchmark query with 10k fruits on Python 3.11:

  • strawberry (vanilla) + sync + fruits in memory: 1.6391847133636475
  • strawberry (vanilla) + async + fruits in memory: 1.1013643741607666
  • strawberry-django (old) + sync: 18.736565351486206
  • strawberry-django (old) + async: 25.834290027618408
  • strawberry-django (old) + sync + optimizer: (N/A)
  • strawberry-django (old) + async + optimizer: (N/A)
  • strawberry-django-plus + sync: 23.372273683547974
  • strawberry-django-plus + async: 20.71276068687439
  • strawberry-django-plus + sync + optimizer: 17.553191423416138
  • strawberry-django-plus + async + optimizer: 14.064252138137817
  • strawberry-django (new) + sync: 7.950979471206665
  • strawberry-django (new) + async: 9.197803020477295
  • strawberry-django (new) + sync + optimizer: 2.4940571784973145
  • strawberry-django (new) + async + optimizer: 2.217967987060547

So, considering that, I'm really interested to know how much difference you will see with the new version

from strawberry-django-plus.

TWeidi avatar TWeidi commented on August 17, 2024

Hey,

yesterday i was preparing a repository to share my current project removing private data, so that you can (hopefully) reproduce the query times i observed. I'm currently adding a small readme and hope to share it with you soon.

Alongside, i started to move to strawberry-django, following your guide and the documentation of strawberry-django.
When looking for a way to replace the gql.django.ListConnectionWithTotalCount with strawberry_django.ListConnectionWithTotalCount as described here i found that neither strawberry_django nor strawberry_django.relay do provide such a class and that the mentioned docs still contain a reference to the gql.django (line 18 in type.py) which might be a copy-paste error.
Could you please share, where the ListConnectionWithTotalCount class lives in the strawberry-django package and if it is save to use strawberry.relay.connection() in place of gql.django.connection()

Thank you very much for you time and effort!

from strawberry-django-plus.

TWeidi avatar TWeidi commented on August 17, 2024

REST will always be faster than Strawberry given the same amount of data being returned. That because the GraphQL spec dictates that the server should validate its return value to ensure that the object is really of the type it is defined in the schema, and it is done for each and every value in there. i.e. if you have a list of 100 objects, and each one have 10 fields, you will be doing 1000 validations at the end.

We've been talking internally on Strawberry to try to find a way to improve that, specially for types that comes from ORMs. But currently that is an overhead of the spec itself, which will happen for all servers, be it implemented in Python, Node, Rust, etc.

OK, i see that validations take extra time that (currently) cannot be avoided and that a comparison to REST must be drawn with care. Nevertheless, i am wondering, if the validations are really that costly to account for a factor of 6 in response time when performing a query of 100 items with 35 fields = 3500 validations as in my concrete use case. If that holds true, i would really appreciate an option to turn validation off or limit it somehow in a future release of strawberry(-django) at the cost of not conforming to the GraphQL spec anymore.

from strawberry-django-plus.

TWeidi avatar TWeidi commented on August 17, 2024

Was just experimenting with my query and found, that if i omit all fields (i had four of them in my query) that are connection to another type, the query resolution speeds up from ~18 seconds to approx. 2 seconds. Since i was only querying for the totalCount of each connection (no other field of the respective Node or Type) i assume that the huge query resolution time is not caused by the validation overhead - since removing 11% of the fields should reduce the response time by the same percentage, while observing a reduction of 89% instead - but from the relay / connection implementation.

from strawberry-django-plus.

TWeidi avatar TWeidi commented on August 17, 2024

You can find my strawberry-django example here

from strawberry-django-plus.

bellini666 avatar bellini666 commented on August 17, 2024

OK, i see that validations take extra time that (currently) cannot be avoided and that a comparison to REST must be drawn with care. Nevertheless, i am wondering, if the validations are really that costly to account for a factor of 6 in response time when performing a query of 100 items with 35 fields = 3500 validations as in my concrete use case. If that holds true, i would really appreciate an option to turn validation off or limit it somehow in a future release of strawberry(-django) at the cost of not conforming to the GraphQL spec anymore.

An option like that would need to be added on graphql-core. Strawberry is basically a wrapper on top of it, the query parsing/execution/validation is done by it.

But as I mentioned, the validation cost is not a factor of 6x. If you look at the benchmark I linked, strawberry slower than DRF by a factor of ~0.5x, and DRF itself is slower than pure JSON response by a factor of ~0.75x

strawberry_django.ListConnectionWithTotalCount as described here i found that neither strawberry_django nor strawberry_django.relay do provide such a class and that the mentioned docs still contain a reference to the gql.django (line 18 in type.py) which might be a copy-paste error.

OMG thanks for pointing it out. Will try to fix that ASAP

Could you please share, where the ListConnectionWithTotalCount class lives in the strawberry-django package and if it is save to use strawberry.relay.connection() in place of gql.django.connection()

The connection class itself can be import with from strawberry_django.relay import ListConnectionWithTotalCount. And gql.django.connection() can be replaced by strawberry_django.connection()

Was just experimenting with my query and found, that if i omit all fields (i had four of them in my query) that are connection to another type, the query resolution speeds up from ~18 seconds to approx. 2 seconds. Since i was only querying for the totalCount of each connection (no other field of the respective Node or Type) i assume that the huge query resolution time is not caused by the validation overhead - since removing 11% of the fields should reduce the response time by the same percentage, while observing a reduction of 89% instead - but from the relay / connection implementation.

Ok, we might have found the culprit here! When you use nested connections you are introducing n+1 issues. The query optimizer can't prefetch nested connections the same way as it would do for lists.

Nested connections are fine in places where the n+1 issue is not a big issue. But in this case, considering you had 4 nested connection, you would have a worst case scenario of ~800 db queries (1 for the connection itself + 1 for the totalCount + (4 nested connections * 100 results) + (4 nested totalCount * 100 results))

In this case it would be better to define those fields as lists, since the query optimizer can prefetch those for all results objects in the connection, and you would end up with 6 db queries (1 for the connection + 1 for the totalCount + 4 for each prefetched list)

But if those nested results are too much and you would actually need to paginate them, maybe creating a separated connection and doing another query to retrieve them would be better?

You can find my strawberry-django example here

Thank you! I'll take a look during the weekend.

from strawberry-django-plus.

TWeidi avatar TWeidi commented on August 17, 2024

I'll consider this issue as solved. Thank you very much for your help!

from strawberry-django-plus.

bellini666 avatar bellini666 commented on August 17, 2024

I'll consider this issue as solved. Thank you very much for your help!

Glad to be helpful! :)

Also, thank you for opening this issue. Without this I would not have wondered about the performance issues we had before (mostly on strawberry-django end, which also affected plus which was built on top of it) and how the new version of strawberry-django basically solved it due to all the refactoring I've done there.

from strawberry-django-plus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.