Giter Site home page Giter Site logo

Comments (11)

acelaya avatar acelaya commented on July 18, 2024 1

Thanks!

An index on the expiry date column will drastically increase the speed

Yep, I think you are right there. I'll do some testing.

from shlink.

marijnvandevoorde avatar marijnvandevoorde commented on July 18, 2024

I took the liberty of adding it, hope I didn't overlook any guidelines you have while doing so. If not, it's not the biggest change and the PR can maybe provide you some ready-to-copy-paste material ;-)

from shlink.

acelaya avatar acelaya commented on July 18, 2024

Yep, I saw it, thanks a lot 🙂. I was definitely planning to use that migration to test this.

from shlink.

acelaya avatar acelaya commented on July 18, 2024

I have tried inserting 1M short URLs, all with a valid_until date in the past. Then when trying to delete them, it takes around 20 seconds (in my laptop) regardless of the presence of the index, so it doesn't seem to have much effect.

I have tried with Postgres only, I will try at least with MySQL as well.

from shlink.

marijnvandevoorde avatar marijnvandevoorde commented on July 18, 2024

I was super surprised by your message at first, but I think I have an explanation:
Since all links are expired, the command has to delete all of them, so it has to go over all links, regardless of the presence of an index.

The real test would be to have 1M links but only 1k expired, and then run it? It should be almost instand with an index but will probably take 20 seconds without the index?

from shlink.

acelaya avatar acelaya commented on July 18, 2024

Hmm, yeah, that actually makes sense. I'll try that and come back with the results.

I also want to experiment a bit with short urls which are "expired" due to having reached the max amount of visits. In that case, the query requires a sub query, so it'll require a different optimization approach.

from shlink.

acelaya avatar acelaya commented on July 18, 2024

I was super surprised by your message at first, but I think I have an explanation: Since all links are expired, the command has to delete all of them, so it has to go over all links, regardless of the presence of an index.

The real test would be to have 1M links but only 1k expired, and then run it? It should be almost instand with an index but will probably take 20 seconds without the index?

Well, the execution times are again very similar 😅

To be honest, with just 1k expired URLs and the rest having valid_since in the future, the query takes ~30ms with the index, and ~150ms without it, so it somewhat has an effect, but it is not terribly bad without it.

And then, the more expired short URLs there are, the less the index makes a difference. I then tried with 100k expired short URLs from the 1M total URLs, and it took ~2s in both cases.

Again, I just tested with Postgres, and it could be this difference is bigger with MySQL, specially with >=8.0 versions.

I feel tempted to merge the PR anyway, with a couple changes, as it definitely doesn't make things worst, but I feel like I should test other database engines first.

In the meantime, feel free to do some tests yourself, as I might be making some mistakes.

from shlink.

marijnvandevoorde avatar marijnvandevoorde commented on July 18, 2024

We were planning to do some performance testing so we can do that as well. To be honest, the PR also comes from "this should have an impact" rather than from experiencing issues.

Unrelated, we also noticed there's no index on the long urls. Shouldn't that be there for people who enable the "findIfExists" feature? I can imagine that becoming slow when the number of urls starts to rise. Again, just what "I would expect", haven't done any testing. We just anticipate serving several millions of links over time and we're checking if there are any reasons NOT to use shlink :-)

from shlink.

acelaya avatar acelaya commented on July 18, 2024

We were planning to do some performance testing so we can do that as well. To be honest, the PR also comes from "this should have an impact" rather than from experiencing issues.

That's fine. These are sometimes "easy" to spot when you have experienced it a couple of times. Your original report did not strike as something crazy and impossible to happen.

Unrelated, we also noticed there's no index on the long urls. Shouldn't that be there for people who enable the "findIfExists" feature?

Could be helpful as well, although I suspect that feature is not extensively used.

However, I did notice a bit of slowness just by listing short URLs while testing this, so that's something I want to dig into and improve, and adding more indexes might be needed.

In general, I have focused on improving performance around visits more than around short URLs, as for most users, the list of visits will be exponentially larger than the short URLs one.

We just anticipate serving several millions of links over time and we're checking if there are any reasons NOT to use shlink :-)

I cannot guarantee there's absolutely no bottlenecks, as there's always code branches which are less frequently hit and could be skipped.

What I can guarantee is that I will work on improving any rough edges as soon as they are found.

from shlink.

marijnvandevoorde avatar marijnvandevoorde commented on July 18, 2024

That last part is what we were looking for. Also happy to contribute. I think we'll have a slightly different usecase (high volume links, each link only visited a few times), so it might be interesting for shlink to see how it holds up under that usecase ;-).

Either way, thanks for the reactiveness, much appreciated.

from shlink.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.