Giter Site home page Giter Site logo

Comments (21)

SoftCreatR avatar SoftCreatR commented on May 20, 2024 1

Umami is hosted on a small Hetzner Cloud machine (CX 21). I'm using MySQL as database.

from umami.

mikecao avatar mikecao commented on May 20, 2024

That sounds really long. Those queries should be using indexed columns so it shouldn't take that long. What kind of server is it running on?

from umami.

mikecao avatar mikecao commented on May 20, 2024

I'm running on a much smaller server (although fewer users) and I haven't had queries that long. Check your database and make sure the indexes were created.

from umami.

SoftCreatR avatar SoftCreatR commented on May 20, 2024

Indexes are there.

image

from umami.

mikecao avatar mikecao commented on May 20, 2024

You can try doing an EXPLAIN on the query to see what's going on. This is the query for metrics for MySQL:

select sum(t.c) as "pageviews",
  count(distinct t.session_id) as "uniques",
  sum(case when t.c = 1 then 1 else 0 end) as "bounces",
  sum(t.time) as "totaltime"
from (
  select session_id,
  date_trunc('hour', created_at),
  count(*) c,
  floor(unix_timestamp(max(created_at)) - unix_timestamp(min(created_at))) as "time"
from pageview
where website_id=?
  and created_at between ? and ?
  group by 1, 2
) t

Replace the ? with parameters.

from umami.

mikecao avatar mikecao commented on May 20, 2024

Also, what is the date range on that query?

from umami.

SoftCreatR avatar SoftCreatR commented on May 20, 2024

Running the query itself, it takes "just" 7,125s:

select sum(t.c) as "pageviews",
  count(distinct t.session_id) as "uniques",
  sum(case when t.c = 1 then 1 else 0 end) as "bounces",
  sum(t.time) as "totaltime"
from (
  select session_id,
  date_trunc('hour', created_at),
  count(*) c,
  floor(unix_timestamp(max(created_at)) - unix_timestamp(min(created_at))) as "time"
from pageview
where website_id=2
  and created_at between from_unixtime(1597788000) and from_unixtime(1598392799)
  group by 1, 2
) t

Explained:

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY ALL 242784
2 DERIVED pageview ref pageview_created_at_idx,pageview_website_id_idx pageview_website_id_idx 4 const 242784 Using where; Using temporary; Using filesort

Also, what is the date range on that query?

7 days is set in the UI. However, the first entry was created on 2020-08-20 03:14:25.

from umami.

mikecao avatar mikecao commented on May 20, 2024

Try creating a new composite index and see if that helps:

create index pageview_website_id_created_at_idx on pageview(website_id, created_at);

from umami.

SoftCreatR avatar SoftCreatR commented on May 20, 2024

Still no change.

metrics: 10.96s
pageviews: 19.03s

from umami.

mikecao avatar mikecao commented on May 20, 2024

@SoftCreatR do you think you can send me a mysql dump of your data? I only have postgresql data. I ran a test with 500K records and it returned in 30ms. I want to see if there is something mysql specific.

from umami.

mikecao avatar mikecao commented on May 20, 2024

Ok, I mocked up a database in MySQL with a million records. Even this simple query it takes 5 seconds:

select session_id,
count(*) c
from pageview
where website_id=1
group by 1

The same query in Postgres takes 13ms. It seems MySQL has issues with group by and count in the query. May need someone with more MySQL experience to chime in here.

from umami.

SoftCreatR avatar SoftCreatR commented on May 20, 2024

I guess, the problem here is the numeric grouping. I'm not sure, but it could be, that indexes don't have that much effect in this case.

from umami.

mikecao avatar mikecao commented on May 20, 2024

I'll keep trying with different queries but it seems really odd that such is simple query is so slow.

from umami.

mikecao avatar mikecao commented on May 20, 2024

@SoftCreatR try adding this index:

create index pageview_website_id_session_id_created_at_idx on pageview(website_id, session_id, created_at);

from umami.

SoftCreatR avatar SoftCreatR commented on May 20, 2024

No change. Ill send you a dump of my db. Maybe that helps.

from umami.

mikecao avatar mikecao commented on May 20, 2024

@SoftCreatR Getting closer to a solution. I've got the main queries down to less than 2 seconds.

from umami.

mikecao avatar mikecao commented on May 20, 2024

@SoftCreatR try pulling the lastest build. I made some improvements. And make sure you create this index:

create index pageview_website_id_session_id_created_at_idx on pageview(website_id, session_id, created_at);

from umami.

SoftCreatR avatar SoftCreatR commented on May 20, 2024

Created the index, pulled the latest commits, rebuilt it. Result:

image

from umami.

SoftCreatR avatar SoftCreatR commented on May 20, 2024

It's even worse: It consumes all available RAM and CPU so it's completely useless for me atm.

from umami.

SoftCreatR avatar SoftCreatR commented on May 20, 2024

So, after some analysis, 3 queries are fired:

select sum(t.c) as "pageviews",
    count(distinct t.session_id) as "uniques",
    sum(case when t.c = 1 then 1 else 0 end) as "bounces",
    sum(t.time) as "totaltime"
  from (
     select session_id,
       DATE_FORMAT(created_at, '%Y-%m-%d %H:00:00'),
       count(*) c,
       floor(unix_timestamp(max(created_at)) - unix_timestamp(min(created_at))) as "time"
     from pageview
     where website_id=2
     and created_at between TIMESTAMP'2020-08-21 22:00:00' and TIMESTAMP'2020-08-28 21:59:59.999000'
     group by 1, 2
 ) t

Execution time (Avg): 10.375s
Explanation:

id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
1 PRIMARY ALL 460892 192252,00 100,00 100,00
2 DERIVED pageview ref pageview_created_at_idx,pageview_website_id_idx,pageview_website_id_session_id_created_at_idx pageview_website_id_session_id_created_at_idx 4 const 460892 950738,00 100,00 88,77 Using where; Using index; Using temporary; Using filesort
select DATE_FORMAT(convert_tz(created_at,'+00:00','+02:00'), '%Y-%m-%d') t,
    count(distinct session_id) y
  from pageview
  where website_id=2
  and created_at between TIMESTAMP'2020-08-21 22:00:00' and TIMESTAMP'2020-08-28 21:59:59.999000'
  group by 1
  order by 1

Execution time (Avg): 2.750s
Explanation:

id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
1 SIMPLE pageview ref pageview_created_at_idx,pageview_website_id_idx,pageview_website_id_session_id_created_at_idx pageview_website_id_session_id_created_at_idx 4 const 460860 843947,00 100,00 100,00 Using where; Using index; Using filesort
select distinct url x, count(*) y
  from pageview
  where website_id=2
  and created_at between TIMESTAMP'2020-08-21 22:00:00' and TIMESTAMP'2020-08-28 21:59:59.999000'
  group by 1
  order by 2 desc

Execution time (Avg): 18.004s
Explanation:

id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
1 SIMPLE pageview ref pageview_created_at_idx,pageview_website_id_idx,pageview_website_id_session_id_created_at_idx pageview_website_id_idx 4 const 460709 950401,00 100,00 88,77 Using where; Using temporary; Using filesort

I've updated the dump, that I've sent to you already.

from umami.

mikecao avatar mikecao commented on May 20, 2024

@SoftCreatR The data dump was very useful. I was able to fix a couple bugs from it. For example the domain field for the website should just be the domain, not including http. So thanks for that.

Using the new dump and I made a few small improvements. Here are the numbers from the details page:

image

A little higher than I'd like but not terrible. But I think I understand what is going on with your server. You are running it on a live site while I am just using a local MySQL instance with no incoming traffic. With the amount of hits you are getting, it's probably taking up all your resources for these queries. You might need a much stronger server or set up a read replica.

Originally I had a cookie using localStorage that would cache some data and save a few queries, but I removed it due to GDPR concerns. It would probably help a lot in your case. I can add it back and make it configurable. What do you think? Since you're EU based it's probably more of a concern for you.

from umami.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.