Giter Site home page Giter Site logo

Comments (24)

jasongrishkoff avatar jasongrishkoff commented on September 9, 2024

If I run meteor mongo and then db.currentOp({secs_running: {$gte: 300}}) I get two queries. Strangely both of them have query: {} so it's hard to say what they're running.

{
        "inprog" : [
                {
                        "desc" : "rsSync",
                        "threadId" : "140578758936320",
                        "active" : true,
                        "opid" : 5,
                        "secs_running" : 468,
                        "microsecs_running" : NumberLong(468308691),
                        "op" : "none",
                        "ns" : "local.oplog.rs",
                        "query" : {

                        },
                        "numYields" : 0,
                        "locks" : {

                        },
                        "waitingForLock" : false,
                        "lockStats" : {
                                "Global" : {
                                        "acquireCount" : {
                                                "r" : NumberLong(19),
                                                "w" : NumberLong(3),
                                                "R" : NumberLong(1),
                                                "W" : NumberLong(1)
                                        }
                                },
                                "Database" : {
                                        "acquireCount" : {
                                                "r" : NumberLong(7),
                                                "w" : NumberLong(2),
                                                "W" : NumberLong(1)
                                        }
                                },
                                "Collection" : {
                                        "acquireCount" : {
                                                "r" : NumberLong(6)
                                        }
                                },
                                "Metadata" : {
                                        "acquireCount" : {
                                                "w" : NumberLong(1)
                                        }
                                },
                                "oplog" : {
                                        "acquireCount" : {
                                                "r" : NumberLong(1),
                                                "w" : NumberLong(2)
                                        }
                                }
                        }
                },
                {
                        "desc" : "ReplBatcher",
                        "threadId" : "140578607867648",
                        "active" : true,
                        "opid" : 9,
                        "secs_running" : 468,
                        "microsecs_running" : NumberLong(468305992),
                        "op" : "none",
                        "ns" : "local.oplog.rs",
                        "query" : {

                        },
                        "numYields" : 0,
                        "locks" : {

                        },
                        "waitingForLock" : false,
                        "lockStats" : {
                                "Global" : {
                                        "acquireCount" : {
                                                "r" : NumberLong(2)
                                        }
                                },
                                "Database" : {
                                        "acquireCount" : {
                                                "r" : NumberLong(1)
                                        }
                                },
                                "oplog" : {
                                        "acquireCount" : {
                                                "r" : NumberLong(1)
                                        }
                                }
                        }
                }
        ],
        "ok" : 1
}

from meteor-apm-server.

jehartzog avatar jehartzog commented on September 9, 2024

Are you using the modifications included with PR #16 & #17? I saw the same insane mongo container CPU usage, I didn't quite fix it but reduced it to manageable levels by reducing processing intervals.

Before those changes it would fail due to running out of CPU resources after a couple days. Now I can keep 7 days worth of data on a t2.micro unlimited container, and CPU is usage sits around ~90%, so my web UI stays responsive enough for me. This is still kind of expensive ($13/month for excess CPU credits in addition to $8/month for the instance) but works well enough for now.

from meteor-apm-server.

lmachens avatar lmachens commented on September 9, 2024

I have high CPU usage too but the UI response fine, so I didn't spend to much time on investigation.
Maybe some mongodb indexes are missing and the queries are taking to long?

from meteor-apm-server.

jasongrishkoff avatar jasongrishkoff commented on September 9, 2024

Hi @jehartzog and @lmachens thanks for getting back. After reporting this I did a reboot, and for the last 24 hours it's been hanging out around 100% (of a possible 200%), so things are running fairly smoothly. Perhaps some indexes just needed some time to work out; or, similarly, redis-oplog had to kick into gear. The good news is that it seems to be working fairly stable on a $15/month Digital Ocean droplet (2gb ram and 2 CPU). Will close this issue for now, and re-open if it crashes again.

from meteor-apm-server.

jasongrishkoff avatar jasongrishkoff commented on September 9, 2024

Okay so about 5 hours ago it looks like mongodb might have crashed because of excessive CPU usage? Not sure... this is all I see in the logs:

(node:17383) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): MongoError: connection 0 to 127.0.0.1:3001 timed out

Grr.

from meteor-apm-server.

jehartzog avatar jehartzog commented on September 9, 2024

Double check that you are using code from the two PR's I added above, they helped manage (but not completely fix) this exact issue for me.

from meteor-apm-server.

jasongrishkoff avatar jasongrishkoff commented on September 9, 2024

@jehartzog Yep, I most certainly am running those -- I only set this server up last week. Here's what I tried:

  1. A mup deploy, which we've discussed in another thread. This worked for a bit, but then stopped
  2. So, I decided to set up an Ubuntu server and run Meteor on there directly rather than via mup. This worked swell for ~3 days, but stopped because the mongodb instance was using 100% CPU.
  3. I upgraded the server to have 2 cores and made sure I had oplog working well. This seemed to peg mongodb at 100%, and again it worked for a few days.
  4. Then yesterday it stopped, and the logs showed that MongoError above. So, I rebooted everything 13 hours ago when I made that post, and it seems to have worked since then

screen shot 2018-02-20 at 10 07 25 am

Stumped for now.

from meteor-apm-server.

lmachens avatar lmachens commented on September 9, 2024

You can adjust metricsLifetime. If you limit it to like 3 days, the cpu usage should be better.

from meteor-apm-server.

jasongrishkoff avatar jasongrishkoff commented on September 9, 2024

I believe my settings.json already has it set to 3 days:

{
        "metricsLifetime": 259200000
}

My Digital Ocean droplet has 60gb of space -- of which it says 6gb has been used so far. The Meteor node process is at ~1% CPU and the mongodb one consistently at 100%. I'm guessing that mongodb is the culprit -- especially because the period where Kadira went down syncs perfectly with an error message related to mongodb:
(node:17383) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): MongoError: connection 0 to 127.0.0.1:3001 timed out

from meteor-apm-server.

lmachens avatar lmachens commented on September 9, 2024

My CPU usage is always at 100% too, but it is fine for me because of 2 cores. No issues in the UI. The aggregations are the reason for the high CPU.
Maybe you can try out what happens when you deactivate/comment out some aggregations in rma server?

from meteor-apm-server.

jasongrishkoff avatar jasongrishkoff commented on September 9, 2024

I went ahead and changed the first contsant from 60 * 1000 to 5 * 60 * 1000 and when my server restarted mongodb went from 100% to 200% (so using max of both cores). I'm guessing that's because of this:

Meteor.startup(() => {
  runShort();
  runMedium();
  runLong();
});

Which gets it to run when Meteor first reboots. So I gave it a few minutes to settle down, but... 30 minutes later and it hasn't died down yet.

Edit -- completely shut down and restarted Meteor so that mongodb would reset. Launched with just runShort() at 5 minute intervals. CPU usage on mongodb spikes to 100% when the server first starts; 3 minutes later it's still pegged at 100%.

from meteor-apm-server.

jasongrishkoff avatar jasongrishkoff commented on September 9, 2024

Looks like my mongodb is now at 200% again -- both processors completely hogged up and the UI pretty much unresponsive. I'm going to expand to 3 CPUs and see what happens.

Edit -- that didn't work, as their package was 3 CPUs and 1gb of ram, which isn't enough ram to boot Meteor properly.

from meteor-apm-server.

jasongrishkoff avatar jasongrishkoff commented on September 9, 2024

Been running it with 4 cores and 8gb of ram and she's finally happy. But I'm not super happy about the $40/month that's setting me back! Going to shop around for another host at some point.

from meteor-apm-server.

jehartzog avatar jehartzog commented on September 9, 2024

Unless you're experiencing some massively different customer usage than I'm used to (~200 peak active simultaneous users), you may want to dig more into fixing the issue causing the mongo container CPU usage. I don't think you'll be able to find a significantly cheaper hosting service that can do the CPU processing that is currently required.

from meteor-apm-server.

jasongrishkoff avatar jasongrishkoff commented on September 9, 2024

Okay so here's an observation -- in the server/rma/server.js file, each of runShort, runMedium and runLong get fired at the same time when Meteor starts. I put a little timer on each. At startup, this is how long each one took to run:

  • runMedium = 196 seconds
  • runShort = 216 seconds
  • runLong = 250 seconds

Once they were done, runShort was able to run on its own again and took a total of 348 seconds! As soon as it was done, it started instantly again -- this time taking 480 seconds. Sure enough, the instant it was done another runShort fired, this time taking 579 seconds. So... getting longer each time, heading in the direction of 10+ minutes per runShort(). Not so short, hey!

Each one of them takes up at least 100% of a CPU -- and in my case, with 4 cores, the CPU was at 400% when all 3 functions were running at the same time. It also looks like whenever runMedium fires (30 minutes), runShort will also fire. And whenever runLong fires (3 hours), both runShort and runMedium will fire. So I've constantly got my CPU maxed out at 100% with runShort(), then every 30 minutes it bumps up to 200%+, and then every 3 hours it gets hammered at a full 400%.

Some steps to take to figure this out and mitigate the issue:

  1. Make sure none of them run at exactly the same time
  2. Figure out why the runShort(), which is set to run every 60 seconds, takes more like 600 seconds

I'll report back if I make any progress -- this is my first time digging into what actually happens with each aggregation, so if you've got some thoughts in the meanwhile that'd be great :)

from meteor-apm-server.

jasongrishkoff avatar jasongrishkoff commented on September 9, 2024

Even with 4 cores, my mongodb keeps crashing and the whole site becomes unusable :(

This could be because I've got so many methods / publications? For example, there'll often be 25k to 35k methods per hour that it has to aggregate. Are all these queries indexed?

Edit -- I see in lib/collections.js that there's MethodTraces.rawCollection().createIndex({ appId: 1, host: 1, startTime: 1 }), but if I run db.methodTraces.getIndexes() the only index that exists is for _id. So... these queries might not be indexed.

Edit 2 -- apparently db.methodTraces and db.methodsTraces are different (one with "s", one without. Not sure what's going on there).

Edit 3 -- indeed, there's a lack of indexing going on here. Check that COLLSCAN below.

db.methodsMetrics.find({"id" : {"appId" : "5MiwLbzwmiauYGecP","host" : "b8ae8d37d76e","name" : "getBlogNames","time" : ISODate("2018-02-24T20:02:00Z"), "res" : "1min"}}).explain()
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "meteor.methodsMetrics",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "id" : {
                                "$eq" : {
                                        "appId" : "5MiwLbzwmiauYGecP",
                                        "host" : "b8ae8d37d76e",
                                        "name" : "getBlogNames",
                                        "time" : ISODate("2018-02-24T20:02:00Z"),
                                        "res" : "1min"
                                }
                        }
                },
                "winningPlan" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                                "id" : {
                                        "$eq" : {
                                                "appId" : "5MiwLbzwmiauYGecP",
                                                "host" : "b8ae8d37d76e",
                                                "name" : "getBlogNames",
                                                "time" : ISODate("2018-02-24T20:02:00Z"),
                                                "res" : "1min"
                                        }
                                }
                        },
                        "direction" : "forward"
                },
                "rejectedPlans" : [ ]
        },
        "ok" : 1
}

from meteor-apm-server.

jasongrishkoff avatar jasongrishkoff commented on September 9, 2024

WOW. Okay so I added some indexes and it's like night and day. Instead of runShort() taking 2+ minutes, it now takes ~500 milliseconds. Meanwhile my mongodb CPU is at almost 0%. I'll put in a pull request for the changes.

from meteor-apm-server.

jehartzog avatar jehartzog commented on September 9, 2024

@jasongrishkoff This looks great, thanks for keeping up the hunt to figure this out!

Are you 100% sure those indexes are the only things you changed to fix the CPU usage? For one thing, number of those indexes already exist. I also added the indexes you recommend and I'm not seeing a change in CPU usage.

from meteor-apm-server.

jasongrishkoff avatar jasongrishkoff commented on September 9, 2024

@jehartzog you added the indexes in this pull request, right? #23

I'm 99.99999999% positive that was the only change, and it was like night and day.

When you fire db.methodsTraces.getIndexes() do you get one that looks like this?

        {
                "v" : 1,
                "key" : {
                        "id" : 1
                },
                "name" : "id_1",
                "ns" : "meteor.methodsTraces"
        }

from meteor-apm-server.

jehartzog avatar jehartzog commented on September 9, 2024

I did use the indexes you added in #23, the only thing I did different was add them in the background.

These are the only 4 collections that you edited in the PR, but methodsTraces is not one of those changed. I'm not sure why I would check indexes on that collection to ensure I applied the PR correctly :D.

  ErrorMetrics.rawCollection().createIndex({id:1});                                                                    
  SystemMetrics.rawCollection().createIndex({id:1});                                                                   
  MethodsMetrics.rawCollection().createIndex({id:1});                                                                  
  PubMetrics.rawCollection().createIndex({id:1}); 

from meteor-apm-server.

jasongrishkoff avatar jasongrishkoff commented on September 9, 2024

@jehartzog my apologies, I definitely should have said db.methodsMetrics.getIndexes(), which should give you a response as such:

        {
                "v" : 1,
                "key" : {
                        "id" : 1
                },
                "name" : "id_1",
                "ns" : "meteor.methodsMetrics"
        }

I tested it by doing db.methodsMetrics.dropIndex('id_1') and the negative impact was instant:

I20180226-06:49:30.941(0)? finished runShort 444ms <-- with the index
I20180226-06:52:21.866(0)? finished runShort 111369ms <-- without the index

from meteor-apm-server.

jehartzog avatar jehartzog commented on September 9, 2024

I went back and reverified everything, don't know what I was smoking before but your solution is 100% effective. I had somehow missed the adding the index for pubMetrics, even though I had a history of the correct mongo command and had verified it. I probably had switched db at some point and not realized it. Thanks for working with me on it!

from meteor-apm-server.

jasongrishkoff avatar jasongrishkoff commented on September 9, 2024

@jehartzog Nice! Hopefully it saves you some money on that extra CPU you were paying for :)

from meteor-apm-server.

lmachens avatar lmachens commented on September 9, 2024

Good job guys! I created a new version with your changes. https://github.com/lmachens/meteor-apm-server/blob/master/CHANGELOG.md#108---2018-02-27

from meteor-apm-server.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.