Giter Site home page Giter Site logo

Comments (9)

salimane avatar salimane commented on June 24, 2024

@RVN-BR I have met high cpu usage and memory usage when I was using count = 1. On a 16 cores CPU loaded with freebsd, when I run more than 20 processes with count =1 (20 workers), the cpu usage would go to more than 50% and machine will start becoming a little bit slow to print resque logs. I was just doing some mysql transactions or emails in those jobs.
But as soon as I changed to 10 processes with count =10 (100 workers). the cpu load went directly to 0-3% and memory usage were tiny. (Notice that I didn't use 1 process with count = 100)
And also in monit, I added a memory threshold whereby monit will restart the workers if they are eating memories.
So make sure what you're doing in those jobs is not leaking memory first then run more processes with less count.
I haven't met something crazy but It might be happening. I need to get some time and check it very well.

from php-resque.

roynasser avatar roynasser commented on June 24, 2024

@salimane Thanks for your ideas ... in my case i'm pretty sure it IS the job that is leaking and not resque... I have read very often that the phpclamav lib is buggy but didnt find any other options... i think i will just use a clamdscan daemon and have the jobs connect by socket to it...

what happened was that with only 1 process with count 10 (10 workers total) it was eating up all the memory... it got so bad that at one point it seems that monit for some reason got lost and thought that the resque had died and perhaps was unable to execute the kill, and launched and additional process... so i ended up with an unresponsive machine and 20 workers... (how i ended up with 20 workers is a mystery still but that hadnt happened since i upgraded to your pull so for now i'm saying it is something to do with the memory problem on this test machine...) (btw, i saw 20 workers on a ruby web resque...) the only way to get the machine back was to hit stop on monit and run a killall -9 php to just force everything down...

tomorrow i'll disable the clamphp extension and start testing on another method for AV scans...

About the 10x10 arrangement vs 1x100, is there any particular reason for that? Is there something that persists in the function that we should maybe unset() ? is unset() useful or will resque just teardown the process after it executes a job anyway? just wish to have the best planning in place. Just thinking of it, 1x100 should be better because it is 1 parent process, with 100 forks for each worker... while with 10x10 it is 100 worker forks (10x10) + 10 parent processes, so 9 extra processes? I assume the parent process is smaller in this case so it is better to have 10 smaller ones?

all the best!

from php-resque.

salimane avatar salimane commented on June 24, 2024

@RVN-BR about the 10x10 arrangement vs 1x100, I just thought less numbers are more easy to manage. and with count > 1, the parent footprint is less so more of it doesn't hurt anymore.
about unset : unset() does just what it's name says - unset a variable. It does not force immediate memory freeing. PHP's garbage collector will do it when it see fits. I'm running with 5.3 (5.3.10) which has better garbage collector so a little help there. But still needs to check if we can help the garbage collector a little bit.

from php-resque.

chrisboulton avatar chrisboulton commented on June 24, 2024

@salimane Thanks for your ideas ... in my case i'm pretty sure it IS the job that is leaking and not resque... I have read very often that the phpclamav lib is buggy but didnt find any other options... i think i will just use a clamdscan daemon and have the jobs connect by socket to it...

How long does your job run for? The idea behind forking per job is so that memory leaks like this shouldn't be a problem, as the forked child is killed at the end of execution.

Which version of PHP are you running? As @salimane mentioned, the GC in 5.3 is better.

Memory leaks are a pain in the butt to deal with, so we can somehow figure out if it's php-resque, I'd like to try and get it fixed.

from php-resque.

roynasser avatar roynasser commented on June 24, 2024

Hi @salimane and @chrisboulton I'm running 5.3 also, so hopefully good there...

@chrisboulton in my particular case (which mostly made me take notice of the issue, not point a finger at phpresque, let me make this clear) the phpclamav is the cuprit, and since it is being loaded as an extension I think that it is cuasing an increased footprint in all php processes, not only the ones its called... (there is a new virus db preloading which is not paying nice with process forking if i'm not mistaken)... The job itself runs for only a few seconds depending on the file size...

I implemented a delay by adding a simple while time < certaintime: sleep(5), in order to make sure that the files are synced across filesystems in gluster, would that while sleep inside the job itself be the best way? I saw some talk of delayed jobs but didnt want something to complex? or do you foresee a problem with the job maybe stalling 30 or so seconds in a while/sleep loop?

from php-resque.

roynasser avatar roynasser commented on June 24, 2024

@salimane sorry to pick your brain some more... I recently added a memory threshold as per your suggestion and monit seems to be doing its job pretty well... I get an e-mail when it sees the phpresque "family" of processes going over the limit, etc...

I have one issue... monit isnt killing php-resque correcty for some reason... If i hit STOP in monit, then go into web-resque (ruby version on the same redis used to monitor) and watch the workers, they all quit. all the php processes also quit, everything works fine... If I use "restart" instead, it doesnt work, it will just "add" more php-resque processes without stopping/killing the initial ones...

I notice that when monit sees php-resque using too much ram, the second case happens and instead of stopping the first X workers, and then starting X workers, it just starts more... It seems like for some reason it is unable to stop the workers?

Can you double check the commands to make sure I havent missed anything? (Weird thing is, if I hit STOP, wait.... START, it works... if I hit "restart" it just piles on... I dont see a different field for different commands when restarting vs stopping starting?)

START command:

'/bin/sh -c INTERVAL=5 COUNT=10 REDIS_BACKEND=192.168.15.62:6379 APP_INCLUDE=/var/CodeRep/trunk/Jobs/jobs.php QUEUE=default VERBOSE=0 PIDFILE=/var/run/resque/worker_default.pid nohup php -f /var/CodeRep/trunk/Jobs/resque.php > /var/log/resque/worker_default.log &' timeout 30 second(s)

STOP command:

'/bin/sh -c /bin/kill -s QUIT -`cat /var/run/resque/worker_default.pidg` && rm -f /var/run/resque/worker_default.pidg && rm -f /var/run/resque/worker_default.pid; exit 0;' timeout 30 second(s)

from php-resque.

salimane avatar salimane commented on June 24, 2024

@RVN-BR
something to be aware of is how to start stop, restart monit...Also the kill command behaves a bit differently depending on your OS, shell for some reason. so I have different kill options for freebsd and ubuntu.
That said, on ubuntu for example:

  • sudo /etc/init.d/monit restart
    • restart the whole monit process including all configurations under /etc/monit/conf.d/
  • sudo /etc/init.d/monit reload
    • reload the whole monit process including all configurations under /etc/monit/conf.d/

an example of my resque monit configuration is :

check process resque_worker_SNS_high
    with pidfile /var/log/resque/worker_SNS_high.pid
    start program = "/bin/sh -c 'cd /home/salimane/htdocs/project/trunk; PIDFILE=/var/log/resque/worker_SNS_high.pid nohup php -f scripts/resque/resque.php > /var/log/resque/worker_SNS_high.log &'" as uid resque and gid resque
    stop program = "/bin/sh -c '/bin/kill -s INT -`cat /var/log/resque/worker_SNS_high.pidg` && rm -f /var/log/resque/worker_SNS_high.pidg && rm -f /var/log/resque/worker_SNS_high.pid; exit 0;'"
    if totalmem is greater than 300 MB for 10 cycles then restart  # eating up memory?
    group resque_workers

Notice the "group resque_workers", because i have monit monitoring many processes not related to resque, when i want to stop, start resque processes i use :

sudo /usr/sbin/monit -g resque_workers start
sudo /usr/sbin/monit -g resque_workers stop

Notice that if you use "sudo /usr/sbin/monit -g resque_workers stop", the monit restart command "sudo /etc/init.d/monit restart" will not restart that group, monit will just put it back at its previous state which is "stop", so you need to "sudo /usr/sbin/monit -g resque_workers start" to start that group again.
Also notice that i'm not using timeout on the start, stop command in the configuration as you are using. that may explain the restart "start more" problem.

from php-resque.

roynasser avatar roynasser commented on June 24, 2024

Hi @salimane, thanks for your input... I will look at it all over in the morning (when I'm less prone to mistakes :s)...

When I meant start/stop/restart, I was using the monit web interface, not restarting/stopping monit itself... just the service within monit...

We are using CentOS... I seem to recall not having any trouble when it was running on another test server with everything running as root, I may have a problem somewhere with users/permissions....I'll look into that more... I notice that you have an "as xxx" on your run command but not on the stop... I dont have it on mine for some reason (although in the old setup i think i had "as root" - which obviously isnt ideal)...

I'll try to investigate more tomorrow, unfortunately it isnt as obvious as i'd have liked... anyways, i'll post back if i find anything useful thks again for your time

from php-resque.

danhunsaker avatar danhunsaker commented on June 24, 2024

Has this progressed any? It's been a couple of years, so if this can be closed, it would be good to do so.

from php-resque.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.