Giter Site home page Giter Site logo

Clean up --cleanup about mrjob HOT 8 CLOSED

coyotemarin avatar coyotemarin commented on August 20, 2024
Clean up --cleanup

from mrjob.

Comments (8)

coyotemarin avatar coyotemarin commented on August 20, 2024

I think the best way to handle this is have cleanup refer to what we do when the job succeeds, and have a separate option, cleanup_on_failure for when the job fails.

from mrjob.

pbharrin avatar pbharrin commented on August 20, 2024

would these options be directories/files to remove or more complex actions?

from mrjob.

coyotemarin avatar coyotemarin commented on August 20, 2024

Yeah, cleanup just removes files. Here's what I was imagining:

The default behavior would be the same, on success (cleanup), cleanup everything, and on failure (cleanup_on_failure), clean up nothing.

cleanup and cleanup_on_failure would both take lists of strings describing what to clean up, with these possible values:

  • ALL
  • NONE
  • SCRATCH
  • LOCAL_SCRATCH
  • REMOTE_SCRATCH
  • LOGS

On the command line, you could specify something like --cleanup-on-failure=LOCAL_SCRATCH,LOGS or --cleanup=NONE

Does this seem like a good design? Is it something you'd be interested in working on? :)

from mrjob.

pbharrin avatar pbharrin commented on August 20, 2024

Yeah I would be interested in working on that.
is the default behavior NONE, for both --cleanup and --cleanup-on-failure?

I had in mind a more functional approach where you could pass in function pointers that would do whatever you liked in the event of a success or failure. Perhaps you would like to be notified via text message if a job failed. My approach would not work when executing jobs from the command line, so we can go with your approach.

from mrjob.

coyotemarin avatar coyotemarin commented on August 20, 2024

Cool, it's yours (though github 500s when I try to assign it to you, darn!)

The default behavior is ALL for cleanup and NONE for cleanup_on_failure (which is the same as the current behavior, which we call IF_SUCCESSFUL).

The main point of cleanup is just to make sure temporary files don't pile up over time, so a simple approach ought to work. For example, if you want to be notified of a job's success or failure, just wrap it in something that looks for a nonzero return code or an exception (depending on how you run it).

Having said that, can you think of other use cases where more flexible cleanup actions would be helpful?

from mrjob.

coyotemarin avatar coyotemarin commented on August 20, 2024

Oh, and please work from the development branch; it's different enough that merging in changes made against master might be tricky.

from mrjob.

coyotemarin avatar coyotemarin commented on August 20, 2024

Ah, had to add you as a team member. Congrats, you are the official owner of this Issue. :)

from mrjob.

irskep avatar irskep commented on August 20, 2024

It would also be nice if the cleanup functions could be called on their own w.r.t. a job ID from the command line. Typing s3cmd del --recursive can get tedious when debugging. python mrjob/tools/emr/cleanup.py --behavior=ALL j-MY_JOB_ID would be much better.

from mrjob.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.