Giter Site home page Giter Site logo

ood_core's Introduction

OodCore

Build Status GitHub Release GitHub License

OnDemand core library with adapters for each batch scheduler.

Installation

Add this line to your application's Gemfile:

gem 'ood_core'

And then execute:

bundle

Or install it yourself as:

gem install ood_core

Usage

TODO: Write usage instructions here

Development

After checking out the repo, run bin/setup to install dependencies. Then, run bundle exec rspec spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update and commit the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/OSC/ood_core.

License

The gem is available as open source under the terms of the MIT License.

ood_core's People

Contributors

ashton22305 avatar brianmcmichael avatar dependabot-preview[bot] avatar dependabot[bot] avatar ericfranz avatar georgiastuart avatar gerald-byrket avatar haroon26 avatar hazelgrant avatar johrstrom avatar lukew3 avatar matthu017 avatar mjbludwig avatar mnakao avatar morganrodgers avatar nickjer avatar oglopf avatar plazonic avatar robinkar avatar scratchings avatar treydock avatar twavv avatar utkarshayachit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ood_core's Issues

LSF Adapter: add NodeInfo to Info using #procs for slots

Notes from an email:

Technically, LSF uses a construct called “job slots” which typically, as in our system, is configured to correspond to a core (although it needn’t necessarily). If a job runs on one job slot on one node, the execution hosts would be reported like “compute010”; two slots on a single node, it would go to “2compute010” and on up to multiple nodes where a 48-slot job, for instance, running on all slots on three nodes would be reported as “16compute010 16compute011 16compute012”. Mixed numbers like “5compute010 11compute011 16*compute012” are valid as well.

Strip leading and trailing whitespace for some `Job::Script` attributes

This was brought to our attention in the Service Now ticket INC0319296.

Basically a user was trying to submit a job with an account string that had a leading whitespace. So it was being submitted as:

qsub -A ' ACCOUNT'

The Job::Script#accounting_id should probably not return a string with leading and trailing whitespace. This argument could probably be made for a few other attributes such as #job_name, #reservation_id, and etc.

┆Issue is synchronized with this Asana task by Unito

Torque adapter should support an array for native

Right now the Torque adapter is the ood-ball for the native arguments. Every other adapter accepts an array of command line arguments, and the translation from job script headers to this array is easy to do once you understand the convention.

But with Torque, it is a hash, and for each PBS header, I have to find the corresponding key that pbs-ruby gem uses and then use that in the hash.

A simple solution would be to extend the Torque adapter to accept a Hash or an Array and respond accordingly. I'm not sure, however, if it is easy in pbs-ruby to support arbitrary command line arguments to qsub.

Add job name and account to info object

We are currently building LSF and Slurm adapters. These (and probably future adapters) should support the ability to get:

  • job name
  • account that a job is charged to

Duplication between Info#procs and Info#allocated_nodes?

      # Set of machines that is utilized for job execution
      # @return [Array<NodeInfo>] allocated nodes
      attr_reader :allocated_nodes

and

      # Number of procs allocated for job
      # @return [Fixnum, nil] allocated total number of procs
      attr_reader :procs

In what case will this be true:

info.allocated_nodes(&:procs).reduce(&:procs) != info.procs

Script#workdir as a Pathname object may be problematic for LSF 9+

Documentation for LSF 9.1.2: https://www.ibm.com/support/knowledgecenter/en/SSETD4_9.1.2/lsf_command_ref/bsub.1.html

-cwd "current_working_directory"
Specifies the current working directory for job execution. The system creates the CWD if the path for the CWD includes dynamic patterns for both absolute and relative paths. LSF cleans the created CWD based on the time to live value set in the JOB_CWD_TTL parameter of the application profile or in lsb.params.
The path can include the following dynamic patterns:

  • %J - job ID
  • %JG - job group (if not specified, it will be ignored)
  • %I - index (default value is 0)
  • %EJ - execution job ID
  • %EI - execution index
  • %P - project name
  • %U - user name
  • %G - user group

For example, the following command creates /scratch/jobcwd/user1/_0/ for the job CWD:

bsub -cwd "/scratch/jobcwd/%U/%J_%I" myjob

Of course, this is an LSF specific feature for job submission.

We should change the documentation of Script#workdir to return a String, not a Pathname object. We would stop coercing workdir into a Pathname and just keep it as a String. The adapters would need to be updated accordingly, but since no one uses the accessors on Script outside of the adapters it should be a safe change.

The other option is the adapter should just use -cwd to the provided path if workdir is set, and then for LSF users if they want a path with "dynamic patterns" they can add that to the script headers. We would just need to be careful in apps like My Jobs to not "always set" the workdir to the job directory using Script#workdir because then we would remove the ability to customize the script via the headers in this regard. In that case, we would instead be sure to always cd to the desired job directory and then execute the script from there.

Note: LSF 8.3 doesn't support these dynamic parameters in cwd

LSF Adapter: Add support for LSF9+

In LSF9+ bjobs offers more flags for getting more output and regularly formatted output from bjobs. We should be able to get accurate runtime and other attributes.

Accounting for multiple queues available

In the cluster config, for jobs, we have this: (see original discussion OSC/ood_appkit#36)

jobs:
  adapter: torque
  host: "ruby-batch.osc.edu"
  lib: "/opt/torque/lib64"
  bin: "/opt/torque/bin"

Notice, there is no queue information listed. In particular, what default queue should be used, and what is the list of all queues available? Each resource managers offers a method to get a queue list (qstat -Q, squeue, bqueues etc.). Perhaps ood_job adapters should have a method to return a list of queues (and information about them?) available for each resource manager.

At OSC we currently can submit jobs without specifying the queue and the job ends up in the appropriate queue. At other centers this may not be the place. For example, at TSC documentation their asks users to specify the queue they want to submit to as a header in their batch scripts.

So the following issue exists:

  1. whether or not queue is a required argument when submitting a job
  2. if queue is required, how to get a list of available queues
  3. (optional) a default queue to use out of that list, if one must be set

┆Issue is synchronized with this Asana task by Unito

Change definition of `Script#min_phys_memory`

Currently the definition of Script#min_phys_memory is:

The minimum amount of physical memory in kilobyte that should be available for the job

This is not possible for Slurm. Slurm only has:

The minimum amount of physical memory in kilobyte per node that should be available for the job

So one possible change in the definition is:

The minimum amount of physical memory in kilobyte across all nodes or per node (dependent upon the resource manager) that should be available for the job

Note: We can't compute the memory per node or memory across all nodes because the Script object may not know the number of nodes being requested.

Add feature to wait for web server port

The iHPC app's panel shows as "Running" in the Dashboard when the connection file is found in the Interactive Session's working directory. This file is generated right after the iHPC app's script is forked off, but not necessarily when the web server is fully loaded from within that script. This leads to the those "unable to connect" errors very early on for the user.

One option is to wait until the web server is fully loaded before providing the user with the "Connect To Server" button in the Dashboard panel. So we would need a Bash helper method that waits until the specified port that the web server listens on is used. Then we use this method right before we generate the connection file in the after.sh script.

An example being...

# after.sh

# Wait for the Jupyter Notebook server to start
echo "Waiting for Jupyter Notebook server to open port ${port}..."
if wait_until_port_used "${host}:${port}" 60; then
  echo "Discovered Jupyter Notebook listening on port ${port}!"
else
  echo "Timed out waiting for Jupyter Notebook to open port ${port}!" ; exit 1
fi
sleep 2

LSF Adapter: Add support all job submission options

Most important:

  • 1. Script#native
  • 2. job_environment

Other Script parameters that look like they are supported:

  • start_time -b [[year:][month:]day:]hour:minute

  • submit_as_hold -H

  • rerunnable -r

  • email -u mail_user

    -u mail_user
    Sends mail to the specified email destination. To
    specify a Windows user account, include the domain
    name in uppercase letters and use a single
    backslash (DOMAIN_NAME\user_name) in
    a Windows command line or a double backslash
    (DOMAIN_NAME\user_name) in a UNIX
    command line.

  • email_on_started -B

  • email_on_terminated -N

  • wall_time

    -W [hour:]minute[/host_name |
    /host_model]
    Sets the runtime limit of the batch job. If a UNIX
    job runs longer than the specified run limit, the
    job is sent a SIGUSR2 signal, and is killed if it
    does not terminate within ten minutes. If a
    Windows job runs longer than the specified run
    limit, it is killed immediately. (For a detailed
    description of how these jobs are killed, see
    bkill.)

  • error_path -e

  • output_path -o

  • input_path -i input_file (supports %I and %J in input file name) or -is

  • priority -sp priority (integer 1 - MAX_USER_PRIORITY)

  • queue_name

  • reservation_id: -U reservation_ID

These don't look like they are supported

  • args
  • join_files
  • min_phys_memory -M [MB] claims rosetta; bjobs man page for 8.3 says:

    -M mem_limit
    Sets a per-process (soft) memory limit for all the
    processes that belong to this batch job (see
    getrlimit(2)).

    By default, the limit is specified in KB. Use
    LSF_UNIT_FOR_LIMITS in lsf.conf to specify a
    larger unit for the limit (MB, GB, TB, PB, or EB).

LSF Adapter: add "estimated runtime" by subtracting current time from start time

It is an estimate. If the job is never suspended after starting, this will be accurate.

LSF 9+ offers the ability to modify the output of the bjobs and specify runtime in the output. So we will be able to provide a more accurate runtime for later versions at that time.

This will allow an empty column in Active Jobs to be set with a value.

The used_port helper fails if no host specified

The used_port bash helper in batch connect fails if no host is specified. You get:

$ expr "22" : '\(.*\):' 2>/dev/null || echo "localhost"

localhost

Notice the new line that should not be present. But it is successful if you pass in a host:

$ expr "host:22" : '\(.*\):' 2>/dev/null || echo "localhost"
host

Slurm node list

After using TACC I noticed a new format that the node list can come in:

c427-032,c429-002

I do not believe this is covered by the current Slurm adapter. In fact, I need to test the following formats:

c457-[011-012]
c439-021,c450-033
c439-[121-122]
c438-[062,104]
c433-[011,013]
c438-[052-053]
c431-[012,072]
c427-032,c429-002
c410-102,c414-004
c457-[001-002]
c474-[004,022]
c452-[054,121]
c453-[101,112]
c454-[021,064]

Support `headers.sh` for adding directives

This would be loaded above the script_wrapper, if it exists in the template.

The goal of this would be to make it easier to add custom arguments, using headers that people are used to working with, instead of having the only option be modifying submit.yml, which can be more challenging because you have to translate the header directives to either array or hash based on which adapter you are using.

Also, in the future we may support adapters that use the C library for LSF, Slurm, PBSPro, etc. like we do for torque. The headers.sh would remain the same for the specific resource manager, regardless of the adapter type used.

All of our example apps could have a headers.sh with a single comment # add custom resource manager directives here.

Remove Job::NodeRequest

Due to the complexity in requesting nodes, tasks, cores, gpus, and other properties on a node on PBS, Slurm, and LSF it may be best to remove support for OodCore::Job::NodeRequest for the time being.

If an app wants to request node-like options then it will use the #native feature for the corresponding resource manager library.

Inconsistency in slurm spec tests?

In slurm_spec we have this:

describe "#submit" do
  def build_script(opts = {})
    OodCore::Job::Script.new(
      {
        content: content
      }.merge opts
    )
  end
  # ...
  subject { adapter.submit(script: build_script) }

  it "returns job id" do
    is_expected.to eq("job.123")
    expect(slurm).to have_received(:submit_string).with(content, args: [], env: {})
  end

  context "with :queue_name" do
    before { adapter.submit(script: build_script(queue_name: "queue")) }

    it { expect(slurm).to have_received(:submit_string).with(content, args: ["-p", "queue"], env: {}) }
  end

We specify subject to be the return of the method call adapter.submit(script: build_script). But this subject is only used for one test, it "returns job id" do as what follows are multiple contexts, where the "subject" of the context is actually in the before.

Would it be more appropriate for the initial test to work the same way? According to http://betterspecs.org/#subject the user of subject is for multiple tests sharing the same subject, but we don't seem to have that here.

Just trying to understand, see if I'm missing something.

LSF job not ending if batch script exits

If the batch script exits but the forked off template/script.sh is still running then LSF keeps the batch job alive.

This is problematic as I have the batch script exit if it times out waiting for the forked server to open its assigned port. The user will then see their Session in a perpetual "Starting..." state.

OodCluster needs documentation

OodCore is pretty much only documented at the code-level at this point.

The OodCluster object in particular is in need of README-level documentation of it's public methods to really be usable for app development by outsiders.
https://github.com/OSC/ood_core/blob/master/lib/ood_core/cluster.rb

At 0.0.4, this repo probably isn't stable enough to undertake a full documentation workup, but I wanted to put it out there as a pain point.

┆Issue is synchronized with this Asana task by Unito

Deprecate `v1` backwards compatibility?

I feel we can safely deprecate the following code:

# Parse a list of clusters from a 'v1' config
# NB: Makes minimum assumptions about config
def parse_v1(id:, cluster:)
c = {
id: id,
metadata: {},
login: {},
job: {},
acls: [],
custom: {}
}
c[:metadata][:title] = cluster["title"] if cluster.key?("title")
c[:metadata][:url] = cluster["url"] if cluster.key?("url")
c[:metadata][:private] = true if cluster["cluster"]["data"]["hpc_cluster"] == false
if l = cluster["cluster"]["data"]["servers"]["login"]
c[:login][:host] = l["data"]["host"]
end
if rm = cluster["cluster"]["data"]["servers"]["resource_mgr"]
c[:job][:adapter] = "torque"
c[:job][:host] = rm["data"]["host"]
c[:job][:lib] = rm["data"]["lib"]
c[:job][:bin] = rm["data"]["bin"]
c[:job][:acls] = []
end
if v = cluster["validators"]
if vc = v["cluster"]
c[:acls] = vc.map do |h|
{
adapter: "group",
groups: h["data"]["groups"],
type: h["data"]["allow"] ? "whitelist" : "blacklist"
}
end
end
end
c
end

as all HPC centers that I worked with installing OOD uses the new v2 cluster config.

Also the v1 backwards compatibility wouldn't support MyJobs and ActiveJobs.

┆Issue is synchronized with this Asana task by Unito

Hostname doesn't give correct host all the time

This line:

"host=$(hostname)\n[[ -e \"#{before_file}\" ]] && source \"#{before_file}\""

uses hostname to get the host of the machine.

At Arizona, this gives:

┌─[jnicklas@i1n5][~]
└─▪ hostname
i1n5

which we are unable to SSH to from the OnDemand node. Maybe this can be fixed by the sys admins, but an alternative solution may need to be looked into. For example:

┌─[jnicklas@i1n5][~]
└─▪ hostname -A
i1n5.ocelote.hpc.arizona.edu i1n5.cm.cluster i1n5.ib.cluster 

where I am able to successfully SSH to i1n5.ocelote.hpc.arizona.edu from the OnDemand node.

Define Adapter#tr for localization support

In Qt, localization is used by wrapping a tr method around every string. In Rails, its using http://guides.rubyonrails.org/i18n.html I18n.translate method which has a short t i.e. instead of

  def index
    flash[:notice] = "Hello flash!"
  end

you would do

  def index
    flash[:notice] = t(:hello_flash)
  end

And then in config/locales/en.yml you would have:

en:
  hello_flash: Hello flash!

and config/locales/fr.yml you would have:

fr:
  hello_flash: Bonjour Flash

Typically this works because there is some global value like I18n.locale == 'en' or I18n.locale == 'fr' specifying the locale to use, so when the translate or t method is called it knows what value to return for the key :hello_flash.

In OnDemand's case, our "locale" is the adapter subclass type being used (Slurm, PBS Pro, Torque) and an example of a word that needs internalized is "queue" (for Torque) and "partition" (for Slurm).

So instead we should use something like:

Adapter#tr(:queue) which the base class returns "queue" (just :queue.to_s) and the Slurm adapter returns "partition" and the Torque adapter returns "queue" i.e. base class implementation:

def tr(word)
  word.to_s
end

and subclass:

def tr(word)
  { queue: "partition" }.fetch(word, super(word))
end

┆Issue is synchronized with this Asana task by Unito

Export host and port

The host and port env vars defined in before.sh should be made available to the forked script.sh file. This can be done by exporting them.

Do not export the passwd env var though.

Should we keep maintaining separate gems for resource manager adapters?

This issue is to capture a discussion around the merits of continuing to maintains separate gems for resource manager adapters, now that we have ood_core.

Original suggestion from @nickjer on the LSF performance issues sparked this discussion:

@nickjer:

I feel if you go this route, it may be best to break this off into a separate gem much like pbs-ruby.

@ericfranz:

I don't think we will realize any benefits by breaking this off into a separate gem.

Add an Job::Adapter#info_where as alternative to info_all

An example: Adapter#info_where(user: "efranz") or maybe better is Adapter#info_where_user("efranz"). We could also add Adapter#info_where_queue("debug") etc.

The Adapter superclass can offer a default implementation, which does a select on the results from info_all.

Individual adapters can optionally override with an optimized implementation. For example, bjobs by default shows only the user's jobs.

It could also be a Adapter#info_where filter and accept a hash where keys are methods on Job::Info.

LSF Adapter: Address performance issues

Need to do a little performance analysis. A better algorithm might do. However, there is a lot of parsing and string manipulation going on. Using the C library might make it faster (via Fiddle http://ruby-doc.org/stdlib-2.2.0/libdoc/fiddle/rdoc/Fiddle.html).

It was observed when testing that 30 jobs can take 1-2 seconds and 4000 jobs could take 28 seconds. This is way too long.

Many Fiddle tutorials online. One example: http://blog.honeybadger.io/use-any-c-library-from-ruby-via-fiddle-the-ruby-standard-librarys-best-kept-secret/

See http://publibfp.dhe.ibm.com/epubs/pdf/c2753121.pdf and lsb_openjobinfo() and lsb_readjobinfo() and jobInfoEnt structure. Also, here is some example code that uses the C library: https://github.com/PlatformLSF/lsf-drmaa/blob/025f9c49af48e410dc0ab0b9c611c42935dc09eb/lsf_drmaa/job.c

┆Issue is synchronized with this Asana task by Unito

Does not detect listening port if on specific ip

If a server is listening on a specific ip and port combination, then the bash helpers do not properly detect if the port is open. That is because the bash helpers just check for open ports on localhost.

The helper should allow the user to specify the ip and port combo when checking if the port is being used.

LSF adapter is treating cores as nodes

Requesting a job on a cluster with nodes that have 20 cores per node as such:

bsub -n 10 -R "span[ptile=10]"

will give you a single node with access to 10 out of the 20 cores on it.

This is what I currently see in the LSF adapter when viewing info for that job:

OodAppkit.clusters['ada'].job_adapter.info("7168816").allocated_nodes
=> [
     #<OodCore::Job::NodeInfo:0x000000021fbc18 @name="sx6036-1202", @procs=1>,
     #<OodCore::Job::NodeInfo:0x000000021fb920 @name="sx6036-1202", @procs=1>,
     #<OodCore::Job::NodeInfo:0x000000021fb538 @name="sx6036-1202", @procs=1>,
     #<OodCore::Job::NodeInfo:0x000000021fafc0 @name="sx6036-1202", @procs=1>,
     #<OodCore::Job::NodeInfo:0x000000021fa1b0 @name="sx6036-1202", @procs=1>,
     #<OodCore::Job::NodeInfo:0x000000021f9f08 @name="sx6036-1202", @procs=1>,
     #<OodCore::Job::NodeInfo:0x000000058cbf68 @name="sx6036-1202", @procs=1>,
     #<OodCore::Job::NodeInfo:0x000000058cbec8 @name="sx6036-1202", @procs=1>,
     #<OodCore::Job::NodeInfo:0x000000058cbe28 @name="sx6036-1202", @procs=1>,
     #<OodCore::Job::NodeInfo:0x000000058cbd88 @name="sx6036-1202", @procs=1>
   ]

This should instead be:

OodAppkit.clusters['ada'].job_adapter.info("7168816").allocated_nodes
=> [
     #<OodCore::Job::NodeInfo:0x000000021fbc18 @name="sx6036-1202", @procs=10>
   ]

Debugging info:

$ bjobs -a -w -W 7168816
JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME  PROJ_NAME CPU_USED MEM SWAP PIDS START_TIME FINISH_TIME SLOTS
7168816 jnicklas RUN   sn_short   login7      sx6036-1202:sx6036-1202:sx6036-1202:sx6036-1202:sx6036-1202:sx6036-1202:sx6036-1202:sx6036-1202:sx6036-1202:sx6036-1202 sys/dashboard/dev/jupyter 01/24-14:53:50 082810563939 000:00:07.00 75     0      31211,31395,31399,31419,31761 01/24-14:53:51 -  10

Test fails when local time is not Eastern Standard Time

Test fails because its testing the conversion of a timestamp into a year month day... string using localtime conversion. The expected result is hardcoded using Eastern Standard Time.

context "with :start_time" do
before { adapter.submit(script: build_script(start_time: 1478631234)) }
it { expect(pbs).to have_received(:submit_string).with(content, queue: nil, headers: {Execution_Time: "201611081353.54"}, resources: {}, envvars: {}) }
end

[efranz@gwdev02 ood_core]$ bundle exec rspec ./spec/job/adapters/torque_spec.rb:198
Run options: include {:locations=>{"./spec/job/adapters/torque_spec.rb"=>[198]}}
F

Failures:

  1) OodCore::Job::Adapters::Torque#submit with :start_time should have received submit_string("my batch script", {:queue=>nil, :headers=>{:Execution_Time=>"201611081353.54"}, :resources=>{}, :envvars=>{}}) 1 time
     Failure/Error: it { expect(pbs).to have_received(:submit_string).with(content, queue: nil, headers: {Execution_Time: "201611081353.54"}, resources: {}, envvars: {}) }

       #<Double (anonymous)> received :submit_string with unexpected arguments
         expected: ("my batch script", {:queue=>nil, :headers=>{:Execution_Time=>"201611081353.54"}, :resources=>{}, :envvars=>{}})
              got: ("my batch script", {:queue=>nil, :headers=>{:Execution_Time=>"201611081253.54"}, :resources=>{}, :envvars=>{}})
       Diff:
       @@ -1,6 +1,6 @@
        ["my batch script",
         {:queue=>nil,
       -  :headers=>{:Execution_Time=>"201611081353.54"},
       +  :headers=>{:Execution_Time=>"201611081253.54"},
          :resources=>{},
          :envvars=>{}}]

     # ./spec/job/adapters/torque_spec.rb:198:in `block (4 levels) in <top (required)>'

Finished in 0.02129 seconds (files took 0.22371 seconds to load)
1 example, 1 failure

Failed examples:

rspec ./spec/job/adapters/torque_spec.rb:198 # OodCore::Job::Adapters::Torque#submit with :start_time should have received submit_string("my batch script", {:queue=>nil, :headers=>{:Execution_Time=>"201611081353.54"}, :resources=>{}, :envvars=>{}}) 1 time

Make some `Info` and `NodeInfo` attributes optional

Make some Info and NodeInfo attributes optional. For example:

job_info.cpu_time
# => nil

Some of these attributes (in particular Info#submit_host, Info#cpu_time, and NodeInfo#procs for Slurm) can not be retrieved for a given resource manager. Setting them nil will make them easy to check for existence and let the app display a default value if it wants.

For example:

<li class="job-info">
  <%= content_tag :ul, "Job Id = #{info.job_id}" %>
  <%# Don't display list item if it doesn't exist %>
  <%= content_tag :ul, "Submit Host = #{info.submit_host}" if info.submit_host %>
  <%# Set a default value for list item if it doesn't exist %>
  <%= content_tag :ul, "CPU Time = #{info.cpu_time || "Not Supported"}" %>
</li>

Split Adapter#info into separate methods?

We have one method that has 2 different ways of executing (one with id specified and one without) and two different return types:

  1. if id is specified, implement algorithm optimized for getting the info of one job, and return a hash
  2. if id is not specified, implement an algorithm optimized for getting the info of all the jobs, and return an array

It seems like we should split these into two separate methods. I think we overlooked this because the underlying implementation uses a single command (qstat, bjobs, squeue).

Ideas: could be info_all and info_find(id:) or just info_all and info(id:). That said, its kinda late in the game to make this change, it might be expensive.

Provide way to source Bash helper methods to remote hosts

The Bash helper methods:

  • create_passwd
  • find_port
  • ...

may need to be used on the host machines assigned to the batch job aside from the master node. One example is to start servers on the worker nodes using pbsdsh .... In order to start the servers we need to choose an available port to listen on using the find_port helper function.

One simple way to make the Bash helper methods more portable is to wrap them up in another Bash function such as...

source_helpers () {
  find_port () {
    # ...
  }
  create_passwd () {
    # ...
  }
  # ...
}
export -f source_helpers

by calling source_helpers in the main script, all of those functions are now available to it.

To make it available to pbsdsh scripts we could do...

pbsdsh bash -c "
  $(declare -f source_helpers)
  source_helpers

  ./start_server --port \$(find_port)
" &

The declare statement basically dumps the code for the helper functions in-place. Then we call that function to make the helpers available.

Submitting with native arguments for all adapters that accept arrays should also accept hash

Currently for all adapters not Torque, we can set on the Script#native an array of custom arguments i.e.

["-n", "5"]

Arrays don't work well for merging different sets of submission arguments. We could update these adapters to accept either an array or a hash. One issue with this would be being able to set flags without arguments. A solution could be that any argument that is nil is omitted. Example:

native = { "-n" => "5", "-R" => "span[ptile=2]", "-B" => nil, "-N" => nil }
native.to_a.flatten.compact
# => ["-n", "5", "-R", "span[ptile=2]", "-B", "-N"]

┆Issue is synchronized with this Asana task by Unito

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.