nimbo-sh / nimbo Goto Github PK
View Code? Open in Web Editor NEWRun compute jobs on AWS as if you were running them locally.
Home Page: https://nimbo.sh
License: GNU General Public License v3.0
Run compute jobs on AWS as if you were running them locally.
Home Page: https://nimbo.sh
License: GNU General Public License v3.0
AWS CLI v2 is recommended for general use by AWS.
Right now testing is minimal, and the quality of the tests is not fantastic.
--dry-run
tests for every CLI option.Would be desirable to aim for 100% test coverage.
Right now awscli
is only useful for s3 due to aws sync
, but the usefulness of this command is limited.
By replacing awscli
usage with our own implementation for interacting with s3 we will be able to:
To make the experience of using Nimbo more seamless, we should support Bash and Zsh autocompletion.
If you have pushed a subdirectory in your datasets folder, then rename it locally and push again, all files will be reuploaded. It seems to me that through aws mv
it should be possible to avoid this.
nimbo list-active
outputs a nice list of active instances. However, instance ids are pretty cryptic, so it is hard to know what is what. It would be really nice to be able to name instances, or perhaps to attach some sort of "description" field (that could for example contain hyperparameters, datasets used, etc).
I was finding some strange files like some_results.pkl.H23ad4 with a hex string appended in my S3 bucket. I believe this is due to the fact that remote_setup.sh syncs the results folder frequently enough that sometimes an object is partially written to disk and we try to sync it.
$ nimbo create-bucket BUCKET_NAME
produces this stacktrace.
The final lines of the stacktrace:
File "/nix/store/k8azdbzwsyklkm344271wacjjsi4mkp5-python3.8-botocore-1.20.52/lib/python3.8/site-packages/botocore/validate.py", line 293, in serialize_to_request
raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Unknown parameter in input: "DryRun", must be one of: ACL, Bucket, CreateBucketConfiguration, GrantFullControl, GrantRead, GrantReadACP, GrantWrite, GrantWriteACP, ObjectLockEnabledForBucket
At the moment CLI options like auxiliary commands like list-gpu-prices
are mixed main commands like ssh
. Group CLI help options by category.
For example:
Utilities:
list-gpu-prices
list-spot-gpu-prices
allow-current-ip
s3:
ls
pull
push
Packaging Nimbo for apt and brew will cover the majority of Nimbo usecases.
Cleanup DryRun
Describe the bug
I'm using nimbo run "python script.py"
and the script fails at "Something went wrong while connecting to the instance."
In the AWS EC2 cloud console I can see the instance up and running, plus I can connect using a separate terminal session using ssh -i {SAME_SSH_KEY_AS_IN_NIMBO_CONFIG}.pem ubuntu@{IP}
.
Expected behavior
Nimbo should succeed and fail depending on the actual state of ssh / port 22 connection.
Configuration
cloud_provider: AWS
# Data paths
local_datasets_path: data # relative to project root
local_results_path: results # relative to project root
s3_datasets_path: s3://xyz/data
s3_results_path: s3://xyz/results
# Device, environment and regions
aws_profile: default
region_name: eu-west-1
instance_type: g4dn.2xlarge
spot: yes
image: ubuntu18-latest-drivers
disk_size: 80 # In GB
conda_env: conda-env.yml # denotes project root
# Job options
run_in_background: no
persist: no # whether instance persists when the job finishes or on error
# Permissions and credentials
security_group: default
instance_key: key.pem # can be an absolute path
role: NimboFullS3AccessRole
ssh_timeout: 450
To Reproduce
Try running a python script using nimbo. Connect to IP address via separate terminal session.
nimbo ssh <instance_id>
connects to the instance as well.
Any hints how to fix this or how to disable the initial up check?
Right now the way information output is:
In addition, Nimbo depends on click
and awscli
which in turn depend on colorama
. Right now we are using rich
, investigate if it is worth reducing our dependencies.
Hi! Thanks for this nice tool.
I'm having an issue with bringing up spot instances.
In nimbo-config.yml if spot is set to no, it works as expected, but setting to yes gives me this error:
Exception: {'Code': 'bad-parameters', 'Message': 'Your Spot request failed due to bad parameters.
Spot request cannot be fulfilled due to invalid availability zone.',
'UpdateTime': datetime.datetime(2021, 5, 12, 19, 8, 50, tzinfo=tzutc())}
I have tried on both ubuntu 20.04 and macos and meet the same error. On my linux box I tried both installing nimbo with pip and from source and met this availability zone error, on macos I've only tried installing nimbo with pip.
Is this a known issue? How can it be resolved?
Thanks
There is no reason not to run tests that do not require spinning up of instances on each PR and master.
Is your feature request related to a problem? Please describe.
Conda env setup takes >45 mins every time I'm starting a new instance. This gets 10x more annoying when relying on spot instances.
Describe the solution you'd like
Instead of creating the same conda environment on each new instance, why not have a docker image that can be ready to start much quicker
Additional context
Would love any pointers how to add this to nimbo. I'd be up for creating a PR depending on the capacity needed.
Is this on the feature roadmap for nimbo anyway? (Is there a feature roadmap? :) )
Thanks, would love to hear your thoughts on this
It would be useful to be able to pass certain parameters as environment variables. For example, suppose I want to commit a config file, but each user of the repo has a different key. With the current setup this wouldn't work, because everyone has different entries for this field:
instance_key: /home/andre/andre-key.pem
A similar issue would arise when running a CI job, for example.
One possible solution is to be able to pass this parameter (and perhaps others) as an environment variable. So something like this:
instance_key: $NIMBO_KEY
Cheers and keep up the great work.
We should have a command line option to create and download instance keys. Careful, the user might not have the right permissions to create them programatically.
Really like the concept of nimbo, it is quite lightweight for launching instances. However, I have not been able to give it a try due to AWS key errors. Would love to chat more about using this tool though.
Enable users to setup email alerts for when they forget to terminate instances. This could be done by deploying a lambda function that executes at some time of the day and sends an email to the user.
Could you please tag the source? This allows distributions to get the complete source from GitHub if they want.
Thanks
At the moment, when adding current IP to a security group /16
CIDR block is used, this should be a config option. Should probably default to /32
.
Nimbo should also attempt to add the current IP automatically to the current security group without the user having to run allow-current-ip group_name
to reduce the number of actions the user has to perform.
Reproduced by
mykey.pem
in the AWS console and download itmykey.pem
to project dirinstance_key: mykey.pem
in nimbo-config.yml
nimbo test-access
load pubkey "mykey.pem": invalid format
errorsInspecting the key shows it looks fine with correct header/footer and can be used to manually SSH onto EC2 instances. Is there something Nimbo does differently when it comes to keys?
Hello,
We have a docker container which runs as a microservice that uses nimbo for training a model on EC2. However, if two users in the same container were to run a job at the same time than someone may use the other person's nimbo-config.yml file.
Is there a way to change the nimbo-config.yml name so that each user can have their own nimbo configuration file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.