Giter Site home page Giter Site logo

azavea / raster-vision Goto Github PK

View Code? Open in Web Editor NEW
2.0K 72.0 375.0 82.83 MB

An open source library and framework for deep learning on satellite and aerial imagery.

Home Page: https://docs.rastervision.io

License: Other

Shell 1.46% Python 96.73% Dockerfile 0.64% Jupyter Notebook 1.17%
deep-learning computer-vision remote-sensing geospatial object-detection semantic-segmentation classification machine-learning pytorch

raster-vision's Introduction

Raster Vision Logo  

Pypi Documentation Status License Build Status codecov

Raster Vision is an open source Python library and framework for building computer vision models on satellite, aerial, and other large imagery sets (including oblique drone imagery).

It has built-in support for chip classification, object detection, and semantic segmentation with backends using PyTorch.

Examples of chip classification, object detection and semantic segmentation

As a library, Raster Vision provides a full suite of utilities for dealing with all aspects of a geospatial deep learning workflow: reading geo-referenced data, training models, making predictions, and writing out predictions in geo-referenced formats.

As a low-code framework, Raster Vision allows users (who don't need to be experts in deep learning!) to quickly and repeatably configure experiments that execute a machine learning pipeline including: analyzing training data, creating training chips, training models, creating predictions, evaluating models, and bundling the model files and configuration for easy deployment. Overview of Raster Vision workflow

Raster Vision also has built-in support for running experiments in the cloud using AWS Batch.

See the documentation for more details.

Installation

For more details, see the Setup documentation.

Install via pip

You can install Raster Vision directly via pip.

pip install rastervision

Use Pre-built Docker Image

Alternatively, you may use a Docker image. Docker images are published to quay.io (see the tags tab).

We publish a new tag per merge into master, which is tagged with the first 7 characters of the commit hash. To use the latest version, pull the latest suffix, e.g. raster-vision:pytorch-latest. Git tags are also published, with the Github tag name as the Docker tag suffix.

Build Docker Image

You can also build a Docker image from scratch yourself. After cloning this repo, run docker/build, and run then the container using docker/run.

Usage Examples and Tutorials

Non-developers may find it easiest to use Raster Vision as a low-code framework where Raster Vision handles all the complexities and the user only has to configure a few parameters. The Quickstart guide is a good entry-point into this. More advanced examples can be found on the Examples page.

For developers and those looking to dive deeper or combine Raster Vision with their own code, the best starting point is Usage Overview, followed by Basic Concepts and Tutorials.

Contact and Support

You can ask questions and talk to developers (let us know what you're working on!) at:

Contributing

For more information, see the Contribution page.

We are happy to take contributions! It is best to get in touch with the maintainers about larger features or design changes before starting the work, as it will make the process of accepting changes smoother.

Everyone who contributes code to Raster Vision will be asked to sign the Azavea CLA, which is based off of the Apache CLA.

  1. Download a copy of the Raster Vision Individual Contributor License Agreement or the Raster Vision Corporate Contributor License Agreement

  2. Print out the CLAs and sign them, or use PDF software that allows placement of a signature image.

  3. Send the CLAs to Azavea by one of:

  • Scanning and emailing the document to [email protected]
  • Faxing a copy to +1-215-925-2600.
  • Mailing a hardcopy to: Azavea, 990 Spring Garden Street, 5th Floor, Philadelphia, PA 19107 USA

Licenses

Raster Vision is licensed under the Apache 2 license. See license here.

3rd party licenses for all dependecies used by Raster Vision can be found here.

raster-vision's People

Contributors

adeelh avatar ameier3 avatar ammarsdc avatar citerana avatar colekettler avatar dependabot[bot] avatar dustymugs avatar echeipesh avatar giswqs avatar jamesmcclain avatar jeromemaleski avatar jisantuc avatar jmorrison1847 avatar jpolchlo avatar lewfish avatar lmbak avatar lossyrob avatar mbertrand avatar mccalluc avatar mmcs-work avatar nholeman avatar notthatbreezy avatar nripeshn avatar perliedman avatar pomadchin avatar rbreslow avatar simonkassel avatar theoway avatar tnation14 avatar uribo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

raster-vision's Issues

TypeError: super() takes at least 1 argument (0 given)

Hello! I've been trying to deploy and run the code, but I've been running into this type error -

I'm currently trying to run the code from the VM using Python 2.7.11

Traceback (most recent call last):
File "/opt/conda/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/opt/conda/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/opt/src/rastervision/semseg/data/factory.py", line 72, in
factory = SemsegDataGeneratorFactory()
File "/opt/src/rastervision/semseg/data/factory.py", line 16, in init
super().init([POTSDAM, VAIHINGEN], [IMAGE, NUMPY])
TypeError: super() takes at least 1 argument (0 given)

From what I've found on Stack Overflow it's a syntax error between Python 2 and 3. I tried a workaround by altering the super().init syntax, but it led to more complications down the road.
Does anybody know a solution to this? Should the script be running conda w/ Python 3?

Thanks!

Preprocessing problem: TIFF reading error

By running the "rastervision.semseg.data.factory isprs/vaihingen all preprocess" for a while, the file size of data in "gts_for_participants" and "dsm" increased a lot. After reaching the 4GB size limit of tiff image, I got the following error:

rasterio._err.CPLE_AppDefined: TIFFReadDirectory:Failed to read directory at offset 4294655492

It seems a similar problem as:
raster-foundry/raster-foundry#209

But the solution there is not working for this issue.

Generate negative chips

The tiff_chipper.py script only generates chips that contain at least one object. We should make it so it attempts to generate some number of negative chips that contain no objects. I don't think this is typically needed, but it seems like it might help with the ships dataset, since the ships always have the same surroundings (sea or docks) and so the network never sees land.

Out of memory when restarting training process

In theory, if you run train_ec2.sh and exit before training completes, and then restart the job, it should pick up where it left off. But this doesn't actually work because on the second run, TF emits an out of memory error. We should isolate the exact conditions when this occurs and file an issue in the repo for TF Object Detection. We should also check to see if there's an issue already there.

Terminating AWS Batch jobs broken

If you terminate an AWS Batch job via the console or AWS CLI, it doesn't work. This happens when running the train_ec2.sh script. The only way to kill it is to kill the underlying spot instance. We think this is due to a bug in Batch, and should submit a bug report to AWS.

Cannot make train_ratio 1.0

To maximize performance, sometimes one would like to train a model using the entire development dataset, in other words, using a train_ratio of 1.0. This causes the program to crash. As a workaround , we have been using a train_ratio of 0.99.

FileNotFoundError: [Errno 2] No such file or directory: 'aws'

I am running the code through locally on my machine

python3 -m rastervision.run experiments/semseg/4_20_17/fcn_0.json

but got a error saying

  File "/home/sizhexi/keras/raster-vision/src/rastervision/common/utils.py", line 145, in s3_sync
    call(['aws', 's3', 'sync', src_path, dst_path])
  File "/usr/lib/python3.4/subprocess.py", line 537, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/lib/python3.4/subprocess.py", line 859, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.4/subprocess.py", line 1457, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'aws'

how can I resolve this issue?

Prediction problem: eval_target_size

When the "eval_target_size" is set to [2000, 2000], the testing images are evaluated for the full resolution [6000, 6000], but only the results of the last tile [4001:6000, 4001:6000] are saved. Can somebody please fix it?

Not using best_model.h5

The train_model task saves the best model (according to the validation loss) as best_model.h5. However, this model is not used by tasks that are run subsequently during the same invocation of the program. Instead, the model, as trained by the final epoch, is used by subsequent tasks, unless the program is run again, at which point best_model.h5 file will be loaded.

Use same validation set each epoch

Currently the validation set is shuffled and randomly augmented, which adds to the variance of the validation loss each epoch. It makes more sense to use the same validation set for each epoch so that epochs are more directly comparable.

Experiment runing problem

I am trying test the code with a json file from 4_20_17. The variable "epoch" is defined within the "train_stages" in this json. But the rastervision/common/options.py uses the epoch variable without retrieving the train_stages. So when I run with "python -m rastervision.run experiments/....", it gives the following error:

root@0a7cd1dcc9ef:/opt/src# python -m rastervision.run experiments/semseg/4_20_17/fcn_0.json
Using TensorFlow backend.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/src/rastervision/run.py", line 47, in <module>
    run_tasks()
  File "/opt/src/rastervision/run.py", line 34, in run_tasks
    options = make_options(options_dict)
  File "/opt/src/rastervision/options.py", line 15, in make_options
    options = SemsegOptions(options_dict)
  File "/opt/src/rastervision/semseg/options.py", line 16, in __init__
    super().__init__(options)
  File "/opt/src/rastervision/common/options.py", line 18, in __init__
    self.epochs = options['epochs']
KeyError: 'epochs'

Can somebody please help to look at it?

./scripts/infra destroy does not terminate instances

After running ./scripts/infra destroy the spot fleet and associated instances should be terminated. However, it only seems to terminate the spot fleet, while leaving the instance running. The following error message is printed:

aws_spot_fleet_request.gpu_worker: Still destroying... (5m0s elapsed)
Error applying plan:

1 error(s) occurred:

* aws_spot_fleet_request.gpu_worker: fleet still has (1) running instances

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Ensemble run error

I was trying to run an ensemble experiment and got the following error. "load_options" is missing. Can somebody please help?

Traceback (most recent call last):
File "/opt/conda/lib/python3.5/runpy.py", line 170, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/src/rastervision/run.py", line 47, in
run_tasks()
File "/opt/src/rastervision/run.py", line 37, in run_tasks
runner.run_tasks(options, args.tasks)
File "/opt/src/rastervision/common/run.py", line 67, in run_tasks
self.run_path, self.options, self.generator, use_best=True)
File "/opt/src/rastervision/common/models/factory.py", line 40, in get_model
model = self.make_model(options, generator)
File "/opt/src/rastervision/semseg/models/factory.py", line 88, in make_model
models, active_input_inds_list = self.load_ensemble_models(options)
File "/opt/src/rastervision/semseg/models/factory.py", line 28, in load_ensemble_models
from ..options import load_options
ImportError: cannot import name 'load_options'

Validation tasks fail on ensemble_avg when using validation folds of different sizes

In an ensemble experiment that uses folds to get full coverage of the data set by using different validation sets, if the validation sets are not all of the same expected size then all the validation tasks will fail in the ensemble's aggregation job. This means that we cannot complete the following tasks: validation_probs, train_thresholds, train_predict, validation_predict or test_predict.

Failing unittest for test_predict compute_predictions2

Running ./scripts/test on develop yields this test failure:

----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/src/rastervision/tagging/tasks/test/test_predict.py", line 48, in test_compute_predictions2
    self.assertTrue(np.array_equal(y_pred, y))
AssertionError: False is not true

----------------------------------------------------------------------
Ran 11 tests in 0.012s

FAILED (failures=1)```

Switch to windowed reading in make_windows.py

This script generates a bunch of windows of a TIFF file by loading the whole image into memory and then slicing it. This won't work for very large images, so we should use rasterio's ability to do windowed reading.

Make predict.py script use >1 image per batch

If running on GPU, we can probably get a big speedup by feeding in > 1 image per batch when making predictions. However, I'm not convinced we'll want to use GPUs for prediction in batch mode considering how fast it runs on CPU and the overhead for booting up the instance.

Jobs stuck in Runnable

We've noticed that sometimes jobs get stuck in a runnable state on Batch. I just logged into the instance for such a job and found that the ecs-agent is not running as it is supposed to. (See http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-introspection.html)

[ec2-user@ip-172-31-45-73 ecs]$ curl http://localhost:51678/v1/metadata
curl: (7) Failed to connect to localhost port 51678: Connection refused

I also looked at the ecs-agent log, which contains error messages which I don't currently understand.

[ec2-user@ip-172-31-45-73 ecs]$ pwd
/var/log/ecs
[ec2-user@ip-172-31-45-73 ecs]$ cat ecs-init.log.2017-07-06-20
2017-07-06T20:21:28Z [INFO] Network error connecting to docker, backing off for '1.14777941s', error: dial unix /var/run/docker.sock: connect: no such file or directory
2017-07-06T20:21:29Z [INFO] Network error connecting to docker, backing off for '2.282153551s', error: dial unix /var/run/docker.sock: connect: no such file or directory
2017-07-06T20:21:31Z [INFO] Network error connecting to docker, backing off for '4.466145821s', error: dial unix /var/run/docker.sock: connect: no such file or directory
2017-07-06T20:21:36Z [INFO] Network error connecting to docker, backing off for '5.235010051s', error: dial unix /var/run/docker.sock: connect: no such file or directory
2017-07-06T20:21:41Z [INFO] Network error connecting to docker, backing off for '5.287113937s', error: dial unix /var/run/docker.sock: connect: no such file or directory
2017-07-06T20:21:46Z [ERROR] dial unix /var/run/docker.sock: connect: no such file or directory
2017-07-06T20:21:46Z [INFO] Network error connecting to docker, backing off for '1.14777941s', error: dial unix /var/run/docker.sock: connect: no such file or directory
2017-07-06T20:21:48Z [INFO] Network error connecting to docker, backing off for '2.282153551s', error: dial unix /var/run/docker.sock: connect: no such file or directory
2017-07-06T20:22:17Z [INFO] post-stop
2017-07-06T20:22:17Z [INFO] Cleaning up the credentials endpoint setup for Amazon EC2 Container Service Agent
2017-07-06T20:22:17Z [ERROR] Error performing action 'delete' for credentials proxy endpoint route: exit status 1; raw output: iptables: No chain/target/match by that name.

2017-07-06T20:22:17Z [ERROR] Error performing action 'delete' for credentials proxy endpoint route: exit status 1; raw output: iptables: No chain/target/match by that name.

pre-trained weights

Hi --

I am looking to do semantic segmentation (no need for tagging, detection, or object recognition at this stage) and I was wondering if anyone has the pre-trained weights available for download? It can be on any of the models, just want to test it out for now.

Thanks vm

Add inception/xception model

At the end of the Planet Kaggle competition, we found that adding Inception to the ensemble improved the score. Unfortunately, the code we were using was problematic because it doesn't assign a unique name to each layer, so we can't use model.load_weights with it. The automatically generated layer names aren't consistent each time you create a new model (the names contain a constant which is global incremented) so they can't be used after the "best" model is loaded from disk after training finishes. We can fix this in a few ways: fix the underlying problem in Keras or report an issue, add unique layer names to the inception code, or use the Xception model builtin in to Keras which appears to be an improved version of Inception and has unique layer names.

Run experiments in parallel on AWS

Currently, we can run experiments in parallel by spinning up some instances and then manually SSHing into each one, and running a command for each experiment. This doesn't scale well, so we would like to find a way of automating this. Some ideas include:

OpenAI has a blog post about their infrastructure setup which we should mine for ideas https://openai.com/blog/infrastructure-for-deep-learning/

Specify order of channels in TIFF

Currently, we assume that the channels in TIFFS are ordered as BGR-IR. To make this more general, we should be able to specify the order of the channels as a command line argument. That is, unless BGR-IR is standard and the assumption is safe.

The dataset required in README.txt maybe incorrect?

Hi, the dataset required in /keras-semantic-segmentation-develop/src/model_training/README.txt maybe wrong. I tried to run your code and got this error: /opt/data/datasets/potsdam/4_Ortho_RGBIR/top_potsdam_2_10_RGBIR.tif: No such file or directory, seems that the program needs 4_Ortho_RGBIR dataset instead of 3_Ortho_IRRG as written in the README file.

Bad behavior when generators try to download files on local machines

Related to #66

The way we use "done.txt" can get really out of sorts if you download the data and place it in your local directory outside of the process.
Also if a download fails, it writes the done.txt as if it succeeded.

We need to refactor this part of the code to be more robust.

Validation accuracy

Validation accuracy in the score.json is different from it in the log.txt/stdout.txt.

BTW, the avg_accuracy in the validation_eval.py should be called overall_accuracy.

Allow easily changing size of training and validation sets

Currently, if you want to change the size of the training and validation sets, you need to run the preprocess.py script again, which puts files into train and validation directories. Now that we aren't using the Keras data generator there's no need to keep the files in separate directories. Instead, each data generator could be given a list of files it can use.

Debug plot not showing all detections

There are more detections that show up when viewing the GeoJSON file in QGIS than show up in the debug plot generated by aggregate_predictions.py.

Fine Tuning ...

Hello,

Can you guide me how can I fine tune the model for a different dataset and naturally different number of classes?

I suppose to create the dataset in your format I think, at the beginning.

Downloading of files for different runs can not happen when needed

In download_dataset, the code checks if the data directory is there, and if not makes it and downloads data. If it is, it considered the data already downloaded. For machines that are being reused in the ECS cluster, it could run multiple trainings. If the one run needs different files from the other, then it can skip downloading important data. We should check for the existence of the files at a per file level (which presents a slight challenge because zip files are deleted for good reason).

Setup script fails

Running ./scripts/setup fails with the following. Running vagrant provisionworks though.

fatal: [raster_vision]: FAILED! => {"changed": false, "failed": true, "msg": "dpkg --force-confdef --force-confold -i /tmp/nvidia-docker.deb failed", "stderr": "start: Job failed to start\ninvoke-rc.d: initscript nvidia-docker, action \"start\" failed.\ndpkg: error processing package nvidia-docker (--install):\n subprocess installed post-installation script returned error exit status 1\nErrors were encountered while processing:\n nvidia-docker\n", "stdout": "Selecting previously unselected package nvidia-docker.\n(Reading database ... 65558 files and directories currently installed.)\nPreparing to unpack /tmp/nvidia-docker.deb ...\nUnpacking nvidia-docker (1.0.0~rc.3-1) ...\nSetting up nvidia-docker (1.0.0~rc.3-1) ...\nConfiguring user\nSetting up permissions\nProcessing triggers for ureadahead (0.100.0-16) ...\n", "stdout_lines": ["Selecting previously unselected package nvidia-docker.", "(Reading database ... 65558 files and directories currently installed.)", "Preparing to unpack /tmp/nvidia-docker.deb ...", "Unpacking nvidia-docker (1.0.0~rc.3-1) ...", "Setting up nvidia-docker (1.0.0~rc.3-1) ...", "Configuring user", "Setting up permissions", "Processing triggers for ureadahead (0.100.0-16) ..."]}

Infrastructure improvements

@lewfish and I discussed some potential areas for infrastructure improvements:

Reducing EC2 Boot times

  • Build docker-images locally and push them to quay.io/ECR, rather than copying the entire local workspace up to EC2 for builds.
  • Get latest source code onto the EC2 instance by cloning this repository using cloud-init or a command run over SSH
  • Replace cloud-config installation of nvidia-docker with our own AMI, based on ami-50b4f047, that has nvidia-docker installed.

Optimizations for multi-user collaborations

  • Identify a user's EC2 instance via key-pair name: scripts/run uses aws ec2 wait to determine when Spot Fleet requests are complete. However, if multiple users are running the script at the same time, the script may wait for the wrong Spot Fleet request to finish. One way to avoid this is to allow users to use their own (named) key-pairs, and add key-name as an additional filter to aws ec2 wait instance-running.

  • Use the AWS CLI to terminate instances once the jobs have finished. Ideally we'd be able to run this from inside the container, but that would either require stored credentials in the container (a security risk), or access to the EC2 metadata service.

Concurrent processing across instances

We want to be able to run the same command with different parameters, simultaneously, across all available workers. We settled on the following:

  • Add instance name/index $INSTANCE_ID as an environment variable via cloud-config
  • Store the command parameters in files namespaced by instance ID. Instances would access a file like $INSTANCE_ID-command-options.json.

Cloud init fails occasionally

Intermittently, the cloud-init fails due to a problem with installing packages. The following is from the log file. The result is that the data and docker image aren't downloaded to the instance.

Get:17 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial/main amd64 libwebp5 amd64 0.4.4-1 [165 kB]
Get:18 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial/main amd64 libwebpmux1 amd64 0.4.4-1 [14.2 kB]
Get:19 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial/main amd64 python3-pil amd64 3.1.2-0ubuntu1 [312 kB]
Get:20 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial/main amd64 python3-pygments all 2.1+dfsg-1 [520 kB]
Get:21 http://security.ubuntu.com/ubuntu xenial-security/main amd64 libtiff5 amd64 4.0.6-1ubuntu0.1 [146 kB]
Get:22 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial/main amd64 unzip amd64 6.0-20ubuntu1 [158 kB]
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin:
Fetched 3,703 kB in 0s (9,560 kB/s)
dpkg: error: dpkg status database is locked by another process
E: Sub-process /usr/bin/dpkg returned an error code (2)
/var/lib/cloud/instance/scripts/part-001: line 9: aws: command not found
/var/lib/cloud/instance/scripts/part-001: line 10: pushd: data/datasets: No such file or directory
/var/lib/cloud/instance/scripts/part-001: line 11: unzip: command not found
/var/lib/cloud/instance/scripts/part-001: line 12: popd: directory stack empty
Cloning into 'keras-semantic-segmentation'...
/var/lib/cloud/instance/scripts/part-001: line 18: aws: command not found
Using default tag: latest
Pulling repository 002496907356.dkr.ecr.us-east-1.amazonaws.com/keras-semantic-segmentation-gpu
unauthorized: authentication required
Cloud-init v. 0.7.8 running 'modules:final' at Mon, 27 Feb 2017 19:26:15 +0000. Up 65.95 seconds.
2017-02-27 19:26:48,013 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [1]
2017-02-27 19:26:48,015 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2017-02-27 19:26:48,016 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed

Problem starting `./scripts/run --gpu`

After starting up 10 EC2 instance, I ran ./scripts/run --gpu on each of them. On 4 of the instances, it hung for a bit and then had the following error message.

docker: Error response from daemon: create nvidia_driver_375.51: Post http://%2Frun%2Fdocker%2Fplugins%2Fnvidia-docker.sock/VolumeDriver.Create: http: ContentLength=44 with Body length 0.
See 'docker run --help'.

Invoking the command a second time worked.

Model improvements

Right now we get 85.8% accuracy on the Potsdam dataset using a single U-Net-like model trained from scratch on RGBIRD. From reading papers, it seems that this is about the best you can do with a single off-the-shelf model trained from scratch. To improve accuracy, we might explore the following ideas:

  • Do some hyperparameter tuning once we have the ability to run lots of experiments in parallel.
  • Implement a more proper version of U-Net. The version we have takes some shortcuts for ease of implementation that might result in lower accuracy.
  • Use a more state-of-the-art model like 100 Layer Tiramisu, which is like a U-Net but uses DenseNets as its base network. https://arxiv.org/abs/1611.09326
  • Train a bunch of models and combine them as an ensemble. Top entries in the contest do this.
  • Use a pre-trained model on the RGB channels and then fuse with a trained-from-scratch model on the IR and D channels. How and where you do the fusing probably matters. Top entries in the contest do this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.