Giter Site home page Giter Site logo

mbari-org / deepsea-ai Goto Github PK

View Code? Open in Web Editor NEW
3.0 6.0 1.0 3.16 MB

DeepSea-AI is a Python package to simplify processing deep sea video in AWS https://aws.amazon.com from a command line.

Home Page: http://docs.mbari.org/deepsea-ai/

License: Apache License 2.0

Python 88.62% Dockerfile 1.59% JavaScript 0.96% TypeScript 5.83% Shell 3.00%
aws tracking-by-detection video-processing video-processing-pipeline aws-cdk fathomnet object-detection

deepsea-ai's People

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

deepsea-ai's Issues

deepsea-ai requests access to directory /data

I have tried running deepsea-ai in a conda environment per the installation documentation, and also in a dedicated venv with a clean requirements.txt install of requirements. python Python 3.10.9. pip install deepsea-ai

deepsea-ai ecsprocess
--config 902005_vaa.ini
--clean
--upload
--cluster y5x315k
--job "Ventana-Dive-V4488-one"
--args "--agnostic-nms --iou-thres=0.5 --conf-thres=0.1 --imgsz=640"
--input /mnt/M3/mezzanine/Ventana/2023/08/4488/V4488_20230803T163130Z_h265.mp4
--exclude trashm
--dry-run

throws error
FileNotFoundError: [Errno 2] No such file or directory: '/data/database'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/bin/deepsea-ai", line 8, in
sys.exit(cli())
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/deepsea_ai/main.py", line 122, in ecs_process
custom_config = init(log_prefix="dsai_ecsprocess", config=config)
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/deepsea_ai/main.py", line 49, in init
custom_config = cfg.Config(config)
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/deepsea_ai/config/config.py", line 55, in init
self.job_db_path.mkdir(parents=True, exist_ok=True)
File "/Users/duane/.pyenv/versions/3.10.9/lib/python3.10/pathlib.py", line 1179, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/Users/duane/.pyenv/versions/3.10.9/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
OSError: [Errno 30] Read-only file system: '/data'

I cannot make a /data/database directory.
What do I do to redirect the output to a different directory?

Difference between number of tracks fetched and # of possible annotations reported

@danellecline I'm fetching the events from M3_EVENT_TRACKS database and grouping them as tracks in my app.

I'm seeing a difference between the number of tracks and the number of possible annotations reported in your dashboard. For example, media 93 (D1371_20210801T154045Z_h264.mp4) has 66 tracks but the dashboard says there are 38 possible annotations and I'm wondering why they're different. Are you applying a filter based on confidence or some other criteria? Thanks

ref: mbari-org/vars-feedback#37

Add check for docker setup

When running

deepsea-ai setup --mirror

on an account that does not have docker running, or logged in, a really ugly error happens ( see below ).
Check the docker status before allowing this command.

“/Users/duane/opt/anaconda3/lib/python3.7/subprocess.py”, line 363, in check_call
  raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ’[’docker’, ‘login’, ‘--username’, ‘AWS’, ‘--password’, 

‘eyJwYXlsb2FkIjoidFBiVFk3QmU3b1FwQW9UZnR6WjhWUFlBZHJIaFFnc0NkYnNrLzZrcFAxSWJZS2luNF
k5czVIaDQ1OFBrejJjMzAzMElMR0lyK1RZREwzQlpRaWJlWDVWWDNKQ1oySXpiTmNRYXVja2l5amhqRH
I2YW0vamgydXpqTGRFR3pYMlBsMFp6YzNna0FITENwTllIYW9RaWNvTUtMQ245TW4zOTF4OWRxaGxhNFAwak9IKzVKNGU4WVBHWHRRZlhqTEhxaElYWjN4YTArVyt5dy9GTEFNeW9MdjhsN2FKN0RkUVc2M
HNQT1dPbEE5b0QzREo2Sk8rMkQ2dGNpMGJ6QW5kOVNpZFlzbUgrNjdWU0RuRW5oWkVObHh2aUVQMm15RWFSRVFTMkdBdVRQcmdsNklKUT0iLCJ2ZXJzaW9uIjoiMiIsInR5cGUiOiJEQVRBX0tFWSIsImV4cGlyYXRpb24iOjE2NjgyNDk5NDh9’, ‘https://548531997526.dkr.ecr.us-west-2.amazonaws.com/’]' returned non-zero exit status 1.

Add configurable ecs mbar:stage tag

The ECS stack tag is currently fixed as mbari:stage=prod in `cdk/app/bin/deploy_stack.ts'

We need to make this configurable through a command line argument or environment variable.

Add config.ini support

Add support for config.ini

  • Add support for custom company, instead of MBARI
  • replacement of docker images URIs

Add support for more hyperparameters

This refers to the process and ecsprocess commands

Per the request in the google doc https://docs.google.com/spreadsheets/d/1YLM649lFmUq5k9pZVVy9aNK-Rf2YWD95iiTi1MWcznM/edit#gid=0 line item 11

When kicking off a job, be sure that all hyperparameters below are available and that they work. For example, ensure that we can turn on agnostic-nms, adjust conf, and iou-threshold.
Here’s a list of flags we might want to be able to use/modify per run:
parser.add_argument('--weights', nargs='+', type=str, default=ROOT / 'yolov5s.pt', help='model path or triton URL')
parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640], help='inference size h,w')
parser.add_argument('--conf-thres', type=float, default=0.25, help='confidence threshold')
parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold')
parser.add_argument('--max-det', type=int, default=1000, help='maximum detections per image')
parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --classes 0, or --classes 0 2 3')
parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')

Tasks

Set up object detection ML server externally.

This task is to setup an object detection server externally.

This is relatively simple to do in AWS.

Reference:

yolov5 inference

IMPORTANT. This was done in our new AWS sub account by setting up credentials through:

https://mbari.awsapps.com/start

using an mbari username, e.g. duane and an mbari password.

Keys can be setup either through environment variables, e.g. AWS_REGION, AWS_SECRET_ACCESS_KEY, etc. or in a ~/.aws/credentials file.

Improved reports

Add one report per job. Currently, job .txt reports are generated per day, per job, which is more verbose than needed.
Kris suggested generation a .txt for the first day of submission and the last day of processing. Starting date and dates are captured in the reports, so a single file should capture the time submitted and finished in one file.

Tasks

Documentation on running mbari/ecs-autoscale

Need documentation on how to run the ECS stack with the docker image mbari/ecs-autoscale

Here is the plan

Tasks

need updated mbari/strongsort-yolov5 in ecr repositories

because of a update in a dependency, need updated mbari/strongsort-yolov5 in the ecr repositories corresponding to the AWS accounts where this package is run. example 901103-bio AWS ECR repo.

https://hub.docker.com/repository/docker/mbari/strongsort-yolov5/general is currently at 1.10.0 which fails when run from doris.shore.mbari.org

Traceback (most recent call last):
File "track.py", line 322, in
main(opt)
File "track.py", line 317, in main
run(**vars(opt))
File "/env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "track.py", line 121, in run
StrongSORT(
File "/app/yolo_tracking/strong_sort/strong_sort.py", line 46, in init
gdown.download(model_url, str(model_weights), quiet=False)
File "/env/lib/python3.8/site-packages/gdown/download.py", line 259, in download
filename_from_url = m.groups()[0]

Add ecsshutdown command

This is a new feature request to add a shutdown command to halt any data processed with the ecsprocess command. e.g.

deepsea-ai ecsshutdown -u \
        --job "DocRicketts 2021/08 with benthic model" \
        --cluster benthic33k 

Tasks

  • Purge SQS VIDEO_QUEUE queue to halt any autoscaling
  • Forced shutdown of any running ECS tasks
  • Remove any completed video tracks from S3 as queued in the TRACK_QUEUE
  • Update job status in sqlite

Improved AWS session token handling

@duane-edgington can you please investigate the best practice here for handling session tokens?

In the new MBARI VAA AWS sub-account, I ran into a session timeout error when training, which was confusing as I thought the training failed, but it did not, and I submitted a training job 2x; this was an expensive mistake 😞

I'm unclear on how to deal with the timeout through this pip module. In our previous AWS account, this was not an issue, as we didn't have a session token.

Appreciate any help with this.

Verify that having a "." (dot) in the video file name is OK for process jobs

We received an error when running
with the video file name
D232_20110526T093251.130Z_alt_h264.mp4

on the mounted MBARI file server
smb://titan.shore.mbari.org/m3/Projects/VAA/Benchmarks/mezzanine/DocRicketts/2011/05/232/D232_20110526T093251.130Z_alt_h264.mp4

the message was
Found s3://902005-benchmark//M3/Projects/VAA/Benchmarks/mezzanine/DocRicketts/2011/05/232/D232_20110526T093251.130Z_alt_h264.mp4 ...skipping upload

It is not clear why the upload was skipped, since a file of that name was not in the target AWS S3 bucket.

Is it because having a "." (dot) in the file name before the file extension caused confusion? Or something else??

Skip flag for subfolders

@lonnylundsten (request) Add a flag to the code that allows us to skip some subfolders because there will be times when we want to process most of the dives (subfolders) for a given month but not all.

This would be useful to add to the process and ecsprocess command respectively as both recursively walk any input folder.

Remove . from upload bucket prefix during training

@duane-edgington ran this command

deepsea-ai train --model yolov5x --instance-type ml.p3.16xlarge --labels labels.tar.gz --images images.tar.gz --label-map /Users/duane/amazon-studio-demos/benthic2017/voc/yolo.names --config /Users/duane/amazon-studio-demos/902005config.txt --input-s3 s3://902005-training-dev/ --output-s3 s3://902005-checkpoints-dev/ --epochs 1 --batch-size 80

which uploaded the data and included the dot . , i.e. s3://902005-training-dev/./training/images.tar.gz

This is presumably because the data was in the same directory as the command was run. Probably functionally acceptable, but better to remove the dot. This is a request to change that.

confirm behavior of --input-s3 parameter

On my recent run of deepsea-ai train, with parameters
--input-s3 s3://901103-models-deploy/megadetector/megafish_ROV_weights.pt
--resume True

the log file indicates that instead of using the indicated weights for transfer learning, the model training started with the last run of training of the model. That is not terrible in this case, since the input-s3 .pt file indicated was the original starting point, but it could be unexpected.

@danellecline @duane-edgington

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.