The deepsea-ai from mbari-org

deepsea-ai requests access to directory /data

I have tried running deepsea-ai in a conda environment per the installation documentation, and also in a dedicated venv with a clean requirements.txt install of requirements. python Python 3.10.9. pip install deepsea-ai

deepsea-ai ecsprocess
--config 902005_vaa.ini
--clean
--upload
--cluster y5x315k
--job "Ventana-Dive-V4488-one"
--args "--agnostic-nms --iou-thres=0.5 --conf-thres=0.1 --imgsz=640"
--input /mnt/M3/mezzanine/Ventana/2023/08/4488/V4488_20230803T163130Z_h265.mp4
--exclude trashm
--dry-run

throws error
FileNotFoundError: [Errno 2] No such file or directory: '/data/database'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/bin/deepsea-ai", line 8, in
sys.exit(cli())
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/deepsea_ai/main.py", line 122, in ecs_process
custom_config = init(log_prefix="dsai_ecsprocess", config=config)
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/deepsea_ai/main.py", line 49, in init
custom_config = cfg.Config(config)
File "/Users/duane/new-deepsea-ai/deepsea-ai/.venv/lib/python3.10/site-packages/deepsea_ai/config/config.py", line 55, in init
self.job_db_path.mkdir(parents=True, exist_ok=True)
File "/Users/duane/.pyenv/versions/3.10.9/lib/python3.10/pathlib.py", line 1179, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/Users/duane/.pyenv/versions/3.10.9/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
OSError: [Errno 30] Read-only file system: '/data'

I cannot make a /data/database directory.
What do I do to redirect the output to a different directory?

Difference between number of tracks fetched and # of possible annotations reported

@danellecline I'm fetching the events from M3_EVENT_TRACKS database and grouping them as tracks in my app.

I'm seeing a difference between the number of tracks and the number of possible annotations reported in your dashboard. For example, media 93 (D1371_20210801T154045Z_h264.mp4) has 66 tracks but the dashboard says there are 38 possible annotations and I'm wondering why they're different. Are you applying a filter based on confidence or some other criteria? Thanks

ref: mbari-org/vars-feedback#37

Add check for docker setup

When running

deepsea-ai setup --mirror

on an account that does not have docker running, or logged in, a really ugly error happens ( see below ).
Check the docker status before allowing this command.

“/Users/duane/opt/anaconda3/lib/python3.7/subprocess.py”, line 363, in check_call
  raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ’[’docker’, ‘login’, ‘--username’, ‘AWS’, ‘--password’, 

‘eyJwYXlsb2FkIjoidFBiVFk3QmU3b1FwQW9UZnR6WjhWUFlBZHJIaFFnc0NkYnNrLzZrcFAxSWJZS2luNF
k5czVIaDQ1OFBrejJjMzAzMElMR0lyK1RZREwzQlpRaWJlWDVWWDNKQ1oySXpiTmNRYXVja2l5amhqRH
I2YW0vamgydXpqTGRFR3pYMlBsMFp6YzNna0FITENwTllIYW9RaWNvTUtMQ245TW4zOTF4OWRxaGxhNFAwak9IKzVKNGU4WVBHWHRRZlhqTEhxaElYWjN4YTArVyt5dy9GTEFNeW9MdjhsN2FKN0RkUVc2M
HNQT1dPbEE5b0QzREo2Sk8rMkQ2dGNpMGJ6QW5kOVNpZFlzbUgrNjdWU0RuRW5oWkVObHh2aUVQMm15RWFSRVFTMkdBdVRQcmdsNklKUT0iLCJ2ZXJzaW9uIjoiMiIsInR5cGUiOiJEQVRBX0tFWSIsImV4cGlyYXRpb24iOjE2NjgyNDk5NDh9’, ‘https://548531997526.dkr.ecr.us-west-2.amazonaws.com/’]' returned non-zero exit status 1.

Add logging and report summary

Add proper logging and a report summary for ecsprocess and process commands.

Add configurable ecs mbar:stage tag

The ECS stack tag is currently fixed as mbari:stage=prod in `cdk/app/bin/deploy_stack.ts'

We need to make this configurable through a command line argument or environment variable.

Add config.ini support

Add support for config.ini

Add support for custom company, instead of MBARI
replacement of docker images URIs

Increase storage for 1280 YV5 model in ECS processing workflow

Getting errors when processing prores video in ECS stack with larger 1280 model.

Waiting for M3/master/Ventana/2022/04/4394/V4394_20220427T184055Z_prores.mov to become available [Errno 28] No space left on device

Add support for more hyperparameters

This refers to the process and ecsprocess commands

Per the request in the google doc https://docs.google.com/spreadsheets/d/1YLM649lFmUq5k9pZVVy9aNK-Rf2YWD95iiTi1MWcznM/edit#gid=0 line item 11

When kicking off a job, be sure that all hyperparameters below are available and that they work. For example, ensure that we can turn on agnostic-nms, adjust conf, and iou-threshold.
Here’s a list of flags we might want to be able to use/modify per run:
parser.add_argument('--weights', nargs='+', type=str, default=ROOT / 'yolov5s.pt', help='model path or triton URL')
parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640], help='inference size h,w')
parser.add_argument('--conf-thres', type=float, default=0.25, help='confidence threshold')
parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold')
parser.add_argument('--max-det', type=int, default=1000, help='maximum detections per image')
parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --classes 0, or --classes 0 2 3')
parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')

Set up object detection ML server externally.

This task is to setup an object detection server externally.

This is relatively simple to do in AWS.

Reference:

yolov5 inference

IMPORTANT. This was done in our new AWS sub account by setting up credentials through:

https://mbari.awsapps.com/start

using an mbari username, e.g. duane and an mbari password.

Keys can be setup either through environment variables, e.g. AWS_REGION, AWS_SECRET_ACCESS_KEY, etc. or in a ~/.aws/credentials file.

add support for line width bounding boxes

recent addition to boxmot examples, --line-width argument

unless we update to most recent codebase, will need to inject that code addition into our clone

see
mikel-brostrom/yolo_tracking#1079

Improved reports

Add one report per job. Currently, job .txt reports are generated per day, per job, which is more verbose than needed.
Kris suggested generation a .txt for the first day of submission and the last day of processing. Starting date and dates are captured in the reports, so a single file should capture the time submitted and finished in one file.

Tasks

Beta Give feedback

Drop the datetime from the reports
Run pytest to confirm ok
Deploy
Options

Documentation on running mbari/ecs-autoscale

Need documentation on how to run the ECS stack with the docker image mbari/ecs-autoscale

Here is the plan

Tasks

Beta Give feedback

Add docs to mkdocs @danellecline
Test command on the deployment machine doris with the mbari315k model @duane-edgington
Options

Add support for hyper parameter training pools

Motivated by this feature: https://docs.aws.amazon.com/sagemaker/latest/dg/train-warm-pools.html

This would speed-up hyperparameter tuning.

need updated mbari/strongsort-yolov5 in ecr repositories

because of a update in a dependency, need updated mbari/strongsort-yolov5 in the ecr repositories corresponding to the AWS accounts where this package is run. example 901103-bio AWS ECR repo.

https://hub.docker.com/repository/docker/mbari/strongsort-yolov5/general is currently at 1.10.0 which fails when run from doris.shore.mbari.org

Traceback (most recent call last):
File "track.py", line 322, in
main(opt)
File "track.py", line 317, in main
run(**vars(opt))
File "/env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "track.py", line 121, in run
StrongSORT(
File "/app/yolo_tracking/strong_sort/strong_sort.py", line 46, in init
gdown.download(model_url, str(model_weights), quiet=False)
File "/env/lib/python3.8/site-packages/gdown/download.py", line 259, in download
filename_from_url = m.groups()[0]

deepsea-ai monitor command throws error

I am running deepsea-ai monitor command on Docker image on mbari system DORIS. When I run command, it reports an error in one file processing, then fatally aborts. See log file below
error report deepsea-ai monitor command March 26 2024 Doris.txt

Add ecsshutdown command

This is a new feature request to add a shutdown command to halt any data processed with the ecsprocess command. e.g.

deepsea-ai ecsshutdown -u \
        --job "DocRicketts 2021/08 with benthic model" \
        --cluster benthic33k

Tasks

Purge SQS VIDEO_QUEUE queue to halt any autoscaling
Forced shutdown of any running ECS tasks
Remove any completed video tracks from S3 as queued in the TRACK_QUEUE
Update job status in sqlite

Improved AWS session token handling

@duane-edgington can you please investigate the best practice here for handling session tokens?

In the new MBARI VAA AWS sub-account, I ran into a session timeout error when training, which was confusing as I thought the training failed, but it did not, and I submitted a training job 2x; this was an expensive mistake 😞

I'm unclear on how to deal with the timeout through this pip module. In our previous AWS account, this was not an issue, as we didn't have a session token.

Appreciate any help with this.

Verify that having a "." (dot) in the video file name is OK for process jobs

We received an error when running
with the video file name
D232_20110526T093251.130Z_alt_h264.mp4

on the mounted MBARI file server
smb://titan.shore.mbari.org/m3/Projects/VAA/Benchmarks/mezzanine/DocRicketts/2011/05/232/D232_20110526T093251.130Z_alt_h264.mp4

the message was
Found s3://902005-benchmark//M3/Projects/VAA/Benchmarks/mezzanine/DocRicketts/2011/05/232/D232_20110526T093251.130Z_alt_h264.mp4 ...skipping upload

It is not clear why the upload was skipped, since a file of that name was not in the target AWS S3 bucket.

Is it because having a "." (dot) in the file name before the file extension caused confusion? Or something else??

Skip flag for subfolders

@lonnylundsten (request) Add a flag to the code that allows us to skip some subfolders because there will be times when we want to process most of the dives (subfolders) for a given month but not all.

This would be useful to add to the process and ecsprocess command respectively as both recursively walk any input folder.

Remove . from upload bucket prefix during training

@duane-edgington ran this command

deepsea-ai train --model yolov5x --instance-type ml.p3.16xlarge --labels labels.tar.gz --images images.tar.gz --label-map /Users/duane/amazon-studio-demos/benthic2017/voc/yolo.names --config /Users/duane/amazon-studio-demos/902005config.txt --input-s3 s3://902005-training-dev/ --output-s3 s3://902005-checkpoints-dev/ --epochs 1 --batch-size 80

which uploaded the data and included the dot . , i.e. s3://902005-training-dev/./training/images.tar.gz

This is presumably because the data was in the same directory as the command was run. Probably functionally acceptable, but better to remove the dot. This is a request to change that.

confirm behavior of --input-s3 parameter

On my recent run of deepsea-ai train, with parameters
--input-s3 s3://901103-models-deploy/megadetector/megafish_ROV_weights.pt
--resume True

the log file indicates that instead of using the indicated weights for transfer learning, the model training started with the last run of training of the model. That is not terrible in this case, since the input-s3 .pt file indicated was the original starting point, but it could be unexpected.

@danellecline @duane-edgington

mbari-org / deepsea-ai Goto Github PK

deepsea-ai's People

Stargazers

Watchers

deepsea-ai's Issues

Tasks

IMPORTANT. This was done in our new AWS sub account by setting up credentials through:

Tasks

Tasks

Recommend Projects

Recommend Topics

Recommend Org