missing image data for training about open-sora-plan HOT 20 OPEN

quantumiracle commented on August 16, 2024

missing image data for training

from open-sora-plan.

Comments (20)

LinB203 commented on August 16, 2024 1

I checked the program that was uploading data in the background and it interrupted, maybe there is some unknown error. I'm trying to fix it, maybe we should be uploading zip files instead of video files.

from open-sora-plan.

quantumiracle commented on August 16, 2024

Another error I got from t2v training is:

/opensora/dataset/t2v_datasets.py", line 76, in get_video
    frame_idx = self.vid_cap_list[idx]['frame_idx']
KeyError: 'frame_idx'

where frame_idx does not exists in the json file.

from open-sora-plan.

LinB203 commented on August 16, 2024

Hi,

When launching the t2v training, the it also requires to specify an image data path as here. However, in the HuggingFace dataset repo there is no image-text dataset, which leads to error when launching training:
FileNotFoundError: [Errno 2] No such file or directory: '/dxyl_data02/anno_jsons/human_images_162094.json'
How to fix this?

https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/blob/main/anno_jsons/human_images_162094.json

from open-sora-plan.

LinB203 commented on August 16, 2024

Another error I got from t2v training is:
/opensora/dataset/t2v_datasets.py", line 76, in get_video
    frame_idx = self.vid_cap_list[idx]['frame_idx']
KeyError: 'frame_idx'
where frame_idx does not exists in the json file.

Do you use the v1.1's code? The code of v1.1 should use annotation from here.

from open-sora-plan.

quantumiracle commented on August 16, 2024

Thanks for quick reply.

It seems I'm using v1.0 dataset.

from open-sora-plan.

quantumiracle commented on August 16, 2024

Hi,

when I'm trying to download v1.1 dataset with:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="LanguageBind/Open-Sora-Plan-v1.1.0", repo_type="dataset", local_dir=data_dir)

I got error:

...
Fetching 117685 files:  12%|████████████▉                                                                                               | 14039/117685 [14:29<1:46:57, 16.15it/s]
4173980_resize1080p.mp4: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.55M/4.55M [00:00<00:00, 247MB/s]
4173972_resize1080p.mp4: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5.49M/5.49M [00:00<00:00, 40.7MB/s]
4173976_resize1080p.mp4: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.94M/8.94M [00:00<00:00, 31.9MB/s]
4173975_resize1080p.mp4: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10.7M/10.7M [00:00<00:00, 45.5MB/s]
4173977_resize1080p.mp4: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12.3M/12.3M [00:00<00:00, 66.7MB/s]
4173973_resize1080p.mp4: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26.1M/26.1M [00:00<00:00, 83.2MB/s]
4173981_resize1080p.mp4: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18.8M/18.8M [00:00<00:00, 126MB/s]
4173982_resize1080p.mp4: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24.6M/24.6M [00:00<00:00, 298MB/s]
Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
    raise HfHubHTTPError(message, response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError:

403 Forbidden: None.
Cannot access content at: https://cdn-lfs-us-1.huggingface.co/repos/d1/a4/d1a47faaa1475f32c7e503cebcd6029bdf94c4a148ceb23e2f5e052d50d3f02a/dc4d652445209b5ad6ad292bc6755cc067abf187c739ee2bf8e8b75b3b2a9d90?response-content-disposition=inline%3B+filename*%3DUTF-8%27%274173971_resize1080p.mp4%3B+filename%3D%224173971_resize1080p.mp4%22%3B&response-content-type=video%2Fmp4&Expires=1718648001&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxODY0ODAwMX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2QxL2E0L2QxYTQ3ZmFhYTE0NzVmMzJjN2U1MDNjZWJjZDYwMjliZGY5NGM0YTE0OGNlYjIzZTJmNWUwNTJkNTBkM2YwMmEvZGM0ZDY1MjQ0NTIwOWI1YWQ2YWQyOTJiYzY3NTVjYzA2N2FiZjE4N2M3MzllZTJiZjhlOGI3NWIzYjJhOWQ5MD9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=dnYc6UEjLcqCarh~mWlzSybLf505FdK8ClHTvIKnnY4Pc2nMLnsp5fxAUSLz3u24xOSQoykAxOG2h2kKCgMG-yKe4bUGRkLrLNJwn75Xl1C5L2iza3-wE6LlnDAre6Ju81QWolv1Wy6fIK0OHWJVMhIHquUqKyMHiaOXl7CLktLQg0POb-wga8HB9HFLDdsUm~1a2uH2mSAOcdAQz9teTMOJ4HCIOfwuuPaJiYK0g0NPeiddWMP4U~8R3cgghVLzq67YFrmmdcpT6Rv-K1F4LE4nLIo9LwQmATHzbI2y1Xgmzs9wFN4U7aGJ6Hq7avfaFplKLOK7nvV-enaJ-t0EOA__&Key-Pair-Id=K2FPYV99P2N66Q.
If you are trying to create or update content,make sure you have a token with the `write` role.

from open-sora-plan.

LinB203 commented on August 16, 2024

It seems that it is a network error? Btw, now the full pexel datasets do not upload completely.

from open-sora-plan.

quantumiracle commented on August 16, 2024

Hi,

I think this is an access issue instead of network problem since it reports:

If you are trying to create or update content,make sure you have a token with the `write` role.

I tried both with snapshot_download and git clone directly, and both give this error. Any idea on why this happens?

from open-sora-plan.

quantumiracle commented on August 16, 2024

Yes, compressed tar.gz would be good

Also it may be good to host each dataset with different url, and provide a downloading script. Trying to download the entire dataset and got interrupt in the middle will take a long time.

from open-sora-plan.

physercoe commented on August 16, 2024

Hi,

I think this is an access issue instead of network problem since it reports:
If you are trying to create or update content,make sure you have a token with the `write` role.
I tried both with snapshot_download and git clone directly, and both give this error. Any idea on why this happens?

i met the same question, please provide the compressed tar.gz files instead of a lot of seperate small files

from open-sora-plan.

quantumiracle commented on August 16, 2024

@LinB203 Hi, when will the dataset be ready? I could help with curating the data if you need.

from open-sora-plan.

LinB203 commented on August 16, 2024

Hi all, due to pexel data upload exception. We decided to package it and upload it again. This process will last about a week.

from open-sora-plan.

missing image data for training about open-sora-plan HOT 20 OPEN

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent