Issue Description
Encountered the following error when attempting to train a Precision Level 3 MaskRCNN model using EPD. This error comes after having integrated the .yaml
parser within P3Trainer.py
.
Traceback (most recent call last):
File "tools/train_net.py", line 201, in <module>
main()
File "tools/train_net.py", line 194, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 72, in train
start_iter=arguments["iteration"],
File "/home/cardboardvoice/anaconda3/envs/p3_trainer/lib/python3.6/site-packages/maskrcnn_benchmark-0.1-py3.6-linux-x86_64.egg/maskrcnn_benchmark/data/build.py", line 164, in make_data_loader
sampler = make_data_sampler(dataset, shuffle, is_distributed)
File "/home/cardboardvoice/anaconda3/envs/p3_trainer/lib/python3.6/site-packages/maskrcnn_benchmark-0.1-py3.6-linux-x86_64.egg/maskrcnn_benchmark/data/build.py", line 64, in make_data_sampler
sampler = torch.utils.data.sampler.RandomSampler(dataset)
File "/home/cardboardvoice/anaconda3/envs/p3_trainer/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 94, in __init__
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0
Expected Behaviour
The training is supposed to proceed without any errors.
Actual Behaviour
The training fails the aforementioned error in terminal.
Error Source
Currently, the integration of the .yaml
parser in P3Trainer.py
seems to be the root cause.
[ Update as of 20220812 ]: The integration of the parser is not the root cause. With the EPD v0.2.2
P3 training workflow failing as well. It can be deduced that the cause should be narrowed to unknown dependency conflicts.