michigancog / gaze-attention Goto Github PK
View Code? Open in Web Editor NEWIntegrating Human Gaze into Attention for Egocentric Activity Recognition (WACV 2021)
License: MIT License
Integrating Human Gaze into Attention for Egocentric Activity Recognition (WACV 2021)
License: MIT License
Hi and thanks for the great work.
I have difficulties reproducing the result reported on the EGTEA Gaze+ dataset. I'm using your provided trained weights and following the guide on code usage I get this number on different splits:
test_split1.txt
: acc: 36.85, 49.21 / 0:22:27
test_split2.txt
:acc: 47.65, 57.44 / 0:15:22
test_split3.txt
:acc: 50.41, 60.14 / 0:15:07
How should I reproduce 69.73%?
I'm using parameters as default:
parser.add_argument('--mode', default='test', help='train | test')
parser.add_argument('--crop', type=int, default=224, help='for spatial cropping')
parser.add_argument('--trange', type=int, default=24, help='temporal range')
parser.add_argument('--stride', type=int, default=8, help='pooling stride for gaze prediction')
parser.add_argument('--b', type=int, default=1, help='batch size')
parser.add_argument('--wd', type=float, default=4e-5, help='weight decay')
parser.add_argument('--it1', type=int, default=8000, help='first decay point')
parser.add_argument('--it2', type=int, default=15000, help='second decay point')
parser.add_argument('--iters', type=int, default=18000, help='number of max iterations for training')
parser.add_argument('--lr', type=float, default=0.032, help='learning rate')
parser.add_argument('--ngpu', type=int, default=1, help='number of GPUs to use')
parser.add_argument('--eps', type=float, default=1000, help='epsilon for the gradient estimator')
parser.add_argument('--anneal', type=float, default=1e-3, help='anneal rate for epsilon')
parser.add_argument('--datapath', default='dataset', help='path to dataset')
parser.add_argument('--datasplit', type=int, default=1, help='data split for the cross validation')
parser.add_argument('--weight', default='weights/i3d_iga_best1_base.pt', help='path to the weight file for the base network')
parser.add_argument('--seed', type=int, default=1, help='random seed')
parser.add_argument('--test_sparse', action='store_true', help='whether to test sparsely for fast evaluation')
Hello, I'm trying to reproduce your research results.
When I download EGTEA Gaze+ dataset, I found that it doesn't contain rgb images & flow images.
So I created them by using denseflow, which is reccommended to use in another discussion.
After creating images in dataset/images_flow or images_rgb, I ran the code below as instructed.
python3 main.py --mode test
However, the error occured and failed to reproduce your result.
datasplit: 1
weight: weights/i3d_iga_best1_base.pt
mode: test
test_sparse: False
loading weight file: weights/i3d_iga_best1_base.pt
loading weight file: weights/i3d_iga_best1_gaze.pt
loading weight file: weights/i3d_iga_best1_attn.pt
run on cuda
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P05-R01-PastaSalad-160540-162131-F003848-F003896/u/0048.jpg'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P05-R01-PastaSalad-160540-162131-F003848-F003896/v/0048.jpg'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P05-R01-PastaSalad-169741-171463-F004069-F004120/u/0051.jpg'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P05-R01-PastaSalad-169741-171463-F004069-F004120/v/0051.jpg'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P04-R06-GreekSalad-682170-683940-F016368-F016419/u/0051.jpg'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P04-R06-GreekSalad-682170-683940-F016368-F016419/v/0051.jpg'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P04-R06-GreekSalad-767250-769130-F018410-F018464/u/0054.jpg'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P04-R06-GreekSalad-767250-769130-F018410-F018464/v/0054.jpg'): can't open/read file: check file path/integrity
Traceback (most recent call last):
File "main.py", line 261, in <module>
main()
File "main.py", line 85, in main
test(test_loader, model_base, model_gaze, model_attn, num_action)
File "main.py", line 222, in test
for i, (rgb, flow, label) in enumerate(test_loader, 1):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 681, in __next__
data = self._next_data()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/usr/local/lib/python3.8/dist-packages/torch/_utils.py", line 461, in reraise
raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/hayashide/catkin_ws/src/third_party/Gaze-Attention/dataset.py", line 67, in __getitem__
fimg = np.concatenate((fimgu[..., np.newaxis], fimgv[..., np.newaxis]), -1)
TypeError: 'NoneType' object is not subscriptable
Please give me some advice to resolve this issue.
Thanks,
In the dataloader, I noticed gaze data is read from npy files. There should be an intermediate step where you preprocessed gaze data from the text file in the original labels. Is there any instructions on how to do that? I read in the paper you're using one hot encoded approach where value "1" is stored on the x-y grid where gaze is pointing at. I just need a clarification that if my understanding is true.
Hi, I have a problem with the dataset preparation. In particular your code takes as input single rgb images but the Egetea dataset provides only videos (which they call cropped clips)as you can see in the readme of the EGTEA Gaze + dataset:
Can you explain how you obtained such images? Or can you provide a link to that images?
Thanks
Hi,
I went through the dataset.py code and I can't find exactly where you make use of the gaze information provided EGTEA Gaze +
. (A sample output of gaze_data
What's pmaps and how to generate these numpy files.
Line 50 in 1e80952
Thanks,
From where do we obtain the pmaps (as used in training)?
Hi,
Could you share the util functions you used to generate the grad cam?
Thanks.
How can I prepare the EGTEA dataset?
The original dataset has cropped video clips, but your code is based on image files.
Is there any way to convert them?
Hello,
I'm just wondering if you have the optical flow of this dataset somewhere uploaded so I don't calculate them again?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.