Giter Site home page Giter Site logo

michigancog / gaze-attention Goto Github PK

View Code? Open in Web Editor NEW
22.0 3.0 5.0 5.46 MB

Integrating Human Gaze into Attention for Egocentric Activity Recognition (WACV 2021)

License: MIT License

Python 100.00%
wacv wacv2021 egocentric-action-recognition direct-optimization gaze-input gtea-gaze-dataset

gaze-attention's People

Contributors

kylemin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gaze-attention's Issues

Reproducing results

Hi and thanks for the great work.

I have difficulties reproducing the result reported on the EGTEA Gaze+ dataset. I'm using your provided trained weights and following the guide on code usage I get this number on different splits:

test_split1.txt : acc: 36.85, 49.21 / 0:22:27
test_split2.txt :acc: 47.65, 57.44 / 0:15:22
test_split3.txt :acc: 50.41, 60.14 / 0:15:07

How should I reproduce 69.73%?

I'm using parameters as default:

parser.add_argument('--mode', default='test', help='train | test')
parser.add_argument('--crop', type=int, default=224, help='for spatial cropping')
parser.add_argument('--trange', type=int, default=24, help='temporal range')
parser.add_argument('--stride', type=int, default=8, help='pooling stride for gaze prediction')
parser.add_argument('--b', type=int, default=1, help='batch size')
parser.add_argument('--wd', type=float, default=4e-5, help='weight decay')
parser.add_argument('--it1', type=int, default=8000, help='first decay point')
parser.add_argument('--it2', type=int, default=15000, help='second decay point')
parser.add_argument('--iters', type=int, default=18000, help='number of max iterations for training')
parser.add_argument('--lr', type=float, default=0.032, help='learning rate')
parser.add_argument('--ngpu', type=int, default=1, help='number of GPUs to use')
parser.add_argument('--eps', type=float, default=1000, help='epsilon for the gradient estimator')
parser.add_argument('--anneal', type=float, default=1e-3, help='anneal rate for epsilon')

parser.add_argument('--datapath', default='dataset', help='path to dataset')
parser.add_argument('--datasplit', type=int, default=1, help='data split for the cross validation')
parser.add_argument('--weight', default='weights/i3d_iga_best1_base.pt', help='path to the weight file for the base network')
parser.add_argument('--seed', type=int, default=1, help='random seed')
parser.add_argument('--test_sparse', action='store_true', help='whether to test sparsely for fast evaluation')

Reproduction trouble: how to prepare images_flow

Hello, I'm trying to reproduce your research results.

When I download EGTEA Gaze+ dataset, I found that it doesn't contain rgb images & flow images.
So I created them by using denseflow, which is reccommended to use in another discussion.

After creating images in dataset/images_flow or images_rgb, I ran the code below as instructed.
python3 main.py --mode test

However, the error occured and failed to reproduce your result.

datasplit:     1
weight:        weights/i3d_iga_best1_base.pt
mode:          test
test_sparse:   False
loading weight file: weights/i3d_iga_best1_base.pt
loading weight file: weights/i3d_iga_best1_gaze.pt
loading weight file: weights/i3d_iga_best1_attn.pt
run on cuda
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P05-R01-PastaSalad-160540-162131-F003848-F003896/u/0048.jpg'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P05-R01-PastaSalad-160540-162131-F003848-F003896/v/0048.jpg'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P05-R01-PastaSalad-169741-171463-F004069-F004120/u/0051.jpg'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P05-R01-PastaSalad-169741-171463-F004069-F004120/v/0051.jpg'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P04-R06-GreekSalad-682170-683940-F016368-F016419/u/0051.jpg'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P04-R06-GreekSalad-682170-683940-F016368-F016419/v/0051.jpg'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P04-R06-GreekSalad-767250-769130-F018410-F018464/u/0054.jpg'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('dataset/images_flow/P04-R06-GreekSalad-767250-769130-F018410-F018464/v/0054.jpg'): can't open/read file: check file path/integrity
Traceback (most recent call last):
  File "main.py", line 261, in <module>
    main()
  File "main.py", line 85, in main
    test(test_loader, model_base, model_gaze, model_attn, num_action)
  File "main.py", line 222, in test
    for i, (rgb, flow, label) in enumerate(test_loader, 1):
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.8/dist-packages/torch/_utils.py", line 461, in reraise
    raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/hayashide/catkin_ws/src/third_party/Gaze-Attention/dataset.py", line 67, in __getitem__
    fimg = np.concatenate((fimgu[..., np.newaxis], fimgv[..., np.newaxis]), -1)
TypeError: 'NoneType' object is not subscriptable

Please give me some advice to resolve this issue.
Thanks,

Gaze data preparation

In the dataloader, I noticed gaze data is read from npy files. There should be an intermediate step where you preprocessed gaze data from the text file in the original labels. Is there any instructions on how to do that? I read in the paper you're using one hot encoded approach where value "1" is stored on the x-y grid where gaze is pointing at. I just need a clarification that if my understanding is true.

EGTEA preparation

Hi, I have a problem with the dataset preparation. In particular your code takes as input single rgb images but the Egetea dataset provides only videos (which they call cropped clips)as you can see in the readme of the EGTEA Gaze + dataset:
EGTEA
Can you explain how you obtained such images? Or can you provide a link to that images?
Thanks

Whats pmap?

From where do we obtain the pmaps (as used in training)?

grad-cam generation

Hi,
Could you share the util functions you used to generate the grad cam?

Thanks.

Data preparation

How can I prepare the EGTEA dataset?
The original dataset has cropped video clips, but your code is based on image files.
Is there any way to convert them?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.