Giter Site home page Giter Site logo

ajabri / videowalk Goto Github PK

View Code? Open in Web Editor NEW
265.0 20.0 37.0 58.25 MB

Repository for "Space-Time Correspondence as a Contrastive Random Walk" (NeurIPS 2020)

Home Page: http://ajabri.github.io/videowalk

License: MIT License

HTML 9.07% Python 90.93%

videowalk's Introduction

Space-Time Correspondence as a Contrastive Random Walk

This is the repository for Space-Time Correspondence as a Contrastive Random Walk, published at NeurIPS 2020.

[Paper] [Project Page] [Slides] [Poster] [Talk]

@inproceedings{jabri2020walk,
    Author = {Allan Jabri and Andrew Owens and Alexei A. Efros},
    Title = {Space-Time Correspondence as a Contrastive Random Walk},
    Booktitle = {Advances in Neural Information Processing Systems},
    Year = {2020},
}

Consider citing our work or acknowledging this repository if you found this code to be helpful :)

Requirements

  • pytorch (>1.3)
  • torchvision (0.6.0)
  • cv2
  • matplotlib
  • skimage
  • imageio

For visualization (--visualize):

  • wandb
  • visdom
  • sklearn

Train

An example training command is:

python -W ignore train.py --data-path /path/to/kinetics/ \
--frame-aug grid --dropout 0.1 --clip-len 4 --temp 0.05 \
--model-type scratch --workers 16 --batch-size 20  \
--cache-dataset --data-parallel --visualize --lr 0.0001

This yields a model with performance on DAVIS as follows (see below for evaluation instructions), provided as pretrained.pth:

 J&F-Mean    J-Mean  J-Recall  J-Decay    F-Mean  F-Recall   F-Decay
  0.67606  0.645902  0.758043   0.2031  0.706219   0.83221  0.246789

Arguments of interest:

  • --dropout: The rate of edge dropout (default 0.1).
  • --clip-len: Length of video sequence.
  • --temp: Softmax temperature.
  • --model-type: Type of encoder. Use scratch or scratch_zeropad if training from scratch. Use imagenet18 to load an Imagenet-pretrained network. Use scratch with --resume if reloading a checkpoint.
  • --batch-size: I've managed to train models with batch sizes between 6 and 24. If you have can afford a larger batch size, consider increasing the --lr from 0.0001 to 0.0003.
  • --frame-aug: grid samples a grid of patches to get nodes; none will just use a single image and use embeddings in the feature map as nodes.
  • --visualize: Log diagonistics to wandb and data visualizations to visdom.

Data

We use the official torchvision.datasets.Kinetics400 class for training. You can find directions for downloading Kinetics here. In particular, the code expects the path given for kinetics to contain a train_256 subdirectory.

You can also provide --data-path with a file with a list of directories of images, or a path to a directory of directory of images. In this case, clips are randomly subsampled from the directory.

Visualization

By default, the training script will log diagnostics to wandb and data visualizations to visdom.

Pretrained Model

You can find the model resulting from the training command above at pretrained.pth. We are still training updated ablation models and will post them when ready.


Evaluation: Label Propagation

The label propagation algorithm is described in test.py. The output of test.py (predicted label maps) must be post-processed for evaluation.

DAVIS

To evaluate a trained model on the DAVIS task, clone the davis2017-evaluation repository, and prepare the data by downloading the 2017 dataset and modifying the paths provided in eval/davis_vallist.txt. Then, run:

Label Propagation:

python test.py --filelist /path/to/davis/vallist.txt \
--model-type scratch --resume ../pretrained.pth --save-path /save/path \
--topk 10 --videoLen 20 --radius 12  --temperature 0.05  --cropSize -1

Though test.py expects a model file created with train.py, it can easily be modified to be used with other networks. Note that we simply use the same temperature used at training time.

You can also run the ImageNet baseline with the command below.

python test.py --filelist /path/to/davis/vallist.txt \
--model-type imagenet18 --save-path /save/path \
--topk 10 --videoLen 20 --radius 12  --temperature 0.05  --cropSize -1

Post-Process:

# Convert
python eval/convert_davis.py --in_folder /save/path/ --out_folder /converted/path --dataset /davis/path/

# Compute metrics
python /path/to/davis2017-evaluation/evaluation_method.py \
--task semi-supervised   --results_path /converted/path --set val \
--davis_path /path/to/davis/

You can generate the above commands with the script below, where removing --dryrun will actually run them in sequence.

python eval/run_test.py --model-path /path/to/model --L 20 --K 10  --T 0.05 --cropSize -1 --dryrun

Test-time Adaptation

To do.

videowalk's People

Contributors

ajabri avatar trevorpburke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

videowalk's Issues

evaluation on the VIP and JHMDB datasets

Hi Allan,

Happy new year! And many thanks for releasing the code of this great work!

I used the codebase and the pretrained model provided in the repo to evaluate the VIP and JHMDB datasets, the results are:
VIP: 37.12(mIOU), JHMDB: 57.62([email protected]) and 79.59([email protected]).

They are noticeably lower than the results in your paper:
VIP: 38.6(mIOU), JHMDB: 59.3([email protected]) and 84.9([email protected]).

Could you please help to check whether I evaluated them in a right way?
For VIP, I used the command:
python test.py --filelist eval/VIP_vallist.txt --model-type scratch --resume ../pretrained.pth --save-path vip_results --topk 10 --videoLen 4 --radius 12 --temperature 0.05 --cropSize 560

For JHMDB, I used the command:
python test.py --filelist eval/jhmdb_vallist.txt --model-type scratch --resume ../pretrained.pth --save-path jhmdb_results --topk 10 --videoLen 7 --radius 12 --temperature 0.05 --cropSize 320

The hyperparameters above were selected based on your paper except temperature (I've also tried 0.07 but found 0.05 is better).

BTW, there're two bugs for JHMDB evaluation:

  1. https://github.com/ajabri/videowalk/blob/master/code/data/jhmdb.py#L231
    the "sio" should be imported in this python file

  2. https://github.com/ajabri/videowalk/blob/master/code/test.py#L161
    it should be "test_utils" rather than "utils"

Best feature

Hi Allan,
Great work!
I see in the test code, by default the layer4 of ResNet is removed.
May I know if it is also true when training?
Or train with layer4 but test with layer3 is better?

Multiple cycle lengths

@ajabri Say, we're considering a cycle I_1 -> I_t -> I_1. Do I understand correctly that the only cycle of length 1 would be I_1 -> I_2 -> I_1; and that I_2 -> I_3 -> I2 is not considered? On other words,

Thank you!

Great work

Hi, @ajabri

Very great work. Your talk at CVPR20 about this work is also impressive. Looking forward to the code.

patch_grid(...): effective stride is always 32?

def patch_grid(transform, shape=(64, 64, 3), stride=[0.5, 0.5]):
stride = np.random.random() * (stride[1] - stride[0]) + stride[0]
stride = [int(shape[0]*stride), int(shape[1]*stride), shape[2]]

@ajabri Do I understand correctly that after L58 stride is always equal [64, 64, 3] and random number is not used since the brackets in L57 evaluate to (0.5 - 0.5) == 0

Q. Get affinity matrix for random walk.

Hello. Thanks to your work!

I've referred to your code and have a question.

please see your code As = self.affinity(q[:, :, :-1], q[:, :, 1:]) (code/model.py, line 140.)

we can define the affinity matrix that walk frame 1 to frame 2 as torch.matmul(frame2, frame1)
and then, the walk frame1 to frame3 could be gotten as matmul( matmul(frame3, frame2), matmul(frame2, frame1) )

As a result, I think your code should be changed **As = self.affinity(q[:, :, :-1], q[:, :, 1:]) ** to **As = self.affinity(q[:, :, 1:], q[:, :, :-1]) **

But, you got a good performance in your expriment. It seems that I miss some figures.
Could you explain it?

Thanks. :)

Low performance with pretrained.pth

Hi,

I recently run your pre-trained model on davis 2017 with the exact same command you listed in the readme. python test.py --filelist /path/to/davis/vallist.txt \ --model-type scratch --resume ../pretrained.pth --save-path /save/path \ --topk 10 --videoLen 20 --radius 12 --temperature 0.05 --cropSize -1

However, the final performance based on the official davis evaluation script is not as good as the one claimed in the paper. What I got is around 61 for J&F-Mean. Specifically, the detailed performance is listed as below:

J&F-Mean   J-Mean  J-Recall  J-Decay   F-Mean  F-Recall  F-Decay
 0.614429 0.584634  0.686656 0.225137 0.644223  0.763603 0.256438

---------- Per sequence results for val ----------
            Sequence   J-Mean   F-Mean
      bike-packing_1 0.496049 0.711096
      bike-packing_2 0.685996 0.752332
         blackswan_1 0.934492 0.973339
         bmx-trees_1 0.301675 0.770057
         bmx-trees_2 0.644392 0.845591
        breakdance_1 0.666383 0.676260
             camel_1 0.747073 0.855923
    car-roundabout_1 0.852337 0.714172
        car-shadow_1 0.807822 0.778809
              cows_1 0.920527 0.956957
       dance-twirl_1 0.549648 0.593753
               dog_1 0.851405 0.867017
         dogs-jump_1 0.302670 0.435166
         dogs-jump_2 0.536664 0.599638
         dogs-jump_3 0.788082 0.822245
     drift-chicane_1 0.729466 0.786235
    drift-straight_1 0.526541 0.528944
              goat_1 0.800556 0.734920
         gold-fish_1 0.721810 0.717445
         gold-fish_2 0.659471 0.700005
         gold-fish_3 0.820182 0.845394
         gold-fish_4 0.848312 0.915238
         gold-fish_5 0.879084 0.878996
    horsejump-high_1 0.773536 0.888244
    horsejump-high_2 0.723407 0.944909
             india_1 0.631993 0.592968
             india_2 0.567645 0.560544
             india_3 0.629983 0.627841
              judo_1 0.760509 0.765048
              judo_2 0.749010 0.756075
         kite-surf_1 0.270090 0.267305
         kite-surf_2 0.004306 0.062131
         kite-surf_3 0.093566 0.127047
          lab-coat_1 0.000000 0.000000
          lab-coat_2 0.000000 0.000300
          lab-coat_3 0.000000 0.000000
          lab-coat_4 0.000000 0.000000
          lab-coat_5 0.000000 0.000000
             libby_1 0.803691 0.920149
           loading_1 0.900133 0.875399
           loading_2 0.383891 0.567959
           loading_3 0.682442 0.716217
       mbike-trick_1 0.571612 0.743456
       mbike-trick_2 0.639744 0.669962
    motocross-jump_1 0.340788 0.395740
    motocross-jump_2 0.519756 0.554731
paragliding-launch_1 0.819913 0.923513
paragliding-launch_2 0.645564 0.885479
paragliding-launch_3 0.034370 0.137811
           parkour_1 0.805982 0.893970
              pigs_1 0.812613 0.764461
              pigs_2 0.617975 0.750136
              pigs_3 0.906452 0.882834
     scooter-black_1 0.389385 0.669319
     scooter-black_2 0.722495 0.675855
          shooting_1 0.270579 0.454346
          shooting_2 0.747166 0.661882
          shooting_3 0.753406 0.872043
           soapbox_1 0.785921 0.778360
           soapbox_2 0.647941 0.710407
           soapbox_3 0.586195 0.741657

I am wondering whether this is the expected performance without test time adaptation? Or could you list the detailed step-by-step procedure so we can reproduce the results more easily?

Thanks.

Cross-entropy loss computation question

@ajabri The paper specifies that the loss is cross-entropy between the row-normalized cycle transition matrix and the identity matrix:
image

However, the code seems to compute something slightly different:
https://github.com/ajabri/videowalk/blob/0834ff9/code/model.py#L175-L176:

# self.xent = nn.CrossEntropyLoss(reduction="none")
logits = torch.log(A+EPS).flatten(0, -2)
loss = self.xent(logits, target).mean()

where matrix A is row-stochastic.

CrossEntropyLoss module expects unnormalized logits and does log-softmax directly. This is like computing log_softmax(log(P[i]))[i] - and this is not regular cross-entropy which would have been log(P[i])[i]. Should nn.NLLLoss have been used instead?

The code seems to use log-probs in place of logits (by logits I mean raw unnormalized scores). Is this intentional? If not it might be a bug. @ajabri Could you please comment on this.

Thank you!

How many GPUs do you used for training?

Hi, thank you for making the code public.

I use the training and testing command you provided. However, the final test result of the model from the last epoch is about slightly lower than the number you provided: J&F-Mean 67.6(yours) VS 66.9(ours).

I'm guessing the problem might be that you didn't use sync_bn so the batch norm parameters are computed per GPU and maybe I'm using a different number of GPUs compared with you.

So how many GPUs do you use during training?

Question about details in the inference time

Hi, thank you for sharing such an awesome project. I have some questions about the details in your paper, and I sincerely hope you can help me solve them.

  1. In your appendix, you said that the radius on feature map considered for source node is 12, what does it mean? It seems that there is no definition for the radius in your main paper. Is it similar to the restricted attention area in MAST and CorrFlow? If so, do you still use the same radius for long term correspondence, like the correspondence between the 1st frame and 100th frame?
  2. In your appendix, the number of neighbors is 10, while in your main paper you use K=5 for the k-nearest neighbors (3.1.1). Thus I would like to know whether they are the same things or not.
  3. Could you please tell us when you will release the code?

I would really appreciate it if you could help me. Look forward to your reply, thanks!

Reproducing with pretrained.pth

HI @ajabri

Thanks for sharing the code and model.

However, I am having trouble reproducing your results with the provided pretrained.pth.
It only yields J&F-Mean 0.407953.

Could you please have a check on that?

Thx!

Questions about single feature map training

Thanks for the great job and for sharing the code.

About the single feature map training, I have some questions. I will be appreciated if you can share the answers.

  1. To get rid of the boundary artifacts, did you try to use "reflect" or "circular" padding mode? Is it helpful?
  2. From my point of view, the short cut is caused by not only the boundary artifacts but also the shared computation among the output features. What do you think are the primary cause of the short cut?

Can you share your hardware specifications?

Hi, thanks for sharing the code.

I'm trying to implement your testing code, but whenever during computing affinity matrices my computer went dead.

Can you share your hardware specifications?

Thanks

Label propagation: predictions before context has burned in

@ajabri Could you please explain how results are filled for first n_context = 20 frames? Are they copied from ground truth? The paper suggests that the ground truth is only used for the 1st frame, but I can't find where predictions for 2nd-20th frames are filled in. Are they filled in as background?

From what I could see, predictions affect lbls only after n_context frames https://github.com/ajabri/videowalk/blob/0834ff9/code/test.py#L144-L148:

if t > 0:
    lbls[t + n_context] = pred
else:
    pred = lbls[0]
    lbls[t + n_context] = pred

For DAVIS evaluation, the frames are saved at index t and not t + n_context https://github.com/ajabri/videowalk/blob/0834ff9/code/test.py#L168:

outpath = os.path.join(args.save_path, str(vid_idx) + '_' + str(t))

Are these 2nd-20th frames included in error metric evaluation? and what prections are used for these frames?

Thanks, @ajabri !

ResNet50 performance

Hi Allan,

Have you tried ResNet50 with your approach? I recently tried by myself, but get very low performance for the starting epoches, ~30 J&F mean. Any ideas?

Are Kornia augs and PyTorch Unfold used in practice?

@ajabri From what I can see in the code skimage-based augmentations are used (random crop and spatial jitter) and kornia_augs.py is not used at all (and then Appendix I is slightly incorrect, since it uses Kornia). Am I right?

If correct, what were the arguments for choosing ones over the other? Thanks!

Landmark propagation

Hi!
Thanks for sharing your code, it worked extremely well for my segmentation task!
I suppose that landmark detection should also work very well with your method, did you try something like this?
It should be very close to the pose estimation task
Thanks!

low JHMDB PCK

Hi, thanks for the great work!

I just use the following command to run CRW on JHMDB:

python code/test.py \
    --filelist jhmdb_testlist_split1.txt \
    --model-type scratch \
    --resume ../pretrained.pth \
    --save-path jhmdb_results \
    --topk 10 \
    --videoLen 7 \
    --radius 5 \
    --temperature 0.05 \
    --cropSize 320

Here're the PCK number I got:

0.1: [53.41509394]
0.2: [75.56534539]
0.3: [84.21850545]
0.4: [89.00846776]
0.5: [92.19968561]

seems to be much lower than the ones reported in the paper.
image

I basically borrowed the command from another related issue. The filelist I used is from the original UVC repo and it contains 268 lines.

Could you please take a look and see if there's anything I did wrong here?

Thanks!

Using selfsim_fc layer for label propagation

@ajabri By chance, have you tried using the layer from selfsim_fc head for label propagation? In appendix G you mention that res4-features perform worse than res3. But what about selfsim_fc? It is located even closer to the loss function, does it perform even worse than res4?

Thanks!

Different image normalization mean/std in different code paths

@ajabri I noticed that different code paths use different image normalization parameters.

Training Kinetics400 path: https://github.com/ajabri/videowalk/blob/0834ff9/code/utils/augs.py#L10-L11 :

IMG_MEAN = (0.4914, 0.4822, 0.4465)
IMG_STD  = (0.2023, 0.1994, 0.2010)

Evaluation DAVIS2017 path: https://github.com/ajabri/videowalk/blob/0834ff9/code/data/vos.py#L173:

mean, std = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]

Both seem RGB format. Is it correct?

Why are they different? Does this lead to better accuracy? Thanks!

The effects of left multiply and right multiply of stochastic matrix (i.e. the parameter --flip)

Thank you so much for your great job and sharing the code.

In the paper Eq. 1, the affinities are normalized by a row-wise softmax. So I think in Eq.2, the multiply should be the right multiply. More specifically, the right multiply A_{t}^{t+1} * A_{t+1}^{t+2} makes sense, while the left multiply A_{t+1}^{t+2} * A_{t}^{t+1} does not make sense. According to Eq. 4, we can also know the right multiply is correct.

However, when we reproduce the experiments, the result is opposite. The left multiply (--flip True) is correct, of which the performance is above 0.67; the right multiply (--flip False, which is default) is wrong, where the performance is only around 0.2, which is below than random initialized model (around 0.4).

I cannot explain why it happens, could you help me to find the reasons?

Looking forward to your reply, thank you!

Problems with expansion

Hi, at first thanks for sharing your code, it worked like a charm!
I am using your work to use your work to track the process of contraction and expansion for various processes.
Tracking an object which contracts itself (e.g a balloon which loses its air) works perfectly!
However in contrast, trying to track the expansion, when filling it with air doesnt work as well. Only half of the object is captured at maximum expansion.
I already tried increasing the radius but the problem is, that it somehow selects features next to the object as most similar.
Do you have any idea how to circumvent this problem? (Training with smaller patches e.g?)
Thank you!

Efficient way to download Kinetics-400

@ajabri Would downloading it from AcademicTorrents have the good size/directory structure?

Or did you download it using https://github.com/Showmax/kinetics-downloader? (recommended at https://github.com/pytorch/vision/tree/master/references/video_classification#data-preparation; which runs youtube-dl and then converts all them to mp4 (and I guess, h264). I tried it and in 2 hours it just downloaded ~500Mb out of 400Gb.

Do you know if clips must converted to mp4? Or would VideoClips just use ffmpeg once for sampling frames? (in that case recoding to the same format is not needed)

Did you use some other way?

What is expected Kinetics400 dataset directory structure? (not explained at https://pytorch.org/docs/stable/torchvision/datasets.html#kinetics-400 or in the dataset metadata). Is it /path/to/dataset/<split>/<classlabel>/<youtubeid>.avi?

If yes, then what is the origin of train_256? From what I understoo the only splits are train, val and test

Thanks a lot!

Is it possible to use ILSVRC-VID dataset to train ?

Hi, thanks for sharing your work! I was wondering if it is possible to train the model on ILSVRC-VID or YTB-VOS dataset instead?

I have tried creating a dataset ILSVRC and YTB-VOS dataset that returns a Tensor[F, H, W, C], where F is the number of frames of the image without transformation. However, after passing through the train transformation, it returns a tuple instead.

This tuple in turn gave me an error at train.py under train_one_epoch, video = video.to(device) list object has no attribute to. How can I rectify this issue? Thanks.

How many epochs for training?

Hi Allen,

Thanks for your great job. I am wondering how many epochs needed for training ?
Seems like it takes more than 10 hours for one epoch.
Btw, is it necessary to use Flip=True for training? I don't quite understand the use of --Flip.

Thanks so much.

Could it be possible to train image sequences?

Thank you so much for sharing the code! This idea is elegant and super cool! I am wondering whether this train.py can train the image sequences in each folder? I've tried to use both Imagefolder and the dataloader in test.py (Because the eval dataset is image sequence) to read image sequences. But both of them have the error with "XXX object does not support indexing". Or is there any method to transform the image sequences into videos that fit the training model?

Thank you very much!

Loss becomes "NAN" with Mixed Precision Training

Really impressive work!

We have tried to support DDP and Mixed Precision Training based on your implementation while meeting a NAN issues related to:

A[torch.rand_like(A) < self.edgedrop_rate] = -1e20

We have modified the above code as we used FP16:

A[torch.rand_like(A) < self.edgedrop_rate] = -65504

However, we find that the loss becomes to NAN soon. It would be great if you could give me any suggestions.

Cycle length bounds

@ajabri I have a question about the cycle building code https://github.com/ajabri/videowalk/blob/master/code/model.py#L147-L148:

for i in list(range(1, len(A12s))):
    g = A12s[:i+1] + A21s[:i+1][::-1]

What is semantics of the variable i? Is it cycle half-length? then why is the boundary len(A12s) and not len(A12s) + 1? If it's cycle half-length minus 1, then why is the starting index 1 and not 0?

For i == 1, the code seems to be building a cycle over 3 frames with 4 edges:

# For i == 1
g = [A12[0], A12[1], A21[1], A21[0]] 
# corresponding F0 -> F1 -> F2 -> F1 -> F0

Is this correct?

E.g. for the boundary case when we have just two frames and want to build a cycle of two edges (frame1 -> frame2 -> frame1), len(A12s) == len(A21s) == 1, but in the existing code this seems not possible.

I propose to change the starting bound to 0 so the loop reads for i in range(len(A12s)):

Thank you!

Potential typo when creating Kinetics400 dataset

Kinetics400 dataset docs specify that extensions argument is a tuple extensions=('avi',), while in train.py it's a string extensions=('mp4'): https://github.com/ajabri/videowalk/blob/master/code/train.py#L95. Even if this works for some reason, it would be clearer to remove this ambiguity

UPD: I found why it works under the hood: python's str.endswith which is used for verifying extensions can accept either a tuple or a plain string, and PyTorch doesn't check the type of the passed object, that's why it's a lucky go

Loading Kinetics400 takes 10h

Hi, thank you for sharing your code. I'm trying to train on Kinetics400 with your code. From the progress bar, it seems it needs 10h to load the whole dataset. Is that the same case in your experiments?

image

DAVIS evaluation resolution

Hi,
In the paper, you mentioned that you evaluate on 480p images for DAVIS. Did you mean that original image size is used instead of 480x480px? i.e., crop_size = -1 in the args?
Best,

Label propgation problem

First of all, thanks for your great work!

When I do label propagation, this error is happen in the test.py file.
(I followed your 'Evaluation: Label Propagation' tab. in README)

******* Vid 0 TOOK 63.87427091598511 *******
******* Vid 1 (70 frames) *******
computed features 0.48213911056518555
Killed

Why does the process is killed after only processing video 0 ?
How can I solve this problem?

More details about the experiments to avoid the trivial shortcut solution

Thank you so much for your great job and sharing the code.

In Supplementary, the Section C: Using a Single Feature Map for Training, you designed four experiments and tried to avoid the trivial solution, which is the network learns a shortcut relying on boundary artifacts.
I want to know more details about the first two experiments, i.e.

  1. removing padding altogether;
  2. reducing the receptive field of the network to the extent that entries in the center crop of the spatial feature map do not see the boundary; we then cropped the feature map to only see this region.

My questions are:
a) how do you remove the padding? Does it mean to set the value in padding place to zero, or set the length (or area) of padding to zero?
b) If you set the length of padding place to zero, what's the feature shape of the network outputs? I think the shape will be much smaller than the original one, how do you compare them?
c) Also if you set the length of padding place to zero, what is the difference between the experiment 1) and 2) ?

Looking forward to your reply, thank you!

Test time training code

Hi Allan,

Many thanks for releasing the codes again! Could you tell us the time that you plan to release the test-time training code? Or would it be possible for you to give me some suggestions on how to implement this based on current codebase?

Many thanks!

Handling total occlusions

I'm trying to reproduce some of the results in the paper, and I'm interested in how the model deals with total occlusions.

For example, I notice in the extra qualitative results you provide, there is a moment where the person being tracked is fully occluded as someone else on a bike passes by (specifically here: https://youtu.be/R_Zae5N_hKw), and the occluded nodes no longer have the labels. I'm unsure how all of the labels disappeared? What happens to a node when its entirely occluded and goes out of sight?

In some initial results of running the model, it appears to predict that entirely occluded nodes (incorrectly) transition to neighbouring nodes or thereafter start tracking the occlusion, as opposed to not being predicted at all.

Thanks for any help in advance!

Other downstream tasks

@ajabri Have you tried evaluating on other downstream tasks? E.g. how do learned representations perform on object detection or instance segmentation?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.