Giter Site home page Giter Site logo

cmhungsteve / sstda Goto Github PK

View Code? Open in Web Editor NEW
154.0 10.0 22.0 1.17 MB

[CVPR 2020] Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation (PyTorch)

Home Page: https://arxiv.org/abs/2003.02824

License: MIT License

Python 81.90% Shell 18.10%
cvpr2020 pytorch domain-adaptation domain-discrepancy temporal-dynamics video action-segmentation self-supervised-learning video-understanding

sstda's Introduction

Hi there 👋

My name is Min-Hung (Steve) Chen (陳敏弘 in Chinese). I am a Senior Research Scientist at NVIDIA Research Taiwan, working on Vision+X Multi-Modal AI. I received my Ph.D. degree from Georgia Tech, advised by Prof. Ghassan AlRegib and in collaboration with Prof. Zsolt Kira. Before joining NVIDIA, I was working on Biometric Research for Cognitive Services as a Research Engineer II at Microsoft Azure AI, and was working on Edge-AI Research as a Senior AI Engineer at MediaTek, respectively.

My research interest is mainly Multi-Modal AI, including Vision-Language, Video Understanding, Cross-Modal Learning, Efficient Tuning, and Transformer. I am also interested in Learning without Fully Supervision, including domain adaptation, transfer learning, continual learning, X-supervised learning, etc.

[Update] I released a comprehensive paper list for Vision Transformer & Attention to facilitate related research. Feel free to check it (I would be appreciative if you can ★STAR it)!

[Personal Website][LinkedIn][Twitter][Google Scholar][Resume]

Min-Hung (Steve)'s GitHub stats

sstda's People

Contributors

cmhungsteve avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sstda's Issues

Is there a demo ?

Hi, @cmhungsteve , today i heard about action segmentation, and i feel interested in it, so i searched it in Github, then i find this great repo, is there a demo or method so that i can have a quick and intuitive sense about action segmentation ? Thanks !

question about the feature shape

Thanks for your great work
I have download the feature of every video, and loaded the .npy file. I find the shape of feature is (2048, frame_count). As the feature dimension of every clip is 2048, why not the featue shape is (frame_count, 2048)?

Pretrained model

Hi, Would you share the pretrained model for test purpose? I wonder it will train for serveral days even on a machine equipped with more than four advanced gpu cards.

Qualitative results

Hi, steve:

Can you share the code or script that you use to produce those visualization results?

Incomplete dataset Download

Hello.
Thank you so much for the files and code you uploaded. However, many times I downloaded the Dataset folder, it was incomplete. Could you please divide the Dataset file into several small compressed files and upload them? Thank you very much for your cooperation.
Best regards

I3D Feature

Each video feature dimension:(2048,X), Whether the feature dimension of each frame extracted is 2048? or divide the video into X frames, and then extract all the frames with a special frame (2048, X)?
I3D doesn't seem to be able to extract features from a single frame, so, I want to know how you extract features from a video. Can you provide code to extract features? Thank you very much!

Change of test set

Hi there, great work and thanks for sharing the code.

I understand that you use the test set as a target domain, does that mean any change of the test set would require retraining to get better results and what are your thoughts/the possible solutions to this?

Thanks and looking forward to your reply.

Results of fewer labeled training data

Hi, I would like to thank you for the refreshing paper.
I have a question regarding the experiments of fewer labeled training data (Table 4 in the main paper and Table 8 in Appendix). I wonder whether the results with 65% of labeled training data were acquired by setting ratio_source or ratio_label_source to 65%.
To my understanding:
(1) ratio_source: dropping both frame features and labels
(2) ratio_label_source: dropping labels only. The dropped labels won’t be used in the TCN cross-entropy loss. However the frame features will still be used in the adversarial loss of domain prediction.
I thought the results of Table 4 were obtained with ratio_source= 65% as it says “we drop labeled frames from source domains with uniform sampling for training” in the paper.
However, in the appendix it also mentions “The additional trained data are all unlabeled, so they cannot be directly trained with standard prediction loss. There we propose SSTDA to exploit unlabeled data” and “achieve performance with this strong baseline using only 65% of labels for training”, which somehow indicate that the results are acquired with ratio_label_source=65%.
Thank you in advance and please correct me if there is any misunderstanding.
Regards

Video Feature Extraction

How to process I3D features dimension into (XXX, 2048)?XXX represents the number of video frames? This I3D feature only uses RGB or flow and RGB?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.