Comments (8)
Hi, if you want to use the optimized flash attention code, you can check out the code here. And this document may be helpful. Hope this can help you.
from uni-fold.
I run into NaNs if I enable flash attention.
unicore.nan_detector | NaN detected in output of model.evoformer.blocks.47.tri_att_end.mha.linear_v, shape: torch.Size([1, 292, 292, 128]), backward
WARNING | unicore.nan_detector | NaN detected in output of model.evoformer.blocks.21.msa_att_row.mha.linear_v, shape: torch.Size([1, 256, 184, 256]), backward
I get also lots of new warnings:
UserWarning: Using non-full backward hooks on a Module that does not return a single Tensor or a tuple of Tensors is deprecated and will
be removed in future versions. This hook will be missing some of the grad_output. Please use register_full_backward_hook to get the documented behavior.
Is it working for you @Xreki?
A100 with bfloat16 enabled
from uni-fold.
Can you provide some details for the installation of flash attention? It seems that the backward did not work correctly.
from uni-fold.
@lhatsk It seems OK for me. I use the docker image dptechnology/unicore:latest-pytorch1.12.1-cuda11.6-flashattn
, test the monomer
model with demo data on 1-A100 GPU, using bfloat16 and no NaNs.
from uni-fold.
I installed flash attention from source according to the README. torch 1.12.1 + CUDA 11.2
I tested it with multimer on 4 GPUs distributed over two nodes (finetuning). It doesn't happen right away. Interestingly, I also get NaNs with OpenFold when I enable flash attention (different data, different cluster, different software setup, monomer) but it happens in the pTM computation there.
from uni-fold.
Can you write a single test for the flash_attn
interface with the shape of the input like [1, 292, 292, 128]
, so that we can test the function whether works properly?
from uni-fold.
Just running _flash_attn(q,k,v)
works without NaNs. I tested it now also with the pre-compiled package and Uni-Fold monomer, also NaNs. Seems to happen after two or three samples.
from uni-fold.
you now can use this branch: https://github.com/dptech-corp/Uni-Fold/tree/flash-attn , to try the flash-attention.
from uni-fold.
Related Issues (20)
- Can unifold multimer be trained with batch_size higher than 1? HOT 2
- Entries in eval_multi_label.json and eval_sample_weight.json do not exist in pdb_uniprots HOT 1
- pdb_assembly.json does not agree with train_multi_label.json HOT 6
- Missing import in Colab HOT 1
- colab error HOT 8
- Is total_step fixed? HOT 2
- import_jax_weights_ failed on AlphaFold-Multimer 2.3.0 HOT 3
- parameters are missing in the pretrained weights HOT 4
- Multi node training HOT 3
- Could not find path to the "hhblits" binary
- Run Uni-Fold with Bohrium Apps
- FileNotFoundError: No such file or directory: '/C.feature.pkl.gz' HOT 1
- questions on installing on Ubuntu Linux 22.04 HOT 1
- recreating homo_search.py output -- minimal version HOT 3
- competition multimer analysis -- does chain order matter? HOT 7
- model name for all alphafold parameters HOT 1
- multi-gpu inference
- convert_unifold_to_alphafold.py?
- UniFold crash: unable to find SCOPdata (a bug that has popped up in ColabFold, & there is a straightforward reason and patch) HOT 2
- Training with linkers
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from uni-fold.