Comments (16)
I was able to fix my problem by manually removing all sequences containing invalid characters according to this:
google-deepmind/alphafold#569
tac pdb_seqres.txt | sed '/^CT05/,+1d' | tac > pdb_seqres_fixed.txt
Dominik
from uni-fold.
Copy that. It looks that the script mishandled the DNA chains as fastas so that 0 is invalid. We are looking into the solution.
from uni-fold.
Glad to see this solved, and thank you for the update. I believe they do not include dna & rna in the pdb_seqres.txt now, which is the reason of the problem.
from uni-fold.
It seems that the parsing of file pdb_seqres.txt
failed:
Parse failed (sequence file /home/data/pdb_seqres/pdb_seqres.txt):
Line 1360234: illegal character 0
Would you please upload the file you used? I understand that the file is big, so perhaps you can locate Line 1360234 and show the context nearby.
from uni-fold.
Also @BaozCWJ may look into this.
from uni-fold.
7ooo_B mol:na length:11 DNA (5'-D(CPTP*(RWQ)PTPCPTPTPTPG)-3')
CT05ATCTTTG
7ooo_E mol:na length:11 DNA (5'-D(CPTP*(RWQ)PTPCPTPTPTPG)-3')
CT05ATCTTTG
second line is Line 1360234, and as error says, '0' character involved.
I used databases from alphafold2 reference site, and it works with their code.
from uni-fold.
7ooo_B mol:na length:11 DNA (5'-D(_CP_TP*(RWQ)P_TP_CP_TP_TP_TP_G)-3')
CT05ATCTTTG
7ooo_E mol:na length:11 DNA (5'-D(_CP_TP*(RWQ)P_TP_CP_TP_TP_TP_G)-3')
CT05ATCTTTGsecond line is Line 1360234, and as error says, '0' character involved.
I used databases from alphafold2 reference site, and it works with their code.
It looks like the error of hmmsearch command
/usr/bin/hmmsearch --noali --cpu 8 --F1 0.1 --F2 0.1 --F3 0.1 --incE 100 -E 100 --domE 100 --incdomE 100 -A /tmp/tmpil8uqp24/output.sto /tmp/tmpil8uqp24/query.hmm /home/data/pdb_seqres/pdb_seqres.txt
could you please try the same fasta with alphafold's code and upload the coressponding hmmsearch command?
from uni-fold.
I checked alphafold code and It appears hmmsearch only used in multimer model, and when I run multimer, same error occured.
command seems same. (see below)
I0809 13:25:16.533803 140494127470400 run_docker.py:255] I0809 04:25:16.531341 139681435416384 hmmsearch.py:103] Launching sub-process ['/usr/bin/hmmsearch', '--noali', '--cpu', '8', '--F1', '0.1', '--F2', '0.1', '--F3', '0.1', '--incE', '100', '-E', '100', '--domE', '100', '--incdomE', '100', '-A', '/tmp/tmp69wsql5x/output.sto', '/tmp/tmp69wsql5x/query.hmm', '/mnt/pdb_seqres_database_path/pdb_seqres.txt']
....(omitted)
I0809 13:25:28.802923 140494127470400 run_docker.py:255] stderr:
I0809 13:25:28.803056 140494127470400 run_docker.py:255] Parse failed (sequence file /mnt/pdb_seqres_database_path/pdb_seqres.txt):
I0809 13:25:28.803190 140494127470400 run_docker.py:255] Line 1360234: illegal character 0
Sorry for inaccurate info that alphafold2 ref works.
So, Is my pdb_seqres.txt file a problem?
from uni-fold.
ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_seqres.txt seems modified. I checked my old pdb_seqres.txt and find differences.
I re-run with old one and error not occured!
Thanks for your reply.
from uni-fold.
Hi, I ran into the same problem. Is there a fix that works with the new PDB_seqres version? Or alternatively, do you know if the old file is still available somewhere?
Thank you,
Dominik
from uni-fold.
@dominik-handler Thank you for the report, and the hot-fix for this issue. Since it's been reported twice, we are now looking into a solution to automatically fix this issue.
from uni-fold.
Thank you!
By the way the database file-structure created by download_all does not fit the expected file naming of run_unifold.sh.
I had to manually fix the structure of the uniprot database.
from uni-fold.
Thank you! By the way the database file-structure created by download_all does not fit the expected file naming of run_unifold.sh. I had to manually fix the structure of the uniprot database.
Maybe @BaozCWJ can look into this?
from uni-fold.
Thank you!
By the way the database file-structure created by download_all does not fit the expected file naming of run_unifold.sh.
I had to manually fix the structure of the uniprot database.
Thanks for your issue, could you please confirm the correct path of uniprot database as following?
--uniprot_database_path=$database_dir/uniprot/uniprot.fasta
from uni-fold.
That should be correct. Thank you!
from uni-fold.
That should be correct. Thank you!
Thx! It's already fixed in the #42
from uni-fold.
Related Issues (20)
- Can unifold multimer be trained with batch_size higher than 1? HOT 2
- Entries in eval_multi_label.json and eval_sample_weight.json do not exist in pdb_uniprots HOT 1
- pdb_assembly.json does not agree with train_multi_label.json HOT 6
- Missing import in Colab HOT 1
- colab error HOT 8
- Is total_step fixed? HOT 2
- import_jax_weights_ failed on AlphaFold-Multimer 2.3.0 HOT 3
- parameters are missing in the pretrained weights HOT 4
- Multi node training HOT 3
- Could not find path to the "hhblits" binary
- Run Uni-Fold with Bohrium Apps
- FileNotFoundError: No such file or directory: '/C.feature.pkl.gz' HOT 1
- questions on installing on Ubuntu Linux 22.04 HOT 1
- recreating homo_search.py output -- minimal version HOT 3
- competition multimer analysis -- does chain order matter? HOT 7
- model name for all alphafold parameters HOT 1
- multi-gpu inference
- convert_unifold_to_alphafold.py?
- UniFold crash: unable to find SCOPdata (a bug that has popped up in ColabFold, & there is a straightforward reason and patch) HOT 2
- Training with linkers
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from uni-fold.