hillerlab / genomealignmenttools Goto Github PK
View Code? Open in Web Editor NEWTools for improving the sensitivity and specificity of genome alignments
License: MIT License
Tools for improving the sensitivity and specificity of genome alignments
License: MIT License
Hi,
When I use chainCleaner, -linearGap must be specified:
loose is chicken/human linear gap costs.
medium is mouse/human linear gap costs.
In example, mm10 and hg38, the linearGap is loose. I can't understand why don't use medium.
But in my research, we only have zoker species. In this case, how to set -linearGap? medium or specify a piecewise linearGap tab delimited file?
Bests,
Yinjia
Dear all,
I followed your pipelines that published in gagascience (2020, 120way mammal) and aligned a new rodent genome to hg38 using LASTZ.
Considering the computational resources required, I splited hg38 genome to hundreds chunks according to chromosome/scaffold name and did the LASTZ & axtChain.
After axtChain, some other pipelines suggest to merge all splited chains into one large chain before chainPreNet(e.g. http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto).
So, in your pipeline, do we need to merge all per-chromosome chains before RepeatFiller? If YES, could you please simply explain why? If NO, which step do I need to merge per-chromosome files? OR, I can follow your pipeline using split-run style all the time?
Best Wishes!
Tao
hi,
the program failed in the last step(chainSort), is there a way to rerun from the chainSort step?
here are some lines in my log:
1.2 get fills/gaps from tmp.chainCleaner.X6tzroP.net ...
DONE1.3 get aligning regions from tmp.chainCleaner.X6tzroP.net ...
DONE1.4 get valid breaks ...
DONE
Remove temporary netfile tmp.chainCleaner.X6tzroP.net
DONE (parsing fills/gaps and getting valid breaks)
reading breaking and broken chains from 01.test/test_out/temp_chain_run/zbf.Ha.before_cleaning.chain.gz and write irrelevant chains to 01.test/test_out/temp_chain_run/zbf.Ha.filled.chain__temp.unsorted ...
DONEreading target and query DNA sequences for breaking and broken chains ...
DONEloop over all breaks. Remove suspects if they pass our filters and write out deleted suspects to 01.test/test_out/temp_chain_run/removed_suspects.bed ...
DONEwrite the (new) breaking and the broken chains to 01.test/test_out/temp_chain_run/zbf.Ha.filled.chain__temp.unsorted ...
DONEchainSort 01.test/test_out/temp_chain_run/zbf.Ha.filled.chain__temp.unsorted 01.test/test_out/temp_chain_run/zbf.Ha.filled.chain__temp ...
ERROR: chainSort failed. Command: chainSort 01.test/test_out/temp_chain_run/zbf.Ha.filled.chain__temp.unsorted 01.test/test_out/temp_chain_run/zbf.Ha.filled.chain__tempslurmstepd: error: Detected 1 oom-kill event(s) in step 1509017.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
Hi,
I see the GenomeAlignmentTools and make_lastz_chains all generate the clean chains through the lastz and kent utilities, so if i want to do multiz or pecan in next step, is all results ok? or which results is best for next step?
Thank you so much
Hi, respected Sir,
When I executed NetFilterNonNested.perl, I was told that
$ NetFilterNonNested.perl ARS-UCD.net -doUCSCSynFilter -keepSynNetsWithScore 5000 -keepInvNetsWithScore 5000
ERROR: parameter -keepSynNetsWithScore/-keepInvNetsWithScore-doUCSCSynFilter is given, but I cannot parse the net type from this fill line: fill 54369 150 NW_017212635.1 + 13997 150 id 4533855 score 3186 ali 150
So, I need to execute netClass before NetFilterNonNested, but I don't know what tDB and qDb are and how to get it.So please let me know.
Thank you in advance for your answers!
netClass - Add classification info to net
usage:
netClass [options] in.net tDb qDb out.net
tDb - database to fetch target repeat masker table information
qDb - database to fetch query repeat masker table information
Hi,
I tried below steps
git clone https://github.com/hillerlab/GenomeAlignmentTools.git
cd GenomeAlignmentTools/kent/src
make
cc -O -g -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_GNU_SOURCE -DMACHTYPE_x86_64 -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -I../inc -I../../inc -I../../../inc -I../../../../inc -I../../../../../inc -I../htslib -o pipeline.o -c pipeline.c
cc -O -g -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_GNU_SOURCE -DMACHTYPE_x86_64 -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -I../inc -I../../inc -I../../../inc -I../../../../inc -I../../../../../inc -I../htslib -o portimpl.o -c portimpl.c
cc -O -g -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_GNU_SOURCE -DMACHTYPE_x86_64 -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -I../inc -I../../inc -I../../../inc -I../../../../inc -I../../../../../inc -I../htslib -o pngwrite.o -c pngwrite.c
pngwrite.c:7:10: fatal error: png.h: No such file or directory
7 | #include "png.h" // MUST come before common.h, due to setjmp checking in pngconf.h
| ^~~~~~~
compilation terminated.
make[1]: *** [../inc/common.mk:420: pngwrite.o] Error 1
make[1]: Leaving directory '/data/install/genomicallingments/GenomeAlignmentTools/kent/src/lib'
make: *** [makefile:9: topLibs] Error 2
Please advise what I'm missing. Is there any prerequisite to install ?
Thanks
Jay
Hi, Teamers.
Recent days, I met a lot problems when used the chainCleaner. Could you please give me some help.
my command was list as follow:
chainCleaner out.chain.gz -tSizes=../target.sizes -qSizes=../query.sizes ../target.2bit ../query.2bit out.clean.chain removedSuspects.bed -linearGap=medium
But I got some warnings:
Thank you. @MichaelHiller
Dear all,
Now I'm running RepeatFiller.py on my genomes (target - mm10, query - mink, sequenced by ourselves), but I met the following error when I run it with the chain file of mm10 chromosome chr12_GL456349_alt:
Traceback (most recent call last):
File "/gpfs/home/yangrui/softs/GenomeAlignmentTools-master/src/RepeatFiller.py", line 396, in
curMiniBlockLines = GetChainBlockFromLastzOutput(allMiniChainLines, next_pos)
File "/gpfs/home/yangrui/softs/GenomeAlignmentTools-master/src/RepeatFiller.py", line 224, in GetChainBlockFromLastzOutput
raise ValueError('ERROR! allMiniChainsSplit start separator line at position ' + str(position) + ' does not start with LINE...')
ValueError: ERROR! allMiniChainsSplit start separator line at position 0 does not start with LINE...
It seems this chain file generated by mm10 chr12_GL456349_alt vs all mink chrs is smaller and simpler than those chains generated by mm10 chr1/2/3/...
Can you tell me whether I can directly delete this problem chain file from the following steps? It seems this chr chr12_GL456349_alt is not so important.
In additition, can you tell me what situations will raise this error?
Thank you!
I followed the installation instructions, but when I get at the compiling part for chainCleaner, chainNet and scoreChain, I get the following error:
gcc -I/inc -I/hg/inc -O4 -static -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -c -o chainNet.o chainNet.c
chainNet.c:10:20: fatal error: common.h: No such file or directory
#include "common.h"
^
compilation terminated.
make: *** [chainNet.o] Error 1
Am I missing some steps?
Dear Michael,
I gave FilterChains_Net_FilterNets.perl a try:
sh: line 1: ExtractSynInvChainsFromNet.perl: command not found
Uncaught exception from user code:
ERROR: ExtractSynInvChainsFromNet.perl command failed
Thanks
AndreiR
Dear Michael,
I followed your pipelines that published in gagascience (2020, 120way mammal) and aligned a new rodent genome to hg38.
After netClass, my net file looks like the following:
net chr1 248956422
fill 11502 17549 Scaffold124 + 86810 13269 id 3869 score 204610 ali 8054 tN 0 qN 18 tR 3301 qR 2443 tTrf 173 qTrf 148
gap 12112 64 Scaffold124 + 87448 992 tN 0 qN 0 tR 0 qR 63 tTrf 0 qTrf 0
fill 12112 52 Scaffold124 - 48726 55 id 59165 score 85 ali 52 tN 0 qN 0 tR 0 qR 0 tTrf 0 qTrf 0
gap 12262 26 Scaffold124 + 88510 0 tN 0 qN 0 tR 0 qR 0 tTrf 0 qTrf 0
gap 12733 486 Scaffold124 + 88954 851 tN 0 qN 0 tR 0 qR 479 tTrf 0 qTrf 0
fill 12734 485 Scaffold419 + 65479 905 id 6168 score 4813 ali 452 tN 0 qN 0 tR 0 qR 368 tTrf 0 qTrf 0
gap 13803 25 Scaffold124 + 90351 0 tN 0 qN 0 tR 0 qR 0 tTrf 0 qTrf 0
fill 13803 25 Scaffold704 - 144389 25 id 341141 score 231 ali 25 tN 0 qN 0 tR 0 qR 0 tTrf 0 qTrf 0
gap 14933 29 Scaffold124 + 91633 0 tN 0 qN 0 tR 0 qR 0 tTrf 0 qTrf 0
......
Then I ran NetFilterNonNested.perl with parameters: NetFilterNonNested.perl -doUCSCSynFilter -keepSynNetsWithScore 5000 -keepInvNetsWithScore 5000 human_ssp_all_classed.net >human_ssp_all_classed_filtered.net
I got the following error:
ERROR: parameter -keepSynNetsWithScore/-keepInvNetsWithScore-doUCSCSynFilter is given, but I cannot parse the net type from this fill line: fill 11502 17549 Scaffold124 + 86810 13269 id 3869 score 204610 ali 8054 tN 0 qN 18 tR 3301 qR 2443 tTrf 173 qTrf 148
Could you please help me to figure out the possible reasons?
Best Wishes,
Tao
Hello,
I am trying to install the tools using the following commands:
git clone https://github.com/hillerlab/GenomeAlignmentTools.git
cd GenomeAlignmentTools/kent/src
make
I get the following error:
hmac.c:11:26: fatal error: openssl/hmac.h: No such file or directory
#include "openssl/hmac.h"
^
compilation terminated.
make[1]: *** [hmac.o] Error 1
make[1]: Leaving directory `/data/gpfs/projects/punim0586/lecook/chipseq-pipeline/cross_species/data/genomes/chain/GenomeAlignmentTools/kent/src/lib'
make: *** [topLibs] Error 2
Any assistance with this error would be greatly appreciated!
Thank you.
Laura Cook
Hi Dr. Hiller,
I am working on generating a pairwise whole genome alignment. I used the command below to run RepeatFiller on a chain file that was merged from several smaller chains and sorted, with chains generated using axtChain. RepeatFiller runs to completion, but when I use the .chain file output from RepeatFiller as input to subsequent programs I get the following error:
"q end mismatch 45149542 vs 45240145 line 565956 of Zonotrichia_albicollis_repeatfiller.chain"
RepeatFiller.py --chain Zonotrichia_albicollis.chain --T2bit ../galGal6.2bit --Q2bit ../Zonotrichia_albicollis_GCF_000385455.1_genomic_simple_filtered_masked.2bit -o Zonotrichia_albicollis_repeat_filler.chain
I do not get the same error message when I use the original .chain file or the output .chain file from patchChain.perl as input into other chain file manipulation programs. I'm wondering if RepeatFiller is not running to completion. I've gone to the specified line in the error message, but I'm not seeing anything out of the ordinary.
Thank you for your time.
John
Can not find the file, png.h
pngwrite.c:7:10: fatal error: png.h: No such file or directory
7 | #include "png.h" // MUST come before common.h, due to setjmp checking in pngconf.h
| ^~~~~~~
compilation terminated.
make[1]: *** [../inc/common.mk:420: pngwrite.o] Error 1
make[1]: Leaving directory '/public_data/softwares/GenomeAlignmentTools/kent/src/lib'
make: *** [makefile:9: topLibs] Error 2
Hi
I am trying RepeatFiller on multiple chain files for hg38->T2T and I am encountering some inconsistent errors.
The official chain file from T2T runs through without anything being reported - seems no repeats are anymore present.
Using self generated ones (from minimap2 and GSAlign) I encounter a weird situation that chain files are generated but then are faulty.
python3 /usr/local/GenomeAlignmentTools/src/RepeatFiller.py \
--chain Minimap2_liftover.chain --T2bit hg38_p8_primaryContigs.2bit \
--Q2bit chm13v2.0.2bit -o Minimap2_liftover.repeatFiltered.chain
If I try then any kind of command afterwards, e.g.
chainPreNet Minimap2_liftover.repeatFiltered.chain hg38_p8_primaryContigs.sizes chm13v2.0.sizes stdout
q end mismatch 242669717 vs 242693499 line 54824 of Minimap2_liftover.repeatFiltered.chain
It fails with that error, other tools such as chainSorter as well. If I use though instead my file Minimap2_liftover.chain
then everything goes smoothly. I tried as well flipping target and query in the RepeatFiller
command as I was not sure about the definition, and astonishing (and worringly) it actually went through ....
But I get then similarly an incompatible chain file at the end.
I used the latest release from your tools
Hi everyone! I am a noob in these Comparative Genomics stuff so excuse me if this is a obvious question.
I am working with two distal species (181 MYA) and some of your tools looks very interesting. I have checked "Supplementary Material for a genome alignment of 120 mammals highlights ultraconserved element variability and placenta associated enhancers" and it seems like you did not use chainPrenet before netting (chainNet) ,after chainCleaner.
Thank you in advance! 👍
Hello,
I am trying to compile the libraries
These are the steps I followed:
git clone https://github.com/hillerlab/GenomeAlignmentTools.git
cd GenomeAlignmentTools/kent/src
make
And I have the following error:
cd hg/mouseStuff/ && make
make[1]: Entering directory /home/dmorenos/software/GenomeAlignmentTools/kent/src/hg/mouseStuff' cd axtChain && echo axtChain && make axtChain make[2]: Entering directory
/home/dmorenos/software/GenomeAlignmentTools/kent/src/hg/mouseStuff/axtChain'
cc -O -g -o ../../../../bin//axtChain axtChain.o ../../../lib/x86_64/jkhgap.a ../../../lib/x86_64/jkweb.a -lpthread -lssl -lcrypto ../../../htslib/libhts.a -lm -lz
/usr/bin/ld: ../../../lib/x86_64/jkweb.a(pipeline.o): unrecognized relocation (0x29) in section .text.checkOpts' /usr/bin/ld: final link failed: Bad value collect2: error: ld returned 1 exit status make[2]: *** [../../../../bin//axtChain] Error 1 make[2]: Leaving directory
/home/dmorenos/software/GenomeAlignmentTools/kent/src/hg/mouseStuff/axtChain'
make[1]: *** [axtChain.all] Error 2
make[1]: Leaving directory `/home/dmorenos/software/GenomeAlignmentTools/kent/src/hg/mouseStuff'
make: *** [userApps] Error 2
Can you help me with this issue please?
Thanks.
Respected Sir,
I have generated chain file using axtChain but now when I am trying to make net file using chainNet program I am confused about the input files
target.sizes, query.sizes
The query chain was made by denovo assembled genome over reference genome. How to make these respective file. Please share your view I will be grateful to you.
Dear Michael,
Recently, I want to use RepeatFiller in my analysis. But I got some problems.
Could you please give me some help.
Traceback (most recent call last): File "/public4/home/sc56340/genome/bin/RepeatFiller.py", line 396, in <module> curMiniBlockLines = GetChainBlockFromLastzOutput(allMiniChainLines, next_pos) File "/public4/home/sc56340/genome/bin/RepeatFiller.py", line 224, in GetChainBlockFromLastzOutput raise ValueError('ERROR! allMiniChainsSplit start separator line at position ' + str(position) + ' does not start with LINE...') ValueError: ERROR! allMiniChainsSplit start separator line at position 0 does not start with LINE...
Thank you. @MichaelHiller
Hello
I have generated target.net query.net file using chainNet but now when I am trying to run NetFilterNonNested.perl to filter but I am confused about the input files ref.query.net.gz. Could you give me any suggestions?
NetFilterNonNested.perl -doUCSCSynFilter -keepSynNetsWithScore 5000 -keepInvNetsWithScore 5000 ref.query.net.gz > ref.query.filtered.net
The 'git clone http://genome-source.cse.ucsc.edu/kent.git' is not working for me.
Hey all,
You may want to consider updating axtChain depending on how you call it. There's also maybe a few more bugs been addressed since the below I think (just based on compiled file sizes):
I'm working to make Repeatfiller/chainfiller available on Bioconda.
Please refer. -> https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository#creating-a-release
I need your help creating a versioned release to use for the Bioconda recipe. Once this is added to Bioconda, it'll also be made available as a Docker container from Biocontainers, and as a Singularity image from the Galaxy Project. The Bioconda bot will also recognize future releases and automatically update the recipe.
Please let me know
Thanks
Jay
Hi Michael,
I have aligned the mouse genome (mm10) with an unpublished marsupial genome. I have run chainCleaner, chainNet and I am now trying to run NetFilteredNonNested but I'm receiving an error.
This is the command I've run:
NetFilterNonNested.perl -doScoreFilter -minScore1 10000 -keepSynNetsWithScore 3000 -keepInvNetsWithScore 3000 mm10.smiCra1.net > mm10.smiCra1_filtered.net
And this is the error:
ERROR: parameter -keepSynNetsWithScore/-keepInvNetsWithScore-doUCSCSynFilter is given, but I cannot parse the net type from this fill line: fill 3051761 112 scaffold00029_pilon_pilon - 11057999 112 id 4113312 score 5313 ali 112
This is what my net file looks like:
##matrix=axtChain 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91
##gapPenalties=axtChain O=400 E=30
net chr1 195471971
fill 3051761 112 scaffold00029_pilon_pilon - 11057999 112 id 4113312 score 5313 ali 112
fill 3054673 508 scaffold00002_pilon_pilon + 385464750 554 id 2546634 score 8323 ali 471
fill 3061871 46 scaffold00029_pilon_pilon + 24281035 46 id 7073966 score 3087 ali 46
fill 3083164 548 scaffold00004_pilon_pilon - 33312094 340 id 1779203 score 11319 ali 224
gap 3083328 301 scaffold00004_pilon_pilon - 33312154 114
fill 3094693 20671 scaffold00012_pilon_pilon - 6162007 19098 id 241035 score 32777 ali 1550
gap 3095443 32 scaffold00012_pilon_pilon - 6180332 0
Any advice on troubleshooting this error would be greatly appreciated.
Thanks so much!
Laura
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.