Giter Site home page Giter Site logo

hillerlab / genomealignmenttools Goto Github PK

View Code? Open in Web Editor NEW
55.0 55.0 15.0 46.82 MB

Tools for improving the sensitivity and specificity of genome alignments

License: MIT License

C 94.50% Makefile 0.71% Perl 0.51% Python 0.24% q 0.01% Shell 0.22% C++ 0.52% HTML 0.08% AngelScript 2.77% TSQL 0.02% PostScript 0.02% Gnuplot 0.23% Roff 0.13% M4 0.05% ActionScript 0.01% CAP CDS 0.01%

genomealignmenttools's People

Contributors

michaelhiller avatar osipovarev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

genomealignmenttools's Issues

How to specify -linearGap

Hi,

When I use chainCleaner, -linearGap must be specified:
loose is chicken/human linear gap costs.
medium is mouse/human linear gap costs.
In example, mm10 and hg38, the linearGap is loose. I can't understand why don't use medium.
But in my research, we only have zoker species. In this case, how to set -linearGap? medium or specify a piecewise linearGap tab delimited file?

Bests,
Yinjia

merge per-chromosome chains

Dear all,

I followed your pipelines that published in gagascience (2020, 120way mammal) and aligned a new rodent genome to hg38 using LASTZ.
Considering the computational resources required, I splited hg38 genome to hundreds chunks according to chromosome/scaffold name and did the LASTZ & axtChain.
After axtChain, some other pipelines suggest to merge all splited chains into one large chain before chainPreNet(e.g. http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto).

So, in your pipeline, do we need to merge all per-chromosome chains before RepeatFiller? If YES, could you please simply explain why? If NO, which step do I need to merge per-chromosome files? OR, I can follow your pipeline using split-run style all the time?

Best Wishes!

Tao

rerun from chainSort

hi,

the program failed in the last step(chainSort), is there a way to rerun from the chainSort step?

here are some lines in my log:

1.2 get fills/gaps from tmp.chainCleaner.X6tzroP.net ...
DONE

1.3 get aligning regions from tmp.chainCleaner.X6tzroP.net ...
DONE

1.4 get valid breaks ...
DONE
Remove temporary netfile tmp.chainCleaner.X6tzroP.net
DONE (parsing fills/gaps and getting valid breaks)

  1. reading breaking and broken chains from 01.test/test_out/temp_chain_run/zbf.Ha.before_cleaning.chain.gz and write irrelevant chains to 01.test/test_out/temp_chain_run/zbf.Ha.filled.chain__temp.unsorted ...
    DONE

  2. reading target and query DNA sequences for breaking and broken chains ...
    DONE

  3. loop over all breaks. Remove suspects if they pass our filters and write out deleted suspects to 01.test/test_out/temp_chain_run/removed_suspects.bed ...
    DONE

  4. write the (new) breaking and the broken chains to 01.test/test_out/temp_chain_run/zbf.Ha.filled.chain__temp.unsorted ...
    DONE

  5. chainSort 01.test/test_out/temp_chain_run/zbf.Ha.filled.chain__temp.unsorted 01.test/test_out/temp_chain_run/zbf.Ha.filled.chain__temp ...
    ERROR: chainSort failed. Command: chainSort 01.test/test_out/temp_chain_run/zbf.Ha.filled.chain__temp.unsorted 01.test/test_out/temp_chain_run/zbf.Ha.filled.chain__temp

slurmstepd: error: Detected 1 oom-kill event(s) in step 1509017.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

What are tDb and qDb of netClass input data ?

Hi, respected Sir,
When I executed NetFilterNonNested.perl, I was told that

$ NetFilterNonNested.perl ARS-UCD.net -doUCSCSynFilter -keepSynNetsWithScore 5000 -keepInvNetsWithScore 5000
ERROR: parameter -keepSynNetsWithScore/-keepInvNetsWithScore-doUCSCSynFilter is given, but I cannot parse the net type from this fill line: fill 54369 150 NW_017212635.1 + 13997 150 id 4533855 score 3186 ali 150

So, I need to execute netClass before NetFilterNonNested, but I don't know what tDB and qDb are and how to get it.So please let me know.
Thank you in advance for your answers!

netClass - Add classification info to net
usage:
   netClass [options] in.net tDb qDb out.net
       tDb - database to fetch target repeat masker table information
       qDb - database to fetch query repeat masker table information

Install error

Hi,

I tried below steps

git clone https://github.com/hillerlab/GenomeAlignmentTools.git
cd GenomeAlignmentTools/kent/src
make

cc -O -g -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_GNU_SOURCE -DMACHTYPE_x86_64   -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -I../inc -I../../inc -I../../../inc -I../../../../inc -I../../../../../inc -I../htslib   -o pipeline.o -c pipeline.c
cc -O -g -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_GNU_SOURCE -DMACHTYPE_x86_64   -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -I../inc -I../../inc -I../../../inc -I../../../../inc -I../../../../../inc -I../htslib   -o portimpl.o -c portimpl.c
cc -O -g -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_GNU_SOURCE -DMACHTYPE_x86_64   -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -I../inc -I../../inc -I../../../inc -I../../../../inc -I../../../../../inc -I../htslib   -o pngwrite.o -c pngwrite.c
pngwrite.c:7:10: fatal error: png.h: No such file or directory
    7 | #include "png.h"   // MUST come before common.h, due to setjmp checking  in pngconf.h
      |          ^~~~~~~
compilation terminated.
make[1]: *** [../inc/common.mk:420: pngwrite.o] Error 1
make[1]: Leaving directory '/data/install/genomicallingments/GenomeAlignmentTools/kent/src/lib'
make: *** [makefile:9: topLibs] Error 2

Please advise what I'm missing. Is there any prerequisite to install ?

Thanks
Jay

chainCleaner failed

Hi, Teamers.
Recent days, I met a lot problems when used the chainCleaner. Could you please give me some help.
my command was list as follow:
chainCleaner out.chain.gz -tSizes=../target.sizes -qSizes=../query.sizes ../target.2bit ../query.2bit out.clean.chain removedSuspects.bed -linearGap=medium
But I got some warnings:
image

Thank you. @MichaelHiller

error when running repeatfiller on mm10 chromosome chr12_GL456349_alt

Dear all,

Now I'm running RepeatFiller.py on my genomes (target - mm10, query - mink, sequenced by ourselves), but I met the following error when I run it with the chain file of mm10 chromosome chr12_GL456349_alt:

Traceback (most recent call last):
File "/gpfs/home/yangrui/softs/GenomeAlignmentTools-master/src/RepeatFiller.py", line 396, in
curMiniBlockLines = GetChainBlockFromLastzOutput(allMiniChainLines, next_pos)
File "/gpfs/home/yangrui/softs/GenomeAlignmentTools-master/src/RepeatFiller.py", line 224, in GetChainBlockFromLastzOutput
raise ValueError('ERROR! allMiniChainsSplit start separator line at position ' + str(position) + ' does not start with LINE...')
ValueError: ERROR! allMiniChainsSplit start separator line at position 0 does not start with LINE...

It seems this chain file generated by mm10 chr12_GL456349_alt vs all mink chrs is smaller and simpler than those chains generated by mm10 chr1/2/3/...

Can you tell me whether I can directly delete this problem chain file from the following steps? It seems this chr chr12_GL456349_alt is not so important.

In additition, can you tell me what situations will raise this error?

Thank you!

Error when compiling chainNet and other binaries

I followed the installation instructions, but when I get at the compiling part for chainCleaner, chainNet and scoreChain, I get the following error:

gcc -I/inc -I/hg/inc -O4 -static -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -c -o chainNet.o chainNet.c
chainNet.c:10:20: fatal error: common.h: No such file or directory
#include "common.h"
^
compilation terminated.
make: *** [chainNet.o] Error 1

Am I missing some steps?

Missing ExtractSynInvChainsFromNet.perl

Dear Michael,

I gave FilterChains_Net_FilterNets.perl a try:

sh: line 1: ExtractSynInvChainsFromNet.perl: command not found     
Uncaught exception from user code:    
        ERROR: ExtractSynInvChainsFromNet.perl command failed   

Thanks
AndreiR

NetFilterNonNested error

Dear Michael,

I followed your pipelines that published in gagascience (2020, 120way mammal) and aligned a new rodent genome to hg38.

After netClass, my net file looks like the following:
net chr1 248956422
fill 11502 17549 Scaffold124 + 86810 13269 id 3869 score 204610 ali 8054 tN 0 qN 18 tR 3301 qR 2443 tTrf 173 qTrf 148
gap 12112 64 Scaffold124 + 87448 992 tN 0 qN 0 tR 0 qR 63 tTrf 0 qTrf 0
fill 12112 52 Scaffold124 - 48726 55 id 59165 score 85 ali 52 tN 0 qN 0 tR 0 qR 0 tTrf 0 qTrf 0
gap 12262 26 Scaffold124 + 88510 0 tN 0 qN 0 tR 0 qR 0 tTrf 0 qTrf 0
gap 12733 486 Scaffold124 + 88954 851 tN 0 qN 0 tR 0 qR 479 tTrf 0 qTrf 0
fill 12734 485 Scaffold419 + 65479 905 id 6168 score 4813 ali 452 tN 0 qN 0 tR 0 qR 368 tTrf 0 qTrf 0
gap 13803 25 Scaffold124 + 90351 0 tN 0 qN 0 tR 0 qR 0 tTrf 0 qTrf 0
fill 13803 25 Scaffold704 - 144389 25 id 341141 score 231 ali 25 tN 0 qN 0 tR 0 qR 0 tTrf 0 qTrf 0
gap 14933 29 Scaffold124 + 91633 0 tN 0 qN 0 tR 0 qR 0 tTrf 0 qTrf 0
......

Then I ran NetFilterNonNested.perl with parameters: NetFilterNonNested.perl -doUCSCSynFilter -keepSynNetsWithScore 5000 -keepInvNetsWithScore 5000 human_ssp_all_classed.net >human_ssp_all_classed_filtered.net

I got the following error:
ERROR: parameter -keepSynNetsWithScore/-keepInvNetsWithScore-doUCSCSynFilter is given, but I cannot parse the net type from this fill line: fill 11502 17549 Scaffold124 + 86810 13269 id 3869 score 204610 ali 8054 tN 0 qN 18 tR 3301 qR 2443 tTrf 173 qTrf 148

Could you please help me to figure out the possible reasons?

Best Wishes,

Tao

issue compiling

Hello,

I am trying to install the tools using the following commands:

git clone https://github.com/hillerlab/GenomeAlignmentTools.git
cd GenomeAlignmentTools/kent/src
make

I get the following error:

hmac.c:11:26: fatal error: openssl/hmac.h: No such file or directory
#include "openssl/hmac.h"
^
compilation terminated.
make[1]: *** [hmac.o] Error 1
make[1]: Leaving directory `/data/gpfs/projects/punim0586/lecook/chipseq-pipeline/cross_species/data/genomes/chain/GenomeAlignmentTools/kent/src/lib'
make: *** [topLibs] Error 2

Any assistance with this error would be greatly appreciated!
Thank you.

Laura Cook

RepeatFiller.py output .chain file reports error when used as input in other chain file programs

Hi Dr. Hiller,

I am working on generating a pairwise whole genome alignment. I used the command below to run RepeatFiller on a chain file that was merged from several smaller chains and sorted, with chains generated using axtChain. RepeatFiller runs to completion, but when I use the .chain file output from RepeatFiller as input to subsequent programs I get the following error:

"q end mismatch 45149542 vs 45240145 line 565956 of Zonotrichia_albicollis_repeatfiller.chain"

RepeatFiller.py --chain Zonotrichia_albicollis.chain --T2bit ../galGal6.2bit --Q2bit ../Zonotrichia_albicollis_GCF_000385455.1_genomic_simple_filtered_masked.2bit -o Zonotrichia_albicollis_repeat_filler.chain

I do not get the same error message when I use the original .chain file or the output .chain file from patchChain.perl as input into other chain file manipulation programs. I'm wondering if RepeatFiller is not running to completion. I've gone to the specified line in the error message, but I'm not seeing anything out of the ordinary.

Thank you for your time.
John

Compiling error (fatal error: png.h: No such file or directory)

Can not find the file, png.h

pngwrite.c:7:10: fatal error: png.h: No such file or directory
7 | #include "png.h" // MUST come before common.h, due to setjmp checking in pngconf.h
| ^~~~~~~
compilation terminated.
make[1]: *** [../inc/common.mk:420: pngwrite.o] Error 1
make[1]: Leaving directory '/public_data/softwares/GenomeAlignmentTools/kent/src/lib'
make: *** [makefile:9: topLibs] Error 2

Chain after RepeatFiller with wrong cordinates

Hi
I am trying RepeatFiller on multiple chain files for hg38->T2T and I am encountering some inconsistent errors.
The official chain file from T2T runs through without anything being reported - seems no repeats are anymore present.
Using self generated ones (from minimap2 and GSAlign) I encounter a weird situation that chain files are generated but then are faulty.

python3 /usr/local/GenomeAlignmentTools/src/RepeatFiller.py  \
  --chain Minimap2_liftover.chain --T2bit hg38_p8_primaryContigs.2bit \
  --Q2bit chm13v2.0.2bit -o Minimap2_liftover.repeatFiltered.chain 

If I try then any kind of command afterwards, e.g.

chainPreNet Minimap2_liftover.repeatFiltered.chain  hg38_p8_primaryContigs.sizes chm13v2.0.sizes stdout 

q end mismatch 242669717 vs 242693499 line 54824 of Minimap2_liftover.repeatFiltered.chain

It fails with that error, other tools such as chainSorter as well. If I use though instead my file Minimap2_liftover.chain then everything goes smoothly. I tried as well flipping target and query in the RepeatFiller command as I was not sure about the definition, and astonishing (and worringly) it actually went through ....
But I get then similarly an incompatible chain file at the end.

I used the latest release from your tools

Distal species alignment | chainCleaner & chainPreNet

Hi everyone! I am a noob in these Comparative Genomics stuff so excuse me if this is a obvious question.

I am working with two distal species (181 MYA) and some of your tools looks very interesting. I have checked "Supplementary Material for a genome alignment of 120 mammals highlights ultraconserved element variability and placenta associated enhancers" and it seems like you did not use chainPrenet before netting (chainNet) ,after chainCleaner.

  • Are these two tools mutually exclusive?
  • Is a bad idea chain-filtering (chainPreNet) after chain-improvement (chainCleaner) to obtain better nets?
  • Should i use always chainPreNet before chainNet?

Thank you in advance! 👍

compiling error

Hello,
I am trying to compile the libraries
These are the steps I followed:

git clone https://github.com/hillerlab/GenomeAlignmentTools.git
cd GenomeAlignmentTools/kent/src
make

And I have the following error:
cd hg/mouseStuff/ && make
make[1]: Entering directory /home/dmorenos/software/GenomeAlignmentTools/kent/src/hg/mouseStuff' cd axtChain && echo axtChain && make axtChain make[2]: Entering directory /home/dmorenos/software/GenomeAlignmentTools/kent/src/hg/mouseStuff/axtChain'
cc -O -g -o ../../../../bin//axtChain axtChain.o ../../../lib/x86_64/jkhgap.a ../../../lib/x86_64/jkweb.a -lpthread -lssl -lcrypto ../../../htslib/libhts.a -lm -lz
/usr/bin/ld: ../../../lib/x86_64/jkweb.a(pipeline.o): unrecognized relocation (0x29) in section .text.checkOpts' /usr/bin/ld: final link failed: Bad value collect2: error: ld returned 1 exit status make[2]: *** [../../../../bin//axtChain] Error 1 make[2]: Leaving directory /home/dmorenos/software/GenomeAlignmentTools/kent/src/hg/mouseStuff/axtChain'
make[1]: *** [axtChain.all] Error 2
make[1]: Leaving directory `/home/dmorenos/software/GenomeAlignmentTools/kent/src/hg/mouseStuff'
make: *** [userApps] Error 2

Can you help me with this issue please?

Thanks.

chainNet input data query

Respected Sir,
I have generated chain file using axtChain but now when I am trying to make net file using chainNet program I am confused about the input files
target.sizes, query.sizes
The query chain was made by denovo assembled genome over reference genome. How to make these respective file. Please share your view I will be grateful to you.

RepeatFiller error

Dear Michael,

Recently, I want to use RepeatFiller in my analysis. But I got some problems.
Could you please give me some help.

Traceback (most recent call last): File "/public4/home/sc56340/genome/bin/RepeatFiller.py", line 396, in <module> curMiniBlockLines = GetChainBlockFromLastzOutput(allMiniChainLines, next_pos) File "/public4/home/sc56340/genome/bin/RepeatFiller.py", line 224, in GetChainBlockFromLastzOutput raise ValueError('ERROR! allMiniChainsSplit start separator line at position ' + str(position) + ' does not start with LINE...') ValueError: ERROR! allMiniChainsSplit start separator line at position 0 does not start with LINE...

Thank you. @MichaelHiller

NetFilterNonNested.perl input

Hello
I have generated target.net query.net file using chainNet but now when I am trying to run NetFilterNonNested.perl to filter but I am confused about the input files ref.query.net.gz. Could you give me any suggestions?
NetFilterNonNested.perl -doUCSCSynFilter -keepSynNetsWithScore 5000 -keepInvNetsWithScore 5000 ref.query.net.gz > ref.query.filtered.net

Versioned package release request

I'm working to make Repeatfiller/chainfiller available on Bioconda.

Please refer. -> https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository#creating-a-release

I need your help creating a versioned release to use for the Bioconda recipe. Once this is added to Bioconda, it'll also be made available as a Docker container from Biocontainers, and as a Singularity image from the Galaxy Project. The Bioconda bot will also recognize future releases and automatically update the recipe.

Please let me know

Thanks
Jay

NetFilteredNonNested error

Hi Michael,

I have aligned the mouse genome (mm10) with an unpublished marsupial genome. I have run chainCleaner, chainNet and I am now trying to run NetFilteredNonNested but I'm receiving an error.

This is the command I've run:

NetFilterNonNested.perl -doScoreFilter -minScore1 10000 -keepSynNetsWithScore 3000 -keepInvNetsWithScore 3000 mm10.smiCra1.net > mm10.smiCra1_filtered.net

And this is the error:

ERROR: parameter -keepSynNetsWithScore/-keepInvNetsWithScore-doUCSCSynFilter is given, but I cannot parse the net type from this fill line: fill 3051761 112 scaffold00029_pilon_pilon - 11057999 112 id 4113312 score 5313 ali 112

This is what my net file looks like:

##matrix=axtChain 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91
##gapPenalties=axtChain O=400 E=30
net chr1 195471971
 fill 3051761 112 scaffold00029_pilon_pilon - 11057999 112 id 4113312 score 5313 ali 112
 fill 3054673 508 scaffold00002_pilon_pilon + 385464750 554 id 2546634 score 8323 ali 471
 fill 3061871 46 scaffold00029_pilon_pilon + 24281035 46 id 7073966 score 3087 ali 46
 fill 3083164 548 scaffold00004_pilon_pilon - 33312094 340 id 1779203 score 11319 ali 224
  gap 3083328 301 scaffold00004_pilon_pilon - 33312154 114
 fill 3094693 20671 scaffold00012_pilon_pilon - 6162007 19098 id 241035 score 32777 ali 1550
  gap 3095443 32 scaffold00012_pilon_pilon - 6180332 0

Any advice on troubleshooting this error would be greatly appreciated.

Thanks so much!
Laura

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.