Giter Site home page Giter Site logo

moshi4 / fastdtlmapper Goto Github PK

View Code? Open in Web Editor NEW
12.0 3.0 5.0 39.82 MB

Genome-wide gene gain/loss mapping tool using DTL(Duplication-Transfer-Loss) reconciliation method

License: GNU General Public License v3.0

Python 84.13% Roff 2.01% Shell 11.12% Perl 2.71% Dockerfile 0.03%
python bioinformatics dtl-reconciliation phylogenomics phylogenetics functional-analysis genomics pipeline comparative-genomics molecular-evolution

fastdtlmapper's Introduction

fastdtlmapper's People

Contributors

moshi4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fastdtlmapper's Issues

IQ-TREE randomly resolved multifurcation is not appropriate for DTL reconciliation

IQ-TREE randomly resolves the multifurcation of the bootstrap tree (branch length = 0.0000000000).
However, it is assumed that the randomly resolved topology leads to undesirable results such as a significant increase in DTL cost.
Therefore, it may be effective to resolve multifurcation based on the topology of species tree to reduce the DTL cost generation instead of resolving random topology.

When multifurcation is detected in the bootstrap tree, I'd like to consider implementing a method to resolve it based on the topology of species tree.

Multifurcation gene tree sample
Screenshot from 2021-10-01 08-21-26

Provide a list of predicted results for horizontal gene transfer for each gene

FastDTLmapper does not provide a list of predicted results for horizontal gene transfer for each gene.
In order to improve the usability, we would like to consider providing the list of each horizontal gene transfer in the following TSV format.

e.g. Format example

OG_ID GENE_ID TransferPath
OG000000 Species0_GENE0 N00X -> Species0
OG000000 Species0_GENE1 SpeciesX -> Species0

Refactor: config.py is complex and has several responsibility

config.py has a number of responsibilities.
It is responsible for processing arguments, storing output paths, getting parsing commands, and so on.
Therefore, I'd like to divide them to make the code more concise.

Divide class candidate

  • Args class
  • RunCmd class
  • OutPath class

diamoand error

hi,
while using fastdtlmapper i got this error
then i installed diamond
thereafter also i am getting same error, kindly look into this error

thank you

Checking required programs are installed

ERROR: Cannot run diamond
Format of make database command:
diamond makedb --in INPUT -d OUTPUT
ERROR: Cannot run diamond
Format of search database command:
diamond blastp -d DATABASE -q INPUT -o OUTPUT --more-sensitive -p 1 --quiet -e 0.001 --compress 1
Please check diamond is installed and that the executables are in the system path

ERROR: An error occurred, please review the error messages they may contain useful information about the problem.

02. Align each OG(Ortholog Group) sequences using mafft

0% 0:0=0s
Finished: 2022/09/22 19:28:50 (Elapsed time = 0.000[h])

03. Trim each OG alignment using trimal

0% 0:0=0s
Finished: 2022/09/22 19:28:50 (Elapsed time = 0.000[h])

04. Reconstruct each OG gene tree using iqtree

0% 0:0=0s
Finished: 2022/09/22 19:28:51 (Elapsed time = 0.000[h])

05. Correct gene tree multifurcation using treerecs

0% 0:0=0s
Finished: 2022/09/22 19:28:51 (Elapsed time = 0.000[h])

06. DTL reconciliation using AnGST

0% 0:0=0s
Finished: 2022/09/22 19:28:51 (Elapsed time = 0.000[h])

07. Aggregate and map DTL reconciliation result

Traceback (most recent call last):
File "/home/pbl2-ec/miniconda3/bin/FastDTLmapper", line 8, in
sys.exit(main())
File "/home/pbl2-ec/miniconda3/lib/python3.9/site-packages/fastdtlmapper/scripts/FastDTLmapper.py", line 57, in main
aggregate_and_map(args, outpath)
File "/home/pbl2-ec/miniconda3/lib/python3.9/site-packages/fastdtlmapper/util/time.py", line 12, in wrap
ret = func(*args, **kargs)
File "/home/pbl2-ec/miniconda3/lib/python3.9/site-packages/fastdtlmapper/scripts/FastDTLmapper.py", line 245, in aggregate_and_map
output_aggregate_map_results(outpath, group_id2all_node_event)
File "/home/pbl2-ec/miniconda3/lib/python3.9/site-packages/fastdtlmapper/scripts/FastDTLmapper.py", line 332, in output_aggregate_map_results
UtilTree.map_node_event(
File "/home/pbl2-ec/miniconda3/lib/python3.9/site-packages/fastdtlmapper/util/tree.py", line 96, in map_node_event
node_event = node_id2node_event[node.name]
KeyError: 'N001'

Add publication-ready gain/loss map plot function

Gain/Loss map visualization using Seaview is enough useful for data analysis.
However, seaview gain/loss map figure is not publication-ready quality.
In order to improve the convenience of user, I'd like to add program that generate
gain/loss map figure nicely.

Plot only specified depth of GOterm in GOEA result

Upper level GOterms are not significant and unuseful to plot.
Exclude top(first) and second hierarchy GOterm.

df = pd.read_table(goea_result_file)
if extract_type == "enrichment":
extract_df = df[
(df["enrichment"] == "e") & (df[pvalue_column] < pvalue_thr)
]
elif extract_type == "purified":
extract_df = df[
(df["enrichment"] == "p") & (df[pvalue_column] < pvalue_thr)
]

FileNotFoundError

when I use the test example. I got the wrong info:
FileNotFoundError: [Errno 2] No such file or directory: 'output_minimum/02_dtl_reconciliation/OG0000000/treerecs/OG0000000_multifurcate.ufboot_recs.nwk'
How to solve this?

Use origin fasta annotation to unique serial id

Currently, user fasta annotation is formatted to completely unique serial id.
It is difficult to match unique serial id and original user fasta annotation id.
So, use origin fasta annotation id to unique serial id is preferable.

Add test code

  • prepare test data files
  • add all test codes
  • code coverage at least 75%

GOEA result summary report is desirable

The GOEA results directory structure is deeply nested, making it difficult to get a full picture.
I'd like to consider outputting a summary report file that summarizes the whole picture and a directory that summarizes plot diagrams.

ANgst Folder is not created

Need help with this please. Why doesnt the angst folder form?
I dont understand what might have caused this as I only ran the default command.

Thank you!

image

Plot parameters in goea.py must be instance variable

Currently, the plot parameters in goea.py are controlled by passing them as arguments to each function.
However, as the function calls become deeper, passing arguments is becoming more complicated.
Therefore, we would like to change the way we define plot parameters to be instance variables of a class.

FastDTLmapper error caused by python2.7 dependency

Dear moshi4,

I am Yuchun, an assistent professor in Sun Yat-sen University. I am trying to use FastDTLmapper for my data analysis, but an error was accured after I installed the software, even analyzed the provided example data. I have tried to install different version of FastDTLmapper, but it could not work. Could you please help me to solve this problem? Thanks

fastdtlmapper_error

Best wishes,

Yuchun

Received email on 2022/09/03 16:16

Hi, Yuchun (@Yuchunyang2022)

Thank you for your interest in FastDTLmapper.

The reason for the error is that you do not have python2.7 installed and AnGST, the tool to run DTL reconciliation, requires python2.7.
To run FastDTLmapper successfully, you will need to build an environment where python 2.X and python 3.X coexist, such as Ubuntu 20.04, although it will take a lot of work.

To save users the trouble of building an environment, I created a Docker image for running FastDTLmapper (See here).
Although knowledge of Docker is required, it is recommended to run FastDTLmapper on this Docker image environment.

Add GOEA plot user select color option

Currently, the GOEA results are plotted in gradient colors ranging from yellow to red, depending on the size of the P-value.
However, we would like to increase the number of options by adding an option to plot in user-specified colors in addition to the gradient color plots.

Fig. Current GOEA result plot
gain_goea_BP_purified
t

Add: GO enrichment analysis function

FastDTLmapper generates each phylogenetic tree nodes gene gain/loss information.
To clarify gain/loss genes functional trends, gain/loss GO enrichment analysis is useful.

Add below.

  • GO annotation function using InterProScan
  • GO enrichment analysis function using goatools

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.