sung-huan / annogesic Goto Github PK

View Code? Open in Web Editor NEW

28.0 28.0 13.0 172.49 MB

ANNOgesic - A Swiss army knife for the RNA-Seq based annotation of bacterial/archaeal genomes

Home Page: http://annogesic.readthedocs.io/en/latest/index.html

License: Other

Python 97.39% Shell 2.19% Makefile 0.06% Dockerfile 0.35%

annogesic's People

Contributors

Stargazers

Watchers

Forkers

malvikasharan najlabioinfo jiangchb pranjpb fasemoreakinyemi richabharti silviadg87 felixgrunberger vivianmonzon elnazamanzadeh 5l1v3r1 drgin-hub csbm-lab

annogesic's Issues

Multiple data set

Hello Sung-Huan,

I have multiple datasets (4, thus 8 TEX treated files) of the same bacteria tested under different conditions. I am interested in identifying the expression of sRNAs under these different conditions. Can I use Annogesic to run all the datasets at once or do you I have to run them separately, thus is Annogesic able to provide separate output files for each dataset in on run? Or do I have to run them each data individually?

too many warnings in target prediction

In the latest version in srna_target, I get thousands of a warnings like that:

WARNING: File ' ANNOgesic_analysis/output/sRNA_targets/RNAplex_results/NC_018660.1/RNAplfold/3_O3K_RS26175_cds-WP_000869859.1_11934-12962_-openen ' open error
WARNING: File ' ANNOgesic_analysis/output/sRNA_targets/RNAplex_results/NC_018660.1/RNAplfold/3_O3K_RS26170_cds-WP_001387337.1_11391-11930-openen ' open error
WARNING: File ' ANNOgesic_analysis/output/sRNA_targets/RNAplex_results/NC_018660.1/RNAplfold/3_O3K_RS26165_cds-WP_000019440.1-2_9596-10576-openen ' open error
WARNING: File ' ANNOgesic_analysis/output/sRNA_targets/RNAplex_results/NC_018660.1/RNAplfold/3_O3K_RS26125_cds-WP_001189111.1_3211-4719-openen ' open error
WARNING: File ' ANNOgesic_analysis/output/sRNA_targets/RNAplex_results/NC_018660.1/RNAplfold/3_O3K_RS26110_cds-WP_014971748.1_757-1737-_openen ' open error

is there is any explanation why is this happening

Installation directory

Thanks for the nice package.

I'm looking to install ANNOgesic on a computing cluster however annogesic create --project_path $PATH seems to require installation in my home directory.

Is there a way to install in another location?

Path to instal sgemehl in get_package_database.sh is old

currently :

VERSION=0_2_0
wget -cP $TOOL_PATH http://www.bioinf.uni-leipzig.de/Software/segemehl/segemehl_${VERSION}.tar.gz

should be if keeping the v_0_2_0 :

wget -cP $TOOL_PATH http://www.bioinf.uni-leipzig.de/Software/segemehl/old/segemehl_${VERSION}.tar.gz

or try newer version:

VERSION=0.3.4
wget -cP $TOOL_PATH http://www.bioinf.uni-leipzig.de/Software/segemehl/downloads/segemehl-${VERSION}.tar.gz
cd $TOOL_PATH
tar -zxvf segemehl-${VERSION}.tar.gz
cd segemehl-${VERSION}/
make all
cd $TOOL_PATH
rm segemehl-${VERSION}.tar.gz

annogesic: error: unrecognized arguments: --tsspredator_jar TSSpredator-1.1beta.jar

I am using annogesic for TSS identification and for other purpose, I installed annogesic updated java to 8, installed dependencies. I checked it is executable, i given -t, --tsspredator and even gave full command, I tried almost everything possible, but annogesic is not recognizing jar file. Kindly inform me if you have any sugeesion. I tried with /usr/local/bin/jar.file and also in placing jar file in working directory, but in any case its is not recognizing, I tried a lot but every time annogesic through error that unrecognized jar file.
Kindly suggest me, I shall be thankful

Thanks

Use normalized or raw coverage files for ANNOgesic?

Hello,

I prepared my .wig/coverage files using READemption2 (multi-species). I see that the tutorial recommends using the coverage files in the READemption subdir READemption_analysis/output/coverage. As referenced in the tutorial, there is no single coverage dir, but multiple dirs - one for each species for the raw data (e.g. species1_coverage-raw, species2_coverage-raw), and then some for normalized data (e.g. species1_coverage-tnoar_mil_normalized and species1_coverage-tnoar_min_normalized).

Does READemption perform better with raw coverage data? Or is it important to use normalized data?

Thank you very much!

sRNA prediction from RNA-seq data

Hi Sung-Huan,

I am looking into the possibility of performing sRNA prediction from my RNA-seq data. Currently my data are in fastq and Bam formats, which seems to be different from the required input file format for the sRNA detection mode.

From what I have read so far, it seems like I will have to convert my Bam files into Wig files, which then can became the input file for generating a transcript file using the transcript detection mode. The transcript file then can be an input for the sRNA detection mode. However, I am not sure if thats the right way to go. Or am I over-complicating things ? Are there other ways to perform sRNA detection with ANNOgesic using either fastq file or Bam files?

thanks a lot in advance,

Stephen

The use of paramters produced by optimize TSS/PS feature

The optimize TSS/PSS feature produces the best paramters for the next step of TSS/PS prediction. After the production of these paramters the user have to pass them manually to TSS/PS prediction function. In this way I have 2 issues:
1- Waiting optimization to finish and then passing the paramters manually, almost nobody wants to run the optimization without using the produced paramter in further steps
2- When working with multi-accession files such species that have plasmids or more than one chromosome. the optimization produce paramters set for each accession and when the user wants to pass the paramters, he will take only one of them to be passed to all of other accession.
it might seem minor thing for organisms with plasmid, but it's major for multi chromosome species such Vibrio Cholerae that has two simillar size chromosomes.
My sugesstion is:
The prediction tool looks automatically (or by a bool parameter) for optimized paramters and loads it if found. and then for each accession the tool should use the appropriate paramter set.

Read-only file system

Hi,

I'm getting the following error when I try to run the docker image on a computing cluster I have have root access to:

sternesp@cl5n008:~/lustre-microbiome/users/sternesp/annogesic$ singularity exec -B /lustre/work-lustre/microbiome/users/sternesp/annogesic/ANNOgesic annogesic.img \
> annogesic \
> transcript \
> --annotation_files /home/sternesp/lustre-microbiome/users/sternesp/annogesic/ANNOgesic/input/references/annotations/GUT_GENOME000001.gff \
> --project_path /lustre/work-lustre/microbiome/users/sternesp/annogesic/ANNOgesic \
> --frag_lib /lustre/work-lustre/microbiome/users/sternesp/annogesic/ANNOgesic/input/wigs/fragment/rev2.wig:frag:1:a:- \
> /lustre/work-lustre/microbiome/users/sternesp/annogesic/ANNOgesic/input/wigs/fragment/fwd2.wig:frag:1:a:+ \
> --replicate_frag all_1

       ___    _   ___   ______                  _     
      /   |  / | / / | / / __ \____ ____  _____(_)____ \
  __ / /| | /  |/ /  |/ / / / / __ `/ _ \/ ___/ / ___/__\
 |  / ___ |/ /|  / /|  / /_/ / /_/ /  __(__  ) / /__    /
 | /_/  |_/_/ |_/_/ |_/\____/\__, /\___/____/_/\___/   /
 |                          /____/ 
 |__________________
 |_____________________
 |________________________________________________
 |                                                \
 |________________________________________________/

Running transcript detection
Parsing GUT_GENOME000001.gff
Parsing rev2.wig
Parsing fwd2.wig
Merging wig files of GUT_GENOME000001
Importing fragment wig files
Computing transcripts for GUT_GENOME000001
Traceback (most recent call last):
  File "/usr/local/bin/annogesic", line 2238, in <module>
    main()
  File "/usr/local/bin/annogesic", line 2154, in main
    args.func(controller)
  File "/usr/local/bin/annogesic", line 2191, in run_Transcript_Assembly
    controller.transcript()
  File "/usr/local/lib/python3.6/dist-packages/annogesiclib/controller.py", line 464, in transcript
    transcript.run_transcript(args_tran, log)
  File "/usr/local/lib/python3.6/dist-packages/annogesiclib/transcript.py", line 314, in run_transcript
    self.tmps["gff"])
  File "/usr/local/lib/python3.6/dist-packages/annogesiclib/helper.py", line 224, in sort_gff
    out = open(out_file, "w")
OSError: [Errno 30] Read-only file system: 'tmp.gff'

Transcript command, Comparison step, division by zero error

Dear ANNOgesic developer,

Thank you for releasing this useful tool. I have been running ANNOgesic (v 1.0.6) transcript command and it gave me the following error:

File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/stat_TA_comparison.py", line 335, in print_tag_stat
str(float(express_gene) / float(stats["gene"]))))
ZeroDivisionError: float division by zero

The full log file are copied below. Would you please let me know what is the likely source of error (e.g. GFF file is not properly formatted etc...)? This way I can try a few changes. I was thinking it is a good idea to report this runtime error that is due to a division by zero, (e.g. it would be good to have more intuitive error messages to instruct the users on what went wrong). I think this is more of a suggestion rather than a bug.

The tex and notex data is private, but the RNA-seq fragment data is public. Please let me know if you require the fragment files for debugging and if so the best way to send the files to you (I can send some sample .wig files and GFF file to you via a downloadable link.)

Yours sincerely,

Ignatius Pang

Running transcript detection
Parsing Sa_JKD6009_ratt.gff
Parsing ctrl_notex_pos.wig
Parsing ctrl_notex_neg.wig
Parsing ctrl_tex_pos.wig
Parsing ctrl_tex_neg.wig
Parsing vanco_notex_pos.wig
Parsing vanco_notex_neg.wig
Parsing vanco_tex_pos.wig
Parsing vanco_tex_neg.wig
Merging wig files of Sa_JKD6009_ratt
Parsing SRR568061_plus_strand.wig
Parsing SRR568061_minus_strand.wig
etc... etc...
Parsing SRR568120_plus_strand.wig
Parsing SRR568120_minus_strand.wig
Merging wig files of Sa_JKD6009_ratt
Importing fragment wig files
Computing transcripts for Sa_JKD6009_ratt
/srv/scratch/z3371724/JKD6009_v1/output/transcripts/Sa_JKD6009_ratt_fragment
Importing tex_notex wig files
Computing transcripts for Sa_JKD6009_ratt
/srv/scratch/z3371724/JKD6009_v1/output/transcripts/Sa_JKD6009_ratt_tex_notex
Merging fragmented and tex treated ones
Merging gff files of Sa_JKD6009_ratt
Parsing Sa_JKD6009_ratt_transcript.gff
Parsing Sa_JKD6009_ratt_transcript.gfftmp
Merging gff files of Sa_JKD6009_ratt_transcript
Modifying Sa_JKD6009_ratt by refering to Sa_JKD6009_ratt.gff
Parsing Sa_JKD6009_ratt_transcript.gff
Parsing Sa_JKD6009_ratt.gff
Merging gff files of Sa_JKD6009_ratt
Comaring of transcripts and genome annotations
Traceback (most recent call last):
File "/apps/python/3.7.3/bin/annogesic", line 2178, in
main()
File "/apps/python/3.7.3/bin/annogesic", line 2094, in main
args.func(controller)
File "/apps/python/3.7.3/bin/annogesic", line 2131, in run_Transcript_Assembly
controller.transcript()
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/controller.py", line 462, in transcript
transcript.run_transcript(args_tran, log)
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/transcript.py", line 332, in run_transcript
self._compare_tss_cds(tas, args_tran, log)
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/transcript.py", line 128, in _compare_tss_cds
self._compare_cds(tas, args_tran, log)
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/transcript.py", line 113, in _compare_cds
args_tran.c_feature)
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/stat_TA_comparison.py", line 405, in stat_ta_gff
print_tag_stat(stats["All"], out_stat, express_gene, feature)
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/stat_TA_comparison.py", line 335, in print_tag_stat
str(float(express_gene) / float(stats["gene"]))))
ZeroDivisionError: float division by zero

--tex_notex_libs was assinged incorrectly

Hi,

I'm trying to execute the command shown below however I keep getting the following error:

Error: The --tex_notex_libs was assinged incorrectly. Please check it again.

Command:

WIG_FOLDER="ANNOgesic/input/wigs/fragment"
LIBS="$WIG_FOLDER/rev2.wig:frag:1:a:- $WIG_FOLDER/fwd2.wig:frag:1:a:+"

#Run TSS
singularity exec -B ANNOgesic annogesic.img \
annogesic \
tss_ps \
--fasta_files MGYG-HGUT-00001.fna \
--annotation_files GUT_GENOME000001.gff \
--project_path ANNOgesic \
--program TSS \
--tex_notex_libs $LIBS \
--condition_names fileName \
--replicate_tex all_1

Example wiggle file:

head $WIG_FOLDER/rev2.wig

track type=wiggle_0 name=fileName
variableStep chrom=GUT_GENOME000001_1 span=1
34	1
35	1
36	1
37	1
38	1
39	1
40	1
41	1

I've been scratching my head about what the issue is. Can you please help?

Thanks!.

Promoter detection ,problem with FASTA file

Dear Sung-Huan

I'm trying to get the promoter module to work, but I have got the following error shown below. I will send you the input files via e-mail. Would please help me in getting this to work?

Yours sincerely,

Igy

annogesic promoter
--tss_files /tss_part_10.gff
--fasta_files Sa_JKD6009.fasta
--motif_width 45 2-10
--parallels 10
--project_path JKD6009_v1

  /   |  / | / / | / / __ \____ ____  _____(_)____ \

__ / /| | / |/ / |/ / / / / __ `/ _ / / / /
| / ___ |/ /| / /| / // / // / ( ) / / /
| // |// |// |/_/_, /_///___/ /
| //
|_________________
|_____________________
|________________________________________________
|
|________________________________________________/

Running promoter detection
Parsing Sa_JKD6009.fasta
Parsing tss_part_10.gff
Checking gff file of tss_part_10.gff
Merging gff files of Sa_JKD6009_TSS
Generating fasta file of Sa_JKD6009
Traceback (most recent call last):
File "/apps/python/3.7.3/bin/annogesic", line 2178, in
main()
File "/apps/python/3.7.3/bin/annogesic", line 2094, in main
args.func(controller)
File "/apps/python/3.7.3/bin/annogesic", line 2143, in run_MEME
controller.meme()
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/controller.py", line 701, in meme
meme.run_meme(args_pro, log)
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/meme.py", line 397, in run_meme
self._split_fasta_by_strain(input_path)
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/meme.py", line 198, in _split_fasta_by_strain
"".join([filename[0], strain, filename[-1]])))
UnboundLocalError: local variable 'filename' referenced before assignment

Missing file error where it is generated temprory by the pipeline in sRNA detection subcommand

Missing file error where it is generated temprory by the pipeline
The file is accutally there but in a step back directory.

Wig file creation and normalization

Hello Sung-Huan,

I've ran into the following issue when trying to run ANNOgesic on my own data (tutorial worked OK). I'm trying to predict TSS using dRNAseq and RNAseq under same conditions.

However, I've made my wig files from bigWig, that was in turn made from a bedGraph. Furthermore, I have applied a certain normalization, so that libraries ran to a different scale would look comparable:

BPS=`samtools view $TAG.bam | cut -f 6 | perl -ne 'm/(\d+)M/; print "$1\n";' | awk '{sum+=$1} END {print sum}'`
SCF=`echo $BPS | awk '{printf "%.3f\n",1000000000/$1}'`
bedtools genomecov -scale  $SCF -ibam $TAG.bam -bg -strand + > $TAG.plus.bedGraph
bedtools genomecov -scale -$SCF -ibam $TAG.bam -bg -strand - > $TAG.minus.bedGraph

bedGraphToBigWig $TAG.plus.bedGraph  $CHROM $TAG.plus.bw
bedGraphToBigWig $TAG.minus.bedGraph $CHROM $TAG.minus.bw

However, when I convert the obtained bw to wig, it looks different, and makes wig file parser from TSS predator fail:

==> A1636-dRNA-Pool-BPS_S37.minus.wig <==
#bedGraph section A1636_v1_chr:0-10009
A1636_v1_chr 0 42 -0.937
A1636_v1_chr 42 43 -1.874
A1636_v1_chr 43 46 -2.811

Here's the output from TSS prediction run:

Running TSS prediction
Checking gff file of A1636_v1_2018.gff
Warning: Repeat locus_tag A1636_02668 in gff file!!!
Parsing A1636_v1_2018.fa
Parsing A1636_v1_2018.gff
Parsing A1636-dRNA-Pool-BPS_S37.minus.wig
Traceback (most recent call last):
File "/usr/local/bin/annogesic", line 2180, in
main()
File "/usr/local/bin/annogesic", line 2096, in main
args.func(controller)
File "/usr/local/bin/annogesic", line 2121, in run_TSSpredator
controller.tsspredator()
File "/usr/local/lib/python3.5/dist-packages/annogesiclib/controller.py", line 327, in tsspredator
tsspredator.run_tsspredator(args_tss, log)
File "/usr/local/lib/python3.5/dist-packages/annogesiclib/tsspredator.py", line 491, in run_tsspredator
self.multiparser.parser_wig(args_tss.wig_folder)
File "/usr/local/lib/python3.5/dist-packages/annogesiclib/multiparser.py", line 359, in parser_wig
out.write(" ".join(line))
AttributeError: 'NoneType' object has no attribute 'write'

I would appreciate any suggestions about how to fix this. Do I need to re-make my wig files? Thank you!

Conda recipe?

Hello Sung-Huan,

I've had quite a few problems installing the program on the cluster, where I don't have root privileges. There's a bioconda version of singularity, which I installed successfully, but when I tried to run the image, it generates the following error:

ERROR : Failed invoking the NEWUSER namespace runtime: Invalid argument
ABORT : Retval = 255

I've installed singularity and built annogesic image on my local workstation, and that worked all right.

I think you should consider making a bioconda recipe (once you already went through the trouble of making a docker container), that would really help people install and use it.

Thank you for the software!

TSS prediction fails with some cases where the gff files has more than one accession numbers

TSS prediction fails in some cases where the gff files has more than one accession numbers (for chromosomes or plasmids)
Happens when checking for overlapping in merging function.
I have 3 different shades of the issue of multi accession numbers in GFF files.

Acinetobacter: has 3 accession numbers, passed without errors!
empty results for the last 2 accession numbers
Vibrio: exit with error when merging GFFs
line 495, in check_overlap num_strain[tss.seq_id]["overlap"] += 1
KeyError: 'NC_002505.1'
Klebsiella: a step earlier exit with a different error
Checking gff file of GCF_000019565.1_ASM1956v1_genomic.gff
Error: The end point of gene:90974-91231_+ is longer than the length of whole genome.

In this error, I checked the gff file and found that the start site is belonging to another accession number mentioned after.
And that’s why it says out the length.

It works only if I split the annotation files by accession number and make a serparate analysis for each which will be hard to implement in shorter genomic parts like plasmids

Manually annotated GFF

Hello Sung-Huan,

hopefully you are not tired by all the questions. The ANNOgesic paper didn't seem to address this issue in much detail. I was wondering how exactly do you select "manually" validated TSS for the training set?

what sort of requirements do you impose on dRNAseq and RNAseq coverage?
how do you assign TSS to be primary or secondary? On few "secondary" TSS from the training set provided with the tutorial, I could not identify a "primary" one.

Thank you!

sRNA target prediction error

Hi, when running sRNA target prediction I experienced below

Running RNAplex with NZ_AP019730.1_target_11
Running RNAplex with NZ_AP019730.1_target_7
WARNING: File ' annogesic_analysis/output/sRNA_targets/RNAplex_results/NZ_AP019730.1/RNAplfold/NZ_AP019730.1_srna2_openen ' open error
WARNING: File ' annogesic_analysis/output/sRNA_targets/RNAplex_results/NZ_AP019730.1/RNAplfold/NZ_AP019730.1_srna47_openen ' open error
WARNING: File ' annogesic_analysis/output/sRNA_targets/RNAplex_results/NZ_AP019730.1/RNAplfold/NZ_AP019730.1_srna81_openen ' open error

any idea what went wrong?
thank you

Transcript detection error

Hi,

I am trying to predict sRNAs from RNA-seq data using ANNOgesic 1.0.16, and I should perform transcript detection first accroding to the tutorial. However, I got errors when performed transcript detection. The command I used is as follows:

annogesic transcript --project_path /public/guanjiahao/rna_seq/ANNOgesic --frag_libs /public/guanjiahao/rna_seq/ANNOgesic/input/wigs/fragment/WT_1.Forward.wig:frag:1:a:+ \
/public/guanjiahao/rna_seq/ANNOgesic/input/wigs/fragment/WT_1.Reverse.wig:frag:1:a:- \
/public/guanjiahao/rna_seq/ANNOgesic/input/wigs/fragment/WT_2.Forward.wig:frag:1:b:+ \
/public/guanjiahao/rna_seq/ANNOgesic/input/wigs/fragment/WT_2.Reverse.wig:frag:1:b:- \
/public/guanjiahao/rna_seq/ANNOgesic/input/wigs/fragment/WT_3.Forward.wig:frag:1:c:+ \
/public/guanjiahao/rna_seq/ANNOgesic/input/wigs/fragment/WT_3.Reverse.wig:frag:1:c:- --replicate_frag all_2

And I got these following error messages:

Running transcript detection
Parsing WT_1.Forward.wig
Parsing WT_1.Reverse.wig
Parsing WT_2.Forward.wig
Parsing WT_2.Reverse.wig
Parsing WT_3.Forward.wig
Parsing WT_3.Reverse.wig
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_1.Forward_STRAIN_NC_016845.1: No such file or directory
sh: line 1: .wig: command not found
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_1.Reverse_STRAIN_NC_016845.1: No such file or directory
sh: line 1: .wig: command not found
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_2.Forward_STRAIN_NC_016845.1: No such file or directory
sh: line 1: .wig: command not found
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_2.Reverse_STRAIN_NC_016845.1: No such file or directory
sh: line 1: .wig: command not found
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_3.Forward_STRAIN_NC_016845.1: No such file or directory
sh: line 1: .wig: command not found
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_3.Reverse_STRAIN_NC_016845.1: No such file or directory
sh: line 1: .wig: command not found
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_1.Forward_STRAIN_NC_016838.1: No such file or directory
sh: line 1: .wig: command not found
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_1.Reverse_STRAIN_NC_016838.1: No such file or directory
sh: line 1: .wig: command not found
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_2.Forward_STRAIN_NC_016838.1: No such file or directory
sh: line 1: .wig: command not found
......
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_1.Forward_STRAIN_NC_016841.1: No such file or directory
sh: line 1: .wig: command not found
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_1.Reverse_STRAIN_NC_016841.1: No such file or directory
sh: line 1: .wig: command not found
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_2.Forward_STRAIN_NC_016841.1: No such file or directory
sh: line 1: .wig: command not found
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_2.Reverse_STRAIN_NC_016841.1: No such file or directory
sh: line 1: .wig: command not found
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_3.Forward_STRAIN_NC_016841.1: No such file or directory
sh: line 1: .wig: command not found
cat: /public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/tmp_wig/frag/tmp/WT_3.Reverse_STRAIN_NC_016841.1: No such file or directory
sh: line 1: .wig: command not found
Importing fragment wig files
Computing transcripts for NC_016839.1

Computing transcripts for NC_016840.1

Computing transcripts for NC_016847.1

Computing transcripts for NC_016841.1

Computing transcripts for NC_016845.1

Computing transcripts for NC_016838.1

Computing transcripts for NC_016846.1

/public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/NC_016839.1
_fragment
/public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/NC_016840.1
_fragment
/public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/NC_016847.1
_fragment
/public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/NC_016841.1
_fragment
/public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/NC_016845.1
_fragment
/public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/NC_016838.1
_fragment
/public/guanjiahao/rna_seq/ANNOgesic/output/transcripts/NC_016846.1
_fragment
Traceback (most recent call last):
  File "/public/guanjiahao/biosoft/ANNOgesic/bin/annogesic", line 2230, in <module>
    main()
  File "/public/guanjiahao/biosoft/ANNOgesic/bin/annogesic", line 2146, in main
    args.func(controller)
  File "/public/guanjiahao/biosoft/ANNOgesic/bin/annogesic", line 2183, in run_Transcript_Assembly
    controller.transcript()
  File "/public/guanjiahao/biosoft/ANNOgesic/bin/annogesiclib/controller.py", line 463, in transcript
    transcript.run_transcript(args_tran, log)
  File "/public/guanjiahao/biosoft/ANNOgesic/bin/annogesiclib/transcript.py", line 320, in run_transcript
    args_tran.gffs, "tmp"), None, None)
  File "/home/guanjiahao/miniconda3/lib/python3.7/posixpath.py", line 80, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

It seemed that there were something wrong within the tmp directory, so I checked the ANNOgesic directory:

$ tree ANNOgesic
ANNOgesic
├── input
│   ├── BAMs
│   │   ├── BAMs_map_reference_genomes
│   │   │   ├── fragment
│   │   │   └── tex_notex
│   │   └── BAMs_map_related_genomes
│   │       ├── fragment
│   │       └── tex_notex
│   ├── databases
│   ├── manual_processing_sites
│   ├── manual_TSSs
│   ├── mutation_tables
│   ├── reads
│   ├── references
│   │   ├── annotations
│   │   └── fasta_files
│   ├── riboswitch_ID_file
│   ├── RNA_thermometer_ID_file
│   └── wigs
│       ├── fragment
│       │   ├── WT_1.Forward.wig
│       │   ├── WT_1.Reverse.wig
│       │   ├── WT_2.Forward.wig
│       │   ├── WT_2.Reverse.wig
│       │   ├── WT_3.Forward.wig
│       │   └── WT_3.Reverse.wig
│       └── tex_notex
├── output
│   └── transcripts
│       ├── gffs
│       │   ├── NC_016838.1\012_transcript.gff
│       │   ├── NC_016839.1\012_transcript.gff
│       │   ├── NC_016840.1\012_transcript.gff
│       │   ├── NC_016841.1\012_transcript.gff
│       │   ├── NC_016845.1\012_transcript.gff
│       │   ├── NC_016846.1\012_transcript.gff
│       │   └── NC_016847.1\012_transcript.gff
│       ├── log.txt
│       ├── statistics
│       ├── tables
│       └── tmp_wig
│           └── frag
│               ├── tmp
│               │   ├── NC_016838.1\012_forward.wig
│               │   ├── NC_016838.1\012_reverse.wig
│               │   ├── NC_016839.1\012_forward.wig
│               │   ├── NC_016839.1\012_reverse.wig
│               │   ├── NC_016840.1\012_forward.wig
│               │   ├── NC_016840.1\012_reverse.wig
│               │   ├── NC_016841.1\012_forward.wig
│               │   ├── NC_016841.1\012_reverse.wig
│               │   ├── NC_016845.1\012_forward.wig
│               │   ├── NC_016845.1\012_reverse.wig
│               │   ├── NC_016846.1\012_forward.wig
│               │   ├── NC_016846.1\012_reverse.wig
│               │   ├── NC_016847.1\012_forward.wig
│               │   └── NC_016847.1\012_reverse.wig
│               ├── WT_1.Forward.wig
│               ├── WT_1.Forward.wig_folder
│               │   ├── WT_1.Forward_STRAIN_NC_016838.1\012.wig
│               │   ├── WT_1.Forward_STRAIN_NC_016839.1\012.wig
│               │   ├── WT_1.Forward_STRAIN_NC_016840.1\012.wig
│               │   ├── WT_1.Forward_STRAIN_NC_016841.1\012.wig
│               │   ├── WT_1.Forward_STRAIN_NC_016845.1\012.wig
│               │   ├── WT_1.Forward_STRAIN_NC_016846.1\012.wig
│               │   └── WT_1.Forward_STRAIN_NC_016847.1\012.wig
│               ├── WT_1.Reverse.wig
│               ├── WT_1.Reverse.wig_folder
│               │   ├── WT_1.Reverse_STRAIN_NC_016838.1\012.wig
│               │   ├── WT_1.Reverse_STRAIN_NC_016839.1\012.wig
│               │   ├── WT_1.Reverse_STRAIN_NC_016840.1\012.wig
│               │   ├── WT_1.Reverse_STRAIN_NC_016841.1\012.wig
│               │   ├── WT_1.Reverse_STRAIN_NC_016845.1\012.wig
│               │   ├── WT_1.Reverse_STRAIN_NC_016846.1\012.wig
│               │   └── WT_1.Reverse_STRAIN_NC_016847.1\012.wig
│               ├── WT_2.Forward.wig
│               ├── WT_2.Forward.wig_folder
│               │   ├── WT_2.Forward_STRAIN_NC_016838.1\012.wig
│               │   ├── WT_2.Forward_STRAIN_NC_016839.1\012.wig
│               │   ├── WT_2.Forward_STRAIN_NC_016840.1\012.wig
│               │   ├── WT_2.Forward_STRAIN_NC_016841.1\012.wig
│               │   ├── WT_2.Forward_STRAIN_NC_016845.1\012.wig
│               │   ├── WT_2.Forward_STRAIN_NC_016846.1\012.wig
│               │   └── WT_2.Forward_STRAIN_NC_016847.1\012.wig
│               ├── WT_2.Reverse.wig
│               ├── WT_2.Reverse.wig_folder
│               │   ├── WT_2.Reverse_STRAIN_NC_016838.1\012.wig
│               │   ├── WT_2.Reverse_STRAIN_NC_016839.1\012.wig
│               │   ├── WT_2.Reverse_STRAIN_NC_016840.1\012.wig
│               │   ├── WT_2.Reverse_STRAIN_NC_016841.1\012.wig
│               │   ├── WT_2.Reverse_STRAIN_NC_016845.1\012.wig
│               │   ├── WT_2.Reverse_STRAIN_NC_016846.1\012.wig
│               │   └── WT_2.Reverse_STRAIN_NC_016847.1\012.wig
│               ├── WT_3.Forward.wig
│               ├── WT_3.Forward.wig_folder
│               │   ├── WT_3.Forward_STRAIN_NC_016838.1\012.wig
│               │   ├── WT_3.Forward_STRAIN_NC_016839.1\012.wig
│               │   ├── WT_3.Forward_STRAIN_NC_016840.1\012.wig
│               │   ├── WT_3.Forward_STRAIN_NC_016841.1\012.wig
│               │   ├── WT_3.Forward_STRAIN_NC_016845.1\012.wig
│               │   ├── WT_3.Forward_STRAIN_NC_016846.1\012.wig
│               │   └── WT_3.Forward_STRAIN_NC_016847.1\012.wig
│               ├── WT_3.Reverse.wig
│               └── WT_3.Reverse.wig_folder
│                   ├── WT_3.Reverse_STRAIN_NC_016838.1\012.wig
│                   ├── WT_3.Reverse_STRAIN_NC_016839.1\012.wig
│                   ├── WT_3.Reverse_STRAIN_NC_016840.1\012.wig
│                   ├── WT_3.Reverse_STRAIN_NC_016841.1\012.wig
│                   ├── WT_3.Reverse_STRAIN_NC_016845.1\012.wig
│                   ├── WT_3.Reverse_STRAIN_NC_016846.1\012.wig
│                   └── WT_3.Reverse_STRAIN_NC_016847.1\012.wig
└── used_annogesic_version.txt

It seemed that the directories were not identical with those in the "cat" commands.

In addition, my wiggle files are as follows, having 7 chrom IDs, including 1 chromosome and 6 plasmids:

$ head /public/guanjiahao/rna_seq/ANNOgesic/input/wigs/fragment/WT_1.Forward.wig
track type=wiggle_0 name="WT_1"
variableStep chrom=NC_016845.1
130	3.00
131	3.00
132	3.00
133	3.00
134	3.00
135	3.00
136	3.00
137	7.00
...
variableStep chrom=NC_016839.1
...
variableStep chrom=NC_016840.1
...
variableStep chrom=NC_016847.1
...
variableStep chrom=NC_016841.1
...
variableStep chrom=NC_016838.1
...
variableStep chrom=NC_016846.1

I also have tried to install ANNOgesic from github and pip, respectively, but both reported the same errors. I am not sure what may caused the problem, it would be truly helpful if some possible solutions are provided.

best wishes,

Guan Jiahao

ERROR: int is not iterable, in promoter prediction subcommand

Merging results srna_target, wrong column names

Hi!

I think I found a small bug when merging results from different methods of srna_target:

ANNOgesic/annogesiclib/merge_rnaplex_rnaup.py

Line 462 in 7b136ae

"Energy_" + methods[0], "Rank_" + methods[1],

ANNOgesic/annogesiclib/merge_rnaplex_rnaup.py

Line 463 in 7b136ae

"Energy_" + methods[0], "Rank_" + methods[1]]) + "\n")

I guess these two lines should instead be:

 "Energy_" + methods[0], "Rank_" + methods[0], 
 "Energy_" + methods[1], "Rank_" + methods[1]]) + "\n")

(please note the indexes of methods)

Error in sRNA_target prediction

See traceback
Ranking for IntaRNA
Traceback (most recent call last):
File "/usr/local/bin/annogesic", line 2230, in
main()
File "/usr/local/bin/annogesic", line 2146, in main
args.func(controller)
File "/usr/local/bin/annogesic", line 2207, in sRNA_target
controller.srna_target()
File "/usr/local/lib/python3.6/dist-packages/annogesiclib/controller.py", line 831, in srna_target
srnatarget.run_srna_target_prediction(args_tar, log)
File "/usr/local/lib/python3.6/dist-packages/annogesiclib/srna_target.py", line 599, in run_srna_target_prediction
self._merge_rnaplex_rnaup(prefixs, args_tar, log)
File "/usr/local/lib/python3.6/dist-packages/annogesiclib/srna_target.py", line 462, in _merge_rnaplex_rnaup
os.path.join(self.gff_path, prefix + ".gff"))
File "/usr/local/lib/python3.6/dist-packages/annogesiclib/merge_rnaplex_rnaup.py", line 674, in merge_srna_target
length)
File "/usr/local/lib/python3.6/dist-packages/annogesiclib/merge_rnaplex_rnaup.py", line 108, in print_rank_one
length)
File "/usr/local/lib/python3.6/dist-packages/annogesiclib/merge_rnaplex_rnaup.py", line 50, in mod_srna_tar_pos
start = int(pos.split(",")[0])
ValueError: invalid literal for int() with base 10: 'NZ_CP024992.1_srna322'

sRNA prediction error

Hi Sung-Huan,
I am running sRNA prediction on a bacteria with 3 plasmids. Unfortunately I got the attached error. Hopefully you can be able to help me.
Secondly, I want to enquire if it is necessary to list all the --promoter_names (MOTIF_1 etc) if one wants to identify all possible sRNAs or it can be left out as default (none)?

Building a new DB, current time: 07/23/2020 17:45:00
New DB name: /Rex/uli/eco83/annogesic/input/databases/sRNA_database_BSRD
New DB title: /Rex/uli/eco83/annogesic/input/databases/sRNA_database_BSRD.fa
Sequence type: Nucleotide
Deleted existing Nucleotide BLAST database named /Rex/uli/eco83/annogesic/input/databases/sRNA_database_BSRD
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 897 sequences in 0.025398 seconds.
Running Blast of GCF_000299455.1_ASM29945v1_genomic in /Rex/uli/eco83/annogesic/input/databases/sRNA_database_BSRD
Warning: [blastn] Query is Empty!
Generating table for GCF_000299455.1_ASM29945v1_genomic
Classifying sRNA of GCF_000299455.1_ASM29945v1_genomic
Traceback (most recent call last):
File "/usr/local/bin/annogesic", line 2230, in
main()
File "/usr/local/bin/annogesic", line 2146, in main
args.func(controller)
File "/usr/local/bin/annogesic", line 2189, in run_sRNA_detection
controller.srna_detection()
File "/usr/local/lib/python3.8/dist-packages/annogesiclib/controller.py", line 641, in srna_detection
srna.run_srna_detection(args_srna, log)
File "/usr/local/lib/python3.8/dist-packages/annogesiclib/srna.py", line 1039, in run_srna_detection
self._class_srna(prefixs, args_srna, log)
File "/usr/local/lib/python3.8/dist-packages/annogesiclib/srna.py", line 773, in _class_srna
classify_srna(os.path.join(self.all_best["all_gff"],
File "/usr/local/lib/python3.8/dist-packages/annogesiclib/sRNA_class.py", line 286, in classify_srna
class_num, index = print_stat_title(
File "/usr/local/lib/python3.8/dist-packages/annogesiclib/sRNA_class.py", line 170, in print_stat_title
"2d_energy", srna_datas[strain][0].attributes.keys(),
IndexError: list index out of range

Thanks

TAP-only (no TEX) dRNA-seq datasets

Hi,

Thanks for developing and maintaining this great tool.
I was wondering whether it would be possible to analyze dRNA-seq datasets which were generated using only TAP, without TEX. An enrichment of reads at 5′ ends should still be detectable, although the signal would probably be weaker.
If you have already tested ANNOgesic on such datasets, do you have any recommendation about the optimization of its parameters?

Thank you!
Simone

TSSpredator parameter defaults

Hello Sung-Huan,

I'm trying to run some stats on predicted TSS with different TSSpredator parameters, and having trouble locating them from the logs. Right now I'm doing regular tss_ps runs without any "manual" files. I think the defaults have changed since I've updated the version (1.0.1 -> 1.10) but I'm having a bit of trouble finding the actual parameters in the files.

So, the header of gff files in /MasterTable folder has the following line:

##parameters 0.3 2.0 0 0.0 2.0 0.9 3 HIGHEST 1 1

What are these numbers? Overall, I can't find in which log are the TSSpredator parameters printed...

Thank you in advance!

tex_notex_libs formatting

Hello! First of all, thank you for making this program available for everyone to use.

I'm trying to use tss_ps on my wig files from RNA-seq data but I'm having troubles with the formatting. This is the format I was using to test out 1 wig file:

--tex_notex_libs /home/gcabebe/rnaseq/SRP157937/TestAnalysis/output/pseudomonas_coverage-tnoar_min_normalized/SRR7693534.trimmomatic_out_div_by_25687973.0_multi_by_9485079.0_forward.wig:frag:1:a:+

When I run it, I get the following error:

Running TSS prediction
Error: The --tex_notex_libs was assinged incorrectly. Please check it again.

Here's the shell script I was using for testing 1 wig file:

#!/bin/bash
#SBATCH -N 1
#SBATCH -n 4
#SBATCH --mem=0
#SBATCH -p long
#SBATCH -t 150:00:00

script=$0
project_path=$1 # /home/gcabebe/rnaseq/SRP157937/annogesic_tss
fasta_files=$2 # /home/gcabebe/rnaseq/SRP157937/TestAnalysis/input/pseudomonas_reference_sequences/Pputida_KT2440_MOD.fna
annotation_files=$3 # /home/gcabebe/rnaseq/SRP157937/TestAnalysis/input/pseudomonas_annotations/Pseudomonas_putida_KT2440_110.gff
tex_notex_libs=$4 # /home/gcabebe/rnaseq/SRP157937/TestAnalysis/output/pseudomonas_coverage-tnoar_min_normalized/SRR7693534.trimmomatic_out_div_by_25687973.0_multi_by_9485079.0_forward.wig:frag:1:a:+
condition_names=$5 # citrate1
replicate_tex=$6 # 1_6

annogesic tss_ps --project_path $project_path --program TSS --tsspredator_path /home/gcabebe/trx_tools/TSSpredator-1.1beta.jar --fasta_files $fasta_files --annotation_files $annotation_files --tex_notex_libs $tex_notex_libs --condition_names $condition_names --replicate_tex $replicate_tex

Additionally, I would like to run this program with 48 wig files, and I keep having issues with formatting all of them on a shell script. Is there an efficient way of running all 48 wig file assignments at once?

annogesic from nanopore transcriptomic output

Hi, I was just wondering if ANNOgesic is able to handle data received from nanopore RNA-seq output? If I can map outside the pipeline and provide wig files directly, also if the data is not stranded (as in direct cDNA sequencing), how do I do it?

Thanks!

TSSpredator error

I am having issues with the TSSpredator in the Annogesic pipeline. There seems to be an issue with the configuration file. I experienced this error after updating my version of Annogesic both in singularity and docker. Please find below my error message for your perusal:
Singularity error:
Running TSSpredator for NC_018658.1
Error: There is not MasterTable file in /home/rna-seq/Rex/DFG/new_data/Annogesic_noblast/output/TSSs/MasterTables/MasterTable_NC_018660.1
Please check configuration file.
Error: There is not MasterTable file in /home/rna-seq/Rex/DFG/new_data/Annogesic_noblast/output/TSSs/MasterTables/MasterTable_NC_018659.1
Please check configuration file.
Error: There is not MasterTable file in /home/rna-seq/Rex/DFG/new_data/Annogesic_noblast/output/TSSs/MasterTables/MasterTable_NC_018666.1
Please check configuration file.
Error: There is not MasterTable file in /home/rna-seq/Rex/DFG/new_data/Annogesic_noblast/output/TSSs/MasterTables/MasterTable_NC_018658.1
Please check configuration file.
Merging gff files of GCF_000299455.1_ASM29945v1_genomic_new_TSS
Running statistaics
Validating TSSs with genome annotations
Remove temperary files and folders

Docker error:
Traceback (most recent call last):
File "/usr/local/bin/annogesic", line 2238, in
main()
File "/usr/local/bin/annogesic", line 2154, in main
args.func(controller)
File "/usr/local/bin/annogesic", line 2179, in run_TSSpredator
controller.tsspredator()
File "/usr/local/lib/python3.8/dist-packages/annogesiclib/controller.py", line 329, in tsspredator
tsspredator.run_tsspredator(args_tss, log)
File "/usr/local/lib/python3.8/dist-packages/annogesiclib/tsspredator.py", line 621, in run_tsspredator
self._convert_gff(prefixs, args_tss, log)
File "/usr/local/lib/python3.8/dist-packages/annogesiclib/tsspredator.py", line 338, in _convert_gff
self.converter.convert_mastertable2gff(
File "/usr/local/lib/python3.8/dist-packages/annogesiclib/converter.py", line 371, in convert_mastertable2gff
tss_libs = self._get_libs(tss_file)
File "/usr/local/lib/python3.8/dist-packages/annogesiclib/converter.py", line 350, in _get_libs
for tss in self.tssparser.entries(tss_fh):
File "/usr/local/lib/python3.8/dist-packages/annogesiclib/TSSpredator_parser.py", line 10, in entries
yield TSSPredatorEntry(row)
File "/usr/local/lib/python3.8/dist-packages/annogesiclib/TSSpredator_parser.py", line 16, in init
assert(len(row) == 30)
AssertionError

This error doesn’t pop up with the older versions of annogesic. Can you please have a look at this error? Thanks

Predicted sRNAs seem too long

Hi Sung-Huan,

I have previous predicted a list of sRNAs in Campylobacter genome using ANNOgesic. To evaluate the quality of the prediction, I compared the prediction with previously annotated sRNAs. While the predictions successfully detect most annoatated sRNAs, I am alarmed that the predicted sRNA appeared too long.

In the attached screenshot below, my prediction (Li_ANNOgesic_154) successfully detected the annotated ncRNA rnpB, the prediction is over double the length of the annotated ncRNA, with the 3'end particularly far off from where the read coverage drop off (I presume ANNOgesic has included some background noise after the annotated 3'end into the prediction). Similarly pattern has been observed in almost all of the predictions.

While it is possible to refine the predictions by mannually adjusting the cutoff threshold, that will be tricky in my case as I am working with nearly 100 set of RNA-Seq data, and each dataset would have different level of background noise. Are there any other parameters in ANNOgesic that can be adjusted to improve the sRNA boundary prediction?

best wishes,

Stephen

Error in annotation transfer subcommand

Traceback (most recent call last):
File "/usr/local/bin/annogesic", line 2227, in
main()
File "/usr/local/bin/annogesic", line 2143, in main args.func(controller) File "/usr/local/bin/annogesic", line 2159, in run_RATT controller.ratt() File "/usr/local/lib/python3.6/dist-packages/annogesiclib/controller.py", line 277, in ratt ratt.annotation_transfer(args_ratt, log) File "/usr/local/lib/python3.6/dist-packages/annogesiclib/ratt.py", line 197, in annotation_transfer out_gbk = self._convert_embl(args_ratt.ref_gbki, log)
AttributeError: 'ArgsContainer' object has no attribute 'ref_gbki'

Transcript detection

Hi,

I have been trying to run transcript detection of my wiggle files using the following command:

annogesic transcript --project_path annogesic/ --annotation_files annogesic/input/references/annotation/11168_genome.gff3 --modify_transcript merge_overlap --frag_libs annogesic/input/wigs/fragment/iron_replete_limited_mRNA_3_forward.wig:frag:1:a:+ annogesic/input/wigs/fragment/iron_replete_limited_mRNA_3_reverse.wig:frag:1:a:- --replicate_frag 1_1
#==================================================
However, the following error message has been returned:

Parsing 11168_genome.gff3
Parsing iron_replete_limited_mRNA_3_forward.wig
Parsing iron_replete_limited_mRNA_3_reverse.wig
Merging wig files of 11168_genome.
Error: comparing input files of 11168_genome. failed. Please check the seq IDs of all gff and fasta files, they should be the same.
Please also check the wiggle files which should contain forward and reverse files.

#==================================================
As the seq IDs gff file and fasta files are the same (see the attached files), I am not sure what would have caused that error message. Is there any other issue in my command line?

best wishes,

Stephen

11168_genome.gff3.txt

NCTC11168_genome.fasta.txt

optimize_tss_ps error:TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Hi,

I am trying to run optimize_tss_ps by using:

singularity` exec -B /home/ubuntu/anaconda3/bin/annogesic annogesic.img \
annogesic optimize_tss_ps \
    --fasta_files /home/ubuntu/annogesic_2/input/references/fasta_files/NC_009839.fa \
    --annotation_files /home/ubuntu/annogesic_2/input/references/annotations/NC_009839.gff \
    --tex_notex_libs $TEX_LIBS \
    --condition_names TSS --steps 25 \
    --manual_files /home/ubuntu/annogesic_2/input/manual_TSSs/NC_009839_manual_TSS.gff \
    --curated_sequence_length NC_009839:200000 \
    --replicate_tex all_1 \
    --project_path /home/ubuntu/annogesic_2

with $TEX_LIBS being

TEX_LIBS="$WIG_FOLDER/GSM951380_Log_81116_R1_minus_TEX_in_NC_009839_minus.wig:notex:1:a:- \
$WIG_FOLDER/GSM951380_Log_81116_R1_minus_TEX_in_NC_009839_plus.wig:notex:1:a:+ \
$WIG_FOLDER/GSM951381_Log_81116_R1_plus_TEX_in_NC_009839_minus.wig:tex:1:a:- \
$WIG_FOLDER/GSM951381_Log_81116_R1_plus_TEX_in_NC_009839_plus.wig:tex:1:a:+ \
$WIG_FOLDER/GSM951382_Log_81116_R2_minus_TEX_in_NC_009839_minus.wig:notex:1:b:- \
$WIG_FOLDER/GSM951382_Log_81116_R2_minus_TEX_in_NC_009839_plus.wig:notex:1:b:+ \
$WIG_FOLDER/GSM951383_Log_81116_R2_plus_TEX_in_NC_009839_minus.wig:tex:1:b:- \
$WIG_FOLDER/GSM951383_Log_81116_R2_plus_TEX_in_NC_009839_plus.wig:tex:1:b:+"

I get the error

Running optimization of TSS prediction
Parsing GSM951381_Log_81116_R1_plus_TEX_in_NC_009839_plus.wig
Parsing GSM951383_Log_81116_R2_plus_TEX_in_NC_009839_minus.wig
Parsing GSM951381_Log_81116_R1_plus_TEX_in_NC_009839_minus.wig
Parsing GSM951380_Log_81116_R1_minus_TEX_in_NC_009839_minus.wig
Parsing GSM951383_Log_81116_R2_plus_TEX_in_NC_009839_plus.wig
Parsing GSM951382_Log_81116_R2_minus_TEX_in_NC_009839_minus.wig
Parsing GSM951380_Log_81116_R1_minus_TEX_in_NC_009839_plus.wig
Parsing GSM951382_Log_81116_R2_minus_TEX_in_NC_009839_plus.wig
Parsing NC_009839.gff
Parsing NC_009839.fa
Parsing NC_009839_manual_TSS.gff
Traceback (most recent call last):
  File "/usr/local/bin/annogesic", line 2238, in <module>
    main()
  File "/usr/local/bin/annogesic", line 2154, in main
    args.func(controller)
  File "/usr/local/bin/annogesic", line 2182, in optimize_TSSpredator
    controller.optimize()
  File "/usr/local/lib/python3.6/dist-packages/annogesiclib/controller.py", line 379, in optimize
    optimize_tss(args_ops, log)
  File "/usr/local/lib/python3.6/dist-packages/annogesiclib/optimize.py", line 31, in optimize_tss
    Multiparser().parser_gff(args_ops.manuals, None)
  File "/usr/local/lib/python3.6/dist-packages/annogesiclib/multiparser.py", line 283, in parser_gff
    os.path.join(gff_folder, "tmp.gff"))
  File "/usr/local/lib/python3.6/dist-packages/annogesiclib/helper.py", line 219, in sort_gff
    for entry in self.gff3parser.entries(g_f):
  File "/usr/local/lib/python3.6/dist-packages/annogesiclib/gff3.py", line 26, in entries
    yield self._dict_to_entry(entry_dict)
  File "/usr/local/lib/python3.6/dist-packages/annogesiclib/gff3.py", line 29, in _dict_to_entry
    return Gff3Entry(entry_dict)
  File "/usr/local/lib/python3.6/dist-packages/annogesiclib/gff3.py", line 56, in __init__
    start, end = sorted([int(entry_dict["start"]), int(entry_dict["end"])])
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

please also find the wiggle files and fasta file attached i obtained the files from geo with accession number GSE38883.
GSM951380_Log_81116_R1_minus_TEX_in_NC_009839_plus.wig.txt
NC_009839.fa.txt
GSM951380_Log_81116_R1_minus_TEX_in_NC_009839_minus.wig.txt

annogesic tss_ps: understanding the "--condition_names" error

Hello,

I am working through the tutorial on a test dataset that I have run through READemption. I am now running ANNOgesic within the latest docker image.

I have about 32 files/libraries that I would like to run through optimize_tss_ps. These are divided into condition 1 and condition 2. A sample of my library string fed into --tex_notex_libs:

path/path/sp1_ctrl_A_TEX_trimmed_forward.wig:tex:1:a:+ path/path/sp1_ctrl_A_TEX_trimmed_reverse.wig:tex:1:a:- path/path/sp1_ctrl_A_trimmed_forward.wig:notex:1:a:+ path/path/sp1_ctrl_A_trimmed_reverse.wig:notex:1:a:- path/path/sp1_exp_A_TEX_trimmed_forward.wig:tex:2:a:+

...etc.

I don't have enough known sites to do run optimize_tss_ps, and I have a reference genome/gff, so I do not need to run annogesic annotation_transfer or annogesic update_genome_fasta.

Here is the command triggering the issue:

annogesic tss_ps \
--fasta_files ${ANNOgesic_input}/references/fasta_files/NC_000000.1.fna \
--annotation_files ${ANNOgesic_input}/references/annotations/NC_000000.1.gff \
--tex_notex_libs $LIB_STRING \
--project_path data/ANNOgesic \
--program TSS \
--replicate_tex all_1 \
--condition_names $COND_STRING

When I try to run ANNOgesic tss_ps, I get this error:
Running TSS prediction Error: The number of --condition_names should be the same to the condition of input libraries

I've tried the following for $COND_STRING but I get the same error every time:
test
1,2
1,1,1,1,2... (a string with 32 numbers and commas in between, each number corresponding to a condition of libraries in --tex-notex-libs)

What should I put in this parameter?

Thank you again for preparing this very helpful tool!

gff3 error

Hello, I am trying to use the transcript module of annogesic all week. First of all i keep getting the following error;

root@93fee60e0cf0:/Petya/Rex/petya_paper/ecopAA/output/TSSs/gffs# annogesic transcript -pj /Petya/Rex/petya_paper/ecopAA/ -g /Petya/Rex/petya_paper/ecopAA/input/references/annotations/NC_018666.1.gff -tl $TEX_LIBS -rt all_1 -cf gene CDS -ct /Petya/Rex/petya_paper/ecopAA/output/TSSs/gffs/NC_018666_TSS.gff

   ___    _   ___   ______                  _
  /   |  / | / / | / / __ \____ ____  _____(_)____ \

Running transcript detection
Parsing NC_018666.1.gff
Parsing TEX_plus_div_by_96542.0_multi_by_96542.0_reverse.wig
Parsing TEX_minus_div_by_104934.0_multi_by_96542.0_reverse.wig
Parsing TEX_plus_div_by_96542.0_multi_by_96542.0_forward.wig
Parsing TEX_minus_div_by_104934.0_multi_by_96542.0_forward.wig
Merging wig files of NC_018666.1
Importing tex_notex wig files
Computing transcripts for NC_018666.1
/Petya/Rex/petya_paper/ecopAA//output/transcripts/NC_018666.1_tex_notex
Merging gff files of NC_018666.1
Parsing NC_018666.1_transcript.gff
Merging gff files of NC_018666.1_transcript
Error: There are folders which contain no gff3 files! The files should end with .gff!

PS: all folders contain gff files that as in gff3 format, and the same input gff was used for the TSS and PS prediction. I would be glad if you could help advise me on what do to make it work.

UTR detection error, do not know how to resolve.

Dear Sung-Huan,

I have tried to run the UTR function and got an error that I cannot resolve (please see the error message below). Would you please have a look at this for me? I have the input files but would prefer sending them to you in private over e-mail. Would you please let me know which e-mail address would be best to send files to you?

Yours Sincerely,

Ignatius

annogesic utr --annotation_files Sa_JKD6009_ratt.gff --transcript_files Sa_JKD6009_ratt_transcript.gff --tss_files Sa_JKD6009_ratt_TSS.gff --terminator_files jdk6009_gff_table_win8_sd4_cpm0_gw5.gff --project_path JKD6009_v1

   ___    _   ___   ______                  _     
  /   |  / | / / | / / __ \____ ____  _____(_)____ \

Running UTR detection
Checking gff file of Sa_JKD6009_ratt_TSS.gff
Checking gff file of Sa_JKD6009_ratt.gff
Checking gff file of Sa_JKD6009_ratt_transcript.gff
Checking gff file of jdk6009_gff_table_win8_sd4_cpm0_gw5.gff
Parsing Sa_JKD6009_ratt.gff
Parsing Sa_JKD6009_ratt_TSS.gff
Merging gff files of Sa_JKD6009_ratt_TSS
Parsing Sa_JKD6009_ratt_transcript.gff
Merging gff files of Sa_JKD6009_ratt_transcript
Parsing jdk6009_gff_table_win8_sd4_cpm0_gw5.gff
Merging gff files of Sa_JKD6009_ratt_term
Computing 5'UTRs of Sa_JKD6009_ratt
Traceback (most recent call last):
File "/apps/python/3.7.3/bin/annogesic", line 2178, in
main()
File "/apps/python/3.7.3/bin/annogesic", line 2094, in main
args.func(controller)
File "/apps/python/3.7.3/bin/annogesic", line 2134, in run_UTR_detection
controller.utr_detection()
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/controller.py", line 481, in utr_detection
utr.run_utr_detection(args_utr, log)
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/utr.py", line 84, in run_utr_detection
self._compute_utr(args_utr, log)
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/utr.py", line 49, in compute_utr
"".join([prefix, "5UTR.gff"])), args_utr)
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/detect_utr.py", line 517, in detect_5utr
args_utr.length)
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/detect_utr.py", line 345, in get_5utr_from_other
cdss, tss, cdss, length, False)
File "/apps/python/3.7.3/lib/python3.7/site-packages/annogesiclib/detect_utr.py", line 326, in detect_feature_5utr
locus_tag = fea.attributes["ID"]
KeyError: 'ID'

TSSpredator-1.06.jar not found during Docker build

I'm trying to rebuild the docker container with ubuntu:focal
However I run into a dependency that is no longer available.
It seems that the repository does not exist anymore. Could you point me to a replacement?

Removing intermediate container 9c9d0d613ae6
---> 0024190f61df
Step 17/60 : RUN wget https://lambda.informatik.uni-tuebingen.de/nexus/content/repositories/releases/org/uni-tuebingen/it/TSSpredator/1.06/TSSpredator-1.06.jar && cp TSSpredator-1.06.jar /usr/local/bin/TSSpredator.jar
---> Running in 6d3831448c98
--2022-11-03 08:47:48-- https://lambda.informatik.uni-tuebingen.de/nexus/content/repositories/releases/org/uni-tuebingen/it/TSSpredator/1.06/TSSpredator-1.06.jar
Resolving lambda.informatik.uni-tuebingen.de (lambda.informatik.uni-tuebingen.de)... 134.2.9.13
Connecting to lambda.informatik.uni-tuebingen.de (lambda.informatik.uni-tuebingen.de)|134.2.9.13|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-11-03 08:47:50 ERROR 404: Not Found.

optimization of TSS error

Hi,

I am trying to run optimize_tss_ps from dRNA-seq data using ANNOgesic 1.0.22. However, I got errors when performed
optimize_tss_ps. The command I used is as follows:

And I got these following error messages:

Parsing NC_010655.gff
Parsing NC_010655.fa
Parsing NC_010655_manual_TSS.gff
Checking gff file of NC_010655.1.gff
Warninng: Repeat ID cds-WP_197736719.1 in gff file!!!
Exception in thread "main" java.lang.Exception: A common identifier has to be present in all files! Check for identifiers in files!
at main.Main.alignMode(Main.java:429)
at gui.Mainwin.main(Mainwin.java:114)
at gui.Starter.main(Starter.java:40)
Exception in thread "main" java.lang.Exception: A common identifier has to be present in all files! Check for identifiers in files!
at main.Main.alignMode(Main.java:429)
at gui.Mainwin.main(Mainwin.java:114)
at gui.Starter.main(Starter.java:40)
Exception in thread "main" java.lang.Exception: A common identifier has to be present in all files! Check for identifiers in files!
at main.Main.alignMode(Main.java:429)
at gui.Mainwin.main(Mainwin.java:114)
at gui.Starter.main(Starter.java:40)
Exception in thread "main" java.lang.Exception: A common identifier has to be present in all files! Check for identifiers in files!
at main.Main.alignMode(Main.java:429)
at gui.Mainwin.main(Mainwin.java:114)
at gui.Starter.main(Starter.java:40)
Traceback (most recent call last):
File "/home/why/.local/bin/annogesic", line 2230, in
main()
File "/home/why/.local/bin/annogesic", line 2146, in main
args.func(controller)
File "/home/why/.local/bin/annogesic", line 2174, in optimize_TSSpredator
controller.optimize()
File "/home/why/.local/lib/python3.8/site-packages/annogesiclib/controller.py", line 378, in optimize
optimize_tss(args_ops, log)
File "/home/why/.local/lib/python3.8/site-packages/annogesiclib/optimize.py", line 84, in optimize_tss
optimization(wig_path, fasta_file, gff_file, args_ops,
File "/home/why/.local/lib/python3.8/site-packages/annogesiclib/optimize_TSSpredator.py", line 1032, in optimization
optimization_process(indexs, current_para, list_num, max_num, best_para,
File "/home/why/.local/lib/python3.8/site-packages/annogesiclib/optimize_TSSpredator.py", line 821, in optimization_process
datas, set_config, run_tss = run_tss_and_stat(
File "/home/why/.local/lib/python3.8/site-packages/annogesiclib/optimize_TSSpredator.py", line 463, in run_tss_and_stat
convert2gff(out_path, gff_files, args_ops, strain)
File "/home/why/.local/lib/python3.8/site-packages/annogesiclib/optimize_TSSpredator.py", line 218, in convert2gff
Converter().convert_mastertable2gff(
File "/home/why/.local/lib/python3.8/site-packages/annogesiclib/converter.py", line 368, in convert_mastertable2gff
tss_fh = open(tss_file, "r")
FileNotFoundError: [Errno 2] No such file or directory: '/home/why/Desktop/AKK_sRNA/drna/annogesic//output/TSSs/optimized_TSSpredator/MasterTable_1/MasterTable.tsv'

It seemed that there were something wrong within gff file, so I checked the head of gff，wig，and fasta file:

$ head NC_010655.gff

##sequence-region NC_010655.1 1 2664102
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=349741
NC_010655.1 RefSeq gene 508 582 . - . ID=gene-AMUC_RS00010;Name=AMUC_RS00010;gbkey=Gene;gene_biotype=tRNA;locus_tag=AMUC_RS00010;old_locus_tag=Amuc_R0001
NC_010655.1 tRNAscan-SE tRNA 508 582 . - . ID=rna-AMUC_RS00010;Parent=gene-AMUC_RS00010;anticodon=(pos:complement(548..550));gbkey=tRNA;inference=COORDINATES: profile:tRNAscan-SE:2.0.7;locus_tag=AMUC_RS00010;product=tRNA-Met

$ head NC_010655.fa

>NC_010655.1
AAATCTTATAAAATAACCACATAACTTAAAAAGAATTATGCGGATTTTTAATCCGTCCCCGGCTACGGCTTCTCCGAGCC
TTTCGGGTTTAGCTGCTCTGATGATGCGCCGCAGCATCTAAAACCCATTTCCCCCCCCTCCGCCGGATTTCCAAAAACAA
TGCGGTTTTTTATTTTCCAGTACCTGTCCGGTATCGGAGCCTTCACTTCATGCGGCATCCTGAACGGCAAAATCCTGAGA
AAAGAAAACGCAGCTTCTCACAAGCGAGATTCCAATCCAACATTCATATCCCCGTGTGGATATGTGCACCGGAATAACAG

$ head dseq_01_enriched_NC_010655_minus.wig

track type=wiggle_0 name="dseq_01_enriched_NC_010655_minus"
variableStep chrom=NC_010655.1 span=1
135 -1.323091842009318
136 -1.323091842009318

$head NC_010655_manual_TSS.gff

##sequence-region NC_010655.1
NC_010655.1 manual TSS 1702 1702 . + . ID=manual_tss0;Name=manual_TSS_0

My annogesic version is 1.0.22 and TSSPredator version is 1.1beta.I want to try different versions of TSSPredator, but I cannot download it from the TSSPredator website.

sung-huan / annogesic Goto Github PK

annogesic's People

Contributors

Stargazers

Watchers

Forkers

annogesic's Issues

currently :

should be if keeping the v_0_2_0 :

or try newer version:

Recommend Projects

Recommend Topics

Recommend Org