Dear @clemgoub,
I'm trying to setup my dnaPipeTE installation using the test dataset for the first analysis.
Trinity and RepeatMasker seems to work properly, however I still have a problem during the
estimation of repeat phase, after the third Blast run. It seems that some files were not created
so it is impossible to remove them. Since my test folder is named "prova1", I ran:
python3 ./dnaPipeTE.py -input test_dataset.fastq -output prova1/ -genome_size 10000000 -genome_coverage 0.1 -sample_number 1
results are different from those provided in the test directory and I have this error:
rm: unable to remove "prova1 // blast_contigs_1_fmtd": File or directory does not exist
This is a short version of the log file:
`Start time: Mon May 15 17:49:27 2017
sampling file found, skipping sampling...
Trinity files found, skipping assembly...
prova1/Annotation/Best_RM_annot_80-80
#######################################
REPEATMASKER to anotate contigs
#######################################
RepeatMasker version open-4.0.6
Search Engine: NCBI/RMBLAST [ 2.2.27+ ]
Master RepeatMasker Database: ./bin/RepeatMasker/Libraries/RepeatMaskerLib.embl ( Complete Database: 20150807 )
analyzing file prova1/Trinity.fasta
Some previous RepeatMasker output files were moved to the directory
prova1//Trinity.fasta.preMonMay151749272017.RMoutput
in order not to overwrite them.
Checking for E. coli insertion elements
identifying Simple Repeats in batch 1 of 1
identifying matches to root sequences in batch 1 of 1
identifying Simple Repeats in batch 1 of 1
processing output:
cycle 1
cycle 2
cycle 3
cycle 4
cycle 5
cycle 6
cycle 7
cycle 8
cycle 9
cycle 10
Generating output...
masking
done
24 line read, sorting...
sort done, filtering...
15 lines in one_RM_hit_per_Trinity_contigs
0 lines in Best_RM_annot_80
12 lines in Best_RM_annot_partial
Done
#########################################
Making contigs annotation from RM
#########################################
Done
Making blast sample...
sampling file found, skipping sampling...
total number of reads: 100125
maximum number of reads to sample: 12048
fastq : test_dataset.fastq
sampling 1 samples of max 12048 reads to reach coverage...
999984 bases sampled in 12048 reads
s_test_dataset.fastq_blast done.
#######################################################
Blast 1 : raw reads against all repeats contigs
#######################################################
Blast 1 files found, skipping Blast 1 ...
###################################################
Blast 2 : raw reads against annoted repeats
###################################################
Blast 2 files found, skipping Blast 2 ...
#####################################################
Blast 3 : raw reads against unannoted repeats
#####################################################
Blast 3 files found, skipping Blast 3 ...
#######################################################
Estimation of Repeat content from blast outputs
#######################################################
parsing blastout and adding RM annotations for each read...
awk: riga com.:1: attenzione: sequenza di escape \$' considerata come semplice
$'
rm: impossibile rimuovere "prova1/blast_contigs_1_fmtd": File o directory non esistente
Done, results in: blast_out/blastout_final_fmtd_annoted
#########################################
OK, lets build some pretty graphs
#########################################
Drawing graphs...
null device
1
null device
1
null device
1
null device
1
Warning message:
Removed 3 rows containing missing values (geom_bar).
Warning message:
Removed 3 rows containing missing values (geom_bar).
Done
Removing Trinity runs files...
find: "prova1/Trinity_run*": File o directory non esistente
done
Finishin time: Mon May 15 17:49:46 2017
########################
see you soon !!!
########################`
In my test analysis LTR/Pao are absent from file landscape.pdf output whereas Counts.txt looks just like yours. Which is the problem? Any help will be greatly appreciated.
Thank you in advice,
Massimiliano.