Giter Site home page Giter Site logo

virfinder's People

Contributors

jessieren avatar mlangill avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

virfinder's Issues

Bit of trouble installing

All pre-reqs installed. Here is where the trouble begins:
VF_cppfunction.cpp:16:1: error: reference to ‘unordered_map’ is ambiguous
unordered_map<unsigned long,unsigned long> HashTable;
^~~~~~~~~~~~~
In file included from /usr/include/c++/6.3.1/unordered_map:48:0,
from /usr/lib64/R/library/Rcpp/include/Rcpp/platform/compiler.h:153,
from /usr/lib64/R/library/Rcpp/include/Rcpp/r/headers.h:48,
from /usr/lib64/R/library/Rcpp/include/RcppCommon.h:29,
from /usr/lib64/R/library/Rcpp/include/Rcpp.h:27,
from VF_cppfunction.cpp:1:
/usr/include/c++/6.3.1/bits/unordered_map.h:98:11: note: candidates are: template<class _Key, class _Tp, class _Hash, class _Pred, class _Alloc> class std::unordered_map
class unordered_map
^~~~~~~~~~~~~
In file included from /usr/include/c++/6.3.1/tr1/unordered_map:42:0,
from VF_cppfunction.cpp:2:
/usr/include/c++/6.3.1/tr1/unordered_map.h:180:11: note: template<class _Key, class _Tp, class _Hash, class _Pred, class _Alloc> class std::tr1::unordered_map
class unordered_map
^~~~~~~~~~~~~
VF_cppfunction.cpp: In function ‘std::vector reverseFour(std::vector)’:
VF_cppfunction.cpp:72:29: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for(int revPos = 0; revPos < Four.size(); revPos++)
~~~~~~~^~~~~~~~~~~~~
VF_cppfunction.cpp: In function ‘long unsigned int SeqKmerCountSingle(Rcpp::CharacterVector, int, long unsigned int)’:
VF_cppfunction.cpp:107:4: error: ‘HashTable’ was not declared in this scope
HashTable[index]++;
^~~~~~~~~
VF_cppfunction.cpp: In function ‘void loadToVector(int, long unsigned int, std::vector&)’:
VF_cppfunction.cpp:145:25: error: ‘HashTable’ was not declared in this scope
kmerCount.push_back((HashTable[currentKmerTen] + HashTable[currentKmerRevTen])/double(2 * total));
^~~~~~~~~
VF_cppfunction.cpp: In function ‘Rcpp::List countSeqFeatureCpp(Rcpp::CharacterVector, int)’:
VF_cppfunction.cpp:174:2: error: ‘HashTable’ was not declared in this scope
HashTable.clear();
^~~~~~~~~
/usr/lib64/R/etc/Makeconf:166: recipe for target 'VF_cppfunction.o' failed
make: *** [VF_cppfunction.o] Error 1
ERROR: compilation failed for package ‘VirFinder’

  • removing ‘/usr/lib64/R/library/VirFinder’
    Warning message:
    In install.packages("/home/fetz/genome/VirFinder/linux/VirFinder_1.1.tar.gz", :
    installation of package ‘/home/fetz/genome/VirFinder/linux/VirFinder_1.1.tar.gz’ had non-zero exit status

GPL vs "non-commercial only" conflicting license information

Hello,

In the root of your repository you distribute https://github.com/jessieren/VirFinder/blob/master/licence.md that declares the software to only be used for non-commercial purposes. The R package itself states GPL-2 or later in https://github.com/jessieren/VirFinder/blob/master/linux/VirFinder/DESCRIPTION .

The two licenses are not compatible with each other. Can you please clarify that situation?

Many thanks and kind regards,
Steffen

Command line usage?

I'd like to add VirFinder into some command line pipelines. Is it possible to run this from the command line instead of in R?

Train VirFinder with new geomes

Hi @jessieren

I tried to train VirFinder with new genomes but it shows me this error.

Error in save(seqTrainKmerCount, file = file.path(seqTrainKmerCountDir, : error writing to connection Calls: VF.train.user -> trainDataCollect -> save In addition: Warning message: 'rBind' is deprecated. Since R version 3.2.0, base's rbind() should work fine with S4 objects Execution halted

What should I do ?

Thanks,
Aly

Q value smaller than P value

Hello,

I just ran the program for the first time on a set of just over 5000 contigs, most (or all) should be viral. However after attaching q values with the VF.qvalue function many q values are smaller than the original P value, for example a P value of 0.98 became a Q value of 0.04. This seems like a bug? Unless the Q value adjustment does some other step aside from adjusting P values for multiple testing?

Thanks

defining q value threshold

Hi,

I'm using VirFinder for the first time and try to figure out where to set the threshold for categorizing into "good" and "bad" predictions. It's not clear to me from the corresponding publications how you selected the threshold there. Could you help, @jessieren ?

Thanks!

Use output of Prodigal for VirFinder

Dear developers, I have a brief question after running megahit I run prodigal for all my metagenomic samples (using the -meta flag) for using them in other analyses.
Should I use the prodigal output instead of the initial output of megahit? Would this improve or acsellerate the estimation? I noticed in the publication that you used Prodigal to corroborate the results (and I will use blastn to assess the taxonomy of the contigs identified by VirFinder).

Thanks for your help.

Error: 'rBind' is defunct,Since R version 3.2.0, base's rbind() should work fine with S4 objects

hello,
I encountered the following questionsproblem in Windows 10, version R 3.6.3.Then use virfinder to switch to MacOS user R version 4.2.0 and the problem is not resolved.

**> VF.trainModUser <- VF.train.user(trainFaFileHost, trainFaFileVirus, userModDir,

  •                              userModName, w, equalSize=TRUE)
    

Error: 'rBind' is defunct.
Since R version 3.2.0, base's rbind() should work fine with S4 objects**
How should this kind of mistake be solved?
Hope to get help, thank you

Training with new viral genomes

Dear all,
I'm very interested in this tool. I'm actually trying to understand whether it's possible to expand the number of viral genomes to produce another training dataset by adding metagenomically-identified contigs representing putatively complete phage genomes from environmental datasets (e.g. Pacific Ocean Virome, Tara Oceans). Is the host gene sequence mandatory for creating the new model?
Best regards

Theano and MLK error on server

Hi there,

I have used this tool locally works great! But for larger/more samples i tried to install it onto a server (where conda is not permitted) as follows:

$module load python/3.6
$virtualenv /home/USER/deepvirfinder_env
$source /home/USER/deepvirfinder_env/bin/activate
$cd bin
$pip install numpy theano keras scikit-learn
$pip install biopython
$git clone https://github.com/jessieren/DeepVirFinder

But now i get the following error

_Using Theano backend.
Traceback (most recent call last):
File "/cvmfs/soft.server.com/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/configparser.py", line 1138, in _unify_values
sectiondict = self._sections[section]
KeyError: 'blas'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/theano/configparser.py", line 168, in fetch_val_for_key
return theano_cfg.get(section, option)
File "/cvmfs/soft.server.com/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/configparser.py", line 781, in get
d = self._unify_values(section, vars)
File "/cvmfs/soft.server.com/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/configparser.py", line 1141, in _unify_values
raise NoSectionError(section)
configparser.NoSectionError: No section: 'blas'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/theano/configparser.py", line 328, in get
delete_key=delete_key)
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/theano/configparser.py", line 172, in fetch_val_for_key
raise KeyError(key)
KeyError: 'blas.ldflags'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/theano/configdefaults.py", line 1250, in check_mkl_openmp
import mkl
ModuleNotFoundError: No module named 'mkl'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/deepvirfinder_env/bin/DeepVirFinder/dvf.py", line 53, in
import keras
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/keras/init.py", line 3, in
from . import utils
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/keras/utils/init.py", line 6, in
from . import conv_utils
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/keras/utils/conv_utils.py", line 9, in
from .. import backend as K
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/keras/backend/init.py", line 1, in
from .load_backend import epsilon
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/keras/backend/load_backend.py", line 87, in
from .theano_backend import *
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/keras/backend/theano_backend.py", line 7, in
import theano
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/theano/init.py", line 124, in
from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/theano/scan_module/init.py", line 41, in
from theano.scan_module import scan_opt
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/theano/scan_module/scan_opt.py", line 60, in
from theano import tensor, scalar
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/theano/tensor/init.py", line 17, in
from theano.tensor import blas
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/theano/tensor/blas.py", line 155, in
from theano.tensor.blas_headers import blas_header_text
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/theano/tensor/blas_headers.py", line 987, in
if not config.blas.ldflags:
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/theano/configparser.py", line 332, in get
val_str = self.default()
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/theano/configdefaults.py", line 1430, in default_blas_ldflags
check_mkl_openmp()
File "/home/user/deepvirfinder_env/lib/python3.6/site-packages/theano/configdefaults.py", line 1262, in check_mkl_openmp
""")
RuntimeError:
Could not import 'mkl'. Either install mkl-service with conda or set
MKL_THREADING_LAYER=GNU in your environment for MKL 2018.

If you have MKL 2017 install and are not in a conda environment you
can set the Theano flag blas.check_openmp to False. Be warned that if
you set this flag and don't set the appropriate environment or make
sure you have the right version you will get wrong results._

Any thoughts?

Installation error

Hi,

I am trying to install VirFinder on Windows but I have an error when executing Library(VirFinder)

Error: package or namespace load failed for ‘VirFinder’:
package ‘VirFinder’ was installed by an R version with different internals; it needs to be reinstalled for use with this R version

Can you help me with that?

Question about your protocol

While reading through the article about Virfinder, I noted that you all set your p-values to < .01. Why is that? Normally a p value of <.05 is considered significant. Would you not consider a contig ran through your program avaro cantique if it had a p value equal to .05? Thanks for any clarification.

‘virFinder’ is not available (for R version 3.5.2)

Hi Jessieren
I use R 3.5.2 "Eggshell Igloo". I am trying to install virfinder but it gives me the following warning

install.packages("virfinder")
Warning in install.packages :
package ‘virfinder’ is not available (for R version 3.5.2)

Can you help with that, please?
Cheers

Multi-threaded operation

Due to the big data about the metagenomic analysis, I wonder weather VirFinder can use Multi-threaded to predict Virus sequences.

Where is the file VirFinder_1.1.tar.gz located?

I'm sorry but I cannot seem to find the tar.gz file located anywhere on your github or with a google search. I would like to install your software but cannot proceed until I have this direction. Thank you.

Error while running VF.pred on my own data

Hi! I followed the instructions to get the viral prediction on my own contigs but the following error popped up:

Error in countSeqFeatureCpp(seqFa, w) :
Not compatible with STRSXP: [type=NULL].
In addition: Warning message:
In file(inFaFile, open = "r") :
file("") only supports open = "w+" and open = "w+b": using the former

I don't have idea what does this mean...
My contigs have headers like this:

NODE_1_length_98149_cov_21.408159_magnitude=32301

Thank you in advance for your help

defining VF.train.user.R subLengthAll

Hi @jessieren
I train VirFinder with 300bp fragments and it shows the error:

Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
  cannot open compressed file '/seqTrainKmerCount/VF.trainKmer.tara_bacteria.fa.subLen500.k8.file1.RData', probable reason 'No such file or directory'

I notice Vf.train.user.R script defined the subLengthALL as 0.5kb, 1kb and 3kb.
Is that mean all train sequences need be >3000bp?

Another question:
I prepare some sequences as 10000bp in train set and test set (like your article Table 1 and Fig 1A, 10000bp), did the train set need to split as 0.5kb, 1kb and 3kb?

The R Version with different internals

here I‘am using R3.5.3
while loading the VirFinder,there’s an error below:

package or namespace load failed for ‘VirFinder’:
package ‘VirFinder’ was installed by an R version with different internals; it needs to be reinstalled for use with this R version

Parallelize VF.trainModUser

Hi,
I am training my own model with a large sequence dataset (<600,000) and I am expecting it can take a really long time. I am wondering if VF.trainModUser can take the advantage at multi-core platform, just like the parallel scripts parVF.pred.R.

Thanks!

Error in smooth.spline(lambda, pi0, df = smooth.df) :

Hi,Jessie
Thanks for your VirFinder job!
when I test with my single sequence follow your github instruction,it return

Error in smooth.spline(lambda, pi0, df = smooth.df) :
  missing or infinite values in inputs are not allowed
Calls: VF.qvalue -> qvalue -> pi0est -> smooth.spline
Execution halted

so, how could it happende and how to solve it?
Thanks a lot

problem with smooth.spline and VF.qvalue

Dear Jessie,

I wonder if you could help me solving this. Some months ago I run VirFinder with no issues I used the script below to make a loop over several .fas files and obtaining one tsv for each 'sample' with the results. Now I am analyzing extra data and I found this error

Error in smooth.spline(lambda, pi0, df = smooth.df) :
  missing or infinite values in inputs are not allowed
Calls: %>% ... <Anonymous> -> VF.qvalue -> qvalue -> pi0est -> smooth.spline

Maybe this is similar to this issue

Maybe the routine didn't find enough virus among the data and it fails. I need to incorporate these results to the already calculated ones for close to 300 samples so I wonder how can I fix this. Thanks for your help.

library(VirFinder)
library(purrr)
library(fs)
library(readr)
library(dplyr)
library(stringr)
library(tibble)
library(tidyr)

# read all *.fas
my_files <- list.files( pattern = "*fas")
neonames <- my_files %>% str_replace(".fas", "")
qnames <- my_files %>% str_replace(".fas", "Q")

bigPred <- map(my_files, VF.pred) %>% setNames(neonames)
saveRDS(bigPred, "bigPred.RDS")

#########  #############  ######  #######  #####  #############
# bind as a single tibble
bigPRED <- bind_rows(bigPred, .id = "sample") %>% relocate(sample, .after = last_col())
# get Q value from bigPRED
bigQ <- bigPRED %>% group_by(sample) %>%
        group_map(~ VF.qvalue(.x$pvalue)) %>% setNames(neonames)
# bigQ list of dataframes into a single tibble
Qvalue <- bigQ %>% enframe() %>%
  unnest() %>% rename(sample = name, qvalue = value)

# send to environment
# Qvalue %>%  group_split(sample) %>% set_names(qnames) %>% 
#        map(., tibble::as_tibble) %>% list2env(., envir = .GlobalEnv)
# bigPRED %>%  group_split(sample) %>% set_names(neonames) %>% 
#        map(., tibble::as_tibble) %>% list2env(., envir = .GlobalEnv)

# just a col bind both PRED and qvalue
FULL <- bind_cols(bigPRED, Qvalue) %>%
  rename(sample = sample...5) %>% select(-sample...6) %>%  # rename
  group_by(sample) %>% arrange(desc(score), .by_group = TRUE ) %>%  # score
  group_by(sample) %>% filter(score > 0.9)                          # greater than 0.9  

# nest by sample, export each one
FULL %>% nest(-sample) %>%
  pwalk(~write_tsv(x = .y, file = paste0(., ".tsv")))
# rm(list=setdiff(ls(), "bigPred")) # delete everything sans bigPred

R package issue

Hi,

I am trying to install the VirFinder package, but I am getting the following error:

package or namespace load failed for ‘VirFinder’:
package ‘VirFinder’ was installed by an R version with different internals; it needs to be reinstalled for use with this R version

I am currently running Rstudio 3.64 on windows. Any advice?

training by adding to the existing database

Hi !
Thank you for this tool. I wonder if it is possible to train virfinder by just adding to the existing database so I don't need to create everything from scratch.
Thank you in advance,
Sofia

VirFinder application in transcriptomes

Hello, Jessieren!

I`m trying to run VirFinder 1.1 in Ubuntu to discover viruses in transcriptomes. Methodology of the package is able to detect this? The following error occurs when I tried to do that

Error in countSeqFeatureCpp(seqFa, w) :
Not compatible with STRSXP: [type=NULL].
In addition: Warning message:
In file(inFaFile, open = "r") :
file("") only supports open = "w+" and open = "w+b": using the former

Can you help with that, please?
Cheers

VirFinder cannot run in R 4.1.1

Hi jessieren,

I'm using R version 4.1.1 and find that VirFinder cannot be intalled. It said "Error: package or namespace load failed for ‘VirFinder’:
package ‘VirFinder’ was installed before R 4.0.0: please re-install it".

If the only thing I can do is to download R before 4.0.0?
Or, can you please update VirFinder package to make it executable in R 4.1.1 or later?

Hope for your respose,
Jiaxiong

Original training data

Hey Jie and Nathan,
Maybe I entirely missed it, in which case I am very sorry to bother you, but could you make the exact data subset from "RefSeq virus and prokaryotic genomes sequenced from before and after 1 January 2014 " that were used to train and test the model publicly available? I'd love to try to match your results! Thanks in advance.

Wrapper error

I get an error when I run the VirFinder_wrapper.R script. If I open R and run the exact same commands, it runs fine and I get the desired output. Any ideas what's wrong? Here's the error:

$ Rscript VirFinder_wrapper.R \
> -f genome.fna \
> -o out.virfinder \
> --qvalue
Loading required package: glmnet
Loading required package: Matrix

Attaching package: 'Matrix'

The following object is masked from 'package:base':

    crossprod, tcrossprod

Loading required package: foreach
Loaded glmnet 2.0-13

Loading required package: qvalue
Warning message:
In fun(libname, pkgname) : no DISPLAY variable so Tk is not available
[1] "Running VirFinder on /global/projectb/scratch/snayfach/projects/dc4/0_input_data/genomes/DC416SNEG_AHWWO/DC416SNEG_AHWWO.fsystem.file                package:base                R Documentation

Find Names of R System Files

Description:

     Finds the full file names of files in packages etc.

na"
Error in rbind2(.Call(dense_to_Csparse, x), y) :
  error in evaluating the argument 'x' in selecting a method for function 'rbind2': Error: could not find function "getClass"
Calls: VF.pred ... predict.glmnet -> <Anonymous> -> <Anonymous> -> rbind2
Execution halted

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.