Giter Site home page Giter Site logo

ramclustr's People

Contributors

cbroeckl avatar hechth avatar maximskorik avatar rickhelmus avatar sneumann avatar xtrojak avatar zargham-ahmad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ramclustr's Issues

How to handle missing `MSMS` data

Currently, the MS1 data is copied into the slot for MS2 data if it is not present in the version that reads data from a csv, while it is kept empty when reading it from xcms - should this be made the general case?

Error running the vignette

R CMD check RAMClustR fails with:

Quitting from lines 19-77 (ramclustR.Rmd) 
Error: processing vignette 'ramclustR.Rmd' failed with diagnostics:
argument is of length zero
Execution halted

I guess there is a package dependency issue
why it fails on my laptop. Can you spot which package
might be the culprit ?

> library(RAMClustR)
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.04

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RAMClustR_0.4

loaded via a namespace (and not attached):
[1] tools_3.3.2

suspected write problem in RAMClustR::write.msp

As part of our wrapping of RAMClustR in Galaxy, we are routinely running a fairly basic test described below. After switching from 1.0.9 to 1.2.1 we are experiencing the following error:

Error in round(ramclustObj$clri[i], 2) : 
  non-numeric argument to mathematical function
Calls: store_output -> <Anonymous> -> paste0
Execution halted

Our store_output function calls the RAMClustR::write.msp here

@hechth suspects that new version of RAMClustR does not expect a parameter to be missing

The parameters of the test are here
The input file is here

Connecting RAMclustR with MetFamily

Hi, it would be great to have a working example snippet
on how to connect RAMclustR output to MetFamily.

What we need is an MSP file that looks like this:
https://raw.githubusercontent.com/ipb-halle/MetFamily/master/files/MSMS_library_showcase.msp

NAME: Unknown
RETENTIONTIME: 8.4209
PRECURSORMZ: 85.00465
METABOLITENAME: 
ADDUCTIONNAME: [M-H]-
Num Peaks: 3
75.00941	44.5240135192871
79.95998	54.568229675293
85.00496	297.418823242188

i.e. where we have RT and Precursor that can be matched to the MS1 precursor.

Then we need the MS1 quantification as it comes from XCMS, looking like
https://github.com/ipb-halle/MetFamily/blob/master/files/Metabolite_profile_showcase.txt
Full spec of the MetFamily input formats are in
https://github.com/ipb-halle/MetFamily/blob/master/files/MetFamily_Input_Specification.pdf

@Treutler and @cbroeckl , is there a way to create a small HowTo
(some people at UC RIverside would love to see that ...)
so we can include that in RAMclustR documentation ?

@cbroeckl , is the file https://github.com/cbroeckl/RAMClustR/blob/master/vignettes/spectra/test.mspLib
what the RAMclustR output looks like ? It has the RT in a comment tag.

Yours,
Steffen

Can't find my sample names

Hi,

I'm trying some worklows using ramclustR just very fast and I can't find my sample names in the output...
After take a look at the script of ramclustR function, I can see that you have a lot of results table containing all that you need (rt, intensity, cluster, etc...) But I can't find the sample names in the results MSP file (whereas the rownames of table are my sample names

Someone can help me please ?

Thanks a lot !

Where would real msms.csv data come from for MSe/bbCID?

I have a user asking how to get their bbCID data into ramclustr and looking at the vignette it's not clear to me how that CSV file should be created.

I suggest adding a flow chart of software to the README to show a few usage scenarios at a glance. Something as simple as this:
image

RAMClustR fails if run on large feature table.

If supplying a feature table with more than 55k entries, RAMClustR fails due to this issue in the ff package ref. I doubt this issue in ff will be fixed.

Since this allocated matrix is symmetric (I assume at least), and only the upper triangle is computed anyway, I think this computation could maybe be optimized in order to never have to store the actual full matrix in memory.

@cbroeckl if you are currently busy and don't have the time to address this issue I'd be happy to support and we will come up with an implementation to solve this.

ffmat<-ff::ff(vmode="double", dim=c(n, n), initdata = 0) ##reset to 1 if necessary

Possible future developments

          @hechth - absolutely could be done.   A few items to consider: 
  1. are there enough samples in each group for correlation-based clustering to be meaningful? If not, it would be best to develop a peak shape based clustering as well. i had actually started down this path and lost steam and ultimately abandoned it, for lack of time to validate it. There is a clear path forward for it though. You can simultaneously use all the similarity metrics, retention time of the feature, correlation, and peak shape by expanding the existing similarity product score. In theory IMS data could also be incorporated, if available.
  2. if you perform RAMClustR by sample groups then cluster spectra, how do you deal with feature assignments which are in conflict?
  3. How do you deal with missing spectra in the blanks (NA values are a bit of a nuisance...).
  4. If you are going to be performing clustering by sample type, would be be best to perform XCMS by sample type as well?
  5. If two spectra from two groups align pretty well but imperfectly, what set of features should be used in the quantitative assignments - only the overlapping features or all features?

Originally posted by @cbroeckl in #31 (comment)

No disk space while calling "ff" in ramclustR.R:263

Hi Steffen,
RAMClustR uses ffmat<-ff(vmode="double", dim=c(n, n), initdata = 0) to call ff in ramclustR.R:263. ff obviously uses the default temp folder to store temporary files. In cases of limited space in the user profile, the call of ff may fail if no more empty disk space could be allocated. In a server based (HPC) environment this could happen, because user profile space is maybe limited and the working directory is allocated on scratch devices. We had a problem on Windows Server 2012 with the user profile located on C:/ and a quota of 10 GB and a working directory on scratch S:/.

I suggest to change the call of ff such that the temp files generated by ff are allocated in the working space and not in the system temp space.

A work around is to set the user system variables of the temp directory to a folder with enough space.

Yours,
Tobias

add license

I just noticed that RAMClustR doesn't seem to have a license O.o

Implement `rc.get.csv.data` function

In the ramclustR.R function it was possible to provide a csv file as the input. It would be great to have this code now available in a separate function which loads the data from a feature csv and inits the pheno object from a metadata csv.

The function would then return a ramClustObj which has the data fields initiated in the same way as the rc.get.xcms.data function.

Refactor main ramclust.R function

The ramclust.R file contains a function covering the whole workflow, but the rc.*.R files actually contain the same functionality in multiple steps, which is more convenient to test and maintain.

  • #30
  • Replace the sections in ramclust.R with the respective sub-steps of the workflow
  • Implement unit tests for all functions
  • Include a data-flow diagram and step-wise procedure in the documentation
  • Group lower-level functions into higher top-level functions

image

Moving over from xcms issue #430

Thanks @cbroeckl for all the help so far,
RC <- ramclustR(xcmsObj = xdata, ExpDes=experiment)

Error in if (!is.null(xcmsObj) & mslev == 2 & any(is.null(MStag), is.null(idMSMStag), :
missing value where TRUE/FALSE needed
In addition: Warning message:
In ramclustR(xcmsObj = xdata, ExpDes = experiment) :
NAs introduced by coercion

Is this because I filled in something incorrectly at
experiment <- defineExperiment(csv = FALSE)?

Question about why MSMS spectra don't read in

Hello @cbroeckl and all,

I know I already asked about it, but want to double check.

I am working with Waters MS/MSe data, which I converted into mzML format keeping all three channels in, so my MS and MSMS data are contained in a single mzML file. Then I processed that with XCMS3, which runs RT alignment and grouping on MS data, but then applies it to the MSMS layer as well.

Now I am trying to prepare my data for annotation using RAMClustR. I read the data in using rc.get.xcms.data, which asks for a name tag to the MSMS files, which I don't have since MSe data is in the same mzML files as the MS data. So I just skipped this parameter, which I think resulted in RAMClustR using only MS data for processing and preparing the spectra.

Is there a way around this? Or should I re-convert my data separating MS and MSe into different mzML files?

Thanks for all your help!
Best,
Lisa

PrecursorMz

Hi,
I use the function writemsp to import the full dataset to a spectra object. I notice that the precursor that is given to each compound differ from the precursor calculate with the do.findmain.
In general the precursor is a mz higher, from the same ms1 group but lower intersity.

Run error in RAMClustR

Hello,
Excellent work on RAMClustR. I've been running into an error when trying to use RAMClustR. Both data from apLCMS and the data provided with the package have caused the same error. I was attempting to use MS1 data only to deconvolute isotopes, in-source fragments and additional adducts. I've been running the following:

res_1<- ramclustR (xcmsObj = NULL, ms = "MSdata.csv",
idmsms = NULL,
taglocation = "filepaths",
MStag = NULL, idMSMStag = NULL, featdelim = "_", timepos = 2,
st = 20, sr = 0.5, maxt = 20, deepSplit = FALSE,
blocksize = 2000, mult = 5, hmax = 0.3, sampNameCol = 1,
collapse = TRUE, mspout = FALSE, mslev = 1, ExpDes = NULL,
normalize = "TIC", minModuleSize = 2, linkage="average")

The function will run through the following steps:
calculating ramclustR similarity: nblocks = 6
finished:1 2 3 4 5 6
RAMClust feature similarity matrix calculated and stored: 0.3 minutes
RAMClust distances converted to distance object: 0.1 minutes
fastcluster based clustering complete: 0 minutes

And then produce the following error:
Error in .subset2(x, i, exact = exact) : subscript out of bounds

My R session information is below.
Thank you in advance for your help.

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] ff_2.2-13 bit_1.1-12 CAMERA_1.22.0
[4] igraph_1.0.1 BiocInstaller_1.16.5 dynamicTreeCut_1.62
[7] fastcluster_1.1.16 xcms_1.42.0 Biobase_2.26.0
[10] BiocGenerics_0.12.1 mzR_2.0.0 Rcpp_0.12.0
[13] RAMClustR_0.2 devtools_1.9.1

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3 cluster_2.0.3 codetools_0.2-14
[4] colorspace_1.2-6 curl_0.9.1 digest_0.6.8
[7] foreign_0.8-65 Formula_1.2-1 ggplot2_1.0.1
[10] graph_1.44.1 grid_3.1.2 gridExtra_2.0.0
[13] gtable_0.1.2 Hmisc_3.16-0 httr_1.0.0
[16] lattice_0.20-33 latticeExtra_0.6-26 magrittr_1.5
[19] MASS_7.3-43 memoise_0.2.1 munsell_0.4.2
[22] nnet_7.3-10 plyr_1.8.3 proto_0.3-10
[25] R6_2.1.0 RBGL_1.42.0 RColorBrewer_1.1-2
[28] reshape2_1.4.1 rpart_4.1-10 scales_0.2.5
[31] splines_3.1.2 stats4_3.1.2 stringi_0.5-5
[34] stringr_1.0.0 survival_2.38-3 tcltk_3.1.2
[37] tools_3.1.2

general function organization

@zargham-ahmad - can we put all the general functions into one file? For example, create_ramclustObj() is a function created in rc.get.df.data, but is called from rc.get.csv.data and rc.get.xcms.data. I am sure there are other functions which are central and used by many files, and i would like to be able to find those central functions more easily.

Implement a test case using individual steps to build the whole `ramclust.R` workflow

Since the main ramclustR.R file has become somewhat obsolete with the new individual components, it would be good to mark it as deprecated or even remove the functionality from the package after making sure that everything is kept where it actually should be. Another option would be to have this function as a default workflow running the fundamental steps of RAMClustR, so keeping it intact as a main wrapper @cbroeckl ?

To make sure that the functionality is kept or equivalent, we need a test case which runs the individual steps and then we can make a comparison to the results created by the old ramclustR function.

Organize dependencies

The InterpretMSSpectrum package is not available on OSx, therefore it should move to Suggests and be imported via @concept in the respective functions so that it is referenced - the code in do.findmain.R should then only be executed if the package is present.

The BiocManager, stringi and xml2 packages are not imported, therefore should also be under suggests.

The package actually depends on R > 3.5.0, so it should also be noted in the package description.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.