cbroeckl / ramclustr Goto Github PK

View Code? Open in Web Editor NEW

12.0 12.0 16.0 91.72 MB

Assigning precursor-product ion relationships in indiscriminant MS/MS data

License: MIT License

R 100.00%

ramclustr's People

Contributors

Stargazers

Watchers

Forkers

meowcat huansi jpgroup yufree sneumann cphyt7 inambioinfo huans asheflin recetox martenson hechth maximskorik rickhelmus zargham-ahmad

ramclustr's Issues

How to handle missing `MSMS` data

Currently, the MS1 data is copied into the slot for MS2 data if it is not present in the version that reads data from a csv, while it is kept empty when reading it from xcms - should this be made the general case?

Error running the vignette

R CMD check RAMClustR fails with:

Quitting from lines 19-77 (ramclustR.Rmd) 
Error: processing vignette 'ramclustR.Rmd' failed with diagnostics:
argument is of length zero
Execution halted

I guess there is a package dependency issue
why it fails on my laptop. Can you spot which package
might be the culprit ?

> library(RAMClustR)
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.04

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RAMClustR_0.4

loaded via a namespace (and not attached):
[1] tools_3.3.2

suspected write problem in RAMClustR::write.msp

As part of our wrapping of RAMClustR in Galaxy, we are routinely running a fairly basic test described below. After switching from 1.0.9 to 1.2.1 we are experiencing the following error:

Error in round(ramclustObj$clri[i], 2) : 
  non-numeric argument to mathematical function
Calls: store_output -> <Anonymous> -> paste0
Execution halted

Our store_output function calls the RAMClustR::write.msp here

@hechth suspects that new version of RAMClustR does not expect a parameter to be missing

The parameters of the test are here
The input file is here

extract function to check input arguments

https://github.com/RECETOX/RAMClustR/blob/36fce5b34d2e57cd70c6d2b9ac53ea52250d94a6/R/findmass.R#L32-L41

exportDataset function fails due to `getData` function not being found

Trying to use the exportDataset function from the library, I run into an error where the getData function mentioned below is not found.

RAMClustR/R/exportDataset.R

Lines 27 to 32 in eb15f37

    
           d <- getData( 
        
             ramclustObj = ramclustObj,  
        
             which.data = which.data,  
        
             filter = filter, 
        
             cmpdlabel = label.by 
        
           )

Which function is this or is the exportDataset function deprecated?

I'm very thankful for any advice.

Create a vignette

We should have a tutorial style vignette

Make RAMClustR available on CRAN

Hi, is there an intent to revitalize the CRAN package for this tool? The official page says it has been removed from CRAN and only archive with old versions is available.

https://cran.r-project.org/web/packages/RAMClustR/index.html

Provide paramsets as function arguments to ramclustR()

there should be a non-interactive mode, where the parameter sets
are not edited via the R data editor, but instead given e.g. as
ramclustR(..., paramset=paramsets$C8Serum, ...)

Connecting RAMclustR with MetFamily

Hi, it would be great to have a working example snippet
on how to connect RAMclustR output to MetFamily.

What we need is an MSP file that looks like this:
https://raw.githubusercontent.com/ipb-halle/MetFamily/master/files/MSMS_library_showcase.msp

NAME: Unknown
RETENTIONTIME: 8.4209
PRECURSORMZ: 85.00465
METABOLITENAME: 
ADDUCTIONNAME: [M-H]-
Num Peaks: 3
75.00941	44.5240135192871
79.95998	54.568229675293
85.00496	297.418823242188

i.e. where we have RT and Precursor that can be matched to the MS1 precursor.

Then we need the MS1 quantification as it comes from XCMS, looking like
https://github.com/ipb-halle/MetFamily/blob/master/files/Metabolite_profile_showcase.txt
Full spec of the MetFamily input formats are in
https://github.com/ipb-halle/MetFamily/blob/master/files/MetFamily_Input_Specification.pdf

@Treutler and @cbroeckl , is there a way to create a small HowTo
(some people at UC RIverside would love to see that ...)
so we can include that in RAMclustR documentation ?

@cbroeckl , is the file https://github.com/cbroeckl/RAMClustR/blob/master/vignettes/spectra/test.mspLib
what the RAMclustR output looks like ? It has the RT in a comment tag.

Yours,
Steffen

Remove duplicate code in de

https://github.com/hechth/RAMClustR/blob/36fce5b34d2e57cd70c6d2b9ac53ea52250d94a6/R/defineExperiment.R#L157-L174

Can't find my sample names

Hi,

I'm trying some worklows using ramclustR just very fast and I can't find my sample names in the output...
After take a look at the script of ramclustR function, I can see that you have a lot of results table containing all that you need (rt, intensity, cluster, etc...) But I can't find the sample names in the results MSP file (whereas the rownames of table are my sample names

Someone can help me please ?

Thanks a lot !

mzpos and timepos are unused

These paremeters are not used in ramclustR()

Where would real msms.csv data come from for MSe/bbCID?

I have a user asking how to get their bbCID data into ramclustr and looking at the vignette it's not clear to me how that CSV file should be created.

I suggest adding a flow chart of software to the README to show a few usage scenarios at a glance. Something as simple as this:

RAMClustR fails if run on large feature table.

If supplying a feature table with more than 55k entries, RAMClustR fails due to this issue in the ff package ref. I doubt this issue in ff will be fixed.

Since this allocated matrix is symmetric (I assume at least), and only the upper triangle is computed anyway, I think this computation could maybe be optimized in order to never have to store the actual full matrix in memory.

@cbroeckl if you are currently busy and don't have the time to address this issue I'd be happy to support and we will come up with an implementation to solve this.

RAMClustR/R/ramclustR.R

Line 667 in 351243d

    
           ffmat<-ff::ff(vmode="double", dim=c(n, n), initdata = 0) ##reset to 1 if necessary

Possible future developments

          @hechth - absolutely could be done.   A few items to consider:

are there enough samples in each group for correlation-based clustering to be meaningful? If not, it would be best to develop a peak shape based clustering as well. i had actually started down this path and lost steam and ultimately abandoned it, for lack of time to validate it. There is a clear path forward for it though. You can simultaneously use all the similarity metrics, retention time of the feature, correlation, and peak shape by expanding the existing similarity product score. In theory IMS data could also be incorporated, if available.
if you perform RAMClustR by sample groups then cluster spectra, how do you deal with feature assignments which are in conflict?
How do you deal with missing spectra in the blanks (NA values are a bit of a nuisance...).
If you are going to be performing clustering by sample type, would be be best to perform XCMS by sample type as well?
If two spectra from two groups align pretty well but imperfectly, what set of features should be used in the quantitative assignments - only the overlapping features or all features?

Originally posted by @cbroeckl in #31 (comment)

No disk space while calling "ff" in ramclustR.R:263

Hi Steffen,
RAMClustR uses ffmat<-ff(vmode="double", dim=c(n, n), initdata = 0) to call ff in ramclustR.R:263. ff obviously uses the default temp folder to store temporary files. In cases of limited space in the user profile, the call of ff may fail if no more empty disk space could be allocated. In a server based (HPC) environment this could happen, because user profile space is maybe limited and the working directory is allocated on scratch devices. We had a problem on Windows Server 2012 with the user profile located on C:/ and a quota of 10 GB and a working directory on scratch S:/.

I suggest to change the call of ff such that the temp files generated by ff are allocated in the working space and not in the system temp space.

A work around is to set the user system variables of the temp directory to a folder with enough space.

Yours,
Tobias

object 'tmpnames1' not found

Hello,

Seems a recent commit (2438b68) is resulting in an error if normalize != "quantile" (ramclustR.R@461).

Thanks,
Rick

add license

I just noticed that RAMClustR doesn't seem to have a license O.o

Implement `rc.get.csv.data` function

In the ramclustR.R function it was possible to provide a csv file as the input. It would be great to have this code now available in a separate function which loads the data from a feature csv and inits the pheno object from a metadata csv.

The function would then return a ramClustObj which has the data fields initiated in the same way as the rc.get.xcms.data function.

URL in the DESCRIPTION file doesn't exist

Refactor main ramclust.R function

The ramclust.R file contains a function covering the whole workflow, but the rc.*.R files actually contain the same functionality in multiple steps, which is more convenient to test and maintain.

#30
Replace the sections in ramclust.R with the respective sub-steps of the workflow
Implement unit tests for all functions
Include a data-flow diagram and step-wise procedure in the documentation
Group lower-level functions into higher top-level functions

Moving over from xcms issue #430

Thanks @cbroeckl for all the help so far,
RC <- ramclustR(xcmsObj = xdata, ExpDes=experiment)

Error in if (!is.null(xcmsObj) & mslev == 2 & any(is.null(MStag), is.null(idMSMStag), :
missing value where TRUE/FALSE needed
In addition: Warning message:
In ramclustR(xcmsObj = xdata, ExpDes = experiment) :
NAs introduced by coercion

Is this because I filled in something incorrectly at
experiment <- defineExperiment(csv = FALSE)?

Question about why MSMS spectra don't read in

Hello @cbroeckl and all,

I know I already asked about it, but want to double check.

I am working with Waters MS/MSe data, which I converted into mzML format keeping all three channels in, so my MS and MSMS data are contained in a single mzML file. Then I processed that with XCMS3, which runs RT alignment and grouping on MS data, but then applies it to the MSMS layer as well.

Now I am trying to prepare my data for annotation using RAMClustR. I read the data in using rc.get.xcms.data, which asks for a name tag to the MSMS files, which I don't have since MSe data is in the same mzML files as the MS data. So I just skipped this parameter, which I think resulted in RAMClustR using only MS data for processing and preparing the spectra.

Is there a way around this? Or should I re-convert my data separating MS and MSe into different mzML files?

Thanks for all your help!
Best,
Lisa

where collapse option variables are used ?

Hi,

Sorry it's me again... !
I'm asking myself where the collapse option operaitons are used after ? https://github.com/cbroeckl/RAMClustR/blob/master/R/ramclustR.R#L784

Because the ramclustObj$SpecAbund object looks not used after ? (or the ramclustObj$SpecAbundAve

Can you light me please ?

Fix edge case where number of features coincides with block size

          @arpita-007  - i think this is a rare event coupled with imperfect code.  the file that fails has exactly 2000 features, which happens to be what the default blocksize setting is.  try setting the option in the ramclustr function:  blocksize = 1200.  i suspect it will run fine.  let me know if this fixes it please!

Originally posted by @cbroeckl in #29 (comment)

PrecursorMz

Hi,
I use the function writemsp to import the full dataset to a spectra object. I notice that the precursor that is given to each compound differ from the precursor calculate with the do.findmain.
In general the precursor is a mz higher, from the same ms1 group but lower intersity.

Run error in RAMClustR

Hello,
Excellent work on RAMClustR. I've been running into an error when trying to use RAMClustR. Both data from apLCMS and the data provided with the package have caused the same error. I was attempting to use MS1 data only to deconvolute isotopes, in-source fragments and additional adducts. I've been running the following:

res_1<- ramclustR (xcmsObj = NULL, ms = "MSdata.csv",
idmsms = NULL,
taglocation = "filepaths",
MStag = NULL, idMSMStag = NULL, featdelim = "_", timepos = 2,
st = 20, sr = 0.5, maxt = 20, deepSplit = FALSE,
blocksize = 2000, mult = 5, hmax = 0.3, sampNameCol = 1,
collapse = TRUE, mspout = FALSE, mslev = 1, ExpDes = NULL,
normalize = "TIC", minModuleSize = 2, linkage="average")

The function will run through the following steps:
calculating ramclustR similarity: nblocks = 6
finished:1 2 3 4 5 6
RAMClust feature similarity matrix calculated and stored: 0.3 minutes
RAMClust distances converted to distance object: 0.1 minutes
fastcluster based clustering complete: 0 minutes

And then produce the following error:
Error in .subset2(x, i, exact = exact) : subscript out of bounds

My R session information is below.
Thank you in advance for your help.

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] ff_2.2-13 bit_1.1-12 CAMERA_1.22.0
[4] igraph_1.0.1 BiocInstaller_1.16.5 dynamicTreeCut_1.62
[7] fastcluster_1.1.16 xcms_1.42.0 Biobase_2.26.0
[10] BiocGenerics_0.12.1 mzR_2.0.0 Rcpp_0.12.0
[13] RAMClustR_0.2 devtools_1.9.1

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3 cluster_2.0.3 codetools_0.2-14
[4] colorspace_1.2-6 curl_0.9.1 digest_0.6.8
[7] foreign_0.8-65 Formula_1.2-1 ggplot2_1.0.1
[10] graph_1.44.1 grid_3.1.2 gridExtra_2.0.0
[13] gtable_0.1.2 Hmisc_3.16-0 httr_1.0.0
[16] lattice_0.20-33 latticeExtra_0.6-26 magrittr_1.5
[19] MASS_7.3-43 memoise_0.2.1 munsell_0.4.2
[22] nnet_7.3-10 plyr_1.8.3 proto_0.3-10
[25] R6_2.1.0 RBGL_1.42.0 RColorBrewer_1.1-2
[28] reshape2_1.4.1 rpart_4.1-10 scales_0.2.5
[31] splines_3.1.2 stats4_3.1.2 stringi_0.5-5
[34] stringr_1.0.0 survival_2.38-3 tcltk_3.1.2
[37] tools_3.1.2

extract duplicate code in defineExperiment.R

https://github.com/hechth/RAMClustR/blob/36fce5b34d2e57cd70c6d2b9ac53ea52250d94a6/R/defineExperiment.R#L157-L174

RAMClustR is moving house

Hi @meowcat and @Huansi, you both have forks of RAMClustR.
Please note that @cbroeckl and I are currently transferring
the RAMClustR repository to Corey where it belongs:
https://github.com/cbroeckl/RAMClustR/
You might have to fork afresh after the move.
Yours, Steffen

extract function to init replacing with NA functionality

https://github.com/hechth/RAMClustR/blob/36fce5b34d2e57cd70c6d2b9ac53ea52250d94a6/R/rc.feature.replace.na.R#L24-L62

general function organization

@zargham-ahmad - can we put all the general functions into one file? For example, create_ramclustObj() is a function created in rc.get.df.data, but is called from rc.get.csv.data and rc.get.xcms.data. I am sure there are other functions which are central and used by many files, and i would like to be able to find those central functions more easily.

temp2 object not found

Hi Corey,

Seems a tiny bug was recently introduced at

RAMClustR/R/ramclustR.R

Line 567 in 47620bf

tmp2 <- matrix(data = 0, nrow = nrow(temp1), ncol = ncol(temp2))

Which errors with object 'temp2' not found.

Thanks,
Rick

Implement a test case using individual steps to build the whole `ramclust.R` workflow

Since the main ramclustR.R file has become somewhat obsolete with the new individual components, it would be good to mark it as deprecated or even remove the functionality from the package after making sure that everything is kept where it actually should be. Another option would be to have this function as a default workflow running the fundamental steps of RAMClustR, so keeping it intact as a main wrapper @cbroeckl ?

To make sure that the functionality is kept or equivalent, we need a test case which runs the individual steps and then we can make a comparison to the results created by the old ramclustR function.

Check list of dependencies and possible remove outdated ones

RAMClustR has collected quite some dependencies and maybe some of them are by now outdated or no longer needed - I think it would make sense to go through the list of imported packages and see whether they are actually still required by the project @cbroeckl ?

Organize dependencies

The InterpretMSSpectrum package is not available on OSx, therefore it should move to Suggests and be imported via @concept in the respective functions so that it is referenced - the code in do.findmain.R should then only be executed if the package is present.

The BiocManager, stringi and xml2 packages are not imported, therefore should also be under suggests.

The package actually depends on R > 3.5.0, so it should also be noted in the package description.

	d <- getData(
	ramclustObj = ramclustObj,
	which.data = which.data,
	filter = filter,
	cmpdlabel = label.by
	)