Giter Site home page Giter Site logo

mzrtsim's Introduction

mzrtsim

R-CMD-check Lifecycle: stable

The goal of mzrtsim is to make raw data and features table simulation for LC/GC-MS based data

Installation

You can install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("yufree/mzrtsim")

Raw Data simulation

You could use simmzml to generate one mzML file.

library(mzrtsim)
data("monams1")
simmzml(db=monams1, name = 'test')

You will find test.mzML and corresponding test.csv with m/z, retention time and compound name of the peaks. Here the monams1 and monahrms1 is from the MS1 data of MassBank of North America (MoNA) and could be downloaded from their website. You could also use hmdbcms to simulate EI source data extracted from HMDB. Here we only use the MS1 full scan data for simulation.

Multiple files with experiment design

You could stimulate two groups of raw data with different peak widths for the same compounds. Retention time could follow a uniform distribution. 100 compounds could be selected randomly and base peaks’ signal to noise ratio could be sample from 100 to 1000. Each group contain 10 samples and 30% compounds are changed between case and control groups.

dir.create('case')
dir.create('control')
# set different peak width for 100 compounds
pw1 <- c(rep(5,30),rep(10,40),rep(15,30))
pw2 <- c(rep(5,20),rep(10,30),rep(15,50))
# set retention time for 100 compounds
rt <- seq(10,590,length.out=100)
set.seed(1)
# select compounds from database
compound <- sample(c(1:4000),100)
set.seed(2)
# select signal to noise ration
sn <- sample(c(100:10000),100)
for(i in c(1:10)){
  simmzml(name=paste0('case/case',i),db=monahrms1,pwidth = pw1,compound=compound,rtime = rt, sn=sn)
}

for(i in c(1:10)){
  simmzml(name=paste0('control/control',i),db=monahrms1,pwidth = pw2,compound=compound,rtime = rt, sn=sn)
}

Then you could find 10 mzML files in case sub folder and another 10 mzML files in control sub folder, as well as corresponding csv files with m/z, retention time and compound name of the peaks.

Chromatography peaks

You could also use simmzml to stimulate tailing/leading peaks by defining the tailing factor of the peaks. When the tailing factor is lower than 1, the peaks are leading peaks. When the tailing factor is larger than 1, the peaks are tailing peaks.

# leading peaks
simmzml(name='test',db=monahrms1,pwidth = 10,compound=1,rtime = 100, sn=10,tailingfactor = 0.8)
# tailing peaks
simmzml(name='test',db=monahrms1,pwidth = 10,compound=1,rtime = 100, sn=10,tailingfactor = 1.5)

matrix stimulation

You could also input a m/z vector as matrix masses. Those masses will generate background baseline signals. By default, the mass vector is from matrix samples previous published.

data(mzm)
simmzml(name='test',db=monahrms1,pwidth = 10,compound=1,rtime = 100, sn=10,matrixmz = mzm,matrix = TRUE)

Peaks list simulation

You could also use mzrtsim to make simulation of peak list.

Here we make a simulation of 100 compounds from selected database with two conditions and three batches. 5 percentage of the peaks were influenced by conditions and 10 percentage of the peaks were influenced by batch effects. Three different type could be simulated: monotonic, random and block. You could also bind batch type, for example, ‘mb’ means the simulation would contain both monotonic and block batch effects. ‘db’ means the spectra database to be used for simulation as metioned in raw data simulation section.

library(mzrtsim)
data("monams1")
simdata <- mzrtsim(ncomp = 100, ncond = 2, ncpeaks = 0.05,
  nbatch = 3, nbpeaks = 0.1, npercond = 10, nperbatch = c(8, 5, 7), seed = 42, batchtype = 'mb', db=monams1)

You could save the simulated data into multiple csv files by simdata function. simraw.csv could be used for metaboanalyst. simraw2.csv show the raw peaks list. simcon.csv show peaks influenced by conditions only.simbatchmatrix.csv show peaks influnced by batch effects only. simbat.csv show peaks influenced by batch effects and conditions. simcomp.csv show independent peaks influenced by conditions and batch effects. simcompchange.csv show the conditions changes of each groups. simblockbatchange.csv show the block batch changes of each groups. simmonobatchange.csv show the monotonic batch changes of each group.

simdata(sim,name = "sim")

mzrtsim's People

Contributors

yufree avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

feigeliudan01

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.