The Popcycle pipeline performs 3 different analyses:
- filtration of Optimally Positioned Particles
- Manual Gating of cytometric populations
- Aggregate statistics for the different populations.
The output of each step is saved into a SQL database using sqlite3
. To run popcycle
and analyze SeaFlow data in real-time, you need to set the filter and gating parameters, and press play, that's it!
-
Clone the popcycle git repository (e.g., called
popcycle-master
, and avoid calling it justpopcycle
) into your computer . For that, open the terminal and type$ git clone https://github.com/uwescience/popcycle.git path/to/popcycle-master (replace path/to/popcycle-master by the actual path...)
-
Install popcycle package and its dependencies, such as
RSQLite
andsplancs
packages if they are not already installed$ cd path/to/popcyle-master $ Rscript setup.R
WARNINGS: You need to be in the popcycle repository to execute the setup.R script. The setup process creates a popcycle directory in ~/popcycle
, this is different from the popcycle repository.
-
The first step is to indicate where the raw files (
evt
) and database (popcycle.db
) are located and where to save the projectlibrary(popcycle) # Required set.cruise.id("foo") set.evt.location("/path/to/evt/files") # e.g., "/Volumes/cruise.id/evt" # Optional (defaults to ~/popcycle) set.project.location("/path/to/project") # e.g., "~/Cruise.id_project"
NOTE: set.project.location
will create a new database if the ("/path/to/project")
does not already exist
-
The second step is to set the parameters for the filtration method, i.e., the
width
(to adjust the alignment of the instrument) and thenotch
(to adjust the focus of the instrument) . Thenotch
represents the the ratio D/fsc_small and depends on how the PMTs of D1/D2 and fsc_small were set up, thewidth
represents the acceptable difference between D1 and D2 for a particle to be considered 'aligned', it is usually set between 0.1 and 0.5. For this example, we are going to choose thenotch
using the latest evt file collected by the instrument (but you choose any evt file that you want, of course). Thewidth
is set to 0.2. Open an R session and type:# SELECT AN EVT FILE evt.list <- get.evt.list() # to get the entire list of evt files evt.name <- evt.list[10] # then select the evt file (e.g., the 10th evt file in the list) # OR evt.name <- get.latest.evt.with.day() # to get the last evt file of the list # LOAD THE EVT FILE evt <- readSeaflow(evt.name) # SET the WIDTH and NOTCH parameter width <- 0.2 # usually between 0.1 and 0.5 notch <- 1 # usually between 0.5 and 1 plot.filter.cytogram(evt, notch=notch, width=width) # to plot the filtration steps
NOTE that if you have trouble finding the optimal NOTCH, you can use the function
find.filter.notch()
width <- 0.2 notch <- find.filter.notch(evt, notch=seq(0.5, 1.5, by=0.1),width=width, do.plot=TRUE) plot.filter.cytogram(evt, notch=notch, width=width)
Once you are satisfy with the filter parameters, you can filter evt
to get opp
by typing:
```r
opp <- filter.notch(evt, notch=notch, width=width)
```
IMPORTANT: To save the filter parameters so the filter parmaters will be apply to all new evt files, you need to call the function:
```r
setFilterParams(width, notch)
```
This function saves the parameters in ~/popcycle/params/filter/filter.csv. Note that every changes in the filter parameters are automatically saved in the logs (~popcycle/logs/filter/filter.csv).
-
Third step is to set the gating for the different populations. WARNINGS: The order in which you gate the different populations is very important, choose it wisely. The gating has to be performed over optimally positioned particles only, not over an evt file. In this example, you are going to first gate the
beads
(this is always the first population to be gated.). Then we will gateSynechococcus
population (this population needs to be gated before you gateProchlorococcus
orpicoeukaryote
), and finallyProchlorococcus
andpicoeukaryote
population. After drawing your gate on the plot, right-click to finalize. In the R session, type:# OPTION 1: SELECT OPP data by FILES opp.list <- get.opp.files() opp.name <- opp.list[10] # to select the opp files (e.g., the 10th opp file in the list, corresponding to 9 minutes of data) opp <- get.opp.by.file(opp.name) # OPTION 2: SELECT OPP data by DATE sfl <- get.sfl.table() sfl$date <- as.POSIXct(sfl$date,format="%FT%T",tz='GMT') opp <- get.opp.by.date(sfl$date[1], sfl$date[1]+60*60, pop=NULL, channel=NULL) # e.g., select 1-h of data # SET THE MANUAL GATING SCHEME setGateParams(opp, popname='beads', para.x='fsc_small', para.y='pe') setGateParams(opp, popname='synecho', para.x='fsc_small', para.y='pe') setGateParams(opp, popname='prochloro', para.x='fsc_small', para.y='chl_small') setGateParams(opp, popname='picoeuk', para.x='fsc_small', para.y='chl_small')
Similar to the
setFilterParams
function,setGateParams
saves the gating parameters and order in which the gating was performed in~/popcycle/params/params.RData
, parameters for each population are also separately saved as a.csv
file. Note that every changes in the gating parameters are automatically saved in the logs (~popcycle/logs/params/
).Note: If you want to change the order of the gating, delete a population, or simply restart over, use the function
resetGateParams()
-
To cluster the different population according to your manual gating, type:
vct <- classify.opp(opp, ManualGating)
-
To plot the cytogram with clustered populations, use the following function:
opp$pop <- vct par(mfrow=c(1,2)) plot.vct.cytogram(opp, para.x='fsc_small', para.y='chl_small') plot.vct.cytogram(opp, para.x='fsc_small', para.y='pe')
-
To apply the filter parameters and analyze evt files according to the filter parameters, use the following function
evt.list <- get.evt.list() filter.evt.files(evt.list)
NOTE that you can run
filter.evt.files
in a parallel fashion. If your computer has 4 cores, then typefilter.evt.files(evt.list, cores=4)
-
To apply the gating parameters and analyze opp files according to gating parameters, use the following function
opp.list <- get.opp.files() run.gating(opp.list) # if you want to apply the gating scheme to ALL the available OPP data from the cruise
This function will create/update the vct
and stats
tables.
Data can be plotted using a set of functions:
-
To plot the filter steps
set.evt.location("/path/to/evt/files") evt.list <- get.evt.list() evt.name <- evt.list[10] # to select to 10th evt file of the list plot.filter.cytogram.by.file(evt.name,width=0.2, notch=1)
-
To plot an evt cytogram. WARNING: the number of particles in an evt file can be high (>10,000) which can be a problem for some computer. We advise to limit the disply to < 10,000 particles.
set.evt.location("/path/to/evt/files") evt.list <- get.evt.list() evt.name <- evt.list[10] # to select to 10th evt file of the list evt <- readSeaflow(evt.name) # TO LIMIT the number of displyed particles to 10,000 if(nrow(evt) > 10000) evt <- evt[round(seq(1,nrow(evt), length.out=10000)),] plot.cytogram(evt, "fsc_small","chl_small")
-
To plot an opp cytogram
set.project.location("/path/to/project") # e.g., "~/Cruise.id_project" # OPTION 1: SELECT OPP data by FILES opp.list <- get.opp.files() opp.name <- opp.list[10] # to select the opp files (e.g., the 10th opp file in the list, corresponding to 9 minutes of data) opp <- get.opp.by.file(opp.name) plot.cytogram(opp, "fsc_small","chl_small) # OR DIRECTLY plot.cytogram.by.file(opp.name, "fsc_small","chl_small) # OPTION 2: SELECT OPP data by DATE sfl <- get.sfl.table() sfl$date <- as.POSIXct(sfl$date,format="%FT%T",tz='GMT') opp <- get.opp.by.date(sfl$date[1], sfl$date[1]+60*60, pop=NULL, channel=NULL) # e.g., select 1-h of data plot.cytogram(opp, "fsc_small","chl_small)
-
To plot an opp cytogram with clustered populations
set.project.location("/path/to/project") # e.g., "~/Cruise.id_project" opp.list <- get.opp.files() opp.name <- opp.list[10] # to select the opp file (e.g., the 10th opp file in the list) plot.vct.cytogram.by.file(opp.name)
-
To plot aggregate statistics, e.g., cell abundance the cyanobacteria
Synechococcus
population on a map or over timeset.project.location("/path/to/project") # e.g., "~/Cruise.id_project" stat <- get.stat.table() # to load the aggregate statistics plot.map(stat, pop='synecho', param='abundance') plot.time(stat, pop='synecho', param='abundance')
But you can plot any parameter/population, just make sure their name match the one in the 'stat' table...
FYI, type
colnames(stat)
to know which parameters are available in thestat
table, andunique(stat$pop)
to know the name of the different populations. -
Data stored in the popcycle.db can be visualized directly in R. Here is an example to display the first 10 row of the opp table in popcycle.db
set.project.location("/path/to/project") # e.g., "~/Cruise.id_project" conn <- dbConnect(SQLite(), dbname = db.name) dbGetQuery(conn, "SELECT * FROM opp LIMIT 10;")