Giter Site home page Giter Site logo

seaflow-uw / popcycle Goto Github PK

View Code? Open in Web Editor NEW
9.0 9.0 8.0 50.58 MB

Popcycle is an R package that offers reproducible analytical tools to analyze flow cytometry data collected by SeaFlow

Home Page: https://seaflow.netlify.com

License: MIT License

R 99.23% HTML 0.33% Dockerfile 0.29% Shell 0.15%
flow-cytometry oceanography phytoplankton reproducible-research seaflow workflow

popcycle's People

Contributors

anettow avatar ctberthiaume avatar dhalperi avatar fribalet avatar jhyrkas avatar johnmacmillan96 avatar mmh1133 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

popcycle's Issues

Errors in functions for specifying new project location

plot.cytogram.by.file() has no argument for the new project location

setGateParams() does not send the gate parameters to the right folder

set.evt.location("/Volumes/seaflow/DeepDOM")
set.project.location("/Volumes/gwennm/popcycle/sqlite/popcycle_DeepDOM.db")
set.cruise.id("march2013")
setGateParams(opp, popname='beads', para.x='fsc_small', para.y='pe')
[1] "No Gating parameters yet!"
Error in file(file, ifelse(append, "a", "w")) :
cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file '/Volumes/gwennm/popcycle/sqlite/popcycle_DeepDOM.db/logs/gates/2014-11-10T19-51-21+0000_beads.csv': Not a directory

the gate params should go in /Volumes/gwennm/popcycle/logs/gates/

Additionally, the filter logs in /Volumes/gwennm/popcycle/logs/filter were not updated when I filtered the files recently with the parallel code. Not sure which function this is part of.

track all attempts to filter EVT files

Currently attempts to filter EVT files that don't produce focused particles for whatever reason are not tracked. This makes it hard to reason whether an EVT file was been processed yet or not. Realtime analysis is most affected since in the worst case you will reprocess a completely out of focus EVT file every analysis iteration, and if there are many such files this can cause a backlog in realtime analysis.

Every attempt to filter an EVT file should produce an OPP entry in the database, and these entries can be filtered out in downstream steps.

error importing SFL files for SCOPE_7_DLM cruise

Looks like an error parsing dates in the SCOPE 7 DLM cruise SFL files. Stack trace:

Traceback (most recent call last):

  File "import_sfl.py", line 172, in <module>

    insert_all_files(args.db, args.evt_dir, args.cruise)

  File "import_sfl.py", line 127, in insert_all_files

    insert_files_bulk(find_sfl_files(evt_path), dbpath, cruise)

  File "import_sfl.py", line 106, in insert_files_bulk

    fix_and_insert_sfl(data, header, dbpath, cruise)

  File "import_sfl.py", line 73, in fix_and_insert_sfl

    if (dbcolumn_to_fixed_data[DATE][-3] != ":"):

KeyError: 'DATE'

warning message about "max(abs(i))" and "-Inf" when auto classifying

Sometimes this warning message appears when performing auto classification:

Warning message:
In max(abs(i)) : no non-missing arguments to max; returning -Inf

It's possible to reproduce this error message with this simple test case, based on real data from HOT-295/2017_170/2017-06-19T22-38-43+00-00.

library(flowDensity)
library(flowCore)
df <- data.frame(
  fsc_small=c(609.24229, 88.24268, 356.75344, 66.07975),
  chl_small=c(20.107942, 9.117206, 6.641842, 3.659126)
)
fframe <- flowFrame(as.matrix(log10(df)))
labels <- flowDensity(
  obj=fframe,
  channels=c("fsc_small", "chl_small"),
  position=c(F, T), gates=c(1.25, 0.25),
  ellip.gate=T, scale=0.975
)

This looks like flowDensity or flowCore doesn't check for some empty data structures that occur when there are a very small number of cells to classify, in this case 4 cells. As such this probably has minimal impact on results.

deleting sfl from database and replacing with new values

I came across a problem where I found some errant lat and lon values in my sfl file and I need to replace them, but I cannot delete sfl entries from the popcycle.db and they cannot be overwritten.

I think this requires a new function preferably a hidden function like the .delete.opp

gating functions

The function described on the popcycle front page: run.gating() does not exist.

I would like to run gates for specified ranges, as in rerun.gating() function, but since I have an older cruise the file names are not in the correct format. Is there a work around or another function I could use to rerun specific dates (gating the whole cruise with run.gating.v1 won't work because I need gates for very different stations).

Another suggestion for something that would be extremely useful for gating functions in general: allow for non-exact timestamp matching by finding files with timestamps in a range (ie: files between 2013-03-28 0:00 and 2013-03-30 0:00)

Real-time problem

Evaluate.latest.evt function stopped is there is an error (e.g., when SeaFlow gets a fault). Need to add a tryCatch() somewhere...

setup.R issue

I pull the latest popcycle code and when I tried to re-install the package, I got the following error message:

-bash-3.2$ Rscript popcycle-master/setup.R
Installing package into ‘/Users/francois/Library/R/3.1/library’
(as ‘lib’ is unspecified)
trying URL 'http://cran.us.r-project.org/bin/macosx/contrib/3.1/testthat_0.9.1.tgz'
Content type 'application/x-gzip' length 240166 bytes (234 Kb)
opened URL
downloaded 234 Kb
The downloaded binary packages are in
/var/folders/kj/c5j5wf5n7m530ky9vmhn4vs80000gn/T//RtmpZXHONb/downloaded_packages
Installing package into ‘/Users/francois/Library/R/3.1/library’
(as ‘lib’ is unspecified)
Warning: invalid package ‘.’
Error: ERROR: no packages specified
Warning message:
In install.packages(".", repos = NULL, type = "source") :
installation of package ‘.’ had non-zero exit status

bug in new function get.opp.by.date

get.opp.by.date mostly works except it is failing to return some opp files for date ranges which do exist in my database. Is this because my database is not indexed by time?
Not sure how to provide you with a reproducible example of this without giving you my db

Better way to upgrade dependencies

Dependencies don't get upgraded at every install. There should be an easy way for users to upgrade dependencies, maybe a second install script, or parameterize the existing one.

Spurious errors when gating

For some files there are no "unknown" particles left when gating later populations. This causes an error. We should add checks to gating functions to ensure there are particles left to gate.

classify.opp

There is no help info available for this function (contained in gating.R).

db overwrite

Tried to refilter in parallel, db1 and db2 were created, but then there was an error:
Error in checkForRemoteErrors(val) : one node produced an error: RS-DBI driver: (RS_SQLite_exec: could not execute1: database is locked)

`get.opp.files` doesn't properly remove outliers

Using HOT-294 as an example:

# Get the outlier table
outlier_table <- get.outlier.table(db)
# Get all unique file IDs in the the opp table
all_opp_files <- unique(get.opp.table(db)$file)
# Get OPP files with `get.opp.files`, which should remove outliers by default
opp_files <- get.opp.files(db)
# Confirm that the first file in the outlier table is an outlier
outlier_table[1,]
#                                file flag
#1 2017_170/2017-06-19T22-38-43+00-00    1
# This file should not be in `opp_files`, but it is
opp_files[1]
#[1] "2017_170/2017-06-19T22-38-43+00-00"

I think this is because the outlier matching in get.opp.files isn't using the match function correctly.

get.opp.list function

New error message from the MV1405 cruise. Any idea?

 set.evt.location("/Volumes/seaflow/MV1405")
 opp.list <- get.opp.list()
Error in sqliteFetch(rs, n = -1, ...) : 
  RSQLite driver: (RS_SQLite_fetch: failed: database disk image is malformed)

fix_sfl.py adds quotes to my headers and file names

I tried to transform an sds file into sfl that had been manipulated by R to fix some data gaps. the fix_sfl.py script successfully removed the '.' and added a space, but it also added quotes around the headers and around the year day in the file name (ex: "2013_85"/1.evt).

plot.filter.cytogram plots missing

tried plot.filter.cytogram on two files (3980, 3700) with notch=0.8 and width=0.2
only alignment plot showed up- no focus or opp plots
error message:
Error in [.data.frame(subset(aligned, D1.fsc_small < 2 & D2.fsc_small < :
object 'display' not found

using multiple files for gating

is it possible to read in multiple opp files for gating?

e.g. rather than setting the gates on a single opp file using this command:
opp.name <- opp.list[600]

could you set a time range to get all of the data from?

error in plot.filter.cytogram

Following the readme I get an error with plot.filter.cytogram. It looks like the var opp is referenced before it's created in the function

plot.filter.cytogram(evt, notch=notch, width=width)
Error in is.data.frame(x) : object 'opp' not found

unable to open database file for new project

i'm getting set up to analyse the data from MV1405, and have run into the following error:

> set.project.location("~/Documents/MV1405/seaflow")
Error: unable to open database "/Users/sclayton/Documents/MV1405/seaflow/sqlite/popcycle.db": unable to open database file

any thoughts on why this might be coming up? the MV1405/seaflow folder already exists, and I have read/write permissions for it...

Add Instrument info to apply the appropriate ratio_evt_stream

from fix_fl.py

LINE 86

  stream_pressure = dbcolumn_to_fixed_data['STREAM_PRESSURE']
  ratio_evt_stream = 0.14756
  flow_rate = 1000 * (-9*10**-5 * stream_pressure**4 + 0.0066 * stream_pressure**3 - 0.173 * stream_pressure**2 + 2.5013 * stream_pressure + 2.1059) * ratio_evt_stream

Functions calling "file.path"

Any function that calls "file.path" requires a quantile input. In the README file, this input needs to be updated in examples using "get.opp.by.file." Currently everything in "visualization.R" needs it, too. The error I get is:

Error in file.path(opp.dir, quantile, f) :
argument "quantile" is missing, with no default

seaflowpy_filter not accepting NA

seaflowpy_filter(db, cruise, evt.dir, opp.dir, process.count=4, notch1=NA, notch2=NA, width=0.5, offset=0)
seaflowpy_filter: error: argument --notch1: invalid float value: 'NA'

zombie VCT results

When regating with a new set of OPP files, it's possible to leave behind VCT db entries and files from the previous analysis if that OPP file no longer exists. This is because classify.opp.files only erases VCT db entries and files for OPP files that it's about to classify, but obviously if there was an OPP file in the previous run that no longer exists because it was filtered out then that delete will never happen.

It should be easy enough to detect these zombies by the presence of old filter IDs. Maybe we should do a scan after classifying and notify the user of these zombies results, which they can then manually erase. I'd also be happy to devise some way to automatically clean up these cases if we can guarantee that we'll never erase data inappropriately.

additional features

  • Add defaults values for Filter and Gating parameters
  • Send command to terminate the real-time analysis (stop cronjob, etc...)
  • A command to select a opp dataframe based on time and number of cells.

run.gating

CMOP_6 gating populations using pop from the popcycle.db (following wiki instructions)

Tried run.gating(list.files) after filtering and completing initialization steps 3 and 4. This is the error: Error in paste(start.day, start.timestamp, sep = "/") : argument "start.timestamp" is missing, with no default

Here were some of the other settings I used (they worked when I did the filtering):
set.evt.location("/Volumes/seaflow/CMOP_6")
set.project.location("~/CMOP_2013_f2")

Should I have tried run.gating(opp.list)?
(since I already created opp.list when setting the gating parameters)

trouble importing cruise.sfl into sqlit3

The wiki part concerning importing cruise.sfl into sqlite3 needs update. As is, it says to use fix_sfl but this function was deleted a few months ago... Update pleazzz

opp.name error

(CMOP_6)
Was able to get opp files and create opp.list
Then tried opp.name <- opp.list[247] # to select the opp file (e.g., the 10th opp file in the list)
and got Error in [.data.frame(opp.list, 247) : undefined columns selected

Problem with filter.evt.files.parallel

LOAD

library(popcycle)
set.project.location("~/KM_1")
set.cruise.id("KM_1")
set.evt.location("/Volumes/seaflow/KiloMoana_1")

SELECT AN EVT FILE

evt.list <- get.evt.list() # to get the entire list of evt files
width <- 0.3 # usually between 0.1 and 0.5
notch <- 0.7 # usually between 0.5 and 1

FILTER

evt.files.without.opp <- filter.evt.files.parallel(evt.list, notch=notch, width=width, cruise=cruise.id, db=db.name, evt.loc=evt.location, cores=4)

NO OPP FILES UPLOADED INTO DB... but it works with filter.evt.files.serial

best.filter.notch 1

From Gwenn Hennon, using DeepDOM dataset

file.name = file.list[4600]
plot.filter.cytogram.by.file(evt.location, file.name, width=0.2, notch=0.9)
evt <- readSeaflow(paste(evt.location, file.name, sep='/')) # load the evt file
notch <- best.filter.notch(evt, notch=seq(0.1, 1.4, by=0.1),width=0.2, do.plot=TRUE)
[1] "filtering notch= 0.1"
[1] "filtering notch= 0.2"
[1] "filtering notch= 0.3"
[1] "filtering notch= 0.4"
[1] "filtering notch= 0.5"
[1] "filtering notch= 0.6"
[1] "filtering notch= 0.7"
[1] "filtering notch= 0.8"
[1] "filtering notch= 0.9"
[1] "filtering notch= 1"
[1] "filtering notch= 1.1"
[1] "filtering notch= 1.2"
[1] "filtering notch= 1.3"
[1] "filtering notch= 1.4"
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr, :
length of 'dimnames' [2] not equal to array extent
In addition: Warning message:
In min(which(DF$fsc.max == max(DF$fsc.max) & DF$id == max(DF$id))) :
no non-missing arguments to min; returning Inf

stat table empty(?) after gating

CMOP_6 post gating

It appeared as if run.gating(opp.list) was working last night, but then my stats table came up empty?
Here is what is happening:
stat_problem

readSeaflow and get.opp.by.file have different signatures for similar tasks

It may be a good idea to think about keeping readSeaflow as the generic low-level labview binary file reader, and creating a new function to specifically read evt files, e.g. get.evt.by.file that has a similar signature to get.opp.by.file.

The problem this would solve is a user may use readSeaflow to read an OPP file, which appear to work but probably isn't what was intended since it will return all OPP quantile as though they were part of one set.

suggest adding this function to retrieve opp.evt.ratio

new function to retrieve opp.evt.table from sqlite database, suggest adding to popcycle

get.opp.evt.ratio.table <- function(db=db.name){
sql <- paste0("SELECT * FROM ", opp.evt.ratio.table.name)
con <- dbConnect(SQLite(), dbname= db)
table <- dbGetQuery(con, sql)
dbDisconnect(con)
return (table)
}

plot_map documentation

Documentation for plot_map' still lists popname' as an input variable:

Usage

plot_time(stat, popname,param)

implementing timestamps for PMT changes

@ctberthiaume and I were just talking about adding in some kind of functionality that will read the log files form the instrument and identify when PMTs were changed. then, when you are analysing data after the cruise, you could set filters and gates for each segment of the cruise data and do the analysis all in one go.

insert.stats.for.file() not working properly

stats table from the small database get updated after every new file, but doesn't after rerunning previous files.
As of today, stats table contains only data from day "2014-06-13" but can't insert "2014-06-11" or "2014-06-12"

insert.stats.for.file("2014-06-12T01-09-07+00-00")
[1] TRUE

but nothing get inserted in the db

readSeaflow doesn't check for too much data

readSeaflow checks that it can read the number of data bytes reported in the unit32 header of a labView binary file, but it doesn't check if there is additional data beyond this amount that isn't supposed to be there. This is useful to catch cases where a user reads an OPP file with a final bitflag encoded three quantile boolean column as an EVT file.

We should add this check and a test for this kind of error.

bug in plot.gate.cytogram

Working through the README, encountered this error. Looks like poly.log has not been set by the time it's used. Or is set somewhere else (maybe setGateParams) and is expected to be available to plot.gate.cytogram?

setGateParams(opp, popname='beads', para.x='fsc_small', para.y='pe')
[1] "No Gating parameters yet"
Error in plot.gate.cytogram(opp, para.x, para.y) : 
  object 'poly.log' not found

unable to set project location and popcycle.db-journal file created

i've just pulled and installed the latest version of popcycle and have been trying to continue my analysis. unfortunately, when i issue the following commands in R:

library(popcycle)
set.cruise.id("MV1405")
set.project.location("Documents/MV1405/seaflow")

R just hangs, and a popcycle.db-journal file is created in the project directory.

the popcycle.db-journal file looks like this:

’…˛Öù€���SQLite format 3���@  ��€ù€�����€-Ê����ˆ��˚�ˆ�    ��’…˛Ö�
�ÌÌ�ï�º¬¬%���9��indexsqlite_auB���Ö%������ä1tableoppopp�CREATE TABLE opp (
  -- First three columns are the EVT, OPP, VCT composite key
  cruise TEXT NOT NULL,
  file TEXT NOT NULL,  -- in old files, File+Day. in new files, Timestamp.
  particle INTEGER NOT NULL,
  -- Next we have the measurements. For these, see
  -- https://github.com/fribalet/flowPhyto/blob/master/R/Globals.R and look
  -- at version 3 of the evt header
  time INTEGER NOT NULL,
  pulse_width INTEGER NOT NULL,
  D1 REAL NOT NULL,
  D2 REAL NOT NULL,
  fsc_small REAL NOT NULL,
  fsc_perp REAL NOT NULL,
  fsc_big REAL NOT NULL,
  pe REAL NOT NULL,
  chl_small REAL NOT NULL,
  chl_big REAL NOT NULL,
  PRIMARY KEY (cruise, file, particle)
)%���9��indexsqlite_autoindex_opp_1opp�B���%��[indexoppFileIndexopp�CREATE INDEX oppFileIndex
ON opp (file)’…ˇg�
����Ä�ß�Î�ï�º�Çb������Ö+tablevctvct�CREATE TABLE vct (
  -- First three columns are the EVT, OPP, VCT, SDS composite key
  cruise TEXT NOT NULL,
  file TEXT NOT NULL,  -- in old files, File+Day. in new files, Timestamp.
  particle INTEGER NOT NULL,
  -- Next we have the classification
  pop TEXT NOT NULL,
  method TEXT NOT NULL,
  PRIMARY KEY (cruise, file, particle)
)%���9��indexsqlite_autoindex_vct_1vct�B���%��[indexvctFileIndexvct CREATE INDEX vctFileIndex
ON vct (file)É'������Ü5tablesflsfl
CREATE TABLE sfl (
  --First two columns are the SDS composite key
  cruise TEXT NOT NULL,
  file TEXT NOT NULL,  -- in old files, File+Day. in new files, Timestamp.
  date TEXT,
  file_duration REAL,
  lat REAL,
  lon REAL,
  conductivity REAL,
  salinity REAL,
  ocean_tmp REAL,
  par REAL,
  bulk_red REAL,
  stream_pressure REAL,
  flow_rate REAL,
  event_rate INTEGER,
  PRIMARY KEY (cruise, file)
)%���9��indexsqlite_autoindex_sfl_1sfl�B    ��%��[indexsflDateIndexsflCREATE INDEX sflDateIndex
ON sfl (date)’…ˇD�
�0p���L�ò�√�~�´0>���!��SindexoppPeIndexoppl�ACREATE INDEX oppPeIndex ON opp (pe)Å�
��''�Å{tableopp_evt_ratioopp_evt_ratio
CREATE TABLE opp_evt_ratio (
  cruise TEXT NOT NULL,
  file TEXT NOT NULL,
  ratio REAL,
  PRIMARY KEY (cruise, file)
)9���M'�indexsqlite_autoindex_opp_evt_ratio_1opp_evt_ratio�ÇI�����Ñqtablestatsstats�CREATE TABLE stats (
  cruise TEXT NOT NULL,
  file TEXT NOT NULL,
  time TEXT,
  lat REAL,
  lon REAL,
  opp_evt_ratio REAL,
  flow_rate REAL,
  file_duration REAL,
  pop TEXT NOT NULL,
  n_count INTEGER,
  abundance REAL,
  fsc_small REAL,
  chl_small REAL,
  pe REAL,
  PRIMARY KEY (cruise, file, pop)
))
��=��indexsqlite_autoindex_stats_1stats�Å8������ÇKtablecytdivcytdiv�CREATE TABLE cytdiv (
  cruise TEXT NOT NULL,
  file TEXT NOT NULL,
  N0 INTEGER,
  N1 REAL,
  H REAL,
  J REAL,
  opp_red REAL,
  PRIMARY KEY (cruise, file)
)+���?��indexsqlite_autoindex_cytdiv_1cytdiv�S���/��oindexoppFsc_smallIndexopp%–;CREATE INDEX oppFsc_smallIndex ON opp (fsc_small)’…ˇj

?!

best.filter.notch 2

From Maria Hamilton, looking at CMOP_6 dataset
"best.filter.notch will not work for file 3960 but works for 3961, 3959, 4000, 4400, 3900 (all the others I've tried today so far)."
October 24, 2014

Problem with filter.evt.files.parallel

library(popcycle)
set.project.location("~/KM_1")
set.cruise.id("KM_1")
set.evt.location("/Volumes/seaflow/KiloMoana_1")

SELECT AN EVT FILE

evt.list <- get.evt.list() # to get the entire list of evt files
width <- 0.3 # usually between 0.1 and 0.5
notch <- 0.7 # usually between 0.5 and 1

FILTER

evt.files.without.opp <- filter.evt.files.parallel(evt.list, notch=notch, width=width, cruise=cruise.id, db=db.name, evt.loc=evt.location, cores=4)

NO OPP FILES UPLOADED INTO DB... but it works with filter.evt.files.serial

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.