isoverse / isoprocessor Goto Github PK

View Code? Open in Web Editor NEW

1.0 3.0 0.0 113.7 MB

isoprocessor package (in development)

Home Page: http://isoprocessor.isoverse.org

License: GNU General Public License v2.0

Makefile 0.08% R 99.76% Shell 0.16%

isoprocessor's Introduction

isoprocessor

About

This package provides broad functionality for IRMS data processing and reduction pipelines.

Existing functionality includes signal conversion (voltage to current and back), time scaling (continuous flow chromatograms), isotope ratio calculations, delta value calculations, as well as easy-to-use highly flexible data calibration and visualization pipelines for continuous flow data. Additional tools on O17 corrections, H3 factor calculation, peak detection, baseline correction, etc are in the works. All implemented functions are well documented and ready for use. However, since this package is still in active development some syntax and function names may still change.

Installation

You can install the isoprocessor package from GitHub using the devtools package. Note that while isoprocessor uses some functions from isoreader, it does NOT require IRMS data to be read with isoreader, it can be used standalone with raw data obtained differently.

# installs the development tools package if not yet installed
if(!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") 

# installs the newest version of isoprocessor
devtools::install_github("isoverse/isoprocessor")

Update

To update to a newer version of isoprocessor:

# installs the newest version of isoprocessor
devtools::install_github("isoverse/isoprocessor")

Troubleshooting note: depending on your workspace and operating system, you may have to re-start your R session, delete previous versions of these packages (remove.packages("isoprocessor"), remove.packages("isoreader")), and/or manually install some dependencies (e.g. the digest package tends to cause trouble: remove.packages("digest"); install.packages("digest")).

Functionality

for a full reference of all available functions, see the Function Reference
for an example of how to work with continuos flow data, see the vignette on Continuous Flow
for an example of how to work with dual inlet data, see the vignette on Dual Inlet
for an example of how Thermo Isodat in particular calculates isotope ratios for standard continuous flow and dual inlet data, see the Isodat calculations notebook
additional vignettes on data reduction and calibration are in the works:
- example 1: bulk carbon isotope analysis
- example 2: compound specific carbon isotope analysis

Open Source

isoprocessor is and will always be fully open-source (i.e. free as in ‘freedom’ and free as in ‘free beer’) and is provided as is. The source code is released under GPL-2.

isoverse

This package is part of the isoverse suite of data tools for stable isotopes. If you like the functionality that isoverse packages provide to the geochemical community, please help us spread the word and include an isoverse or individual package logo on one of your posters or slides. All logos are posted in high resolution in the main isoverse repository.

isoprocessor's People

Contributors

Stargazers

Watchers

isoprocessor's Issues

introduce generic (small) delta calculation function for dual inlet files

Note: still missing: automatic ratio calculation for ratios that are not present yet. The ratio columns will automatically be around in the data frame but will not be retained in the iso_files unless they are separately calculated using iso_calculate_ratios.

select from all calibrations in iso_mark_calibration_range

by default, select from all calibrations, warn if there is conflicting constraints from the calibrations on the range depicted in the plot

move example data for iso* packages to an isodata package dedicated to just data

to keep the functionality packages small and comply with CRAN guidelines for package sizes, it will be necessary to move some or all of the example data to a separate data only package, this can probably be one that works for the entire isoverse and is just isodata. example of how this was done in the past is here:

implement peak_table for continuous flow files

to simplify isoprocessor continuous flow calculations that may benefit from staying in isofile space for continuous flow plotting and other operations, it could be useful to introduce an peak_table field that could be modified (e.g. if a calibrated peak table data set should be added back into the isofiles at a later point) via iso_set_data/peak_table and could adopt vendor data table to begin with using iso_set_peak_table_from_vendor_data_table (or something along those lines) and rename fields to names that are compatible with isoprocessor defaults (rt, rt_start, rt_end, etc., and likewise the areas and backgrounds, + units via iso_double_with_units).

Calculation operations then operate on raw_data and peak_table (or often a combination of the two) as needed. I would advocate for keeping them separate rather than a hashed combined one even if that makes those operations a bit slower. I.e. an iso_calculate_peak_area combines the peak table and chromatogram and calculates areas, then spits out the peak table again into its own field and leaves the raw data unchanged. this would probably help from getting things too messy in where what data is stored.

the rough overview of functions would be:

iso_detect_peaks: peak detection that operates on the raw data and generate the peak_table from scratch (not implemented for now)
iso_set_peak_table_from_vendor_data_table: start with vendor data table
iso_set_peak_table: set peak table from external information (or after calibration)
iso_mutate_peak_table: add new columns with useful information
iso_map_peaks: modify the peak mapping function to work within the isofiles (to keep plotting easy)
iso_integrate_peaks: combine raw data with peak data to calculate peak areas, max peak height, bgrd areas, bgrd_start height and bgrd_end height (depending on bgrd calculation modes), and store the result in the peak_data field
iso_calculate_peak_ratios: calculate ratios in the peaks table, shouldn't need the raw data in any way
iso_calculate_peak_deltas: calculate peak deltas based on an expression that identifies the reference peaks to use, a delta value for the reference peaks (again an expression so it can be a fixed value or a column introduced from other information, by default 0 - somehow figure out how to do this taking the standard values into consideration too if provided by methods info), and a method on how to extrapolate raw ratios from the ref peaks ("linear", "bracket", "average", etc.)

plotting function for scan files

find a good way to deal with the types and how to best panel them while also preserving flexibliity in data trace and file coloring / paneling

make iso_mark_calibration_range work with x/y variable renames

if a y or x value is renamed during iso_plot_data, then iso_mark_calibration_range will no longer recognize the range constraint anymore and it needs to be possible to tell the function which variable is actually which

one way to potential remedy this in a more autmoted way is to have all transformations that are applied to x and y during iso_plot_data ALSO applied to the calibration ranges that are stored inside the data frame! this is probably easiest if first nesting the dataset based on all calib_params fields (from all calibrations) and then doing the modifications in each evaluation ranges (keeping the original var names and ranges and adding the additional ones based on rename / mutate statements)

support peak table operations if calibrations are no longer in the peak table

carefully implement alternate ways of passing calibration information to functions that need the calibrations still stored in the peak table (in case people need to drop the calibration columns in iso_get_calibration_data because of oversized data frames).

iso_mark_calibration_range
iso_export_calibration_to_excel

what else?

auto-complete of S3 methods: is there a better way?

the help for S3 methods like iso_plot_continuous_flow is not great because the default is defined as ... so that's the only help available when plotting - is there any way to make this better?

hopefully, this RStudio release https://support.rstudio.com/hc/en-us/articles/205273297-Code-Completion will help with this but make sure to test it

consider doing peak mapping on nested files

consider taking the same approach as with the regressions to standardize how these kind of operations work, i.e. introduce a prepare_for_peak_mapping function and then an unnest_peaks or something like that (since iso_add_metadata will be replaced by iso_add_fileinfo from isoreader already), this should be the only big data adding operation.

maybe even a unified command that is something along the lines of iso_prepare_for_grouped_operation?

note: not set in stone either way, something to think about carefully as it might be overkill to do it this way

calibration export

provide simple export function for calibration and all data contained in it including coefficients, summary, data, evaluation ranges

bug in iso_plot_continuous_flow_data - zoom_group not found

this happens when panel = ... something that is not file_id since at the time of the zoom grouping the data frame does not yet have access to the file information!

allow for multiple `use_in_calib` parameters

create all permutations of the model and use_in_calib parameters to allow easier evaluation of fits across different calibration ranges. Probably requires a new column to capture what the range is (either the expression or a name, similar to how the model_name works). Having this additional column would be a significant addition at the basic level of the regression machinery. Consider carefully what this means for regressions, range predictions and in_range plotting.

include newly created columns in info messages

this will make it easier for users to figure out what's been created

make better excel sheets for iso_export_calibration_to_excel

e.g.:

have columns be the width of the column headers

simplify mark_outlier

same as with mark_value_range, provide an easy way to simplify highlighting of outliers simply based on standard deviations

improve info message when adding standards

--> the iso_add_standards function should provide the names of the standards that have been added (based on the unique combinations of match_by) and how many of each standard have been identified. This would help with catching errors in standard identification (often an issue at the stage of generating calibrations) early on

migrate calculation functions from isoreader

issue reading termination sequence of a dxf file

Hi @TanjaWald, thanks for reporting this issue, can you upload the .dxf file that causes this problem?

Transferred from sebkopf/isorunN2O#8

Hi,
I have problems when importing the data, always the first file cannot be processed. Here is an example:

`Warning: encountered 1 problem.

| FILE | PROBLEM | OCCURRED IN | DETAILS

1 | 44942__N2O 20nmol.dxf | error | extract_isodat_sequence_line_info | cannot process s...
Use iso_get_problems(...) for more details.`

Here I have attached the outputs from R:

The following attachment contains information about the conditioner (N2O 20 nmol) from Isodat:

44942__N2O 20nmol.xlsx

Thanks for your help!

peak detection in continuous flow files

from @danote
This is not an issue with isoreader, but rather an idea for a new functionality, which you may or may not consider worth the time. I was just thinking that it would be awesome if isoreader allowed the user to re-evaluate a dxf trace with new peak detection parameters. In my experience, changing the peak detection parameters in Isodat can have important effects on delta values, but it is cumbersome to re-evaluate whole datasets in isodat varying peak detect params. If isoreader could evaluate the peaks, it would be easy for the user to test a range of peak detect params and see what results in best data quality. I think a lot of people arbitrarily choose peak detect params just because it is hard to test them. This could help. Just a thought. I know you don't have time to re-write all of Isodat though.

provide functionality for error ellipses

iso_set_peak_table_automatically_from_vendor_data_table not working on mac

Hi Seb,

I'm working through Brett's code lookign for problems, and I'm having trouble getting through on of the early chunks. (The exact rmd I'm working on is called - rmarkdown_5_ea_irms_example_Gasbench_greatness

when I run:

 # process peak table
iso_files_w_peak_table <- iso_files_raw %>% 
  # set peak table from vendor data table
  iso_set_peak_table_automatically_from_vendor_data_table() %>% 
  # convert units from mV to V for amplitudes and area
  iso_convert_peak_table_units(V = mV, Vs = mVs)

I get the error: Error in apply_software_files_ids[[i, "func"]](iso_files[apply_software_files_ids[[i, :
attempt to apply non-function

support units in iso_plot_data aesthetics other than x and y (if aesthetics are single symbol)

improve documentation for plotting functions

move shared parameters about plotting features into the generic instead of the dataframe iso_object specific S3 implementations - that makes them available in the quick help (much more useful)

this might also mean that the time window parameter would be useful to have in both rather than just the iso_object implementation (in reality this means moving the time math to the data frame method)

make plotting functions more flexible to support data frames directly

function to prep plotting data for cf
S3 based cf plotting that can take iso_files and data frames
same for dual inlet

update iso_generate_summary_table and support outliers

rename to more generic iso_summarize_data_table
groups needs to be namespaced
renamed special char columns don't work, example to reproduce: iso_generate_summary_table(rename(mpg, d 13C/12C = cty), raw=d 13C/12C)
allow simplified use that takes all columns by default (if no ...) specified expect grouping columns, to generate the data table
allow flexible functions
implement outlier identification function based on distance from mean + sd: e.g. outside a 1, 2, 3 sigma range (see nu_tests in isoreader), maybe iso_identify_outliers(n_sigma = 5)
implement exclude_outliers = TRUE flag (default FALSE) parameter in iso_summarize_data_table that if set checks if outliers column exists in the data table (if not, points to running iso_identifiy_outliers() first), and then calculates the stats only based on those that are not outliers. Will also have to include a col_x n for each column to indicate how many of the total n were used for the calcluation

include background calculation in iso_convert_signals

currently only the raw data is scaled but the backgrounds should be scaled the same way

infer units in residual during regression

the units of the residual should be inferred automatically during regression (if there are any units) from the y variable of the regression

`iso_summarize_peak_mappings()` provides confusing information

the peak mapping summarize function iso_summarize_peak_mappings() when used with iso_get_problematic_peak_mappings() gives very confusing results because it claims 0/x peaks mapped (because the good maps have been removed) - maybe include a "only problematic files" in the summarize peak_mappings function instead?

allow for replicate-conscious regression inversion

right now, regression inversion happens for each analysis individually but it may make more sense to allow this to happen for replicate analyses right away to get a better constraint on errors

highlight peaks in chromatograms

Trouble installing/loading isoprocessor in R version 3.6.0

devtools::install_github("isoverse/isoprocessor")
Downloading GitHub repo isoverse/isoprocessor@master
✔ checking for file ‘/private/var/folders/pw/9fk_chms0jl85nfldy5rgfyc0000gn/T/RtmpGRq76b/remotesd2f97f3ca6a2/isoverse-isoprocessor-d728877/DESCRIPTION’ (338ms)
─ preparing ‘isoprocessor’: (410ms)
✔ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘isoprocessor_0.4.0.1.tar.gz’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/example_chromatograms_with_peaks-1.png’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/first_look_sample_and_standard_peaks-1.png’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/global_calibration_coefficients-1.png’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/global_calibration_residuals-1.png’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/single_analysis_calibration_coefficients-1.png’

installing source package ‘isoprocessor’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
Error: package or namespace load failed for ‘isoreader’:
.onLoad failed in loadNamespace() for 'isoreader', details:
call: NULL
error: (converted from warning) Unquoting language objects with !!! is deprecated as of rlang 0.4.0.
Please use !! instead.

Bad:

dplyr::select(data, !!!enquo(x))

Good:

dplyr::select(data, !!enquo(x)) # Unquote single quosure
dplyr::select(data, !!!enquos(x)) # Splice list of quosures

This warning is displayed once per session.
Error: package ‘isoreader’ could not be loaded
Execution halted
ERROR: lazy loading failed for package ‘isoprocessor’

removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/isoprocessor’
Error in i.p(...) :
(converted from warning) installation of package ‘/var/folders/pw/9fk_chms0jl85nfldy5rgfyc0000gn/T//RtmpGRq76b/filed2f966e016f0/isoprocessor_0.4.0.1.tar.gz’ had non-zero exit status

implement prediction of dependent variables in regressions

implement background markers is iso_plot_continuous_flow

where background info is available, implement background markers option

write proper tests for out-of-range calibrations

deprecate metadata functions

this functionality is already implemented by iso_add_file_info from isoreader (1.0.7) and should be soft-deprecated here along with its auxiliary functions (explain how to do the filtering based on whether something has metadata or not since that function will no longer exist)

provide functionality for easy data export to excel (not just calibrations)

should allow named and unnamed (then use Sheet1, Sheet2, etc. or if single variable maybe the parsed variable name?) data set export

implement S3 methods for unit conversion (time & signal) to support data frames

recalculate peak areas from raw data and peak delimiters (retention times + background info)

it would be great to be able to change which reference peaks are used for the calculation and (re)calculate resulting ratios and deltas on the fly

iso_plot_continuous_flow throws an aesthetics error when peak_table defined but no filtered peaks in the plotting window

allow raw data with peak tables in iso_prepare_continuous_flow_plot_data

re-arrange the zoom and other calculation in such a way that iso_prepare_contionuous_flow_plot_data can be implemented as an S3 method that can take a raw_data (+/- peak_table) trace and work with it directly to ensure compatibility with pipelines that move away from iso_files and work with the raw and peak table data directly.

note that this might mean it will be easier to do the gathering of the data trace data at a later point also in the iso_files version of this function (i.e. iso_get_raw_data without direct gathering at this early) point and then we could deprecate the data_trace paramter in iso_combine_raw_data_with_peak_table and just have a separate gather function for raw data that works outside of the iso_get_raw_data as well.

document error estimates in inversions better (+adjust?)

better documentation for what happens in the error estimates in the inversion (what is a Wald interval)?
allow boostrap inversion estimates? (will be super slow)
non-symmetrical error interval calculations allowed?
figure out whether adjust = "Bonferroni" or adjust = "Scheffe" needs to happen (see https://journal.r-project.org/archive/2014-1/greenwell-kabban.pdf simultaneous calibration intervals for details) and could make the calculations faster

allow additional grouping columns for iso_mark_calibration_range

for when there are additional grouping variables used for calibration purposes

upgrade iso_generate_summary_table

allow simplified use that takes all columns by default (if no ...) specified expect grouping columns, to generate the data table
allow exclusion of data that is outside a 1, 2, 3 sigma range (see nu_tests in isoreader) - maybe will have to include an col_x n for each column to indicate how many of the total n were used for the calcluation (if no cutoff provided, can leave out these columns). note that this may be useful in a separate function that identifies_outliers first