Giter Site home page Giter Site logo

isoprocessor's Introduction

isoprocessor's People

Contributors

japhir avatar sebkopf avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

isoprocessor's Issues

support peak table operations if calibrations are no longer in the peak table

carefully implement alternate ways of passing calibration information to functions that need the calibrations still stored in the peak table (in case people need to drop the calibration columns in iso_get_calibration_data because of oversized data frames).

  • iso_mark_calibration_range
  • iso_export_calibration_to_excel

what else?

peak detection in continuous flow files

from @danote
This is not an issue with isoreader, but rather an idea for a new functionality, which you may or may not consider worth the time. I was just thinking that it would be awesome if isoreader allowed the user to re-evaluate a dxf trace with new peak detection parameters. In my experience, changing the peak detection parameters in Isodat can have important effects on delta values, but it is cumbersome to re-evaluate whole datasets in isodat varying peak detect params. If isoreader could evaluate the peaks, it would be easy for the user to test a range of peak detect params and see what results in best data quality. I think a lot of people arbitrarily choose peak detect params just because it is hard to test them. This could help. Just a thought. I know you don't have time to re-write all of Isodat though.

issue reading termination sequence of a dxf file

Hi @TanjaWald, thanks for reporting this issue, can you upload the .dxf file that causes this problem?

Transferred from sebkopf/isorunN2O#8

Hi,
I have problems when importing the data, always the first file cannot be processed. Here is an example:

`Warning: encountered 1 problem.

| FILE | PROBLEM | OCCURRED IN | DETAILS

1 | 44942__N2O 20nmol.dxf | error | extract_isodat_sequence_line_info | cannot process s...
Use iso_get_problems(...) for more details.`

Here I have attached the outputs from R:

iso_get_problems df raw

The following attachment contains information about the conditioner (N2O 20 nmol) from Isodat:

2021-03-05 15_23_30-Isodat Workspace - 44942__N2O 20nmol dxf
2021-03-05 15_24_34-Isodat Workspace - 44942__N2O 20nmol dxf
2021-03-05 15_24_51-Isodat Workspace - 44942__N2O 20nmol dxf

44942__N2O 20nmol.xlsx

Thanks for your help!

update iso_generate_summary_table and support outliers

  • rename to more generic iso_summarize_data_table
  • groups needs to be namespaced
  • renamed special char columns don't work, example to reproduce: iso_generate_summary_table(rename(mpg, d 13C/12C = cty), raw=d 13C/12C)
  • allow simplified use that takes all columns by default (if no ...) specified expect grouping columns, to generate the data table
  • allow flexible functions
  • implement outlier identification function based on distance from mean + sd: e.g. outside a 1, 2, 3 sigma range (see nu_tests in isoreader), maybe iso_identify_outliers(n_sigma = 5)
  • implement exclude_outliers = TRUE flag (default FALSE) parameter in iso_summarize_data_table that if set checks if outliers column exists in the data table (if not, points to running iso_identifiy_outliers() first), and then calculates the stats only based on those that are not outliers. Will also have to include a col_x n for each column to indicate how many of the total n were used for the calcluation

make iso_mark_calibration_range work with x/y variable renames

if a y or x value is renamed during iso_plot_data, then iso_mark_calibration_range will no longer recognize the range constraint anymore and it needs to be possible to tell the function which variable is actually which

one way to potential remedy this in a more autmoted way is to have all transformations that are applied to x and y during iso_plot_data ALSO applied to the calibration ranges that are stored inside the data frame! this is probably easiest if first nesting the dataset based on all calib_params fields (from all calibrations) and then doing the modifications in each evaluation ranges (keeping the original var names and ranges and adding the additional ones based on rename / mutate statements)

improve info message when adding standards

--> the iso_add_standards function should provide the names of the standards that have been added (based on the unique combinations of match_by) and how many of each standard have been identified. This would help with catching errors in standard identification (often an issue at the stage of generating calibrations) early on

calibration export

provide simple export function for calibration and all data contained in it including coefficients, summary, data, evaluation ranges

move example data for iso* packages to an isodata package dedicated to just data

to keep the functionality packages small and comply with CRAN guidelines for package sizes, it will be necessary to move some or all of the example data to a separate data only package, this can probably be one that works for the entire isoverse and is just isodata. example of how this was done in the past is here:

document error estimates in inversions better (+adjust?)

  • better documentation for what happens in the error estimates in the inversion (what is a Wald interval)?
  • allow boostrap inversion estimates? (will be super slow)
  • non-symmetrical error interval calculations allowed?
  • figure out whether adjust = "Bonferroni" or adjust = "Scheffe" needs to happen (see https://journal.r-project.org/archive/2014-1/greenwell-kabban.pdf simultaneous calibration intervals for details) and could make the calculations faster

allow raw data with peak tables in iso_prepare_continuous_flow_plot_data

re-arrange the zoom and other calculation in such a way that iso_prepare_contionuous_flow_plot_data can be implemented as an S3 method that can take a raw_data (+/- peak_table) trace and work with it directly to ensure compatibility with pipelines that move away from iso_files and work with the raw and peak table data directly.

note that this might mean it will be easier to do the gathering of the data trace data at a later point also in the iso_files version of this function (i.e. iso_get_raw_data without direct gathering at this early) point and then we could deprecate the data_trace paramter in iso_combine_raw_data_with_peak_table and just have a separate gather function for raw data that works outside of the iso_get_raw_data as well.

upgrade iso_generate_summary_table

  • allow simplified use that takes all columns by default (if no ...) specified expect grouping columns, to generate the data table
  • allow exclusion of data that is outside a 1, 2, 3 sigma range (see nu_tests in isoreader) - maybe will have to include an col_x n for each column to indicate how many of the total n were used for the calcluation (if no cutoff provided, can leave out these columns). note that this may be useful in a separate function that identifies_outliers first

improve documentation for plotting functions

move shared parameters about plotting features into the generic instead of the dataframe iso_object specific S3 implementations - that makes them available in the quick help (much more useful)

this might also mean that the time window parameter would be useful to have in both rather than just the iso_object implementation (in reality this means moving the time math to the data frame method)

deprecate metadata functions

this functionality is already implemented by iso_add_file_info from isoreader (1.0.7) and should be soft-deprecated here along with its auxiliary functions (explain how to do the filtering based on whether something has metadata or not since that function will no longer exist)

consider doing peak mapping on nested files

consider taking the same approach as with the regressions to standardize how these kind of operations work, i.e. introduce a prepare_for_peak_mapping function and then an unnest_peaks or something like that (since iso_add_metadata will be replaced by iso_add_fileinfo from isoreader already), this should be the only big data adding operation.

maybe even a unified command that is something along the lines of iso_prepare_for_grouped_operation?

note: not set in stone either way, something to think about carefully as it might be overkill to do it this way

introduce easier calibration references

for multiple sequential calibrations, make it easier to refer to the right calibration by implementing a default function that selects the last calibration created

iso_set_peak_table_automatically_from_vendor_data_table not working on mac

Hi Seb,

I'm working through Brett's code lookign for problems, and I'm having trouble getting through on of the early chunks. (The exact rmd I'm working on is called - rmarkdown_5_ea_irms_example_Gasbench_greatness

when I run:

 # process peak table
iso_files_w_peak_table <- iso_files_raw %>% 
  # set peak table from vendor data table
  iso_set_peak_table_automatically_from_vendor_data_table() %>% 
  # convert units from mV to V for amplitudes and area
  iso_convert_peak_table_units(V = mV, Vs = mVs) 

I get the error: Error in apply_software_files_ids[[i, "func"]](iso_files[apply_software_files_ids[[i, :
attempt to apply non-function

plotting function for scan files

find a good way to deal with the types and how to best panel them while also preserving flexibliity in data trace and file coloring / paneling

allow for multiple `use_in_calib` parameters

create all permutations of the model and use_in_calib parameters to allow easier evaluation of fits across different calibration ranges. Probably requires a new column to capture what the range is (either the expression or a name, similar to how the model_name works). Having this additional column would be a significant addition at the basic level of the regression machinery. Consider carefully what this means for regressions, range predictions and in_range plotting.

simplify mark_outlier

same as with mark_value_range, provide an easy way to simplify highlighting of outliers simply based on standard deviations

Trouble installing/loading isoprocessor in R version 3.6.0

devtools::install_github("isoverse/isoprocessor")
Downloading GitHub repo isoverse/isoprocessor@master
✔ checking for file ‘/private/var/folders/pw/9fk_chms0jl85nfldy5rgfyc0000gn/T/RtmpGRq76b/remotesd2f97f3ca6a2/isoverse-isoprocessor-d728877/DESCRIPTION’ (338ms)
─ preparing ‘isoprocessor’: (410ms)
✔ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘isoprocessor_0.4.0.1.tar.gz’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/example_chromatograms_with_peaks-1.png’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/first_look_sample_and_standard_peaks-1.png’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/global_calibration_coefficients-1.png’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/global_calibration_residuals-1.png’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/single_analysis_calibration_coefficients-1.png’

  • installing source package ‘isoprocessor’ ...
    ** using staged installation
    ** R
    ** inst
    ** byte-compile and prepare package for lazy loading
    Error: package or namespace load failed for ‘isoreader’:
    .onLoad failed in loadNamespace() for 'isoreader', details:
    call: NULL
    error: (converted from warning) Unquoting language objects with !!! is deprecated as of rlang 0.4.0.
    Please use !! instead.

    Bad:

    dplyr::select(data, !!!enquo(x))

    Good:

    dplyr::select(data, !!enquo(x)) # Unquote single quosure
    dplyr::select(data, !!!enquos(x)) # Splice list of quosures

This warning is displayed once per session.
Error: package ‘isoreader’ could not be loaded
Execution halted
ERROR: lazy loading failed for package ‘isoprocessor’

  • removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/isoprocessor’
    Error in i.p(...) :
    (converted from warning) installation of package ‘/var/folders/pw/9fk_chms0jl85nfldy5rgfyc0000gn/T//RtmpGRq76b/filed2f966e016f0/isoprocessor_0.4.0.1.tar.gz’ had non-zero exit status

implement peak_table for continuous flow files

to simplify isoprocessor continuous flow calculations that may benefit from staying in isofile space for continuous flow plotting and other operations, it could be useful to introduce an peak_table field that could be modified (e.g. if a calibrated peak table data set should be added back into the isofiles at a later point) via iso_set_data/peak_table and could adopt vendor data table to begin with using iso_set_peak_table_from_vendor_data_table (or something along those lines) and rename fields to names that are compatible with isoprocessor defaults (rt, rt_start, rt_end, etc., and likewise the areas and backgrounds, + units via iso_double_with_units).

Calculation operations then operate on raw_data and peak_table (or often a combination of the two) as needed. I would advocate for keeping them separate rather than a hashed combined one even if that makes those operations a bit slower. I.e. an iso_calculate_peak_area combines the peak table and chromatogram and calculates areas, then spits out the peak table again into its own field and leaves the raw data unchanged. this would probably help from getting things too messy in where what data is stored.

the rough overview of functions would be:

  • iso_detect_peaks: peak detection that operates on the raw data and generate the peak_table from scratch (not implemented for now)
  • iso_set_peak_table_from_vendor_data_table: start with vendor data table
  • iso_set_peak_table: set peak table from external information (or after calibration)
  • iso_mutate_peak_table: add new columns with useful information
  • iso_map_peaks: modify the peak mapping function to work within the isofiles (to keep plotting easy)
  • iso_integrate_peaks: combine raw data with peak data to calculate peak areas, max peak height, bgrd areas, bgrd_start height and bgrd_end height (depending on bgrd calculation modes), and store the result in the peak_data field
  • iso_calculate_peak_ratios: calculate ratios in the peaks table, shouldn't need the raw data in any way
  • iso_calculate_peak_deltas: calculate peak deltas based on an expression that identifies the reference peaks to use, a delta value for the reference peaks (again an expression so it can be a fixed value or a column introduced from other information, by default 0 - somehow figure out how to do this taking the standard values into consideration too if provided by methods info), and a method on how to extrapolate raw ratios from the ref peaks ("linear", "bracket", "average", etc.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.