isoverse / isoprocessor Goto Github PK
View Code? Open in Web Editor NEWisoprocessor package (in development)
Home Page: http://isoprocessor.isoverse.org
License: GNU General Public License v2.0
isoprocessor package (in development)
Home Page: http://isoprocessor.isoverse.org
License: GNU General Public License v2.0
carefully implement alternate ways of passing calibration information to functions that need the calibrations still stored in the peak table (in case people need to drop the calibration columns in iso_get_calibration_data because of oversized data frames).
what else?
from @danote
This is not an issue with isoreader, but rather an idea for a new functionality, which you may or may not consider worth the time. I was just thinking that it would be awesome if isoreader allowed the user to re-evaluate a dxf trace with new peak detection parameters. In my experience, changing the peak detection parameters in Isodat can have important effects on delta values, but it is cumbersome to re-evaluate whole datasets in isodat varying peak detect params. If isoreader could evaluate the peaks, it would be easy for the user to test a range of peak detect params and see what results in best data quality. I think a lot of people arbitrarily choose peak detect params just because it is hard to test them. This could help. Just a thought. I know you don't have time to re-write all of Isodat though.
to support data frame operations for ratio calculations as well as iso files
Note: still missing: automatic ratio calculation for ratios that are not present yet. The ratio columns will automatically be around in the data frame but will not be retained in the iso_files unless they are separately calculated using iso_calculate_ratios.
Hi @TanjaWald, thanks for reporting this issue, can you upload the .dxf file that causes this problem?
Transferred from sebkopf/isorunN2O#8
Hi,
I have problems when importing the data, always the first file cannot be processed. Here is an example:`Warning: encountered 1 problem.
| FILE | PROBLEM | OCCURRED IN | DETAILS
1 | 44942__N2O 20nmol.dxf | error | extract_isodat_sequence_line_info | cannot process s...
Use iso_get_problems(...) for more details.`Here I have attached the outputs from R:
The following attachment contains information about the conditioner (N2O 20 nmol) from Isodat:
Thanks for your help!
iso_summarize_data_table
groups
needs to be namespacediso_generate_summary_table(rename(mpg,
d 13C/12C = cty), raw=
d 13C/12C)
iso_identify_outliers(n_sigma = 5)
exclude_outliers = TRUE
flag (default FALSE
) parameter in iso_summarize_data_table
that if set checks if outliers
column exists in the data table (if not, points to running iso_identifiy_outliers()
first), and then calculates the stats only based on those that are not outliers. Will also have to include a col_x n
for each column to indicate how many of the total n were used for the calcluationdon't use grid for single panel (=single y) iso_plot_data
to have a simpler plot with y label
if a y or x value is renamed during iso_plot_data
, then iso_mark_calibration_range
will no longer recognize the range constraint anymore and it needs to be possible to tell the function which variable is actually which
one way to potential remedy this in a more autmoted way is to have all transformations that are applied to x and y during iso_plot_data
ALSO applied to the calibration ranges that are stored inside the data frame! this is probably easiest if first nesting the dataset based on all calib_params fields (from all calibrations) and then doing the modifications in each evaluation ranges (keeping the original var names and ranges and adding the additional ones based on rename / mutate statements)
--> the iso_add_standards
function should provide the names of the standards that have been added (based on the unique combinations of match_by
) and how many of each standard have been identified. This would help with catching errors in standard identification (often an issue at the stage of generating calibrations) early on
the units of the residual should be inferred automatically during regression (if there are any units) from the y variable of the regression
the help for S3 methods like iso_plot_continuous_flow
is not great because the default is defined as ... so that's the only help available when plotting - is there any way to make this better?
hopefully, this RStudio release https://support.rstudio.com/hc/en-us/articles/205273297-Code-Completion will help with this but make sure to test it
this happens when panel = ...
something that is not file_id
since at the time of the zoom grouping the data frame does not yet have access to the file information!
iso_select_file_info gives details on each column, many times those columns are all selected for the exact same # of files - could this be consolidated a bit more?
currently only the raw data is scaled but the backgrounds should be scaled the same way
provide simple export function for calibration and all data contained in it including coefficients, summary, data, evaluation ranges
tibble update breaks some functionality (e.g. #85 )
to keep the functionality packages small and comply with CRAN guidelines for package sizes, it will be necessary to move some or all of the example data to a separate data only package, this can probably be one that works for the entire isoverse and is just isodata
. example of how this was done in the past is here:
adjust = "Bonferroni"
or adjust = "Scheffe"
needs to happen (see https://journal.r-project.org/archive/2014-1/greenwell-kabban.pdf simultaneous calibration intervals for details) and could make the calculations fasterby default, select from all calibrations, warn if there is conflicting constraints from the calibrations on the range depicted in the plot
re-arrange the zoom and other calculation in such a way that iso_prepare_contionuous_flow_plot_data can be implemented as an S3 method that can take a raw_data (+/- peak_table) trace and work with it directly to ensure compatibility with pipelines that move away from iso_files and work with the raw and peak table data directly.
note that this might mean it will be easier to do the gathering of the data trace data at a later point also in the iso_files version of this function (i.e. iso_get_raw_data without direct gathering at this early) point and then we could deprecate the data_trace
paramter in iso_combine_raw_data_with_peak_table and just have a separate gather function for raw data that works outside of the iso_get_raw_data as well.
col_x n
for each column to indicate how many of the total n were used for the calcluation (if no cutoff provided, can leave out these columns). note that this may be useful in a separate function that identifies_outliers
firstfor when there are additional grouping variables used for calibration purposes
this will make it easier for users to figure out what's been created
move shared parameters about plotting features into the generic instead of the dataframe iso_object specific S3 implementations - that makes them available in the quick help (much more useful)
this might also mean that the time window parameter would be useful to have in both rather than just the iso_object implementation (in reality this means moving the time math to the data frame method)
this functionality is already implemented by iso_add_file_info
from isoreader (1.0.7) and should be soft-deprecated here along with its auxiliary functions (explain how to do the filtering based on whether something has metadata or not since that function will no longer exist)
consider taking the same approach as with the regressions to standardize how these kind of operations work, i.e. introduce a prepare_for_peak_mapping
function and then an unnest_peaks
or something like that (since iso_add_metadata will be replaced by iso_add_fileinfo
from isoreader already), this should be the only big data adding operation.
maybe even a unified command that is something along the lines of iso_prepare_for_grouped_operation
?
note: not set in stone either way, something to think about carefully as it might be overkill to do it this way
where background info is available, implement background markers option
for multiple sequential calibrations, make it easier to refer to the right calibration by implementing a default function that selects the last calibration created
Hi Seb,
I'm working through Brett's code lookign for problems, and I'm having trouble getting through on of the early chunks. (The exact rmd I'm working on is called - rmarkdown_5_ea_irms_example_Gasbench_greatness
when I run:
# process peak table
iso_files_w_peak_table <- iso_files_raw %>%
# set peak table from vendor data table
iso_set_peak_table_automatically_from_vendor_data_table() %>%
# convert units from mV to V for amplitudes and area
iso_convert_peak_table_units(V = mV, Vs = mVs)
I get the error: Error in apply_software_files_ids[[i, "func"]](iso_files[apply_software_files_ids[[i, :
attempt to apply non-function
find a good way to deal with the types and how to best panel them while also preserving flexibliity in data trace and file coloring / paneling
add info about how many metadata records could not be matched as well
create all permutations of the model
and use_in_calib
parameters to allow easier evaluation of fits across different calibration ranges. Probably requires a new column to capture what the range is (either the expression or a name, similar to how the model_name works). Having this additional column would be a significant addition at the basic level of the regression machinery. Consider carefully what this means for regressions, range predictions and in_range plotting.
same as with mark_value_range, provide an easy way to simplify highlighting of outliers simply based on standard deviations
it would be great to be able to change which reference peaks are used for the calculation and (re)calculate resulting ratios and deltas on the fly
right now, regression inversion happens for each analysis individually but it may make more sense to allow this to happen for replicate analyses right away to get a better constraint on errors
should allow named and unnamed (then use Sheet1, Sheet2, etc. or if single variable maybe the parsed variable name?) data set export
devtools::install_github("isoverse/isoprocessor")
Downloading GitHub repo isoverse/isoprocessor@master
✔ checking for file ‘/private/var/folders/pw/9fk_chms0jl85nfldy5rgfyc0000gn/T/RtmpGRq76b/remotesd2f97f3ca6a2/isoverse-isoprocessor-d728877/DESCRIPTION’ (338ms)
─ preparing ‘isoprocessor’: (410ms)
✔ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘isoprocessor_0.4.0.1.tar.gz’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/example_chromatograms_with_peaks-1.png’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/first_look_sample_and_standard_peaks-1.png’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/global_calibration_coefficients-1.png’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/global_calibration_residuals-1.png’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘isoprocessor/docs/articles/gc_irms_example_carbon_files/figure-html/single_analysis_calibration_coefficients-1.png’
installing source package ‘isoprocessor’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
Error: package or namespace load failed for ‘isoreader’:
.onLoad failed in loadNamespace() for 'isoreader', details:
call: NULL
error: (converted from warning) Unquoting language objects with !!!
is deprecated as of rlang 0.4.0.
Please use !!
instead.
dplyr::select(data, !!!enquo(x))
dplyr::select(data, !!enquo(x)) # Unquote single quosure
dplyr::select(data, !!!enquos(x)) # Splice list of quosures
This warning is displayed once per session.
Error: package ‘isoreader’ could not be loaded
Execution halted
ERROR: lazy loading failed for package ‘isoprocessor’
to simplify isoprocessor continuous flow calculations that may benefit from staying in isofile space for continuous flow plotting and other operations, it could be useful to introduce an peak_table
field that could be modified (e.g. if a calibrated peak table data set should be added back into the isofiles at a later point) via iso_set_data/peak_table
and could adopt vendor data table to begin with using iso_set_peak_table_from_vendor_data_table
(or something along those lines) and rename fields to names that are compatible with isoprocessor defaults (rt
, rt_start
, rt_end
, etc., and likewise the areas and backgrounds, + units via iso_double_with_units
).
Calculation operations then operate on raw_data
and peak_table
(or often a combination of the two) as needed. I would advocate for keeping them separate rather than a hashed combined one even if that makes those operations a bit slower. I.e. an iso_calculate_peak_area
combines the peak table and chromatogram and calculates areas, then spits out the peak table again into its own field and leaves the raw data unchanged. this would probably help from getting things too messy in where what data is stored.
the rough overview of functions would be:
iso_detect_peaks
: peak detection that operates on the raw data and generate the peak_table
from scratch (not implemented for now)iso_set_peak_table_from_vendor_data_table
: start with vendor data tableiso_set_peak_table
: set peak table from external information (or after calibration)iso_mutate_peak_table
: add new columns with useful informationiso_map_peaks
: modify the peak mapping function to work within the isofiles (to keep plotting easy)iso_integrate_peaks
: combine raw data with peak data to calculate peak areas, max peak height, bgrd areas, bgrd_start height and bgrd_end height (depending on bgrd calculation modes), and store the result in the peak_data
fieldiso_calculate_peak_ratios
: calculate ratios in the peaks table, shouldn't need the raw data in any wayiso_calculate_peak_deltas
: calculate peak deltas based on an expression that identifies the reference peaks to use, a delta value for the reference peaks (again an expression so it can be a fixed value or a column introduced from other information, by default 0 - somehow figure out how to do this taking the standard values into consideration too if provided by methods info), and a method on how to extrapolate raw ratios from the ref peaks ("linear", "bracket", "average", etc.)e.g.:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.