palaeoverse / palaeoverse Goto Github PK
View Code? Open in Web Editor NEWpalaeoverse: an R package developed by palaeobiologists, for palaeobiologists
Home Page: https://palaeoverse.palaeoverse.org/
License: GNU General Public License v3.0
palaeoverse: an R package developed by palaeobiologists, for palaeobiologists
Home Page: https://palaeoverse.palaeoverse.org/
License: GNU General Public License v3.0
Great Job Lewis! I did some tests, it all seems to work except for the binning at large bin sizes (see below). These are the changes I propose:
equal
argument: size
is sufficient. Set default of size
to FALSE
, if user specifies a number, then use equal length time binsplot
default to FALSE
: I don't think the user usually doesn't want to plot these intervals when generating time bins.assign
GTS2020$interval_name
returns only stages. Should return table with stage, epoch, ... columnWhile playing around with other things, I realized that time_bins(scale = "GTS2012")
(i.e., using the old timescale with all other defaults) currently fails. This is because the default value for interval
is c("Fortunian", "Meghalayan")
, but the "Meghalayan" stage didn't exist in 2012. To fix/enhance this, I propose that we make the interval
argument optional (with a default value of NULL
) and if not specified, all intervals of the desired rank are returned from the specified timescale.
It would be great to give more flexibility to the user to which timescale is being used for time_bins()
(like axis_geo()
). This could include incorporating other timescales through deeptime::getTimeScale()
or by letting the user supply their own dataframe. I imagine this would be most useful for the equal-length binning aspect of time_bins()
.
CRAN checks note that we have some non-ASCII characters in our package. I'm guessing they are within our built-in datasets (maybe characters with accents?)? It's not super urgent, but we should convert them to ASCII at some point.
The tax_range_time
example uses orders, but does not exclude "NO_ORDER_SPECIFIED", resulting in weird groupings. This should be updated to use genera or exclude non-specified occurrences.
Describe the bug
The look_up function tries to look up intervals from the GTS tables and assign stages. This fails for intervals older than the Phanerozoic, as older stages are not defined in the GTS tables, resulting in an error.
To reproduce
look_up(reefs,
int_key = FALSE,
early_interval = "interval",
late_interval = "interval",
assign_with_GTS = "GTS2020",
return_unassigned = FALSE)
Expected behavior
Not assign those stages.
Resolution for the next release
Add a line of code specifying that pre-Phanerozoic intervals will not be looked up in the GTS tables.
Right now bin_time()
uses a uniform distribution to assign point estimates for ages. However, I can see situations where users would prefer a normal distribution, logistic distribution, etc. It would be quite cool (although possibly complicated) to allow for this type of flexibility.
It would be nice to have a way to ensure that we have the MOST equal-length bins possible. This could include some sort of algorithm that checks a bunch of different sets of bins and then compares their sds. I'm not sure on the return on investment here, both in terms of developer time and computation time within the function, but I think it might be useful for more statistically-inclined users.
Is your feature request related to a problem? Please describe.
tax_unique() currently verifies repetition based on one taxonomic level at a time. This means that, for example, genera with the same name but in different orders, would currently be collapsed into a single genus.
Describe the solution you'd like
The 'if' statements which cross-check taxon names need to be made more nuanced to allow checking across multiple taxon levels.
rot_age, rot_lng and rot_lat columns are generated by default in the palaeorotate function. However, these are not used by the "point" method and should therefore only be generated if the "grid" method is specified. Small bug, but annoying.
Is your feature request related to a problem? Please describe.
tax_unique() currently provides a list of 'unique' taxon names, and does not utilise occurrence information.
Describe the solution you'd like
An alternative output format which provides occurrence information as nested lists within each 'unique' taxon might be useful.
A function for checking the taxonomy of occurrence data. This might not be necessary with all of the taxonomic packages out there (e.g., taxize) and the fossilbrush package which seems to be devoted specifically to this problem.
Right now tax_expand_time()
only supports GTS 2012 and GTS 2020, but we should allow it to support a custom user time bin dataframe like bin_time()
does.
The lat_bins() function is now ready for auditing. Sofía and Lucas could you please now audit the code/documentation and check it is behaving as expected?
Please also document any tests you throw at it and bugs you find. This will help with developing the automated testing for later.
For now, I would suggest creating your own branch and proofing the code there. However, if you prefer we can also meet online to discuss any issues that need resolving.
Thank you!!!
When calling multiple GPMs at once, there seems to be a binding issue with the palaeocoordinates. Note, this is not an issue if palaeorotations are generated iteratively and must be related to chunk handling. This should be resolved quickly.
Documentation for the MULLER2019 model has been updated on GPlates Web Service stating that the MULLER2019 model covers 0--250 Ma (as the paper also states). However, the API service allows points to be reconstructed up to 540 Ma for this model. A little digging required...
This should be addressed for v1.1.1.
As a side note, the MULLER2022 model is also now available and perhaps should be incorporated down the line.
Is your feature request related to a problem? Please describe.
This problem was raised by Meghan Jenkinson (via X): an R-based way to plot stratigraphic range and occurrence data for an individual section.
Describe the solution you'd like
An extension of tax_range_time which plots ranges across beds within a section, including points indicating specific sampled levels, and ideally with open points for uncertain identifications.
Describe alternatives you've considered
These figures are common but typically made by hand for manuscripts, but it should be possible to generate them automatically from the input data.
Additional context
I already have some code from Alex Dunhill; I will be working on a full draft over the next couple of weeks.
The palaeorotate() function is now at a place I am happy with. Bethany, Emma, and Chris could you please now audit the code/documentation and check it is behaving as expected?
Please also document any tests you throw at it and bugs you find. This will help with developing the automated testing for later.
Three might seem overkill for checking this function, but I think this could perhaps be one of the most used functions. As such, I would like to ensure I haven't made a mistake somewhere.
For now, I would suggest creating your own branch and proofing the code there. However, if you prefer we can also meet online to discuss any issues that need resolving.
Thank you!!!
Currently, the reconstruction files are based on a 1º x 1º spatial grid. This should perhaps be updated to use a discrete equal-area grid. As implemented, points at high latitudes will be linked to reconstruction files at a higher geographic resolution than those at low latitudes.
Currently edge polygons are warped when they wrap around the antimeridian (-180/180). It would be nice to have better functionality for handling this. This is a known issue when representing a spheroid in 2D and is generally only a problem for visualisation purposes. However, it should be resolved at some point.
Functionality of the function
A function to bin occurrence data into spatial bins. I know @LewisAJones and @bethany-j-allen have their own ways of doing this. Perhaps it would also be useful to incorporate other methods, despite potential reservations (e.g., rectangular bins, the Close et al. MST method, etc).
We should go through the package and check all of the function and argument names for consistency before the CRAN release
Multi-model call in palaeorotate for the "point" method does not return all requested model coordinates, only the last called model.
Develop time_binning() function
Functionality:
In deeptime::coord_geo()
, there is the option to use ggfittext
to scale the labels such that they don't overlap with one another and fit within their boxes. I'm not aware of something similar for base R, but we could try to emulate it for axis_geo()
. @KEichenseer looked into it during development but couldn't figure out a solution, so it might not even be possible in base R?
With the devel version of sf and terra, GDAL 3.6.0, this:
00check.log
testthat.Rout.zip
The test makes no sense to me, and setup-data.R also fails:
> library(palaeoverse)
> require(divDyn, quietly = TRUE)
> stages <- deeptime::stages
> periods <- deeptime::periods
> data(corals)
Warning message:
In data(corals) : data set 'corals' not found
> corals_stages_clean <- subset(corals, stage != "")
Error in subset(corals, stage != "") : object 'corals' not found
> coral_div <- aggregate(cbind(n = genus) ~ stage,
+ data = corals_stages_clean,
+ FUN = function(x) length(x))
Error in eval(m$data, parent.frame()) :
object 'corals_stages_clean' not found
probably explaining why object 'coral_div' not found
. In my case, there is no package called 'divDyn'
, but you do not condition on success. Either rewrite the test framework to respect settings where _R_CHECK_FORCE_SUGGESTS_=FALSE
, or elevate packages needed for testing.
Right now, tax_unique()
only allows for family, class, and order, but I imagine lots of datasets out there have other higher taxonomic levels (e.g., subfamily). I think we should leave genus/species/binomial/name the way it is, but collapse the higher level arguments to a single argument that takes a vector, then just loop over those similar to how we do now with order/class/family. Since you would use indet. for any of those, we should theoretically be able to take any arbitrary set of ordered higher levels of taxonomy.
The time_bins() function is now ready for auditing. Kilian and Alessandro could you please now audit the code/documentation and check it is behaving as expected?
Please also document any tests you throw at it and bugs you find. This will help with developing the automated testing for later.
For now, I would suggest creating your own branch and proofing the code there. However, if you prefer we can also meet online to discuss any issues that need resolving.
Thank you!!!
Copied from this guide:
Loading the package now results in the following message:
The legacy packages maptools, rgdal, and rgeos, underpinning this package
will retire shortly. Please refer to R-spatial evolution reports on
https://r-spatial.org/r/2023/05/15/evolution4.html for details.
This package is now running under evolution status 0
My best guess is that this is due to our dependency on geosphere
which uses sp
? It doesn't look like geosphere has any intent on moving to sf
, so maybe we need to find another package to use?
More details: https://r-spatial.org/r/2023/05/15/evolution4.html
Edit: Looks like this functionality is built into sf
: https://cran.r-project.org/web/packages/sf/vignettes/sf7.html
Is your feature request related to a problem? Please describe.
Quantifying ghost ranges is useful for thinking about the sampling completeness of a dataset during exploration.
Describe the solution you'd like
This could include two facets:
Describe alternatives you've considered
I'm not sure if anything like this already exists - perhaps there might be something in a package more geared towards biostratigraphy?
Additional context
None.
Develop function for generating quick (palaeo-)latitudinal plots. The basis of the function is almost there but it still requires:
If you have any other suggestions/additions Sofía, please feel free to go ahead and implement them.
Tasks to be complete for palaeorotation function
A function for removing duplicate taxa from a dataset. I believe this is already performed by the fossilbrush package, but I'll leave it up to the assignees to determine if there are any gaps that still need to be filled.
The original idea of this function was to provide a conversion table for several regional and international time scales (similar to GeoWhen). This seems like a pretty lofty goal, and maybe focusing on a smaller number of time scales would be more tractable (for now). Alternatively, using interval ages to correlate across time scales instead of (bio)stratigraphic correlations might be more manageable and easily updatable?
Is your feature request related to a problem? Please describe.
Kateryn Pino on our Google Group requested additional customisability of plots in tax_range_time
. This might be desirable for some users to refine plots for publications.
Describe the solution you'd like
Basically, we should allow greater customisability. I guess we could pass standard plot arguments using ...
- it seems it would be the cleanest way.
Describe alternatives you've considered
An alternative would be to set up a number of expected customisable arguments (e.g. colour, title, etc), but this seems unnecessary and a lot more work.
Additional context
Nothing to add.
@KEichenseer @ChristopherDavidDean happy to review if I put a PR together?
Describe the bug
For the interval name Atdabanian/Botomian in interval_key
the early stage and late stage are incorrect. Currently, the early stage is "Stage 3" and the late stage "Stage 2". This should be the opposite.
To Reproduce
interval_key[which(interval_key$interval_name == "Atdabanian/Botomian"), ]
I think the palaeorotate() function is more or less there now!
Lucas, could you take a look through the code and documentation? Given your past 6 months of work, you're probably in the best position to give this an initial check before we pass it onto the team for formal review! We also need to think about the comparison (using all PBDB data) between rotations generated via the function, and actually using GPlates (essentially, point rotations vs. grid rotations). Although, I think this is something more for the eventual manuscript.
Merci!
According to this app found by @willgearty, our CONTRIBUTING.md is not up to scratch. This should be updated going forward.
It would be great to have a function that takes interval names and checks for spelling mistakes (similar to tax_check) and cleans up interval names. For example, some times interval names are provided as time1/time2 or time1–time2 in a single column or contain information that might want discarding (e.g. early Maastrichtian might want reducing to Maastrichtian).
A function for calculating the temporal, latitudinal, and/or spatial range of taxa (based on occurrence data).
I can envision a function that is like tax_time_expand
but for latitudinal bins, although I'm not sure range-through is great assumption for spatial studies. (also, both functions should probably have their names changed to tax_expand_{spat/time}
to mirror our verb construction of other function names)
We should add a vignette to showcase an example workflow using all/most of the functions in the package.
At some point, we should establish a website for the package (after we get the first version up and running). This can be done very easily using pkgdown (https://pkgdown.r-lib.org/articles/pkgdown.html). An example of how this can look: https://tidyverse.tidyverse.org/.
The plot that is output by bin_space appears to have some dateline problems.
Here is the output of the current documented example:
# Get internal data
data("reefs")
# Reduce data for plotting
occdf <- reefs[1:250, ]
# Bin data using a hexagonal equal-area grid
ex1 <- bin_space(occdf = occdf, spacing = 500, plot = TRUE)
# Bin data using a hexagonal equal-area grid and sub-grid
ex2 <- bin_space(occdf = occdf, spacing = 1000, sub_grid = 250, plot = TRUE)
Is your feature request related to a problem? Please describe.
For checking the tip names in a phylogeny against a list of taxa and trimming the phylogeny if desired.
Describe the solution you'd like
A table cross-matching a list of taxon names with the list of tip names, with an additional option to trim the phylogeny.
Describe alternatives you've considered
I don't know of other functions that do this - it might be available in e.g. paleotree - but I think would make a nice complement to the rest of the functions here anyway.
Additional context
None
Is your feature request related to a problem? Please describe.
Understanding the abundance distribution within a set of occurrences is a fundamental way of exploring a dataset, but we currently don't have any functions that do this.
Describe the solution you'd like
It would be great to have a function that summarises the abundance distribution in an occurrence dataset and outputs:
Describe alternatives you've considered
iNEXT has functions to create strings but it would be nice to have a more flexible function specifically designed to do this from a wider range of input formats.
Additional context
It would be great to expand this to model fitting on the distribution, but this is straying into analysis rather than data exploration.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.