Giter Site home page Giter Site logo

binomen's People

Contributors

graceli8 avatar maelle avatar sckott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

binomen's Issues

to jsonld (xml too?)

Fxn started, but not working yet:

  • try to follow darwin core terms
  • meh, for XML, we'd need XML pkg, maybe just stick with jsonld

Thoughts on binomen

Hi @sckott, you asked for my thoughts on binomen on issue #9 and I thought a new issue might be a better place to respond.

I like the package overall. It has a clean and straight forward style that I think biologists will like.
Below is a list of intermixed comments, suggestions, and potential bugs.

  • The classes assume explicit ranks. I have mixed feelings about this. On one hand, this is intuitive for biologists who might want to characterize data by a familiar rank, like family or genus. Having defined ranks means that you know the order (e.g. does 'aberration' come before 'subform'?). This also gives you a way to force all kinds of data into a table or equal-depth taxonomy, however incomplete, since you know the dimension corresponding to rank. On the other hand, data from some sources, such as database FASTA headers, might have the taxon names without their ranks and some people might be fine not thinking about the ranks; forcing those people to look up the ranks via database searches might be too much of a hassle for some. Or, ranks might be available, but the rank names/order might be different than what is defined in the package, forcing the user to figure out how to convert their raw data before forming it into binomen classes. Also, genus must be defined, yet in some cases you might not know the genus, such as ambiguously classified OTUs. Overall, it seems to me explicit ranks are more intuitive, but less flexible. I lean towards arbitrary/optional ranks.
  • When there is no species epithet, none is printed. How about printing this as sp.?
  • When no species is defined none is printed twice in the species field of the binomial. Is this supposed to happen?
> obj <- make_taxon(genus="Poa", family='Poaceae')
> obj$binomial
<binomial>
  genus: Poa
  epithet: none
  canonical: Poa none
  species: Poa none none
  authority: none
  • For span it would be nice to be able to leave off one end of the range (e.g. df2 %>% span(family, NA) or df2 %>% span(family, )), meaning that all ranks above/below the specified rank are included.
  • You have to specify genus to make a taxon, but not when making a taxondf. This can cause an error when trying to convert taxondf to taxa:
> df <- data.frame(order = c('Asterales','Asterales','Fagales','Poales','Poales','Poales'),
+                  family = c('Asteraceae',NA,'Fagaceae','Poaceae','Poaceae','Poaceae'),
+                  genus = c('Helianthus', NA,'Quercus','Poa','Festuca','Holodiscus'),
+                  stringsAsFactors = FALSE)
> df2 <- taxon_df(df)
> scatter(df2)
Error in if (genus == "none") stop("You must supply at least genus") : 
  missing value where TRUE/FALSE needed

Allow filter with operators < >, etc.

We have span, e.g., df2 %>% span(order, genus) and takes two rank names to get range of data from

Would be nice to allow searches like

  • obj %>% select( > family)
  • obj %>% select( < genus)
  • obj %>% select( != tribe)
  • obj %>% select( == tribe)

though this would already work with dplyr for data.frame's but not on taxon/etc. objects of course...hmmmm

clazz?

Minor question. There are multiple occurrences of clazz in the README.md. Should this be "class", or are you trying to avoid using the reserved word "class"?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.