Giter Site home page Giter Site logo

openintrostat / openintro Goto Github PK

View Code? Open in Web Editor NEW
226.0 38.0 176.0 173.01 MB

📦 R package for data and supplemental functions for OpenIntro resources

Home Page: http://openintrostat.github.io/openintro/

License: GNU General Public License v3.0

R 100.00%
openintro rstats rstats-package data

openintro's Introduction

openintro Hex logo for package

CRAN status R-CMD-check Lifecycle: stable CRAN RStudio mirror downloads

Supplemental functions and data for OpenIntro resources, which includes open-source textbooks and resources for introductory statistics at openintro.org. The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. The package also contains the datasets used in OpenIntro labs. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.

Installation

You can install the released version of openintro from CRAN with:

install.packages("openintro")

You can install the development version of openintro from GitHub with:

# install.packages("devtools")
library(devtools)
install_github("OpenIntroStat/openintro")

This package was produced as part of the OpenIntro project. For the accompanying textbook, visit openintro.org. A PDF of the textbook is free and paperbacks can be purchased online (royalty-free).

Questions, bugs, feature requests

You can file an issue to get help, report a bug, or make a feature request.

When filing an issue to get help or report a bug, please make a minimal reproducible example using the reprex package. If you haven’t heard of or used reprex before, you’re in for a treat! Seriously, reprex will make all of your R-question-asking endeavors easier (which is a pretty insane ROI for the five to ten minutes it’ll take you to learn what it’s all about). For additional reprex pointers, check out the Get help! section of the tidyverse site.

Before opening a new issue, be sure to search issues and pull requests to make sure the bug hasn’t been reported and/or already fixed in the development version. By default, the search will be pre-populated with is:issue is:open. You can edit the qualifiers (e.g. is:pr, is:closed) as needed. For example, you’d simply remove is:open to search all issues in the repo, open or closed.

Contributing

Process for adding new data to the package

The following steps use the devtools and usethis packages for various steps. We recommend using this process when suggesting new datasets to be added to the package. If the dataset is large (>500MB) or you’d like to add a function, please open an issue first for discussion before making the pull request.

  1. Fork and clone the repo with usethis::create_from_github("OpenIntroStat/openintro")
    • Note: If you have write access to the repo, you can skip this step.
  2. Start a new pull request with usethis::pr_init("BRANCH-NAME"), where BRANCH-NAME is an informative branch name.
  3. If adding a file that is not an .rda file to begin with (Excel, csv, etc.), create a folder in the data-raw folder with the name of the dataset (how you’d like it to show up in the package). Please use snake_case for naming, e.g. name_of_dataset.
  4. Place your dataset in its raw form in the folder.
  5. Also in the data-raw folder, create a new R script called name_of_dataset-dataprep.R and write the code needed to read in the file, make any modifications to the data that are needed (if any), and end with usethis::use_data() to save the data in the package as an .rda file with the ideal compression. See examples from other folders in data-raw for sample code. The contents of this folder do not end up in the package (the entire folder is ignored in the .Rbuildignore) so you don’t need to worry about adding package dependencies etc.
  6. In the R folder, create an R script called data-name_of_dataset and add documentation using Roxygen style. See other documentation files for help with style. In the examples, use tidyverse syntax but do not use library(tidyverse) and only use the relevant packages, e.g. library(dplyr), library(ggplot2).
  7. Restart R and run devtools::load_all() to make sure the data loads and run your examples to confirm they all work.
  8. Run devtools::document(), restart R, and then devtools::load_all(). Then, check out ?name_of_dataset to make sure the documentation looks as expected.
  9. Run devtools::check(). The only NOTE you should see as a result of the check should be about the package size. If any other ERRORs, NOTEs, or WARNINGs are generated, resolve them or open an issue for help.
  10. In the pkgdown.yml file, add the name of the dataset under reference, in the correct alphabetical order.
  11. Add a note in the NEWS.md with the new dataset you’ve added with a link to your GitHub username so we can acknowledge your contribution, e.g. “added by @mine-cetinkaya-rundel”.

Code of Conduct

Please note that the openintro project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

openintro's People

Contributors

ameliamn avatar andrewpbray avatar beanumber avatar daviddiez avatar hardin47 avatar jtr13 avatar mine-cetinkaya-rundel avatar ngoguened avatar npaterno avatar openintroorg avatar rudeboybert avatar sjvrensburg avatar suriyaa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openintro's Issues

dotPlot() collides with mosaic::dotPlot()

From looking at your examples, I'm not exactly sure what the purpose of your dotPlot() is supposed to be, but it is unfortunate that you have chosen a name that conflicts with the version in the mosaic package, which makes the kind of dot plot often seen in introductory statistics courses.

mosaic::dotPlot( ~ rnorm(500), width = 0.1)

image

As per the Korean font error

Hi,

When I render the image with Korean Character, the Korean characters are broken.
-. myPDF in variable.R

However, for instance, when I test the CairoPDF, the Korean characters are rendered correctly, but there are width and height issues.

image

I think that the other asian characters will have similar issues when using openintro package.

Thank you.

rosling_responses mentioned in text but not present in package

On page 191 the Fourth Edition of the textbook mentions the rosling_responses data set:

"We will use the rosling_responses data set to evaluate the hypothesis test ..."

Use of the texttt font for "rosling_responses" suggests that such a data set exists in the package, but it doesn't.

yrbss documentation

Do we know which year's survey is included in this dataset? Also, do we know if the variable called gender is what's identified in the 2017 data documentation as sex?

I'm happy to do a PR to clarify those things if we can track them down.

Why mask data sets in datasets?

This seems unnecessary and confusing:

library(openintro)
## Please visit openintro.org for free statistics materials
## 
## Attaching package: ‘openintro’
## 
## The following objects are masked from ‘package:datasets’:
## 
##     cars, chickwts, trees

code by chapter

Is there a place where I can find the R code by chapter for the openintro book ?

Email data corrections

  • In both email and email50 there are variables in the docs that don't exist in the data: period_mess and signoff -- should be removed from docs
  • email50 example code yields FALSE (random sampling change might be the cause?)
  • In both datasets indicator variables should be factors
  • cc is numeric, not indicator

Add a page with csv download for all data

This would be helpful for non-R users of the datasets.

@DavidDiez I know you host these on openintro.org but keeping synced seems a challenge. I could automate it here and post on the package websites and openintro.org could point to them. Or I suppose you could build the page on your end based on the automatically generated files in this repo as well. We should discuss which approach is preferable, but at least automatically generating files as we update the package seems like a good idea.

Remove message that appears when package loads

Referring to the text that says "Please visit openintro.org for free statistics". It shows up in the compiled markdown documents (as shown below), and yes, it's possible to mute that with the message = FALSE option in the chunk, but I think we want to be careful about teaching those to students who are new to R.

screen shot 2015-09-16 at 17 05 26

Leaving the issue here to be consider before the next version of the package...

[Bug]: fastfood data has incorrect salad variable

Contact Details

[email protected]

Bug

The fastfood data set has a salad variable with all 515 values "Other". Looking at the item descriptions, it does appear that there are actual salads in the data set.

Reproducible Example

library(openintro)
#> Loading required package: airports
#> Loading required package: cherryblossom
#> Loading required package: usdata
table(fastfood$salad)
#>
#> Other
#> 515

Expected Behavior

I expected to see some foods classified as salads and others, not.

Session Info

No response

Additional context

No response

yrbss isn't in the OpenIntro packages

Hi,
the yrbss data is used in the OpenIntro text.
The yrbss data is available to download on the Github site.
So far as I can tell, the yrbss data hasn't been added to the OpenIntro packages.
Should it be?

qqnormsim() ideas

  1. Use scales == "free" or better, add a scales argument that defaults to "free". [Else a sample with an outlier will cause the other plots to look quite different from how they would look if they were generated in isolation.]

  2. Don't hard code the number of simulations. Let 8 be the default if you like.

  3. rename first argument? It's a bit of an odd name. But I'm guessing it will typically be used without naming, so this is not such a big deal.

  4. Consider a version that doesn't label the original data but makes it one of the sample (randomly selecting which location). Not sure the best way to do the "reveal".

  5. Perhaps add a seed argument that sets the seed used. That would solve the reveal issue in one way, since the plot could be generated again withe the original data set distinguished.

  6. Complete the documentation and include examples.

Add additional citation to BAC

#' @source J. Malkevitch and L.M. Lesser. For All Practical Purposes:

From Jack Miller:

The blood alcohol data set has been around since 1992 and appeared in the Electronic Encyclopedia of Statistical Examples and Exercises. I worked on EESEE and used the data sets at OSU, so I am very familiar with that particular citation. :-) Here is a URL for that particular "story" in EESEE: http://bcs.whfreeman.com/WebPub/Statistics/shared_resources/EESEE/BloodAlcoholContent/index.html.

This change will need to propagate to IMS and other books that reference this dataset as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.