Giter Site home page Giter Site logo

librarycarpentry / lc-data-intro Goto Github PK

View Code? Open in Web Editor NEW
29.0 16.0 84.0 11.29 MB

Library Carpentry: Introduction to Working with Data (Regular Expressions)

Home Page: https://librarycarpentry.org/lc-data-intro/

License: Other

carpentries library-carpentry lesson data-management regular-expressions regex english stable

lc-data-intro's People

Contributors

bkmgit avatar ccronje avatar danmichaelo avatar emcaulay avatar erinbecker avatar ernstki avatar fdsayre avatar gvwilson avatar jcszamosi avatar jt14den avatar kevintfrench avatar kmiller621 avatar libcce avatar lsult avatar marwahaha avatar mbkerr avatar miku avatar mkuzak avatar mlandryacenet avatar naupaka avatar philreeddata avatar pitviper6 avatar ppival avatar sharilaster avatar tobyhodges avatar varachkina avatar weaverbel avatar wking avatar yvonnemery avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lc-data-intro's Issues

Adding more context to organi[sz]e example

As part of the checkout process I propose a rephrasing in the Regular Expressions (lesson 1) segment (https://librarycarpentry.org/lc-data-intro/01-regular-expressions/index.html).

The existing training states:

But it would also match reorganise, reorganize, organises, organizes, organised, organized, etc.

However it does not explain why it would match all those words, instead of just organize and organise.

I propose a change to:

But because it is looking specifically for all the characters in the document that match that pattern, not just for that word, it would also match reorganise, reorganize, organises, organizes, organised, organized, etc.

Review 02-jargon-busting.md

Consider referencing NNLM's data thesaurus as a place to look up jargon in post-carpentry situations. This could also serve the purpose of exposing learners to terms that may not have come up during group chats as well. Link to the data thesaurus here: https://nnlm.gov/data/thesaurus

Problem working through final exercise in 04-regular-expressions

I ran into problem when working through the exercise "Extracting a substring in Google Sheets using regex".
I followed the 2014 Public Library Survey link in step 1 and downloaded a zip file, which contains 3 CSV files. There's no mention in the exercise which file to import to Google Sheets. I checked all three of them but cannot find a LOCATION column in any of the files. I'm not sure if I have downloaded what was intended to be used in this exercise?

Concern about geek/non-geek comic

In 03-foundations there is a comic: https://twitter.com/visualisingdata/status/621957383464599552/photo/1. I think this comic is a bad fit for the lesson/library carpentry in general for a few reasons, and suggest that we remove it and find a different way of making the point that automation is not always efficient.

My concerns:

(1) the carpentries are committed to the idea that anyone can code; many of our learners don't identify as geeks, and they shouldn't feel like they have to. this also isolates folks who don't feel like geeks and makes the implicit lesson that they don't belong.
(2) it talks about "winning" and "losing" which I think is the wrong language around learning and experimenting - it's not a competition.
(3) it's a false binary!


Clarifying first regex example

In the first episode on Regular Expressions, the example of the organi[sz]e regular expression at the end of the first section is a bit confusing.

There's some inconsistency with the last sentence of the first section where reorganise, reorganize, organises, organizes, organised, organized are formatted as code, which looks like the regex in the previous sentence, but I believe in this example they are supposed to be other strings that would match.

I would also be interested in helping to restructure the lesson to clarify some simpler concepts first, e.g. capitalizing a word (or not) makes a difference because upper case and lower case are treated as different characters in a string, and that ASCII has special characters for tab, space and newline. Then more complicated concepts like escape characters can be discussed after that. Generally I think it would help to space out introducing the different types of the metacharacters throughout the episode, since it's a lot to take in all at once.

June 2019 Lesson Release checklist

If your Maintainer team has decided not to participate in the June 2019 lesson release, please close this issue.

To have this lesson included in the 18 June 2019 release, please confirm that the following items are true:

  • Example code chunks run as expected
  • Challenges / exercises run as expected
  • Challenge / exercise solutions are correct
  • Call out boxes (exercises, discussions, tips, etc) render correctly
  • A schedule appears on the lesson homepage (e.g. not “00:00”)
  • Each episode includes learning objectives
  • Each episode includes questions
  • Each episode includes key points
  • Setup instructions are up-to-date, correct, clear, and complete
  • File structure is clean (e.g. delete deprecated files, insure filenames are consistent)
  • Some Instructor notes are provided
  • Lesson links work as expected

When all checkboxes above are completed, this lesson will be added to the 18 June lesson release. Please leave a comment on carpentries/lesson-infrastructure#26 or contact Erin Becker with questions ([email protected]).

swcCoC.md is missing again in 04-regular-expressions

There is a dead link in _episodes/04-regular-expressions.md the reference to 'swcCoC.md file' exists but the file has been deleted in commit "Remove CoC regex exercise specific to Calgary workshop" 01e1681

Based on a related closed ticket from 2018 #35 I assume the file is needed and was deleted by mistake. I'm going to create a pull request that restores the file.

Repurpose lesson to regular expressions, move intro, jargon, foundations to overview

"Can you guess..." language for exercises

I would suggest that this language be changed to something more straightforward from, for example, "
Can you guess what the regular expression ^[Oo]rgani.e\b will match?" to "What will the regular expression ^[Oo]rgani.e\b will match?"

The current language seems to imply that they'll be guessing answers instead of reasoning them out (possibly implying that they couldn't reason them out and would have to guess).

This language is throughout the 04_regular_expressions module. I haven't checked the others.

Give this lesson a clear purpose

As we've removed the 'history' stuff from this lesson and as we know that many people teach this a different way, this year is a good time to think about what we really want to achieve with this lesson. For me the keys are:

  • an ice breaker that helps attendees feel comfortable about what they do/don't know.
  • a simple intro to 'programming', programmatic thinking, and how an LC workshop works ('regex' works well)
  • an opportunity to think about the connection between library work and programmatic tasks.

Include information and simple example of using backreferences

I've found that once people learn about backreferences and their potential use, they begin to see some of the incredibly time-saving things that can be accomplished with regular expressions.

I think a very basic example of using backreferences would be a good addition. Granted this addition might not be feasible within the constraints of 20 minutes of teaching time and 25 minutes for exercises and practice. However, I do think that the topic of regular expressions could warrant a longer, more involved lesson.

Broken formatting in the Regex lesson

The markdown formatting needs to be fixed in the Challenge/Solution section.
Ex:
Or, any other string that starts a line, begins with a letter o in lower or capital case, proceeds with rgani, has any character in the 7th position, follows with letter e and zero or more characters from the range [A-Za-z0-9]. {: .solution} {: .challenge}

Rename "Data Intro for Librarians" to "for Libraries"?

The title "Data Intro for Librarians" leaves out a significant group of people who work in libraries but who aren't librarians. This can be a point of contention at some libraries and can leave people feeling unwelcome. I don't know if "For Libraries" is the best wording, but for Librarians could be alienating for library staff and other specialists without ML(I)Ss.

URLs and filenaming discussion in 03_foundations

In reading through the 03_Foundations section, I found the filenaming discussion hard to understand. I deal constantly with filenaming in my work, but I didn't understand the principles being presented here well enough to know how to revise. As a librarian reading that section, I also had no idea of how to name a file after reading it. I'll try to come back to this topic after reviewing the entire lc_data_intro episodes.

Replace Foundations w/ Why automate? and File naming

As discussed @ccronje @sharilaster @ppival, Foundations is now two episodes:

The File naming episode is a PR that needs to be approved/merged by one/all of you. If we are fine with this, I can update the episodes to reflect these changes. @sharilaster I included a very basic challenge of working with Markdown in File naming. What do you think? @ccronje I added a link to the UNIX Shell lesson at the end of File naming. Is this what you were thinking?

Rework first lesson 01-introduction.md - move info about library carpentry to website or workshop template

As we develop library carpentry into a lesson organization, we should move information about the program out of the intro lesson an into the website or workshop template. This first episode should be re-written to include an intro do the first lesson "data intro". One suggestion will be supply an intro to what will be covered in the lesson:

  • breaking down jargon tech terms that confuse
  • why we'd want to automate things thru computation
  • how using flat files and naming conventions enables automation
  • why pattern matching is powerful

present Markdown as a useful tool

This is a reframing of #51 for #mozsprint, building on suggestions from @drjwbaker @ccronje @libcce and others in that thread to improve the presentation of Markdown as a useful tool for Library Carpentry audiences.

The Foundations episode (https://librarycarpentry.org/lc-data-intro/03-foundations/index.html)
already includes a section titled "Use machine readable plain text notation for formatting."

This section should be reworked to briefly explain why Markdown in particular is useful for library settings, point to additional resources, and possibly could include a brief illustration/example.

A paragraph that lacks context

In 02-match-extract-strings.md the paragraph below is under the heading "Exercise finding phone numbers..." but I do not see how it relates to that heading. My suggestion is simply to add a new heading above the paragraph along the lines of "Using regular expressions when working with files and directories"

This is the paragraph in question:
"One of the reasons we stress the value of consistent and predictable directory and filenaming conventions is that working in this way enables you to use the computer to select files based on the characteristics of their file names. For example, if you have a bunch of files where the first four digits are the year and you only want to do something with files from '2017', then you can. Or if you have 'journal' somewhere in a filename when you have data about journals, you can use the computer to select just those files. Equally, using plain text formats means that you can go further and select files or elements of files based on characteristics of the data within those files."

No mention of the many regular expression engines in use nor the engine used in the lesson's examples

I'd recommend mentioning, at least in passing, that there are many different regular expression engines in common use and that each engine has features and syntax that, while often quite similar, do differ from each other in meaningful ways.

Additionally, there is no mention of the specific engine used for the lesson's examples. Granted, the basic examples used in the lesson will work with any Perl-like engine (with the lesson's suggested online tools employing PCRE and Javascript for the most part), it would be a good thing to plant the idea in the head of students that they might need to learn a particular tool or language's regular expression implementation before using more advanced regex features.

A couple links that describe the varying features of some of the many regular expression engines in use:
https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines (basic overview as one would expect on wikipedia)
https://www.regular-expressions.info/refflavors.html (nice, detailed reference)

typo: dentifying rather then identifying

In the Where To Go for Help section:
Begin by dentifying people on your table who can help: you will all be working from the same material, so someone around you may have mastered the point you are stuck at.

dentifying should be identifying

Add info about Markdown and perhaps a demo of its use?

When we talk about plain text formats, I suggest we introduce Markdown. We can point to Markdown being the tool that builds Library Carpentry lessons while still being a plain text format.

I often create (or open) a small Markdown doc on the fly where I format a bulleted list, a numbered list, add an image, add a Web link, and make some text bold and italic and show three sizes of headings. Then I run the file through pandoc and output that single file as a PDF, a Web page and also open it as a formatted Word document. This generally gets people excited about one file, many purposes. What do people think? Too much?

Error in 04-regular-expressions intro section

There is a missing character that will cause confusion to learners in the last paragraph of the opening section of Regular Expressions:

"For example, the period (.) means “match any character”, but if you want to match a period (.) then you will need to use a “" in front of it to signal to the regular expression processor that you want to use the period as a plain old period and not a metacharacter.

There should be a \ between the empty quotation marks in the middle of that sentence.

Address accessibility/universal design in the lesson

Watch Carli Spina talk on Universal Design for Learning and determine if there are common issues/improvements to address in the lesson.

Universal Design for Learning
In this Carpentries in Libraries community call, Carli Spina talks about Universal Design for Learning (UDL). She provides an overview of UDL and examples of how it can be applied to Carpentries lessons.
Slides: https://github.com/LibraryCarpentry/governance/blob/master/community-calls/2019-03-07-Spina-UDL.pdf
Video: https://www.youtube.com/watch?v=56rhFeU5-Ig&feature=youtu.be

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.