librarycarpentry / lc-data-intro Goto Github PK

View Code? Open in Web Editor NEW

29.0 16.0 84.0 11.29 MB

Library Carpentry: Introduction to Working with Data (Regular Expressions)

Home Page: https://librarycarpentry.org/lc-data-intro/

License: Other

carpentries library-carpentry lesson data-management regular-expressions regex english stable

lc-data-intro's People

Contributors

Stargazers

Watchers

Forkers

cmacdonell lsult yvonnemery emcaulay bertrandcaron allisongofman kylemonahan jduckles dagimy wsshaw pitviper6 scriptotek rosegmartin nano-jag katrinleinweber fdsayre marwahaha statkclee jodischneider jaguillette echolover mbkerr binxiepeterson bagnacan sarasrking hemmefelix miku avolkov ualibraries icecjan kostler ppival jmcgranahan ernstki mariapraetzellis cnancarrow btovar skip2mylou1 andreabrand kmiller621 bwilli07 adyork cherylhughey kylamj madwarner99 libraryan-prog tobyhodges morskyjezek bkmgit ldko lyndamk spswanz annajiat kristindawn mightylibrarian carmenlouw benguimbis siemiestar nwu-eresearch cclack zandilechansa lib-eric davidfkane cazzapinky weaverbel chrbknudsen umn-dash mlandryacenet abigailsparling sharilaster philreeddata jcszamosi kaitlinnewson kevintfrench erinbecker mausams2 dossantoss shinyen0112 dikshasahay gauri24-08-00 doujoudc cyrilbois aforestsomewhere

lc-data-intro's Issues

Adding more context to organi[sz]e example

As part of the checkout process I propose a rephrasing in the Regular Expressions (lesson 1) segment (https://librarycarpentry.org/lc-data-intro/01-regular-expressions/index.html).

The existing training states:

But it would also match reorganise, reorganize, organises, organizes, organised, organized, etc.

However it does not explain why it would match all those words, instead of just organize and organise.

I propose a change to:

But because it is looking specifically for all the characters in the document that match that pattern, not just for that word, it would also match reorganise, reorganize, organises, organizes, organised, organized, etc.

Review 02-jargon-busting.md

Consider referencing NNLM's data thesaurus as a place to look up jargon in post-carpentry situations. This could also serve the purpose of exposing learners to terms that may not have come up during group chats as well. Link to the data thesaurus here: https://nnlm.gov/data/thesaurus

Modifying language in LC:Intro:Foundations

I'd recommend finding another way to describe the "The computer is stupid" entry in the Foundations lesson (https://librarycarpentry.org/lc-data-intro/03-foundations/index.html). Based on the Carpentries emphasis on inclusive language, this phrasing choice seems to be in conflict.

in 04_regular expressions a file is linked that returns a 404

The file is referred to as: " swCoC.md file". Not sure if this an error -- it seems like it to me, but I might just not understand how everything's supposed to work.

Move References

@jt14den suggested we move the References listed at the bottom of https://librarycarpentry.github.io/lc-data-intro/04-regular-expressions/

Where should we move them to?

A separate episode at the end of lesson?
Under Extras > Reference? https://librarycarpentry.github.io/lc-data-intro/reference/

Problem working through final exercise in 04-regular-expressions

I ran into problem when working through the exercise "Extracting a substring in Google Sheets using regex".
I followed the 2014 Public Library Survey link in step 1 and downloaded a zip file, which contains 3 CSV files. There's no mention in the exercise which file to import to Google Sheets. I checked all three of them but cannot find a LOCATION column in any of the files. I'm not sure if I have downloaded what was intended to be used in this exercise?

Update keyboard shortcuts to Carpentries style

See lesson-example/06-style-guide.

@ maintainer: Please label this as good first issue.

Review 03-foundations.md against Carpentries style guide

Carpentries now has a style guide
https://docs.carpentries.org/topic_folders/communications/style-guide.html

Review 03-foundations.md (https://github.com/LibraryCarpentry/lc-data-intro/blob/gh-pages/_episodes/03-foundations.md) to ensure it follows the style guide

Concern about geek/non-geek comic

In 03-foundations there is a comic: https://twitter.com/visualisingdata/status/621957383464599552/photo/1. I think this comic is a bad fit for the lesson/library carpentry in general for a few reasons, and suggest that we remove it and find a different way of making the point that automation is not always efficient.

My concerns:

(1) the carpentries are committed to the idea that anyone can code; many of our learners don't identify as geeks, and they shouldn't feel like they have to. this also isolates folks who don't feel like geeks and makes the implicit lesson that they don't belong.
(2) it talks about "winning" and "losing" which I think is the wrong language around learning and experimenting - it's not a competition.
(3) it's a false binary!

Clarifying first regex example

In the first episode on Regular Expressions, the example of the organi[sz]e regular expression at the end of the first section is a bit confusing.

There's some inconsistency with the last sentence of the first section where reorganise, reorganize, organises, organizes, organised, organized are formatted as code, which looks like the regex in the previous sentence, but I believe in this example they are supposed to be other strings that would match.

I would also be interested in helping to restructure the lesson to clarify some simpler concepts first, e.g. capitalizing a word (or not) makes a difference because upper case and lower case are treated as different characters in a string, and that ASCII has special characters for tab, space and newline. Then more complicated concepts like escape characters can be discussed after that. Generally I think it would help to space out introducing the different types of the metacharacters throughout the episode, since it's a lot to take in all at once.

June 2019 Lesson Release checklist

If your Maintainer team has decided not to participate in the June 2019 lesson release, please close this issue.

To have this lesson included in the 18 June 2019 release, please confirm that the following items are true:

When all checkboxes above are completed, this lesson will be added to the 18 June lesson release. Please leave a comment on carpentries/lesson-infrastructure#26 or contact Erin Becker with questions ([email protected]).

Find issues to resolve for this lesson

This lesson has now been migrated to the Library Carpentry organisation.

All the issues raised for this lesson are still on the old repo at https://github.com/data-lessons/library-data-intro-DEPRECATED/issues

Find issues to resolve there and fix them here.

Review 02-jargon-busting.md against Carpentries style guide

Carpentries now has a style guide
https://docs.carpentries.org/topic_folders/communications/style-guide.html

Review 02-jargon-busting.md (hhttps://github.com/LibraryCarpentry/lc-data-intro/blob/gh-pages/_episodes/02-jargon-busting.md) to ensure it follows the style guide

Maintainers do not appear to have edit access that I think they should have

@libcce (cc @ccronje) - Chris, I just opened a PR #73 and tried to add @ppival @sharilaster and @antonangelo as reviewers (because they are listed as reviewer at https://github.com/LibraryCarpentry/lc-data-intro) but I couldn't. I'm presuming this means they don't have the correct edit access but that they should do? I've not changed anything because I'm out of touch with how LC is managing repo permissions.

swcCoC.md is missing again in 04-regular-expressions

There is a dead link in _episodes/04-regular-expressions.md the reference to 'swcCoC.md file' exists but the file has been deleted in commit "Remove CoC regex exercise specific to Calgary workshop" 01e1681

Based on a related closed ticket from 2018 #35 I assume the file is needed and was deleted by mistake. I'm going to create a pull request that restores the file.

Repurpose lesson to regular expressions, move intro, jargon, foundations to overview

Intro to LC and Foundations are used optionally by instructors per feedback collected by the maintainers
Review Workshop Overview to A) see if using the overview is how we want to proceed moving forward B) if there are errors/edits that need to be fixed
Intro to LC has been revamped, is now optional, and moved to Workshop Overview:
https://github.com/LibraryCarpentry/lc-overview/blob/gh-pages/_episodes/02-intro-to-library-carpentry.md --> Recommend deleting Intro to LC in Intro to Data
Foundations has been revamped, is now optional, and moved to Workshop Overview:
See https://github.com/LibraryCarpentry/lc-overview/tree/gh-pages/_episodes and Keyboard Shortcuts, File Naming, Computational Approach, Further Reading episodes --> Recommend deleting Foundations in Intro to Data
Jargon Busting has been moved to Workshop Overview:
https://github.com/LibraryCarpentry/lc-overview/blob/gh-pages/_episodes/03-jargon-busting.md --> Recommend deleting Jargon Busting from Intro to Data
Recommend splitting the Regular Expressions episode into two episodes, the new episode will start at Exercise Using Regex101.com
Recommend renaming the lesson to Regular Expressions

"Can you guess..." language for exercises

I would suggest that this language be changed to something more straightforward from, for example, "
Can you guess what the regular expression ^[Oo]rgani.e\b will match?" to "What will the regular expression ^[Oo]rgani.e\b will match?"

The current language seems to imply that they'll be guessing answers instead of reasoning them out (possibly implying that they couldn't reason them out and would have to guess).

This language is throughout the 04_regular_expressions module. I haven't checked the others.

Recommend Atom as editor

We currently recommend Notepad++ to Windows users and mention Atom only briefly. Because of its Teletype capability and integration with GitHub Desktop, how about we recommend it specifically?

PS: GitHub Desktop could also be somewhat central goal for Library Carpentry to move towards, because it brings the Git Bash with it as well, and there is a derivative lesson.

Setup page style and navigation header don't match other lesson pages

The setup page for this lesson (and, actually, all the lessons I've looked at) is using the default browser style instead of the Carpentries style. See https://librarycarpentry.github.io/lc-data-intro/setup/

Also, the top navigation bar is pointing to directories that don't exist - it looks as if a subdirectory "setup" has been hard coded into the page:
https://github.com/LibraryCarpentry/lc-data-intro/blob/gh-pages/setup.md

Review 05-quiz.md against Carpentries style guide

Carpentries now has a style guide
https://docs.carpentries.org/topic_folders/communications/style-guide.html

Review 05-quiz.md (https://github.com/LibraryCarpentry/lc-data-intro/blob/gh-pages/_episodes/05-quiz.md) to ensure it follows the style guide

04-regular-expressions link is broken to 2014 Public Library Survey dataset

Current link to a IMLS 2014 Public library survey is not pointing to a page that no longer exists.

I opened a pull request to an updated location with the file compressed in zip format.

Give this lesson a clear purpose

As we've removed the 'history' stuff from this lesson and as we know that many people teach this a different way, this year is a good time to think about what we really want to achieve with this lesson. For me the keys are:

an ice breaker that helps attendees feel comfortable about what they do/don't know.
a simple intro to 'programming', programmatic thinking, and how an LC workshop works ('regex' works well)
an opportunity to think about the connection between library work and programmatic tasks.

Add section on why use a text editor

Instead of recommending specific text editors in Applications for writing, reading and outputting plain text files address why a text editor is helpful. List additional tools that might be missing.

For further background, see #67.

Include information and simple example of using backreferences

I've found that once people learn about backreferences and their potential use, they begin to see some of the incredibly time-saving things that can be accomplished with regular expressions.

I think a very basic example of using backreferences would be a good addition. Granted this addition might not be feasible within the constraints of 20 minutes of teaching time and 25 minutes for exercises and practice. However, I do think that the topic of regular expressions could warrant a longer, more involved lesson.

Regex episode: "case-sensitive" incorrect?

Please see these 2 examples with the note this is case-sensitive. Aren't those actually case-_in_sensitive?

I feel that the "note" is misleading, because only the RE tokens A-Z & a-z are case-sensitive in themselves, but A-Za-z makes the entire RE _in_sensitive.

Incorporate parts of tidy spreadsheets into data-intro

Chatted with @jezcope a bit at carpentrycon and we thought parts of lc-spreadsheets can be utilized in data-intro. For example, Episode 1 can be a stand-alone module and used in data-intro.

Broken formatting in the Regex lesson

The markdown formatting needs to be fixed in the Challenge/Solution section.
Ex:
Or, any other string that starts a line, begins with a letter o in lower or capital case, proceeds with rgani, has any character in the 7th position, follows with letter e and zero or more characters from the range [A-Za-z0-9]. {: .solution} {: .challenge}

Rename "Data Intro for Librarians" to "for Libraries"?

The title "Data Intro for Librarians" leaves out a significant group of people who work in libraries but who aren't librarians. This can be a point of contention at some libraries and can leave people feeling unwelcome. I don't know if "For Libraries" is the best wording, but for Librarians could be alienating for library staff and other specialists without ML(I)Ss.

Incorporate real Library examples in text & interactive exercise

Migrated from data-lessons/library-data-intro-DEPRECATED#27. See discussions there.

Replace multiple choice/exercises with Carpentries challenge/solution format

Hi @ccronje @ppival @sharilaster @drjwbaker Before I submit a PR to replace:

https://librarycarpentry.org/lc-data-intro/05-quiz/index.html
https://librarycarpentry.org/lc-data-intro/06-quiz-answers/index.html

with The Carpentries challenge/solution format:

https://librarycarpentry.org/lc-data-intro/07-quiz/index.html
https://librarycarpentry.org/lc-data-intro/08-exercises/index.html

do you agree? Just double checking!

04-regular-expressions - consider using something other than "foobar"?

"foobar"is kind of a specific reference to have in these lessons. Is there another real word that could be used as an example?

URLs and filenaming discussion in 03_foundations

In reading through the 03_Foundations section, I found the filenaming discussion hard to understand. I deal constantly with filenaming in my work, but I didn't understand the principles being presented here well enough to know how to revise. As a librarian reading that section, I also had no idea of how to name a file after reading it. I'll try to come back to this topic after reviewing the entire lc_data_intro episodes.

Switch applications shortcut for Windows should be Alt+Tab?

In the keyboard shortcuts table in episode 03 (foundations), it lists Ctrl+Tab as the shortcut to switch applications in Windows. However, I think it is actually Alt+Tab.

Replace Foundations w/ Why automate? and File naming

As discussed @ccronje @sharilaster @ppival, Foundations is now two episodes:

The File naming episode is a PR that needs to be approved/merged by one/all of you. If we are fine with this, I can update the episodes to reflect these changes. @sharilaster I included a very basic challenge of working with Markdown in File naming. What do you think? @ccronje I added a link to the UNIX Shell lesson at the end of File naming. Is this what you were thinking?

Rework first lesson 01-introduction.md - move info about library carpentry to website or workshop template

As we develop library carpentry into a lesson organization, we should move information about the program out of the intro lesson an into the website or workshop template. This first episode should be re-written to include an intro do the first lesson "data intro". One suggestion will be supply an intro to what will be covered in the lesson:

breaking down jargon tech terms that confuse
why we'd want to automate things thru computation
how using flat files and naming conventions enables automation
why pattern matching is powerful

present Markdown as a useful tool

This is a reframing of #51 for #mozsprint, building on suggestions from @drjwbaker @ccronje @libcce and others in that thread to improve the presentation of Markdown as a useful tool for Library Carpentry audiences.

The Foundations episode (https://librarycarpentry.org/lc-data-intro/03-foundations/index.html)
already includes a section titled "Use machine readable plain text notation for formatting."

This section should be reworked to briefly explain why Markdown in particular is useful for library settings, point to additional resources, and possibly could include a brief illustration/example.

change file naming convention section to use concepts introduced in dc rr-organization

I recently used http://www.datacarpentry.org/rr-organization1/01-file-naming/ instead of the Naming files sensible things section in https://librarycarpentry.github.io/lc-data-intro/03-foundations/. I liked it b/c it introduces principles for why you would name things well. I suggest we incorporate this into the foundations section.

Review 01-introduction.md against Carpentries style guide

Carpentries now has a style guide https://docs.carpentries.org/topic_folders/communications/style-guide.html

Review 01-introduction.md (https://github.com/LibraryCarpentry/lc-data-intro/blob/gh-pages/_episodes/01-introduction.md) to ensure it follows the style guide

A paragraph that lacks context

In 02-match-extract-strings.md the paragraph below is under the heading "Exercise finding phone numbers..." but I do not see how it relates to that heading. My suggestion is simply to add a new heading above the paragraph along the lines of "Using regular expressions when working with files and directories"

This is the paragraph in question:
"One of the reasons we stress the value of consistent and predictable directory and filenaming conventions is that working in this way enables you to use the computer to select files based on the characteristics of their file names. For example, if you have a bunch of files where the first four digits are the year and you only want to do something with files from '2017', then you can. Or if you have 'journal' somewhere in a filename when you have data about journals, you can use the computer to select just those files. Equally, using plain text formats means that you can go further and select files or elements of files based on characteristics of the data within those files."

No mention of the many regular expression engines in use nor the engine used in the lesson's examples

I'd recommend mentioning, at least in passing, that there are many different regular expression engines in common use and that each engine has features and syntax that, while often quite similar, do differ from each other in meaningful ways.

Additionally, there is no mention of the specific engine used for the lesson's examples. Granted, the basic examples used in the lesson will work with any Perl-like engine (with the lesson's suggested online tools employing PCRE and Javascript for the most part), it would be a good thing to plant the idea in the head of students that they might need to learn a particular tool or language's regular expression implementation before using more advanced regex features.

A couple links that describe the varying features of some of the many regular expression engines in use:
https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines (basic overview as one would expect on wikipedia)
https://www.regular-expressions.info/refflavors.html (nice, detailed reference)

Re-write to remove use of "I" in the 03-foundations.md lesson.

Move away from "I" statements in favor of "we" and let's.

typo: dentifying rather then identifying

In the Where To Go for Help section:
Begin by dentifying people on your table who can help: you will all be working from the same material, so someone around you may have mastered the point you are stuck at.

dentifying should be identifying

Review guide.md against Carpentries style guide

Carpentries now has a style guide
https://docs.carpentries.org/topic_folders/communications/style-guide.html

Review guide.md (https://github.com/LibraryCarpentry/lc-data-intro/blob/gh-pages/_extras/guide.md) to ensure it follows the style guide

Add info about Markdown and perhaps a demo of its use?

When we talk about plain text formats, I suggest we introduce Markdown. We can point to Markdown being the tool that builds Library Carpentry lessons while still being a plain text format.

I often create (or open) a small Markdown doc on the fly where I format a bulleted list, a numbered list, add an image, add a Web link, and make some text bold and italic and show three sizes of headings. Then I run the file through pandoc and output that single file as a PDF, a Web page and also open it as a formatted Word document. This generally gets people excited about one file, many purposes. What do people think? Too much?

Error in 04-regular-expressions intro section

There is a missing character that will cause confusion to learners in the last paragraph of the opening section of Regular Expressions:

"For example, the period (.) means “match any character”, but if you want to match a period (.) then you will need to use a “" in front of it to signal to the regular expression processor that you want to use the period as a plain old period and not a metacharacter.

There should be a \ between the empty quotation marks in the middle of that sentence.

Recommend remove 01 "Introduction to Library Carpentry"

@jt14den and I discussed, and we feel that this section is not in line with other lessons. Introduction to Library Carpentry should be covered outside of a lesson. But before removing, I'd prefer to get some comments in favor / not in favor. Even if it's just @jt14den

Address accessibility/universal design in the lesson

Watch Carli Spina talk on Universal Design for Learning and determine if there are common issues/improvements to address in the lesson.

Universal Design for Learning
In this Carpentries in Libraries community call, Carli Spina talks about Universal Design for Learning (UDL). She provides an overview of UDL and examples of how it can be applied to Carpentries lessons.
Slides: https://github.com/LibraryCarpentry/governance/blob/master/community-calls/2019-03-07-Spina-UDL.pdf
Video: https://www.youtube.com/watch?v=56rhFeU5-Ig&feature=youtu.be

librarycarpentry / lc-data-intro Goto Github PK

lc-data-intro's People

Contributors

Stargazers

Watchers

Forkers

lc-data-intro's Issues

Recommend Projects

Recommend Topics

Recommend Org