librarycarpentry / lc-data-intro Goto Github PK
View Code? Open in Web Editor NEWLibrary Carpentry: Introduction to Working with Data (Regular Expressions)
Home Page: https://librarycarpentry.org/lc-data-intro/
License: Other
Library Carpentry: Introduction to Working with Data (Regular Expressions)
Home Page: https://librarycarpentry.org/lc-data-intro/
License: Other
As part of the checkout process I propose a rephrasing in the Regular Expressions (lesson 1) segment (https://librarycarpentry.org/lc-data-intro/01-regular-expressions/index.html).
The existing training states:
But it would also match reorganise, reorganize, organises, organizes, organised, organized, etc.
However it does not explain why it would match all those words, instead of just organize and organise.
I propose a change to:
But because it is looking specifically for all the characters in the document that match that pattern, not just for that word, it would also match reorganise, reorganize, organises, organizes, organised, organized, etc.
Consider referencing NNLM's data thesaurus as a place to look up jargon in post-carpentry situations. This could also serve the purpose of exposing learners to terms that may not have come up during group chats as well. Link to the data thesaurus here: https://nnlm.gov/data/thesaurus
I'd recommend finding another way to describe the "The computer is stupid" entry in the Foundations lesson (https://librarycarpentry.org/lc-data-intro/03-foundations/index.html). Based on the Carpentries emphasis on inclusive language, this phrasing choice seems to be in conflict.
The file is referred to as: " swCoC.md file". Not sure if this an error -- it seems like it to me, but I might just not understand how everything's supposed to work.
@jt14den suggested we move the References listed at the bottom of https://librarycarpentry.github.io/lc-data-intro/04-regular-expressions/
Where should we move them to?
I ran into problem when working through the exercise "Extracting a substring in Google Sheets using regex".
I followed the 2014 Public Library Survey link in step 1 and downloaded a zip file, which contains 3 CSV files. There's no mention in the exercise which file to import to Google Sheets. I checked all three of them but cannot find a LOCATION column in any of the files. I'm not sure if I have downloaded what was intended to be used in this exercise?
See lesson-example/06-style-guide.
@ maintainer: Please label this as good first issue
.
Carpentries now has a style guide
https://docs.carpentries.org/topic_folders/communications/style-guide.html
Review 03-foundations.md (https://github.com/LibraryCarpentry/lc-data-intro/blob/gh-pages/_episodes/03-foundations.md) to ensure it follows the style guide
In 03-foundations there is a comic: https://twitter.com/visualisingdata/status/621957383464599552/photo/1. I think this comic is a bad fit for the lesson/library carpentry in general for a few reasons, and suggest that we remove it and find a different way of making the point that automation is not always efficient.
My concerns:
(1) the carpentries are committed to the idea that anyone can code; many of our learners don't identify as geeks, and they shouldn't feel like they have to. this also isolates folks who don't feel like geeks and makes the implicit lesson that they don't belong.
(2) it talks about "winning" and "losing" which I think is the wrong language around learning and experimenting - it's not a competition.
(3) it's a false binary!
In the first episode on Regular Expressions, the example of the organi[sz]e
regular expression at the end of the first section is a bit confusing.
There's some inconsistency with the last sentence of the first section where reorganise
, reorganize
, organises
, organizes
, organised
, organized
are formatted as code, which looks like the regex in the previous sentence, but I believe in this example they are supposed to be other strings that would match.
I would also be interested in helping to restructure the lesson to clarify some simpler concepts first, e.g. capitalizing a word (or not) makes a difference because upper case and lower case are treated as different characters in a string, and that ASCII has special characters for tab, space and newline. Then more complicated concepts like escape characters can be discussed after that. Generally I think it would help to space out introducing the different types of the metacharacters throughout the episode, since it's a lot to take in all at once.
If your Maintainer team has decided not to participate in the June 2019 lesson release, please close this issue.
To have this lesson included in the 18 June 2019 release, please confirm that the following items are true:
When all checkboxes above are completed, this lesson will be added to the 18 June lesson release. Please leave a comment on carpentries/lesson-infrastructure#26 or contact Erin Becker with questions ([email protected]).
This lesson has now been migrated to the Library Carpentry organisation.
All the issues raised for this lesson are still on the old repo at https://github.com/data-lessons/library-data-intro-DEPRECATED/issues
Find issues to resolve there and fix them here.
Carpentries now has a style guide
https://docs.carpentries.org/topic_folders/communications/style-guide.html
Review 02-jargon-busting.md (hhttps://github.com/LibraryCarpentry/lc-data-intro/blob/gh-pages/_episodes/02-jargon-busting.md) to ensure it follows the style guide
@libcce (cc @ccronje) - Chris, I just opened a PR #73 and tried to add @ppival @sharilaster and @antonangelo as reviewers (because they are listed as reviewer at https://github.com/LibraryCarpentry/lc-data-intro) but I couldn't. I'm presuming this means they don't have the correct edit access but that they should do? I've not changed anything because I'm out of touch with how LC is managing repo permissions.
There is a dead link in _episodes/04-regular-expressions.md the reference to 'swcCoC.md file' exists but the file has been deleted in commit "Remove CoC regex exercise specific to Calgary workshop" 01e1681
Based on a related closed ticket from 2018 #35 I assume the file is needed and was deleted by mistake. I'm going to create a pull request that restores the file.
I would suggest that this language be changed to something more straightforward from, for example, "
Can you guess what the regular expression ^[Oo]rgani.e\b will match?" to "What will the regular expression ^[Oo]rgani.e\b will match?"
The current language seems to imply that they'll be guessing answers instead of reasoning them out (possibly implying that they couldn't reason them out and would have to guess).
This language is throughout the 04_regular_expressions module. I haven't checked the others.
We currently recommend Notepad++ to Windows users and mention Atom only briefly. Because of its Teletype capability and integration with GitHub Desktop, how about we recommend it specifically?
PS: GitHub Desktop could also be somewhat central goal for Library Carpentry to move towards, because it brings the Git Bash with it as well, and there is a derivative lesson.
The setup page for this lesson (and, actually, all the lessons I've looked at) is using the default browser style instead of the Carpentries style. See https://librarycarpentry.github.io/lc-data-intro/setup/
Also, the top navigation bar is pointing to directories that don't exist - it looks as if a subdirectory "setup" has been hard coded into the page:
https://github.com/LibraryCarpentry/lc-data-intro/blob/gh-pages/setup.md
Carpentries now has a style guide
https://docs.carpentries.org/topic_folders/communications/style-guide.html
Review 05-quiz.md (https://github.com/LibraryCarpentry/lc-data-intro/blob/gh-pages/_episodes/05-quiz.md) to ensure it follows the style guide
Current link to a IMLS 2014 Public library survey is not pointing to a page that no longer exists.
I opened a pull request to an updated location with the file compressed in zip format.
As we've removed the 'history' stuff from this lesson and as we know that many people teach this a different way, this year is a good time to think about what we really want to achieve with this lesson. For me the keys are:
Instead of recommending specific text editors in Applications for writing, reading and outputting plain text files address why a text editor is helpful. List additional tools that might be missing.
For further background, see #67.
I've found that once people learn about backreferences and their potential use, they begin to see some of the incredibly time-saving things that can be accomplished with regular expressions.
I think a very basic example of using backreferences would be a good addition. Granted this addition might not be feasible within the constraints of 20 minutes of teaching time and 25 minutes for exercises and practice. However, I do think that the topic of regular expressions could warrant a longer, more involved lesson.
Please see these 2 examples with the note this is case-sensitive
. Aren't those actually case-_in_sensitive?
I feel that the "note" is misleading, because only the RE tokens A-Z
& a-z
are case-sensitive in themselves, but A-Za-z
makes the entire RE _in_sensitive.
Chatted with @jezcope a bit at carpentrycon and we thought parts of lc-spreadsheets can be utilized in data-intro. For example, Episode 1 can be a stand-alone module and used in data-intro.
The markdown formatting needs to be fixed in the Challenge/Solution section.
Ex:
Or, any other string that starts a line, begins with a letter o in lower or capital case, proceeds with rgani, has any character in the 7th position, follows with letter e and zero or more characters from the range [A-Za-z0-9]. {: .solution} {: .challenge}
The title "Data Intro for Librarians" leaves out a significant group of people who work in libraries but who aren't librarians. This can be a point of contention at some libraries and can leave people feeling unwelcome. I don't know if "For Libraries" is the best wording, but for Librarians could be alienating for library staff and other specialists without ML(I)Ss.
Migrated from data-lessons/library-data-intro-DEPRECATED#27. See discussions there.
Hi @ccronje @ppival @sharilaster @drjwbaker Before I submit a PR to replace:
https://librarycarpentry.org/lc-data-intro/05-quiz/index.html
https://librarycarpentry.org/lc-data-intro/06-quiz-answers/index.html
with The Carpentries challenge/solution format:
https://librarycarpentry.org/lc-data-intro/07-quiz/index.html
https://librarycarpentry.org/lc-data-intro/08-exercises/index.html
do you agree? Just double checking!
"foobar"is kind of a specific reference to have in these lessons. Is there another real word that could be used as an example?
Carpentries now has a style guide
https://docs.carpentries.org/topic_folders/communications/style-guide.html
Review 04-regular-expressions.md (https://github.com/LibraryCarpentry/lc-data-intro/blob/gh-pages/_episodes/04-regular-expressions.md) to ensure it follows the style guide
Carpentries now has a style guide
https://docs.carpentries.org/topic_folders/communications/style-guide.html
Review 06-quiz-answers.md (https://github.com/LibraryCarpentry/lc-data-intro/blob/gh-pages/_episodes/06-quiz-answers.md) to ensure it follows the style guide
In reading through the 03_Foundations section, I found the filenaming discussion hard to understand. I deal constantly with filenaming in my work, but I didn't understand the principles being presented here well enough to know how to revise. As a librarian reading that section, I also had no idea of how to name a file after reading it. I'll try to come back to this topic after reviewing the entire lc_data_intro episodes.
In the keyboard shortcuts table in episode 03 (foundations), it lists Ctrl+Tab as the shortcut to switch applications in Windows. However, I think it is actually Alt+Tab.
As discussed @ccronje @sharilaster @ppival, Foundations is now two episodes:
The File naming episode is a PR that needs to be approved/merged by one/all of you. If we are fine with this, I can update the episodes to reflect these changes. @sharilaster I included a very basic challenge of working with Markdown in File naming. What do you think? @ccronje I added a link to the UNIX Shell lesson at the end of File naming. Is this what you were thinking?
As we develop library carpentry into a lesson organization, we should move information about the program out of the intro lesson an into the website or workshop template. This first episode should be re-written to include an intro do the first lesson "data intro". One suggestion will be supply an intro to what will be covered in the lesson:
This is a reframing of #51 for #mozsprint, building on suggestions from @drjwbaker @ccronje @libcce and others in that thread to improve the presentation of Markdown as a useful tool for Library Carpentry audiences.
The Foundations episode (https://librarycarpentry.org/lc-data-intro/03-foundations/index.html)
already includes a section titled "Use machine readable plain text notation for formatting."
This section should be reworked to briefly explain why Markdown in particular is useful for library settings, point to additional resources, and possibly could include a brief illustration/example.
I recently used http://www.datacarpentry.org/rr-organization1/01-file-naming/ instead of the Naming files sensible things section in https://librarycarpentry.github.io/lc-data-intro/03-foundations/. I liked it b/c it introduces principles for why you would name things well. I suggest we incorporate this into the foundations section.
Carpentries now has a style guide https://docs.carpentries.org/topic_folders/communications/style-guide.html
Review 01-introduction.md (https://github.com/LibraryCarpentry/lc-data-intro/blob/gh-pages/_episodes/01-introduction.md) to ensure it follows the style guide
In 02-match-extract-strings.md the paragraph below is under the heading "Exercise finding phone numbers..." but I do not see how it relates to that heading. My suggestion is simply to add a new heading above the paragraph along the lines of "Using regular expressions when working with files and directories"
This is the paragraph in question:
"One of the reasons we stress the value of consistent and predictable directory and filenaming conventions is that working in this way enables you to use the computer to select files based on the characteristics of their file names. For example, if you have a bunch of files where the first four digits are the year and you only want to do something with files from '2017', then you can. Or if you have 'journal' somewhere in a filename when you have data about journals, you can use the computer to select just those files. Equally, using plain text formats means that you can go further and select files or elements of files based on characteristics of the data within those files."
I'd recommend mentioning, at least in passing, that there are many different regular expression engines in common use and that each engine has features and syntax that, while often quite similar, do differ from each other in meaningful ways.
Additionally, there is no mention of the specific engine used for the lesson's examples. Granted, the basic examples used in the lesson will work with any Perl-like engine (with the lesson's suggested online tools employing PCRE and Javascript for the most part), it would be a good thing to plant the idea in the head of students that they might need to learn a particular tool or language's regular expression implementation before using more advanced regex features.
A couple links that describe the varying features of some of the many regular expression engines in use:
https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines (basic overview as one would expect on wikipedia)
https://www.regular-expressions.info/refflavors.html (nice, detailed reference)
Move away from "I" statements in favor of "we" and let's.
In the Where To Go for Help section:
Begin by dentifying people on your table who can help: you will all be working from the same material, so someone around you may have mastered the point you are stuck at.
dentifying should be identifying
Carpentries now has a style guide
https://docs.carpentries.org/topic_folders/communications/style-guide.html
Review guide.md (https://github.com/LibraryCarpentry/lc-data-intro/blob/gh-pages/_extras/guide.md) to ensure it follows the style guide
When we talk about plain text formats, I suggest we introduce Markdown. We can point to Markdown being the tool that builds Library Carpentry lessons while still being a plain text format.
I often create (or open) a small Markdown doc on the fly where I format a bulleted list, a numbered list, add an image, add a Web link, and make some text bold and italic and show three sizes of headings. Then I run the file through pandoc and output that single file as a PDF, a Web page and also open it as a formatted Word document. This generally gets people excited about one file, many purposes. What do people think? Too much?
There is a missing character that will cause confusion to learners in the last paragraph of the opening section of Regular Expressions:
"For example, the period (.) means “match any character”, but if you want to match a period (.) then you will need to use a “" in front of it to signal to the regular expression processor that you want to use the period as a plain old period and not a metacharacter.
There should be a \ between the empty quotation marks in the middle of that sentence.
Watch Carli Spina talk on Universal Design for Learning and determine if there are common issues/improvements to address in the lesson.
Universal Design for Learning
In this Carpentries in Libraries community call, Carli Spina talks about Universal Design for Learning (UDL). She provides an overview of UDL and examples of how it can be applied to Carpentries lessons.
Slides: https://github.com/LibraryCarpentry/governance/blob/master/community-calls/2019-03-07-Spina-UDL.pdf
Video: https://www.youtube.com/watch?v=56rhFeU5-Ig&feature=youtu.be
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.