Giter Site home page Giter Site logo

learnbyexample / learn_gnuawk Goto Github PK

View Code? Open in Web Editor NEW
1.1K 16.0 75.0 4.96 MB

Example based guide to mastering GNU awk

Home Page: https://learnbyexample.github.io/learn_gnuawk/

License: MIT License

Shell 99.51% Awk 0.49%
ebooks gnu awk regex command-line learn-by-doing exercises linux one-liners

learn_gnuawk's Introduction

CLI text processing with GNU awk

Example based guide to mastering GNU awk. Visit https://youtu.be/KIa_EaYwGDI for a short video about the book.

CLI text processing with GNU awk ebook cover image

The book also includes exercises to test your understanding, which are presented together as a single file in this repo — Exercises.md.

For solutions to the exercises, see Exercise_solutions.md.

You can also use this interactive TUI app to practice some of the exercises from the book.

See Version_changes.md to keep track of changes made to the book.


E-book

For a preview of the book, see sample chapters.

The book can also be viewed as a single markdown file in this repo. See my blogpost on generating pdfs from markdown using pandoc if you are interested in the ebook creation process.

For the web version of the book, visit https://learnbyexample.github.io/learn_gnuawk/


Testimonials

Step up your cli fu with this fabulous intro & deep dive into awk. I learned a ton of tricks!

feedback on twitter

I consider myself pretty experienced at shell-fu and capable of doing most things I set out to achieve in either bash scripts or fearless one-liners. However, my awk is rudimentary at best, I think mostly because it's such an unforgiving environment to experiment in.

These books you've written are great for a bit of first principles insight and then quickly building up to functional usage. I will have no hesitation in referring colleagues to them!

feedback on Hacker News


Feedback and Contributing

⚠️ ⚠️ Please DO NOT submit pull requests. Main reason being any modification requires changes in multiple places.

I would highly appreciate it if you'd let me know how you felt about this book. It could be anything from a simple thank you, pointing out a typo, mistakes in code snippets, which aspects of the book worked for you (or didn't!) and so on. Reader feedback is essential and especially so for self-published authors.

You can reach me via:


Table of Contents

  1. Preface
  2. Installation and Documentation
  3. awk introduction
  4. Regular Expressions
  5. Field separators
  6. Record separators
  7. In-place file editing
  8. Using shell variables
  9. Control Structures
  10. Built-in functions
  11. Multiple file input
  12. Processing multiple records
  13. Two file processing
  14. Dealing with duplicates
  15. awk scripts
  16. Gotchas and Tips
  17. Further Reading

Acknowledgements

Special thanks to all my friends and online acquaintances for their help, support and encouragement, especially during these difficult times.


License

The book is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The code snippets are licensed under MIT, see LICENSE file.

learn_gnuawk's People

Contributors

learnbyexample avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

learn_gnuawk's Issues

Semantic & grammatical issues with preface

The preface says,

As an analogy, consider learning to drive a bike or a car.

But, we don't drive bikes; we ride them.

Later, under Feedback and Errata it says,

I would highly appreciate if you'd let me know how you felt about this book, it would help to improve this book as well as my future attempts.

This is a comma spliced run-on sentence. The comma should be a period.

Wrong explanation in code for optional meta character

In file gnu_awk.md there is told '<' to be replaced with '\<' only if not preceded by '\' but that is never the case because optional meta character matches one or zero occurrences so this should be something like this '<' to be replaced with '\<' optionally preceded by '\'. A way to prove can be just change the code & run

$ echo 'blah \< foo bar < blah baz <' | awk '{gsub(/\\?</, "X")} 1'
blah X foo bar X blah baz X

Exercises 'awk introduction' b

The request is

For the input file addr.txt, display first field of lines not containing y. Consider space as the field separator for this file.

Answer should be

$ awk '$1 !~ /y/' addr.txt
Hello World
How are you
This game is good
12345
You are funny

The check will be applied to the first field.

Regexp exercise 15): solution improvement

First of all, thanks for the great tutorial and the CLI app for exercising!

Sorry, I accidentally pressed Submit before finishing my issue. Here's the rest.

In question 21/88, i.e. exercise 15) of the Regular Expressions section, the proposed solution is this:

**15)** For the input file `patterns.txt`, filter lines containing three or more occurrences of `ar` and replace the last but second `ar` with `X`.
```bash
$ awk 'BEGIN{r = @/(.*)ar((.*ar){2})/} $0~r{print gensub(r, "\\1X\\2", 1)}' patterns.txt
par car tX far Cart
pXt cart mart
```

Isn't this a shorter solution:

awk '/(.*ar.*){3,}/{print gensub(/ar/, "X", NF-2)}' patterns.txt

Also, is the phrasing "last but second" correct? I was rather confused. Is the intended meaning the same as that of the word "antepenultimate"?

suggestion for FPAT section: csvquote

Hi Sundeep- thanks, this is a great resource!

In the section about Field Separators-FPAT there is a warning note saying that FPAT will not work for csv files. That's true, and it is why csvquote exists. In contrast to xsv (which you mention) and other csv-processing suites such as miller and csvtools, the goal of csvquote is to provide just enough csv awareness to allow any text processing tools to work effectively with problematic csv data. There are still two caveats to be aware of, although they are uncommon - the data is assumed to not already contain 2 specfic nonprinting ASCII characters (0x1e and 0x1f), and within the awk script there should not be any matching based on the embedded commas and newlines.

Also works for TSV and other data that follow RFC-4180.

suggestion : counting chars vs. counting bytes

"
awk 'length($1) < 6' table.txt
echo 'αλεπού' | awk '{print length()}'
echo 'αλεπού' | awk -b '{print length()}'
echo 'αλεπού' | LC_ALL=C awk '{print length()}'"

one doesn't need to use LC_ALL=C or activate byte mode -b just to count exact bytes of the input.

even in gawk unicode mode, use


- length(str)  

  to count UTF8 characters, and 
  
- match(str, /$/) - 1 

   to count bytes

Why that works is that the code is requesting a match of the empty string at the tail, but since no other characters were matched along the way, it defaults to reporting back to you a byte count. The minus 1 is essential because otherwise RSTART would be at 1 virtual byte beyond the input string.

You can directly throw binary files like .MP3 .MP4 .XZ .PNG and gawk unicode mode would give you the byte count, without any error messages

That said, only the match( ) one won't give error messages if you throw binary data at gawk unicode mode, length( ) will DEFINITELY scream, as well as match(str /.$/)

  1. (note the dot . right before $ - on valid UTF8 inputs, this function call style is equivalent to length( ), but on random bytes, it will DEFINITELY give you the locale error message )

(can't use this to circumvent length( )'s error message if it's pure binary input - one needs to code up an alternative approach to count it, e.g. via gsub( )

Took me a while to code it up myself , but now i could get byte-mode to count UTF8, and get unicode mode to directly take in binary data, and have it report an identical count to gnu-wc)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.