Giter Site home page Giter Site logo

nightmachinery / r_rational Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 3.0 184.95 MB

r/rational archived in plain-text org-mode (good for, e.g., doing offline full-text searches)

Home Page: https://sourcegraph.com/search?q=context%3Aglobal+file%3Aindices%2F.*.org+repo%3A%5Egithub%5C.com%2FNightMachinery%2Fr_rational%24+&patternType=regexp&groupBy=path

org org-mode rational-fiction archive archiving plaintext plain-text reddit subreddit full-text-extraction

r_rational's Introduction

r/rational in org-mode

This is an archive of all the posts in the r/rational subreddit in plain-text org-mode.

I personally use it to do fast, offline full-text searches on the whole subreddit.

Reddit does not map cleanly to org-mode, so I am open to ideas on changing the template used to create the org-mode files.

Github renders org headings as HTML headers, which doesn’t work at all for these. Use an org-mode viewer to view the files or just open them as plain-text.

readme.org_imgs/20210531_054346_t1GssN.png

readme.org_imgs/20210531_054821_vKtPi3.png

Full-text search guides

Online full-text search via Sourcegraph

This search engine was optimized for searching code, so it is not too suitable for our purposes, but it’s still much better than Reddit’s own search.

Here is Sourcegraph’s query syntax. The important point is that it supports regular expressions and assumes the words are in the correct order, unless you use boolean operators such as japanese AND horror.

Note that the link above searches in the indices directory, where each file contains only a single comment. This is usually what you want . (It’s only drawback being that it’s tedious to find the comments around the found results.) To search per submission (instead of per comment), use this link, which searches the posts directory instead.

readme.org_imgs/20210601_003236_9uj3rV.png

Searching via ugrep

Install GitHub - Genivia/ugrep: 🔍NEW ugrep v3.3: ultra fast grep with interactive que… by, e.g.,

brew install ugrep

Now paste this function into your shell:

ugc () {
    ugrep --heading --color=always --pretty --context=3 --recursive --bool --smart-case '--sort=best' --no-confirm --perl-regexp --hidden '--binary-files=without-match' "$@" | less -n
}

Now you can do:

git clone --recursive https://github.com/NightMachinary/r_rational
cd r_rational/posts
ugc 'japanese horror'

readme.org_imgs/20210531_174125_jXIQ5n.png

ugrep also supports an interactive, incremental search mode:

function ugci {
    local r="${@[-1]}" opts=("${@[1,-2]}")

    ugrep --heading --color=always --pretty --context=3 --recursive --bool --smart-case '--sort=best' --no-confirm --perl-regexp --hidden '--binary-files=without-match' "$opts[@]" --query=1 --regexp="$r"
}
ugci 'japanese horror'

FAQ

Reduce storage costs by deleting indices

This directory saves each comment to a single file, which is very inefficient on modern OSes with a block size of 4KB. If you don’t use these files, deleting them will reduce the size of this repo by a lot (as of this writing, the posts directory is only 163MB). You can also delete the .git directory, but then you would lose access to git features such as pulling new updates.

Search excluding the authors’ names

The easiest way to achieve this is to delete the authors’ names from the data using a search-and-replace tool such as ms-jpq/sad:

fd . | sad '\s*:author:.*' ''

fd . | sad 'u/\S+' 'u/redacted'

How was this repo made?

This repo was created using this script, which needs some refactoring to be decoupled from my environment.

I plan to keep the repo up-to-date as new posts are added to the subreddit.

r_rational's People

Contributors

nightmachinery avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

masnes sinamore

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.