feediron / feediron-recipes Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 10.0 83 KB

Recipes for the TT-RSS Plugin Feediron

License: MIT License

feediron-recipes's People

Contributors

Stargazers

Watchers

Forkers

de-es scheistyscheid jake-penguins tuxcoder uqs overzealous hugoideler vco1 bfly75

feediron-recipes's Issues

Standard format

I noticed some recipe file names end with tld (e.g. arstechnica.com) and some file names do not have one (e.g. zdnet).

Can we make this a standard for file names?
domain.tld

As for the case of different recipes on the same domain (e.g. heise.de)
Maybe restructure the config like this?

    "config": {
        "match": "heise.de\/tr",
        ...
        },
              {
        "match": "heise.de\/tp",
        ...
        }

Site config - reddit.com link threads

Been awhile since I created some of these, and this is very barebones.

Got a little tired of some of my reddit feeds just show a link with no context I wrote this to convert the link to content.

{
    "name": "www.reddit.com",
    "url": "www.reddit.com",
    "stamp": 1686163767,
    "author": "JRem",
    "match": "www.reddit.com",
    "config": {
        "type": "xpath",
        "multipage": {
            "xpath": "a[@data-post-click-location='post-media-content']",
            "append": true,
            "recursive": false,
            "tidy-source": true
        }
    }
}

History not preserved from old repo

The history of the recipes wasn't preserved when moving to this new repo. Since a lot of the recipes are large regular expressions, any context in the commit history is valuable for future contributors trying to fix something.

Ex: I made the following commit in the old repo: feediron/ttrss_plugin-feediron@04c74bf . Not all commit messages are that detailed, but I think there's still value to them.

My current fork of this repo has all the history preserved: https://github.com/pR0Ps/feediron-recipes. Would you consider replacing this repo with the content of that one?

Like so:

cd <your recipes repo>
git remote add pR0Ps https://github.com/pR0Ps/feediron-recipes.git
git fetch pR0Ps
git checkout master
git reset --hard pR0Ps/master
git push --force origin master

Alternatively, you can do the same thing I did from a fresh copy of your repo using the bash script below. This way you don't have to trust that I haven't manipulated the code in some way.

It will clone a fresh copy of the repo and rewrite the history to preserve the recipes folder and nothing else. It then cherry-picks your commits that added the readme, license, and github templates. There's probably a more optimal way of doing this but meh.

The last commented out line force-pushes the new tree up to the master branch of this repo.

# Grab a fresh copy of the repo
git clone https://github.com/feediron/ttrss_plugin-feediron.git
cd ttrss_plugin-feediron

# Check out the last commit with recipes in it
git checkout e22c0dfbdd30cdb8247657e2de19b59ccae2bcf4~1

# Split the history of the recipes folder into its own branch
git subtree split --prefix=recipes/ -b recipes
git checkout recipes

# Rewrite history to move all the recipes to a folder called "recipes"
git filter-branch --tree-filter "mkdir recipes; git mv -k * recipes" HEAD

# Add the refs from the current recipes repo
# git remote add https://github.com/feediron/feediron-recipes.git # HTTPS
git remote add recipes [email protected]:feediron/feediron-recipes.git
git fetch recipes

# Cherry pick commits that add `LICENCE`, `README.md`, and `.github/*_TEMPLATE.md`
git cherry-pick 150c0a3b58d5df29e3412b550af4e3ebd4963e6e 3910db566e14ba666a0b5d847bb0c2bce153de40 9911bdffda833003f18c5b589aaeee319d3f1518

# Explore the history and file tree to make sure everything worked properly.
# When you're satisfied, use the following command to push the current HEAD up
# to the master branch of this repo:
#     git push --force recipes HEAD:master

Note that I changed the folder of the recipes from "general" to "recipes".

I'm assuming "general" is the beginning of a categorization effort? I was going to make another issue about this, but figured I would lump it in here: I don't think categorization is the way to go for this. Not everyone's categories are the same and loads of sites would be considered to be in multiple categories. Personally, I think it would actually make finding a recipe harder, not easier. Right now you can select the dropdown, type the first letter and get pretty close to the site you want. With categories this probably won't be the case. Also, unless they're localized, the experience will get strictly worse from a non-English perspective.

IMO a much better way of solving the problem of too many options in the dropdown would be to limit them to the ones you actually want, something like feediron/ttrss_plugin-feediron#19 .

Either way, your repo, your rules. It's pretty easy to change the script if you want the folder named "general".

Trouble with XPath expression, how to get the 2nd element?

Expected Behavior

The arstechnica.com recipe is broken and/or they changed their site layout so that the extracted element is often just half of the content.

Recipe Code

Please help provide information about the recipe.

{
    "name": "arstechnica.com",
    "url": "arstechnica.com",
    "stamp": 1470889961,
    "author": "cwmke",
    "match": "arstechnica.com",
    "config": {
        "type": "xpath",
        "xpath": "div[contains(@class, 'article-content')]",
        "multipage": {
            "xpath": "nav[contains(@class, 'page-numbers')]\/span\/a[last()]",                                                                                  
            "append": true,
            "recursive": true
        },  
        "modify": [
            {
                "type": "regex",
                "pattern": "\/<li.*? data-src=\"(.*?)\".*?>\\s*<figure.*?>.*?(?:<figcaption.*?<div class=\"caption\">(.*?)<\\\/div>.*?<\\\/figcaption>)?\\s*<\\\/figure>\\s*<\\\/li>\/s",
                "replace": "<figure><img src=\"$1\"\/><figcaption>$2<\/figcaption><\/figure>"
            }   
        ],  
        "cleanup": [
            "aside",
            "div[contains(@class, 'sidebar')]"
        ]   
    }   
}

Context

Ignore the modify regex, that is not the problem. I've only this example article at hand and this is not supposed to be a political statement or anything (I'm just curious what all this Impostor stuff is actually about)

https://arstechnica.com/gaming/2020/10/aocs-twitch-streaming-debut-attracts-over-435000-among-us-viewers/

Run that article through the filter, and you'll notice that the bottom half of the article is missing.

The article structure is roughly like so:

<article>
  <div> <div> <section class="article-guts> <div class="article-content post-page> </div></div></div>
  <some ad stuff in here>
  <div> <div> <section class="article-guts> rest of article in here </div></div></div>
</article

The filter grabs the first article-content and runs with it. So I changed it to:

    "xpath": [
        "div[contains(@class, 'article-content')]",
        "(//section[@class='article-guts'])[1]"
    ],

Because in Chrome, I can select it in the console using: $x("//section[@class='article-guts']")[1]
But in feediron, this results in all content getting dropped (and then the fallback to displaying the full HTML).

I'm confused as to how XPath works and how it works in Feediron and whether it would concatenate 2 expressions or whatever. Just running with the single filter of: "section[@class='article-guts'][last()]" results in, you guessed it, the first article-guts content getting displayed, not the 2nd or last one.

Help? Does feediron extract both XPaths and concatenates them? How can I get it to extract both article-guts classes? Why does it think the forward slashes need to be escaped and re-writes them?

feediron / feediron-recipes Goto Github PK

feediron-recipes's People

Contributors

Stargazers

Watchers

Forkers

feediron-recipes's Issues

Standard format

Site config - reddit.com link threads

History not preserved from old repo

Trouble with XPath expression, how to get the 2nd element?

Expected Behavior

Recipe Code

Context

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent