Giter Site home page Giter Site logo

feediron-recipes's People

Contributors

bryanlyon avatar cwmke avatar dugite-code avatar gregthib avatar hugoideler avatar jreming85 avatar m42e avatar mcbyte-it avatar narga avatar overzealous avatar pr0ps avatar radlerandi avatar spinda avatar symphorien avatar teroshan avatar uqs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

feediron-recipes's Issues

Standard format

I noticed some recipe file names end with tld (e.g. arstechnica.com) and some file names do not have one (e.g. zdnet).

Can we make this a standard for file names?
domain.tld

As for the case of different recipes on the same domain (e.g. heise.de)
Maybe restructure the config like this?

    "config": {
        "match": "heise.de\/tr",
        ...
        },
              {
        "match": "heise.de\/tp",
        ...
        }

Site config - reddit.com link threads

Been awhile since I created some of these, and this is very barebones.

Got a little tired of some of my reddit feeds just show a link with no context I wrote this to convert the link to content.

{
    "name": "www.reddit.com",
    "url": "www.reddit.com",
    "stamp": 1686163767,
    "author": "JRem",
    "match": "www.reddit.com",
    "config": {
        "type": "xpath",
        "multipage": {
            "xpath": "a[@data-post-click-location='post-media-content']",
            "append": true,
            "recursive": false,
            "tidy-source": true
        }
    }
}

History not preserved from old repo

The history of the recipes wasn't preserved when moving to this new repo. Since a lot of the recipes are large regular expressions, any context in the commit history is valuable for future contributors trying to fix something.

Ex: I made the following commit in the old repo: feediron/ttrss_plugin-feediron@04c74bf . Not all commit messages are that detailed, but I think there's still value to them.

My current fork of this repo has all the history preserved: https://github.com/pR0Ps/feediron-recipes. Would you consider replacing this repo with the content of that one?

Like so:

cd <your recipes repo>
git remote add pR0Ps https://github.com/pR0Ps/feediron-recipes.git
git fetch pR0Ps
git checkout master
git reset --hard pR0Ps/master
git push --force origin master

Alternatively, you can do the same thing I did from a fresh copy of your repo using the bash script below. This way you don't have to trust that I haven't manipulated the code in some way.

It will clone a fresh copy of the repo and rewrite the history to preserve the recipes folder and nothing else. It then cherry-picks your commits that added the readme, license, and github templates. There's probably a more optimal way of doing this but meh.

The last commented out line force-pushes the new tree up to the master branch of this repo.

# Grab a fresh copy of the repo
git clone https://github.com/feediron/ttrss_plugin-feediron.git
cd ttrss_plugin-feediron

# Check out the last commit with recipes in it
git checkout e22c0dfbdd30cdb8247657e2de19b59ccae2bcf4~1

# Split the history of the recipes folder into its own branch
git subtree split --prefix=recipes/ -b recipes
git checkout recipes

# Rewrite history to move all the recipes to a folder called "recipes"
git filter-branch --tree-filter "mkdir recipes; git mv -k * recipes" HEAD

# Add the refs from the current recipes repo
# git remote add https://github.com/feediron/feediron-recipes.git # HTTPS
git remote add recipes [email protected]:feediron/feediron-recipes.git
git fetch recipes

# Cherry pick commits that add `LICENCE`, `README.md`, and `.github/*_TEMPLATE.md`
git cherry-pick 150c0a3b58d5df29e3412b550af4e3ebd4963e6e 3910db566e14ba666a0b5d847bb0c2bce153de40 9911bdffda833003f18c5b589aaeee319d3f1518

# Explore the history and file tree to make sure everything worked properly.
# When you're satisfied, use the following command to push the current HEAD up
# to the master branch of this repo:
#     git push --force recipes HEAD:master

Note that I changed the folder of the recipes from "general" to "recipes".

I'm assuming "general" is the beginning of a categorization effort? I was going to make another issue about this, but figured I would lump it in here: I don't think categorization is the way to go for this. Not everyone's categories are the same and loads of sites would be considered to be in multiple categories. Personally, I think it would actually make finding a recipe harder, not easier. Right now you can select the dropdown, type the first letter and get pretty close to the site you want. With categories this probably won't be the case. Also, unless they're localized, the experience will get strictly worse from a non-English perspective.

IMO a much better way of solving the problem of too many options in the dropdown would be to limit them to the ones you actually want, something like feediron/ttrss_plugin-feediron#19 .

Either way, your repo, your rules. It's pretty easy to change the script if you want the folder named "general".

Trouble with XPath expression, how to get the 2nd element?

Expected Behavior

The arstechnica.com recipe is broken and/or they changed their site layout so that the extracted element is often just half of the content.

Recipe Code

Please help provide information about the recipe.

{
    "name": "arstechnica.com",
    "url": "arstechnica.com",
    "stamp": 1470889961,
    "author": "cwmke",
    "match": "arstechnica.com",
    "config": {
        "type": "xpath",
        "xpath": "div[contains(@class, 'article-content')]",
        "multipage": {
            "xpath": "nav[contains(@class, 'page-numbers')]\/span\/a[last()]",                                                                                  
            "append": true,
            "recursive": true
        },  
        "modify": [
            {
                "type": "regex",
                "pattern": "\/<li.*? data-src=\"(.*?)\".*?>\\s*<figure.*?>.*?(?:<figcaption.*?<div class=\"caption\">(.*?)<\\\/div>.*?<\\\/figcaption>)?\\s*<\\\/figure>\\s*<\\\/li>\/s",
                "replace": "<figure><img src=\"$1\"\/><figcaption>$2<\/figcaption><\/figure>"
            }   
        ],  
        "cleanup": [
            "aside",
            "div[contains(@class, 'sidebar')]"
        ]   
    }   
}   

Context

Ignore the modify regex, that is not the problem. I've only this example article at hand and this is not supposed to be a political statement or anything (I'm just curious what all this Impostor stuff is actually about)

https://arstechnica.com/gaming/2020/10/aocs-twitch-streaming-debut-attracts-over-435000-among-us-viewers/

Run that article through the filter, and you'll notice that the bottom half of the article is missing.

The article structure is roughly like so:

<article>
  <div> <div> <section class="article-guts> <div class="article-content post-page> </div></div></div>
  <some ad stuff in here>
  <div> <div> <section class="article-guts> rest of article in here </div></div></div>
</article

The filter grabs the first article-content and runs with it. So I changed it to:

    "xpath": [
        "div[contains(@class, 'article-content')]",
        "(//section[@class='article-guts'])[1]"
    ],

Because in Chrome, I can select it in the console using: $x("//section[@class='article-guts']")[1]
But in feediron, this results in all content getting dropped (and then the fallback to displaying the full HTML).

I'm confused as to how XPath works and how it works in Feediron and whether it would concatenate 2 expressions or whatever. Just running with the single filter of: "section[@class='article-guts'][last()]" results in, you guessed it, the first article-guts content getting displayed, not the 2nd or last one.

Help? Does feediron extract both XPaths and concatenates them? How can I get it to extract both article-guts classes? Why does it think the forward slashes need to be escaped and re-writes them?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.