Giter Site home page Giter Site logo

Comments (22)

markpritchard avatar markpritchard commented on June 1, 2024 3

Hi,

Quick update on this - I have the library compiling after the switch to quick-xml and most of the test cases fixed. Just tracking down the last couple of test failures.

from feed-rs.

jangernert avatar jangernert commented on June 1, 2024 2

Used in production, but 0 open issues and 0 open pull requests. THE DREAM!

from feed-rs.

markpritchard avatar markpritchard commented on June 1, 2024

Hi @ayrat555 , could this be another example of a redirection? The feed works fine for me e.g. using the testurls program included with feed-rs:

$ echo 'https://www.tjrs.jus.br/site_php/noticias/news_rss.php' | cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `/home/mpritcha/devel/projects/external/rust/feed-rs/target/debug/testurls`
https://www.tjrs.jus.br/site_php/noticias/news_rss.php  ... ok

from feed-rs.

ayrat555 avatar ayrat555 commented on June 1, 2024

@markpritchard I doesn't seem like it.
It seems like an encoding issue. I'm passing raw bytes to feed-rss which it can not handle

    let client = match HttpClient::builder()
        .timeout(Duration::from_secs(5))
        .default_header("User-Agent", "el_monitorro/0.1.0")
        .redirect_policy(RedirectPolicy::Limit(10))
        .build()
    {
        Ok(cl) => cl,
        Err(er) => {
            let msg = format!("{:?}", er);

            return Err(FeedReaderError { msg });
        }
    };

    match client.get(url) {
        Ok(mut response) => {
            println!("{:?}", response);

            let mut writer: Vec<u8> = vec![];

            if let Err(err) = io::copy(response.body_mut(), &mut writer) {
                let msg = format!("{:?}", err);

                return Err(FeedReaderError { msg });
            }
            println!("{:?}", writer);
            Ok(writer)
        }
        Err(error) => {
            let msg = format!("{:?}", error);

            Err(FeedReaderError { msg })
        }
    }

the same issue is for https://pikabu.ru/xmlfeeds.php?cmd=popular .

from feed-rs.

ayrat555 avatar ayrat555 commented on June 1, 2024

Converting to utf-8 and passing to feed-rss works. I will use this workaround for now. But still, it's not the right way.

So maybe this behaviour should be documented. Also, I think it would be nice to make it configurable

from feed-rs.

jangernert avatar jangernert commented on June 1, 2024

Smells a bit like #10

from feed-rs.

ayrat555 avatar ayrat555 commented on June 1, 2024

rss and atom-syndication crates can handle such cases. you can check out how they solved these cases. I think they use encoding tag from xml

from feed-rs.

markpritchard avatar markpritchard commented on June 1, 2024

@ayrat555 this was actually solved nicely by @jangernert with a patch to xml-rs. Unfortunately that crate doesn't seem to be maintained any more so I forked it and merged Jan's patch into my version. I'll write up some test cases and release a new version of feed-rs with this fix in it soon (just been busy with other things ... I live in Victoria, Australia and we are back into lockdown now... urgh).

from feed-rs.

markpritchard avatar markpritchard commented on June 1, 2024

I tested with the xml-rs patch, but decoding the Characters event after it is created fails because its expecting to decode the accumulated buffer as a valid UTF8 string.

I'll try switching to quick-xml (thanks for the tip @jangernert )

from feed-rs.

ayrat555 avatar ayrat555 commented on June 1, 2024

rust-syndication/rss#87 (comment) - rss also uses quick-xml

from feed-rs.

jangernert avatar jangernert commented on June 1, 2024

@markpritchard before you start working on this. I have a WIP branch here switching to quick-xml. I'll try to get it into a working state and propose a MR for discussion.

from feed-rs.

jangernert avatar jangernert commented on June 1, 2024

WIP is here: https://github.com/jangernert/feed-rs/tree/quick-xml
It's still messy. I'm trying to get all the tests to pass before opening the MR.

from feed-rs.

markpritchard avatar markpritchard commented on June 1, 2024

That is cool, thanks @jangernert ... I got a fair way on my changes too, but will take a look at your WIP as well.

from feed-rs.

jangernert avatar jangernert commented on June 1, 2024

Oh, how far along are you? I have it compiling again and am at 8/28 tests passing. But I did probably do more copying than necessary to get out of lifetime hell with elements also possessing a reference to the element source.
If you are at a similar stage I don't think it makes sense for both of us to write more or less the same code.

from feed-rs.

markpritchard avatar markpritchard commented on June 1, 2024

from feed-rs.

jangernert avatar jangernert commented on June 1, 2024

Okay then I'll stop working on it and let you do your thing.

from feed-rs.

kkszysiu avatar kkszysiu commented on June 1, 2024

Hey guys,

How it is going with this issue?

from feed-rs.

markpritchard avatar markpritchard commented on June 1, 2024

Hey @kkszysiu ,

Apologies for the delay ... I've been pretty distracted with COVID stuff lately (live in Melbourne, AU ... we are now in stage 4 lockdown).

Hoping to get back on this later this week.

from feed-rs.

kkszysiu avatar kkszysiu commented on June 1, 2024

@markpritchard thank you, no rush. I'm just trying to understand if you're planning to still maintain this library :)

I've prepared a feedparser library based on this lib for Python, here: https://github.com/kkszysiu/ultrafeedparser
As for now it is the fastest library for Python to parse feeds: https://github.com/kkszysiu/ultrafeedparser/tree/master/bench#statistics

Thanks!

from feed-rs.

markpritchard avatar markpritchard commented on June 1, 2024

@markpritchard thank you, no rush. I'm just trying to understand if you're planning to still maintain this library :)

I've prepared a feedparser library based on this lib for Python, here: https://github.com/kkszysiu/ultrafeedparser
As for now it is the fastest library for Python to parse feeds: https://github.com/kkszysiu/ultrafeedparser/tree/master/bench#statistics

Thanks!

Out of curiosity @kkszysiu, - do you have a Python project that needs to parse various feeds? Or was it just interest that led you to prepare the benchmark?

I'm interested to see how things change with the switch to quick-xml as the parser backend - the crate documentation mentions its 50x faster, and some more stats here: https://github.com/RazrFalcon/choose-your-xml-rs

from feed-rs.

kkszysiu avatar kkszysiu commented on June 1, 2024

@markpritchard thank you, no rush. I'm just trying to understand if you're planning to still maintain this library :)
I've prepared a feedparser library based on this lib for Python, here: https://github.com/kkszysiu/ultrafeedparser
As for now it is the fastest library for Python to parse feeds: https://github.com/kkszysiu/ultrafeedparser/tree/master/bench#statistics
Thanks!

Out of curiosity @kkszysiu, - do you have a Python project that needs to parse various feeds? Or was it just interest that led you to prepare the benchmark?

I'm interested to see how things change with the switch to quick-xml as the parser backend - the crate documentation mentions its 50x faster, and some more stats here: https://github.com/RazrFalcon/choose-your-xml-rs

I'm planning to start using it on production to parse various feeds, yeah. I'm working in a project, where we have few hundreds of them to parse as a periodic tasks and the faster we can do that, the better of course.
But here the performance is just a value added. Not a requirement.

Yet I will definitely make new benchmarks, when new version of feed-rs that uses quick-xml would be released :)

from feed-rs.

markpritchard avatar markpritchard commented on June 1, 2024

Cool, thanks for the background @kkszysiu :)

I'll fix up #60 and #61 too, then cut another release (likely 0.4.0 given the internal library change)

from feed-rs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.