Comments (22)
Hi,
Quick update on this - I have the library compiling after the switch to quick-xml and most of the test cases fixed. Just tracking down the last couple of test failures.
from feed-rs.
Used in production, but 0 open issues and 0 open pull requests. THE DREAM!
from feed-rs.
Hi @ayrat555 , could this be another example of a redirection? The feed works fine for me e.g. using the testurls
program included with feed-rs
:
$ echo 'https://www.tjrs.jus.br/site_php/noticias/news_rss.php' | cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.05s
Running `/home/mpritcha/devel/projects/external/rust/feed-rs/target/debug/testurls`
https://www.tjrs.jus.br/site_php/noticias/news_rss.php ... ok
from feed-rs.
@markpritchard I doesn't seem like it.
It seems like an encoding issue. I'm passing raw bytes to feed-rss which it can not handle
let client = match HttpClient::builder()
.timeout(Duration::from_secs(5))
.default_header("User-Agent", "el_monitorro/0.1.0")
.redirect_policy(RedirectPolicy::Limit(10))
.build()
{
Ok(cl) => cl,
Err(er) => {
let msg = format!("{:?}", er);
return Err(FeedReaderError { msg });
}
};
match client.get(url) {
Ok(mut response) => {
println!("{:?}", response);
let mut writer: Vec<u8> = vec![];
if let Err(err) = io::copy(response.body_mut(), &mut writer) {
let msg = format!("{:?}", err);
return Err(FeedReaderError { msg });
}
println!("{:?}", writer);
Ok(writer)
}
Err(error) => {
let msg = format!("{:?}", error);
Err(FeedReaderError { msg })
}
}
the same issue is for https://pikabu.ru/xmlfeeds.php?cmd=popular .
from feed-rs.
Converting to utf-8 and passing to feed-rss
works. I will use this workaround for now. But still, it's not the right way.
So maybe this behaviour should be documented. Also, I think it would be nice to make it configurable
from feed-rs.
Smells a bit like #10
from feed-rs.
rss
and atom-syndication
crates can handle such cases. you can check out how they solved these cases. I think they use encoding tag from xml
from feed-rs.
@ayrat555 this was actually solved nicely by @jangernert with a patch to xml-rs
. Unfortunately that crate doesn't seem to be maintained any more so I forked it and merged Jan's patch into my version. I'll write up some test cases and release a new version of feed-rs with this fix in it soon (just been busy with other things ... I live in Victoria, Australia and we are back into lockdown now... urgh).
from feed-rs.
I tested with the xml-rs
patch, but decoding the Characters
event after it is created fails because its expecting to decode the accumulated buffer as a valid UTF8 string.
I'll try switching to quick-xml (thanks for the tip @jangernert )
from feed-rs.
rust-syndication/rss#87 (comment) - rss
also uses quick-xml
from feed-rs.
@markpritchard before you start working on this. I have a WIP branch here switching to quick-xml
. I'll try to get it into a working state and propose a MR for discussion.
from feed-rs.
WIP is here: https://github.com/jangernert/feed-rs/tree/quick-xml
It's still messy. I'm trying to get all the tests to pass before opening the MR.
from feed-rs.
That is cool, thanks @jangernert ... I got a fair way on my changes too, but will take a look at your WIP as well.
from feed-rs.
Oh, how far along are you? I have it compiling again and am at 8/28 tests passing. But I did probably do more copying than necessary to get out of lifetime hell with elements also possessing a reference to the element source.
If you are at a similar stage I don't think it makes sense for both of us to write more or less the same code.
from feed-rs.
from feed-rs.
Okay then I'll stop working on it and let you do your thing.
from feed-rs.
Hey guys,
How it is going with this issue?
from feed-rs.
Hey @kkszysiu ,
Apologies for the delay ... I've been pretty distracted with COVID stuff lately (live in Melbourne, AU ... we are now in stage 4 lockdown).
Hoping to get back on this later this week.
from feed-rs.
@markpritchard thank you, no rush. I'm just trying to understand if you're planning to still maintain this library :)
I've prepared a feedparser library based on this lib for Python, here: https://github.com/kkszysiu/ultrafeedparser
As for now it is the fastest library for Python to parse feeds: https://github.com/kkszysiu/ultrafeedparser/tree/master/bench#statistics
Thanks!
from feed-rs.
@markpritchard thank you, no rush. I'm just trying to understand if you're planning to still maintain this library :)
I've prepared a feedparser library based on this lib for Python, here: https://github.com/kkszysiu/ultrafeedparser
As for now it is the fastest library for Python to parse feeds: https://github.com/kkszysiu/ultrafeedparser/tree/master/bench#statisticsThanks!
Out of curiosity @kkszysiu, - do you have a Python project that needs to parse various feeds? Or was it just interest that led you to prepare the benchmark?
I'm interested to see how things change with the switch to quick-xml as the parser backend - the crate documentation mentions its 50x faster, and some more stats here: https://github.com/RazrFalcon/choose-your-xml-rs
from feed-rs.
@markpritchard thank you, no rush. I'm just trying to understand if you're planning to still maintain this library :)
I've prepared a feedparser library based on this lib for Python, here: https://github.com/kkszysiu/ultrafeedparser
As for now it is the fastest library for Python to parse feeds: https://github.com/kkszysiu/ultrafeedparser/tree/master/bench#statistics
Thanks!Out of curiosity @kkszysiu, - do you have a Python project that needs to parse various feeds? Or was it just interest that led you to prepare the benchmark?
I'm interested to see how things change with the switch to quick-xml as the parser backend - the crate documentation mentions its 50x faster, and some more stats here: https://github.com/RazrFalcon/choose-your-xml-rs
I'm planning to start using it on production to parse various feeds, yeah. I'm working in a project, where we have few hundreds of them to parse as a periodic tasks and the faster we can do that, the better of course.
But here the performance is just a value added. Not a requirement.
Yet I will definitely make new benchmarks, when new version of feed-rs that uses quick-xml would be released :)
from feed-rs.
Cool, thanks for the background @kkszysiu :)
I'll fix up #60 and #61 too, then cut another release (likely 0.4.0 given the internal library change)
from feed-rs.
Related Issues (20)
- Handle Atom Links inside RSS.
- Invalid xml:base triggers url debug_assert HOT 2
- `serde` feature is not available HOT 22
- Publish the latest version to crates.io HOT 2
- Empty content for Matrix blog HOT 1
- Incorrect extraction of text data when `CDATA` literal is preceded by newline HOT 2
- Cant serialize Feed parsed from string HOT 3
- No Published date for some italian news sites HOT 3
- Support for <description> tag HOT 3
- Version 1.2 decoding regression HOT 1
- Dates without time don't get parsed HOT 2
- Idea: Add a hashmap to feed HOT 5
- Handling badly malformed dates HOT 4
- `media:thumbnail not parsed` HOT 1
- JSONFeed v1.1 support? HOT 2
- Multi-line text in description field is being concatinated into a single line HOT 6
- biorxiv.org missing title HOT 1
- could not parse atom feed https://www.scattered-thoughts.net/atom.xml HOT 1
- feature: atom: handle xml:base attribute on content
- Itunes subcategories are not parsed HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from feed-rs.