w3stling / rssreader Goto Github PK
View Code? Open in Web Editor NEWA simple Java library for RSS and Atom feeds
License: MIT License
A simple Java library for RSS and Atom feeds
License: MIT License
Would be nice to have a read()
method capable of parsing a String
containing a feed directly. At this moment you need to convert the String
to byte[]
and then to ByteArrayInputStream
.
Getting IllegalArgumentException when a feed provides the item date with this special format. I do not have found a simple way to provide a specific date time format to handle this formatting aspect.
Would be nice to have some access points to enhance the API to use/handle additional formats.
Exception in thread "main" java.lang.IllegalArgumentException: Unknown date time format 2023-02-28T17:37:08.823050+00:00
at com.apptasticsoftware.rssreader.DateTime.toZonedDateTime(DateTime.java:174)
at com.apptasticsoftware.rssreader.DateTime.toEpochMilli(DateTime.java:353)
at java.base/java.util.Optional.map(Optional.java:260)
at com.apptasticsoftware.rssreader.util.ItemComparator.lambda$newestItemFirst$1(ItemComparator.java:32)
Hey I would like to have the availability to also get unknown fields xml like like image
(e.g. https://worldoftanks.eu/en/rss/news/)
thx
Hi,
I am getting the below exception on this line -rssReader.read(URL)
java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Response http status code: 401
how can we pass the credentials ?
thanks !
I am getting an exception after using the RssReader on my server. It seems to only happen after awhile. From a little research it seems things are not being closed down somewhere. I tried to copy the code and debug it to find out where but I couldn't.
Sep 17 12:37:16 ip-172-31-46-83 web: java.io.IOException: java.util.concurrent.ExecutionException: java.lang.InternalError: java.net.SocketException: Too many open files
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.freedom.utils.RssReader.read(RssReader.kt:60)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.ApplicationViewModel.updateArticlesForCategory(ApplicationViewModel.kt:98)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.ApplicationViewModel.refreshArticles(ApplicationViewModel.kt:70)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.ApplicationKt$module$$inlined$scheduleAtFixedRate$1$lambda$1.invokeSuspend(Application.kt:38)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
Sep 17 12:37:16 ip-172-31-46-83 web: Caused by: java.util.concurrent.ExecutionException: java.lang.InternalError: java.net.SocketException: Too many open files
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.freedom.utils.RssReader.read(RssReader.kt:47)
Sep 17 12:37:16 ip-172-31-46-83 web: ... 9 common frames omitted
Sep 17 12:37:16 ip-172-31-46-83 web: Caused by: java.lang.InternalError: java.net.SocketException: Too many open files
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.PlainHttpConnection.<init>(PlainHttpConnection.java:224)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.AsyncSSLConnection.<init>(AsyncSSLConnection.java:49)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.HttpConnection.getSSLConnection(HttpConnection.java:239)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.HttpConnection.getConnection(HttpConnection.java:225)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.Http2Connection.createAsync(Http2Connection.java:360)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.Http2ClientImpl.getConnectionFor(Http2ClientImpl.java:127)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.ExchangeImpl.get(ExchangeImpl.java:89)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.Exchange.establishExchange(Exchange.java:299)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.Exchange.responseAsyncImpl0(Exchange.java:431)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.Exchange.responseAsyncImpl(Exchange.java:336)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.Exchange.responseAsync(Exchange.java:328)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.MultiExchange.responseAsyncImpl(MultiExchange.java:346)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.MultiExchange.lambda$responseAsync0$2(MultiExchange.java:292)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.lang.Thread.run(Thread.java:834)
Sep 17 12:37:16 ip-172-31-46-83 web: Caused by: java.net.SocketException: Too many open files
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.Net.socket0(Native Method)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.Net.socket(Net.java:433)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.Net.socket(Net.java:426)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:121)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.nio.channels.SocketChannel.open(SocketChannel.java:150)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.PlainHttpConnection.<init>(PlainHttpConnection.java:213)
Sep 17 12:37:16 ip-172-31-46-83 web: ... 18 common frames omitted
Sep 17 12:37:16 ip-172-31-46-83 web: Exception in thread "DefaultDispatcher-worker-1" java.lang.InternalError: java.io.IOException: Too many open files
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.HttpClientImpl.<init>(HttpClientImpl.java:311)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.HttpClientImpl.create(HttpClientImpl.java:253)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.HttpClientBuilderImpl.build(HttpClientBuilderImpl.java:135)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.freedom.utils.RssReader.createHttpClient(RssReader.kt:391)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.freedom.utils.RssReader.sendAsyncRequest(RssReader.kt:84)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.freedom.utils.RssReader.readAsync(RssReader.kt:73)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.freedom.utils.RssReader.read(RssReader.kt:47)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.ApplicationViewModel.updateArticlesForCategory(ApplicationViewModel.kt:98)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.ApplicationViewModel.refreshArticles(ApplicationViewModel.kt:70)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.ApplicationKt$module$$inlined$scheduleAtFixedRate$1$lambda$1.invokeSuspend(Application.kt:38)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
Sep 17 12:37:16 ip-172-31-46-83 web: Caused by: java.io.IOException: Too many open files
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.EPoll.create(Native Method)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:79)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.nio.channels.Selector.open(Selector.java:295)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.<init>(HttpClientImpl.java:699)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.HttpClientImpl.<init>(HttpClientImpl.java:308)
Sep 17 12:37:16 ip-172-31-46-83 web: ... 15 more
After read from remote successfully, I would store it into storage (i.e, local file system) for next time read faster.
There is no method to read from String or Bytes :-(
I have about 700 different RSS sources and lots of different image formats. How do I detect all images in these?
When trying to sort Items I get an exception "Unknown date time format" fired from https://github.com/w3stling/rssreader/blob/master/src/main/java/com/apptastic/rssreader/DateTime.java#L87 .
The date that is trying to be parsed is 2021-11-17T13:21:21Z
. It isn't recognized but is compliant with the case DateTimeFormatter.ISO_OFFSET_DATE_TIME (atom feed rfc).
An example of a feed like this can be the GitHub personal feed (found at the bottom of https://github.com when logged in).
Currently trying to read http://na.leagueoflegends.com/en/rss.html but get this error
WARNING: Failed to parse XML.
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Premature end of file.
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:652)
at com.apptastic.rssreader.RssReader$RssItemIterator.next(RssReader.java:202)
at com.apptastic.rssreader.RssReader$RssItemIterator.peekNext(RssReader.java:177)
at com.apptastic.rssreader.RssReader$RssItemIterator.hasNext(RssReader.java:187)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:132)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
at Commands.CustomCommands.Subscribers.RssLeagueThread.run(RssLeagueThread.java:43)
at java.base/java.lang.Thread.run(Thread.java:834)
I suspect the website is blocking the connection? As that URL and XML is perfectly valid.
When reading the url https://www.nrdc.org/rss.xml I get the error:
java.time.format.DateTimeParseException: Text '2023-08-07T10:06:05-0400' could not be parsed, unparsed text found at index 22
When reading the url https://www.sciencedaily.com/rss/top.xml I get the error:
java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Response http status code: 403
Both of them work fine with Thinderbird.
Is this known or just my fate?
Hi,
Could you please provide a constructor that receive a HttpClient? It will help us to optimize resource.
Thanks in advance
The method getPubDateZonedDateTime()
returns an Optional<ZonedDateTime>
. For some reason it cannot parse the date shown in the example below:
Item item = items.get(0);
item.setPubDate("Sat, 21 Jan 2023 11:12:30 GMT");
// Throws an Exception
Optional<ZonedDateTime> pubDate = item.getPubDateZonedDateTime();
An exception is thrown
Exception in thread "main" java.time.format.DateTimeParseException: Text 'Sat, 21 Jan 2023 11:12:30 GMT' could not be parsed at index 0
at java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2052)
at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1954)
at java.base/java.time.ZonedDateTime.parse(ZonedDateTime.java:600)
at com.apptasticsoftware.rssreader/com.apptasticsoftware.rssreader.DateTime.toZonedDateTime(DateTime.java:179)
at java.base/java.util.Optional.map(Optional.java:260)
at com.apptasticsoftware.rssreader/com.apptasticsoftware.rssreader.Item.getPubDateZonedDateTime(Item.java:235)
I wonder that the pubDate can't be parsed.
Expected behavior: Returning an empty Optional if parsing fails.
Tested with version 3.4.1.
Is it on purpose that there is only support for one category?
When a feed is setting those fields, they are not recognized:
<dc:date>2022-09-14T10:10:10+00:00</dc:date>
<dc:creator>creator</dc:creator>
Example: https://lwn.net/headlines/rss
how to get the text of news?
Hi !
I am trying to get to specific RSS feed, but I get 403 error back... When trying to access the same website with JSoup parser, I also get same error (I had to use Selenium in the end to be able to parse the pages)... Did you have similar problem anytime before.
Andy
How to extend the class Item?
At this moment the content
tag is being treated as description
, but it should be treated separately (adding the content
variable along with setContent()
, getContent()
methods, etc).
Many feeds have the entire article there (most of WordPress sites, in fact).
I'm not sure if this would be out of scope or not but torznab feeds have the ability to contain multiple enclosures.
The spec for that is here.
I would greatly appreciate if this was possible to add.
I'm not quite understanding how this repository does what it promises. You state the following in the description:
Subscribing to a website RSS removes the need for the user to manually check the website for new content.
This is closely followed by:
This Java RSS parser library makes it easier to automate data extraction from RSS or Atom feeds via Java stream API.
To me, and I would like to believe most people, this suggests that this repository has the capabilities of "subscribing" to an "RSS" feed (henceforth "automating data extraction"). But unless I'm being silly and just completely missing something, this is not possible with this library?
If this could be added, that would be great because otherwise I don't believe it fulfils the description of this repository. Don't get me wrong, this library is very handy nonetheless, but what I would consider the more important part of the library is being able to subscribe to a feed. Otherwise it verges on the edge of not being so useful.
I also cannot help but point out, the fact that it uses the Java Stream API is a weird point to advertise this repository from? Is that really a such a big thing? It's not any different than just giving us a List
and us calling List#stream
. Generally it feels like a bad idea to just pass streams around since obviously they do need to be closed.
I want to do some logic in the registerItemTags() based on several conditions,
For example when creating the reader, i passed some param thru the header,
List<Item> items = new RssReader().addHeader("Param","1").read("http://google.com").toList();
Then i want to read it again,
public class MovieRssReader extends AbstractRssReader<Channel, Item> {
@Override
protected void registerItemTags() {
if(parent.getHeader("Param") == "1")
do this
}
Any help will be appreciated, thanks
Hello,
I've just noticed that my RSS feed source has changed content but the data I get from RssReader
is still the old data. Does it have a cache? If so, is that configurable? I'm on version 3.4.0.
Thanks a lot!
Hi, I just wanted to let you know that JCenter is in the process of being sunset, and quite soon: https://jfrog.com/blog/into-the-sunset-bintray-jcenter-gocenter-and-chartcenter/
You'll need to move to some other repository service.
First of all many thanks for this lib :) I started experimenting with it today and it's been great.
One thing I came across: one of the feeds I am trying to consume has the enclosure length set to blank (length=""
). Sadly this breaks here:
According to the spec the value must be set - so the feed is doing it wrong. Nonetheless I was wondering if there is a way for the code to do something reasonable and came up with this locally:
enclosureAttributes
.put("length", (i, v) -> i.getEnclosure().ifPresent(e -> {
if (!v.isBlank()) {
e.setLength(Long.parseLong(v));
}
}));
In accordance with the Robustness principle it doesn't seem like the worst idea to me.
What do you think?
If you think it's reasonable, I could create a PR.
How to get the raw string of the RSS xml response?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.