Giter Site home page Giter Site logo

alabeduarte / feedparser-promised Goto Github PK

View Code? Open in Web Editor NEW
40.0 1.0 5.0 750 KB

[DEPRECATED] Wrapper around feedparser with promises

License: MIT License

JavaScript 71.33% Shell 28.67%
promise feedparser article wrapper feed feeds parser promises rss

feedparser-promised's Introduction

[DEPRECATED] feedparser-promised

npm version downloads npm license travis build codeclimate score

⛔️ DEPRECATED

I've decided to deprecate this libary as feedparser already does a pretty good job to parse content from a RSS feed.

This repo, as a wrapper, will always get behind the newly features and security updates. If you used this library until now or have even considered, thank you and I'm sorry.

Wrapper around feedparser with promises.

Install

  $ npm install --save feedparser-promised

Usage

  const feedparser = require('feedparser-promised');

  const url = 'http://feeds.feedwrench.com/JavaScriptJabber.rss';

  feedparser.parse(url).then(items =>
    items.forEach(item => console.log('title:', item.title));
  ).catch(console.error);

Using HTTP Node.js options

const feedparser = require('feedparser-promised');

const httpOptions = {
  uri: 'http://feeds.feedwrench.com/JavaScriptJabber.rss',
  timeout: 3000,
  gzip: true,
  // ...
};

feedparser.parse(httpOptions).then(items => { /* do your magic here */ });

Using Feedparser options

const feedparser = require('feedparser-promised');

const httpOptions = {
  uri: 'http://feeds.feedwrench.com/JavaScriptJabber.rss',
  // ...
};

const feedparserOptions = {
  feedurl: 'http://feeds.feedwrench.com/JavaScriptJabber.rss',
  normalize: false,
  addmeta: false,
  resume_saxerror: true
};

feedparser.parse(httpOptions, feedparserOptions).then(items => { /* do your magic here */ });

List of article properties

  • title: title
  • description: frequently, the full article content
  • summary: frequently, an excerpt of the article content
  • link: link
  • origlink: when FeedBurner or Pheedo puts a special tracking url in the link property, origlink contains the original link
  • permalink: when an RSS feed has a guid field and the isPermalink attribute is not set to false, permalink contains the value of guid
  • date: most recent update
  • pubdate: original published date
  • author: author
  • guid a unique identifier for the article
  • comments: a link to the article's comments section
  • image: an Object containing urlandtitle` properties
  • categories: an Array of Strings
  • source: an Object containing url and title properties pointing to the original source for an article; see the RSS Spec for an explanation of this element
  • enclosures: an Array of Objects, each representing a podcast or other enclosure and having a url property and possibly type and length properties
  • meta: an Object containing all the feed meta properties; especially handy when using the EventEmitter interface to listen to article emissions

Contributing

There are many ways to contribute, such as fixing opened issues, creating them or suggesting new ideas. Either way will be very appreciated.

If there are issues open, I recommend you follow those steps:

  • Create a branch feedparser-promised#{issue_number}; eg: feedparser-promised#42
  • Please, remember to write unit tests.
  • Send a pull request!

Running Tests

$ npm test

License

feedparser-promised is released under the MIT License.

feedparser-promised's People

Contributors

alabeduarte avatar joseph1125 avatar sdrobov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

feedparser-promised's Issues

Errors aren't handled in the `catch`

The "Not a feed" error coming from feedparser isn't being handled with the example code. Instead of handling this gracefully, it crashes the server.

Parser crashes on this valid RSS feed

I originally raised this on the feedparser project, as I'd presumed it was something there. But @danmactough said it looks like this feed is being served gzipped, and the wrapper needs to unzip that before passing it on. This is the feed URL which breaks and the curl response looks like this:

curl -i "http://www.arctic-council.org/index.php/en/?format=feed&type=rss"

HTTP/1.1 200 OK
Server: nginx
Date: Fri, 28 Apr 2017 18:30:09 GMT
Content-Type: application/rss+xml; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Content-Encoding: gzip
Expires: Wed, 17 Aug 2005 00:00:00 GMT
ETag: http://www.arctic-council.org/index.php/en/?format=feed&type=rss
Cache-Control: no-cache
Pragma: no-cache
Set-Cookie: 9b25b2ceec56fcde5bd7ce6373ce6f76=bp2mueorpmsfdn3kmvucgkqof4; path=/; HttpOnly
Last-Modified: Fri, 28 Apr 2017 18:27:09 GMT
Host-Header: 192fc2e7e50945beb8231a492d6a8024
X-Proxy-Cache: MISS

Note the Content-Encoding header. I believe all that's required is adding gzip: true to the request() call... See here for why this isn't the default option.

Getting ETIMEDOUT on Amazon AWS Lambda Node 8.10

When I test the function below locally with the same node version as on AWS Lambda, the response is instant and I get my result.
But when I run it on Lambda, the first time, I see "about to hit test feedparser with...." and then nothing else, at all.

If I then invoke the same function 10 seconds later, I see

"Caught an error in checking for https://test.com/valid.xml Error: ETIMEDOUT"

and then the same "about to hit....", but what never shows in the logs is "within test feedparser"

Note in the code below, I have tried changing gzip to true and false; should I be doing something with pools? Also, I did test using plain old "request" and it worked, so it's not Lambda that's blocking my connection. I realise this probably isn't a problem with this library, but is there anything else I can do to debug? Thanks!

 var httpOptions = {
		'uri': url,
		'gzip': false,
		'timeout': 6500,
		'pool': false,
		'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.66 Safari/537.36"
	};
    
const feedparserOptions = {
	feedurl: url,
	normalize: false,
	addmeta: false,
	resume_saxerror: true
	};
   
    console.log(`about to hit test feedparser with httpOptions ${httpOptions} and feedparserOptions ${feedparserOptions}`);
    feedparser.parse(httpOptions, feedparserOptions).then(items => {
	console.log(`within test feedparser and got items of ${JSON.stringify(items)}`);
	}).catch(err => {
	console.log(`Caught an error in checking for ${httpOptions.uri} ${err}`);
	});

It's doing my head in, having spent almost two full days now, tearing my hair out, googling, StackOverflowing etc. Any ideas? Many thanks

Timeout Error Error: ESOCKETTIMEDOUT parsing https://www.nasdaq.com/feed/rssoutbound?category=Stocks

I'd like to parse https://www.nasdaq.com/feed/rssoutbound?category=Stocks. I am using the sample from this page. When I run the code I receive a Error: ESOCKETTIMEDOUT. More of the error is below the code. I've used this code to parse many many other RSS feeds without a problem.
The XML file appears fine. I've pasted the head portion below the error message.

    const httpOptions = {
        uri: 'https://www.nasdaq.com/feed/rssoutbound?category=Stocks',
        timeout: 9000,
        gzip: true,
      };

      feedparser.parse(httpOptions).then(items => { 
        items.forEach(item =>  {
            console.log('title:', item.title); 
        });
       });

(node:1937) UnhandledPromiseRejectionWarning: Error: ESOCKETTIMEDOUT
at ClientRequest.
at Object.onceWrapper (events.js:420:28)
at ClientRequest.emit (events.js:314:20)
at TLSSocket.emitRequestTimeout (_http_client.js:769:9)
at Object.onceWrapper (events.js:420:28)
at TLSSocket.emit (events.js:326:22)
at TLSSocket.Socket._onTimeout (net.js:483:8)
at listOnTimeout (internal/timers.js:551:17)
at processTimers (internal/timers.js:494:7)
(Use node --trace-warnings ... to show where the warning was created)
(node:1937) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:1937) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

"use strict";

Hi,
i guess it's because i use a node old version : v4.7.3 i get this error :

Block-scoped declarations (let, const, function, class) not yet supported outside strict mode

may you add "use strict"; to feedParserPromised.js ?
Thanks in advance

Recent update broke passing options to underlying feedparser

Ref my old ticket, after updating the code, it seems the underlying feedparser is no longer getting the options object passed in. Indeed, this looks to be the case in the source code, as it is instantiated as const feedparser = new FeedParser(); rather than const feedparser = new FeedParser(options);.

I couldn't exactly follow the change that led to this, but is it possible to put options support back in?

Many thanks!

Duplicate items in feed

I've been having real trouble pinning down the source of a problem with a feed aggregator I'm building, and I've boiled it down to such a simple example that I can't see what I'm doing wrong.

I've captured a real feed and stored the XML locally so that I can be sure I'm getting the same stuff every time, but FeedParserPromised is returning a variable number of items, and almost always more than the feed contains, resulting in duplicates being emitted by my code. Can you tell me where I'm going wrong - or is it a problem with FPP? Note the code below uses the original source URL, but I get the same behaviour if I capture that and serve it locally as a static XML file. The feed itself has 10 items currently, but the code below outputs between 14 and 19 items:

var request = require('request');
var FeedParser = require('feedparser-promised');

var req = request({
    uri: 'http://www.europlanet-eu.org/feed/',
    timeout: 3000
});

FeedParser.parse(req).then(function (items) {   
    console.log(items.length);
});

Note I did recode this to use the underlying feedparser library, and always got 10 items returned, so I'm assuming it's something to do with the promisification... I'm running this on Debian 7 with node 1.8.1 if that makes any difference.

encoding problems

I'm working with Spanish feeds and I always get � char with special character. I tried with a decoder but I always get the same char so I think is an internal issue.

Any idea?

App crashes in IE 11

Hi,
After adding your lib it works perfectly and thank you for that :)
Only problem is that is crashes on IE11 with error
SCRIPT1002: Syntax error
File: bundle.js, Line: 117906, Column: 19

Error is pointing on this code

module.exports = class FeedParserPromised {
	  static parse (options) {
	    return new Promise( (resolve, reject) => {
	      const items = [];
	      const feedparser = new FeedParser();

	      feedparser.on('error', (err) => { reject(err); });

	      feedparser.on('readable', () => {
	        let item;

	        while(item = feedparser.read()) { items.push(item); }

	        return items;
	      });

Also this WARNING is displayed after webpack build

`WARNING in ./~/ajv/dist/ajv.bundle.js
Critical dependencies:
1:476-483 This seems to be a pre-built javascript file. Though this is possible,
it's not recommended. Try to require the original source to get better results.

@ ./~/ajv/dist/ajv.bundle.js 1:476-483`

Any ideas how to solve this would be appreciated

thx in advance

Update the @types/request = 0.0.45

When using typescript 2.4 to compile, compiling @types/request will result in error because of compatibility issue, it is fixed in version 0.0.45

How to avoid import stopping on error?

I noticed that when there's an error importing the RSS feed, the whole import stops.

I tried to not include the .catch but it's still not helping. Any idea how to ignore problematic items in feeds and make the parser go on no matter what?

The ES6 class syntax makes this unusable in ES5 projects

I cannot use this library in a project that's not setup to transpile ES6 sources, because of the class syntax.

Most notably, the create-react-app boilerplate used by thousands of people, expects all dependencies to be ES5 when building. See facebook/create-react-app#2433 (comment) for an explanation.

I recommend adding a build step that uses babel to transpile the sources to ES5, and puts them in a /lib folder.

A quick try on Try Babel showed that the code was plug and play, and everything worked when pasting the transpiled code into the feedparser-promised module.

It looks like you tried to convert everything to es5 in this commit f8256d4 , but I don't think you succeed, maybe I'm reading the commit wrong.

I have a fork of this for my personal usage that fixes the issue, however I haven't tested it extensively, only with my own React project.
Let me know if you want a PR to this repo. If you see other solutions, please tell.

Parsing XML attributes

Hi,
Is there a way to get custom xml attributes when promise is resolved ?
example

<item segment="home">
<title>blah</title>
<link>
blabla.com
</link>
</item>

I want to parse segment="home" part

thx in advance

Can't load feedparser-promised - broken package?

I'm converting a load of old callback-based code to nice shiny promises, and was hoping this module would make things really easy. But I'm stuck at the first step. I've run npm i --save feedparser-promised and put var FeedParser = require('feedparser-promised); at the top of the code. But I get this:

[nodemon] starting `node app.js`
module.js:336
    throw err;
          ^
Error: Cannot find module './lib/feedParserPromised'
    at Function.Module._resolveFilename (module.js:334:15)
    at Function.Module._load (module.js:276:25)
    at Module.require (module.js:363:17)
    at require (module.js:382:17)
    at Object.<anonymous> (/node/classifier/node_modules/feedparser-promised/index.js:1:88)
    at Module._compile (module.js:428:26)
    at Object.Module._extensions..js (module.js:446:10)
    at Module.load (module.js:353:32)
    at Function.Module._load (module.js:308:12)
    at Module.require (module.js:363:17)

and indeed there is no /lib folder there. I tried running make in the node_modules/feedparser-promised folder, but apart from a couple of warnings, it didn't seem to change anything. In case you're interested, the warnings were:

$ make
npm install
npm WARN deprecated [email protected]: Jade has been renamed to pug, please install the latest version of pug instead of jade
npm WARN deprecated [email protected]: lodash@<3.0.0 is no longer maintained. Upgrade to lodash@^4.0.0.
npm WARN cannot run in wd [email protected] babel -d lib src/ (wd=/node/classifier/node_modules/feedparser-promised)

and then a bunch of modules with dependencies. But still no /lib folder... Am I'm missing something? I re-read the README several times...

Is it possible to remove unwanted characters?

I'm currently working on parsing RSS Feeds. I find this package to work great with namespaces which have been pitfalls in other packages. Currently I'm making use of the option normalize: true. I wonder if it would be possible to remove some of the characters in the response.

For example to clean the response a bit

'rss:link':
   { '@': {},
     '#':
      'http://www.nycgovparks.org/events/2019/06/08/astoria-park-carnival1' },
  'rss:description':
   { '@': {},
     '#':
      '<p>Date: June 8, 2019</p><p><p>Join the Central Astoria Local Development Coalition for its annual Astoria Park at the Astoria Park Parking Lot. The Carnival will feature games, food, rides, and fun for the entire family. We look forward to seeing you there!</p>\n\n<p>The Carnival will take place on Wednesday, June 5 through Sunday, June 9. </p>\n\n<p><strong>Carnival Hours</strong></p>\n\n<ul>\n\t<li>Wednesday, June 5 and Thursday, June 6: 4:00 p.m. - 10:00 p.m.</li>\n\t<li>Friday, June 7: 4:00 p.m. - 11:00 p.m.</li>\n\t<li>Saturday, June 8: Noon to midnight</li>\n\t<li>Sunday, June 9: Noon to 11:00 p.m.</li>\n</ul></p><p>Start time: 12:00 pm</p><p>End time: 11:59 pm</p><p>Contact phone: (718) 728-7820</p><p>Location: Parking Lot (in Astoria Park)</p>' },
  'event:parkids': { '@': {}, '#': 'Q004' },
  'event:parknames': { '@': {}, '#': 'Astoria Park' },
  'event:startdate': { '@': {}, '#': '2019-06-08' },
  'event:enddate': { '@': {}, '#': '2019-06-08' },
  'event:starttime': { '@': {}, '#': '12:00 pm' },
  'event:endtime': { '@': {}, '#': '11:59 pm' },
  'event:contact_phone': { '@': {}, '#': '(718) 728-7820' },
  'event:location': { '@': {}, '#': 'Parking Lot (in Astoria Park)' }

Would it be possible to remove '@', {} thus cleaning the output of the parsing? Is there an option that will accomplish this? So it can look something like 'event:parkids': 'Q004'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.