Comments (10)
I tried to reproduce your example code, however I'm not sure why your code isn't scraping. Try logging errors with .error
and see if that reveals anything.
My test code:
var osmosis = require('osmosis');
osmosis
.get('http://www.yelp.com/search?cflt=hvac&find_loc=Charlotte%2C+NC', { start: 0 })
.paginate({ start: +10 }, 5)
.set({
'companies': [
osmosis.find('.regular-search-result').set({
'name': 'a.biz-name',
'link': 'a.biz-name@href',
'address': 'address'
})
]
})
.data(console.log)
.log(console.log)
.error(console.log)
.debug(console.log)
from node-osmosis.
@rc0x03 Here are few suggestions to improve the documentation:
- Put it all in the main Readme.md file. Seems it's the convention since most other libraries are doing the same.
- For each method, we want an explanation of the arguments and a full working example. Something like https://github.com/caolan/async
- I didn't know how to write the selectors and what syntax to use, so I used the reference on http://www.w3.org/TR/CSS21/selector.html#pattern-matching. So maybe add something about that.
Great library by the way. I am using it instead of x-ray because it supports cookies and advanced pagination. Thanks.
from node-osmosis.
It might be that you put .paginate
after .find
. The .paginate
command should be placed immediately after the request command (.get
, .follow
, etc.). If you put .paginate
after the .find
command the next page will be loaded but .find
won't be called. If that's not the case, you may have to post some example code so I can try to reproduce the issue.
from node-osmosis.
Documentation needs some work. I still can't get how paginate works!
from node-osmosis.
OK. I figured it out. Specify the query string in the first get
. Then paginate
should be the next line, and the rest follows.
.get('http://www.example.com/', { pagingQryStr: 1 }) //start with pagingQryStr=1
.paginate({ pagingQryStr: +1 }, 5) //limit of 5 pages
.find('whatever')
from node-osmosis.
@rc0x03 i updated my code to what @anasqadrei mentioned and now i am experiencing a different bug. It doesnt seem to scrape, period. It will say it loaded the pages but nothing is happening. How exactly does the pagination work, is it setting a query param or?
Example Code
o
.get(url, { page: 1 })
.paginate({ page: +1 }, 5)
.set({
'companies': [
o.find('.business-info').set({
'name': 'h4 a',
'link': 'h4 a@href',
'address': 'address'
})
]
})
.data(Meteor.bindEnvironment(function (data) {
_.each(data.companies, function (company, index) {
company.scraped = false;
company.category = config.category;
try {
var id = Companies.insert(company);
console.log(company.name + " : " + id);
} catch ( e ) {
console.log('Duplicate Found');
}
});
Meteor.call('collectScrape');
}))
.log(function (msg) {
console.log(msg);
});
from node-osmosis.
Hmmm, im not too sure what is going on then maybe its just the site I am using. I was able to get it to kinda work if I added all of the query params into the args as seen below, however it changed the result so im wondering if it has to do with how the query params are added? Is it possible they are being re-encoded which could mess up certain characters like pluses?
.get(url, {
type : 'category',
input : type,
location : location,
tobid : '',
filter : 'business',
source : 'bbbse',
radius : '80',
accredited : 'accredited',
country : 'USA%2CCAN',
language : 'en',
page: 50,
codeType : 'YPPA'
})
from node-osmosis.
Yes, needle will automatically urlencode your query parameters. So country
should be set to 'USA,CAN'
and not 'USA%2CCAN'
.
from node-osmosis.
@anasqadrei can you give me some recommendations as far as improving the documentation? What would you like to see in it? How can I make things more clear?
from node-osmosis.
@rc0x03 Thank you
from node-osmosis.
Related Issues (20)
- paginate doesn't preserve the http method
- Cannot get contents of an element that contains < as text HOT 1
- figure and figcaption not supported
- Error: Invalid property for defaults:compressed HOT 8
- A question about osm map quantity
- A question about osm map quantity
- Getting "script" content seems to truncate characters after a limit HOT 2
- Worker_Threads :: Module did not self-register HOT 1
- Get elements that contains only numbers?
- How selector paginate work? HOT 8
- Add support for case insensitive wild card matching
- Build fails on NodeJS 12 HOT 3
- Use with local file HOT 1
- content of xml <link> tag not extracted? HOT 1
- [DEP0066] DeprecationWarning: OutgoingMessage.prototype._headers is deprecated with node 12.16.1
- Get HTML code of element HOT 1
- not working, .data handler not called
- [HOW TO] paginate by click handler, not by link?
- How to scrape the webpage contents which takes some time to load?
- how can return values be guaranteed? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from node-osmosis.