I am currently working on trying to get my scraper to paginate and it doesnt seem to d

It might be that you put .paginate after <code class=

@rc0x03 i updated my code to what <a class="user-mention notranslate" data-hovercard-t

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Pagination loads but skips previous commands about node-osmosis HOT 10 CLOSED

rchipka commented on September 24, 2024

Pagination loads but skips previous commands

from node-osmosis.

Comments (10)

rchipka commented on September 24, 2024 1

I tried to reproduce your example code, however I'm not sure why your code isn't scraping. Try logging errors with .error and see if that reveals anything.

My test code:

var osmosis = require('osmosis');

osmosis
.get('http://www.yelp.com/search?cflt=hvac&find_loc=Charlotte%2C+NC', { start: 0 })
.paginate({ start: +10 },  5)
.set({
  'companies': [
    osmosis.find('.regular-search-result').set({
      'name': 'a.biz-name',
      'link': 'a.biz-name@href',
      'address': 'address'
    })
  ]
})
.data(console.log)
.log(console.log)
.error(console.log)
.debug(console.log)

from node-osmosis.

anasqadrei commented on September 24, 2024 1

@rc0x03 Here are few suggestions to improve the documentation:

Put it all in the main Readme.md file. Seems it's the convention since most other libraries are doing the same.
For each method, we want an explanation of the arguments and a full working example. Something like https://github.com/caolan/async
I didn't know how to write the selectors and what syntax to use, so I used the reference on http://www.w3.org/TR/CSS21/selector.html#pattern-matching. So maybe add something about that.

Great library by the way. I am using it instead of x-ray because it supports cookies and advanced pagination. Thanks.

from node-osmosis.

rchipka commented on September 24, 2024

It might be that you put .paginate after .find. The .paginate command should be placed immediately after the request command (.get, .follow, etc.). If you put .paginate after the .find command the next page will be loaded but .find won't be called. If that's not the case, you may have to post some example code so I can try to reproduce the issue.

from node-osmosis.

anasqadrei commented on September 24, 2024

Documentation needs some work. I still can't get how paginate works!

from node-osmosis.

anasqadrei commented on September 24, 2024

OK. I figured it out. Specify the query string in the first get. Then paginate should be the next line, and the rest follows.

.get('http://www.example.com/', { pagingQryStr: 1 }) //start with pagingQryStr=1
.paginate({ pagingQryStr: +1 },  5) //limit of 5 pages
.find('whatever')

from node-osmosis.

patrickml commented on September 24, 2024

@rc0x03 i updated my code to what @anasqadrei mentioned and now i am experiencing a different bug. It doesnt seem to scrape, period. It will say it loaded the pages but nothing is happening. How exactly does the pagination work, is it setting a query param or?

Example Code

o
          .get(url, { page: 1 })
          .paginate({ page: +1 },  5)
          .set({
            'companies': [
              o.find('.business-info').set({
                'name': 'h4 a',
                'link': 'h4 a@href',
                'address': 'address'
              })
            ]
          })
          .data(Meteor.bindEnvironment(function (data) {
            _.each(data.companies, function (company, index) {
              company.scraped = false;
              company.category = config.category;
              try {
                var id = Companies.insert(company);
                console.log(company.name + " : " + id);

              } catch ( e ) {
                console.log('Duplicate Found');
              }
            });
            Meteor.call('collectScrape');
          }))
          .log(function (msg) {
            console.log(msg);
          });

from node-osmosis.

patrickml commented on September 24, 2024

Hmmm, im not too sure what is going on then maybe its just the site I am using. I was able to get it to kinda work if I added all of the query params into the args as seen below, however it changed the result so im wondering if it has to do with how the query params are added? Is it possible they are being re-encoded which could mess up certain characters like pluses?


.get(url, {
            type : 'category',
            input : type,
            location : location,
            tobid : '',
            filter : 'business',
            source : 'bbbse',
            radius : '80',
            accredited : 'accredited',
            country : 'USA%2CCAN',
            language : 'en',
            page: 50,
            codeType : 'YPPA'
          })

from node-osmosis.

rchipka commented on September 24, 2024

Yes, needle will automatically urlencode your query parameters. So country should be set to 'USA,CAN' and not 'USA%2CCAN'.

from node-osmosis.

rchipka commented on September 24, 2024

@anasqadrei can you give me some recommendations as far as improving the documentation? What would you like to see in it? How can I make things more clear?

from node-osmosis.

patrickml commented on September 24, 2024

@rc0x03 Thank you

from node-osmosis.

Pagination loads but skips previous commands about node-osmosis HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent