Giter Site home page Giter Site logo

Comments (10)

rchipka avatar rchipka commented on September 24, 2024 1

I tried to reproduce your example code, however I'm not sure why your code isn't scraping. Try logging errors with .error and see if that reveals anything.

My test code:

var osmosis = require('osmosis');

osmosis
.get('http://www.yelp.com/search?cflt=hvac&find_loc=Charlotte%2C+NC', { start: 0 })
.paginate({ start: +10 },  5)
.set({
  'companies': [
    osmosis.find('.regular-search-result').set({
      'name': 'a.biz-name',
      'link': 'a.biz-name@href',
      'address': 'address'
    })
  ]
})
.data(console.log)
.log(console.log)
.error(console.log)
.debug(console.log)

from node-osmosis.

anasqadrei avatar anasqadrei commented on September 24, 2024 1

@rc0x03 Here are few suggestions to improve the documentation:

Great library by the way. I am using it instead of x-ray because it supports cookies and advanced pagination. Thanks.

from node-osmosis.

rchipka avatar rchipka commented on September 24, 2024

It might be that you put .paginate after .find. The .paginate command should be placed immediately after the request command (.get, .follow, etc.). If you put .paginate after the .find command the next page will be loaded but .find won't be called. If that's not the case, you may have to post some example code so I can try to reproduce the issue.

from node-osmosis.

anasqadrei avatar anasqadrei commented on September 24, 2024

Documentation needs some work. I still can't get how paginate works!

from node-osmosis.

anasqadrei avatar anasqadrei commented on September 24, 2024

OK. I figured it out. Specify the query string in the first get. Then paginate should be the next line, and the rest follows.

.get('http://www.example.com/', { pagingQryStr: 1 }) //start with pagingQryStr=1
.paginate({ pagingQryStr: +1 },  5) //limit of 5 pages
.find('whatever')

from node-osmosis.

patrickml avatar patrickml commented on September 24, 2024

@rc0x03 i updated my code to what @anasqadrei mentioned and now i am experiencing a different bug. It doesnt seem to scrape, period. It will say it loaded the pages but nothing is happening. How exactly does the pagination work, is it setting a query param or?

Example Code

o
          .get(url, { page: 1 })
          .paginate({ page: +1 },  5)
          .set({
            'companies': [
              o.find('.business-info').set({
                'name': 'h4 a',
                'link': 'h4 a@href',
                'address': 'address'
              })
            ]
          })
          .data(Meteor.bindEnvironment(function (data) {
            _.each(data.companies, function (company, index) {
              company.scraped = false;
              company.category = config.category;
              try {
                var id = Companies.insert(company);
                console.log(company.name + " : " + id);

              } catch ( e ) {
                console.log('Duplicate Found');
              }
            });
            Meteor.call('collectScrape');
          }))
          .log(function (msg) {
            console.log(msg);
          });

from node-osmosis.

patrickml avatar patrickml commented on September 24, 2024

Hmmm, im not too sure what is going on then maybe its just the site I am using. I was able to get it to kinda work if I added all of the query params into the args as seen below, however it changed the result so im wondering if it has to do with how the query params are added? Is it possible they are being re-encoded which could mess up certain characters like pluses?


.get(url, {
            type : 'category',
            input : type,
            location : location,
            tobid : '',
            filter : 'business',
            source : 'bbbse',
            radius : '80',
            accredited : 'accredited',
            country : 'USA%2CCAN',
            language : 'en',
            page: 50,
            codeType : 'YPPA'
          })

from node-osmosis.

rchipka avatar rchipka commented on September 24, 2024

Yes, needle will automatically urlencode your query parameters. So country should be set to 'USA,CAN' and not 'USA%2CCAN'.

from node-osmosis.

rchipka avatar rchipka commented on September 24, 2024

@anasqadrei can you give me some recommendations as far as improving the documentation? What would you like to see in it? How can I make things more clear?

from node-osmosis.

patrickml avatar patrickml commented on September 24, 2024

@rc0x03 Thank you

from node-osmosis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.