Giter Site home page Giter Site logo

node-horseman's People

Contributors

awlayton avatar dzoba avatar easyrider avatar efernandesng avatar fabiocorneti avatar fay-jai avatar framerate avatar ganrmit avatar grahamkennery avatar jeprojects avatar johntitus avatar kirillrogovoy avatar mvalipour avatar petecoop avatar piercus avatar robertpallas avatar shesmu avatar tmerse avatar w33ble avatar wong2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-horseman's Issues

Feature Request: ".do"

In order to run custom functions during a chain i would like to propose a "do" function. The function can be added to actions.js and could be as simple as:

exports.do = function(fn){
    return fn();
} 

In the chain the user can use do like this:

horseman
.open('http://'example.com')
.screenshot('test.png')
.do(function(){
    updateProgress();
    console.log('took a screenshot')
})
.type('input[name="Email"]', '[email protected]')
.click('input[name="signIn"]')
.do(function(){
    console.log("signed in")
})
.finally(function(){
   console.log('finished');
   horseman.close(); 
});

The function might need some polishing, but you get the idea. Extra functionality could be to call a next() function within the do function, to continue the chain.

Horseman hanging while starting new instance

Hi Johnitus,

First of all thanks for the wonderful work !!!

I tried to create more than one horseman instance in a different phantomJs port.But while doing the second horseman instance waits to instantiate till the first horseman instance gets closed.

Is it possible to create more than one horseman instance from a same process using different phantomJs port?

I have attached sample code block below

main.js

/**
 * Module Imports
 */

var create = require("./headless.js");

/**
 * Global Variables
 */

var urlList = [
    "http://www.google.com/",
    "http://www.bing.com/",
    "http://www.yahoo.com/",
    "https://github.com/",
    "https://wordpress.com/"
];
var port = 12401;

/**
 * Looping Through Url List And Creating New Horseman Instance For Each Url
 */

for (var url in urlList) {
    (function (port) {
        console.log(port,"Looping Once Again");
        create.createInstance(urlList[url],port)
            .then(function (horseman) {
                console.log(port,horseman.url());
                console.log(port,"closing node horseman");
                horseman.close();
            });
    })(port);
    port++;
} 

headless.js

/**
 * Global Imports
 */

var Promise = require('bluebird');
var Horseman = require('node-horseman');

/**
 * Creates A Horseman Instance For Each Url In A Given Port
 */

exports.createInstance = function (link,port) {
    return new Promise(function (resolve,reject) {
        console.log(port,"creating new horseman instance");
        console.log("Port For New Horseman Instance",port);
        horseman = new Horseman({
            port : port
        });
        console.log(port,"created Instance");
        horseman
            .open(link);
        resolve(horseman);
    });
};

Sample Output

12401 'Looping Once Again'
12401 'creating new horseman instance'
Port For New Horseman Instance 12401
12401 'created Instance'
12402 'Looping Once Again'
12402 'creating new horseman instance'
Port For New Horseman Instance 12402
http://www.google.com/
12401 'closing node horseman'
12402 'created Instance'
12403 'Looping Once Again'
12403 'creating new horseman instance'
Port For New Horseman Instance 12403
http://www.bing.com/
12402 'closing node horseman'
12403 'created Instance'
12404 'Looping Once Again'
12404 'creating new horseman instance'
Port For New Horseman Instance 12404
https://www.yahoo.com/
12403 'closing node horseman'
12404 'created Instance'

You can clearly see from the sample output.Only when previous horseman instance gets closed next instance gets created.

Is there a way to create parallel instance of horseman from same process??

Assertion failed error

I keep getting the following error when using Internet Explorer

Assertion failed: !current_buffer, file src\node_http_parser.cc, line 387

I have a Express website that uses AJAX request to pull data in using node-horseman. Whenever Internet Explorer is used I get the above error. Google Chrome however does not have any issues.

I am using node v0.10.36

While troubleshooting, I have even removed everything but the var x = new horseman initialization and I still get the error. As soon as I remove that, the issue goes away.

Please help. Thanks.

Horseman Won't Install on iojs (> v2.5.0) or node (v4.0.0)

It seems that I can't install node-horseman on iojs greater than version 2.5.0 or node 4.0.0.

Is node-horseman still being supported? It looks like the trouble may be with 'deasync'? I just figured I would start a thread for others trying to upgrade.


npm ERR! Linux 3.13.0-57-generic
npm ERR! argv "/home/doppleruser/.nvm/versions/io.js/v3.0.0/bin/iojs" "/home/doppleruser/.nvm/versions/io.js/v3.0.0/bin/npm" "install"
npm ERR! node v3.0.0
npm ERR! npm v2.13.3
npm ERR! code ELIFECYCLE

npm ERR! [email protected] install: node ./build.js
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] install script 'node ./build.js'.
npm ERR! This is most likely a problem with the deasync package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! node ./build.js
npm ERR! You can get their info via:
npm ERR! npm owner ls deasync
npm ERR! There is likely additional logging output above.

npm ERR! Please include the following file with any support request:
npm ERR! /home/package-trials/npm-debug.log


Search Example, Not showing the Results?

Thanks for this awesome pack. I have been searching for a way to use phantomjs from node, and horseman is exactly what I needed!

I have been going through the api and examples.

Here is one problem I am facing: When I run the quickstart example, google search,

  1. It opens the google page.
  2. Enters my keyword, eg, github
  3. It doesn't show the results. Output - [Number of links: null]

I have attached the screenshot....

Search on Google:

var Horseman = require('node-horseman');
var horseman = new Horseman();

var numLinks = horseman
.open('http://www.google.com')
.type('input[name="q"]', 'github')
.click("button:contains('Google Search')")
.waitForNextPage()
.count("li.g");

// I have added this code to take screenshot.
horseman.screenshot('g3.jpeg');

console.log("Number of links: " + numLinks);

horseman.close();

// Output - Number of links: null

g3

Consider making injecting jQuery an option

For sites that already include jQuery, the injected jQuery can cause problems, clobbering the site's jQuery and removing plugins, etc., that it has added to the jQuery.fn namespace. It would be nice to be able to turn off this feature with something like var horseman = new Horseman({injectjQuery: false}). If you'd like me to send a pull request for this, let me know.

is "crop" possible in v2?

Hi - I find horseman really useful, and I used to use the .crop feature - is this no longer available in v2? I can't see it in the docs.

Horseman does not start on heroku

I am trying to start a horseman instance on heroku.
Example:
var phantomjs = require("phantomjs"); var Horseman = require("node-horseman"); var config = {phantomPath:phantomjs.path}; new Horseman(config)

Debug Output: Sat, 28 Mar 2015 18:06:49 GMT horseman .setup() creating phantom instance on

nom Error :npm ERR! code ELIFECYCLE

Add support for Tabs

Version 1.x had support for tabs, but 2.x does not. Should be straightforward - just store an array of phantom pages. That's how 1.x worked.

process.exit() from cmd

Hi! After require('node-horseman') i lose control of the process in cmd (windows 8)
i have to write e.g. process.on('SIGINT', function() { process.exit(); }) yourself to close the process from terminal on ctrl+c. e.t.c

Feature Request: .log()

It would be nice if you could spit out the result of an evaluate-type function, or a string, without breaking the chain. Seems like it shouldn't be too hard, but my brain isn't coming up with the answer this morning.

Currently, you need to do something like this:

horseman
  .open('http://www.google.com')
  .count('a')
  .then(function(count){
     console.log(count);
     horseman
        .open('http://www.yahoo.com')
        .count('a')
        .then(function(count){
             console.log(count);
         });
  });

Desired:

horseman
  .open('http://www.google.com')
  .count('a')
  .log() // spits out the value returned from count
  .open('http://www.yahoo.com')
  .count('a')
  .log() // spits out the value returned from count
  .log('here i am') // spits out the string provided.

Confirm and Alert not working

Here's the code I used:

yield horseman.on('alert', co.wrap(function*(msg) {
console.log(':\nALERT: ' + msg);
return true;
}));

I also set an event for urlChanged and that works perfectly fine with the same format above. (using the co.wrap)

Any idea what might be the cause?

The rest of the code works fine if a confirm/alert is not on the site. Also, not sure if this is important but I used an android device user agent... so this was for mobile.

Thanks

Unable to retrieve a list of cookies.

So I have tried the following code on node 0.10.x, 0.12.x and iojs 2.3.x with phantomjs 1.9.8 and 2.0.0 on OSX Yosemite. All I'm getting is an error that getCookies is not a function on the phantom object.

TypeError: this.phantom.getCookies is not a function
    at Horseman.exports.cookies (/Users/Josh/Desktop/Projects/horse-test/node_modules/node-horseman/lib/actions.js:233:18)
    at Object.<anonymous> (/Users/Josh/Desktop...
var Horseman = require('node-horseman');

var horseman = new Horseman({
 cookiesFile: 'cookies.txt'
 });

var result = horseman
    .open('http://google.com')
    .cookies();

console.log(result);

I know it's getting the cookies because it's saving to cookies.txt that I set in the options object.

Any idea what's going on?

"Unhandled rejection HeadlessError: Phantom Process died" when doing multiple open()

Hello! We've been using horseman to access a certain feature in the youtube cms. It was working well before but throws these errors now.

Here's the general logic:

  1. We need to login to the youtube cms (no issues what so ever)
  2. Loop to different pages and perform some actions (this is where things get bumpy, the error is shown below)
horseman .open https://accounts.google.com/ServiceLogin?continue=https%3A%2F%2Fwww.youtube.com%2Fsignin%3Faction_handle_signin%3Dtrue%26next%3D%252F%26app%3Ddesktop%26feature%3Dsign_in_button%26hl%3Den&passive=true&hl=en&service=youtube&uilel=3#identifier +2s
  horseman injected jQuery +13ms
  horseman .click() +2ms
  horseman .type() +10ms input[name="Email"] [email protected] undefined
  horseman .click() +3ms
  horseman .wait() +19ms 2000
  horseman .type() +2s input[name="Passwd"] password123 undefined
  horseman .click() +2ms
  horseman .waitForNextPage() +24ms
  horseman Timeout during waitForNextPage() +5s
  horseman .waitForNextPage() +0ms
  horseman Timeout during waitForNextPage() +5s
  horseman .close(). +1ms
  horseman .open https://www.youtube.com/my_channels?o=<page1> +6ms
  horseman .wait() +0ms 2000
  horseman .open https://www.youtube.com/my_channels?o=<page2> +1ms
Unhandled rejection HeadlessError: Phantom Process died
    at poll_func (/Users/ninz/proj/frnky/cms_interface/node_modules/node-horseman/node_modules/node-phantom-simple/node-phantom-simple.js:563:10)
    at /Users/ninz/proj/frnky/cms_interface/node_modules/node-horseman/node_modules/node-phantom-simple/node-phantom-simple.js:61:7
    at ClientRequest.<anonymous> (/Users/ninz/proj/frnky/cms_interface/node_modules/node-horseman/node_modules/node-phantom-simple/node-phantom-simple.js:484:11)
    at emitOne (events.js:77:13)
    at ClientRequest.emit (events.js:169:7)
    at Socket.socketErrorListener (_http_client.js:259:9)
    at emitOne (events.js:77:13)
    at Socket.emit (events.js:169:7)
    at emitErrorNT (net.js:1250:8)
    at doNTCallback2 (node.js:429:9)
    at process._tickCallback (node.js:343:17)
  horseman .open https://www.youtube.com/my_channels?o=<page3> +1ms
Unhandled rejection HeadlessError: Phantom Process died
    at poll_func (/Users/ninz/proj/frnky/cms_interface/node_modules/node-horseman/node_modules/node-phantom-simple/node-phantom-simple.js:563:10)
    at /Users/ninz/proj/frnky/cms_interface/node_modules/node-horseman/node_modules/node-phantom-simple/node-phantom-simple.js:61:7
    at ClientRequest.<anonymous> (/Users/ninz/proj/frnky/cms_interface/node_modules/node-horseman/node_modules/node-phantom-simple/node-phantom-simple.js:484:11)
    at emitOne (events.js:77:13)
    at ClientRequest.emit (events.js:169:7)
    at Socket.socketErrorListener (_http_client.js:259:9)
    at emitOne (events.js:77:13)
    at Socket.emit (events.js:169:7)
    at emitErrorNT (net.js:1250:8)
    at doNTCallback2 (node.js:429:9)
    at process._tickCallback (node.js:343:17)
  horseman .open https://www.youtube.com/my_channels?o=<pag4> +0ms

Horseman crashes when more than 11 instances are running

I'm writing code that uses the async library that does web spidering. I've tried opening just 1 site at a time and the code runs without a hitch. I ran into this while trying to use node-horseman:

(node) warning: possible EventEmitter memory leak detected. 11 SIGINT listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
at process.addListener (events.js:179:15)
at process.on.process.addListener (node.js:668:26)
at /app/node_modules/node-horseman/node_modules/node-phantom-simple/node-phantom simple.js:87:21
at Array.forEach (native)
at spawnPhantom (/app/node_modules/node-horseman/node_modules/node-phantom-simple/node-phantom-simple.js:86:31)
at Object.exports.create (/app/node_modules/node-horseman/node_modules/node-phantom-simple/node-phantom-simple.js:201:5)
at new Horseman (/app/node_modules/node-horseman/lib/index.js:135:11)

can't run on windows

$ phantomjs 1.js
Error: Cannot find module 'http'

phantomjs://bootstrap.js:289
phantomjs://bootstrap.js:254 in require
C:/Users/n37r06u3/Desktop/node/node_modules/node-horseman/node_modules/node-ph
antom-simple/node-phantom-simple.js:3
C:/Users/n37r06u3/Desktop/node/node_modules/node-horseman/node_modules/node-ph
antom-simple/node-phantom-simple.js:474
Error: Cannot find module 'path'

phantomjs://bootstrap.js:289
phantomjs://bootstrap.js:254 in require
C:/Users/n37r06u3/Desktop/node/node_modules/node-horseman/lib/index.js:4
C:/Users/n37r06u3/Desktop/node/node_modules/node-horseman/lib/index.js:290
TypeError: '[object Object]' is not a constructor (evaluating 'new Horseman()')

1.js:2

Request() error evaluating open() call: Error: connect ECONNREFUSED

var pattern = '0 */'+ 5 +' * * * *';
function job(){
    console.log('Start to run every ' + 5 + ' mins');
        var price = horseman
        .open('http://viettelstore.vn/dien-thoai/apple-iphone-6-plus-64gb-ban-quoc-te--pid975.html')
        .evaluate(function () {
            return  $('#_price_new436').text()
        })
        console.log(price);
        horseman.close();
}
new CronJob(pattern, job, null, true, 'America/Los_Angeles');

I use node-cron to to schedule crawling job as above. Horseman works very good without node-cron, when putting horseman in node-cron, the first call is ok, all call after that is fail with below error

Request() error evaluating open() call: Error: connect ECONNREFUSED
Request() error evaluating evaluate() call: Error: connect ECONNREFUSED

Travis-CI Compatibility

Hello,

First of all, thank you for the great work!

I was trying to run node-horseman with mocha on travis but it failed with:

> node test/horseman.js

undefined
npm ERR! Test failed.  See above for more details.

The script is almost exactly the same as the example:

var Horseman = require('node-horseman');
var horseman = new Horseman();

var numLinks = horseman
  .open('http://www.google.com')
  .type('input[name="q"]', 'github')
  .click("button:contains('Search')")
  .waitForNextPage()
  .title();

console.log("Title: " + numLinks);

horseman.close();

Please let me know if this is a known issue or if I have missed anything.

Thanks,

Ben

.crop affects all subsequent .screenshot calls

In a script, I was cropping some areas, then wanting to take full screenshots later on.

However, the clipRect set by .crop affects all subsequent .screenshot calls - is this by design?

Also, not sure why a viewport of 1200 x 800 is set after a crop? For me, altering this code so that the clipRect is removed after a crop works:

https://github.com/johntitus/node-horseman/blob/master/lib/actions.js#L267

  this.page.set('clipRect', rect, function(){
    self.pause.unpause('clipRect');
    self.screenshot( path );
    // self.page.set("viewportSize",{
    //   width: 1200,
    //   height: 800
    // });
    self.page.set('clipRect', {});
    return this;
  });

When calling "screenshot" from loadFinished callback, no screenshot is made

When i'm trying to make a screenshot in the callback of "loadFinished", the screenshot is nowhere to be found. No error occurs (success callback, "done" is printed).

When i'm using "screenshot" as a separate callback in the chain, it does work, but the screenshot might not be complete because the page is not done loading.

  var horseman = new Horseman();
  horseman
  .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0")
  .viewport(1280, 1024)
  .on("loadFinished", function() {
    horseman.screenshot(path.resolve(__dirname, '123.png')).then(function(){
      console.log("done");
    }, function(e){
      console.error(e);
    });
  })
  .on("error", function(e) {
    console.log("error", e);
  })
  .on("timeout", function(e) {
    console.log("timeout", e);
  })
  .open("http://google.com")
  .finally(function() {
     horseman.close();
  });

Am i doing something wrong here?

My environment:
Windows 8
Nodejs 4.1.1
PhantomJS 2.0.0

persistent cookies

It doesn't appear like you have an option to save persistent cookies to a file. Does your api possibly persist them in memory though for each unique object?

For example, if I login to a website and don't log out, can a subsequent connection that uses the same object use the same website session (stored in a cookie) even after horseman.close(); has been called? Or is this a limitation with the current api?

'On confirm' event is not responding to callback return value

After writing some Mocha tests utilising Horseman I've come across an issue with Javascript confirmation dialogues.

Following the PhantomJs documentation of dealing with confirmation events. I'm returning true from the on confirm callback. This page was referenced by the Horseman docs.

As explained on the PhantomJs page:

true === pressing the "OK" button, false === pressing the "Cancel" button
};

No matter what I return in my Horseman code it says it says cancel was pressed.

I've created some test files to illustrate this issue.

My test HTML:

<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <title>Confirm check</title>
    <script>
        function confirmation() {
            var answer = confirm('OK or Cancel?');
            if (answer === true) {
                console.log('You pressed OK');
            } else {
                console.log('You pressed Cancel');
            }
        }
    </script>
</head>
<a id="confirm" href="#" onclick="confirmation()" >Confirm?</a>
</body>
</html>

My test Horseman script:

var Horseman = require('node-horseman');

var browser = new Horseman();
var url = 'file:///Users/user/Desktop/confirm.html';

browser
    .viewport(1920, 1080)
    .on('loadFinished', function(status) {
        console.log('%s Status %s', url, status);
    })
    .on('consoleMessage', function( msg, lineNumber, sourceId ) {
        console.log('Console: %s, %s, %s', msg, lineNumber, sourceId);
    })
    .on('confirm', function( msg ) {
        console.log('Confirm message: %s', msg);
        return false;
    })
    .open(url)
    .click('a#confirm');

browser.close();

The terminal output with DEBUG=horseman enabled:

  horseman .setup() creating phantom instance on +0ms
  horseman .setup() phantom instance created ok. +448ms
  horseman .setup() creating phantom page. +1ms
  horseman .setup() phantom page created ok. +21ms
  horseman .setup() setting viewport. +0ms
  horseman .setup() viewport set ok. +4ms
  horseman .viewport() set +60ms
  horseman .on loadFinished set. +0ms
  horseman .on consoleMessage set. +0ms
  horseman .on confirm set. +0ms
  horseman .opening: file:///Users/user/Desktop/confirm.html +0ms
file:///Users/user/Desktop/confirm.html Status success
  horseman .open: file:///Users/user/Desktop/confirm.html - status: success +22ms
Confirm message: OK or Cancel?
Console: You pressed Cancel, undefined, undefined
  horseman .click() a#confirm +81ms
  horseman .close(). +0ms

I may be doing things wrong but I've not seen any indication that I should be doing things differently to respond to confirmation events.

Any help with this would be appreciated.

Cheers.

horseman.close() not closing

After executing a file that uses horseman to load a page, the file never completes executing. This is the code I'm using:

var Horseman = require("node-horseman");
var horseman = new Horseman();

horseman
  .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0")
  .open('http://www.google.com')
  .close();

I then run node horsemantest.js and it just hangs.

I am using node 0.12.5 and phantomjs 2.0.0, and the latest node-horseman.

typo in Readme.md

.waitForSelector(selector)
Wait until the element selector is present e.g. .wait('#pay-button')

In the above section located within the waitForSelector section of Readme.md, you show an example that uses a different method then what is expected.

Is this a typo? Shouldn't it read:

Wait until the element selector is present e.g. .waitForSelector('#pay-button')

While we're on the topic, what is the logic behind using different methods and not just using .wait and detecting for number or selector? Is it for speed?

`waitForNextPage` doesn't work

I've been trying to use waitForNextPage, but it doesn't work as expected.

In your links example, if you change this line to

return horseman.waitForNextPage();

the screenshot will give you an unfinished page. The same occurs with the pizza example.

So far I've been able to hack it out with waitForSelector, but that's highly prone to error.

Horseman ends my long search for an up-to-date Phantom driver, but this bug renders it almost unusable.

I'm on Phantom 2.0.0.

How to make horseman chain synchronous?

I am doing something like the code below, where I have wrapped a horseman operation in a function that I want to return a value. The problem is, the chain gets started asynchronously so the function returns undefined immediately. Later on when horseman finishes, it console logs the correct value.

How can I cause the chain to block or return the value I want from the .then() / .evaluate() section?

var Horseman = require('node-horseman');
var horseman = new Horseman();

console.log("outside val: " + request("a"));

function request(query) {
  horseman
    .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0")
    .open('http://example.com/'+query)
    .evaluate(function() {
      return $('input[name="fieldName"]').val();
    })
    .then(function(value) {
      console.log("Running then");
      horseman.close();
      if (value !== undefined) {
        console.log("inside val: " + value);
        return value;
      }
    });
}

Output:

outside val: undefined
Running then
inside val: correctValue

accepting & opening file downloads?

I want to test a website that pushes a CSV into my browser download folder. I need to be able to open the csv file and review the contents inside my phantom. Do you have anything to assist with this process?

phantom.getCookies is undefined

line 233 in lib/action.js:

this.phantom.getCookies(function( result ){

I can't find this method in the phantom API. Maybe the phantom object is extended, but I get this error:
undefined is not a function at Horseman.exports.cookies(/Users/kristian/WebstormProjects/HorseManScraper/node_modules/node-horseman/lib/actions.js:233:18)

goto()?

Was there a reason for removing goto()? I was using that API in a previous nightmare script. I see you have open(), which I suppose is better? I think it would be nice to have both, for legacy scripts.

Crop should allow something else then path

Provide a path is nice for local storage of image but we should be allowed to give things like a buffer for other manipulations.

I would be happy to be allowed to upload it, serve it with HTTP, etc... It's endless.

Support header footer in pdf

Support Header Footer in pdf render part. Something like following

page.paperSize = {
format: 'A4',
margin: "1cm",
/* default header/footer for pages that don't have custom overwrites (see below) */
header: {
height: "1cm",
contents: phantom.callback(function(pageNum, numPages) {
if (pageNum == 1) {
return "";
}
return "< h1>Header " + pageNum + " / " + numPages + "";
})
},
footer: {
height: "1cm",
contents: phantom.callback(function(pageNum, numPages) {
if (pageNum == numPages) {
return "";
}
return "< h1>Footer " + pageNum + " / " + numPages + "";
})
}
};

Run horseman in asynchronous way with callback

Is there any way to run horseman in asynchronous way with callback. In below example, I have to wait horseman for finishing to run 'myFunction'. Is there any way to run 'myFunction' without waiting for horseman?

var Horseman = require('node-horseman');
var horseman = new Horseman();

var numLinks = horseman
  .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0")
  .open('http://www.google.com')
  .type('input[name="q"]', 'github')
  .click("button:contains('Google Search')")
  .waitForNextPage()
  .count("li.g");

console.log("Number of links: " + numLinks);
horseman.close();

// call myFuction()

Failed to exec install script

thomas@workstation:shopify-endeavor$ npm i node-horseman --save
npm info it worked if it ends with ok
npm info using [email protected]
npm info using [email protected]
npm info package.json [email protected] license should be a valid SPDX license expression
npm info package.json [email protected] No license field.
npm info package.json [email protected] No license field.
npm info package.json [email protected] No license field.
npm info attempt registry request try #1 at 8:55:43 PM
npm http request GET http://registry.npmjs.org/node-horseman
npm http 304 http://registry.npmjs.org/node-horseman
npm info install [email protected] into /Users/thomas/Desktop/shopify-endeavor
npm info installOne [email protected]
npm info preinstall [email protected]
npm info attempt registry request try #1 at 8:55:43 PM
npm http request GET http://registry.npmjs.org/deasync
npm info attempt registry request try #1 at 8:55:43 PM
npm http request GET http://registry.npmjs.org/clone
npm info attempt registry request try #1 at 8:55:43 PM
npm http request GET http://registry.npmjs.org/defaults
npm info attempt registry request try #1 at 8:55:43 PM
npm http request GET http://registry.npmjs.org/node-phantom-simple
npm http 304 http://registry.npmjs.org/clone
npm http 304 http://registry.npmjs.org/defaults
npm http 304 http://registry.npmjs.org/node-phantom-simple
npm http 304 http://registry.npmjs.org/deasync
npm info install [email protected] into /Users/thomas/Desktop/shopify-endeavor/node_modules/node-horseman
npm info install [email protected] into /Users/thomas/Desktop/shopify-endeavor/node_modules/node-horseman
npm info install [email protected] into /Users/thomas/Desktop/shopify-endeavor/node_modules/node-horseman
npm info install [email protected] into /Users/thomas/Desktop/shopify-endeavor/node_modules/node-horseman
npm info installOne [email protected]
npm info installOne [email protected]
npm info installOne [email protected]
npm info installOne [email protected]
npm info preinstall [email protected]
npm info build /Users/thomas/Desktop/shopify-endeavor/node_modules/node-horseman/node_modules/defaults
npm info linkStuff [email protected]
npm info preinstall [email protected]
npm info install [email protected]
npm info postinstall [email protected]
npm info build /Users/thomas/Desktop/shopify-endeavor/node_modules/node-horseman/node_modules/clone
npm info linkStuff [email protected]
npm info install [email protected]
npm info postinstall [email protected]
npm info preinstall [email protected]
npm info build /Users/thomas/Desktop/shopify-endeavor/node_modules/node-horseman/node_modules/node-phantom-simple
npm info linkStuff [email protected]
npm info install [email protected]
npm info postinstall [email protected]
npm info preinstall [email protected]
npm info attempt registry request try #1 at 8:55:44 PM
npm http request GET http://registry.npmjs.org/bindings
npm info attempt registry request try #1 at 8:55:44 PM
npm http request GET http://registry.npmjs.org/nan
npm http 304 http://registry.npmjs.org/nan
npm http 304 http://registry.npmjs.org/bindings
npm info install [email protected] into /Users/thomas/Desktop/shopify-endeavor/node_modules/node-horseman/node_modules/deasync
npm info install [email protected] into /Users/thomas/Desktop/shopify-endeavor/node_modules/node-horseman/node_modules/deasync
npm info installOne [email protected]
npm info installOne [email protected]
npm info preinstall [email protected]
npm info build /Users/thomas/Desktop/shopify-endeavor/node_modules/node-horseman/node_modules/deasync/node_modules/bindings
npm info linkStuff [email protected]
npm info install [email protected]
npm info postinstall [email protected]
npm info preinstall [email protected]
npm info build /Users/thomas/Desktop/shopify-endeavor/node_modules/node-horseman/node_modules/deasync/node_modules/nan
npm info linkStuff [email protected]
npm info install [email protected]
npm info postinstall [email protected]
npm info build /Users/thomas/Desktop/shopify-endeavor/node_modules/node-horseman/node_modules/deasync
npm info linkStuff [email protected]
npm info install [email protected]

> [email protected] install /Users/thomas/Desktop/shopify-endeavor/node_modules/node-horseman/node_modules/deasync
> node ./build.js

Build failed
npm info [email protected] Failed to exec install script
npm ERR! Darwin 14.4.0
npm ERR! argv "/Users/thomas/.nvm/versions/io.js/v3.2.0/bin/iojs" "/Users/thomas/.nvm/versions/io.js/v3.2.0/bin/npm" "i" "node-horseman" "--save"
npm ERR! node v3.2.0
npm ERR! npm  v2.13.3
npm ERR! code ELIFECYCLE

npm ERR! [email protected] install: `node ./build.js`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the [email protected] install script 'node ./build.js'.
npm ERR! This is most likely a problem with the deasync package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR!     node ./build.js
npm ERR! You can get their info via:
npm ERR!     npm owner ls deasync
npm ERR! There is likely additional logging output above.
npm info preuninstall [email protected]
npm info uninstall [email protected]
npm info postuninstall [email protected]
npm info preuninstall [email protected]
npm info uninstall [email protected]
npm info postuninstall [email protected]

npm ERR! Please include the following file with any support request:
npm ERR!     /Users/thomas/Desktop/shopify-endeavor/npm-debug.log
Failed to exec install script

run arbitrary code in a chain

Hi - great lib :)

Is it possible to run arbitrary code in a chain?

In the node context, not the browser context (which is .evaluate).

for example:

horseman
  .open("www.gogle.com")
  .wait(1000)
  .then(function(){
    //do stuff here
  });

执行实例代码报错

1、执行示例代码报错,请问大概是什么原因?
Error: spawn ENOENT

npm install node-horseman
2、安装模块报错
npm WARN package.json [email protected] No description
npm WARN package.json [email protected] No repository field.
npm WARN package.json [email protected] No README data
\

[email protected] install /Users/qinmudi/ericqin/test/horseman/node_modules/node-horseman/node_modules/deasync
node ./build.js

darwin-x64-node-0.10 exists; testing
Binary is fine; exiting
[email protected] node_modules/node-horseman
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected] ([email protected])
└── [email protected] ([email protected], [email protected])

Error with a dependencie of horseman

Hello, I can't run a test using horseman, the apparent cause is a error of dependency of npm packets, follow:

/home/fguedes/projetos/tripper/node_modules/horseman/node_modules/jsdom/lib/jsdom.js:3
`jsdom 4.x onward only works on io.js, not Node.js™: https://github.com/tmpvar
^
SyntaxError: Unexpected token ILLEGAL
at Module._compile (module.js:439:25)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Module.require (module.js:364:17)
at require (module.js:380:17)
at Object. (/home/fguedes/projetos/tripper/node_modules/horseman/lib/horseman.js:3:13)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)

Sorry but I can't find solution for this bug.

Any suggestions?

Issue in instantiating node-horseman

Within express app when I tried to create instance of Horseman globally its working correctly without any errors.But when I try the same within the function it throwing the following error

node: ../src/node_http_parser.cc:387: static v8::Handlev8::Value node::Parser::Execute
(const v8::Arguments&): Assertion `!current_buffer' failed.

Aborted (core dumped)

When I do the same thing within sails my code hangs at the line where I instantiate the instance of Horseman.

The same thing works fine when I do it normally without using it with sails or express

The following code block creates a simple express app which requires node_horseman_test_2.js where I instantiate node horseman.

node_horseman_test_1.js

 |--------------------------------------------------------------------------
 | Global Imports
 |--------------------------------------------------------------------------
 */

var express = require('express');
var Promise = require('bluebird');
var path = require('path');
var bodyParser = require('body-parser');
var nodeHorseman = require('./node_horseman_test_2.js');

/*
 |--------------------------------------------------------------------------
 | Create our Express application
 |--------------------------------------------------------------------------
 */

var app = express();

app.use(bodyParser.urlencoded({
    extended: true
}));

app.all('/*', function(req, res, next) {
    // CORS headers
    res.header("Access-Control-Allow-Origin", "*"); // restrict it to the required domain
    res.header('Access-Control-Allow-Methods', 'GET,PUT,POST,DELETE,OPTIONS');
    // Set custom headers for CORS
    res.header('Access-Control-Allow-Headers', 'Content-type,Accept,X-Access-Token,X-Key');
    if (req.method == 'OPTIONS') {
        res.status(200).end();
    } else {
        next();
    }
});

// Use environment defined port or 8888
var port = process.env.PORT || 8888;

// Create our Express router
var router = express.Router();

// Register all our routes with /api
app.use('/api/v1/', router);

// Initial dummy route for testing
router.get('/', function(req, res) {
    res.json({ message: 'Test message' });
});

//Add productMatching router here
var test = router.route('/test');

//Get request here calls scraper to find matches
test.get(function(req, res) {
    nodeHorseman.instantiate(req.query).then(function(results) {
        res.send(results);
    }).catch(function(err) {
        if(err) res.send(err);
        //LOG ERROR HERE
        else res.send('Match failed. Check connection.');
    });
});

// If no route is matched by now, it must be a 404
app.use(function(req, res, next) {
    var err = new Error('Not Found');
    err.status = 404;
    //LOG ERROR HERE
    next(err);
});

// Start the server
app.listen(port);
console.log('Send match searches to ' + port);

node_horseman_test_2.js

var Horseman = require('node-horseman');
var userAgent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36';
var Promise = require("bluebird");


exports.instantiate = function() {
    return new Promise(function(resolve, reject) {
        var horse = new Horseman();
        horse
        .userAgent(userAgent)
        .open("http://www.google.com");
        console.log(horse.url());
        horse.close();
    });
};

horseman/nightmare/phantom API...

Would you somehow mark next to each API item (example title()) what project introduced it? I believe you added new items such as text(), which is fantastic. I'd like to be able to see which project introduces which API command.

Perhaps like this?

.text(selector)
Gets the text inside of an element. (added Horseman)

Upgrade to node-phantom-simple v2.0 ?

First off, thanks for the awesome package! I have gotten a lot of use out of it, and it has proven very helpful.

I am curious if you plan to upgrade Horseman to use version ^2.0 of node-phantom-simple anytime soon? I ask because I am using FreeBSD, which is not supported in versions < 2.0 of node-phantom-simple. As a result of that, I added support for FreeBSD to node-phantom-simple and submitted a pull request which was accepted some time ago, but which has only just recently made it into a release. In the interim, I have been having to manually replace Horseman's node-phantom-simple package with the custom package, which I am still having to do since Horseman's dependency is for node-phantom-simple version ^1.2.0. It would be nice if I no longer had to do this, but obviously it requires that Horseman support version ^2.0.

Change API to use Promises

Opening for comments. I've been playing with promises and would like to move the entire API to be promise based. Using Horseman would now look something like:

horseman
    .headers( headers )
    .then(function(){
        return horseman.open( 'http://httpbin.org/headers' )
    })
    .then( function(){
        return horseman.evaluate(function(){
            return document.body.children[0].innerHTML;
        });
    })
    .then( function( data ){
        var response = JSON.parse( data );                  
        response.should.have.property( 'headers' );
        response.headers.should.have.property( 'X-Horseman-Header' );
        response.headers[ 'X-Horseman-Header' ].should.equal( 'test header' );
    })
        .catch( function( e ){
           console.log(e)
        });

This would remove the dependency on deasync, and make it easy to run multiple horsemen at the same time, without blocking the entire process.

Downwide - API gets more convoluted/not as easy to read.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.