This package is a wrapper of Puppeteer which works setting an array of steps.
It allows you doing basic scrapping in a easy way.
Get a new instance of Scrapper
Scrapper(<dimensions>, <showBrowser>, <steps>, <objectData>, <customChromium>)
dimensions: {
height: <number>
width: <number>,
}
showBrowser: <boolean>
steps: [{
type: <string>,
...propsAccordingToType
}]
objectData: {
yourProp1: 'prop1',
yourProp2: 'prop2',
yourProp3: 'prop3'
}
customChromium: <string>
// if customChromium = ''. The Chromium to be used will be the installed by Puppeteer; otherwise, The Chromium to be used will be the past one.
const Scrapper = require('puppeteer-by-steps');
s = new Scrapper({ width: 1366, height: 768}, true, newSteps, transformedData);
await s.init(); // Init setting required by Puppeteer
await s.scrap(); // Start the scrapping process
This method is required after you instance the Scrapper. This method will set your first page, which will called main
.
Use:
await s.init();
This method returns the Puppeteer browser. You can use whatever Puppeteer browser method.
s.getBrowser();
This method returns the Puppeteer page which is being using currently. You can use whatever Puppeteer page method.
s.getCurrentPage();
This method will create a new page in the browser. It receives a name for the new page.
Use:
await s.createPage(name);
Example:
await s.createPage('secondPage');
This method allows to switch between the different pages using the name. If does not exist a page with the given name by parameter, the browser will switch to the main
page .
Use:
await s.selectPage(name);
Example:
await s.selectPage('secondPage');
await s.selectPage('main');
This method allows to switch between the different pages using the index of array pages instances of the browser.
Use:
await s.selectPageByIndex(index);
Example:
await s.selectPageByIndex(0);
await s.selectPageByIndex(1);
await s.selectPageByIndex(5);
This method will run the different steps which have been passed to the Scrapper.
Use:
await s.scrap();
This method returns the data collected across the different steps.
- If a string parameter is sent, the function will return the specific property stored according to the parameter.
- If is not sent any parameter, an array will be returned with all data collected.
Use:
const fullData = s.getCollectedData();
const specificData = s.getCollectedData('propertyStored');
This method returns the array with current steps
Use:
const data = s.getSteps();
This method sets the steps.
Use:
const data = s.setSteps([<Step>]);
This method closes the browser.
Use:
await s.closeBrowser();
You can invoke it in individual way or using in steps.
Definition:
{
type: 'click',
selector: <string:required>, // Selector which will be used for complete the action
waitFor: <number:default=0> // Milliseconds to await after complete the action
}
Example:
{
type: 'click',
selector: 'a.mylink'
waitFor: 10
}
Definition:
You can store data inside an object of the your Scrapper instance, this data can be retrieved using the method getCollectedData
;
{
type: 'collect-data',
prop: '<string|required>', // Name of prop inside the *collectedData*
selector: <string:required>, // Selector which will be used for complete the action
contentType: <string:default=innerText:options=innerText,outerHTML> // Type of information you can extract form the selector,
multiple: <boolean:default=false> // By default returns only one value, otherwise, if there are more one value matched with the selector, will be returned an array
}
Example not-multiple:
{
type: 'collect-data',
prop: 'title',
selector: 'h1',
contentType: 'innerText',
multiple: false
}
Example multiple:
{
type: 'collect-data',
prop: 'subtitles',
selector: 'h3',
contentType: 'innerText',
multiple: true
}
Definition: You can set values to inputs, selects or radio buttoms.
{
type: 'fill-data',
data: [<Data>],
waitFor: <number:default=0> // Milliseconds to await after complete the action
}
<Data>: {
type: <string:required:options=input,select,radio>,
selector: <string:required>, // Selector which will be used for complete the action
origin: <string:options=static,dynamic>, // 'Static' will take the value of the prop *value*, 'dynamic' will take the value of the *objectData* passed at moment of instance creation.
value: <string> //(For radio buttoms is the index of the option. Example: '0' or '2'),
waitFor: <number:default=0> // Milliseconds to await after complete the action
}
Example static value:
{
type: 'fill-data',
data: [{
type: 'input',
selector: '#homeaddress',
origin: 'static',
value: 'street 32' // *street 32* will be the value set,
waitFor: 1000
},{
type: 'input',
selector: '#phonenumber',
origin: 'static',
value: '18601234567' //*18601234567* will be the value set
}],
waitFor: <number:default=0> // Milliseconds to await after complete the action
}
Example dynamic value:
{
type: 'fill-data',
data: [{
type: 'input',
selector: '#homeaddress',
origin: 'dynamic',
value: 'user_home_address' // The value set will be the value for objectClass.user_home_address (Passed in the creation of the Scrapper instance)
},{
type: 'input',
selector: '#phonenumber',
origin: 'dynamic',
value: 'user_phone' // The value set will be the value for objectClass.user_phone (Passed in the creation of the Scrapper instance)
}],
waitFor: 2000
}
Definition:
Definition of Puppeteer options for goTo
{
type: 'go-to',
link: <string:url:required>, // URL which will be visited
waitUntil: <string:default=load:options=load,domcontentloaded,networkidle0,networkidle2>,
timeout: <number>, // milliseconds
waitFor: <number:default=0> // Milliseconds to await after complete the action
}
Example:
{
type: 'go-to',
link: 'https://github.com/zetogk',
waitFor: 0
}
Definition:
List of keys - US Keyboard Layout
{
type: 'press-key',
key: <string:required>, // Visit the keyboard layout to know the keys
waitFor: <number:default=0> // Milliseconds to await after complete the action
}
Example:
{
type: 'press-key',
key: 'Escape',
waitFor: 1000
}
Definition:
{
type: 'screenshot'
}
Example:
{
type: 'screenshot'
}
Definition:
{
type: 'wait-for-selector',
selector: <string:required>, // Selector which will be used for complete the action
timeout: <number> // milliseconds
}
Example:
{
type: 'wait-for-selector',
selector: '.mydiv',
timeout: 10000
}
- zetogk [email protected]
- Maximo-Miranda [email protected]