Giter Site home page Giter Site logo

casperjs-mvp-scraper's Introduction

CasperJS MVP Web Scraper

CasperJS web scraping that interact with tab and form

Modify index variables & run

Invoke
npm run casper ${nCurrent} ${nStep} ${rndBase} ${rndTop} ${isSubSeq}

Random
npm run casper 20000000 10 1000000 9999999 0

Sequence
npm run casper 20000000 10 0 0 0

Multiples
node multi.js

If need to parse local htmls use parsers where it fit
node parser.group.regex.csv.js

AWS EC2's Deploy Instructions

if you already have the custom AMI go to first boot instructions

Deploy Ubuntu AMI

Login To instance base storage 16GB cause tmps on caspers on npm folders

ssh -i aws_key.pem [email protected]

AWS AMIs Compatible

sudo apt-get update -y
sudo apt-get install libc6 libstdc++6 libgcc1 libgtk2.0-0 libasound2 libxrender1 libdbus-glib-1-2 -y
sudo apt-get install python -y

install max casperjs firefox version

wget https://ftp.mozilla.org/pub/firefox/releases/59.0/linux-x86_64/en-US/firefox-59.0.tar.bz2
tar xvjf firefox-59.0.tar.bz2
sudo apt-mark hold firefox

Create app folder

mkdir -p app/00

Install NodeJS in this case using nvm package

wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.34.0/install.sh | bash

Reload terminal to update and install to required version

nvm install node 

Add ENV vars to .bashrc at the end of file

vi ~/.bashrc
export SLIMERJSLAUNCHER="$HOME/firefox/firefox"
export PHANTOMJS_EXECUTABLE="./node_modules/.bin/slimerjs"

Volumes Setup EFS libs to be availabe to mount later

sudo apt-get -y install nfs-common

mounting point

sudo mkdir /mnt/efs
sudo chown ubuntu:ubuntu /mnt/efs
sudo chmod 777 /mnt/efs

NVM SSD , EBS or volumes

create mounting point

sudo mkdir /mnt/nvm
sudo chown ubuntu:ubuntu /mnt/nvm
sudo chmod 777 /mnt/nvm

sudo apt-get -y install linux-aws  
sudo reboot

upload app code from local /dont include packages

scp -i aws_key.pem -r ~/Documents/telexplorer [email protected]:~/app

login back & customize packinstall

mv app/base app/00-replicas
cd app/00-replicas
npm install

exit Make AMI image or standalone install continue

First run after AMI deployed

Add GUI to the Installation (if you do this before creating AWS AMI image for EC2s wont deploy)

sudo apt-get upgrade -y

Install new grub on EBS

sudo apt-get install ubuntu-desktop gnome-panel gnome-settings-daemon metacity nautilus gnome-terminal xfce4 vnc4server -y

install and set pass for vnc and replace config

vncserver
vi ~/.vnc/xstartup

#!/bin/sh
# Uncomment the following two lines for normal desktop:
unset SESSION_MANAGER
# exec /etc/X11/xinit/xinitrc
unset DBUS_SESSION_BUS_ADDRESS
startxfce4 &
[ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup
[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources
xsetroot -solid grey
vncconfig -iconic &
gnome-panel &
gnome-settings-daemon &
metacity &
nautilus &
gnome-terminal &

mount EFS & check

sudo mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-id.efs.region.amazonaws.com:/ /mnt/efs
ls /mnt/efs/

define proper disk to use, depending on the type of ec2 could be efimeral or ebs use the proper one not partitioned i.e. nvme0n1 259:1 0 139.7G 0 disk nvme1n1 259:2 0 279.4G 0 disk

lsblk

Configure disk USE proper name file sys and mount

sudo mkfs.ext4 -E nodiscard /dev/nvme0n1
sudo mount -o discard /dev/nvme0n1 /mnt/nvm
// lets check all is mounted
df -h

want to add to boot sequence lets extract ids

sudo blkid

i.e. /dev/nvme0n1: UUID="8d1d3d75-3626-40e3-970a-40d8f6936c29" TYPE="ext4"

add to /etc/stab

sudo vi /etc/fstab

UUID=<UUID> /mnt/nvm ext4 defaults 0 0

UUID=8d1d3d75-3626-40e3-970a-40d8f6936c29 /mnt/nvm ext4 defaults 0 0

// permissions
sudo chown ubuntu:ubuntu /mnt/nvm & sudo chmod 777 /mnt/nvm

//Create Specific app folders

mkdir /mnt/nvm/raw /mnt/nvm/captures /mnt/nvm/results /mnt/nvm/tpl1 /mnt/nvm/error /mnt/nvm/uk

Reload ssh to run VNC if your Security group allows can connect direct to ip, or set ssh port fowarding

ssh -L 5902:localhost:5902 -i aws_key.pem [email protected]
vncserver
vncserver -geometry 1340x750

try connecting with vncviewer

Done!

ON EC2 Reboots

On each restart need to mount back EFS ssh machine and start vnc and or ephimeral disks

vncserver
vncserver -geometry 1340x750
sudo mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-id.efs.region.amazonaws.com:/ /mnt/efs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.