ail-project / ail-splash-manager Goto Github PK
View Code? Open in Web Editor NEWDeprecated: AIL crawler has been upgraded to https://github.com/ail-project/lacus
License: GNU General Public License v3.0
Deprecated: AIL crawler has been upgraded to https://github.com/ail-project/lacus
License: GNU General Public License v3.0
Hi!
I Installed the AIL-Splash-Manager on the same machine as AIL itself is running (we only have this single machine)
But i´m not able to get the Crawlers running because of an Error:
Screen of the ail-splash-manager:
Launching all Splash dockers ...
* Serving Flask app 'Flask_server'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on https://127.0.0.1:7001
* Running on https://192.168.158.2:7001
Press CTRL+C to quit
127.0.0.1 - - [19/Aug/2022 14:28:33] "GET /api/v1/ping HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:40] "GET /api/v1/get/session_uuid HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:40] "GET /api/v1/ping HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:40] "GET /api/v1/get/proxies/all HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:41] "GET /api/v1/get/splash/all HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:41] "GET /api/v1/ping HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:42] "GET /api/v1/ping HTTP/1.1" 200 -
Here is the output of the LAUNCH.sh -t
./LAUNCH.sh -t
#### containers config: ####
# proxy_name: proxy name (defined in proxies_profiles.cfg)
# port: single port or port range (ex: 8050 or 8050-8052)
# cpu: max number of cpu allocated
# memory: RAM (G) allocated
# maxrss: max unbound in-memory cache (Mb, Restart Splash when full)
# description: docker description
[default_splash_tor]
proxy_name=default_tor
port=8050-8052
cpu=1
memory=1
maxrss=2000
description= default splash tor
net=bridge
# Splash with SQUID proxy
[web_splash]
proxy_name=web_proxy
port=8060
cpu=1
memory=1
maxrss=2000
description= web splash
net=bridge
# Splash with I2P proxy
#[default_splash_i2p] # section name: splash name
#proxy_name=default_i2p
#port=8053-8055
#cpu=1
#memory=1
#maxrss=2000
#description=default splash i2p
#net=host
#### proxies config: ####
# Tor: torrc default proxy
# use The torproject proxy https://2019.www.torproject.org/docs/debian
# (up to date, solve issues with v3 onion addresses)
# proxy name
[default_tor]
# proxy host
host=172.17.0.1
# proxy port
port=9050
# proxy type
type=SOCKS5
# proxy description
description=tor default proxy
# crawler type (tor or i2p or web)
crawler_type=tor
# SQUID proxy
[web_proxy]
host=172.17.0.1
port=3128
type=HTTP
description=web proxy
crawler_type=web
# I2P proxy
#[default_i2p]
#host=127.0.0.1
#port=4444
#type=HTTP
#description=i2p default proxy
#crawler_type=i2p
#### #### ####
Launching Tests ...
Splash List:
b'6a30543d58f9 scrapinghub/splash "python3 /app/bin/sp" 58 seconds ago Up 57 seconds 0.0.0.0:8060->8050/tcp gallant_moser\n719b9459bb82 scrapinghub/splash "python3 /app/bin/sp" 58 seconds ago Up 57 seconds 0.0.0.0:8052->8050/tcp hardcore_nightingale\ne237c3d9a36d scrapinghub/splash "python3 /app/bin/sp" 59 seconds ago Up 58 seconds 0.0.0.0:8051->8050/tcp pensive_greider\nc8c36770f590 scrapinghub/splash "python3 /app/bin/sp" 59 seconds ago Up 58 seconds 0.0.0.0:8050->8050/tcp strange_gould\n'
Testing Splash Docker 6a30543d58f9:
success
Testing Splash Docker 719b9459bb82:
success
Testing Splash Docker e237c3d9a36d:
success
Testing Splash Docker c8c36770f590:
success
Running docker container:
# docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6a30543d58f9 scrapinghub/splash "python3 /app/bin/sp…" 6 minutes ago Up 6 minutes 0.0.0.0:8060->8050/tcp gallant_moser
719b9459bb82 scrapinghub/splash "python3 /app/bin/sp…" 6 minutes ago Up 6 minutes 0.0.0.0:8052->8050/tcp hardcore_nightingale
e237c3d9a36d scrapinghub/splash "python3 /app/bin/sp…" 6 minutes ago Up 6 minutes 0.0.0.0:8051->8050/tcp pensive_greider
c8c36770f590 scrapinghub/splash "python3 /app/bin/sp…" 6 minutes ago Up 6 minutes 0.0.0.0:8050->8050/tcp strange_gould
Under onion crawler are both ports listed (8050 TOR + 8060 WEB) is this maybe the problem ?
I checked the WebProxy Configuration (https://github.com/ail-project/ail-splash-manager#web-proxy) here in the project description but is not clear to me which "/etc/squid/squid.conf" i need to configure ? Squid is not installed per default with the ail-splash-manger install script on the host. Or do i need to change it inside the docker container ?
Is there anything i can debug further ?
Please give me a hint if more logs are needed.
Thanks for your help!
Hi @Terrtia ,
regarding the possibility to also configure crawlers to connect via http proxy. How does the ail-framework know which crawlers on which port can be used for non-TOR connections? Does it need a special configuration in /ail-framework/configs/core.cfg
under ```
[Crawler]
activate_crawler = True
crawler_depth_limit = 1
...
?
Hi @Terrtia , is the way the crawlers are started somehow different to the way in the script form the ail-framework? I tried the spalsh-manager and everything seems to start right but the crawlers seem to not to be able to connect via the TOR proxy. This is not the case when using the script from the ail-framework. Sidenote: I am using TOR behind a http(s) proxy which is configured in /etc/tor/torrc
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.