Giter Site home page Giter Site logo

ail-splash-manager's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

vncloudsco rafiot

ail-splash-manager's Issues

Crawler Error / Down

Hi!

I Installed the AIL-Splash-Manager on the same machine as AIL itself is running (we only have this single machine)
But i´m not able to get the Crawlers running because of an Error:
image

Screen of the ail-splash-manager:

Launching all Splash dockers ...

 * Serving Flask app 'Flask_server'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on https://127.0.0.1:7001
 * Running on https://192.168.158.2:7001
Press CTRL+C to quit
127.0.0.1 - - [19/Aug/2022 14:28:33] "GET /api/v1/ping HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:40] "GET /api/v1/get/session_uuid HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:40] "GET /api/v1/ping HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:40] "GET /api/v1/get/proxies/all HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:41] "GET /api/v1/get/splash/all HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:41] "GET /api/v1/ping HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:42] "GET /api/v1/ping HTTP/1.1" 200 -

Here is the output of the LAUNCH.sh -t


 ./LAUNCH.sh -t
 #### containers config: ####
# proxy_name: proxy name (defined in proxies_profiles.cfg)
# port: single port or port range (ex: 8050 or 8050-8052)
# cpu: max number of cpu allocated
# memory: RAM (G) allocated
# maxrss: max unbound in-memory cache (Mb, Restart Splash when full)
# description: docker description
[default_splash_tor]
proxy_name=default_tor
port=8050-8052
cpu=1
memory=1
maxrss=2000
description= default splash tor
net=bridge

# Splash with SQUID proxy
[web_splash]
proxy_name=web_proxy
port=8060
cpu=1
memory=1
maxrss=2000
description= web splash
net=bridge

# Splash with I2P proxy
#[default_splash_i2p] # section name: splash name
#proxy_name=default_i2p
#port=8053-8055
#cpu=1
#memory=1
#maxrss=2000
#description=default splash i2p
#net=host
#### proxies config: ####
# Tor: torrc default proxy
# use The torproject proxy https://2019.www.torproject.org/docs/debian
# (up to date, solve issues with v3 onion addresses)

# proxy name
[default_tor]
# proxy host
host=172.17.0.1
# proxy port
port=9050
# proxy type
type=SOCKS5
# proxy description
description=tor default proxy
# crawler type (tor or i2p or web)
crawler_type=tor

# SQUID proxy
[web_proxy]
host=172.17.0.1
port=3128
type=HTTP
description=web proxy
crawler_type=web

# I2P proxy
#[default_i2p]
#host=127.0.0.1
#port=4444
#type=HTTP
#description=i2p default proxy
#crawler_type=i2p

#### #### ####

 Launching Tests ...

Splash List:
b'6a30543d58f9   scrapinghub/splash   "python3 /app/bin/sp"   58 seconds ago   Up 57 seconds   0.0.0.0:8060->8050/tcp   gallant_moser\n719b9459bb82   scrapinghub/splash   "python3 /app/bin/sp"   58 seconds ago   Up 57 seconds   0.0.0.0:8052->8050/tcp   hardcore_nightingale\ne237c3d9a36d   scrapinghub/splash   "python3 /app/bin/sp"   59 seconds ago   Up 58 seconds   0.0.0.0:8051->8050/tcp   pensive_greider\nc8c36770f590   scrapinghub/splash   "python3 /app/bin/sp"   59 seconds ago   Up 58 seconds   0.0.0.0:8050->8050/tcp   strange_gould\n'

Testing Splash Docker 6a30543d58f9:
success

Testing Splash Docker 719b9459bb82:
success

Testing Splash Docker e237c3d9a36d:
success

Testing Splash Docker c8c36770f590:
success

Running docker container:


# docker container ls
CONTAINER ID   IMAGE                COMMAND                  CREATED         STATUS         PORTS                    NAMES
6a30543d58f9   scrapinghub/splash   "python3 /app/bin/sp…"   6 minutes ago   Up 6 minutes   0.0.0.0:8060->8050/tcp   gallant_moser
719b9459bb82   scrapinghub/splash   "python3 /app/bin/sp…"   6 minutes ago   Up 6 minutes   0.0.0.0:8052->8050/tcp   hardcore_nightingale
e237c3d9a36d   scrapinghub/splash   "python3 /app/bin/sp…"   6 minutes ago   Up 6 minutes   0.0.0.0:8051->8050/tcp   pensive_greider
c8c36770f590   scrapinghub/splash   "python3 /app/bin/sp…"   6 minutes ago   Up 6 minutes   0.0.0.0:8050->8050/tcp   strange_gould

Under onion crawler are both ports listed (8050 TOR + 8060 WEB) is this maybe the problem ?
image

I checked the WebProxy Configuration (https://github.com/ail-project/ail-splash-manager#web-proxy) here in the project description but is not clear to me which "/etc/squid/squid.conf" i need to configure ? Squid is not installed per default with the ail-splash-manger install script on the host. Or do i need to change it inside the docker container ?

Is there anything i can debug further ?

Please give me a hint if more logs are needed.
Thanks for your help!

Question: How are non-TOR crawlers used in ail-framework

Hi @Terrtia ,
regarding the possibility to also configure crawlers to connect via http proxy. How does the ail-framework know which crawlers on which port can be used for non-TOR connections? Does it need a special configuration in /ail-framework/configs/core.cfg under ```
[Crawler]
activate_crawler = True
crawler_depth_limit = 1
...

?

Question: Initialisation in comparison with script from ail-framework

Hi @Terrtia , is the way the crawlers are started somehow different to the way in the script form the ail-framework? I tried the spalsh-manager and everything seems to start right but the crawlers seem to not to be able to connect via the TOR proxy. This is not the case when using the script from the ail-framework. Sidenote: I am using TOR behind a http(s) proxy which is configured in /etc/tor/torrc .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.