Giter Site home page Giter Site logo

Comments (7)

yssoe avatar yssoe commented on August 14, 2024

Hi, just run it on Amazon AWS or any other cloud service.

Cheers,

from scrapybook.

inkrement avatar inkrement commented on August 14, 2024

You got me wrong. I am already using my own vps. But I don't want to use the terminal all the time to setup virtualenv, crontabs etc. This is quite messy, especially if you have to install and manage a lot of scrapers. So I am looking for a nice gui to install, manage, configure and monitor my scrapers. A self hosted scrapinghub would be perfect, but I was not able to find such a tool.

from scrapybook.

yssoe avatar yssoe commented on August 14, 2024

Hi, did you tried scrapyd ?

It comes with a webinterface

https://scrapyd.readthedocs.io/en/latest/overview.html#web-interface

I run a fair amount of spiders, and I scripted the deployment of them in Ansible, I only need to run 1 command and it's done.

cheers

from scrapybook.

lookfwd avatar lookfwd commented on August 14, 2024

@inkrement - thank you so much! I'm so glad you like the book :)

One thing I would recommend is talking directly to @pablohoffman. Scrapinghub might be able to provide you with a licence, code or just the right direction to have exactly the system you need.

install, manage, configure and monitor my scrapers

All but the monitor on this list are actually very close to what scrapyd (as @yssoe says) and/or generic infrastructure tools like chef, vagrant or docker provide (relevant tools: 1, 2, 3). For monitoring, indeed, I'm not aware of something strong. The section named "Creating our custom monitoring command" in Chapter 11 gives some clues on how easy it is to implement such functionality. It's all REST + JSON and it should be easy and cost effective to contract someone in upwork to develop something that would exactly fit your needs and potentially opensource it as well. There is indeed a gap.

from scrapybook.

pablohoffman avatar pablohoffman commented on August 14, 2024

Hi @inkrement, we have no plans to provide a self-hosted version of Scrapinghub simply because it's too much work to maintain a separate appliance version of our platform (we're a small team!) and we've yet to find: 1. a customer our infrastructure can't accommodate and 2. a customer that is willing to sponsor its development (we're talking north of a couple hundred grand)

I'm curious to understand what your concerns are in regards to running your spiders in Scrapinghub. Would you have the same concerns regarding, say, hosting your web app in Heroku or your code in Github?. Thanks in advance for your insights!

from scrapybook.

inkrement avatar inkrement commented on August 14, 2024

@yssoe Thanks for your input. Scrapyd looks very promising, I'll take a look at it!

@lookfwd Oh, nice - I skipped that chapter back then, but I will read it. Maybe I will code something too, I studied Software Engineering, so this should not be the problem, but I hoped that there are already some existing tools.

@pablohoffman I have no concerns and I would love to use scrapinghub, but I work for a university and we have our own servers. If I am paying for external infrastructure or services I have to argue why I am not using our own hardware and that's the only reason against it. It's not easy to do that especially because usability is not really a good reason for them.

from scrapybook.

pablohoffman avatar pablohoffman commented on August 14, 2024

@inkrement thanks for clarifying, would love to continue the chat offline. you can reach me at pablo in scrapinghub.com

from scrapybook.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.