Giter Site home page Giter Site logo

andi's Introduction

Scrapinghub command line client

PyPI Version Python Versions Tests Coverage report

shub is the Scrapinghub command line client. It allows you to deploy projects or dependencies, schedule spiders, and retrieve scraped data or logs without leaving the command line.

Requirements

  • Python >= 3.6

Installation

If you have pip installed on your system, you can install shub from the Python Package Index:

pip install shub

Please note that if you are using Python < 3.6, you should pin shub to 2.13.0 or lower.

We also supply stand-alone binaries. You can find them in our latest GitHub release.

Documentation

Documentation is available online via Read the Docs: https://shub.readthedocs.io/, or in the docs directory.

andi's People

Contributors

gallaecio avatar ivanprado avatar kmike avatar wrar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

andi's Issues

andi does not support functools.partial

I'm trying to use scrapy-po โ€“ which relies on andi โ€“ with a callback provided by functools.partial but I'm facing the following traceback.

Andi is internally trying to use typing.get_type_hints to inspect the target element and build the andi.Plan while this built-in method does not accept anything that "is not a module, class, method, or function".

Traceback (most recent call last):
  File "/Users/victortorres/.virtualenvs/scrapy-po/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/Users/victortorres/.virtualenvs/scrapy-po/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/Users/victortorres/.virtualenvs/scrapy-po/lib/python3.7/site-packages/scrapy/core/downloader/middleware.py", line 51, in process_response
    response = yield deferred_from_coro(method(request=request, response=response, spider=spider))
  File "/Users/victortorres/.virtualenvs/scrapy-po/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/Users/victortorres/git/shub/scrapy-po/scrapy_po/middleware.py", line 30, in process_response
    plan, provider_instances = build_plan(callback, response)
  File "/Users/victortorres/git/shub/scrapy-po/scrapy_po/middleware.py", line 53, in build_plan
    externally_provided=provider_instances.keys()
  File "/Users/victortorres/.virtualenvs/scrapy-po/lib/python3.7/site-packages/andi/andi.py", line 264, in plan
    dependency_stack=None)
  File "/Users/victortorres/.virtualenvs/scrapy-po/lib/python3.7/site-packages/andi/andi.py", line 294, in _plan
    arguments = inspect(class_or_func)
  File "/Users/victortorres/.virtualenvs/scrapy-po/lib/python3.7/site-packages/andi/andi.py", line 24, in inspect
    annotations = get_type_hints(func, globalns)
  File "/Users/victortorres/miniconda3/lib/python3.7/typing.py", line 993, in get_type_hints
    'or function.'.format(obj))
TypeError: functools.partial(<bound method PartialCallbackSpider.parse of <PartialCallbackSpider 'test_spider' at 0x10f487dd8>>, use_x=True) is not a module, class, method, or function.

Example spider:

class MySpider(scrapy.Spider):

    def start_requests(self):
        yield scrapy.Request('http://example.com/', functools.partial(self.parse, use_x=True))

    def parser(self, response, use_x):
        return {
            'url': response.url,
            'use_x': use_x,
        }

I've had discussed this behavior with @kmike and he has asked me to report this bug here.

Can build functions be simplified?

Currently andi docs suggest this as a build function:

def build(plan):
    instances = {}
    for fn_or_cls, kwargs_spec in plan:
        if isinstance(fn_or_cls, CustomBuilder):
            instances[fn_or_cls.result_class_or_fn] = fn_or_cls.factory(**kwargs_spec.kwargs(instances))
        else:
            instances[fn_or_cls] = fn_or_cls(**kwargs_spec.kwargs(instances))
    return instances

It seems it's possible to make it

def build(plan):
    instances = {}
    for builder, kwargs_spec in plan:
        instances[builder.result_class_or_fn] = builder.factory(**kwargs_spec.kwargs(instances))
    return instances

if we wrap everything to builders in the plan.

Update `andi.plan` to support a function/class signature in `dict`

Currently, andi.plan() works by passing a function or class. This requires them to have their signatures already established:

class Valves:
    pass

class Engine:
    def __init__(self, valves: Valves):
        self.valves = valves

class Wheels:
    pass

class Car:
    def __init__(self, engine: Engine, wheels: Wheels):
        self.engine = engine
        self.wheels = wheels

However, there are use cases where we might want to create different types of cars which means that the car signatures could be dynamic which is only determined during runtime.

For example, during runtime the car's wheels could be electric, or the wheels could become tank treads. We could solve this by defining an ElectricCar or TankCar. Yet, this solution doesn't cover all the other permutation of car types, especially when its signature changes if it adds an arbitrary amount of attachments:

  • engine: Engine, wheels: Wheels, top_carrier: RoofCarrier
  • engine: Engine, wheels: Wheels, top_carrier: BikeRack
  • engine: Engine, wheels: Wheels, back_carrier: BikeRack, top_carrier: RoofCarrier

We could list down all of the possible dependencies in the signature but that wouldn't be efficient since it takes some effort to fulfill all of them but at the end, only a handful of them will be used.


I'm proposing to update the API to allow such arbitrary signatures to used. This allows something like this to be possible:

def get_blueprint(customer_request):
    results: Dict[str, Any] = {}
    for i, dependency in enumerate(read_request(customer_request)):
        results[f"arg_{i}"] = results
    
    return results  # something like {"arg_1": Engine, "arg_2": Wheels, "arg_3": BikeRack}

signature = get_blueprint(customer_request)

plan = andi.plan(
    signature,
    is_injectable=is_injectable,
    externally_provided=externally_provided,
)

andi.plan() largely remains the same except that it now supports a mapping representing any arbitrary function/class signature.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.