Giter Site home page Giter Site logo

risjbot's People

Contributors

dependabot[bot] avatar pmyteh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

risjbot's Issues

How to run?

Hi,
I've downloaded your project, but I don't know how to run it. Help me please
Thanks in advance :)

usatoday.py is raising an exception and failing to parse many pages

2020-01-17 11:41:01 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.usatoday.com/story/sports/sports-betting/2020/01/17/anaheim-ducks-at-carolina-hurricanes-odds-picks-and-best-bets/41014307/> (referer: https://www.usatoday.com/news-sitemap.xml)
Traceback (most recent call last):
  File "/home/euler/miniconda3/envs/pytorch/lib/python3.7/site-packages/scrapy/loader/processors.py", line 59, in __call__
    value = func(value)
  File "/home/euler/miniconda3/envs/pytorch/lib/python3.7/site-packages/scrapy/loader/processors.py", line 107, in __call__
    return self.separator.join(values)
  File "/mnt/data1/NLP/RISJbot-master/RISJbot/loaders.py", line 42, in _strip_strl
    yield s.strip()
AttributeError: 'int' object has no attribute 'strip'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/euler/miniconda3/envs/pytorch/lib/python3.7/site-packages/scrapy/loader/__init__.py", line 159, in _process_input_value
    return proc(value)
  File "/home/euler/miniconda3/envs/pytorch/lib/python3.7/site-packages/scrapy/loader/processors.py", line 63, in __call__
    (str(func), value, type(e).__name__, str(e)))
ValueError: Error in Compose with <scrapy.loader.processors.Join object at 0x7f4bf64d1dd0> value=<generator object _strip_strl at 0x7f4be104a8d0> error='AttributeError: 'int' object has no attribute 'strip''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/euler/miniconda3/envs/pytorch/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/mnt/data1/NLP/RISJbot-master/RISJbot/spiders/newssitemapspider.py", line 29, in parse
    return self.parse_page(response)
  File "/mnt/data1/NLP/RISJbot-master/RISJbot/spiders/us/usatoday.py", line 60, in parse_page
    l.add_schemaorg(response)
  File "/mnt/data1/NLP/RISJbot-master/RISJbot/loaders.py", line 171, in add_schemaorg
    rdfa=False)
  File "/mnt/data1/NLP/RISJbot-master/RISJbot/loaders.py", line 184, in add_schemaorg_mde
    self.add_value('keywords',     data.get('keywords'))
  File "/home/euler/miniconda3/envs/pytorch/lib/python3.7/site-packages/scrapy/loader/__init__.py", line 78, in add_value
    self._add_value(field_name, value)
  File "/home/euler/miniconda3/envs/pytorch/lib/python3.7/site-packages/scrapy/loader/__init__.py", line 92, in _add_value
    processed_value = self._process_input_value(field_name, value)
  File "/home/euler/miniconda3/envs/pytorch/lib/python3.7/site-packages/scrapy/loader/__init__.py", line 164, in _process_input_value
    value, type(e).__name__, str(e)))
ValueError: Error with input processor Compose: field='keywords' value=[2] error='ValueError: Error in Compose with <scrapy.loader.processors.Join object at 0x7f4bf64d1dd0> value=<generator object _strip_strl at 0x7f4be104a8d0> error='AttributeError: 'int' object has no attribute 'strip'''

import error

from .aws_credentials import *
i run the command scrapy crawl yahoo but it gave me this error

ModuleNotFoundError: No module named 'RISJbot.aws_credentials'

my system is Windows 10
python == 3.7

Spider not found

I keep getting key error spider not found:CNN when I run scrapy crawl cnn or for any news website. What directory am I supposed to run that in? The README is very vague.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.