Giter Site home page Giter Site logo

Comments (9)

codelucas avatar codelucas commented on May 18, 2024

After this revamp is done, the only two updates I see on the immediate horizon for this library are making category extraction better and finally adding a publishing date extraction feature.

from newspaper.

julianofischer avatar julianofischer commented on May 18, 2024

Wow, auto detect languages will be amazing.
=)

from newspaper.

codelucas avatar codelucas commented on May 18, 2024

OK, the update is now out. Update your pip packages and post any errors you guys find in this thread!!

The functionality works as far as I know, but I only added the barebones test cases in our testing suite. WE DESPERATELY need more testcases lol.

from newspaper.

julianofischer avatar julianofischer commented on May 18, 2024

Hi,
I tried to update using "sudo pip install newspaper --upgrade" and I got a error.

The entire log is bellow:

Downloading/unpacking newspaper from https://pypi.python.org/packages/source/n/newspaper/newspaper-0.0.5.tar.gz#md5=f70140b081028c9b272098df725f354b
  Downloading newspaper-0.0.5.tar.gz (7.7MB): 7.7MB downloaded
  Running setup.py egg_info for package newspaper


    package init file 'newspaper/data/__init__.py' not found (or not a regular file)
Downloading/unpacking lxml from https://pypi.python.org/packages/source/l/lxml/lxml-3.2.5.tar.gz#md5=6c4fb9b1840631cff09b8229a12a9ef7 (from newspaper)
  Downloading lxml-3.2.5.tar.gz (3.3MB): 3.3MB downloaded
  Running setup.py egg_info for package lxml
    /usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url'
      warnings.warn(msg)
    Building lxml version 3.2.5.
    Building without Cython.
    ERROR: /bin/sh: 1: xslt-config: not found

    ** make sure the development packages of libxml2 and libxslt are installed **

    Using build configuration of libxslt

    warning: no previously-included files found matching '*.py'
    warning: no files found matching '*.txt' under directory 'src/lxml/tests'
Downloading/unpacking requests from https://pypi.python.org/packages/source/r/requests/requests-2.1.0.tar.gz#md5=28543001831f46b1ff40686ebc027deb (from newspaper)
  Downloading requests-2.1.0.tar.gz (420kB): 420kB downloaded
  Running setup.py egg_info for package requests

Requirement already up-to-date: nltk in /usr/local/lib/python2.7/dist-packages (from newspaper)
Downloading/unpacking Pillow from https://pypi.python.org/packages/source/P/Pillow/Pillow-2.3.0.zip#md5=56b6614499aacb7d6b5983c4914daea7 (from newspaper)
  Downloading Pillow-2.3.0.zip (2.4MB): 2.4MB downloaded
  Running setup.py egg_info for package Pillow

Requirement already up-to-date: cssselect in /usr/local/lib/python2.7/dist-packages (from newspaper)
Requirement already up-to-date: BeautifulSoup in /usr/lib/python2.7/dist-packages (from newspaper)
Requirement already up-to-date: PyYAML>=3.09 in /usr/local/lib/python2.7/dist-packages (from nltk->newspaper)
Installing collected packages: newspaper, lxml, requests, Pillow
  Found existing installation: newspaper 0.0.4
    Uninstalling newspaper:
      Successfully uninstalled newspaper
  Running setup.py install for newspaper

    package init file 'newspaper/data/__init__.py' not found (or not a regular file)

  Found existing installation: lxml 3.2.0
    Uninstalling lxml:
      Successfully uninstalled lxml
  Running setup.py install for lxml
    /usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url'
      warnings.warn(msg)
    Building lxml version 3.2.5.
    Building without Cython.
    ERROR: /bin/sh: 1: xslt-config: not found

    ** make sure the development packages of libxml2 and libxslt are installed **

    Using build configuration of libxslt
    building 'lxml.etree' extension
    i686-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip_build_root/lxml/src/lxml/includes -I/usr/include/python2.7 -c src/lxml/lxml.etree.c -o build/temp.linux-i686-2.7/src/lxml/lxml.etree.o
    src/lxml/lxml.etree.c:8:22: fatal error: pyconfig.h: Arquivo ou diretório não encontrado
     #include "pyconfig.h"
                          ^
    compilation terminated.
    error: command 'i686-linux-gnu-gcc' failed with exit status 1
    Complete output from command /usr/bin/python -c "import setuptools;__file__='/tmp/pip_build_root/lxml/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-hoqnZK-record/install-record.txt --single-version-externally-managed:
    /usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url'

  warnings.warn(msg)

Building lxml version 3.2.5.

Building without Cython.

ERROR: /bin/sh: 1: xslt-config: not found



** make sure the development packages of libxml2 and libxslt are installed **



Using build configuration of libxslt

running install

running build

running build_py

creating build

creating build/lib.linux-i686-2.7

creating build/lib.linux-i686-2.7/lxml

copying src/lxml/builder.py -> build/lib.linux-i686-2.7/lxml

copying src/lxml/cssselect.py -> build/lib.linux-i686-2.7/lxml

copying src/lxml/usedoctest.py -> build/lib.linux-i686-2.7/lxml

copying src/lxml/ElementInclude.py -> build/lib.linux-i686-2.7/lxml

copying src/lxml/pyclasslookup.py -> build/lib.linux-i686-2.7/lxml

copying src/lxml/sax.py -> build/lib.linux-i686-2.7/lxml

copying src/lxml/doctestcompare.py -> build/lib.linux-i686-2.7/lxml

copying src/lxml/_elementpath.py -> build/lib.linux-i686-2.7/lxml

copying src/lxml/__init__.py -> build/lib.linux-i686-2.7/lxml

creating build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/__init__.py -> build/lib.linux-i686-2.7/lxml/includes

creating build/lib.linux-i686-2.7/lxml/html

copying src/lxml/html/_html5builder.py -> build/lib.linux-i686-2.7/lxml/html

copying src/lxml/html/html5parser.py -> build/lib.linux-i686-2.7/lxml/html

copying src/lxml/html/builder.py -> build/lib.linux-i686-2.7/lxml/html

copying src/lxml/html/formfill.py -> build/lib.linux-i686-2.7/lxml/html

copying src/lxml/html/ElementSoup.py -> build/lib.linux-i686-2.7/lxml/html

copying src/lxml/html/usedoctest.py -> build/lib.linux-i686-2.7/lxml/html

copying src/lxml/html/defs.py -> build/lib.linux-i686-2.7/lxml/html

copying src/lxml/html/diff.py -> build/lib.linux-i686-2.7/lxml/html

copying src/lxml/html/_setmixin.py -> build/lib.linux-i686-2.7/lxml/html

copying src/lxml/html/clean.py -> build/lib.linux-i686-2.7/lxml/html

copying src/lxml/html/_diffcommand.py -> build/lib.linux-i686-2.7/lxml/html

copying src/lxml/html/soupparser.py -> build/lib.linux-i686-2.7/lxml/html

copying src/lxml/html/__init__.py -> build/lib.linux-i686-2.7/lxml/html

creating build/lib.linux-i686-2.7/lxml/isoschematron

copying src/lxml/isoschematron/__init__.py -> build/lib.linux-i686-2.7/lxml/isoschematron

copying src/lxml/lxml.etree.h -> build/lib.linux-i686-2.7/lxml

copying src/lxml/lxml.etree_api.h -> build/lib.linux-i686-2.7/lxml

copying src/lxml/includes/c14n.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/xmlerror.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/htmlparser.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/dtdvalid.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/xmlschema.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/schematron.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/uri.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/etreepublic.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/tree.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/xinclude.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/xslt.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/relaxng.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/xmlparser.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/config.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/xpath.pxd -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/lxml-version.h -> build/lib.linux-i686-2.7/lxml/includes

copying src/lxml/includes/etree_defs.h -> build/lib.linux-i686-2.7/lxml/includes

creating build/lib.linux-i686-2.7/lxml/isoschematron/resources

creating build/lib.linux-i686-2.7/lxml/isoschematron/resources/rng

copying src/lxml/isoschematron/resources/rng/iso-schematron.rng -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/rng

creating build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl

copying src/lxml/isoschematron/resources/xsl/XSD2Schtrn.xsl -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl

copying src/lxml/isoschematron/resources/xsl/RNG2Schtrn.xsl -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl

creating build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1

copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_dsdl_include.xsl -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1

copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_abstract_expand.xsl -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1

copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_svrl_for_xslt1.xsl -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1

copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_skeleton_for_xslt1.xsl -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1

copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_message.xsl -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1

copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/readme.txt -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1

running build_ext

building 'lxml.etree' extension

creating build/temp.linux-i686-2.7

creating build/temp.linux-i686-2.7/src

creating build/temp.linux-i686-2.7/src/lxml

i686-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip_build_root/lxml/src/lxml/includes -I/usr/include/python2.7 -c src/lxml/lxml.etree.c -o build/temp.linux-i686-2.7/src/lxml/lxml.etree.o

src/lxml/lxml.etree.c:8:22: fatal error: pyconfig.h: Arquivo ou diretório não encontrado

 #include "pyconfig.h"

                      ^

compilation terminated.

error: command 'i686-linux-gnu-gcc' failed with exit status 1

----------------------------------------
  Rolling back uninstall of lxml
Cleaning up...
Command /usr/bin/python -c "import setuptools;__file__='/tmp/pip_build_root/lxml/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-hoqnZK-record/install-record.txt --single-version-externally-managed failed with error code 1 in /tmp/pip_build_root/lxml
Traceback (most recent call last):
  File "/usr/bin/pip", line 9, in <module>
    load_entry_point('pip==1.4.1', 'console_scripts', 'pip')()
  File "/usr/lib/python2.7/dist-packages/pip/__init__.py", line 148, in main
    return command.main(args[1:], options)
  File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 169, in main
    text = '\n'.join(complete_log)
UnicodeDecodeError: 

from newspaper.

codelucas avatar codelucas commented on May 18, 2024

I tried to clone the pip upgrade but this was my log (it worked on ubuntu with a virtualenv):

lukeuser@laguna:~/labs/python_labs/newspaper_env$ source bin/activate
(newspaper_env)lukeuser@laguna:~/labs/python_labs/newspaper_env$ pip install newspaper --upgrade
Downloading/unpacking newspaper from https://pypi.python.org/packages/source/n/newspaper/newspaper-0.0.5.tar.gz#md5=f70140b081028c9b272098df725f354b
  Downloading newspaper-0.0.5.tar.gz (7.7MB): 7.7MB downloaded
  Running setup.py (path:/home/lukeuser/labs/python_labs/newspaper_env/build/newspaper/setup.py) egg_info for package newspaper


    package init file 'newspaper/data/__init__.py' not found (or not a regular file)
Requirement already up-to-date: lxml in ./lib/python2.7/site-packages/lxml-3.3.0beta3-py2.7-linux-x86_64.egg (from newspaper)
Requirement already up-to-date: requests in ./lib/python2.7/site-packages (from newspaper)
Requirement already up-to-date: nltk in ./lib/python2.7/site-packages (from newspaper)
Requirement already up-to-date: Pillow in ./lib/python2.7/site-packages (from newspaper)
Requirement already up-to-date: cssselect in ./lib/python2.7/site-packages (from newspaper)
Requirement already up-to-date: BeautifulSoup in ./lib/python2.7/site-packages (from newspaper)
Requirement already up-to-date: PyYAML>=3.09 in ./lib/python2.7/site-packages (from nltk->newspaper)
Installing collected packages: newspaper
  Found existing installation: newspaper 0.0.4
    Uninstalling newspaper:
      Successfully uninstalled newspaper
  Running setup.py install for newspaper

    package init file 'newspaper/data/__init__.py' not found (or not a regular file)

Successfully installed newspaper
Cleaning up...
(newspaper_env)lukeuser@laguna:~/labs/python_labs/newspaper_env$

Can I suggest you remove your newspaper completely then try installing from scratch?

The error in your re-installation looks like it is from text = '\n'.join(complete_log)

from newspaper.

codelucas avatar codelucas commented on May 18, 2024

Try running: sudo easy_install lxml. Then pip install newspaper. It looks like all your required packages are there intact but for some reason it is re-installing lxml. Are you using ubuntu by any chance? Read this:

http://newspaper.readthedocs.org/en/latest/user_guide/install.html

from newspaper.

julianofischer avatar julianofischer commented on May 18, 2024

The fix did not worked.
I reinstalled and seeminlgy worked.
=)

Thank you!

from newspaper.

codelucas avatar codelucas commented on May 18, 2024

Wooh! props 👍

from newspaper.

codelucas avatar codelucas commented on May 18, 2024

I'm closing this issue. Please reopen if you have any issues.

from newspaper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.