Comments (9)
After this revamp is done, the only two updates I see on the immediate horizon for this library are making category extraction better and finally adding a publishing date extraction feature.
from newspaper.
Wow, auto detect languages will be amazing.
=)
from newspaper.
OK, the update is now out. Update your pip packages and post any errors you guys find in this thread!!
The functionality works as far as I know, but I only added the barebones test cases in our testing suite. WE DESPERATELY need more testcases lol.
from newspaper.
Hi,
I tried to update using "sudo pip install newspaper --upgrade" and I got a error.
The entire log is bellow:
Downloading/unpacking newspaper from https://pypi.python.org/packages/source/n/newspaper/newspaper-0.0.5.tar.gz#md5=f70140b081028c9b272098df725f354b
Downloading newspaper-0.0.5.tar.gz (7.7MB): 7.7MB downloaded
Running setup.py egg_info for package newspaper
package init file 'newspaper/data/__init__.py' not found (or not a regular file)
Downloading/unpacking lxml from https://pypi.python.org/packages/source/l/lxml/lxml-3.2.5.tar.gz#md5=6c4fb9b1840631cff09b8229a12a9ef7 (from newspaper)
Downloading lxml-3.2.5.tar.gz (3.3MB): 3.3MB downloaded
Running setup.py egg_info for package lxml
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url'
warnings.warn(msg)
Building lxml version 3.2.5.
Building without Cython.
ERROR: /bin/sh: 1: xslt-config: not found
** make sure the development packages of libxml2 and libxslt are installed **
Using build configuration of libxslt
warning: no previously-included files found matching '*.py'
warning: no files found matching '*.txt' under directory 'src/lxml/tests'
Downloading/unpacking requests from https://pypi.python.org/packages/source/r/requests/requests-2.1.0.tar.gz#md5=28543001831f46b1ff40686ebc027deb (from newspaper)
Downloading requests-2.1.0.tar.gz (420kB): 420kB downloaded
Running setup.py egg_info for package requests
Requirement already up-to-date: nltk in /usr/local/lib/python2.7/dist-packages (from newspaper)
Downloading/unpacking Pillow from https://pypi.python.org/packages/source/P/Pillow/Pillow-2.3.0.zip#md5=56b6614499aacb7d6b5983c4914daea7 (from newspaper)
Downloading Pillow-2.3.0.zip (2.4MB): 2.4MB downloaded
Running setup.py egg_info for package Pillow
Requirement already up-to-date: cssselect in /usr/local/lib/python2.7/dist-packages (from newspaper)
Requirement already up-to-date: BeautifulSoup in /usr/lib/python2.7/dist-packages (from newspaper)
Requirement already up-to-date: PyYAML>=3.09 in /usr/local/lib/python2.7/dist-packages (from nltk->newspaper)
Installing collected packages: newspaper, lxml, requests, Pillow
Found existing installation: newspaper 0.0.4
Uninstalling newspaper:
Successfully uninstalled newspaper
Running setup.py install for newspaper
package init file 'newspaper/data/__init__.py' not found (or not a regular file)
Found existing installation: lxml 3.2.0
Uninstalling lxml:
Successfully uninstalled lxml
Running setup.py install for lxml
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url'
warnings.warn(msg)
Building lxml version 3.2.5.
Building without Cython.
ERROR: /bin/sh: 1: xslt-config: not found
** make sure the development packages of libxml2 and libxslt are installed **
Using build configuration of libxslt
building 'lxml.etree' extension
i686-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip_build_root/lxml/src/lxml/includes -I/usr/include/python2.7 -c src/lxml/lxml.etree.c -o build/temp.linux-i686-2.7/src/lxml/lxml.etree.o
src/lxml/lxml.etree.c:8:22: fatal error: pyconfig.h: Arquivo ou diretório não encontrado
#include "pyconfig.h"
^
compilation terminated.
error: command 'i686-linux-gnu-gcc' failed with exit status 1
Complete output from command /usr/bin/python -c "import setuptools;__file__='/tmp/pip_build_root/lxml/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-hoqnZK-record/install-record.txt --single-version-externally-managed:
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url'
warnings.warn(msg)
Building lxml version 3.2.5.
Building without Cython.
ERROR: /bin/sh: 1: xslt-config: not found
** make sure the development packages of libxml2 and libxslt are installed **
Using build configuration of libxslt
running install
running build
running build_py
creating build
creating build/lib.linux-i686-2.7
creating build/lib.linux-i686-2.7/lxml
copying src/lxml/builder.py -> build/lib.linux-i686-2.7/lxml
copying src/lxml/cssselect.py -> build/lib.linux-i686-2.7/lxml
copying src/lxml/usedoctest.py -> build/lib.linux-i686-2.7/lxml
copying src/lxml/ElementInclude.py -> build/lib.linux-i686-2.7/lxml
copying src/lxml/pyclasslookup.py -> build/lib.linux-i686-2.7/lxml
copying src/lxml/sax.py -> build/lib.linux-i686-2.7/lxml
copying src/lxml/doctestcompare.py -> build/lib.linux-i686-2.7/lxml
copying src/lxml/_elementpath.py -> build/lib.linux-i686-2.7/lxml
copying src/lxml/__init__.py -> build/lib.linux-i686-2.7/lxml
creating build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/__init__.py -> build/lib.linux-i686-2.7/lxml/includes
creating build/lib.linux-i686-2.7/lxml/html
copying src/lxml/html/_html5builder.py -> build/lib.linux-i686-2.7/lxml/html
copying src/lxml/html/html5parser.py -> build/lib.linux-i686-2.7/lxml/html
copying src/lxml/html/builder.py -> build/lib.linux-i686-2.7/lxml/html
copying src/lxml/html/formfill.py -> build/lib.linux-i686-2.7/lxml/html
copying src/lxml/html/ElementSoup.py -> build/lib.linux-i686-2.7/lxml/html
copying src/lxml/html/usedoctest.py -> build/lib.linux-i686-2.7/lxml/html
copying src/lxml/html/defs.py -> build/lib.linux-i686-2.7/lxml/html
copying src/lxml/html/diff.py -> build/lib.linux-i686-2.7/lxml/html
copying src/lxml/html/_setmixin.py -> build/lib.linux-i686-2.7/lxml/html
copying src/lxml/html/clean.py -> build/lib.linux-i686-2.7/lxml/html
copying src/lxml/html/_diffcommand.py -> build/lib.linux-i686-2.7/lxml/html
copying src/lxml/html/soupparser.py -> build/lib.linux-i686-2.7/lxml/html
copying src/lxml/html/__init__.py -> build/lib.linux-i686-2.7/lxml/html
creating build/lib.linux-i686-2.7/lxml/isoschematron
copying src/lxml/isoschematron/__init__.py -> build/lib.linux-i686-2.7/lxml/isoschematron
copying src/lxml/lxml.etree.h -> build/lib.linux-i686-2.7/lxml
copying src/lxml/lxml.etree_api.h -> build/lib.linux-i686-2.7/lxml
copying src/lxml/includes/c14n.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/xmlerror.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/htmlparser.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/dtdvalid.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/xmlschema.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/schematron.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/uri.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/etreepublic.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/tree.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/xinclude.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/xslt.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/relaxng.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/xmlparser.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/config.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/xpath.pxd -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/lxml-version.h -> build/lib.linux-i686-2.7/lxml/includes
copying src/lxml/includes/etree_defs.h -> build/lib.linux-i686-2.7/lxml/includes
creating build/lib.linux-i686-2.7/lxml/isoschematron/resources
creating build/lib.linux-i686-2.7/lxml/isoschematron/resources/rng
copying src/lxml/isoschematron/resources/rng/iso-schematron.rng -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/rng
creating build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl
copying src/lxml/isoschematron/resources/xsl/XSD2Schtrn.xsl -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl
copying src/lxml/isoschematron/resources/xsl/RNG2Schtrn.xsl -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl
creating build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_dsdl_include.xsl -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_abstract_expand.xsl -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_svrl_for_xslt1.xsl -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_skeleton_for_xslt1.xsl -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_message.xsl -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/readme.txt -> build/lib.linux-i686-2.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
running build_ext
building 'lxml.etree' extension
creating build/temp.linux-i686-2.7
creating build/temp.linux-i686-2.7/src
creating build/temp.linux-i686-2.7/src/lxml
i686-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip_build_root/lxml/src/lxml/includes -I/usr/include/python2.7 -c src/lxml/lxml.etree.c -o build/temp.linux-i686-2.7/src/lxml/lxml.etree.o
src/lxml/lxml.etree.c:8:22: fatal error: pyconfig.h: Arquivo ou diretório não encontrado
#include "pyconfig.h"
^
compilation terminated.
error: command 'i686-linux-gnu-gcc' failed with exit status 1
----------------------------------------
Rolling back uninstall of lxml
Cleaning up...
Command /usr/bin/python -c "import setuptools;__file__='/tmp/pip_build_root/lxml/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-hoqnZK-record/install-record.txt --single-version-externally-managed failed with error code 1 in /tmp/pip_build_root/lxml
Traceback (most recent call last):
File "/usr/bin/pip", line 9, in <module>
load_entry_point('pip==1.4.1', 'console_scripts', 'pip')()
File "/usr/lib/python2.7/dist-packages/pip/__init__.py", line 148, in main
return command.main(args[1:], options)
File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 169, in main
text = '\n'.join(complete_log)
UnicodeDecodeError:
from newspaper.
I tried to clone the pip upgrade but this was my log (it worked on ubuntu with a virtualenv):
lukeuser@laguna:~/labs/python_labs/newspaper_env$ source bin/activate
(newspaper_env)lukeuser@laguna:~/labs/python_labs/newspaper_env$ pip install newspaper --upgrade
Downloading/unpacking newspaper from https://pypi.python.org/packages/source/n/newspaper/newspaper-0.0.5.tar.gz#md5=f70140b081028c9b272098df725f354b
Downloading newspaper-0.0.5.tar.gz (7.7MB): 7.7MB downloaded
Running setup.py (path:/home/lukeuser/labs/python_labs/newspaper_env/build/newspaper/setup.py) egg_info for package newspaper
package init file 'newspaper/data/__init__.py' not found (or not a regular file)
Requirement already up-to-date: lxml in ./lib/python2.7/site-packages/lxml-3.3.0beta3-py2.7-linux-x86_64.egg (from newspaper)
Requirement already up-to-date: requests in ./lib/python2.7/site-packages (from newspaper)
Requirement already up-to-date: nltk in ./lib/python2.7/site-packages (from newspaper)
Requirement already up-to-date: Pillow in ./lib/python2.7/site-packages (from newspaper)
Requirement already up-to-date: cssselect in ./lib/python2.7/site-packages (from newspaper)
Requirement already up-to-date: BeautifulSoup in ./lib/python2.7/site-packages (from newspaper)
Requirement already up-to-date: PyYAML>=3.09 in ./lib/python2.7/site-packages (from nltk->newspaper)
Installing collected packages: newspaper
Found existing installation: newspaper 0.0.4
Uninstalling newspaper:
Successfully uninstalled newspaper
Running setup.py install for newspaper
package init file 'newspaper/data/__init__.py' not found (or not a regular file)
Successfully installed newspaper
Cleaning up...
(newspaper_env)lukeuser@laguna:~/labs/python_labs/newspaper_env$
Can I suggest you remove your newspaper completely then try installing from scratch?
The error in your re-installation looks like it is from text = '\n'.join(complete_log)
from newspaper.
Try running: sudo easy_install lxml
. Then pip install newspaper. It looks like all your required packages are there intact but for some reason it is re-installing lxml. Are you using ubuntu by any chance? Read this:
http://newspaper.readthedocs.org/en/latest/user_guide/install.html
from newspaper.
The fix did not worked.
I reinstalled and seeminlgy worked.
=)
Thank you!
from newspaper.
Wooh! props 👍
from newspaper.
I'm closing this issue. Please reopen if you have any issues.
from newspaper.
Related Issues (20)
- 🌈
- 三国历史
- There seem to be complaints related to the user agent scraping permission issue
- Bing에서 공유됨:
- Bing에서 공유됨:
- Copilot에서 공유됨:
- Copilot에서 공유됨:
- https://sl.bing.net/gsRNraProei
- Bing에서 공유됨:
- 🌈💎🪞🐱📡
- http://naver.me/xLJVmPFB
- Bing에서 공유됨:
- Bing에서 공유됨:
- Bing에서 공유됨:
- Bing에서 공유됨:
- Bing에서 공유됨:
- Bing에서 공유됨:
- Questions about Copilot + Open Source Software Hierarchy
- 📡
- Copilot에서 공유됨:
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from newspaper.