sethblack / python-seo-analyzer Goto Github PK

An SEO tool that analyzes the structure of a site, crawls the site, count words in the body of the site and warns of any technical SEO issues.

License: Other

Python 86.91% HTML 12.78% Dockerfile 0.31%

seo python python-seo-analyzer analyzer python3 python-3 search-engine seo-optimization seo-monitor technical-seo

python-seo-analyzer's Introduction

Python SEO Analyzer

An SEO tool that analyzes the structure of a site, crawls the site, counts words in the body of the site and warns of any technical SEO issues.

Requires Python 3.6+, BeautifulSoup4 and urllib3.

Installation

PIP

pip3 install pyseoanalyzer

Docker

docker run sethblack/python-seo-analyzer [ARGS ...]

Command-line Usage

If you run without a sitemap it will start crawling at the homepage.

seoanalyze http://www.domain.com/

Or you can specify the path to a sitmap to seed the urls to scan list.

seoanalyze http://www.domain.com/ --sitemap path/to/sitemap.xml

HTML output can be generated from the analysis instead of json.

seoanalyze http://www.domain.com/ --output-format html

API

The analyze function returns a dictionary with the results of the crawl.

from seoanalyzer import analyze

output = analyze(site, sitemap)

print(output)

In order to analyze heading tags (h1-h6) and other extra additional tags as well, the following options can be passed to the analyze function

from seoanalyzer import analyze

output = analyze(site, sitemap, analyze_headings=True, analyze_extra_tags=True)

print(output)

By default, the analyze function analyzes all the existing inner links as well, which might be time consuming. This default behaviour can be changed to analyze only the provided URL by passing the following option to the analyze function

from seoanalyzer import analyze

output = analyze(site, sitemap, follow_links=False)

print(output)

Alternatively, you can run the analysis as a script from the seoanalyzer folder.

python -m seoanalyzer https://www.sethserver.com/ -f html > results.html

Notes

If you get requests.exceptions.SSLError at either the command-line or via the python-API, try using:

http://www.foo.bar

instead of..

https://www.foo.bar

python-seo-analyzer's People

Contributors

Stargazers

Watchers

Forkers

nickgzzjr maruthiprithivi b-rich bengeek88 gabouh amitgandhinz trevorfox speculator55005 ndrbrt nazim cash2one dlee325 enile8 pcoder optionalg sudhi10 fjk890 rocketoc tvignesh49 miklochon sasili-adetunji rtruxal juandisay quintendewilde bianc0niglio menudoproblema dhanunjaya-tokala forvendettaw itwonders chinopuente llazzaro ayeedshaikh andrewisidoro arunjoseph unclepeach z3r0h0ur666 prasb tdhungit stungkit seo-china rybo4490 ibadr3 beaupedraza dking3876 hamedkhosrawi fikachu mormukut11 theheartofraven iskysir rohitreddygaddam azzenabidi pablopda gustavoarmoa viribi sariogonfer alirezarezvani selmiyoussef abhishek-ag2000 mysticaltech josephachacha myke47 vanpitkinobi buscamais moralexgg yashodhank wgb128 yaroschak yesblogger jhgjhtuytdfbnfvmnbgjtuydt legendri kamalsharma9392 jyotikarajpal zldoty ravishekharco chavarera umohpyro iasad1 mindaugasvaitkus2 paqman85 imne-hh afcarl perryyo ronknight nagappankv namedevelopers agilulfe chiewxia ksheh1326 paulseperformance poeblu gasbarroni8 j-stone toscan6 rougeredwired mewt jhyoung09 hvaandres benoss coreymakes skylinemarketing

python-seo-analyzer's Issues

feat: open graph property support

open graph properties aid in SEO, so it would be good to scan a page to see if it supports it.

reference: http://itprism.com/blog/19-opengraph-microformats-semantic

404 Audit

As a user I'd like to see a 404 (broken link) audit.

Dictionary is not really usable

I am trying to Use the API Version and I get a really long Dictionary Returned. How do I convert that to a CSV or Html

NameError: name 'k' is not defined

Hi I get this error on my site (other sites work fine)

  File "seo.py", line 49, in __init__
    output = analyze(site, sitemap)
  File "/usr/local/lib/python3.5/dist-packages/seoanalyzer/analyzer.py", line 546, in analyze
    pg.analyze()
  File "/usr/local/lib/python3.5/dist-packages/seoanalyzer/analyzer.py", line 218, in analyze
    self.populate(soup_lower)
  File "/usr/local/lib/python3.5/dist-packages/seoanalyzer/analyzer.py", line 171, in populate
    k))
NameError: name 'k' is not defined

my code

site = "https://vandersluijs.nl"
sitemap = "https://vandersluijs.nl/sitemap_index.xml"
output = analyze(site, sitemap) #line gives error
print(str(output)

ImportError: cannot import name 'PoolManager' from 'urllib3'

Describe the bug
Not working ...

Expected behavior
still not working after fixinf urllib3

A clear and concise description of what you expected to happen.
python analyzer.py https://www.sethserver.com/ -f html > results.html
Traceback (most recent call last):
File "analyzer.py", line 5, in
from seoanalyzer.website import Website
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\seoanalyzer_init_.py", line 3, in
from .analyzer import analyze
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\seoanalyzer\analyzer.py", line 5, in
from seoanalyzer.website import Website
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\seoanalyzer\website.py", line 8, in
from seoanalyzer.http import http
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\seoanalyzer\http.py", line 2, in
import urllib3
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3_init_.py", line 8, in
from .connectionpool import (
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 11, in
from .exceptions import (
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\exceptions.py", line 2, in
from .packages.six.moves.http_client import (
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\packages\six.py", line 203, in load_module
mod = mod._resolve()
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\packages\six.py", line 115, in _resolve
return _import_module(self.mod)
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\packages\six.py", line 82, in import_module
import(name)
File "C:\Users\Manik Malhotra\Downloads\python-seo-analyzer-master (5)\python-seo-analyzer-master\seoanalyzer\http.py", line 2, in
from urllib3 import PoolManager
ImportError: cannot import name 'PoolManager' from 'urllib3' (C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3_init.py)

Desktop (please complete the following information):

OS:windows 10
Browser [ edge
Version 4.04

Keyword Analysis

Describe the bug
Words on keyword analysis list seem to have trailing characters missing.
Not always, but frequently enough that I noticed it.

To Reproduce
Steps to reproduce the behavior:

Run python/cli code
Navigate to keyword section of html output
Observe

add processing time

just for fun

'PoolManager' not found !!!!!

Describe the bug
After writing the command "python analyzer.py https://www.sethserver.com/ -f html > results.html" it got the error of pool manager not found.
how can i run this project

Expected behavior
to run the website and show answer in result.html

Screenshots
whole error

Traceback (most recent call last):
File "analyzer.py", line 5, in
from seoanalyzer.website import Website
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\seoanalyzer_init_.py", line 3, in
from .analyzer import analyze
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\seoanalyzer\analyzer.py", line 5, in
from seoanalyzer.website import Website
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\seoanalyzer\website.py", line 8, in
from seoanalyzer.http import http
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\seoanalyzer\http.py", line 2, in
import urllib3
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3_init_.py", line 8, in
from .connectionpool import (
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 11, in
from .exceptions import (
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\exceptions.py", line 2, in
from .packages.six.moves.http_client import (
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 668, in _load_unlocked
File "", line 638, in _load_backward_compatible
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\packages\six.py", line 203, in load_module
mod = mod._resolve()
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\packages\six.py", line 115, in _resolve
return _import_module(self.mod)
File "C:\Users\Manik Malhotra\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\packages\six.py", line 82, in _import_module
import(name)
File "F:\python-seo-analyzer-master (2)\python-seo-analyzer-master\seoanalyzer\http.py", line 18, in
http = Http()
File "F:\python-seo-analyzer-master (2)\python-seo-analyzer-master\seoanalyzer\http.py", line 8, in init
self.http = urllib3.PoolManager(
AttributeError: module 'urllib3' has no attribute 'PoolManager'

Desktop (please complete the following information):*

OS: [Windows]
Browser [microsoft edge, chrome]
Version [4.0.2]

Additional context
Add any other context about the problem here.

Errors on large sites?

Hey, great script and works great on small sites, but i guess i am having issues and not sure if this is related to the size of the site or special characters in there:

File "...programs\python\python38-32\lib\runpy.py", line 192, in _run_module_as_main
return run_code(code, main_globals, None,
File "...programs\python\python38-32\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "...Programs\Python\Python38-32\Scripts\seoanalyze.exe_main.py", line 7, in
File "...\appdata\local\programs\python\python38-32\lib\site-packages\seoanalyzer_main.py", line 35, in main
print(output_from_parsed_template)
File "...\appdata\local\programs\python\python38-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u1ea5' in position 1179652: character maps to

Runing on localhost:port only crawls home page

I installed the library and every thing seems good i used python 3 on a Linux Ubuntu 16 machine
when i run on my local instance it only crawls the home page but when i run against my live site http://mysite.com it crawlies all pages.

Memory Leak with the API

When using python-seo-analyzer as an API the memory is never released
That's because

wordcount = {}
two_ngram = Counter()
three_ngram = Counter()
pages_crawled = []
pages_to_crawl = []
stem_to_word = {}
stemmer = nltk.stem.porter.PorterStemmer()
page_titles = []
page_descriptions = []

are defined globally in analyzer.py

This also prevent from running 2 different analyze() on 2 different sites with the same interpreter.

Suggestion create a Site class that would be instantiated at the beginning of analyze would hold the data and will be released at the end of the function.

Make Page object more independant

I know it is a bit more longshot, but do you think you could refactor the Page so that a Page object can get the full HTML as a parameter and skip the request.get() in analyze.

I have 2 real use cases for that:

I have a Django website and I'd like to do a report in the CMS when somebody preview a page. The page is not live yet, but in Django I have access to the rendered HTML. Sending it to the Page could give the writers good insights before an article goes live
One of my client has an Angular + API website. Because this is a full page app the HTML is just javascript and not returning anything. I can fetch those pages with Javascript rendering using Selenium and chrome driver then extract the DOM and send it to python-seo-analyzer

certificate verify failed

I tried to run this on Mac:

seoanalyze http://www.benefits.gov
Traceback (most recent call last):
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 444, in wrap_socket
cnx.do_handshake()
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1907, in do_handshake
self._raise_ssl_error(self._ssl, result)
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1639, in _raise_ssl_error
_raise_current_error()
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/OpenSSL/_util.py", line 54, in exception_from_error_queue
raise exception_type(errors)
OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py", line 849, in validate_conn
conn.connect()
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/urllib3/connection.py", line 356, in connect
ssl_context=context)
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/urllib3/util/ssl.py", line 359, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 450, in wrap_socket
raise ssl.SSLError('bad handshake: %r' % e)
ssl.SSLError: ("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')])",)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/requests/adapters.py", line 445, in send
timeout=timeout
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/urllib3/util/retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='ashbbcpsg03.usae.bah.com', port=4433): Max retries exceeded with url: /?cfru=aHR0cDovL3d3dy5iZW5lZml0cy5nb3Yv (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')])")))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/syedahmed/anaconda3/bin/seoanalyze", line 11, in
sys.exit(main())
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/seoanalyzer/main.py", line 26, in main
output = analyze(args.site, args.sitemap)
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/seoanalyzer/analyzer.py", line 622, in analyze
pg.analyze()
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/seoanalyzer/analyzer.py", line 243, in analyze
page = self.session.get(self.url)
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 525, in get
return self.request('GET', url, **kwargs)
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 644, in send
history = [resp for resp in gen] if allow_redirects else []
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 644, in
history = [resp for resp in gen] if allow_redirects else []
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 222, in resolve_redirects
**adapter_kwargs
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "/Users/syedahmed/anaconda3/lib/python3.7/site-packages/requests/adapters.py", line 511, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='ashbbcpsg03.usae.bah.com', port=4433): Max retries exceeded with url: /?cfru=aHR0cDovL3d3dy5iZW5lZml0cy5nb3Yv (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')])")))

Duplicate Content Check

As a user I'd like to see warnings for duplicate content.

Scan only one page

Anyway to scan only 1 page? I tried to add only 1 url in a sitemap but it did not work.

When trying to build Dockerfile, we get error "Error: Please make sure the libxml2 and libxslt development packages are installed."

Describe the bug

Trying to build Dockerfile results in the error:

Error: Please make sure the libxml2 and libxslt development packages are installed.

To Reproduce

docker build --progress=plain .

#1 [internal] load build definition from Dockerfile
#1 sha256:ca1a3ea44b16c6c0ed281e17b8be63ee91165143750d4d5baaa06a6304293c42
#1 transferring dockerfile: 162B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 sha256:7104b95f2931ddf29b0d17d59efe21cbb4f726aaf604f361fa3536a5b63b18d2
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/library/python:3-alpine
#3 sha256:64ad20c706f0d0f73eed262cd381a08bc4d90b5012dcc541110fc95d1f8eb2f5
#3 DONE 4.4s

#6 [internal] load build context
#6 sha256:27c2c9293847f00a8c49645172e6dd67a9b6eba343a3171dbf00cabfada82b97
#6 transferring context: 240.96kB 0.0s done
#6 DONE 0.0s

#4 [1/4] FROM docker.io/library/python:3-alpine@sha256:7099d74f22c2d7a597875c3084e840846ca294ad01da1e845b0154100a6ac15b
#4 sha256:cc28dc005d2eca7ba4c0b767bbf501395dd4476d63bb3faf60c9816bd69f6432
#4 resolve docker.io/library/python:3-alpine@sha256:7099d74f22c2d7a597875c3084e840846ca294ad01da1e845b0154100a6ac15b done
#4 sha256:7099d74f22c2d7a597875c3084e840846ca294ad01da1e845b0154100a6ac15b 1.65kB / 1.65kB done
#4 sha256:052d3e34bb778210138247340bd82f18fc45fb506e89ff0ea6c182e3f98593a7 1.37kB / 1.37kB done
#4 sha256:60bc44358912526bc4bf5dd6199cd731e876893ddef8513dee02fed271c7bd05 7.08kB / 7.08kB done
#4 sha256:148d739a8e6b9342daa1f5b428d3a3c6118f340f21df28c16e06f918ef150147 0B / 2.71MB 0.1s
#4 sha256:582f36864e09c64d3e23d060e609187b7c702f079cd984b2be6b46e2a83b3c71 0B / 668.15kB 0.1s
#4 sha256:81f13c633245010441e09d30ba078b2753389ad7cc1428bfbc54f64eb3ce2956 0B / 12.35MB 0.1s
#4 sha256:148d739a8e6b9342daa1f5b428d3a3c6118f340f21df28c16e06f918ef150147 2.71MB / 2.71MB 0.5s
#4 sha256:148d739a8e6b9342daa1f5b428d3a3c6118f340f21df28c16e06f918ef150147 2.71MB / 2.71MB 0.5s done
#4 sha256:582f36864e09c64d3e23d060e609187b7c702f079cd984b2be6b46e2a83b3c71 668.15kB / 668.15kB 0.6s done
#4 extracting sha256:148d739a8e6b9342daa1f5b428d3a3c6118f340f21df28c16e06f918ef150147 0.1s
#4 sha256:044058648f3a966c82116fe875c914c863c0a2cc7ea10ece1e75386197b8a7a0 0B / 234B 0.6s
#4 sha256:46eb628fede2974e3dee1a129a5f1906cb6d42eb6e981af3a34810567a60ecf6 0B / 2.87MB 0.6s
#4 sha256:81f13c633245010441e09d30ba078b2753389ad7cc1428bfbc54f64eb3ce2956 1.05MB / 12.35MB 0.7s
#4 extracting sha256:148d739a8e6b9342daa1f5b428d3a3c6118f340f21df28c16e06f918ef150147 0.1s done
#4 extracting sha256:582f36864e09c64d3e23d060e609187b7c702f079cd984b2be6b46e2a83b3c71 0.1s
#4 extracting sha256:582f36864e09c64d3e23d060e609187b7c702f079cd984b2be6b46e2a83b3c71 0.1s done
#4 sha256:81f13c633245010441e09d30ba078b2753389ad7cc1428bfbc54f64eb3ce2956 2.10MB / 12.35MB 1.0s
#4 sha256:044058648f3a966c82116fe875c914c863c0a2cc7ea10ece1e75386197b8a7a0 234B / 234B 1.0s done
#4 sha256:81f13c633245010441e09d30ba078b2753389ad7cc1428bfbc54f64eb3ce2956 3.15MB / 12.35MB 1.1s
#4 sha256:81f13c633245010441e09d30ba078b2753389ad7cc1428bfbc54f64eb3ce2956 4.19MB / 12.35MB 1.2s
#4 sha256:81f13c633245010441e09d30ba078b2753389ad7cc1428bfbc54f64eb3ce2956 5.24MB / 12.35MB 1.4s
#4 sha256:46eb628fede2974e3dee1a129a5f1906cb6d42eb6e981af3a34810567a60ecf6 1.05MB / 2.87MB 1.4s
#4 sha256:81f13c633245010441e09d30ba078b2753389ad7cc1428bfbc54f64eb3ce2956 6.29MB / 12.35MB 1.6s
#4 sha256:81f13c633245010441e09d30ba078b2753389ad7cc1428bfbc54f64eb3ce2956 8.39MB / 12.35MB 1.9s
#4 sha256:46eb628fede2974e3dee1a129a5f1906cb6d42eb6e981af3a34810567a60ecf6 2.87MB / 2.87MB 1.9s done
#4 sha256:81f13c633245010441e09d30ba078b2753389ad7cc1428bfbc54f64eb3ce2956 9.44MB / 12.35MB 2.1s
#4 sha256:81f13c633245010441e09d30ba078b2753389ad7cc1428bfbc54f64eb3ce2956 10.49MB / 12.35MB 2.3s
#4 sha256:81f13c633245010441e09d30ba078b2753389ad7cc1428bfbc54f64eb3ce2956 11.53MB / 12.35MB 2.6s
#4 sha256:81f13c633245010441e09d30ba078b2753389ad7cc1428bfbc54f64eb3ce2956 12.35MB / 12.35MB 2.6s done
#4 extracting sha256:81f13c633245010441e09d30ba078b2753389ad7cc1428bfbc54f64eb3ce2956
#4 extracting sha256:81f13c633245010441e09d30ba078b2753389ad7cc1428bfbc54f64eb3ce2956 0.3s done
#4 extracting sha256:044058648f3a966c82116fe875c914c863c0a2cc7ea10ece1e75386197b8a7a0
#4 extracting sha256:044058648f3a966c82116fe875c914c863c0a2cc7ea10ece1e75386197b8a7a0 done
#4 extracting sha256:46eb628fede2974e3dee1a129a5f1906cb6d42eb6e981af3a34810567a60ecf6 0.1s
#4 extracting sha256:46eb628fede2974e3dee1a129a5f1906cb6d42eb6e981af3a34810567a60ecf6 0.1s done
#4 DONE 3.3s

#5 [2/4] WORKDIR /app
#5 sha256:ab37595f8baa0fd3ceecc2248029ec5652d56972339c797855cecd0c7ae5ae3f
#5 DONE 0.1s

#7 [3/4] COPY . /app
#7 sha256:e478db86057002c42ee14550e93eb08a80719ee27f23b15c655a889a902da687
#7 DONE 0.0s

#8 [4/4] RUN python3 setup.py install
#8 sha256:7973305ca98aeb3f29f52989e8748fc8c0d15aaf7b9ea2fb09e36ffb02034fa8
#8 0.585 running install
#8 0.686 running bdist_egg
#8 0.686 running egg_info
#8 0.686 creating pyseoanalyzer.egg-info
#8 0.686 writing pyseoanalyzer.egg-info/PKG-INFO
#8 0.686 writing dependency_links to pyseoanalyzer.egg-info/dependency_links.txt
#8 0.687 writing entry points to pyseoanalyzer.egg-info/entry_points.txt
#8 0.687 writing requirements to pyseoanalyzer.egg-info/requires.txt
#8 0.687 writing top-level names to pyseoanalyzer.egg-info/top_level.txt
#8 0.687 writing manifest file 'pyseoanalyzer.egg-info/SOURCES.txt'
#8 0.691 reading manifest file 'pyseoanalyzer.egg-info/SOURCES.txt'
#8 0.691 reading manifest template 'MANIFEST.in'
#8 0.691 adding license file 'LICENSE'
#8 0.691 writing manifest file 'pyseoanalyzer.egg-info/SOURCES.txt'
#8 0.692 installing library code to build/bdist.linux-aarch64/egg
#8 0.692 running install_lib
#8 0.692 running build_py
#8 0.692 creating build
#8 0.692 creating build/lib
#8 0.692 creating build/lib/seoanalyzer
#8 0.692 copying seoanalyzer/__main__.py -> build/lib/seoanalyzer
#8 0.692 copying seoanalyzer/http.py -> build/lib/seoanalyzer
#8 0.692 copying seoanalyzer/page.py -> build/lib/seoanalyzer
#8 0.693 copying seoanalyzer/analyzer.py -> build/lib/seoanalyzer
#8 0.693 copying seoanalyzer/__init__.py -> build/lib/seoanalyzer
#8 0.693 copying seoanalyzer/stemmer.py -> build/lib/seoanalyzer
#8 0.693 copying seoanalyzer/website.py -> build/lib/seoanalyzer
#8 0.693 creating build/lib/tests
#8 0.693 copying tests/test_http.py -> build/lib/tests
#8 0.693 copying tests/test_page.py -> build/lib/tests
#8 0.693 copying tests/__init__.py -> build/lib/tests
#8 0.693 copying tests/test_analyzer.py -> build/lib/tests
#8 0.694 creating build/lib/seoanalyzer/templates
#8 0.694 copying seoanalyzer/templates/index.html -> build/lib/seoanalyzer/templates
#8 0.694 creating build/bdist.linux-aarch64
#8 0.694 creating build/bdist.linux-aarch64/egg
#8 0.694 creating build/bdist.linux-aarch64/egg/seoanalyzer
#8 0.694 creating build/bdist.linux-aarch64/egg/seoanalyzer/templates
#8 0.695 copying build/lib/seoanalyzer/templates/index.html -> build/bdist.linux-aarch64/egg/seoanalyzer/templates
#8 0.695 copying build/lib/seoanalyzer/__main__.py -> build/bdist.linux-aarch64/egg/seoanalyzer
#8 0.695 copying build/lib/seoanalyzer/http.py -> build/bdist.linux-aarch64/egg/seoanalyzer
#8 0.695 copying build/lib/seoanalyzer/page.py -> build/bdist.linux-aarch64/egg/seoanalyzer
#8 0.695 copying build/lib/seoanalyzer/analyzer.py -> build/bdist.linux-aarch64/egg/seoanalyzer
#8 0.695 copying build/lib/seoanalyzer/__init__.py -> build/bdist.linux-aarch64/egg/seoanalyzer
#8 0.695 copying build/lib/seoanalyzer/stemmer.py -> build/bdist.linux-aarch64/egg/seoanalyzer
#8 0.695 copying build/lib/seoanalyzer/website.py -> build/bdist.linux-aarch64/egg/seoanalyzer
#8 0.696 creating build/bdist.linux-aarch64/egg/tests
#8 0.696 copying build/lib/tests/test_http.py -> build/bdist.linux-aarch64/egg/tests
#8 0.696 copying build/lib/tests/test_page.py -> build/bdist.linux-aarch64/egg/tests
#8 0.697 copying build/lib/tests/__init__.py -> build/bdist.linux-aarch64/egg/tests
#8 0.697 copying build/lib/tests/test_analyzer.py -> build/bdist.linux-aarch64/egg/tests
#8 0.698 byte-compiling build/bdist.linux-aarch64/egg/seoanalyzer/__main__.py to __main__.cpython-310.pyc
#8 0.698 byte-compiling build/bdist.linux-aarch64/egg/seoanalyzer/http.py to http.cpython-310.pyc
#8 0.699 byte-compiling build/bdist.linux-aarch64/egg/seoanalyzer/page.py to page.cpython-310.pyc
#8 0.701 byte-compiling build/bdist.linux-aarch64/egg/seoanalyzer/analyzer.py to analyzer.cpython-310.pyc
#8 0.702 byte-compiling build/bdist.linux-aarch64/egg/seoanalyzer/__init__.py to __init__.cpython-310.pyc
#8 0.702 byte-compiling build/bdist.linux-aarch64/egg/seoanalyzer/stemmer.py to stemmer.cpython-310.pyc
#8 0.703 byte-compiling build/bdist.linux-aarch64/egg/seoanalyzer/website.py to website.cpython-310.pyc
#8 0.703 byte-compiling build/bdist.linux-aarch64/egg/tests/test_http.py to test_http.cpython-310.pyc
#8 0.703 byte-compiling build/bdist.linux-aarch64/egg/tests/test_page.py to test_page.cpython-310.pyc
#8 0.704 byte-compiling build/bdist.linux-aarch64/egg/tests/__init__.py to __init__.cpython-310.pyc
#8 0.704 byte-compiling build/bdist.linux-aarch64/egg/tests/test_analyzer.py to test_analyzer.cpython-310.pyc
#8 0.704 creating build/bdist.linux-aarch64/egg/EGG-INFO
#8 0.704 copying pyseoanalyzer.egg-info/PKG-INFO -> build/bdist.linux-aarch64/egg/EGG-INFO
#8 0.705 copying pyseoanalyzer.egg-info/SOURCES.txt -> build/bdist.linux-aarch64/egg/EGG-INFO
#8 0.705 copying pyseoanalyzer.egg-info/dependency_links.txt -> build/bdist.linux-aarch64/egg/EGG-INFO
#8 0.705 copying pyseoanalyzer.egg-info/entry_points.txt -> build/bdist.linux-aarch64/egg/EGG-INFO
#8 0.705 copying pyseoanalyzer.egg-info/not-zip-safe -> build/bdist.linux-aarch64/egg/EGG-INFO
#8 0.705 copying pyseoanalyzer.egg-info/requires.txt -> build/bdist.linux-aarch64/egg/EGG-INFO
#8 0.705 copying pyseoanalyzer.egg-info/top_level.txt -> build/bdist.linux-aarch64/egg/EGG-INFO
#8 0.705 creating dist
#8 0.705 creating 'dist/pyseoanalyzer-4.0.6-py3.10.egg' and adding 'build/bdist.linux-aarch64/egg' to it
#8 0.710 removing 'build/bdist.linux-aarch64/egg' (and everything under it)
#8 0.711 Processing pyseoanalyzer-4.0.6-py3.10.egg
#8 0.714 creating /usr/local/lib/python3.10/site-packages/pyseoanalyzer-4.0.6-py3.10.egg
#8 0.714 Extracting pyseoanalyzer-4.0.6-py3.10.egg to /usr/local/lib/python3.10/site-packages
#8 0.724 Adding pyseoanalyzer 4.0.6 to easy-install.pth file
#8 0.725 Installing seoanalyze script to /usr/local/bin
#8 0.725 
#8 0.725 Installed /usr/local/lib/python3.10/site-packages/pyseoanalyzer-4.0.6-py3.10.egg
#8 0.726 Processing dependencies for pyseoanalyzer==4.0.6
#8 0.727 Searching for certifi
#8 0.727 Reading https://pypi.org/simple/certifi/
#8 0.876 Downloading https://files.pythonhosted.org/packages/37/45/946c02767aabb873146011e665728b680884cd8fe70dde973c640e45b775/certifi-2021.10.8-py2.py3-none-any.whl#sha256=d62a0163eb4c2344ac042ab2bdf75399a71a2d8c7d47eac2e2ee91b9d6339569
#8 1.057 Best match: certifi 2021.10.8
#8 1.057 Processing certifi-2021.10.8-py2.py3-none-any.whl
#8 1.058 Installing certifi-2021.10.8-py2.py3-none-any.whl to /usr/local/lib/python3.10/site-packages
#8 1.063 Adding certifi 2021.10.8 to easy-install.pth file
#8 1.065 
#8 1.065 Installed /usr/local/lib/python3.10/site-packages/certifi-2021.10.8-py3.10.egg
#8 1.065 Searching for urllib3
#8 1.065 Reading https://pypi.org/simple/urllib3/
#8 1.232 Downloading https://files.pythonhosted.org/packages/ec/03/062e6444ce4baf1eac17a6a0ebfe36bb1ad05e1df0e20b110de59c278498/urllib3-1.26.9-py2.py3-none-any.whl#sha256=44ece4d53fb1706f667c9bd1c648f5469a2ec925fcf3a776667042d645472c14
#8 1.382 Best match: urllib3 1.26.9
#8 1.382 Processing urllib3-1.26.9-py2.py3-none-any.whl
#8 1.383 Installing urllib3-1.26.9-py2.py3-none-any.whl to /usr/local/lib/python3.10/site-packages
#8 1.401 Adding urllib3 1.26.9 to easy-install.pth file
#8 1.402 
#8 1.402 Installed /usr/local/lib/python3.10/site-packages/urllib3-1.26.9-py3.10.egg
#8 1.404 Searching for jinja2
#8 1.404 Reading https://pypi.org/simple/jinja2/
#8 1.530 Downloading https://files.pythonhosted.org/packages/20/9a/e5d9ec41927401e41aea8af6d16e78b5e612bca4699d417f646a9610a076/Jinja2-3.0.3-py3-none-any.whl#sha256=077ce6014f7b40d03b47d1f1ca4b0fc8328a692bd284016f806ed0eaca390ad8
#8 1.677 Best match: Jinja2 3.0.3
#8 1.677 Processing Jinja2-3.0.3-py3-none-any.whl
#8 1.677 Installing Jinja2-3.0.3-py3-none-any.whl to /usr/local/lib/python3.10/site-packages
#8 1.685 Adding Jinja2 3.0.3 to easy-install.pth file
#8 1.686 
#8 1.686 Installed /usr/local/lib/python3.10/site-packages/Jinja2-3.0.3-py3.10.egg
#8 1.687 Searching for requests
#8 1.687 Reading https://pypi.org/simple/requests/
#8 1.900 Downloading https://files.pythonhosted.org/packages/2d/61/08076519c80041bc0ffa1a8af0cbd3bf3e2b62af10435d269a9d0f40564d/requests-2.27.1-py2.py3-none-any.whl#sha256=f22fa1e554c9ddfd16e6e41ac79759e17be9e492b3587efa038054674760e72d
#8 2.042 Best match: requests 2.27.1
#8 2.042 Processing requests-2.27.1-py2.py3-none-any.whl
#8 2.043 Installing requests-2.27.1-py2.py3-none-any.whl to /usr/local/lib/python3.10/site-packages
#8 2.057 Adding requests 2.27.1 to easy-install.pth file
#8 2.058 
#8 2.058 Installed /usr/local/lib/python3.10/site-packages/requests-2.27.1-py3.10.egg
#8 2.060 Searching for lxml
#8 2.060 Reading https://pypi.org/simple/lxml/
#8 2.594 Downloading https://files.pythonhosted.org/packages/3b/94/e2b1b3bad91d15526c7e38918795883cee18b93f6785ea8ecf13f8ffa01e/lxml-4.8.0.tar.gz#sha256=f63f62fc60e6228a4ca9abae28228f35e1bd3ce675013d1dfb828688d50c6e23
#8 3.175 Best match: lxml 4.8.0
#8 3.175 Processing lxml-4.8.0.tar.gz
#8 3.353 Writing /tmp/easy_install-qymx0_na/lxml-4.8.0/setup.cfg
#8 3.353 Running lxml-4.8.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-qymx0_na/lxml-4.8.0/egg-dist-tmp-w8ibzyt9
#8 3.392 error: Setup script exited with 1
#8 3.393 Building lxml version 4.8.0.
#8 3.393 Building without Cython.
#8 3.393 Error: Please make sure the libxml2 and libxslt development packages are installed.
#8 ERROR: executor failed running [/bin/sh -c python3 setup.py install]: exit code: 1
------
 > [4/4] RUN python3 setup.py install:
------
executor failed running [/bin/sh -c python3 setup.py install]: exit code: 1

Expected behavior

Dockerfile should build.

Desktop (please complete the following information):

macOS
Docker version 20.10.12, build e91ed57

Python 3.8.5 not working at all

Describe the bug
Hi, help me please: I installed all libraries in accordance with your guide. But when I run a simple first example:

seoanalyze https://www.prostoprelest.com.ua/

terminal shows errors:

{
"resource": "/c:/Users/Admin/hello/python-seo-analyzer/my-script.py",
"owner": "generated_diagnostic_collection_name#0",
"severity": 8,
"message": "Statements must be separated by newlines or semicolons",
"source": "Pylance",
"startLineNumber": 1,
"startColumn": 12,
"endLineNumber": 1,
"endColumn": 17
}

{
"resource": "/c:/Users/Admin/hello/python-seo-analyzer/my-script.py",
"owner": "generated_diagnostic_collection_name#0",
"severity": 8,
"message": "Expected expression",
"source": "Pylance",
"startLineNumber": 1,
"startColumn": 18,
"endLineNumber": 1,
"endColumn": 20
}

{
"resource": "/c:/Users/Admin/hello/python-seo-analyzer/my-script.py",
"owner": "generated_diagnostic_collection_name#0",
"code": {
"value": "reportUndefinedVariable",
"target": {
"$mid": 1,
"external": "https://github.com/microsoft/pylance-release/blob/main/DIAGNOSTIC_SEVERITY_RULES.md#diagnostic-severity-rules",
"path": "/microsoft/pylance-release/blob/main/DIAGNOSTIC_SEVERITY_RULES.md",
"scheme": "https",
"authority": "github.com",
"fragment": "diagnostic-severity-rules"
}
},
"severity": 4,
"message": ""seoanalyze" is not defined",
"source": "Pylance",
"startLineNumber": 1,
"startColumn": 1,
"endLineNumber": 1,
"endColumn": 11
}

Thanks ahead

TemplateNotFound

When running with html output request, error received:

Traceback (most recent call last):
File "/usr/local/bin/seoanalyze", line 11, in
sys.exit(main())
File "/usr/local/lib/python3.6/site-packages/seoanalyzer/main.py", line 33, in main
template = env.get_template('index.html')
File "/usr/local/lib/python3.6/site-packages/jinja2/environment.py", line 830, in get_template
return self._load_template(name, self.make_globals(globals))
File "/usr/local/lib/python3.6/site-packages/jinja2/environment.py", line 804, in _load_template
template = self.loader.load(self, name, globals)
File "/usr/local/lib/python3.6/site-packages/jinja2/loaders.py", line 113, in load
source, filename, uptodate = self.get_source(environment, name)
File "/usr/local/lib/python3.6/site-packages/jinja2/loaders.py", line 187, in get_source
raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: index.html

Please Check analyzer.py line 171

While using your code, I found this problem. Please fix it ASAP! @sethblack
`$ python seoanalyze.py > seow3.txt

Traceback (most recent call last):
File "seoanalyze.py", line 4, in
output = analyze("http://www.w3schools.com")
File "/home/fahimfarhan/anaconda3/lib/python3.6/site-packages/seoanalyzer/analyzer.py", line 546, in analyze
pg.analyze()
File "/home/fahimfarhan/anaconda3/lib/python3.6/site-packages/seoanalyzer/analyzer.py", line 218, in analyze
self.populate(soup_lower)
File "/home/fahimfarhan/anaconda3/lib/python3.6/site-packages/seoanalyzer/analyzer.py", line 171, in populate
k))
NameError: name 'k' is not defined
`

feat: Structured JSON Results

Currently the results are returned partially in JSON for the social stats and list of warnings. The word stats are then just returned as output without structure.

It would be nice to return all the results in a nice parseable JSON output so that the output can be used to generate GUI's/reports over the top of it.

For Example:

{
 {
 "www.site.com/page.html": {
   "social": [ { ... } ]
   "warnings" : [ {...} ]
 },
 {
 "keywords": {
    "apple" : 6,
    "banana": 2,
    "monkey eat banana" : 4
  }
 }
}

Argparse

Not working with 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21)

FYI

error output below:

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/bin/seoanalyze", line 11, in
load_entry_point('pyseoanalyzer==3.1.1', 'console_scripts', 'seoanalyze')()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/seoanalyzer/main.py", line 33, in main
template = env.get_template('index.html')
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/jinja2/environment.py", line 830, in get_template
return self._load_template(name, self.make_globals(globals))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/jinja2/environment.py", line 804, in _load_template
template = self.loader.load(self, name, globals)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/jinja2/loaders.py", line 113, in load
source, filename, uptodate = self.get_source(environment, name)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/jinja2/loaders.py", line 187, in get_source
raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: index.html

utf-8' codec can't decode bytes in position 31608-31609

I am using Jupyter notebook to run the script. I used the example from this site, but with an actual company website. This is on windows 10 using the latest version of Anaconda.

What am I doing incorrectly?

Input:
from seoanalyzer import analyze
site = 'http://www.site.com'
sitemap = None
output = analyze(site, sitemap)
print(output)

Results:

UnicodeDecodeError Traceback (most recent call last)
in
4 sitemap = None
5
----> 6 output = analyze(site, sitemap)
7 print(output)

C:\ProgramData\Anaconda3\lib\site-packages\seoanalyzer\analyzer.py in analyze(url, sitemap_url)
15 site = Website(url, sitemap_url)
16
---> 17 site.crawl()
18
19 for p in site.crawled_pages:

C:\ProgramData\Anaconda3\lib\site-packages\seoanalyzer\website.py in crawl(self)
63 continue
64
---> 65 page.analyze()
66
67 self.content_hashes[page.content_hash].add(page.url)

C:\ProgramData\Anaconda3\lib\site-packages\seoanalyzer\page.py in analyze(self, raw_html)
170 return
171 else:
--> 172 raw_html = page.data.decode('utf-8')
173
174 self.content_hash = hashlib.sha1(raw_html.encode('utf-8')).hexdigest()

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 31608-31609: invalid continuation byteAdd any other context about the problem here.

Invalid Syntax for all commands

Hi everyone, i can't manage to make it work ! I installed all the required package
import urllib.request
import urllib.error
import urllib3

!tar -xvzf /content/pyseoanalyzer-4.0.7.tar.gz
!cd path/to/package && pip install .

!pip install docker

!pip install pyseoanalyzer

And i keep having invalid syntax errors on my request like : seoanalyze http://internetvergelijnk.nl/

But this one work
from seoanalyzer import analyze

output = analyze('https://www.3ds.com/consulting-services-value-engagement/')

print(output)

I don't understand what i did wrong :')

Trying to pull Docker image sethblack/python-seo-analyzer gives error "repository does not exist"

Describe the bug

The README says this can be used with Docker:

docker run sethblack/python-seo-analyzer [ARGS ...]

However, that image does not exist.

To Reproduce

$ docker pull sethblack/python-seo-analyzer           
Using default tag: latest
Error response from daemon: pull access denied for sethblack/python-seo-analyzer, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

Expected behavior

Docker image should either exist, or the README should not contain Docker information.

Desktop (please complete the following information):

OS: macOS

No file gets output?

No matter what I do, no file gets output. seoanalyze -f html https://x.com

The script seems to do what it needs as the data is printed in the terminal but no file is being created.

..

Traceback errors, seoanalyze won't run

02:21 PM [pilot]~ root # seoanalyze
Traceback (most recent call last):
File "/usr/local/bin/seoanalyze", line 7, in
from seoanalyzer.main import main
File "/usr/local/lib/python3.5/dist-packages/seoanalyzer/init.py", line 3, in
from .analyzer import analyze
File "/usr/local/lib/python3.5/dist-packages/seoanalyzer/analyzer.py", line 5, in
from seoanalyzer.website import Website
File "/usr/local/lib/python3.5/dist-packages/seoanalyzer/website.py", line 9, in
from seoanalyzer.page import Page
File "/usr/local/lib/python3.5/dist-packages/seoanalyzer/page.py", line 128
self.warn(f'Keywords should be avoided as they are a spam indicator and no longer used by Search Engines: {keywords}')
^
SyntaxError: invalid syntax

02:21 PM [pilot]~ root # pip3 install -r requirements.txt
Requirement already satisfied: beautifulsoup4==4.6.0 in /usr/local/lib/python3.5/dist-packages (from -r requirements.txt (line 1))
Requirement already satisfied: requests==2.20.0 in /usr/local/lib/python3.5/dist-packages (from -r requirements.txt (line 2))
Requirement already satisfied: Jinja2==2.10.1 in /usr/local/lib/python3.5/dist-packages (from -r requirements.txt (line 3))
Requirement already satisfied: urllib3==1.24.2 in /usr/local/lib/python3.5/dist-packages (from -r requirements.txt (line 4))
Requirement already satisfied: certifi==2018.11.29 in /usr/local/lib/python3.5/dist-packages (from -r requirements.txt (line 5))
Requirement already satisfied: idna<2.8,>=2.5 in /usr/local/lib/python3.5/dist-packages (from requests==2.20.0->-r requirements.txt (line 2))
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.5/dist-packages (from requests==2.20.0->-r requirements.txt (line 2))
Requirement already satisfied: MarkupSafe>=0.23 in /usr/lib/python3/dist-packages (from Jinja2==2.10.1->-r requirements.txt (line 3))

ModuleNotFoundError: No module named 'lxml'

I use Python 3.8.10
I use virtual environment to run your script
I check my web already have index.xml

When i run it from terminal :

seoanalyze http://www.domain.com/

seoanalyze http://www.domain.com/ --sitemap http://www.domain.com/sf2-sitemap/index.xml

it response :

Traceback (most recent call last):
File "...../my-venv/bin/seoanalyze", line 7, in
from seoanalyzer.main import main
File "...../my-venv/lib/python3.8/site-packages/seoanalyzer/init.py", line 3, in
from .analyzer import analyze
File "...../my-venv/lib/python3.8/site-packages/seoanalyzer/analyzer.py", line 5, in
from seoanalyzer.website import Website
File "...../my-venv/lib/python3.8/site-packages/seoanalyzer/website.py", line 9, in
from seoanalyzer.page import Page
File "...../my-venv/lib/python3.8/site-packages/seoanalyzer/page.py", line 8, in
import lxml.html as lh
ModuleNotFoundError: No module named 'lxml'

Need your help. Many thanks.

TF/IDF

Bring in TF/IDF content report per page.

feat: SEO ScoreCard

As a user I would like to see each page scored (A, B, C, F) based on the issues found so that I know where my effort should be focused for On Page SEO Optimization.

Out-of-domain pages get crawled.

Describe the bug
If there are links outside of the domain they are still crawled.

To Reproduce
Steps to reproduce the behavior:

Run on website with external links.

Expected behavior
External pages should not be crawled.

Replace the template

Is there any way that will allow me to replace the index.html with my own template or do I need to export as JSON and play with that ?

Selenium Implementation For Website that uses JavaScript

error: option --single-version-externally-managed not recognized

chetan@chetan-Inspiron-3542:~/Desktop$ sudo pip3 install -U  pyseoanalyzer
The directory '/home/chetan/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/chetan/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting pyseoanalyzer
  Downloading pyseoanalyzer-3.0.5.tar.gz
Requirement already up-to-date: BeautifulSoup4 in /usr/local/lib/python3.4/dist-packages (from pyseoanalyzer)
Requirement already up-to-date: nltk in /usr/local/lib/python3.4/dist-packages (from pyseoanalyzer)
Requirement already up-to-date: numpy in /usr/local/lib/python3.4/dist-packages (from pyseoanalyzer)
Requirement already up-to-date: requests in /usr/local/lib/python3.4/dist-packages (from pyseoanalyzer)
Requirement already up-to-date: jinja2 in /usr/local/lib/python3.4/dist-packages (from pyseoanalyzer)
Requirement already up-to-date: six in /usr/local/lib/python3.4/dist-packages (from nltk->pyseoanalyzer)
Requirement already up-to-date: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.4/dist-packages (from requests->pyseoanalyzer)
Requirement already up-to-date: urllib3<1.23,>=1.21.1 in /usr/local/lib/python3.4/dist-packages (from requests->pyseoanalyzer)
Requirement already up-to-date: idna<2.7,>=2.5 in /usr/local/lib/python3.4/dist-packages (from requests->pyseoanalyzer)
Requirement already up-to-date: certifi>=2017.4.17 in /usr/local/lib/python3.4/dist-packages (from requests->pyseoanalyzer)
Requirement already up-to-date: MarkupSafe>=0.23 in /usr/local/lib/python3.4/dist-packages (from jinja2->pyseoanalyzer)
Installing collected packages: pyseoanalyzer
  Running setup.py install for pyseoanalyzer ... error
    Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-6d9pf_nh/pyseoanalyzer/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-g5lm90ji-record/install-record.txt --single-version-externally-managed --compile:
    usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
       or: -c --help [cmd1 cmd2 ...]
       or: -c --help-commands
       or: -c cmd --help
    
    error: option --single-version-externally-managed not recognized

loop on multiple Analyze website write on same variable object

Hi,
I want to analyze multiple website by loop on a list and write the results in a json file.

I notice that when we crawl 2 differents website and we store the output in two differents variables (let's say A and B), the second variable, B, gets incremented of A...and so on for other crawls.

It is like the analyse() write on a the same object !!

And it gets even weirder when I delete A and B with a del A,B, the analyse() function do not re-run, it recovers the old results from nowhere !!

I tried to use function % reset to erase the memory...but still recover the results from local memory !!!

here is an example:

from seoanalyzer import analyze
A = analyze("https://krugerwildlifesafaris.com/")

# the lenght is 90
print(len(A['pages'])) 

B = analyze("http://www.vintage.co.bw/")

# the lenght is 90
print(len(A['pages']))
# the lenght is 100 but it should be 10 
print(len(B['pages']))

the A has 90 pages and B should have only 10 pages, but it has 90 from A + its own 10..

how to avoid this ?
Why this erratic behavior ?

regards,

karim.m

InvalidSchema: javascript event code in href

I ran the analyzer on a website and got this error. Anyway to filter these urls?

Traceback (most recent call last):
File "...\python37-32\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "...\python37-32\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "...\Python37-32\Scripts\seoanalyze.exe_main.py", line 7, in
File "...\python37-32\lib\site-packages\seoanalyzer_main.py", line 26, in main
output = analyze(args.site, args.sitemap)
File "...\python37-32\lib\site-packages\seoanalyzer\analyzer.py", line 618, in analyze
pg.analyze()
File "...\python37-32\lib\site-packages\seoanalyzer\analyzer.py", line 239, in analyze
page = self.session.get(self.url)
File "...\python37-32\lib\site-packages\requests\sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "...\python37-32\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "...\python37-32\lib\site-packages\requests\sessions.py", line 640, in send
adapter = self.get_adapter(url=request.url)
File "...\python37-32\lib\site-packages\requests\sessions.py", line 731, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'javascript:poptastic('http://www.website.com/share.php?...');'

UnicodeEncodeError: 'ascii' codec can't encode characters when calling self._output(request.encode('ascii'))

Describe the bug
When crawling websites that have non-ascii characters in the URL (for example the character é), I get this error:

UnicodeEncodeError: 'ascii' codec can't encode characters when calling self._output(request.encode('ascii'))

To Reproduce
Steps to reproduce the behavior:

Run seoanalyze https://www.archi-graph.com/
This website has pages with URLs containing non-ascii characters and will throw the error above

Expected behavior
Program should run as normal

Desktop (please complete the following information):

OS: Windows 10
Browser: N/A

Smartphone (please complete the following information):
N/A

Additional context
I propose a fix that sanitizes all URLs passed to the get method in the http module

No module named seoanalyzer

Heya everyone,
This is probably a PEBCAK. Hopefully someone with more experience will be patient and explain what silly thing I did wrong.

I followed the README to install the module:

dylan@compy:~/scripts/url_tester$ pip3 install pyseoanalyzer
Requirement already satisfied: pyseoanalyzer in /usr/local/lib/python3.6/site-packages
Requirement already satisfied: BeautifulSoup4 in /usr/local/lib/python3.6/site-packages (from pyseoanalyzer)
Requirement already satisfied: requests in /usr/local/lib/python3.6/site-packages (from pyseoanalyzer)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.6/site-packages (from pyseoanalyzer)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.6/site-packages (from pyseoanalyzer)
Requirement already satisfied: soupsieve>=1.2 in /usr/local/lib/python3.6/site-packages (from BeautifulSoup4->pyseoanalyzer)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/site-packages (from requests->pyseoanalyzer)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/site-packages (from requests->pyseoanalyzer)
Requirement already satisfied: idna<2.7,>=2.5 in /usr/local/lib/python3.6/site-packages (from requests->pyseoanalyzer)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.6/site-packages (from jinja2->pyseoanalyzer)

Then crafted the simplest file to see it work:

#!/usr/bin/env python
from seoanalyzer import analyze
output = analyze('https://valid.url.com')
print(output)

When I run the file, I get an error:

dylan@compy:~/scripts/url_tester$ ./network_site_tester.py
Traceback (most recent call last):
  File "./network_site_tester.py", line 3, in <module>
    from seoanalyzer import analyze
ImportError: No module named seoanalyzer

Obviously I'm no Python guru, but this is probably a no-brainer for a proper Python developer.
Thanks!

Possible error with sitemap crawling

It seems that whenever I try to run my script with a sitemap I get a Traceback error. I can run my script with just a URL inserted but when I try to crawl using the sitemap I get the error.

Screenshot

Desktop (please complete the following information):

OS: Linux using replit
Browser: Chrome

Im new to coding in general so this could be entirely on me, thanks in advance!

Newbie Issue

Hi Guys, Trying to run seoanalyzer on Google Colab.

Tried to follow all instructions mentioned on github, but it seems like I have missed something.

Below is the code I used.

import zipfile
!unzip /content/python-seo-analyzer-master.zip -d /content/Project
!pip install pyseoanalyzer
from seoanalyzer import analyze
seoanalyze https://domain.com/

Would really appreciate any feedback/help, am a complete newbie with coding/python.

Thanks

`seoanalyze -f html https://www.foo-site.bar` throws Jinja2 Loader Error

Gotta love Jinja2 errors!!

λ seoanalyze -f html https://landing.bscg.us
Traceback (most recent call last):
  File "c:\anaconda2\envs\py36test\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\anaconda2\envs\py36test\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Anaconda2\envs\py36test\Scripts\seoanalyze.exe\__main__.py", line 9, in <module>
  File "c:\anaconda2\envs\py36test\lib\site-packages\seoanalyzer\__main__.py", line 33, in main
    template = env.get_template('index.html')
  File "c:\anaconda2\envs\py36test\lib\site-packages\jinja2\environment.py", line 830, in get_template
    return self._load_template(name, self.make_globals(globals))
  File "c:\anaconda2\envs\py36test\lib\site-packages\jinja2\environment.py", line 804, in _load_template
    template = self.loader.load(self, name, globals)
  File "c:\anaconda2\envs\py36test\lib\site-packages\jinja2\loaders.py", line 113, in load
    source, filename, uptodate = self.get_source(environment, name)
  File "c:\anaconda2\envs\py36test\lib\site-packages\jinja2\loaders.py", line 187, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: index.html

TemplateNotFound can only mean ~3 things. & I think I know which one it is based on:

File "c:\anaconda2\envs\py36test\lib\site-packages\seoanalyzer\__main__.py", line 33, in main
    template = env.get_template('index.html')

Let me know if anyone else is getting this error!

Otherwise, I'll probably have a fixing pull-req out by EoD.

urllib3.exceptions.MaxRetryError

Hey there,

Love seoanlyzer. Getting the error: urllib3.exceptions.MaxRetryError

I've tried some solutions here https://stackoverflow.com/questions/23013220/max-retries-exceeded-with-url-in-requests

with no luck. Any suggestions would be great. thanks

AttributeError: module 'urllib3' has no attribute 'PoolManager'

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Trivial examples of scripts using seoanalyzer

Is there any chance you would be able to make a markdown document that gave some simple examples of python scripts using this tool so that beginners (like me; people starting out with SEO) can see exactly how this might be utilised.

It is probably very trivial information to some, but I think it would benefit people like me.

jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'facebook'

exits with the error jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'facebook'

while running a scan with sitemap and html output options.

Ubuntu Xenial running in the Windows 10 environment

UnicodeEncodeError when calling print(output_from_parsed_template) for some websites

Describe the bug
When running the analyzer on https://www.amazon.jobs/en/, the following error occurs:

UnicodeEncodeError: 'charmap' codec can't encode character '\u202f' in position 10874: character maps to <undefined>

The error occurs when calling print(output_from_parsed_template) in __main__.py and seem to be related to the html output option.

When commenting out the print statement, the program finishes but the HTML output report is blank.

To Reproduce
Steps to reproduce the behavior:

Run seoanalyze https://www.amazon.jobs/en/ --output-format html

Expected behavior
No error. Program executes normally.

Desktop (please complete the following information):

OS: Windows 10

check_dns doesn't work with a proxy

If use seo-analyzer behind a proxy, call to check_dns fails with 'DNS Lookup Failed'.

xml.parsers.expat.ExpatError: syntax error: line 1, column 0

A site (https://40enfit.nl) gives the following XML error.

  File "seo.py", line 51, in __init__
    output = analyze(site, sitemap)
  File "/usr/local/lib/python3.5/dist-packages/seoanalyzer/analyzer.py", line 523, in analyze
    xmldoc = minidom.parseString(xml_raw)
  File "/usr/lib/python3.5/xml/dom/minidom.py", line 1968, in parseString
    return expatbuilder.parseString(string)
  File "/usr/lib/python3.5/xml/dom/expatbuilder.py", line 925, in parseString
    return builder.parseString(string)
  File "/usr/lib/python3.5/xml/dom/expatbuilder.py", line 223, in parseString
    parser.Parse(string, True)
xml.parsers.expat.ExpatError: syntax error: line 1, column 0

my code

site = "https://40enfit.nl"
sitemap = "https://40enfit.nl/sitemap_index.xml"
output = analyze(site, sitemap)
print(str(output))

`analyze()` not returning keywords in JSON or HTML

Assuming this is some collateral-damage from your cleanup of the global vars.

Here's the issue in a nutshell:

>>> seo_data = analyze('http[s]://foo-any-website.com')
>>> seo_data['keywords'] == []
True

for all websites.

sethblack / python-seo-analyzer Goto Github PK

python-seo-analyzer's Introduction

Python SEO Analyzer

Installation

PIP

Docker

Command-line Usage

API

Notes

python-seo-analyzer's People

Contributors

Stargazers

Watchers

Forkers

python-seo-analyzer's Issues

Results:

Below is the code I used.

import zipfile !unzip /content/python-seo-analyzer-master.zip -d /content/Project !pip install pyseoanalyzer from seoanalyzer import analyze seoanalyze https://domain.com/

Recommend Projects

Recommend Topics

Recommend Org

import zipfile
!unzip /content/python-seo-analyzer-master.zip -d /content/Project
!pip install pyseoanalyzer
from seoanalyzer import analyze
seoanalyze https://domain.com/