Giter Site home page Giter Site logo

ayima / sitemap-visualization-tool Goto Github PK

View Code? Open in Web Editor NEW
95.0 95.0 39.0 552 KB

Python scripts for extracting, categorizing and visualizing an XML sitemap

Home Page: https://www.ayima.com/guides/how-to-visualize-an-xml-sitemap-using-python.html

License: Mozilla Public License 2.0

Python 1.70% HTML 61.43% Jupyter Notebook 36.87%

sitemap-visualization-tool's People

Contributors

agalea91 avatar denics avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sitemap-visualization-tool's Issues

Error with numpy?

Hi I have reinstalled sitemap-visualization-tool recently on a new debian based machine and I am getting this error:

$ python categorize_urls.py --depth 4
ImportError: No module named _multiarray_umath

That I think refers to numpy library. This is the installed version:

Requirement already satisfied: numpy in /usr/local/lib/python2.7/dist-packages (1.15.4)

Denis

Crawler link detection

Hello Again,
after some more testing i noticed that the crawler is not getting all the internal links. The displayed links in the .pdf are mostly menu and sidebar links but links in content area are ignored.
Can we add a setting to adjust the crawler "depth"?

Add analytics

I was looking at categorize_urls and I was wondering if it could be possible, using google-api-python-client to add a "visits" column. Entering few arguments like startdate and enddate, we could retrieve the visits (or whatever metric, but I would give maybe a list of alternatives), for each of the URLs and display or color each node depending on the number visits. This may be a useful enhancement.

visualize_urls.py generates malformed content

dot will trip on the generated output

digraph sitemap {
        graph [fontcolor=black fontname=Helvetica fontsize=18 label="www.example.org"]
        node [color=black fillcolor="#dbdddd" fontcolor=black fontname=Helvetica fontsize=14 style=filled]
        edge [arrowhead=open color=black fontcolor=black fontname=Helvetica fontsize=12]
rankdir=LRsize="40"     node [shape=rectangle]
        "www.example.org" [label="www.example.org (495)"]

with this error:

graphviz.backend.execute.CalledProcessError: Command '[PosixPath('dot'), '-Kdot', '-Tpdf', '-O', 'sitemap_graph_5_layer']' returned non-zero exit status 1. [stderr: b"Error: sitemap_graph_5_layer: syntax error in line 5 near '='\n"]

After making 2 lines out of line 5 and adding a space ,dot will accept the data:

        rankdir=LR size="40"
        node [shape=rectangle]

Ubuntu, python 3.10.6, graphviz 2.43.0

error with visualization

Hello!

I ran the code files, and while the first two run fine, the visualization one returns an error.
The error goes like this:

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/graphviz/backend.py", line 164, in run
proc = subprocess.Popen(cmd, startupinfo=get_startupinfo(), **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/subprocess.py", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/subprocess.py", line 1702, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'dot'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/giorgos/Downloads/sitemap-visualization-tool-master/visualize_urls.py", line 240, in
main()
File "/Users/giorgos/Downloads/sitemap-visualization-tool-master/visualize_urls.py", line 235, in main
f.render(cleanup=True)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/graphviz/files.py", line 243, in render
rendered = backend.render(self._engine, format, filepath,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/graphviz/backend.py", line 223, in render
run(cmd, capture_output=True, cwd=cwd, check=True, quiet=quiet)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/graphviz/backend.py", line 167, in run
raise ExecutableNotFound(cmd)
graphviz.backend.ExecutableNotFound: failed to execute ['dot', '-Kdot', '-Tpng', '-O', 'sitemap_graph_2_layer'], make sure the Graphviz executables are on your systems' PATH

Could you help me resolve this?

Thank you in advance for your help,

VC

Add requirements file

It would be nice to check that all requirements are installed with a simple pip command such as:

pip install -r requirements.txt

.csv file

while categorizing in csv file just two columns are generating.
site

Possibility to have limit by layer.

Again, working with complex websites can result in messy (and huge) sitemaps. it would be handy to be able to render only the layers that we want.

For example, I can think of a full sitemap where we don't want to print the whole level one, but, in combination with "only" we can print all a subsection of level two.

Local .xml file

Hi there,

Is is possible to pass in a local .xml file instead of a URL?

Resolution of pdf

Hello i was wondering if this gets still updated?
I noticed that on large pages the resolution of the exported .pdf is so low that you can not read the labels is there any solution for this?

Interactive Graph

Now i have one more question:
since this module is exporting a csv is it possible to sort the csv in a way that it can get pulled in to gephy or is there any other way to make the output interactive?

add "only" option

In certain cases, when the sitemap is too complex, it would be handy to render only a part of it, especially if working on a specific branch of the site.

Having an option "only" as we have "skip" could be very beneficial in my opinion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.