Giter Site home page Giter Site logo

ikreymer / webarchiveplayer Goto Github PK

View Code? Open in Web Editor NEW
189.0 24.0 20.0 225.92 MB

NOTE: This project is no longer being actively developed.. Check out Webrecorder Player for the latest player. https://github.com/webrecorder/webrecorderplayer-electron) (Legacy: Desktop application for browsing web archives (WARC and ARC)

License: GNU General Public License v3.0

Shell 4.88% Python 87.85% HTML 6.87% Batchfile 0.40%

webarchiveplayer's Introduction

Please note: this project is no longer being actively developed.

Please use the new Webrecorder Player app, available for download here. The Webrecorder Player will receive regular feature updates in sync with https://webrecorder.io/

WebArchivePlayer 1.4.7

WebArchivePlayer is a new desktop tool which provides a simple point-and-click wrapper for viewing any web archive file (in WARC and ARC format).

To create a web archive (WARC) file of your own, you can use the free https://webrecorder.io/ service to browse any page and then download the recorded WARC file.

The player allows users to pick one or more ARC/WARC from their local machine and browse the contents from any browser. No internet connection is necessary in order to browse the archive.

Usage (Windows and OS X Apps)

  1. Download the latest version:
  1. Double click to open. (For OS X, open the .dmg file to mount the volume and extract the player). You may have to agree to allow open files from the internet, and to allow making internet connections (windows only). This still new software and other distribution methods may be added in the future.

  2. A file dialog will show up. Browse to an existing WARC or ARC file(s).

    You can use https://webrecorder.io to record pages as you browse and then download the WARC file.

  3. A browser will open to http://localhost:8090/ listing all the pages in the archive.

  4. Click on any page listed to view the replay. Or, enter a url to search the full archive.

  5. To exit, simply close the WebArchivePlayer window.

Example

OS X Screenshot Windows Screenshot

(Replaying screenshot from Wikipedia SOPA Blackout. You can download the WARC from GitHub.)

Usage for All Platforms -- Running from python source

Currently, executable versions are available only for OS X and Windows.

However, the player should work on any system that has Python 2.7.x, but requires a little bit more setup.

On other systems (or to build from source):

  1. Clone this repo: git clone https://github.com/ikreymer/webarchiveplayer.git; cd webarchiveplayer

  2. Install by running python setup.py install (optionally using a virtualenv)

  3. Run webarchiveplayer [/path/to/warc_or_arc]

GUI Mode

If a W/ARC file argument is omitted, the player will attempt to start in GUI mode and show a File Open dialog.

However, in order to run in GUI mode, the wxPython toolkit will also need to be installed seperately.

Refer to instructions at wxPython Download page for your platform.

wxPython and virtualenv

wxPython does not by default work in virtualenv. The simplest way to make it work is to symlink the system wxredirect.pth to the virtualenv site-packages directory. For example, on OS X, if you've installed `virtualenv [myenv]

ln -s /Library/Python/2.7/site-packages/wxredirect.pth [myenv]/lib/python2.7/site-packages/wxredirect.pth

CLI Mode

If a W/ARC file argument is passed to the player, eg:

webarchiveplayer /path/to/warcfile.warc.gz

The player will select that file and skip the File Open dialog. Installation of wxPython is not required when specifiyng the WARC explicitly via command line.

The OS X and Windows applications also support specifying the file via command line.

Custom Preset Archive Mode

In addition to opening files, WebArchivePlayer can now also be used to provide a point-and-click launcher for any pywb archive.

If a config.yaml file is present in the working directory (same directory as WebArchivePlayer), the specified configuration will be loaded instead of a file prompt.

This can be used to distribute specific archives together with WebArchivePlayer.

Certain aspects of the player can also be modified in the config.yaml, including changing the contents from 'Web Archive Player' to any custom title and HTML page.

webarchiveplayer:
   # initial page to load on start-up
   # eg: http://localhost:8090/my_coll/http://example.com/
   start_url: my_coll/http://example.com/

   # set initial width of player window
   width: 400

   # set initial height of player window
   height: 250

   # set window title
   title: My Archive

   # Load custom contents from local HTML
   desc_html: ./desc.html
   
   # Auto-load WARCs from specified directory (supported from 1.4.6)
   auto_load_dir: ./warcs/

For example, one could distribute a WARC together with the player and provide a custom setup. This includes automatically indexing WARCs on load to allow quick drop in, or configuring a multi-collection archive.

Auto-Load WARCs

With version 1.4.6, webarchiveplayer supports indexing WARCs automatically from a designated directory. Archive files are indexed on each load to allow for dropping or updating the files more easily.

To setup, all that's needed is a config.yaml with the following:

   webarchiveplayer:
       auto_load_dir: ./warcs
       
       title: 'My Archive'
       desc_html: ./desc_page.html

If WebArchivePlayer is placed in the same directory as the config.yaml and warcs directory, the player will automatically load and index all WARC/ARC files found in this directory.

Optionally, the config.yaml and warcs may also be placed in an archive sub-directory. This allows for an archive to be more easily transported (eg. as a tar-ball or zip file).

The last two params allow for customizing the WebArchivePlayer window. The title param specifies the window title, while the desc_html param specifies the contents of the WebArchivePlayer window.

Create multi-collection archive

The following steps describe creating static archive with preset collections and indexed archive files:

  1. Create new directory my_archive and switch to it.

  2. Copy the WebArchivePlayer application to my_archive

  3. In my_archive, run wb-manager init my_coll

  4. Run wb-manager add my_coll <path/to/warc>

  5. Add config.yaml in my_archive, perhaps with

       webarchiveplayer:
          start_url: my_coll/http://example.com/
          title: My Archive Demo
    
  6. Now, when WebArchivePlayer is started in my_archive, it will use the WARC in my_coll and load http://localhost:8090/my_coll/http://example.com/ as the starting URL.

  7. The my_archive dir can be distributed as a standlone archive and player.

Building GUI Binaries

The binaries can be built by running the builds scripts from the app directory:

Note: wxPython must be installed for this to work. If running in virtualenv, follow instructions above. The install script will not run if it can't find wxPython

OS X: (output written to osx/webarchiveplayer.dmg)

cd app
./build-osx.sh

Windows: (output copied to windows\webarchiveplayer.exe)

cd app
build-windows.bat

Changelist

1.4.7

Ensure config file as desc HTML are read as utf-8

1.4.6

Update to pywb 0.33.1 Support for auto_load_dir option in config.yaml (or archive/config.yaml) which specifies a directory from which to automatically load WARCs on startup.

1.4.5

Update to pywb 0.32.1 Support Webrecorder collection WARCs, read pages/bookmarks from all warcinfo records

1.4.1

Update to pywb 0.30.1 Support reading of WARC files with non-HTTP response records (which are skipped).

1.4.0

Build using Python 3 and pywb 0.30.0, using latest pyinstaller page detect: re-enable reading pagelist from json-metadata if present in WARC

1.3.0

Support multiple instances by picking a random port if 8090 is not available Ensure HTML 'resource' records are included in page list Display error dialog before quitting if unable to read and index WARC/ARCs. Switch to pywb 0.11.1, many improvements in indexing and replay

1.2.0

Custom preset archive support with custom config.yaml Use HTML for main window rendering Switch to pywb 0.10.9.1 for more rewriting improvements

1.1.4

Update to pywb 0.10.8, rewriting improvements, add pywb version display

1.1.3

Update to pywb 0.10.6, significant replay improvements

1.1.2

Fix issue where page listing only lists pages for one WARC/ARC when multiple are selected. Build scripts check for wxPython installation.

1.1.1

Update to use latest pywb release (0.8.3)

1.1.0

Support opening multiple WARC/ARC files at once. Also fix issue with opening files with spaces in filename.

1.0.1

Initial release.

How it Works

WebArchivePlayer is a simple wrapper over the pywb web archiving tools using pyinstaller to create a standalone, GUI wrapper. The wxPython toolkit is used to provide the GUI. The wrapper starts a local server which serves content from the selected web archive, using pywb to handle the rest.

Consult the pywb documentation for more info on web archive replay.

Questions / Issues

Please feel free to open an issue on this page for any problems / questions / concerns regarding this tool. This is a brand new software, so feedback is encouraged.

Other Tools

Another project, which in part inspired WebArchivePlayer, is Mat Kelly's excellent WAIL project, which provides a GUI for different web crawling and replay systems.

webarchiveplayer's People

Contributors

ikreymer avatar kblumenthal avatar machawk1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

webarchiveplayer's Issues

Is there a way to turn error reporting to a high level?

Currently, I get many silent failures while trying to read WARC files which are successfully read by the openwayback application.

Sometimes, the webarchiveplayer fails with no errors reported. Other times, it successfully starts up, but only gives me a couple of URLs when the WARC in question contains a few dozen resources (all response records). These are visible and functional in openwayback.

I would like to debug this myself, but since I only have a small clue about running and debugging at the python level, I'm hoping I can at least get started by setting logging/error reporting to a maximum level. Is there such a thing? Would it be a good idea?

I am running the latest webarchiveplayer.exe from either cmd.exe or from gitbash on Windows 8.1.

Thanks!

Issue building GUI binaries on OS X

I was attempting to build the GUI binary for OS X 10.10.5 and ran into some issues using the following commands:

  1. git clone https://github.com/ikreymer/webarchiveplayer
  2. cd webarchiveplayer/app
  3. ./build-osx.sh
    • webarchiveplayer/app/osx/webarchiveplayer.dmg is created as well as the pre-packaged products at webarchiveplayer/app/dist/{webarchiveplayer,webarchiveplayer.app}.
  4. I open the .dmg to reveal a file/directory named webarchiveplayer.app.
  5. I launch the app via double-click. It immediately closes (the icon is shown and hidden quickly in the dock).
    • The webarchiveplayer/app/dist/webarchiveplayer.app file exhibits the same behavior.
  6. Per the procedure I have used to remedy this behavior in WAIL, I launched the webarchiveplayer/app/dist/webarchiveplayer file and receive the error:
    • Traceback (most recent call last): File "<string>", line 1, in <module> ImportError: No module named archiveplayer.archiveplayer

I played devil's advocate and tried pip install archiveplayer to no avail (the package is not in pip).

I then ran the system-wide installation via python setup.py install and re-ran ./build-osx.sh, which produced another .dmg (I was sure to remove the original .dmg and build remnants).

I relaunched the .app in the .dmg and the Open window now displays as expected. This indicates to me that the python setup install.py must be run before the GUI can be built. I have not tested whether the binary is reliant on the system-installed package when using the pyinstaller-interfacing build script you have provided. It might be useful to provide a requirements file and amend the instructions to first install the dependencies for pyinstaller to access and include in the binary.

I realize that this is a derivative project of pywb but there is definitely appeal in provide a native simple interface to replaying WARCs as you have here. I also hope to extensively reuse your code when I can find the time.

Not an issue, but a request

Sorry, I can be so stupid at times and can't find the appropriate place to post.

Would it be possible to integrate NutchWAX with your software? I'd love to be able to search WARCs.

Anyway, thanks so much for your wonderful software and work.

Excessive memory usage when serving big files

On Debian 7, I loaded a WARC that contained some larger files (200-300 MB). The loading went smoothly, but when I tried to access the file in a browser (in this case, Firefox), memory usage of the process pywb (or python? or webarchiveplayer? can't remember and I can't test it again now) increased up to 700-800 MB when I ran out of memory and the following error messages appeared in the terminal:

Pywb Error

Error Details:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/pywb-0.8.3-py2.7.egg/pywb/framework/wsgi_wrappers.py", line 98, in handle_methods
    response = wb_router(env)
  File "/usr/local/lib/python2.7/dist-packages/pywb-0.8.3-py2.7.egg/pywb/framework/archivalrouter.py", line 39, in __call__
    return route.handler(wbrequest)
  File "/usr/local/lib/python2.7/dist-packages/pywb-0.8.3-py2.7.egg/pywb/webapp/handlers.py", line 73, in __call__
    return self.handle_request(wbrequest)
  File "/usr/local/lib/python2.7/dist-packages/pywb-0.8.3-py2.7.egg/pywb/webapp/handlers.py", line 141, in handle_request
    return self.handle_replay(wbrequest, cdx_lines)
  File "/usr/local/lib/python2.7/dist-packages/pywb-0.8.3-py2.7.egg/pywb/webapp/handlers.py", line 155, in handle_replay
    cdx_callback)
  File "/usr/local/lib/python2.7/dist-packages/pywb-0.8.3-py2.7.egg/pywb/webapp/replay_views.py", line 90, in render_content
    failed_files)
  File "/usr/local/lib/python2.7/dist-packages/pywb-0.8.3-py2.7.egg/pywb/webapp/replay_views.py", line 123, in cached_replay_capture
    return get_capture()
  File "/usr/local/lib/python2.7/dist-packages/pywb-0.8.3-py2.7.egg/pywb/webapp/replay_views.py", line 115, in get_capture
    failed_files)
  File "/usr/local/lib/python2.7/dist-packages/pywb-0.8.3-py2.7.egg/pywb/webapp/replay_views.py", line 183, in replay_capture
    response_iter)
  File "/usr/local/lib/python2.7/dist-packages/pywb-0.8.3-py2.7.egg/pywb/webapp/replay_views.py", line 205, in buffered_response
    content = out.getvalue()
MemoryError

Interestingly, when I accessed the file via wget (initiated a download), it was able to serve, however, there was some strain there too. (I haven't checked the memory usage, though.)

I guess the problem is easily reproducible: you can find the WARC here: https://archive.org/details/hajduvolan_hu_2015_05, and the largest file (an flv video) is on this page: http://hajduvolan.hu/2011-2012-475.html, you'll find it in an embedded player, or you can directly download it with the link under the player.

I believe it's not a problem with the browser, but some bug in pywb. (I wondered if I should file this issue to pywb, but that's a newer version than the one utilized by webarchiveplayer. Maybe the problem has been fixed since?) And I believe 700 MB of memory usage is too much for a 250 MB file.

In case if it's not a bug with webarchiveplayer or pywb, or if the problem has been fixed sine, I apologize for this issue. Also sorry for me not being able to do more tests on it.

coursera warc files not really showing content

  1. I downloaded and extracted the last file in this list:
    https://gist.github.com/marai2/b548ce70b6af4789522c6ef5e54c6bbf
    file is: https://archive.org/download/archiveteam_coursera_20160709070402/coursera_20160709070402.megawarc.warc.gz

  2. Downloaded and run:
    Web Archive Player 1.4.7
    (pywb 0.33.1)
    Archive Player Server running at:
    http://localhost:8090/

  3. Opened the warc file with web archive player. Chrome opened the url with 11548 links in it.

  4. I clicked on tissue101 whose link is: http://localhost:8090/20160629170142/https://www.coursera.org/course/tissue101

  5. It shows an empty Coursera page. Same happens with other course links. Or it loads forever.

I am using Mac OS 10.13.4 High Sierra.
What can I do to use these warc files? Is there a way to see content as folders? I want to see if there are any video files at least.
Thanks

Associate date/build information with the binary

I noticed when using this that there was no way to tell what version of the software I was using. It would be useful to know what version of the binary as well as other information (e.g., pywb version) the app uses.

On OS X, this information would normally be present in the "About" box. https://github.com/machawk1/wail/blob/osagnostic/bundledApps/WAIL.py#L232 and
https://github.com/machawk1/wail/blob/osagnostic/bundledApps/WAIL.py#L236-L246 are now I do it in WAIL for within-the-app.

From outside of the app, the version is also not evident.
screen shot 2015-11-06 at 1 13 57 pm

This information is set in the app's .plist file in OS X. Pyinstaller does not have a way to specify a plist file on-build but recommends replacing it within the .app after the binary has been built. I do this in my MAKEFILE in WAIL:
https://github.com/machawk1/wail/blob/osagnostic/bundledApps/MAKEFILE.sh#L46
https://github.com/machawk1/wail/blob/osagnostic/build/Info.plist

It's not the ideal way, but it is a solution.

Unknown archive format, first line: ['WARC/1.1']

I have downloaded a WARC file from archive.org. I can't share it as it's over 100GB in size. It contains a file I am after, but unfortunately, the webarchiveplayer cannot open it and responds with this error message.

image

Many Thanks in advance.

Support running with preset collection with a custom config.yaml

If a config.yaml is present in the current working directory, use that configuration instead of dynamic file selection mode. In this mode, starting WebArchivePlayer will start with the specified available.

Also support customizing the WebArchivePlayer window, including initial width, height, title and set window from custom html file, as well as custom starting url (instead of just /replay/)

Resource entries are not listed

When loading a warc with "resource" entries conforming to Section 6.4 of ISO 28500's latest draft, the resource entries are omitted from the page list presented to the end user.

Error Reading Web Archive File(s)

Hi! I'm trying to use webarchiveplayer to open a WARC file, but facing a problem of "Error Reading".

The error message is as follow:

 WebArchivePlayer is unable to read the input file(s) and will quit.
 Details: Invalid WARC record, first line: WARC-Warcinfo-ID:
 f8ea7d54-e7a3-4d33-9ff4-45d99aa7864c

The WARC files are the samples of project ClueWeb09 (http://www.lemurproject.org/clueweb09/sampleFiles.php).

Can you tell me what is going wrong?

Thank you for your help.

Files created using WARCreate don't work

I'm running Win7 SP1 Home.

I'm using Chrome 46.0.2490.86 m with the latest version of http://warcreate.com/ extension to generate some .WARC files.

webarchiveplayer.exe 1.2.0 simply does nothing if I try to open any WARC file generated using that Chrome extension: no error message, no warning, nothing at all.

enhancement request: search more than URLs

In the GUI you have a nice box to allow the searching of URLs. I would LOVE to have the ability to search for any text string, both in the visible capture, and the source code. I'm currently working on a WARC of a Twitter search history, and would love to be able to search the source code for the tweet-ids (ideally generating a list of all of them from the given WARC file).

Thanks for the consideration, and the neat tool!

Support for recent requests module

I see webarchiveplayer needs requests version 2.5.1. Is there a chance it will support newer versions of it in the near future (or is it already possible, just the dependency record is not updated), or shall I set up a virtualenv not to have conflict with other applications? (Please consider this as a feature request.)

I don't know how complicated it would be, that's why I'm asking. Thanks for the reply.

File menu is empty

Clicking on the file menu shows no menu contents. A solution would be to completely get rid of the menu (I don't know if this is possible w/ wxPython). The "webarchiveplayer" menu should also contain a hook into an about box, per #12. OS X 10.11.1

webarchiveplayer_nomenu

Wrong dmg upload to the releases page?

I just checked the releases page and the dmg and exe for the last two versions.

The dmg for versions 1.4.1 and 1.4.5 appear to be the exact same (same sha256). This suggests the one for the previous version was erroneously uploaded to the current one. exes have different shasums so they seem to be correct.

Fails to install

Fails to install (I am aware that there is already a DMG file for OSX, but I need this for linux later).

$ cd /tmp
$ git clone https://github.com/ikreymer/webarchiveplayer
$ cd webarchiveplayer
$ python setup.py install
running install
running bdist_egg
running egg_info
writing webarchiveplayer.egg-info/PKG-INFO
writing dependency_links to webarchiveplayer.egg-info/dependency_links.txt
writing entry points to webarchiveplayer.egg-info/entry_points.txt
writing requirements to webarchiveplayer.egg-info/requires.txt
writing top-level names to webarchiveplayer.egg-info/top_level.txt
reading manifest file 'webarchiveplayer.egg-info/SOURCES.txt'
writing manifest file 'webarchiveplayer.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.12-x86_64/egg
running install_lib
running build_py
creating build/bdist.macosx-10.12-x86_64/egg
creating build/bdist.macosx-10.12-x86_64/egg/archiveplayer
copying build/lib/archiveplayer/version.py -> build/bdist.macosx-10.12-x86_64/egg/archiveplayer
copying build/lib/archiveplayer/__init__.py -> build/bdist.macosx-10.12-x86_64/egg/archiveplayer
copying build/lib/archiveplayer/archiveplayer.py -> build/bdist.macosx-10.12-x86_64/egg/archiveplayer
copying build/lib/archiveplayer/pagedetect.py -> build/bdist.macosx-10.12-x86_64/egg/archiveplayer
creating build/bdist.macosx-10.12-x86_64/egg/archiveplayer/templates
copying build/lib/archiveplayer/templates/index.html -> build/bdist.macosx-10.12-x86_64/egg/archiveplayer/templates
copying build/lib/archiveplayer/templates/pagelist_search.html -> build/bdist.macosx-10.12-x86_64/egg/archiveplayer/templates
byte-compiling build/bdist.macosx-10.12-x86_64/egg/archiveplayer/version.py to version.cpython-36.pyc
byte-compiling build/bdist.macosx-10.12-x86_64/egg/archiveplayer/__init__.py to __init__.cpython-36.pyc
byte-compiling build/bdist.macosx-10.12-x86_64/egg/archiveplayer/archiveplayer.py to archiveplayer.cpython-36.pyc
byte-compiling build/bdist.macosx-10.12-x86_64/egg/archiveplayer/pagedetect.py to pagedetect.cpython-36.pyc
creating build/bdist.macosx-10.12-x86_64/egg/EGG-INFO
copying webarchiveplayer.egg-info/PKG-INFO -> build/bdist.macosx-10.12-x86_64/egg/EGG-INFO
copying webarchiveplayer.egg-info/SOURCES.txt -> build/bdist.macosx-10.12-x86_64/egg/EGG-INFO
copying webarchiveplayer.egg-info/dependency_links.txt -> build/bdist.macosx-10.12-x86_64/egg/EGG-INFO
copying webarchiveplayer.egg-info/entry_points.txt -> build/bdist.macosx-10.12-x86_64/egg/EGG-INFO
copying webarchiveplayer.egg-info/requires.txt -> build/bdist.macosx-10.12-x86_64/egg/EGG-INFO
copying webarchiveplayer.egg-info/top_level.txt -> build/bdist.macosx-10.12-x86_64/egg/EGG-INFO
copying webarchiveplayer.egg-info/zip-safe -> build/bdist.macosx-10.12-x86_64/egg/EGG-INFO
creating 'dist/webarchiveplayer-1.4.7-py3.6.egg' and adding 'build/bdist.macosx-10.12-x86_64/egg' to it
removing 'build/bdist.macosx-10.12-x86_64/egg' (and everything under it)
Processing webarchiveplayer-1.4.7-py3.6.egg
Removing /Users/david/.virtualenvs/py3-data/lib/python3.6/site-packages/webarchiveplayer-1.4.7-py3.6.egg
Copying webarchiveplayer-1.4.7-py3.6.egg to /Users/david/.virtualenvs/py3-data/lib/python3.6/site-packages
webarchiveplayer 1.4.7 is already the active version in easy-install.pth
Installing webarchiveplayer script to /Users/david/.virtualenvs/py3-data/bin

Installed /Users/david/.virtualenvs/py3-data/lib/python3.6/site-packages/webarchiveplayer-1.4.7-py3.6.egg
Processing dependencies for webarchiveplayer==1.4.7
Searching for pywb>=0.32.0
Reading https://pypi.python.org/simple/pywb/
Downloading https://pypi.python.org/packages/fe/8b/5e51f6fb6811a5132d63c2a61d7d7806f64cd630616f1e6112591aac2918/pywb-2.0.1.tar.gz#md5=a078abffea5d03b586824f7da8438bf2
Best match: pywb 2.0.1
Processing pywb-2.0.1.tar.gz
Writing /var/folders/f2/_2s_jkt505b_2dkrm1mnqjgw0000gn/T/easy_install-64u7qsxa/pywb-2.0.1/setup.cfg
Running pywb-2.0.1/setup.py -q bdist_egg --dist-dir /var/folders/f2/_2s_jkt505b_2dkrm1mnqjgw0000gn/T/easy_install-64u7qsxa/pywb-2.0.1/egg-dist-tmp-503fzvy1
fatal: Not a git repository (or any of the parent directories): .git
error: [Errno 2] No such file or directory: 'requirements.txt'
(py3-data)
david at vpn-253-022 in /tmp/webarchiveplayer on master

Getting 'permanently moved' results on 200/OK WARC entries

I'm getting 'permanently moved to here' results when loading up a WARC I made. The word 'here' is a link, and when I click it, it just reloads the same 'permanently moved' page. This would make sense if the URL in question had a result of 30x in my WARC, however I'm getting this for some 200/OK request/responses in my WARC. Any idea why this would be?

Thank you for your time.

Unable to launch 1.1.2 on OS X

Using version 1.1.1 binary on OS X 10.10.2. The .app show the icon in the dock then immediately quits. I built from source using the shell script and ran the webarchiveplayer in the dist folder that accompanies the .app. This produced the following error on the command-line, which might be the culprit.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named archiveplayer.archiveplayer

Fails to run on docker

$ docker run -ti python bash

$ git clone https://github.com/ikreymer/webarchiveplayer

$ cd webarchiveplayer

$ python setup.py install
running install
running bdist_egg
running egg_info
writing webarchiveplayer.egg-info/PKG-INFO
writing dependency_links to webarchiveplayer.egg-info/dependency_links.txt
writing entry points to webarchiveplayer.egg-info/entry_points.txt
writing requirements to webarchiveplayer.egg-info/requires.txt
writing top-level names to webarchiveplayer.egg-info/top_level.txt
reading manifest file 'webarchiveplayer.egg-info/SOURCES.txt'
writing manifest file 'webarchiveplayer.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/archiveplayer
copying build/lib/archiveplayer/version.py -> build/bdist.linux-x86_64/egg/archiveplayer
copying build/lib/archiveplayer/init.py -> build/bdist.linux-x86_64/egg/archiveplayer
copying build/lib/archiveplayer/archiveplayer.py -> build/bdist.linux-x86_64/egg/archiveplayer
copying build/lib/archiveplayer/pagedetect.py -> build/bdist.linux-x86_64/egg/archiveplayer
creating build/bdist.linux-x86_64/egg/archiveplayer/templates
copying build/lib/archiveplayer/templates/index.html -> build/bdist.linux-x86_64/egg/archiveplayer/templates
copying build/lib/archiveplayer/templates/pagelist_search.html -> build/bdist.linux-x86_64/egg/archiveplayer/templates
byte-compiling build/bdist.linux-x86_64/egg/archiveplayer/version.py to version.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/archiveplayer/init.py to init.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/archiveplayer/archiveplayer.py to archiveplayer.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/archiveplayer/pagedetect.py to pagedetect.cpython-36.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying webarchiveplayer.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying webarchiveplayer.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying webarchiveplayer.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying webarchiveplayer.egg-info/entry_points.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying webarchiveplayer.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying webarchiveplayer.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying webarchiveplayer.egg-info/zip-safe -> build/bdist.linux-x86_64/egg/EGG-INFO
creating 'dist/webarchiveplayer-1.4.7-py3.6.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing webarchiveplayer-1.4.7-py3.6.egg
Removing /usr/local/lib/python3.6/site-packages/webarchiveplayer-1.4.7-py3.6.egg
Copying webarchiveplayer-1.4.7-py3.6.egg to /usr/local/lib/python3.6/site-packages
webarchiveplayer 1.4.7 is already the active version in easy-install.pth
Installing webarchiveplayer script to /usr/local/bin

Installed /usr/local/lib/python3.6/site-packages/webarchiveplayer-1.4.7-py3.6.egg
Processing dependencies for webarchiveplayer==1.4.7
Searching for requests==2.18.4
Best match: requests 2.18.4
Processing requests-2.18.4-py3.6.egg
requests 2.18.4 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/requests-2.18.4-py3.6.egg
Searching for waitress==1.1.0
Best match: waitress 1.1.0
Processing waitress-1.1.0-py3.6.egg
waitress 1.1.0 is already the active version in easy-install.pth
Installing waitress-serve script to /usr/local/bin

Using /usr/local/lib/python3.6/site-packages/waitress-1.1.0-py3.6.egg
Searching for pywb==2.0.1
Best match: pywb 2.0.1
Processing pywb-2.0.1-py3.6.egg
pywb 2.0.1 is already the active version in easy-install.pth
Installing pywb script to /usr/local/bin
Installing wayback script to /usr/local/bin
Installing cdx-server script to /usr/local/bin
Installing live-rewrite-server script to /usr/local/bin
Installing cdx-indexer script to /usr/local/bin
Installing wb-manager script to /usr/local/bin
Installing warcserver script to /usr/local/bin

Using /usr/local/lib/python3.6/site-packages/pywb-2.0.1-py3.6.egg
Searching for urllib3==1.22
Best match: urllib3 1.22
Processing urllib3-1.22-py3.6.egg
urllib3 1.22 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/urllib3-1.22-py3.6.egg
Searching for idna==2.6
Best match: idna 2.6
Processing idna-2.6-py3.6.egg
idna 2.6 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/idna-2.6-py3.6.egg
Searching for chardet==3.0.4
Best match: chardet 3.0.4
Processing chardet-3.0.4-py3.6.egg
chardet 3.0.4 is already the active version in easy-install.pth
Installing chardetect script to /usr/local/bin

Using /usr/local/lib/python3.6/site-packages/chardet-3.0.4-py3.6.egg
Searching for certifi==2018.1.18
Best match: certifi 2018.1.18
Processing certifi-2018.1.18-py3.6.egg
certifi 2018.1.18 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/certifi-2018.1.18-py3.6.egg
Searching for wsgiprox==1.4.1
Best match: wsgiprox 1.4.1
Processing wsgiprox-1.4.1-py3.6.egg
wsgiprox 1.4.1 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/wsgiprox-1.4.1-py3.6.egg
Searching for Werkzeug==0.14.1
Best match: Werkzeug 0.14.1
Processing Werkzeug-0.14.1-py3.6.egg
Werkzeug 0.14.1 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/Werkzeug-0.14.1-py3.6.egg
Searching for webencodings==0.5.1
Best match: webencodings 0.5.1
Processing webencodings-0.5.1-py3.6.egg
webencodings 0.5.1 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/webencodings-0.5.1-py3.6.egg
Searching for webassets==0.12.1
Best match: webassets 0.12.1
Processing webassets-0.12.1-py3.6.egg
webassets 0.12.1 is already the active version in easy-install.pth
Installing webassets script to /usr/local/bin

Using /usr/local/lib/python3.6/site-packages/webassets-0.12.1-py3.6.egg
Searching for warcio==1.5.1
Best match: warcio 1.5.1
Processing warcio-1.5.1-py3.6.egg
warcio 1.5.1 is already the active version in easy-install.pth
Installing warcio script to /usr/local/bin

Using /usr/local/lib/python3.6/site-packages/warcio-1.5.1-py3.6.egg
Searching for surt==0.3.0
Best match: surt 0.3.0
Processing surt-0.3.0-py3.6.egg
surt 0.3.0 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/surt-0.3.0-py3.6.egg
Searching for six==1.11.0
Best match: six 1.11.0
Processing six-1.11.0-py3.6.egg
six 1.11.0 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/six-1.11.0-py3.6.egg
Searching for redis==2.10.6
Best match: redis 2.10.6
Processing redis-2.10.6-py3.6.egg
redis 2.10.6 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/redis-2.10.6-py3.6.egg
Searching for PyYAML==3.12
Best match: PyYAML 3.12
Processing PyYAML-3.12-py3.6-linux-x86_64.egg
PyYAML 3.12 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/PyYAML-3.12-py3.6-linux-x86_64.egg
Searching for portalocker==1.1.0
Best match: portalocker 1.1.0
Processing portalocker-1.1.0-py3.6.egg
portalocker 1.1.0 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/portalocker-1.1.0-py3.6.egg
Searching for Jinja2==2.8.1
Best match: Jinja2 2.8.1
Processing Jinja2-2.8.1-py3.6.egg
Jinja2 2.8.1 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/Jinja2-2.8.1-py3.6.egg
Searching for gevent==1.2.2
Best match: gevent 1.2.2
Processing gevent-1.2.2-py3.6-linux-x86_64.egg
gevent 1.2.2 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/gevent-1.2.2-py3.6-linux-x86_64.egg
Searching for brotlipy==0.7.0
Best match: brotlipy 0.7.0
Processing brotlipy-0.7.0-py3.6-linux-x86_64.egg
brotlipy 0.7.0 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/brotlipy-0.7.0-py3.6-linux-x86_64.egg
Searching for certauth==1.2
Best match: certauth 1.2
Processing certauth-1.2-py3.6.egg
certauth 1.2 is already the active version in easy-install.pth
Installing certauth script to /usr/local/bin

Using /usr/local/lib/python3.6/site-packages/certauth-1.2-py3.6.egg
Searching for tldextract==2.2.0
Best match: tldextract 2.2.0
Processing tldextract-2.2.0-py3.6.egg
tldextract 2.2.0 is already the active version in easy-install.pth
Installing tldextract script to /usr/local/bin

Using /usr/local/lib/python3.6/site-packages/tldextract-2.2.0-py3.6.egg
Searching for MarkupSafe==1.0
Best match: MarkupSafe 1.0
Processing MarkupSafe-1.0-py3.6-linux-x86_64.egg
MarkupSafe 1.0 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/MarkupSafe-1.0-py3.6-linux-x86_64.egg
Searching for greenlet==0.4.13
Best match: greenlet 0.4.13
Processing greenlet-0.4.13-py3.6-linux-x86_64.egg
greenlet 0.4.13 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/greenlet-0.4.13-py3.6-linux-x86_64.egg
Searching for cffi==1.11.4
Best match: cffi 1.11.4
Processing cffi-1.11.4-py3.6-linux-x86_64.egg
cffi 1.11.4 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/cffi-1.11.4-py3.6-linux-x86_64.egg
Searching for pyOpenSSL==17.5.0
Best match: pyOpenSSL 17.5.0
Processing pyOpenSSL-17.5.0-py3.6.egg
pyOpenSSL 17.5.0 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/pyOpenSSL-17.5.0-py3.6.egg
Searching for setuptools==38.5.1
Best match: setuptools 38.5.1
Adding setuptools 38.5.1 to easy-install.pth file
Installing easy_install script to /usr/local/bin
Installing easy_install-3.6 script to /usr/local/bin

Using /usr/local/lib/python3.6/site-packages
Searching for requests-file==1.4.3
Best match: requests-file 1.4.3
Processing requests_file-1.4.3-py3.6.egg
requests-file 1.4.3 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/requests_file-1.4.3-py3.6.egg
Searching for pycparser==2.18
Best match: pycparser 2.18
Processing pycparser-2.18-py3.6.egg
pycparser 2.18 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/pycparser-2.18-py3.6.egg
Searching for cryptography==2.1.4
Best match: cryptography 2.1.4
Processing cryptography-2.1.4-py3.6-linux-x86_64.egg
cryptography 2.1.4 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/cryptography-2.1.4-py3.6-linux-x86_64.egg
Searching for asn1crypto==0.24.0
Best match: asn1crypto 0.24.0
Processing asn1crypto-0.24.0-py3.6.egg
asn1crypto 0.24.0 is already the active version in easy-install.pth

Using /usr/local/lib/python3.6/site-packages/asn1crypto-0.24.0-py3.6.egg
Finished processing dependencies for webarchiveplayer==1.4.7

$ webarchiveplayer
Traceback (most recent call last):
File "/usr/local/bin/webarchiveplayer", line 11, in
load_entry_point('webarchiveplayer==1.4.7', 'console_scripts', 'webarchiveplayer')()
File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 572, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 2755, in load_entry_point
return ep.load()
File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 2408, in load
return self.resolve()
File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 2414, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/usr/local/lib/python3.6/site-packages/webarchiveplayer-1.4.7-py3.6.egg/archiveplayer/archiveplayer.py", line 3, in
ModuleNotFoundError: No module named 'pywb.framework'

Is there a clean way to run multiple instances of this?

Hi,

I'd like to be able to run several of these simultaneously, for multiple different WARCs, but each one needing a new port causes some issues...

I guess the easiest solution would be to make it cleanly increment ports; would it also be possible to spawn webarchiveplayer with multiple WARCs at once, as a list of arguments passed to the script?

Can't be used as 'default application for WARCs on OSX

If I right click and click 'Get info' on a WARC and then select webarchiveplayer as the default application for WARC files and hit change all I get the operation can't be completed. An unexpected error occurred (error code -10813).. when I expect it to change the file association.

error: requests 2.5.1 is installed but requests<3,>=2.9.1 is required by set(['tldextract'])

jaap@jaap:/opt$ cat /etc/debian_version
8.4
jaap@jaap:/opt$ uname -a
Linux jaap 4.4.0-0.bpo.1-amd64 #1 SMP Debian 4.4.6-1~bpo8+1 (2016-03-20) x86_64 GNU/Linux
jaap@jaap:/opt$ git clone https://github.com/ikreymer/webarchiveplayer.git; cd webarchiveplayer
Cloning into 'webarchiveplayer'...
remote: Counting objects: 401, done.
remote: Total 401 (delta 0), reused 0 (delta 0), pack-reused 401
Receiving objects: 100% (401/401), 132.99 MiB | 1.32 MiB/s, done.
Resolving deltas: 100% (223/223), done.
Checking connectivity... done.
/opt/webarchiveplayer
jaap@jaap:/opt/webarchiveplayer$ python setup.py install
running install
running bdist_egg
running egg_info
...
Installed /usr/local/lib/python2.7/dist-packages/tldextract-2.0rc1-py2.7.egg
Searching for requests-file>=1.4
Reading https://pypi.python.org/simple/requests-file/
Best match: requests-file 1.4
Downloading https://pypi.python.org/packages/source/r/requests-file/requests-file-1.4.tar.gz#md5=fe475905d26986ee5fe77d9bcae3efcb
Processing requests-file-1.4.tar.gz
Writing /tmp/easy_install-smkdnc/requests-file-1.4/setup.cfg
Running requests-file-1.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-smkdnc/requests-file-1.4/egg-dist-tmp-NW3O5e
zip_safe flag not set; analyzing archive contents...
Moving requests_file-1.4-py2.7.egg to /usr/local/lib/python2.7/dist-packages
Adding requests-file 1.4 to easy-install.pth file

Installed /usr/local/lib/python2.7/dist-packages/requests_file-1.4-py2.7.egg
error: requests 2.5.1 is installed but requests<3,>=2.9.1 is required by set(['tldextract'])
jaap@jaap:/opt/webarchiveplayer

Auto-index WARC in specified directory.

Add auto_load_dir option to config.yaml (webarchiveplayer section). If specified, check specified directory and index all WARC/ARCs on startup

eg: config.yaml

webarchiveplayer:
   auto_load_dir: ./warcs

Not working for me

Hi, I have a WARC file I've downloaded but for the life of me cannot get it to work with your player. I'm using windows 7 and don't have Python or anything like that installed... was wondering if you knew what the issue could be. Thank you.

enhancement: simple Wayback function

Hi

I really like the WebArchivePlayer because it is very simple for viewing a WARC-file.

The only thing I miss in this Player is the possibility to change from one archived version to another (simple Wayback function). It would be a great enhancement if the WebArchivePlayer could show that the displayed site also exists in another version and that the date can be changed.

In my example (Bistro_201602.warc renamed to Bistro_201602.warc.jpg) you can see the two versions of the page “offer” (http://www.bistrot-bern.ch/offer).

Thanks!
Chlara

bistro_201602 warc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.