Giter Site home page Giter Site logo

openwayback's People

Contributors

aaronbinns avatar adam-miller avatar anjackson avatar aponb avatar ato avatar bitzl avatar bnfklm avatar csrster avatar dependabot[bot] avatar egh avatar hhockx avatar ibnesayeed avatar ikreymer avatar jason-ellis avatar johnerikhalse avatar kngenie avatar kris-sigur avatar ldko avatar logpanic avatar machawk1 avatar mohammedelsayyed avatar mtcjayne avatar nlevitt avatar obrienben avatar peveikko avatar psypherpunk avatar ptrourke avatar rogermathisen avatar rossh avatar vinaygoel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openwayback's Issues

Can StaticMapExclusionFilterFactory cope with SURT prefixes

I've been attempting to re-use the StaticMapExclusionFilterFactory code elsewhere, and although it says it can cope with SURT prefixes, I'm finding that they fail to load. The problem is that the system appears to catch the wrong exception.

https://github.com/internetarchive/wayback/blob/master/wayback-core/src/main/java/org/archive/wayback/accesscontrol/staticmap/StaticMapExclusionFilterFactory.java#L115

This says it captures URIException, but I'm seeing them get through:

org.apache.commons.httpclient.URIException: gnu.inet.encoding.IDNAException: Contains non-LDH characters. (org,

https://github.com/iipc/wayback/blob/master/wayback-core/src/main/java/org/archive/wayback/accesscontrol/staticmap/StaticMapExclusionFilterFactory.java#L115

Live Archiving Proxy integration

Live Archiving Proxy integration, requires SSL handling.

S'sheet line: 26
For whom? INA, BL, IA
Notes: Separate project, integrating LAP and Proxy-mode Wayback? e.g. forward request to LAP? Largely a documentation issue.
Est. Milestone: 2.0.x

Support WARC Revisit, inc. URL-Agnostic

Support WARC revisit, inc. URL-agnostic.

S'sheet line: 4
For whom? BNF, BL, DN
Notes: Revisits fixed. URL-agnostic working but at edge of spec?
Est. Milestone: 2.x.x

UI Localisation

S'sheet line: 29
For whom? BNF, NULI, BL, DN
Notes: There are holes, but framework ok. How to review? Consider switch to JSTL?
Est. Milestone: 2.0.x

HTTPS in proxy mode, full support

S'sheet line: 8
For whom? BNF, BL, DN
Notes: Same problem as for LAP. Documentation challenge. Look at LAP certificate magic.
Est. Milestone: 2.x.x

Better display of information panel/toolbar into frames and iframes

Better display of information panel/toolbar into frames and iframes.

S'sheet line: 13
For whom? BNF
Notes: Wayback toolbar multiple times, one in each frame. No scaling/frame-sense. At least the date in small frames. Perhaps test-suite driven solution?
Est. Milestone: 2.x.x

Clean up dependencies, enable CI and proper releases.

The goal is to reduce linkage to IA Maven artefacts, so that we can build using a SonaType POM and make clean releases. Should also enable Travis CI builds to work, and ensure that proper releases will go to Maven Central.

UI Customisation

S'sheet line: 27
For whom? BNF, NULI, BL, DN, IA
Notes: Document overlay stuff, keep UI stuff together/modular.
Est. Milestone: TBC

Avoid dependency on heritrix-commons SNAPSHOT release

The code is critically dependent on the heritrix-commons codebase, mainly for the WARC readers/writers. API changes between 3.1.1 and 3.1.2-SNAPSHOT mean that we cannot rely on a proper release at the moment.

This is really rather bad practice for code stability, as the dependant code can shift from underneath us, and this is even more pressing as the two codebases are now under the control of different groups.

Easiest solution is probably for IA to make a 3.1.2 release. Best solution is probably to pull the ARC/WARC code out into a separate project AND/OR shift over to the JWAT implementation. Clumsy fallback would be to make an IIPC release of H3 and depend on that instead.

IDN support

S'sheet line: 10
For whom? BNF, DN
Notes: CDX/indexing consequences? Need a test case. Heritrix issues, maybe just H1, so need H1 and H3 test cases.
Est. Milestone: 2.x.x

Display CDX metadata

Display CDX metadata, e.g. WARC filename.

S'sheet line: 11
For whom? BNF, DN
Notes: Option to show somewhere in UI.
Est. Milestone: 2.x.x

Dynamic JSP inserts

Dynamic JSP inserts (based on host).

S'sheet line: 24
For whom? BNF
Notes: Currently implemented in JavaScript.
Est. Milestone: 2.x.x

adding livewebPrefix to wayback.xml

Hello, Is it possible to add example of livewebPrefix to wayback.xml?
This one did not work for me:
After:
< property name="replayPrefix" value="${wayback.urlprefix}" />
< property name="queryPrefix" value="${wayback.urlprefix}" />
< property name="staticPrefix" value="${wayback.urlprefix}" />
Added:
< property name="livewebPrefix" value="${wayback.urlprefix}/liveweb/" />
Follow to the documentation - http://archive-access.sourceforge.net/projects/wayback/administrator_manual.html and https://github.com/internetarchive/wayback/blob/master/wayback-core/src/main/java/org/archive/wayback/webapp/AccessPoint.java I need to add it.

Extract page or subset of pages and associated embeds

S'sheet line: 15
For whom? BNF, IA
Notes: Download current page as (W)ARC. Multiple pages, e.g. every page visited in session, c.f. Zotero. Separate component that talks to the proxy? Solution not clear.
Est. Milestone: TBC

Better timeline

Decide on need to aggregate stats and then do some UI design.

WARCRecordToSearchResultAdapter.java needs a new case added to handle resource records

For whois records, WARCRecordToSearchResultAdapter.java needs a new case added to handle resource records.

(Ilya, see https://webarchive.jira.com/browse/ARI-3552)

/data/arcs/archive-it/ARCHIVEIT-4000-NONE-25860-20131018173129550-00000-wbgrp-crawl055.us.archive.org-6444.warc.gz
java.io.IOException: Failed parse of http status line.
at org.archive.io.RecoverableIOException.(RecoverableIOException.java:36)
at org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter.adaptWARCHTTPResponse(WARCRecordToSearchResultAdapter.java:273)
at org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter.adaptInner(WARCRecordToSearchResultAdapter.java:105)
at org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter.adapt(WARCRecordToSearchResultAdapter.java:74)
at org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter.adapt(WARCRecordToSearchResultAdapter.java:52)
at org.archive.wayback.util.AdaptedIterator.hasNext(AdaptedIterator.java:54)
at org.archive.wayback.util.AdaptedIterator.hasNext(AdaptedIterator.java:52)
at org.archive.wayback.resourcestore.indexer.IndexWorker.main(IndexWorker.java:209)
java.io.IOException: Failed parse of http status line.
at org.archive.io.RecoverableIOException.(RecoverableIOException.java:36)
at org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter.adaptWARCHTTPResponse(WARCRecordToSearchResultAdapter.java:273)
at org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter.adaptInner(WARCRecordToSearchResultAdapter.java:105)
at org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter.adapt(WARCRecordToSearchResultAdapter.java:74)
at org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter.adapt(WARCRecordToSearchResultAdapter.java:52)
at org.archive.wayback.util.AdaptedIterator.hasNext(AdaptedIterator.java:54)
at org.archive.wayback.util.AdaptedIterator.hasNext(AdaptedIterator.java:52)
at org.archive.wayback.resourcestore.indexer.IndexWorker.main(IndexWorker.java:209)
java.io.IOException: Failed parse of http status line.
at org.archive.io.RecoverableIOException.(RecoverableIOException.java:36)
at org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter.adaptWARCHTTPResponse(WARCRecordToSearchResultAdapter.java:273)
at org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter.adaptInner(WARCRecordToSearchResultAdapter.java:105)
at org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter.adapt(WARCRecordToSearchResultAdapter.java:74)
at org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter.adapt(WARCRecordToSearchResultAdapter.java:52)
at org.archive.wayback.util.AdaptedIterator.hasNext(AdaptedIterator.java:54)
at org.archive.wayback.util.AdaptedIterator.hasNext(AdaptedIterator.java:52)
at org.archive.wayback.resourcestore.indexer.IndexWorker.main(IndexWorker.java:209)

Dynamic collections unders same replay point

Dynamic collections unders same replay point. Proxy is fixed, so switching collections needed, added to session.

S'sheet line: 20
For whom? BNF
Notes: Related case for Archive-IT using HTTP Basic. Perhaps an API consideration.
Est. Milestone: TBC

jQuery getting stomped on

As was discussed over on SourceForge it looks like Wayback's use of jQuery is causing some problems because it uses an older version (v1.3.2) which can stomp on previously loaded versions, which can break functionality on the page. I think it may also interfere with any jQuery plugins that have already been installed.

I don’t know what the best solution is, but it seems to me there are (at least) four options:

  1. From a brief search It looks like jQuery takes backwards compatibility pretty seriously. So if Wayback must have a jQuery dependency perhaps it could simply be upgraded to use the latest version.
  2. It looks like it’s possible for multiple versions of jQuery to co-exist on the same page. So perhaps Wayback could be updated to use jQuery in this way, so that it doesn’t interfere with archived pages that also use jQuery?
  3. Perhaps Wayback should test to see if jQuery could be loaded before re-loading it? This is what was recommended in a previous bug report
  4. Perhaps Wayback doesn't need jQuery at all anymore?

Personally, I think 4 is probably the best option, to keep things as simple as possible. But it's not entirely clear to me why jQuery was added, and what it and the associated plugins, are currently used for.

Switch to Google Guava for public suffix API

While porting for #10, this happened:

One issue I noticed was that the archive-access code brings in entire heritrix-commons just for one class, which appears to be quite general purpose:

import org.archive.net.PublicSuffixes;

(indeed, there is a Google Guava class that does pretty much the same thing). This seems a little over the top, so I copied the PublicSuffixes to iipc-web-commons under the org.archive.url package, along with the corresponding unit tests and effective_tld data file.

This is rather clumsy, and given this is provided by Google Guava, there seems little point maintaining our own code (assuming theirs is kept up to date). The task is then to check that the Google one is well maintained and switch over to that instead of copying in code from elsewhere.

Problem with arcIndexer

Hi all,

Arc indexer send an exception like Created (escaped) uuri > 2083 and the indexation process stop it. How can I solve this problem?

Thanks all

Dynamic canonicalisation rules (based on host)

S'sheet line: 22
For whom? BNF
Notes: e.g hash-bang URIs, cross-component issues. Share examples first? API consideration. Shared rule-bank, versioned, between H and W. Perhaps a shared rule framework with local rules. Look at the two Wayback canonicalisation systems.
Est. Milestone: TBC

Better Spring

S'sheet line: 16
For whom? BNF
Est. Milestone: 2.x.x

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.