oduwsdl / mementoembed Goto Github PK
View Code? Open in Web Editor NEWA service that provides archive-aware oEmbed-compatible embeddable surrogates (social cards, thumbnails, etc.) for archived web pages (mementos).
License: MIT License
A service that provides archive-aware oEmbed-compatible embeddable surrogates (social cards, thumbnails, etc.) for archived web pages (mementos).
License: MIT License
Sometimes all images on a page are loaded via CSS. MementoEmbed should interrogate the CSS files for URIs found in the background-image
property.
Run the server and load the page http://localhost:5550/
then enter a URI-M and click "Create a Social Card" button. For the first time the card creation request is not processed, instead the page is redirected to http://localhost:5550/?#
(notice the added ?#
part in the URL), the form is cleared, and a message is logged in the console, saying we have failure with data: [object Object]
. Any further attempts will be processed as expected (unless the ?#
is removed from the URL again.)
Test URI-M to get started:
This will likely be re-opened as new, interesting URIs are tried, but this requirement is for a base number of URIs that are tested with the Internet Archive.
To start:
If these cards are intended to be used in other sites, it might be a good idea to utilize HTML CustomElements for bi-directional style isolation. We are using it in Reconstructive Banner. Here are some other resources that will help you get started with this fairly new HTML standard.
Every time I try to request a card for a URI-M like https://web.archive.org/web/20180604110141/http://www.example.com/
, the card is missing the thumbnail image and a 404 is reported in the developer console of the browser which points to a resource at http://localhost:5550/undefined
. This might be a duplicate of #30.
It's generally not considered a good practice to place JS as HTML attribute value in an obtrusive way so that when all the JS is extracted out into an external file, it does not leak into the markup. The click handler (such as onclick="requestEmbed();"
) from the following code can be removed and can be bound externally in an unobtrusive way.
<button type="button" class="btn btn-primary" onclick="requestEmbed();">Create a Social Card</button>
<button type="reset" class="btn btn-secondary" onclick="clearEmbed();">Clear URL</button>
URI-Ms to get started:
That last one is important for the "all" collection that contains all publicly-available Archive-It URI-Ms.
Many files in this repo currently do not have a trailing newline character which is not an standard practice.
This requirement will be delayed until we have URI-Ms to test.
"Please note that the Bibliotheca Alexandrina is currently migrating the web archive collection to a new storage system. Therefore, availability of archived webpages is not guaranteed for the time being. We appreciate your patience, and we look forward to the collection being fully available once again soon."
The following URI contains framesets that create an issue when searching for a striking image: https://www.webarchive.org.uk/wayback/archive/20050726120000/http://www.isb.org.uk/index.html
The system gives these messages during testing:
DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
At some point the warn function will no longer be available.
Sometimes the image returned generates a 404. This could be a temporary archive issue, but we should assume that it just wasn't crawled and isn't available.
FROM python:3.6.4-stretch
The base image is so specific that it would not update to even any non-breaking security patches (i.e., third point release in the semver scheme) automatically. For example, python:3.6.4-stretch
is already stale and python:3.6.5-stretch
is published. We should perhaps use python:3.6-stretch
instead to accommodate any security patch releases. Better yet, if Python 3.6 is not required specifically and any later version would work then we can use python:3-stretch
instead.
# TODO: publish archiveit_utilities so that we don't need to do this
RUN git clone https://github.com/shawnmjones/archiveit_utilities.git
RUN cd archiveit_utilities && pip install .
Each docker instruction adds one layer to the image. It is better to put relevant tasks and any associated cleanup should in a single layer for a cleaner image. Hence, above instructions can be consolidated in a single one.
From testing, this message shows up:
mementosurrogate.py:785: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
It is related to the readability
library.
After booting the server when the first card creation request arrives, the server logs a warning.
/app/mementoembed/mementosurrogate.py:785: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
if maxpara:
Some URI-Ms to get started:
This URI-M contains image URIs with the data scheme:
http://perma-archives.org/warc/20180501135244/https://www.flexispy.com/
They should be processed.
Ensure that a badge is placed on the README so that users understand the degree of stability of the current version of the code.
Is there anything we can do about this?
ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.86.26', 50357), raddr=('207.241.225.186', 443)>
This only shows up during unit tests and appears to be related to:
psf/requests#3912
MementoEmbed should at least link to the CarbonDate service for a given URI. Perhaps the CarbonDate endpoint could be made configurable?
Once the text snippets and thumbnails are implemented, this combination should not require too much additional effort.
Right now, the following error message appears on a red background:
The URL you supplied (https://www.flexispy.com/) is not a memento or comes from an archive that is not Memento-Compliant.
For a live web resource, you can create a memento that resides on the web in the following ways:
* Using the Internet Archive's Save Page Now button.
* Saving the web page at Archive.is
* Using the ArchiveNow service.
* Using a browser plugin, like Mink.
Happy Memento Making!
The red background is not friendly and there should be links to the recommended services.
To start, try this URI-M:
It appears that MementoEmbed does not execute properly at first load. If a user submits a URI-M at /, the system reloads the page to /#? rather than submitting the request. Once this reload has occurred, subsequent URI-M submissions are successful.
Some URI-Ms to get started:
MementoEmbed should at least link to the MementoDamage service for a given URI. Perhaps the MementoDamage endpoint could be made configurable?
On the bottom of the card, there is a datetime and source but the time is ambiguous, as it lacks a timezone. "GMT", "Z", or what ever is applicable ought to be appended.
MementoEmbed 6563c01 using Docker.
I've noticed that the system does not give the user a good message when the server is no longer running. It should say something constructive to the user.
In spite of using a custom heuristic with cachecontrol
, the system still skips the cache for some requests. Because web archives will likely block too many requests for the same URI-M, this will need to be alleviated for automated testing to be used with Travis CI.
Specifically Archive-It and archive.is do not appear to display their favicons.
Currently, issues are downloaded sequentially. Using the requests-futures
library, one can run multiple simultaneous requests.
After reviewing MementoEmbed's capabilities with a variety of images, it has become clear that an image scoring function will prove to be better than just selecting the largest image.
When a card generation request is made, it would be a good idea to have the URI-M in question be part of the URL to make the request page sharable.
We want to support other forms of surrogates besides social cards and thumbnails. Text snippet should be relatively easy to implement.
Some URI-Ms to start with:
A feature of oEmbed is the generation of thumbnails based on given height/width requirements. MementoEmbed needs to support this.
Sometimes a 404 is returned for a memento of a favicon. The resulting URI-M should not be used.
Keeping the list of dependencies in setup.py
(or any other files) sorted makes it more maintainable.
This project has issues with resolving content from archive.is URI-Ms. A potential solution may exist in the use of the ZIP URI containing the content that is rendered within the archive.is banner.
In the case of http://webarchive.parliament.uk/20100426094738/http://www.publications.parliament.uk/pa/cm199899/cmselect/cmagric/141/9020402.htm, the same image is checked multiple times, resulting in unnecessary network traffic.
We usually use MIT license for most of our codes, but you can chose whichever feels more suitable to you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.