beatonma / django-wm Goto Github PK
View Code? Open in Web Editor NEWAutomatic Webmention functionality for Django models
Home Page: https://beatonma.org/webmentions_tester/
License: GNU General Public License v3.0
Automatic Webmention functionality for Django models
Home Page: https://beatonma.org/webmentions_tester/
License: GNU General Public License v3.0
The HCard parser currently uses mf2py to get a list of "items" and only looks for an h-card
item at the top level of that, i.e. alongside the h-entry
item.
I think it's also OK to have a p-author h-card
within an h-entry
to indicate its author. For example, from the Microformats h-entry wiki page:
<article class="h-entry">
<h1 class="p-name">Microformats are amazing</h1>
<p>Published by <a class="p-author h-card" href="http://example.com">W. Developer</a>
on <time class="dt-published" datetime="2013-06-13 12:00:00">13<sup>th</sup> June 2013</time></p>
<p class="p-summary">In which I extoll the virtues of using microformats.</p>
<div class="e-content">
<p>Blah blah blah</p>
</div>
</article>
I guess this would require adding to the _parse_hcard()
method, so that if the item is an h-card
, to look for an author
item within it (or more than one?) and see if it has an 'h-card'?
I guess any author h-card should take precedence over an h-card
at the same level as the h-entry
...?
It all gets a bit complicated!
I can’t seem to get incoming web mentions working correctly on a Wagtail install, and I’m hoping you can offer some pointers.
I get the the Unable to find matching page on our server for url
error when I try to process incoming mentions.
Background Info
I am using Wagtail as my CMS. I have a PostPage
model for blog posts. It extends Wagtail’s Page
model, and includes the mixins from wagtail-seo and django-wm.
The mentions
app and middleware are enabled in my settings, and mentions.urls
is in my urls.py
.
I have the following in my settings:
DOMAIN_NAME = "polytechnic.co.uk"
WEBMENTIONS_USE_CELERY = False
WEBMENTIONS_AUTO_APPROVE = False
WEBMENTIONS_INCOMING_TARGET_MODEL_REQUIRED = False
WEBMENTIONS_ALLOW_SELF_MENTIONS = False
The PostPage model implements the all_text
and get_absolute_url
methods:
class PostPage(SeoMixin, Page, MentionableMixin):
# field definitions
def all_text(self):
return f"{self.overview} {self.body}"
def get_absolute_url(self):
# Avoid circular imports
from blog.templatetags.blogapp_tags import post_page_date_slug_url
return post_page_date_slug_url(self, self.blog_page)
get_absolute_url()
seems to be doing the right thing:
>>> pp = PostPage.objects.get(title="dConstruct 2022")
>>> pp.title
'dConstruct 2022'
>>> pp.get_absolute_url()
'/blog/2022/10/dconstruct-2022/'
Is there something else I’m missing?
django-wm
includes concrete models that require database migrations, but no migration files have been included in the package. Running makemigrations
generates at least one new migration file inside the package directory (in site-packages
or wherever django-wm
is installed).
If these concrete models are altered in a later update, upgrading django-wm
may start to get painful for downstream users. Even if a user creates the initial migration for the mentions
app, upgrading to a new version may actually remove that file, as the full source of the package is replaced with the newer version. At that point, running makemigrations
would again create a 0001_initial
migration file.
Because a migration with the matching filename would already be tagged in the database as "completed", the changes from the new version would not be migrated at all. Users may start to experience errors due to a mismatch between schema and models that can only be resolved with some manual intervention.
Recommendations:
0001_initial
migration file for the mentions
app into the projectThe HCard.from_soup()
method tries to loop through all of the items
that mf2py has found in the page source:
for item in parsed_data.get("items", []):
try:
return _parse_hcard(item, save)
except NotEnoughData as e:
log.debug(e)
continue
But this currently results in it giving up after the first item, if that wasn't an h-card
that generated a NotEnoughData
exception.
e.g. if there are two items found, an h-entry
and then an h-card
, _parse_hcard()
will return None
when it looks at the h-entry
, and so from_soup()
will immediately do the same, before it can send the h-card
to _parse_hcard()
.
A difficulty I've found integrating this into my existing site is with the requirement for the URL pattern for a MentionableMixin
model to just have a slug
.
I've added MentionableMixin
to my Post
model, but the URL pattern for its get_absolute_url()
/ detail view is like:
"<slug:blog_slug>/<int:year>/<int:month>/<int:day>/<slug:post_slug>/"
I could change post_slug
to be just slug
but the Post.slug
field has a unique_for_date
constraint on it, and there are many Posts
with identical slugs, but with different dates. So the get_model_for_url_path()
wouldn't be able to find a Post
solely by using the slug
field.
My first thought at a solution to this would be to add an optional setting to replace the get_model_for_url_path()
function, something like this (as a default):
WEBMENTIONS_GET_MODEL_FUNCTION = "mentions.resolution.get_model_for_url_path"
I'm not sure that's the best name, or that it's the best idea, but still.
It would allow the replacement of that function with another that would receive a URL and optional ResolverMatch
and either return the found model or else raise one of BadConfig
or TargetDoesNotExist
exceptions. It would give flexibility over exactly how that model should be found. Is that enough flexibility? Is it a bit too complicated to describe how to replace the function? Will this all end in tears?
As ever, I'm open to other, better ideas.
I am using custom forms on my site, but after upgrading from v3.1.0 to v4.0.0, the allow_outgoing_webmentions
checkbox does not display. The form views look like this:
class AdminNoteCreate(LoginRequiredMixin, CreateView):
model = Note
template_name = "note.html"
fields = [
"text",
"in_reply_to",
"create_date",
"rss_only",
"allow_outgoing_webmentions",
]
And the template looks like this:
<h4>Note Meta</h4>
<div>
{{ form.allow_outgoing_webmentions }}
<label for="{{ form.allow_outgoing_webmentions.id_for_label }}">Send webmentions</label>
</div>
<div>
{{ form.rss_only }}
<label for="{{ form.rss_only.id_for_label }}">RSS only</label>
</div>
This is what I see:
I've just tried to install django-wm but hit an issue where its specified version of requests
is older than that allowed by other packages I'm already using.
setup.cfg
specifies requests ~= 2.20.0
, so anything like 2.20.*
. But requests is now up to 2.27.1.
Other things I'm using in this project, as examples, have >=2.2.1
(flickrapi), >=2.1.0
(twython), >=2.0,<3.0
(responses).
Any chance it could be given a more lenient requirement?
(Sorry, after coming here ages ago to ask about a non-celery version of django-wm, I disappeared and only now have I found time to get to grips with it again... and here I am causing trouble once more! These kinds of interrelated dependencies can be such a pain and I only assume I haven't hit a similar problem with my own packages because nobody uses them :) Thanks for your time.)
Hi,
I'm seeing a bug where a webmention is being continuously processed and keeps adding entries to the Webmentions page in admin (my cron job runs at 45 minutes past every hour).
Each entry has the note:
Unable to find matching page on our server for url 'https://polytechnic.co.uk/blog/2022/03/site-update/'
Which is a separate issue at my end in trying to get django-wm to play with Wagtail, I have other incoming webmentions that fail to find the post, but aren't repeating like this.
There's nothing outstanding in the "Pending incoming webmentions" screen.
I've spoken with @philgyford (the source of the webmention) and he doesn't see anything strange at his end.
Any ideas on how to diagnose this? Happy to give you a database snapshot if needed.
python 3.7.3
Django 3.2.14
django-wm 3.1.0
If target and source URLs are successfully submitted via the form in WebmentionView
then it returns:
return HttpResponse("Thank you, your webmention has been accepted.", status=202)
If this was replaced with a basic template then it would be possible for projects using the app to use the view, but to override both the mentions/webmention-submit-manual.html
template and the new "thank you" response template, to keep appearances consistent.
With 4.0.0, when I run my site's tests I get dozens of warnings from here:
if not settings.DEBUG and scheme != "https":
log.warning(
f"settings.{SETTING_URL_SCHEME} should not be `http` when in production!"
)
Because when running my tests, DEBUG is False and I'm not using https. It's not "in production". I can figure out how to stop my tests from showing warning-level log messages, but also part of me wonders if this is an appropriate message? Is it up to django-wm to police a site's use of http/https?
A minor quibble, so feel free to disagree and close this :)
Michael,
I receive this error when I am trying to add MentionableMixin to my 'Posts" model.
ImportError: cannot import name 'MentionableMixin' from 'mentions' (/lib/python3.7/site-packages/mentions/__init__.py)
You have any thoughts on that? Thanks a ton in advance.
Best,
Rasul
I've added MentionableMixin
to a Post
model, that has statuses of "Draft" and "Published", and I'm trying to work out how I can ensure that webmentions are only sent when a Post
object's saved and it's in "Published" state. At the moment, overriding the save()
method, I can't figure out a way to do it.
My initial thought is to give MentionableMixin
a get_allow_outgoing_webmentions()
method that, by default, just returns the value of allow_outgoing_webmentions
. Child classes could override this with something like:
def get_allow_outgoing_webmentions(self):
if self.allow_outgoing_webmentions and self.status == LIVE:
return True
else:
return False
Then the existing MentionableMixin.save()
method could check that instead of allow_outgoing_webmentions
before handling them.
I guess add a matching get_allow_incoming_webmentions()
method for completeness too.
If I save a MentionableMixin
object that contains links to anchors elsewhere on the page, those are treated as targets to which outgoing webmentions should be sent.
e.g. On this page there is this HTML:
<p id="s5"><a class="section-anchor" href="#s5" title="Link to this section">§</a> ...
and I end up with Outgoing Webmention Statuses that include:
etc. I guess filter out any links that are #
fragment links to the page itself?
More of a question than an issue: I have a couple of legacy models that do not have a slug
field, and am wondering if it is possible to enable mentions for them. I have tried defining a slug
in the model:
@property
def slug(self):
return self.id
… but it doesn't seem to work; I get:
File "[_PATH TO DJANGO..._]/venv/lib/python3.10/site-packages/django/db/models/sql/query.py", line 1709, in names_to_path
raise FieldError(
django.core.exceptions.FieldError: Cannot resolve keyword 'slug' into field. Choices are: [_MODEL FIELD NAMES..._]
)
Is there a way I can work around this?
Webmention
has a quote
field but, as far as I can tell, it's never set.
I assume it would be good to set it in mentions.tasks.incoming_webmentions.process_incoming_webmention()
using a tweaked _update_wm()
function, but I'm not sure if you have in mind what text the quote should be?
Maybe use BeautifulSoup to try a few ways of getting something representative from the source page? Maybe grab the .e-content
's content (falling back to.h-entry
, then trying <main>
, then trying the meta description?), and then use the HTML-stripped, start of that?
When sending a webmention to a site that returns multiple link
s in the header, the resolved endpoint URL gets mashed together with other link
values. For example, if the site returns a header like this:
HTTP/1.1 200 OK
Date: Mon, 03 Oct 2022 19:33:58 GMT
Content-Type: text/html
Content-Length: 16887
Last-Modified: Sun, 02 Oct 2022 20:46:59 GMT
link: <https://websub.io>; rel="websub"
link: <https://websub.io>; rel="websub"
link: <https://webmention.io>; rel="webmention"
Referrer-Policy: no-referrer
X-Content-Type-Options: nosniff
Accept-Ranges: bytes
The URL endpoint resolves as:
https://websub.io>; rel="websub",<https://websub.io>; rel="websub",<https://webmention.io
I think it has something to do with the regex in the _get_endpoint_in_http_headers function; if the string "webmention" is found in the headers, it looks like it might be matching everything from the https
at start of the first URL until it finds the webmention URL?
Sorry, me again. I've found a couple of issues, which I don't think are related, so making two issues for them.
First, running ./manage.py pending_mentions
manually I end up with a lot of lines like this:
The target URL could not be retrieved: Invalid URL '#s3': No scheme supplied. Perhaps you meant http://#s3?.
The target URL could not be retrieved: Invalid URL '#s8': No scheme supplied. Perhaps you meant http://#s8?.
The target URL could not be retrieved: Invalid URL '#s5': No scheme supplied. Perhaps you meant http://#s5?.
The target URL could not be retrieved: Invalid URL '#s9': No scheme supplied. Perhaps you meant http://#s9?.
The target URL could not be retrieved: Invalid URL '#s8': No scheme supplied. Perhaps you meant http://#s8?.
The target URL could not be retrieved: Invalid URL '#s8': No scheme supplied. Perhaps you meant http://#s8?.
The target URL could not be retrieved: Invalid URL '#s3': No scheme supplied. Perhaps you meant http://#s3?.
I had to stop it eventually, after 550 similar lines. It looks like it's still trying to send webmentions to #anchor
links? My pages have a few of these in, e.g. the § markers on https://www.gyford.com/phil/writing/2022/05/15/weeknotes/
Not at all critical, but there are times when I have multiple links in a post and know that some of the sites don't accept webmentions. It would be cool if there was a way to exclude outgoing webmentions from being sent to those domains that are known not to accept them to stop them from clogging up the admin. I could see this being done either as a setting (e.g.):
WEBMENTIONS_EXCLUDE_DOMAINS = ["domain1.com", "domain2.com"]
…or even as a class added to a link on a case-by-case basis (e.g.):
<a class="wm-nosend" href="https://domain.com/page.html">link text</a>
If I link to a page on my own site that doesn't represent an object with MentionableMixin
then it registers a mention to that URL. This results in a Webmention object that has no target_object
etc.
For example, on this page representing a Post
(the only model on the site that has MentionableMixin
) I link to this page https://www.gyford.com/phil/creators/8kk28/ . This results in these Webmentions:
With the relevant one like this:
I'm not sure why this happens, given the target URL (/phil/creators/8kk28/
) doesn't match the URL of the only MentionableMixin
model?
I've taken to running some tests across my projects' Admin classes and came across this errors in django-wm's:
QuotableAdmin
it lists "hcard"
in search_fields
– but that's a ForeignKey and can't be searched on. Maybe it should be "hcard__name"
?I have fully implemented the django-wm library. I tested for incoming webmentions via https://django-wm.dev/. Everything works fine, I can receive the Hcard information and the mention itself.
When it comes to outgoing mentions, here I am running into a couple of issues. I have tested the outgoing mention using https://django-wm.dev/ and https://webmention.rocks/. Neither worked. The link to a "mentionable" post is https://rasulkireev.com/writings/wm-test-1.
The problem is that I am not receiving any errors, so I am not entirely sure what to do. @beatonma Do you have any ideas or suggestions? Thanks a ton in advance.
From v2.0.0 onwards, mf2py supports img alt text by default, which breaks incoming webmentions, since (I think!) the line hcard.avatar = urljoin(source_url, hcard.avatar)
in mentions/tasks/incoming/remote.py
expects source_url
to be a string, but is being passed an object with value and alt instead of just the URL string. Trying to parse incoming webmentions throws TypeError: Cannot mix str and non-str arguments
(see stack trace below).
The current requirements (mf2py>=1.1.2
) will mean a clean install of django-wm will default to the latest version (currently 2.0.1). Downgrading to version 1.1.3 does solve the problem, however.
I'm not entirely clear how or why this started or continued but, from what I can tell, going back to the first instance of the error in Sentry for my website...
When sending outgoing webmentions, requests timed out and generated an error. I think this meant that the code stopped, with the new PendingOutgoingContent
object created but not deleted. Then next time the script runs, it creates another PendingOutgoingContent
object and tries again, and repeat. A month later I actually notice this is happening and I have 37,000 outgoing webmention statuses, mostly from this post and then more duplicates from subsequent ones!
It's possible it's related to several of the links being from my website to other posts on my site (or #anchor
links to the same page (#32)) and so the server is coping with both opening outgoing connections and handling the incoming ones - maybe if something's slow it uses up its connections and so everything gets even slower, etc, etc.
Anyway. Here's the traceback from the first time it happened:
TimeoutError: [Errno 110] Connection timed out
File "urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "urllib3/util/connection.py", line 95, in create_connection
raise err
File "urllib3/util/connection.py", line 85, in create_connection
sock.connect(sa)
NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f05e0f6a940>: Failed to establish a new connection: [Errno 110] Connection timed out
File "urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "urllib3/connectionpool.py", line 1040, in _validate_conn
conn.connect()
File "urllib3/connection.py", line 358, in connect
self.sock = conn = self._new_conn()
File "urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
MaxRetryError: HTTPSConnectionPool(host='www.gyford.com', port=9494): Max retries exceeded with url: /webmentions/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f05e0f6a940>: Failed to establish a new connection: [Errno 110] Connection timed out'))
File "requests/adapters.py", line 440, in send
resp = conn.urlopen(
File "urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
ConnectionError: HTTPSConnectionPool(host='www.gyford.com', port=9494): Max retries exceeded with url: /webmentions/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f05e0f6a940>: Failed to establish a new connection: [Errno 110] Connection timed out'))
File "manage.py", line 22, in <module>
execute_from_command_line(sys.argv)
File "django/core/management/__init__.py", line 446, in execute_from_command_line
utility.execute()
File "django/core/management/__init__.py", line 440, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "django/core/management/base.py", line 414, in run_from_argv
self.execute(*args, **cmd_options)
File "django/core/management/base.py", line 460, in execute
output = self.handle(*args, **options)
File "mentions/management/commands/pending_mentions.py", line 28, in handle
handle_pending_webmentions(incoming=incoming, outgoing=outgoing)
File "mentions/tasks/scheduling.py", line 66, in handle_pending_webmentions
process_outgoing_webmentions(wm.absolute_url, wm.text)
File "mentions/util.py", line 46, in __call__
return func(*args, **kwargs)
File "mentions/tasks/outgoing_webmentions.py", line 82, in process_outgoing_webmentions
result = _process_link(source_urlpath, link_url)
File "mentions/tasks/outgoing_webmentions.py", line 145, in _process_link
success, status_code = _send_webmention(source_urlpath, endpoint, link_url)
File "mentions/tasks/outgoing_webmentions.py", line 237, in _send_webmention
response = requests.post(endpoint, data=payload)
File "requests/api.py", line 117, in post
return request('post', url, data=data, json=json, **kwargs)
File "requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "requests/sessions.py", line 529, in request
resp = self.send(prep, **send_kwargs)
File "requests/sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
And here's the "breadcrumbs" from Sentry (updated to correct image since I first posted this):
A couple of things that we could do:
requests
which the docs say should always be done. That would at least make failure happen more quickly. Given I was running the management command every 10 minutes (too frequently in retrospect!) the server was spending a lot of time trying to send webmentions!PendingOutgoingContent
object, so I'm not sure what should happen to it if a connection fails, but maybe there's something to do there?FWIW, when I've used requests
in the past this is what I've done, which feels rather too laborious but I pieced it together from trying to capture all the possible failures:
def send_request(url):
error_message = ""
try:
response = requests.get(url, timeout=5)
except requests.exceptions.ConnectionError:
error_message = "Can't connect to domain."
except requests.exceptions.Timeout:
error_message = "Connection timed out."
except requests.exceptions.TooManyRedirects:
error_message = "Too many redirects."
try:
response.raise_for_status()
except requests.exceptions.HTTPError:
# 4xx or 5xx errors:
error_message = "HTTP Error: %s" % response.status_code
except NameError:
if error_message == "":
error_message = "Something unusual went wrong."
if error_message:
return {"success": False, "content": error_message}
else:
return {"success": True, "content": response.text}
(It's also possible I'm entirely misreading what's caused this!)
The webmention implementation on my current site allows me to specify the type of webmention, so I can group them on a post, this post for example, showing the webmentions sorted into likes/favourites, reposts/retweets, and comments.
The comment functionality is handled by QuotableMixin
, but it would be good to have some optional flags to extend this to cover the other scenarios.
I'm imagining a type
field on QuotableMixin
that could either be a dictionary of options, or just a charfield, and then some logic in process_incoming_webmention
to identify the type.
This would be especially useful when using something like Bridgy.
Happy to take a run at a PR if this is something you'd be interested in adding?
I was able to go through the whole process of setting up the Celery worker, Redis database, and all the settings for the django-wm. I am running the celery worker on my Ubuntu 18.04 droplet and when I am testing the incoming submission I get the following error:
[2020-01-16 23:31:12,680: ERROR/ForkPoolWorker-1] Task mentions.tasks.incoming_webmentions.process_incoming_webmention[7edc74b1-8e2a-4111-b8cf-3d0c963f9250] raised unexpected: RuntimeError("Model class django.contrib.flatpages.models.FlatPage doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS.",)
Traceback (most recent call last):
File "/var/www/dj-pw/venv/lib/python3.6/site-packages/celery/app/trace.py", line 385, in trace_task
R = retval = fun(*args, **kwargs)
File "/var/www/dj-pw/venv/lib/python3.6/site-packages/celery/app/trace.py", line 650, in __protected_call__
return self.run(*args, **kwargs)
File "/var/www/dj-pw/venv/lib/python3.6/site-packages/mentions/tasks/incoming_webmentions.py", line 42, in process_incoming_webmention
obj = _get_target_object(target)
File "/var/www/dj-pw/venv/lib/python3.6/site-packages/mentions/tasks/incoming_webmentions.py", line 111, in _get_target_object
return get_model_for_url_path(path)
File "/var/www/dj-pw/venv/lib/python3.6/site-packages/mentions/util.py", line 24, in get_model_for_url_path
from django.contrib.flatpages.views import flatpage
File "/var/www/dj-pw/venv/lib/python3.6/site-packages/django/contrib/flatpages/views.py", line 2, in <module>
from django.contrib.flatpages.models import FlatPage
File "/var/www/dj-pw/venv/lib/python3.6/site-packages/django/contrib/flatpages/models.py", line 8, in <module>
class FlatPage(models.Model):
File "/var/www/dj-pw/venv/lib/python3.6/site-packages/django/db/models/base.py", line 111, in __new__
"INSTALLED_APPS." % (module, name)
RuntimeError: Model class django.contrib.flatpages.models.FlatPage doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS.
I figured the error is due to the lack of django.contrib.flatpages
in my INSTALLED_APPS. I followed this instruction to enable it, but that did not help unfortunately.
Would you happen to know what is the issue? Thanks a ton in advance.
I want to add webmentions to my own website and I really like the look of django-wm – there are so many nice and useful touches. But I don't currently use Celery on it and I'm unlikely to get it set up just to enable webmentions.
Would you be open to me adding an option to work without Celery? e.g. Add two custom management commands to process any outstanding incoming and outgoing webmentions, which could then be run using cron or similar.
I haven't thought in further detail what would be involved in making this work – maybe it would require either adding extra field(s) to the models to indicate whether they were waiting to be processed, or adding another model to store this queue...
Anyway, if you're open to the idea I'd be happy to give it a go, and if you have any better ideas about how to approach it, that would be appreciated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.