Giter Site home page Giter Site logo

Comments (25)

frabcus avatar frabcus commented on August 19, 2024

It already does use the HTML part if there is one. (See _get_attachment_leaves_recursive in incoming_message.rb, near the comment "Take an HTML one as even higher priority").

For the main part, this is currently then rendered to text using elinks. (see _convert_part_body_to_text, where it says "XXX This is a bit of a hack as it is calling a convert to text routine. Could instead call a sanitize HTML one.". And then the code for HTML in IncomingMessage._get_attachment_text_internal_one_file)

It's using this elinks command line:
IO.popen("/usr/bin/elinks -dump-charset utf-8 -force-html -dump " + tempfile.path, "r") do |child|

This is relatively new code - it's absolutely true that the plain text part itself is often really bad e.g. not event containing vital hyperlinks. But the conversion via elinks of the HTML part improved that kind of problem loads.

So what are the specific requests where this is a problem? It is to do with colour or tables or what? It certainly shouldn't be to do with hyperlinks.

The answer is to find a good HTML renderer/sanitiser library and update _convert_part_body_to_text to show more of the HTML.

from alaveteli.

sebbacon avatar sebbacon commented on August 19, 2024

A couple of examples of badly-rendered emails would be useful when we come to address this.

from alaveteli.

hsenag avatar hsenag commented on August 19, 2024

I see, sorry for jumping to conclusions about which part was being rendered!

Here's one example of a reply where the user mentioned it to us: http://www.whatdotheyknow.com/request/grit_bins_locations_dates_reason_19#incoming-140959

In the past I've seen others which where hard to read and in some cases where the users complained to the authority about this. I'll try to remember to add them here when I come across more.

from alaveteli.

hsenag avatar hsenag commented on August 19, 2024

http://www.whatdotheyknow.com/request/power_line_technology_plt#incoming-154812

Here's another request where the response is hard to read, because (a) the different colour and indentation of the response is not shown, and (b) the response is all inline with the quoted message and hidden behind "Show quoted sections".

from alaveteli.

skenaja avatar skenaja commented on August 19, 2024

There are examples of replies with formatting issues in todo.txt

https://github.com/sebbacon/alaveteli/blob/master/todo.txt

from alaveteli.

skenaja avatar skenaja commented on August 19, 2024

Another example: http://www.whatdotheyknow.com/request/appendix_b_budget_savings_propos?unfold=1#incoming-176534

from alaveteli.

hsenag avatar hsenag commented on August 19, 2024

Another example where real content is hidden inside the quoted sections: http://www.whatdotheyknow.com/request/quality_risk_profile_south_essex#incoming-189862

from alaveteli.

skenaja avatar skenaja commented on August 19, 2024

Example of a .bmp embedded in the html part not showing up as an attachment:

http://www.whatdotheyknow.com/request/chronic_disease_in_lambeth?unfold=1#incoming-138418

from alaveteli.

hsenag avatar hsenag commented on August 19, 2024

In this case MIME decoding seems to have failed completely:
https://www.whatdotheyknow.com/request/pct_contacts_and_gp_systems_3#incoming-396756

the raw email opens fine in Thundebird.

from alaveteli.

RichardTaylor avatar RichardTaylor commented on August 19, 2024

There's a specific issue at

https://www.whatdotheyknow.com/request/children_and_adult_social_servic#incoming-1253330

Where only a HTML response was provided and Alaveteli's display of links containing spaces in URLs:
(" " not "%20")
resulting in the display of broken links.

Perhaps a similar issue to #3400

from alaveteli.

RichardTaylor avatar RichardTaylor commented on August 19, 2024

The issue with spaces in a URL causing an issue occurred again at

https://www.whatdotheyknow.com/request/fly_tipping_enforcement_in_londo_2#comment-90013

from alaveteli.

RichardTaylor avatar RichardTaylor commented on August 19, 2024

More specific issue: Improve/fix HTML rendering of tables #1528

from alaveteli.

RichardTaylor avatar RichardTaylor commented on August 19, 2024

A further example which doesn't obviously fit into any of the more specific tickets:
https://www.whatdotheyknow.com/request/missed_bin_collections_data_7#incoming-1659864

from alaveteli.

mdeuk avatar mdeuk commented on August 19, 2024

A further example which doesn't obviously fit into any of the more specific tickets:
https://www.whatdotheyknow.com/request/missed_bin_collections_data_7#incoming-1659864

Oh dear, that response is pretty mangled. It appears that the reply is being generated using the same software we've noted on #5905, so I wonder if there is some commonality between these issues.

In any case, the raw email renders as you'd expect in an email client, however its not clear whether or not the MIME encoding in the email is without error.

from alaveteli.

gbp avatar gbp commented on August 19, 2024

It appears that the reply is being generated using the same software we've noted on #5905, so I wonder if there is some commonality between these issues.

Yep - I can confirm its caused by the same issue

from alaveteli.

MattK1234 avatar MattK1234 commented on August 19, 2024

A user contacted us regarding https://www.whatdotheyknow.com/request/secure_email_contracts_23?unfold=1#incoming-1685108 because the inline responses using a different colour are not clearly shown on WhatDoTheyKnow.com

from alaveteli.

RichardTaylor avatar RichardTaylor commented on August 19, 2024

Noting a corrupted response at

https://www.whatdotheyknow.com/request/ipc_grants_11

in this case the responses don't open legibly in Mac Mail when the raw message is downloaded, so they're either corrupt on receipt or they're being mangled at an early stage by the system.

from alaveteli.

RichardTaylor avatar RichardTaylor commented on August 19, 2024

Another example of a response only being viewable after clicking "show quoted sections"

https://www.whatdotheyknow.com/request/milton_keynes_city_status_bid#incoming-1934835

from alaveteli.

RichardTaylor avatar RichardTaylor commented on August 19, 2024

Another example of a response only being viewable after clicking "show quoted sections"

https://www.whatdotheyknow.com/request/request_for_details_of_parking_i#incoming-1973914

This case is from the same council, and sent via the same system, as the previous one.

from alaveteli.

RichardTaylor avatar RichardTaylor commented on August 19, 2024

What looks like a one-off case at

https://www.whatdotheyknow.com/request/freedom_of_information_request_a_162#incoming-1454731

Is it possible the problem is content which looks like an email address, which would have been redacted, in the header of the part of the email containing an image ?


"--_005_CWXP265MB1797D3B3C4A15889D320B57CA6680CWXP265MB1797GBRP_
Content-Type: image/jpeg; name="119102214405700977.jpg"
Content-Disposition: inline; filename="119102214405700977.jpg"
Content-Id: <[email protected]>
Content-Transfer-Encoding: base64"

The system will have replaced that Content-ID element with a note saying the email is redacted?

from alaveteli.

FOIMonkey avatar FOIMonkey commented on August 19, 2024

All Frimley Health NHS Foundation Trust responses start with "Description:" repeated multiple times https://www.whatdotheyknow.com/request/foi_request_patients_treated_56#incoming-1933094

from alaveteli.

RichardTaylor avatar RichardTaylor commented on August 19, 2024

All Frimley Health NHS Foundation Trust responses start with "Description:" repeated multiple times https://www.whatdotheyknow.com/request/foi_request_patients_treated_56#incoming-1933094

That's a message from December 2021.

This email displayed without the repeated "Description:" when opened in the Mail App on OSX. The raw email contains the following:

--_000_LO2P265MB5546B640EC9FCD7027BB944D95719LO2P265MB5546GBRP_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

[Description: Description: Description: Description: Description: Descripti=
on: Description: Description: Description: Description: Description: Descri=
ption: Description: Description: Description: Description: Description: Des=
cription: Description: Description: Description: Description: Description: =
Description: Description: Description: Description: Description: Descriptio=
n: Description: Description: Description: Description: Description: Descrip=
tion: Description: Description: Description: Description: Description: Desc=
ription: Description: Description: Description: Description: Description: D=
escription: Description: Description: Description: Description: Description=
: Description: Description: Description: Description: Description: Descript=
ion: Description: Description: Description: Description: Description: Descr=
iption: Description: Description: Description: Description: Description: De=
scription: Description: Description: Description: Description: Description:=
 Description: Description: Description: Description: Description: Descripti=
on: Description: Description: Description: Description: Frimley Health FT c=
ol (3)]

Dear Requester

And the HTML version contains:


<body lang=3D"EN-GB" link=3D"blue" vlink=3D"purple" style=3D"word-wrap:brea=
k-word">
<div class=3D"WordSection1">
<p align=3D"right" style=3D"text-align:right"><span style=3D"font-size:10.0=
pt;font-family:&quot;Arial&quot;,sans-serif"><img width=3D"381" height=3D"7=
6" style=3D"width:3.9687in;height:.7916in" id=3D"Picture_x0020_2" src=3D"ci=
d:[email protected]" alt=3D"Description: Description: Descript=
ion: Description: Description: Description: Description: Description: Descr=
iption: Description: Description: Description: Description: Description: De=
scription: Description: Description: Description: Description: Description:=
 Description: Description: Description: Description: Description: Descripti=
on: Description: Description: Description: Description: Description: Descri=
ption: Description: Description: Description: Description: Description: Des=
cription: Description: Description: Description: Description: Description: =
Description: Description: Description: Description: Description: Descriptio=
n: Description: Description: Description: Description: Description: Descrip=
tion: Description: Description: Description: Description: Description: Desc=
ription: Description: Description: Description: Description: Description: D=
escription: Description: Description: Description: Description: Description=
: Description: Description: Description: Description: Description: Descript=
ion: Description: Description: Description: Description: Description: Descr=
iption: Description: Frimley Health FT col (3)"></span><span style=3D"font-=
size:10.0pt;font-family:&quot;Arial&quot;,sans-serif"><o:p></o:p></span></p=
>

The email appears to have been generated by NHS Microsoft systems.

There is a further example from the same body at

https://www.whatdotheyknow.com/request/ooutsourcing_radiology_imaging#incoming-473633 (from 2014)

Googling suggests the issue might be related to a Microsoft Outlook bug where saving an OFT (Outlook File Template)

https://answers.microsoft.com/en-us/outlook_com/forum/all/outlook-2010-picture-alt-text/59aac086-8ac0-4afb-83e2-ca765b8e8bab

I don't think there's anything Alavetlei should do here, we could point those generating the problematic emails to above link describing the bug. Upgrading or changing email system might be a way for public bodies to prevent this issue.

from alaveteli.

FOIMonkey avatar FOIMonkey commented on August 19, 2024

Another example of the description spam from this month: https://www.whatdotheyknow.com/request/epr_solutions_26#incoming-1999411
It does look like the outlook bug is to blame. Other authorities are also affected to a much lesser extent eg https://www.whatdotheyknow.com/request/information_on_facilities_manage_188#incoming-2002336 and https://www.whatdotheyknow.com/request/invoices_for_sp_beautiful_brows#incoming-2002002 from today.

from alaveteli.

FOIMonkey avatar FOIMonkey commented on August 19, 2024

The substantive part of the response to this request isn't displayed at all for some reason: https://www.whatdotheyknow.com/request/avaliable_propertties_foi_2000#incoming-1046540
2022-04-11
2022-04-11 (1)

from alaveteli.

FOIMonkey avatar FOIMonkey commented on August 19, 2024

Another occurrence of spaces in URLs causing broken links: https://www.whatdotheyknow.com/request/pass_card_guidance#incoming-2117148

from alaveteli.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.