global-healthy-liveable-cities / global_scorecards Goto Github PK

The code in this repository draws on results from the Global Healthy and Sustainable Cities Indicator Collaboration Study to generate policy 'scorecard' reports.

License: MIT License

Python 100.00%

health policy sustainability physical-activity liveability livability urban-planning spatial-analysis policy-analysis research-translation

global_scorecards's Introduction

Scorecards for the Global Healthy and Sustainable Cities Indicator Collaboration Study

The functionality of this code has now been fully incorporated into the current version of our Global Healthy and Sustainable City Indicators software. The code here specifically supplemented v1.0.0 of the tool, designed to produce analyses and reports for 25 cities as part of the Lancet Global Health Series on Urban Design, Transport and Health.

The code in this repository draws on results from the Global Healthy and Sustainable Cities Indicator Collaboration Study to generate policy 'scorecard' reports.

The respository is located as a subfolder of the study's spatial components repository, located within the analysis folder.

The version of the code tagged '0.1' was used to prepare the reports for our 25 city analysis (https://doi.org/10.25439/rmt.c.6012649), iterating over the processed set of cities and producing reports in multiple languages and associated fonts as configured. The current version of the code is designed to support the Global Healthy and Sustinable City Indicators Collaboration 1000 Cities challenge, supporting generation of reports for newly processed cities in conjunction with supplied policy review results. Configuration of reporting ties in with the indicators.yml and policies.yml files which are to be stored with optional modification in a project's configuration folder.

The code is run using the Docker image for our Global Healthy and Sustainable Cities Indicator Collaboration Study, and launched from the analysis/global_scorecards directory:

ghsci@docker-desktop:~/work/analysis/global_scorecards$ python _generate_reports.py --help
Global Healthy Liveable Cities Indicator Study Collaboration, version 1.2

Generate reports

usage: _generate_reports.py [-h] [--city CITY] [--generate_resources] [--language LANGUAGE] [--auto_language]
                            [--templates TEMPLATES [TEMPLATES ...]] [--configuration CONFIGURATION]

Reports and infographic scorecards for the Global Healthy and Sustainable City Indicators Collaboration

optional arguments:
  -h, --help            show this help message and exit
  --city CITY           The city for which reports are to be generated.
  --generate_resources  Generate images from input data for each city? Default is True.
  --language LANGUAGE   The desired language for presentation, as defined in the template workbook languages sheet.
  --auto_language       Identify all languages associated with specified cities and prepare reports for these.
  --templates TEMPLATES [TEMPLATES ...]
                        A list of templates to iterate outputs over, for example: "web" (default), or "web,print" The words
                        listed correspond to sheets present in the configuration file, prefixed by "template_",for example,
                        "template_web" and "template_print". These files contain the PDF template layout information required by
                        fpdf2 for pagination of the output PDF files.
  --configuration CONFIGURATION
                        An XLSX workbook containing spreadsheets detailing template layout(s), prose, fonts and city details to
                        be drawn upon when formatting reports.

So, given available translations have been defined, the follow would generate scorecards for Barcelona, Valencia, Vic and Mexico City in Spanish:

python _generate_reports.py --city ghent_v2 --auto_language

The code identifies required data sources, and extracts key information which is passed to functions in scorecard_functions.py to generate resources, format pages, and output the final PDF layout, according to the locations, pages and prose specified in the _report_configuration.xlsx file ¹.

Carl Higgs [email protected]

The _report_configuration.xlsx file makes use of relative formulas for placement of nested elements. This makes it easier to quickly move entire blocks of content around the page when fine-tuning the template, compared with a plain text format like CSV. It also contains multiple worksheets, used to define phrases in different languages and their associated font type faces. ↩

global_scorecards's People

Contributors

Watchers

global_scorecards's Issues

Import of geopandas warns ERROR 1: PROJ: proj_create_from_database: Open of /env/share/proj failed

Re-building the docker image, upon import of geopandas (in scorecard_functions.py) the following warning is observed

import geopandas
ERROR 1: PROJ: proj_create_from_database: Open of /env/share/proj failed

This occurred at first with a trial installation of Babel to address #8 but even when reverting to the previous version of the Docker image.

The explicit dependencies used to install the image using conda-forge were

python=3.9.*
geopandas=0.10.2
pygeos=0.12.0
osmnx=1.1.2
descartes=1.1.*
fpdf2=2.4.*
openpyxl=3.0.9

However, I noticed a relatively recent fiona update had been released so I also trialled using

fiona<1.8.21

I haven't seen any clear issues on the Geopandas repository referencing this, although I may investigate further before reporting it there.

generate_scorecard() function in scorecard_functions.py is too complicated

The generate_scorecard() function is too complicated and needs to be refactored to simplify the logic. Currently this function causes failure of scorecard_functions.py to pass flake8 at the git pre-commit hooks stage ("too complex"). To date I have been bypassing this to focus on core functionality, but to make simplify future required amendments including city-specific exceptions this complexity should be addressed.

By abstracting out the setting up of language- and city-specific phrases, we can reduce the complexity of this function and hopefully simplify the plotting functions that also rely on phrase look up.

Simplifying docker set up

Currently this repository's docker set up is based on that of https://github.com/global-healthy-liveable-cities/global-indicators, as I envisioned it as potentially evolving into a trial re-factoring (at least of the processing environment and configuration). However, I just posted an issue in our global indicators software proposing that while we update our tools, we could consider focusing this repo on reporting (its current role) while the other global_repository focuses on the analyses giving rise to the data outputs to be reported on.

https://github.com/global-healthy-liveable-cities/global-indicators/issues/139

Anyway, this is just a placeholder issue linking it to the one above, for consideration.

Localisation of number formatting

To date, numbers have been presented using commas as thousand seperators and periods as decimal seperators. This is not a universal standard however, and it was pointed out to me for example that in Germany the opposite is done.

So, we need to add in capacity for customisation of number formatting in tandem with translation.

Report prose should allow for city-specific exceptions

The reports currently allow for customisation of prose by language, but not by city. However, some cities may require additional clarification or other customisation of language-specific prose.

This could be incorporated via city-specific exceptions, defined through a JSON data structure. Where prose elements are found referenced for a city in its exception JSON, the city-specific element will be used for that city.

The specific application of this is for Seattle, which in our study actually refers to the broader Seattle metropolitan area (which also incorporates Tacoma and Bellevue). The 'series_intro' text should refer to the 'Seattle metropolitan area' to make this clear in official terms, so that when 'Seattle' is subsequently mentioned it is understood that this refers to the urban conglomeration surrounding Seattle.

Localised north arrow and units in maps

The North arrow in maps currently[^1] is annotated with an 'N' regardless of the translated language of the rest of prose.

The language for units should be abstracted so can be modified along with other map elements.

The image below shows an example snippet of the map for Hong Kong, where 'N' should be '北', and km should be '公里'

North arrow should have the label for north above, not below the arrow

Feedback was received from our Hong Kong research collaborator who is a GIS expert that the label for north (eg 'N' or '北') should be above, not below the arrow.

In principle this should be achievable easily through modification of the matplotlib annotation code, but could also be fiddly to do.

(In the above image, the 'N' should be on top of the arrow, according to the feedback received)

Option to output to city-specific sub-folders, rather than language specific sub-folders

To date, the score card reports for each city have been output in language specific subfolders (e.g. the 'English' language subfolder contains the reports for all 25 cities, while the 'Spanish - Mexico' subfolder only contains that for Mexico City).

An alternate way of storing the reports is to do this by city. This way, all of Barcelona's reports (Catalan, English, Spanish) are stored in the same folder.

Both approaches to organisation are useful, however storing by city will make upload of the reports to Figshare (which does not support programmatic replacement of existing files) much more straightforward. For example, to update the reports for Barcelona, this would be one drag and drop operation, rather than three; and for all cities, it would be 25, rather than 47 reports across 25 cities and 16 languages.

We could implement an optional argument 'by_city', which will then output city-specific sub-folders to a 'scorecards/by_city' folder.

Percent sign spacing

Feedback was received that a space should be present before percent sign in some languages (eg Czech).

As per https://en.wikipedia.org/wiki/Percent_sign#Spacing, the rules around this are somewhat complicated and contextual. If this is to be implemented, there should be consideration put into exactly how best to do this, and if there is existing localisation support in Python.

Citation for translated reports (English title in brackets)

Its suggested that the title for translations shouldn't be essentially the English version; its a fair point.

The current format is like:

Global Healthy & Sustainable City-Indicators Collaboration. 2022. Graz, Austria—Healthy and Sustainable City Indicators Report: Comparisons with 25 cities internationally (Deutsch). https://doi.org/10.25439/rmt.19614039

The justification I gave for this was

A common English language citation will make it easier to track any citation usage, especially in cases where the DOI is not cited
The English language citation wasn't finalised until very recently, which made it challenging to arrange for translation
We're reluctant to approach our collaborators for further translation advice because the coordination required for this across the 25 cities/16 languages with variable levels of engagement makes this challenging to coordinate (with a deadline for sign off)
While the citation is in English, it does include the translated name of the language
However, perhaps we should also include translator names as part of the citation for the translated documents

I think its a fair point though that we could implement a translated citation like the following example easily:

Global Healthy & Sustainable City-Indicators Collaboration. 2022. Köln, Deutschland - Bericht über gesunde und nachhaltige Stadtindikatoren: Internationaler Vergleich von 25 Städten (Cologne, Germany - Healthy and Sustainable City Indicators Report: Comparisons with 25 cities internationally).

This could be like

[English author name]. [Year]. [City, Country] - [Translated title] (English title; translator names). [DOI]

If others agree, I'll implement this

Matplotlib tick units should be localised according to country and language

Currently, tick marks for the threshold plots are handled by matplotlib's engineering ticker

# axis formatting
cax.xaxis.set_major_formatter(ticker.EngFormatter())

For numbers in the thousands this has the result of abbreviating units using a 'k' which is generally desirable, at least in English.

However, in Czech the meaning of 'k' is not natural/intuitive; for example, a better option would be "tisíce" (thousand).

There is a python library for localising units which we have implemented elsewhere in the code for this project, Babel, which has a format_unit() function that could potentially be used for this. However, I haven't seen examples of its use in the context of matplotlib, or specifically the engineering ticker. It may be beyond scope to address this issue, but ideally, we would deal with internationalisation/localisation of these units as we have elsewhere in the project for translations.

Aesthetic improvements of wind rose accessibility diagram

Suggestions were made for improvements to the wind rose diagram / access profile plot:

replace '...' with ':'
add % to axis labels

Adding % to axis labels may require left rotation of all axes to ensure the labels remain visible.

In addition, there are still challenges with word-wrapping in some languages; a further decrease in font size may be required.

Policy checklist statistics should be entered programatically

The policy statistics (% of cities with requirement met, by country income group) to date have been entered directly from the template where values had been copied across. However, notwithstanding that these values were copied across incorrectly, they should also be evaluated programatically so that if new cities were added they could be evaluated in a similar way. At the very least by entering values from the source numerical spreadsheet it ensures values and their derivation are transparent. The added side effect in this case should be, the values should be correct too!

Allow for non-breaking of words in languages without spaces between words

The wrapping / non-wrapping / justification of text is apparently not working well for Thai. This relates to word boundary definition in this language, and many other non-Latin character based languages, where words are not demarcated using spaces. In practice, in our current report implementation this is leading to inappropriate breaks, or inappropriately large gaps where phrases should be split at particular words but a lack of spaces mean they aren't.

Here are some links I located about this issue; I'm not sure yet how to best resolve it. While there are APIs which are used to identify word boundaries in languages like Thai for line breaking (eg ICU), and are implemented in web browsers like Chrome, I haven't yet been able to locate straightforward solutions that I can apply for this task yet.

Links:
https://stackoverflow.com/questions/8492763/thai-line-breaking-how-to-break-thai-text-effectively
https://opensource.googleblog.com/2016/10/budou-automatic-japanese-line-breaking.html
https://github.com/google/budou
https://gametorrahod.com/manual-word-breaking-approach-in-a-webpage/
https://unicode-org.github.io/icu/userguide/icu/
https://icu.unicode.org/download/70
https://github.com/unicode-org/icu/releases/tag/release-70-1
https://unicode-org.github.io/icu/userguide/boundaryanalysis/
https://www.w3.org/International/articles/typography/linebreak.en

Tamil text appears incorrect, regardless of font

The appearance of glyphs for Tamil using fpdf2 is incorrect. Apparently this relates to a currently known bug (March 2022) for fpdf2
py-pdf/fpdf2#365

In the case of Tamil, சென்னை, இந்தியா appears as

and if you select the text in the above PDF and paste it elsewhere (like in this post above), it appears correct.

Apparently it relates to the way 'devaganari conjuncts' are represented. But its very annoying for our Tamil reports.

2cm margin ideal for print layout (ie. consider separate templates for web and print)

We received advice from one of our collaborators print centres that 2cm margin on all pages is ideal for print consideration.

So as not to disrupt our existing web-friendly PDF layouts with any new experiments for a print layout, I suspect the best approach to quickly trial a re-arrangement of layout would be to allow for seperate layouts for print and web --- ie. a new configuration workbook sheet specifying layout specifically for print.

Regarding the advice, 2cm margin would presumably only be required on pages which may be subject to binding; however, it would be easiest to say '2cm' as a rule to keep it simple.

Anyway -- it will be interesting to see what can be achieved in 2 hours with regard to a draft approach to this to see if we can get it working, given all the effort put into creating our existing content and translations.

Update to work with re-factored global indicators workflow, and support 1000 cities challenge

The global_indicators repo had some large re-workings recently to allow the methods used for our 25-city study to more easily be generalised for new cities (eg for the 1000 cities challenge). To work with the new methods, including removal of between-city walkability comparison (as per https://github.com/global-healthy-liveable-cities/global-indicators/pull/143) and streamlined project configuration (https://github.com/global-healthy-liveable-cities/global-indicators/issues/144) the global_scorecards methods also need updating.

global-healthy-liveable-cities / global_scorecards Goto Github PK

global_scorecards's Introduction

Scorecards for the Global Healthy and Sustainable Cities Indicator Collaboration Study

Footnotes

global_scorecards's People

Contributors

Watchers

global_scorecards's Issues

Recommend Projects

Recommend Topics

Recommend Org