unfoldingword-dev / translationdatabaseweb Goto Github PK

Python 48.54% HTML 47.76% JavaScript 1.54% Shell 0.03% Less 2.08% Dockerfile 0.05%

translationdatabaseweb's Introduction

translationDatabase

Goals

The goals for translationDatabase are to manage and track data for languages and the progress of getting unrestricted biblical content into every language.

For more information on the unfoldingWord project, see the About page.

Data Sources

A lot of the sources of data are pull into and managed as repo as part of the Debian project called simply, ISO Codes.

In Use

Other Potential Sources

Getting Started

To setup a new working environment of this project, several items are needed:

Python (consult the requirements.txt for specific libraries/packages)
Redis
Postgres
Node

Building Static Media

npm install
npm run watch     # run a watcher on the static folder
npm run build     # builds static and exits
npm run buildprod # builds for production (uglify/minification)

Initialize the Database

After installing requirements (via pip) within your environment or virtualenv:

python manage.py migrate
python manage.py loaddata sites
python manage.py loaddata uw_network_seed
python manage.py loaddata uw_region_seed
python manage.py loaddata uw_title_seed
python manage.py loaddata uw_media_seed
python manage.py loaddata additional-languages
python manage.py reload_imports

At this point, the basic country and language datasets will be populated but without many optional fields or extra data.

Updating the `/exports/langnames.json` and `/exports/langnames_short.json` endpoints

When languages are added or updated, run this command to update the data locally:

python manage.py rebuild_langnames

Switch to the master branch and run this command to update the data on the server:

ec run web python manage.py rebuild_langnames

Docker Deployments

translationDatabase was previously built using the Heroku-18 stack and deployed on Heroku dynos.

It is now being deployed using Heroku's Docker container support.

This was configured via:

heroku stack:set container -a ${HEROKU_APP_NAME:-translation-database-demo}

(and repeated using HEROKU_APP_NAME=translation-database).

Deploying via heroku.yml

This application can be deployed to Heroku via Git.

Heroku's documentation on Git / GitHub deployments can be found here:

To deploy the master branch to the translation-database-demo site:

git checkout master
heroku git:remote -a translation-database-demo
git push heroku master:main

For additional documentation, see Building Docker Images with heroku.yml

Building the Docker image manually

These instructions are provided as a convenience; the application should be deployable following Deploying via heroku.yml above.

NOTE: This assumes that you have a version of Docker installed.

Build the production image

rm -Rf archive archive.tgz
git archive HEAD > archive.tgz
mkdir -p archive
tar -xvf archive.tgz -C archive
cd archive

docker build --platform=linux/amd64 -f Dockerfile -t td .
cd ..
rm -Rf archive

Run via

# assumes environment variables populated in
# .dev-env file
docker run --name=td --rm -d --env-file ./.dev-env -p 8000:8000 td

Push the Docker image to Heroku manually

If you wanted to deploy the pre-built image to Heroku, you would need to:

have access to the translation-database-demo and translation-database apps on Heroku
have authenticated with the Heroku Container Registry

Tag and push for $APP_NAME:

docker tag td registry.heroku.com/${HEROKU_APP_NAME:-translation-database-demo}/web
docker tag td registry.heroku.com/${HEROKU_APP_NAME:-translation-database-demo}/worker
docker push registry.heroku.com/${HEROKU_APP_NAME:-translation-database-demo}/web
docker push registry.heroku.com/${HEROKU_APP_NAME:-translation-database-demo}/worker

Release to Heroku

heroku container:release web worker -a ${HEROKU_APP_NAME:-translation-database-demo}

Repeat the steps above with the HEROKU_APP_NAME variable set for the production environment:

export HEROKU_APP_NAME=translation-database

translationdatabaseweb's People

Contributors

Stargazers

Watchers

Forkers

wycliffeassociates knaveenjoshi kidaa phillip-hopper eldarion claudep jacobwegner

translationdatabaseweb's Issues

tW IDs

From @jag3773 on July 16, 2015 21:2

We need to hash the page's filename for the ID.

Copied from original issue: unfoldingWord-dev/uwadmin#42

Dynamic API: Search

From @jag3773 on July 16, 2015 21:25

From @jag3773 on June 23, 2015 22:16

We need a dynamic API that can return results based on keyword searches.

Copied from original issue: unfoldingWord-dev/Door43#69

Copied from original issue: unfoldingWord-dev/uwadmin#43

There is some interest in having a "no scripture" flag for the languages that have no scripture. @jag3773 , could this be generated from the "resources" block - those with empty resources would be flagged as "no scripture." Would that be close to accurate, or do we have massive holes in our data?
--Perry

Add Source Link Field to Resources section

Offline Progress Form

We need a form for GL teams to record offline progress. Such a form should include:

language name
resource name (drop down)
Status (drop down)
Comment (plain text)

The list of status options needs to be defined.

Add Missing 'Translation Suggestions' Section

From @jag3773 on July 29, 2015 20:16

The translationNotes are now utilizing a new section, 'Translation Suggestions', this needs to be added to the publish utility https://github.com/unfoldingWord-dev/tools/blob/master/obs/json/json_tn_export.py and probably https://github.com/unfoldingWord-dev/tools/blob/master/uwb/tn-kt_export.py. Note that @swilcox may be in the process of importing these to uW Admin site?

Copied from original issue: unfoldingWord-dev/uwadmin#50

Bible Publishing

From @jag3773 on February 9, 2015 18:1

We need to add an interface and guts for publishing Scripture.

This should be based on USFM input. Ideally, the administrator could upload a zip file of a USFM formatted text and then add publishing information about it.

Some cobbled together scripts that are currently publishing from our Etherpad texts are:
https://github.com/Door43/tools/blob/master/uwb/ep_export.py
https://github.com/Door43/tools/blob/master/uwb/api_publish.py
https://github.com/Door43/tools/blob/master/uw/update_catalog.py

Note that ep_export should stay as it is, that just takes the text out of Etherpad and puts it into a Github repo as USFM files.

Copied from original issue: unfoldingWord-dev/uwadmin#5

Allow ordering of networks by the user

The order of the networks seems to be the order in which they are selected. Allow the user to drag network names left or right without having to delete and reselect them in order to change the order.

the user might want to list the networks in an order of importance or use.

Create Visualizations for Gateway Languages

@jag3773 mentioned in previous conversations that he would like to have something like the dendrogram or Node-Link Tree.

We should have visualizations:

By Country
By Gateway Language

Considerations:

We should also consider the Tri-Fold Tree where the part you click on rotates to the 3 O'Clock position.
English is in the center as the source text
- The next layer of nodes should be the 47 gateway languages.
- See the attached Gateway Languages PDF. Note that the DB needs seeded with this information based on region. In other words, every language in the DB needs to have it's gateway set based on its region.
- Note that we'll need to have some meta gateway languages. For instance, for now Papua New Guinea languages should show up with a gateway of "English/Tok Pisin". We know that the languages in PNG should be able to read one or the other of those, but we don't necessarily know the exact mapping at this point.
The outer layer is the rest of the languages.
Every language should be color-coded based on the region it's in (see page 2 of the attached PDF).

The visualization URL should be http://td.unfoldingword.org/gl/

The ring style visualization doesn't seem to scale with the sheer number of languages (over 7500) split over only 47 gateway languages:

Fix Anticipated Completion Date

The calendar beginning with year 0001 isn't very helpful. Maybe start the calendar at today().
Clicking on the right arrow advances the year by one but does not rapidly advance the years.
Show the allowed format after the field name.

MAP: Fix Multiple GLs for given country

We need to fix the issue where only 1 GL shows per country on the map on the main page.

Harvest Ethnologue Maps

The Ethnologue makes low-res, watermarked language maps available for free download on their website, located at consistent URLs based on the ISO 3166 country code (VN = Vietnam, etc.), e.g. http://www.ethnologue.com/country/VN/maps

Given that we know the full list of ISO 3166 country codes, and if the structure is consistent , we need a script to download all the maps available for every country and show them in tD.

Then, generate a PDF with a TOC for each country code (and name) and map thereof.

Allow multiple books to be selected

Allow for multiple books to be selected.

might even have selections such as Pentatuch (G, E, L, N, D), History, Poetic, Major Prophets, Minor Prophets, Gospels, Synpotic Gospels, Gospels and Acts, Pauline, yada, yada. Not sure of the best way but Dave especially saw a need for selection of multiple books.

GL Dashboard - uW Published Material

Page 2 of PDF. Indicate GL language-resource progress based on material published to the catalog.

Update Scripture Field

Given the P3 we are using, the first field under Scripture should be “Is there a desire for translation?” or words to that effect with Yes, No, Unknown/Not Available options.

Rationale: Before capturing the rest of the information about the Scripture, we should document if the local church (network) and people want or have a translation. If not, why spend time now capturing the information? If we want to know what networks don't desire a translation, the system can easily display/print that list for whatever purpose like inputting the available info if someone has the time when the local network isn't interested.

Add a “Year” field to describe the Yes/No/NA field in #1 above.

Add a “How much/What part” field.

Either typed in descriptive field or selection list

Add a “What format” field.

Written, video, etc.

Networks translating

I've noticed some glitches in the tD: under "Networks translating" there is a long dropdown list of bizarre entities, and SIL is not there and can't be typed in, although they do most of the translation work in the world.

Should we remove app words from json files

The json files for the translation text contains "app_words". These are deprecated and should be removed to decrease the file size.

Check out l'Observatoire linguistique as a data source

Jesse,

I am excited to see this slowly coming together, and I’m convinced it will be a very good tool in the end. You mentioned that you currently import data from the Ethnologue, SIL and Wikipedia, with ‘glottolog’ being “worked on” right now. I strongly suggest we also draw information from the Linguasphere Register.

This register covers not only all languages, and where they are spoken, but also indicates here and there which script is in use (e.g. the Cyrillic script for some Caucasian languages). I think the Linguasphere coding system is the one that WCD uses (World Christian Database). (I took just a very quick look on this, so this would need to be verified).

At any rate, we should utilize the huge amount of data that Linguasphere provides.

Blessings

Ralph

GL Dashboard - Combined Overview

Page 3 of PDF. Combine Door43, uW, and Offline activity into a GL language-resource heat map. Relies on #61 #60 #59 .

Add Number of Languages Column

This page should also show a column for the number of languages in each country: http://td.unfoldingword.org/uw/countries/.

GL Dashboard - Online Activity

Page 1 of the PDF. Indicate activity for each GL language-resource combination in a heat map.

Add Alternate Language Names to langnames.json

We need to add all of the known alternate language names to the langnames.json export.
First priority:

Include English names for all languages (we already have this info in http://td.unfoldingword.org/data-sources/wikipedia/). We also have the description field from http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry that has the English name of the language.

Secondary priority:

See if we can find and import other "alternate" names for languages. Possibly scraping ethnologue.

Database Backups

We need to be making daily database backups to our offsite backup server. @swilcox , what do we have on the Gondor side to facilitate this?

Dynamic API: Verse Search

From @jag3773 on July 16, 2015 21:25

From @jag3773 on June 23, 2015 22:15

We need a dynamic API endpoint that returns Scripture. Endpoint should accept verses or verse ranges as well as language and/or version requests.

Copied from original issue: unfoldingWord-dev/Door43#68

Copied from original issue: unfoldingWord-dev/uwadmin#44

Add a Graphviz compatible data export endpoint

tA IDs

From @jag3773 on July 16, 2015 21:2

We need to hash the slug as the ID for translationAcademy articles.

Copied from original issue: unfoldingWord-dev/uwadmin#41

Add ability to add region codes to AdditionalLanguage

GL Translation Progress Tracking Dashboard

We need to track the progress of GL translation efforts. I've attached a PDF of a spreadsheet layout that shows what we are aiming for.
gl_dashboard__mockup_.pdf

Add GL Column

This page should also show a column for the GLs of each country: http://td.unfoldingword.org/uw/countries/.

Add ability to submit a request for a new Living Language

We realize this is the ISO identifier. If we find a language without an identifier, can we (should we) add it to the tD? If so, we need an Add option.

Is the tD only interested in capturing data about “living” languages? This might be a “duh” question; just clarifying.

Version 1 and 2 Export

From @jag3773 on February 9, 2015 18:15

Translations with a source text version number that is lower than the latest source text version number, should be listed in an API endpoint of some sort. This will allow a plugin on our Door43 server to place a notice message for that language that provides information on how to update to the latest version.

Note that English version should be considered the "latest source text version".

Copied from original issue: unfoldingWord-dev/uwadmin#9

Publish tA vol1 to uW Catalog

From @jag3773 on July 29, 2015 20:5

We need to write a publishing routing to get tA vol1 into the catalog.

Each entry should have an ID that is the "slug" field in the source YAML matter. It's possible that we'll need to prefix the slug with vol1-slug and/or vol2-slug in order to avoid collisions (please first check to see if there are any existing collisions before implementing that though). This comment replaces #72 .

Copied from original issue: unfoldingWord-dev/uwadmin#49

Versioned Data Dumps

We need to discuss versioned data dumps of information we are allowed to dump.

Add Geographic Data langnames.json

We need the country code and the geographic region added to the langnames.json end point. An entry should look something like this:

{  "lc": "aa",
   "cc": "NG",
   "lr": "Africa",
   "ln": "Afaraf"
}

Add ability to add/edit/remove Translators

Update OBS Available Translations

This mockup shows the general direction that I would like to see for the static HTML rendering of the "Available Translations of Open Bible Stories" that is embedded on the page unfoldingWord.or/stories.

Some points about it:

the HTML bullet should be the level checking indicator for each respective translation. Note the new icon for "in progress" (fourth language in the list).
the language names are self-names where we have them
the language codes are IETF-compliant
icon 1: low-res icon - links to SD in-browser display
icon 2: hi-res icon - links to HD in-browser display
icon 3: download icon - links directly to PDF download
icon 4: app icon - links directly to unfoldingWord mobile app in Google Play (eventually we will add different icons for Android, iOS, etc.) Note: this icon could be moved to a larger icon at the top of the page for "get all these languages in the mobile apps"
note the "in progress" icon which should link directly to the translation draft for that language in Door43. Note: this link could also be provided for checked translations as well.

Update Regions

We need to make the regions that tD is showing match up with the region list on the GL page and that WA uses. I think that means we add South Asia and change Europe to Eurasia.

Reference https://unfoldingword.org/gateway/ and Help Desk Ticket https://help.door43.org/Ticket/Display.html?id=1479.

compact json files rather than serving as pretty print

We can save about 40KB per json file if we serve them compact e.g. strip out all the extra white space.

Copyright saving issue

And when you type something in the "copyright" field and click "Save," it doesn't save it.

Merge "Is Gateway" Field

In the language lists the "Is Gateway" column should be merge into the Gateway Language column to conserve space.

Import America's Data

Hi Jesse and Patrick

I mentioned that the Americas has data we'd appreciate if you import into the tD. This email has all that info. In addition, we believe we have a list of additional languages to be added. Would you also help with that effort?

I've attached two files. The one whose name begins with "Languages" is a spreadsheet compiled by a WA person working his deputation (as I understand). His name is Todd, hence you'll read "Todd's spreadsheet" in various places. This spreadsheet ("ss" in places) is the source of the info I'd like you to import into the tD.

The other attachment (2 db - 2 "database" - comparison) has 3 tabs. The first is my field by field comparison of Todd's ss with the fields in the tD. On the left side of this ss, I've listed Todd's field name (exactly from the other attachment) and then my best guess at the definition for the field. In fields where it made sense, I have also listed the current values in Todd's ss.

In the first column on the right side of the first tab of the 2 db comparison ss, I have listed the 4 or 5 data fields which I believe are a match (the data in the field means exactly the same thing in Todd's ss as in the tD). Then, I have listed the fields (without an attempt at definitions) of the tD in a hierarchy read from left to right.

The second tab of the "2 db comparison" ss holds a new topic altogether: identification of new languages. I've listed possible new languages in Guinea-Bissau and have compared the list of new languages to those currently listed for G-B. Dave Byron received this list from a Brazilian woman and believes the list does represent as yet unidentified languages. I'm not sure what the procedure is to "add new languages" since there likely will be some official comparison to the ISO and, if new, assignment of new ISO codes.

Please let me know if there's anything I can do to help or clarify with either of these efforts.

Thanks! Karen

Add travis config

From @jag3773 on July 23, 2015 21:13

Copied from original issue: unfoldingWord-dev/uwadmin#45

Create Importer for Glottolog

Single Text Box Entry Form

Create a form that is a single entry box, similar to twitter, that allows people to dump information into it.

The text box should be hashtag aware so that the entry can be tagged with various topics. Once inserted, the entry should become a comment on the relevant page and the data should be mined and inserted into the appropriate fields in the appropriate page.

See example mock up.

Published Date Issue

Also, under "Published date" if you type in a year, it adds another number that seems to be the current month. I think most of the time we will only know the year and that's really all we care about, so I would recommend getting rid of that month feature.

Copied from original issue: unfoldingWord-dev/uwadmin#12

GL Dashboard - Language-Resource Detail

Page 5 of PDF. Show detailed heat map for each GL language-resource page in Door43.