Giter Site home page Giter Site logo

digitallinguistics / dlx2html Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 190 KB

A JavaScript library for converting linguistic data to HTML

License: MIT License

JavaScript 98.86% HTML 1.14%
digital-humanities digital-linguistics glossing language linguistics morphology language-documentation

dlx2html's Introduction

dlx2html

A JavaScript library for converting linguistic to HTML for presenting on the web.

Written using modern ES modules, useable in both Node and the browser.

When writing about linguistic data, linguists use a format called an interlinear glossed example which shows each of the parts of a word (morphemes) and their meanings. This allows people who are not familiar with the language under discussion to read the examples and understand their structure and meaning. Below is a very simple example from Swahili:

# Swahili
ninaenda
ni-na-end-a
1SG-PRES-go-IND
I am going

These interlinear glossed examples follow a very specific format, originally specified in the Leipzig Glossing Rules. Another specification, called DaFoDiL, formalizes how such data should be structured when being stored as JSON or worked with as a plain old JavaScript object (POJO).

The dlx2html library takes one or more interlinear glosses in the DaFoDiL format and converts them to HTML for representing linguistic examples on the web.

The dlx2html library does not add any styling to the output HTML. Users should either add their own CSS styles, or use the compatible Digital Linguistics Style Library. The structure of the output HTML and CSS classes are described below.

If using this library for research, please cite it using the model below:

Hieber, Daniel W. {year}. @digitallinguistics/dlx2html. https://github.com/digitallinguistics/dlx2html/. DOI: 10.5281/zenodo.10720085.

Samples

The following pages demo the HTML output from the library. They are styled using the DLx styles library.

Usage

This library is written in JavaScript, and may be run as either a Node.js module or as a script in the browser. See the Node.js learning path for more information about Node.js, how to install it, and how to run programs with it.

Node.js

To use dlx2html in Node:

  1. Install the package.

    npm install @digitallinguistics/dlx2html
    # OR
    yarn add @digitallinguistics/dlx2html
  2. Import the package and use it to convert the data to HTML.

    // Import the dlx2html module.
    import convert      from '@digitallinguistics/dlx2html'
    import { readFile } from 'node:fs/promises'
    
    // Load the data from a JSON-formatted DaFoDiL file.
    const json = await readFile(`examples.json`, `utf-8`)
    const data = JSON.parse(json)
    
    // Convert the text to HTML.
    const html = convert(data, { /* specify options here */ })
    
    console.log(html) // <div class=igl>...</div>

Browser

To use dlx2html in the browser:

  1. Download the latest version of the library from the releases page. Copy the dlx2html.js file to your project.

  2. Import and use the script in your code:

    <script type=module>
    
      // Import the dlx2html module
      import convert from './dlx2html.js'
    
      // Get a reference to your data.
      const json = document.body.innerText
      const data = JSON.parse(data)
    
      // Convert the text to HTML.
      const html = convert(data, { /* specify options here */ })
    
      // Insert the HTML into your page.
      document.body.innerHTML = html
    
    </script>

API

Calling the dlx2html function returns an HTML string.

Options

Option type Default Description
abbreviations Object {} An object hash providing the full descriptions of gloss abbreviations, e.g. "sg" => "singular". If present, these will be used to populate the title attribute of <abbr> elements for glosses. Note that the abbreviations are case-sensitive.
analysisLang String undefined An IETF language tag to use as the default value of the lang attribute for any data in the analysis language (metadata, literal translation, free translation, glosses, literal word translation). If undefined, no lang tag is added, which means that browsers will assume that the analysis language is the same as the HTML document.
classes Array ['igl'] An array of classes to apply to the wrapper element.
glosses Boolean false Options for wrapping glosses in <abbr> tags.

If set to false (default), no <abbr> tags are added to the glosses.

If set to true, an <abbr> tag is wrapped around any glosses in CAPS, any numbers, and any of sg, du, or pl (lowercased). Note that text within the <abbr> will be converted to lowercase, since by convention glosses are rendered in smallcaps (and uppercase letters display as normal uppercase letters even when in smallcaps).
tag String 'div' The HTML tag to wrap each interlinear gloss in. Can also be a custom tag (useful for HTML custom elements).
targetLang String undefined An IETF language tag to use as the default value of the lang attribute for any data in the target language.

HTML Structure

This section describes the structure of the HTML output by this library, and the classes added to the HTML elements. You can see sample HTML output by the program in the samples/ folder, as well as the DLx Styles library.

Note: The output HTML does not contain much extraneous whitespace and therefore is not very human-readable. If you want more readable output, use a formatting library like Prettier to format the result.

Each utterance/example in the original data is wrapped in a <div class=igl> element by default. You can customize both the tag that is used for the wrapper and the classes applied to it with the tag and classes options. For example, to wrap each utterance in <li class=interlinear>, you would provide the following options:

const options = {
  classes: [`interlinear`],
  tag:     `li`
}

You can apply three different types of emphasis to the data:

Scription HTML Output Renders As
***text*** <strong>text</strong> text
**text** <em>text</em> text
*text* <b>text</b> text
_text_ <u>text</u> text

Additional Notes

  • The speaker (\sp) and source (\s) data are combined into a single element strutured as follows: <p class=ex-source>{speaker} ({source})</p>.
  • Notes fields (\n) are not added to the HTML by default.
  • Individual glosses receive the .gl class.

CSS

The CSS classes for each line type are as follows:

Line CSS Class
metadata ex-header
source ex-source
transcript trs
phonemic transcription txn
phonetic transcription phon
word transcription w
morphemic analysis m
glosses glosses
literal translation lit
timespan timespan
free translation tln
word translation wlt

If the language of the text is specified, it is set as the value of the lang attribute for data in the target language wherever relevant. Whenever the language of analysis data (metadata, glosses, translations, etc.) is specified, it is passed through to the lang attribute of the relevant analysis language elements (<p class=tln lang=en>).

When the data occurs in multiple orthographies, the orthography of the data is specified in the data-ortho attribute. For example, the following data is transformed to the HTML that follows:

\trs-Modern  Wetkx hus naancaakamankx wetk hi hokmiqi.
\trs-Swadesh wetkšˊ husˊ na·nča·kamankšˊ wetkˊ hi hokmiʔiˊ.
\tln         He left his brothers.
<p class=trs data-ortho=Modern>Wetkx hus naancaakamankx wetk hi hokmiqi.</p>
<p class=trs data-ortho=Swadesh>wetkšˊ husˊ na·nča·kamankšˊ wetkˊ hi hokmiʔiˊ.</p>
<p class=tln>He left his brothers.</p>

dlx2html's People

Contributors

dwhieb avatar

Stargazers

Holden avatar Dr. Hunter Thompson Lockwood avatar

Watchers

 avatar

dlx2html's Issues

example source

Transform the speaker data and source data into a single example source tag.

Support transforms for custom line codes

The scription2dlx library passes through undefined/unknown line data to the final object without processing them by default.

A custom lines option should allow the user to pass a hash mapping custom line codes -> rendering functions.

The user will most likely want to pass a custom utterance transform or custom line ordering as well. Otherwise, custom lines are added to the end of the utterance.

Output HTML should match the number and order of the lines in the original

The output HTML should match the number and order of the lines in the original.

This won't be directly possible once #56 is complete, because this library will no longer have information about the original scription data. This will probably have to be an option. Even though, it would be a very complicated option, and might be out of scope. This perhaps is better handled by post-processor plugins.

Allow custom transformers for utterance-level fields

Allow the user to pass custom line parsers for utterance-level fields.

Explain the file structure of the library in this option's details.

Could exemplify this with the metadata line since that's very straightforward.

Might want to pass the Utterance object to each of the line methods too, so users can access that context if needed.

add content to readme

Do this after the library is fully written.

Add user guide to readme.

Don't deploy to GitHub Pages - just use the GitHub repo's readme.

  • purpose
  • citation
  • installation
  • basic usage

Support custom transforms for utterance

The user should be able to provide a custom transform for the utterance, which accepts all of the line data as one argument, and the utterance object as a second argument, and must return an HTML string.

This allows the user to add custom lines or other markup around existing lines, or to reorder lines. One specific use case is to add notes lines at the end of the interlinear example (which by default aren't processed).

This runs after individual line transforms are run.

Support example labels

Support the addition of a label to the left of the example, independent of the counter number.

Not clear what the HTML markup for this should be.

This should be specified globally, and applied to all utterances. If the user needs to customize the labels on a per-utterance basis, they should call the function multiple times.

Support morphemic analysis

Though the DLx data has a morphemes property which contains information, this library should only populate HTML based on the word's analysis field (the morphemic analysis for the word).

If the user wants to construct the morphemic analysis line from the morphemes array, they should do so using the scription2dlx library (which populates the field automatically).

https://scription.digitallinguistics.io/#morphemes

Support line labels

Allow the user to specify labels to display in front of lines.

Not sure what the HTML structure for this should be, or how the user should specify it.

This should be specified globally. If users need to customize line labels on a per-utterance basis, they should call the function multiple times.

option: `glosses`

Support a glosses option with the following values:

status value description
false (default) Do not wrap glosses in <abbr> tags.
true Any glosses in CAPS will be wrapped in an <abbr> tag. Warning: This will also wrap acronyms like NSF or all-caps words like I.

You don't need to parse the string of glosses to do this. Just search for numbers or sequences of capital letters to wrap in abbreviations. You don't even need to separate the combined person-number glosses; the <abbr> tags should be butted up next to each other with no separator.

option: tag

Add a tag option which allows the user to specify the HTML tag to wrap the interlinear gloss in.

option: targetLang + analysisLang

The targetLang option should allow the user to pass an IETF language code to use as the default value of the lang attribute in the HTML.

When applying this to the phonetic transcription, include the -fonipa subcode.

set up Cypress for HTML testing

Set up Cypress for testing the HTML output of the library. Try to use Cypress' component testing functionality, and write your own custom mount() command.

utterance metadata

Parse the utterance metadata line and output it as the example header.

option: attributes

Add an attributes option which allows the user to provide a hash of attributes and their values to be added to the outer element of the interlinear gloss.

option: classes

Add a classes option which allows the user to provide an array of CSS classes to apply to the outer element of the interlinear gloss.

bundle for browsers

Use esbuild to bundle the script for browsers, if it winds up being more than one file.

option: errors

Add an errors option which allows the user to specify how to handle errors.

  • true: Throw on errors. (default)
  • false: Fail silently, skip the utterance.

Create sample output for visual testing

Create sample HTML output for visual testing. This should make testing a lot easier.

  • Create sample HTML output that updates as part of the build step.
  • Apply DLx styles to the sample HTML.
  • Deploy project documentation + sample HTML to GitHub pages.
  • Include links to sample HTML in the readme.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.