Giter Site home page Giter Site logo

laws-africa / bluebell Goto Github PK

View Code? Open in Web Editor NEW
7.0 7.0 0.0 969 KB

Bluebell is a generic Akoma Ntoso 3 parser.

Home Page: https://laws.africa/open-law-technology

License: GNU General Public License v3.0

Makefile 0.02% Python 94.55% XSLT 5.44%
akoma-ntoso

bluebell's People

Contributors

goose-life avatar longhotsummer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

bluebell's Issues

Support ordered list (<ol>)

perhaps, to mirror the BULLETS markup, something like

NUMBERS
  1. sdfsdf
  2. dfsfs
  3. sdfsdf

where the space denotes the end of the num portion (although I realise lis don't have nums)

Escaping when unparsing may be incorrect

This is the code that handles escaping both block elements and inlines when unparsing:

  <!-- first text nodes of these elems must be escaped if they have special chars -->
  <xsl:template match="a:*[self::a:p or self::a:listIntroduction or self::a:listWrapUp]/text()[not(preceding-sibling::*)]">
    <xsl:call-template name="escape">
      <xsl:with-param name="text" select="." />
    </xsl:call-template>
  </xsl:template>

However, I don't think this is correct. Surely we should be escaping inlines for everything, not just p, listIntroduction and listWrapUp? What about headings and crossheadings?

Additionally, I don't think it's possible to have a text() tag that does have a preceding sibling! I think this is a hangover from slaw where we only escaped the first paragraphs of some things.

Implicit hier elements

eg.:

1. The text of the paragraph

should be recognised as a paragraph.

  • parser support for number + text
  • unparse when number matches what parser would match
  • handle escapes in parser
  • apply escapes when unparsing

Be more tolerant of empty elements on a round trip

eg. <heading/> and <subheading/> shouldn't result in errors on a round trip. They don't need to be preserved, necessarily, but we should be able to handle unparsing and re-parsing them.

eg. this doesn't work, but it should SEC 2. -

Fully support attributes and classes for hier and block elements

Ensure that the grammar and the XSLT fully support attributes and PARA.foo style class names.

Some of the elements do (eg. tables and table cells), and some of the grammar does when parsing, but not when unparsing.

It should also only allow valid attribute names, and drop invalid ones.

  • hierarchical elements
  • BULLETS and ITEMS
  • ITEM inside ITEMS
  • crossheadings

Smart sections and subsections

Eg. parse:

1. some text at the start of a paragraph,

  that also wraps

as

PARAGRAPH 1.
  some text at the start of a paragraph,

  that also wraps
  • parse
  • unparse
  • ignore escaped number when parsing
  • insert escapes when unparsing

Support multi-line editorial remarks

What happens with indents? I suspect we should enforce consistent indents

eg.

SEC 1
  SUBSEC (a)
    some text
    [[this remark
      spans multiple
lines and
      indents]]

address duplicate ids sanely

ref #21 #22

  • always prefer num if there is one
  • use human-friendly number if there isn't one (e.g. not 1 if it's the sixth paragraph in a section)
  • don't end up with duplicate ids

e.g.

PARA 
    Intro

PARA 1.
    First para

PARA 1A.
    Added in later

PARA
    Unnumbered

PARA 2.
    Second (actually third/fourth/fifth, depending on who's counting) para.

PARA 2.
    Another para with the num 2.

PARA 2.3-4.5.
    Another para with the num 2.

PARA 2.3-4.5.
    Another para with the num 2.

PARA 2.3-4.5_1
    Another para with the num 2.

should have the following as their eIds:

  • para_nn-1
  • para_1
  • para_1A
  • para_nn-4 para_nn-2
  • para_2
  • para_2_1
  • para_2.3-4.5
  • para_2.3-4.5_1
  • para_2.3-4.5_1_1

(Not finalised)

Support nested bullets

The following markup should work in theory but doesn't:

BULLETS
* Level 1

  BULLETS
  * Level 2

Support provisos

Complicated legislation like Income Tax Acts can have provisos to provisos, and provisos containing deeply nested elements. Delineating where precisely the proviso ends will help readers as well as future drafters understand the structure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.