Giter Site home page Giter Site logo

regionize's People

Contributors

danielepolencic avatar evnbr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

regionize's Issues

Lazy load the document

Context: I'm here because I wanted to use Bindery.js.

We currently have a snippet of code that does this:

import Bindery from 'bindery'

const HTML = <script dangerouslySetInnerHTML={{__html: `(${useBindery.toString()})()`}}>
function useBindery() {
  Bindery.makeBook({...})
}

In our case, the HTML is rendered server-side and shipped to the browser for execution.
The script fails with the current error:

ReferenceError: document is not defined
    at Object.<anonymous> (node_modules/regionize/dist/regionize.cjs.js:109:24)
    at Module._compile (internal/modules/cjs/loader.js:1200:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1220:10)
    at Module.load (internal/modules/cjs/loader.js:1049:32)
    at Function.Module._load (internal/modules/cjs/loader.js:937:14)
    at Module.require (internal/modules/cjs/loader.js:1089:19)
    at require (internal/modules/cjs/helpers.js:73:18)
    at Object.<anonymous> (node_modules/bindery/dist/bindery.cjs.js:25:17)
    at Module._compile (internal/modules/cjs/loader.js:1200:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1220:10)

Upon further inspection, I noticed that regionize expects the document to be immediately available:

image

Unfortunately, document is not defined in Node.js (where we run the code).
I was wondering if you'd be open to a PR to fix that.
The change is: all the functions that use the document do so lazily.

Widows, orphans, and keep-together

Tracking task for

  • Preventing a split between sibling elements, ie keep a heading and following paragraph together
  • Preventing orphaned or widowed lines, ie one line of a paragraph on a different page

Requests

For those reading this: Please feel free to upvote or add other requests. Expected in the release after the current 1.0.0 is stabilized, ie, unlikely in the next few months.

Regionize redesign

Preface / context

  1. This is one of the best libraries for paginating books. I tried Weasyprint, paged.js, etc. This is by far the easiest and simplest implementation. It also has the best results. Thank you.
  2. I'm interested in printing PDFs.
  3. You can ignore this issue. I'm writing here because I thought it'd be interesting to discuss, but I don't expect a response or a full or partial redesign as a consequence of this ticket.

What happened

I dived into the code for regionize because I found a few edge cases while printing PDFs with Bindery.js.
In particular:

Sometimes blocks are not split properly:

image

I believe I'm hitting this code block: https://github.com/evnbr/regionize/blob/master/src/flowIntoRegions.ts#L129
Please notice that I have ~30-40 similar blocks and this issue happens only once.

It's not possible to keep title + text on the same page or any other component:

image

This is tricky because we have no control over how the elements are split: https://github.com/evnbr/regionize/blob/master/src/tryInNextRegion.ts#L24

Lastly, the code is written to be recursive.
It's rather hard to reason about the flow and it's hard to include more complex rules (i.e. if you split a text after the header, move both to the new page).

Proposal

I tried to play with the code and come up with a way to extend it to cater for:

  1. keeping elements next to each other.
  2. deciding how to split elements.

I found it very hard with the current design. There's a lot of complexity in the tryInNextRegion function and the code caters for a lot of edge cases already.

So I tried to come up with an alternative implementation.

I designed a function that given a Region and an element, it can return 3 states:

  1. the element fit in the region.
  2. the element does not fit at all in the region.
  3. the element fits in the region, but it should be split.

The function signature is the following:

type HTMLOperation =
  | { type: 'nospace' }
  | { type: 'split'; split: { left: Element; right: Element }[] }
  | { type: 'fit' };

function Paginate(region: Region, content: Element): HTMLOperation

I originally designed the split object to be {left: Element, right: Element} to indicate that the element can only fit partially in the page. It became apparent very quickly that this has the same limitation of nearestMovableElement. So I refactored the code to return all possible splits for the element (hence the array of left, right elements { left: Element; right: Element }[]).

Let me explain with an example.
Imagine having an unordered list with 4 elements. Only 3 elements fit, the fourth has to be moved to a new page.
The Paginate function returns a split element with an array that has:

  1. {left: ['<li>','<li>','<li>'], right: ['<li>']}
  2. {left: ['<li>','<li>'], right: ['<li>','<li>']}
  3. {left: ['<li>'], right: ['<li>','<li>','<li>']}
  4. {left: [], right: ['<li>','<li>', '<li>','<li>']}(all items to the new page)

How can I choose the best option?
By default, we can fall back to the nearestMoveAbleElement, but now that we have all four combinations, we can ask the user to score them and we select the one with the highest score.

function scoreSplits(splits: { left: Element; right: Element }[]): number[]

const [bestSplit] = scoreSplits(splits).sort((a, b) => b - a)

This should solve the problem of deciding if an element is worth splitting and how many elements it should contain.
As an example, I could decide to split the list to have 2+2 <li> elements because I don't want a single element on a new page.

The other benefit of the Paginate function is that we only check one element at the time. There is no recursion (apart from the split part that has been encapsulated in the function itself).

So in theory, the new code could be:

const allElementsToPaginate = [...document.querySelector('main').childNodes]
for (const element of allElementsToPaginate) {

  const operation = Paginate(region, element)

  if (operation.type === 'fit') {
    // add to current page
  }
  if (operation.type === 'nospace') {
   // move to next page
  }
  if (operation.type === 'split') {
   // add to current page, and move to the next one
  }
}

However, because allElementsToPaginate is a list of elements, it's easy to look ahead and decide if an element should stay next to another. Example:

function checkIfElementShouldStaytogether(currentElement: Element, nextElement: Element): boolean

const allElementsToPaginate = [...document.querySelector('main').childNodes]
for (let i =0; i <  allElementsToPaginate.length i++) {
  const element = allElementsToPaginate[i]

  const operation = Paginate(region, element)

  if (operation.type === 'fit') {
    // add to current page
  }
  if (operation.type === 'nospace') {
    // look to the next element
   const nextElement = allElementsToPaginate[i+1]
   if (checkIfElementShouldStaytogether(element, nextElement)) {
     // move this element to the next page
   } else {
    // keep element in the current page
   }
  }
  if (operation.type === 'split') {
   // add to current page, and move to the next one
  }
}

Since the Paginate function does not change the region (only checks if the element fits) we can write a scheduler that takes into account of the splits, the current, previous and next elements.

Show me the code

the following code implements a broken version of the algorithm above for demo purposes:

type HTMLOperation =
  | { type: 'nospace' }
  | { type: 'split'; split: { left: Element; right: Element }[] }
  | { type: 'fit' };

type TextOperation =
  | { type: 'nospace' }
  | { type: 'split'; left: Text; right: Text }
  | { type: 'fit' };

export class Region {
  constructor(
    private box: Element,
    private innerContainer: Element,
    private maxHeight: number,
  ) {}
  appendChild(node: Node): void {
    this.innerContainer.appendChild(node);
  }
  removeLastChild(): void {
    if (this.innerContainer.childNodes.length === 0) {
      return;
    }
    const [lastChild] = Array.from(this.innerContainer.childNodes).slice(-1);
    this.innerContainer.removeChild(lastChild);
  }
  hasOverflowed(): boolean {
    const { height } = this.box.getBoundingClientRect();
    return height > this.maxHeight;
  }
  enter(): Region {
    if (this.innerContainer.childNodes.length === 0) {
      throw `Cannot enter an empty node`;
    }
    const [lastChild] = Array.from(this.innerContainer.childNodes).slice(-1);
    if (lastChild.nodeType !== Node.ELEMENT_NODE) {
      throw `Cannot enter a ${lastChild.nodeType} node`;
    }
    return new Region(this.box, lastChild as Element, this.maxHeight);
  }
}

export function Paginate(region: Region, content: Element): HTMLOperation {
  region.appendChild(content);

  if (!region.hasOverflowed()) {
    region.removeLastChild();
    return { type: 'fit' };
  }

  region.removeLastChild();

  const childNodes = [...content.childNodes].filter(
    (it) => it.nodeType === Node.TEXT_NODE || it.nodeType === Node.ELEMENT_NODE,
  ) as Array<Element | Text>;

  function cloneEmpty(element: Element): Element {
    const copy = element.cloneNode() as Element;
    copy.innerHTML = '';
    return copy;
  }

  for (let i = 0; i < childNodes.length; i++) {
    const child = childNodes[i];
    switch (child.nodeType) {
      case Node.TEXT_NODE: {
        const operation = addText(region, (child as Text).cloneNode() as Text);
        switch (operation.type) {
          case 'fit': {
            continue;
          }
          case 'nospace': {
            if (i === 0) {
              return { type: 'nospace' };
            }
            const split = range(0, i).map((index) => {
              const right = cloneEmpty(content);
              const left = cloneEmpty(content);
              childNodes
                .slice(0, i)
                .forEach((it) => left.appendChild(it.cloneNode()));
              childNodes
                .slice(i)
                .forEach((it) => right.appendChild(it.cloneNode()));
              return { left, right };
            });
            return { type: 'split', split };
          }
          case 'split': {
            const split = range(0, i).map((index) => {
              const right = cloneEmpty(content);
              const left = cloneEmpty(content);
              if (childNodes.length === 1) {
                left.appendChild(operation.left);
                right.appendChild(operation.right);
                return { left, right };
              }
              childNodes.slice(0, i).forEach((it) => {
                if (index === i) {
                  left.appendChild(operation.left);
                } else {
                  left.appendChild(it.cloneNode());
                }
              });
              childNodes.slice(i).forEach((it) => {
                if (index === i) {
                  right.appendChild(operation.right);
                } else {
                  right.appendChild(it.cloneNode());
                }
              });
              return { left, right };
            });
            region.removeLastChild()
            return { type: 'split', split };
          }
          default:
            break;
        }
      }
      case Node.ELEMENT_NODE: {
        const operation = Paginate(region.enter(), child as Element);
        switch (operation.type) {
          case 'fit':
            continue;
          case 'nospace': {
            if (i === 0) {
              return { type: 'nospace' };
            }
            const split = range(0, i).map((index) => {
              const right = cloneEmpty(content);
              const left = cloneEmpty(content);
              childNodes
                .slice(0, i)
                .forEach((it) => left.appendChild(it.cloneNode()));
              childNodes
                .slice(i)
                .forEach((it) => right.appendChild(it.cloneNode()));
              return { left, right };
            });
            return { type: 'split', split };
          }
          case 'split': {
            const split = range(0, i)
              .map((index) => {
                const right = cloneEmpty(content);
                const left = cloneEmpty(content);
                childNodes.slice(0, i).forEach((it) => {
                  left.appendChild(it.cloneNode());
                });
                childNodes.slice(i).forEach((it) => {
                  right.appendChild(it.cloneNode());
                });
                const splits = operation.split.map((it) => {
                  const l = left.cloneNode().appendChild(it.left);
                  const r = right.cloneNode().appendChild(it.right);
                  return { left: l, right: r };
                });
                return splits;
              })
              .reduce((acc, it) => acc.concat(it), []);
            return { type: 'split', split };
          }
        }
      }
      default:
        throw 'Did not expect to get here';
    }
  }

  return { type: 'nospace' };

  function addText(region: Region, textNode: Text): TextOperation {
    region.appendChild(textNode);
    if (!region.hasOverflowed()) {
      return { type: 'fit' };
    }
    const originalText = textNode.nodeValue ?? '';
    const chunks = originalText.split(' ');
    let currentChunk = chunks.length;
    while (region.hasOverflowed() && currentChunk >= 0) {
      textNode.nodeValue = chunks.slice(0, currentChunk).join(' ');
      currentChunk--;
    }
    if (currentChunk < 0 && region.hasOverflowed()) {
      return { type: 'nospace' };
    }
    return {
      type: 'split',
      left: document.createTextNode(
        chunks.slice(0, currentChunk + 1).join(' '),
      ),
      right: document.createTextNode(
        chunks.slice(currentChunk + 1, chunks.length).join(' '),
      ),
    };
  }
}

function range(start: number, end: number) {
  return [...Array.from(Array(1 + end - start).keys())].map((v) => start + v);
}

Thanks for reading till the end.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.