Giter Site home page Giter Site logo

anthrax63 / mupdf-js Goto Github PK

View Code? Open in Web Editor NEW

This project forked from andytango/mupdf-js

0.0 1.0 1.0 7.37 MB

πŸ“° Yet another Webassembly PDF renderer for node and the browser

Home Page: https://andytango.github.io/mupdf-js-demo/index.html

License: GNU Affero General Public License v3.0

Shell 1.61% JavaScript 19.09% TypeScript 47.80% Makefile 4.40% C 23.03% EJS 4.08%

mupdf-js's Introduction

πŸ“° MuPDF.js

This is a port of MuPDF to javascript and webassembly, giving you the following:

  • πŸ”₯ Blazing fast rendering of PDFs to PNG, SVG and even HTML
  • πŸ’Ό Run in the web browser or your server. Basically any platform that supports Webassembly!
  • β˜‘ Supports Typescript
  • πŸ—ΊοΈ A super simple API that's also completely flexible, see below...

🏁 Getting Started

yarn add mupdf-js
# or
npm i mupdf-js

Basic Usage

Before you do any processing, you'll need to initialise the MuPdf library:

import { createMuPdf } from "mupdf-js";

async function handleSomePdf(file: File) {
  const mupdf = await createMuPdf();
  
  //...
}

In the browser, you'll most likely retrieve a File or Blob object from an html <input type="file"> tag, supplied by a user.

You'll need to convert the file firstly to an ArrayBuffer, then to a Uint8Array:

import { createMuPdf } from "mupdf-js";

async function handleSomePdf(file) {
  const mupdf = await createMuPdf();
  const buf = await file.arrayBuffer();
  const arrayBuf = new Uint8Array(buf);
  
  //...
}

Once you have this, you can load the file into the MuPdf environment, creating a MuPdf document:

import { createMuPdf } from "mupdf-js";

async function handleSomePdf(file) {
  const mupdf = await createMuPdf();
  const buf = await file.arrayBuffer();
  const arrayBuf = new Uint8Array(buf);
  const doc = mupdf.load(arrayBuf);
}

You now have three different options to render the PDF document:

import { createMuPdf } from "mupdf-js";

async function handleSomePdf(file) {
  const mupdf = await createMuPdf();
  const buf = await file.arrayBuffer();
  const arrayBuf = new Uint8Array(buf);
  const doc = mupdf.load(arrayBuf);
  
  // Each of these returns a string:
  
  const png = mupdf.drawPageAsPNG(doc, 1, 300);
  const svg = mupdf.drawPageAsSVG(doc, 1);
  const html = mupdf.drawPageAsHTML(doc, 1);
  
  // This method returns Uint8Array
  const pngRaw = mupdf.drawPageAsPNGRaw(doc, 1, 300);
}

Conversion Options

PNG

// Returns PNG as data uri string
mupdf.drawPageAsPNG(document, page, resolution); 

// Returns PNG data as Uint8Array
mupdf.drawPageAsPNGRaw(document, page, resolution); 

Arguments:

  • document: a MuPdf document object
  • page: the page number to be rendered, starting from 1
  • resolution: the DPI to use for rendering the file

Returns: an uncompressed PNG image, encoded as a base64 data URI.

SVG

mupdf.drawPageAsSVG(document, page);

Arguments:

  • document: a MuPdf document object
  • page: the page number to be rendered, starting from 1

Returns: an SVG file with the PDF document rendered as image tiles.

HTML

mupdf.drawPageAsHTML(document, page);

Arguments:

  • document: a MuPdf document object
  • page: the page number to be rendered, starting from 1

Returns: an HTML file that uses absolute positioned elements for layout.

Text operations

Get text from page

mupdf.getPageText(document, page);

Arguments:

  • document: a MuPdf document object
  • page: the page number to be rendered, starting from 1

Returns: string containing all text collected from page

Search on the page

mupdf.searchPageText(document, page, searchString, maxHits);

Arguments:

  • document: a MuPdf document object
  • page: the page number to be rendered, starting from 1
  • searchString: string to search
  • maxHits: the maximum possible number of matches (it stops search when reaches this limit)

Returns: array of found rectangles of text matches ({x: number, y: number, w: number, h: number}[])

You should set maxHits to an appropriate level that a user would expect (for example 100), or allow users to set their own limit. Alternatively, if you want to allow effectively unlimited search hits (and risk running out of memory), you can set it to C's maximum unsigned 32-bit integer size, which is 4294967295.

Manual context management

By default, mupdf-js creates a MuPDF context upon initialization and uses it for all calls. However, since the context includes a cache, over time this can lead to an increase in the application's memory consumption. To manage the context independently, mupdf-js supports the following:

import { createMuPdfWithoutContext } from "mupdf-js";

async function handleSomePdf(file) {
  const mupdf = await createMuPdfWithoutContext();
  const ctx = mupdf.createContext();
  const buf = await file.arrayBuffer();
  const arrayBuf = new Uint8Array(buf);
  const doc = mupdf.load(ctx, arrayBuf);
  
  // Each of these returns a string:
  
  const png = mupdf.drawPageAsPNG(ctx, doc, 1, 300);
  const svg = mupdf.drawPageAsSVG(ctx, doc, 1);
  const html = mupdf.drawPageAsHTML(ctx, doc, 1);
  
  // This method returns Uint8Array
  const pngRaw = mupdf.drawPageAsPNGRaw(ctx, doc, 1, 300);
  mupdf.freeDocument(doc);
  mupdf.freeContext(ctx);
}

Custom logging

By default, console.log and console.warn are used for printing errors and other messages. If you prefer to use your custom logger (e.g., pino), you can do the following:

import {createMuPdf} from "mupdf-js";
import pino from 'pino';

const logger = pino();

async function handleSomePdf(file) {
  const mupdf = await createMuPdf();
  mupdf.setLogger({
    log: (...args: any[]) => {
      logger.debug(...args);
    },
    errorLog: (...args: any[]) => {
      logger.error(...args);
    },
  });
  //...
}

Contributing

See CONTRIBUTING.md

License

AGPL, subject to the MuPDF license.

mupdf-js's People

Contributors

andytango avatar dependabot[bot] avatar anthrax63 avatar ihoey avatar malena205 avatar

Watchers

James Cloos avatar

Forkers

rawsh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.