Giter Site home page Giter Site logo

asnunes / notion-page-to-html Goto Github PK

View Code? Open in Web Editor NEW
156.0 6.0 41.0 841 KB

NodeJS tool to convert public Notion pages to HTML from page ID

License: MIT License

JavaScript 0.48% TypeScript 99.32% Makefile 0.14% Shell 0.06%
notion-pages html html5 equation

notion-page-to-html's Introduction

Cover image

Notion Page To HTML

NodeJS tool to convert public notion pages to HTML.

Also available as public API:

https://notion-page-to-html-api.vercel.app/

Supported features

Most of the native Notion blocks are currently supported:

  • Headings
  • Text With Decorations
  • Quote
  • Image
  • YouTube Videos
  • Code
  • Math Equations
  • To-do
  • Checkbox
  • Bulleted Lists
  • Numbered Lists
  • Toggle Lists
  • Divider
  • Callout
  • Nested blocks

Embeds and tables are not supported yet.

Why notion-page-to-html?

It's perfect as content manager system

  • This tool can get any public page from Notion and convert it to html. This is perfect for the ones who want to use Notion as CMS. Once it gets page content from Notion, it becomes completely independent (images are converted to base64 so you do not have to call Notion again to get content). You can convert a page and then make it private again.

It's fully customizable

  • You can choose how you want to get page content. Do you want title, cover, and icon in html body? You can do that! Do you want they apart of html so you can choose where place it? You have it. Do want html without style? Without Equation and Code Highlighting scripts? Do you want body content only? You have those options too.

Basic Usage

Install it in a NodeJS project using npm

npm install notion-page-to-html

Then, just import it and paste a public Notion page url

const NotionPageToHtml = require('notion-page-to-html');

// using async/await
async function getPage() {
  const { title, icon, cover, html } = await NotionPageToHtml.convert("https://www.notion.so/asnunes/Simple-Page-Text-4d64bbc0634d4758befa85c5a3a6c22f");
  console.log(title, icon, cover, html);
}

getPage();

cover is a base64 string from original page cover image. icon can be an emoji or base64 image based on original page icon. html is a full html document by default. It has style, body, MathJax and PrismJS CDN scripts by default. You can pass some options to handle html content.

NotionPageToHtml.convert(
  'https://www.notion.so/asnunes/Simple-Page-Text-4d64bbc0634d4758befa85c5a3a6c22f',
  options,
);

options is an object with the following keys

Key Default value If true
excludeCSS false returns html without style tag
excludeMetadata false returns html without metatags
excludeScripts false returns html without script tags
excludeHeaderFromBody false returns html without title, cover and icon inside body
excludeTitleFromHead false returns html without title tag in head
bodyContentOnly false returns html body tag content only

Development and testing

  1. Clone this application

  2. Make sure you have node v14 or higher and then install all dependencies

npm i

Running tests:

npm test

Installing locally in another project:

npm run build
npm pack

Inside your project:

npm i /path/to/tar/gz

Docker approach for testing

  1. Make sure you have Docker and Docker Compose installed and then run:
make test

Contributing

We love your feedback! Feel free to:

  • Report a bug
  • Discuss the current state of the code
  • Submit a fix
  • Propose new features
  • Become a maintainer

Just create a GitHub issue or a PR ;)

notion-page-to-html's People

Contributors

8byteagency avatar asnunes avatar brunobmello25 avatar dependabot[bot] avatar dianait avatar smor avatar soneji avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

notion-page-to-html's Issues

Outputs the contents of nested pages too

When you try to get the content of a page, it also adds the HTML content of any nested page.

Take this page as an example-
https://therohitdas.notion.site/Rohit-Das-99163302a0c44cd086292fd8f7637c5e
I have added a page "Testing something don't mind" page at the very bottom.

The output -
https://notion-page-to-html-api.vercel.app/html?id=99163302a0c44cd086292fd8f7637c5e
When you visit this link, you can see the content of the nested page at the very bottom.

What would be nice?

  • Only the top-level page is fetched and rendered as HTML.
  • All nested pages are just https://notion-page-to-html-api.vercel.app/html?id=[ID FOR NESTED PAGE] So when you click on those links you will be actually fetching the nested page. And this cycle can keep on going.

FYI - I also hosted the API on Vercel and this is my endpoint: https://notion-page-to-html-api.vercel.app Don't use it for testing. as it points to a fork of this repo.

Private pages?

Hi, this does exactly what I need. For my use case, all the pages I want to convert are private. I'm willing to help submit a PR because it doesn't look like any other library can parse Notion blocks into HTML.

Alternatively, I'm already using the nishan library to talk to the API, and I really just need an exposed method that allowed me to pass blocks into it.

I'm not sure which is easiest, but again happy to submit a PR.

Example?

Two ideas to make this project more approachable

  1. A public example of input Notion and output/resulting hosted HTML file somewhere

  2. (If I'm understanding this project correct) Ideally there'd be a way to run this without writing any code, via npx, to have a single command to take a public notion URL and covert it to static HTML that can be hosted anywhere. (I'm also looking at https://github.com/leoncvlt/loconotion as something similar)

Apologies if I'm misunderstanding what this is intended for though!

Unable to find any public page NPM latest

NotionPageToHtml.convert("https://www.notion.so/asnunes/Simple-Page-Text-2-4d64bbc0634d4758befa85c5a3a6c22f").then((page) => console.log(page));

Error: Can not find Notion Page of id 4d64bbc0-634d-4758-befa-85c5a3a6c22f. Is the url correct? It is the original page or a redirect page (not supported)?

My setup:
npm version 8.3.1
node v16.14.0

Same error on:
https://npm.runkit.com/notion-page-to-html

Subpages Support

Works well however does not support sub-pages, here is a demo code:

const NotionPageToHtml = require('notion-page-to-html');
const fs = require('fs');

async function getPage() {
  const { title, icon, cover, html } = await NotionPageToHtml.convert("XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX");
  console.log(title);

  fs.writeFile("./out.html", html, function(err) {
    if(err) {
        return console.log(err);
    }
    console.log("The file was saved!");
  });
}

getPage();

Subpages would be much more useful, especially for generating static pages with github actions

404 on every page I've tried so far

After updating from 1.1.2 to 1.1.3, every page I try to read throws "Can not find Notion Page of id [id]. Is the url correct? It is the original page or a redirect page (not supported)?"

I looked through this, but still not sure why it is being triggered:
e190430

Here are some example page ids that work with 1.1.2 but not 1.1.3:
22425fea05234ab282bb79f5b81881c4
640f9f67-56c6-4f6b-a371-81de2b2b16fd
71f15f9d-3da7-457d-b32e-b5363a402c2e
736b903b-3e9b-4dcd-973c-9262bb16f545

Databases and table support

Hello there! Outstanding job with this! ๐Ÿš€
Is there any plan to introduce support for databases and/or tables?

Code formatting in Notion, not always well rendered.

Hello,

I think, but I'm not sure, that I've found a tiny bug in your API and it would make me so happy if you could take a minute to look at it, that would be great.
Let me give you some background.

I make my articles in Notion and then I use a process with n8n to send it to WordPress.
To do this, I use your API, which transforms the notion page into HTML. It's a crazy thing, and then in seconds it arrives as a WordPress Post and it's done, all I have to do is add an image made by ChatGPT. It' so easy, thank you!

Everything works fine as long as there's no code in the text.

When there is code, the code is present but not quite correctly formatted.
To help you understand better, I'll give you an example:

Here's the Notion page: https://lmvi.notion.site/Test-e3e31ae11fc94ab38d6c12ed98a226ae
Here's the page created by your wonderfull API: https://notion-page-to-html-api.vercel.app/html?id=e3e31ae11fc94ab38d6c12ed98a226ae

If you had a couple of minutes to correct it, it would be so great and so wonderful.
1000 thanks for your help and your tool.
Sincerely
Jean-Marc Henry

Unexpected token e in JSON at position 0

I'm getting the following error:

SyntaxError: Unexpected token e in JSON at position 0
    at JSON.parse (<anonymous>)
    at IncomingMessage.<anonymous> (/node_modules/notion-page-to-html/dist/utils/usecases/http-get/node-http-get.js:74:44)
    at IncomingMessage.emit (node:events:406:35)
    at endReadableNT (node:internal/streams/readable:1329:12)
    at processTicksAndRejections (node:internal/process/task_queues:83:21)

When trying to use it with this page:

https://www.notion.so/smarterlabs/Setting-Up-a-New-Vercel-Website-584a1c27a10642d8869473588a5c1b45

I can get it to work just fine with the example page you have in the readme. So I'm assuming it's something on that page the notion-page-to-html module doesn't support.

Bug with uploaded images

Hi
I seem to be having an issue with uploaded images.

This is the URL I am trying to use with the module https://www.notion.so/dhavalsoneji/2c5dd1f8b26840d7ba882d1490a4a917

I get this error:

error - uncaughtException: SyntaxError: Unexpected token P in JSON at position 0
    at JSON.parse (<anonymous>)
    at IncomingMessage.<anonymous> (/Users/dhaval/git-clones/portfolio/node_modules/notion-page-to-html/dist/utils/usecases/http-get/node-http-get.js:72:44)
    at IncomingMessage.emit (node:events:406:35)
    at IncomingMessage.emit (node:domain:475:12)
    at endReadableNT (node:internal/streams/readable:1343:12)
    at processTicksAndRejections (node:internal/process/task_queues:83:21)

Which is happening because stringData is trying to be JSON parsed, but its value is:

PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPEVycm9yPjxDb2RlPkFjY2Vzc0RlbmllZDwvQ29kZT48TWVzc2FnZT5BY2Nlc3MgRGVuaWVkPC9NZXNzYWdlPjxSZXF1ZXN0SWQ+WENKU0tNQzM5VEhQMTU5RjwvUmVxdWVzdElkPjxIb3N0SWQ+bjV2T3I4NEpURk5CWE5Ed1RKeWN4U0FocU5pUDJqSkd2U2dEcFl6ckcwQU5uQ1B6cUNUWHZLODJxZndFOE1OakwxOFFuNlpxSkU4PTwvSG9zdElkPjwvRXJyb3I+

Which b64 decodes to:

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>AccessDenied</Code>
  <Message>Access Denied</Message>
  <RequestId>XCJSKMC39THP159F</RequestId>
  <HostId>n5vOr84JTFNBXNDwTJycxSAhqNiP2jJGvSgDpYzrG0ANnCPzqCTXvK82qfwE8MNjL18Qn6ZqJE8=</HostId>
</Error>

I believe this is happening due to an image I uploaded to notion for the page cover. The API by default doesn't give us a useful URL to get the image.
It will give something like:
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/40b79211-1ae6-427f-8b3f-85216732792a/Untitled.png
Which is inaccessible

I think a solution is to check if the image url contains notion's aws and use notion's image endpoint

if (image.includes("amazonaws.com") && image.includes("secure.notion-static.com")) {
    image = "https://www.notion.so/image/" + encodeURIComponent(image) + "?table=block&id=" + id
}

Where id is the ID of the page.

This should give something like:
https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F40b79211-1ae6-427f-8b3f-85216732792a%2FUntitled.png?table=block&id=2c5dd1f8-b268-40d7-ba88-2d1490a4a917

Which is properly accessible

MathJax Rendering

The MathJax rendering is erroneous. For example, it doesn't recognize \\ as a marker for the equation to be rendered on the next line.

resume releases?

Hello! amazing project thanks to everyone involve and especially @asnunes

I saw that there is no release since 5 months now, can we do a new release in NPM? I need this fix c03060a so we can use it in a project

For now I'll clone the project and publish it in a private npm repository.

Some pages have incorrectly rendered images

Demo page and code:

import NotionPageToHtml from "notion-page-to-html";
import fs from "fs";

async function getPage() {
  const { html } = await NotionPageToHtml.convert(
    "https://jmlecoach.notion.site/Travis-Scott-x-Jordan-1-Low-OG-Olive-2ed7e492a0ed4588970efdf9a69ce954"
  );

  fs.writeFile("./out.html", html, function (err) {
    if (err) {
      return console.log(err);
    }
    console.log("The file was saved!");
  });
}

getPage();

The output ends up hiding some images and subtitles that are present in the original page.

Error when inputting a page that has an uploaded cover image

Whenever I try to get the HTML of a page with no cover or cover from Unsplash, everything works correctly. The moment I upload my own cover, I get

A server error has occurred
FUNCTION_INVOCATION_FAILED

When I look into the Vercel log, I get:

[GET] /?id=Page-with-a-custom-cover-65f2a5a62d7344768e95cfa91b423618
10:44:58:94
2022-01-29T09:45:01.776Z	1e9b4de1-f8c6-409f-a7f1-09860b5d1548	ERROR	Uncaught Exception 	{"errorType":"SyntaxError","errorMessage":"Unexpected token P in JSON at position 0","stack":["SyntaxError: Unexpected token P in JSON at position 0","    at JSON.parse (<anonymous>)","    at IncomingMessage.<anonymous> (/var/task/node_modules/notion-page-to-html/dist/utils/usecases/http-get/node-http-get.js:71:44)","    at IncomingMessage.emit (events.js:412:35)","    at endReadableNT (internal/streams/readable.js:1334:12)","    at processTicksAndRejections (internal/process/task_queues.js:82:21)"]}
Unknown application error occurred

I've deployed this app on https://notion-to-html.vercel.app/, I've been testing it with this Notion database - no-cover page and Unsplash cover page work, custom cover does not. Thank you for this amazing extension, I hope we can find a solution, I'll gladly follow up with further information if needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.