Giter Site home page Giter Site logo

caltechlibrary / newt Goto Github PK

View Code? Open in Web Editor NEW
7.0 3.0 0.0 54.69 MB

Newt a microservice for integrating Postgres+PostgREST and Pandoc

Home Page: https://caltechlibrary.github.io/newt/

License: Other

HTML 55.26% Makefile 1.39% Go 32.70% Shell 0.90% Lua 0.02% JavaScript 4.89% CSS 0.82% PLpgSQL 0.14% Batchfile 0.17% PowerShell 0.32% TypeScript 3.39%
data-router microservice pandoc postgresql postgrest

newt's Introduction

Newt Project

Newt is an experimental rapid application development. Specifically Newt is focused on creating web based metadata curation tools. These types of applications are commonly needed in galleries, libraries, archives and museums (abbr: GLAM). Newt makes creating these type of applications easier.

How does Newt do that? Newt generates an application by implementing a service oriented architecture using a combination of off the shelf software and generated code.

You can think of a web application as a sequence of requests and responses. Traditionally a web browser contacts your web site or application then one of two things will happen. Your app knows the answer and hands back the result. Alternatively if it doesn't know the answer it tells you it can't do so (e.g. 404 HTTP STATUS CODE). With service oriented architecture your application has another option. Your application can contact another service and use that result to answer the request from the web browser. Newt's implements a service oriented architecture by orchestrating data processing pipelines1.

With data pipelines we can accept a request and feed that request to one service then take its output and send it to the next service. Newt does this by providing a data router. Newt can manage the request sequence through a simple YAML described pipeline. While it is possible to create pipelines using Apache and NginX proxy features in practice that approach quickly becomes an unmanageable configuration problem. You could encapsulate clusters of processes in containers but this too becomes complex to manage. Newt's router can cut through the hairball of configurations and define pipelines per request route. With Newt's pipelines the last service completed hands its result back to Newt's router which returns the result to the web browser.

The data pipelines are defined in Newt' YAML configuration file. The pipelines are managed by Newt's data router.

Why is this important? Much of the "back end" of a web application is already available as off the shelf software. Here is a short list of examples used by Caltech Library.

This is not an exhaustive list. These types of applications can all be integrated into your application through configuring the connection in Newt's YAML file. Newt's router runs the data pipelines and can host static content.

Wait, what about my custom metadata needs?

Metadata oriented applications share the following operations -- create, retrieve, update, delete and list. These are called CRUD-L2 features. Customization tend towards data models. Newt's configuration file includes simple YAML description of your data models. It uses the data model to render configuration, middleware and templates. Newt's data router ties them all together into an application using service oriented architecture.

Newt provides:

  • a simple data modeler
  • code generation
  • template engine
  • data routing

You can extend your application through browser side enhancements, adding additional data routes and pipelines or through customizing the generated code.

Does Newt clean my house or make cocktails?

Newt is a narrowly focused rapid application development toolbox. Newt will not clean your house or make a cocktail. Additionally it does not support the class of web applications that handle file uploads. That means it is not a replacement for Drupal, WordPress, Islandora, etc. Newt is for building applications more in line with ArchivesSpace but with simpler data models. If you need file upload support Newt is not the right solution at this time.

Newt applications are well suited to interacting with other applications that provide a JSON API. A service with a JSON API can be treated as a JSON data source. A JSON data source can easily be run through a pipeline. Many GLAM applications like ArchivesSpace and Invenio RDM provide JSON API. It is possible to extended those systems by creating simpler services that can talk to those JSON data sources. Newt is well suited to this "development at the edges" approach.

What if those systems aren't available on localhost?

Services not available on localhost must be proxied to integrate with your Newt application. Why is this necessary? Short answer, security. Newt seeks to reduce the attack surface of your web application as much as possible. It does that by only trusting services offered directly on localhost. Newt's application models presumes you're running behind a "front end web server" like Apache 2, NginX or Lighttd. These systems can be configured to provide access control as well as perform proxy services. The front end web server is your first line of defense against a cracker.

External web services integrate through a proxy setup (e.g ORCID, ROR, CrossRef or DataCite). This can be done via the front end web server or by writing a dedicated proxy service. Today writing proxy services us easily accomplished in most popular programming languages due to good support for web protocols. This is true of Python, PHP, JavaScript, Go, Rust and many others.

How does Newt impact web application development?

A Newt application encourages the following.

  • modeling your data simply the result is expressed in YAML
  • preference for "off the shelf" over writing new code
  • use a database management system for managing your data and SQL functions
  • prefer software that can function as a JSON data source
  • transforming data representations by using a light weight template engine
  • code generation where appropriate

If Newt doesn't make cocktails, what is it bringing to the party?

In 2024 there is allot of off the self software to build on. Newt provides a few tools to fill in the gaps.

  • newt is a development tool for data modeling, generating code, running Newt router, template engine and PostgREST or dataset collections
  • ndr is a stateless web service (a.k.a. micro service) that routes a web requests through a data pipelines built from other web services
  • nte is a stateless template engine inspired by Pandoc server that supports the Handlebarsjs template language and is designed to process data from a JSON data source

The Newt YAML configuration ties these together expressing

  • data modeling (descriptions of data as you would provided in a web form)
  • code generation
  • data routes (web requests differentiated by a HTTP method and URL path that trigger processing in a data pipeline)
  • template maps (path/template pairs used that can recieve JSON and render a results)

What type of applications are supported by Newt?

Most GLAM applications are focused on managing and curating some sort of metadata records. Sometimes these metadata records are quite complex (e.g. ArchivesSpace, RDM records) but often they are simple (e.g. a list of authors, a list of citations). Newt's primary target is generating applications to manage simple data models. Simple data models are those which can be expressed in web forms via HTML5 native input elements.

Motivation

Over the last several decades web applications became very complex. This complexity is expensive in terms of reliability, enhancement, bug fixes and software sustainability. Newt is an attempt to address this by reducing the code you write and focusing your efforts on declaring what you want.

In 2024 the back end of web applications can largely be assemble from off the shelf software. Middleware however remains complex. I believe this to be a by product of inertia in software development practices and the assumption that what is good for "Google Scale" is good for everyone.

I think a radical simplification is due. Newt is intended to spark that conversation. My observation is most software doesn't need to scale large. In the research and GLAM communities we don't routinely write software that scales as large as Zenodo. We don't typically support tens of thousands of simultaneous users. I think we can focus our efforts around orchestrating off the shelf components and put our remaining development time into improving the human experience using our software. A better human experience is an intended side effect of Newt.

A big key to simplification is narrowing the focus of our middleware. When our middleware has to implement everything it becomes very complex. Look at Drupal and WordPress. They implement data modeling, data management, user accounts, access management, data transformation.

I think our web services should be doing less, much less. Our web services should be narrowly focused. Conceptually simpler. Do one or two things really well. Newt enables using simpler discrete services to build our applications.

Working with off the shelf deliverables

Take the following as a "for instance".

  • (data management) Postgres combined with PostgREST gives you an out of the box JSON API for managing data
  • (access control) Apache 2 or NGINX combined with Shibboleth for access control and communicating with the web browser
  • (rich client) Web browsers now provide a rich software platform in their own right
  • (optionally you can full text search) Solr gives you a powerful, friendly, JSON API for search and discovery

With the above list we can build capable applications relying on the sophisticated features of our web browsers. This is true even without using Newt. There is a problem though. If we only use the above software to build our application we must rely on JavaScript (or WASM module) running in the web browser to interact with the server. This sounds simple. In practice this is a terrible idea3.

What we should do is use Newt to tie those JSON services together and send rendered HTML back to the web browser. Newt's router provides static file service and a means of pipelining our JSON data source through a template engine. Newt provides a Handlebars4 template engine for that purpose. Newt provides the missing bits from my original list so we don't need to send JavaScript down the wire to the web browser. The Newt approach uses less bandwidth, fewer network accesses and less computations cycles on your viewing device. The Newt approach takes advantage of what the web browser is really good at without turning your web pages into a web service. Newt YAML describes the system you want. You get the Newt capabilities without writing much code. Maybe without writing any code if Newt's code generator does a sufficient job for your needs.

A "newt" baseline

Web services talk to other web services all the time. This isn't new. It isn't exotic. Newt scales down this approach to the single application.

  • Can we align access control with our front end web server?
  • Can we insist on our database management system providing a JSON API?
  • Can we treat the output of one web service as the input for the next?
  • Can we aggregate these into data pipelines?
  • Will that be enough to define our web application?

In Spring 2024 for metadata curation apps I think the answer is "yes we can".

What comes with the Newt Project?

The primary tools are.

  • newt a developer tool for building a Newt based application which includes modeling, code generation support, and runtime support
  • ndr a web service designed for working with other "off the shelf" web services. It functions both as a router and as a static file server. It does this by routing your request through a YAML defined pipeline and returning the results. Typically this will be a JSON data source and running that output through a template engine like Newt's nte.
  • nte provides a Handlebars template engine for transforming JSON data into a more human friendly format.

See the user manual for details.

About the Newt source repository

Newt is a project of Caltech Library's Digital Library Development group. It is hosted on GitHub at https://github.com/caltechlibrary/newt. If you have questions, problems or concerns regarding Newt you can use GitHub issue tracker to communicate with the development team. It is located at https://github.com/caltechlibrary/newt/issues. The name comes from wanting a "[New t]ake" on web application development.

Getting help

The Newt Project is an experiment!!. The source code for the project is supplied "as is". Newt is a partially implemented prototype (May 2024). However if you'd like to ask a question or have something you'd like to contribute please feel free to file a GitHub issue, see https://github.com/caltechlibrary/newt/issues.

Currently Newt is targeted at Windows on X86, macOS on x86 and ARM (e.g. M1, M2) and Linux on aarch64 (ARM 64) and x86.

Building from source

Newt is experimental so doesn't have installers yet. If you are compiling from source the following software is required.

  1. Git to retrieve the Newt repository
  2. Golang >= 1.22 (used to compile command line programs and services except newthandlebars)
  3. Deno >= 1.44 (used to build newthandlebars, note not available on Windows ARM or 32bit Linux)
  4. Pandoc >= 3.1 (used to build version.go, version.ts and documentation)
  5. GNU Make and Bash shell (you can cross compile for Windows and macOS from Linux)

Steps to compile

  1. From your home directory clone the repository
  2. change into the repository directory
  3. Run Make
  4. Run make test
  5. Run make install
cd $HOME
git clone [email protected]:caltechlibrary/newt src/github.com/caltechlibrary/newt
cd  src/github.com/caltechlibrary/newt
make
make test
make install

Footnotes

  1. A data pipeline is formed by taking the results from one web service and using it as the input to another web service. It is the web equivalent of Unix pipes. Prior art: Yahoo! Pipes

  2. CRUD-L, acronym meaning, "Create, Read, Update, Delete and List" or alternately "Create, Retrieve, Update, Delete and List". These are the basic actions used to manage metadata.

  3. See https://infrequently.org/2024/01/performance-inequality-gap-2024/ for a nice discussion of the problem.

  4. Handlebars is largely a superset of the Handlbars Template Language, see https://handlebarsjs.com and https://mustache.github.io for details.

newt's People

Contributors

rsdoiel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

newt's Issues

newtinit, a way of generating your starter Newt YAML file

Learning a YAML syntax can be overwhelming (at least to me). It's helpful to get a hand generating one to modify. Codemeta faced a similar problem which they solved (or mitigated) via the Codemeta generator linked from codemeta.github.io. Newt will need something like that to bootstrap a Newt YAML file. That way learning the Newt YAML syntax can be incremental while you learn Newt program usage.

This could be addressed one of three ways (or all of three ways)

  • website generator, maybe as a web component?
  • a simple command line app that asks questions, offers defaults then renders a starter Newt YAML file
  • a GUI app that could be used to manage a Newt YAML file

Newt Mustache GET results

It would be helpful if doing a GET to Newt Mustache return either a 404 when no POST route is supported or the definition of the route's template map.

Generators for common data types, models and applications

Libraries, Archives and Museums have some common standards implemented around records and field types. Neet as a rapid development tool should be able to generate these models out of the box.

E.g. DOI or ROR can be validated and declaring in your model that a variable of type DOI or ROR should take care of that. See idutils Python package for ideas.

Libraries also have some common models, Dublin Core, Portland Common Data models.Archives have things like EAD. These could be implemented via generators for common use cases.

There are some common applications too. Blogs/News Letters, bibliographical lists, holdings oists, author and organizational lists. A generator could render an example Newt YAML file for these along with a set of SQL files and Pandoc templates.

Supporting generators could allow faster uptake of Newt by new developers or fast prototyping for experienced ones.

Newt runner improvements

Right now newt can run both newt router and mustache engine. To make development convenient it would be nice to have it also manage starting and stopping PostgREST when the scheme/sql get changed or data base reloaded. To be able to do that I would need to be able to do the following

  • launch PostgREST with the right config file tracking the pid of the process
  • newt would need to handle signals like SIGTERM and SIGHUP with notification also sent to PostgREST.
  • newt would need to handle SIGKILL with a message that indicates the PostgREST process is could still be running.

SQL model generation is broken

As of v0.0.4 the SQL model generation is incorrect. The full type definition used in a CREATE TABLE statement is seeping into CREATE FUNCTION definitions. This renders the functions as errors so models fail to be available to PostgREST.

newmustache needs to support Mustache templates that contain partials.

Given a starting template, newtmustache needs to load both the primary templates and any partial templates at startup of the web service. It could do this by if all the templates are in the same directory. Then the JSON API to render the template could take a primary template name and render the result while still supporting keeping templates simple like Pandoc templates on the command line.

Example: In feeds the citation.tmpl defines how to cite a record and can be used whenever HTML content is required. Using a partial template allows standardize output of the elements across the feeds website. Similarly application that handle person or organizational names could benefit from an ability of resolving partials when the template engine is provided by a web service.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.