Giter Site home page Giter Site logo

h-dong / serina Goto Github PK

View Code? Open in Web Editor NEW
9.0 0.0 0.0 871 KB

Natural Language Parser for date and time in Javascript

Home Page: https://serina.netlify.com

License: MIT License

TypeScript 98.91% HTML 0.99% Shell 0.11%
javascript natural-language-processing date-parser time-parsing javascript-library typescript

serina's Introduction

Serina

Serina v2 is under active development (currently in alpha), so please continue to use v1.x.x until v2 becomes table.


GitHub Release GitHub Issues Open GitHub Last Commit GitHub License

Netlify Build Status

Natural Language Parser (NLP) for date and time in Javascript.

Serina Demo: serina.netlify.com

Introduction

Serina can parse English phrases and return an object that is easier to work. This project is inspired by Sherlock project. The name comes from the Xbox Game "Halo wars", where she was the Artificial Intelligence of the UNSC navy ship - Spirit of Fire.

Installation

Simply run npm install serina

Basic browser setup

Just include Serina in a script tag. You can access its various classes through the serina global.

<script src="serina.umd.js"></script>
var parsed = serina('Remind me to buy milk tomorrow 3pm');

Node

Install via NPM:

npm i --save serina
const serina = require('serina');

var parsed = serina('Remind me to buy milk tomorrow 3pm');

ES6

import serina from 'serina';

const parsed = serina('Remind me to buy milk tomorrow 3pm');

Usage of Library

var parsed = serina('Remind me to buy milk tomorrow 3pm'); // assuming it is currently 29th Oct 2017

// assuming current time is 2019/09/10 1pm
console.log(parsed);
// console output
{
    "original": "Remind me to buy milk tomorrow 3pm",
    "isValid": true,
    "matches": [
        {
            "text": "Remind me to buy milk tomorrow",
            "dateTime": "2019-09-10T15:00:00.000Z",
            "matched": "3pm"
        },
        {
            "text": "Remind me to buy milk 3pm",
            "dateTime": "2019-09-11T00:00:00.000Z",
            "matched": "tomorrow"
        },
        {
            "text": "Remind me to buy milk",
            "dateTime": "2019-09-11T15:00:00.000Z",
            "matched": "tomorrow 3pm"
        }
    ]
}

Publish

The recommended way is to publish using Github Actions, by

  1. Changing version in package.json to x.y.z
  2. Push a commit with message "Release x.y.z", this will then generate tag as well as publish to npm.

Here's an example:

Now, when someone changes the version in package.json to 1.2.3 and pushes a commit with the message Release 1.2.3, the npm-publish action will create a new tag v1.2.3 and publish the package to the npm registry.

Or publish locally if all else fails.

npm run publish

Progress

This project is currently developed by just me, so can't say when the library will be ready. But just to give a high level breakdown, here is everything I'm planning to include:

Version 1 โœ…

  • Parse weekdays e.g. tue, tuesday
  • Parse day e.g. 11th, 2nd
  • Parse month e.g. july, jan
  • Parse year e.g. 2018, 9999
  • Parse time e.g. 5pm, 5:00am, 15:00
  • Parse combined day, month and year e.g. 11th June 2019, 11/09/2018
  • Parse incomplete date formats e.g. 20/08 or Jan 2020
  • Parse combined date and time e.g. 20/10/2019 8pm, 11th 14:00
  • Parse day of week with time e.g. 4pm Mon, Tuesday 5:30pm
  • Parse relative time e.g. in half an hour, 4 hours from now
  • Parse relative days e.g. today, tomorrow, a week from now
  • Parse relative dates e.g. next year, 2 weeks from now
  • Parse combined relative date and time e.g. a week from now 2pm
  • Parse keywords such as noon, midnight, mid day

Version 2

  • Rewrite Serina to stop using Luxon as peer dependency
  • Improve Regex logic to make it easier to maintain
  • Review unit tests
  • Add support for parsing yesterday
  • Bug fixes

Todo

  • Parse seconds e.g. 15:30:22
  • Parse date range e.g. tue - thu, 4th july to 8th aug
  • Parse time range e.g. between 5pm and 8pm
  • Parse international date formats better e.g. 2018/06/21
  • Parse text that contains number words e.g. one, twelve
  • Parse more time related words e.g. noon, midnight
  • Parse more UK keywords e.g. oxt, fortnight
  • Parse more advanced time e.g. seconds, millisecond
  • Timezone support #59

Why remove Luxon and reinvent the wheel?

There are two main considerations for removing Luxon:

  • This project now has zero dependencies (exclude dev-dependencies of course).
  • Took inspirations from Day.js and implement a DayLite class to handle previous Luxon date operations. Now, with full control over the date-time logic, it's possible to move more complex date operations into the DayLite class. Hopefully this will translate to simpler Serina utility files and have them more focused on NLP.

The idea is to release DayLite as a separate library at some point, for now it is easier to keep it within this project.

Edge cases / Limitations

People could express dates & time in many different ways, and sometimes there's no one clear logical choice. In these situations, I'll try to list them here so everyone's aware about these edge cases and what the expected outcome should be. If people have any suggestions for these decisions feel free to raise an issue about it where we can discuss it in more detail. I'm happy for any of these to be challenged!

Resolve "Next 31st" when the current/next month doesn't have 31 days

Given current date is 20th February, the logical month for "next" in this case should be February itself since 31st is greater than 20th. However, February only has 28 or 29 days depending on if it is a leap year. The current logic is to skip Feb and look for "next month which has 31 days". So in this case Serina will resolve "next 31st" to be 31st March.

Week day v.s. day of the week

Week day normally refers to Mon - Fri and excludes weekends, but for the sake of simplicity (and looking at other libs) I decided it is much easier to just refer to "day of the week" as "week day" in the code.

Only match year 1000 - 9999

Currently the library only find matched YEAR between (1000 - 9999). This could be a limitation for some people, so we may need to come back and address this. Please raise an issue if this is an problem for you.

Multiple identical matches in the same string

It was decided to always prioritize the first match in these situations e.g. "catch the 2:20pm bus at 2:20pm". This decision was made due to the primary function of the library being time and date conversion. By utilizing the first match, the resulting date object will always be consistent. Allowing for multiple matches would result in duplicate suggestions, potentially leading to a poor user experience.

serina's People

Contributors

dependabot[bot] avatar h-dong avatar hdong92 avatar matthiasgu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

serina's Issues

ES5 compatibility

We should revisit the source code and see what was used to ensure ES5 compatibility. Or maybe ES6 compatibility is enough?

  • Loops for...of or for...in
  • Regex look behind and look ahead MDN

Review `g` in all regex

Using global in Regex checks seems to have to some weird behaviours. See https://stackoverflow.com/a/2630538. Should review all g usages, if it is safe to remove then we should remove them.

One thing to consider is how well does these Regex check work with long text, maybe the ones that includes tab, new line and etc. Maybe need to check for m (multi-line) instead of g?

Refactor 7 lines occurring 2 times in 2 files: dates.ts, time.ts

I've selected for refactoring 7 lines of code which are duplicated in 2 file(s) (1, 2). Addressing this will make our codebase more maintainable and improve Better Code Hub's Write Code Once guideline rating! ๐Ÿ‘

Here's the gist of this guideline:

  • Definition ๐Ÿ“–
    Do not copy code.
  • Whyโ“
    When code is copied, bugs need to be fixed in multiple places. This is both inefficient and a source of regression bugs.
  • How ๐Ÿ”ง
    Avoid duplication by never copy/pasting blocks of code and reduce duplication by extracting shared code, either to a new unit or introduce a superclass if the language permits.

You can find more info about this guideline in Building Maintainable Software. ๐Ÿ“–


โ„น๏ธ To know how many other refactoring candidates need addressing to get a guideline compliant, select some by clicking on the ๐Ÿ”ฒ next to them. The risk profile below the candidates signals (โœ…) when it's enough! ๐Ÿ


Good luck and happy coding! :shipit: โœจ ๐Ÿ’ฏ

Add optional filler word to date

Since we have added optional filler word to date-time match, we should do the same for date matches too just to be consistent.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.