Giter Site home page Giter Site logo

ang-zeyu / infisearch Goto Github PK

View Code? Open in Web Editor NEW
41.0 41.0 1.0 61.43 MB

Easy and flexible client-side search for static sites

Home Page: https://infi-search.com

License: MIT License

Rust 49.72% HTML 35.39% TypeScript 7.75% CSS 2.90% JavaScript 4.01% Makefile 0.23%
jamstack javascript rust search static-site wasm

infisearch's Introduction

Hi there ๐Ÿ‘‹

My most notable projects here are โญ

  • InfiSearch, A semi-scalable end-to-end client-side search solution powered by a static pre-built index. Scalability was achieved by fragmenting the static file index into numerous files for on-demand retrieval. It consists of a multi-threaded CLI indexer written in Rust, a search library written in Web Assembly (Rust), and a user interface written in Typescript.
  • MarkBind, An open-source static site generator tailored for education use cases, allowing users to write markdown, Nunjucks and various custom syntaxes.

Directory to other projects

These are some non exhaustive projects I've worked on as part of university coursework, and various smaller pet projects to pick up certain skills that I've uploaded to Github. For my professional experience, you can refer to my LinkedIn profile and/or resume:

Web Development
  • AppTracky, a drag-and-drop Kanban job application tracking app built on Spring Boot, Spring Security, utilising various AWS services (ECS, ECR, IAM, S3, S3 presigning, RDS, SSM Parameter store), supporting federated identity with OAuth/OIDC auth.
  • Sudoku game built with Next.js, Supabase and Deployed with Vercel
  • A simple blogging web app allowing posts to be written in Markdown, written in React + GraphQL, Node.Js with MongoDB (link)
  • An online marketplace website with an auto price discounting feature for sellers, written with Nunjucks, Node.Js, and MongoDB (link)
  • A simple dating application, allowing users to upload photos, view/edit user profiles, written with Angular + .NET Core (link)
  • Anime recommendation catalogue One of my first ever web projects, made using only raw HTML, CSS and JS. A fun attempt to create an anime "fandom/recommendation" website with features like a persistent "watch progress" bar.
  • Weathery Minimalistic weather site made using an external api and React
  • LargestHistogramVisualizer Step-by-step visualizer in React.js for visualizing the solution to the largest rectangle under a histogram problem.
Mobile Development
  • IOT Music Song Recommender Team coursework project to build an IOT-powered music song recommender using ambient weather data. Uses React Native, some BLE libraries and Texas Instruments CC2650 sensor. I was responsible for the bulk of React Native development and some custom firmware (C) development for the sensor.
  • Quizzy Native Android app that generates 10 random MCQ questions on any topic of your choice, powered by ChatGPT.
Computer Graphics (C++) ๐Ÿ–ผ๏ธ
  • Simple Mesh Editor A well-featured but basic 3D modelling tool. This was initially part of a coursework. I had fun implementing many additional features beyond the requirements, ending with a proof-of-concept mesh editor.
  • Many other less notable coursework assignments related to computer graphics I haven't uploaded. Please contact me if you'd like to see them, or we can talk more about it in person ๐Ÿ™‚
Misc.
  • A simple containerised setup for a react frontend, an express backend, proxied by an nginx container, showcasing basic familiarity with Docker.
  • TravelPal: A travelling companion app, written in java and JavaFX for a software engineering course. (core developer) (personal contribution ~= 10k loc)
  • Https Proxy Multi-threaded C++ Https proxy made with OpenMP
Game Development
Links

infisearch's People

Contributors

ang-zeyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

lhybdv

infisearch's Issues

Explore the use of cloudflare workers

Context: This suggestion was brought up here. There is an existing project https://github.com/wilsonzlin/edgesearch that does this but is more API focused.

A lot of Morsels' code is already setup to split index files, similar to edgesearch. If the performance improvements of using Cloudflare are substantial, this could help to improve the scalability of Morsels even further.

What's missing is:

  • alternate code paths to retrieve index files and field stores from Cloudflare
  • APIs to query and return results
  • Providing easy ways to upload the index files to cloudflare.
  • The architecture could be generalized roughly like so:
    cloudflare

Support "enum/category" fields, and UI category filters

Documents may have domain specific 'categories' (e.g. sunny vs rainy vs gloomy weather forecasts).

While it is possible to perform category filtering using the following,

+mycategoryfield:(sunny warm)

It requires understanding the query syntax. A UI multi-select dropdown could be provided instead.

Regular 'text' fields are also an inefficient storage for this, as each document may only take one enum / category value, and positions are not important.

A specialised 'enum' field could be supported that maps enum values to integer values. These integer values can be bitpacked for compression. A 50000 document collection with 16 categories maximum for example can be stored in just 50000 * 4 bits / 8 = 25000 bytes = 25KB.

Incorrect metadata.json url

Hi!

It's me again.
I've been curious to implement the latest version of your great library!

However, I encountered a little issue when hosting the data.

image

It looks like the URL is not correctly inferred.

After a little investigation, I guess that the issue comes from here:

const metadataUrl = `${innerUrl}/metadata.json`;

Where the innerUrl already has a /

const innerUrl = `${searcherOptions.url}${indexVer}/`;

For an obscure reason, no problem for local dev, but once deployed, the search-ui fails to load the data :/

Marketing ideas

Apologies for the unsolicited marketing ideas. SCR... I was searching for this billion dollar library for a long time (meilisearch/meilisearch-rust#67). Sorry to see the project is not getting enough attention it deserves. I think, it is mostly due to the keywords, SEO and the pitch.

Here are my 2 cents:

  1. Compare the keywords, pitches, etc with the competitors (Typesense, Algolia, ElasticSearch, Meilisearch, Site Search 360, Klevu)
  2. For example, look at the title, description and topics used by the new and successful player in this space https://github.com/typesense/typesense
  3. Create a new website. Again compare with https://typesense.org/
  4. Create head-to-head comparison with the competitors. Again, compare with https://typesense.org/typesense-vs-algolia-vs-elasticsearch-vs-meilisearch/
  5. Remove mdBook related pitch from the main README. They should be sub-products
  6. Remove limitations from the README. Perhaps move them to subpages
  7. Fix the demo to accept 1 characters. Currently "No results" for less than 4 characters search is confusing. If that's not possible, need to display "Type 4 or more characters to get results"
  8. Create a demo using https://github.com/algolia/instantsearch.js (See, how MeiliSearch has done it https://github.com/meilisearch/instant-meilisearch)
  9. Fork and add comparison in https://mosuka.github.io/search-benchmark-game/
  10. Create a server-based and CLI-based search options, like Algolia, etc (I think, that should be easy as you used Rust). It can be Enterprise-only too.
  11. Possibly rename the project to include "search" in the product name (?)
  12. Publish the crate
  13. Think about moving few parts to Enterprise (open core approach) with some sort of licensing, for making it a commercial success.

Note: I can contribute now and then for 1-6 and bit on 9. My advance wishes for your success.

Support integer/timestamp/numeric fields

This could be a generalisation of #4.

Documents may have associated numeric values, the most common of which would be timestamps (e.g. blog posts made on a certain date).

Morsels only indexes texts, but these can be mapped to integer values through some configuration (e.g. parsing a datetime string with a specific format into a 64bit unix timestamp).

On the API side, there could be support for:

  • less than, greater than, in between operations (1 < x < 1000)
  • sorting results by integer fields instead of BM25

On the UI side,

  • before/after dropdowns specifically for date times could be supported. (e.g. Any time vs After Dec 20 2005)
  • a sort-by dropdown could be supported. (e.g. by relevance, X field or Y field)

In terms of priority, timestamp fields should come first. The use case for generic numeric fields in static site search may be less common.

Search is broken when indexed data contains special symbols

We encountered an issue when our data contained a ^C which during indexing got included in the resulting JSON. This symbol is not allowed in JSON and makes it invalid. When the client side tries to read JSON it can't parse it and the search doesn't work.

Maybe the indexer should remove symbols not allowed in JSON to avoid such situations.

TypeScript client

Hello!

First of all, thank you for this great work.
I'm interested in client-side search solutions, and this is one of the few that also keeps trace of the latest id in the page.

One of the difficulty I encounter with this library is that I can only initiate the UI with initiateMorsels that is exported in the bundle file (search-ui.*.bundle.js).
As I am currently using Next JS (with TypeScript) to build the website, it would be very convenient to have access to a typescript client, installable from npm or yarn.

Currently, the only way I see to update the UI would be to redefine the types mentioned on this page.
This still makes custom handling of event quite complex to handle (e.g. client-side routing).

At the moment, I'm using pagefind. This library emits a .js file that is quite simple compared to the search-ui bundle, so I managed to simply overwrite part of it.

However, I'd still be very interested to avoid any overwriting of .js files and directly use a TypeScript client, even though I believe that emitting such a library would make the handling of a WebWorker more difficult for the user.

How difficult and useful do you think this kind of feature could be?

Support for PDF Files

Would it be possible to support PDF files?
We would like to create a new book with mdbook and use PDF files as content here. Therefore it would be very good if the search would also have the content of PDF files in the index.

toml version conflict

The latest version of mdbook 0.4.32 upgraded the toml package to version 0.7.6 (commit link),

which conflicts with the toml 0.5 referenced in the projects' packages/mdbook-infisearch/Cargo.toml.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.