Giter Site home page Giter Site logo

autocompleteapi's Introduction

AutoCompleteApi

Overview

The objective of this repository is to create an autoComplete API that given a user input query, provides suggested queries, based on the search history stored in a csv file and the number of times the queries were typed.

The project was implemented using two different methods. The programming language used is Java with the framework SpringBoot to build the api.

The first one consists in storing all the queries in a Map (text : count). Then, for each input query, we loop over the Map and determine all the queries that starts with the input. The result Map will be sorted according to the values of the count which allows extract the most typed queries. ( branch main of the code)

The second method consists in using a prefix tree (trie). After storing the different queries in a Map (text : count), we loop over the Map to insert each query sentence in the tree. The root node is empty and each node is defined by a hashMap children and an integer endOfSentence to tell whether we have already a complete sentence at this node or not. The insertion is done character by character. Once it is inserted, we set the endOfSentence attribute of the last node to the number of times the query was typed (count Value). Given an input query (prefix), the search is done letter by letter accross the tree which speeds up the search and allows you to eliminate certain branches each time you move forward. ( branch tree-search of the code)

Complexity:

N : max query length M : size of query list data
Time Memory
naive O(M) O(MN)
tree (insert) O(MN) O(MN)
tree (search) O(N) cte

Results:

image

To get the request input in the url, i use the @GetMapping annotation.

Methods comparaison

Then, I compared the two methods performances by sending a certain number of requests and calculating each time the time to execute all of them. The simulation is done with python.

The experience is repeated using 5, 10, 100, 200, 500, 1000, 2000, 3000, 5000 Samples.

First method --> naive Second method --> Tree

The time is displayed by seconds.

image

image

Example with 5000 input query: time to execute all the queries:

  • first method (naive) : ~184s

  • Second method (tree) : ~20s

The second method may be improved by storing the different sentences in the tree decreasingly according to the count value which allows get the 10 most typed queries without having to loop over all the queries starting with the prefix input.

We can also define a new rule to prioritize and manage the case where our result list of suggested queries contains queries with the same number of time typed.

To implement an approximate suggestion logic by measuring the similarity between the input query and the queries stored and defining a threshold to determine whether the words can be considered as similar or not.

To start the api, clone the project and change the fileName in src/main/resources/application.properties with your csv file path then run the project.

autocompleteapi's People

Contributors

yassinelar avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.