Giter Site home page Giter Site logo

israelviner / cpp_search_engine Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 519 KB

In this project I implemented a search engine, including building a database from the Internet, and providing the possibility to perform different types of search queries. The search can also be performed over the network using the TCP protocol.

Makefile 10.98% C++ 89.02%

cpp_search_engine's Introduction

Search Engine Project

In this project, I implemented a search engine system, which includes building a database by crawling URLs and performing various types of search queries on the data. Search results are sorted by relevance, determined by the number of occurrences of the search terms on the web page. The search can also be performed over the network using the TCP protocol.

Architecture:

In planning the project, efforts were made to separate the various classes as much as possible using interfaces so that object transfer is done only through interfaces. Several design patterns were also used, including singleton, factory, command, and proxy.

The project is divided into several modules:

  • The data module consists of the database itself, which inherits from two different interfaces: set_data and get_data, which provide precise access to building and retrieving data from the database.

  • The crawl module encompasses the entire area of building the database. At the beginning of the process, the crawler class receives from the configuration file the URLs, the number of pages the software is requested to scan, and other various conditions, and iteratively performs the crawling process. At each stage, one web page is sent for analysis and data processing.

  • The search module handles the search area. The software supports three different types of search: search by a single word, search by multiple words, and search when certain words are not present. In the first stage, a searcher object is created, loaded with the relevant query, and passed to the search_factory class, which identifies which search type each query belongs to and instantiates the appropriate search class accordingly.

  • In order to enable receiving queries from remote servers, the TCP module was built, which is divided into client_tcp and server_tcp, and a complete server-client module was implemented to enable bidirectional data transfer from the client to the server and vice versa. Using the proxy pattern, the TUI interface creates a suitable client_searcher object, which is unique in that it does not perform the query itself, but sends the query via the TCP protocol to the tcp_searcher object, which performs the query and returns the results through the network protocol.

cpp_search_engine's People

Contributors

israelviner avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.