Giter Site home page Giter Site logo

samalyarov / telegram_monitor Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 138 KB

A telegram channel parser + binary text classifier utilizing a simple logistic regression model

License: MIT License

Jupyter Notebook 95.51% Python 4.49%
binary-classification classifier-model logistic-regression sklearn telegram telegrambot-python text-classification nltk-python sklearn-library eli5 openpyxl parsing regular-expressions scikit-learn tqdm

telegram_monitor's Introduction

Telegram monitoring script

A script is based on telethon library and utilizes a personal Telegram account for acquiring messages from open channels or channel to which the user has access. Data can be collected from multiple channels (from a list in separate files). That particular version of the scripts then utilizes text pre-processing and a trained logistic regression model to determine if the message is "of interest" and if so - forwards it to a specified channel. Whether the message is "of interest" is arbitrary and based purely on the samples and interests of one using the script.

In this particular example, the script is used by a concierge company to ease the monitoring of many chats in which they search for potential clients. This helped save a lot of working time (estimated up to 70-80% workload reduction on that particular task) and expand the monitoring range - the script allowed to consistently monitor many more channels and groups, thus substantially increasing the company coverage without incurring any additional costs (a single worker could monitor several times more channels).

Files included in repository:

  • t_channels.xlsx is a list of channels (with names and links) from which the posts are to be loaded. The list is done in a .xlsx file for ease of communicating with colleagues (especially those with no coding experience) and leaving comments and remarks.
  • config.ini is a config file listing username, password, phone number of a user as well as bot API settings for connection.
  • TG_Monitor.py is a script itself
  • Posts_prediction.ipynb is an Jupyter Notebook with preliminary data analysis and model training, as well as prospects for future development and overall analysis of the model.
  • In order to be operational the folder should also include 3 .pkl files - a trained logistic regression model, a trained vectorizer model and vocabulary for said vectorizer model. All three can be created through the use of the "Posts_prediction" notebook.

The folder is designed to later on be compressed into a single .exe file (via the use of pyinstaller) for easy use.

Libraries used: telethon, configparser, datetime, numpy, pandas, pytz, matplotlib, seaborn, pickle, re, pymorphy2, nltk, scikit-learn(sklearn), eli5, tqdm (tqdm_notebook)

telegram_monitor's People

Contributors

samalyarov avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.