Giter Site home page Giter Site logo

obsei / obsei Goto Github PK

View Code? Open in Web Editor NEW
1.1K 27.0 151.0 16.63 MB

Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .

Home Page: https://obsei.com/

License: Apache License 2.0

Python 60.44% Jupyter Notebook 39.16% Dockerfile 0.33% HTML 0.07%
artificial-intelligence natural-language-processing sentiment-analysis workflow social-network-analysis customer-engagement text-analysis text-analytics python nlp

obsei's People

Contributors

akar5h avatar arorajatin avatar chxlium avatar cnarte avatar dependabot[bot] avatar girishpatel avatar kuutsav avatar lalitpagaria avatar namanjuneja771 avatar pyup-bot avatar reenabapna avatar sanjaybharkatiya avatar shahrukhx01 avatar snyk-bot avatar tanish36 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

obsei's Issues

Batch call to pipeline in Analyzers

Is your feature request related to a problem? Please describe.
Currently analyzers are iterating over array and calling pipeline method with single argument. This can be improve upon by calling pipeline with array of data.

Describe the solution you'd like
Divide input array into multiple batches and pass batch array to pipeline. Also, do performance analysis if this improves library latency.

Issue while running Colab Project

Error is coming while running the step Configure Play Store Scrapper Source in colab project

ImportError Traceback (most recent call last)
in ()
----> 1 from obsei.source.playstore_scrapper import PlayStoreScrapperConfig, PlayStoreScrapperSource
2
3 # initialize play store source config
4 source_config = PlayStoreScrapperConfig(
5 # Need two parameters package_name and country.

/usr/local/lib/python3.7/dist-packages/obsei/source/playstore_scrapper.py in ()
3
4 from google_play_scraper import Sort, reviews
----> 5 from google_play_scraper.features.reviews import ContinuationToken
6
7 from obsei.source.base_source import BaseSource, BaseSourceConfig

ImportError: cannot import name 'ContinuationToken' from 'google_play_scraper.features.reviews' (/usr/local/lib/python3.7/dist-packages/google_play_scraper/features/reviews.py)

Remove hydra's dependency

Hydra include many sub dependencies hence in order to keep binary clean. Better to remove hydra as dependency instead add boilerplate code.

SlackSink is not printing translated data correctly , Unicode data is visible

Describe the bug
When using Slack sink for any Source like Reddit, Playstore or AppStore , sometimes data is not in readable format , Unicode characters are visible

To Reproduce
Steps to reproduce the behavior:

Expected behavior
Should not print Unicode characters , Convert in String and present output

Stacktrace
If applicable, add stacktrace to help explain your problem.

Please complete the following information:

  • OS: Windows
  • Version:

Additional context
Add any other context about the problem here.

Add example table on Readme for good information

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Table can have information -

  • Credentials needed
  • Dependencies required
  • Link to example python
  • etc

Suggested by @julian-risch

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Data transformation node

Idea to have a node which transform list of data/dict/json to one format to another format.
Ideally it can be used as data merging, and conversion purpose as well.

[BUG] All analyzer examples are broken

Describe the bug
A clear and concise description of what the bug is.
Regression caused by moving analyzer config param from class init to analyze function
To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Stacktrace
If applicable, add stacktrace to help explain your problem.

Please complete the following information:

  • OS:
  • Version:

Additional context
Add any other context about the problem here.

Add DAG support and fix inconsistent naming

  • Introduce DAG based workflow. Need to finalise between networkx or airflow
  • Replace use of Sink with Informer (packages, classes and variables)
  • Replace use of Source with Observer (packages, classes and variables)
  • Replace use of Analyzer with Segmenter (packages, classes and variables)
  • Add BaseSegmenter class
  • Fix state store
  • Fix circular dependency in Workflow classes

[Analyzer] Integrate intent classifier

An intent classifier is useful to detect what customer goals like buy, sell, and purchase also useful in conversional flow.

This same things can also be done via zero shot classifier as well but it would better to add separate analyzer to separate it from generic text classification. It will help user to load their own models for this purpose.

Add persistent storage to store current state

Problem

Twitter V2 APIs, Play Store Review APIs and etc have capabilities to fetch result after tweet id, review id respectively. So idea is to store intermediate information in persistent store to avoid fetching duplicate data.

Propose solution

sqlalchemy already included in the dependencies so it would easy to add storage layer. sqlalchemy can also enable user to use their choice of data store supported by it's DBAPI.

Persistent layer would be helpful for workflow engine to recover from some failure scenarios.

Reduce size of docker image

Current docker image size is 1G

  • apt-get install layer consuming 240MB and
  • pip install layer consuming 800MB

Idea to create lean Docker image, install only required dependencies.

Add text cleaner node

Idea to have configurable text cleaning node.
This node also have predefined template to clean tweets, facebook feed, app reviews etc.

For detail refer #75 (comment)

Add conda release

Currently Obsei lib is released on pypi. But there are substantial user using conda to download their project dependencies.

[BUG] Play Store Scrapper fails

Describe the bug
A clear and concise description of what the bug is.

While running example itself fails.
To Reproduce
Steps to reproduce the behavior:
Just run the play store scrapper example
Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

  File "/Users/lalitp/PycharmProjects/obsei/example/playstore_scrapper_example.py", line 31, in <module>
    source_response_list = source.lookup(source_config)
  File "/Users/lalitp/PycharmProjects/obsei/obsei/source/playstore_scrapper.py", line 73, in lookup
    if review.date < review["at"]:
AttributeError: 'dict' object has no attribute 'date'

Please complete the following information:

  • OS:
  • Version:

Additional context
Add any other context about the problem here.

HTTP Sink is not working due to date time serialization issue on AppStore and PlayStore Scrapper Sources

below issue is coming :

TypeError: datetime.datetime(...) is not JSON serializable

To Reproduce
Select PlayStore & AppStore Scrapper and use some HTTP mock server or HTTP local server to receive sentiments data.

Expected behavior
Should work with any date time format

Stacktrace
TypeError: datetime.datetime(...) is not JSON serializable

Please complete the following information:

  • OS: windows
  • Version:

Additional context
Add any other context about the problem here.

Issue while loading Obsei Image using URL

Stacktrace
SSLError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /lalitpagaria/obsei/master/images/logos/obsei_200x200.png (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1125)')))
Traceback:
File "/usr/local/lib/python3.8/site-packages/streamlit/script_runner.py", line 337, in _run_script
exec(code, module.dict)
File "/home/user/ui.py", line 9, in
favicon = Image.open(requests.get(logo_url, stream=True).raw)
File "/usr/local/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)

Please complete the following information:

  • OS:
  • Version:

Additional context
Add any other context about the problem here.

[BUG] Text size is more than model handle

This bug is reported by @shahrukhx01

When input text size is more than the text size model can process we get this -

(The size of tensor a (1453) must match the size of tensor b (512) at non-singleton dimension 1)

Currently we don't have proper solution hence as a hack we will truncate text to required size before passing to model.

Add Google news and website crawler as Source

Idea to add Google News as Source.
Google News provide RSS feed and query support hence it is easy to crawl it.
RSS link -

https://news.google.com/rss/search?q=[INPUT]

For now just add GoogleNews as source later we can add few other news sources.

Google RSS feed give title, headlight, date and url. So inorder to fetch full article we need to use another library like https://github.com/adbar/trafilatura

Better offline support for transformers

Is your feature request related to a problem? Please describe.
For deployment of dockers on data centres, models needs to be cache locally. Either copying manually/scripts or auto-downloaded by code.
This should provide offline access to models. Since models are in huge size (in GBs), need to improve upon frequent upload or download of models.

Describe the solution you'd like
Transformers can run models offline by using environment variable - TRANSFORMERS_OFFLINE=1. This is documented here - https://huggingface.co/transformers/installation.html#offline-mode
We can achieve auto download for first time by code with similar logic as raised in PR - spacy download

Describe alternatives you've considered

  1. Manual copy model and code to look for it. Need mounting of disk by docker.
  2. Create docker image with model. Too big image.

Not able to run on HTTP informer

Describe the bug
I have created a mock server on local which is running .
but while running OBSEI using HTTP i am getting below response

(HTTPSConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /test (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1123)'))))

Not able to build Obsei in Windows machine using pip install [BUG]

Describe the bug

While building Obsei using pip , its continuously failing while building uvloop-0.15.2.tar.gz

To Reproduce
pip install obsei in windows 7/10 machine
Expected behavior
Should build successfully
Stacktrace
ERROR: Command errored out with exit status 1:
command: 'c:\users\sanjay.bharkatiya\appdata\local\programs\python\python39\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\sanjay.bharkatiya\AppData\Local\Temp\pip-install-fumhr70o\uvloop\setup.py'"'"'; file='"'"'C:\Users\sanjay.bharkatiya\AppData\Local\Temp\pip-install-fumhr70o\uvloop\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\sanjay.bharkatiya\AppData\Local\Temp\pip-pip-egg-info-993ebv1w'
cwd: C:\Users\sanjay.bharkatiya\AppData\Local\Temp\pip-install-fumhr70o\uvloop
Complete output (15 lines):
Error processing line 1 of c:\users\sanjay.bharkatiya\appdata\local\programs\python\python39\lib\site-packages\matplotlib-3.4.1-py3.9-nspkg.pth:

  Traceback (most recent call last):
    File "c:\users\sanjay.bharkatiya\appdata\local\programs\python\python39\lib\site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 562, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\sanjay.bharkatiya\AppData\Local\Temp\pip-install-fumhr70o\uvloop\setup.py", line 8, in <module>
    raise RuntimeError('uvloop does not support Windows at the moment')
RuntimeError: uvloop does not support Windows at the moment
----------------------------------------

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Please complete the following information:

  • OS:
  • Version:

Additional context
Add any other context about the problem here.

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Add Translator analyzer

Idea to detect income text language automatically and translate it to configured language.

This can be achieved by API calls and NLP models both. Need to check which suits well here.

Abstract Analyzer

Currently Analyzer only support sentiment and classification.
Abstract it into BaseAnalyzer and create separate classes for Sentiment/Classification/NER/QA/FAQ/Search etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.