obsei / obsei Goto Github PK

Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .

Home Page: https://obsei.com/

License: Apache License 2.0

Python 60.44% Jupyter Notebook 39.16% Dockerfile 0.33% HTML 0.07%

artificial-intelligence natural-language-processing sentiment-analysis workflow social-network-analysis customer-engagement text-analysis text-analytics python nlp

obsei's People

Contributors

Stargazers

Watchers

Forkers

rajagurunath ashispapu saikrishna9494 girishpatel prashant118 fossabot timoeller sibtainrazajamali chibuikeeugene shinroo rongpenl o7s8r6 smitakshigupta felipeescallon helioxgroup zhiliangpersonal adbmd matteo-grella aromeira driffathsultana klinkai shahrukhx01 wouldayajustlookatit zolvedebmeet akar5h ankush-chander afiqmuzaffar anant-mital aniruddhachoudhury stjordanis namanjuneja771 trendingtechnology reenabapna mahzy aniketmaurya ahmadfikrimasyhur genglukuan deepmd-tools amitkayal cnarte codeaudit mathew55 hercules261188 aditya-zutshi ravanv2 tanish36 exqmjmz admariner rahulsingh1508 dystudio amirunpri2018 jsairdrop techthiyanes hacking-for-humanity anitcloud1 jcarlosneto mukeshsharma04 gurusura jacobjohansen iamgrewal hacaro76 visionarylab rohit036 abhinavbh08 svjai mkeshita fxcebx kekewind allensmile joeyburzynski 1016135097 ai-awesome-repos fujohnwang frankswu wushian lswjkllc williamhsu17 specialbiscuit chxlium won21kr ywk-leo sinntalker arorajatin poisoners dlnan fraol123 guptam opensrcext deploysites123 zuodh zhangli344236745 rose-apple-bit ajunlonglive adambear sacredabhishek abrapartners guswic gg-big-org bekimpilo ritwikagrawal1228

obsei's Issues

Batch call to pipeline in Analyzers

Is your feature request related to a problem? Please describe.
Currently analyzers are iterating over array and calling pipeline method with single argument. This can be improve upon by calling pipeline with array of data.

Describe the solution you'd like
Divide input array into multiple batches and pass batch array to pipeline. Also, do performance analysis if this improves library latency.

Issue while running Colab Project

Error is coming while running the step Configure Play Store Scrapper Source in colab project

ImportError Traceback (most recent call last)
in ()
----> 1 from obsei.source.playstore_scrapper import PlayStoreScrapperConfig, PlayStoreScrapperSource
2
3 # initialize play store source config
4 source_config = PlayStoreScrapperConfig(
5 # Need two parameters package_name and country.

/usr/local/lib/python3.7/dist-packages/obsei/source/playstore_scrapper.py in ()
3
4 from google_play_scraper import Sort, reviews
----> 5 from google_play_scraper.features.reviews import ContinuationToken
6
7 from obsei.source.base_source import BaseSource, BaseSourceConfig

ImportError: cannot import name 'ContinuationToken' from 'google_play_scraper.features.reviews' (/usr/local/lib/python3.7/dist-packages/google_play_scraper/features/reviews.py)

Use dependencies cache to reduce CI time

https://docs.github.com/en/actions/guides/building-and-testing-python
https://github.com/actions/cache
https://docs.github.com/en/actions/guides/caching-dependencies-to-speed-up-workflows
https://github.com/deepset-ai/haystack/blob/master/.github/workflows/ci.yml

Remove hydra's dependency

Hydra include many sub dependencies hence in order to keep binary clean. Better to remove hydra as dependency instead add boilerplate code.

Confluence as Source/Observer

Add Slack as Informer/Sink

SlackSink is not printing translated data correctly , Unicode data is visible

Describe the bug
When using Slack sink for any Source like Reddit, Playstore or AppStore , sometimes data is not in readable format , Unicode characters are visible

To Reproduce
Steps to reproduce the behavior:

Expected behavior
Should not print Unicode characters , Convert in String and present output

Stacktrace
If applicable, add stacktrace to help explain your problem.

Please complete the following information:

OS: Windows
Version:

Additional context
Add any other context about the problem here.

Add apple app store review support

Currently Apple do not provide API to fetch reviews. But they provide RSS feed to fetch reviews data without authentication.
Refer : https://developer.apple.com/forums/thread/16909

There is python library provide this functionality can also be used https://github.com/cowboy-bebug/app-store-scraper

Add example table on Readme for good information

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Table can have information -

Credentials needed
Dependencies required
Link to example python
etc

Suggested by @julian-risch

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Support for Reddit Observer / source

With Reddit gaining popularity and as a tribute to r/wallstreetbets/ 💎🚀🚀, support for Reddit would be nice here !

Use google play store scrapper to fetch reviews

Current implementation of google app store review fetcher use actual google supported API but it require too much work related to authentication. So idea to provide google play store review scrapper as well so user can easily use this tool

This python lib cab we used for this purpose https://github.com/JoMingyu/google-play-scraper

Data transformation node

Idea to have a node which transform list of data/dict/json to one format to another format.
Ideally it can be used as data merging, and conversion purpose as well.

[BUG] All analyzer examples are broken

Describe the bug
A clear and concise description of what the bug is.
Regression caused by moving analyzer config param from class init to analyze function
To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Stacktrace
If applicable, add stacktrace to help explain your problem.

Please complete the following information:

OS:
Version:

Additional context
Add any other context about the problem here.

Add Telegram as Informer/Sink

Add DAG support and fix inconsistent naming

Introduce DAG based workflow. Need to finalise between networkx or airflow
Replace use of Sink with Informer (packages, classes and variables)
Replace use of Source with Observer (packages, classes and variables)
Replace use of Analyzer with Segmenter (packages, classes and variables)
Add BaseSegmenter class
Fix state store
Fix circular dependency in Workflow classes

[Analyzer] Integrate intent classifier

An intent classifier is useful to detect what customer goals like buy, sell, and purchase also useful in conversional flow.

This same things can also be done via zero shot classifier as well but it would better to add separate analyzer to separate it from generic text classification. It will help user to load their own models for this purpose.

Add persistent storage to store current state

Problem

Twitter V2 APIs, Play Store Review APIs and etc have capabilities to fetch result after tweet id, review id respectively. So idea is to store intermediate information in persistent store to avoid fetching duplicate data.

Propose solution

sqlalchemy already included in the dependencies so it would easy to add storage layer. sqlalchemy can also enable user to use their choice of data store supported by it's DBAPI.

Persistent layer would be helpful for workflow engine to recover from some failure scenarios.

Add Google play store reviews as Source

Add Facebook Observer/Source

Add Translation Analyzer

Reduce size of docker image

Current docker image size is 1G

apt-get install layer consuming 240MB and
pip install layer consuming 800MB

Idea to create lean Docker image, install only required dependencies.

Move rest api interface to separate repo

In order to manage number of dependencies and test changes better to move rest api interface to separate repo https://github.com/lalitpagaria/obsei-rest

Google places reviews as source

Supporting links -
https://developers.google.com/my-business/reference/rest/v4/accounts.locations.reviews/list
https://developers.google.com/my-business/content/review-data#list_all_reviews
https://developers.google.com/my-business/reference/rest/#collection-v3accountslocationsreviews

For Crawler -
Yet to find

Add text cleaner node

Idea to have configurable text cleaning node.
This node also have predefined template to clean tweets, facebook feed, app reviews etc.

For detail refer #75 (comment)

[Analyzer] Add model explainability

Adding support for model explainability with transformer models. It should be provided via optional dependencies along with optional parameter. Following repos can be used -

https://github.com/hila-chefer/Transformer-Explainability
https://github.com/cdpierse/transformers-interpret

[Bonus work :D ] For non transformer model
https://github.com/marcotcr/lime

Add conda release

Currently Obsei lib is released on pypi. But there are substantial user using conda to download their project dependencies.

[BUG] Play Store Scrapper fails

Describe the bug
A clear and concise description of what the bug is.

While running example itself fails.
To Reproduce
Steps to reproduce the behavior:
Just run the play store scrapper example
Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

  File "/Users/lalitp/PycharmProjects/obsei/example/playstore_scrapper_example.py", line 31, in <module>
    source_response_list = source.lookup(source_config)
  File "/Users/lalitp/PycharmProjects/obsei/obsei/source/playstore_scrapper.py", line 73, in lookup
    if review.date < review["at"]:
AttributeError: 'dict' object has no attribute 'date'

Please complete the following information:

OS:
Version:

Additional context
Add any other context about the problem here.

Obsei demo UI

Ticket for creation of small ui demo of Obsei

Create documentation website

[BUG] Test Obsei in GPU environment

HTTP Sink is not working due to date time serialization issue on AppStore and PlayStore Scrapper Sources

below issue is coming :

TypeError: datetime.datetime(...) is not JSON serializable

To Reproduce
Select PlayStore & AppStore Scrapper and use some HTTP mock server or HTTP local server to receive sentiments data.

Expected behavior
Should work with any date time format

Stacktrace
TypeError: datetime.datetime(...) is not JSON serializable

Please complete the following information:

OS: windows
Version:

Additional context
Add any other context about the problem here.

[Analyzer] Add Haystack FAQ analyzer

Idea to query FAQ and add context of customer's query while informing to ticketing issuer or back to user itself.

Issue while loading Obsei Image using URL

Stacktrace
SSLError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /lalitpagaria/obsei/master/images/logos/obsei_200x200.png (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1125)')))
Traceback:
File "/usr/local/lib/python3.8/site-packages/streamlit/script_runner.py", line 337, in _run_script
exec(code, module.dict)
File "/home/user/ui.py", line 9, in
favicon = Image.open(requests.get(logo_url, stream=True).raw)
File "/usr/local/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)

Please complete the following information:

OS:
Version:

Additional context
Add any other context about the problem here.

[BUG] Text size is more than model handle

This bug is reported by @shahrukhx01

When input text size is more than the text size model can process we get this -

(The size of tensor a (1453) must match the size of tensor b (512) at non-singleton dimension 1)

Currently we don't have proper solution hence as a hack we will truncate text to required size before passing to model.

Add Google news and website crawler as Source

Idea to add Google News as Source.
Google News provide RSS feed and query support hence it is easy to crawl it.
RSS link -

https://news.google.com/rss/search?q=[INPUT]

For now just add GoogleNews as source later we can add few other news sources.

Google RSS feed give title, headlight, date and url. So inorder to fetch full article we need to use another library like https://github.com/adbar/trafilatura

LinkedIn as source

Idea to fetch Post and it's comments when someone tag user.

Clean CI workflows

add code coverage, lint, mypy, security check

Add Email Observer

To handle email tagging and classification use cases

[Analyzer] integrate PII information masking analyzer

Let's integrate PII information masking Analyzer in given text.
Basically PII information detection done via NER and then that is masked. We can use following repo directly which is using spacy model to perform this action.

https://github.com/microsoft/presidio

Panda DataFrame as Sink

Better offline support for transformers

Is your feature request related to a problem? Please describe.
For deployment of dockers on data centres, models needs to be cache locally. Either copying manually/scripts or auto-downloaded by code.
This should provide offline access to models. Since models are in huge size (in GBs), need to improve upon frequent upload or download of models.

Describe the solution you'd like
Transformers can run models offline by using environment variable - TRANSFORMERS_OFFLINE=1. This is documented here - https://huggingface.co/transformers/installation.html#offline-mode
We can achieve auto download for first time by code with similar logic as raised in PR - spacy download

Describe alternatives you've considered

Manual copy model and code to look for it. Need mounting of disk by docker.
Create docker image with model. Too big image.

Not able to run on HTTP informer

Describe the bug
I have created a mock server on local which is running .
but while running OBSEI using HTTP i am getting below response

(HTTPSConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /test (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1123)'))))

[Sink] output analyzed data in Dataframe and CSV

Add ZenDesk Informer

Add Whatsapp as Informer

Not able to build Obsei in Windows machine using pip install [BUG]

Describe the bug

While building Obsei using pip , its continuously failing while building uvloop-0.15.2.tar.gz

To Reproduce
pip install obsei in windows 7/10 machine
Expected behavior
Should build successfully
Stacktrace
ERROR: Command errored out with exit status 1:
command: 'c:\users\sanjay.bharkatiya\appdata\local\programs\python\python39\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\sanjay.bharkatiya\AppData\Local\Temp\pip-install-fumhr70o\uvloop\setup.py'"'"'; file='"'"'C:\Users\sanjay.bharkatiya\AppData\Local\Temp\pip-install-fumhr70o\uvloop\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\sanjay.bharkatiya\AppData\Local\Temp\pip-pip-egg-info-993ebv1w'
cwd: C:\Users\sanjay.bharkatiya\AppData\Local\Temp\pip-install-fumhr70o\uvloop
Complete output (15 lines):
Error processing line 1 of c:\users\sanjay.bharkatiya\appdata\local\programs\python\python39\lib\site-packages\matplotlib-3.4.1-py3.9-nspkg.pth:

  Traceback (most recent call last):
    File "c:\users\sanjay.bharkatiya\appdata\local\programs\python\python39\lib\site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 562, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\sanjay.bharkatiya\AppData\Local\Temp\pip-install-fumhr70o\uvloop\setup.py", line 8, in <module>
    raise RuntimeError('uvloop does not support Windows at the moment')
RuntimeError: uvloop does not support Windows at the moment
----------------------------------------

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Please complete the following information:

OS:
Version:

Additional context
Add any other context about the problem here.

obsei / obsei Goto Github PK

obsei's People

Contributors

Stargazers

Watchers

Forkers

obsei's Issues

Problem

Propose solution

Recommend Projects

Recommend Topics

Recommend Org