Name: Sohom Ghosh
Type: User
Company: Fidelity Investments | Jadavpur University
Bio: Sr. Data Scientist | Financial Natural Language Processing Researcher | Deep Learning, Machine Learning & AI | Large Language Models
Twitter: sohom1ghosh
Location: Bangalore, India
Blog: https://sohomghosh.github.io/
Sohom Ghosh's Projects
Experiments with D3.js
My notebooks for the eBay / CSUMB Data Science for Search Course
a curated list of R tutorials for Data Science, NLP and Machine Learning
Datasets, papers and books on AI & Finance.
Deep Learning with TensorFlow by Packt
Deep Learning codes will be added here
Example Repo for the Udemy Course "Deployment of Machine Learning Models"
The earnings conference call dataset of S&P 500 companies
ECTSum Dataset and Codes
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
An IPython Notebook to try Elasticsearch-py
Numeral is the crucial part of financial documents. In order to understand the detail of opinions in financial documents, we should not only analyze the text, but also need to assay the numeric information in depth. Because of the informal writing style, analyzing social media data is more challenging than analyzing news and official documents. FinNum is a dataset for fine-grained numeral understanding in financial social media data - to identify the category of a numeral.
FinProLex provides 5,162 tokens in professional analysts' reports and the financial social media platform posts with expert-like scores. The expert-like scores are calculated based on the pointwise mutual information (PMI).
NTUSD-Fin provides various scoring methods including frequency, CFIDF, chi-squared value, market sentiment score and word vector for the tokens. Only the tokens appeared at least ten times and shown significantly difference between expected and observed frequency with chi-squared test are remained in our dictionary. The predetermined significance level is 0.05. The market sentiment score is calculated by substracting the bearish PMI from the bullish PMI. There are 8,331 words, 112 hashtags and 115 emojis in the constructed dictionary, NTUSD-Fin.
Numeral is the crucial part of in narrative, especially in financial documents. We should not only analyze the text, but also need to assay the numeric information in depth. Numeracy-600K is a dataset for testing the numeracy of machines.
Solution developed by team LIPI while participating in the FNS 2022 shared task.
A tool to detect whether numerals present in Financial Texts are in-claim or out-of-claim
Code for FinCLASS: Modeling Financial Uncertainty with Multivariate Temporal Entropy-based Curriculums at UAI 2021
NLP progress in Fintech. A repository to track the progress in Natural Language Processing (NLP) related to the domain of Finance, including the datasets, papers, and current state-of-the-art results for the most popular tasks.
FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability
Free resources for learning data science
Getting Started with TensorFlow, published by Packt