Giter Site home page Giter Site logo

Sugata Ghosh, PhD

Sugata Ghosh | Gmail

Sugata Ghosh | Linkedin

Sugata Ghosh | GitHub

Sugata Ghosh | Kaggle

Sugata Ghosh | Twitter



Data Science and Machine Learning Portfolio Website: https://sugatagh.github.io/dsml/

Experience

Ford Motor Company 2024-Present
Reliability Data Scientist

  • Department: Global Data Insight and Analytics

Indian Institute of Science Education and Research Kolkata 2018-2024
Research Fellow and Teaching Assistant

  • Research Focus: Stochastic ordering

  • Teaching Assistantship: Served as Teaching Assistant for the courses Statistics I, Probability I, and Analysis I. Involved in conducting tutorial sessions, preparing question papers, and the grading process.

Education

Indian Institute of Science Education and Research Kolkata 2018-2024
Doctor of Philosophy in Statistics

Indian Institute of Technology Kanpur 2015-2017
Master of Science in Statistics

University of Calcutta 2012-2015
Bachelor of Science in Statistics

Skills

Languages: Python, SQL, R, MATLAB

Tools: LaTeX, Jupyter Notebook

Statistical Software: Minitab

Publications and Presentations

Refereed Journal Publications

Preprints

Academic Magazine Articles

  • Banerjee, P., Ghosh, S. (2016) A brief review on missing data. Prakarsho.*

  • Ghosh, S. (2014) A generalization of the Kelly gambling system. Prakarsho.

  • Dutta, T., Ghosh, S. (2014) An attempt to generate random numbers. Prakarsho.

Presentations

  • Departure-based Asymptotic Stochastic Order for Random Processes. International Workshop on Reliability Theory and Survival Analysis (IWRTSA) 2022, IISER Kolkata.

  • On Some Inconsistent Multivariate Distributions. Open House'16, IIT Kanpur.

*Prakarsho: Departmental magazine published by the Department of Statistics, St. Xavier's College, Kolkata.

Scholastic Achievements

Scholarship and Research Fellowship

  • Research Fellowship from University Grants Commission, MHRD, Government of India.

  • National Scholarship from Department of Higher Education, MHRD, Government of India.

Test Performances

  • AIR-94 in Mathematical Science paper in CSIR-UGC NET-JRF (Dec 2016).

  • AIR-31 in Mathematical Statistics paper in IIT-JAM (2015).

Seminars, Workshops, and Summer/Winter Schools

Winter School on Deep Learning: From Perceptrons to Diffusion Models.
Organized by Electronics and Communication Sciences Unit, ISI Kolkata.

International Workshop on Reliability Theory and Survival Analysis (IWRTSA) 2022.
Organized by Department of Mathematics and Statistics, IISER Kolkata.

Indo-French Center for Applied Mathematics (IFCAM) Winter School 2018.
On Stochastic Methods for Uncertainty Quantification and Sensitivity Analysis of Complex Models.

National Seminar on Application of Statistics and Statistical Computing.
Organized by Xaverian Statistical Association under Department of Statistics, St. Xavierโ€™s College, Kolkata.

Fellowship Programs

TMLC Fellowship Program 2022-2023
Conducted by The Machine Learning Company.
Contributed to the Conversational AI DeepPavlov project.

Data Science and Machine Learning Projects

Author Identification with Natural Language Processing

E-commerce Text Classification

  • Classified products into four given categories based on their descriptions available on an e-commerce platform.
  • Employed TF-IDF vectorizer and Word2Vec embedder with a number of classifiers. Obtained test accuracy of $0.949$ with the hyperparameter-tuned model achieving the highest validation accuracy (TF-IDF + Linear SVM).
  • GitHub repository: https://github.com/sugatagh/E-commerce-Text-Classification

Anomaly Detection in Credit Card Transactions

Higgs Boson Event Detection
Conducted by The Machine Learning Company.

  • Predicted whether or not an event produced in a particle accelerator indicates the discovery of a new particle.
  • Trained a deep neural network, achieving test AMS (approximate median significance) score of $1.200$ and test accuracy of $0.824$, using GridSearchCV for hyperparameter optimization.
  • GitHub repository: https://github.com/sugatagh/Higgs-Boson-Event-Detection

Patient Survival Prediction
Conducted by The Machine Learning Company.

Electron Energy Flux Prediction
Conducted by The Machine Learning Company.

Site Energy Usage Intensity Prediction
Conducted by The Machine Learning Company.

Road Traffic Accident Severity Classification
Conducted by The Machine Learning Company.

Natural Language Processing with Disaster Tweets
Jointly with Shyambhu Mukherjee.

Credit Card Fraud Detection
Jointly with Shyambhu Mukherjee.

  • Classified credit card transactions as authentic or fraudulent, based on relevant data such as time and amount.
  • Obtained test $F_2$-score of $0.880$ with random forest algorithm after oversampling the minority class (fraudulent transactions) in the training set via synthetic minority over-sampling technique (SMOTE).
  • GitHub repository: https://github.com/sugatagh/Credit-Card-Fraud-Detection

Online Internships

Machine Learning Internship Program 2022
Conducted by Uniconverge Technologies and The IoT Academy.

  • Detected duplication of points of interest in a dataset of over $1.5$ million place entries.
  • Trained several algorithms and obtained test accuracy of $0.770$ with hyperparameter-tuned XGBoost classifier.
  • GitHub repository: https://github.com/sugatagh/Foursquare-Location-Matching

Academic Course Projects

A Time Series Analysis of Monthly Airline Revenue Passenger Mile (RPM)
Supervisor: Dr. Amit Mitra (IIT Kanpur).

  • Analyzed RPM data for $1996 โ€“ 2014$ and built a predictive model for forecasting future revenue values.

A Study on Performances in the Olympic Games
Supervisor: Dr. Sharmishtha Mitra (IIT Kanpur).

  • Built a regression model to predict the overall performance of the countries in the Summer Olympic Games.

Students' Future Plans and the Reasons Behind
Supervisor: Dr. Shalabh (IIT Kanpur).

  • Examined the variation in career choices of the students at IIT Kanpur and how the reasons for such choices vary.

A Statistical Analysis of the Variation in Preference to Movie Genres among Spectators
Supervisors: Dr. Durba Bhattacharya and Prof. Soumya Banerjee (St. Xavier's College, Kolkata).

  • Studied how hobbies influence preferred movie genre of an individual. Checked bias due to gender and age-group.
  • Analyzed differences in preferring one factor for a movie's success over another across age-groups and gender.

Certifications

Generative AI for Everyone 2023
Authorized by DeepLearning.AI, offered by Coursera.
https://www.coursera.org/account/accomplishments/certificate/EV8T2EF4VUKN

Machine Learning Specialization 2022
Authorized by Stanford University, offered by Coursera.
https://www.coursera.org/account/accomplishments/specialization/certificate/U2MZV5HWRG5L

Data Analyst in SQL Track 2022
Offered by DataCamp.
https://www.datacamp.com/statement-of-accomplishment/track/689ba9d0ab9984f55aac593e6caacd1f9d197194

IBM Data Science Specialization 2022
Authorized by IBM, offered by Coursera.
https://www.coursera.org/account/accomplishments/specialization/certificate/9V355HMT2FB6

Applied Data Science with Python 2021
Offered by Electronics and ICT Academy, IIT Roorkee.
https://eict.iitr.ac.in/wp-content/uploads/L214613B669.jpg

Academic Courses

Statistics
Regression Analysis, Statistical Inference, Time-Series Analysis, Statistical Simulation and Data Analysis, Probabilistic Theory of Pattern Recognition, Multivariate Analysis, Analysis of Variance, Robust Statistical Methods, Nonparametric Inference, Non-linear Regression, Large Sample Theory, Sampling Theory, Matrix Theory and Linear Estimation, Design of Experiments, Statistical Quality Control, Distributions Theory in Statistics, Population Statistics, Economic Statistics.

Mathematics
Real Analysis, Linear Algebra, Multivariable Calculus, Numerical Analysis, Complex Variables, Ergodic Theory, Introduction to Graph Theory, Measure theory.

Probability and Applications
Probability Theory, Applied Stochastic Process.

Others
Computer Programming and Data Structures, Research Methodology.

Sugata Ghosh's Projects

anomaly-detection-in-credit-card-transactions icon anomaly-detection-in-credit-card-transactions

The objective of the project is to detect anomalies in credit card transactions. More precisely, given the data on time, amount and 28 transformed features, our goal is to fit a probability distribution based on authentic transactions, and then use it to correctly identify a new transaction as authentic or fraudulent.

credit-card-fraud-detection icon credit-card-fraud-detection

The objective of the project is to classify credit card transactions as authentic or fraudulent, based on data regarding time, amount and a set of PCA-transformed features for transactions. We explore the data extensively and employ different techniques to build classification models, which are compared through various evaluation metrics.

dsml icon dsml

Data Science and Machine Learning Portfolio Website

e-commerce-text-classification icon e-commerce-text-classification

Proper categorization of e-commerce products enhances the user experience and achieves better results with external search engines. The objective of the project is to classify a product into four given categories, based on its description available on an e-commerce platform.

foursquare-location-matching icon foursquare-location-matching

Using the provided dataset of over one-and-a-half million place entries, heavily altered to include noise, duplications, extraneous, or incorrect information, the objective is to produce an algorithm that predicts which place entries represent the same point of interest (POI).

higgs-boson-event-detection icon higgs-boson-event-detection

The goal of the project is to classify an event produced in the particle accelerator as background or signal. A background event is explained by the existing theories and previous observations. A signal event, however, indicates a process that cannot be described by previous observations and leads to the potential discovery of a new particle.

implementing-logistic-regression-from-scratch icon implementing-logistic-regression-from-scratch

While it is convenient to use advanced libraries for day-to-day modeling, it does not give insight into the details of what really happens underneath, when we run the codes. In this work, we implement a logistic regression model manually from scratch, without using any advanced library, to understand how it works.

natural-language-processing-with-disaster-tweets icon natural-language-processing-with-disaster-tweets

The objective of the project is to predict whether a particular tweet, of which the text (occasionally the keyword and the location as well) is provided, indicates a real disaster or not. We use various NLP techniques and classification models for this purpose and objectively compare these models by means of appropriate evaluation metric.

patient-survival-prediction-using-deep-learning icon patient-survival-prediction-using-deep-learning

The knowledge about medical records is crucial in the treatment of a patient and it indicates his/her survival odds to a great extent. In this project, we predict whether a patient will survive or not, based on various relevant medical information.

road-traffic-accident-severity-classification icon road-traffic-accident-severity-classification

The aim of the project is to build prediction models to classify severity of road traffic accidents (slight injury, serious injury or fatal injury) based on various relevant information regarding the involved vehicles, drivers, casualties and surrounding conditions.

site-energy-usage-intensity-prediction icon site-energy-usage-intensity-prediction

The objective of the project is to predict energy usage intensity of a building in a given year, based on building characteristics as well as weather data for the location of the building. The dataset contains about 100 thousand observations of building energy usage records collected over 7 years, in several states within the United States.

spacex-falcon-9-first-stage-landing-prediction icon spacex-falcon-9-first-stage-landing-prediction

A primary reason behind SpaceX's stellar success is its relatively inexpensive rocket launches, founded on its ingenious reuse of the first stage of a rocket. Thus, to estimate the cost of a launch, it is critical to predict if the first stage will land. In this project, we attempt to predict first stage landing of SpaceX's Falcon 9 rocket.

spooky-author-identification icon spooky-author-identification

The objective of the project is to train an LSTM model with the help of GloVe embeddings, to predict probabilities that a given text is written by particular authors. Furthermore, these probabilities are used to predict the author of the text in a multiclass classification setup.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.