leondrajames Goto Github PK

followers: 20.0 following: 5.0 repos: 41.0 gists: 0.0

Name: Leondra R. Gonzalez

Type: User

Company: Microsoft

Bio: Machine Learning | AI | NLP | Gamer | Sci Fi Geek

Location: Indianapolis, IN

Blog: https://www.linkedin.com/in/leondragonzalez/

Hi 👋🏽, I'm Leondra.

Data Science | Machine Learning | Python | R | SQL | Kusto | Java

Welcome to my Github! I'm so glad that you're here 😊. My name is Leondra Gonzalez (nee James) and I am a 📉 Data Scientist with a passion for sci-fi ⭐, table top games 🎲, video games 🎮, prog rock / metal 🎸, anime, animals 🐱, and B-list horror and action films 🎬.

💼 I’m currently working at Microsoft as a Senior Data & Applied Scientist
🎓 Alumni Affiliations Carnegie Mellon University, Otterbein University, Quantic School of Business and Technology
📚 Currently studying PhD in Information Technology with a Specialization in Artificial Intelligence at University of the Cumberlands, School of Computer and Information Science
📓 Currently learning Deep learning architectures and tuning (RNNs, LSTMs, autoencoders, transformers, etc.) and LLMs
🎵 Currently jamming to Polyphia
📺 Currently obsessed with Cobra Kai, Love Island, Yellowstone
🔋 Passionate / Energized about Inclusion in STEM
🏙 Location Indianapolis, IN
📫 How to reach me [email protected]
📖 Read my book, "Cracking the Data Science Interview"
⚡ Fun fact Former high school United States Chess Federation state and national-level placements 🏆♟️

Connect with me:

Languages and Tools:

Let's Connect!:

Leondra R. Gonzalez's Projects

adclick_fraud

Capstone project #2 for the Harvard University Professional Certificate in Data Science

awssagemaker_pythonxgboosttutorial

Python XGBoost model, using Amazon SageMaker, EC2 instances and S3 buckets. Used to prepare, partition, train, tune, predict and evaluate model. Project involves predicting customers who sign up for a financial product at a bank.

bikeshare-exploratory-analysis

An exploratory analysis of the Kaggle bikeshare data set with the application of linear regression models, which are not optimal for this particular problem of predicting bikes rented.

boston-housing---random-forest-xgboost

Leveraging regression random forest and XGBoost algorithms with cross validation and grid search to tune the best performing model on the Boston Housing dataset. Analyzed and visualized the most statistically significant features for both models. Achieved an RMSE of $2K

bostonhousingprices_neuralnet

My first attempt at implementing a neural network using the Boston housing data set from the MASS library.

candycrushproj

Candy Crush Level Difficulty Analysis

chipotlelocations

This is a descriptive and exploratory data analysis project from DataCamp which aims to explore real data on every Chipotle location to identify franchising opportunities. The goal is to scout out the next Chipotle location using interactive maps (ie: leaflet) and external data to compare proposed locations on several important factors, such as proximity to current Chipotle locations, the distribution of the state's population, and the distance from interstates and tourist attractions.

customer-churn-w-logistic-regression

Utilizing tools such as Spark, Python (PySpark), SQL, and Databricks, performed logistic regression on customers to predict those at a higher risk of churning, then applied the model to an unseen "new customers" data set.

datakind-project-7.21

Data Visualizations

degrees-that-pay-you-back

A cluster analysis leveraging the kmeans algorithm to determine which degrees are likely to yield which levels of income based on historical data.

disney-movies-box-office-hits

Analysis of Disney's top grossing films (adjusted for inflation) in Python, using regression to attribute film genre to success. The project includes using regression on the data, as well as bootstrap regression to determine confidence intervals of the intercept and coefficients.

docs

ds-bootcamp-capstone-mondayball

Data Science & Machine Learning Data Capstone based on Moneyball dataset

film-similarity-nlp-with-kmeans-hierarchical-clustering

Used NLP techniques (tokenization, stemming, vectorization for TF-IDF) and clustering algorithms (Kmeans and Hierarchical clustering) to mine the "similarities" between films based on their plots provided by IMBD and Wikipedia. The dataset contains the titles of the top 100 movies on IMDb.

first-knn-attempt---islr-caravan-dataset

This is my first attempt at a KNN model, where I attempt to classify the purchase of caravan insurance in the Caravan data set (ISLR package).

goldenageofgaming

Video games are big business: the global gaming market is projected to be worth more than $300 billion by 2027 according to Mordor Intelligence. With so much money at stake, the major game publishers are increasingly more incentivized to create the next big hit. But are games getting better, or has the golden age of video games already passed? In this project, I explore the top 400 best-selling video games created between 1977 and 2020. This is achieved by comparing gaming sales data with critic and user reviews data. In doing so, we can discover whether video games have improved as the gaming market has grown. Each table is limited to 400 rows for this experiment, but the complete dataset with over 13,000 games can be found on Kaggle.

googletrendseda

gotnetworkanalysis

Analysis of the co-occurrence network of Game of Thrones characters in the Game of Thrones books. Here, two characters are considered to co-occur if their names appear in the vicinity of 15 words from one another in the books. This project utilized graph analysis and modeling frameworks such as Google's PageRank Algorithm.

graduate-admission-bias-hypothesis-testing

harvardxcapstone---film-recommender-system

Capstone Submission #1 for the Harvard University Professional Certificate in Data Science.

heartdisease

"What Your Heart Is Telling You" Logit Model

hyundai-cruise-ship-crew-prediction

Predicting the number of required crew needed for manning a Hyundai Cruise ship based on information like number of cabins and passengers using linear regression. Leveraged SQL and PySpark,

international-debt-stats-eda

Used SQL in Jupyter Notebooks to analyze and explore data on international debts and codes.

leondrajames

loanpaymentprediction_svm

My first attempt with building a SVM model, and optimizing the cost and gamma parameters using the Gaussian Kernel grid search method.

marketbasketanalysis-mba-

Use of associative rule mining using the APRIORI algorithm

markovchains_multitouchattribution

Multi touch attribution models, including Markov chains

mobilegameabtest

2 A/B tests, testing the difference in 1) average player 1 day and 2) 7 day retention against control (old player level) and new version (new player level)

netflix-content-duration-analysis

Given the large number of movies and series available on Netflix, it is a perfect opportunity to dive into the entertainment industry with an analysis of Netflix content durations. This analysis aims to understand trends in content duration on the Netflix platform since 2011 through 2020.

predicttaxifares

An analysis and prediction of taxi fares based on 2013 NYC data using decision trees and random forests.