Melinda Malone's Projects
The Amazon Vine Analysis repo features Big Data using Google Colab, PySpark, AWS RDS, and AWS Simple Storage Service (S3) in a PostgreSQL environment to analyze Amazon customer reviews. For this analysis, the Amazon product category of Pet Products will be analyzed to determine if positivity bias exists when Amazon Vine members, who are provided these products and required to publish a review, review the products.
This repository contains a Machine Learning project to analyze Austin Animal Center's data
This repo contains an interactive web data visualization created using Plotly.js, a JavaScript data visualization library, HTML, and basic JavaScript. D3.json() was used to fetch external data such as CSV files and web APIs, and data was parsed in JSON format. Functional programming, JavaScript's Math library, and Event Handlers were used to manipulate the data and add interactivity to the data visualization. The index.html file was deployed to GitHub pages to allow users the ease of clicking on a weblink to access the interactive data visualization.
The purpose of the Bike-Sharing Analysis is to utilize Tableau to create data visualizations and present the data in a series of worksheets, dashboards, and stories to determine if a bike-sharing service could be launched in Des Moines, Iowa based on New York City data based on August 2019 data.
This repo features Supervised Machine Learning and its use in data analytics by analyzing credit card risk. Logistic Regression, Decision Tree, Random Forest, and Support Vector machine algorithms, in addition to ensemble and resampling techniques, were used.
The Election Analysis repo uses Python, in conjunction with Microsoft Visual Studio Code to confirm the winner of the election and analyze the data further to determine which county had the highest voter turnout. The actual vote count and percentages were determined for each candidate and each county using lists, dictionaries, for loops, and conditional statements with membership and logical operators.
This repo contains an analysis of a theater fundraising campaign to determine ideal parameters to achieve optimal results. The dataset was analyzed in Microsoft Excel using Pivot Tables, Pivot Charts, COUNTIFS formulas, and line charts. This was the first assignment for the Data Analysis Boot Camp where Git Bash and the SSH key were set up to push/pull to and from GitHub.
This repo contains an analysis of vehicle data using the R programming language, a language popular in data science and academia due to its statistical modeling and hypothesis testing.
This is a special repo that features the README on my public GitHub profile.
The Movies Extract-Transform-Load (ETL) Analysis repo contains movie data extracted from Wikipedia and Kaggle in CSV and JSON file formats. The datasets were transformed by cleaning and merging the datasets, and the cleaned datasets were loaded into a movie_data SQL database. Regex was used to identify strings of characters defined by search patterns playing a critical role in cleaning the box office, budget, release date, and running time data. Lambda functions were used in the transform phase as "anonymous functions."
The Pewlett Hackard Analysis repo uses Structured Query Language (SQL) in a Postgres relational database using pgAdmin. An entity-relationship diagram (ERD) was initially designed to organize employee data for six separate CSV files containing employee, department, employees by department, managers by department, salary, and title information. The datasets were imported into pgAdmin and queries using Joins, Where, Order By, Group By, Count, On, and Into clauses and functions, in addition to table aliases, were used to analyze the data and determine how many employees are eligible for retirement and what programs need to be put into place for mentorship opportunities before retirement-eligible employees retire.
This repo contains an interactive web data visualization created using Plotly.js, a JavaScript data visualization library, HTML, and basic JavaScript. D3.json() was used to fetch external data such as CSV files and web APIs, and data was parsed in JSON format. Functional programming, JavaScript's Math library, and Event Handlers were used to manipulate the data and add interactivity to the data visualization. The index.html file was deployed to GitHub pages to allow users the ease of clicking on a weblink to access the interactive data visualization.
This is my practice repo where I edited and reformatted the README as I maintained the PostgreSQL database and Tableau visualization for the Austin AniML Rescue Machine Learning group data project as part of the Data Analysis and Visualization Boot Camp at Texas McCombs School of Business. During the project, I acted as the database administrator and Tableau developer. The technologies I used include PostgreSQL, pgAdmin, Python, Pandas, SQL Alchemy, Quick DBD, and Tableau.
The PyBer Analysis repo contains an analysis of ridesharing and city data using Python, NumPy, Matplotlib, and SciPy by creating line charts, bar charts, scatter plots, bubble charts, pie charts, and box-and-whisker plots. Using Pandas DataFrames and groupby, pivot, and resample functions, the data has been analyzed to determine total rides, total drivers, total fares, and average fare per ride and driver by rural, suburban, or urban city type.
The School District Analysis uses Python, Anaconda, Jupyter Notebook, the Pandas library specifically DataFrames, and NumPy to perform analysis on school data for fifteen different high schools across four different grades by merging the data into several Pandas DataFrames. Using the Pandas loc method, Pandas groupby function and bins, the average math, reading, and overall scores and passing rates were identified by student budget, by school size, and by school type.
The purpose of the All Stocks Analysis Refactored repo is to write and execute code in Visual Basic for Applications (VBA), refactor code in VBA, and learn the benefits of refactoring code in VBA.
This repo is for the Surfs Up Analysis where weather data is analyzed for the Hawaiian island of Oahu to determine the potential success of an ice cream surf shop using SQLite, SQLAlchemy, and Flask in a Python environment in Jupyter Notebook and VSCode editors.
This repo contains the UFOs Analysis employing JavaScript "standard" and JavaScript ES6+ (aka ES2015) to create, populate, and dynamically filter an HTML table. A dynamic webpage was created and customized using JavaScript, HTML, CSS, and Bootstrap components to display data of UFO sightings and allow users to filter UFO sightings by date, city, state, country, and shape of the sighting.
The World Weather Analysis repo utilizes Python and Jupyter Notebook in conjunction with decision and repetition statements, data structures, Pandas, Matplotlib, NumPy, CitiPy, and SciPy statistics to retrieve and use data from OpenWeatherMap and Google Map API. The APIs are used to "get" requests from a server, retrieve and store values from a JSON array, use try and except blocks to resolve errors, create and format scatter plots using Matplotlib, perform linear regression and add regression lines to scatter plots while simultaneously determining favorable vacation destinations for customers based on weather conditions.