The main objective of this data science personal project portfolio is to demonstrate my skills in solving business challenges through my knowledge and tools of Data Science.
I am a Data Science and Data analyst currently Freelance. My main object is to build data-product powered by Machine Learning to increase company revenue and reduce costs as well.
I completed the undergraduate course in Biological Sciences at Universidade Santa Úrsula. I entered as an intern (scientific initiation) in the ornithology sector of the National Museum/UFRJ where he developed his monograph on taxonomy. While still at this institution, studying a postgraduate course in Zoology, I also completed a Master's degree in taxonomy and developed my Doctorate in Molecular Systematics and Taxonomy. I have a Post-Doctorate at the University of São Paulo and at the Federal University of Riode Janeiro.
During this period I developed my skills in data analysis, machine learning and bioinformatics. With that I gained vast knowledge in tools like Python and R. I also worked in several research groups and supervised master's and doctoral students.
As a professional, I always pushed myself forward to learn skills to become an independent Data Scientist, able to build an end-to-end project, since the data collection up to ML models creation and deployment.
At this point in my career, I feel confident about my expertise and ability to build relevant solutions to improve company results and thus, make me an outstanding professional.
Data Collect and Storage: SQL.
Data Processing and Analysis: R, Python, .
Development Tools: Git, heroku and Linux.
Machine Learning: Classification, Regression, Clustering, Time Series, Reinforcement Learning and Deep learning (image classification).
The project focuses on leveraging Yolov8, ViT, and traditional methods to develop a classifier and object detection system for Brazilian Birds through image analysis. To achieve this, the project utilizes photos sourced from the WikiAves website (https://www.wikiaves.com.br/), enabling a comprehensive understanding of Brazil's avian biodiversity. By combining state-of-the-art deep learning techniques like Yolov8 and Vision Transformer (ViT) with traditional methods, the project aims to achieve accurate identification and localization of Brazilian Birds in images. The rich dataset from WikiAves serves as a valuable resource, enabling the model to learn and generalize patterns effectively. Ultimately, this project contributes to conservation efforts and enhances our understanding of Brazil's diverse bird species.
App for IOS/Android to encourage bird watching by giving access to Xeno Canto recordings for playback. Still a very early project
Building a Machine Learning Model to detect cardiovascular disease in early stages leverage the diagnostic precision made by health professionals.
In this project, I developed a Machine Learning model able to detect disease in early stages with 65% accuracy over 70k patience.
The performance of this model would increase revenue of U$81.2 millions according to the company's business model described in the problem definition.
Machine Learning and taking fraud detection to the next level. Companies are reducing their costs with detecting fraudulent transactions, while companies providing theses types of services are increasing thier income. In this project, I built a Machine Learning classifier to label fraudulent transactions with 99.63% of accuracy. The performance of this model would bring revenue of U$877,5 millions according to the company's business model described in the problem definition.
Insurance All is a company that provides vehicle insurance to its customers and the product team is analyzing the possibility of offering policyholders a new product: health insurance.
As with vehicle insurance, customers of this new health insurance plan need to pay an amount annually to Insurance All to obtain an amount insured by the company, intended for the costs of an eventual accident or damage to the vehicle.
Insurance All conducted a survey of about 380,000 customers about their interest in joining a new health insurance product last year. All customers expressed interest or not in purchasing health insurance and these responses were saved in a database along with other customer attributes.
The product team selected 127 thousand new customers who did not respond to the survey to participate in a campaign, in which they will receive the offer of the new health insurance product. The offer will be made by the sales team through telephone calls.
However, the sales team has the capacity to make 20 thousand calls within the campaign period.
The business problem is selecting customers to create a loyalty program called Insiders. A UK-based online retail store has captured the sales data for different products for the period of one year (Nov 2016 to Dec 2017). The organization sells gifts primarily on the online platform. The customers who make a purchase consume directly for themselves. There are small businesses that buy in bulk and sell to other customers through the retail outlet channel.
House Rocket is a digital platform whose business model is the purchase and sale of properties using technology. House Rocket's CEO would like to maximize the company's revenue by finding good business opportunities. Its main strategy is to buy good houses in great locations with low prices and then resell them later at higher prices. The greater the difference between buying and selling, the greater the company's profit and therefore the greater its revenue. However, houses have many attributes that make them more or less attractive to buyers and sellers, and the location and time of year can also influence prices. Therefore, the company's CEO asked for a scenarios business simulation and answers for the following questions:
Which houses should the company buy and at what price? Once the house is in the company's possession, what is the best time to sell it and what would be the sale price? Should the company make a renovation to increase the price of the sale? What would be the suggestions for changes?
An end-to-end Data Science project with a regression adapted for time series as solution was created four machine learning models to forecast the sales. Predictions can be accessed by users through a bot from the smartphone app Telegram. This repository contains the solution for a Kaggle competition problem: https://www.kaggle.com/c/rossmann-store-sales. This project is part of the "Data Science Community" (Comunidade DS), a study environment to promote, learn, discuss and execute Data Science projects. For more information, please visit (in portuguese): https://sejaumdatascientist.com/ The goal of this Readme is to show the context of the problem, the steps taken to solve it, the main insights and the overall performance.
How you can do Hyperparameter Optimization for a NeuralNetwork automatically using Optuna. This is an end-to-end code in which I select a problem and design a neural network in PyTorch and then I find the optimal number of layers, drop out, learning rate, and other parameters using Optuna. This code is base in aproach by Abhishek Thakur. The dataset used was "Mechanisms of Action (MoA) Prediction".