Giter Site home page Giter Site logo

data-career-portfolio's Introduction

Data-Career-Portfolio

Projects

[ETL] GCP Data Engineer Project

In this project, I designed and implemented an ETL data pipeline using Google Cloud Storage as Data Lake, Google BigQuery as Data Warehouse and Google Cloud Composer for runing Apache Airflow as Data Orchestrator. This system is running on Google Cloud Platform.

  • Technology used : Google Cloud Storage, Google BigQuery, Airflow, Looker Studio.

  • Architechture Diagram :

    Diagram

  • Dashboard : Audible Sale Dashboard

    Audible Dashboard page1

    Audible Dashboard page2


[ELT] Retail Data Engineer Project

In this project, I designed and implemented an ELT data pipeline, leveraging Google Cloud Storage as a robust Data Lake, Google BigQuery as a high-performance Data Warehouse, and Apache Airflow as the orchestrator. The entire system is seamlessly orchestrated locally through the Astro CLI.

  • Technology used: Google Cloud Storage, Google BigQuery, Airflow, Looker Studio, DBT Core, Docker

  • Architechture Diagram :

    Diagram

  • Dashboard : Retail Dashboard

    Retail Dashboard


[Streaming] Weather Monitoring Stream Data Pipeline

The extraction process is done using Kafka, The data is streamed from the OpenWeatherMap API followed by creation of topics and publishing using Apache Kafka. In the transformation and load process, schema is extracted from the stream of data from API and reading of data from apache Kafka as streaming a dataframe. Then, data will be written in Cassandra for further data usage.

  • Technology used: Apache Kafka, Apache Spark, Cassandra, Docker

  • Architechture Diagram :

    Diagram

  • Cassandra :

cassandra


[ETL] HADOOP Data Pipeline

  • Architechture Diagram :

    Diagram

  • Next step development : - replace Flume with Kafka


[ML] Sentiment Analysis Machine Learning Model

  • This project is developed in SparkML, using Amazon sports and outdoors products review as a dataset.
  • There's the overall score which has a score range 1.0 to 5.0. I dropped rows where the overall score is 3.0 due to ambiguous sentiment, however, there's more than enough data for training and testing.
  • Created the pipeline to train the model, there are 5 stages including tokenizing, removing stop word, CountVectorization and Logistic regression.
  • The model was evaluated using a Binary Classification Evaluator on a total of 9003 rows of test data. The results indicate that the model correctly predicted 7776 rows, while 1227 rows were predicted incorrectly.

[MLE] Fast ML API

Deploy Machine Learining model as an ML API using FastAPI


ABC Theater Database

  • ER Diagram :

    erd


Conizant Data Scientist Job Simulate

This virtual internship as Data Scientist, I..

  • Completed a job simulation focused on AI for Cognizant’s Data Science team.
  • Conducted exploratory data analysis using Python script and Python notebook for one of Cognizant’s technology-led clients, Gala Groceries.
  • Prepared a Python module that contains code to train a model and output the performance metrics for the Machine Learning engineering team.
  • Communicated findings and analysis in the form of a PowerPoint slide to present the results back to the business.

Simple LLM Chatbot

I developed LLM Chatbot using Langchain Framework and OpenAI as based LLM model. The chatbot can answer online course questions by understanding the context of the question and matching it with prepared prompts and answers.

  • Technology used: Langchain Framework, OpenAI API, Faiss Vector Database, Streamlit

data-career-portfolio's People

Contributors

younive avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.