Giter Site home page Giter Site logo

66-days-of-data's Introduction

Days of Data part 2

Week 1: Revisit previous work in python. Build python reflex.

Using Python Data Science Handbook.


  • Day 1: Finished Tina's SQL for Tech and Data Science Interviews
  • Day 2: Spinned up a GCP Virtual Machine and loaded it with Python, Visual Studio and some additional helpers (Juypter, Black). First time coding in VS, liking it so far. Hopefully this VM set up will jumpstart some coding assignments untill I get some official credentials to work with client data. Test
  • Day 3: Revisited Chapter 3 of Linked book above for pandas. Read and Looked at Pandas Profiling. May actually use in the next week.
  • Day 4: Off topic - Looked at a few different ML certs. My productions skills are lacking, so I may need to change the content of this 66 days to better align with my weaknesses. Also re-built the VM to Linux and learned about Poetry(Dependency Manager)
  • Day 5: Yesterday and today were short on time. At work I learned how to use pyenv and poetry for version and dependency management. My goal is to remote into the GCP Virtual Machine (where pyenv is installed), and spin up two projects each with different versions of python, and different versions of the desired libraries. Pretty neat stuff!
  • Day 6: Briefly learned about Docker, Kubernetes and Kubeflow for machine learning. I have alot more to learn about containerized environments for ML deployment.
  • Day 7: Learned about classes and Self. Now gotta learn to to make ML classes ๐Ÿ˜ญ. Also forgot to push this, this it posted day 8 ๐Ÿ˜… rip contributor streak.

Week 2: Looking for indepth pandas magic to speed up analysis. I actually need to update the rest of the challenge since my objectives have changed a bit.

Welcome to advanced pandas

Advanced pandas article

From novice to advanced pandas

Pandas to optimize speed and memory

Advanced pandas features

Pandas Cheat Sheet


  • Day 8: Read about the google style guide. Learned more about asserts and how to handle Try and Except blocks. I must get better at Docstrings as well for documenting classes. Also didn't know you can use Self outside of classes.
  • Day 9: Brought code up to google docstring style. Used type hinting in a function. Learned briefly about to create python packages.
  • Day 10: Writting an Python package complete with own modules. Nothing special just conducting RF model on the Iris dataset. Having it take key arguments, lots of self.thing. It feels much more modular this way.
  • Day 11: Somewhat finished the package. It has it's own modules and .py files, runs main() when invoked and takes command line arguments with docopt. Also breezed through this: Applied Pandas - Twitter Analytics. I picked up this course from humble bundle. Going to read Noah Gift's 'Cloud Computing for Data Analysis'. Hope to keep up with this for a few days on this challenge. 10 chapters, 1 chapter a day?? He also JUST published Practical MLOps
  • Day 12: Read the first chapter of 'Cloud Computing for Data Analysis'. YO! That chapter was jammed packed with resources. I read through it, and tmr I will go through the linked resources. Then read Chapter 2 on day 13, then go over chapter 2 resources on Day 14?
  • Day 13: Set up Azure resourse, Azure ML, Azure KeyVault, Azure Storage, Azure Insights... Just gotta get some data in there next. Hope to read chapter 2 tmr.
  • Day 14: Light day today. Watched a brief video about comparing different implementations of gradient bosting algos. Of course aglos should be considered carefully when implementing but CatBoost came out on top for accuracy and speed(top speed meaning slowest algo).

Week 3: Learn about machine learning in the cloud -> AZURE

Intro to ML with Python

'Cloud Computing for Data Analysis'

Perform data science with Azure Databricks


  • Day 15: Re-formatted my google sheet where I track my stock trading.
  • Day 16: Learned about Databricks and Azure. Having trouble uploading data to the blob. Delta lakes are cool. I gotta think harder about mounting to the blob/container and working with the data.
  • Day 17: Completed a intro to databricks and azure notebook. Attached parquet file to my very own file storage and mounted it to the notebook!
  • Day 18: Read chapter 2 and began chapter 3 of 'Cloud Computing for Data Analysis'. Topics where cloud computing foundations and then Docker, Kubernetes, Hybrids clouds and some more! This book is so insightful and has neat little links which lend themselfs to a great mobile reading experience.
  • Day 19: Through the week I was working on this course - 'Perform data science with Azure Databricks'. I finished that today. Hopefully soon I can put it to use. I have an idea of automating my reddit classifer or 'upserting' data from my IoT device into a vizualization.
  • Day 20: Watched How Starbucks Forescasts Demand at Scale wiht FaceBook Prophet and Azure Databricks. At first I didn't see the value of building a single product-price model but the power came from using pandas UDFs to train models on each product-store combo which you can distribute the workload on Azure Databricks with Spark. B) Also updated Next weeks goals.
  • Day 21: Attended satRdays - an R conference. I listened to three talks.

1: Intro to CatBoost 2: The Current State of NLP and Text-Based Machine Learning Modeling. 3: Survey, Linguistics, Analysis kNowledge Guide: SLANG, a tool for multi-faceted text analysis in R.

Week 4: Practicle MLOps: by Noah Gift.

I need to ramp up my cloud skills, and try to become as 'full-stack' as possible. Not sure how far this book will take me into 'full-stack' but it's a start!

Practical MLOps: Operationalizing Machine Learning Models


  • Day 22: Read Chapter 4 and 5 of 'Cloud Computing for Data Analysis'. Topics where Cloud Storage (AWS mainily) and challenges with distributed computing (the issues with CPU computing and benifits to GPU and TPU).
  • Day 23: Read the prologue and 1/2 of Chapter 1 -> Practical MLOps: Operationalizing Machine Learning Models Topics where: What is MLOps? How did MLOps become needed? The fundamentals of MLOps. The intersection of ML and DevOps = MLOps. Kaizen, continuous improvement.
  • Day 24: Finished chapter 1: Continues to address the usefulness and need for MLOps. Some exercises and questions are prompted which I plan to answer tomorow. Began reading Chapter 12 (1/2). This chapter is about case studies and linking MLOps to the real world. I thought this would be a good place to start so that the excercises would be more engaging. Reading these case studies is building my motivation to begin MLOps. I will think about my previous work and see if I can save it using MLOps.
  • Day 25: Built a twitch chat websocket on python. Trying to connect it to AWS Kinesis for streaming data analysis. I plan to read MLOps tonight before bed.
  • Day 26: Finished chapter 12 - began Chapter 2. Also connected the python script to Kinesis firehouse! Though, I can't see the raw data, I can see the transactions.
  • Day 27: Read a little bit more of Chapter 2. Completed a few exercises from Chapter 1 (CI and CD with Github actions and AWS Codebuild!). Build makefiles and tests to facilitate the continuous integration. Also finished the Factfullness audio book.
  • Day 28: Finished Chapter 2! Short day today.

Week 5: Cloud data products and their relevance to my work. A goal in mind here is to send twitch data into a Bigquery DB or AWS Bucket.

Understanding Cloud data services

Ai with Google

Accenture Applied AI


  • Day 29: Read Half of Chapter 3! It was about container and contained a walkthrough of setting one up (I did follow it....YET)
  • Day 30: Finished Chapter 3. The last half was about edge computing and highlighed where containers may play a role in configuring ML models for edge deployment. I'm beginning to understand how useful containers can be. Also messed around in Gitlab. Hoping to do some CI/CD in Gitlab tomorrow.
  • Day 31: Read Half of Chapter 4. I'm sensing a pattern here! I believe 1/2 chapter (about 20 pages) seems to fill me up on the reading. Got the CI/CD working in gitlab today :)
  • Day 32: Finished Chapter 4. Added a basic RF model into the MLOps repo complete with lints and tests so that the repo builds correctly with every push :)
  • Day 33: Finished Chapter 5. This chapter was about AutoML. I was today years old when I found out apple has an autoML solution built-in.
  • Day 34: Finished chapter 6. Logging and monitoring. I need to review the last 6 chapters tomorrow.
  • Day 35: Started chapter 10. This chapter is about model interoperablity. Chapters 7-8-9 deal with the big 3 cloud providers. I want to get each of those chapters special time.

Week 6: Deep Learning Begins: Keras

  • Day 36: Finished Chapter 10. The chapter wrapped up with converting a few tf modles to onnx format. Then converting an onnx format to apple ML.
  • Day 37: Started Chapter 11. :) A bit less than half the chapter today. First section is about CLI tools.
  • Day 38: Continued with Chapter 11. Encouraged new users to use highlevel 'severless' platforms as microservices. Don't invent the wheel, just use AWS coderunner. Sounds interesting!
  • Day 39: Finished Chapter 11. Thinking about re-reading chapter 12 though, I really need to start attacking all of the questions and exercises from the other chapters.
  • Day 40: Skimmed back over Chapter 12 (Case study about Athlete Intel). Read some Appendix chapters: Tech certifications (lots of info on AWS, focused on Azure), Remote work, Think like a Venture Capitalist (when build an ML career). I guess I'm doing everything I can to avoid these exercises huh? LOL
  • Day 41: Finally reading chapter 8: Azure ML
  • Day 42: 2/3 of the way with chapter 8 - Registering Models, Registering Datasets and Deploying Models.

Week 7: Deep Learning Continues:

Pytorch

NLP with Pytorch

  • Day 43: Made changes to Rchamp also added a github actions test when merging into main. I want to add lints and tests to the package. Eventually I would like to set up some auto data collection into an s3 bucket.
  • Day 44: Learned about xgboost4j in spark for distributed spark training.
  • Day 45: Finished Chapter 8, Attempted to use sparkdl for XGboost on EMR. Failed! I'll try again tmr... I think
  • Day 46: Spent most of the day trying to get xgboost onto emr. Wether it be different EMR versions. My last attempt will try to build the xgboost4j library for spark.
  • Day 47: Continued to try and build Xgboost4j for spark. Tried in EMR 5.19 and 5.33. 5.33 seemed to work but not 5.19! I don't think it will be woth the effort atm to continue down this path.
  • Day 48: Read some more about MLOps in GCP.
  • Day 49: Watched Ken Jee video about Business skills in data science.

Week 8: TensorFlow

  • Day 50: NOOOOOOOOOOO I missed this day :(
  • Day 51: Watched IAC with AWS CDK on pragmatic labs. Honestly, I need to revisit the plan for the rest of the challenge so that I can act on it. I've been ignoring the exercises :(.
  • Day 52: Another day GONE! No time set aside for the challenge so I watched GCP cloud functions for the impatient
  • Day 53: Added Chapter 2 exercises and questions to my private repo :)
  • Day 54: Read some more about GCP in Chapter 9, started with GKE. Also watched this vedo on DS for infrastructure
  • Day 55: Updated the CI for Rchamp package. Couldn't get it to completely work yet :(
  • Day 56: Watched part of this screen cast

Week 9: Advanced NLP with Python.

spaCy for text processing

LSTM and use cases

NLP with Python


  • Day 57: Watched some more of the DVC screen cast.
  • Day 58: NOOO9OOOO I FORGOT!
  • Day 59: Learned about DVC experiments. Looks a lot like AWS sage maker.
  • Day 60: NOOOOOOOOOOOO
  • Day 61: Watched AWS lambda demo by Noah Gift. Pretty interesting! A tool I could use for when I build a twitch sentiment api call. Since AWS lambda scales...
  • Day 62: Watched an AWS study guide by Noah Gift. Going to be looking for an Azure study guide. Study guides for DS or MLE
  • Day 63: Intro CICI with DVS. Feeling like im in a code rut :(.

Week 10: Advanced NLP with Python: Vector Models and RNNs.

  • Day 64: Watch #2 and #3 of the MLOps playlist from DVC
  • Day 65: Read about NLP word embeddings used in non-NLP settings. Such as product similarty for online purchases. Blog HERE
  • Day 66: Watched #4 of MLOPs playlist. This is using custom code-runers for CI/CD which I think is cool. This is good for using GPU runners for deep learning workloads.
  • [] Day 67
  • [] Day 68
  • [] Day 69
  • [] Day 70

66-days-of-data's People

Contributors

mowgl-i avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.