Pre-College Big Data, Machine Learning, and Artificial Intelligence

This github repo introduces a portfolio of probability theory, statistical inference, and applied machine learning topics.

Instructor

Yiqiao Yin: He received B.A. in Mathematics from University of Rochester, M.S. in Finance from University of Rochester Simon Business School, M.A. in Statistics from Columbia University. He was a PhD student in the Statistics Department at Columbia University from Sept. 2020 to Dec. 2021. He has prior experience at T3 Trading as a Trader and at AQR Capital as a Quant. He also used to be an enterprise level Data Scientist at a EURO STOXX 50 company, Bayer. With three top hit skill sets (trading, quant, and data science) under his belt, he has a wide range of experience in financial analysis, quantitative modeling, statistical machine learning, representation learning, transfer learning, interpretable machine learning, computer vision, and so on.

He is currently a Senior Data Scientist at a S&P 500 company. He is the founder of W.Y.N. Associates, LLC and he also has media coverage of interesting topics in Money Management, Machine Learning, Deep Learning, Interpretable ML, Artificial General Intelligence, eXpalinable AI, and Responsible AI on his YouTube Channel: YinsCapital.

Additional course materials from Yiqiao Yin or W.Y.N. Associates can be found at WYN Education.

ITEMS	LINK
Courses	Link
- Overview	Link
-- Materials	Link
-- Resources and References	Link
-- Collection of Course Resources	Link
-- Software Installation	Link
-- News and Social Media	Link
Philosophy	Link
-- Teaching Philosophy	Link
-- Why do I want to teach	Link
-- Philosophy with Machine Learning / Data Science / Artificial Intelligence	Link
-- What You Take Away From This Course	Link
-- Skill Sets	Link
-- Career Development	Link

Courses

Overview

The course focuses on the strategic use of data and innovative technologies to derive actionable decision-making process. Participants develop strong understanding of using statistical learning and data-driven critical thinking for solving real-world problems. Students are introduced foundations of statistical learning and machine learning tools. Moreover, they gain data analytical thinking process and a familiarity of using programming language such as Python or R (software programming languages for statistics and data science).

Materials

Required text (all links are publicly available):

Probability and Statistics: Link
Introduction to Probability, Statistics and Random Processes: Link
Seeing Theory: Link
Professor Shervine's Computer Science Courses at Stanford: Link
James, Gareth, An Introduction to Statistical Learning: Link
Hastie, T., Tibshirani, R., and Friedman, R., The Elements of Statistical Learning: Link
Hastie and Efron, Computer Age Statistical Inference: Algorithms, Evidence, and Data Science: Link

We focus on selected topics in the required textbooks above. Topics are described under Course Schedule below. For additional textbooks, please reach out to Yiqiao Yin for more resources. It is also recommended for students to have a working laptop or desktop. Please install R / RStudio for homework as well as capstone project.

R/RStudio (required): please refer to RStudio Cloud, we will be using this IDE as the main development environment
Python/Jupyter Lab (required): please refer to Google Colab, we will be using this IDE as the main development environment
Github (a website repo where students show case their projects)

Resources and References

Let me cite these resources again here (and more):

For Probability and Statistics:

Probability and Statistics: site
Stat Lecture: site

For Computer Science & Deep Learning:

An Introduction to Statistical Learning: site
- Fundamental of Machine Learning by Mr. Yin: YouTube
Deep Learning: site
- Deep Learning Series by Mr. Yin: YouTube
Stanford CS 229: site
NYU CS Yan LeCun: site
Columbia COMS 4995: site
Isabelle Guyon: site
Trevor Hastie: site

For Capstone Projects (Individual Research Project):

Tensorflow Keras: site
Tensorflow R Interface: site
RStudio AI Blog: site

For Online CoderPad (Technical Interview):

Free Online Coding Platform (C++/Python/R/Matlab/Java all available): https://coderpad.io/launch-sandbox

Collection of Course Resources

Available resources for data:

UCI Machine Learning: link
Kaggle: link
Tensorflow datasets: link
Firm AI: link

Common resources:

The main course repo is here. Deep Learning related repo is here.
Related course topics are fully uploaded to YouTube channels: Machine Learning, and Deep Learning.
Google drive link (include both machine learning code and deep learning code) is here. I made it view-only, so please make sure to save it as a copy on your own Google Drive. This way other students can also use it as well and one person's edit won't interfere with others'.
Selected course zoom recordings are posted on this YouTube link.
Previous student projects are on this YouTube link. PLEASE FEEL FREE TO WATCH AND TAKE IT AS REFERENCE!

Software Installation

It is required to use the programming language Python. I will explain the difference of using Python and R.

For lecture, homework, tutorials, and capstone projects, we all use Python programming language.
You only need a Gmail account and Google Drive to be able to access Colab. You will need to log in to your Drive and look for "New" button on the top left of the screen. Once you click on "New" button, then please click on "More" button to open up the apps. If this is the first time you use Colab, you will have to go to "Connect More Apps". Once you click on it, a window pops out and you will have to search and install "Colab".

It is optional to use the programming language R.

For installation of the programming language R, please use the link here and click the big blue link that says "Download R 4.0.3. for Windows". In situation where the location is not in the United States, there could be different versions. Do not worry about this the version.
For installation of the programming IDE RStudio (this is a different installation of R), please use the link here
Do not worry about Windows or Macbook. The links provided above have download for both Windows and Macbook.
For installation of the programming language Python, please use the link here. It is required that you install Python. This is because we will render Python language in IDE RStudio.

For dev purpose, please install Git (sometimes also called Git Bash) using links here for Windows. If you have a Macbook, it should be built in already.

News and Social Media

Sometimes I just want to give myself some light and leisure reading on my getaway trip to a resort. When that's the case, I save the following websites to ensure I have an updated AI environment from around the world.

Philosophy

There are the following philosophy I would like to address with my audience.

Teaching Philosophy: Trinity Set

Teaching philosophy can be explained with the following diagram. The first element is about Passion. Passion is what guides learning and if not the most essential element one can seek to learn something. The next element is Talent. Most students might gain unforunate prior experience of the idea that they do not have talent (perhaps from their parents, friends, or school environment). Do not let your prior fear dominate what you can become. Everyone has talent and it is okay we take our time to explore what they might be. But you need to keep exploring. We fail not because somebody else tells we are failure but when we stop exploring our talent. The last element is Wealth. A good explanation of Wealth is money, but it is only a vanilla or perhaps naive explanation. A better one is resources. Every market every sector has its unique hiearchy and it is up to us to explore what each level of the hiearchy has in terms resources.

With the above definition given, the relationship of these three elements can be shown as the following diagram:

The left arrows connect Passion with Talent. If I am interested in a topic, it is more likely I spend more time on this topic. Then it makes sense I am getting better at it. The converse is also true. If I happen to stumble upon some skills that I am good at, it is veyr likely I want to keep doing things that way because I am good at it. It will make me fall in love of what I am doing.
The bottom arros connect Talent with Wealth. If I have a talent that is right on top of the competition pyramid that nobody else can get to, then I am in good shape and people want me to be there when they are in trouble which creates a market that generates wealth. The converse is true. When I have more money, I would very much like to strengthen the things that I used to generate that wealth so I can keep on making more money.
The right arrows connect Passion with Wealth. If I am passionate about a topic, then I love talking about it, reading news or stories about this topic, and spending more time with it. These series of actions will eventually broaden my pocket and resources about this topic which translate to wealth. The converse is also true. If I have wealth, resources I have on a topic can directly translates to passion if not reinforce it.

Things to think about:

What are some popular examples that holds all three elements of Trinity Set? LeBron James, Bill Gates, Elon Musk...
Why is it called a Trinity Set? What happens when one side of arraws dissappear?
Why an ideal career path needs all three elements?

Why do I want to teach

Why do I want to teach? What do I get out of this? Well, there are so many different reasons and they all based on different factors. The truth is that the reasons change over time. However, let me summarize in the following:

I truly love it. I really enjoy talking to different people and show people what I know about Data Science, Machine Learning, and Artificial Intelligence. It is, in some way, a small reinforcement to the knowledge flow in the overall population. I found it super helpful myself when I was in the beginning of my career and I want to give it back to the community if I can.
I am learning from all my students. Being humble aside, I truly believe one important way of learning is to teach. The process of stating a piece of information and then breaking it down to another person is non-trivial. It is not a skill to gain overnight. Rather, it takes years of practicing. On top of all that, it requires me to understand the knowledge and its many caveat scenarios by heart, which is a piece of experience I hold dear.
A data scientist's career goes takes the following trend: Data Analyst > Data Scientist > Senior Data Scientist > Lead/Principle Data Scientist > Associate Director > Director > Senior Vice President > Vice President and so on. I can take on a project, but that is just for me and it is not enough. I want to be able to train and mentor others to be successful in their data science career as well. This leadership skill and experience come in practice and training, years of it. It is also not that trivial. It takes time to get to know a person's background and their strength to be able to mentor them and encourage them to move above and beyond.

Philosophy with Machine Learning / Data Science / Artificial Intelligence

There are three stages of learning curves down the road of pursuing Machine Learning / Data Science / Artificial Intelligence. For detailed description of the following stages, please see My Advice.

Stage I: This stage is at the beginner level. There is a wide range of topics that I need to learn and I need to smash time to get through all of them as fast as I can. This means pick a common machine learning book and go through them word by word and topic by topic. Through this journey you also get a sense of what are the practices that other people do. More importantly, you need to explore what are the pros and cons of these practices. In brief, Stage I is about collecting tools.
Stage II: This stage is at intermediate level. This is a level where you already known plenty of topics and you probably already seen some interesting projects. You need to test and examine if you can reproduce or carry out these projects yourself by looking at a brand new data set. It is time to explore as many data sets as possible and to go through as many projects as possible independently. In brief, Stage II is about broaden your horizon.
Stage III: This stage is at an advanced level. This level assumes you already went through Stage I and II. It means you know almost all of the tools and you have done plenty of projects independently and critically. However, through all these practices you have come across some issues that were never resolved by previous people (academically or industrial). You have gain some insights yourself about these issues and more importantly you have a solution that you found. It is time to come up with a software package to collect the tools you create and become the expert yourself. In brief, Stage II is about creation and your legacy.

What You Take Away From This Course

You only live once. Your life is your legacy! We all want to build a better world and a nice world. I am here to tell you: Ideals are peaceful. History is violent.

You are here because you are entering college. This means you leave home for the first time in a very long time and you are about to write your own legacy. So you want to work hard and you want to deliver. Be the kind of person who thrives and shines. Be the kind of person who others look up to and who others can learn from. Show others what life can be. Show others what legacy is about.

End of the day we are fighting off a common enemy: the extinction of mankind. Newton left us the laws in physics. Confucius left us the Analects of ancient Chinese wisdom. Dr. MLK showed us what can be done as a regular citizen. Elon Musk sent a car to space. These are stories that were told. These are the stories that will be told. These are the legacies we are left with. These are all noble work. What type of noble work will you choose do? What kind of tomorrow are you trying to build that you can start today?

Skill Sets

Individual and Independent Github Opensource Package

You are responsible for the design of the package
You are responsible for the documentation and shipped code
You are capable of making a showcase what problems your package solves

Area of Expertise: Reference

Embedded Advanced Analytics - close collaboration & dialogue with business
Scalable Analytics Solutions - development of scalable analytics tools & recommendations in close cooperation with IT
Innovative Data Science - experts on AI, machine learning, deep learning
Integrated Care Platform - platform for digital health solutions and data architecture design

Skill Sets (you may end up developing): Reference

Lead research to advance the science and technology of intelligent machines.
Lead research that enables learning the semantics of data (images, video, text, audio, speech and other modalities).
Devise better data-driven models of human behavior.
Work towards long-term ambitious research goals, while identifying intermediate milestones.
Influence progress of relevant research communities by producing publications.
Contribute research that can be applied to Facebook product development.
Lead and collaborate on research projects within a globally based team.

Career Development

I make the assumption that you are all here because you are somewhat interested in a tech job. Allow me to illustrate the taxonomy of "tech job" and what that means.

Software Engineer:

Software engineering is a branch of computer science which includes the development and building of computer systems software and applications software. Computer systems software is composed of programs that include computing utilities and operations systems. Applications software consists of user-focused programs that include web browsers, database programs, etc.

There is a lot of investment going into software engineering at the moment due to the increasing reliance on mobile technology, venture capital-backed start-ups, the growing complexity of technology, and emerging industries. The demand for skilled and qualified software engineers seems to have no end. This demand is strengthened by a changing economic landscape and fueled by the need for technology solutions. With billions of physical devices around the world that are now connected to the internet and that are collecting and sharing data, all industries are quickly becoming technology driven industries. Reference

Research Scientist (AI Focused):

Research Scientists at Google work closely with Software Engineers to discover, invent, and build at the largest scale. Ideas may come from internal projects as well as from collaborations with researchers at partner universities and technical institutes all over the world. From creating experiments and prototyping implementations to designing new learning algorithms, Research Scientists work on challenges in machine perception, data mining, machine learning, and natural language understanding. As a Google Research Scientist, you will continue to be an active contributor to the wider research community by collaborating with academic researchers and by publishing papers.

Researchers on the Google AI team have the freedom to set their research agenda and to engage as much or as little as they wish with existing products, choosing between doing more basic, methodological research or more applied research as necessary to produce the most compelling results. Because many of the advances we develop today may take years to become useful, the team as a whole maintains a portfolio of projects across this spectrum. It is our philosophy that making substantive progress on hard applications can help drive and sharpen the research questions we study, and in turn scientific breakthroughs can spawn entirely new applications.

The Google AI team’s research focuses on methods that can learn multiple layers of rich, non-linear feature extractors and can scale to large amounts of data. Much of our work is best understood as part of the deep learning subfield of machine learning, but we are interested in any methods capable of efficient and effective feature learning that get good results on challenging problems. We have resources and access to projects impossible to find elsewhere. Our broad and fundamental research goals allow us to collaborate closely with and–contribute uniquely to–many different product teams across the company. Google Example

DeepMind Glassdoor DeepMind Career
Facebook AI Research (FAIR) FAIR Career
GoogleBrain Homepage Career

Data Scientist (Research Focused)

Research Scientist Facebook Example

Data Scientist (Data Cleanup Focused)

Data Scientist Facebook Example

Interview

Technical Interview:

R Questions: 27 R Questions
Python Questions: Python Interview Questions

beegee47 / introduction-to-machine-learning-big-data-and-application Goto Github PK

introduction-to-machine-learning-big-data-and-application's Introduction