Engin Deniz Alpman eğitmenin "Veri Bilimi" kursunu bitirme projem. (Patika.dev)
veribilimi's Introduction
📊📋Data Science🧮🗂
⚙ What is data and Data Science? 🤔
Data is everything we perceive, describe.
For example, the population of Turkey is a data. The population of Germany, the population of the world, simply dogs, cats, houses, schools are all data.
Subcategories of data:
Numeric Data
Categorial Data
When we look at Numeric Data closely we'll see:
Continous (Interval)
Discerete (Ratio)
When we look at Categorial Data closely we'll see:
Binary
Multiclass
Although I have simplified the meaning of the term "data", actually "Data Science" is a broad concept that encompasses mathematics and statistics, custom programming, advanced analytics, artificial intelligence (AI) and machine learning. Data science is a multidisciplinary field that uses scientific methods, processes, algorithms and systems to extract information and insights from structured and unstructured data.
Data Science is collected under 3 main headings:
It is a communication tool used to tell our requests to the computer. Deep Learning is a sub-branch of Machine Learning and Machine Learning is a sub-branch of Data Science.
Machine Learning has 2 areas:
There is no absolute 0 reference in "interval", but there is in "ratio".
There is no contiuous variable.
Prediction: We have a lot of data and we try to correctly guess the answer to a question from this data. For example, we have data such as the height, leaves and color of a flower, and we can estimate whether the flower is poisonous by looking at these data.
Mapping: f(x1,x2,x3)= ŷ↔y. So we describe the function as an input, and the "ŷ" as an output. Also "ŷ" means prediction of the model,and "y"means the truth. Our main goal in mapping is to minimize the errors that occur. error = e(ŷ,y)
If e=0 in known data, there is no such thing as e0 in unknown data. The main purpose is to minimize the errors that will arise from the unseen data while training the machine in the "train". So, how can we do this?
The answer is: I just train the model on the "train" validation and set the hyperparameters of the model. But since I did this hyperparameter update according to its good performance on validation, my model starts to overfit the validation set, albeit indirectly. So I need data that it has never seen to test it implicitly as well.
In short 🤐, I split the "train" into 3:
Train
Validation
Test
⚙ What is Bias Statics? 🤔
Bias is when the model systematically discriminates. Models carry the ideas of the people who created them. That's why every model is as objective as its designer. (Look "overstimate and "understimate") 👉https://www.statisticshowto.com/what-is-bias/