: Study and practice various machine learning models
- 1-1 Gaussian Naive Bayes
- 1-2 Multinomial naive bayes
- select Distance d(a,b)
-
Categorical variables : Hamming distance
-
Continuous variables : Euclidian distance, Manhattan distance
-
-
3-1. Assumption
- each group of numbers has a probability distribution in the form of a normal distribution.
- each group of numbers has a similar covariance structure.
-
3-2. The characteristics of the decision boundary obtained as a result of LDA
-
Axis orthogonal to boundary
- Consider the shape of the distribution when the data is projected onto this axis.
-
Maximize the difference in means?
- Use the vector difference vector of the two means.
=> Boundary that maximize the difference between variance and mean
-
-
3-3. Advantage
- Unlike the naïve bayes model, it reflects the covariance structure between the explanatory variables.
- Relatively robust even when assumptions are violated.
-
3-4. Disadvantages
- The number of smallest samples must be greater than the number of explanatory variables.
- Poorly explained if it deviates significantly from the normal distribution assumption.
- Fails to reflect cases where the covariance structure is different between categories y .
-
3-5. Define and understand QDA
- QDA removes the assumption of a common covariance structure∑ independent of k.
- It can be utilized when different categories of Y have different covariance structures.
- QDA removes the assumption of a common covariance structure∑ independent of k.
-
-
Relative advantages of QDA
- y Allows for different covariance structures for different categories.
-
Relative disadvantages of QDA
- If you have a large number of explanatory variables, there are more parameters to estimate - Requires a large sample size
-
-
4-1. Background
-
When assumptions about the distribution of data are hard to make, how do you split the data below?
- focus on the boundary
- determine the boundary that maximizes the margin as shown below.
- Problem
-
What if there are cases that are not exactly distinct?
=> Allow a small amount of error and determine the boundary to minimize it
-
-
The dependent variable is divided into two categories based on the form of the data.
- Categorical variables
- Support vector classifier
- Continuous variables
- Support vector regression (SVR)
- Categorical variables
- Key to SVM, SVR
-
Distinguish between what will and will not affect model cost with margins
- SVM
- Points that fall within the margin, or are categorized in the opposite direction.
- SVR
- Points that are outside the margin.
- SVM
-
-
-
4-2. SVM with Kernel
- For non-linear relationships
- The curse of dimensionality
- When fitting data with a non-linear structure, it is necessary to use a kernel.
- However, as the dimensionality of the dth-degree polynomial increases, above a certain dimensionality, the number of parameters that need to be estimated increases, resulting in higher test errors.
-
4-3. SVM vs. LDA
-
Relative Advantages of SVM
-
When the data distribution is difficult, it is inefficient to consider the covariance structure.
- Only observations near the boundary can be considered.
-
Higher prediction accuracy.
-
-
Relative disadvantages of SVM
- Need to determine C
- Takes a long time to build the model
-
-
5-1. Definition
: A model that creates a criterion of variables and uses them to categorize a sample, and then estimates the properties of the categorized group.
- Advantages: highly interpretable, intuitive, universal.
- Disadvantages: high volatility. Can be sensitive to sample.
-
5-2. Decision tree terminology
-
Node - The location of the variable on which the classification is based. Divide the sample based on this.
- Parent node - a relative concept. Parent node.
- Child node - Lower node.
- Root node - The top-level node with no child nodes.
- Leaf node (Tip) - The lowest node with no children.
- Internal node - a node that is not a Leaf node.
-
Edge - Where the conditions that categorize the samples are located.
-
Depth - the number of edges that must be traversed to reach a particular node from the Root node.
-
Depending on the response variable
- Categorical variables : Classification tree
- Continuous Variables : Regression Tree (Estimate the category of y from its mean value)
-
5-3. Entropy
-
Entropy is often used as a criterion to select the best attribute for splitting a node in the tree.
-
The attribute that maximizes the information gain, which is the reduction in entropy achieved by splitting the node according to that attribute, is chosen as the best attribute.
-
The entropy of a set S with respect to a binary classification problem is given by the following formula:
-
-
5-3. Information Gain
-
Entropy difference before and after a particular node in a decision tree.
-
A higher information gain indicates that the attribute can split the dataset into more homogeneous subsets, making the classification task easier. Conversely, a lower information gain indicates that the attribute is less useful for classification.
-
-
5-4. classification Tree
-
According to the Tree condition. The idea of dividing the area that X can have into blocks.
-
Estimate Y from the attributes of the samples in the blocked region.
-
-
5-5. Regression Tree