Kaggel Notebook: https://www.kaggle.com/code/mdriponmiah/ml-dl-interview-short-question-and-answer
@@ To learn very basic of ML/DL for Interview Preparation.<br> @@
A Perceptron is a type of linear classifier that separates data points into different classes by finding a linear boundary in feature space.
Overfitting in machine learning occurs when a model is trained too well on the training data, and as a result, it performs poorly on unseen data.
Supervised learning is a type of machine learning where the model is trained on labeled data and the goal is to make predictions about unseen data.
Unsupervised learning, on the other hand, is a type of machine learning where the model is not given any labeled data and the goal is to discover patterns or structure in the data.
Backpropagation algorithm is used to calculate the gradient of a loss function with respect to the weights of a neural network in order to update the weights in the opposite direction of the gradient.
A Convolutional Neural Network (CNN) is a type of deep learning model that is particularly effective in image and video analysis. CNNs use convolutional layers to learn the features of an image, and then use pooling layers to reduce the dimensionality of the feature maps.
Recurrent Neural Network (RNN) is a type of deep learning model that is designed to process sequential data. RNNs use feedback connections to process the current input in relation to the previous inputs. This allows them to maintain a hidden state that can be used to process sequences of variable length.
A generative model learns the underlying probability distribution of the data, and can generate new samples from that distribution. A discriminative model, on the other hand, learns to directly predict the output class or value from the input features.
The Bias-Variance tradeoff is a fundamental concept in machine learning that refers to the tradeoff between a model's ability to fit the training data well (low bias) and its ability to generalize to unseen data (low variance). Models with high bias tend to underfit the training data, while models with high variance tend to overfit the training data.
The curse of dimensionality refers to the problem that occurs in high-dimensional spaces, where the number of data points required to accurately estimate the underlying probability distribution increases exponentially with the number of dimensions. This can lead to poor performance and overfitting for machine learning models.
Dropout is a regularization technique for neural networks that randomly drops out (i.e. sets to zero) a certain proportion of neurons during training. This helps to prevent overfitting by adding noise to the activations and forcing the network to learn multiple independent representations of the data.
Batch normalization normalizes the activations of a layer for each mini-batch of data.
Layer normalization normalizes the activations of a layer for each example in a dataset.
Batch Normalization is more robust to the different scales of features, and is more often used in practice,
but Layer normalization can work better for small dataset.
GAN (Generative Adversarial Network) is a type of generative model that consists of two parts: a generator network that generates new samples, and a discriminator network that tries to distinguish the generated samples from real samples.
VAE (Variational Autoencoder) is a type of generative model that consists of an encoder network that maps the input data to a lower-dimensional latent space, and a decoder network that maps the latent space back to the original data space.
GANs are mainly used for generating images, while VAEs are mainly used for generative models and feature learning.
Regularization is a technique used to prevent overfitting in machine learning models by adding a penalty term to the loss function. This term makes the model more likely to choose a simpler solution that generalizes better to unseen data. The most common regularization techniques are L1 and L2 regularization, also called Lasso and Ridge respectively.
Parameters are the internal variables of a model that are learned during training, such as the weights of a neural network. Hyperparameters are the external variables that control the behavior of a model, such as the learning rate or the number of layers in a neural network.
A feedforward neural network is a type of network in which the data flows through the network in only one direction, from input to output. A recurrent neural network, on the other hand, has feedback connections that allow information to flow in multiple directions, allowing it to process sequential data.
The softmax activation function is used for multi-class classification problems. It maps the input values to a probability distribution over the classes. The sigmoid activation function, on the other hand, is used for binary classification problems. It maps the input values to a probability between 0 and 1.
Online learning is a type of machine learning where the model is trained on a stream of data and updates its parameters incrementally as new data becomes available. Offline learning, on the other hand, is a type of machine learning where the model is trained on a fixed dataset and the parameters are updated only once.
GPT is a type of transformer-based model for natural language processing tasks such as language translation, text summarization, and question answering. GPT has been pre-trained on a large corpus of text data and fine-tuned on specific tasks. GPT models are widely used for natural language understanding, and are considered state-of-the-art in language models.
Gradient Descent is an optimization algorithm that is commonly used to update the parameters of a machine learning model. It works by iteratively adjusting the parameters in the direction of the negative gradient of the loss function, with the goal of minimizing the loss and improving the model's performance. Gradient descent is important because it is the most commonly used optimization algorithm in machine learning and it is very efficient and easy to implement.
The ReLU activation function is a non-linear activation function that is commonly used in neural networks. It maps all negative input values to zero and all positive input values to the same positive output value. ReLU is computationally efficient and helps to improve the convergence of the network and prevent the vanishing gradient problem.
Supervised learning is a type of machine learning where the model is trained on labeled data and the goal is to make predictions about unseen data. Semi-supervised learning is a type of machine learning where the model is trained on a dataset that contains both labeled and unlabeled data. The goal is to make use of the unlabeled data to improve the performance of the model.
A decision tree is a type of model used for classification and regression problems. It is a tree-like structure where each internal node represents a feature or attribute of the data, each branch represents a decision based on the value of the feature, and each leaf node represents a prediction or class label. The decision tree algorithm works by recursively splitting the data based on the feature that provides the most information gain, until all the leaf nodes are pure.
Random Forest is an ensemble of decision trees. It is a supervised learning algorithm that is used for both classification and regression problems. The random forest algorithm creates multiple decision trees and combines their predictions by averaging or majority voting. This results in a more robust model that reduces overfitting and improves the generalization performance.
Support Vector Machine (SVM) is a type of linear model that is commonly used for classification problems. It works by finding the best boundary (or hyperplane) that separates the different classes in the data. The boundary is chosen in such a way that it maximizes the margin, or the distance between the boundary and the closest data points of each class, known as support vectors. SVM is robust to noise and can work well with high-dimensional data.
The k-Nearest Neighbors (k-NN) algorithm is a type of instance-based learning or non-parametric method. It is used for both classification and regression problems. The algorithm works by storing all the training data and then classifying a new data point based on the majority class or mean value of its k nearest neighbors. The value of k is a hyperparameter that can be chosen by the user or by using cross-validation methods. K-NN is simple to implement, but can be computationally expensive for large datasets.
Data preprocessing is the process of cleaning, transforming, and normalizing the data before it is fed into a machine learning model. It is important because raw data often contains missing or inconsistent values, outliers, and irrelevant features that can negatively impact the performance of the model. Data preprocessing helps to improve the quality of the data and make it more suitable for analysis.
Feature engineering is the process of creating new features or modifying existing ones to improve the performance of a machine learning model. It is important because it allows the model to extract more information from the data and make more accurate predictions. Feature engineering can involve techniques such as feature scaling, feature transformation, and feature extraction.
Exploratory data analysis (EDA) is the process of analyzing and summarizing the main characteristics of a dataset. It is important because it allows the data scientist to gain a deeper understanding of the data and identify patterns, outliers, and other relevant information that can inform the development of the model. EDA can involve techniques such as data visualization, statistical analysis, and hypothesis testing.
A population is the entire group of individuals or objects that we are interested in studying, while a sample is a smaller subset of the population that is selected for analysis. The goal of statistical analysis is to use information from the sample to make inferences about the population.
Overfitting occurs when a model is too complex and memorizes the noise in the training data, instead of generalizing to unseen data. It can be prevented by using techniques such as regularization, cross-validation, and early stopping.
A list is a mutable data type, which means that it can be modified after it is created. Lists are defined using square brackets, [].
A tuple is an immutable data type, which means that it cannot be modified after it is created. Tuples are defined using parentheses, ().
A dictionary is a data type that stores key-value pairs, where each key is unique. Dictionaries are defined using curly braces, {}.
A set is a data type that stores unique elements, it is unordered collection of items. Sets are defined using the set() function or curly braces with colons.