Thirumala Reddy
December 13, 2022
AI(Artificial Intelligence):-
AI is an application that is able to do its own task without any human interaction
eg:- Netflix movie recommendation system, and Amazon recommendation system for buying products.
ML(Machine Learning):-
Machine learning is a field of artificial intelligence (AI). Machine learning deals with the concept that a computer program can learn and adapt to new data without human interference by using different algorithms.
DL(Deep Learning):-
Deep learning is nothing but a subset of machine learning that uses algorithms to reflect the human brain. These algorithms that come under deep learning are called artificial neural networks.
DS(Data Science):-
Data science is the study of data. The role of a data scientist involves developing the methods of recording, storing, and analyzing data to effectively extract useful information. The Final goal of data science is to gain insights and knowledge from any type of data.
Let's discuss Machine learning
Machine Learning is divided into 3 types
1)Supervised Machine Learning
2)Un Supervised Machine Learning
3)Reinforcement Machine Learning
1)Supervised Machine Learning:-
Supervised Machine Learning has 2 types
1)Classification
2)Regression
Classification:-
-->Classification is a process of categorizing a given data into different classes.
-->Classification can be performed on both structured and unstructured data to categorize data.
eg:-Classifying the mail whether it belongs spam or not spam
Regression:-
eg:-I have a company & I want to release 2 products,1st product is costly so I want to target Rich people, 2nd product is the medium cost so I want to target middle-class people. So when I am doing Add Marketing I can apply customer segmentation & can focus on that particular clusters
= | coefficient of determination | |
= | the sum of squares of residuals | |
= | the total sum of squares |
-->The top right chart of the above fig indicates the polynomial regression with a degree equal to 2.
-->R^2 will consider all values, R^2 won't care about whether values will affect output or not. So it will consider unnecessary values & predict R^2 but Adjusted R^2 will consider only required data for example
CLICK HERE FOR RIDGE REGRESSION CODE
1)Preventing Overfitting
2)Perform Feature Selection
Assumption of Linear Regression:-
CLICK HERE FOR LASSO REGRESSION CODE
1)Binary Classification
from the above figure we can observe that in Linear Regression if outlier is present the best fit line changes which result in mis-classification of data if we use Linear Regression for classification. It is not in the case of Logistic Regression. In Logistic Regression we curve will be in the shape of "S", not as like as a "Line" as per the Linear Regression . So, as the curve shape is "S", the classification of data points will occur accurately. So, for classification of data ,Logistic Regression is used.
A confusion matrix is one of the way to evaluate the performance of our algorithm. To construct confusion matrix we will take both predicted & actual responses of our model & we construct confusion matrix as below
where
-->Generally the result of accuracy is taken into consideration for Balanced data
eg:-In Spam classification, if we got spam mail it should be identified as spam & in spam classification we should concentrate on reducing FP i.e. even though the mail we got is not a spam but if our algorithm detects it as a spam, then we are going to miss our important mails .so in order to avoid this case we should concentrate on reducing FP
4)RECALL:--
Recall can be defined as out of total actual positive values, how many values did we correctly predicted positive is called Recall.
eg:-In classifying a person whether we has cancer or not FN is more important to reduce. If our model predicts that a person don't have a cancer even though he has a cancer this leads to increase of cancer cells in his body & affects his health.
Lets understood how the naive Bayes classifier will work by using below example:
Let's take the dataset of weather conditions and the corresponding target variable "Play". In the different, we have records of different whether conditions and with respect to the corresponding weather condition whether he/she can play or not.
Now by using the dataset we are classifying whether he/she can play when weather is sunny
Solution: To solve this, first consider the below dataset:
Outlook | Play | |
0 | Rainy | Yes |
1 | Sunny | Yes |
2 | Overcast | Yes |
3 | Overcast | Yes |
4 | Sunny | No |
5 | Rainy | Yes |
6 | Sunny | Yes |
7 | Overcast | Yes |
8 | Rainy | No |
9 | Sunny | No |
10 | Sunny | Yes |
11 | Rainy | No |
12 | Overcast | Yes |
13 | Overcast | Yes |
Weather | Yes | No |
Overcast | 5 | 0 |
Rainy | 2 | 2 |
Sunny | 3 | 2 |
Total | 10 | 5 |
Likelihood table weather condition:
Weather | No | Yes | |
Overcast | 0 | 5 | 5/14= 0.35 |
Rainy | 2 | 2 | 4/14=0.29 |
Sunny | 2 | 3 | 5/14=0.35 |
All | 4/14=0.29 | 10/14=0.71 |
Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
From above result we can notice that P(Yes|Sunny)>P(No|Sunny)
Hence on a Sunny day, Players can play the game.
K-Nearest Neighbor
-->KNN works on the distance concept
Manhattan Distance:-
Let us consider two points A(X1,Y1) & B(X2,Y2).The Manhattan Distance formula to measure the distance between these two A & B is as follows
Assumptions of K-Nearest Neighbor:-
-->From above we can observe as the no of features increased the performance of the model decreased.
Leaf Node:
The leaf nodes (green), also called terminal nodes, are nodes that don't split into more nodes.
-->A node is 100% impure when a node is split evenly 50/50 and 100% pure when all of its data belongs to a single class. In order to optimize our model we need to reach maximum purity and avoid impurity.
In the decision Tree, the purity of the split is measured by
1)Entropy
2)Gini Impurity
The features are selected based on the value of Information Gain
1)Entropy
-->Entropy helps us to build an appropriate decision tree for selecting the best splitter.
-->Entropy can be defined as a measure of the purity of the sub-split.
-->Entropy always lies between 0 to 1.
-->The entropy of any split can be calculated by this formula.
-->The split in which we got less entropy is selected & proceeds further
Information Gain:-
Information gain is the basic criterion to decide whether a feature should be used to split a node or not. The feature with the optimal split i.e., the highest value of information gain at a node of a decision tree is used as the feature for splitting the node
--->Information Gain is calculated for a split by subtracting the weighted entropies of each branch from the original entropy. When training a Decision Tree using these metrics, the best split is chosen by maximizing Information Gain.
-->The feature for which we got higher Information Gain is selected & proceed further.
-->Gini impurity has a maximum value of 0.5, which is the worst we can get, and a minimum value of 0 means the best we can get.
A decision tree builds regression or classification models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast, and Rainy), each representing values for the attribute tested. The leaf node (e.g., Hours Played) represents a decision on the numerical target. The topmost decision node in a tree corresponds to the best predictor called the root node. Decision trees can handle both categorical and numerical data.
-->Based on the mean squared error(MSE) the splitting done in regression type of problems in the Decision Tree
The strengths of decision tree methods are:
The weaknesses of decision tree methods :
max_depth
and min_samples_split
using cost_complexity_pruning
In this blog, i will use GridSearchCV for Hyperparameter tuning.
Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset." Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions, predicts the final output.
-->The algorithms that come under Boosting are
The formula to calculate the sample weights is:
Where N is the total number of data points
Step2:-creating a decision stump
We’ll create a decision stump for each of the features and then calculate the Gini Index of each tree. The tree with the lowest Gini Index will be our first stump.
Step3:-Calculating Performance say
We’ll now calculate the “Amount of Say” or “Importance” or “Influence” for this classifier in classifying the data points using this formula:
Step4:- Updating the weights
we need to update the weights because if the same weights are applied to the next model, then the output received will be the same as what was received in the first model.
The wrong predictions will be given more weight whereas the correct predictions' weights will be decreased. Now when we build our next model after updating the weights, more preference will be given to the points with higher weights.
After finding the importance of the classifier and total error we need to finally update the weights and for this, we use the following formula:
Step5:-Creating the buckets
We will create buckets based on Normalized weights
-->The classification of data points can be done by using a single line or it can also be done by using the nonlinear line. For linear we use kernel="linear" and for nonlinear line we use kernel="rbf"
Below are some of the features of K-Means clustering algorithms:
Some of the drawbacks of K-Means clustering techniques are as follows:
-->Let’s take six data points A, B, C, D, E, and F for constructing a dendrogram
Silhouette Clustering:-
-->The silhouette can be calculated with any distance metric, such as the Euclidean distance or the Manhattan distance.
STEP1:-
-->For each data point , we now define
Dataisgood is on a mission to ensure that everyone has the opportunity to thrive in an inclusive environment that fosters equal opportunities for advancement and progress. At Dataisgood, we empower individuals with live, hands-on training led by industry experts. Our goal is to facilitate successful transitions for those from non-tech backgrounds, equipping them with the skills and knowledge needed to excel in the tech industry. Additionally, we offer upskilling and reskilling opportunities through our industry-approved training programs, ensuring that professionals stay ahead in their careers
Dataisgood LLC.
447 Broadway,
NY 10013, USA
Ph: +1 718-682-7717
Addictive Learning Technology Pvt Ltd
B-75, Sector 63 Noida, 201301
Uttar Pradesh, India
Ph:+91-8700627800
Addictive Learning Technology Pvt Ltd
Corporate Office: 576, Block C,Sushant Lok Phase I, Sector 43, Gurugram, Haryana 122002
Ph:+91-8700627800
Skill Arbitrage Technology, Inc.
8 The Green,
Dover, DE 19901
Ph:+91-8700627800