The major algorithms used in machine learning include
Linear Regression– Regression shows any line or curves passing through data points on a predictor graph in a manner that the distance(vertical) between the data points and the line of regression is minimum.
If the regression shows linear relation between two variables plotted on the graph, it is called linear regression
Here Umbrella is the dependent variable on rainfall ( independent)
Support Vector Machine
Support Vector Machine is a supervised ML algorithm used for classification or regression challenges. Here each data item is represented as a point in x dimensional space where x represents the features of the items. The value of each feature is plotted on a particular coordinate. Its goal is to create the best line or decision boundary (hyperplane) to group these x dimensional spaces into respective classes. For example, I come across a number that appears as an alphabet. Now using SVM, we can identify whether it is a number or an alphabet. We train the ML model with images of numbers and alphabets, allowing the model to analyze the images geometrically. Then we test it on our subject (either an alphabet or number).
Based on support vectors, we identify whether it is a number or an alphabet.
How SVM works?
Naïve Bayes- It is a supervised learning algorithm which works on bayes theorem.
P(A)- Probability of the hypothesis
P(B)- Probability of evidence
P(A|B) Probability of A based on event B.
P(B|A): Probability of evidence if the probability of the hypothesis is true.
It is generally used in text classification. It is called Naïve Bayes because the features occur independent of other features. Naïve Bayes can be used for binary and multi-class classifications
Logistic Regression– Logistic Regression is an ML algorithm used for classification. Contrary to linear regression, a sigmoid function or better called logistic function is used instead of a linear function. Sigmoid function is generally used to identify the predictive value of the probabilities.
Sigmoid- f(x) = 1/1+(e)-x
K-Nearest Neighbors – This algorithm identifies the similarities between the new data and the available data cases and groups the new case to the most similar category.
Random Forest
It is a supervised learning ML algorithm that is made up of various decision trees , the entire structure being the forest of these trees. These various random decision trees are merged to make accurate predictions
K-Means Clustering
It is a type of unsupervised ML algorithm that is used to group data that are not sorted into categories
Hierarchical Clustering
In these almost similar objects are made into groups called clusters. The groups are formed in a manner that objects within one group bear more resemblance than objects within the other group. The end point is a hierarchy of clusters.
DBScan Clustering
With density-based scan algorithm, a slight change in data points doesn’t affect the clustering outcome.
The algorithm uses two parameters
- minPts: The lowest number of points (the threshold value) grouped together in a cluster for considering a set or region dense
- eps (E)-a measure for distance to devise out location of points present in the neighbourhood of any other point