Common Machine Learning Algorithms

August 16, 2021

The major algorithms used in machine learning include

Linear Regression– Regression shows any line or curves passing through data points on a predictor graph in a manner that the distance(vertical) between the data points and the line of regression is minimum.

If the regression shows linear relation between two variables plotted on the graph, it is called linear regression

Here Umbrella is the dependent variable on rainfall ( independent)

Support Vector Machine

Support Vector Machine is a supervised ML algorithm used for classification or regression challenges. Here each data item is represented as a point in x dimensional space where x represents the features of the items. The value of each feature is plotted on a particular coordinate. Its goal is to create the best line or decision boundary (hyperplane) to group these x dimensional spaces into respective classes. For example, I come across a number that appears as an alphabet. Now using SVM, we can identify whether it is a number or an alphabet. We train the ML model with images of numbers and alphabets, allowing the model to analyze the images geometrically. Then we test it on our subject (either an alphabet or number).

Based on support vectors, we identify whether it is a number or an alphabet.

How SVM works?

Naïve Bayes- It is a supervised learning algorithm which works on bayes theorem.

P(A)- Probability of the hypothesis

P(B)- Probability of evidence

P(A|B) Probability of A based on event B.

P(B|A): Probability of evidence if the probability of the hypothesis is true.

It is generally used in text classification. It is called Naïve Bayes because the features occur independent of other features. Naïve Bayes can be used for binary and multi-class classifications

Logistic Regression– Logistic Regression is an ML algorithm used for classification. Contrary to linear regression, a sigmoid function or better called logistic function is used instead of a linear function. Sigmoid function is generally used to identify the predictive value of the probabilities.

Sigmoid- f(x) = 1/1+(e)^-x

K-Nearest Neighbors – This algorithm identifies the similarities between the new data and the available data cases and groups the new case to the most similar category.

Random Forest

It is a supervised learning ML algorithm that is made up of various decision trees , the entire structure being the forest of these trees. These various random decision trees are merged to make accurate predictions

K-Means Clustering

It is a type of unsupervised ML algorithm that is used to group data that are not sorted into categories

Hierarchical Clustering

In these almost similar objects are made into groups called clusters. The groups are formed in a manner that objects within one group bear more resemblance than objects within the other group. The end point is a hierarchy of clusters.

DBScan Clustering

With density-based scan algorithm, a slight change in data points doesn’t affect the clustering outcome.

The algorithm uses two parameters

minPts: The lowest number of points (the threshold value) grouped together in a cluster for considering a set or region dense
eps (E)-a measure for distance to devise out location of points present in the neighbourhood of any other point