Important ML algorithms and how to choose?
By admin
July 12th, 2023 · 16 min read
1047
What Is a Machine Learning (ML) Algorithm?
 A mathematical model or
 A set of rules
Since ML is a mathematical model, it needs no explicit coding for a given goal. It is the core component of a ML system that allows it to automatically learn and improve from its new data or new experiences in other words.
ML algorithms uses statistical techniques to identify patterns, relations and its dependencies to inputs and outputs in the data being a mathematical model. Then those models and patterns are used to make new predictions or decisions for even unseen data. This is the power of Machine Learning and accuracy lies mostly on the
training algorithm and
quality of data which is being used in the training phase.
In this article, let’s discuss more what are the existing ML algorithms and for what type of tasks suit the most.
What are the most common types of ML algorithms?
Most common sets of ML algorithms as in the following list. We will discuss each in more detail in coming up articles. There are three main types of algorithms under ML algorithms umbrella,
Supervised Algorithms:
Supervised learning algorithms use labeled datasets, in which inputs and relevant outputs are coupled. And typically, these data are labeled by an external supervisor (human/ program). For example: In Figure 03 [left], images of cats with label cats are fed and trained.
There are several supervised algorithms that can be used for training ML models. Then the trained model are used to predict unseen/new data (Figure 03 [right]).
# Supervised Learning Algorithms:
Here are some important supervised algorithms that are in literature.
Supervised Learning Algorithm  Description  Use cases 
Regression: Linear Regression  Captures the relation between the input variables and the output variable using a linear equation and finds the bestfitting line that minimizes the difference between the predicted and actual values.  Estimating real values (number of vehicle registrations, revenue of a company) 
Regression: Polynomial  Learns the relation between the input variables and the output variable using a polynomial equation and finds the bestfitting line that minimizes the difference between the predicted and actual values.  Estimating real values (nonlinear relation between the temperature of a chemical reaction and the reaction rate) 
Classification: Logistic Regression  Captures the relation between the input variables and the probability of belonging to a particular class. It applies a logistic function to map the input features to the probability of the output class.  Binary classification tasks 
Classification: Naive Bayes  Probabilistic classifier based on Bayes’ theorem; given the class, it calculates the probability of each class for a given input. It is simple and needs less computation, thus it is fast.  Text classification tasks. 
Random Forest:  Widely adopted in the industry since it’s very easy to deploy and gives good results, even though it performs a little worse than XGBoost. This is an ensemble learning method. Combining multiple decision trees, it builds a collection of decision trees and predicts based on the majority vote or average prediction of the individual trees.  
Classification/Regression: XGBoost (Extreme Gradient Boosting)  The algorithm that usually gives the best results and optimizes the objective function by iteratively adding new models and minimizing a loss function.  Recommendation Systems, Image Processing, Natural Language Processing (NLP), etc. 
Classification/Regression: KNearest Neighbors (KNN)  KNN is a nonparametric algorithm and classifies or predicts the output based on the majority vote or average of the nearest k neighbors in the training data, where “k” is a userdefined training parameter.  Movie recommendation systems, Anomaly Detection 
Classification and Regression: Support Vector Machine (SVM)  It constructs a hyperplane in a highdimensional feature space to separate different classes. The algorithm aims to maximize the margin between the classes, making it robust to outliers. SVM can also handle nonlinear relationships by using kernel functions to transform the data into a higherdimensional space.  
Classification and Regression: Decision Tree  Create a treelike model where each internal node represents a feature, and each leaf node represents a class or a predicted value. Decision trees make binary splits based on feature thresholds to partition the data into homogeneous subsets. They are easy to interpret and can handle both categorical and numerical features. 
For more info, follow this link:
Unsupervised ML Algorithms:
Unsupervised learning uses unlabeled data and try to group similar things together or find hidden structures. Sorting out some different objects into groups/categories without given any category just by matching the similarities data have. These algorithms are very useful for letting a machine to decide/explore from their own, without explicitly telling what is expected.
There are several unsupervised algorithms that can be used for training ML models. Then the trained model are used to predict unseen/new data or cluster or categorize given new data point (Figure 04).
# Unsupervised Algorithms:
Here are some important supervised algorithms that are in literature.
Unsupervised Algorithm: 
Description 
Use case 
Classification or Regression: Singular Value Decomposition (SVD) 
SVD and PCA are dimensionality reduction techniques used to extract the most important features from highdimensional data. SVD factorizes a matrix into three matrices, capturing the latent relationships between variables. 
Collaborative Filtering for Recommender Systems 
Classification or Regression: Principal Component Analysis (PCA) 
PCA is a specific application of SVD, where it identifies orthogonal components that explain the maximum variance in the data. These techniques are widely used for feature extraction, noise reduction, and data visualization. 
Dimensionality Reduction for Classification 
Classification: Kmeans: 
Kmeans is an unsupervised clustering algorithm used to partition data into k distinct clusters. It assigns each data point to the nearest cluster centroid based on the distance metric (Euclidean distance). Kmeans aims to minimize the withincluster sum of squared distances, forming tight clusters. It is a simple and efficient algorithm but requires a specific k value in advance. 
Customer Segmentation for Marketing 
For more info, follow this link:
Reinforcement Learning (RL) Algorithms:
In Reinforcement learning algorithm, ML model has access to an agent (physical/virtual). This agent interacts with its environment (physical/virtual) by sensing given/current state and learns to take an action in order to maximize its longterm reward. For example in training a dog to fetch a stick, reinforcement learning also involve a similar approach. The dog is the agent, and the environment is the area where the dog interacts with the stick and other objects. Goal is to bring back the throwed object and the reward is some snacks for it. Through trialanderror method the dog finally finds the action of picking up the stick [Goal].
There are several RL algorithms that can be used for training ML models. Then the trained models are used by the agent to take actions/decisions given environments. (Figure 05 [right]).
# Reinforcement Learning (RL) Algorithms:
We will now discuss what are the algorithms and what they are used and ideal for?
RL algorithm 
Description 
Use case 
Modelfree RL: QLearning: 
QLearning is a popular modelfree RL algorithm. It is based on the concept of learning actionvalue functions (Qvalues) that represent the expected cumulative rewards for taking a particular action in a given state. The agent iteratively updates Qvalues based on the observed rewards and the explorationexploitation tradeoff. 
Problems with discrete state and action spaces. 
Modelfree RL: Deep QNetwork (DQN): 
Extension of QLearning and combines RL with deep neural networks. It uses a deep neural network to approximate Qvalues, enabling RL in environments with highdimensional state spaces. DQN utilizes experience replay and target networks to stabilize learning and improve sample efficiency. 
Playing Atari games and controlling robotic systems. 
Modelfree RL: Policy Gradient (PG): 
Policy Gradient methods directly optimize a parameterized policy that maps states to actions. These methods use gradient ascent to update the policy parameters, aiming to maximize the expected cumulative rewards. 
For tasks like robotic control and game playing. 
ModelBased Monte Carlo (MBMC): 
MBMC is an RL algorithm that learns a model of the environment dynamics through Monte Carlo simulation. It collects samples by executing actions in the real environment and then uses those samples to learn a transition model and reward model. 
Trained models are used for planning and decisionmaking. 
ModelBased: DynaQ: 
Combines QLearning with planning. It maintains a learned model of the environment and uses it to simulate additional transitions. The simulated transitions are used to update the Qvalues, allowing for more efficient exploration and faster convergence. 

ModelBased Tree Search: 
ModelBased Tree Search algorithms, such as Monte Carlo Tree Search (MCTS), combine RL with tree search techniques. These algorithms build a search tree by iteratively simulating trajectories using the learned model. 
Enables efficient exploration and planning to find the optimal actions. 
For more info, follow this link:
Conclusion
In conclusion, machine learning algorithms play a crucial role in solving complex problems and extracting insights from data. In this article, we have provided an overview of various machine learning algorithms. In future articles, we will delve into each algorithm in detail, exploring their implementation side mostly with Python code. Stay connected with us and our social media for latest updates on this series.
Table of Contents
Solar Power radiometer model
$32.00
Share this article with your friends.
Last updated on July 12th, 2023.
918
What is Bard? Are the new Bing and Edge really a threat to Google?
By admin
February 23rd, 2023 · 16 min read