AI, ML, and DL: Core Concepts and Relationships
AI, ML, and DL: Core Concepts
Correlation: AI, ML, and DL
Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) are interconnected fields, each building upon the other, but they are not synonymous. AI is the broader concept of machines being able to carry out tasks in a way that we would consider “intelligent.” ML is a subset of AI that focuses on the development of algorithms that allow computers to learn from and make predictions or decisions based on data. DL is a subfield of ML that focuses on artificial neural networks with multiple layers (deep neural networks).
Supervised vs. Unsupervised Learning
| Supervised Learning (SL) | Unsupervised Learning (USL) |
|---|---|
| SL algorithms are trained using labelled data. | USL algorithms are trained using unlabelled data. |
| SL models take direct feedback to check if they are predicting the correct output. | USL models do not take any feedback. |
| SL models predict the output. | USL finds the hidden pattern in data. |
| In SL, input data is provided to the model along with the output. | In USL, only input data is provided to the model. |
| The goal of SL is to train the model so that it can predict the output when given new data. | The goal of USL is to find the hidden pattern and useful insight from the unknown data set. |
| SL models produce an accurate result. | USL models may give less accurate results compared to SL. |
| It includes algorithms such as linear regression, logistic regression, support vector machine, multiclass classification, decision tree, etc. | It includes algorithms such as clustering, KNN, and Apriori algorithm. |
Machine Learning is highly used to stay in competition and learn new things. Deep learning solves complex learning issues.
Deep Learning (DL)
DL is a subset of machine learning, which is predicated on the idea of learning from an example. Deep learning eliminates some of the data pre-processing typically involved with machine learning. In deep learning, a computer model learns to perform classification tasks directly from image, text, and sound. Deep learning models can achieve state-of-the-art accuracy, sometimes exceeding human-level performance. Models are trained by using a large set of labelled data and a neural network architecture that contains many layers.
Pros and Cons of Deep Learning
- Pros: Deep learning models achieve high accuracy in various tasks due to their ability to learn complex patterns from data. They can automatically extract relevant features from raw data, eliminating the need for manual feature engineering. With advancements in hardware and software, deep learning models can scale effectively to handle large datasets and complex problems. They exhibit adaptability by adjusting to new data and environments, making them suitable for dynamic scenarios.
- Cons: Deep learning models require large amounts of labelled data for training, which can be expensive and time-consuming to acquire. Training deep learning models often demands significant computational resources, including powerful GPUs or TPUs and large amounts of memory. Complex deep learning architectures are prone to overfitting, memorizing noise or specific patterns in the training data rather than generalizing well to unseen data. Deep learning models lack explainability, making it difficult to understand how they arrive at their predictions.
Machine Learning vs. Deep Learning Comparison
| Machine Learning | Deep Learning |
|---|---|
| ML is a superset of Deep Learning. | DL is a subset of Machine Learning. |
| ML algorithms are often linear. | DL algorithms are complex and nonlinear. |
| ML often consists of thousands of data points. | DL works on big data, so millions of data points. |
| Not necessary to have costly high-end machines. | High-end machines and high-performing GPUs are required. |
| ML is an evolution of AI. | DL is an evolution of Machine Learning; it defines the depth of ML. |
Popular Industrial Tools for Deep Learning
TensorFlow: It is an open-source machine learning framework for all developers, used for implementing machine learning and deep learning applications. The Google team created TensorFlow to develop and research fascinating ideas. TensorFlow is designed in Python, hence it is considered an easy-to-understand framework. TensorFlow includes a variety of machine learning and deep learning algorithms. It can train and run deep neural networks for handwritten digit classification, image recognition, word embedding, and creation of various sequence models. TensorFlow is the most famous symbolic maths library used for creating neural networks and deep learning models.
Keras: It is a high-level deep learning API developed by Google for implementing neural networks. It is written in Python and is used to make neural network implementation easy. It also supports multiple backend neural network computations. Keras runs on top of open-source machine libraries like TensorFlow, Theano, or Cognitive Toolkit. Keras is based on a minimal structure that provides a clean and easy way to create deep learning models based on TensorFlow and Theano. Keras is designed to quickly define deep learning models and is an optimal choice for deep learning applications.
PyTorch: It is an open-source machine learning library for Python and is completely based on Torch. It is primarily used for applications such as natural language processing. PyTorch was developed by Facebook’s Artificial Intelligence Research group, along with Uber’s ‘Pyro’ software for the concept of inbuilt probabilistic programming. PyTorch redesigns and implements Torch in Python while sharing the same core C libraries for the backend code. PyTorch developers tuned this backend code to run Python efficiently. They also kept the GPU-based hardware acceleration as well as the extensibility feature that made Lua-based Torch powerful.
Shogun: It is an open-source machine learning software library built in C++. It offers a wide range of efficient and unified machine learning algorithms. The heart of Shogun lies in kernel machines such as support vector machines for regression and classification problems. Shogun offers a full implementation of Hidden Markov models.
Its core is written in C++, and it offers interfaces for MATLAB, Octave, Python, R, Java, Lua, Ruby, and C#. The Shogun ML toolkit encourages exploration through its features (accessible, open source, and good) and pivots on ML education and development. Shogun is one of the largest and oldest open-source ML platforms.
Bias-Variance Trade-off
Bias: Assumption made by a model to make the function easier to learn. The algorithm’s error rate on the training set is the algorithm’s bias.
Variance: If you train your model on training data and obtain a very low error, and upon changing the data and then training the same previous model you experience a high error, this is variance.
Underfitting (High Bias and Low Variance)
A statistical model of a machine learning algorithm is set to have underfitting when it cannot capture the underlying trend of the data. It usually happens when we have less data to build an accurate model and also when we try to build a linear model with a non-linear dataset. In such cases, the rules of the machine learning model are too simple and inflexible to be applied to such minimal data, and therefore the model will probably make a lot of wrong predictions. Underfitting can be avoided by using more data and also reducing features by using feature selection.
Overfitting (High Variance and Low Bias)
A statistical model is said to be overfitted when we train it with too much data. When the model gets trained with so much data, it starts learning from the noise and inaccurate data entries in the dataset. Then the model does not categorize the data correctly because of too many details and noise. The cause of overfitting is the non-parametric and non-linear method because these types of machine learning algorithms have more freedom in building the model based on the dataset, and therefore they can build unrealistic models.
Understanding How Deep Learning Works in Three Figures
In machine learning, we take some data, train a model on that data, and use the trained model to make predictions on new data. The process of training a model can be seen as a learning process where the model is exposed to new, unfamiliar data step-by-step. At each step, the model makes predictions and gets feedback about how accurate its generated predictions were. This feedback, which is provided in terms of an error, is used to correct the errors made in prediction.
The first figure that can help in understanding deep learning is the architecture of a neural network. A neural network is composed of layers of interconnected ‘neurons’, which are inspired by the structure and function of a biological neuron in the brain. Each neuron in a layer receives input from the previous layer, processes it, and sends it to the next layer. The input layer receives the raw data, and the output layer produces the final output of the network. The layers in between are called ‘hidden layers’ and they are used to extract features and representations of the data.
The second figure that can help in understanding deep learning is the process of training a neural network. In supervised learning, data with labelled examples is used to train the network. The network is presented with inputs and the corresponding desired outputs, and its weights and biases are adjusted to minimize the difference between the network predictions and the desired outputs. This process is repeated for many examples in the dataset, and the network gradually learns to make accurate predictions on new, unseen examples.
The third figure that may help in understanding deep learning is forward and backward propagation. It is the process of passing input data through the layers of a neural network and computing the output, as well as adjusting the weights of the network in the backward pass by using an optimization algorithm, like Stochastic Gradient Descent (SGD). This is the process by which the network learns from data by minimizing the error between the predicted output and the actual output.
Hyperparameters in Neural Networks
Hyperparameters are settings that control the learning process of a neural network. They are not directly learned by the network from the data but rather set by the developer before training begins. Choosing the right hyperparameters is crucial for optimizing the performance of a neural network. Here are some common hyperparameters:
- Learning Rate: This determines how much the weights of the network are adjusted during training based on the error. A high learning rate can lead to faster learning but also instability, while a low learning rate can cause slow learning or getting stuck in local minima.
- Number of Epochs: An epoch represents one complete pass through the entire training dataset. The number of epochs determines how many times the network sees the training data during training.
- Batch Size: This defines the number of data points processed by the network before updating its weights. A larger batch size can improve efficiency but might lead to rougher updates, while a smaller batch size can be slower but provide more fine-grained updates.
- Number of Hidden Layers and Neurons: The architecture of the network, including the number of hidden layers and neurons in each layer, is a hyperparameter. This impacts the model’s capacity to learn complex patterns, but too many layers or neurons can lead to overfitting.
- Regularization Parameters: Techniques like L1 or L2 regularization penalize the model for having large weights, promoting simpler models and reducing overfitting. The strength of this penalty is a hyperparameter that needs to be tuned.
Forward Propagation
- Input Layer: We start with data fed into the input layer of the neural network. Each input is assigned a value.
- Hidden Layers: The data then travels through the hidden layers. At each neuron:
- The neuron receives inputs from the previous layer, each multiplied by a corresponding weight.
- These weighted inputs are summed together.
- An activation function (like ReLU) is applied to the sum, introducing non-linearity and determining the neuron’s output.
- This output becomes the input for the next layer of neurons.
- Output Layer: Finally, the data reaches the output layer. The process in the hidden layers is repeated, and the final output is generated.
Forward propagation simply calculates the output of the network for a given input. This initial output is likely inaccurate, as the weights in the network are randomly assigned at first.
Backpropagation
- Error Calculation: We compare the obtained output with the desired output (the actual label of the data). The difference between these values is the error.
- Error Propagation: This error is then propagated backward through the network layer by layer. At each neuron:
- The contribution of that neuron’s error to the overall error is calculated.
- Based on this contribution and the learning rate (a hyperparameter), the weights connected to that neuron are adjusted.
- Weight Update: These adjustments nudge the weights in a direction that aims to reduce the overall error. Backpropagation essentially guides the network towards better performance by fine-tuning the weights based on the error.
The Cycle
- We perform forward propagation to get an output for a data point.
- We calculate the error by comparing the output with the desired output.
- We use backpropagation to adjust the weights based on the error.
- We repeat steps 1-3 for all data points in the training set (one epoch).
- We iterate through multiple epochs, allowing the network to learn progressively and improve its accuracy.
Through this continuous process of forward propagation, error calculation, and backpropagation, the neural network gradually learns the optimal weights that minimize the error and produce accurate outputs for unseen data.
Sentiment Analysis in Detail
Sentiment analysis, also known as opinion mining, is a technique in Natural Language Processing (NLP) that aims to understand the emotional tone behind a piece of text. It categorizes the sentiment as positive, negative, or neutral. Here’s a summarized breakdown:
Applications
- Analyze customer reviews to gauge brand perception.
- Monitor social media sentiment towards a product or event.
- Filter comments and identify potentially offensive content.
Process Drafts
- Data Collection: Textual data like social media posts, reviews, or articles is gathered.
- Data Preprocessing: Clean text data by removing irrelevant symbols, stop words (common words like “the,” “a”), and applying stemming/lemmatization (reducing words to their root form).
- Feature Engineering: Extract features that convey sentiment, like specific words (“happy,” “sad”), emojis, or part-of-speech tags (nouns, verbs).
- Model Training: Train a machine learning model (e.g., Naive Bayes, Support Vector Machines) on labelled data (text with positive, negative, or neutral sentiment labels).
- Sentiment Classification: Use the trained model to classify the sentiment of new, unseen text data.
Challenges
- Sarcasm and irony can be misinterpreted by models.
- Context and domain-specific language can influence sentiment.
- Limited training data for specific languages or emotions.
Perceptron
The Perceptron is a simple type of artificial neural network. It was one of the first algorithms developed for solving binary classification problems where the goal is to classify data into one of two classes. A Perceptron is a type of artificial neuron used in neural networks to classify input data into one of two categories, based on a set of weights and a threshold.
The perceptron takes input from several sources and applies a weight to each input, determining how important each input is in making the decision. The perceptron then calculates the weighted sum of these inputs and compares this sum to a threshold value. If the sum is greater than the threshold, the perceptron output is 1 (indicating one category), and if the sum is less than the threshold, the perceptron output is 0 (indicating the other category).
