Neural Network Architectures and Learning Concepts

Posted on Jan 27, 2026 in Computer Engineering

Feedforward Neural Network (FNN)

A Feedforward Neural Network is the simplest type of artificial neural network in which information flows in only one direction, from the input layer to the output layer. There are no feedback connections or loops.

Basic Structure

Consists of an input layer, one or more hidden layers, & an output layer.
Data flows from input → hidden → output.

Single-layer Feedforward Network

Has an input layer & an output layer only. The output layer performs the main computation using weights. Inputs are connected to outputs with different weights, & each output node gives one result.

Multilayer Feedforward Network (MLP)

Has one or more hidden layers between input & output. It is more powerful & commonly used than single-layer networks.

Working of MLP

Each neuron in a hidden layer takes the weighted sum of all inputs. Then it applies an activation function (e.g., sigmoid, ReLU) to decide its output. The output of one layer becomes the input for the next layer.

Training (Learning Process)

The network is trained using data & a learning algorithm such as backpropagation. During training, weights & biases are adjusted to reduce the error between predicted output & actual output.

Characteristics

Information moves only forward; no loops.
Does not have memory of previous inputs.

Applications

Used for classification, Regression (predicting continuous values), Speech recognition, Image recognition, etc.

Multilayer Perceptron (MLP)

A Multilayer Perceptron (MLP) is a type of feedforward neural network that contains one or more hidden layers between the input & output layers. It can learn complex non-linear relationships.

Architecture

Input Layer: Receives the raw input features from the dataset (e.g., pixel values, sensor readings).
Hidden Layers: One or more layers of neurons between input & output. Each neuron receives a weighted sum of inputs & passes it through an activation function.
Output Layer: Produces the final output such as a class label or a numerical value.

Working

Each neuron computes a weighted sum of its inputs plus a bias term. This sum is passed through an activation function (like sigmoid, ReLU) to produce the output of the neuron. The outputs of one layer become inputs to the next layer, so information flows forward through the network. During training, the network uses a learning algorithm such as backpropagation to adjust weights & biases so that the error between predicted output & actual target is minimized.

Recurrent Neural Network (RNN)

A Recurrent Neural Network (RNN) is a type of neural network designed to handle sequential data. Unlike feedforward networks, RNNs have feedback connections, which give them a form of memory about previous inputs.

Feedback Concept

In an RNN, the output of a neuron at one time step can be fed back as input at the next time step. This creates a loop and allows the network to maintain an internal state that stores information about the past.

Single-layer Feedback Network

Consists of one main recurrent layer with feedback connections. At each time step, the output depends on: the current input, and the previous hidden state.

Multilayer Feedback Network

Extends the idea to multiple hidden layers. Feedback can occur: within the same layer, and from deeper layers back to earlier layers. This allows for a richer and more complex memory system.

RNNs for Sequential Data

RNNs are suitable for tasks where order matters, such as time series, speech, text, and video.

Types of RNN Based on Input–Output Structure

One-to-One: Single input, single output (e.g., image classification).
One-to-Many: Single input, sequence output (e.g., image → image caption).
Many-to-One: Sequence input, single output (e.g., sentiment analysis from a sentence or review).
Many-to-Many: Sequence input, sequence output (e.g., machine translation, where a sentence in one language is translated into another sentence).

Applications

Used in language modelling, sentiment analysis, speech recognition, time-series prediction, and machine translation.

Radial Basis Function Network (RBFN)

A Radial Basis Function Network (RBFN) has three layers: Input, one Hidden Layer, and Output. The hidden layer units use radial basis functions that depend on the distance between the input and a center. Each hidden neuron responds strongly when the input is close to its center. The output layer computes a weighted sum of these responses. Used in pattern recognition, time-series prediction, and financial forecasting.

Recursive Neural Network (RecNN)

A Recursive Neural Network (RecNN) is designed for structured input such as trees. It applies the same set of weights recursively over a tree structure. Unlike RNNs (which work on sequences), RecNNs work on hierarchical structures. Used in NLP, especially where sentence structure (parse tree) is important, e.g., sentiment analysis.

Long Short-Term Memory (LSTM)

Long Short-Term Memory (LSTM) networks are a special type of Recurrent Neural Network (RNN) designed to handle long-term dependencies and to overcome the vanishing gradient problem of standard RNNs.

Motivation

Standard RNNs have difficulty learning from long sequences due to vanishing and exploding gradients. LSTMs were designed to remember information for long periods of time by controlling the flow of information.

Basic Structure

An LSTM unit contains: a cell state (long-term memory), and a hidden state (short-term output). It uses gates to control what to keep, what to forget, and what to output.

Forget Gate ($f_t$)

Input: current input and previous hidden state.
Output: a value between 0 and 1 for each element of the cell state.
Function: decides what part of the old cell state to forget.

Input Gate ($i_t$)

Has two parts: A sigmoid layer that decides which values to update, and A tanh layer that creates new candidate values.
Together, they determine what new information to store in the cell state.

Updating Cell State

Old cell state is first multiplied by the forget gate (to forget some information). Then the result is added to the new candidate values scaled by the input gate. This produces the updated cell state.

Output Gate ($o_t$)

Decides what part of the cell state becomes the hidden state (output) for the current time step. The hidden state is used for prediction and passed to the next time step.

Advantages

LSTMs can preserve long-term information and selectively forget irrelevant information.
They reduce the effect of vanishing gradients and allow training on long sequences.

Applications

Used for language modelling, machine translation, speech recognition, time-series forecasting, etc.

Model Evaluation Metrics

Accuracy: Out of ALL guesses, how many were correct? $\frac{TP+TN}{TP+TN+FP+FN}$
Precision: When the model says Fraud, how often is it right? $\frac{TP}{TP+FP}$
Recall: Out of all real frauds, how many did the model catch? $\frac{TP}{TP+FN}$
Specificity: Out of all real NOT-fraud people, how many the model correctly ignored? $\frac{TN}{TN+FP}$
F1 Score: Balance between precision and recall. When you want both to be good. $2 \times \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$

Convolutional Neural Network (CNN)

A Convolutional Neural Network (CNN) is a feed-forward deep learning model mainly used for image processing and computer vision tasks such as image classification, object detection, and recognition. It processes input data that has a grid-like topology, for example, a 2D image represented as a matrix of pixel values. CNN is also called ConvNet. Designed to automatically and adaptively learn spatial hierarchies of features from images. Uses layers like convolution, ReLU, pooling, flattening, and fully connected layers.

Input Representation

Each image is represented as a 2D (grayscale) or 3D (color) matrix of pixel values. This matrix is given as input to the CNN.

Convolution Layer

Uses small matrices called filters/kernels (e.g., 3×3). The filter is slid over the input image (with a step size called stride). At each position, the dot product of filter and image patch is computed to produce a feature map. Purpose: to extract low-level features like edges, lines, textures.

ReLU Layer (Activation)

ReLU = Rectified Linear Unit. Applies function: $f(x) = \max(0, x)$ element-wise on feature map. Converts negative values to 0 and introduces non-linearity.

Pooling Layer

Performs down-sampling of feature maps. Reduces spatial dimensions, number of parameters, and computational cost. Common pooling operations:

Max pooling – selects maximum value in a region.
Average pooling – takes average value in a region.

Helps in making the model more invariant to small translations in the input image.

Stacking of Convolution + ReLU + Pooling Layers

Several such layers are stacked to learn high-level features. Early layers learn edges, later layers learn shapes, parts, and finally objects.

Flattening Layer

Converts the final pooled feature maps into a single long vector. This vector is then passed to the fully connected layers.

Fully Connected (FC) Layer

Works like a traditional neural network. Every neuron is connected to all outputs of the previous layer. Learns the non-linear combination of high-level features to perform classification.

Output Layer

Typically uses Softmax activation for multi-class classification. Produces a probability distribution over possible classes (e.g., cat, dog, bird).

Applications

Image and video classification
Object detection and tracking
Face recognition
Medical image analysis, etc.

Ensemble Learning Methods

Ensemble learning combines multiple weak models to build a strong predictive model. Random Forest uses bagging, while AdaBoost uses boosting. Both improve accuracy but work differently.

1. Random Forest

Based on Bagging (Bootstrap Aggregating).
Creates many datasets using sampling with replacement.
Builds many decision trees in parallel.
Uses random feature selection at each split (feature bagging).
Final result: majority vote (classification) or average (regression).
Reduces variance and prevents overfitting. Works well for noisy datasets.

2. AdaBoost

Based on Boosting.
Builds weak learners sequentially.
Each new learner focuses on misclassified examples.
Assigns weights to data points and alpha values to weak learners.
Final prediction is a weighted vote.
Reduces bias by improving step-by-step. Sensitive to noisy data and outliers.

3. Key Differences

Random Forest → Parallel, reduces variance, uses full trees.
AdaBoost → Sequential, reduces bias, uses stumps.

Random Forest is more robust; AdaBoost is more sensitive but often very accurate.

Ensemble Methods Fundamentals

Ensemble methods are machine learning techniques that combine predictions from multiple models to form a more accurate and robust final model. They work on the idea of “wisdom of the crowd,” where multiple weak learners together form a strong learner.

1. Weak and Strong Learners

Weak Learner: Performs slightly better than random guessing.
Strong Learner: Formed by combining many weak learners.

2. Why Ensemble Methods Work

Reduce bias and variance.
Different models make different errors.
Combining many predictions cancels individual errors.

3. Types of Ensemble Methods

A. Bagging (Bootstrap Aggregating)

Goal: Reduce variance.
Trains multiple models on different bootstrapped samples. Models are trained in parallel.
Final prediction: Majority vote or average.
Example: Random Forest.

B. Boosting

Goal: Reduce bias.
Models are trained sequentially. Each new model focuses on misclassified data by increasing their weights.
Final prediction uses weighted voting.
Examples: AdaBoost, Gradient Boosting, XGBoost.

C. Stacking

Goal: Combine strengths of different models.
Trains multiple diverse models on the same data.
A final meta-model learns how to best combine their outputs.
Provides high predictive accuracy.

Handling Class Imbalance

Class imbalance occurs when one class heavily outnumbers another. In such cases, standard classifiers become biased toward the majority class, making accuracy an unreliable metric. Special techniques are needed to correctly learn & detect the minority class.

1. Use Appropriate Evaluation Metrics

Accuracy is misleading for imbalanced data. Use Precision, Recall, F1-score, & PR-AUC. These metrics focus on minority class performance. Use confusion matrix to examine true positives & false negatives.

2. Data Resampling Methods

a. Oversampling: Increase minority class samples by duplication.
b. SMOTE: Generates synthetic minority samples by interpolating between neighbors.
c. Undersampling: Reduces majority class size. Faster but may lose information.

3. Algorithm-Level Solutions

a. Cost-Sensitive Learning: Assign higher misclassification cost to minority class. Use parameters like class_weight='balanced'.
b. Algorithms Robust to Imbalance: Boosting methods (AdaBoost, XGBoost) give more focus to misclassified data. Tree-based models (Random Forest) capture minority patterns effectively.

4. Example

In fraud detection (99.5% normal, 0.5% fraud): Use F1-score & PR-AUC. Apply SMOTE to create synthetic fraud samples. Train Random Forest/XGBoost with balanced class weights. Evaluate using precision–recall thresholding.

Bias–Variance Tradeoff

The Bias–Variance Tradeoff is a key concept for improving machine learning models. It explains why a model performs poorly & guides the correct strategy to fix it. By analyzing whether a model has high bias or high variance, we can take targeted actions to improve its generalization.

1. Diagnosing High Bias (Underfitting)

High error on both training & test data.
Model too simple to learn patterns.
Example: Linear Regression used for a non-linear problem.

Fixes for High Bias

Use more complex models.
Add relevant features.
Reduce regularization strength.
Use Boosting (AdaBoost) to reduce bias.

2. Diagnosing High Variance (Overfitting)

Low training error but high test error.
Model learns noise instead of pattern.
Example: Very deep decision tree.

Fixes for High Variance

Use simpler models.
Collect more training data.
Apply feature selection.
Use regularization (L1/L2).
Use Bagging (Random Forest) to reduce variance.
Prune decision trees.

3. Why This Tradeoff Helps

Helps identify exact reason for poor performance.
Prevents random trial & error.
Allows targeted optimization strategies.
Leads to better generalization & efficient model tuning.

Neural Network Architectures and Learning Concepts

Feedforward Neural Network (FNN)

Basic Structure

Single-layer Feedforward Network

Multilayer Feedforward Network (MLP)

Working of MLP

Training (Learning Process)

Characteristics

Applications

Multilayer Perceptron (MLP)

Architecture

Working

Recurrent Neural Network (RNN)

Feedback Concept

Single-layer Feedback Network

Multilayer Feedback Network

RNNs for Sequential Data

Types of RNN Based on Input–Output Structure

Applications

Radial Basis Function Network (RBFN)

Recursive Neural Network (RecNN)

Long Short-Term Memory (LSTM)

Motivation

Basic Structure

Forget Gate ($f_t$)

Input Gate ($i_t$)

Updating Cell State

Output Gate ($o_t$)

Advantages

Applications

Model Evaluation Metrics

Convolutional Neural Network (CNN)

Input Representation

Convolution Layer

ReLU Layer (Activation)

Pooling Layer

Stacking of Convolution + ReLU + Pooling Layers

Flattening Layer

Fully Connected (FC) Layer

Output Layer

Applications

Ensemble Learning Methods

1. Random Forest

2. AdaBoost

3. Key Differences

Ensemble Methods Fundamentals

1. Weak and Strong Learners

2. Why Ensemble Methods Work

3. Types of Ensemble Methods

A. Bagging (Bootstrap Aggregating)

B. Boosting

C. Stacking

Handling Class Imbalance

1. Use Appropriate Evaluation Metrics

2. Data Resampling Methods

3. Algorithm-Level Solutions

4. Example

Bias–Variance Tradeoff

1. Diagnosing High Bias (Underfitting)

Fixes for High Bias

2. Diagnosing High Variance (Overfitting)

Fixes for High Variance

3. Why This Tradeoff Helps

Recent Notes

Subjects

Publicidad