Neural Network from Scratch in C++

A complete implementation of a multi-layer neural network with backpropagation for the MNIST handwritten digit classification task. Built entirely from scratch without any external ML libraries.

Overview

This project implements a fully-connected neural network (also called a multi-layer perceptron or MLP) trained using backpropagation to classify handwritten digits from the MNIST dataset.

The implementation includes:

Custom matrix operations
Dense (fully-connected) layers with learnable weights and biases
Multiple activation functions
Cross-entropy and MSE loss functions
Stochastic Gradient Descent (SGD) optimizer

Project Structure

backpropagation/
├── include/
│   ├── activations.hpp
│   ├── layer.hpp
│   ├── matrix.hpp
│   ├── mnist_loader.hpp
│   └── neural_network.hpp
├── src/
│   ├── activations.cpp
│   ├── layer.cpp
│   ├── main.cpp
│   ├── matrix.cpp
│   ├── mnist_loader.cpp
│   └── neural_network.cpp
└── README.md

Features

Matrix Operations: Custom Matrix class with dot product, element-wise operations, transposition
Dense Layers: Fully-connected layers with configurable input/output sizes
Activation Functions: Sigmoid, ReLU, Leaky ReLU, Softmax, Tanh, Linear
Loss Functions: Cross-entropy (recommended for classification) and Mean Squared Error
Backpropagation: Full gradient computation through all layers
Stochastic Gradient Descent (SGD): Mini-batch training with configurable batch size
MNIST Loading: Native support for IDX format MNIST files

Building the Project

Prerequisites

C++17 compatible compiler
CMake 3.10 or higher

Compile

mkdir -p build && cd build
cmake ..
make -j$(nproc)

Running the Project

./neural_network <data_directory> <epochs> <learning_rate> <batch_size>

# Example: 20 epochs, learning rate 0.005, batch size 64
./neural_network ./data/mnist 20 0.005 64

Default Parameters

Parameter	Default Value	Description
Data Directory	./data/mnist	Path to MNIST files
Epochs	10	Number of training passes
Learning Rate	0.01	Step size for weight updates
Batch Size	32	Samples per training batch

Network Architecture

graph TD
    subgraph Input_Layer
        I[Input: 784 neurons<br/>28x28 pixels]
    end
    
    subgraph Hidden_Layer_1
        H1[128 neurons<br/>ReLU activation]
    end
    
    subgraph Hidden_Layer_2
        H2[64 neurons<br/>ReLU activation]
    end
    
    subgraph Output_Layer
        O[10 neurons<br/>Softmax activation<br/>Digits 0-9]
    end
    
    I --> H1
    H1 --> H2
    H2 --> O

Layer Details

Layer	Input	Output	Parameters	Activation
Layer 1	784	128	100,608	ReLU
Layer 2	128	64	8,256	ReLU
Output	64	10	650	Softmax
Total			109,514

How It Works

Forward Propagation

flowchart LR
    subgraph "Forward Pass"
        Ap[A_prev] --> W[W]
        Ap --> b[b]
        W --> Z[Z]
        b --> Z
        Z --> act[activation]
        act --> A[A]
    end

Forward propagation passes the input through each layer:

Linear transformation: Z = X * W + b
Activation: A = activation(Z)

Backward Propagation

flowchart TD
    Start[Start] --> Output[Output Layer]
    
    Output --> ActDerive[Compute dL/dZ]
    ActDerive --> WeightGrad[Compute dL/dW]
    ActDerive --> BiasGrad[Compute dL/db]
    ActDerive --> PrevGrad[Compute dL/dA_prev]
    
    PrevGrad --> Hidden1{Hidden Layer?}
    Hidden1 -->|Yes| Process1[Process this layer]
    Process1 --> Hidden1
    
    Hidden1 -->|No| Update[Update Weights]
    
    WeightGrad --> Update
    BiasGrad --> Update
    
    Update --> NextBatch[Next Batch]
    
    Hidden1 -->|No| Update

Backpropagation computes gradients recursively through each layer:

Start with gradient from layer above
For each layer (from output to input):
- Apply activation derivative
- Compute weight gradient
- Compute bias gradient
- Pass gradient to previous layer
Update weights

Training Algorithm

flowchart TD
    A([Start]) --> B[Initialize Network]
    B --> C{For each epoch}
    C --> D[Shuffle Training Data]
    D --> E{For each batch}
    E --> F[Forward Pass]
    F --> G[Compute Loss]
    G --> H[Backward Pass]
    H --> I[Update Weights]
    I --> E
    E -->|All batches| J[Evaluate on Test Set]
    J --> C
    C -->|All epochs| K([Done])

The training uses mini-batch Stochastic Gradient Descent:

Loop through epochs
Shuffle training data each epoch
For each batch: forward pass → compute loss → backward pass → update weights
Evaluate on test set after each epoch

Activation Functions

Function	Formula	Use Case
Sigmoid	1/(1+e^(-x))	Binary classification
ReLU	max(0, x)	Hidden layers
Leaky ReLU	x > 0 ? x : 0.01x	Dying ReLU fix
Softmax	e^(x_i) / Σe^(x_j)	Multi-class output
Tanh	tanh(x)	Hidden layers
Linear	x	Regression

Derivatives

Function	Derivative
Sigmoid	σ(x) × (1 - σ(x))
ReLU	1 if x > 0, 0 otherwise
Leaky ReLU	1 if x > 0, α otherwise
Softmax	Jacobian-based
Tanh	1 - tanh²(x)
Linear	1

Loss Functions

Cross-Entropy Loss

L = -Σ y_true × log(y_pred)

Used with softmax for multi-class classification.

Mean Squared Error

L = (1/n) × Σ (y_true - y_pred)²

Data Format

Input

Images: 28×28 → flattened to 784 values
Normalized: [0, 255] → [0, 1]

Output

Labels: one-hot encoded (10 classes)
Predictions: probability distribution

Customization

Adding New Layers

Layer newLayer(inputSize, outputSize, "activation_name");
nn.addLayer(newLayer);

Changing Loss Function

nn.setLossFunction("mse");
nn.setLossFunction("cross_entropy");

Performance Notes

Training accuracy: ~95%+ after 10 epochs
Test accuracy: ~94% after 10 epochs

Limitations

Basic SGD (no momentum/Adam)
No regularization
No batch normalization
CPU only

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
include		include
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.MD		README.MD

Folders and files

Latest commit

History

Repository files navigation

Neural Network from Scratch in C++

Table of Contents

Overview

Project Structure

Features

Building the Project

Prerequisites

Compile

Running the Project

Default Parameters

Network Architecture

Layer Details

How It Works

Forward Propagation

Backward Propagation

Training Algorithm

Activation Functions

Derivatives

Loss Functions

Cross-Entropy Loss

Mean Squared Error

Data Format

Input

Output

Customization

Adding New Layers

Changing Loss Function

Performance Notes

Limitations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages