Skip to content

nisbenz/ScratchNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Neural Network from Scratch in C++

A complete implementation of a multi-layer neural network with backpropagation for the MNIST handwritten digit classification task. Built entirely from scratch without any external ML libraries.

Table of Contents

Overview

This project implements a fully-connected neural network (also called a multi-layer perceptron or MLP) trained using backpropagation to classify handwritten digits from the MNIST dataset.

The implementation includes:

  • Custom matrix operations
  • Dense (fully-connected) layers with learnable weights and biases
  • Multiple activation functions
  • Cross-entropy and MSE loss functions
  • Stochastic Gradient Descent (SGD) optimizer

Project Structure

backpropagation/
β”œβ”€β”€ include/
β”‚   β”œβ”€β”€ activations.hpp
β”‚   β”œβ”€β”€ layer.hpp
β”‚   β”œβ”€β”€ matrix.hpp
β”‚   β”œβ”€β”€ mnist_loader.hpp
β”‚   └── neural_network.hpp
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ activations.cpp
β”‚   β”œβ”€β”€ layer.cpp
β”‚   β”œβ”€β”€ main.cpp
β”‚   β”œβ”€β”€ matrix.cpp
β”‚   β”œβ”€β”€ mnist_loader.cpp
β”‚   └── neural_network.cpp
└── README.md

Features

  • Matrix Operations: Custom Matrix class with dot product, element-wise operations, transposition
  • Dense Layers: Fully-connected layers with configurable input/output sizes
  • Activation Functions: Sigmoid, ReLU, Leaky ReLU, Softmax, Tanh, Linear
  • Loss Functions: Cross-entropy (recommended for classification) and Mean Squared Error
  • Backpropagation: Full gradient computation through all layers
  • Stochastic Gradient Descent (SGD): Mini-batch training with configurable batch size
  • MNIST Loading: Native support for IDX format MNIST files

Building the Project

Prerequisites

  • C++17 compatible compiler
  • CMake 3.10 or higher

Compile

mkdir -p build && cd build
cmake ..
make -j$(nproc)

Running the Project

./neural_network <data_directory> <epochs> <learning_rate> <batch_size>

# Example: 20 epochs, learning rate 0.005, batch size 64
./neural_network ./data/mnist 20 0.005 64

Default Parameters

Parameter Default Value Description
Data Directory ./data/mnist Path to MNIST files
Epochs 10 Number of training passes
Learning Rate 0.01 Step size for weight updates
Batch Size 32 Samples per training batch

Network Architecture

graph TD
    subgraph Input_Layer
        I[Input: 784 neurons<br/>28x28 pixels]
    end
    
    subgraph Hidden_Layer_1
        H1[128 neurons<br/>ReLU activation]
    end
    
    subgraph Hidden_Layer_2
        H2[64 neurons<br/>ReLU activation]
    end
    
    subgraph Output_Layer
        O[10 neurons<br/>Softmax activation<br/>Digits 0-9]
    end
    
    I --> H1
    H1 --> H2
    H2 --> O
Loading

Layer Details

Layer Input Output Parameters Activation
Layer 1 784 128 100,608 ReLU
Layer 2 128 64 8,256 ReLU
Output 64 10 650 Softmax
Total 109,514

How It Works

Forward Propagation

flowchart LR
    subgraph "Forward Pass"
        Ap[A_prev] --> W[W]
        Ap --> b[b]
        W --> Z[Z]
        b --> Z
        Z --> act[activation]
        act --> A[A]
    end
Loading

Forward propagation passes the input through each layer:

  1. Linear transformation: Z = X * W + b
  2. Activation: A = activation(Z)

Backward Propagation

flowchart TD
    Start[Start] --> Output[Output Layer]
    
    Output --> ActDerive[Compute dL/dZ]
    ActDerive --> WeightGrad[Compute dL/dW]
    ActDerive --> BiasGrad[Compute dL/db]
    ActDerive --> PrevGrad[Compute dL/dA_prev]
    
    PrevGrad --> Hidden1{Hidden Layer?}
    Hidden1 -->|Yes| Process1[Process this layer]
    Process1 --> Hidden1
    
    Hidden1 -->|No| Update[Update Weights]
    
    WeightGrad --> Update
    BiasGrad --> Update
    
    Update --> NextBatch[Next Batch]
    
    Hidden1 -->|No| Update
Loading

Backpropagation computes gradients recursively through each layer:

  1. Start with gradient from layer above
  2. For each layer (from output to input):
    • Apply activation derivative
    • Compute weight gradient
    • Compute bias gradient
    • Pass gradient to previous layer
  3. Update weights

Training Algorithm

flowchart TD
    A([Start]) --> B[Initialize Network]
    B --> C{For each epoch}
    C --> D[Shuffle Training Data]
    D --> E{For each batch}
    E --> F[Forward Pass]
    F --> G[Compute Loss]
    G --> H[Backward Pass]
    H --> I[Update Weights]
    I --> E
    E -->|All batches| J[Evaluate on Test Set]
    J --> C
    C -->|All epochs| K([Done])
Loading

The training uses mini-batch Stochastic Gradient Descent:

  1. Loop through epochs
  2. Shuffle training data each epoch
  3. For each batch: forward pass β†’ compute loss β†’ backward pass β†’ update weights
  4. Evaluate on test set after each epoch

Activation Functions

Function Formula Use Case
Sigmoid 1/(1+e^(-x)) Binary classification
ReLU max(0, x) Hidden layers
Leaky ReLU x > 0 ? x : 0.01x Dying ReLU fix
Softmax e^(x_i) / Ξ£e^(x_j) Multi-class output
Tanh tanh(x) Hidden layers
Linear x Regression

Derivatives

Function Derivative
Sigmoid Οƒ(x) Γ— (1 - Οƒ(x))
ReLU 1 if x > 0, 0 otherwise
Leaky ReLU 1 if x > 0, Ξ± otherwise
Softmax Jacobian-based
Tanh 1 - tanhΒ²(x)
Linear 1

Loss Functions

Cross-Entropy Loss

L = -Ξ£ y_true Γ— log(y_pred)

Used with softmax for multi-class classification.

Mean Squared Error

L = (1/n) Γ— Ξ£ (y_true - y_pred)Β²

Data Format

Input

  • Images: 28Γ—28 β†’ flattened to 784 values
  • Normalized: [0, 255] β†’ [0, 1]

Output

  • Labels: one-hot encoded (10 classes)
  • Predictions: probability distribution

Customization

Adding New Layers

Layer newLayer(inputSize, outputSize, "activation_name");
nn.addLayer(newLayer);

Changing Loss Function

nn.setLossFunction("mse");
nn.setLossFunction("cross_entropy");

Performance Notes

  • Training accuracy: ~95%+ after 10 epochs
  • Test accuracy: ~94% after 10 epochs

Limitations

  • Basic SGD (no momentum/Adam)
  • No regularization
  • No batch normalization
  • CPU only

About

Neural Network from Scratch in C++

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages