Skip to content

[QDP] Add SVHN IQP encoding benchmark with PennyLane baseline and QDP pipeline #1186

Open
ryankert01 wants to merge 3 commits intoapache:mainfrom
ryankert01:svhn-iqp
Open

[QDP] Add SVHN IQP encoding benchmark with PennyLane baseline and QDP pipeline #1186
ryankert01 wants to merge 3 commits intoapache:mainfrom
ryankert01:svhn-iqp

Conversation

@ryankert01
Copy link
Member

1. Introduction

This report presents a benchmark comparing two quantum encoding pipelines for a
binary classification task on the SVHN (Street View House Numbers) dataset. The
task discriminates digit 1 vs digit 7 using an IQP (Instantaneous Quantum
Polynomial) encoding followed by a variational quantum classifier.

The two pipelines under comparison are:

  • PennyLane baseline (pennylane_baseline/svhn_iqp.py) — IQP encoding is
    embedded inside the quantum circuit and re-executed on every forward/backward
    pass during training.
  • QDP pipeline (qdp_pipeline/svhn_iqp.py) — IQP encoding is performed
    once upfront on GPU via QDP's CUDA kernels. The training circuit loads
    pre-encoded state vectors using StatePrep.

The central question is: how much wall-clock time does one-shot GPU encoding
save compared to re-encoding on every circuit evaluation
, given identical
quantum states and identical training configurations?

2. Method

2.1 Dataset and Preprocessing

Step Description
Source SVHN train_32x32.mat + test_32x32.mat (Stanford)
Flatten 32 x 32 x 3 = 3072-dim float64 vectors, normalized to [0, 1]
Binary filter Keep digits 1 (+1) and 7 (-1) only
Subsample Uniformly sample n_samples from the filtered pool
Scale + PCA StandardScaler then PCA to n_qubits dimensions
Train/test split Single random permutation (seed-controlled), 80/20 split, performed once before all trials

2.2 IQP Encoding

Both pipelines implement the same IQP circuit:

$$|\psi\rangle = H^{\otimes n} ; U_{\text{phase}}(\mathbf{x}) ; H^{\otimes n} |0\rangle^{\otimes n}$$

where the diagonal unitary $U_{\text{phase}}$ applies:

  • Single-qubit phases: $\text{PhaseShift}(x_i) = \text{diag}(1,; e^{i x_i})$ on qubit $i$
  • Two-qubit phases: $\text{ControlledPhaseShift}(x_i x_j) = \text{diag}(1,; 1,; 1,; e^{i x_i x_j})$ on qubits $(i, j)$ for all $i < j$

This matches QDP's CUDA kernel (iqp.cu), which computes:

$$\text{amplitude}[z] = \frac{1}{2^n} \sum_{x} e^{i,\theta(x)} \cdot (-1)^{\text{popcount}(x ,\land, z)}$$

with $\theta(x) = \sum_i x_i \cdot \text{data}i + \sum{i<j} x_i \cdot x_j \cdot \text{data}_{ij}$.

PennyLane baseline constructs this circuit with explicit PennyLane gates
(Hadamard, PhaseShift, ControlledPhaseShift) inside a @qml.qnode. It is
re-evaluated on every forward and backward pass.

QDP pipeline calls QdpEngine.encode(method="iqp") once on GPU, converts
the resulting state vectors to NumPy, and feeds them via StatePrep during
training.

2.3 Variational Classifier

Both pipelines share the same classifier architecture:

  • Variational layers: num_layers repetitions of Rot(theta, phi, omega)
    on each qubit + a ring of CNOTs
  • Readout: expval(PauliZ(0)) + trainable bias
  • Loss: Mean squared error (square loss)
  • Optimizer: Adam (lr = 0.01)
  • Batching: Random mini-batches of size batch_size per step

2.4 Experimental Configuration

All runs use the following parameters unless stated otherwise:

Parameter Value
--n-samples 200
--n-qubits 6
--iters 200
--batch-size 10
--layers 4
--lr 0.01
--optimizer adam
--seed 42
--test-size 0.2
--early-stop 0 (disabled)

Hardware: Single NVIDIA GPU (CUDA), CPU for PennyLane default.qubit.

2.5 Fairness Controls

To ensure an apples-to-apples comparison, the following controls are enforced:

  1. Identical quantum states — PennyLane uses the same H-D-H sandwich
    circuit with the same phase convention ($e^{i\phi}$) as QDP's CUDA kernel.
  2. Identical train/test split — Both pipelines split data once in main()
    using np.random.default_rng(seed).permutation() before the trial loop.
  3. Identical batch sampling RNG — Inside run_training(), the RNG is
    initialized fresh with np.random.default_rng(seed) with no prior
    permutation() call, so both pipelines draw the same mini-batch sequences.
  4. Fair timing — QDP's encode time includes GPU-to-CPU (torch.Tensor.cpu().numpy())
    transfer.

3. Results

3.1 Single-Trial Comparison (seed = 42)

Metric PennyLane baseline QDP pipeline
Train samples 160 160
Test samples 40 40
Train accuracy 0.6625 0.6625
Test accuracy 0.6750 0.6750
Compile time 0.0838 s 0.0303 s
Train time 129.94 s 120.64 s
Throughput 15.4 samples/s 16.6 samples/s
IQP encode time (embedded in train) 0.0095 s (one-shot)
PCA time 0.1704 s 0.1700 s
  • Train and test accuracies match exactly, confirming numerical equivalence
    of the two encoding implementations.
  • QDP is ~7.2% faster in wall-clock training time (120.6 s vs 129.9 s).
  • QDP's one-shot encoding takes only 9.5 ms including GPU-to-CPU transfer.

3.2 Multi-Trial Run (3 trials, PennyLane baseline)

Seeds: 42, 43, 44. Same train/test split across all trials.

Trial Seed Train acc Test acc Train time (s)
1 42 0.6625 0.6750 133.10
2 43 0.6562 0.6750 133.33
3 44 0.6750 0.6500 129.93

Aggregate statistics:

Statistic Value
Best 0.6750
Median 0.6750
Mean +/- Std 0.6667 +/- 0.0118
Min 0.6500
Max 0.6750

The consistent train/test partition across trials (verified by constant
n_train = 160, n_test = 40) confirms the split-once-in-main fix.

3.3 Timing Breakdown (QDP pipeline)

Phase Time Notes
PCA 0.1700 s StandardScaler + PCA (CPU)
IQP encode (QDP) 0.0095 s GPU kernel + D2H transfer
Training (avg) 120.64 s default.qubit + Adam
Encode fraction < 0.01% of (encode + train)

The IQP encoding is negligible relative to training time for this problem size.

4. Discussion

4.1 Encoding Equivalence

The exact match in train/test accuracy (0.6625/0.6750) across pipelines
confirms that the PennyLane H-D-H circuit and QDP's CUDA kernel produce
identical quantum states. This is expected since:

  • $\text{PhaseShift}(\phi) = \text{diag}(1,; e^{i\phi})$ matches QDP's $e^{i \cdot x_i \cdot \text{data}[i]}$
  • $\text{ControlledPhaseShift}(\phi) = \text{diag}(1,; 1,; 1,; e^{i\phi})$ matches QDP's two-qubit interaction term
  • Both use the same $H$-$D$-$H$ sandwich structure

4.2 Performance

At 200 samples and 6 qubits, the state dimension is only $2^6 = 64$, so the
encoding cost is negligible in both pipelines. The ~7% speed advantage of QDP
comes from not re-running the IQP gates on every forward/backward pass. This
advantage is expected to grow with:

  • More qubits (exponential state space)
  • More training steps (more circuit evaluations)
  • Larger batches (more encoding calls per step in PennyLane)

4.3 Limitations

  • Accuracies (~67%) are modest because --iters 200 and --n-samples 200 are
    deliberately small for benchmarking speed, not classification quality.
  • Training uses PennyLane's default.qubit (CPU) in both pipelines. A
    GPU-native training backend would shift the bottleneck.

5. Reproduction

# PennyLane baseline
python3 qdp/qdp-python/benchmark/encoding_benchmarks/pennylane_baseline/svhn_iqp.py \
  --n-samples 200 --n-qubits 6 --iters 200 --batch-size 10 --layers 4 \
  --trials 1 --seed 42 --early-stop 0

# QDP pipeline
python3 qdp/qdp-python/benchmark/encoding_benchmarks/qdp_pipeline/svhn_iqp.py \
  --n-samples 200 --n-qubits 6 --iters 200 --batch-size 10 --layers 4 \
  --trials 1 --seed 42 --early-stop 0

# Multi-trial
python3 qdp/qdp-python/benchmark/encoding_benchmarks/pennylane_baseline/svhn_iqp.py \
  --n-samples 200 --n-qubits 6 --iters 200 --batch-size 10 --layers 4 \
  --trials 3 --seed 42 --early-stop 0

@ryankert01 ryankert01 changed the title feat: add QDP pipeline for SVHN IQP variational classifier with one-time encoding [QDP] Add SVHN IQP encoding benchmark with PennyLane baseline and QDP pipeline Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant