[QDP] Add SVHN IQP encoding benchmark with PennyLane baseline and QDP pipeline by ryankert01 · Pull Request #1186 · apache/mahout

ryankert01 · 2026-03-15T09:20:32Z

1. Introduction

This report presents a benchmark comparing two quantum encoding pipelines for a
binary classification task on the SVHN (Street View House Numbers) dataset. The
task discriminates digit 1 vs digit 7 using an IQP (Instantaneous Quantum
Polynomial) encoding followed by a variational quantum classifier.

The two pipelines under comparison are:

PennyLane baseline (pennylane_baseline/svhn_iqp.py) — IQP encoding is
embedded inside the quantum circuit and re-executed on every forward/backward
pass during training.
QDP pipeline (qdp_pipeline/svhn_iqp.py) — IQP encoding is performed
once upfront on GPU via QDP's CUDA kernels. The training circuit loads
pre-encoded state vectors using StatePrep.

The central question is: how much wall-clock time does one-shot GPU encoding
save compared to re-encoding on every circuit evaluation, given identical
quantum states and identical training configurations?

2. Method

2.1 Dataset and Preprocessing

Step	Description
Source	SVHN `train_32x32.mat` + `test_32x32.mat` (Stanford)
Flatten	32 x 32 x 3 = 3072-dim float64 vectors, normalized to [0, 1]
Binary filter	Keep digits 1 (+1) and 7 (-1) only
Subsample	Uniformly sample `n_samples` from the filtered pool
Scale + PCA	`StandardScaler` then PCA to `n_qubits` dimensions
Train/test split	Single random permutation (seed-controlled), 80/20 split, performed once before all trials

2.2 IQP Encoding

Both pipelines implement the same IQP circuit:

$$|\psi\rangle = H^{\otimes n} ; U_{\text{phase}}(\mathbf{x}) ; H^{\otimes n} |0\rangle^{\otimes n}$$

where the diagonal unitary $U_{\text{phase}}$ applies:

Single-qubit phases: $\text{PhaseShift}(x_i) = \text{diag}(1,; e^{i x_i})$ on qubit $i$
Two-qubit phases: $\text{ControlledPhaseShift}(x_i x_j) = \text{diag}(1,; 1,; 1,; e^{i x_i x_j})$ on qubits $(i, j)$ for all $i < j$

This matches QDP's CUDA kernel (iqp.cu), which computes:

$$\text{amplitude}[z] = \frac{1}{2^n} \sum_{x} e^{i,\theta(x)} \cdot (-1)^{\text{popcount}(x ,\land, z)}$$

with $\theta(x) = \sum_i x_i \cdot \text{data}i + \sum{i<j} x_i \cdot x_j \cdot \text{data}_{ij}$.

PennyLane baseline constructs this circuit with explicit PennyLane gates
(Hadamard, PhaseShift, ControlledPhaseShift) inside a @qml.qnode. It is
re-evaluated on every forward and backward pass.

QDP pipeline calls QdpEngine.encode(method="iqp") once on GPU, converts
the resulting state vectors to NumPy, and feeds them via StatePrep during
training.

2.3 Variational Classifier

Both pipelines share the same classifier architecture:

Variational layers: num_layers repetitions of Rot(theta, phi, omega)
on each qubit + a ring of CNOTs
Readout: expval(PauliZ(0)) + trainable bias
Loss: Mean squared error (square loss)
Optimizer: Adam (lr = 0.01)
Batching: Random mini-batches of size batch_size per step

2.4 Experimental Configuration

All runs use the following parameters unless stated otherwise:

Parameter	Value
`--n-samples`	200
`--n-qubits`	6
`--iters`	200
`--batch-size`	10
`--layers`	4
`--lr`	0.01
`--optimizer`	adam
`--seed`	42
`--test-size`	0.2
`--early-stop`	0 (disabled)

Hardware: Single NVIDIA GPU (CUDA), CPU for PennyLane default.qubit.

2.5 Fairness Controls

To ensure an apples-to-apples comparison, the following controls are enforced:

Identical quantum states — PennyLane uses the same H-D-H sandwich
circuit with the same phase convention ($e^{i\phi}$) as QDP's CUDA kernel.
Identical train/test split — Both pipelines split data once in main()
using np.random.default_rng(seed).permutation() before the trial loop.
Identical batch sampling RNG — Inside run_training(), the RNG is
initialized fresh with np.random.default_rng(seed) with no prior
permutation() call, so both pipelines draw the same mini-batch sequences.
Fair timing — QDP's encode time includes GPU-to-CPU (torch.Tensor.cpu().numpy())
transfer.

3. Results

3.1 Single-Trial Comparison (seed = 42)

Metric	PennyLane baseline	QDP pipeline
Train samples	160	160
Test samples	40	40
Train accuracy	0.6625	0.6625
Test accuracy	0.6750	0.6750
Compile time	0.0838 s	0.0303 s
Train time	129.94 s	120.64 s
Throughput	15.4 samples/s	16.6 samples/s
IQP encode time	(embedded in train)	0.0095 s (one-shot)
PCA time	0.1704 s	0.1700 s

Train and test accuracies match exactly, confirming numerical equivalence
of the two encoding implementations.
QDP is ~7.2% faster in wall-clock training time (120.6 s vs 129.9 s).
QDP's one-shot encoding takes only 9.5 ms including GPU-to-CPU transfer.

3.2 Multi-Trial Run (3 trials, PennyLane baseline)

Seeds: 42, 43, 44. Same train/test split across all trials.

Trial	Seed	Train acc	Test acc	Train time (s)
1	42	0.6625	0.6750	133.10
2	43	0.6562	0.6750	133.33
3	44	0.6750	0.6500	129.93

Aggregate statistics:

Statistic	Value
Best	0.6750
Median	0.6750
Mean +/- Std	0.6667 +/- 0.0118
Min	0.6500
Max	0.6750

The consistent train/test partition across trials (verified by constant
n_train = 160, n_test = 40) confirms the split-once-in-main fix.

3.3 Timing Breakdown (QDP pipeline)

Phase	Time	Notes
PCA	0.1700 s	StandardScaler + PCA (CPU)
IQP encode (QDP)	0.0095 s	GPU kernel + D2H transfer
Training (avg)	120.64 s	`default.qubit` + Adam
Encode fraction	< 0.01%	of (encode + train)

The IQP encoding is negligible relative to training time for this problem size.

4. Discussion

4.1 Encoding Equivalence

The exact match in train/test accuracy (0.6625/0.6750) across pipelines
confirms that the PennyLane H-D-H circuit and QDP's CUDA kernel produce
identical quantum states. This is expected since:

$\text{PhaseShift}(\phi) = \text{diag}(1,; e^{i\phi})$ matches QDP's $e^{i \cdot x_i \cdot \text{data}[i]}$
$\text{ControlledPhaseShift}(\phi) = \text{diag}(1,; 1,; 1,; e^{i\phi})$ matches QDP's two-qubit interaction term
Both use the same $H$-$D$-$H$ sandwich structure

4.2 Performance

At 200 samples and 6 qubits, the state dimension is only $2^6 = 64$, so the
encoding cost is negligible in both pipelines. The ~7% speed advantage of QDP
comes from not re-running the IQP gates on every forward/backward pass. This
advantage is expected to grow with:

More qubits (exponential state space)
More training steps (more circuit evaluations)
Larger batches (more encoding calls per step in PennyLane)

4.3 Limitations

Accuracies (~67%) are modest because --iters 200 and --n-samples 200 are
deliberately small for benchmarking speed, not classification quality.
Training uses PennyLane's default.qubit (CPU) in both pipelines. A
GPU-native training backend would shift the bottleneck.

5. Reproduction

# PennyLane baseline
python3 qdp/qdp-python/benchmark/encoding_benchmarks/pennylane_baseline/svhn_iqp.py \
  --n-samples 200 --n-qubits 6 --iters 200 --batch-size 10 --layers 4 \
  --trials 1 --seed 42 --early-stop 0

# QDP pipeline
python3 qdp/qdp-python/benchmark/encoding_benchmarks/qdp_pipeline/svhn_iqp.py \
  --n-samples 200 --n-qubits 6 --iters 200 --batch-size 10 --layers 4 \
  --trials 1 --seed 42 --early-stop 0

# Multi-trial
python3 qdp/qdp-python/benchmark/encoding_benchmarks/pennylane_baseline/svhn_iqp.py \
  --n-samples 200 --n-qubits 6 --iters 200 --batch-size 10 --layers 4 \
  --trials 3 --seed 42 --early-stop 0

…ime encoding

feat: add QDP pipeline for SVHN IQP variational classifier with one-t…

1517df9

…ime encoding

ryankert01 requested review from 400Ping and guan404ming as code owners March 15, 2026 09:20

ryankert01 changed the title ~~feat: add QDP pipeline for SVHN IQP variational classifier with one-time encoding~~ [QDP] Add SVHN IQP encoding benchmark with PennyLane baseline and QDP pipeline Mar 15, 2026

ryankert01 added 2 commits March 15, 2026 17:43

Merge branch 'main' into svhn-iqp

a293fdf

pre-commit

f1fd668

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QDP] Add SVHN IQP encoding benchmark with PennyLane baseline and QDP pipeline #1186

[QDP] Add SVHN IQP encoding benchmark with PennyLane baseline and QDP pipeline #1186
ryankert01 wants to merge 3 commits intoapache:mainfrom
ryankert01:svhn-iqp

ryankert01 commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ryankert01 commented Mar 15, 2026

1. Introduction

2. Method

2.1 Dataset and Preprocessing

2.2 IQP Encoding

2.3 Variational Classifier

2.4 Experimental Configuration

2.5 Fairness Controls

3. Results

3.1 Single-Trial Comparison (seed = 42)

3.2 Multi-Trial Run (3 trials, PennyLane baseline)

3.3 Timing Breakdown (QDP pipeline)

4. Discussion

4.1 Encoding Equivalence

4.2 Performance

4.3 Limitations

5. Reproduction

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant