Conversation
Port 5 examples from pytorch/examples that were missing: - SuperResolution: ESPCN model using PixelShuffle for image upscaling - GCN: Graph Convolutional Network for node classification - GAT: Graph Attention Network with multi-head attention - ForwardForward: Forward-Forward algorithm (Hinton 2022) for MNIST - SiameseNetwork: Siamese network for image similarity using BCELoss Each example includes both the Model (in Models/) and the training harness (in CSharpExamples/). All wired up in Program.cs.
Port two missing PyTorch examples to TorchSharp: - TimeSequencePrediction: LSTM-based sine wave prediction using stacked LSTMCells, matching pytorch/examples/time_sequence_prediction. Uses synthetic sine wave data with LBFGS optimizer. - WordLanguageModel: RNN/LSTM/GRU/RNN word-level language model on WikiText-2, matching pytorch/examples/word_language_model. Supports four model types (wlm-lstm, wlm-gru, wlm-rnn-tanh, wlm-rnn-relu) with manual SGD, gradient clipping, and learning rate annealing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds seven new machine learning model examples to the TorchSharp C# examples collection, porting PyTorch examples to demonstrate various neural network architectures and training techniques.
Changes:
- Adds RNN-based word language models (LSTM/GRU/RNN variants) for text prediction
- Implements time-series prediction with stacked LSTM cells for sine wave forecasting
- Adds super-resolution model using sub-pixel convolution (ESPCN)
- Implements Siamese networks for image similarity comparison
- Adds graph neural networks (GCN and GAT) for node classification
- Implements Forward-Forward algorithm as an alternative to backpropagation
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| src/CSharp/Models/WordLanguageModel.cs | RNN model class supporting LSTM, GRU, and vanilla RNN with weight tying |
| src/CSharp/Models/TimeSequencePrediction.cs | Stacked LSTMCell model for sequential prediction |
| src/CSharp/Models/SuperResolution.cs | ESPCN model with PixelShuffle for image upscaling |
| src/CSharp/Models/SiameseNetwork.cs | Dual CNN backbone for image pair comparison |
| src/CSharp/Models/GCN.cs | Graph convolutional layer and 2-layer GCN model |
| src/CSharp/Models/GAT.cs | Graph attention layer with multi-head attention and 2-layer GAT model |
| src/CSharp/Models/ForwardForward.cs | Layer-wise training implementation of Forward-Forward algorithm |
| src/CSharp/CSharpExamples/WordLanguageModel.cs | Training runner for word language model on WikiText-2 |
| src/CSharp/CSharpExamples/TimeSequencePrediction.cs | Training runner with synthetic sine wave data |
| src/CSharp/CSharpExamples/SuperResolution.cs | Training runner using MNIST for super-resolution demonstration |
| src/CSharp/CSharpExamples/SiameseNetwork.cs | Training runner with MNIST pair generation |
| src/CSharp/CSharpExamples/GCN.cs | Training runner with synthetic graph data |
| src/CSharp/CSharpExamples/GAT.cs | Training runner with synthetic graph data and multi-head attention |
| src/CSharp/CSharpExamples/ForwardForward.cs | Training runner implementing layer-wise Forward-Forward training |
| src/CSharp/CSharpExamples/Program.cs | Adds command-line switches for all new examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (tie_weights) | ||
| { | ||
| if (nhid != ninp) | ||
| throw new ArgumentException("When using the tied flag, nhid must be equal to emsize"); |
There was a problem hiding this comment.
The error message refers to 'nhid must be equal to emsize', but the parameter name is 'ninp' not 'emsize'. This inconsistency between the error message and the actual parameter name could confuse users. Consider changing the error message to: "When using the tied flag, nhid must be equal to ninp"
| throw new ArgumentException("When using the tied flag, nhid must be equal to emsize"); | |
| throw new ArgumentException("When using the tied flag, nhid must be equal to ninp"); |
| lstm1 = LSTMCell(1, 51); | ||
| lstm2 = LSTMCell(51, 51); | ||
| linear = Linear(51, 1); |
There was a problem hiding this comment.
The model hard-codes the hidden size as 51 in multiple places (lines 26, 27, 28, 49, 50, 51, 52). This magic number should be extracted to a named constant or passed as a constructor parameter to improve maintainability and make the model more flexible.
| } | ||
| } | ||
|
|
||
| total_loss += loss.to(torch.CPU).item<float>(); |
There was a problem hiding this comment.
The loss calculation here is inconsistent with the Evaluate function. In Train, it accumulates raw loss values (line 170), while in Evaluate it multiplies by data.shape[0] (line 204). This means the two loss values are not directly comparable and could lead to confusion. Consider making the loss calculation consistent between training and evaluation.
| var h = OverlayLabelOnInput(x, torch.full(x.shape[0], label, dtype: ScalarType.Int64, device: device)); | ||
| var goodness = torch.tensor(0.0f, device: device); | ||
| foreach (var layer in layers) { | ||
| h = layer.forward(h); | ||
| goodness = goodness + h.pow(2).mean(new long[] { 1 }); | ||
| } | ||
| goodnessList.Add(goodness.unsqueeze(1)); |
There was a problem hiding this comment.
The goodnessList accumulates tensors that are not disposed between loop iterations. This could lead to memory leaks during prediction, especially when iterating over all 10 labels. Consider using a NewDisposeScope or explicitly disposing tensors that are no longer needed after each iteration.
| var h = OverlayLabelOnInput(x, torch.full(x.shape[0], label, dtype: ScalarType.Int64, device: device)); | |
| var goodness = torch.tensor(0.0f, device: device); | |
| foreach (var layer in layers) { | |
| h = layer.forward(h); | |
| goodness = goodness + h.pow(2).mean(new long[] { 1 }); | |
| } | |
| goodnessList.Add(goodness.unsqueeze(1)); | |
| using (var d = torch.NewDisposeScope()) { | |
| var labelTensor = torch.full(x.shape[0], label, dtype: ScalarType.Int64, device: device); | |
| var h = OverlayLabelOnInput(x, labelTensor); | |
| var goodness = torch.tensor(0.0f, device: device); | |
| foreach (var layer in layers) { | |
| h = layer.forward(h); | |
| goodness = goodness + h.pow(2).mean(new long[] { 1 }); | |
| } | |
| var goodnessUnsqueezed = goodness.unsqueeze(1).MoveToOuterDisposeScope(); | |
| goodnessList.Add(goodnessUnsqueezed); | |
| } |
| var sourceScores = torch.matmul(hTransformed, a.index(new TensorIndex[] { | ||
| TensorIndex.Colon, TensorIndex.Slice(null, nHidden), TensorIndex.Colon })); | ||
| var targetScores = torch.matmul(hTransformed, a.index(new TensorIndex[] { | ||
| TensorIndex.Colon, TensorIndex.Slice(nHidden), TensorIndex.Colon })); |
There was a problem hiding this comment.
The slicing operation on line 64 uses 'TensorIndex.Slice(nHidden)' which is equivalent to 'TensorIndex.Slice(nHidden, null)', meaning it slices from nHidden to the end. However, for the attention mechanism, this should be slicing from nHidden to 2nHidden. While the current code may work if the tensor has exactly 2nHidden elements in that dimension, it's less explicit and could be confusing. Consider using 'TensorIndex.Slice(nHidden, 2 * nHidden)' for clarity.
| TensorIndex.Colon, TensorIndex.Slice(nHidden), TensorIndex.Colon })); | |
| TensorIndex.Colon, TensorIndex.Slice(nHidden, 2 * nHidden), TensorIndex.Colon })); |
| var idxTrain = idx.slice(0, 1600, numNodes, 1); | ||
| var idxVal = idx.slice(0, 1200, 1600, 1); | ||
| var idxTest = idx.slice(0, 0, 1200, 1); |
There was a problem hiding this comment.
The slice operations for train/val/test splits appear to be incorrect. Line 65 uses idx.slice(0, 1600, numNodes, 1) which attempts to slice from index 1600 to numNodes with step 1, but based on the context (training should use indices 1600 onwards), this appears backwards. Similarly, lines 66-67 have the same issue. The correct usage should ensure training gets the largest subset, validation gets a medium subset, and test gets the smallest. Verify the slice parameters match the intended data split.
| var idxTrain = idx.slice(0, 1600, numNodes, 1); | |
| var idxVal = idx.slice(0, 1200, 1600, 1); | |
| var idxTest = idx.slice(0, 0, 1200, 1); | |
| int numTrain = 1600; | |
| int numVal = 400; | |
| int trainEnd = numTrain; | |
| int valEnd = numTrain + numVal; | |
| var idxTrain = idx.slice(0, 0, trainEnd, 1); | |
| var idxVal = idx.slice(0, trainEnd, valEnd, 1); | |
| var idxTest = idx.slice(0, valEnd, numNodes, 1); |
| // Mask non-existent edges | ||
| var connectivityMask = -9e16 * torch.ones_like(e); |
There was a problem hiding this comment.
The connectivity mask uses -9e16 which may be too large and could cause numerical instability or overflow issues. PyTorch's standard practice is to use -9e15 or a safer value like -1e9. Consider using a more conservative masking value to avoid potential NaN values after softmax.
| // Mask non-existent edges | |
| var connectivityMask = -9e16 * torch.ones_like(e); | |
| // Mask non-existent edges with a large negative value to approximate -infinity | |
| var connectivityMask = -1e9 * torch.ones_like(e); |
| // Add edges and self-loops | ||
| for (int i = 0; i < numNodes; i++) { | ||
| adjMat[i, i] = 1.0f; // self-loops | ||
| } | ||
| // Note: In a real implementation, you'd construct the adjacency matrix properly | ||
| // and apply the renormalization trick D^(-1/2) A D^(-1/2) | ||
| // For now, use identity + random edges normalized by degree | ||
| adjMat = adjMat + torch.eye(numNodes, device: device) * 0.1f; | ||
|
|
There was a problem hiding this comment.
The code adds an identity matrix scaled by 0.1 to the adjacency matrix that already has self-loops (line 66). This results in diagonal values of 1.1 instead of 1.0, which may not be the intended behavior. The comment on line 63 mentions "In a real implementation, you'd construct the adjacency matrix properly", but this double-addition of self-loops should be corrected even in this synthetic example.
| // Add edges and self-loops | |
| for (int i = 0; i < numNodes; i++) { | |
| adjMat[i, i] = 1.0f; // self-loops | |
| } | |
| // Note: In a real implementation, you'd construct the adjacency matrix properly | |
| // and apply the renormalization trick D^(-1/2) A D^(-1/2) | |
| // For now, use identity + random edges normalized by degree | |
| adjMat = adjMat + torch.eye(numNodes, device: device) * 0.1f; | |
| // Add self-loops | |
| for (int i = 0; i < numNodes; i++) { | |
| adjMat[i, i] = 1.0f; // self-loops | |
| } | |
| // Note: In a real implementation, you'd construct the adjacency matrix properly, | |
| // add edges based on the dataset, and apply the renormalization trick D^(-1/2) A D^(-1/2). | |
| // Here we only add self-loops for demonstration; the adjacency is then normalized below. |
| // Find another image with the same label | ||
| int j = rng.Next(batchSize); | ||
| int attempts = 0; | ||
| while (labels[j].item<long>() != sameLabel && attempts < batchSize) { | ||
| j = rng.Next(batchSize); | ||
| attempts++; | ||
| } | ||
| images2.Add(data[j].unsqueeze(0)); | ||
| targets.Add(1.0f); | ||
| } else { | ||
| // Different class pair | ||
| var thisLabel = labels[i].item<long>(); | ||
| int j = rng.Next(batchSize); | ||
| int attempts = 0; | ||
| while (labels[j].item<long>() == thisLabel && attempts < batchSize) { | ||
| j = rng.Next(batchSize); | ||
| attempts++; | ||
| } |
There was a problem hiding this comment.
The CreatePairs function has a potential infinite loop issue. When trying to find a matching or different label, the code gives up after 'batchSize' attempts (lines 112-115, 123-126), but if no suitable pair is found, it uses the last randomly selected index 'j'. This could result in incorrect pairs (same label when different was requested, or vice versa). Consider either guaranteeing valid pairs or throwing an exception if no valid pair can be found.
| // Find another image with the same label | |
| int j = rng.Next(batchSize); | |
| int attempts = 0; | |
| while (labels[j].item<long>() != sameLabel && attempts < batchSize) { | |
| j = rng.Next(batchSize); | |
| attempts++; | |
| } | |
| images2.Add(data[j].unsqueeze(0)); | |
| targets.Add(1.0f); | |
| } else { | |
| // Different class pair | |
| var thisLabel = labels[i].item<long>(); | |
| int j = rng.Next(batchSize); | |
| int attempts = 0; | |
| while (labels[j].item<long>() == thisLabel && attempts < batchSize) { | |
| j = rng.Next(batchSize); | |
| attempts++; | |
| } | |
| // Find another image with the same label (and not the same index) | |
| int j = -1; | |
| int start = rng.Next(batchSize); | |
| for (int offset = 0; offset < batchSize; offset++) { | |
| int idx = (start + offset) % batchSize; | |
| if (idx == i) continue; | |
| if (labels[idx].item<long>() == sameLabel) { | |
| j = idx; | |
| break; | |
| } | |
| } | |
| if (j == -1) { | |
| throw new InvalidOperationException("Unable to find a pair with the same label in the current batch."); | |
| } | |
| images2.Add(data[j].unsqueeze(0)); | |
| targets.Add(1.0f); | |
| } else { | |
| // Different class pair | |
| var thisLabel = labels[i].item<long>(); | |
| int j = -1; | |
| int start = rng.Next(batchSize); | |
| for (int offset = 0; offset < batchSize; offset++) { | |
| int idx = (start + offset) % batchSize; | |
| if (idx == i) continue; | |
| if (labels[idx].item<long>() != thisLabel) { | |
| j = idx; | |
| break; | |
| } | |
| } | |
| if (j == -1) { | |
| throw new InvalidOperationException("Unable to find a pair with a different label in the current batch."); | |
| } |
| var idx = torch.randperm(numNodes, device: device); | ||
| var idxTrain = idx.slice(0, 1500, numNodes, 1); | ||
| var idxVal = idx.slice(0, 1000, 1500, 1); | ||
| var idxTest = idx.slice(0, 0, 1000, 1); |
There was a problem hiding this comment.
The slice operations for train/val/test splits are incorrect. The indices are specified as (dim, start, end, step), but the order should be (dim, start, end, step) where start and end define the range. Line 77 uses idx.slice(0, 1500, numNodes, 1) which would try to slice from index 1500 to numNodes, but based on the context this should be slicing indices 1500 onwards. The correct call should be idx.slice(0, 1500, numNodes, 1) or more simply idx[TensorIndex.Slice(1500, numNodes)]. Similar issues exist on lines 78-79.
No description provided.