feat(internal): protocol setup#540
feat(internal): protocol setup#540aristidesstaffieri wants to merge 21 commits intofeature/data-migrationsfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Implements the “protocol-setup” classification pipeline (issue #506): adds a CLI command and service to fetch unclassified contract WASMs, extract Soroban spec entries, and classify/assign protocol_id for matching protocols; also introduces a protocols table/status tracking and related model plumbing across ingest/indexer.
Changes:
- Added
protocol-setupCLI +ProtocolSetupService, WASM spec extraction, and protocol validator abstractions (incl. SEP-41 validator). - Added
protocolstable + protocol registration migration runner, and FK fromprotocol_wasms.protocol_id→protocols.id. - Refactored protocol wasm model/type usage across ingest/indexer/checkpoint and added unit/integration tests.
Reviewed changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/services/validator_registry_test.go | Adds tests for validator registry behavior. |
| internal/services/validator_registry.go | Introduces global registry for protocol validator factories. |
| internal/services/protocol_validator.go | Adds ProtocolValidator/WasmSpecExtractor and wazero-based spec extraction. |
| internal/services/protocol_setup_test.go | Adds unit tests for protocol setup service behavior and batching. |
| internal/services/protocol_setup.go | Implements protocol classification flow + RPC fetching and DB updates. |
| internal/services/mocks.go | Adds mocks for ProtocolValidator and WasmSpecExtractor. |
| internal/services/ingest_test.go | Updates ingest tests for renamed protocol wasm types. |
| internal/services/ingest_live.go | Updates live ingest persistence to use ProtocolWasms model/type. |
| internal/services/ingest_backfill.go | Updates backfill ingest persistence/signatures for renamed protocol wasm types. |
| internal/services/contract_validator.go | Adds SEP-41 protocol validator wrapper. |
| internal/services/checkpoint_test.go | Updates checkpoint tests for renamed protocol wasm model/type. |
| internal/services/checkpoint.go | Updates checkpoint service to use renamed protocol wasm model/type. |
| internal/integrationtests/protocol_setup_test.go | Adds integration tests using real WASMs for extraction/validation/classification. |
| internal/ingest/ingest.go | Wires checkpoint service config to renamed protocol wasm model. |
| internal/indexer/processors/protocol_wasms.go | Updates protocol wasm processor to return renamed type. |
| internal/indexer/mocks.go | Updates indexer mocks to use renamed protocol wasm type. |
| internal/indexer/indexer_test.go | Updates indexer tests for renamed protocol wasm processor return type. |
| internal/indexer/indexer_buffer_test.go | Updates buffer tests for renamed protocol wasm type. |
| internal/indexer/indexer_buffer.go | Updates buffer storage and methods for renamed protocol wasm type. |
| internal/indexer/indexer.go | Updates indexer interfaces/generics to use renamed protocol wasm type. |
| internal/db/migrations/protocols/main.go | Adds embedded “protocol registration” SQL runner. |
| internal/db/migrations/protocols/000_placeholder.sql | Adds placeholder protocol registration SQL file. |
| internal/db/migrations/2026-03-09.2-protocol_wasms_fk.sql | Adds FK constraint from protocol_wasms.protocol_id to protocols.id. |
| internal/db/migrations/2026-03-09.0-protocols.sql | Creates protocols table and status columns. |
| internal/db/migrate.go | Adds RunProtocolMigrations entrypoint. |
| internal/data/protocols_test.go | Adds tests for ProtocolsModel. |
| internal/data/protocols.go | Adds ProtocolsModel + status constants. |
| internal/data/protocol_wasms_test.go | Updates protocol wasm tests and adds BatchUpdateProtocolID tests. |
| internal/data/protocol_wasms.go | Renames protocol wasm type/model and adds GetUnclassified/BatchUpdateProtocolID. |
| internal/data/protocol_contracts_test.go | Updates tests to use renamed protocol wasm model/type. |
| internal/data/models.go | Adds Protocols and renames protocol wasms model entry. |
| internal/data/mocks.go | Adds mocks for protocols/protocol-wasms models and updates types. |
| cmd/root.go | Registers the new protocol-setup command. |
| cmd/protocol_setup.go | Implements CLI wiring: registry lookup, DB/RPC setup, and service execution. |
Comments suppressed due to low confidence (1)
internal/db/migrations/protocols/000_placeholder.sql:3
- The embedded protocol migrations directory currently contains only
000_placeholder.sql, which is comments-only. Executing a comments-only SQL string viaExecContextcan fail with an "empty query" error in Postgres, causingRunProtocolMigrationsto fail. Either remove the placeholder and ensure at least one real .sql file exists, or make the placeholder a no-op statement (e.g.,SELECT 1;) / explicitly skip it inRun.
-- Placeholder file required by go:embed *.sql directive.
-- Protocol registration SQL files (e.g., 001_sep41.sql) are added here.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| @@ -0,0 +1,2 @@ | |||
| -- Placeholder file required by go:embed *.sql directive. | |||
There was a problem hiding this comment.
A little confused here: What do these individual protocol migration files track here? Because we have the separate protocols.sql too right?
There was a problem hiding this comment.
each new migration will need new entries into the protocols table, as well as potentially new tables to track state. protocols.sql only adds the protocols table itself which is why it is in the regular migrations for schema. Migrations for a specific protocol are added at the time that protocol is implemented and they are run during the protocol-setup process.
There was a problem hiding this comment.
@aristidesstaffieri makes sense - one question: will all new protocol specfic migrations added here be related to current state?
There was a problem hiding this comment.
no not just current state but also adding the protocol itself to the protocols table, but yeah I expect those are the 2 types of sql that will live here.
| @@ -0,0 +1,85 @@ | |||
| package services | |||
There was a problem hiding this comment.
The two files contract_validator.go and protocol_validator.go have duplicated WASM extraction logic and overlapping validation concerns from parallel evolution — the ingestion pipeline built its own validator, then protocol-setup introduced a more extensible design without refactoring the original.
What each file provides
contract_validator.go (older) |
protocol_validator.go (newer) |
|
|---|---|---|
| WASM extraction | contractValidator.extractContractSpecFromWasmCode() — private method |
WasmSpecExtractor.ExtractSpec() — public interface |
| SEP-41 validation | isContractCodeSEP41() + ContractValidator.ValidateFromContractCode() |
SEP41ProtocolValidator.Validate() (delegates to |
same isContractCodeSEP41) |
||
| wazero runtime | Owns one instance | Owns a separate instance |
| Design | Tightly coupled (extract + validate in one call) | Separated concerns, supports multiple protocols via registry |
Who uses what
contract_validator.go→ ingestion pipeline (ingest.go→CheckpointService→processContractCode)protocol_validator.go→ protocol-setup command (cmd/protocol_setup.go→protocolSetupService)
The duplication
The WASM extraction logic is copy-pasted almost verbatim:
contractValidator.extractContractSpecFromWasmCode()wasmSpecExtractor.ExtractSpec()
Both compile the WASM module with wazero, find the contractspecv0 custom section, and unmarshal XDR ScSpecEntry entries. Additionally, SEP41ProtocolValidator.Validate() just delegates to the same isContractCodeSEP41() function that ContractValidator uses internally.
Suggested consolidation
Delete contract_validator.go entirely. Migrate to the newer, extensible design with one file per protocol:
internal/services/
protocol_validator.go # WasmSpecExtractor + ProtocolValidator interface (keep as-is)
validator_registry.go # Registry glue (keep as-is)
sep41_validator.go # SEP-41 spec, validation logic, type maps (new — extracted from contract_validator.go)
sep41_validator_test.go # SEP-41 tests (new — extracted from contract_validator_test.go)
Specific changes:
- Create
sep41_validator.go— moveSEP41ProtocolValidator,isContractCodeSEP41(),sep41RequiredFunctions,sep41FunctionSpec,
scSpecTypeNames,getTypeName(), andvalidateFunctionInputsAndOutputs()here - Delete
contract_validator.go— theContractValidatorinterface andcontractValidatorstruct are no longer needed - Update
CheckpointService— replaceContractValidatordependency withWasmSpecExtractor+[]ProtocolValidator:// Before: contractValidator.ValidateFromContractCode(ctx, wasmBytes) // After: specs, err := specExtractor.ExtractSpec(ctx, wasmBytes) // then run validators against specs
- Update ingest.go — wire WasmSpecExtractor and validators into CheckpointService instead of ContractValidator
This pattern scales cleanly — adding a new protocol (e.g., SEP-50) means adding one file + one registry call, no existing files modified.
There was a problem hiding this comment.
| var validatorRegistry = map[string]func() ProtocolValidator{} | ||
|
|
||
| // RegisterValidator registers a validator factory for a protocol ID. | ||
| func RegisterValidator(protocolID string, factory func() ProtocolValidator) { |
There was a problem hiding this comment.
Where do we call RegisterValidator from? I do not see where a new validator is registered
There was a problem hiding this comment.
The intended pattern is a Go init() function in the protocol's package.
So when a new protocols is implemented you would do -
func init() {
RegisterValidator("SEP41", NewSEP41ProtocolValidator())
}
That will run at import time, since cmd/protocol_setup.go already imports internal/services, any init() in that package would execute before the CLI command runs, populating the registry.
| var registryMu sync.RWMutex | ||
|
|
||
| // validatorRegistry holds factory functions keyed by protocol ID. | ||
| var validatorRegistry = map[string]func() ProtocolValidator{} |
There was a problem hiding this comment.
Registry Simplification: Store Instances, Not Factories
Current code
validator_registry.go stores factory functions:
map[string]func() ProtocolValidatorRegisterValidator(protocolID string, factory func() ProtocolValidator)GetValidator(protocolID string) (func() ProtocolValidator, bool)
Problem
The factory pattern adds indirection that isn't justified. Protocol validators are stateless — SEP41ProtocolValidator is an empty struct whose Validate() is a pure function over spec entries. There's no per-use state, no expensive construction, no resource lifecycle that would warrant lazy instantiation via factories.
Suggested change
Store validator instances directly where ProtocolValidator is an interface and each new validator implements it:
map[string]ProtocolValidatorRegisterValidator(protocolID string, v ProtocolValidator)GetValidator(protocolID string) (ProtocolValidator, bool)
Self-registration in sep41_validator.go becomes:
func init() {
RegisterValidator("SEP41", NewSEP41ProtocolValidator())
}Removes an unnecessary layer of indirection. If a future validator ever needs per-use instantiation, it can be refactored then.
There was a problem hiding this comment.
This change is already applied in 6fc4535 and referenced in a previous comment, and the registration is already setup to work with an init call.
Introduce the protocol-setup CLI command that classifies unclassified WASM hashes in protocol_wasms by fetching bytecodes via RPC getLedgerEntries and running them against protocol validators. - Add protocols, protocol_contracts tables and protocol_wasms FK migration - Add Protocol and ProtocolContract data models with tests - Extend ProtocolWasmModel with BatchUpdateProtocolID - Add ProtocolValidator interface and WasmSpecExtractor for bytecode analysis - Add protocolSetupService with batched RPC fetching (200 per call) - Wire protocol-setup command with --rpc-url, --network-passphrase flags - Add mocks and tests for all new service and data layer code
Replace Go constants (e.g., ProtocolSEP41) with SQL migration files as
the registry for protocol IDs. This means adding a new protocol only
requires a SQL file + validator implementation — no Go constant
maintenance needed.
- Remove ProtocolSEP41 constant block from internal/data/protocols.go
- Add embedded SQL migration infrastructure (internal/db/migrations/protocols/)
that executes idempotent INSERT statements alphabetically
- Add RunProtocolMigrations wrapper in internal/db/migrate.go
- Call RunProtocolMigrations in protocol-setup command before building validators
- Replace registerProtocols (InsertIfNotExists per validator) with
validateProtocolsExist (GetByIDs + missing check with descriptive error)
- Update all 6 test cases to mock GetByIDs instead of InsertIfNotExists
Restrict classification_status, history_migration_status, and
current_state_migration_status to valid values ('not_started',
'in_progress', 'success', 'failed') at the database level.
…tion Export SEP41ProtocolValidator in contract_validator.go to wrap the existing isContractCodeSEP41() logic behind the ProtocolValidator interface, without registering it in production validator factories. Add integration tests exercising the full pipeline with real WASM bytecodes from disk: spec extraction, SEP-41 validation, and end-to-end ProtocolSetupService classification against a real DB with mocked RPC.
Make protocol setup convert `HashBytea` values explicitly before RPC lookups and update `protocol_wasms` using `bytea[]` hashes instead of text parameters. Update mocks and tests to match the typed API, and add coverage for `BatchUpdateProtocolID`.
… names This aligns Protocol→Protocols and ProtocolWasm→ProtocolWasms (structs, interfaces, mocks, and Models struct fields) to match the protocols and protocol_wasms table names, consistent with the existing ProtocolContracts convention.
Tests RegisterValidator and GetValidator for registration, lookup of unknown protocols, and overwrite behavior. Also removes outdated init() references from comments since registration is dynamic.
…RPC response type guard
- Use a fresh context with timeout for best-effort status cleanup when
classify() fails, preventing protocols from being permanently stuck at
'in_progress' when the parent context is cancelled (e.g. SIGTERM)
- Add type guard before MustContractCode() in fetchWasmBytecodes to
prevent panics on unexpected RPC response entry types
- Remove dead base64ToHex map that was allocated but never read
Address errcheck, gofumpt, staticcheck, and wrapcheck violations: - Handle Close() error via DeferredClose in protocol_setup cmd - Check xdr.MarshalBase64 return errors in test helpers - Fix gofumpt formatting in buildMultiRPCResponse signature - Correct Protocols type comment to match ST1021 convention - Wrap external error from protocols.Run in migrate.go
If a validator factory returns nil, the nil propagates silently until Run() calls v.ProtocolID(), causing a panic far from the source. Fail early with a descriptive error at the call site instead
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Guard the package-level validatorRegistry map with a read-write mutex to prevent data races when RegisterValidator and GetValidator are called concurrently. Expose a resetRegistry helper for tests and add a concurrent race-detector test.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
SEP41ProtocolValidator embedded a full contractValidator (which allocates a
wazero runtime), but Validate() only called isContractCodeSEP41() — a pure
function that never touches the runtime. The runtime leaked since
ProtocolValidator has no Close() and shouldn't need one.
- Convert isContractCodeSEP41 and validateFunctionInputsAndOutputs from
receiver methods to standalone package-level functions
- Make SEP41ProtocolValidator an empty struct with no Close() method
- Remove defer validator.Close(ctx) from integration tests
- Legacy contractValidator is untouched (deferred to SEP-42 migration)
Move the validator-matching check before the specExtractor nil check so that an unmatched protocol ID is caught before infrastructure dependencies are validated.
…tern The real-pipeline tests used dbtest.Open() which requires a standalone Postgres via PGHOST/PGPORT env vars. This works in the unit test CI job but fails in the integration-test job which only has Docker containers. Convert to a testify suite using SharedContainers.GetWalletDBConnectionString() to match the existing integration test pattern.
Replace three mock-based unit tests with a single end-to-end integration
test that verifies the full protocol setup pipeline: ingest discovers
WASM hashes, stores them as unclassified, and ProtocolSetupService
classifies them using real RPC.
Key changes:
- Remove mock RPC, manual WASM insertion, and test helpers
(buildRPCResponse, buildMultiRPCResponse, loadTestWasm, testdataDir)
- Use testEnv.RPCService (real RPC) and ingest-populated protocol_wasms
- SetupTest now resets classification state (UPDATE SET NULL) instead of
truncating, preserving ingest-populated rows
- Move suite to run after AccountBalancesAfterCheckpointTestSuite to
ensure ingest has populated the DB
- Fix unparam lint in wasm_spec_extractor_test.go (unused xdr.Hash return)
Since the project is not in production, reorder protocol migrations so protocols is created first, allowing protocol_wasms to define its FK to protocols(id) inline instead of in a separate ALTER TABLE migration.
… wasm extractor and processors directly in ingestion
b2ac7ce to
3b1e155
Compare
3b1e155 to
77e65e2
Compare
SEP-41 validation and account-contract token tracking will be
reintroduced later as a protocol data migration. This removes all
SEP-41 classification, metadata fetching, and relationship tracking
from checkpoint population, live ingestion, and the GraphQL API.
SAC contract tracking remains intact.
Changes:
- Remove SEP-41 spec validation from checkpoint processContractCode
- Remove contractTypesByWasmHash, contractIDsByWasmHash,
contractTokensByHolderAddress from checkpoint data
- Remove fetchSep41Metadata and account-contract relationship storage
- Remove FetchSep41Metadata from ContractMetadataService interface
- Remove AccountContractTokensModel, table, and migration
- Remove SEP-41 from indexer buffer, token transfer processor
- Remove ContractTypeSEP41 constant
- Remove processContractTokenChanges from token ingestion
- Remove prepareNewContractTokens (replaced with prepareNewSACContracts)
- Remove SEP41Balance GraphQL type and resolver logic
- Remove SEP-41 from wbclient SDK types and queries
- Simplify validator registry to store instances instead of factories
- Extract SEP41ProtocolValidator to sep41_validator.go for future use
- Delete contract_validator.go (replaced by shared WasmSpecExtractor)
77e65e2 to
310fa92
Compare
Closes #506
What
Add the protocol-setup CLI command that classifies WASM bytecodes against registered protocol validators.
Why
This is the first step of the protocol data migration pipeline. Before the wallet-backend can index protocol-specific data (e.g., SEP-41 token balances), it needs to know which WASM contracts on the network implement which protocols. The protocol-setup command performs this one-time classification(per protocol) by comparing on-chain WASM bytecodes against protocol interface definitions, tagging each matching WASM in the database. Subsequent pipeline steps (history migration, current-state migration) depend on this classification to know which contracts to index.
Known limitations
N/A
Issue that this PR addresses
#506
Checklist
PR Structure
allif the changes are broad or impact many packages.Thoroughness
Release