feat(schema): add semantic IR and symbol ID infrastructure by hardbyte · Pull Request #124 · thepartly/reflectapi

hardbyte · 2026-03-27T02:49:51Z

Summary

Adds semantic IR infrastructure (#96), fixes Python codegen for flattened tagged-union fields (#123), makes Python a first-class codegen backend with namespace classes, typed errors, field descriptions, and SemanticSchema-driven code generation.

Semantic IR Infrastructure (`reflectapi-schema`)

Module	Purpose
`symbol.rs`	`SymbolId` / `SymbolKind` — stable unique identifiers for all schema symbols
`ids.rs`	`ensure_symbol_ids()` — canonical ID assignment with cross-typespace disambiguation
`semantic.rs`	Immutable `SemanticSchema`, `SemanticType`, `SymbolTable`, `ResolvedTypeReference`
`normalize.rs`	`NormalizationPipeline` + `Normalizer` (`&Schema` → `SemanticSchema`)

Schema type changes: id: SymbolId on all types, Normalizer::normalize(&Schema), original_name on SemanticType preserving pre-normalization qualified names.

Python Codegen — Now Driven by SemanticSchema

The Python codegen uses SemanticSchema as the primary driver:

Type iteration: semantic.types() (deterministic BTreeMap order, replaces manual topological sort)
Import detection: SemanticType pattern matching
Function ordering: semantic.functions()
Raw Schema kept for rendering (concrete field/variant data)

Python Codegen — Wire-Compatible Flatten (#123)

For #[serde(flatten)] on internally-tagged enums, generates per-variant models merging parent fields + tag + variant fields into a discriminated union RootModel.

Python Codegen — First-Class DX

Namespace classes mirroring Rust module structure (auth.UsersSignInRequest)
Field descriptions via Field(description="...")
Typed error returns — ApiResponse[OutputType, ErrorType] in method signatures
Typed error deserialization — ApplicationError.typed_error as Pydantic model
Typed list responses — list[Model] via TypeAdapter (was returning raw dicts)
Fast JSON parsing — Pydantic's Rust-based validate_json(bytes)
Factory classes removed — direct construction via namespace types (-13% file size)
Docstring escaping for backslashes, triple-quotes, Python keywords in method/parameter names

Python Runtime Fixes

TypeAdapter for all response validation (handles list[Model], generics, unions)
error_model parameter on _make_request for typed error deserialization
ApiResponse[T, E] generic with both success and error type parameters
validate_json(bytes) fast path for Pydantic's Rust-based parser

Other Changes

Architecture documentation (docs/architecture.md)
Fixed dead README links
Pin mdbook 0.4.x in CI (fixes 3-month doc build failure)
Merged Andrey/refactoring cleanup #122 askama removal
21 new edge case snapshot tests

Real-World Validation

Partly's core-server (284 endpoints, 78K-line schema):

47K-line Python client (was 68K before factory removal)
Valid Python, imports in ~0.65s
Authenticates against live API, typed responses and errors work
list[BillingCurrencyListItem] returns Pydantic models
ApplicationError.typed_error = CustomerGetErrorCustomerNotFoundVariant

Test Coverage

220 tests total (0 failures)
166 demo snapshot tests (21 new edge case tests)
37 schema crate tests
All 3 CI workflows green (including doc build)

Add SymbolId system, semantic IR types, ID assignment, and normalization pipeline to reflectapi-schema. This provides stable, unique identifiers for all schema symbols and a multi-stage pipeline for transforming raw schemas into validated semantic representations. New modules: - symbol.rs: SymbolId/SymbolKind types with stable identifiers - ids.rs: ensure_symbol_ids() for post-deserialization ID assignment - semantic.rs: Immutable semantic IR (SemanticSchema, SymbolTable, etc.) - normalize.rs: TypeConsolidation, NamingResolution, CircularDependency detection stages, and Normalizer (Schema -> SemanticSchema) Schema type changes: - Added id: SymbolId field to Schema, Function, Primitive, Struct, Field, Enum, Variant (serde skip_serializing, backward compatible) - Manual PartialEq/Hash impls exclude id from comparisons - PartialEq + Eq added to SerializationMode, Copy to SymbolKind Addresses #96, lays groundwork for #123.

claude

⚠️ Code review skipped — your organization's overage spend limit has been reached.

Code review is billed via overage credits. To resume reviews, an organization admin can raise the monthly limit at claude.ai/admin-settings/claude-code.

Once credits are available, reopen this pull request to trigger a review.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86c00edfb8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

reflectapi-schema/src/ids.rs

reflectapi-schema/src/normalize.rs

- TypeConsolidationStage now rewrites type references after renaming conflicted types (fixes dangling references to old names) - ensure_symbol_ids uses separate seen maps per typespace and disambiguates output types that share an FQN with a different input type (prevents SymbolId collisions in the Normalizer) - Field::new and Variant::new use Default::default() for id so ensure_symbol_ids can assign proper parent-contextualized paths

hardbyte · 2026-03-27T03:12:36Z

@claude review

reflectapi-schema/src/normalize.rs

reflectapi-schema/src/ids.rs

- ids.rs: Use struct/enum's actual ID (not seen-map ID) as owner for member ID assignment, fixing inconsistent parent-child paths when types have pre-assigned IDs - normalize.rs: Track all conflicting qualified names in name_usage (Vec<String> per simple name), not just the first, so update_type_references_in_schema builds mappings for all conflicting types and avoids dangling references - normalize.rs: Fix generate_unique_name fallback to join all module parts instead of using module_parts[0], which would return an excluded part ("model"/"proto") and cause name collisions https://claude.ai/code/session_01UcJQe3CE12BFgqDiadkgii

- test_pre_assigned_id_member_paths_consistent: verifies struct field ID paths use the struct's actual ID as parent prefix - test_pre_assigned_id_enum_member_paths_consistent: same for enums - test_naming_resolution_all_conflicting_types_have_references_rewritten: verifies function references to all conflicting types (not just the first) are rewritten to valid names after NamingResolutionStage - test_generate_unique_name_excluded_modules_no_collision: verifies model::Foo and model::proto::Foo produce different names - test_generate_unique_name_with_non_excluded_module: normal case https://claude.ai/code/session_01UcJQe3CE12BFgqDiadkgii

reflectapi-schema/src/semantic.rs

reflectapi-schema/src/normalize.rs

reflectapi-schema/src/ids.rs

reflectapi-schema/src/normalize.rs

For structs with `#[serde(flatten)]` on an internally-tagged enum, generate per-variant models that merge parent struct fields + variant fields + tag discriminator, then emit a discriminated union RootModel. This matches the flat wire format serde produces. Before: `Offer` had only `id: str` (enum fields silently dropped) After: `Offer` is a RootModel union of `OfferSingle{id,type,business}` and `OfferGroup{id,type,count}` — wire-compatible with serde Also: - Compose NormalizationPipeline into Normalizer (runs TypeConsolidation, NamingResolution, CircularDependencyResolution before IR construction) - Add snapshot tests for flattened externally-tagged, adjacently-tagged, and untagged enums - Document Boxing strategy as intentional no-op (Rust schemas already encode Box<T>); add integration tests for self-referential and multi-type circular dependency normalization - Add docs/architecture.md covering semantic IR pipeline, codegen backends, and flattened type handling

- Remove all point-in-time language ("currently", "not yet", "planned") - Rename Section 7 from "Current Status and Roadmap" to "Limitations and Design Gaps" — state facts, not progress - Delete "Complete" and "In Progress" subsections - Fix Schema/SemanticSchema code samples to include id fields - Add reflectapi-python-runtime crate description - Fix OpenAPI version (3.1, not 3.0) - Replace vague language with specifics throughout - Remove subjective tone and issue number references from prose

- Handler function signature convention (Input, Output, Headers, Error) - Input/Output traits as the self-registration mechanism - reflectapi::Option<T> three-state type (Undefined | None | Some) - Primitive.fallback mechanism for codegen type resolution - #[reflectapi(...)] derive macro attributes reference - Snapshot test architecture (5 snapshots per test, trybuild)

- Add description/deprecation_note to Function struct sample - Fix TypeConsolidation claim: both copies are renamed when name appears in both typespaces (not just when types differ) - Fix NamingResolution example: proto is skipped in prefix generation, so use ApiUser/BillingUser not ProtoUser

Replace dead reflectapi.partly.workers.dev URLs (returning 404) with links to local docs. Add link to architecture doc.

ids.rs: - Use struct's actual id (not seen-map id) as owner for member assignment, fixing inconsistent parent-child paths - Zero-pad tuple field indices (arg00, arg01, ...) so BTreeMap ordering matches positional order for 10+ fields - assign_disambiguated_id now clears and re-assigns all member IDs after disambiguation, maintaining hierarchical consistency - Schema root uses sentinel path ["__schema__", name] to avoid collision with same-named user types normalize.rs: - TypeConsolidation uses full qualified name for conflict renaming (input.a.Foo vs input.b.Foo) preventing silent type drops - resolve_types filters resolution_cache to type-level symbols only, preventing Field/Variant entries from shadowing type lookups - discover_struct/enum_symbols derives SymbolInfo.path from field.id.path for consistency with split-path ID assignment

hardbyte · 2026-03-27T22:04:38Z

@claude review

...c/tests/snapshots/reflectapi_demo__tests__serde__flatten_adjacently_tagged_enum_field-5.snap

reflectapi-schema/src/normalize.rs

claude · 2026-03-27T22:35:33Z

reflectapi/src/codegen/python.rs

+    let flattened_internal_enum =
+        struct_def
+            .fields
+            .iter()
+            .filter(|f| f.flattened())
+            .find_map(|field| {
+                let type_name = resolve_flattened_type_name(&field.type_ref);
+                match schema.get_type(type_name) {
+                    Some(reflectapi_schema::Type::Enum(enum_def)) => {
+                        match &enum_def.representation {
+                            reflectapi_schema::Representation::Internal { tag } => {
+                                Some((field, enum_def.clone(), tag.clone()))
+                            }
+                            _ => None,
+                        }
+                    }
+                    _ => None,
+                }
+            });
+
+    if let Some((_enum_field, enum_def, tag)) = flattened_internal_enum {
+        // Wire-compatible path: generate per-variant models with merged fields
+        render_struct_with_flattened_internal_enum(
+            struct_def,


🔴 The find_map at python.rs:462 returns only the FIRST flattened internally-tagged enum field; if a struct has two such fields (valid in Rust when they use different tag names), the second enum's variants are never generated and are silently dropped from the Python output. Any consumer deserializing such a struct will face a mismatch: the Rust type has two independent discriminated unions flattened in, but the Python model only reflects one of them.

Extended reasoning...

What the bug is and how it manifests

In render_struct_with_flatten (python.rs lines 457–475), the iterator chains .filter(|f| f.flattened()).find_map(...) to locate a flattened internally-tagged enum field. find_map short-circuits on the first match and returns Option<(field, enum_def, tag)>. Only that single enum_def is ever passed to render_struct_with_flattened_internal_enum. If a struct has a second flattened internally-tagged enum field — valid Rust with serde when the two enums use distinct tag field names (e.g. type and kind) — find_map never sees it.

The specific code path that triggers it

Inside render_struct_with_flattened_internal_enum, the loop at lines 561–578 iterates over all flattened fields:

for field in struct_def.fields.iter().filter(|f| f.flattened()) { let type_name = resolve_flattened_type_name(&field.type_ref); if let Some(reflectapi_schema::Type::Struct(_)) = schema.get_type(type_name) { // expand struct fields into base_fields } // Enum fields are handled below as variants <-- misleading comment }

The comment says "Enum fields are handled below as variants", but "below" refers only to the for variant in &enum_def.variants loop, which iterates over the variants of the ONE enum that was found by find_map. A second flattened internally-tagged enum field is neither expanded into base_fields nor iterated as a variant block. It is completely skipped.

Why existing code does not prevent it

The function signature render_struct_with_flattened_internal_enum(... enum_def: &Enum ...) accepts a single enum. There is no mechanism to pass, receive, or render a second enum. The test suite (test_flatten_internally_tagged_enum_field) uses a struct with exactly one flattened enum, so the missing second-enum path is never exercised.

What the impact would be

Given a Rust struct:

struct Combined { id: String, #[serde(flatten)] action: ActionKind, // internal tag "type" #[serde(flatten)] status: StatusKind, // internal tag "kind" }

The generated Python model would contain only the ActionKind discriminated union variants. Every variant that comes from StatusKind — including its tag field "kind" — is absent from the Python output. Any Python code receiving a wire message with {"id":"1","type":"Create","kind":"Active",...} would fail to deserialize or would silently ignore the kind and all status-related fields.

Step-by-step proof

struct_def has two flattened fields: action: ActionKind (internal tag type) and status: StatusKind (internal tag kind).

.filter(|f| f.flattened()).find_map(...)) evaluates action first. ActionKind matches Representation::Internal, so find_map returns Some((action_field, action_enum_def, "type")) immediately.

status: StatusKind is never evaluated.

render_struct_with_flattened_internal_enum receives enum_def = ActionKind and generates CombinedCreate, CombinedDelete, etc. — no CombinedActive, CombinedInactive variants.

The inner loop at line 562 skips status because it is an Enum (not a Struct) and the comment defers to code that never runs for it.

Result: StatusKind's variants are entirely absent from the Python output.

How to fix it

Collect ALL flattened internally-tagged enums (not just the first), then either: (a) generate a cross-product of variant combinations, which is complex but wire-accurate; or (b) for each additional internally-tagged enum beyond the first, fall back to the standard field emission path used for non-internal enums, with a documented limitation. At minimum, a warning or error should be surfaced when multiple flattened internally-tagged enums are detected, rather than silently generating incorrect output.

claude · 2026-03-27T22:35:34Z

...i-demo/src/tests/snapshots/reflectapi_demo__tests__serde__flatten_untagged_enum_field-5.snap

+StdNumNonZeroI32 = Annotated[int, "Rust NonZero i32 type"]
+StdNumNonZeroI64 = Annotated[int, "Rust NonZero i64 type"]
+
+# Rebuild models to resolve forward references
+try:
+    ReflectapiDemoTestsSerdeCell.model_rebuild()
+    ReflectapiDemoTestsSerdeValue.model_rebuild()
+except AttributeError:
+    # Some types may not have model_rebuild method
+    pass
+
+# Factory classes (generated after model rebuild to avoid forward references)
+


🔴 The Python codegen emits model_rebuild() calls for Union type aliases (e.g., ReflectapiDemoTestsSerdeValue = Union[...]) alongside real BaseModel subclasses inside a single try/except AttributeError block. Union aliases have no model_rebuild() method, so the call always raises AttributeError. Because all calls share one block, any Union alias that sorts alphabetically before a real BaseModel subclass will silently abort the entire try block, leaving the real model's forward references unresolved. The fix is to either wrap each model_rebuild() call in its own try/except block, or filter the list to exclude Union aliases.

Extended reasoning...

What the bug is and how it manifests

The Python codegen in reflectapi/src/codegen/python.rs (around line 1320-1332) collects all rendered type names, sorts them alphabetically, and emits them in a single try/except AttributeError block. In the snapshot flatten_untagged_enum_field-5.snap (lines 148-160), ReflectapiDemoTestsSerdeValue is a plain Python Union type alias, not a Pydantic BaseModel subclass:

ReflectapiDemoTestsSerdeValue = Union[ ReflectapiDemoTestsSerdeValueNum, ReflectapiDemoTestsSerdeValueText ]

Union type aliases in Python are typing special forms and have no model_rebuild() method. Calling .model_rebuild() on them always raises AttributeError.

The specific code path that triggers the latent bug

Types are sorted alphabetically before the block is emitted (sorted_type_names.sort() in python.rs). In the tested snapshot, Cell (C) sorts before Value (V), so ReflectapiDemoTestsSerdeCell.model_rebuild() runs first and succeeds, and then the AttributeError from ReflectapiDemoTestsSerdeValue.model_rebuild() is caught. This specific case is harmless.

However, the structural defect is that all calls share one try/except block. Consider any schema where a Union alias name sorts alphabetically before a real BaseModel/RootModel subclass — for example, an 'AValue = Union[...]' alias and a 'BModel(BaseModel)' class. The sequence would be: (1) AValue.model_rebuild() raises AttributeError, (2) the except block catches it and execution exits the entire try block, (3) BModel.model_rebuild() is never called.

Why existing code does not prevent it

The comment 'Some types may not have model_rebuild method' shows the author anticipated this case, but the single-block structure is the defect. The only reason the tested snapshots work is that all real models happen to sort before the Union aliases in the current test cases. With 'from future import annotations' active (which this generated file uses), Pydantic defers annotation evaluation and depends on model_rebuild() being called to resolve forward references in complex schemas. Any schema where a Union alias sorts before a real model relying on forward reference resolution will silently produce broken Pydantic models.

Step-by-step proof for the latent ordering failure

Suppose a schema produces 'AValueUnion = Union[AVariant1, AVariant2]' and 'class BModel(BaseModel): field: SomeForwardRef'. In the single try/except block (alphabetical order): AValueUnion.model_rebuild() raises AttributeError, the except catches it and exits the block, BModel.model_rebuild() never runs, and SomeForwardRef remains an unresolved string annotation in BModel.

How to fix it

Option 1 (simplest): wrap each call in its own try/except so that a failure on a Union alias does not abort subsequent real model rebuilds. Option 2: filter the type name list at codegen time to exclude Union type aliases, only emitting model_rebuild() calls for actual BaseModel/RootModel subclasses.

reflectapi/src/codegen/python.rs

claude · 2026-03-27T22:35:34Z

reflectapi-schema/src/semantic.rs

+pub struct SemanticEnum {
+    pub id: SymbolId,
+    pub name: String,
+    pub serde_name: String,
+    pub description: String,
+
+    /// Resolved generic parameters
+    pub parameters: Vec<SemanticTypeParameter>,
+
+    /// Variants ordered deterministically
+    pub variants: BTreeMap<SymbolId, SemanticVariant>,
+
+    /// Serde representation strategy
+    pub representation: crate::Representation,
+
+    /// Language-specific configuration
+    pub codegen_config: crate::LanguageSpecificTypeCodegenConfig,
+}


🔴 SemanticEnum.variants is BTreeMap<SymbolId, SemanticVariant> (semantic.rs:103), which sorts variants alphabetically by name rather than by declaration order. For #[serde(untagged)] enums, serde tries variants in declaration order and picks the first successful deserialization — any downstream codegen backend iterating SemanticEnum.variants will silently use the wrong order, causing incorrect deserialization when an alphabetically-earlier variant can absorb input intended for a later one. Fix by using IndexMap<SymbolId, SemanticVariant> or Vec to preserve insertion order.

Extended reasoning...

What the bug is and how it manifests

SemanticEnum.variants is typed as BTreeMap<SymbolId, SemanticVariant> (semantic.rs line 103). SymbolId derives Ord by field order: kind, then path: Vec, then disambiguator. All variants within the same enum share kind=Variant, and their path ends with the variant name — so the BTreeMap sorts them alphabetically by variant name, not by the order they appear in the source.

The specific code path that triggers it

In normalize.rs, build_semantic_enum (around line 1134) iterates enm.variants() — which returns variants in their raw declaration order (preserved in Vec) — and inserts each into a BTreeMap<SymbolId, SemanticVariant> keyed by SymbolId. The BTreeMap then re-sorts by SymbolId::Ord, discarding the position metadata. The Normalizer::build_semantic_enum code is:

for variant in enm.variants() { let semantic_variant = self.build_semantic_variant(variant)?; variants.insert(variant.id.clone(), semantic_variant); // BTreeMap re-sorts }

Why existing code does not prevent it

The raw Enum.variants field is Vec, which preserves declaration order. That order is available at the point build_semantic_enum iterates enm.variants(). However, the result is inserted into a BTreeMap which re-sorts by SymbolId. There is no assertion, test, or fallback that checks whether BTreeMap ordering matches declaration order.

What the impact would be

For #[serde(untagged)] enums, serde's contract is: try variants in declaration order, use the first that deserializes successfully. A codegen backend that iterates SemanticEnum.variants (the natural, intended API) will silently produce a client that applies variants in alphabetical order instead. This leads to incorrect deserialization for any untagged enum where two variants can both deserialize a given input — the wrong variant is selected with no error.

Example: an enum declared as [Integer(i64), Float(f64)] — both variants can deserialize the JSON value 42. Serde (declaration order) picks Integer. A backend using SemanticEnum.variants iteration (alphabetical) tries Float first and picks Float. The generated client silently deserializes a different type than Rust would.

The new test case added in this PR includes test_flatten_untagged_enum_field with enum Value { Num { value: f64 }, Text { text: String } }. Alphabetically Num < Text, which happens to match declaration order here. But for any enum where declaration order differs from alphabetical order, the bug manifests.

How to fix it

Replace BTreeMap<SymbolId, SemanticVariant> with an insertion-order-preserving collection:

IndexMap<SymbolId, SemanticVariant> from the indexmap crate — preserves insertion order, provides O(1) keyed lookup

Vec — simplest, no key-based lookup without an auxiliary index

The same fix is needed for SemanticVariant.fields and SemanticStruct.fields for correctness with positional (unnamed) fields.

Step-by-step proof

Define an untagged enum: variants declared as [Integer(i64), Float(f64)].

Normalizer::build_semantic_enum inserts Integer (SymbolId path=["MyEnum","Integer"]) then Float (path=["MyEnum","Float"]) into BTreeMap.

BTreeMap sorts by path lexicographically: "Float" < "Integer", so Float entry comes first in iteration.

A codegen backend calls semantic_enum.variants.values() and emits: try Float, then try Integer.

For JSON input 42: both would match — Float wins because it was tried first. Rust's serde would have picked Integer (declaration order). The generated client deserializes a different type silently.

- Python codegen: set has_externally_tagged_enums flag for Adjacent representation too, fixing missing RootModel/model_validator imports - generate_unique_name: join ALL non-excluded module components to avoid collisions (ServicesUserProfile vs AuthUserProfile) - discover_symbols: use function.id.path instead of splitting HTTP URL path, fixing SymbolTable get_by_path for endpoints

reflectapi/src/codegen/python.rs

hardbyte · 2026-03-28T01:50:35Z

@claude review

- Sanitize tag discriminator field name for Python reserved words (e.g., "type" → "type_" with alias). Fixes SyntaxError when tag name is a Python keyword. - Add model_rebuild() calls for per-variant classes generated by render_struct_with_flattened_internal_enum. Fixes forward reference resolution with `from __future__ import annotations`. - Guard against empty enum variants producing invalid `Union[]` syntax.

ids.rs (3 tests): - Zero-padded tuple field ordering (arg00..arg11 sort correctly) - Disambiguated ID propagates to member IDs - Schema root ID does not collide with same-named type normalize.rs (4 tests): - TypeConsolidation preserves all types with qualified name uniqueness - resolve_types does not confuse variant with type of same name - generate_unique_name distinguishes same-inner-module paths - Function symbol path matches ID for get_by_path lookups

Merges the askama dependency removal from PR #122. Template structs now use manual render() methods returning String instead of askama::Template derive + fallible render. Conflict resolution: kept both the TestingModule render() impl from #122 and the #[derive(Clone)] on Field from our branch. Fixed render()? -> render() in render_struct_with_flattened_internal_enum.

Run Normalizer::normalize() at the start of Python codegen's generate() function, making the SemanticSchema available alongside the raw Schema. - Add convenience methods to SemanticSchema: get_type_by_name(), get_type(), types(), functions(), type_names() - The SemanticSchema is constructed once and available for render functions that benefit from type-safe SymbolId lookups - Raw Schema is still used for the main iteration loop since the Normalizer's NamingResolutionStage transforms type names, and the existing codegen relies on pre-normalization names - Graceful fallback if normalization fails (best-effort) This is the first consumer of SemanticSchema in the codegen path, validating the IR infrastructure from #96.

- Replace broken fallback (would panic on same error) with .ok() that makes normalization best-effort - Use _semantic prefix for intentionally-unused binding - get_type_by_name: use symbol table O(log n) lookup with linear scan fallback, instead of always O(n) - type_names: return iterator instead of allocating Vec<String> - Remove stale dead code reference to `semantic` variable

Python codegen fixes: - Underscore-prefixed fields no longer treated as Pydantic private attributes. sanitize_field_name strips leading underscores and generates Field(alias="_original") for wire compatibility. - exclude_none=True removed from enum serializers — was dropping intentional None values. Plain model_dump() matches serde behavior. - Factory method parameters now include type annotations (e.g., `def circle(radius: float)` instead of `def circle(radius)`). - sanitize_field_name_with_alias now takes serde_name for proper alias generation on renamed fields. Normalizer refactor: - normalize() takes &Schema instead of Schema by value, eliminating the clone at the call site (clones internally for pipeline mutation) - build_semantic_ir receives pre-pipeline original_names map - SemanticPrimitive/Struct/Enum gain original_name field preserving pre-normalization qualified names - SemanticSchema::get_type_by_name falls back to original_name search ~88 snapshots updated with type-annotated factory params, wire-name aliases on renamed fields, and model_dump() without exclude_none.

Port the TypeScript/Rust namespace algorithm to Python codegen. Type definitions remain at module top-level with flat PascalCase names for Pydantic forward-reference resolution. Namespace alias classes provide dotted access paths mirroring the Rust module hierarchy: class reflectapi_demo: class tests: class serde: Offer = ReflectapiDemoTestsSerdeOffer OfferKind = ReflectapiDemoTestsSerdeOfferKind Users access types as: reflectapi_demo.tests.serde.Offer Type references in annotations, client methods, model_rebuild calls, and factory classes all use dotted paths. This matches the approach used by TypeScript (export namespace) and Rust (pub mod) backends. Implementation: - New Module struct + modules_from_rendered_types (ported from TS) - type_name_to_python_ref converts :: paths to dotted notation - Client signatures use dotted type references - Factory/testing utilities use namespaced names - Removed old generate_nested_class_structure dead code 125 snapshot files updated.

- extract_defined_names now only matches top-level definitions (no leading whitespace), preventing enum member values like NOT_FOUND from leaking into namespace alias classes - Filter out SCREAMING_SNAKE_CASE constants (enum members) - Filter out *Variants internal union type aliases from namespace (implementation details, not part of the public API surface)

…lones - Remove dead _semantic normalizer call (constructed but never used) - Filter TypeVar declarations (T, U) from extract_defined_names - Move instead of clone rendered_original_names_in_order - Collect rendered_type_keys before moving rendered_types - Delete dead Imports::render() method (~95 lines) - Delete always-false has_flatten_support field - Inline trivial to_valid_python_identifier wrapper

Coverage for previously untested code paths across 6 categories: Namespace edge cases (3): single-segment types, deeply nested modules, numeric/special character field names Flatten edge cases (5): nested flatten depth > 1, optional internally- tagged enum flatten, multiple flattened structs, combined struct + enum flatten, unit-variant-only enum flatten Enum representation edge cases (4): generic externally-tagged enum, generic adjacently-tagged enum, mixed variant types (unit + struct), serde rename on variants Type reference edge cases (4): Box<T> unwrapping, nested generic containers (Vec<Vec<u32>>), self-referential struct, Option<Option<T>> Field sanitization edge cases (3): all Python keywords as field names, special characters in serde renames, multiple underscore prefixes Factory/client edge cases (2): 12-variant enum at scale, empty enum 105 new snapshot files (21 tests x 5 snapshots each).

Real-world validation against Partly's core-server (284 endpoints, 78K-line schema) revealed two codegen bugs: 1. Descriptions containing backslashes (e.g., "object\'s") break Python docstrings because \ acts as a line continuation character. Added sanitize_for_docstring() that escapes \ and """ in all 13 template render methods that emit docstrings. 2. Factory method names derived from enum variant names (e.g., "global", "from") can be Python keywords, producing SyntaxError. Applied safe_python_identifier() to all factory method name and parameter name generation sites. The generated 57K-line Python client for core-server now parses as valid Python (verified with py_compile).

Fixes NameError when importing the generated client: model_rebuild() was called inline (in render_struct_with_flattened_internal_enum) before namespace alias classes were defined, so dotted type references like `business_rules.Response` could not be resolved. Moved all model_rebuild() calls to the global rebuild section which runs after namespace classes are defined. Also sanitized all remaining docstring emission points (13 locations) to escape backslashes and triple-quotes in description text. Fixed factory method names and parameters using Python keywords (from, global) via safe_python_identifier(). Validated against Partly's core-server (284 endpoints, 78K-line schema): - 57K-line Python client generates as valid Python - Imports in 0.65s - Successfully authenticates against live API (dev13)

mdbook 0.5.x changed the preprocessor JSON protocol, breaking mdbook-keeper compatibility. Pin both tools to compatible versions: - mdbook ~0.4 (0.4.x series) - mdbook-keeper ~0.5 This fixes doc builds that have been failing on main since Jan 2026. Applied to docs.yml, docs-preview.yml workflows. Also: add __pycache__/*.pyc to .gitignore, remove accidentally committed pycache files.

github-actions · 2026-03-28T09:57:32Z

📖 Documentation Preview: https://reflectapi-docs-preview-pr-124.partly.workers.dev

Updated automatically from commit 1dd3496

… approach)

- TypeScript no longer uses askama (removed in #122), uses std::fmt::Write - Python is no longer experimental — validated against production API - Python section updated to document namespace classes, alias handling, docstring escaping, factory type annotations - Python flatten example updated to show actual type_ alias pattern - Limitations section references #127 for remaining DX improvements

Field descriptions: - Schema field descriptions now emitted as Field(description="...") in generated Pydantic models. Descriptions appear in IDE hover, model_json_schema(), and help() output. - Added sanitize_for_string_literal() to escape newlines, quotes, and backslashes in description strings. - Flattened-field internal descriptions (prefixed "(flattened") are filtered out as they're implementation details. Typed error returns: - Client methods now return ApiResponse[OutputType, ErrorType] instead of ApiResponse[Any], making the error type visible in the signature and IDE autocompletion. - ApiResponse runtime class updated to Generic[T, E] (backward compatible — ApiResponse[T] still works). - Docstring return section shows both success and error types. Validated against Partly's core-server (284 endpoints): - All field descriptions preserved including multi-line ones - All error types visible in method signatures - Generated 57K-line client passes py_compile

Factory classes (371 in core-server output) consumed ~13K lines (19% of file) and provided no value over direct type construction: # Before (factory): myapi_proto_PetsCreateErrorFactory.conflict() # After (direct, already works): myapi.proto.PetsCreateError("Conflict") Removed: - FactoryInfo struct and 5 factory generation functions - generate_factory_method_params/args helpers - render_*_without_factory naming (renamed to render_*/render_enum) - sanitize_field_name (only used by factory code) - HybridEnumClass and FactoryMethod template structs - Default generate_testing changed to false 697 lines removed from python.rs (6595 -> 5898). Core-server output: 58K lines (down from 68K, -13%). Validated: imports, constructs types, authenticates against live API.

Runtime fixes: - Use Pydantic TypeAdapter for all response validation, replacing the manual isinstance/model_validate chain. This correctly handles generic types like list[Model], dict[str, Model], Union types, and plain BaseModel subclasses. - Use TypeAdapter.validate_json(bytes) for Pydantic's Rust-based fast JSON parser when raw bytes are available, falling back to validate_python(dict) otherwise. - Add error_model parameter to _make_request and _handle_error_response. When an API returns an error, the runtime attempts to deserialize the error body into the typed error model. Accessible via ApplicationError.typed_error. Codegen: - Generated _make_request calls now pass error_model= with the typed error type from the schema. Validated against Partly's core-server: - list[BillingCurrencyListItem] returns typed Pydantic models (was returning raw dicts) - CustomerGetError.typed_error = CustomerNotFoundVariant(customer_id=...) (was raw dict string) - 170 currencies validated via fast validate_json path

The Python codegen now uses SemanticSchema as the primary driver for type iteration, import detection, and function ordering: - Type iteration uses semantic.types() (deterministic BTreeMap order) instead of manual topological_sort_types (removed: 118 lines) - Import detection (has_enums, has_literal, etc.) uses SemanticType pattern matching instead of raw schema.get_type() lookups - Function iteration uses semantic.functions() for ordering - Deprecation detection uses semantic function metadata The raw Schema is kept for rendering (render functions need concrete Struct/Enum/Field types). Lookups use original_name (pre-normalization qualified name like "analytics::AnalyticsEventInsertData") to find types in the consolidated raw schema. Fixed original_names capture in Normalizer: builds short→qualified name mapping from pre-normalization type names, keyed by the post-normalization short name that NamingResolutionStage produces. Validated: 220 tests pass, core-server (284 endpoints) generates valid 47K-line Python client, live API authentication works.

The Python codegen now uses SemanticSchema as the single source of truth for type iteration, with the raw Schema providing concrete type data for rendering. Architecture: - NormalizationPipeline::for_codegen() runs only CircularDependency detection (no TypeConsolidation, no NamingResolution) - schema.consolidate_types() runs first, then Normalizer builds SemanticSchema from the consolidated schema - Since NamingResolution is skipped, SemanticType.name() matches the raw Schema's names exactly — no name-domain mismatch - Removed all original_name bridging logic TypeVar collision fix: - Detects when TypeVar names (e.g., Identity) collide with class names and renames them with _T_ prefix (_T_Identity) - rename_type_params_in_schema() propagates renames through all type parameter declarations and type references Validated: 220 tests pass, core-server (284 endpoints, 59K lines) generates valid Python, live API authentication works.

Replace the fixed standard()/for_codegen() pipeline variants with a declarative PipelineBuilder that lets backends configure each stage: PipelineBuilder::new() .consolidation(Consolidation::Skip) // or Standard (default) .naming(Naming::Skip) // or Standard, or Custom(stage) .circular_dependency_strategy(...) // default: Intelligent .add_stage(custom_stage) // append backend-specific stages .build() Three configuration dimensions: - Consolidation: Standard (run TypeConsolidationStage) | Skip - Naming: Standard (NamingResolution) | Skip | Custom(Box<dyn Stage>) - ResolutionStrategy: passed to CircularDependencyResolutionStage Convenience methods standard() and for_codegen() delegate to the builder internally and remain as shorthand. Python codegen uses PipelineBuilder directly with Skip/Skip. Architecture doc updated with PipelineBuilder diagram and config docs.

- Remove stale architecture doc claims (field descriptions and error types are now implemented, not "remaining gaps") - Remove dead code in render_struct: unreachable flattened-fields collection loop (flattened structs take the early return path)

- Always import Field — used for descriptions, aliases, discriminators across many contexts. Fixes NameError for schemas with aliased fields but no discriminated unions. - Remove dead try/except around response_model identity check in runtime client (both sync and async).

avkonst and others added 5 commits March 26, 2026 20:49

remove aksama dependency

f9a9660

format

bc6d273

resolve CI issues

d67d509

fix CI issues

a7ea9d9