Skip to content

[refactor] Semantic Function Clustering Analysis - Code Organization Review #671

@github-actions

Description

@github-actions

🎯 Analysis Overview

This automated semantic function clustering analysis examined 61 non-test Go files (391 total functions) in the MCP Gateway repository to identify refactoring opportunities through function organization patterns, semantic clustering, and code similarity detection.

Key Findings:

  • Overall Organization: Good - Most packages are well-organized by feature
  • ⚠️ 20 Similar Implementations: Including exact duplicates and near-duplicates (>60% similarity)
  • ⚠️ 2 Files with Mixed Purposes: Some files mix validation, formatting, and I/O logic
  • ℹ️ 12 Scattered Helper Functions: Helper functions spread across multiple files but logically grouped

Priority Findings:

  1. Exact Duplicate: SetGatewayVersion function duplicated in 2 packages (100% identical)
  2. Pattern Duplication: DIFC packages have similar set management code (Add, AddAll, Contains methods)
  3. Logger Close Methods: Similar Close implementations across 3 logger types (83-88% similar)

Full Analysis Report

1. Exact Duplicate Functions

🔴 Critical: SetGatewayVersion (100% Identical)

Function appears in:

  1. internal/config/validation_schema.go:53
  2. internal/server/unified.go:36

Implementation:

func SetGatewayVersion(version string) {
    if version != "" {
        gatewayVersion = version
    }
}

Issue: Two packages maintain identical gateway version state independently. This creates:

  • State synchronization risk (must call both)
  • Maintenance burden (changes must be duplicated)
  • Confusion about single source of truth

Recommendation:

  • Create a single version module: internal/version/version.go
  • Move gateway version management there
  • Import from both config and server packages
  • Impact: Eliminates state duplication, provides single source of truth

Estimated Effort: 1 hour (low risk refactoring)


2. High-Similarity Duplicates (>85%)

🟡 DIFC Set Management Pattern (87-92% similar)

Location: internal/difc/capabilities.go and internal/difc/labels.go

Both types implement nearly identical set management operations:

  • Add(tag Tag) - 89.7% similar
  • AddAll(tags []Tag) - 92.4% similar
  • Contains(tag Tag) bool - 91.4% similar
  • Count() int - 82.2% similar

Code Example (Add method):

// capabilities.go
func (c *Capabilities) Add(tag Tag) {
    c.mu.Lock()
    defer c.mu.Unlock()
    c.tags[tag] = struct{}{}
}

// labels.go
func (l *Label) Add(tag Tag) {
    l.mu.Lock()
    defer c.mu.Unlock()
    l.tags[tag] = struct{}{}
}

Analysis: Both types wrap map[Tag]struct{} with mutex protection and implement identical operations. This is a classic case for Go generics or shared implementation.

Recommendation:

  • Option A: Create generic TagSet[T any] type (Go 1.18+)
  • Option B: Extract shared implementation to embedded type
  • Option C: Keep as-is (acceptable duplication for type-specific semantics)

Assessment: This duplication may be intentional for type safety and semantic clarity. Capabilities represents global tags while Label represents agent-specific tags. The minor duplication (4 methods) may be preferable to abstraction complexity.

Recommendation: Keep as-is - The duplication is minimal and provides clear type-specific semantics. ✅


🟡 Logger Close Methods (83-88% similar)

Implementations:

  1. internal/logger/file_logger.go:Close()
  2. internal/logger/jsonl_logger.go:Close()
  3. internal/logger/markdown_logger.go:Close()

Pattern:

func (fl *FileLogger) Close() error {
    fl.mu.Lock()
    defer fl.mu.Unlock()
    return closeLogFile(fl.logFile, &fl.mu, "file")
}

func (jl *JSONLLogger) Close() error {
    jl.mu.Lock()
    defer jl.mu.Unlock()
    return closeLogFile(jl.logFile, &jl.mu, "JSONL")
}

Analysis: All three loggers delegate to shared closeLogFile helper. This is good design - the duplication is minimal (3 lines per method) and each type correctly implements its cleanup.

Recommendation: Keep as-is - This is idiomatic Go. Each type properly implements io.Closer interface. ✅


3. Files with Mixed Purposes

🟡 internal/config/validation_schema.go (8 functions)

Current Organization:

  • Formatting functions: formatSchemaError, formatValidationErrorRecursive, formatErrorContext
  • Getter functions: fetchAndFixSchema, getOrCompileSchema
  • Validation functions: validateJSONSchema, validateStringPatterns

Analysis: This file mixes schema validation, error formatting, and schema fetching. The file serves a cohesive purpose (schema validation), so the mixed categories are acceptable.

Recommendation: Acceptable - All functions relate to JSON schema validation. Consider splitting only if file grows beyond ~500 lines. ✅


🟡 internal/mcp/connection.go (30 functions)

Current Organization:

  • Creation: NewConnection, NewHTTPConnection, createJSONRPCRequest
  • Getters: GetHTTPURL, GetHTTPHeaders, getPrompt
  • I/O: readResource, Close

Analysis: 30 functions in one file suggests this is a central abstraction. The file manages MCP connection lifecycle, which naturally includes creation, access, and I/O.

Recommendation: Consider splitting into:

  • connection.go - core Connection type and lifecycle
  • connection_http.go - HTTP-specific functionality
  • connection_stdio.go - stdio-specific functionality

Estimated Effort: 3-4 hours


4. Scattered Helper Functions

ℹ️ Analysis: Well-Organized Helpers

The analysis found 12 helper functions across the codebase:

Convert functions (5 total):

  • internal/config/ - stdin configuration conversion (3 functions)
  • internal/server/unified.go - tool result conversion (1 function)

Sanitize functions (4 total):

  • internal/auth/header.go - TruncateSessionID (security)
  • internal/logger/sanitize/ - centralized sanitization package (3 functions) ✅

Assessment: Helper functions are well-organized:

  • Sanitization helpers are already in dedicated package (internal/logger/sanitize/)
  • Convert functions are in appropriate config and server packages
  • No refactoring needed ✅

Function Organization by Package

Package Distribution (61 files, 391 functions)

Package Files Functions Assessment
difc 5 66 ✅ Well-organized DIFC security labels
server 10 66 ⚠️ Consider splitting connection.go
logger 11 61 ✅ Well-modularized logging system
config 7 45 ✅ Clear config validation structure
mcp 2 31 ⚠️ connection.go is large (30 funcs)
launcher 3 25 ✅ Clean backend management
mcptest 4 22 ✅ Well-structured test utilities
cmd 7 20 ✅ Clean flag organization
guard 4 18 ✅ Good security guard pattern
rules 1 11 ✅ Focused rule evaluation
middleware 1 6 ✅ Clean middleware pattern
sys 1 6 ✅ System utilities
sanitize 1 5 ✅ Centralized sanitization
auth 1 5 ✅ Focused auth logic
tty 2 3 ✅ TTY utilities
timeutil 1 1 ✅ Time formatting

Overall Assessment: The codebase demonstrates good Go project organization following best practices:

  • Internal packages are well-scoped by feature
  • Helper functions are centralized where needed
  • Clear separation of concerns

Priority Refactoring Recommendations

✅ Priority 1: High Value, Low Risk

1.1 Consolidate SetGatewayVersion Duplicate

Action: Create unified version management module

// internal/version/version.go
package version

var gatewayVersion = "dev"

func Set(v string) {
    if v != "" {
        gatewayVersion = v
    }
}

func Get() string {
    return gatewayVersion
}
``````

**Changes Required**:
- Create `internal/version/version.go`
- Update `internal/config/validation_schema.go` to import version
- Update `internal/server/unified.go` to import version
- Update `main.go` to call `version.Set()` once at startup

**Benefits**:
- Single source of truth for gateway version
- No risk of state inconsistency
- Clearer initialization flow

**Estimated Effort**: 1 hour  
**Risk**: Low

---

### ⏸️ Priority 2: Consider for Future Work

#### 2.1 Split internal/mcp/connection.go

**Observation**: 30 functions in one file (largest in codebase)

**Suggested Split**:
``````
internal/mcp/
  connection.go          # Core Connection type, lifecycle
  connection_http.go     # HTTP-specific methods
  connection_stdio.go    # stdio-specific methods
  types.go              # (already exists)

Benefits:

  • Easier navigation and maintenance
  • Clear separation of transport-specific code

Estimated Effort: 3-4 hours
Priority: Medium (nice-to-have, not critical)


✅ Priority 3: No Action Needed

The following findings are acceptable as-is:

  1. DIFC Set Methods Duplication: Intentional for type-specific semantics ✅
  2. Logger Close Methods: Idiomatic Go interface implementation ✅
  3. Scattered Helpers: Already well-organized in appropriate packages ✅
  4. validation_schema.go Mixed Categories: Cohesive schema validation purpose ✅

Implementation Checklist

Immediate Actions (Optional)

  • Create internal/version package for unified version management
  • Update config and server packages to use shared version
  • Add tests for version module
  • Verify no functionality broken

Future Considerations (Low Priority)

  • Monitor internal/mcp/connection.go - consider splitting if it grows beyond 500 lines
  • Review logger implementations if new logger types are added (template pattern opportunity)

Analysis Metadata

  • Files Analyzed: 61 Go files (excluding test files)
  • Functions Cataloged: 391 total functions
  • Similarity Pairs Detected: 20 pairs
  • High-Priority Issues: 1 (SetGatewayVersion duplicate)
  • Analysis Date: 2026-02-04
  • Detection Method: Semantic analysis + AST parsing + code similarity detection

Conclusion

The MCP Gateway codebase demonstrates strong code organization with clear package boundaries and focused responsibilities. The analysis identified one high-priority refactoring opportunity (SetGatewayVersion duplicate) and several acceptable duplication patterns that reflect good Go practices.

Recommended Action: Address the SetGatewayVersion duplicate (Priority 1) and consider the connection.go split as a future enhancement.

The remaining duplication patterns are either:

  • Intentional for type safety (DIFC)
  • Idiomatic interface implementations (Loggers)
  • Well-organized helpers (Sanitization)

Overall Code Health: ✅ Excellent - Only minor improvements suggested.


References:

AI generated by Semantic Function Refactoring

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions