Skip to content

CannObserv/address-validator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

337 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Address Validator

FastAPI service (v2.0.0) that parses and standardizes US physical addresses per USPS Publication 28.

All API routes live under /api/v1/ and return an api_version field in every response body plus an API-Version: 1 response header.

Features

  • Parse a raw address string into labelled components using the usaddress library.
  • Standardize addresses to USPS format: all-caps, official suffix abbreviations (Avenue → AVE), directional abbreviations (South → S), state abbreviations (Illinois → IL), secondary unit designators (Suite → STE), and ZIP code normalization.
  • Intersection support"Hollywood Blvd and Vine St""HOLLYWOOD BLVD & VINE ST".
  • Country validation — requests accept an optional country field (ISO 3166-1 alpha-2, default "US"). Invalid codes are rejected using the pycountry library; only US is currently supported.
  • API key authentication/api/v1/* endpoints require an X-API-Key header; docs remain open.
  • CORS enabled — cross-origin requests are allowed from any origin.
  • Health checkGET /api/v1/health for liveness probes.

Endpoints

Method Path Auth Description
POST /api/v1/parse 🔒 Parse raw address string into components
POST /api/v1/standardize 🔒 Standardize address to USPS Pub 28 format
GET /api/v1/health Service health check
GET /docs Interactive Swagger UI
GET /redoc ReDoc API documentation

All POST endpoints accept and return application/json. Address inputs are limited to 1000 characters.

POST /api/v1/parse

Request:

{
  "address": "1600 Pennsylvania Avenue NW, Washington, DC 20500",
  "country": "US"
}

The country field is optional (defaults to "US").

Response:

{
  "input": "1600 Pennsylvania Avenue NW, Washington, DC 20500",
  "country": "US",
  "components": {
    "spec": "usps-pub28",
    "spec_version": "unknown",
    "values": {
      "address_number": "1600",
      "street_name": "Pennsylvania",
      "street_name_post_type": "Avenue",
      "street_name_post_directional": "NW",
      "city": "Washington",
      "state": "DC",
      "zip_code": "20500"
    }
  },
  "type": "Street Address",
  "warnings": [],
  "api_version": "1"
}

The type field is one of "Street Address", "Intersection", or "Ambiguous". The warnings list is populated whenever input is silently modified during parsing — for example when parenthesized text is stripped, when repeated address numbers are joined into a range, or when a unit designator is recovered from a mis-tagged field. It is empty on clean input.

The components field is a ComponentSet containing:

  • spec — machine identifier for the component schema (e.g. "usps-pub28").
  • spec_version — edition of the spec the values conform to.
  • values — the labelled address component key/value pairs.

POST /api/v1/standardize

Accepts either a raw address string or pre-parsed components. When both are provided, components takes precedence and address is ignored.

Request (string):

{
  "address": "350 Fifth Ave Suite 3300, New York, NY 10118",
  "country": "US"
}

Request (components):

{
  "components": {
    "address_number": "350",
    "street_name": "Fifth",
    "street_name_post_type": "Ave",
    "occupancy_type": "Suite",
    "occupancy_identifier": "3300",
    "city": "New York",
    "state": "NY",
    "zip_code": "10118"
  },
  "country": "US"
}

Response:

{
  "address_line_1": "350 FIFTH AVE",
  "address_line_2": "STE 3300",
  "city": "NEW YORK",
  "region": "NY",
  "postal_code": "10118",
  "country": "US",
  "standardized": "350 FIFTH AVE  STE 3300  NEW YORK, NY 10118",
  "components": {
    "spec": "usps-pub28",
    "spec_version": "unknown",
    "values": {
      "address_number": "350",
      "street_name": "FIFTH",
      "street_name_post_type": "AVE",
      "occupancy_type": "STE",
      "occupancy_identifier": "3300",
      "city": "NEW YORK",
      "state": "NY",
      "zip_code": "10118"
    }
  },
  "warnings": [],
  "api_version": "1"
}

The response uses geography-neutral field names: region (not state) and postal_code (not zip_code). The standardized field uses two-space separators between address lines, matching the USPS single-line format convention.

GET /api/v1/health

Response:

{"status": "ok", "api_version": "1"}

No authentication required.

Authentication

All /api/v1/* endpoints (except /api/v1/health) require an X-API-Key header. Set the expected key via the API_KEY environment variable:

export API_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')

Requests without a valid key receive 401 or 403. Swagger (/docs) and ReDoc (/redoc) remain open.

curl -X POST http://localhost:8000/api/v1/standardize \
  -H 'Content-Type: application/json' \
  -H 'X-API-Key: YOUR_KEY' \
  -d '{"address": "350 Fifth Ave, New York, NY 10118"}'

Setup

Requires Python ≥ 3.12. Dependencies are managed with uv.

uv sync                  # install dependencies into .venv/
export API_KEY="your-secret-key"
uv run uvicorn address_validator.main:app --host 0.0.0.0 --port 8000

A systemd unit file (address-validator.service) is included for persistent deployment. The API key is stored in /etc/address-validator/.env and loaded via EnvironmentFile=.

Project Structure

src/address_validator/
  main.py                      # FastAPI app entry point, CORS config
  auth.py                      # API key authentication dependency
  models.py                    # Shared Pydantic request/response models
  logging_filter.py            # RequestIdFilter — injects request_id into logs
  middleware/
    request_id.py              # ULID generation, X-Request-ID header
  routers/v1/
    core.py                    # Country validation, APIError, helpers
    health.py                  # GET /api/v1/health
    parse.py                   # POST /api/v1/parse
    standardize.py             # POST /api/v1/standardize
  services/
    parser.py                  # usaddress wrapper, tag-name mapping
    standardizer.py            # USPS Pub 28 standardization logic
    validation/                # Provider abstraction + USPS/Google/chain/cache
  usps_data/
    spec.py                    # Pub 28 spec identifier constants
    suffixes.py                # Street suffix abbreviations
    directionals.py            # Directional abbreviations
    states.py                  # State name → abbreviation map
    units.py                   # Secondary unit designators
docs/
  usps-pub28.md                # Pub 28 research notes
  usps-addresses-v3r2_3.yaml   # Archived USPS Addresses API v3 spec
tests/
  conftest.py                  # Shared fixtures, API_KEY bootstrap
  unit/                        # Unit tests (parser, standardizer, auth, data)
  integration/                 # Integration tests (HTTP endpoints)
address-validator.service      # systemd unit file
pyproject.toml                 # Project metadata, dependencies, tool config

Standardization Rules Applied

  • All output uppercased
  • Parenthesized text removed (USPS Pub 28 §354 — not valid in standardized addresses; typically wayfinding notes)
  • Trailing commas, semicolons, and stray punctuation stripped
  • Street suffixes abbreviated (USPS Pub 28 Appendix C)
  • Directionals abbreviated (N, S, E, W, NE, NW, SE, SW)
  • State names converted to two-letter abbreviations
  • Secondary unit designators abbreviated (Suite → STE, Apartment → APT, Building/Bldg/Bld → BLDG, etc.)
  • Unit identifiers without a designator default to #
  • Designator words folded into identifiers are extracted (e.g. NO. 16# 16)
  • Both occupancy and subaddress designators preserved when present
  • Dual address numbers joined with hyphen (1804 & 18101804-1810)
  • Periods removed from all components
  • ZIP codes normalized to 5-digit or 5+4 format
  • Unit designators mis-tagged as city by the parser are recovered (e.g. BASEMENT, FREELAND → line 2 BSMT, city FREELAND)
  • Non-address wayfinding words (e.g. YARD) dropped from city
  • Line 2 ordering: larger container (BLDG) before specific unit (STE)
  • Intersections formatted as STREET1 & STREET2

Testing

uv run pytest                          # full suite with coverage
uv run pytest --no-cov                 # fast, no coverage
uv run pytest tests/unit/test_parser.py # single file
uv run ruff check .                    # lint
uv run ruff format .                   # format

Coverage floor: 80% line + branch (enforced by --cov-fail-under=80).

About

Web service to normalize and validate physical addresses

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors