Data Warehouse for the Texas Hearing Institute.
This repository provides a Data Warehouse designed to ingest, validate, and store patient and organizational data. The system is accessed via Power BI for reporting and analytics.
The system is orchestrated using the Make utility, which provides a single entry point for all major operations. Before you begin, ensure you have the following installed:
- Docker Desktop (or equivalent): Required for running the core infrastructure.
- Make: Orchestration utility (pre-installed on macOS and most Linux distributions).
- Node.js (v18+): Required only for local frontend development.
- Python (v3.11+): Required only for local backend development.
For clients and operators deploying the full system, use the following command to launch the production-ready background stack:
make deployThis command automatically initializes and starts the following services:
- Postgres 16: The central relational database and primary data warehouse.
- RabbitMQ: The message broker for asynchronous task processing.
- SeaweedFS: S3-compatible local object storage for file persistence.
- FastAPI Backend: The core API server providing the REST interface for schema validation and file management.
- Celery Worker: The background processing engine that handles file validation and ETL.
Once the system is deployed, you can verify the health of the services using these interfaces:
- System Status: Run
make psto view the status of all containers. - API Documentation: http://localhost:8000/docs
- Task Queue Dashboard: http://localhost:15672 (guest / guest)
- Storage Explorer: http://localhost:8888
This system is optimized for use as a Data Warehouse for Power BI. To connect your reports to the warehouse, follow these steps:
- Open Power BI Desktop.
- Go to Get Data > PostgreSQL Database.
- Enter the following connection details:
| Parameter | Value (Out-of-the-Box Defaults) |
|---|---|
| Server | localhost (Or your server IP) |
| Database | postgres |
| Authentication | Use the Database tab |
| User | postgres |
| Password | password |
| Port | 5432 |
- Once connected, you will see the tables generated from your processed CSV/XLSX files. You can now build relationships and visualizations directly in Power BI.
The system is built to run immediately "out of the box" using the default credentials listed above, which are standard, dummy defaults.
When preparing to deploy this to your organization's live environment, you must override these defaults directly within the docker-compose.yml file to secure the warehouse. You do not need to manage a separate .env file.
Open docker-compose.yml and update the default postgres and password variables across these three services:
- db: Update
POSTGRES_USERandPOSTGRES_PASSWORD. - backend: Update
userandpasswordto match the new database credentials. - celery_worker: Update
userandpasswordto match the new database credentials.
For developers wishing to contribute or modify the codebase, follow these steps.
Prepare your local machine by installing the necessary dependencies for both the frontend and backend:
make install- Full Stack (Recommended): Starts everything in Docker.
make dev
- Hybrid Mode (Fastest iterations): Runs the infrastructure (DB/Queue/Storage) in Docker, but executes the API and Next.js processes directly on your host machine for optimized debugging and hot-reloading.
make dev-local
All backend logic is covered by a comprehensive pytest suite.
make testThis command ensures the infrastructure is healthy and executes the tests inside the backend container to ensure environment parity.
| Command | Primary Use Case | Action |
|---|---|---|
make deploy |
Production Startup | Launches the full background stack and waits for health checks. |
make dev |
Local Development | Starts the full stack and the Next.js development server. |
make dev-local |
Active Coding | Runs DB/MQ in Docker while services run on the host. |
make test |
Quality Assurance | Executes the full test suite in the backend container. |
make install |
Setup | Configures local npm and Python virtual environments. |
make ps |
Monitoring | Shows the uptime and status of all system containers. |
make down |
Shutdown | Stops and removes all containers, networks, and volumes. |