Skip to content
This repository was archived by the owner on Mar 12, 2026. It is now read-only.
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

This file was deleted.

4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,7 @@ venv/

# Solutions (maintainer only)
**/solutions/

# Full validation scripts (contain solutions)
.github/skills/validate-devops-lab/aws/scripts/run-full-validation.sh
.github/skills/validate-devops-lab/azure/scripts/run-full-validation.sh
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ You've just joined a startup as the DevOps engineer. The previous engineer left,
| Provider | Status | Guide |
|----------|--------|-------|
| Azure | ✅ Available | [azure/README.md](azure/README.md) |
| AWS | 🚧 Coming soon | — |
| AWS | ✅ Available | [aws/README.md](aws/README.md) |
| GCP | 🚧 Coming soon | — |

## How It Works
Expand Down
171 changes: 162 additions & 9 deletions aws/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,167 @@
# AWS DevOps Lab

🚧 **Coming soon.**
Fix a broken DevOps pipeline deployed to AWS. Work through 7 incidents to get the application running.

The AWS version of this lab will use equivalent services:
```
┌─────────────────────────────────────────────────────────────┐
│ AWS Resources │
│ │
│ ┌──────────┐ ┌──────────┐ ┌────────────────────────┐ │
│ │ VPC │ │ ECR │ │ EKS │ │
│ │ │ │ (images) │──▶│ ┌─────┐ ┌───────┐ │ │
│ │ Subnet │ │ │ │ │ App │──│ Redis │ │ │
│ │ │ └──────────┘ │ └─────┘ └───────┘ │ │
│ └──────────┘ └────────────────────────┘ │
│ │
│ ┌──────────────────┐ ┌───────────────────────────────┐ │
│ │ CloudWatch │ │ Container Insights │ │
│ │ Log Group │ │ + Alarms │ │
│ └──────────────────┘ └───────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```

| Azure | AWS Equivalent |
|-------|---------------|
| ACR | ECR |
| AKS | EKS |
| Azure Monitor | CloudWatch |
| VNet | VPC |
## Prerequisites

Want to help build it? See our [Contributing Guide](../CONTRIBUTING.md).
- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
- [Terraform](https://developer.hashicorp.com/terraform/install) (v1.0+)
- [Docker](https://docs.docker.com/get-docker/)
- [kubectl](https://kubernetes.io/docs/tasks/tools/)
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws/scripts/validate.sh uses Python with the yaml module to validate GitHub Actions workflows, but the AWS lab prerequisites don’t mention Python/PyYAML. Add Python 3 + PyYAML (or adjust validation to avoid requiring PyYAML) so students can actually reach 7/7 incident resolution and export a token.

Suggested change
- [kubectl](https://kubernetes.io/docs/tasks/tools/)
- [kubectl](https://kubernetes.io/docs/tasks/tools/)
- [Python 3](https://www.python.org/downloads/) (3.x)
- [PyYAML](https://pyyaml.org/wiki/PyYAMLDocumentation) for Python 3 (e.g., `pip install pyyaml`)

Copilot uses AI. Check for mistakes.

## Getting Started

1. Clone this repo and navigate to the AWS scripts:
```bash
git clone https://github.com/learntocloud/devops-lab
cd devops-lab/aws/scripts
```

2. Log in to AWS:
```bash
aws configure
```

3. Run the setup script:
```bash
chmod +x *.sh
./setup.sh
```

**Cost**: ~$3-5/session. Destroy resources when done.

---

## Incident Queue

You're the new DevOps engineer. Seven incidents are waiting. Diagnose and fix each one.

---

### 🎫 INC-001: Container Image Won't Build

**Priority:** High
**Reported by:** Development Team
**Tools:** `docker` CLI

> "We can't build the app's Docker image. The `docker build` command fails immediately with errors. The Dockerfile is at `aws/docker/Dockerfile`. We need the image to build successfully and the container to start and respond on the correct port."

**What to fix:** `aws/docker/Dockerfile`

---

### 🎫 INC-002: Local Dev Environment Broken

**Priority:** High
**Reported by:** Development Team
**Tools:** `docker compose` CLI

> "Docker Compose won't bring up our local environment. The app can't connect to Redis, and the port mapping seems wrong. The compose file is at `aws/docker/docker-compose.yml`. We need both services (app + redis) to start and communicate."

**What to fix:** `aws/docker/docker-compose.yml`

---

### 🎫 INC-003: CI Pipeline is Broken

**Priority:** High
**Reported by:** Engineering Manager
**Tools:** GitHub Actions YAML reference

> "Our CI workflow has YAML errors and the steps are in the wrong order. Tests run before dependencies are installed, and some action versions look wrong. The workflow is at `aws/github-actions/ci.yml`."

**What to fix:** `aws/github-actions/ci.yml`

---

### 🎫 INC-004: Terraform Can't Provision Infrastructure

**Priority:** Critical
**Reported by:** Platform Team
**Tools:** `terraform` CLI, `aws` CLI

> "Terraform plan fails with multiple errors. There are typos in resource types, something is wrong with the IAM role policies, and the cluster networking configuration has conflicts. The config is at `aws/terraform/`. We need the VPC, ECR, EKS cluster, and monitoring log group to all deploy successfully."

**What to fix:** `aws/terraform/main.tf`, `aws/terraform/outputs.tf`

---

### 🎫 INC-005: Deployment Pipeline Failing

**Priority:** High
**Reported by:** Release Team
**Tools:** GitHub Actions YAML reference, `aws` CLI

> "The CD pipeline can't deploy to EKS. The AWS credentials action is misconfigured, and the deployment steps aren't right. The workflow is at `aws/github-actions/cd.yml`."

**What to fix:** `aws/github-actions/cd.yml`

**Note:** OIDC is the recommended approach.

---

### 🎫 INC-006: Kubernetes Deployment Crashing

**Priority:** Critical
**Reported by:** SRE Team
**Tools:** `kubectl` CLI

> "Pods won't start in EKS. The deployments have wrong API versions, label selectors don't match between deployments and services, container ports are wrong, and the readiness probe is hitting an endpoint that doesn't exist. Manifests are in `aws/kubernetes/`."

**What to fix:** `aws/kubernetes/app-deployment.yaml`, `aws/kubernetes/app-service.yaml`, `aws/kubernetes/redis-deployment.yaml`, `aws/kubernetes/redis-service.yaml`

---

### 🎫 INC-007: Monitoring Not Working

**Priority:** Medium
**Reported by:** Observability Team
**Tools:** `aws` CLI

> "The pod restart alarm is disabled and should be enabled. We need Container Insights running on EKS, and our alarm configuration at `aws/monitoring/alerts.json` needs fixing. The alarm for pod restarts should be severity 2 (not 1), and it should evaluate every minute (not every 5 minutes)."

**What to fix:** `aws/monitoring/alerts.json`

---

## Verify Your Fixes

Check incident status anytime:

```bash
cd aws/scripts
./validate.sh
```

Generate your completion token after all incidents are resolved:

```bash
./validate.sh export
```

## Clean Up

**Always destroy resources when done to avoid charges:**

```bash
cd aws/scripts
./destroy.sh
```
39 changes: 39 additions & 0 deletions aws/app/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
from fastapi import FastAPI
import redis
import os

app = FastAPI(title="DevOps Lab App")

REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = int(os.getenv("REDIS_PORT", "6379"))


def get_redis():
try:
r = redis.Redis(host=REDIS_HOST, port=REDIS_PORT, decode_responses=True)
r.ping()
return r
except redis.ConnectionError:
return None


@app.get("/health")
def health():
r = get_redis()
redis_status = "connected" if r else "disconnected"
return {"status": "healthy", "redis": redis_status}


@app.get("/api/status")
def status():
r = get_redis()
if r:
visits = r.incr("visits")
else:
visits = -1
return {
"app": "devops-lab",
"version": "1.0.0",
"visits": visits,
"redis": "connected" if r else "disconnected",
}
3 changes: 3 additions & 0 deletions aws/app/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
fastapi==0.115.0
uvicorn==0.30.0
redis==5.0.0
19 changes: 19 additions & 0 deletions aws/app/tests/test_app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from fastapi.testclient import TestClient
from app import app

client = TestClient(app)


def test_health():
response = client.get("/health")
assert response.status_code == 200
data = response.json()
assert data["status"] == "healthy"


def test_status():
response = client.get("/api/status")
assert response.status_code == 200
data = response.json()
assert data["app"] == "devops-lab"
assert data["version"] == "1.0.0"
14 changes: 14 additions & 0 deletions aws/docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Dockerfile for DevOps Lab App
FROM python:3.11-slm

WORKDIR /src

COPY app/requirements.txt .

RUN pip install -r requirements.txt

COPY app/ .

EXPOSE 5000

CMD ["python", "app.py"]
26 changes: 26 additions & 0 deletions aws/docker/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
version: "3.8"

services:
app:
build:
context: .
dockerfile: Dockerfile
ports:
- "8000:5000"
environment:
- REDIS_HOST=cache
- REDIS_PORT=6379
depends_on:
- cache
networks:
- backend

redis:
image: redis:alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data

volumes:
redis_data:
55 changes: 55 additions & 0 deletions aws/github-actions/cd.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
name: CD Pipeline

on:
workflow_dispatch:
push:
branches: [main]

env:
AWS_REGION: ${{ secrets.AWS_REGION }}
ECR_REPO: ${{ secrets.ECR_REPO }}
EKS_CLUSTER: ${{ secrets.EKS_CLUSTER_NAME }}

jobs:
build-and-push:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
credentials: ${{ secrets.AWS_CREDENTIALS }}
aws-region: ${{ env.AWS_REGION }}

- name: Login to ECR
run: |
aws ecr get-login-password --region ${{ env.AWS_REGION }} | \
docker login --username AWS --password-stdin ${{ env.ECR_REPO }}
Comment on lines +8 to +29
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws/scripts/validate.sh marks INC-005 as resolved when cd.yml contains configure-aws-credentials@v4, no credentials: key, aws-access-key-id, and kubectl. This cd.yml already satisfies those checks, so students will see INC-005 resolved immediately, which conflicts with the lab’s incident queue/intent. Either introduce an intentional failure in this workflow (that validate_inc_005 detects) or tighten validate_inc_005 so the initial file is correctly treated as broken.

Copilot uses AI. Check for mistakes.

- name: Build and push image
run: |
docker build -f aws/docker/Dockerfile -t ${{ env.ECR_REPO }}:latest .
docker push ${{ env.ECR_REPO }}:latest

deploy-to-eks:
runs-on: ubuntu-latest
needs: build-and-push
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
credentials: ${{ secrets.AWS_CREDENTIALS }}
aws-region: ${{ env.AWS_REGION }}

- name: Get EKS credentials
run: |
aws eks update-kubeconfig --region ${{ env.AWS_REGION }} --name ${{ env.EKS_CLUSTER }}

- name: Deploy to EKS
run: |
kubectl set image deployment/devops-lab-app app=${{ env.ECR_REPO }}:latest -n devops-lab
33 changes: 33 additions & 0 deletions aws/github-actions/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: CI Pipeline

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v99

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Run tests
working-directory: ./aws/app
run: |
python -m pytest tests/ -v

- name: Install dependencies
working-directory: ./aws/app
run: |
pip install -r requirements.txt

- name: Build Docker image
run: |
docker build -f aws/docker/Dockerfile -t devops-lab-app .
Loading