Admin Ops & Metrics

The admin ops system provides operational visibility through health checks, queue monitoring, latency metrics, and build information.

Architecture

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  /admin/ops     │────▶│  AdminOpsService │────▶│  Queue Snapshot │
│  /health        │     │                  │     │  Prometheus     │
└─────────────────┘     └──────────────────┘     └─────────────────┘
                               │
                               ▼
                        ┌──────────────────┐
                        │  Health          │
                        │  Indicators      │
                        └──────────────────┘

Sources:

apps/teetime/teetime-backend/src/admin/admin-ops.controller.ts
apps/teetime/teetime-backend/src/app/health/health.controller.ts

Ops Status

Endpoint

GET /admin/ops/status

// Response
{
  "queues": [
    { "name": "tee-time-queue", "waiting": 5, "active": 2, "delayed": 0, "failed": 1 }
  ],
  "apiLatencyMsP50": 45,
  "apiLatencyMsP95": 120,
  "replicaLagSeconds": 0.5,
  "timestamp": "2025-12-15T08:00:00Z"
}

Queue Metrics

Metric	Description
`waiting`	Jobs waiting to be processed
`active`	Jobs currently processing
`delayed`	Jobs scheduled for future
`failed`	Failed jobs

Latency Metrics

Fetched from Prometheus:

// P50 latency
histogram_quantile(0.5, sum(rate(http_server_duration_seconds_bucket[5m])) by (le))

// P95 latency
histogram_quantile(0.95, sum(rate(http_server_duration_seconds_bucket[5m])) by (le))

// Replica lag
pg_last_wal_receive_lsn_lag_seconds

Build Info

Endpoint

GET /admin/build-info

// Response
{
  "version": "2.1.0",
  "commitSha": "abc123def456",
  "environment": "production",
  "builtAt": "2025-01-10T14:30:00Z"
}

Configuration Priority

Environment variables (highest)
Build info file (BUILD_INFO_FILE)
Defaults (lowest)

Env Var	Field
`BUILD_VERSION`	version
`BUILD_COMMIT_SHA`	commitSha
`NODE_ENV`	environment
`BUILD_DATE`	builtAt

Health Checks

Endpoints

Endpoint	Purpose
`GET /health`	Aggregate readiness
`GET /health/readiness`	Full readiness check
`GET /health/liveness`	Lightweight alive check
`GET /health/providers`	External provider health
`GET /health/config`	Environment validation
`GET /health/external`	External services (S3)
`GET /health/panels`	All panels combined

Health Indicators

Indicator	What It Checks
`StorageHealthIndicator`	Storage database ping
`TeeSheetDbHealthIndicator`	Tee-sheet database (5s timeout)
`QueueHealthIndicator`	Redis PING/PONG
`RedisReadyIndicator`	Redis cache connectivity
`McaHealthIndicator`	MCA API endpoint
`ProviderHealthIndicator`	GolfNow, Lightspeed, ForeUp
`S3HealthIndicator`	S3 bucket HeadBucket
`EnvHealthIndicator`	Required env vars

Response Format

{
  "status": "ok",
  "details": [
    { "storage": { "status": "up", "latencyMs": 42 } },
    { "teeSheetDb": { "status": "up", "latencyMs": 55 } },
    { "queue": { "status": "up", "latencyMs": 23 } }
  ]
}

Prometheus Metrics

Endpoint

GET /metrics
// Returns Prometheus-compatible metrics format

Infrastructure Gauges

Metric	Description
`storage_up`	Storage DB connectivity (1/0)
`tee_sheet_db_up`	Tee-sheet DB connectivity
`queue_up`	Queue/Redis connectivity
`service_ready`	Overall service readiness
`provider_up`	Provider availability
`s3_up`	S3 bucket connectivity
`weather_up`	Weather service availability
`geocoding_up`	Geocoding service availability

Queue Gauges

queue_jobs{queue="tee-time-queue", status="waiting"} 5
queue_jobs{queue="tee-time-queue", status="active"} 2
queue_jobs{queue="tee-time-queue", status="delayed"} 0
queue_jobs{queue="tee-time-queue", status="failed"} 1

Metrics Refresh

Background services refresh metrics periodically:

Service	Interval	Config
`StorageMetricsService`	60s	`METRICS_PING_INTERVAL_MS`
`QueueMetricsService`	60s	`METRICS_QUEUE_INTERVAL_MS`

Environment Validation

The config health check validates:

Required:

TEETIME_DATABASE_URL

Optional Groups (consistency rules):

Group	Variables	Rule
Redis	`TEE_CACHE_REDIS_URL`	—
OpenMeteo	`OPENMETEO_BASE_URL`, `OPENMETEO_APIKEY`	Both required if either set
Geocoding	`GEOCODING_BASE_URL`, `OPENMETEO_APIKEY`	Both required if either set
Twilio	`TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`	Both required if either set
Booking Policy	`BOOKING_ADVANCE_WINDOW_DAYS`	Must be numeric

Configuration

Prometheus

Env Var	Description
`PROMETHEUS_BASE_URL`	Prometheus server URL
`API_LATENCY_P50_QUERY`	Custom P50 query
`API_LATENCY_P95_QUERY`	Custom P95 query
`REPLICA_LAG_QUERY`	Custom replica lag query

Queue Monitoring

Env Var	Description
`OPS_STATUS_QUEUE_NAMES`	Comma-separated queue names
`QUEUE_NAMES`	Fallback queue names

Health Checks

Env Var	Description
`MCA_PING_ENDPOINTS`	Additional MCA endpoints to check
`STORAGE_BUCKET`	S3 bucket name
`S3_REGION`	AWS region

Authentication

All admin ops endpoints require:

JwtAuthGuard
AudienceGuard('teetime-admin')

Health and metrics endpoints are typically public for monitoring systems.

Tenant Resolution

The ops service resolves tenant ID from multiple sources:

User's tenantIds array
JWT claims
x-tenant-id header
Custom claim keys

GET /admin/tenant/profile

// Response
{
  "tenantId": "tenant-123",
  "displayName": "Golf Club Inc",
  "description": "Premium golf management"
}

Troubleshooting

Health Check Failing

Check database connectivity
Verify Redis is running
Check external service URLs
Review timeout settings (default: 5s for DB)

Metrics Not Updating

Verify PROMETHEUS_BASE_URL is set
Check Prometheus is accessible
Review query syntax
Check metrics refresh interval

Queue Depth High

Check worker processes are running
Review failed job logs
Check Redis memory usage
Consider scaling workers

Architecture​

Ops Status​

Endpoint​

Queue Metrics​

Latency Metrics​

Build Info​

Endpoint​

Configuration Priority​

Health Checks​

Endpoints​

Health Indicators​

Response Format​

Prometheus Metrics​

Endpoint​

Infrastructure Gauges​

Queue Gauges​

Metrics Refresh​

Environment Validation​

Configuration​

Prometheus​

Queue Monitoring​

Health Checks​

Authentication​

Tenant Resolution​

Troubleshooting​

Health Check Failing​

Metrics Not Updating​

Queue Depth High​

Architecture

Ops Status

Endpoint

Queue Metrics

Latency Metrics

Build Info

Endpoint

Configuration Priority

Health Checks

Endpoints

Health Indicators

Response Format

Prometheus Metrics

Endpoint

Infrastructure Gauges

Queue Gauges

Metrics Refresh

Environment Validation

Configuration

Prometheus

Queue Monitoring

Health Checks

Authentication

Tenant Resolution

Troubleshooting

Health Check Failing

Metrics Not Updating

Queue Depth High