Monitoring

kombify TechStack provides comprehensive monitoring for your infrastructure through built-in metrics, health checks, and integrations with popular observability tools.

Monitoring 2.0: TechStack now includes an embedded Prometheus TSDB with a PromQL-compatible API, enabling built-in metric storage and querying without external Prometheus. External Prometheus integration remains supported.

Built-in health checks

API health endpoint

curl https://api.kombify.io/v1/orchestrator/health

Self-hosted: Replace the base URL with http://localhost:5260/api/v1 when running TechStack locally.

Response:

{
  "status": "healthy",
  "version": "1.0.0",
  "uptime": "72h15m30s",
  "checks": {
    "database": "ok",
    "grpc": "ok",
    "workers": "ok"
  }
}

Health probes

Stack exposes standard health probes (compatible with Kubernetes, Docker healthchecks, and load balancers):

Endpoint	Purpose	When to use
`/health/live`	Liveness probe	Is the process running?
`/health/ready`	Readiness probe	Can it handle traffic?
`/health/startup`	Startup probe	Has it started successfully?

# Docker Compose healthcheck example
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:5260/health/ready"]
  interval: 30s
  timeout: 10s
  retries: 3

Prometheus metrics

Stack exposes metrics in Prometheus format at /metrics:

curl http://localhost:5260/metrics

Key metrics

Metric	Type	Description
`kombistack_api_requests_total`	Counter	Total API requests
`kombistack_api_request_duration_seconds`	Histogram	Request latency
`kombistack_workers_connected`	Gauge	Connected agent count
`kombistack_workers_healthy`	Gauge	Healthy agent count
`kombistack_jobs_total`	Counter	Jobs by type and status
`kombistack_jobs_duration_seconds`	Histogram	Job execution time
`kombistack_stacks_total`	Gauge	Total stacks managed
`kombistack_drift_detected_total`	Counter	Drift detections

Prometheus configuration

prometheus.yml

scrape_configs:
  - job_name: 'kombistack'
    static_configs:
      - targets: ['localhost:5260']
    metrics_path: /metrics
    scheme: http

Worker (node) monitoring

Each connected agent reports metrics about its node:

# Get worker status
curl https://api.kombify.io/v1/orchestrator/workers

Response:

{
  "workers": [
    {
      "id": "worker_abc123",
      "name": "main-server",
      "status": "healthy",
      "last_heartbeat": "2026-02-03T10:30:00Z",
      "metrics": {
        "cpu_percent": 25.5,
        "memory_percent": 45.2,
        "disk_percent": 62.0,
        "containers_running": 12
      }
    }
  ]
}

Worker health states

State	Description
`healthy`	All checks passing
`degraded`	Some non-critical issues
`unhealthy`	Critical issues detected
`unreachable`	No heartbeat received
`pending`	Awaiting approval

Grafana dashboards

Import our pre-built Grafana dashboards:

Stack Overview
Workers
Jobs

Dashboard ID: kombistack-overviewShows:

API request rates
Error rates
Active workers
Job queue status

Dashboard ID: kombistack-workersShows:

CPU/memory per worker
Container counts
Network traffic
Heartbeat status

Dashboard ID: kombistack-jobsShows:

Job success/failure rates
Duration by type
Queue depth
Recent failures

Alerting

Example alert rules

alerting-rules.yml

groups:
  - name: kombistack
    rules:
      - alert: WorkerUnhealthy
        expr: kombistack_workers_healthy < kombistack_workers_connected
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Worker unhealthy"
          
      - alert: HighErrorRate
        expr: rate(kombistack_api_requests_total{status=~"5.."}[5m]) > 0.1
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High API error rate"
          
      - alert: DriftDetected
        expr: increase(kombistack_drift_detected_total[1h]) > 0
        labels:
          severity: warning
        annotations:
          summary: "Configuration drift detected"

Log aggregation

Stack outputs structured JSON logs that can be collected by any log aggregator:

{
  "time": "2026-02-03T10:30:00Z",
  "level": "INFO",
  "msg": "Job completed",
  "job_id": "job_abc123",
  "job_type": "provision",
  "duration_ms": 45230,
  "worker_id": "worker_xyz789"
}

Loki configuration

loki-config.yml

scrape_configs:
  - job_name: kombistack
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        regex: '/kombistack'
        action: keep

Service monitoring

Monitor deployed services via Stack’s built-in checks:

# Get service health for a stack
curl https://api.kombify.io/v1/orchestrator/stacks/{stack-id}/health

Response:

{
  "stack_id": "stack_abc123",
  "overall": "healthy",
  "services": [
    {
      "name": "traefik",
      "status": "healthy",
      "uptime": "72h",
      "health_check": {
        "url": "http://traefik:8080/ping",
        "status_code": 200,
        "latency_ms": 5
      }
    },
    {
      "name": "dokploy",
      "status": "healthy",
      "uptime": "71h55m"
    }
  ]
}

Monitoring stack deployment

Deploy a complete monitoring stack with Prometheus, Grafana, and Loki:

kombination.yaml

stackkit: base-kit
variant: default

services:
  # ... your services

monitoring:
  enabled: true
  prometheus:
    retention: 15d
    storage: 50Gi
  grafana:
    enabled: true
    admin_password: ${GRAFANA_PASSWORD}
  loki:
    enabled: true
    retention: 7d

Overview

How-To

Explanations

Reference

Built-in health checks

API health endpoint

Health probes

Prometheus metrics

Key metrics

Prometheus configuration

Worker (node) monitoring

Worker health states

Grafana dashboards

Alerting

Example alert rules

Log aggregation

Loki configuration

Service monitoring

Monitoring stack deployment

Next steps

Troubleshooting

Drift detection

Overview

How-To

Explanations

Reference

Documentation Index

​Built-in health checks

​API health endpoint

​Health probes

​Prometheus metrics

​Key metrics

​Prometheus configuration

​Worker (node) monitoring

​Worker health states

​Grafana dashboards

​Alerting

​Example alert rules

​Log aggregation

​Loki configuration

​Service monitoring

​Monitoring stack deployment

​Next steps

Troubleshooting

Drift detection

Built-in health checks

API health endpoint

Health probes

Prometheus metrics

Key metrics

Prometheus configuration

Worker (node) monitoring

Worker health states

Grafana dashboards

Alerting

Example alert rules

Log aggregation

Loki configuration

Service monitoring

Monitoring stack deployment

Next steps