Vision-only — This StackKit is not yet implemented. The documentation below describes the planned architecture and capabilities. Use the Base Kit for production deployments today.
The High Availability Kit provides high-availability infrastructure for homelabs that can’t afford downtime. It supports 2-5 nodes with automatic failover.
Overview
Key Features
Auto Failover Automatic failover when primary node fails
Load Balancing Distribute traffic across healthy nodes
Data Replication Synchronous database replication
Requirements
Resource Per Node Total (2 nodes) CPU 2 cores 4 cores RAM 8 GB 16 GB Storage 50 GB SSD 100 GB Network 1 Gbps Same subnet
Architecture
Failover Mechanism
Database Replication
Quick Start
Define your nodes
stackkit : ha-kit
domain : homelab.local
email : you@example.com
nodes :
- name : node-1
ip : 192.168.1.10
role : primary
- name : node-2
ip : 192.168.1.11
role : replica
# Virtual IP for failover
failover :
strategy : keepalived
virtual_ip : 192.168.1.100
interface : eth0
Configure services
services :
traefik :
enabled : true
replicas : all # Run on all nodes
authelia :
enabled : true
replicas : all
postgres :
enabled : true
mode : ha # Enable replication
synchronous : true # Zero data loss
redis :
enabled : true
mode : sentinel # Redis Sentinel for HA
Validate and generate
stackkit validate
stackkit generate
Deploy to all nodes
# Deploy to primary first
stackkit deploy --node node-1
# Then replicas
stackkit deploy --node node-2
Configuration Reference
Failover Strategies
Keepalived (Recommended)
Corosync/Pacemaker
failover :
strategy : keepalived
virtual_ip : 192.168.1.100
interface : eth0
priority :
node-1 : 100 # Higher = preferred primary
node-2 : 90
Uses VRRP protocol for fast failover (~3 seconds). failover :
strategy : corosync
quorum : majority
resources :
- virtual_ip : 192.168.1.100
- postgres_primary
Enterprise-grade with resource management.
Database HA Modes
Mode Description RPO RTO asyncAsync replication Seconds ~30s syncSync replication Zero ~30s quorumQuorum commit Zero ~30s
services :
postgres :
enabled : true
mode : ha
replication :
type : sync # async, sync, quorum
max_replicas : 2
Service Distribution
services :
# Run on all nodes (stateless)
traefik :
replicas : all
# Run on specific nodes
home-assistant :
replicas : 1
preferred_node : node-1
# Run on N nodes
plex :
replicas : 2
Network Architecture
Monitoring
Built-in health checks and monitoring:
monitoring :
enabled : true
healthchecks :
interval : 10s
timeout : 5s
alerts :
- type : node_down
notify : email
- type : failover_triggered
notify : [ email , slack ]
- type : replication_lag
threshold : 10s
notify : slack
Testing Failover
Always test failover in a maintenance window first!
# Simulate node failure
ssh node-1 "sudo systemctl stop docker"
# Watch failover
stackkit status --watch
# Expected output:
# [12:00:01] node-1: UNREACHABLE
# [12:00:03] Failover triggered: node-1 → node-2
# [12:00:04] VIP 192.168.1.100 moved to node-2
# [12:00:05] Services healthy on node-2
# Restore node-1
ssh node-1 "sudo systemctl start docker"
# node-1 becomes backup (no auto-failback by default)
Constraints
Constraint Value Reason Min nodes 2 Need backup for HA Max nodes 5 Complexity ceiling Same subnet Required VRRP/VIP requirement Network 1 Gbps+ Replication bandwidth
Troubleshooting
When nodes can’t communicate but both think they’re primary:
Check network connectivity between nodes
Review Keepalived logs: journalctl -u keepalived
Consider adding a third node for quorum
If replica falls behind:
Check network bandwidth: iperf3 -c node-1
Review PostgreSQL logs for errors
Consider async replication for high-write workloads
Verify Keepalived is running on both nodes
Check VRRP traffic: tcpdump -i eth0 vrrp
Review interface configuration
Migration from the Base Kit
Add second node hardware
Install same OS and Docker version as your current node.
Update kombination.yaml
stackkit : ha-kit # Changed from base-kit!
nodes :
- name : node-1 # Existing
ip : 192.168.1.10
role : primary
- name : node-2 # New
ip : 192.168.1.11
role : replica
failover :
strategy : keepalived
virtual_ip : 192.168.1.100
Migrate data
stackkit migrate --from base-kit --to ha-kit
Next Steps
Monitoring Setup Set up comprehensive monitoring