Introduction
Monitoring is the heartbeat of any homelab. After trying several heavy solutions, I decided to build something targeted: PiHook. It’s a Python-based service designed to monitor both my internal services (like Jellyfin and Gitea) and my external footprint, providing real-time alerts without the overhead of a full enterprise stack.
The Core Architecture
PiHook is built for stability and low resource usage, making it perfect for running on a Raspberry Pi or a background container in a Proxmox VM.
Key Features:
- YAML-Based Configuration: I can add or remove services just by editing a simple
services.yamlfile. - Dual-Zone Monitoring: It distinguishes between local IP services and external DuckDNS/public URLs.
- Discord Integration: Real-time webhooks notify me the second a service drops or recovers.
- Persistent History: Every check is logged to a SQLite database, allowing for long-term health analysis.
- System Awareness: It doesn't just watch services; it monitors the host's CPU, RAM, and Disk usage, even checking for
aptupdates.
Design Decisions
I wanted the system to be "set and forget." To achieve this, I implemented:
- Maintenance Mode: A simple flag file (maintenance.flag) that silences all Discord alerts when I'm intentionally working on the rack.
- Escalation Logic: It doesn't spam me on a single blink. It uses an escalation threshold (e.g., 3 consecutive failures) before firing a critical alert.
- WSGI Dashboard: A lightweight Flask server provides a quick HTML status table at a glance.
Implementation Snippet
The heart of the service-checking logic handles retries and response time logging:
# Attempt request with retries
for attempt in range(retries + 1):
try:
start = time.time()
r = requests.get(url, timeout=(connect_timeout, read_timeout), verify=verify)
resp_time = time.time() - start
if r.ok:
curr = "UP"
logging.info(f"✅ {name} is UP (resp_time={resp_time:.2f}s)")
break
except Exception as e:
if attempt == retries:
curr = "DOWN"
Weekly Reporting
One of my favorite additions was the Weekly CSV Export. Every 7 days, the system prunes old logs and generates a CSV summary of uptime percentages and average response times. It keeps the database lean while giving me enough data to spot trends (like a server that's slowing down over time).
Conclusion
PiHook has become an essential part of my toolkit. It gives me peace of mind knowing that if a hard drive fails or my WAN IP changes, I'll know within minutes. Next on the roadmap: adding more granular Telegram support and integrating Grafana for visual dashboards.
Comments