You are building a Python-based process resource monitoring library specifically designed for Aisha, a platform engineer managing containerized workloads. The library should monitor resource usage within pods and nodes to ensure efficient resource allocation and prevent noisy neighbor problems.
- Identify processes running within containers vs host processes
- Map processes to Kubernetes pods, containers, and namespaces
- Track cgroup resource limits and actual usage per container
- Monitor container runtime overhead (Docker, containerd, CRI-O)
- Support multi-container pods with shared process namespaces
- Monitor CPU and memory usage against configured limits and requests
- Detect when pods approach or exceed resource limits
- Track OOMKilled events and correlate with memory pressure
- Identify pods without resource limits defined
- Calculate resource utilization efficiency (actual vs requested)
- Monitor node-level resource pressure signals
- Track allocatable vs capacity for CPU, memory, and ephemeral storage
- Identify nodes with high scheduling pressure
- Detect pod eviction risks due to resource pressure
- Analyze resource fragmentation preventing pod scheduling
- Generate custom metrics for HPA decisions
- Track request rate, response time, and queue depth per pod
- Calculate rolling averages and percentiles for scaling metrics
- Support both resource and custom metric types
- Predict scaling events based on trend analysis
- Monitor ResourceQuota usage per namespace
- Track quota consumption trends over time
- Alert on approaching quota limits
- Identify unused quota allocations
- Generate namespace resource usage reports
- Direct integration with Kubernetes API and metrics-server
- Support for Prometheus metrics exposition
- Container runtime API integration for detailed stats
- Real-time event streaming from Kubernetes watch API
- Efficient batch collection to minimize API server load
# Example usage
monitor = K8sResourceMonitor()
# Connect to cluster
monitor.connect(
kubeconfig="/path/to/kubeconfig",
context="production"
)
# Monitor namespace resources
namespace_stats = monitor.get_namespace_resources(
namespace="production",
include_pods=True,
include_quota=True
)
# Detect resource limit breaches
breaches = monitor.detect_limit_breaches(
threshold_percent=90,
time_window="5m"
)
# Analyze node pressure
node_pressure = monitor.analyze_node_pressure(
include_scheduling_hints=True,
predict_evictions=True
)
# Generate HPA metrics
hpa_metrics = monitor.generate_hpa_metrics(
deployment="api-server",
metric_type="requests_per_second",
aggregation="p95"
)
# Get resource recommendations
recommendations = monitor.get_resource_recommendations(
namespace="production",
optimization_target="balanced" # or "performance" or "cost"
)- Unit tests with mocked Kubernetes API responses
- Integration tests with kind/minikube clusters
- Chaos testing for pod eviction scenarios
- Load tests with 1000+ pod simulations
- Use pytest with pytest-json-report for test result formatting
- Test against multiple Kubernetes versions
- Monitor clusters with up to 5000 pods
- Update metrics within 10 seconds of change
- Process node metrics for 100 nodes in <5 seconds
- Store 24 hours of detailed pod metrics
- Generate namespace reports in <3 seconds
- Python 3.8+ compatibility required
- Use only Python standard library plus: kubernetes, psutil, prometheus-client
- No GUI components - this is a backend library only
- Support RBAC with minimal required permissions
- Compatible with Kubernetes 1.19+
- Core Python library with Kubernetes-native resource monitoring
- Prometheus metrics exporter for integration
- Resource recommendation engine based on usage patterns
- CLI tool for kubectl-style resource investigations
- Helm chart for deploying as DaemonSet with minimal privileges