admin/k3s-gitops

Fork 0

Files

History

admin 2850989230 Update apps/cluster-health-dashboard/Jenkinsfile

2026-01-07 09:40:37 +00:00

dashboard_preview.md

Add apps/cluster-health-dashboard/dashboard_preview.md

2026-01-07 08:25:48 +00:00

Jenkinsfile

Update apps/cluster-health-dashboard/Jenkinsfile

2026-01-07 09:40:37 +00:00

Jenkinsfile.old

Update apps/cluster-health-dashboard/Jenkinsfile.old

2026-01-07 08:48:50 +00:00

readme.md

Add apps/cluster-health-dashboard/readme.md

2026-01-07 08:26:35 +00:00

setup.sh

Add apps/cluster-health-dashboard/setup.sh

2026-01-07 08:27:25 +00:00

readme.md

📊 Cluster Health Dashboard - Setup Guide

Complete setup guide for the Kubernetes Cluster Health Dashboard Jenkins pipeline.

🎯 What This Dashboard Does

Collects:

✅ Cluster information (version, nodes, namespaces, pods)
✅ Resource metrics from Prometheus (CPU, Memory, Network)
✅ Pod status across all namespaces
✅ Node capacity and usage
✅ Cost estimation (monthly)
✅ Health checks and issues detection

Generates:

📊 Interactive HTML dashboard
📄 JSON report with all metrics
📱 Telegram summary notification
📧 Optional email report

📋 Prerequisites

Required:

✅ Jenkins with Kubernetes plugin
✅ kubectl configured with cluster access
✅ Prometheus running in cluster (for metrics)
✅ jq installed on Jenkins agent
✅ curl installed on Jenkins agent

Optional:

⚙️ Telegram bot (for notifications)
⚙️ Email configured in Jenkins
⚙️ Grafana (referenced in dashboard)

🚀 Setup Steps

Step 1: Install Required Tools on Jenkins Agent

# SSH to your Jenkins agent or use Jenkins shell

# Install jq (JSON processor)
sudo apt-get update
sudo apt-get install -y jq

# Verify installations
jq --version
kubectl version --client
curl --version

Step 2: Configure Prometheus Access

Option A: If Prometheus is in your cluster (recommended)

Check if Prometheus is accessible:

# From Jenkins agent or any pod in cluster
kubectl get svc -n monitoring

# Should see something like:
# prometheus-server   ClusterIP   10.43.xxx.xxx   <none>   80/TCP

Option B: If Prometheus is external

Update Jenkinsfile environment variables:

environment {
    PROMETHEUS_URL = 'http://your-prometheus-url:9090'
}

Test Prometheus access:

# From Jenkins agent
curl "http://prometheus-server.monitoring.svc.cluster.local/api/v1/query?query=up"

# Should return JSON with metrics

Step 3: Set Up Telegram Notifications (Optional)

If you already have bot from previous setup, skip this!

A. Create Bot (if not done)

Open Telegram → @BotFather
/newbot
Get token: 1234567890:ABC...

B. Get Chat ID

Telegram → @userinfobot
Get your ID: 904518516

C. Add to Jenkins Credentials

Jenkins → Manage Jenkins → Manage Credentials → Add:

Credential 1:

Kind: Secret text
Secret: 8347227871:AAHmkc--2ky2yEK80EGyIfpItKzV9zhGZSI
ID: telegram-bot-token
Description: Telegram Bot Token

Credential 2:

Kind: Secret text
Secret: 904518516
ID: telegram-chat-id
Description: Telegram Chat ID

Step 4: Adjust Cost Estimates

Edit Jenkinsfile to match your actual cloud costs:

environment {
    // Adjust these to your actual pricing
    CPU_PRICE_PER_HOUR = '0.04'        // $0.04 per vCPU/hour
    MEMORY_PRICE_PER_GB_HOUR = '0.005' // $0.005 per GB/hour
}

Common pricing reference:

AWS t3.medium: ~$0.0416/hour (2 vCPU, 4GB RAM)
DigitalOcean: $0.06/hour per vCPU, $0.007/GB RAM
Local/Bare metal: $0 (or electricity cost)

Step 5: Create Jenkins Pipeline

A. Create New Pipeline Job

Jenkins → New Item
Name: cluster-health-dashboard
Type: Pipeline
OK

B. Configure Pipeline

Description:

Daily cluster health monitoring and reporting. 
Generates dashboard with metrics, costs, and health checks.

Build Triggers:
- ☑️ Build periodically
- Schedule: 0 8 * * 1-5 (8 AM weekdays)
Pipeline:
- Definition: Pipeline script from SCM
- SCM: Git
- Repository URL: http://gitea-http.gitea.svc.cluster.local:3000/admin/k3s-gitops
- Credentials: gitea-credentials
- Branch: */main
- Script Path: apps/cluster-health-dashboard/Jenkinsfile

C. Or use Pipeline Script Directly

If you want to test first without Git:

Definition: Pipeline script
Copy entire Jenkinsfile content into the script box
Save

Step 6: Add to GitOps Repository

# On your local machine
cd ~/projects/k3s-gitops

# Create directory
mkdir -p apps/cluster-health-dashboard

# Copy Jenkinsfile
cp /path/to/Jenkinsfile apps/cluster-health-dashboard/

# Commit
git add apps/cluster-health-dashboard/
git commit -m "feat: add cluster health dashboard pipeline"
git push origin main

🧪 Testing

Test 1: Manual Run (First Time)

Jenkins → cluster-health-dashboard → Build with Parameters
Set:
- REPORT_PERIOD: 24h
- SEND_EMAIL: false (for first test)
- SEND_TELEGRAM: true
Build Now

Watch Console Output:

🚀 Starting Cluster Health Dashboard generation...
📋 Collecting cluster information...
Cluster version: v1.28.0
Nodes: 3
Namespaces: 14
Pods: 67
📈 Querying Prometheus for metrics...
✅ Dashboard generated

Test 2: Check Generated Dashboard

After build completes:

Jenkins → cluster-health-dashboard → Build #1
Click "Cluster Health Dashboard" (left sidebar)
Should see beautiful HTML dashboard! 🎨

Test 3: Check Telegram Notification

You should receive:

📊 Cluster Health Report

━━━━━━━━━━━━━━━━━━━━━━
📋 Cluster Info
Version: v1.28.0
Nodes: 3
Namespaces: 14
Total Pods: 67

━━━━━━━━━━━━━━━━━━━━━━
💻 Resources
CPU Cores: 12
Memory: 48 GB
Avg CPU Usage: 23.5%
...

Test 4: Check Artifacts

Build #1 → Artifacts
Should see:
- dashboard.html
- report.json
- namespace-stats.json
- all-pods.json
- node-resources.json

🔧 Troubleshooting

Issue 1: "Failed to query Prometheus"

Symptoms:

⚠️ Failed to query Prometheus: Connection refused

Fix:

# Check if Prometheus is running
kubectl get pods -n monitoring

# Check service
kubectl get svc -n monitoring

# Test connection from Jenkins pod
kubectl exec -it jenkins-0 -n jenkins -- \
  curl "http://prometheus-server.monitoring.svc.cluster.local/api/v1/query?query=up"

If Prometheus is in different namespace:

Update Jenkinsfile:

PROMETHEUS_URL = 'http://prometheus-server.YOUR_NAMESPACE.svc.cluster.local'

Issue 2: "jq: command not found"

Fix:

# Install jq on Jenkins agent
kubectl exec -it jenkins-0 -n jenkins -- apt-get update
kubectl exec -it jenkins-0 -n jenkins -- apt-get install -y jq

# Or add to Jenkins Dockerfile:
# RUN apt-get update && apt-get install -y jq

Issue 3: "kubectl: command not found"

Fix:

Jenkins needs kubectl. Check installation:

kubectl exec -it jenkins-0 -n jenkins -- kubectl version --client

# If not installed, add to Jenkins image or install:
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
mv kubectl /usr/local/bin/

Issue 4: Dashboard shows "0" for all metrics

Possible causes:

Prometheus not accessible
Wrong Prometheus URL
No metrics in Prometheus

Debug:

# Test Prometheus query manually
curl "http://prometheus-server.monitoring.svc.cluster.local/api/v1/query?query=up"

# Check if metrics exist
curl "http://prometheus-server.monitoring.svc.cluster.local/api/v1/query?query=container_cpu_usage_seconds_total"

Issue 5: HTML Dashboard not showing

Check:

# Verify HTML Plugin is installed
Jenkins → Manage Jenkins → Manage Plugins → Installed

# Look for: HTML Publisher Plugin

# If not installed:
# Manage Plugins → Available → Search "HTML Publisher" → Install

Issue 6: Telegram notifications not sending

Check credentials:

# Verify credentials exist
Jenkins → Manage Jenkins → Manage Credentials

# Should see:
# - telegram-bot-token
# - telegram-chat-id

# Test manually:
BOT_TOKEN="8347227871:AAHmkc--2ky2yEK80EGyIfpItKzV9zhGZSI"
CHAT_ID="904518516"

curl -X POST "https://api.telegram.org/bot${BOT_TOKEN}/sendMessage" \
    -d chat_id="${CHAT_ID}" \
    -d text="Test from terminal"

📊 Understanding the Dashboard

Metrics Explained:

Cluster Information:

Kubernetes Version: Your K8s version
Nodes: Number of worker nodes
Namespaces: Total namespaces
Total Pods: All pods across cluster

Resource Capacity:

Total CPU Cores: Sum of all node CPUs
Total Memory: Sum of all node RAM
Avg CPU Usage: Average CPU across containers
Progress Bar: Visual CPU usage

Pod Status:

Running: Healthy pods ✅
Pending: Pods waiting to start ⏳
Failed: Crashed pods ❌
Total Restarts: Container restarts (high = problem)

Monthly Costs:

Based on CPU cores and Memory GB
Calculated using rates you configured
Estimates infrastructure cost

Health Checks:

High restart count (>10)
Failed pods (>0)
Pending pods (>5)
High CPU usage (>80%)

Resources by Namespace:

Table showing pod/container count per namespace
Sorted by pod count (highest first)

🎨 Customization

Change Schedule

Edit cron trigger in Jenkinsfile:

triggers {
    cron('0 8 * * 1-5')  // Weekdays 8 AM
    
    // Examples:
    // cron('0 */6 * * *')     // Every 6 hours
    // cron('0 9 * * MON')     // Mondays 9 AM
    // cron('0 0 * * *')       // Daily midnight
}

Add More Metrics

Add to Prometheus queries section:

// Disk I/O
env.DISK_READ_MB = queryPrometheus(
    "sum(rate(container_fs_reads_bytes_total[5m])) / 1024 / 1024"
)

// HTTP Requests (if you have metrics)
env.HTTP_REQUESTS_PER_SEC = queryPrometheus(
    "sum(rate(http_requests_total[5m]))"
)

Then add to HTML dashboard:

<div class="metric">
    <span class="metric-label">Disk Read</span>
    <span class="metric-value">${env.DISK_READ_MB} MB/s</span>
</div>

Change Colors/Styling

Edit CSS in generateDashboardHTML():

/* Change main gradient */
background: linear-gradient(135deg, #YOUR_COLOR1 0%, #YOUR_COLOR2 100%);

/* Change card colors */
.card h2 {
    color: #YOUR_COLOR;
}

Add Email Recipients

Add to Jenkinsfile:

post {
    success {
        emailext (
            to: 'devops-team@company.com',
            subject: "Cluster Health Report - ${new Date().format('yyyy-MM-dd')}",
            body: '''
                <h2>Daily Cluster Health Report</h2>
                <p>Please see attached dashboard.</p>
            ''',
            mimeType: 'text/html',
            attachmentsPattern: '**/dashboard.html'
        )
    }
}

📈 Usage Examples

Weekly Review

Monday 8 AM → Dashboard generated
Review:
- Are costs increasing? Why?
- Any failed pods? Investigate
- CPU usage trending up? Scale?
- Restarts increasing? Bug in app?

Cost Tracking

Week 1: $150/month
Week 2: $180/month ⚠️  (+20%)
→ Check namespace-stats.json
→ Which namespace grew?
→ Review pod counts

Capacity Planning

Current: 12 CPU cores, 23.5% usage
If usage > 70% for 7 days:
→ Time to add nodes
→ Dashboard shows trend

Health Monitoring

Dashboard shows:
❌ 5 pods in Failed state
⚠️ 15 container restarts

→ Click artifact → all-pods.json
→ Find which pods
→ kubectl logs <pod>
→ Fix issue

🔗 Integration with Other Tools

Export to Grafana

Use report.json:

# Download report.json from Jenkins artifact
# Import to Grafana via JSON API datasource
# Create time-series dashboard

Send to Slack

Add Slack webhook:

post {
    success {
        sh """
            curl -X POST ${SLACK_WEBHOOK_URL} \
                -H 'Content-Type: application/json' \
                -d '{
                    "text": "Daily Cluster Report: ${env.MONTHLY_TOTAL_COST} USD/month",
                    "attachments": [{
                        "color": "good",
                        "fields": [
                            {"title": "Nodes", "value": "${env.NODE_COUNT}", "short": true},
                            {"title": "Pods", "value": "${env.POD_COUNT}", "short": true}
                        ]
                    }]
                }'
        """
    }
}

Store in Database

Parse JSON and insert:

stage('Store in Database') {
    steps {
        script {
            def report = readJSON file: "${OUTPUT_DIR}/report.json"
            
            sh """
                psql -h postgres -U metrics -d cluster_metrics -c "
                    INSERT INTO daily_reports (date, cpu_usage, pod_count, cost)
                    VALUES ('${report.generated_at}', ${report.resources.avg_cpu_usage_percent}, 
                            ${report.cluster.total_pods}, ${report.costs.monthly_total_usd})
                "
            """
        }
    }
}

✅ Verification Checklist

After setup, verify:

Jenkins job created
First build succeeds
HTML dashboard accessible
Metrics show real data (not zeros)
Telegram notification received
Costs calculated correctly
JSON report generated
Namespace table populated
Health checks working
Schedule triggers correctly

📚 Next Steps

Enhancements:

Historical Tracking - Store reports in Git or database
Alerts - Trigger alerts on threshold breaches
Comparison - Compare week-over-week trends
Recommendations - Auto-suggest optimizations
Deep Dive - Per-namespace detailed reports

Security Scanning (scan images from this report)
Cleanup Pipeline (remove resources shown as unused)
Backup Pipeline (backup based on importance shown here)

You're all set! 🎉

Run your first build and enjoy your cluster health dashboard! 📊✨

readme.md

📊 Cluster Health Dashboard - Setup Guide

🎯 What This Dashboard Does

Collects:

Generates:

📋 Prerequisites

Required:

Optional:

🚀 Setup Steps

Step 1: Install Required Tools on Jenkins Agent

Step 2: Configure Prometheus Access

Step 3: Set Up Telegram Notifications (Optional)

Step 4: Adjust Cost Estimates

Step 5: Create Jenkins Pipeline

Step 6: Add to GitOps Repository

🧪 Testing

Test 1: Manual Run (First Time)

Test 2: Check Generated Dashboard

Test 3: Check Telegram Notification

Test 4: Check Artifacts

🔧 Troubleshooting

Issue 1: "Failed to query Prometheus"

Issue 2: "jq: command not found"

Issue 3: "kubectl: command not found"

Issue 4: Dashboard shows "0" for all metrics

Issue 5: HTML Dashboard not showing

Issue 6: Telegram notifications not sending

📊 Understanding the Dashboard

Metrics Explained:

🎨 Customization

Change Schedule

Add More Metrics

Change Colors/Styling

Add Email Recipients

📈 Usage Examples

Weekly Review

Cost Tracking

Capacity Planning

Health Monitoring

🔗 Integration with Other Tools

Export to Grafana

Send to Slack

Store in Database

✅ Verification Checklist

📚 Next Steps

Enhancements:

Related Pipelines: