15 KiB
🔄 Manual Rollback Feature - Complete Documentation
📋 Table of Contents
- Overview
- Features
- Setup Guide
- Usage Guide
- Rollback Methods
- Troubleshooting & Fixes
- Best Practices
- Examples
Overview
Manual Rollback feature позволяет откатить deployment на любую предыдущую версию через Jenkins Pipeline.
Key Features:
- ✅ 3 способа rollback (IMAGE_TAG, REVISION_NUMBER, GIT_COMMIT)
- ✅ GitOps sync - автоматически обновляет Git manifests
- ✅ Zero downtime - rolling updates
- ✅ DRY_RUN mode - безопасное тестирование
- ✅ Health checks - опциональная проверка после rollback
- ✅ Full RBAC - правильные permissions
Features
Rollback Methods
| Method | Description | Example | Use Case |
|---|---|---|---|
| IMAGE_TAG | По Docker image tag | main-21 |
Знаешь конкретный build number |
| REVISION_NUMBER | По Kubernetes revision | 2 |
Откат на N шагов назад |
| GIT_COMMIT | По Git commit SHA | abc123def |
Точное состояние кода |
Parameters
ROLLBACK_METHOD // Выбор метода
TARGET_VERSION // Целевая версия (auto-trim whitespace)
SKIP_HEALTH_CHECK // Пропустить health checks (default: false)
DRY_RUN // Только показать план (default: false)
Setup Guide
Step 1: Create Jenkins Pipeline
1. Jenkins → New Item
2. Name: demo-nginx-rollback
3. Type: Pipeline
4. Click OK
Step 2: Configure Pipeline
Pipeline:
Definition: Pipeline script from SCM
SCM: Git
Repository URL: http://gitea-http.gitea.svc.cluster.local:3000/admin/k3s-gitops
Credentials: gitea-credentials
Branch: */main
Script Path: apps/demo-nginx/Jenkinsfile.rollback
Step 3: Verify RBAC
RBAC уже настроен в apps/jenkins/rbac.yaml:
ClusterRole: jenkins-deployer
Permissions:
- pods, services, deployments (full CRUD)
- pods/exec, pods/log (for health checks)
- ingresses, applications (for ArgoCD)
Step 4: Test with DRY_RUN
Jenkins → demo-nginx-rollback → Build with Parameters
├─ ROLLBACK_METHOD: IMAGE_TAG
├─ TARGET_VERSION: main-21
├─ DRY_RUN: ✅ true
└─ Build
Usage Guide
Quick Start
Jenkins → demo-nginx-rollback → Build with Parameters
┌─────────────────────────────────────┐
│ ROLLBACK_METHOD: IMAGE_TAG │
│ TARGET_VERSION: main-21 │
│ SKIP_HEALTH_CHECK: true (рекоменд.) │
│ DRY_RUN: false │
└─────────────────────────────────────┘
→ Build → ✅ SUCCESS!
Pipeline Stages
Stage 1: Validate Input
└─ Trim whitespace, validate TARGET_VERSION
Stage 2: Show Current State
└─ Current deployment, image, pods, history
Stage 3: Prepare Rollback
└─ Build target image path or verify revision
Stage 4: Execute Rollback
├─ kubectl set image (or rollout undo)
└─ Git commit & push
Stage 5: Wait for Rollout
├─ kubectl rollout status (300s timeout)
└─ sleep 10s (stabilization)
Stage 6: Health Check (optional)
└─ 5 retry attempts with 5s delay
Stage 7: Show New State
└─ New deployment state, pods, history
Rollback Methods
Method 1: IMAGE_TAG (Recommended)
Когда использовать: Знаешь конкретный build number
Как найти tag:
# Docker Hub
https://hub.docker.com/r/vladcrypto/demo-nginx/tags
# Jenkins build history
Jenkins → demo-nginx → Build History
# Git commits
git log --oneline | grep "Update image"
Example:
ROLLBACK_METHOD: IMAGE_TAG
TARGET_VERSION: main-21
Result: Rollback to docker.io/vladcrypto/demo-nginx:main-21
Method 2: REVISION_NUMBER
Когда использовать: Нужно откатиться на N шагов назад
Как найти revision:
kubectl rollout history deployment/demo-nginx -n demo-app
# Output:
REVISION CHANGE-CAUSE
1 Initial deployment
2 Update to main-20
3 Update to main-21
4 Update to main-22 (current)
Example:
ROLLBACK_METHOD: REVISION_NUMBER
TARGET_VERSION: 2
Result: Rollback to revision 2 (main-20)
Method 3: GIT_COMMIT
Когда использовать: Нужно вернуться к конкретному состоянию кода
Как найти commit:
# Gitea
https://git.thedevops.dev/admin/k3s-gitops/commits/branch/main
# Git CLI
git log --oneline apps/demo-nginx/deployment.yaml
# Output:
abc123d Update image to main-22 (current)
def456e Update image to main-21
ghi789f Update image to main-20
Example:
ROLLBACK_METHOD: GIT_COMMIT
TARGET_VERSION: def456e
Result: Rollback to commit def456e
Troubleshooting & Fixes
Issue #1: Container Name Error ✅ FIXED
Error:
error: unable to find container named "demo-nginx"
Root Cause: Pipeline использовал deployment name вместо container name.
Fix:
environment {
APP_NAME = 'demo-nginx' // Deployment name
CONTAINER_NAME = 'nginx' // Container name ✅
}
kubectl set image deployment/${APP_NAME} \
${CONTAINER_NAME}=${TARGET_IMAGE}
How to verify:
kubectl get deployment demo-nginx -n demo-app \
-o jsonpath='{.spec.template.spec.containers[0].name}'
# Output: nginx
Issue #2: Whitespace in Input ✅ FIXED
Error:
Target image: docker.io/vladcrypto/demo-nginx: main-21
^
Space!
Root Cause: User ввел TARGET_VERSION с пробелом.
Fix:
stage('Validate Input') {
// Auto-trim whitespace
env.TARGET_VERSION_CLEAN = params.TARGET_VERSION.trim()
// Use everywhere
${env.TARGET_VERSION_CLEAN}
}
Issue #3: RBAC Permissions ✅ FIXED
Error:
Error: User "system:serviceaccount:jenkins:jenkins"
cannot create resource "pods/exec"
Root Cause: Jenkins ServiceAccount не имел прав на pods/exec для health checks.
Fix:
# apps/jenkins/rbac.yaml
rules:
- apiGroups: [""]
resources: ["pods/exec", "pods/log"] # ← Added!
verbs: ["create", "get"]
Applied:
kubectl apply -f apps/jenkins/rbac.yaml
Issue #4: Health Check Timing ⚠️ WORKAROUND
Error:
wget: can't connect to remote host: Connection refused
Root Cause: Health check runs too early during rolling update (race condition).
Workaround:
// Option 1: Skip health check (recommended)
SKIP_HEALTH_CHECK: true
// Option 2: Longer stabilization wait
sleep 30 // Instead of 10
Timeline:
T+0s: kubectl set image
T+30s: Rollout status = complete
T+40s: sleep 10s
T+50s: Health check (pods might still be starting)
Solution:
Use SKIP_HEALTH_CHECK: true и проверь вручную через 30-60s:
kubectl get pods -n demo-app -l app=demo-nginx
Issue #5: Bash Loop Syntax ✅ FIXED
Error:
Health check attempt {1..5}/5...
# Loop executed only once!
Root Cause:
{1..5} не работает в sh/dash, нужен bash.
Fix:
#!/bin/bash # ← Added shebang
set -e
# Fixed loop syntax
for i in 1 2 3 4 5; do # Instead of {1..5}
echo "Health check attempt $i/5..."
if kubectl exec ...; then
exit 0
fi
if [ $i -lt 5 ]; then
sleep 5
fi
done
Best Practices
1. Always Use DRY_RUN First
Step 1: DRY_RUN=true → Проверь план
Step 2: Verify output
Step 3: DRY_RUN=false → Execute
2. Use SKIP_HEALTH_CHECK for Emergency
Emergency rollback:
├─ SKIP_HEALTH_CHECK: true
├─ Focus on speed
└─ Verify manually after
3. Document Rollback Reason
Add comment в Jenkins build:
Build Comment:
"Rollback due to: API errors in main-23
Previous working version: main-21
Impact: None (zero downtime)"
4. Monitor After Rollback
# Watch pods
watch kubectl get pods -n demo-app
# Check logs
kubectl logs -n demo-app -l app=demo-nginx -f
# Verify image
kubectl get deployment demo-nginx -n demo-app \
-o jsonpath='{.spec.template.spec.containers[0].image}'
5. Verify in ArgoCD
ArgoCD UI → demo-nginx
├─ Status: Synced ✅
└─ Health: Healthy ✅
Examples
Example 1: Quick Rollback to Previous Build
Scenario: Build #23 failed, rollback to #21
Steps:
1. Jenkins → demo-nginx-rollback
2. IMAGE_TAG + main-21
3. SKIP_HEALTH_CHECK: true
4. Build
Time: ~2 minutes
Result: ✅ SUCCESS
Example 2: Rollback to Last Week's Version
Scenario: Need stable version from last week
Steps:
1. Find old build: Jenkins → Build History → #15
2. Check image tag: main-15
3. Jenkins → demo-nginx-rollback
4. IMAGE_TAG + main-15
5. DRY_RUN: true (verify first!)
6. DRY_RUN: false (execute)
Result: ✅ Rolled back to main-15
Example 3: Rollback by Revision Number
Scenario: Откатить на 3 versions назад
Steps:
1. Check history:
kubectl rollout history deployment/demo-nginx -n demo-app
2. Find revision: 25 (current: 28)
3. Jenkins → demo-nginx-rollback
4. REVISION_NUMBER + 25
5. Build
Result: ✅ Rolled back to revision 25
Example 4: Rollback by Git Commit
Scenario: Нужно точное состояние кода
Steps:
1. Find commit:
git log --oneline apps/demo-nginx/deployment.yaml
2. Copy SHA: abc123def
3. Jenkins → demo-nginx-rollback
4. GIT_COMMIT + abc123def
5. Build
Result: ✅ Rolled back to commit abc123def
Manual Verification Commands
Check Deployment Status
kubectl get deployment demo-nginx -n demo-app
# Expected:
NAME READY UP-TO-DATE AVAILABLE AGE
demo-nginx 2/2 2 2 15h
Check Image Version
kubectl get deployment demo-nginx -n demo-app \
-o jsonpath='{.spec.template.spec.containers[0].image}'
# Expected: docker.io/vladcrypto/demo-nginx:main-21
Check Pods
kubectl get pods -n demo-app -l app=demo-nginx
# Expected: 2 pods Running
Check Rollout History
kubectl rollout history deployment/demo-nginx -n demo-app
# Shows all revisions
Test Health Endpoint
POD=$(kubectl get pods -n demo-app -l app=demo-nginx -o jsonpath='{.items[0].metadata.name}')
kubectl exec $POD -n demo-app -- wget -q -O- http://localhost/health
# Expected: healthy
Emergency Rollback Procedure
If Production is Down
Option 1: Jenkins (2 minutes)
1. Jenkins → demo-nginx-rollback
2. IMAGE_TAG → last known good version
3. SKIP_HEALTH_CHECK: ✅ true
4. Build
Option 2: kubectl (30 seconds)
# Fastest - rollback to previous
kubectl rollout undo deployment/demo-nginx -n demo-app
# To specific revision
kubectl rollout undo deployment/demo-nginx -n demo-app --to-revision=25
Option 3: ArgoCD (1 minute)
1. ArgoCD UI → demo-nginx
2. History → Select previous version
3. Rollback button
Configuration Reference
Environment Variables
APP_NAME = 'demo-nginx' // Deployment name
CONTAINER_NAME = 'nginx' // Container name
NAMESPACE = 'demo-app' // K8s namespace
DOCKER_REGISTRY = 'docker.io' // Registry
DOCKER_REPO = 'vladcrypto' // Docker Hub user
HEALTH_CHECK_TIMEOUT = '300s' // Rollout timeout
Customization
Изменить настройки в Jenkinsfile.rollback:
// Увеличить timeout
HEALTH_CHECK_TIMEOUT = '600s'
// Больше попыток health check
for i in 1 2 3 4 5 6 7 8 9 10; do
// Дольше ждать stabilization
sleep 30 // Instead of 10
Monitoring & Alerts
Grafana Dashboard
# Rollback count
sum(increase(deployment_rollback_total[1h])) by (deployment)
# Rollback rate
rate(deployment_rollback_total[5m])
# Average rollback duration
avg(deployment_rollback_duration_seconds)
Alert Rules
- alert: FrequentRollbacks
expr: rate(deployment_rollback_total[1h]) > 2
annotations:
summary: "Frequent rollbacks detected"
description: "More than 2 rollbacks in last hour"
- alert: RollbackFailed
expr: deployment_rollback_failed_total > 0
annotations:
summary: "Rollback failed"
description: "Manual intervention required"
Summary of All Fixes
| # | Issue | Fix | Status |
|---|---|---|---|
| 1 | Container name wrong | Use nginx not demo-nginx |
✅ Fixed |
| 2 | Whitespace in input | Auto-trim with .trim() |
✅ Fixed |
| 3 | RBAC pods/exec | Add permission to ClusterRole | ✅ Fixed |
| 4 | Health check timing | Use SKIP_HEALTH_CHECK=true |
⚠️ Workaround |
| 5 | Bash loop syntax | Use explicit list 1 2 3 4 5 |
✅ Fixed |
Success Criteria
✅ Rollback Methods: 3/3 working (IMAGE_TAG, REVISION, GIT_COMMIT)
✅ GitOps Sync: Git commits automatically
✅ Zero Downtime: Rolling updates
✅ RBAC: Full permissions configured
✅ Input Validation: Whitespace auto-trimmed
✅ DRY_RUN: Safe testing mode
✅ Retry Logic: 5 attempts with proper bash syntax
⚠️ Health Check: Optional (use SKIP_HEALTH_CHECK=true)
FAQ
Q: Health check всегда падает, это нормально?
A: Да, из-за timing race condition во время rolling update. Используй SKIP_HEALTH_CHECK: true и проверь вручную через 30-60s.
Q: Как откатиться на несколько версий назад?
A: Используй REVISION_NUMBER метод и укажи нужную revision из kubectl rollout history.
Q: Можно ли откатить только в staging?
A: Да, измени NAMESPACE в Jenkinsfile или создай отдельный job для staging.
Q: Как быстро откатиться в emergency?
A: Используй kubectl rollout undo (30 секунд) или Jenkins с SKIP_HEALTH_CHECK=true (2 минуты).
Q: Что если Git commit fail?
A: Rollback всё равно произошёл в Kubernetes! Git нужен только для GitOps sync. ArgoCD пере-синкает через 3 минуты.
Related Documentation
- CI/CD Guide
- Automatic Rollback - See
post { failure }section - Jenkins RBAC
- Deployment Manifest
Support
Issues?
- Check Jenkins console output
- Verify RBAC permissions
- Check pod status:
kubectl get pods -n demo-app - Review ArgoCD sync status
Need Help?
- Jenkins logs: Jenkins → Build → Console Output
- Kubernetes events:
kubectl get events -n demo-app - Pod logs:
kubectl logs -n demo-app -l app=demo-nginx
Last Updated: 2026-01-06
Version: 1.0
Status: Production Ready ✅