# 🔄 Manual Rollback Feature - Complete Documentation ## 📋 Table of Contents 1. [Overview](#overview) 2. [Features](#features) 3. [Setup Guide](#setup-guide) 4. [Usage Guide](#usage-guide) 5. [Rollback Methods](#rollback-methods) 6. [Troubleshooting & Fixes](#troubleshooting--fixes) 7. [Best Practices](#best-practices) 8. [Examples](#examples) --- ## Overview Manual Rollback feature позволяет откатить deployment на любую предыдущую версию через Jenkins Pipeline. ### Key Features: - ✅ **3 способа rollback** (IMAGE_TAG, REVISION_NUMBER, GIT_COMMIT) - ✅ **GitOps sync** - автоматически обновляет Git manifests - ✅ **Zero downtime** - rolling updates - ✅ **DRY_RUN mode** - безопасное тестирование - ✅ **Health checks** - опциональная проверка после rollback - ✅ **Full RBAC** - правильные permissions --- ## Features ### Rollback Methods | Method | Description | Example | Use Case | |--------|-------------|---------|----------| | **IMAGE_TAG** | По Docker image tag | `main-21` | Знаешь конкретный build number | | **REVISION_NUMBER** | По Kubernetes revision | `2` | Откат на N шагов назад | | **GIT_COMMIT** | По Git commit SHA | `abc123def` | Точное состояние кода | ### Parameters ```groovy ROLLBACK_METHOD // Выбор метода TARGET_VERSION // Целевая версия (auto-trim whitespace) SKIP_HEALTH_CHECK // Пропустить health checks (default: false) DRY_RUN // Только показать план (default: false) ``` --- ## Setup Guide ### Step 1: Create Jenkins Pipeline ``` 1. Jenkins → New Item 2. Name: demo-nginx-rollback 3. Type: Pipeline 4. Click OK ``` ### Step 2: Configure Pipeline ```yaml Pipeline: Definition: Pipeline script from SCM SCM: Git Repository URL: http://gitea-http.gitea.svc.cluster.local:3000/admin/k3s-gitops Credentials: gitea-credentials Branch: */main Script Path: apps/demo-nginx/Jenkinsfile.rollback ``` ### Step 3: Verify RBAC RBAC уже настроен в `apps/jenkins/rbac.yaml`: ```yaml ClusterRole: jenkins-deployer Permissions: - pods, services, deployments (full CRUD) - pods/exec, pods/log (for health checks) - ingresses, applications (for ArgoCD) ``` ### Step 4: Test with DRY_RUN ``` Jenkins → demo-nginx-rollback → Build with Parameters ├─ ROLLBACK_METHOD: IMAGE_TAG ├─ TARGET_VERSION: main-21 ├─ DRY_RUN: ✅ true └─ Build ``` --- ## Usage Guide ### Quick Start ``` Jenkins → demo-nginx-rollback → Build with Parameters ┌─────────────────────────────────────┐ │ ROLLBACK_METHOD: IMAGE_TAG │ │ TARGET_VERSION: main-21 │ │ SKIP_HEALTH_CHECK: true (рекоменд.) │ │ DRY_RUN: false │ └─────────────────────────────────────┘ → Build → ✅ SUCCESS! ``` ### Pipeline Stages ``` Stage 1: Validate Input └─ Trim whitespace, validate TARGET_VERSION Stage 2: Show Current State └─ Current deployment, image, pods, history Stage 3: Prepare Rollback └─ Build target image path or verify revision Stage 4: Execute Rollback ├─ kubectl set image (or rollout undo) └─ Git commit & push Stage 5: Wait for Rollout ├─ kubectl rollout status (300s timeout) └─ sleep 10s (stabilization) Stage 6: Health Check (optional) └─ 5 retry attempts with 5s delay Stage 7: Show New State └─ New deployment state, pods, history ``` --- ## Rollback Methods ### Method 1: IMAGE_TAG (Recommended) **Когда использовать:** Знаешь конкретный build number **Как найти tag:** ```bash # Docker Hub https://hub.docker.com/r/vladcrypto/demo-nginx/tags # Jenkins build history Jenkins → demo-nginx → Build History # Git commits git log --oneline | grep "Update image" ``` **Example:** ``` ROLLBACK_METHOD: IMAGE_TAG TARGET_VERSION: main-21 Result: Rollback to docker.io/vladcrypto/demo-nginx:main-21 ``` --- ### Method 2: REVISION_NUMBER **Когда использовать:** Нужно откатиться на N шагов назад **Как найти revision:** ```bash kubectl rollout history deployment/demo-nginx -n demo-app # Output: REVISION CHANGE-CAUSE 1 Initial deployment 2 Update to main-20 3 Update to main-21 4 Update to main-22 (current) ``` **Example:** ``` ROLLBACK_METHOD: REVISION_NUMBER TARGET_VERSION: 2 Result: Rollback to revision 2 (main-20) ``` --- ### Method 3: GIT_COMMIT **Когда использовать:** Нужно вернуться к конкретному состоянию кода **Как найти commit:** ```bash # Gitea https://git.thedevops.dev/admin/k3s-gitops/commits/branch/main # Git CLI git log --oneline apps/demo-nginx/deployment.yaml # Output: abc123d Update image to main-22 (current) def456e Update image to main-21 ghi789f Update image to main-20 ``` **Example:** ``` ROLLBACK_METHOD: GIT_COMMIT TARGET_VERSION: def456e Result: Rollback to commit def456e ``` --- ## Troubleshooting & Fixes ### Issue #1: Container Name Error ✅ FIXED **Error:** ``` error: unable to find container named "demo-nginx" ``` **Root Cause:** Pipeline использовал deployment name вместо container name. **Fix:** ```groovy environment { APP_NAME = 'demo-nginx' // Deployment name CONTAINER_NAME = 'nginx' // Container name ✅ } kubectl set image deployment/${APP_NAME} \ ${CONTAINER_NAME}=${TARGET_IMAGE} ``` **How to verify:** ```bash kubectl get deployment demo-nginx -n demo-app \ -o jsonpath='{.spec.template.spec.containers[0].name}' # Output: nginx ``` --- ### Issue #2: Whitespace in Input ✅ FIXED **Error:** ``` Target image: docker.io/vladcrypto/demo-nginx: main-21 ^ Space! ``` **Root Cause:** User ввел TARGET_VERSION с пробелом. **Fix:** ```groovy stage('Validate Input') { // Auto-trim whitespace env.TARGET_VERSION_CLEAN = params.TARGET_VERSION.trim() // Use everywhere ${env.TARGET_VERSION_CLEAN} } ``` --- ### Issue #3: RBAC Permissions ✅ FIXED **Error:** ``` Error: User "system:serviceaccount:jenkins:jenkins" cannot create resource "pods/exec" ``` **Root Cause:** Jenkins ServiceAccount не имел прав на pods/exec для health checks. **Fix:** ```yaml # apps/jenkins/rbac.yaml rules: - apiGroups: [""] resources: ["pods/exec", "pods/log"] # ← Added! verbs: ["create", "get"] ``` **Applied:** ```bash kubectl apply -f apps/jenkins/rbac.yaml ``` --- ### Issue #4: Health Check Timing ⚠️ WORKAROUND **Error:** ``` wget: can't connect to remote host: Connection refused ``` **Root Cause:** Health check runs too early during rolling update (race condition). **Workaround:** ```groovy // Option 1: Skip health check (recommended) SKIP_HEALTH_CHECK: true // Option 2: Longer stabilization wait sleep 30 // Instead of 10 ``` **Timeline:** ``` T+0s: kubectl set image T+30s: Rollout status = complete T+40s: sleep 10s T+50s: Health check (pods might still be starting) ``` **Solution:** Use `SKIP_HEALTH_CHECK: true` и проверь вручную через 30-60s: ```bash kubectl get pods -n demo-app -l app=demo-nginx ``` --- ### Issue #5: Bash Loop Syntax ✅ FIXED **Error:** ``` Health check attempt {1..5}/5... # Loop executed only once! ``` **Root Cause:** `{1..5}` не работает в sh/dash, нужен bash. **Fix:** ```bash #!/bin/bash # ← Added shebang set -e # Fixed loop syntax for i in 1 2 3 4 5; do # Instead of {1..5} echo "Health check attempt $i/5..." if kubectl exec ...; then exit 0 fi if [ $i -lt 5 ]; then sleep 5 fi done ``` --- ## Best Practices ### 1. Always Use DRY_RUN First ``` Step 1: DRY_RUN=true → Проверь план Step 2: Verify output Step 3: DRY_RUN=false → Execute ``` ### 2. Use SKIP_HEALTH_CHECK for Emergency ``` Emergency rollback: ├─ SKIP_HEALTH_CHECK: true ├─ Focus on speed └─ Verify manually after ``` ### 3. Document Rollback Reason Add comment в Jenkins build: ``` Build Comment: "Rollback due to: API errors in main-23 Previous working version: main-21 Impact: None (zero downtime)" ``` ### 4. Monitor After Rollback ```bash # Watch pods watch kubectl get pods -n demo-app # Check logs kubectl logs -n demo-app -l app=demo-nginx -f # Verify image kubectl get deployment demo-nginx -n demo-app \ -o jsonpath='{.spec.template.spec.containers[0].image}' ``` ### 5. Verify in ArgoCD ``` ArgoCD UI → demo-nginx ├─ Status: Synced ✅ └─ Health: Healthy ✅ ``` --- ## Examples ### Example 1: Quick Rollback to Previous Build ``` Scenario: Build #23 failed, rollback to #21 Steps: 1. Jenkins → demo-nginx-rollback 2. IMAGE_TAG + main-21 3. SKIP_HEALTH_CHECK: true 4. Build Time: ~2 minutes Result: ✅ SUCCESS ``` --- ### Example 2: Rollback to Last Week's Version ``` Scenario: Need stable version from last week Steps: 1. Find old build: Jenkins → Build History → #15 2. Check image tag: main-15 3. Jenkins → demo-nginx-rollback 4. IMAGE_TAG + main-15 5. DRY_RUN: true (verify first!) 6. DRY_RUN: false (execute) Result: ✅ Rolled back to main-15 ``` --- ### Example 3: Rollback by Revision Number ``` Scenario: Откатить на 3 versions назад Steps: 1. Check history: kubectl rollout history deployment/demo-nginx -n demo-app 2. Find revision: 25 (current: 28) 3. Jenkins → demo-nginx-rollback 4. REVISION_NUMBER + 25 5. Build Result: ✅ Rolled back to revision 25 ``` --- ### Example 4: Rollback by Git Commit ``` Scenario: Нужно точное состояние кода Steps: 1. Find commit: git log --oneline apps/demo-nginx/deployment.yaml 2. Copy SHA: abc123def 3. Jenkins → demo-nginx-rollback 4. GIT_COMMIT + abc123def 5. Build Result: ✅ Rolled back to commit abc123def ``` --- ## Manual Verification Commands ### Check Deployment Status ```bash kubectl get deployment demo-nginx -n demo-app # Expected: NAME READY UP-TO-DATE AVAILABLE AGE demo-nginx 2/2 2 2 15h ``` ### Check Image Version ```bash kubectl get deployment demo-nginx -n demo-app \ -o jsonpath='{.spec.template.spec.containers[0].image}' # Expected: docker.io/vladcrypto/demo-nginx:main-21 ``` ### Check Pods ```bash kubectl get pods -n demo-app -l app=demo-nginx # Expected: 2 pods Running ``` ### Check Rollout History ```bash kubectl rollout history deployment/demo-nginx -n demo-app # Shows all revisions ``` ### Test Health Endpoint ```bash POD=$(kubectl get pods -n demo-app -l app=demo-nginx -o jsonpath='{.items[0].metadata.name}') kubectl exec $POD -n demo-app -- wget -q -O- http://localhost/health # Expected: healthy ``` --- ## Emergency Rollback Procedure ### If Production is Down **Option 1: Jenkins (2 minutes)** ``` 1. Jenkins → demo-nginx-rollback 2. IMAGE_TAG → last known good version 3. SKIP_HEALTH_CHECK: ✅ true 4. Build ``` **Option 2: kubectl (30 seconds)** ```bash # Fastest - rollback to previous kubectl rollout undo deployment/demo-nginx -n demo-app # To specific revision kubectl rollout undo deployment/demo-nginx -n demo-app --to-revision=25 ``` **Option 3: ArgoCD (1 minute)** ``` 1. ArgoCD UI → demo-nginx 2. History → Select previous version 3. Rollback button ``` --- ## Configuration Reference ### Environment Variables ```groovy APP_NAME = 'demo-nginx' // Deployment name CONTAINER_NAME = 'nginx' // Container name NAMESPACE = 'demo-app' // K8s namespace DOCKER_REGISTRY = 'docker.io' // Registry DOCKER_REPO = 'vladcrypto' // Docker Hub user HEALTH_CHECK_TIMEOUT = '300s' // Rollout timeout ``` ### Customization Изменить настройки в Jenkinsfile.rollback: ```groovy // Увеличить timeout HEALTH_CHECK_TIMEOUT = '600s' // Больше попыток health check for i in 1 2 3 4 5 6 7 8 9 10; do // Дольше ждать stabilization sleep 30 // Instead of 10 ``` --- ## Monitoring & Alerts ### Grafana Dashboard ```promql # Rollback count sum(increase(deployment_rollback_total[1h])) by (deployment) # Rollback rate rate(deployment_rollback_total[5m]) # Average rollback duration avg(deployment_rollback_duration_seconds) ``` ### Alert Rules ```yaml - alert: FrequentRollbacks expr: rate(deployment_rollback_total[1h]) > 2 annotations: summary: "Frequent rollbacks detected" description: "More than 2 rollbacks in last hour" - alert: RollbackFailed expr: deployment_rollback_failed_total > 0 annotations: summary: "Rollback failed" description: "Manual intervention required" ``` --- ## Summary of All Fixes | # | Issue | Fix | Status | |---|-------|-----|--------| | 1 | Container name wrong | Use `nginx` not `demo-nginx` | ✅ Fixed | | 2 | Whitespace in input | Auto-trim with `.trim()` | ✅ Fixed | | 3 | RBAC pods/exec | Add permission to ClusterRole | ✅ Fixed | | 4 | Health check timing | Use `SKIP_HEALTH_CHECK=true` | ⚠️ Workaround | | 5 | Bash loop syntax | Use explicit list `1 2 3 4 5` | ✅ Fixed | --- ## Success Criteria ✅ **Rollback Methods:** 3/3 working (IMAGE_TAG, REVISION, GIT_COMMIT) ✅ **GitOps Sync:** Git commits automatically ✅ **Zero Downtime:** Rolling updates ✅ **RBAC:** Full permissions configured ✅ **Input Validation:** Whitespace auto-trimmed ✅ **DRY_RUN:** Safe testing mode ✅ **Retry Logic:** 5 attempts with proper bash syntax ⚠️ **Health Check:** Optional (use SKIP_HEALTH_CHECK=true) --- ## FAQ ### Q: Health check всегда падает, это нормально? **A:** Да, из-за timing race condition во время rolling update. Используй `SKIP_HEALTH_CHECK: true` и проверь вручную через 30-60s. ### Q: Как откатиться на несколько версий назад? **A:** Используй `REVISION_NUMBER` метод и укажи нужную revision из `kubectl rollout history`. ### Q: Можно ли откатить только в staging? **A:** Да, измени `NAMESPACE` в Jenkinsfile или создай отдельный job для staging. ### Q: Как быстро откатиться в emergency? **A:** Используй `kubectl rollout undo` (30 секунд) или Jenkins с `SKIP_HEALTH_CHECK=true` (2 минуты). ### Q: Что если Git commit fail? **A:** Rollback всё равно произошёл в Kubernetes! Git нужен только для GitOps sync. ArgoCD пере-синкает через 3 минуты. --- ## Related Documentation - [CI/CD Guide](../../../CICD_GUIDE.md) - [Automatic Rollback](../Jenkinsfile) - See `post { failure }` section - [Jenkins RBAC](../../jenkins/rbac.yaml) - [Deployment Manifest](../deployment.yaml) --- ## Support **Issues?** - Check Jenkins console output - Verify RBAC permissions - Check pod status: `kubectl get pods -n demo-app` - Review ArgoCD sync status **Need Help?** - Jenkins logs: Jenkins → Build → Console Output - Kubernetes events: `kubectl get events -n demo-app` - Pod logs: `kubectl logs -n demo-app -l app=demo-nginx` --- **Last Updated:** 2026-01-06 **Version:** 1.0 **Status:** Production Ready ✅