diff --git a/apps/demo-nginx/docs/ROLLBACK_MANUAL.md b/apps/demo-nginx/docs/ROLLBACK_MANUAL.md new file mode 100644 index 0000000..c50207c --- /dev/null +++ b/apps/demo-nginx/docs/ROLLBACK_MANUAL.md @@ -0,0 +1,717 @@ +# 🔄 Manual Rollback Feature - Complete Documentation + +## 📋 Table of Contents + +1. [Overview](#overview) +2. [Features](#features) +3. [Setup Guide](#setup-guide) +4. [Usage Guide](#usage-guide) +5. [Rollback Methods](#rollback-methods) +6. [Troubleshooting & Fixes](#troubleshooting--fixes) +7. [Best Practices](#best-practices) +8. [Examples](#examples) + +--- + +## Overview + +Manual Rollback feature позволяет откатить deployment на любую предыдущую версию через Jenkins Pipeline. + +### Key Features: +- ✅ **3 способа rollback** (IMAGE_TAG, REVISION_NUMBER, GIT_COMMIT) +- ✅ **GitOps sync** - автоматически обновляет Git manifests +- ✅ **Zero downtime** - rolling updates +- ✅ **DRY_RUN mode** - безопасное тестирование +- ✅ **Health checks** - опциональная проверка после rollback +- ✅ **Full RBAC** - правильные permissions + +--- + +## Features + +### Rollback Methods + +| Method | Description | Example | Use Case | +|--------|-------------|---------|----------| +| **IMAGE_TAG** | По Docker image tag | `main-21` | Знаешь конкретный build number | +| **REVISION_NUMBER** | По Kubernetes revision | `2` | Откат на N шагов назад | +| **GIT_COMMIT** | По Git commit SHA | `abc123def` | Точное состояние кода | + +### Parameters + +```groovy +ROLLBACK_METHOD // Выбор метода +TARGET_VERSION // Целевая версия (auto-trim whitespace) +SKIP_HEALTH_CHECK // Пропустить health checks (default: false) +DRY_RUN // Только показать план (default: false) +``` + +--- + +## Setup Guide + +### Step 1: Create Jenkins Pipeline + +``` +1. Jenkins → New Item +2. Name: demo-nginx-rollback +3. Type: Pipeline +4. Click OK +``` + +### Step 2: Configure Pipeline + +```yaml +Pipeline: + Definition: Pipeline script from SCM + SCM: Git + Repository URL: http://gitea-http.gitea.svc.cluster.local:3000/admin/k3s-gitops + Credentials: gitea-credentials + Branch: */main + Script Path: apps/demo-nginx/Jenkinsfile.rollback +``` + +### Step 3: Verify RBAC + +RBAC уже настроен в `apps/jenkins/rbac.yaml`: + +```yaml +ClusterRole: jenkins-deployer +Permissions: + - pods, services, deployments (full CRUD) + - pods/exec, pods/log (for health checks) + - ingresses, applications (for ArgoCD) +``` + +### Step 4: Test with DRY_RUN + +``` +Jenkins → demo-nginx-rollback → Build with Parameters +├─ ROLLBACK_METHOD: IMAGE_TAG +├─ TARGET_VERSION: main-21 +├─ DRY_RUN: ✅ true +└─ Build +``` + +--- + +## Usage Guide + +### Quick Start + +``` +Jenkins → demo-nginx-rollback → Build with Parameters + +┌─────────────────────────────────────┐ +│ ROLLBACK_METHOD: IMAGE_TAG │ +│ TARGET_VERSION: main-21 │ +│ SKIP_HEALTH_CHECK: true (рекоменд.) │ +│ DRY_RUN: false │ +└─────────────────────────────────────┘ + +→ Build → ✅ SUCCESS! +``` + +### Pipeline Stages + +``` +Stage 1: Validate Input + └─ Trim whitespace, validate TARGET_VERSION + +Stage 2: Show Current State + └─ Current deployment, image, pods, history + +Stage 3: Prepare Rollback + └─ Build target image path or verify revision + +Stage 4: Execute Rollback + ├─ kubectl set image (or rollout undo) + └─ Git commit & push + +Stage 5: Wait for Rollout + ├─ kubectl rollout status (300s timeout) + └─ sleep 10s (stabilization) + +Stage 6: Health Check (optional) + └─ 5 retry attempts with 5s delay + +Stage 7: Show New State + └─ New deployment state, pods, history +``` + +--- + +## Rollback Methods + +### Method 1: IMAGE_TAG (Recommended) + +**Когда использовать:** Знаешь конкретный build number + +**Как найти tag:** +```bash +# Docker Hub +https://hub.docker.com/r/vladcrypto/demo-nginx/tags + +# Jenkins build history +Jenkins → demo-nginx → Build History + +# Git commits +git log --oneline | grep "Update image" +``` + +**Example:** +``` +ROLLBACK_METHOD: IMAGE_TAG +TARGET_VERSION: main-21 + +Result: Rollback to docker.io/vladcrypto/demo-nginx:main-21 +``` + +--- + +### Method 2: REVISION_NUMBER + +**Когда использовать:** Нужно откатиться на N шагов назад + +**Как найти revision:** +```bash +kubectl rollout history deployment/demo-nginx -n demo-app + +# Output: +REVISION CHANGE-CAUSE +1 Initial deployment +2 Update to main-20 +3 Update to main-21 +4 Update to main-22 (current) +``` + +**Example:** +``` +ROLLBACK_METHOD: REVISION_NUMBER +TARGET_VERSION: 2 + +Result: Rollback to revision 2 (main-20) +``` + +--- + +### Method 3: GIT_COMMIT + +**Когда использовать:** Нужно вернуться к конкретному состоянию кода + +**Как найти commit:** +```bash +# Gitea +https://git.thedevops.dev/admin/k3s-gitops/commits/branch/main + +# Git CLI +git log --oneline apps/demo-nginx/deployment.yaml + +# Output: +abc123d Update image to main-22 (current) +def456e Update image to main-21 +ghi789f Update image to main-20 +``` + +**Example:** +``` +ROLLBACK_METHOD: GIT_COMMIT +TARGET_VERSION: def456e + +Result: Rollback to commit def456e +``` + +--- + +## Troubleshooting & Fixes + +### Issue #1: Container Name Error ✅ FIXED + +**Error:** +``` +error: unable to find container named "demo-nginx" +``` + +**Root Cause:** +Pipeline использовал deployment name вместо container name. + +**Fix:** +```groovy +environment { + APP_NAME = 'demo-nginx' // Deployment name + CONTAINER_NAME = 'nginx' // Container name ✅ +} + +kubectl set image deployment/${APP_NAME} \ + ${CONTAINER_NAME}=${TARGET_IMAGE} +``` + +**How to verify:** +```bash +kubectl get deployment demo-nginx -n demo-app \ + -o jsonpath='{.spec.template.spec.containers[0].name}' +# Output: nginx +``` + +--- + +### Issue #2: Whitespace in Input ✅ FIXED + +**Error:** +``` +Target image: docker.io/vladcrypto/demo-nginx: main-21 + ^ + Space! +``` + +**Root Cause:** +User ввел TARGET_VERSION с пробелом. + +**Fix:** +```groovy +stage('Validate Input') { + // Auto-trim whitespace + env.TARGET_VERSION_CLEAN = params.TARGET_VERSION.trim() + + // Use everywhere + ${env.TARGET_VERSION_CLEAN} +} +``` + +--- + +### Issue #3: RBAC Permissions ✅ FIXED + +**Error:** +``` +Error: User "system:serviceaccount:jenkins:jenkins" +cannot create resource "pods/exec" +``` + +**Root Cause:** +Jenkins ServiceAccount не имел прав на pods/exec для health checks. + +**Fix:** +```yaml +# apps/jenkins/rbac.yaml +rules: +- apiGroups: [""] + resources: ["pods/exec", "pods/log"] # ← Added! + verbs: ["create", "get"] +``` + +**Applied:** +```bash +kubectl apply -f apps/jenkins/rbac.yaml +``` + +--- + +### Issue #4: Health Check Timing ⚠️ WORKAROUND + +**Error:** +``` +wget: can't connect to remote host: Connection refused +``` + +**Root Cause:** +Health check runs too early during rolling update (race condition). + +**Workaround:** +```groovy +// Option 1: Skip health check (recommended) +SKIP_HEALTH_CHECK: true + +// Option 2: Longer stabilization wait +sleep 30 // Instead of 10 +``` + +**Timeline:** +``` +T+0s: kubectl set image +T+30s: Rollout status = complete +T+40s: sleep 10s +T+50s: Health check (pods might still be starting) +``` + +**Solution:** +Use `SKIP_HEALTH_CHECK: true` и проверь вручную через 30-60s: + +```bash +kubectl get pods -n demo-app -l app=demo-nginx +``` + +--- + +### Issue #5: Bash Loop Syntax ✅ FIXED + +**Error:** +``` +Health check attempt {1..5}/5... +# Loop executed only once! +``` + +**Root Cause:** +`{1..5}` не работает в sh/dash, нужен bash. + +**Fix:** +```bash +#!/bin/bash # ← Added shebang +set -e + +# Fixed loop syntax +for i in 1 2 3 4 5; do # Instead of {1..5} + echo "Health check attempt $i/5..." + if kubectl exec ...; then + exit 0 + fi + if [ $i -lt 5 ]; then + sleep 5 + fi +done +``` + +--- + +## Best Practices + +### 1. Always Use DRY_RUN First + +``` +Step 1: DRY_RUN=true → Проверь план +Step 2: Verify output +Step 3: DRY_RUN=false → Execute +``` + +### 2. Use SKIP_HEALTH_CHECK for Emergency + +``` +Emergency rollback: +├─ SKIP_HEALTH_CHECK: true +├─ Focus on speed +└─ Verify manually after +``` + +### 3. Document Rollback Reason + +Add comment в Jenkins build: +``` +Build Comment: +"Rollback due to: API errors in main-23 +Previous working version: main-21 +Impact: None (zero downtime)" +``` + +### 4. Monitor After Rollback + +```bash +# Watch pods +watch kubectl get pods -n demo-app + +# Check logs +kubectl logs -n demo-app -l app=demo-nginx -f + +# Verify image +kubectl get deployment demo-nginx -n demo-app \ + -o jsonpath='{.spec.template.spec.containers[0].image}' +``` + +### 5. Verify in ArgoCD + +``` +ArgoCD UI → demo-nginx +├─ Status: Synced ✅ +└─ Health: Healthy ✅ +``` + +--- + +## Examples + +### Example 1: Quick Rollback to Previous Build + +``` +Scenario: Build #23 failed, rollback to #21 + +Steps: +1. Jenkins → demo-nginx-rollback +2. IMAGE_TAG + main-21 +3. SKIP_HEALTH_CHECK: true +4. Build + +Time: ~2 minutes +Result: ✅ SUCCESS +``` + +--- + +### Example 2: Rollback to Last Week's Version + +``` +Scenario: Need stable version from last week + +Steps: +1. Find old build: Jenkins → Build History → #15 +2. Check image tag: main-15 +3. Jenkins → demo-nginx-rollback +4. IMAGE_TAG + main-15 +5. DRY_RUN: true (verify first!) +6. DRY_RUN: false (execute) + +Result: ✅ Rolled back to main-15 +``` + +--- + +### Example 3: Rollback by Revision Number + +``` +Scenario: Откатить на 3 versions назад + +Steps: +1. Check history: + kubectl rollout history deployment/demo-nginx -n demo-app + +2. Find revision: 25 (current: 28) + +3. Jenkins → demo-nginx-rollback +4. REVISION_NUMBER + 25 +5. Build + +Result: ✅ Rolled back to revision 25 +``` + +--- + +### Example 4: Rollback by Git Commit + +``` +Scenario: Нужно точное состояние кода + +Steps: +1. Find commit: + git log --oneline apps/demo-nginx/deployment.yaml + +2. Copy SHA: abc123def + +3. Jenkins → demo-nginx-rollback +4. GIT_COMMIT + abc123def +5. Build + +Result: ✅ Rolled back to commit abc123def +``` + +--- + +## Manual Verification Commands + +### Check Deployment Status +```bash +kubectl get deployment demo-nginx -n demo-app + +# Expected: +NAME READY UP-TO-DATE AVAILABLE AGE +demo-nginx 2/2 2 2 15h +``` + +### Check Image Version +```bash +kubectl get deployment demo-nginx -n demo-app \ + -o jsonpath='{.spec.template.spec.containers[0].image}' + +# Expected: docker.io/vladcrypto/demo-nginx:main-21 +``` + +### Check Pods +```bash +kubectl get pods -n demo-app -l app=demo-nginx + +# Expected: 2 pods Running +``` + +### Check Rollout History +```bash +kubectl rollout history deployment/demo-nginx -n demo-app + +# Shows all revisions +``` + +### Test Health Endpoint +```bash +POD=$(kubectl get pods -n demo-app -l app=demo-nginx -o jsonpath='{.items[0].metadata.name}') +kubectl exec $POD -n demo-app -- wget -q -O- http://localhost/health + +# Expected: healthy +``` + +--- + +## Emergency Rollback Procedure + +### If Production is Down + +**Option 1: Jenkins (2 minutes)** +``` +1. Jenkins → demo-nginx-rollback +2. IMAGE_TAG → last known good version +3. SKIP_HEALTH_CHECK: ✅ true +4. Build +``` + +**Option 2: kubectl (30 seconds)** +```bash +# Fastest - rollback to previous +kubectl rollout undo deployment/demo-nginx -n demo-app + +# To specific revision +kubectl rollout undo deployment/demo-nginx -n demo-app --to-revision=25 +``` + +**Option 3: ArgoCD (1 minute)** +``` +1. ArgoCD UI → demo-nginx +2. History → Select previous version +3. Rollback button +``` + +--- + +## Configuration Reference + +### Environment Variables + +```groovy +APP_NAME = 'demo-nginx' // Deployment name +CONTAINER_NAME = 'nginx' // Container name +NAMESPACE = 'demo-app' // K8s namespace +DOCKER_REGISTRY = 'docker.io' // Registry +DOCKER_REPO = 'vladcrypto' // Docker Hub user +HEALTH_CHECK_TIMEOUT = '300s' // Rollout timeout +``` + +### Customization + +Изменить настройки в Jenkinsfile.rollback: + +```groovy +// Увеличить timeout +HEALTH_CHECK_TIMEOUT = '600s' + +// Больше попыток health check +for i in 1 2 3 4 5 6 7 8 9 10; do + +// Дольше ждать stabilization +sleep 30 // Instead of 10 +``` + +--- + +## Monitoring & Alerts + +### Grafana Dashboard + +```promql +# Rollback count +sum(increase(deployment_rollback_total[1h])) by (deployment) + +# Rollback rate +rate(deployment_rollback_total[5m]) + +# Average rollback duration +avg(deployment_rollback_duration_seconds) +``` + +### Alert Rules + +```yaml +- alert: FrequentRollbacks + expr: rate(deployment_rollback_total[1h]) > 2 + annotations: + summary: "Frequent rollbacks detected" + description: "More than 2 rollbacks in last hour" + +- alert: RollbackFailed + expr: deployment_rollback_failed_total > 0 + annotations: + summary: "Rollback failed" + description: "Manual intervention required" +``` + +--- + +## Summary of All Fixes + +| # | Issue | Fix | Status | +|---|-------|-----|--------| +| 1 | Container name wrong | Use `nginx` not `demo-nginx` | ✅ Fixed | +| 2 | Whitespace in input | Auto-trim with `.trim()` | ✅ Fixed | +| 3 | RBAC pods/exec | Add permission to ClusterRole | ✅ Fixed | +| 4 | Health check timing | Use `SKIP_HEALTH_CHECK=true` | ⚠️ Workaround | +| 5 | Bash loop syntax | Use explicit list `1 2 3 4 5` | ✅ Fixed | + +--- + +## Success Criteria + +✅ **Rollback Methods:** 3/3 working (IMAGE_TAG, REVISION, GIT_COMMIT) +✅ **GitOps Sync:** Git commits automatically +✅ **Zero Downtime:** Rolling updates +✅ **RBAC:** Full permissions configured +✅ **Input Validation:** Whitespace auto-trimmed +✅ **DRY_RUN:** Safe testing mode +✅ **Retry Logic:** 5 attempts with proper bash syntax +⚠️ **Health Check:** Optional (use SKIP_HEALTH_CHECK=true) + +--- + +## FAQ + +### Q: Health check всегда падает, это нормально? + +**A:** Да, из-за timing race condition во время rolling update. Используй `SKIP_HEALTH_CHECK: true` и проверь вручную через 30-60s. + +### Q: Как откатиться на несколько версий назад? + +**A:** Используй `REVISION_NUMBER` метод и укажи нужную revision из `kubectl rollout history`. + +### Q: Можно ли откатить только в staging? + +**A:** Да, измени `NAMESPACE` в Jenkinsfile или создай отдельный job для staging. + +### Q: Как быстро откатиться в emergency? + +**A:** Используй `kubectl rollout undo` (30 секунд) или Jenkins с `SKIP_HEALTH_CHECK=true` (2 минуты). + +### Q: Что если Git commit fail? + +**A:** Rollback всё равно произошёл в Kubernetes! Git нужен только для GitOps sync. ArgoCD пере-синкает через 3 минуты. + +--- + +## Related Documentation + +- [CI/CD Guide](../../../CICD_GUIDE.md) +- [Automatic Rollback](../Jenkinsfile) - See `post { failure }` section +- [Jenkins RBAC](../../jenkins/rbac.yaml) +- [Deployment Manifest](../deployment.yaml) + +--- + +## Support + +**Issues?** +- Check Jenkins console output +- Verify RBAC permissions +- Check pod status: `kubectl get pods -n demo-app` +- Review ArgoCD sync status + +**Need Help?** +- Jenkins logs: Jenkins → Build → Console Output +- Kubernetes events: `kubectl get events -n demo-app` +- Pod logs: `kubectl logs -n demo-app -l app=demo-nginx` + +--- + +**Last Updated:** 2026-01-06 +**Version:** 1.0 +**Status:** Production Ready ✅