718 lines
15 KiB
Markdown
718 lines
15 KiB
Markdown
# 🔄 Manual Rollback Feature - Complete Documentation
|
||
|
||
## 📋 Table of Contents
|
||
|
||
1. [Overview](#overview)
|
||
2. [Features](#features)
|
||
3. [Setup Guide](#setup-guide)
|
||
4. [Usage Guide](#usage-guide)
|
||
5. [Rollback Methods](#rollback-methods)
|
||
6. [Troubleshooting & Fixes](#troubleshooting--fixes)
|
||
7. [Best Practices](#best-practices)
|
||
8. [Examples](#examples)
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
Manual Rollback feature позволяет откатить deployment на любую предыдущую версию через Jenkins Pipeline.
|
||
|
||
### Key Features:
|
||
- ✅ **3 способа rollback** (IMAGE_TAG, REVISION_NUMBER, GIT_COMMIT)
|
||
- ✅ **GitOps sync** - автоматически обновляет Git manifests
|
||
- ✅ **Zero downtime** - rolling updates
|
||
- ✅ **DRY_RUN mode** - безопасное тестирование
|
||
- ✅ **Health checks** - опциональная проверка после rollback
|
||
- ✅ **Full RBAC** - правильные permissions
|
||
|
||
---
|
||
|
||
## Features
|
||
|
||
### Rollback Methods
|
||
|
||
| Method | Description | Example | Use Case |
|
||
|--------|-------------|---------|----------|
|
||
| **IMAGE_TAG** | По Docker image tag | `main-21` | Знаешь конкретный build number |
|
||
| **REVISION_NUMBER** | По Kubernetes revision | `2` | Откат на N шагов назад |
|
||
| **GIT_COMMIT** | По Git commit SHA | `abc123def` | Точное состояние кода |
|
||
|
||
### Parameters
|
||
|
||
```groovy
|
||
ROLLBACK_METHOD // Выбор метода
|
||
TARGET_VERSION // Целевая версия (auto-trim whitespace)
|
||
SKIP_HEALTH_CHECK // Пропустить health checks (default: false)
|
||
DRY_RUN // Только показать план (default: false)
|
||
```
|
||
|
||
---
|
||
|
||
## Setup Guide
|
||
|
||
### Step 1: Create Jenkins Pipeline
|
||
|
||
```
|
||
1. Jenkins → New Item
|
||
2. Name: demo-nginx-rollback
|
||
3. Type: Pipeline
|
||
4. Click OK
|
||
```
|
||
|
||
### Step 2: Configure Pipeline
|
||
|
||
```yaml
|
||
Pipeline:
|
||
Definition: Pipeline script from SCM
|
||
SCM: Git
|
||
Repository URL: http://gitea-http.gitea.svc.cluster.local:3000/admin/k3s-gitops
|
||
Credentials: gitea-credentials
|
||
Branch: */main
|
||
Script Path: apps/demo-nginx/Jenkinsfile.rollback
|
||
```
|
||
|
||
### Step 3: Verify RBAC
|
||
|
||
RBAC уже настроен в `apps/jenkins/rbac.yaml`:
|
||
|
||
```yaml
|
||
ClusterRole: jenkins-deployer
|
||
Permissions:
|
||
- pods, services, deployments (full CRUD)
|
||
- pods/exec, pods/log (for health checks)
|
||
- ingresses, applications (for ArgoCD)
|
||
```
|
||
|
||
### Step 4: Test with DRY_RUN
|
||
|
||
```
|
||
Jenkins → demo-nginx-rollback → Build with Parameters
|
||
├─ ROLLBACK_METHOD: IMAGE_TAG
|
||
├─ TARGET_VERSION: main-21
|
||
├─ DRY_RUN: ✅ true
|
||
└─ Build
|
||
```
|
||
|
||
---
|
||
|
||
## Usage Guide
|
||
|
||
### Quick Start
|
||
|
||
```
|
||
Jenkins → demo-nginx-rollback → Build with Parameters
|
||
|
||
┌─────────────────────────────────────┐
|
||
│ ROLLBACK_METHOD: IMAGE_TAG │
|
||
│ TARGET_VERSION: main-21 │
|
||
│ SKIP_HEALTH_CHECK: true (рекоменд.) │
|
||
│ DRY_RUN: false │
|
||
└─────────────────────────────────────┘
|
||
|
||
→ Build → ✅ SUCCESS!
|
||
```
|
||
|
||
### Pipeline Stages
|
||
|
||
```
|
||
Stage 1: Validate Input
|
||
└─ Trim whitespace, validate TARGET_VERSION
|
||
|
||
Stage 2: Show Current State
|
||
└─ Current deployment, image, pods, history
|
||
|
||
Stage 3: Prepare Rollback
|
||
└─ Build target image path or verify revision
|
||
|
||
Stage 4: Execute Rollback
|
||
├─ kubectl set image (or rollout undo)
|
||
└─ Git commit & push
|
||
|
||
Stage 5: Wait for Rollout
|
||
├─ kubectl rollout status (300s timeout)
|
||
└─ sleep 10s (stabilization)
|
||
|
||
Stage 6: Health Check (optional)
|
||
└─ 5 retry attempts with 5s delay
|
||
|
||
Stage 7: Show New State
|
||
└─ New deployment state, pods, history
|
||
```
|
||
|
||
---
|
||
|
||
## Rollback Methods
|
||
|
||
### Method 1: IMAGE_TAG (Recommended)
|
||
|
||
**Когда использовать:** Знаешь конкретный build number
|
||
|
||
**Как найти tag:**
|
||
```bash
|
||
# Docker Hub
|
||
https://hub.docker.com/r/vladcrypto/demo-nginx/tags
|
||
|
||
# Jenkins build history
|
||
Jenkins → demo-nginx → Build History
|
||
|
||
# Git commits
|
||
git log --oneline | grep "Update image"
|
||
```
|
||
|
||
**Example:**
|
||
```
|
||
ROLLBACK_METHOD: IMAGE_TAG
|
||
TARGET_VERSION: main-21
|
||
|
||
Result: Rollback to docker.io/vladcrypto/demo-nginx:main-21
|
||
```
|
||
|
||
---
|
||
|
||
### Method 2: REVISION_NUMBER
|
||
|
||
**Когда использовать:** Нужно откатиться на N шагов назад
|
||
|
||
**Как найти revision:**
|
||
```bash
|
||
kubectl rollout history deployment/demo-nginx -n demo-app
|
||
|
||
# Output:
|
||
REVISION CHANGE-CAUSE
|
||
1 Initial deployment
|
||
2 Update to main-20
|
||
3 Update to main-21
|
||
4 Update to main-22 (current)
|
||
```
|
||
|
||
**Example:**
|
||
```
|
||
ROLLBACK_METHOD: REVISION_NUMBER
|
||
TARGET_VERSION: 2
|
||
|
||
Result: Rollback to revision 2 (main-20)
|
||
```
|
||
|
||
---
|
||
|
||
### Method 3: GIT_COMMIT
|
||
|
||
**Когда использовать:** Нужно вернуться к конкретному состоянию кода
|
||
|
||
**Как найти commit:**
|
||
```bash
|
||
# Gitea
|
||
https://git.thedevops.dev/admin/k3s-gitops/commits/branch/main
|
||
|
||
# Git CLI
|
||
git log --oneline apps/demo-nginx/deployment.yaml
|
||
|
||
# Output:
|
||
abc123d Update image to main-22 (current)
|
||
def456e Update image to main-21
|
||
ghi789f Update image to main-20
|
||
```
|
||
|
||
**Example:**
|
||
```
|
||
ROLLBACK_METHOD: GIT_COMMIT
|
||
TARGET_VERSION: def456e
|
||
|
||
Result: Rollback to commit def456e
|
||
```
|
||
|
||
---
|
||
|
||
## Troubleshooting & Fixes
|
||
|
||
### Issue #1: Container Name Error ✅ FIXED
|
||
|
||
**Error:**
|
||
```
|
||
error: unable to find container named "demo-nginx"
|
||
```
|
||
|
||
**Root Cause:**
|
||
Pipeline использовал deployment name вместо container name.
|
||
|
||
**Fix:**
|
||
```groovy
|
||
environment {
|
||
APP_NAME = 'demo-nginx' // Deployment name
|
||
CONTAINER_NAME = 'nginx' // Container name ✅
|
||
}
|
||
|
||
kubectl set image deployment/${APP_NAME} \
|
||
${CONTAINER_NAME}=${TARGET_IMAGE}
|
||
```
|
||
|
||
**How to verify:**
|
||
```bash
|
||
kubectl get deployment demo-nginx -n demo-app \
|
||
-o jsonpath='{.spec.template.spec.containers[0].name}'
|
||
# Output: nginx
|
||
```
|
||
|
||
---
|
||
|
||
### Issue #2: Whitespace in Input ✅ FIXED
|
||
|
||
**Error:**
|
||
```
|
||
Target image: docker.io/vladcrypto/demo-nginx: main-21
|
||
^
|
||
Space!
|
||
```
|
||
|
||
**Root Cause:**
|
||
User ввел TARGET_VERSION с пробелом.
|
||
|
||
**Fix:**
|
||
```groovy
|
||
stage('Validate Input') {
|
||
// Auto-trim whitespace
|
||
env.TARGET_VERSION_CLEAN = params.TARGET_VERSION.trim()
|
||
|
||
// Use everywhere
|
||
${env.TARGET_VERSION_CLEAN}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### Issue #3: RBAC Permissions ✅ FIXED
|
||
|
||
**Error:**
|
||
```
|
||
Error: User "system:serviceaccount:jenkins:jenkins"
|
||
cannot create resource "pods/exec"
|
||
```
|
||
|
||
**Root Cause:**
|
||
Jenkins ServiceAccount не имел прав на pods/exec для health checks.
|
||
|
||
**Fix:**
|
||
```yaml
|
||
# apps/jenkins/rbac.yaml
|
||
rules:
|
||
- apiGroups: [""]
|
||
resources: ["pods/exec", "pods/log"] # ← Added!
|
||
verbs: ["create", "get"]
|
||
```
|
||
|
||
**Applied:**
|
||
```bash
|
||
kubectl apply -f apps/jenkins/rbac.yaml
|
||
```
|
||
|
||
---
|
||
|
||
### Issue #4: Health Check Timing ⚠️ WORKAROUND
|
||
|
||
**Error:**
|
||
```
|
||
wget: can't connect to remote host: Connection refused
|
||
```
|
||
|
||
**Root Cause:**
|
||
Health check runs too early during rolling update (race condition).
|
||
|
||
**Workaround:**
|
||
```groovy
|
||
// Option 1: Skip health check (recommended)
|
||
SKIP_HEALTH_CHECK: true
|
||
|
||
// Option 2: Longer stabilization wait
|
||
sleep 30 // Instead of 10
|
||
```
|
||
|
||
**Timeline:**
|
||
```
|
||
T+0s: kubectl set image
|
||
T+30s: Rollout status = complete
|
||
T+40s: sleep 10s
|
||
T+50s: Health check (pods might still be starting)
|
||
```
|
||
|
||
**Solution:**
|
||
Use `SKIP_HEALTH_CHECK: true` и проверь вручную через 30-60s:
|
||
|
||
```bash
|
||
kubectl get pods -n demo-app -l app=demo-nginx
|
||
```
|
||
|
||
---
|
||
|
||
### Issue #5: Bash Loop Syntax ✅ FIXED
|
||
|
||
**Error:**
|
||
```
|
||
Health check attempt {1..5}/5...
|
||
# Loop executed only once!
|
||
```
|
||
|
||
**Root Cause:**
|
||
`{1..5}` не работает в sh/dash, нужен bash.
|
||
|
||
**Fix:**
|
||
```bash
|
||
#!/bin/bash # ← Added shebang
|
||
set -e
|
||
|
||
# Fixed loop syntax
|
||
for i in 1 2 3 4 5; do # Instead of {1..5}
|
||
echo "Health check attempt $i/5..."
|
||
if kubectl exec ...; then
|
||
exit 0
|
||
fi
|
||
if [ $i -lt 5 ]; then
|
||
sleep 5
|
||
fi
|
||
done
|
||
```
|
||
|
||
---
|
||
|
||
## Best Practices
|
||
|
||
### 1. Always Use DRY_RUN First
|
||
|
||
```
|
||
Step 1: DRY_RUN=true → Проверь план
|
||
Step 2: Verify output
|
||
Step 3: DRY_RUN=false → Execute
|
||
```
|
||
|
||
### 2. Use SKIP_HEALTH_CHECK for Emergency
|
||
|
||
```
|
||
Emergency rollback:
|
||
├─ SKIP_HEALTH_CHECK: true
|
||
├─ Focus on speed
|
||
└─ Verify manually after
|
||
```
|
||
|
||
### 3. Document Rollback Reason
|
||
|
||
Add comment в Jenkins build:
|
||
```
|
||
Build Comment:
|
||
"Rollback due to: API errors in main-23
|
||
Previous working version: main-21
|
||
Impact: None (zero downtime)"
|
||
```
|
||
|
||
### 4. Monitor After Rollback
|
||
|
||
```bash
|
||
# Watch pods
|
||
watch kubectl get pods -n demo-app
|
||
|
||
# Check logs
|
||
kubectl logs -n demo-app -l app=demo-nginx -f
|
||
|
||
# Verify image
|
||
kubectl get deployment demo-nginx -n demo-app \
|
||
-o jsonpath='{.spec.template.spec.containers[0].image}'
|
||
```
|
||
|
||
### 5. Verify in ArgoCD
|
||
|
||
```
|
||
ArgoCD UI → demo-nginx
|
||
├─ Status: Synced ✅
|
||
└─ Health: Healthy ✅
|
||
```
|
||
|
||
---
|
||
|
||
## Examples
|
||
|
||
### Example 1: Quick Rollback to Previous Build
|
||
|
||
```
|
||
Scenario: Build #23 failed, rollback to #21
|
||
|
||
Steps:
|
||
1. Jenkins → demo-nginx-rollback
|
||
2. IMAGE_TAG + main-21
|
||
3. SKIP_HEALTH_CHECK: true
|
||
4. Build
|
||
|
||
Time: ~2 minutes
|
||
Result: ✅ SUCCESS
|
||
```
|
||
|
||
---
|
||
|
||
### Example 2: Rollback to Last Week's Version
|
||
|
||
```
|
||
Scenario: Need stable version from last week
|
||
|
||
Steps:
|
||
1. Find old build: Jenkins → Build History → #15
|
||
2. Check image tag: main-15
|
||
3. Jenkins → demo-nginx-rollback
|
||
4. IMAGE_TAG + main-15
|
||
5. DRY_RUN: true (verify first!)
|
||
6. DRY_RUN: false (execute)
|
||
|
||
Result: ✅ Rolled back to main-15
|
||
```
|
||
|
||
---
|
||
|
||
### Example 3: Rollback by Revision Number
|
||
|
||
```
|
||
Scenario: Откатить на 3 versions назад
|
||
|
||
Steps:
|
||
1. Check history:
|
||
kubectl rollout history deployment/demo-nginx -n demo-app
|
||
|
||
2. Find revision: 25 (current: 28)
|
||
|
||
3. Jenkins → demo-nginx-rollback
|
||
4. REVISION_NUMBER + 25
|
||
5. Build
|
||
|
||
Result: ✅ Rolled back to revision 25
|
||
```
|
||
|
||
---
|
||
|
||
### Example 4: Rollback by Git Commit
|
||
|
||
```
|
||
Scenario: Нужно точное состояние кода
|
||
|
||
Steps:
|
||
1. Find commit:
|
||
git log --oneline apps/demo-nginx/deployment.yaml
|
||
|
||
2. Copy SHA: abc123def
|
||
|
||
3. Jenkins → demo-nginx-rollback
|
||
4. GIT_COMMIT + abc123def
|
||
5. Build
|
||
|
||
Result: ✅ Rolled back to commit abc123def
|
||
```
|
||
|
||
---
|
||
|
||
## Manual Verification Commands
|
||
|
||
### Check Deployment Status
|
||
```bash
|
||
kubectl get deployment demo-nginx -n demo-app
|
||
|
||
# Expected:
|
||
NAME READY UP-TO-DATE AVAILABLE AGE
|
||
demo-nginx 2/2 2 2 15h
|
||
```
|
||
|
||
### Check Image Version
|
||
```bash
|
||
kubectl get deployment demo-nginx -n demo-app \
|
||
-o jsonpath='{.spec.template.spec.containers[0].image}'
|
||
|
||
# Expected: docker.io/vladcrypto/demo-nginx:main-21
|
||
```
|
||
|
||
### Check Pods
|
||
```bash
|
||
kubectl get pods -n demo-app -l app=demo-nginx
|
||
|
||
# Expected: 2 pods Running
|
||
```
|
||
|
||
### Check Rollout History
|
||
```bash
|
||
kubectl rollout history deployment/demo-nginx -n demo-app
|
||
|
||
# Shows all revisions
|
||
```
|
||
|
||
### Test Health Endpoint
|
||
```bash
|
||
POD=$(kubectl get pods -n demo-app -l app=demo-nginx -o jsonpath='{.items[0].metadata.name}')
|
||
kubectl exec $POD -n demo-app -- wget -q -O- http://localhost/health
|
||
|
||
# Expected: healthy
|
||
```
|
||
|
||
---
|
||
|
||
## Emergency Rollback Procedure
|
||
|
||
### If Production is Down
|
||
|
||
**Option 1: Jenkins (2 minutes)**
|
||
```
|
||
1. Jenkins → demo-nginx-rollback
|
||
2. IMAGE_TAG → last known good version
|
||
3. SKIP_HEALTH_CHECK: ✅ true
|
||
4. Build
|
||
```
|
||
|
||
**Option 2: kubectl (30 seconds)**
|
||
```bash
|
||
# Fastest - rollback to previous
|
||
kubectl rollout undo deployment/demo-nginx -n demo-app
|
||
|
||
# To specific revision
|
||
kubectl rollout undo deployment/demo-nginx -n demo-app --to-revision=25
|
||
```
|
||
|
||
**Option 3: ArgoCD (1 minute)**
|
||
```
|
||
1. ArgoCD UI → demo-nginx
|
||
2. History → Select previous version
|
||
3. Rollback button
|
||
```
|
||
|
||
---
|
||
|
||
## Configuration Reference
|
||
|
||
### Environment Variables
|
||
|
||
```groovy
|
||
APP_NAME = 'demo-nginx' // Deployment name
|
||
CONTAINER_NAME = 'nginx' // Container name
|
||
NAMESPACE = 'demo-app' // K8s namespace
|
||
DOCKER_REGISTRY = 'docker.io' // Registry
|
||
DOCKER_REPO = 'vladcrypto' // Docker Hub user
|
||
HEALTH_CHECK_TIMEOUT = '300s' // Rollout timeout
|
||
```
|
||
|
||
### Customization
|
||
|
||
Изменить настройки в Jenkinsfile.rollback:
|
||
|
||
```groovy
|
||
// Увеличить timeout
|
||
HEALTH_CHECK_TIMEOUT = '600s'
|
||
|
||
// Больше попыток health check
|
||
for i in 1 2 3 4 5 6 7 8 9 10; do
|
||
|
||
// Дольше ждать stabilization
|
||
sleep 30 // Instead of 10
|
||
```
|
||
|
||
---
|
||
|
||
## Monitoring & Alerts
|
||
|
||
### Grafana Dashboard
|
||
|
||
```promql
|
||
# Rollback count
|
||
sum(increase(deployment_rollback_total[1h])) by (deployment)
|
||
|
||
# Rollback rate
|
||
rate(deployment_rollback_total[5m])
|
||
|
||
# Average rollback duration
|
||
avg(deployment_rollback_duration_seconds)
|
||
```
|
||
|
||
### Alert Rules
|
||
|
||
```yaml
|
||
- alert: FrequentRollbacks
|
||
expr: rate(deployment_rollback_total[1h]) > 2
|
||
annotations:
|
||
summary: "Frequent rollbacks detected"
|
||
description: "More than 2 rollbacks in last hour"
|
||
|
||
- alert: RollbackFailed
|
||
expr: deployment_rollback_failed_total > 0
|
||
annotations:
|
||
summary: "Rollback failed"
|
||
description: "Manual intervention required"
|
||
```
|
||
|
||
---
|
||
|
||
## Summary of All Fixes
|
||
|
||
| # | Issue | Fix | Status |
|
||
|---|-------|-----|--------|
|
||
| 1 | Container name wrong | Use `nginx` not `demo-nginx` | ✅ Fixed |
|
||
| 2 | Whitespace in input | Auto-trim with `.trim()` | ✅ Fixed |
|
||
| 3 | RBAC pods/exec | Add permission to ClusterRole | ✅ Fixed |
|
||
| 4 | Health check timing | Use `SKIP_HEALTH_CHECK=true` | ⚠️ Workaround |
|
||
| 5 | Bash loop syntax | Use explicit list `1 2 3 4 5` | ✅ Fixed |
|
||
|
||
---
|
||
|
||
## Success Criteria
|
||
|
||
✅ **Rollback Methods:** 3/3 working (IMAGE_TAG, REVISION, GIT_COMMIT)
|
||
✅ **GitOps Sync:** Git commits automatically
|
||
✅ **Zero Downtime:** Rolling updates
|
||
✅ **RBAC:** Full permissions configured
|
||
✅ **Input Validation:** Whitespace auto-trimmed
|
||
✅ **DRY_RUN:** Safe testing mode
|
||
✅ **Retry Logic:** 5 attempts with proper bash syntax
|
||
⚠️ **Health Check:** Optional (use SKIP_HEALTH_CHECK=true)
|
||
|
||
---
|
||
|
||
## FAQ
|
||
|
||
### Q: Health check всегда падает, это нормально?
|
||
|
||
**A:** Да, из-за timing race condition во время rolling update. Используй `SKIP_HEALTH_CHECK: true` и проверь вручную через 30-60s.
|
||
|
||
### Q: Как откатиться на несколько версий назад?
|
||
|
||
**A:** Используй `REVISION_NUMBER` метод и укажи нужную revision из `kubectl rollout history`.
|
||
|
||
### Q: Можно ли откатить только в staging?
|
||
|
||
**A:** Да, измени `NAMESPACE` в Jenkinsfile или создай отдельный job для staging.
|
||
|
||
### Q: Как быстро откатиться в emergency?
|
||
|
||
**A:** Используй `kubectl rollout undo` (30 секунд) или Jenkins с `SKIP_HEALTH_CHECK=true` (2 минуты).
|
||
|
||
### Q: Что если Git commit fail?
|
||
|
||
**A:** Rollback всё равно произошёл в Kubernetes! Git нужен только для GitOps sync. ArgoCD пере-синкает через 3 минуты.
|
||
|
||
---
|
||
|
||
## Related Documentation
|
||
|
||
- [CI/CD Guide](../../../CICD_GUIDE.md)
|
||
- [Automatic Rollback](../Jenkinsfile) - See `post { failure }` section
|
||
- [Jenkins RBAC](../../jenkins/rbac.yaml)
|
||
- [Deployment Manifest](../deployment.yaml)
|
||
|
||
---
|
||
|
||
## Support
|
||
|
||
**Issues?**
|
||
- Check Jenkins console output
|
||
- Verify RBAC permissions
|
||
- Check pod status: `kubectl get pods -n demo-app`
|
||
- Review ArgoCD sync status
|
||
|
||
**Need Help?**
|
||
- Jenkins logs: Jenkins → Build → Console Output
|
||
- Kubernetes events: `kubectl get events -n demo-app`
|
||
- Pod logs: `kubectl logs -n demo-app -l app=demo-nginx`
|
||
|
||
---
|
||
|
||
**Last Updated:** 2026-01-06
|
||
**Version:** 1.0
|
||
**Status:** Production Ready ✅
|