Files
k3s-gitops/apps/demo-nginx/docs/ROLLBACK_MANUAL.md

718 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 🔄 Manual Rollback Feature - Complete Documentation
## 📋 Table of Contents
1. [Overview](#overview)
2. [Features](#features)
3. [Setup Guide](#setup-guide)
4. [Usage Guide](#usage-guide)
5. [Rollback Methods](#rollback-methods)
6. [Troubleshooting & Fixes](#troubleshooting--fixes)
7. [Best Practices](#best-practices)
8. [Examples](#examples)
---
## Overview
Manual Rollback feature позволяет откатить deployment на любую предыдущую версию через Jenkins Pipeline.
### Key Features:
-**3 способа rollback** (IMAGE_TAG, REVISION_NUMBER, GIT_COMMIT)
-**GitOps sync** - автоматически обновляет Git manifests
-**Zero downtime** - rolling updates
-**DRY_RUN mode** - безопасное тестирование
-**Health checks** - опциональная проверка после rollback
-**Full RBAC** - правильные permissions
---
## Features
### Rollback Methods
| Method | Description | Example | Use Case |
|--------|-------------|---------|----------|
| **IMAGE_TAG** | По Docker image tag | `main-21` | Знаешь конкретный build number |
| **REVISION_NUMBER** | По Kubernetes revision | `2` | Откат на N шагов назад |
| **GIT_COMMIT** | По Git commit SHA | `abc123def` | Точное состояние кода |
### Parameters
```groovy
ROLLBACK_METHOD // Выбор метода
TARGET_VERSION // Целевая версия (auto-trim whitespace)
SKIP_HEALTH_CHECK // Пропустить health checks (default: false)
DRY_RUN // Только показать план (default: false)
```
---
## Setup Guide
### Step 1: Create Jenkins Pipeline
```
1. Jenkins → New Item
2. Name: demo-nginx-rollback
3. Type: Pipeline
4. Click OK
```
### Step 2: Configure Pipeline
```yaml
Pipeline:
Definition: Pipeline script from SCM
SCM: Git
Repository URL: http://gitea-http.gitea.svc.cluster.local:3000/admin/k3s-gitops
Credentials: gitea-credentials
Branch: */main
Script Path: apps/demo-nginx/Jenkinsfile.rollback
```
### Step 3: Verify RBAC
RBAC уже настроен в `apps/jenkins/rbac.yaml`:
```yaml
ClusterRole: jenkins-deployer
Permissions:
- pods, services, deployments (full CRUD)
- pods/exec, pods/log (for health checks)
- ingresses, applications (for ArgoCD)
```
### Step 4: Test with DRY_RUN
```
Jenkins → demo-nginx-rollback → Build with Parameters
├─ ROLLBACK_METHOD: IMAGE_TAG
├─ TARGET_VERSION: main-21
├─ DRY_RUN: ✅ true
└─ Build
```
---
## Usage Guide
### Quick Start
```
Jenkins → demo-nginx-rollback → Build with Parameters
┌─────────────────────────────────────┐
│ ROLLBACK_METHOD: IMAGE_TAG │
│ TARGET_VERSION: main-21 │
│ SKIP_HEALTH_CHECK: true (рекоменд.) │
│ DRY_RUN: false │
└─────────────────────────────────────┘
→ Build → ✅ SUCCESS!
```
### Pipeline Stages
```
Stage 1: Validate Input
└─ Trim whitespace, validate TARGET_VERSION
Stage 2: Show Current State
└─ Current deployment, image, pods, history
Stage 3: Prepare Rollback
└─ Build target image path or verify revision
Stage 4: Execute Rollback
├─ kubectl set image (or rollout undo)
└─ Git commit & push
Stage 5: Wait for Rollout
├─ kubectl rollout status (300s timeout)
└─ sleep 10s (stabilization)
Stage 6: Health Check (optional)
└─ 5 retry attempts with 5s delay
Stage 7: Show New State
└─ New deployment state, pods, history
```
---
## Rollback Methods
### Method 1: IMAGE_TAG (Recommended)
**Когда использовать:** Знаешь конкретный build number
**Как найти tag:**
```bash
# Docker Hub
https://hub.docker.com/r/vladcrypto/demo-nginx/tags
# Jenkins build history
Jenkins → demo-nginx → Build History
# Git commits
git log --oneline | grep "Update image"
```
**Example:**
```
ROLLBACK_METHOD: IMAGE_TAG
TARGET_VERSION: main-21
Result: Rollback to docker.io/vladcrypto/demo-nginx:main-21
```
---
### Method 2: REVISION_NUMBER
**Когда использовать:** Нужно откатиться на N шагов назад
**Как найти revision:**
```bash
kubectl rollout history deployment/demo-nginx -n demo-app
# Output:
REVISION CHANGE-CAUSE
1 Initial deployment
2 Update to main-20
3 Update to main-21
4 Update to main-22 (current)
```
**Example:**
```
ROLLBACK_METHOD: REVISION_NUMBER
TARGET_VERSION: 2
Result: Rollback to revision 2 (main-20)
```
---
### Method 3: GIT_COMMIT
**Когда использовать:** Нужно вернуться к конкретному состоянию кода
**Как найти commit:**
```bash
# Gitea
https://git.thedevops.dev/admin/k3s-gitops/commits/branch/main
# Git CLI
git log --oneline apps/demo-nginx/deployment.yaml
# Output:
abc123d Update image to main-22 (current)
def456e Update image to main-21
ghi789f Update image to main-20
```
**Example:**
```
ROLLBACK_METHOD: GIT_COMMIT
TARGET_VERSION: def456e
Result: Rollback to commit def456e
```
---
## Troubleshooting & Fixes
### Issue #1: Container Name Error ✅ FIXED
**Error:**
```
error: unable to find container named "demo-nginx"
```
**Root Cause:**
Pipeline использовал deployment name вместо container name.
**Fix:**
```groovy
environment {
APP_NAME = 'demo-nginx' // Deployment name
CONTAINER_NAME = 'nginx' // Container name ✅
}
kubectl set image deployment/${APP_NAME} \
${CONTAINER_NAME}=${TARGET_IMAGE}
```
**How to verify:**
```bash
kubectl get deployment demo-nginx -n demo-app \
-o jsonpath='{.spec.template.spec.containers[0].name}'
# Output: nginx
```
---
### Issue #2: Whitespace in Input ✅ FIXED
**Error:**
```
Target image: docker.io/vladcrypto/demo-nginx: main-21
^
Space!
```
**Root Cause:**
User ввел TARGET_VERSION с пробелом.
**Fix:**
```groovy
stage('Validate Input') {
// Auto-trim whitespace
env.TARGET_VERSION_CLEAN = params.TARGET_VERSION.trim()
// Use everywhere
${env.TARGET_VERSION_CLEAN}
}
```
---
### Issue #3: RBAC Permissions ✅ FIXED
**Error:**
```
Error: User "system:serviceaccount:jenkins:jenkins"
cannot create resource "pods/exec"
```
**Root Cause:**
Jenkins ServiceAccount не имел прав на pods/exec для health checks.
**Fix:**
```yaml
# apps/jenkins/rbac.yaml
rules:
- apiGroups: [""]
resources: ["pods/exec", "pods/log"] # ← Added!
verbs: ["create", "get"]
```
**Applied:**
```bash
kubectl apply -f apps/jenkins/rbac.yaml
```
---
### Issue #4: Health Check Timing ⚠️ WORKAROUND
**Error:**
```
wget: can't connect to remote host: Connection refused
```
**Root Cause:**
Health check runs too early during rolling update (race condition).
**Workaround:**
```groovy
// Option 1: Skip health check (recommended)
SKIP_HEALTH_CHECK: true
// Option 2: Longer stabilization wait
sleep 30 // Instead of 10
```
**Timeline:**
```
T+0s: kubectl set image
T+30s: Rollout status = complete
T+40s: sleep 10s
T+50s: Health check (pods might still be starting)
```
**Solution:**
Use `SKIP_HEALTH_CHECK: true` и проверь вручную через 30-60s:
```bash
kubectl get pods -n demo-app -l app=demo-nginx
```
---
### Issue #5: Bash Loop Syntax ✅ FIXED
**Error:**
```
Health check attempt {1..5}/5...
# Loop executed only once!
```
**Root Cause:**
`{1..5}` не работает в sh/dash, нужен bash.
**Fix:**
```bash
#!/bin/bash # ← Added shebang
set -e
# Fixed loop syntax
for i in 1 2 3 4 5; do # Instead of {1..5}
echo "Health check attempt $i/5..."
if kubectl exec ...; then
exit 0
fi
if [ $i -lt 5 ]; then
sleep 5
fi
done
```
---
## Best Practices
### 1. Always Use DRY_RUN First
```
Step 1: DRY_RUN=true → Проверь план
Step 2: Verify output
Step 3: DRY_RUN=false → Execute
```
### 2. Use SKIP_HEALTH_CHECK for Emergency
```
Emergency rollback:
├─ SKIP_HEALTH_CHECK: true
├─ Focus on speed
└─ Verify manually after
```
### 3. Document Rollback Reason
Add comment в Jenkins build:
```
Build Comment:
"Rollback due to: API errors in main-23
Previous working version: main-21
Impact: None (zero downtime)"
```
### 4. Monitor After Rollback
```bash
# Watch pods
watch kubectl get pods -n demo-app
# Check logs
kubectl logs -n demo-app -l app=demo-nginx -f
# Verify image
kubectl get deployment demo-nginx -n demo-app \
-o jsonpath='{.spec.template.spec.containers[0].image}'
```
### 5. Verify in ArgoCD
```
ArgoCD UI → demo-nginx
├─ Status: Synced ✅
└─ Health: Healthy ✅
```
---
## Examples
### Example 1: Quick Rollback to Previous Build
```
Scenario: Build #23 failed, rollback to #21
Steps:
1. Jenkins → demo-nginx-rollback
2. IMAGE_TAG + main-21
3. SKIP_HEALTH_CHECK: true
4. Build
Time: ~2 minutes
Result: ✅ SUCCESS
```
---
### Example 2: Rollback to Last Week's Version
```
Scenario: Need stable version from last week
Steps:
1. Find old build: Jenkins → Build History → #15
2. Check image tag: main-15
3. Jenkins → demo-nginx-rollback
4. IMAGE_TAG + main-15
5. DRY_RUN: true (verify first!)
6. DRY_RUN: false (execute)
Result: ✅ Rolled back to main-15
```
---
### Example 3: Rollback by Revision Number
```
Scenario: Откатить на 3 versions назад
Steps:
1. Check history:
kubectl rollout history deployment/demo-nginx -n demo-app
2. Find revision: 25 (current: 28)
3. Jenkins → demo-nginx-rollback
4. REVISION_NUMBER + 25
5. Build
Result: ✅ Rolled back to revision 25
```
---
### Example 4: Rollback by Git Commit
```
Scenario: Нужно точное состояние кода
Steps:
1. Find commit:
git log --oneline apps/demo-nginx/deployment.yaml
2. Copy SHA: abc123def
3. Jenkins → demo-nginx-rollback
4. GIT_COMMIT + abc123def
5. Build
Result: ✅ Rolled back to commit abc123def
```
---
## Manual Verification Commands
### Check Deployment Status
```bash
kubectl get deployment demo-nginx -n demo-app
# Expected:
NAME READY UP-TO-DATE AVAILABLE AGE
demo-nginx 2/2 2 2 15h
```
### Check Image Version
```bash
kubectl get deployment demo-nginx -n demo-app \
-o jsonpath='{.spec.template.spec.containers[0].image}'
# Expected: docker.io/vladcrypto/demo-nginx:main-21
```
### Check Pods
```bash
kubectl get pods -n demo-app -l app=demo-nginx
# Expected: 2 pods Running
```
### Check Rollout History
```bash
kubectl rollout history deployment/demo-nginx -n demo-app
# Shows all revisions
```
### Test Health Endpoint
```bash
POD=$(kubectl get pods -n demo-app -l app=demo-nginx -o jsonpath='{.items[0].metadata.name}')
kubectl exec $POD -n demo-app -- wget -q -O- http://localhost/health
# Expected: healthy
```
---
## Emergency Rollback Procedure
### If Production is Down
**Option 1: Jenkins (2 minutes)**
```
1. Jenkins → demo-nginx-rollback
2. IMAGE_TAG → last known good version
3. SKIP_HEALTH_CHECK: ✅ true
4. Build
```
**Option 2: kubectl (30 seconds)**
```bash
# Fastest - rollback to previous
kubectl rollout undo deployment/demo-nginx -n demo-app
# To specific revision
kubectl rollout undo deployment/demo-nginx -n demo-app --to-revision=25
```
**Option 3: ArgoCD (1 minute)**
```
1. ArgoCD UI → demo-nginx
2. History → Select previous version
3. Rollback button
```
---
## Configuration Reference
### Environment Variables
```groovy
APP_NAME = 'demo-nginx' // Deployment name
CONTAINER_NAME = 'nginx' // Container name
NAMESPACE = 'demo-app' // K8s namespace
DOCKER_REGISTRY = 'docker.io' // Registry
DOCKER_REPO = 'vladcrypto' // Docker Hub user
HEALTH_CHECK_TIMEOUT = '300s' // Rollout timeout
```
### Customization
Изменить настройки в Jenkinsfile.rollback:
```groovy
// Увеличить timeout
HEALTH_CHECK_TIMEOUT = '600s'
// Больше попыток health check
for i in 1 2 3 4 5 6 7 8 9 10; do
// Дольше ждать stabilization
sleep 30 // Instead of 10
```
---
## Monitoring & Alerts
### Grafana Dashboard
```promql
# Rollback count
sum(increase(deployment_rollback_total[1h])) by (deployment)
# Rollback rate
rate(deployment_rollback_total[5m])
# Average rollback duration
avg(deployment_rollback_duration_seconds)
```
### Alert Rules
```yaml
- alert: FrequentRollbacks
expr: rate(deployment_rollback_total[1h]) > 2
annotations:
summary: "Frequent rollbacks detected"
description: "More than 2 rollbacks in last hour"
- alert: RollbackFailed
expr: deployment_rollback_failed_total > 0
annotations:
summary: "Rollback failed"
description: "Manual intervention required"
```
---
## Summary of All Fixes
| # | Issue | Fix | Status |
|---|-------|-----|--------|
| 1 | Container name wrong | Use `nginx` not `demo-nginx` | ✅ Fixed |
| 2 | Whitespace in input | Auto-trim with `.trim()` | ✅ Fixed |
| 3 | RBAC pods/exec | Add permission to ClusterRole | ✅ Fixed |
| 4 | Health check timing | Use `SKIP_HEALTH_CHECK=true` | ⚠️ Workaround |
| 5 | Bash loop syntax | Use explicit list `1 2 3 4 5` | ✅ Fixed |
---
## Success Criteria
**Rollback Methods:** 3/3 working (IMAGE_TAG, REVISION, GIT_COMMIT)
**GitOps Sync:** Git commits automatically
**Zero Downtime:** Rolling updates
**RBAC:** Full permissions configured
**Input Validation:** Whitespace auto-trimmed
**DRY_RUN:** Safe testing mode
**Retry Logic:** 5 attempts with proper bash syntax
⚠️ **Health Check:** Optional (use SKIP_HEALTH_CHECK=true)
---
## FAQ
### Q: Health check всегда падает, это нормально?
**A:** Да, из-за timing race condition во время rolling update. Используй `SKIP_HEALTH_CHECK: true` и проверь вручную через 30-60s.
### Q: Как откатиться на несколько версий назад?
**A:** Используй `REVISION_NUMBER` метод и укажи нужную revision из `kubectl rollout history`.
### Q: Можно ли откатить только в staging?
**A:** Да, измени `NAMESPACE` в Jenkinsfile или создай отдельный job для staging.
### Q: Как быстро откатиться в emergency?
**A:** Используй `kubectl rollout undo` (30 секунд) или Jenkins с `SKIP_HEALTH_CHECK=true` (2 минуты).
### Q: Что если Git commit fail?
**A:** Rollback всё равно произошёл в Kubernetes! Git нужен только для GitOps sync. ArgoCD пере-синкает через 3 минуты.
---
## Related Documentation
- [CI/CD Guide](../../../CICD_GUIDE.md)
- [Automatic Rollback](../Jenkinsfile) - See `post { failure }` section
- [Jenkins RBAC](../../jenkins/rbac.yaml)
- [Deployment Manifest](../deployment.yaml)
---
## Support
**Issues?**
- Check Jenkins console output
- Verify RBAC permissions
- Check pod status: `kubectl get pods -n demo-app`
- Review ArgoCD sync status
**Need Help?**
- Jenkins logs: Jenkins → Build → Console Output
- Kubernetes events: `kubectl get events -n demo-app`
- Pod logs: `kubectl logs -n demo-app -l app=demo-nginx`
---
**Last Updated:** 2026-01-06
**Version:** 1.0
**Status:** Production Ready ✅