Files
k3s-gitops/apps/demo-nginx/docs/ROLLBACK_MANUAL.md

15 KiB
Raw Blame History

🔄 Manual Rollback Feature - Complete Documentation

📋 Table of Contents

  1. Overview
  2. Features
  3. Setup Guide
  4. Usage Guide
  5. Rollback Methods
  6. Troubleshooting & Fixes
  7. Best Practices
  8. Examples

Overview

Manual Rollback feature позволяет откатить deployment на любую предыдущую версию через Jenkins Pipeline.

Key Features:

  • 3 способа rollback (IMAGE_TAG, REVISION_NUMBER, GIT_COMMIT)
  • GitOps sync - автоматически обновляет Git manifests
  • Zero downtime - rolling updates
  • DRY_RUN mode - безопасное тестирование
  • Health checks - опциональная проверка после rollback
  • Full RBAC - правильные permissions

Features

Rollback Methods

Method Description Example Use Case
IMAGE_TAG По Docker image tag main-21 Знаешь конкретный build number
REVISION_NUMBER По Kubernetes revision 2 Откат на N шагов назад
GIT_COMMIT По Git commit SHA abc123def Точное состояние кода

Parameters

ROLLBACK_METHOD      // Выбор метода
TARGET_VERSION       // Целевая версия (auto-trim whitespace)
SKIP_HEALTH_CHECK    // Пропустить health checks (default: false)
DRY_RUN             // Только показать план (default: false)

Setup Guide

Step 1: Create Jenkins Pipeline

1. Jenkins → New Item
2. Name: demo-nginx-rollback
3. Type: Pipeline
4. Click OK

Step 2: Configure Pipeline

Pipeline:
  Definition: Pipeline script from SCM
  SCM: Git
  Repository URL: http://gitea-http.gitea.svc.cluster.local:3000/admin/k3s-gitops
  Credentials: gitea-credentials
  Branch: */main
  Script Path: apps/demo-nginx/Jenkinsfile.rollback

Step 3: Verify RBAC

RBAC уже настроен в apps/jenkins/rbac.yaml:

ClusterRole: jenkins-deployer
Permissions:
  - pods, services, deployments (full CRUD)
  - pods/exec, pods/log (for health checks)
  - ingresses, applications (for ArgoCD)

Step 4: Test with DRY_RUN

Jenkins → demo-nginx-rollback → Build with Parameters
├─ ROLLBACK_METHOD: IMAGE_TAG
├─ TARGET_VERSION: main-21
├─ DRY_RUN: ✅ true
└─ Build

Usage Guide

Quick Start

Jenkins → demo-nginx-rollback → Build with Parameters

┌─────────────────────────────────────┐
│ ROLLBACK_METHOD: IMAGE_TAG          │
│ TARGET_VERSION: main-21             │
│ SKIP_HEALTH_CHECK: true (рекоменд.) │
│ DRY_RUN: false                      │
└─────────────────────────────────────┘

→ Build → ✅ SUCCESS!

Pipeline Stages

Stage 1: Validate Input
  └─ Trim whitespace, validate TARGET_VERSION

Stage 2: Show Current State
  └─ Current deployment, image, pods, history

Stage 3: Prepare Rollback
  └─ Build target image path or verify revision

Stage 4: Execute Rollback
  ├─ kubectl set image (or rollout undo)
  └─ Git commit & push

Stage 5: Wait for Rollout
  ├─ kubectl rollout status (300s timeout)
  └─ sleep 10s (stabilization)

Stage 6: Health Check (optional)
  └─ 5 retry attempts with 5s delay

Stage 7: Show New State
  └─ New deployment state, pods, history

Rollback Methods

Когда использовать: Знаешь конкретный build number

Как найти tag:

# Docker Hub
https://hub.docker.com/r/vladcrypto/demo-nginx/tags

# Jenkins build history
Jenkins → demo-nginx → Build History

# Git commits
git log --oneline | grep "Update image"

Example:

ROLLBACK_METHOD: IMAGE_TAG
TARGET_VERSION: main-21

Result: Rollback to docker.io/vladcrypto/demo-nginx:main-21

Method 2: REVISION_NUMBER

Когда использовать: Нужно откатиться на N шагов назад

Как найти revision:

kubectl rollout history deployment/demo-nginx -n demo-app

# Output:
REVISION  CHANGE-CAUSE
1         Initial deployment
2         Update to main-20
3         Update to main-21
4         Update to main-22 (current)

Example:

ROLLBACK_METHOD: REVISION_NUMBER
TARGET_VERSION: 2

Result: Rollback to revision 2 (main-20)

Method 3: GIT_COMMIT

Когда использовать: Нужно вернуться к конкретному состоянию кода

Как найти commit:

# Gitea
https://git.thedevops.dev/admin/k3s-gitops/commits/branch/main

# Git CLI
git log --oneline apps/demo-nginx/deployment.yaml

# Output:
abc123d Update image to main-22 (current)
def456e Update image to main-21
ghi789f Update image to main-20

Example:

ROLLBACK_METHOD: GIT_COMMIT
TARGET_VERSION: def456e

Result: Rollback to commit def456e

Troubleshooting & Fixes

Issue #1: Container Name Error FIXED

Error:

error: unable to find container named "demo-nginx"

Root Cause: Pipeline использовал deployment name вместо container name.

Fix:

environment {
    APP_NAME = 'demo-nginx'       // Deployment name
    CONTAINER_NAME = 'nginx'      // Container name ✅
}

kubectl set image deployment/${APP_NAME} \
    ${CONTAINER_NAME}=${TARGET_IMAGE}

How to verify:

kubectl get deployment demo-nginx -n demo-app \
  -o jsonpath='{.spec.template.spec.containers[0].name}'
# Output: nginx

Issue #2: Whitespace in Input FIXED

Error:

Target image: docker.io/vladcrypto/demo-nginx: main-21
                                              ^
                                              Space!

Root Cause: User ввел TARGET_VERSION с пробелом.

Fix:

stage('Validate Input') {
    // Auto-trim whitespace
    env.TARGET_VERSION_CLEAN = params.TARGET_VERSION.trim()
    
    // Use everywhere
    ${env.TARGET_VERSION_CLEAN}
}

Issue #3: RBAC Permissions FIXED

Error:

Error: User "system:serviceaccount:jenkins:jenkins" 
cannot create resource "pods/exec"

Root Cause: Jenkins ServiceAccount не имел прав на pods/exec для health checks.

Fix:

# apps/jenkins/rbac.yaml
rules:
- apiGroups: [""]
  resources: ["pods/exec", "pods/log"]  # ← Added!
  verbs: ["create", "get"]

Applied:

kubectl apply -f apps/jenkins/rbac.yaml

Issue #4: Health Check Timing ⚠️ WORKAROUND

Error:

wget: can't connect to remote host: Connection refused

Root Cause: Health check runs too early during rolling update (race condition).

Workaround:

// Option 1: Skip health check (recommended)
SKIP_HEALTH_CHECK: true

// Option 2: Longer stabilization wait
sleep 30  // Instead of 10

Timeline:

T+0s:  kubectl set image
T+30s: Rollout status = complete
T+40s: sleep 10s
T+50s: Health check (pods might still be starting)

Solution: Use SKIP_HEALTH_CHECK: true и проверь вручную через 30-60s:

kubectl get pods -n demo-app -l app=demo-nginx

Issue #5: Bash Loop Syntax FIXED

Error:

Health check attempt {1..5}/5...
# Loop executed only once!

Root Cause: {1..5} не работает в sh/dash, нужен bash.

Fix:

#!/bin/bash  # ← Added shebang
set -e

# Fixed loop syntax
for i in 1 2 3 4 5; do  # Instead of {1..5}
    echo "Health check attempt $i/5..."
    if kubectl exec ...; then
        exit 0
    fi
    if [ $i -lt 5 ]; then
        sleep 5
    fi
done

Best Practices

1. Always Use DRY_RUN First

Step 1: DRY_RUN=true  → Проверь план
Step 2: Verify output
Step 3: DRY_RUN=false → Execute

2. Use SKIP_HEALTH_CHECK for Emergency

Emergency rollback:
├─ SKIP_HEALTH_CHECK: true
├─ Focus on speed
└─ Verify manually after

3. Document Rollback Reason

Add comment в Jenkins build:

Build Comment:
"Rollback due to: API errors in main-23
Previous working version: main-21
Impact: None (zero downtime)"

4. Monitor After Rollback

# Watch pods
watch kubectl get pods -n demo-app

# Check logs
kubectl logs -n demo-app -l app=demo-nginx -f

# Verify image
kubectl get deployment demo-nginx -n demo-app \
  -o jsonpath='{.spec.template.spec.containers[0].image}'

5. Verify in ArgoCD

ArgoCD UI → demo-nginx
├─ Status: Synced ✅
└─ Health: Healthy ✅

Examples

Example 1: Quick Rollback to Previous Build

Scenario: Build #23 failed, rollback to #21

Steps:
1. Jenkins → demo-nginx-rollback
2. IMAGE_TAG + main-21
3. SKIP_HEALTH_CHECK: true
4. Build

Time: ~2 minutes
Result: ✅ SUCCESS

Example 2: Rollback to Last Week's Version

Scenario: Need stable version from last week

Steps:
1. Find old build: Jenkins → Build History → #15
2. Check image tag: main-15
3. Jenkins → demo-nginx-rollback
4. IMAGE_TAG + main-15
5. DRY_RUN: true (verify first!)
6. DRY_RUN: false (execute)

Result: ✅ Rolled back to main-15

Example 3: Rollback by Revision Number

Scenario: Откатить на 3 versions назад

Steps:
1. Check history:
   kubectl rollout history deployment/demo-nginx -n demo-app
   
2. Find revision: 25 (current: 28)

3. Jenkins → demo-nginx-rollback
4. REVISION_NUMBER + 25
5. Build

Result: ✅ Rolled back to revision 25

Example 4: Rollback by Git Commit

Scenario: Нужно точное состояние кода

Steps:
1. Find commit:
   git log --oneline apps/demo-nginx/deployment.yaml
   
2. Copy SHA: abc123def

3. Jenkins → demo-nginx-rollback
4. GIT_COMMIT + abc123def
5. Build

Result: ✅ Rolled back to commit abc123def

Manual Verification Commands

Check Deployment Status

kubectl get deployment demo-nginx -n demo-app

# Expected:
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
demo-nginx   2/2     2            2           15h

Check Image Version

kubectl get deployment demo-nginx -n demo-app \
  -o jsonpath='{.spec.template.spec.containers[0].image}'

# Expected: docker.io/vladcrypto/demo-nginx:main-21

Check Pods

kubectl get pods -n demo-app -l app=demo-nginx

# Expected: 2 pods Running

Check Rollout History

kubectl rollout history deployment/demo-nginx -n demo-app

# Shows all revisions

Test Health Endpoint

POD=$(kubectl get pods -n demo-app -l app=demo-nginx -o jsonpath='{.items[0].metadata.name}')
kubectl exec $POD -n demo-app -- wget -q -O- http://localhost/health

# Expected: healthy

Emergency Rollback Procedure

If Production is Down

Option 1: Jenkins (2 minutes)

1. Jenkins → demo-nginx-rollback
2. IMAGE_TAG → last known good version
3. SKIP_HEALTH_CHECK: ✅ true
4. Build

Option 2: kubectl (30 seconds)

# Fastest - rollback to previous
kubectl rollout undo deployment/demo-nginx -n demo-app

# To specific revision
kubectl rollout undo deployment/demo-nginx -n demo-app --to-revision=25

Option 3: ArgoCD (1 minute)

1. ArgoCD UI → demo-nginx
2. History → Select previous version
3. Rollback button

Configuration Reference

Environment Variables

APP_NAME = 'demo-nginx'                  // Deployment name
CONTAINER_NAME = 'nginx'                 // Container name
NAMESPACE = 'demo-app'                   // K8s namespace
DOCKER_REGISTRY = 'docker.io'            // Registry
DOCKER_REPO = 'vladcrypto'              // Docker Hub user
HEALTH_CHECK_TIMEOUT = '300s'            // Rollout timeout

Customization

Изменить настройки в Jenkinsfile.rollback:

// Увеличить timeout
HEALTH_CHECK_TIMEOUT = '600s'

// Больше попыток health check
for i in 1 2 3 4 5 6 7 8 9 10; do

// Дольше ждать stabilization
sleep 30  // Instead of 10

Monitoring & Alerts

Grafana Dashboard

# Rollback count
sum(increase(deployment_rollback_total[1h])) by (deployment)

# Rollback rate
rate(deployment_rollback_total[5m])

# Average rollback duration
avg(deployment_rollback_duration_seconds)

Alert Rules

- alert: FrequentRollbacks
  expr: rate(deployment_rollback_total[1h]) > 2
  annotations:
    summary: "Frequent rollbacks detected"
    description: "More than 2 rollbacks in last hour"

- alert: RollbackFailed
  expr: deployment_rollback_failed_total > 0
  annotations:
    summary: "Rollback failed"
    description: "Manual intervention required"

Summary of All Fixes

# Issue Fix Status
1 Container name wrong Use nginx not demo-nginx Fixed
2 Whitespace in input Auto-trim with .trim() Fixed
3 RBAC pods/exec Add permission to ClusterRole Fixed
4 Health check timing Use SKIP_HEALTH_CHECK=true ⚠️ Workaround
5 Bash loop syntax Use explicit list 1 2 3 4 5 Fixed

Success Criteria

Rollback Methods: 3/3 working (IMAGE_TAG, REVISION, GIT_COMMIT)
GitOps Sync: Git commits automatically
Zero Downtime: Rolling updates
RBAC: Full permissions configured
Input Validation: Whitespace auto-trimmed
DRY_RUN: Safe testing mode
Retry Logic: 5 attempts with proper bash syntax
⚠️ Health Check: Optional (use SKIP_HEALTH_CHECK=true)


FAQ

Q: Health check всегда падает, это нормально?

A: Да, из-за timing race condition во время rolling update. Используй SKIP_HEALTH_CHECK: true и проверь вручную через 30-60s.

Q: Как откатиться на несколько версий назад?

A: Используй REVISION_NUMBER метод и укажи нужную revision из kubectl rollout history.

Q: Можно ли откатить только в staging?

A: Да, измени NAMESPACE в Jenkinsfile или создай отдельный job для staging.

Q: Как быстро откатиться в emergency?

A: Используй kubectl rollout undo (30 секунд) или Jenkins с SKIP_HEALTH_CHECK=true (2 минуты).

Q: Что если Git commit fail?

A: Rollback всё равно произошёл в Kubernetes! Git нужен только для GitOps sync. ArgoCD пере-синкает через 3 минуты.



Support

Issues?

  • Check Jenkins console output
  • Verify RBAC permissions
  • Check pod status: kubectl get pods -n demo-app
  • Review ArgoCD sync status

Need Help?

  • Jenkins logs: Jenkins → Build → Console Output
  • Kubernetes events: kubectl get events -n demo-app
  • Pod logs: kubectl logs -n demo-app -l app=demo-nginx

Last Updated: 2026-01-06
Version: 1.0
Status: Production Ready