Files
k3s-gitops/docs/gitops-cicd/gitlab-harbor-swarm-automation-solution.md

1320 lines
41 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# GitLab + Harbor + Docker Swarm: Automated Deployment Solution
**Версия:** 1.0
**Дата создания:** Январь 2026
**Статус:** Implementation Ready
**Целевая аудитория:** DevOps Team, Development Team
---
## Executive Summary
Данный документ описывает практическое решение для автоматизации deployment процесса в существующей инфраструктуре:
**Текущая ситуация:**
- ✅ GitLab уже установлен
- ✅ Harbor Registry уже работает
- ✅ Docker Swarm с несколькими контейнерами
- ✅ 4 окружения: Development → Sandbox → Testing → Production
- ❌ Ручной deployment через bash скрипты
- ❌ Нет процесса code review
- ❌ Нет автоматического rollback
- ❌ Получаем готовые images из Harbor без visibility
**Предлагаемое решение:**
- GitLab CI/CD pipelines для автоматического deployment
- GitOps подход: Git как source of truth для deployments
- Автоматический deployment по средам с approval gates
- One-click rollback capability
- Deployment history и audit trail
- Health checks и автоматический rollback при failure
**Результаты внедрения:**
- 🚀 Deployment time: с 30-60 минут → 5-10 минут
- 🔒 Human errors: reduction на 90%
- 📊 Full visibility: кто, что, когда deployed
- ⚡ Rollback: с 1-2 часов → 2-3 минуты
- ✅ Compliance: полный audit trail
---
## Содержание
1. [Архитектура решения](#1-архитектура-решения)
2. [GitLab CI/CD Pipeline Implementation](#2-gitlab-cicd-pipeline-implementation)
3. [Docker Stack Management](#3-docker-stack-management)
4. [Environment Management Strategy](#4-environment-management-strategy)
5. [Rollback Strategy](#5-rollback-strategy)
6. [Monitoring & Health Checks](#6-monitoring--health-checks)
7. [Implementation Roadmap](#7-implementation-roadmap)
8. [Best Practices](#8-best-practices)
---
## 1. Архитектура решения
### 1.1 Current State Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Current Manual Process │
├─────────────────────────────────────────────────────────────┤
│ │
│ Developer → Build Image → Push to Harbor │
│ ↓ │
│ Notify DevOps Team │
│ ↓ │
│ DevOps manually runs bash scripts: │
│ │
│ 1. SSH to Swarm manager │
│ 2. docker service update app --image harbor/app:new-tag │
│ 3. Check logs manually │
│ 4. Hope everything works │
│ 5. Repeat for each environment (4x) │
│ │
│ Problems: │
│ • Time consuming (30-60 min per environment) │
│ • Error prone (typos, wrong tags) │
│ • No rollback plan │
│ • No audit trail │
│ • No validation before deployment │
└─────────────────────────────────────────────────────────────┘
```
### 1.2 Target Automated Architecture
```
┌──────────────────────────────────────────────────────────────┐
│ Automated GitOps-Based Solution │
├──────────────────────────────────────────────────────────────┤
│ │
│ Developer pushes image tag change to Git │
│ ↓ │
│ GitLab CI/CD Pipeline automatically: │
│ ↓ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ 1. Validate docker-compose.yml syntax │ │
│ │ 2. Check image exists in Harbor │ │
│ │ 3. Deploy to Development (automatic) │ │
│ │ 4. Run health checks │ │
│ │ 5. Wait for manual approval → Sandbox │ │
│ │ 6. Deploy to Sandbox │ │
│ │ 7. Wait for manual approval → Testing │ │
│ │ 8. Deploy to Testing │ │
│ │ 9. Wait for manual approval → Production │ │
│ │ 10. Deploy to Production │ │
│ │ 11. Monitor deployment success │ │
│ │ 12. Auto-rollback if health checks fail │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ Benefits: │
│ ✅ 5-10 minutes per environment │
│ ✅ Zero human errors │
│ ✅ Automatic rollback on failure │
│ ✅ Complete audit trail in Git │
│ ✅ Pre-deployment validation │
└──────────────────────────────────────────────────────────────┘
```
### 1.3 Git Repository Structure
```
deployment-configs/ # New GitLab repository
├── README.md
├── .gitlab-ci.yml # CI/CD pipeline definition
├── environments/
│ ├── development/
│ │ ├── docker-compose.yml
│ │ ├── .env
│ │ └── healthcheck.sh
│ │
│ ├── sandbox/
│ │ ├── docker-compose.yml
│ │ ├── .env
│ │ └── healthcheck.sh
│ │
│ ├── testing/
│ │ ├── docker-compose.yml
│ │ ├── .env
│ │ └── healthcheck.sh
│ │
│ └── production/
│ ├── docker-compose.yml
│ ├── .env
│ └── healthcheck.sh
├── scripts/
│ ├── deploy.sh # Deployment script
│ ├── rollback.sh # Rollback script
│ ├── healthcheck.sh # Health validation
│ └── validate-compose.sh # Pre-deployment validation
└── docs/
├── deployment-guide.md
└── rollback-procedure.md
```
---
## 2. GitLab CI/CD Pipeline Implementation
### 2.1 Complete .gitlab-ci.yml
```yaml
# .gitlab-ci.yml - Complete automated deployment pipeline
variables:
DOCKER_HOST: "tcp://docker-swarm-manager:2376"
DOCKER_TLS_VERIFY: "1"
HARBOR_REGISTRY: "harbor.company.com"
# Swarm connection details (stored in GitLab CI/CD variables)
# SWARM_DEV_HOST, SWARM_SANDBOX_HOST, SWARM_TEST_HOST, SWARM_PROD_HOST
# SWARM_SSH_KEY (SSH private key for authentication)
stages:
- validate
- deploy-dev
- deploy-sandbox
- deploy-testing
- deploy-production
- rollback
#═══════════════════════════════════════════════════════════
# Stage 1: VALIDATION
#═══════════════════════════════════════════════════════════
validate:syntax:
stage: validate
image: docker:24-cli
script:
- echo "Validating docker-compose files..."
- |
for env in development sandbox testing production; do
echo "Checking $env environment..."
docker-compose -f environments/$env/docker-compose.yml config > /dev/null
if [ $? -eq 0 ]; then
echo "✅ $env: Syntax OK"
else
echo "❌ $env: Syntax ERROR"
exit 1
fi
done
only:
- branches
tags:
- docker
validate:images:
stage: validate
image: docker:24-cli
before_script:
- docker login -u $HARBOR_USER -p $HARBOR_PASSWORD $HARBOR_REGISTRY
script:
- echo "Checking if images exist in Harbor..."
- |
for env in development sandbox testing production; do
echo "Checking images for $env..."
# Extract image tags from docker-compose
images=$(grep "image:" environments/$env/docker-compose.yml | awk '{print $2}')
for image in $images; do
echo "Pulling $image to verify existence..."
docker pull $image
if [ $? -eq 0 ]; then
echo "✅ Image exists: $image"
else
echo "❌ Image NOT found: $image"
exit 1
fi
done
done
only:
- branches
tags:
- docker
#═══════════════════════════════════════════════════════════
# Stage 2: DEPLOY TO DEVELOPMENT (Automatic)
#═══════════════════════════════════════════════════════════
deploy:development:
stage: deploy-dev
image: alpine:latest
before_script:
- apk add --no-cache openssh-client bash docker-cli
- eval $(ssh-agent -s)
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan -H $SWARM_DEV_HOST >> ~/.ssh/known_hosts
script:
- echo "🚀 Deploying to DEVELOPMENT environment..."
# Copy files to swarm manager
- scp -r environments/development root@$SWARM_DEV_HOST:/tmp/deploy/
- scp scripts/deploy.sh root@$SWARM_DEV_HOST:/tmp/deploy/
# Execute deployment
- |
ssh root@$SWARM_DEV_HOST bash << 'EOF'
cd /tmp/deploy/development
# Load environment variables
source .env
# Deploy stack
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
# Wait for services to stabilize
echo "Waiting for services to start..."
sleep 30
# Check service status
docker stack services app-stack
# Run health checks
bash ../healthcheck.sh
EOF
- echo "✅ Deployment to DEVELOPMENT completed"
environment:
name: development
url: https://dev.company.com
on_stop: stop:development
only:
- main
- develop
tags:
- deployment
#═══════════════════════════════════════════════════════════
# Stage 3: DEPLOY TO SANDBOX (Manual Approval Required)
#═══════════════════════════════════════════════════════════
deploy:sandbox:
stage: deploy-sandbox
image: alpine:latest
before_script:
- apk add --no-cache openssh-client bash docker-cli
- eval $(ssh-agent -s)
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan -H $SWARM_SANDBOX_HOST >> ~/.ssh/known_hosts
script:
- echo "🚀 Deploying to SANDBOX environment..."
- scp -r environments/sandbox root@$SWARM_SANDBOX_HOST:/tmp/deploy/
- |
ssh root@$SWARM_SANDBOX_HOST bash << 'EOF'
cd /tmp/deploy/sandbox
source .env
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
sleep 30
docker stack services app-stack
bash ../healthcheck.sh
EOF
- echo "✅ Deployment to SANDBOX completed"
environment:
name: sandbox
url: https://sandbox.company.com
when: manual # ⚠️ Requires manual approval
only:
- main
tags:
- deployment
#═══════════════════════════════════════════════════════════
# Stage 4: DEPLOY TO TESTING (Manual Approval Required)
#═══════════════════════════════════════════════════════════
deploy:testing:
stage: deploy-testing
image: alpine:latest
before_script:
- apk add --no-cache openssh-client bash docker-cli
- eval $(ssh-agent -s)
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan -H $SWARM_TEST_HOST >> ~/.ssh/known_hosts
script:
- echo "🚀 Deploying to TESTING environment..."
- scp -r environments/testing root@$SWARM_TEST_HOST:/tmp/deploy/
- |
ssh root@$SWARM_TEST_HOST bash << 'EOF'
cd /tmp/deploy/testing
source .env
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
sleep 30
docker stack services app-stack
bash ../healthcheck.sh
EOF
- echo "✅ Deployment to TESTING completed"
environment:
name: testing
url: https://testing.company.com
when: manual # ⚠️ Requires manual approval
only:
- main
tags:
- deployment
#═══════════════════════════════════════════════════════════
# Stage 5: DEPLOY TO PRODUCTION (Manual Approval Required)
#═══════════════════════════════════════════════════════════
deploy:production:
stage: deploy-production
image: alpine:latest
before_script:
- apk add --no-cache openssh-client bash docker-cli
- eval $(ssh-agent -s)
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan -H $SWARM_PROD_HOST >> ~/.ssh/known_hosts
script:
- echo "🚀 Deploying to PRODUCTION environment..."
# Backup current deployment
- |
ssh root@$SWARM_PROD_HOST bash << 'EOF'
echo "Creating backup of current deployment..."
mkdir -p /backup/deployments/$(date +%Y%m%d-%H%M%S)
docker stack services app-stack --format "{{.Name}} {{.Image}}" > /backup/deployments/$(date +%Y%m%d-%H%M%S)/services.txt
echo "Backup created"
EOF
# Deploy new version
- scp -r environments/production root@$SWARM_PROD_HOST:/tmp/deploy/
- |
ssh root@$SWARM_PROD_HOST bash << 'EOF'
cd /tmp/deploy/production
source .env
echo "Starting production deployment..."
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
echo "Waiting for services to stabilize..."
sleep 60
echo "Checking service health..."
docker stack services app-stack
# Run comprehensive health checks
bash ../healthcheck.sh
if [ $? -eq 0 ]; then
echo "✅ Health checks PASSED"
else
echo "❌ Health checks FAILED - consider rollback"
exit 1
fi
EOF
- echo "✅ Deployment to PRODUCTION completed successfully"
environment:
name: production
url: https://app.company.com
when: manual # ⚠️ Requires manual approval + confirmation
only:
- main
tags:
- deployment
#═══════════════════════════════════════════════════════════
# ROLLBACK JOBS (Manual Trigger)
#═══════════════════════════════════════════════════════════
rollback:production:
stage: rollback
image: alpine:latest
before_script:
- apk add --no-cache openssh-client bash docker-cli git
- eval $(ssh-agent -s)
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan -H $SWARM_PROD_HOST >> ~/.ssh/known_hosts
script:
- echo "🔄 Rolling back PRODUCTION to previous version..."
# Get previous Git commit
- PREVIOUS_COMMIT=$(git rev-parse HEAD~1)
- echo "Rolling back to commit: $PREVIOUS_COMMIT"
# Checkout previous version
- git checkout $PREVIOUS_COMMIT -- environments/production/
# Deploy previous version
- scp -r environments/production root@$SWARM_PROD_HOST:/tmp/rollback/
- |
ssh root@$SWARM_PROD_HOST bash << 'EOF'
cd /tmp/rollback/production
source .env
echo "Rolling back to previous version..."
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
sleep 30
echo "Verifying rollback..."
docker stack services app-stack
bash ../healthcheck.sh
EOF
- echo "✅ Rollback completed"
environment:
name: production
action: rollback
when: manual
only:
- main
tags:
- deployment
```
---
## 3. Docker Stack Management
### 3.1 Example docker-compose.yml Structure
```yaml
# environments/production/docker-compose.yml
version: '3.8'
services:
#════════════════════════════════════════════════════════
# Frontend Application
#════════════════════════════════════════════════════════
frontend:
image: ${HARBOR_REGISTRY}/company/frontend:${FRONTEND_VERSION}
networks:
- app-network
ports:
- "80:80"
- "443:443"
deploy:
replicas: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
monitor: 30s
rollback_config:
parallelism: 1
delay: 5s
restart_policy:
condition: any
delay: 5s
max_attempts: 3
placement:
constraints:
- node.role == worker
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
#════════════════════════════════════════════════════════
# Backend API
#════════════════════════════════════════════════════════
api:
image: ${HARBOR_REGISTRY}/company/api:${API_VERSION}
networks:
- app-network
- db-network
environment:
- DATABASE_URL=${DATABASE_URL}
- REDIS_URL=${REDIS_URL}
- JWT_SECRET=${JWT_SECRET}
secrets:
- db_password
- jwt_secret
deploy:
replicas: 5
update_config:
parallelism: 2
delay: 10s
failure_action: rollback
monitor: 45s
rollback_config:
parallelism: 2
delay: 5s
restart_policy:
condition: any
delay: 5s
max_attempts: 3
placement:
constraints:
- node.role == worker
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
#════════════════════════════════════════════════════════
# Worker Service
#════════════════════════════════════════════════════════
worker:
image: ${HARBOR_REGISTRY}/company/worker:${WORKER_VERSION}
networks:
- app-network
- db-network
environment:
- REDIS_URL=${REDIS_URL}
- QUEUE_NAME=jobs
deploy:
replicas: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
restart_policy:
condition: any
delay: 10s
max_attempts: 3
placement:
constraints:
- node.role == worker
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
#════════════════════════════════════════════════════════
# Cache (Redis)
#════════════════════════════════════════════════════════
redis:
image: redis:7-alpine
networks:
- app-network
deploy:
replicas: 1
placement:
constraints:
- node.role == worker
restart_policy:
condition: any
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 3
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
#════════════════════════════════════════════════════════
# Networks
#════════════════════════════════════════════════════════
networks:
app-network:
driver: overlay
attachable: true
db-network:
driver: overlay
internal: true
#════════════════════════════════════════════════════════
# Secrets
#════════════════════════════════════════════════════════
secrets:
db_password:
external: true
jwt_secret:
external: true
```
### 3.2 Environment Variables (.env files)
```bash
# environments/production/.env
# Harbor Registry
HARBOR_REGISTRY=harbor.company.com
# Application Versions (THIS IS WHAT YOU UPDATE!)
FRONTEND_VERSION=v2.1.5
API_VERSION=v3.2.1
WORKER_VERSION=v1.8.3
# Database Configuration
DATABASE_URL=postgresql://user@db-prod:5432/appdb
# Redis Configuration
REDIS_URL=redis://redis:6379
# Application Configuration
JWT_SECRET_FILE=/run/secrets/jwt_secret
LOG_LEVEL=info
ENVIRONMENT=production
```
### 3.3 Health Check Script
```bash
#!/bin/bash
# environments/production/healthcheck.sh
set -e
echo "═══════════════════════════════════════════════"
echo "Running Health Checks for Production"
echo "═══════════════════════════════════════════════"
STACK_NAME="app-stack"
FAILED=0
# Check if all services are running
echo ""
echo "1⃣ Checking service status..."
SERVICES=$(docker stack services $STACK_NAME --format "{{.Name}}")
for service in $SERVICES; do
REPLICAS=$(docker service ls --filter name=$service --format "{{.Replicas}}")
echo " $service: $REPLICAS"
# Check if service has failed replicas
if echo "$REPLICAS" | grep -q "0/"; then
echo " ❌ Service $service has NO running replicas!"
FAILED=1
fi
done
# Check frontend health endpoint
echo ""
echo "2⃣ Checking Frontend health endpoint..."
if curl -sf http://localhost/health > /dev/null; then
echo " ✅ Frontend health check PASSED"
else
echo " ❌ Frontend health check FAILED"
FAILED=1
fi
# Check API health endpoint
echo ""
echo "3⃣ Checking API health endpoint..."
if curl -sf http://localhost:3000/health > /dev/null; then
echo " ✅ API health check PASSED"
else
echo " ❌ API health check FAILED"
FAILED=1
fi
# Check Redis connectivity
echo ""
echo "4⃣ Checking Redis connectivity..."
if docker exec $(docker ps -q -f name=${STACK_NAME}_redis) redis-cli ping | grep -q PONG; then
echo " ✅ Redis connectivity PASSED"
else
echo " ❌ Redis connectivity FAILED"
FAILED=1
fi
# Check for recent errors in logs
echo ""
echo "5⃣ Checking recent logs for errors..."
ERROR_COUNT=$(docker service logs --since 5m $STACK_NAME | grep -i "error\|fatal\|panic" | wc -l)
if [ $ERROR_COUNT -gt 10 ]; then
echo " ⚠️ Found $ERROR_COUNT errors in last 5 minutes"
FAILED=1
else
echo " ✅ Error count acceptable: $ERROR_COUNT"
fi
echo ""
echo "═══════════════════════════════════════════════"
if [ $FAILED -eq 0 ]; then
echo "✅ ALL HEALTH CHECKS PASSED"
echo "═══════════════════════════════════════════════"
exit 0
else
echo "❌ HEALTH CHECKS FAILED"
echo "═══════════════════════════════════════════════"
exit 1
fi
```
---
## 4. Environment Management Strategy
### 4.1 Promotion Flow
```
┌─────────────────────────────────────────────────────────┐
│ Environment Promotion Flow │
└─────────────────────────────────────────────────────────┘
Developer updates image version in Git
Development (Automatic)
├─ Deploy immediately
├─ Run health checks
└─ ✅ If successful → enable Sandbox deployment
↓ (Manual approval required)
Sandbox (Manual Trigger)
├─ QA team tests features
├─ Run integration tests
└─ ✅ If approved → enable Testing deployment
↓ (Manual approval required)
Testing (Manual Trigger)
├─ Full regression testing
├─ Performance testing
└─ ✅ If approved → enable Production deployment
↓ (Manual approval required + confirmation)
Production (Manual Trigger)
├─ Backup current state
├─ Deploy with blue-green strategy
├─ Run comprehensive health checks
└─ ✅ Monitor or 🔄 Rollback if issues
```
### 4.2 Deployment Approval Matrix
| Environment | Approval Required | Who Can Approve | Rollback Strategy |
|-------------|-------------------|-----------------|-------------------|
| **Development** | ❌ No (Automatic) | N/A | Automatic on health check failure |
| **Sandbox** | ✅ Yes (Manual) | Any Developer | Manual via GitLab UI |
| **Testing** | ✅ Yes (Manual) | QA Lead, DevOps Lead | Manual via GitLab UI |
| **Production** | ✅ Yes (Manual + Confirmation) | DevOps Lead, CTO | Automatic on failure + Manual option |
### 4.3 Change Management Workflow
```yaml
# Example: Updating application version
# 1. Developer receives new image from Harbor
New image available: harbor.company.com/company/api:v3.2.2
# 2. Developer creates feature branch
git checkout -b update-api-v3.2.2
# 3. Update version in Development environment
# Edit: environments/development/.env
API_VERSION=v3.2.2
# 4. Commit and push
git add environments/development/.env
git commit -m "feat: update API to v3.2.2 in development"
git push origin update-api-v3.2.2
# 5. Create Merge Request in GitLab
- Title: "Update API to v3.2.2"
- Description: "New features: X, Y, Z. Bug fixes: A, B"
- Assign to: DevOps team for review
# 6. After MR approval and merge to main:
- GitLab CI automatically deploys to Development
- Monitor deployment
- If successful, manually trigger Sandbox deployment
# 7. QA tests in Sandbox
- If approved, update Testing environment
- Repeat process
# 8. Production deployment
- Update production/.env with new version
- Create MR with detailed change log
- Require approvals from: DevOps Lead + CTO
- Schedule deployment window
- Execute manual deployment
- Monitor closely
```
---
## 5. Rollback Strategy
### 5.1 Automatic Rollback (Health Check Failure)
```yaml
# In docker-compose.yml - automatic rollback on failure
services:
api:
deploy:
update_config:
failure_action: rollback # ← Automatic rollback!
monitor: 60s # Monitor for 60 seconds
rollback_config:
parallelism: 2 # Roll back 2 at a time
delay: 5s # 5s between rollbacks
```
**How it works:**
1. New version deploys
2. Docker Swarm monitors health checks for 60 seconds
3. If health checks fail → Automatic rollback to previous version
4. Previous version restored within 2-3 minutes
### 5.2 Manual Rollback via GitLab
**Option A: Rollback via Git History**
```bash
# GitLab Pipeline: rollback:production job
# 1. Identify previous working version
git log --oneline environments/production/.env
# 2. Checkout previous commit
git checkout <previous-commit-hash> -- environments/production/
# 3. Pipeline redeploys previous version
# 4. Verify health checks
```
**Option B: Rollback via GitLab UI**
```
GitLab → Deployments → Environments → Production
Click "Rollback" button
Select previous successful deployment
Confirm rollback
Pipeline automatically executes rollback job
```
### 5.3 Emergency Rollback Procedure
```bash
#!/bin/bash
# scripts/emergency-rollback.sh
# FOR EMERGENCY USE ONLY - bypasses GitLab pipeline
# Run directly on Swarm manager node
STACK_NAME="app-stack"
BACKUP_DIR="/backup/deployments"
echo "🚨 EMERGENCY ROLLBACK INITIATED"
# Find last backup
LAST_BACKUP=$(ls -td $BACKUP_DIR/* | head -1)
echo "Rolling back to: $LAST_BACKUP"
# Extract previous image versions
while read line; do
SERVICE=$(echo $line | awk '{print $1}')
IMAGE=$(echo $line | awk '{print $2}')
echo "Rolling back $SERVICE to $IMAGE"
docker service update --image $IMAGE ${STACK_NAME}_${SERVICE}
done < "$LAST_BACKUP/services.txt"
echo "✅ Emergency rollback completed"
echo "⚠️ Remember to update Git repository to match!"
```
---
## 6. Monitoring & Health Checks
### 6.1 Service-Level Health Checks
```yaml
# In docker-compose.yml
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s # Check every 30 seconds
timeout: 10s # Request timeout
retries: 3 # Fail after 3 attempts
start_period: 60s # Grace period for startup
```
### 6.2 Stack-Level Monitoring
```bash
#!/bin/bash
# scripts/monitor-deployment.sh
STACK_NAME="app-stack"
while true; do
clear
echo "═══════════════════════════════════════════════"
echo "Stack: $STACK_NAME - $(date)"
echo "═══════════════════════════════════════════════"
# Show service status
docker stack services $STACK_NAME
echo ""
echo "Recent logs (last 10 lines):"
docker service logs --tail=10 $STACK_NAME
sleep 10
done
```
### 6.3 Notification Integration
```yaml
# Add to .gitlab-ci.yml
after_script:
- |
if [ "$CI_JOB_STATUS" == "success" ]; then
MESSAGE="✅ Deployment to $CI_ENVIRONMENT_NAME successful"
else
MESSAGE="❌ Deployment to $CI_ENVIRONMENT_NAME FAILED"
fi
# Send to Slack
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$MESSAGE\nPipeline: $CI_PIPELINE_URL\"}" \
$SLACK_WEBHOOK_URL
# Send email (if SMTP configured)
echo "$MESSAGE" | mail -s "Deployment Notification" devops@company.com
```
---
## 7. Implementation Roadmap
### Phase 1: Preparation (Week 1)
**Day 1-2: Repository Setup**
- [ ] Create `deployment-configs` repository in GitLab
- [ ] Create directory structure (environments/, scripts/)
- [ ] Add current docker-compose.yml to each environment
- [ ] Create .env files with current versions
- [ ] Commit initial structure
**Day 3-4: GitLab Configuration**
- [ ] Configure GitLab CI/CD variables:
- `SWARM_DEV_HOST`, `SWARM_SANDBOX_HOST`, `SWARM_TEST_HOST`, `SWARM_PROD_HOST`
- `SWARM_SSH_KEY` (SSH private key)
- `HARBOR_USER`, `HARBOR_PASSWORD`
- `SLACK_WEBHOOK_URL` (optional)
- [ ] Create SSH keys for GitLab Runner → Swarm access
- [ ] Test SSH connectivity from GitLab to each Swarm environment
**Day 5: Scripts Development**
- [ ] Create deploy.sh script
- [ ] Create healthcheck.sh script
- [ ] Create rollback.sh script
- [ ] Test scripts manually on Development environment
### Phase 2: Pipeline Implementation (Week 2)
**Day 1-2: Basic Pipeline**
- [ ] Create .gitlab-ci.yml with validation stage only
- [ ] Test syntax validation
- [ ] Test image validation
**Day 3: Development Deployment**
- [ ] Add deploy:development job
- [ ] Test automatic deployment to Development
- [ ] Verify health checks work
**Day 4: Sandbox & Testing**
- [ ] Add deploy:sandbox job (manual)
- [ ] Add deploy:testing job (manual)
- [ ] Test manual approval workflow
**Day 5: Production Deployment**
- [ ] Add deploy:production job (manual + confirmation)
- [ ] Add backup before deployment
- [ ] Test on Friday afternoon (low traffic)
### Phase 3: Rollback Implementation (Week 3)
**Day 1-2: Automatic Rollback**
- [ ] Configure Docker Swarm automatic rollback
- [ ] Test by deploying broken version
- [ ] Verify automatic recovery
**Day 3-4: Manual Rollback**
- [ ] Implement rollback:production job
- [ ] Test Git-based rollback
- [ ] Document rollback procedure
**Day 5: Emergency Procedures**
- [ ] Create emergency-rollback.sh script
- [ ] Test emergency rollback
- [ ] Document for on-call team
### Phase 4: Monitoring & Optimization (Week 4)
**Day 1-2: Monitoring**
- [ ] Set up deployment notifications (Slack/Email)
- [ ] Configure Prometheus metrics collection
- [ ] Create Grafana dashboards for deployments
**Day 3-4: Documentation**
- [ ] Write deployment guide for developers
- [ ] Write operations runbook
- [ ] Create troubleshooting guide
- [ ] Record demo video
**Day 5: Team Training**
- [ ] Train developers on new workflow
- [ ] Train QA team on approval process
- [ ] Train DevOps team on monitoring/rollback
- [ ] Conduct Q&A session
---
## 8. Best Practices & Tips
### 8.1 Version Management
**✅ DO:**
```bash
# Use semantic versioning
API_VERSION=v3.2.1 # ← Good: Clear, semantic version
# Include Git commit hash for traceability
API_VERSION=v3.2.1-abc123ef
# Use immutable tags
IMAGE=harbor.company.com/app:v1.2.3 # ← Good: Specific version
```
**❌ DON'T:**
```bash
# Avoid mutable tags
API_VERSION=latest # ← Bad: Can change unexpectedly
# Avoid ambiguous versions
API_VERSION=production # ← Bad: What version is this?
```
### 8.2 Deployment Timing
**Recommended deployment windows:**
- **Development:** Anytime (automatic)
- **Sandbox:** Business hours (9am-5pm)
- **Testing:** Business hours (requires QA)
- **Production:**
- Normal changes: Tuesday-Thursday, 10am-2pm
- Critical fixes: Anytime with proper approval
- Avoid: Monday mornings, Friday afternoons, weekends
### 8.3 Communication
**Before Production deployment:**
```
Slack announcement template:
📢 Production Deployment Scheduled
🗓 Date: January 15, 2026
⏰ Time: 11:00 AM (EST)
⏱ Duration: ~15 minutes
📝 Changes:
- API v3.2.1 → v3.2.2 (bug fixes)
- Frontend v2.1.5 → v2.1.6 (UI improvements)
🔗 Release Notes: [link]
🔗 Rollback Plan: [link]
Please report any issues to #devops-alerts
```
### 8.4 Security Considerations
```yaml
# Store sensitive data as Docker secrets
secrets:
db_password:
external: true # ← Created outside compose file
api_key:
external: true
# Never commit secrets to Git!
# Use GitLab CI/CD variables for:
# - SSH keys
# - API tokens
# - Passwords
# - Certificates
```
### 8.5 Troubleshooting Common Issues
**Issue 1: Pipeline fails with "SSH connection refused"**
```bash
# Solution: Verify SSH key in GitLab CI/CD variables
# Test manually:
ssh -i ~/.ssh/gitlab_rsa root@swarm-manager
```
**Issue 2: Image pull fails from Harbor**
```bash
# Solution: Check registry credentials
docker login harbor.company.com -u $HARBOR_USER -p $HARBOR_PASSWORD
# Verify image exists:
docker pull harbor.company.com/company/api:v3.2.1
```
**Issue 3: Health checks fail after deployment**
```bash
# Debug: Check service logs
docker service logs app-stack_api --tail 100
# Check service status
docker service ps app-stack_api
# Manual health check
curl http://localhost:3000/health
```
**Issue 4: Deployment stuck "pending"**
```bash
# Check swarm node status
docker node ls
# Check resource availability
docker node inspect swarm-worker-1 | grep Resources -A 10
# Check for failed tasks
docker service ps app-stack_api --no-trunc
```
---
## 9. Success Metrics
### 9.1 Key Performance Indicators
**Before Automation:**
- 📊 Deployment frequency: 1-2 per week
- ⏱ Average deployment time: 30-60 minutes per environment
- 🐛 Deployment errors: ~20% (typos, wrong tags)
- 🔄 Rollback time: 1-2 hours (manual)
- 📝 Audit trail: Partial (chat logs, manual notes)
**After Automation (Target):**
- 📊 Deployment frequency: 5-10 per week
- ⏱ Average deployment time: 5-10 minutes per environment
- 🐛 Deployment errors: <2% (automated validation)
- 🔄 Rollback time: 2-3 minutes (automatic)
- 📝 Audit trail: Complete (Git history + GitLab logs)
### 9.2 Success Criteria
**Week 4 Evaluation:**
- [ ] All 4 environments deployed via GitLab CI/CD
- [ ] Zero manual SSH deployments
- [ ] At least 5 successful Production deployments
- [ ] At least 1 successful rollback test
- [ ] Team can deploy without DevOps assistance
- [ ] Complete audit trail for all deployments
- [ ] Average deployment time < 15 minutes
---
## 10. Conclusion & Next Steps
### Current State
Manual bash script deployments
No audit trail
Error-prone process
Slow rollbacks
### Target State (After Implementation)
Automated GitLab CI/CD pipelines
Complete Git-based audit trail
Validated deployments with health checks
2-minute automatic rollbacks
Self-service for developers
### Immediate Next Steps
1. **This Week:**
- Create GitLab repository structure
- Configure CI/CD variables
- Test SSH connectivity
2. **Next Week:**
- Implement basic pipeline
- Test Development deployments
- Add validation stages
3. **Week 3-4:**
- Roll out to all environments
- Implement rollback procedures
- Train team
### Resources Needed
- **Time Investment:** 2-4 weeks (1 DevOps engineer)
- **Infrastructure:** GitLab Runner (existing OK)
- **Training:** 2-3 hours team training session
- **Documentation:** Deployment guide + runbooks
### Support & Questions
For implementation assistance:
- 📧 Email: devops@company.com
- 💬 Slack: #devops-automation
- 📖 Documentation: https://gitlab.company.com/deployment-configs
---
**Document Version:** 1.0
**Last Updated:** Январь 2026
**Status:** Ready for Implementation
**Author:** DevOps Team
**Review Date:** After Phase 2 completion