diff --git a/docs/gitops-cicd/gitlab-harbor-swarm-automation-solution.md b/docs/gitops-cicd/gitlab-harbor-swarm-automation-solution.md new file mode 100644 index 0000000..16af87d --- /dev/null +++ b/docs/gitops-cicd/gitlab-harbor-swarm-automation-solution.md @@ -0,0 +1,1320 @@ +# GitLab + Harbor + Docker Swarm: Automated Deployment Solution + +**Версия:** 1.0 +**Дата создания:** Январь 2026 +**Статус:** Implementation Ready +**Целевая аудитория:** DevOps Team, Development Team + +--- + +## Executive Summary + +Данный документ описывает практическое решение для автоматизации deployment процесса в существующей инфраструктуре: + +**Текущая ситуация:** +- ✅ GitLab уже установлен +- ✅ Harbor Registry уже работает +- ✅ Docker Swarm с несколькими контейнерами +- ✅ 4 окружения: Development → Sandbox → Testing → Production +- ❌ Ручной deployment через bash скрипты +- ❌ Нет процесса code review +- ❌ Нет автоматического rollback +- ❌ Получаем готовые images из Harbor без visibility + +**Предлагаемое решение:** +- GitLab CI/CD pipelines для автоматического deployment +- GitOps подход: Git как source of truth для deployments +- Автоматический deployment по средам с approval gates +- One-click rollback capability +- Deployment history и audit trail +- Health checks и автоматический rollback при failure + +**Результаты внедрения:** +- 🚀 Deployment time: с 30-60 минут → 5-10 минут +- 🔒 Human errors: reduction на 90% +- 📊 Full visibility: кто, что, когда deployed +- ⚡ Rollback: с 1-2 часов → 2-3 минуты +- ✅ Compliance: полный audit trail + +--- + +## Содержание + +1. [Архитектура решения](#1-архитектура-решения) +2. [GitLab CI/CD Pipeline Implementation](#2-gitlab-cicd-pipeline-implementation) +3. [Docker Stack Management](#3-docker-stack-management) +4. [Environment Management Strategy](#4-environment-management-strategy) +5. [Rollback Strategy](#5-rollback-strategy) +6. [Monitoring & Health Checks](#6-monitoring--health-checks) +7. [Implementation Roadmap](#7-implementation-roadmap) +8. [Best Practices](#8-best-practices) + +--- + +## 1. Архитектура решения + +### 1.1 Current State Architecture + +``` +┌─────────────────────────────────────────────────────────────┐ +│ Current Manual Process │ +├─────────────────────────────────────────────────────────────┤ +│ │ +│ Developer → Build Image → Push to Harbor │ +│ ↓ │ +│ Notify DevOps Team │ +│ ↓ │ +│ DevOps manually runs bash scripts: │ +│ │ +│ 1. SSH to Swarm manager │ +│ 2. docker service update app --image harbor/app:new-tag │ +│ 3. Check logs manually │ +│ 4. Hope everything works │ +│ 5. Repeat for each environment (4x) │ +│ │ +│ Problems: │ +│ • Time consuming (30-60 min per environment) │ +│ • Error prone (typos, wrong tags) │ +│ • No rollback plan │ +│ • No audit trail │ +│ • No validation before deployment │ +└─────────────────────────────────────────────────────────────┘ +``` + +### 1.2 Target Automated Architecture + +``` +┌──────────────────────────────────────────────────────────────┐ +│ Automated GitOps-Based Solution │ +├──────────────────────────────────────────────────────────────┤ +│ │ +│ Developer pushes image tag change to Git │ +│ ↓ │ +│ GitLab CI/CD Pipeline automatically: │ +│ ↓ │ +│ ┌─────────────────────────────────────────────────┐ │ +│ │ 1. Validate docker-compose.yml syntax │ │ +│ │ 2. Check image exists in Harbor │ │ +│ │ 3. Deploy to Development (automatic) │ │ +│ │ 4. Run health checks │ │ +│ │ 5. Wait for manual approval → Sandbox │ │ +│ │ 6. Deploy to Sandbox │ │ +│ │ 7. Wait for manual approval → Testing │ │ +│ │ 8. Deploy to Testing │ │ +│ │ 9. Wait for manual approval → Production │ │ +│ │ 10. Deploy to Production │ │ +│ │ 11. Monitor deployment success │ │ +│ │ 12. Auto-rollback if health checks fail │ │ +│ └─────────────────────────────────────────────────┘ │ +│ │ +│ Benefits: │ +│ ✅ 5-10 minutes per environment │ +│ ✅ Zero human errors │ +│ ✅ Automatic rollback on failure │ +│ ✅ Complete audit trail in Git │ +│ ✅ Pre-deployment validation │ +└──────────────────────────────────────────────────────────────┘ +``` + +### 1.3 Git Repository Structure + +``` +deployment-configs/ # New GitLab repository +├── README.md +├── .gitlab-ci.yml # CI/CD pipeline definition +│ +├── environments/ +│ ├── development/ +│ │ ├── docker-compose.yml +│ │ ├── .env +│ │ └── healthcheck.sh +│ │ +│ ├── sandbox/ +│ │ ├── docker-compose.yml +│ │ ├── .env +│ │ └── healthcheck.sh +│ │ +│ ├── testing/ +│ │ ├── docker-compose.yml +│ │ ├── .env +│ │ └── healthcheck.sh +│ │ +│ └── production/ +│ ├── docker-compose.yml +│ ├── .env +│ └── healthcheck.sh +│ +├── scripts/ +│ ├── deploy.sh # Deployment script +│ ├── rollback.sh # Rollback script +│ ├── healthcheck.sh # Health validation +│ └── validate-compose.sh # Pre-deployment validation +│ +└── docs/ + ├── deployment-guide.md + └── rollback-procedure.md +``` + +--- + +## 2. GitLab CI/CD Pipeline Implementation + +### 2.1 Complete .gitlab-ci.yml + +```yaml +# .gitlab-ci.yml - Complete automated deployment pipeline + +variables: + DOCKER_HOST: "tcp://docker-swarm-manager:2376" + DOCKER_TLS_VERIFY: "1" + HARBOR_REGISTRY: "harbor.company.com" + + # Swarm connection details (stored in GitLab CI/CD variables) + # SWARM_DEV_HOST, SWARM_SANDBOX_HOST, SWARM_TEST_HOST, SWARM_PROD_HOST + # SWARM_SSH_KEY (SSH private key for authentication) + +stages: + - validate + - deploy-dev + - deploy-sandbox + - deploy-testing + - deploy-production + - rollback + +#═══════════════════════════════════════════════════════════ +# Stage 1: VALIDATION +#═══════════════════════════════════════════════════════════ + +validate:syntax: + stage: validate + image: docker:24-cli + script: + - echo "Validating docker-compose files..." + - | + for env in development sandbox testing production; do + echo "Checking $env environment..." + docker-compose -f environments/$env/docker-compose.yml config > /dev/null + if [ $? -eq 0 ]; then + echo "✅ $env: Syntax OK" + else + echo "❌ $env: Syntax ERROR" + exit 1 + fi + done + only: + - branches + tags: + - docker + +validate:images: + stage: validate + image: docker:24-cli + before_script: + - docker login -u $HARBOR_USER -p $HARBOR_PASSWORD $HARBOR_REGISTRY + script: + - echo "Checking if images exist in Harbor..." + - | + for env in development sandbox testing production; do + echo "Checking images for $env..." + + # Extract image tags from docker-compose + images=$(grep "image:" environments/$env/docker-compose.yml | awk '{print $2}') + + for image in $images; do + echo "Pulling $image to verify existence..." + docker pull $image + if [ $? -eq 0 ]; then + echo "✅ Image exists: $image" + else + echo "❌ Image NOT found: $image" + exit 1 + fi + done + done + only: + - branches + tags: + - docker + +#═══════════════════════════════════════════════════════════ +# Stage 2: DEPLOY TO DEVELOPMENT (Automatic) +#═══════════════════════════════════════════════════════════ + +deploy:development: + stage: deploy-dev + image: alpine:latest + before_script: + - apk add --no-cache openssh-client bash docker-cli + - eval $(ssh-agent -s) + - echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add - + - mkdir -p ~/.ssh + - chmod 700 ~/.ssh + - ssh-keyscan -H $SWARM_DEV_HOST >> ~/.ssh/known_hosts + script: + - echo "🚀 Deploying to DEVELOPMENT environment..." + + # Copy files to swarm manager + - scp -r environments/development root@$SWARM_DEV_HOST:/tmp/deploy/ + - scp scripts/deploy.sh root@$SWARM_DEV_HOST:/tmp/deploy/ + + # Execute deployment + - | + ssh root@$SWARM_DEV_HOST bash << 'EOF' + cd /tmp/deploy/development + + # Load environment variables + source .env + + # Deploy stack + docker stack deploy -c docker-compose.yml --with-registry-auth app-stack + + # Wait for services to stabilize + echo "Waiting for services to start..." + sleep 30 + + # Check service status + docker stack services app-stack + + # Run health checks + bash ../healthcheck.sh + EOF + + - echo "✅ Deployment to DEVELOPMENT completed" + + environment: + name: development + url: https://dev.company.com + on_stop: stop:development + + only: + - main + - develop + + tags: + - deployment + +#═══════════════════════════════════════════════════════════ +# Stage 3: DEPLOY TO SANDBOX (Manual Approval Required) +#═══════════════════════════════════════════════════════════ + +deploy:sandbox: + stage: deploy-sandbox + image: alpine:latest + before_script: + - apk add --no-cache openssh-client bash docker-cli + - eval $(ssh-agent -s) + - echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add - + - mkdir -p ~/.ssh + - chmod 700 ~/.ssh + - ssh-keyscan -H $SWARM_SANDBOX_HOST >> ~/.ssh/known_hosts + + script: + - echo "🚀 Deploying to SANDBOX environment..." + - scp -r environments/sandbox root@$SWARM_SANDBOX_HOST:/tmp/deploy/ + - | + ssh root@$SWARM_SANDBOX_HOST bash << 'EOF' + cd /tmp/deploy/sandbox + source .env + docker stack deploy -c docker-compose.yml --with-registry-auth app-stack + sleep 30 + docker stack services app-stack + bash ../healthcheck.sh + EOF + - echo "✅ Deployment to SANDBOX completed" + + environment: + name: sandbox + url: https://sandbox.company.com + + when: manual # ⚠️ Requires manual approval + + only: + - main + + tags: + - deployment + +#═══════════════════════════════════════════════════════════ +# Stage 4: DEPLOY TO TESTING (Manual Approval Required) +#═══════════════════════════════════════════════════════════ + +deploy:testing: + stage: deploy-testing + image: alpine:latest + before_script: + - apk add --no-cache openssh-client bash docker-cli + - eval $(ssh-agent -s) + - echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add - + - mkdir -p ~/.ssh + - chmod 700 ~/.ssh + - ssh-keyscan -H $SWARM_TEST_HOST >> ~/.ssh/known_hosts + + script: + - echo "🚀 Deploying to TESTING environment..." + - scp -r environments/testing root@$SWARM_TEST_HOST:/tmp/deploy/ + - | + ssh root@$SWARM_TEST_HOST bash << 'EOF' + cd /tmp/deploy/testing + source .env + docker stack deploy -c docker-compose.yml --with-registry-auth app-stack + sleep 30 + docker stack services app-stack + bash ../healthcheck.sh + EOF + - echo "✅ Deployment to TESTING completed" + + environment: + name: testing + url: https://testing.company.com + + when: manual # ⚠️ Requires manual approval + + only: + - main + + tags: + - deployment + +#═══════════════════════════════════════════════════════════ +# Stage 5: DEPLOY TO PRODUCTION (Manual Approval Required) +#═══════════════════════════════════════════════════════════ + +deploy:production: + stage: deploy-production + image: alpine:latest + before_script: + - apk add --no-cache openssh-client bash docker-cli + - eval $(ssh-agent -s) + - echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add - + - mkdir -p ~/.ssh + - chmod 700 ~/.ssh + - ssh-keyscan -H $SWARM_PROD_HOST >> ~/.ssh/known_hosts + + script: + - echo "🚀 Deploying to PRODUCTION environment..." + + # Backup current deployment + - | + ssh root@$SWARM_PROD_HOST bash << 'EOF' + echo "Creating backup of current deployment..." + mkdir -p /backup/deployments/$(date +%Y%m%d-%H%M%S) + docker stack services app-stack --format "{{.Name}} {{.Image}}" > /backup/deployments/$(date +%Y%m%d-%H%M%S)/services.txt + echo "Backup created" + EOF + + # Deploy new version + - scp -r environments/production root@$SWARM_PROD_HOST:/tmp/deploy/ + - | + ssh root@$SWARM_PROD_HOST bash << 'EOF' + cd /tmp/deploy/production + source .env + + echo "Starting production deployment..." + docker stack deploy -c docker-compose.yml --with-registry-auth app-stack + + echo "Waiting for services to stabilize..." + sleep 60 + + echo "Checking service health..." + docker stack services app-stack + + # Run comprehensive health checks + bash ../healthcheck.sh + + if [ $? -eq 0 ]; then + echo "✅ Health checks PASSED" + else + echo "❌ Health checks FAILED - consider rollback" + exit 1 + fi + EOF + + - echo "✅ Deployment to PRODUCTION completed successfully" + + environment: + name: production + url: https://app.company.com + + when: manual # ⚠️ Requires manual approval + confirmation + + only: + - main + + tags: + - deployment + +#═══════════════════════════════════════════════════════════ +# ROLLBACK JOBS (Manual Trigger) +#═══════════════════════════════════════════════════════════ + +rollback:production: + stage: rollback + image: alpine:latest + before_script: + - apk add --no-cache openssh-client bash docker-cli git + - eval $(ssh-agent -s) + - echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add - + - mkdir -p ~/.ssh + - chmod 700 ~/.ssh + - ssh-keyscan -H $SWARM_PROD_HOST >> ~/.ssh/known_hosts + + script: + - echo "🔄 Rolling back PRODUCTION to previous version..." + + # Get previous Git commit + - PREVIOUS_COMMIT=$(git rev-parse HEAD~1) + - echo "Rolling back to commit: $PREVIOUS_COMMIT" + + # Checkout previous version + - git checkout $PREVIOUS_COMMIT -- environments/production/ + + # Deploy previous version + - scp -r environments/production root@$SWARM_PROD_HOST:/tmp/rollback/ + - | + ssh root@$SWARM_PROD_HOST bash << 'EOF' + cd /tmp/rollback/production + source .env + + echo "Rolling back to previous version..." + docker stack deploy -c docker-compose.yml --with-registry-auth app-stack + + sleep 30 + + echo "Verifying rollback..." + docker stack services app-stack + bash ../healthcheck.sh + EOF + + - echo "✅ Rollback completed" + + environment: + name: production + action: rollback + + when: manual + + only: + - main + + tags: + - deployment +``` + +--- + +## 3. Docker Stack Management + +### 3.1 Example docker-compose.yml Structure + +```yaml +# environments/production/docker-compose.yml + +version: '3.8' + +services: + + #════════════════════════════════════════════════════════ + # Frontend Application + #════════════════════════════════════════════════════════ + frontend: + image: ${HARBOR_REGISTRY}/company/frontend:${FRONTEND_VERSION} + networks: + - app-network + ports: + - "80:80" + - "443:443" + deploy: + replicas: 3 + update_config: + parallelism: 1 + delay: 10s + failure_action: rollback + monitor: 30s + rollback_config: + parallelism: 1 + delay: 5s + restart_policy: + condition: any + delay: 5s + max_attempts: 3 + placement: + constraints: + - node.role == worker + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost/health"] + interval: 30s + timeout: 10s + retries: 3 + start_period: 40s + logging: + driver: "json-file" + options: + max-size: "10m" + max-file: "3" + + #════════════════════════════════════════════════════════ + # Backend API + #════════════════════════════════════════════════════════ + api: + image: ${HARBOR_REGISTRY}/company/api:${API_VERSION} + networks: + - app-network + - db-network + environment: + - DATABASE_URL=${DATABASE_URL} + - REDIS_URL=${REDIS_URL} + - JWT_SECRET=${JWT_SECRET} + secrets: + - db_password + - jwt_secret + deploy: + replicas: 5 + update_config: + parallelism: 2 + delay: 10s + failure_action: rollback + monitor: 45s + rollback_config: + parallelism: 2 + delay: 5s + restart_policy: + condition: any + delay: 5s + max_attempts: 3 + placement: + constraints: + - node.role == worker + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:3000/health"] + interval: 30s + timeout: 10s + retries: 3 + start_period: 60s + logging: + driver: "json-file" + options: + max-size: "10m" + max-file: "3" + + #════════════════════════════════════════════════════════ + # Worker Service + #════════════════════════════════════════════════════════ + worker: + image: ${HARBOR_REGISTRY}/company/worker:${WORKER_VERSION} + networks: + - app-network + - db-network + environment: + - REDIS_URL=${REDIS_URL} + - QUEUE_NAME=jobs + deploy: + replicas: 3 + update_config: + parallelism: 1 + delay: 10s + failure_action: rollback + restart_policy: + condition: any + delay: 10s + max_attempts: 3 + placement: + constraints: + - node.role == worker + logging: + driver: "json-file" + options: + max-size: "10m" + max-file: "3" + + #════════════════════════════════════════════════════════ + # Cache (Redis) + #════════════════════════════════════════════════════════ + redis: + image: redis:7-alpine + networks: + - app-network + deploy: + replicas: 1 + placement: + constraints: + - node.role == worker + restart_policy: + condition: any + healthcheck: + test: ["CMD", "redis-cli", "ping"] + interval: 10s + timeout: 3s + retries: 3 + logging: + driver: "json-file" + options: + max-size: "10m" + max-file: "3" + +#════════════════════════════════════════════════════════ +# Networks +#════════════════════════════════════════════════════════ +networks: + app-network: + driver: overlay + attachable: true + db-network: + driver: overlay + internal: true + +#════════════════════════════════════════════════════════ +# Secrets +#════════════════════════════════════════════════════════ +secrets: + db_password: + external: true + jwt_secret: + external: true +``` + +### 3.2 Environment Variables (.env files) + +```bash +# environments/production/.env + +# Harbor Registry +HARBOR_REGISTRY=harbor.company.com + +# Application Versions (THIS IS WHAT YOU UPDATE!) +FRONTEND_VERSION=v2.1.5 +API_VERSION=v3.2.1 +WORKER_VERSION=v1.8.3 + +# Database Configuration +DATABASE_URL=postgresql://user@db-prod:5432/appdb + +# Redis Configuration +REDIS_URL=redis://redis:6379 + +# Application Configuration +JWT_SECRET_FILE=/run/secrets/jwt_secret +LOG_LEVEL=info +ENVIRONMENT=production +``` + +### 3.3 Health Check Script + +```bash +#!/bin/bash +# environments/production/healthcheck.sh + +set -e + +echo "═══════════════════════════════════════════════" +echo "Running Health Checks for Production" +echo "═══════════════════════════════════════════════" + +STACK_NAME="app-stack" +FAILED=0 + +# Check if all services are running +echo "" +echo "1️⃣ Checking service status..." +SERVICES=$(docker stack services $STACK_NAME --format "{{.Name}}") + +for service in $SERVICES; do + REPLICAS=$(docker service ls --filter name=$service --format "{{.Replicas}}") + echo " $service: $REPLICAS" + + # Check if service has failed replicas + if echo "$REPLICAS" | grep -q "0/"; then + echo " ❌ Service $service has NO running replicas!" + FAILED=1 + fi +done + +# Check frontend health endpoint +echo "" +echo "2️⃣ Checking Frontend health endpoint..." +if curl -sf http://localhost/health > /dev/null; then + echo " ✅ Frontend health check PASSED" +else + echo " ❌ Frontend health check FAILED" + FAILED=1 +fi + +# Check API health endpoint +echo "" +echo "3️⃣ Checking API health endpoint..." +if curl -sf http://localhost:3000/health > /dev/null; then + echo " ✅ API health check PASSED" +else + echo " ❌ API health check FAILED" + FAILED=1 +fi + +# Check Redis connectivity +echo "" +echo "4️⃣ Checking Redis connectivity..." +if docker exec $(docker ps -q -f name=${STACK_NAME}_redis) redis-cli ping | grep -q PONG; then + echo " ✅ Redis connectivity PASSED" +else + echo " ❌ Redis connectivity FAILED" + FAILED=1 +fi + +# Check for recent errors in logs +echo "" +echo "5️⃣ Checking recent logs for errors..." +ERROR_COUNT=$(docker service logs --since 5m $STACK_NAME | grep -i "error\|fatal\|panic" | wc -l) +if [ $ERROR_COUNT -gt 10 ]; then + echo " ⚠️ Found $ERROR_COUNT errors in last 5 minutes" + FAILED=1 +else + echo " ✅ Error count acceptable: $ERROR_COUNT" +fi + +echo "" +echo "═══════════════════════════════════════════════" +if [ $FAILED -eq 0 ]; then + echo "✅ ALL HEALTH CHECKS PASSED" + echo "═══════════════════════════════════════════════" + exit 0 +else + echo "❌ HEALTH CHECKS FAILED" + echo "═══════════════════════════════════════════════" + exit 1 +fi +``` + +--- + +## 4. Environment Management Strategy + +### 4.1 Promotion Flow + +``` +┌─────────────────────────────────────────────────────────┐ +│ Environment Promotion Flow │ +└─────────────────────────────────────────────────────────┘ + +Developer updates image version in Git + ↓ + Development (Automatic) + ├─ Deploy immediately + ├─ Run health checks + └─ ✅ If successful → enable Sandbox deployment + + ↓ (Manual approval required) + + Sandbox (Manual Trigger) + ├─ QA team tests features + ├─ Run integration tests + └─ ✅ If approved → enable Testing deployment + + ↓ (Manual approval required) + + Testing (Manual Trigger) + ├─ Full regression testing + ├─ Performance testing + └─ ✅ If approved → enable Production deployment + + ↓ (Manual approval required + confirmation) + + Production (Manual Trigger) + ├─ Backup current state + ├─ Deploy with blue-green strategy + ├─ Run comprehensive health checks + └─ ✅ Monitor or 🔄 Rollback if issues +``` + +### 4.2 Deployment Approval Matrix + +| Environment | Approval Required | Who Can Approve | Rollback Strategy | +|-------------|-------------------|-----------------|-------------------| +| **Development** | ❌ No (Automatic) | N/A | Automatic on health check failure | +| **Sandbox** | ✅ Yes (Manual) | Any Developer | Manual via GitLab UI | +| **Testing** | ✅ Yes (Manual) | QA Lead, DevOps Lead | Manual via GitLab UI | +| **Production** | ✅ Yes (Manual + Confirmation) | DevOps Lead, CTO | Automatic on failure + Manual option | + +### 4.3 Change Management Workflow + +```yaml +# Example: Updating application version + +# 1. Developer receives new image from Harbor +New image available: harbor.company.com/company/api:v3.2.2 + +# 2. Developer creates feature branch +git checkout -b update-api-v3.2.2 + +# 3. Update version in Development environment +# Edit: environments/development/.env +API_VERSION=v3.2.2 + +# 4. Commit and push +git add environments/development/.env +git commit -m "feat: update API to v3.2.2 in development" +git push origin update-api-v3.2.2 + +# 5. Create Merge Request in GitLab +- Title: "Update API to v3.2.2" +- Description: "New features: X, Y, Z. Bug fixes: A, B" +- Assign to: DevOps team for review + +# 6. After MR approval and merge to main: +- GitLab CI automatically deploys to Development +- Monitor deployment +- If successful, manually trigger Sandbox deployment + +# 7. QA tests in Sandbox +- If approved, update Testing environment +- Repeat process + +# 8. Production deployment +- Update production/.env with new version +- Create MR with detailed change log +- Require approvals from: DevOps Lead + CTO +- Schedule deployment window +- Execute manual deployment +- Monitor closely +``` + +--- + +## 5. Rollback Strategy + +### 5.1 Automatic Rollback (Health Check Failure) + +```yaml +# In docker-compose.yml - automatic rollback on failure + +services: + api: + deploy: + update_config: + failure_action: rollback # ← Automatic rollback! + monitor: 60s # Monitor for 60 seconds + rollback_config: + parallelism: 2 # Roll back 2 at a time + delay: 5s # 5s between rollbacks +``` + +**How it works:** +1. New version deploys +2. Docker Swarm monitors health checks for 60 seconds +3. If health checks fail → Automatic rollback to previous version +4. Previous version restored within 2-3 minutes + +### 5.2 Manual Rollback via GitLab + +**Option A: Rollback via Git History** + +```bash +# GitLab Pipeline: rollback:production job + +# 1. Identify previous working version +git log --oneline environments/production/.env + +# 2. Checkout previous commit +git checkout -- environments/production/ + +# 3. Pipeline redeploys previous version +# 4. Verify health checks +``` + +**Option B: Rollback via GitLab UI** + +``` +GitLab → Deployments → Environments → Production + ↓ +Click "Rollback" button + ↓ +Select previous successful deployment + ↓ +Confirm rollback + ↓ +Pipeline automatically executes rollback job +``` + +### 5.3 Emergency Rollback Procedure + +```bash +#!/bin/bash +# scripts/emergency-rollback.sh + +# FOR EMERGENCY USE ONLY - bypasses GitLab pipeline +# Run directly on Swarm manager node + +STACK_NAME="app-stack" +BACKUP_DIR="/backup/deployments" + +echo "🚨 EMERGENCY ROLLBACK INITIATED" + +# Find last backup +LAST_BACKUP=$(ls -td $BACKUP_DIR/* | head -1) +echo "Rolling back to: $LAST_BACKUP" + +# Extract previous image versions +while read line; do + SERVICE=$(echo $line | awk '{print $1}') + IMAGE=$(echo $line | awk '{print $2}') + + echo "Rolling back $SERVICE to $IMAGE" + docker service update --image $IMAGE ${STACK_NAME}_${SERVICE} +done < "$LAST_BACKUP/services.txt" + +echo "✅ Emergency rollback completed" +echo "⚠️ Remember to update Git repository to match!" +``` + +--- + +## 6. Monitoring & Health Checks + +### 6.1 Service-Level Health Checks + +```yaml +# In docker-compose.yml + +healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:3000/health"] + interval: 30s # Check every 30 seconds + timeout: 10s # Request timeout + retries: 3 # Fail after 3 attempts + start_period: 60s # Grace period for startup +``` + +### 6.2 Stack-Level Monitoring + +```bash +#!/bin/bash +# scripts/monitor-deployment.sh + +STACK_NAME="app-stack" + +while true; do + clear + echo "═══════════════════════════════════════════════" + echo "Stack: $STACK_NAME - $(date)" + echo "═══════════════════════════════════════════════" + + # Show service status + docker stack services $STACK_NAME + + echo "" + echo "Recent logs (last 10 lines):" + docker service logs --tail=10 $STACK_NAME + + sleep 10 +done +``` + +### 6.3 Notification Integration + +```yaml +# Add to .gitlab-ci.yml + +after_script: + - | + if [ "$CI_JOB_STATUS" == "success" ]; then + MESSAGE="✅ Deployment to $CI_ENVIRONMENT_NAME successful" + else + MESSAGE="❌ Deployment to $CI_ENVIRONMENT_NAME FAILED" + fi + + # Send to Slack + curl -X POST -H 'Content-type: application/json' \ + --data "{\"text\":\"$MESSAGE\nPipeline: $CI_PIPELINE_URL\"}" \ + $SLACK_WEBHOOK_URL + + # Send email (if SMTP configured) + echo "$MESSAGE" | mail -s "Deployment Notification" devops@company.com +``` + +--- + +## 7. Implementation Roadmap + +### Phase 1: Preparation (Week 1) + +**Day 1-2: Repository Setup** +- [ ] Create `deployment-configs` repository in GitLab +- [ ] Create directory structure (environments/, scripts/) +- [ ] Add current docker-compose.yml to each environment +- [ ] Create .env files with current versions +- [ ] Commit initial structure + +**Day 3-4: GitLab Configuration** +- [ ] Configure GitLab CI/CD variables: + - `SWARM_DEV_HOST`, `SWARM_SANDBOX_HOST`, `SWARM_TEST_HOST`, `SWARM_PROD_HOST` + - `SWARM_SSH_KEY` (SSH private key) + - `HARBOR_USER`, `HARBOR_PASSWORD` + - `SLACK_WEBHOOK_URL` (optional) +- [ ] Create SSH keys for GitLab Runner → Swarm access +- [ ] Test SSH connectivity from GitLab to each Swarm environment + +**Day 5: Scripts Development** +- [ ] Create deploy.sh script +- [ ] Create healthcheck.sh script +- [ ] Create rollback.sh script +- [ ] Test scripts manually on Development environment + +### Phase 2: Pipeline Implementation (Week 2) + +**Day 1-2: Basic Pipeline** +- [ ] Create .gitlab-ci.yml with validation stage only +- [ ] Test syntax validation +- [ ] Test image validation + +**Day 3: Development Deployment** +- [ ] Add deploy:development job +- [ ] Test automatic deployment to Development +- [ ] Verify health checks work + +**Day 4: Sandbox & Testing** +- [ ] Add deploy:sandbox job (manual) +- [ ] Add deploy:testing job (manual) +- [ ] Test manual approval workflow + +**Day 5: Production Deployment** +- [ ] Add deploy:production job (manual + confirmation) +- [ ] Add backup before deployment +- [ ] Test on Friday afternoon (low traffic) + +### Phase 3: Rollback Implementation (Week 3) + +**Day 1-2: Automatic Rollback** +- [ ] Configure Docker Swarm automatic rollback +- [ ] Test by deploying broken version +- [ ] Verify automatic recovery + +**Day 3-4: Manual Rollback** +- [ ] Implement rollback:production job +- [ ] Test Git-based rollback +- [ ] Document rollback procedure + +**Day 5: Emergency Procedures** +- [ ] Create emergency-rollback.sh script +- [ ] Test emergency rollback +- [ ] Document for on-call team + +### Phase 4: Monitoring & Optimization (Week 4) + +**Day 1-2: Monitoring** +- [ ] Set up deployment notifications (Slack/Email) +- [ ] Configure Prometheus metrics collection +- [ ] Create Grafana dashboards for deployments + +**Day 3-4: Documentation** +- [ ] Write deployment guide for developers +- [ ] Write operations runbook +- [ ] Create troubleshooting guide +- [ ] Record demo video + +**Day 5: Team Training** +- [ ] Train developers on new workflow +- [ ] Train QA team on approval process +- [ ] Train DevOps team on monitoring/rollback +- [ ] Conduct Q&A session + +--- + +## 8. Best Practices & Tips + +### 8.1 Version Management + +**✅ DO:** +```bash +# Use semantic versioning +API_VERSION=v3.2.1 # ← Good: Clear, semantic version + +# Include Git commit hash for traceability +API_VERSION=v3.2.1-abc123ef + +# Use immutable tags +IMAGE=harbor.company.com/app:v1.2.3 # ← Good: Specific version +``` + +**❌ DON'T:** +```bash +# Avoid mutable tags +API_VERSION=latest # ← Bad: Can change unexpectedly + +# Avoid ambiguous versions +API_VERSION=production # ← Bad: What version is this? +``` + +### 8.2 Deployment Timing + +**Recommended deployment windows:** +- **Development:** Anytime (automatic) +- **Sandbox:** Business hours (9am-5pm) +- **Testing:** Business hours (requires QA) +- **Production:** + - Normal changes: Tuesday-Thursday, 10am-2pm + - Critical fixes: Anytime with proper approval + - Avoid: Monday mornings, Friday afternoons, weekends + +### 8.3 Communication + +**Before Production deployment:** +``` +Slack announcement template: + +📢 Production Deployment Scheduled + +🗓 Date: January 15, 2026 +⏰ Time: 11:00 AM (EST) +⏱ Duration: ~15 minutes +📝 Changes: + - API v3.2.1 → v3.2.2 (bug fixes) + - Frontend v2.1.5 → v2.1.6 (UI improvements) + +🔗 Release Notes: [link] +🔗 Rollback Plan: [link] + +Please report any issues to #devops-alerts +``` + +### 8.4 Security Considerations + +```yaml +# Store sensitive data as Docker secrets +secrets: + db_password: + external: true # ← Created outside compose file + api_key: + external: true + +# Never commit secrets to Git! +# Use GitLab CI/CD variables for: +# - SSH keys +# - API tokens +# - Passwords +# - Certificates +``` + +### 8.5 Troubleshooting Common Issues + +**Issue 1: Pipeline fails with "SSH connection refused"** +```bash +# Solution: Verify SSH key in GitLab CI/CD variables +# Test manually: +ssh -i ~/.ssh/gitlab_rsa root@swarm-manager +``` + +**Issue 2: Image pull fails from Harbor** +```bash +# Solution: Check registry credentials +docker login harbor.company.com -u $HARBOR_USER -p $HARBOR_PASSWORD + +# Verify image exists: +docker pull harbor.company.com/company/api:v3.2.1 +``` + +**Issue 3: Health checks fail after deployment** +```bash +# Debug: Check service logs +docker service logs app-stack_api --tail 100 + +# Check service status +docker service ps app-stack_api + +# Manual health check +curl http://localhost:3000/health +``` + +**Issue 4: Deployment stuck "pending"** +```bash +# Check swarm node status +docker node ls + +# Check resource availability +docker node inspect swarm-worker-1 | grep Resources -A 10 + +# Check for failed tasks +docker service ps app-stack_api --no-trunc +``` + +--- + +## 9. Success Metrics + +### 9.1 Key Performance Indicators + +**Before Automation:** +- 📊 Deployment frequency: 1-2 per week +- ⏱ Average deployment time: 30-60 minutes per environment +- 🐛 Deployment errors: ~20% (typos, wrong tags) +- 🔄 Rollback time: 1-2 hours (manual) +- 📝 Audit trail: Partial (chat logs, manual notes) + +**After Automation (Target):** +- 📊 Deployment frequency: 5-10 per week +- ⏱ Average deployment time: 5-10 minutes per environment +- 🐛 Deployment errors: <2% (automated validation) +- 🔄 Rollback time: 2-3 minutes (automatic) +- 📝 Audit trail: Complete (Git history + GitLab logs) + +### 9.2 Success Criteria + +**Week 4 Evaluation:** +- [ ] All 4 environments deployed via GitLab CI/CD +- [ ] Zero manual SSH deployments +- [ ] At least 5 successful Production deployments +- [ ] At least 1 successful rollback test +- [ ] Team can deploy without DevOps assistance +- [ ] Complete audit trail for all deployments +- [ ] Average deployment time < 15 minutes + +--- + +## 10. Conclusion & Next Steps + +### Current State +❌ Manual bash script deployments +❌ No audit trail +❌ Error-prone process +❌ Slow rollbacks + +### Target State (After Implementation) +✅ Automated GitLab CI/CD pipelines +✅ Complete Git-based audit trail +✅ Validated deployments with health checks +✅ 2-minute automatic rollbacks +✅ Self-service for developers + +### Immediate Next Steps + +1. **This Week:** + - Create GitLab repository structure + - Configure CI/CD variables + - Test SSH connectivity + +2. **Next Week:** + - Implement basic pipeline + - Test Development deployments + - Add validation stages + +3. **Week 3-4:** + - Roll out to all environments + - Implement rollback procedures + - Train team + +### Resources Needed + +- **Time Investment:** 2-4 weeks (1 DevOps engineer) +- **Infrastructure:** GitLab Runner (existing OK) +- **Training:** 2-3 hours team training session +- **Documentation:** Deployment guide + runbooks + +### Support & Questions + +For implementation assistance: +- 📧 Email: devops@company.com +- 💬 Slack: #devops-automation +- 📖 Documentation: https://gitlab.company.com/deployment-configs + +--- + +**Document Version:** 1.0 +**Last Updated:** Январь 2026 +**Status:** Ready for Implementation +**Author:** DevOps Team +**Review Date:** After Phase 2 completion \ No newline at end of file