41 KiB
GitLab + Harbor + Docker Swarm: Automated Deployment Solution
Версия: 1.0
Дата создания: Январь 2026
Статус: Implementation Ready
Целевая аудитория: DevOps Team, Development Team
Executive Summary
Данный документ описывает практическое решение для автоматизации deployment процесса в существующей инфраструктуре:
Текущая ситуация:
- ✅ GitLab уже установлен
- ✅ Harbor Registry уже работает
- ✅ Docker Swarm с несколькими контейнерами
- ✅ 4 окружения: Development → Sandbox → Testing → Production
- ❌ Ручной deployment через bash скрипты
- ❌ Нет процесса code review
- ❌ Нет автоматического rollback
- ❌ Получаем готовые images из Harbor без visibility
Предлагаемое решение:
- GitLab CI/CD pipelines для автоматического deployment
- GitOps подход: Git как source of truth для deployments
- Автоматический deployment по средам с approval gates
- One-click rollback capability
- Deployment history и audit trail
- Health checks и автоматический rollback при failure
Результаты внедрения:
- 🚀 Deployment time: с 30-60 минут → 5-10 минут
- 🔒 Human errors: reduction на 90%
- 📊 Full visibility: кто, что, когда deployed
- ⚡ Rollback: с 1-2 часов → 2-3 минуты
- ✅ Compliance: полный audit trail
Содержание
- Архитектура решения
- GitLab CI/CD Pipeline Implementation
- Docker Stack Management
- Environment Management Strategy
- Rollback Strategy
- Monitoring & Health Checks
- Implementation Roadmap
- Best Practices
1. Архитектура решения
1.1 Current State Architecture
┌─────────────────────────────────────────────────────────────┐
│ Current Manual Process │
├─────────────────────────────────────────────────────────────┤
│ │
│ Developer → Build Image → Push to Harbor │
│ ↓ │
│ Notify DevOps Team │
│ ↓ │
│ DevOps manually runs bash scripts: │
│ │
│ 1. SSH to Swarm manager │
│ 2. docker service update app --image harbor/app:new-tag │
│ 3. Check logs manually │
│ 4. Hope everything works │
│ 5. Repeat for each environment (4x) │
│ │
│ Problems: │
│ • Time consuming (30-60 min per environment) │
│ • Error prone (typos, wrong tags) │
│ • No rollback plan │
│ • No audit trail │
│ • No validation before deployment │
└─────────────────────────────────────────────────────────────┘
1.2 Target Automated Architecture
┌──────────────────────────────────────────────────────────────┐
│ Automated GitOps-Based Solution │
├──────────────────────────────────────────────────────────────┤
│ │
│ Developer pushes image tag change to Git │
│ ↓ │
│ GitLab CI/CD Pipeline automatically: │
│ ↓ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ 1. Validate docker-compose.yml syntax │ │
│ │ 2. Check image exists in Harbor │ │
│ │ 3. Deploy to Development (automatic) │ │
│ │ 4. Run health checks │ │
│ │ 5. Wait for manual approval → Sandbox │ │
│ │ 6. Deploy to Sandbox │ │
│ │ 7. Wait for manual approval → Testing │ │
│ │ 8. Deploy to Testing │ │
│ │ 9. Wait for manual approval → Production │ │
│ │ 10. Deploy to Production │ │
│ │ 11. Monitor deployment success │ │
│ │ 12. Auto-rollback if health checks fail │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ Benefits: │
│ ✅ 5-10 minutes per environment │
│ ✅ Zero human errors │
│ ✅ Automatic rollback on failure │
│ ✅ Complete audit trail in Git │
│ ✅ Pre-deployment validation │
└──────────────────────────────────────────────────────────────┘
1.3 Git Repository Structure
deployment-configs/ # New GitLab repository
├── README.md
├── .gitlab-ci.yml # CI/CD pipeline definition
│
├── environments/
│ ├── development/
│ │ ├── docker-compose.yml
│ │ ├── .env
│ │ └── healthcheck.sh
│ │
│ ├── sandbox/
│ │ ├── docker-compose.yml
│ │ ├── .env
│ │ └── healthcheck.sh
│ │
│ ├── testing/
│ │ ├── docker-compose.yml
│ │ ├── .env
│ │ └── healthcheck.sh
│ │
│ └── production/
│ ├── docker-compose.yml
│ ├── .env
│ └── healthcheck.sh
│
├── scripts/
│ ├── deploy.sh # Deployment script
│ ├── rollback.sh # Rollback script
│ ├── healthcheck.sh # Health validation
│ └── validate-compose.sh # Pre-deployment validation
│
└── docs/
├── deployment-guide.md
└── rollback-procedure.md
2. GitLab CI/CD Pipeline Implementation
2.1 Complete .gitlab-ci.yml
# .gitlab-ci.yml - Complete automated deployment pipeline
variables:
DOCKER_HOST: "tcp://docker-swarm-manager:2376"
DOCKER_TLS_VERIFY: "1"
HARBOR_REGISTRY: "harbor.company.com"
# Swarm connection details (stored in GitLab CI/CD variables)
# SWARM_DEV_HOST, SWARM_SANDBOX_HOST, SWARM_TEST_HOST, SWARM_PROD_HOST
# SWARM_SSH_KEY (SSH private key for authentication)
stages:
- validate
- deploy-dev
- deploy-sandbox
- deploy-testing
- deploy-production
- rollback
#═══════════════════════════════════════════════════════════
# Stage 1: VALIDATION
#═══════════════════════════════════════════════════════════
validate:syntax:
stage: validate
image: docker:24-cli
script:
- echo "Validating docker-compose files..."
- |
for env in development sandbox testing production; do
echo "Checking $env environment..."
docker-compose -f environments/$env/docker-compose.yml config > /dev/null
if [ $? -eq 0 ]; then
echo "✅ $env: Syntax OK"
else
echo "❌ $env: Syntax ERROR"
exit 1
fi
done
only:
- branches
tags:
- docker
validate:images:
stage: validate
image: docker:24-cli
before_script:
- docker login -u $HARBOR_USER -p $HARBOR_PASSWORD $HARBOR_REGISTRY
script:
- echo "Checking if images exist in Harbor..."
- |
for env in development sandbox testing production; do
echo "Checking images for $env..."
# Extract image tags from docker-compose
images=$(grep "image:" environments/$env/docker-compose.yml | awk '{print $2}')
for image in $images; do
echo "Pulling $image to verify existence..."
docker pull $image
if [ $? -eq 0 ]; then
echo "✅ Image exists: $image"
else
echo "❌ Image NOT found: $image"
exit 1
fi
done
done
only:
- branches
tags:
- docker
#═══════════════════════════════════════════════════════════
# Stage 2: DEPLOY TO DEVELOPMENT (Automatic)
#═══════════════════════════════════════════════════════════
deploy:development:
stage: deploy-dev
image: alpine:latest
before_script:
- apk add --no-cache openssh-client bash docker-cli
- eval $(ssh-agent -s)
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan -H $SWARM_DEV_HOST >> ~/.ssh/known_hosts
script:
- echo "🚀 Deploying to DEVELOPMENT environment..."
# Copy files to swarm manager
- scp -r environments/development root@$SWARM_DEV_HOST:/tmp/deploy/
- scp scripts/deploy.sh root@$SWARM_DEV_HOST:/tmp/deploy/
# Execute deployment
- |
ssh root@$SWARM_DEV_HOST bash << 'EOF'
cd /tmp/deploy/development
# Load environment variables
source .env
# Deploy stack
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
# Wait for services to stabilize
echo "Waiting for services to start..."
sleep 30
# Check service status
docker stack services app-stack
# Run health checks
bash ../healthcheck.sh
EOF
- echo "✅ Deployment to DEVELOPMENT completed"
environment:
name: development
url: https://dev.company.com
on_stop: stop:development
only:
- main
- develop
tags:
- deployment
#═══════════════════════════════════════════════════════════
# Stage 3: DEPLOY TO SANDBOX (Manual Approval Required)
#═══════════════════════════════════════════════════════════
deploy:sandbox:
stage: deploy-sandbox
image: alpine:latest
before_script:
- apk add --no-cache openssh-client bash docker-cli
- eval $(ssh-agent -s)
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan -H $SWARM_SANDBOX_HOST >> ~/.ssh/known_hosts
script:
- echo "🚀 Deploying to SANDBOX environment..."
- scp -r environments/sandbox root@$SWARM_SANDBOX_HOST:/tmp/deploy/
- |
ssh root@$SWARM_SANDBOX_HOST bash << 'EOF'
cd /tmp/deploy/sandbox
source .env
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
sleep 30
docker stack services app-stack
bash ../healthcheck.sh
EOF
- echo "✅ Deployment to SANDBOX completed"
environment:
name: sandbox
url: https://sandbox.company.com
when: manual # ⚠️ Requires manual approval
only:
- main
tags:
- deployment
#═══════════════════════════════════════════════════════════
# Stage 4: DEPLOY TO TESTING (Manual Approval Required)
#═══════════════════════════════════════════════════════════
deploy:testing:
stage: deploy-testing
image: alpine:latest
before_script:
- apk add --no-cache openssh-client bash docker-cli
- eval $(ssh-agent -s)
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan -H $SWARM_TEST_HOST >> ~/.ssh/known_hosts
script:
- echo "🚀 Deploying to TESTING environment..."
- scp -r environments/testing root@$SWARM_TEST_HOST:/tmp/deploy/
- |
ssh root@$SWARM_TEST_HOST bash << 'EOF'
cd /tmp/deploy/testing
source .env
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
sleep 30
docker stack services app-stack
bash ../healthcheck.sh
EOF
- echo "✅ Deployment to TESTING completed"
environment:
name: testing
url: https://testing.company.com
when: manual # ⚠️ Requires manual approval
only:
- main
tags:
- deployment
#═══════════════════════════════════════════════════════════
# Stage 5: DEPLOY TO PRODUCTION (Manual Approval Required)
#═══════════════════════════════════════════════════════════
deploy:production:
stage: deploy-production
image: alpine:latest
before_script:
- apk add --no-cache openssh-client bash docker-cli
- eval $(ssh-agent -s)
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan -H $SWARM_PROD_HOST >> ~/.ssh/known_hosts
script:
- echo "🚀 Deploying to PRODUCTION environment..."
# Backup current deployment
- |
ssh root@$SWARM_PROD_HOST bash << 'EOF'
echo "Creating backup of current deployment..."
mkdir -p /backup/deployments/$(date +%Y%m%d-%H%M%S)
docker stack services app-stack --format "{{.Name}} {{.Image}}" > /backup/deployments/$(date +%Y%m%d-%H%M%S)/services.txt
echo "Backup created"
EOF
# Deploy new version
- scp -r environments/production root@$SWARM_PROD_HOST:/tmp/deploy/
- |
ssh root@$SWARM_PROD_HOST bash << 'EOF'
cd /tmp/deploy/production
source .env
echo "Starting production deployment..."
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
echo "Waiting for services to stabilize..."
sleep 60
echo "Checking service health..."
docker stack services app-stack
# Run comprehensive health checks
bash ../healthcheck.sh
if [ $? -eq 0 ]; then
echo "✅ Health checks PASSED"
else
echo "❌ Health checks FAILED - consider rollback"
exit 1
fi
EOF
- echo "✅ Deployment to PRODUCTION completed successfully"
environment:
name: production
url: https://app.company.com
when: manual # ⚠️ Requires manual approval + confirmation
only:
- main
tags:
- deployment
#═══════════════════════════════════════════════════════════
# ROLLBACK JOBS (Manual Trigger)
#═══════════════════════════════════════════════════════════
rollback:production:
stage: rollback
image: alpine:latest
before_script:
- apk add --no-cache openssh-client bash docker-cli git
- eval $(ssh-agent -s)
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan -H $SWARM_PROD_HOST >> ~/.ssh/known_hosts
script:
- echo "🔄 Rolling back PRODUCTION to previous version..."
# Get previous Git commit
- PREVIOUS_COMMIT=$(git rev-parse HEAD~1)
- echo "Rolling back to commit: $PREVIOUS_COMMIT"
# Checkout previous version
- git checkout $PREVIOUS_COMMIT -- environments/production/
# Deploy previous version
- scp -r environments/production root@$SWARM_PROD_HOST:/tmp/rollback/
- |
ssh root@$SWARM_PROD_HOST bash << 'EOF'
cd /tmp/rollback/production
source .env
echo "Rolling back to previous version..."
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
sleep 30
echo "Verifying rollback..."
docker stack services app-stack
bash ../healthcheck.sh
EOF
- echo "✅ Rollback completed"
environment:
name: production
action: rollback
when: manual
only:
- main
tags:
- deployment
3. Docker Stack Management
3.1 Example docker-compose.yml Structure
# environments/production/docker-compose.yml
version: '3.8'
services:
#════════════════════════════════════════════════════════
# Frontend Application
#════════════════════════════════════════════════════════
frontend:
image: ${HARBOR_REGISTRY}/company/frontend:${FRONTEND_VERSION}
networks:
- app-network
ports:
- "80:80"
- "443:443"
deploy:
replicas: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
monitor: 30s
rollback_config:
parallelism: 1
delay: 5s
restart_policy:
condition: any
delay: 5s
max_attempts: 3
placement:
constraints:
- node.role == worker
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
#════════════════════════════════════════════════════════
# Backend API
#════════════════════════════════════════════════════════
api:
image: ${HARBOR_REGISTRY}/company/api:${API_VERSION}
networks:
- app-network
- db-network
environment:
- DATABASE_URL=${DATABASE_URL}
- REDIS_URL=${REDIS_URL}
- JWT_SECRET=${JWT_SECRET}
secrets:
- db_password
- jwt_secret
deploy:
replicas: 5
update_config:
parallelism: 2
delay: 10s
failure_action: rollback
monitor: 45s
rollback_config:
parallelism: 2
delay: 5s
restart_policy:
condition: any
delay: 5s
max_attempts: 3
placement:
constraints:
- node.role == worker
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
#════════════════════════════════════════════════════════
# Worker Service
#════════════════════════════════════════════════════════
worker:
image: ${HARBOR_REGISTRY}/company/worker:${WORKER_VERSION}
networks:
- app-network
- db-network
environment:
- REDIS_URL=${REDIS_URL}
- QUEUE_NAME=jobs
deploy:
replicas: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
restart_policy:
condition: any
delay: 10s
max_attempts: 3
placement:
constraints:
- node.role == worker
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
#════════════════════════════════════════════════════════
# Cache (Redis)
#════════════════════════════════════════════════════════
redis:
image: redis:7-alpine
networks:
- app-network
deploy:
replicas: 1
placement:
constraints:
- node.role == worker
restart_policy:
condition: any
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 3
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
#════════════════════════════════════════════════════════
# Networks
#════════════════════════════════════════════════════════
networks:
app-network:
driver: overlay
attachable: true
db-network:
driver: overlay
internal: true
#════════════════════════════════════════════════════════
# Secrets
#════════════════════════════════════════════════════════
secrets:
db_password:
external: true
jwt_secret:
external: true
3.2 Environment Variables (.env files)
# environments/production/.env
# Harbor Registry
HARBOR_REGISTRY=harbor.company.com
# Application Versions (THIS IS WHAT YOU UPDATE!)
FRONTEND_VERSION=v2.1.5
API_VERSION=v3.2.1
WORKER_VERSION=v1.8.3
# Database Configuration
DATABASE_URL=postgresql://user@db-prod:5432/appdb
# Redis Configuration
REDIS_URL=redis://redis:6379
# Application Configuration
JWT_SECRET_FILE=/run/secrets/jwt_secret
LOG_LEVEL=info
ENVIRONMENT=production
3.3 Health Check Script
#!/bin/bash
# environments/production/healthcheck.sh
set -e
echo "═══════════════════════════════════════════════"
echo "Running Health Checks for Production"
echo "═══════════════════════════════════════════════"
STACK_NAME="app-stack"
FAILED=0
# Check if all services are running
echo ""
echo "1️⃣ Checking service status..."
SERVICES=$(docker stack services $STACK_NAME --format "{{.Name}}")
for service in $SERVICES; do
REPLICAS=$(docker service ls --filter name=$service --format "{{.Replicas}}")
echo " $service: $REPLICAS"
# Check if service has failed replicas
if echo "$REPLICAS" | grep -q "0/"; then
echo " ❌ Service $service has NO running replicas!"
FAILED=1
fi
done
# Check frontend health endpoint
echo ""
echo "2️⃣ Checking Frontend health endpoint..."
if curl -sf http://localhost/health > /dev/null; then
echo " ✅ Frontend health check PASSED"
else
echo " ❌ Frontend health check FAILED"
FAILED=1
fi
# Check API health endpoint
echo ""
echo "3️⃣ Checking API health endpoint..."
if curl -sf http://localhost:3000/health > /dev/null; then
echo " ✅ API health check PASSED"
else
echo " ❌ API health check FAILED"
FAILED=1
fi
# Check Redis connectivity
echo ""
echo "4️⃣ Checking Redis connectivity..."
if docker exec $(docker ps -q -f name=${STACK_NAME}_redis) redis-cli ping | grep -q PONG; then
echo " ✅ Redis connectivity PASSED"
else
echo " ❌ Redis connectivity FAILED"
FAILED=1
fi
# Check for recent errors in logs
echo ""
echo "5️⃣ Checking recent logs for errors..."
ERROR_COUNT=$(docker service logs --since 5m $STACK_NAME | grep -i "error\|fatal\|panic" | wc -l)
if [ $ERROR_COUNT -gt 10 ]; then
echo " ⚠️ Found $ERROR_COUNT errors in last 5 minutes"
FAILED=1
else
echo " ✅ Error count acceptable: $ERROR_COUNT"
fi
echo ""
echo "═══════════════════════════════════════════════"
if [ $FAILED -eq 0 ]; then
echo "✅ ALL HEALTH CHECKS PASSED"
echo "═══════════════════════════════════════════════"
exit 0
else
echo "❌ HEALTH CHECKS FAILED"
echo "═══════════════════════════════════════════════"
exit 1
fi
4. Environment Management Strategy
4.1 Promotion Flow
┌─────────────────────────────────────────────────────────┐
│ Environment Promotion Flow │
└─────────────────────────────────────────────────────────┘
Developer updates image version in Git
↓
Development (Automatic)
├─ Deploy immediately
├─ Run health checks
└─ ✅ If successful → enable Sandbox deployment
↓ (Manual approval required)
Sandbox (Manual Trigger)
├─ QA team tests features
├─ Run integration tests
└─ ✅ If approved → enable Testing deployment
↓ (Manual approval required)
Testing (Manual Trigger)
├─ Full regression testing
├─ Performance testing
└─ ✅ If approved → enable Production deployment
↓ (Manual approval required + confirmation)
Production (Manual Trigger)
├─ Backup current state
├─ Deploy with blue-green strategy
├─ Run comprehensive health checks
└─ ✅ Monitor or 🔄 Rollback if issues
4.2 Deployment Approval Matrix
| Environment | Approval Required | Who Can Approve | Rollback Strategy |
|---|---|---|---|
| Development | ❌ No (Automatic) | N/A | Automatic on health check failure |
| Sandbox | ✅ Yes (Manual) | Any Developer | Manual via GitLab UI |
| Testing | ✅ Yes (Manual) | QA Lead, DevOps Lead | Manual via GitLab UI |
| Production | ✅ Yes (Manual + Confirmation) | DevOps Lead, CTO | Automatic on failure + Manual option |
4.3 Change Management Workflow
# Example: Updating application version
# 1. Developer receives new image from Harbor
New image available: harbor.company.com/company/api:v3.2.2
# 2. Developer creates feature branch
git checkout -b update-api-v3.2.2
# 3. Update version in Development environment
# Edit: environments/development/.env
API_VERSION=v3.2.2
# 4. Commit and push
git add environments/development/.env
git commit -m "feat: update API to v3.2.2 in development"
git push origin update-api-v3.2.2
# 5. Create Merge Request in GitLab
- Title: "Update API to v3.2.2"
- Description: "New features: X, Y, Z. Bug fixes: A, B"
- Assign to: DevOps team for review
# 6. After MR approval and merge to main:
- GitLab CI automatically deploys to Development
- Monitor deployment
- If successful, manually trigger Sandbox deployment
# 7. QA tests in Sandbox
- If approved, update Testing environment
- Repeat process
# 8. Production deployment
- Update production/.env with new version
- Create MR with detailed change log
- Require approvals from: DevOps Lead + CTO
- Schedule deployment window
- Execute manual deployment
- Monitor closely
5. Rollback Strategy
5.1 Automatic Rollback (Health Check Failure)
# In docker-compose.yml - automatic rollback on failure
services:
api:
deploy:
update_config:
failure_action: rollback # ← Automatic rollback!
monitor: 60s # Monitor for 60 seconds
rollback_config:
parallelism: 2 # Roll back 2 at a time
delay: 5s # 5s between rollbacks
How it works:
- New version deploys
- Docker Swarm monitors health checks for 60 seconds
- If health checks fail → Automatic rollback to previous version
- Previous version restored within 2-3 minutes
5.2 Manual Rollback via GitLab
Option A: Rollback via Git History
# GitLab Pipeline: rollback:production job
# 1. Identify previous working version
git log --oneline environments/production/.env
# 2. Checkout previous commit
git checkout <previous-commit-hash> -- environments/production/
# 3. Pipeline redeploys previous version
# 4. Verify health checks
Option B: Rollback via GitLab UI
GitLab → Deployments → Environments → Production
↓
Click "Rollback" button
↓
Select previous successful deployment
↓
Confirm rollback
↓
Pipeline automatically executes rollback job
5.3 Emergency Rollback Procedure
#!/bin/bash
# scripts/emergency-rollback.sh
# FOR EMERGENCY USE ONLY - bypasses GitLab pipeline
# Run directly on Swarm manager node
STACK_NAME="app-stack"
BACKUP_DIR="/backup/deployments"
echo "🚨 EMERGENCY ROLLBACK INITIATED"
# Find last backup
LAST_BACKUP=$(ls -td $BACKUP_DIR/* | head -1)
echo "Rolling back to: $LAST_BACKUP"
# Extract previous image versions
while read line; do
SERVICE=$(echo $line | awk '{print $1}')
IMAGE=$(echo $line | awk '{print $2}')
echo "Rolling back $SERVICE to $IMAGE"
docker service update --image $IMAGE ${STACK_NAME}_${SERVICE}
done < "$LAST_BACKUP/services.txt"
echo "✅ Emergency rollback completed"
echo "⚠️ Remember to update Git repository to match!"
6. Monitoring & Health Checks
6.1 Service-Level Health Checks
# In docker-compose.yml
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s # Check every 30 seconds
timeout: 10s # Request timeout
retries: 3 # Fail after 3 attempts
start_period: 60s # Grace period for startup
6.2 Stack-Level Monitoring
#!/bin/bash
# scripts/monitor-deployment.sh
STACK_NAME="app-stack"
while true; do
clear
echo "═══════════════════════════════════════════════"
echo "Stack: $STACK_NAME - $(date)"
echo "═══════════════════════════════════════════════"
# Show service status
docker stack services $STACK_NAME
echo ""
echo "Recent logs (last 10 lines):"
docker service logs --tail=10 $STACK_NAME
sleep 10
done
6.3 Notification Integration
# Add to .gitlab-ci.yml
after_script:
- |
if [ "$CI_JOB_STATUS" == "success" ]; then
MESSAGE="✅ Deployment to $CI_ENVIRONMENT_NAME successful"
else
MESSAGE="❌ Deployment to $CI_ENVIRONMENT_NAME FAILED"
fi
# Send to Slack
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$MESSAGE\nPipeline: $CI_PIPELINE_URL\"}" \
$SLACK_WEBHOOK_URL
# Send email (if SMTP configured)
echo "$MESSAGE" | mail -s "Deployment Notification" devops@company.com
7. Implementation Roadmap
Phase 1: Preparation (Week 1)
Day 1-2: Repository Setup
- Create
deployment-configsrepository in GitLab - Create directory structure (environments/, scripts/)
- Add current docker-compose.yml to each environment
- Create .env files with current versions
- Commit initial structure
Day 3-4: GitLab Configuration
- Configure GitLab CI/CD variables:
SWARM_DEV_HOST,SWARM_SANDBOX_HOST,SWARM_TEST_HOST,SWARM_PROD_HOSTSWARM_SSH_KEY(SSH private key)HARBOR_USER,HARBOR_PASSWORDSLACK_WEBHOOK_URL(optional)
- Create SSH keys for GitLab Runner → Swarm access
- Test SSH connectivity from GitLab to each Swarm environment
Day 5: Scripts Development
- Create deploy.sh script
- Create healthcheck.sh script
- Create rollback.sh script
- Test scripts manually on Development environment
Phase 2: Pipeline Implementation (Week 2)
Day 1-2: Basic Pipeline
- Create .gitlab-ci.yml with validation stage only
- Test syntax validation
- Test image validation
Day 3: Development Deployment
- Add deploy:development job
- Test automatic deployment to Development
- Verify health checks work
Day 4: Sandbox & Testing
- Add deploy:sandbox job (manual)
- Add deploy:testing job (manual)
- Test manual approval workflow
Day 5: Production Deployment
- Add deploy:production job (manual + confirmation)
- Add backup before deployment
- Test on Friday afternoon (low traffic)
Phase 3: Rollback Implementation (Week 3)
Day 1-2: Automatic Rollback
- Configure Docker Swarm automatic rollback
- Test by deploying broken version
- Verify automatic recovery
Day 3-4: Manual Rollback
- Implement rollback:production job
- Test Git-based rollback
- Document rollback procedure
Day 5: Emergency Procedures
- Create emergency-rollback.sh script
- Test emergency rollback
- Document for on-call team
Phase 4: Monitoring & Optimization (Week 4)
Day 1-2: Monitoring
- Set up deployment notifications (Slack/Email)
- Configure Prometheus metrics collection
- Create Grafana dashboards for deployments
Day 3-4: Documentation
- Write deployment guide for developers
- Write operations runbook
- Create troubleshooting guide
- Record demo video
Day 5: Team Training
- Train developers on new workflow
- Train QA team on approval process
- Train DevOps team on monitoring/rollback
- Conduct Q&A session
8. Best Practices & Tips
8.1 Version Management
✅ DO:
# Use semantic versioning
API_VERSION=v3.2.1 # ← Good: Clear, semantic version
# Include Git commit hash for traceability
API_VERSION=v3.2.1-abc123ef
# Use immutable tags
IMAGE=harbor.company.com/app:v1.2.3 # ← Good: Specific version
❌ DON'T:
# Avoid mutable tags
API_VERSION=latest # ← Bad: Can change unexpectedly
# Avoid ambiguous versions
API_VERSION=production # ← Bad: What version is this?
8.2 Deployment Timing
Recommended deployment windows:
- Development: Anytime (automatic)
- Sandbox: Business hours (9am-5pm)
- Testing: Business hours (requires QA)
- Production:
- Normal changes: Tuesday-Thursday, 10am-2pm
- Critical fixes: Anytime with proper approval
- Avoid: Monday mornings, Friday afternoons, weekends
8.3 Communication
Before Production deployment:
Slack announcement template:
📢 Production Deployment Scheduled
🗓 Date: January 15, 2026
⏰ Time: 11:00 AM (EST)
⏱ Duration: ~15 minutes
📝 Changes:
- API v3.2.1 → v3.2.2 (bug fixes)
- Frontend v2.1.5 → v2.1.6 (UI improvements)
🔗 Release Notes: [link]
🔗 Rollback Plan: [link]
Please report any issues to #devops-alerts
8.4 Security Considerations
# Store sensitive data as Docker secrets
secrets:
db_password:
external: true # ← Created outside compose file
api_key:
external: true
# Never commit secrets to Git!
# Use GitLab CI/CD variables for:
# - SSH keys
# - API tokens
# - Passwords
# - Certificates
8.5 Troubleshooting Common Issues
Issue 1: Pipeline fails with "SSH connection refused"
# Solution: Verify SSH key in GitLab CI/CD variables
# Test manually:
ssh -i ~/.ssh/gitlab_rsa root@swarm-manager
Issue 2: Image pull fails from Harbor
# Solution: Check registry credentials
docker login harbor.company.com -u $HARBOR_USER -p $HARBOR_PASSWORD
# Verify image exists:
docker pull harbor.company.com/company/api:v3.2.1
Issue 3: Health checks fail after deployment
# Debug: Check service logs
docker service logs app-stack_api --tail 100
# Check service status
docker service ps app-stack_api
# Manual health check
curl http://localhost:3000/health
Issue 4: Deployment stuck "pending"
# Check swarm node status
docker node ls
# Check resource availability
docker node inspect swarm-worker-1 | grep Resources -A 10
# Check for failed tasks
docker service ps app-stack_api --no-trunc
9. Success Metrics
9.1 Key Performance Indicators
Before Automation:
- 📊 Deployment frequency: 1-2 per week
- ⏱ Average deployment time: 30-60 minutes per environment
- 🐛 Deployment errors: ~20% (typos, wrong tags)
- 🔄 Rollback time: 1-2 hours (manual)
- 📝 Audit trail: Partial (chat logs, manual notes)
After Automation (Target):
- 📊 Deployment frequency: 5-10 per week
- ⏱ Average deployment time: 5-10 minutes per environment
- 🐛 Deployment errors: <2% (automated validation)
- 🔄 Rollback time: 2-3 minutes (automatic)
- 📝 Audit trail: Complete (Git history + GitLab logs)
9.2 Success Criteria
Week 4 Evaluation:
- All 4 environments deployed via GitLab CI/CD
- Zero manual SSH deployments
- At least 5 successful Production deployments
- At least 1 successful rollback test
- Team can deploy without DevOps assistance
- Complete audit trail for all deployments
- Average deployment time < 15 minutes
10. Conclusion & Next Steps
Current State
❌ Manual bash script deployments
❌ No audit trail
❌ Error-prone process
❌ Slow rollbacks
Target State (After Implementation)
✅ Automated GitLab CI/CD pipelines
✅ Complete Git-based audit trail
✅ Validated deployments with health checks
✅ 2-minute automatic rollbacks
✅ Self-service for developers
Immediate Next Steps
-
This Week:
- Create GitLab repository structure
- Configure CI/CD variables
- Test SSH connectivity
-
Next Week:
- Implement basic pipeline
- Test Development deployments
- Add validation stages
-
Week 3-4:
- Roll out to all environments
- Implement rollback procedures
- Train team
Resources Needed
- Time Investment: 2-4 weeks (1 DevOps engineer)
- Infrastructure: GitLab Runner (existing OK)
- Training: 2-3 hours team training session
- Documentation: Deployment guide + runbooks
Support & Questions
For implementation assistance:
- 📧 Email: devops@company.com
- 💬 Slack: #devops-automation
- 📖 Documentation: https://gitlab.company.com/deployment-configs
Document Version: 1.0
Last Updated: Январь 2026
Status: Ready for Implementation
Author: DevOps Team
Review Date: After Phase 2 completion