1320 lines
41 KiB
Markdown
1320 lines
41 KiB
Markdown
# GitLab + Harbor + Docker Swarm: Automated Deployment Solution
|
||
|
||
**Версия:** 1.0
|
||
**Дата создания:** Январь 2026
|
||
**Статус:** Implementation Ready
|
||
**Целевая аудитория:** DevOps Team, Development Team
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
Данный документ описывает практическое решение для автоматизации deployment процесса в существующей инфраструктуре:
|
||
|
||
**Текущая ситуация:**
|
||
- ✅ GitLab уже установлен
|
||
- ✅ Harbor Registry уже работает
|
||
- ✅ Docker Swarm с несколькими контейнерами
|
||
- ✅ 4 окружения: Development → Sandbox → Testing → Production
|
||
- ❌ Ручной deployment через bash скрипты
|
||
- ❌ Нет процесса code review
|
||
- ❌ Нет автоматического rollback
|
||
- ❌ Получаем готовые images из Harbor без visibility
|
||
|
||
**Предлагаемое решение:**
|
||
- GitLab CI/CD pipelines для автоматического deployment
|
||
- GitOps подход: Git как source of truth для deployments
|
||
- Автоматический deployment по средам с approval gates
|
||
- One-click rollback capability
|
||
- Deployment history и audit trail
|
||
- Health checks и автоматический rollback при failure
|
||
|
||
**Результаты внедрения:**
|
||
- 🚀 Deployment time: с 30-60 минут → 5-10 минут
|
||
- 🔒 Human errors: reduction на 90%
|
||
- 📊 Full visibility: кто, что, когда deployed
|
||
- ⚡ Rollback: с 1-2 часов → 2-3 минуты
|
||
- ✅ Compliance: полный audit trail
|
||
|
||
---
|
||
|
||
## Содержание
|
||
|
||
1. [Архитектура решения](#1-архитектура-решения)
|
||
2. [GitLab CI/CD Pipeline Implementation](#2-gitlab-cicd-pipeline-implementation)
|
||
3. [Docker Stack Management](#3-docker-stack-management)
|
||
4. [Environment Management Strategy](#4-environment-management-strategy)
|
||
5. [Rollback Strategy](#5-rollback-strategy)
|
||
6. [Monitoring & Health Checks](#6-monitoring--health-checks)
|
||
7. [Implementation Roadmap](#7-implementation-roadmap)
|
||
8. [Best Practices](#8-best-practices)
|
||
|
||
---
|
||
|
||
## 1. Архитектура решения
|
||
|
||
### 1.1 Current State Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────┐
|
||
│ Current Manual Process │
|
||
├─────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ Developer → Build Image → Push to Harbor │
|
||
│ ↓ │
|
||
│ Notify DevOps Team │
|
||
│ ↓ │
|
||
│ DevOps manually runs bash scripts: │
|
||
│ │
|
||
│ 1. SSH to Swarm manager │
|
||
│ 2. docker service update app --image harbor/app:new-tag │
|
||
│ 3. Check logs manually │
|
||
│ 4. Hope everything works │
|
||
│ 5. Repeat for each environment (4x) │
|
||
│ │
|
||
│ Problems: │
|
||
│ • Time consuming (30-60 min per environment) │
|
||
│ • Error prone (typos, wrong tags) │
|
||
│ • No rollback plan │
|
||
│ • No audit trail │
|
||
│ • No validation before deployment │
|
||
└─────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### 1.2 Target Automated Architecture
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ Automated GitOps-Based Solution │
|
||
├──────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ Developer pushes image tag change to Git │
|
||
│ ↓ │
|
||
│ GitLab CI/CD Pipeline automatically: │
|
||
│ ↓ │
|
||
│ ┌─────────────────────────────────────────────────┐ │
|
||
│ │ 1. Validate docker-compose.yml syntax │ │
|
||
│ │ 2. Check image exists in Harbor │ │
|
||
│ │ 3. Deploy to Development (automatic) │ │
|
||
│ │ 4. Run health checks │ │
|
||
│ │ 5. Wait for manual approval → Sandbox │ │
|
||
│ │ 6. Deploy to Sandbox │ │
|
||
│ │ 7. Wait for manual approval → Testing │ │
|
||
│ │ 8. Deploy to Testing │ │
|
||
│ │ 9. Wait for manual approval → Production │ │
|
||
│ │ 10. Deploy to Production │ │
|
||
│ │ 11. Monitor deployment success │ │
|
||
│ │ 12. Auto-rollback if health checks fail │ │
|
||
│ └─────────────────────────────────────────────────┘ │
|
||
│ │
|
||
│ Benefits: │
|
||
│ ✅ 5-10 minutes per environment │
|
||
│ ✅ Zero human errors │
|
||
│ ✅ Automatic rollback on failure │
|
||
│ ✅ Complete audit trail in Git │
|
||
│ ✅ Pre-deployment validation │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### 1.3 Git Repository Structure
|
||
|
||
```
|
||
deployment-configs/ # New GitLab repository
|
||
├── README.md
|
||
├── .gitlab-ci.yml # CI/CD pipeline definition
|
||
│
|
||
├── environments/
|
||
│ ├── development/
|
||
│ │ ├── docker-compose.yml
|
||
│ │ ├── .env
|
||
│ │ └── healthcheck.sh
|
||
│ │
|
||
│ ├── sandbox/
|
||
│ │ ├── docker-compose.yml
|
||
│ │ ├── .env
|
||
│ │ └── healthcheck.sh
|
||
│ │
|
||
│ ├── testing/
|
||
│ │ ├── docker-compose.yml
|
||
│ │ ├── .env
|
||
│ │ └── healthcheck.sh
|
||
│ │
|
||
│ └── production/
|
||
│ ├── docker-compose.yml
|
||
│ ├── .env
|
||
│ └── healthcheck.sh
|
||
│
|
||
├── scripts/
|
||
│ ├── deploy.sh # Deployment script
|
||
│ ├── rollback.sh # Rollback script
|
||
│ ├── healthcheck.sh # Health validation
|
||
│ └── validate-compose.sh # Pre-deployment validation
|
||
│
|
||
└── docs/
|
||
├── deployment-guide.md
|
||
└── rollback-procedure.md
|
||
```
|
||
|
||
---
|
||
|
||
## 2. GitLab CI/CD Pipeline Implementation
|
||
|
||
### 2.1 Complete .gitlab-ci.yml
|
||
|
||
```yaml
|
||
# .gitlab-ci.yml - Complete automated deployment pipeline
|
||
|
||
variables:
|
||
DOCKER_HOST: "tcp://docker-swarm-manager:2376"
|
||
DOCKER_TLS_VERIFY: "1"
|
||
HARBOR_REGISTRY: "harbor.company.com"
|
||
|
||
# Swarm connection details (stored in GitLab CI/CD variables)
|
||
# SWARM_DEV_HOST, SWARM_SANDBOX_HOST, SWARM_TEST_HOST, SWARM_PROD_HOST
|
||
# SWARM_SSH_KEY (SSH private key for authentication)
|
||
|
||
stages:
|
||
- validate
|
||
- deploy-dev
|
||
- deploy-sandbox
|
||
- deploy-testing
|
||
- deploy-production
|
||
- rollback
|
||
|
||
#═══════════════════════════════════════════════════════════
|
||
# Stage 1: VALIDATION
|
||
#═══════════════════════════════════════════════════════════
|
||
|
||
validate:syntax:
|
||
stage: validate
|
||
image: docker:24-cli
|
||
script:
|
||
- echo "Validating docker-compose files..."
|
||
- |
|
||
for env in development sandbox testing production; do
|
||
echo "Checking $env environment..."
|
||
docker-compose -f environments/$env/docker-compose.yml config > /dev/null
|
||
if [ $? -eq 0 ]; then
|
||
echo "✅ $env: Syntax OK"
|
||
else
|
||
echo "❌ $env: Syntax ERROR"
|
||
exit 1
|
||
fi
|
||
done
|
||
only:
|
||
- branches
|
||
tags:
|
||
- docker
|
||
|
||
validate:images:
|
||
stage: validate
|
||
image: docker:24-cli
|
||
before_script:
|
||
- docker login -u $HARBOR_USER -p $HARBOR_PASSWORD $HARBOR_REGISTRY
|
||
script:
|
||
- echo "Checking if images exist in Harbor..."
|
||
- |
|
||
for env in development sandbox testing production; do
|
||
echo "Checking images for $env..."
|
||
|
||
# Extract image tags from docker-compose
|
||
images=$(grep "image:" environments/$env/docker-compose.yml | awk '{print $2}')
|
||
|
||
for image in $images; do
|
||
echo "Pulling $image to verify existence..."
|
||
docker pull $image
|
||
if [ $? -eq 0 ]; then
|
||
echo "✅ Image exists: $image"
|
||
else
|
||
echo "❌ Image NOT found: $image"
|
||
exit 1
|
||
fi
|
||
done
|
||
done
|
||
only:
|
||
- branches
|
||
tags:
|
||
- docker
|
||
|
||
#═══════════════════════════════════════════════════════════
|
||
# Stage 2: DEPLOY TO DEVELOPMENT (Automatic)
|
||
#═══════════════════════════════════════════════════════════
|
||
|
||
deploy:development:
|
||
stage: deploy-dev
|
||
image: alpine:latest
|
||
before_script:
|
||
- apk add --no-cache openssh-client bash docker-cli
|
||
- eval $(ssh-agent -s)
|
||
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
|
||
- mkdir -p ~/.ssh
|
||
- chmod 700 ~/.ssh
|
||
- ssh-keyscan -H $SWARM_DEV_HOST >> ~/.ssh/known_hosts
|
||
script:
|
||
- echo "🚀 Deploying to DEVELOPMENT environment..."
|
||
|
||
# Copy files to swarm manager
|
||
- scp -r environments/development root@$SWARM_DEV_HOST:/tmp/deploy/
|
||
- scp scripts/deploy.sh root@$SWARM_DEV_HOST:/tmp/deploy/
|
||
|
||
# Execute deployment
|
||
- |
|
||
ssh root@$SWARM_DEV_HOST bash << 'EOF'
|
||
cd /tmp/deploy/development
|
||
|
||
# Load environment variables
|
||
source .env
|
||
|
||
# Deploy stack
|
||
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
|
||
|
||
# Wait for services to stabilize
|
||
echo "Waiting for services to start..."
|
||
sleep 30
|
||
|
||
# Check service status
|
||
docker stack services app-stack
|
||
|
||
# Run health checks
|
||
bash ../healthcheck.sh
|
||
EOF
|
||
|
||
- echo "✅ Deployment to DEVELOPMENT completed"
|
||
|
||
environment:
|
||
name: development
|
||
url: https://dev.company.com
|
||
on_stop: stop:development
|
||
|
||
only:
|
||
- main
|
||
- develop
|
||
|
||
tags:
|
||
- deployment
|
||
|
||
#═══════════════════════════════════════════════════════════
|
||
# Stage 3: DEPLOY TO SANDBOX (Manual Approval Required)
|
||
#═══════════════════════════════════════════════════════════
|
||
|
||
deploy:sandbox:
|
||
stage: deploy-sandbox
|
||
image: alpine:latest
|
||
before_script:
|
||
- apk add --no-cache openssh-client bash docker-cli
|
||
- eval $(ssh-agent -s)
|
||
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
|
||
- mkdir -p ~/.ssh
|
||
- chmod 700 ~/.ssh
|
||
- ssh-keyscan -H $SWARM_SANDBOX_HOST >> ~/.ssh/known_hosts
|
||
|
||
script:
|
||
- echo "🚀 Deploying to SANDBOX environment..."
|
||
- scp -r environments/sandbox root@$SWARM_SANDBOX_HOST:/tmp/deploy/
|
||
- |
|
||
ssh root@$SWARM_SANDBOX_HOST bash << 'EOF'
|
||
cd /tmp/deploy/sandbox
|
||
source .env
|
||
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
|
||
sleep 30
|
||
docker stack services app-stack
|
||
bash ../healthcheck.sh
|
||
EOF
|
||
- echo "✅ Deployment to SANDBOX completed"
|
||
|
||
environment:
|
||
name: sandbox
|
||
url: https://sandbox.company.com
|
||
|
||
when: manual # ⚠️ Requires manual approval
|
||
|
||
only:
|
||
- main
|
||
|
||
tags:
|
||
- deployment
|
||
|
||
#═══════════════════════════════════════════════════════════
|
||
# Stage 4: DEPLOY TO TESTING (Manual Approval Required)
|
||
#═══════════════════════════════════════════════════════════
|
||
|
||
deploy:testing:
|
||
stage: deploy-testing
|
||
image: alpine:latest
|
||
before_script:
|
||
- apk add --no-cache openssh-client bash docker-cli
|
||
- eval $(ssh-agent -s)
|
||
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
|
||
- mkdir -p ~/.ssh
|
||
- chmod 700 ~/.ssh
|
||
- ssh-keyscan -H $SWARM_TEST_HOST >> ~/.ssh/known_hosts
|
||
|
||
script:
|
||
- echo "🚀 Deploying to TESTING environment..."
|
||
- scp -r environments/testing root@$SWARM_TEST_HOST:/tmp/deploy/
|
||
- |
|
||
ssh root@$SWARM_TEST_HOST bash << 'EOF'
|
||
cd /tmp/deploy/testing
|
||
source .env
|
||
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
|
||
sleep 30
|
||
docker stack services app-stack
|
||
bash ../healthcheck.sh
|
||
EOF
|
||
- echo "✅ Deployment to TESTING completed"
|
||
|
||
environment:
|
||
name: testing
|
||
url: https://testing.company.com
|
||
|
||
when: manual # ⚠️ Requires manual approval
|
||
|
||
only:
|
||
- main
|
||
|
||
tags:
|
||
- deployment
|
||
|
||
#═══════════════════════════════════════════════════════════
|
||
# Stage 5: DEPLOY TO PRODUCTION (Manual Approval Required)
|
||
#═══════════════════════════════════════════════════════════
|
||
|
||
deploy:production:
|
||
stage: deploy-production
|
||
image: alpine:latest
|
||
before_script:
|
||
- apk add --no-cache openssh-client bash docker-cli
|
||
- eval $(ssh-agent -s)
|
||
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
|
||
- mkdir -p ~/.ssh
|
||
- chmod 700 ~/.ssh
|
||
- ssh-keyscan -H $SWARM_PROD_HOST >> ~/.ssh/known_hosts
|
||
|
||
script:
|
||
- echo "🚀 Deploying to PRODUCTION environment..."
|
||
|
||
# Backup current deployment
|
||
- |
|
||
ssh root@$SWARM_PROD_HOST bash << 'EOF'
|
||
echo "Creating backup of current deployment..."
|
||
mkdir -p /backup/deployments/$(date +%Y%m%d-%H%M%S)
|
||
docker stack services app-stack --format "{{.Name}} {{.Image}}" > /backup/deployments/$(date +%Y%m%d-%H%M%S)/services.txt
|
||
echo "Backup created"
|
||
EOF
|
||
|
||
# Deploy new version
|
||
- scp -r environments/production root@$SWARM_PROD_HOST:/tmp/deploy/
|
||
- |
|
||
ssh root@$SWARM_PROD_HOST bash << 'EOF'
|
||
cd /tmp/deploy/production
|
||
source .env
|
||
|
||
echo "Starting production deployment..."
|
||
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
|
||
|
||
echo "Waiting for services to stabilize..."
|
||
sleep 60
|
||
|
||
echo "Checking service health..."
|
||
docker stack services app-stack
|
||
|
||
# Run comprehensive health checks
|
||
bash ../healthcheck.sh
|
||
|
||
if [ $? -eq 0 ]; then
|
||
echo "✅ Health checks PASSED"
|
||
else
|
||
echo "❌ Health checks FAILED - consider rollback"
|
||
exit 1
|
||
fi
|
||
EOF
|
||
|
||
- echo "✅ Deployment to PRODUCTION completed successfully"
|
||
|
||
environment:
|
||
name: production
|
||
url: https://app.company.com
|
||
|
||
when: manual # ⚠️ Requires manual approval + confirmation
|
||
|
||
only:
|
||
- main
|
||
|
||
tags:
|
||
- deployment
|
||
|
||
#═══════════════════════════════════════════════════════════
|
||
# ROLLBACK JOBS (Manual Trigger)
|
||
#═══════════════════════════════════════════════════════════
|
||
|
||
rollback:production:
|
||
stage: rollback
|
||
image: alpine:latest
|
||
before_script:
|
||
- apk add --no-cache openssh-client bash docker-cli git
|
||
- eval $(ssh-agent -s)
|
||
- echo "$SWARM_SSH_KEY" | tr -d '\r' | ssh-add -
|
||
- mkdir -p ~/.ssh
|
||
- chmod 700 ~/.ssh
|
||
- ssh-keyscan -H $SWARM_PROD_HOST >> ~/.ssh/known_hosts
|
||
|
||
script:
|
||
- echo "🔄 Rolling back PRODUCTION to previous version..."
|
||
|
||
# Get previous Git commit
|
||
- PREVIOUS_COMMIT=$(git rev-parse HEAD~1)
|
||
- echo "Rolling back to commit: $PREVIOUS_COMMIT"
|
||
|
||
# Checkout previous version
|
||
- git checkout $PREVIOUS_COMMIT -- environments/production/
|
||
|
||
# Deploy previous version
|
||
- scp -r environments/production root@$SWARM_PROD_HOST:/tmp/rollback/
|
||
- |
|
||
ssh root@$SWARM_PROD_HOST bash << 'EOF'
|
||
cd /tmp/rollback/production
|
||
source .env
|
||
|
||
echo "Rolling back to previous version..."
|
||
docker stack deploy -c docker-compose.yml --with-registry-auth app-stack
|
||
|
||
sleep 30
|
||
|
||
echo "Verifying rollback..."
|
||
docker stack services app-stack
|
||
bash ../healthcheck.sh
|
||
EOF
|
||
|
||
- echo "✅ Rollback completed"
|
||
|
||
environment:
|
||
name: production
|
||
action: rollback
|
||
|
||
when: manual
|
||
|
||
only:
|
||
- main
|
||
|
||
tags:
|
||
- deployment
|
||
```
|
||
|
||
---
|
||
|
||
## 3. Docker Stack Management
|
||
|
||
### 3.1 Example docker-compose.yml Structure
|
||
|
||
```yaml
|
||
# environments/production/docker-compose.yml
|
||
|
||
version: '3.8'
|
||
|
||
services:
|
||
|
||
#════════════════════════════════════════════════════════
|
||
# Frontend Application
|
||
#════════════════════════════════════════════════════════
|
||
frontend:
|
||
image: ${HARBOR_REGISTRY}/company/frontend:${FRONTEND_VERSION}
|
||
networks:
|
||
- app-network
|
||
ports:
|
||
- "80:80"
|
||
- "443:443"
|
||
deploy:
|
||
replicas: 3
|
||
update_config:
|
||
parallelism: 1
|
||
delay: 10s
|
||
failure_action: rollback
|
||
monitor: 30s
|
||
rollback_config:
|
||
parallelism: 1
|
||
delay: 5s
|
||
restart_policy:
|
||
condition: any
|
||
delay: 5s
|
||
max_attempts: 3
|
||
placement:
|
||
constraints:
|
||
- node.role == worker
|
||
healthcheck:
|
||
test: ["CMD", "curl", "-f", "http://localhost/health"]
|
||
interval: 30s
|
||
timeout: 10s
|
||
retries: 3
|
||
start_period: 40s
|
||
logging:
|
||
driver: "json-file"
|
||
options:
|
||
max-size: "10m"
|
||
max-file: "3"
|
||
|
||
#════════════════════════════════════════════════════════
|
||
# Backend API
|
||
#════════════════════════════════════════════════════════
|
||
api:
|
||
image: ${HARBOR_REGISTRY}/company/api:${API_VERSION}
|
||
networks:
|
||
- app-network
|
||
- db-network
|
||
environment:
|
||
- DATABASE_URL=${DATABASE_URL}
|
||
- REDIS_URL=${REDIS_URL}
|
||
- JWT_SECRET=${JWT_SECRET}
|
||
secrets:
|
||
- db_password
|
||
- jwt_secret
|
||
deploy:
|
||
replicas: 5
|
||
update_config:
|
||
parallelism: 2
|
||
delay: 10s
|
||
failure_action: rollback
|
||
monitor: 45s
|
||
rollback_config:
|
||
parallelism: 2
|
||
delay: 5s
|
||
restart_policy:
|
||
condition: any
|
||
delay: 5s
|
||
max_attempts: 3
|
||
placement:
|
||
constraints:
|
||
- node.role == worker
|
||
healthcheck:
|
||
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
|
||
interval: 30s
|
||
timeout: 10s
|
||
retries: 3
|
||
start_period: 60s
|
||
logging:
|
||
driver: "json-file"
|
||
options:
|
||
max-size: "10m"
|
||
max-file: "3"
|
||
|
||
#════════════════════════════════════════════════════════
|
||
# Worker Service
|
||
#════════════════════════════════════════════════════════
|
||
worker:
|
||
image: ${HARBOR_REGISTRY}/company/worker:${WORKER_VERSION}
|
||
networks:
|
||
- app-network
|
||
- db-network
|
||
environment:
|
||
- REDIS_URL=${REDIS_URL}
|
||
- QUEUE_NAME=jobs
|
||
deploy:
|
||
replicas: 3
|
||
update_config:
|
||
parallelism: 1
|
||
delay: 10s
|
||
failure_action: rollback
|
||
restart_policy:
|
||
condition: any
|
||
delay: 10s
|
||
max_attempts: 3
|
||
placement:
|
||
constraints:
|
||
- node.role == worker
|
||
logging:
|
||
driver: "json-file"
|
||
options:
|
||
max-size: "10m"
|
||
max-file: "3"
|
||
|
||
#════════════════════════════════════════════════════════
|
||
# Cache (Redis)
|
||
#════════════════════════════════════════════════════════
|
||
redis:
|
||
image: redis:7-alpine
|
||
networks:
|
||
- app-network
|
||
deploy:
|
||
replicas: 1
|
||
placement:
|
||
constraints:
|
||
- node.role == worker
|
||
restart_policy:
|
||
condition: any
|
||
healthcheck:
|
||
test: ["CMD", "redis-cli", "ping"]
|
||
interval: 10s
|
||
timeout: 3s
|
||
retries: 3
|
||
logging:
|
||
driver: "json-file"
|
||
options:
|
||
max-size: "10m"
|
||
max-file: "3"
|
||
|
||
#════════════════════════════════════════════════════════
|
||
# Networks
|
||
#════════════════════════════════════════════════════════
|
||
networks:
|
||
app-network:
|
||
driver: overlay
|
||
attachable: true
|
||
db-network:
|
||
driver: overlay
|
||
internal: true
|
||
|
||
#════════════════════════════════════════════════════════
|
||
# Secrets
|
||
#════════════════════════════════════════════════════════
|
||
secrets:
|
||
db_password:
|
||
external: true
|
||
jwt_secret:
|
||
external: true
|
||
```
|
||
|
||
### 3.2 Environment Variables (.env files)
|
||
|
||
```bash
|
||
# environments/production/.env
|
||
|
||
# Harbor Registry
|
||
HARBOR_REGISTRY=harbor.company.com
|
||
|
||
# Application Versions (THIS IS WHAT YOU UPDATE!)
|
||
FRONTEND_VERSION=v2.1.5
|
||
API_VERSION=v3.2.1
|
||
WORKER_VERSION=v1.8.3
|
||
|
||
# Database Configuration
|
||
DATABASE_URL=postgresql://user@db-prod:5432/appdb
|
||
|
||
# Redis Configuration
|
||
REDIS_URL=redis://redis:6379
|
||
|
||
# Application Configuration
|
||
JWT_SECRET_FILE=/run/secrets/jwt_secret
|
||
LOG_LEVEL=info
|
||
ENVIRONMENT=production
|
||
```
|
||
|
||
### 3.3 Health Check Script
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
# environments/production/healthcheck.sh
|
||
|
||
set -e
|
||
|
||
echo "═══════════════════════════════════════════════"
|
||
echo "Running Health Checks for Production"
|
||
echo "═══════════════════════════════════════════════"
|
||
|
||
STACK_NAME="app-stack"
|
||
FAILED=0
|
||
|
||
# Check if all services are running
|
||
echo ""
|
||
echo "1️⃣ Checking service status..."
|
||
SERVICES=$(docker stack services $STACK_NAME --format "{{.Name}}")
|
||
|
||
for service in $SERVICES; do
|
||
REPLICAS=$(docker service ls --filter name=$service --format "{{.Replicas}}")
|
||
echo " $service: $REPLICAS"
|
||
|
||
# Check if service has failed replicas
|
||
if echo "$REPLICAS" | grep -q "0/"; then
|
||
echo " ❌ Service $service has NO running replicas!"
|
||
FAILED=1
|
||
fi
|
||
done
|
||
|
||
# Check frontend health endpoint
|
||
echo ""
|
||
echo "2️⃣ Checking Frontend health endpoint..."
|
||
if curl -sf http://localhost/health > /dev/null; then
|
||
echo " ✅ Frontend health check PASSED"
|
||
else
|
||
echo " ❌ Frontend health check FAILED"
|
||
FAILED=1
|
||
fi
|
||
|
||
# Check API health endpoint
|
||
echo ""
|
||
echo "3️⃣ Checking API health endpoint..."
|
||
if curl -sf http://localhost:3000/health > /dev/null; then
|
||
echo " ✅ API health check PASSED"
|
||
else
|
||
echo " ❌ API health check FAILED"
|
||
FAILED=1
|
||
fi
|
||
|
||
# Check Redis connectivity
|
||
echo ""
|
||
echo "4️⃣ Checking Redis connectivity..."
|
||
if docker exec $(docker ps -q -f name=${STACK_NAME}_redis) redis-cli ping | grep -q PONG; then
|
||
echo " ✅ Redis connectivity PASSED"
|
||
else
|
||
echo " ❌ Redis connectivity FAILED"
|
||
FAILED=1
|
||
fi
|
||
|
||
# Check for recent errors in logs
|
||
echo ""
|
||
echo "5️⃣ Checking recent logs for errors..."
|
||
ERROR_COUNT=$(docker service logs --since 5m $STACK_NAME | grep -i "error\|fatal\|panic" | wc -l)
|
||
if [ $ERROR_COUNT -gt 10 ]; then
|
||
echo " ⚠️ Found $ERROR_COUNT errors in last 5 minutes"
|
||
FAILED=1
|
||
else
|
||
echo " ✅ Error count acceptable: $ERROR_COUNT"
|
||
fi
|
||
|
||
echo ""
|
||
echo "═══════════════════════════════════════════════"
|
||
if [ $FAILED -eq 0 ]; then
|
||
echo "✅ ALL HEALTH CHECKS PASSED"
|
||
echo "═══════════════════════════════════════════════"
|
||
exit 0
|
||
else
|
||
echo "❌ HEALTH CHECKS FAILED"
|
||
echo "═══════════════════════════════════════════════"
|
||
exit 1
|
||
fi
|
||
```
|
||
|
||
---
|
||
|
||
## 4. Environment Management Strategy
|
||
|
||
### 4.1 Promotion Flow
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────┐
|
||
│ Environment Promotion Flow │
|
||
└─────────────────────────────────────────────────────────┘
|
||
|
||
Developer updates image version in Git
|
||
↓
|
||
Development (Automatic)
|
||
├─ Deploy immediately
|
||
├─ Run health checks
|
||
└─ ✅ If successful → enable Sandbox deployment
|
||
|
||
↓ (Manual approval required)
|
||
|
||
Sandbox (Manual Trigger)
|
||
├─ QA team tests features
|
||
├─ Run integration tests
|
||
└─ ✅ If approved → enable Testing deployment
|
||
|
||
↓ (Manual approval required)
|
||
|
||
Testing (Manual Trigger)
|
||
├─ Full regression testing
|
||
├─ Performance testing
|
||
└─ ✅ If approved → enable Production deployment
|
||
|
||
↓ (Manual approval required + confirmation)
|
||
|
||
Production (Manual Trigger)
|
||
├─ Backup current state
|
||
├─ Deploy with blue-green strategy
|
||
├─ Run comprehensive health checks
|
||
└─ ✅ Monitor or 🔄 Rollback if issues
|
||
```
|
||
|
||
### 4.2 Deployment Approval Matrix
|
||
|
||
| Environment | Approval Required | Who Can Approve | Rollback Strategy |
|
||
|-------------|-------------------|-----------------|-------------------|
|
||
| **Development** | ❌ No (Automatic) | N/A | Automatic on health check failure |
|
||
| **Sandbox** | ✅ Yes (Manual) | Any Developer | Manual via GitLab UI |
|
||
| **Testing** | ✅ Yes (Manual) | QA Lead, DevOps Lead | Manual via GitLab UI |
|
||
| **Production** | ✅ Yes (Manual + Confirmation) | DevOps Lead, CTO | Automatic on failure + Manual option |
|
||
|
||
### 4.3 Change Management Workflow
|
||
|
||
```yaml
|
||
# Example: Updating application version
|
||
|
||
# 1. Developer receives new image from Harbor
|
||
New image available: harbor.company.com/company/api:v3.2.2
|
||
|
||
# 2. Developer creates feature branch
|
||
git checkout -b update-api-v3.2.2
|
||
|
||
# 3. Update version in Development environment
|
||
# Edit: environments/development/.env
|
||
API_VERSION=v3.2.2
|
||
|
||
# 4. Commit and push
|
||
git add environments/development/.env
|
||
git commit -m "feat: update API to v3.2.2 in development"
|
||
git push origin update-api-v3.2.2
|
||
|
||
# 5. Create Merge Request in GitLab
|
||
- Title: "Update API to v3.2.2"
|
||
- Description: "New features: X, Y, Z. Bug fixes: A, B"
|
||
- Assign to: DevOps team for review
|
||
|
||
# 6. After MR approval and merge to main:
|
||
- GitLab CI automatically deploys to Development
|
||
- Monitor deployment
|
||
- If successful, manually trigger Sandbox deployment
|
||
|
||
# 7. QA tests in Sandbox
|
||
- If approved, update Testing environment
|
||
- Repeat process
|
||
|
||
# 8. Production deployment
|
||
- Update production/.env with new version
|
||
- Create MR with detailed change log
|
||
- Require approvals from: DevOps Lead + CTO
|
||
- Schedule deployment window
|
||
- Execute manual deployment
|
||
- Monitor closely
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Rollback Strategy
|
||
|
||
### 5.1 Automatic Rollback (Health Check Failure)
|
||
|
||
```yaml
|
||
# In docker-compose.yml - automatic rollback on failure
|
||
|
||
services:
|
||
api:
|
||
deploy:
|
||
update_config:
|
||
failure_action: rollback # ← Automatic rollback!
|
||
monitor: 60s # Monitor for 60 seconds
|
||
rollback_config:
|
||
parallelism: 2 # Roll back 2 at a time
|
||
delay: 5s # 5s between rollbacks
|
||
```
|
||
|
||
**How it works:**
|
||
1. New version deploys
|
||
2. Docker Swarm monitors health checks for 60 seconds
|
||
3. If health checks fail → Automatic rollback to previous version
|
||
4. Previous version restored within 2-3 minutes
|
||
|
||
### 5.2 Manual Rollback via GitLab
|
||
|
||
**Option A: Rollback via Git History**
|
||
|
||
```bash
|
||
# GitLab Pipeline: rollback:production job
|
||
|
||
# 1. Identify previous working version
|
||
git log --oneline environments/production/.env
|
||
|
||
# 2. Checkout previous commit
|
||
git checkout <previous-commit-hash> -- environments/production/
|
||
|
||
# 3. Pipeline redeploys previous version
|
||
# 4. Verify health checks
|
||
```
|
||
|
||
**Option B: Rollback via GitLab UI**
|
||
|
||
```
|
||
GitLab → Deployments → Environments → Production
|
||
↓
|
||
Click "Rollback" button
|
||
↓
|
||
Select previous successful deployment
|
||
↓
|
||
Confirm rollback
|
||
↓
|
||
Pipeline automatically executes rollback job
|
||
```
|
||
|
||
### 5.3 Emergency Rollback Procedure
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
# scripts/emergency-rollback.sh
|
||
|
||
# FOR EMERGENCY USE ONLY - bypasses GitLab pipeline
|
||
# Run directly on Swarm manager node
|
||
|
||
STACK_NAME="app-stack"
|
||
BACKUP_DIR="/backup/deployments"
|
||
|
||
echo "🚨 EMERGENCY ROLLBACK INITIATED"
|
||
|
||
# Find last backup
|
||
LAST_BACKUP=$(ls -td $BACKUP_DIR/* | head -1)
|
||
echo "Rolling back to: $LAST_BACKUP"
|
||
|
||
# Extract previous image versions
|
||
while read line; do
|
||
SERVICE=$(echo $line | awk '{print $1}')
|
||
IMAGE=$(echo $line | awk '{print $2}')
|
||
|
||
echo "Rolling back $SERVICE to $IMAGE"
|
||
docker service update --image $IMAGE ${STACK_NAME}_${SERVICE}
|
||
done < "$LAST_BACKUP/services.txt"
|
||
|
||
echo "✅ Emergency rollback completed"
|
||
echo "⚠️ Remember to update Git repository to match!"
|
||
```
|
||
|
||
---
|
||
|
||
## 6. Monitoring & Health Checks
|
||
|
||
### 6.1 Service-Level Health Checks
|
||
|
||
```yaml
|
||
# In docker-compose.yml
|
||
|
||
healthcheck:
|
||
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
|
||
interval: 30s # Check every 30 seconds
|
||
timeout: 10s # Request timeout
|
||
retries: 3 # Fail after 3 attempts
|
||
start_period: 60s # Grace period for startup
|
||
```
|
||
|
||
### 6.2 Stack-Level Monitoring
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
# scripts/monitor-deployment.sh
|
||
|
||
STACK_NAME="app-stack"
|
||
|
||
while true; do
|
||
clear
|
||
echo "═══════════════════════════════════════════════"
|
||
echo "Stack: $STACK_NAME - $(date)"
|
||
echo "═══════════════════════════════════════════════"
|
||
|
||
# Show service status
|
||
docker stack services $STACK_NAME
|
||
|
||
echo ""
|
||
echo "Recent logs (last 10 lines):"
|
||
docker service logs --tail=10 $STACK_NAME
|
||
|
||
sleep 10
|
||
done
|
||
```
|
||
|
||
### 6.3 Notification Integration
|
||
|
||
```yaml
|
||
# Add to .gitlab-ci.yml
|
||
|
||
after_script:
|
||
- |
|
||
if [ "$CI_JOB_STATUS" == "success" ]; then
|
||
MESSAGE="✅ Deployment to $CI_ENVIRONMENT_NAME successful"
|
||
else
|
||
MESSAGE="❌ Deployment to $CI_ENVIRONMENT_NAME FAILED"
|
||
fi
|
||
|
||
# Send to Slack
|
||
curl -X POST -H 'Content-type: application/json' \
|
||
--data "{\"text\":\"$MESSAGE\nPipeline: $CI_PIPELINE_URL\"}" \
|
||
$SLACK_WEBHOOK_URL
|
||
|
||
# Send email (if SMTP configured)
|
||
echo "$MESSAGE" | mail -s "Deployment Notification" devops@company.com
|
||
```
|
||
|
||
---
|
||
|
||
## 7. Implementation Roadmap
|
||
|
||
### Phase 1: Preparation (Week 1)
|
||
|
||
**Day 1-2: Repository Setup**
|
||
- [ ] Create `deployment-configs` repository in GitLab
|
||
- [ ] Create directory structure (environments/, scripts/)
|
||
- [ ] Add current docker-compose.yml to each environment
|
||
- [ ] Create .env files with current versions
|
||
- [ ] Commit initial structure
|
||
|
||
**Day 3-4: GitLab Configuration**
|
||
- [ ] Configure GitLab CI/CD variables:
|
||
- `SWARM_DEV_HOST`, `SWARM_SANDBOX_HOST`, `SWARM_TEST_HOST`, `SWARM_PROD_HOST`
|
||
- `SWARM_SSH_KEY` (SSH private key)
|
||
- `HARBOR_USER`, `HARBOR_PASSWORD`
|
||
- `SLACK_WEBHOOK_URL` (optional)
|
||
- [ ] Create SSH keys for GitLab Runner → Swarm access
|
||
- [ ] Test SSH connectivity from GitLab to each Swarm environment
|
||
|
||
**Day 5: Scripts Development**
|
||
- [ ] Create deploy.sh script
|
||
- [ ] Create healthcheck.sh script
|
||
- [ ] Create rollback.sh script
|
||
- [ ] Test scripts manually on Development environment
|
||
|
||
### Phase 2: Pipeline Implementation (Week 2)
|
||
|
||
**Day 1-2: Basic Pipeline**
|
||
- [ ] Create .gitlab-ci.yml with validation stage only
|
||
- [ ] Test syntax validation
|
||
- [ ] Test image validation
|
||
|
||
**Day 3: Development Deployment**
|
||
- [ ] Add deploy:development job
|
||
- [ ] Test automatic deployment to Development
|
||
- [ ] Verify health checks work
|
||
|
||
**Day 4: Sandbox & Testing**
|
||
- [ ] Add deploy:sandbox job (manual)
|
||
- [ ] Add deploy:testing job (manual)
|
||
- [ ] Test manual approval workflow
|
||
|
||
**Day 5: Production Deployment**
|
||
- [ ] Add deploy:production job (manual + confirmation)
|
||
- [ ] Add backup before deployment
|
||
- [ ] Test on Friday afternoon (low traffic)
|
||
|
||
### Phase 3: Rollback Implementation (Week 3)
|
||
|
||
**Day 1-2: Automatic Rollback**
|
||
- [ ] Configure Docker Swarm automatic rollback
|
||
- [ ] Test by deploying broken version
|
||
- [ ] Verify automatic recovery
|
||
|
||
**Day 3-4: Manual Rollback**
|
||
- [ ] Implement rollback:production job
|
||
- [ ] Test Git-based rollback
|
||
- [ ] Document rollback procedure
|
||
|
||
**Day 5: Emergency Procedures**
|
||
- [ ] Create emergency-rollback.sh script
|
||
- [ ] Test emergency rollback
|
||
- [ ] Document for on-call team
|
||
|
||
### Phase 4: Monitoring & Optimization (Week 4)
|
||
|
||
**Day 1-2: Monitoring**
|
||
- [ ] Set up deployment notifications (Slack/Email)
|
||
- [ ] Configure Prometheus metrics collection
|
||
- [ ] Create Grafana dashboards for deployments
|
||
|
||
**Day 3-4: Documentation**
|
||
- [ ] Write deployment guide for developers
|
||
- [ ] Write operations runbook
|
||
- [ ] Create troubleshooting guide
|
||
- [ ] Record demo video
|
||
|
||
**Day 5: Team Training**
|
||
- [ ] Train developers on new workflow
|
||
- [ ] Train QA team on approval process
|
||
- [ ] Train DevOps team on monitoring/rollback
|
||
- [ ] Conduct Q&A session
|
||
|
||
---
|
||
|
||
## 8. Best Practices & Tips
|
||
|
||
### 8.1 Version Management
|
||
|
||
**✅ DO:**
|
||
```bash
|
||
# Use semantic versioning
|
||
API_VERSION=v3.2.1 # ← Good: Clear, semantic version
|
||
|
||
# Include Git commit hash for traceability
|
||
API_VERSION=v3.2.1-abc123ef
|
||
|
||
# Use immutable tags
|
||
IMAGE=harbor.company.com/app:v1.2.3 # ← Good: Specific version
|
||
```
|
||
|
||
**❌ DON'T:**
|
||
```bash
|
||
# Avoid mutable tags
|
||
API_VERSION=latest # ← Bad: Can change unexpectedly
|
||
|
||
# Avoid ambiguous versions
|
||
API_VERSION=production # ← Bad: What version is this?
|
||
```
|
||
|
||
### 8.2 Deployment Timing
|
||
|
||
**Recommended deployment windows:**
|
||
- **Development:** Anytime (automatic)
|
||
- **Sandbox:** Business hours (9am-5pm)
|
||
- **Testing:** Business hours (requires QA)
|
||
- **Production:**
|
||
- Normal changes: Tuesday-Thursday, 10am-2pm
|
||
- Critical fixes: Anytime with proper approval
|
||
- Avoid: Monday mornings, Friday afternoons, weekends
|
||
|
||
### 8.3 Communication
|
||
|
||
**Before Production deployment:**
|
||
```
|
||
Slack announcement template:
|
||
|
||
📢 Production Deployment Scheduled
|
||
|
||
🗓 Date: January 15, 2026
|
||
⏰ Time: 11:00 AM (EST)
|
||
⏱ Duration: ~15 minutes
|
||
📝 Changes:
|
||
- API v3.2.1 → v3.2.2 (bug fixes)
|
||
- Frontend v2.1.5 → v2.1.6 (UI improvements)
|
||
|
||
🔗 Release Notes: [link]
|
||
🔗 Rollback Plan: [link]
|
||
|
||
Please report any issues to #devops-alerts
|
||
```
|
||
|
||
### 8.4 Security Considerations
|
||
|
||
```yaml
|
||
# Store sensitive data as Docker secrets
|
||
secrets:
|
||
db_password:
|
||
external: true # ← Created outside compose file
|
||
api_key:
|
||
external: true
|
||
|
||
# Never commit secrets to Git!
|
||
# Use GitLab CI/CD variables for:
|
||
# - SSH keys
|
||
# - API tokens
|
||
# - Passwords
|
||
# - Certificates
|
||
```
|
||
|
||
### 8.5 Troubleshooting Common Issues
|
||
|
||
**Issue 1: Pipeline fails with "SSH connection refused"**
|
||
```bash
|
||
# Solution: Verify SSH key in GitLab CI/CD variables
|
||
# Test manually:
|
||
ssh -i ~/.ssh/gitlab_rsa root@swarm-manager
|
||
```
|
||
|
||
**Issue 2: Image pull fails from Harbor**
|
||
```bash
|
||
# Solution: Check registry credentials
|
||
docker login harbor.company.com -u $HARBOR_USER -p $HARBOR_PASSWORD
|
||
|
||
# Verify image exists:
|
||
docker pull harbor.company.com/company/api:v3.2.1
|
||
```
|
||
|
||
**Issue 3: Health checks fail after deployment**
|
||
```bash
|
||
# Debug: Check service logs
|
||
docker service logs app-stack_api --tail 100
|
||
|
||
# Check service status
|
||
docker service ps app-stack_api
|
||
|
||
# Manual health check
|
||
curl http://localhost:3000/health
|
||
```
|
||
|
||
**Issue 4: Deployment stuck "pending"**
|
||
```bash
|
||
# Check swarm node status
|
||
docker node ls
|
||
|
||
# Check resource availability
|
||
docker node inspect swarm-worker-1 | grep Resources -A 10
|
||
|
||
# Check for failed tasks
|
||
docker service ps app-stack_api --no-trunc
|
||
```
|
||
|
||
---
|
||
|
||
## 9. Success Metrics
|
||
|
||
### 9.1 Key Performance Indicators
|
||
|
||
**Before Automation:**
|
||
- 📊 Deployment frequency: 1-2 per week
|
||
- ⏱ Average deployment time: 30-60 minutes per environment
|
||
- 🐛 Deployment errors: ~20% (typos, wrong tags)
|
||
- 🔄 Rollback time: 1-2 hours (manual)
|
||
- 📝 Audit trail: Partial (chat logs, manual notes)
|
||
|
||
**After Automation (Target):**
|
||
- 📊 Deployment frequency: 5-10 per week
|
||
- ⏱ Average deployment time: 5-10 minutes per environment
|
||
- 🐛 Deployment errors: <2% (automated validation)
|
||
- 🔄 Rollback time: 2-3 minutes (automatic)
|
||
- 📝 Audit trail: Complete (Git history + GitLab logs)
|
||
|
||
### 9.2 Success Criteria
|
||
|
||
**Week 4 Evaluation:**
|
||
- [ ] All 4 environments deployed via GitLab CI/CD
|
||
- [ ] Zero manual SSH deployments
|
||
- [ ] At least 5 successful Production deployments
|
||
- [ ] At least 1 successful rollback test
|
||
- [ ] Team can deploy without DevOps assistance
|
||
- [ ] Complete audit trail for all deployments
|
||
- [ ] Average deployment time < 15 minutes
|
||
|
||
---
|
||
|
||
## 10. Conclusion & Next Steps
|
||
|
||
### Current State
|
||
❌ Manual bash script deployments
|
||
❌ No audit trail
|
||
❌ Error-prone process
|
||
❌ Slow rollbacks
|
||
|
||
### Target State (After Implementation)
|
||
✅ Automated GitLab CI/CD pipelines
|
||
✅ Complete Git-based audit trail
|
||
✅ Validated deployments with health checks
|
||
✅ 2-minute automatic rollbacks
|
||
✅ Self-service for developers
|
||
|
||
### Immediate Next Steps
|
||
|
||
1. **This Week:**
|
||
- Create GitLab repository structure
|
||
- Configure CI/CD variables
|
||
- Test SSH connectivity
|
||
|
||
2. **Next Week:**
|
||
- Implement basic pipeline
|
||
- Test Development deployments
|
||
- Add validation stages
|
||
|
||
3. **Week 3-4:**
|
||
- Roll out to all environments
|
||
- Implement rollback procedures
|
||
- Train team
|
||
|
||
### Resources Needed
|
||
|
||
- **Time Investment:** 2-4 weeks (1 DevOps engineer)
|
||
- **Infrastructure:** GitLab Runner (existing OK)
|
||
- **Training:** 2-3 hours team training session
|
||
- **Documentation:** Deployment guide + runbooks
|
||
|
||
### Support & Questions
|
||
|
||
For implementation assistance:
|
||
- 📧 Email: devops@company.com
|
||
- 💬 Slack: #devops-automation
|
||
- 📖 Documentation: https://gitlab.company.com/deployment-configs
|
||
|
||
---
|
||
|
||
**Document Version:** 1.0
|
||
**Last Updated:** Январь 2026
|
||
**Status:** Ready for Implementation
|
||
**Author:** DevOps Team
|
||
**Review Date:** After Phase 2 completion |