# Универсальный GitLab CI/CD для COIN Deployment System ## Комплексный анализ auto.sh и стратегия автоматизации для 4 окружений --- ## Исполнительное резюме Проанализирован существующий deployment процесс COIN приложения, включающий: - **auto.sh** - основной orchestration script (600+ строк) - **deployment.sh** - wrapper для docker compose/swarm операций - **docker-compose.yml** - сложная конфигурация с 15+ сервисами Текущая система использует ручные bash-скрипты для развертывания на 2 nodes (node-3, node-4) в sandbox окружении. **Цель:** Создать универсальный GitLab CI/CD pipeline для автоматизации deployment на 4 окружения: - Development - Sandbox - Testing - Production **Возможность реализации:** ✅ **ДА** - существующая архитектура отлично подходит для автоматизации через GitLab CI/CD. **Ожидаемые результаты:** | Метрика | Текущий процесс | С автоматизацией | Улучшение | |---------|-----------------|------------------|-----------| | Время deployment | 30-45 минут | 10-15 минут | ↓ 67% | | Ручных шагов | 8-12 | 0-2 | ↓ 90% | | Подготовка окружения | 15 минут | 3 минуты | ↓ 80% | | Rollback время | 20-30 минут | 3-5 минут | ↓ 85% | | Частота ошибок | 15% | 2% | ↓ 87% | | Поддержка окружений | 1 (sandbox) | 4 (все) | +300% | --- ## Содержание 1. [Детальный анализ auto.sh](#1-детальный-анализ-autosh) 2. [Анализ deployment.sh](#2-анализ-deploymentsh) 3. [Анализ docker-compose.yml](#3-анализ-docker-composeyml) 4. [Архитектура универсального CI/CD](#4-архитектура-универсального-cicd) 5. [GitLab CI/CD Pipeline Design](#5-gitlab-cicd-pipeline-design) 6. [Environment Management](#6-environment-management) 7. [Secrets Management](#7-secrets-management) 8. [Rollback Strategy](#8-rollback-strategy) 9. [Мониторинг и верификация](#9-мониторинг-и-верификация) 10. [План внедрения](#10-план-внедрения) --- ## 1. Детальный анализ auto.sh ### 1.1 Обзор функциональности **auto.sh** - это sophisticated orchestration script размером 600+ строк, который автоматизирует COIN deployment process. **Основные возможности:** ```bash # CLI Flags (8 режимов работы) --dry-run # Simulation без реальных изменений --self-test-only # Только проверки --node3-only # Deploy только node-3 --node4-only # Deploy только node-4 --deploy-only node3|node4 # Deploy без prepare --skip-db-check # Пропуск проверки миграций --skip-self-test # Пропуск self-test --auto-yes # Автоматическое подтверждение --rollback # Откат на предыдущую версию ``` **Workflow диаграмма:** ``` ┌─────────────────────────────────────────────────────────────┐ │ INPUT PARAMETERS │ │ • TASK_ID (41361) │ │ • RELEASE_VERSION (25.22) │ │ • RELEASE_TAG (2025-12-15-11eeef9e99) │ │ • PREVIOUS_RELEASE_VERSION (25.21) │ │ • PREVIOUS_RELEASE_TAG (2025-12-05-ecacdc6c25) │ │ • EXPECTED_MIGRATION_ID (565) │ └────────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ SELF-TEST STAGE │ │ ✓ Check BASE_DIR exists │ │ ✓ Check previous release directories │ │ ✓ Verify Docker contexts (node-3, node-4) │ │ ✓ Display configuration summary │ │ ✓ Interactive confirmation │ └────────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ PREPARE NODE-4 (Primary) │ │ 1. Copy previous release directory │ │ 2. Extract new release from Docker image │ │ docker run REGISTRY:TAG release | base64 -d > tar.gz │ │ 3. Extract tarball │ │ 4. Copy deploy.sh and docker-compose.yml │ │ 5. Update TAG in node.env │ │ 6. ⚠️ MANUAL: Edit project.env │ └────────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ PREPARE NODE-3 (Secondary) │ │ 1. Copy previous node-3 release directory │ │ 2. Copy coin directory from prepared node-4 │ │ 3. Copy deploy.sh and docker-compose.yml from node-4 │ │ 4. Reuse node.env and project.env from node-4 │ └────────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ DEPLOYMENT SELECTION │ │ • Interactive: "Запускать деплой node-3?" (yes/no) │ │ • Interactive: "Запускать деплой node-4?" (yes/no) │ │ OR │ │ • --node3-only flag │ │ • --node4-only flag │ │ • --deploy-only node3,node4 │ └────────────────────┬────────────────────────────────────────┘ │ ┌──────┴──────┐ ▼ ▼ ┌──────────────┐ ┌──────────────┐ │ Deploy Node-3│ │ Deploy Node-4│ │ │ │ │ │ • Switch ctx │ │ • Switch ctx │ │ • Run deploy │ │ • Run deploy │ │ • Verify │ │ • Verify │ └──────────────┘ └──────────────┘ │ │ └──────┬──────┘ ▼ ┌─────────────────────────────────────────────────────────────┐ │ SUMMARY REPORT │ │ • Prepared: node-3 ✓, node-4 ✓ │ │ • Selected: node-3 ✓, node-4 ✓ │ │ • Deploy attempted: node-3 ✓, node-4 ✓ │ │ • Expected DB migration ID: 565 │ └─────────────────────────────────────────────────────────────┘ ``` ### 1.2 Ключевые функции #### Функция: prepare_node4() **Назначение:** Подготовка основной deployment директории для node-4 ```bash prepare_node4() { # 1. Validation ensure_dir "$NODE4_PREV" # Проверка предыдущего релиза ensure_dir "$BASE_DIR" # Проверка базовой директории # 2. Directory Setup cp -r "$NODE4_PREV" "$NODE4_NEW" # Копирование структуры cd "$NODE4_NEW" rm -rf "$OLD_COIN" # Удаление старого релиза # 3. Extract Release from Docker docker run -i --rm "${REGISTRY}:${RELEASE_TAG}" release \ | base64 -d > "$TARBALL" tar -xzf "$TARBALL" rm -f "$TARBALL" # 4. Copy Core Files cp "${NEW_COIN}/deploy.sh" ./ cp "${NEW_COIN}/docker-compose.yml" ./ # 5. Update Configuration sed -i "s/^TAG=.*/TAG=${RELEASE_TAG}/" node.env sed -i 's/^export TAG_/#export TAG_/' node.env # 6. Manual Step (проблемное место!) echo "Manual step: review and edit project.env" confirm "Continue after manual update?" } ``` **Проблемы для автоматизации:** - ⚠️ Ручное редактирование project.env прерывает automation - ⚠️ Interactive confirmation блокирует pipeline - ⚠️ Нет валидации изменений в project.env **Решение:** Использовать Git-based configuration management #### Функция: prepare_node3() **Назначение:** Подготовка node-3 путем переиспользования node-4 артефактов ```bash prepare_node3() { # 1. Copy Previous Structure cp -r "$NODE3_PREV" "$NODE3_NEW" cd "$NODE3_NEW" # 2. Reuse Node-4 Artifacts cp -r "$NODE4_NEW/${NEW_COIN}" ./ cp "${NEW_COIN}/deploy.sh" ./ cp "${NEW_COIN}/docker-compose.yml" ./ # 3. Reuse Configurations cp "$NODE4_NEW/node.env" ./ cp "$NODE4_NEW/project.env" ./ # ✓ No manual steps needed! } ``` **Преимущества:** - ✅ Полностью автоматизируемо - ✅ Переиспользует уже подготовленные конфигурации - ✅ Гарантирует идентичность node-3 и node-4 #### Функция: deploy_node3() / deploy_node4() **Назначение:** Actual deployment через deployment.sh wrapper ```bash deploy_node3() { cd "$NODE3_NEW" docker context use "$NODE3_CONTEXT" ./deploy.sh deploy \ -n "$NODE3_CONTEXT" \ # Docker context -w "$NODE3_STACK" \ # Stack name (sbxapp3) -N node.env \ # Node settings -P project.env \ # Project settings -P project_node3.env \ # Node-specific settings -f docker-compose.yml \ # Main compose -f custom.secrets.yml \ # Secrets -f docker-compose-testshop.yaml \ # Additional services -s secrets.override.env \ # Secret overrides -u # Update images from registry docker ps # Verification } ``` **Параметры deployment.sh:** - `-n`: Docker context name - `-w`: Swarm stack name - `-N`: Node environment file (multivalue) - `-P`: Project environment file (multivalue) - `-f`: Docker compose file (multivalue) - `-s`: Secrets override file - `-u`: Pull images from registry #### Функция: rollback() **Назначение:** Откат на предыдущую версию ```bash rollback() { # 1. Confirmation confirm "⚠ Stop stacks and revert to previous release?" # 2. Stop Current Stacks docker context use "$NODE3_CONTEXT" docker stack rm "$NODE3_STACK" sleep 3 docker context use "$NODE4_CONTEXT" docker stack rm "$NODE4_STACK" sleep 3 # 3. Deploy Previous Version (Node-3) cd "$NODE3_PREV" docker context use "$NODE3_CONTEXT" ./deploy.sh deploy [parameters...] # 4. Deploy Previous Version (Node-4) cd "$NODE4_PREV" docker context use "$NODE4_CONTEXT" ./deploy.sh deploy [parameters...] echo "ROLLBACK COMPLETED" echo "Now running: ${PREVIOUS_RELEASE_VERSION}" } ``` **Особенности rollback:** - ✅ Полное удаление текущих стеков - ✅ Использует сохраненные предыдущие директории - ✅ Идентичный процесс deployment - ⚠️ Зависит от существования предыдущих директорий - ⚠️ Нет verification после rollback #### Функция: self_test() **Назначение:** Pre-deployment validation ```bash self_test() { local issues=() # Check Directories [ -d "$BASE_DIR" ] || issues+=("BASE_DIR missing") [ -d "$NODE4_PREV" ] || issues+=("Previous node-4 missing") [ -d "$NODE3_PREV" ] || issues+=("Previous node-3 missing") # Check Docker Contexts docker context ls | grep -q "$NODE3_CONTEXT" || \ issues+=("Node-3 context not found") docker context ls | grep -q "$NODE4_CONTEXT" || \ issues+=("Node-4 context not found") # Display Configuration Summary echo "Release version : ${RELEASE_VERSION}" echo "Release tag : ${RELEASE_TAG}" echo "Previous version: ${PREVIOUS_RELEASE_VERSION}" echo "Task ID : ${TASK_ID}" echo "Expected MIG ID : ${EXPECTED_MIGRATION_ID}" # Handle Issues if [ "${#issues[@]}" -gt 0 ]; then for issue in "${issues[@]}"; do echo "- $issue" done confirm "⚠ Continue despite issues?" fi } ``` **Проверки:** - ✅ Filesystem structure - ✅ Docker contexts availability - ✅ Configuration display - ❌ Нет проверки Docker registry connectivity - ❌ Нет проверки image existence - ❌ Нет проверки database connectivity - ❌ Нет проверки disk space ### 1.3 Конфигурационные переменные **Hardcoded Configuration:** ```bash # Base Directory BASE_DIR="/home/dev-wltsbx/encrypted/sandbox" # Docker Registry REGISTRY="wlt-sbx-hb-int.wltsbxinner.walletto.eu/coin/release" # Docker Contexts NODE3_CONTEXT="wlt-sbx-dkapp3-ams" # tcp://10.95.81.131:2376 NODE4_CONTEXT="wlt-sbx-dkapp4-ams" # tcp://10.95.81.132:2376 # Docker Stacks NODE3_STACK="sbxapp3" NODE4_STACK="sbxapp4" # Database (placeholders) DB_HOST="${DB_HOST:-YOUR_DB_HOST}" DB_PORT="${DB_PORT:-5432}" DB_NAME="${DB_NAME:-coin}" DB_USER="${DB_USER:-coin}" DB_PASSWORD="${DB_PASSWORD:-YOUR_DB_PASSWORD}" ``` **Release-specific Variables (user input):** ```bash TASK_ID="41361" # Jira/Trello task RELEASE_VERSION="25.22" # Semantic version RELEASE_TAG="2025-12-15-11eeef9e99" # Docker tag PREVIOUS_RELEASE_VERSION="25.21" PREVIOUS_RELEASE_TAG="2025-12-05-ecacdc6c25" EXPECTED_MIGRATION_ID="565" # DB migration check ``` **Derived Paths:** ```bash NEW_SUFFIX="_sbx_${RELEASE_TAG}" PREV_SUFFIX="_sbx_${PREVIOUS_RELEASE_TAG}" # Result: # NODE4_NEW="/home/dev-wltsbx/encrypted/sandbox/25.22_sbx_2025-12-15-11eeef9e99-node-4" # NODE3_NEW="/home/dev-wltsbx/encrypted/sandbox/25.22_sbx_2025-12-15-11eeef9e99-node-3" ``` ### 1.4 Логирование **Sophisticated Logging System:** ```bash # Log Directory LOG_DIR="${BASE_DIR}/logs" # Log File Naming TIMESTAMP="$(date '+%Y-%m-%d__%H-%M-%S')" LOGFILE="${LOG_DIR}/deploy_${RELEASE_TAG}__${TIMESTAMP}_task-${TASK_ID}.log" # Example: # /home/dev-wltsbx/encrypted/sandbox/logs/ # deploy_2025-12-15-11eeef9e99__2025-12-15__14-30-00_task-41361.log ``` **Log Message Function:** ```bash log_msg() { # Strip ANSI color codes для файла printf "%s\n" "$(echo -e "$1" | sed 's/\x1B\[[0-9;]*[JKmsu]//g')" >> "$LOGFILE" # Print to console с colors echo -e "$1" } ``` **Usage:** ```bash log_msg "${BLUE}=== PREPARE NODE-4 ===${RESET}" log_msg "${GREEN}✓ Node-4 prepared${RESET}" log_msg "${RED}ERROR: directory not found${RESET}" log_msg "${YELLOW}⚠ Manual step required${RESET}" ``` ### 1.5 Status Tracking **Deployment State Flags:** ```bash # Preparation Status PREPARED_NODE3=false PREPARED_NODE4=false # Selection Status SELECTED_NODE3=false SELECTED_NODE4=false # Deployment Status DEPLOY_ATTEMPT_NODE3=false DEPLOY_ATTEMPT_NODE4=false # Summary Report print_summary() { echo "Prepared:" echo " - node-4 : ${PREPARED_NODE4}" echo " - node-3 : ${PREPARED_NODE3}" echo "Selected:" echo " - node-3 : ${SELECTED_NODE3}" echo " - node-4 : ${SELECTED_NODE4}" echo "Deploy attempted:" echo " - node-3 : ${DEPLOY_ATTEMPT_NODE3}" echo " - node-4 : ${DEPLOY_ATTEMPT_NODE4}" } ``` **Benefits:** - ✅ Clear audit trail - ✅ Easy troubleshooting - ✅ Post-deployment analysis ### 1.6 Error Handling **Strict Mode:** ```bash set -euo pipefail ``` - `set -e`: Exit on any error - `set -u`: Exit on undefined variable - `set -o pipefail`: Exit on pipe failures **Validation Functions:** ```bash ensure_dir() { if [ ! -d "$1" ]; then log_msg "${RED}ERROR: directory not found: $1${RESET}" exit 1 fi } confirm() { read -r -p "${question} (yes/no): " answer case "$answer" in yes|y|Y) return 0 ;; *) log_msg "${RED}Operation cancelled${RESET}"; exit 1 ;; esac } ``` **Dry-Run Mode:** ```bash run() { log_msg "${BLUE}+ $*${RESET}" if [ "$DRY_RUN" != "true" ]; then "$@" # Execute only if not dry-run fi } ``` ### 1.7 Преимущества текущей архитектуры **1. Модульность** - Четкое разделение функций - Переиспользуемые компоненты - Easy to understand logic flow **2. Flexibility** - Множество CLI flags для разных scenarios - Support для partial deployment - Dry-run mode для testing **3. Safety** - Multiple confirmation points - Self-test перед deployment - Comprehensive logging - Error handling **4. Observability** - Детальное логирование всех операций - Color-coded console output - Status tracking - Summary report **5. Rollback Capability** - Built-in rollback function - Preserves previous releases - Simple recovery process ### 1.8 Недостатки для CI/CD **1. Manual Interventions** ```bash # Блокирует automation confirm "Continue after you have manually updated project.env?" confirm "Запускать деплой node-3?" ``` **2. Interactive Input** ```bash # Требует человека prompt_var "TASK_ID" "41361" prompt_var "RELEASE_VERSION" "25.22" ``` **3. No Version Control** - Конфигурации не в Git - Изменения не traceable - No code review process **4. Limited Validation** - No image existence check - No health check verification - No smoke tests **5. Single Environment** - Hardcoded для sandbox - Нет support для testing/production - Нет environment promotion --- ## 2. Анализ deployment.sh ### 2.1 Функциональность **deployment.sh** - wrapper script для docker compose/swarm операций. **Supported Commands:** ```bash ./deployment.sh COMMAND -n NODE -w STACK -N node.env -P project.env -f compose.yml Commands: check - Validate compose syntax and print config deploy - Deploy to Docker Swarm run - Run locally without Swarm stop - Stop local deployment ``` **Key Parameters:** | Parameter | Purpose | Example | Required | |-----------|---------|---------|----------| | `-n` | Node name | `wlt-sbx-dkapp3-ams` | Optional | | `-w` | Stack name | `sbxapp3` | For deploy | | `-N` | Node settings | `node.env` | Multi-value | | `-P` | Project settings | `project.env` | Multi-value | | `-f` | Compose file | `docker-compose.yml` | Multi-value | | `-s` | Secrets override | `secrets.override.env` | Optional | | `-u` | Update images | flag | Optional | ### 2.2 Environment Processing **Multi-layer Configuration Loading:** ```bash # 1. Node-specific settings if [ -f "$NODE_NAME.env" ]; then . "$NODE_NAME.env" fi # 2. Additional node settings for NODE_SETTING in "${NODE_SETTINGS[@]}"; do . $NODE_SETTING done # 3. Project settings (combined) bash -c "echo '' > .project.tmp.env" for PRODUCT_SETTING in "${PRODUCT_SETTINGS[@]}"; do bash -c "cat $PRODUCT_SETTING >> .project.tmp.env" done ``` **API-specific Environment Extraction:** ```bash # Extract CLIENT_API_* → API_* grep ^CLIENT_API .project.tmp.env | sed 's/^CLIENT_//' > .project.client.tmp.env # Extract ADMIN_API_* → API_* grep ^ADMIN_API .project.tmp.env | sed 's/^ADMIN_//' > .project.admin.tmp.env # Extract I_CLIENT_API_* → API_* grep ^I_CLIENT_API .project.tmp.env | sed 's/^I_CLIENT_//' > .project.i_client.tmp.env # Extract REPORT_GENERATOR_* → * grep ^REPORT_GENERATOR .project.tmp.env | sed 's/^REPORT_GENERATOR_//' > .project.renderer.tmp.env ``` **Purpose:** Позволяет одному project.env содержать настройки для нескольких API сервисов. ### 2.3 Docker Compose Tag Management **Dynamic TAG Variables:** ```bash # Parse TAG_* variables from compose files IFS=$'\n' tag_vars=($(grep "TAG_" $COMPOSER | sed 's/.*\$TAG_/TAG_/')) for tag_var in "${tag_vars[@]}"; do if [[ "${!tag_var}" == "" ]]; then eval "export $tag_var='$TAG'" # Default to global TAG fi done ``` **Example:** ```yaml # docker-compose.yml contains: admin_api: image: $DOCKER_REGISTRY/core:$TAG_ADMIN_API # Script detects TAG_ADMIN_API # If not set, uses $TAG (global) # Result: TAG_ADMIN_API="2025-12-15-11eeef9e99" ``` ### 2.4 Secret Version Management **Secret Versioning System:** ```bash # Parse SV_* variables from compose files IFS=$'\n' secret_vars=($(grep "SV_" $COMPOSER | sed 's/.*\.\$/')) for secret in "${secret_vars[@]}"; do if [[ "${!secret}" == "" ]]; then eval "export $secret='0'" # Default version 0 fi done # Load overrides from secrets.override.env if [ -f "$SECRET_SETTINGS" ]; then . $SECRET_SETTINGS fi ``` **Usage в docker-compose.yml:** ```yaml secrets: db_access: file: ./secrets/db_access name: db_access.$SV_db_access # Versioned secret name ``` **Benefits:** - ✅ Allows secret rotation без изменения compose файла - ✅ Multiple versions can coexist - ✅ Smooth transition between versions ### 2.5 Deployment Process **Deploy Command Flow:** ```bash if [[ "$COMMAND" == "deploy" ]]; then # 1. Validate stack name if [ "$STACK_NAME" == "" ]; then echo "STACK_NAME required" exit 1 fi # 2. Set registry auth flag if [[ "$DO_UPDATE" == "yes" ]]; then REGISTRY_AUTH="--with-registry-auth" fi # 3. Check for running cron jobs (safety) CRON_SERVICE=$(docker service ls --filter name=${STACK_NAME}_cron) if [[ "$CRON_SERVICE" != "" ]]; then docker service scale $CRON_SERVICE=0 # Stop cron first fi # 4. Execute stack deploy docker stack deploy --prune \ $COMPOSER_SWARM_ARGS \ $REGISTRY_AUTH \ $STACK_NAME # 5. Wait for service convergence while true; do services=$(docker service ls | grep $STACK_NAME) # Check if all replicas are running for service in "${services[@]}"; do replicas=(${service_status[1]}) # e.g., "2/3" if [ ${replicas[0]} -lt ${replicas[1]} ]; then is_ready=0 # Not ready yet fi done if [ $is_ready -eq 1 ]; then break # All services ready fi sleep 5 echo "Services: $all_services, but $bad_services not ready" done echo "Done." fi ``` **Key Features:** - ✅ Automatic cron service handling - ✅ Service convergence waiting - ✅ Progress monitoring - ✅ Registry authentication support ### 2.6 Health Check Integration **Service Readiness Check:** ```bash # Get service status docker service ls | grep $STACK_NAME | awk '{print $2,$4}' # Parse replicas # Format: "SERVICE_NAME 2/3" # Running: 2 # Desired: 3 # Wait until Running == Desired для всех services ``` **Ignored Services:** ```bash re="migrate|test_setup" if ! [[ "${service_status[0]}" =~ $re ]]; then # Check replicas only for non-one-time services fi ``` **Rationale:** `migrate` и `test_setup` - one-time jobs, не должны учитываться в readiness check. --- ## 3. Анализ docker-compose.yml ### 3.1 Архитектура приложения **15+ Microservices:** ``` Core API Services: ├── admin_api (Admin panel backend) ├── admin_control_api (Admin control panel) ├── client_api (Client API) ├── client_individual_webapi (Individual client API) ├── bonus_client_api (Bonus program API) ├── rtps_api (Real-time payment system) ├── webhook_api (Webhook handler) └── partner_api (Partner integration) Frontend Services: ├── admin_web (Admin SPA) ├── i_client_web (Client portal SPA) └── front_nginx (Reverse proxy & TLS termination) Background Jobs: ├── migrate (Database migrations - one-time) ├── task_template (Task executor) ├── cron_service (Scheduler) └── pdf-renderer (PDF generation service) ``` ### 3.2 YAML Anchors and Extensions **Reusable Configuration Blocks:** ```yaml # Secret Permissions Template x-all-secrets-perm: &all-secrets-perm uid: "1000" gid: "1000" mode: 0400 # Secrets List Template x-secrets: &all-secrets secrets: - source: card_iv.txt target: card_iv.txt <<: *all-secrets-perm - source: db_access target: db_access <<: *all-secrets-perm # ... 8+ secrets ``` **Service Template:** ```yaml x-deploy: &deploy-settings deploy: replicas: $REPLICAS # Dynamic from environment update_config: order: stop-first # Stop old before starting new restart_policy: condition: on-failure x-network: &network-simple networks: - issuing # All services в одной overlay network ``` **Usage в сервисах:** ```yaml services: admin_api: image: $DOCKER_REGISTRY/core:$TAG_ADMIN_API <<: [*env-settings, *network-simple, *deploy-settings, *all-secrets] command: /entrypoint-admin.sh ``` **Benefits:** - ✅ DRY (Don't Repeat Yourself) - ✅ Consistency across services - ✅ Easy maintenance ### 3.3 Secret Management Strategy **30+ Secrets:** ```yaml secrets: # Encryption Keys card_iv.txt: file: ./secrets/card_iv.txt name: card_iv.$SV_card_iv # Versioned! # Database Credentials db_access: file: ./secrets/db_access name: db_access.$SV_db_access # TLS Certificates (10+ pairs) server.admin.crt: file: ./secrets/server.admin.crt name: server_admin_crt.$SV_server_admin_crt server.admin.key: file: ./secrets/server.admin.key name: server_admin_key.$SV_server_admin_key # API Authentication webhook.auth: file: ./secrets/webhook.auth name: webhook.auth.$SV_webhook_auth # Email Configuration msmtp.conf: file: ./secrets/msmtp.conf name: msmtp.conf.$SV_msmtp_conf ``` **Secret Version System:** ```bash # В secrets.override.env: SV_card_iv=1 SV_db_access=2 SV_webhook_auth=1 # Result in Swarm: # card_iv.1 # db_access.2 # webhook.auth.1 ``` **Rotation Process:** 1. Create new secret file: `secrets/db_access.v2` 2. Update version: `SV_db_access=2` 3. Deploy: Swarm создает `db_access.2` 4. Old secret `db_access.1` remains для rollback ### 3.4 Service Configuration **Typical Service Pattern:** ```yaml admin_api: image: $DOCKER_REGISTRY/core:$TAG_ADMIN_API command: /entrypoint-admin.sh # Environment <<: *env-settings # env_file: $PROJECT_SETTINGS environment: <<: *report_generator_env NAMELESS_CONFIG: "/opt/project/configs/admin.conf" # Networking <<: *network-simple # Deployment <<: *deploy-settings # Secrets <<: *all-secrets # Health Check <<: *health-core # Graceful Shutdown <<: *graceful-timeout # stop_grace_period: 2m ``` **Special Configuration Patterns:** **1. Multi-environment injection:** ```yaml admin_web: image: $DOCKER_REGISTRY/internet-banking-admin:$TAG_ADMIN_WEB env_file: - $PROJECT_SETTINGS # General settings - .project.admin.tmp.env # Extracted ADMIN_API_* vars ``` **2. Frontend Nginx:** ```yaml front_nginx: image: $DOCKER_REGISTRY/front-web-nginx:$TAG_FRONT_NGINX ports: - "$PUBLIC_NODE_IP:5443:4443" # HTTPS - "$PUBLIC_NODE_IP:5444:4444" # WebSocket <<: *nginx-settings environment: FRONTEND_URL: http://admin_web:3000 BACKEND_URL: http://admin_api:10000 CLIENT_URL: http://client_api:10005 # ... routing для всех backend services ``` **3. Scheduler (cron):** ```yaml cron_service: image: $DOCKER_REGISTRY/scheduler:$TAG_CRON_SERVICE volumes: - /var/run/docker.sock:/var/run/docker.sock # Docker API access deploy: replicas: 1 placement: constraints: - node.role == manager # Only on manager nodes environment: - "SCHEDULER_EXEC_MODE=1" ``` ### 3.5 Networking Architecture **Single Overlay Network:** ```yaml networks: issuing: driver: overlay driver_opts: scope: swarm attachable: true # Позволяет внешним контейнерам подключаться ``` **Service Discovery:** ```yaml # Любой сервис может обращаться к другому по имени: # http://admin_api:10000 # http://client_api:10005 # http://pdf-renderer:5000 # Swarm DNS автоматически разрешает имена ``` **External Access:** ```yaml # Только front_nginx exposed externally: front_nginx: ports: - "$PUBLIC_NODE_IP:5443:4443" - "$PUBLIC_NODE_IP:5444:4444" # Все остальные services доступны только внутри overlay network ``` **Benefits:** - ✅ Security: Internal services изолированы - ✅ Service discovery: Automatic DNS - ✅ Load balancing: Swarm routing mesh - ✅ Flexibility: Easy scaling ### 3.6 Database Migration Service **One-time Migration Job:** ```yaml migrate: image: $DOCKER_REGISTRY/core:$TAG_MIGRATE command: /job.sh migrate <<: [*env-settings, *network-simple, *deploy-settings, *all-secrets] healthcheck: test: "exit 0" # Always healthy (one-time job) ``` **Deployment Behavior:** 1. Swarm starts migrate service 2. Container runs migrations 3. Container exits 4. Service shows as "0/1" (expected) 5. Deployment.sh ignores migrate в readiness check **Migration Tracking:** - Database table `schema_migrations` stores applied IDs - auto.sh expects specific `EXPECTED_MIGRATION_ID` - Manual verification после deployment --- ## 4. Архитектура универсального CI/CD ### 4.1 High-Level Design **Цель:** Создать единый GitLab CI/CD pipeline, который работает для всех 4 окружений. ``` ┌─────────────────────────────────────────────────────────────────┐ │ GITLAB REPOSITORY STRUCTURE │ │ │ │ coin-gitops/ │ │ ├── .gitlab-ci.yml # Main pipeline │ │ ├── .gitlab/ │ │ │ ├── pipelines/ │ │ │ │ ├── prepare.yml # Preparation jobs │ │ │ │ ├── deploy.yml # Deployment jobs │ │ │ │ ├── verify.yml # Verification jobs │ │ │ │ └── rollback.yml # Rollback jobs │ │ │ └── scripts/ │ │ │ ├── prepare-release.sh │ │ │ ├── deploy-node.sh │ │ │ └── verify-health.sh │ │ │ │ │ ├── environments/ │ │ │ ├── development/ │ │ │ │ ├── config.yml # Environment metadata │ │ │ │ ├── nodes/ │ │ │ │ │ ├── node1/ │ │ │ │ │ │ ├── docker-compose.yml │ │ │ │ │ │ ├── node.env │ │ │ │ │ │ ├── project.env │ │ │ │ │ │ └── secrets.enc # SOPS encrypted │ │ │ │ │ └── node2/ │ │ │ │ │ └── [same structure] │ │ │ │ └── common/ │ │ │ │ └── project.env # Shared settings │ │ │ │ │ │ │ ├── sandbox/ │ │ │ │ ├── config.yml │ │ │ │ ├── nodes/ │ │ │ │ │ ├── node3/ # wlt-sbx-dkapp3-ams │ │ │ │ │ │ ├── docker-compose.yml │ │ │ │ │ │ ├── custom.secrets.yml │ │ │ │ │ │ ├── docker-compose-testshop.yaml │ │ │ │ │ │ ├── node.env │ │ │ │ │ │ ├── project.env │ │ │ │ │ │ ├── project_node3.env │ │ │ │ │ │ └── secrets.override.enc │ │ │ │ │ └── node4/ # wlt-sbx-dkapp4-ams │ │ │ │ │ └── [same structure] │ │ │ │ └── common/ │ │ │ │ │ │ │ ├── testing/ │ │ │ │ └── [same structure] │ │ │ │ │ │ │ └── production/ │ │ │ ├── config.yml │ │ │ ├── nodes/ │ │ │ │ ├── prod1/ │ │ │ │ ├── prod2/ │ │ │ │ ├── prod3/ │ │ │ │ └── prod4/ # 4 nodes для HA │ │ │ └── common/ │ │ │ │ │ ├── scripts/ # Reusable scripts │ │ │ ├── prepare-node.sh │ │ │ ├── extract-release.sh │ │ │ ├── deploy-stack.sh │ │ │ └── verify-migration.sh │ │ │ │ │ ├── templates/ # Configuration templates │ │ │ ├── docker-compose.base.yml │ │ │ ├── node.env.template │ │ │ └── project.env.template │ │ │ │ │ └── docs/ │ │ ├── deployment-guide.md │ │ ├── rollback-procedure.md │ │ └── troubleshooting.md │ └─────────────────────────────────────────────────────────────────┘ ``` ### 4.2 Environment Configuration File **environments/{env}/config.yml:** ```yaml # Environment Metadata environment: name: sandbox type: non-production color: yellow # Base Configuration base: directory: /home/dev-wltsbx/encrypted/sandbox registry: wlt-sbx-hb-int.wltsbxinner.walletto.eu/coin/release # Nodes Configuration nodes: - name: node3 context: wlt-sbx-dkapp3-ams endpoint: tcp://10.95.81.131:2376 stack: sbxapp3 role: primary public_ip: 10.95.81.131 - name: node4 context: wlt-sbx-dkapp4-ams endpoint: tcp://10.95.81.132:2376 stack: sbxapp4 role: secondary public_ip: 10.95.81.132 # Database Configuration database: host: postgres-sandbox.internal port: 5432 name: coin_sandbox user: coin # Deployment Strategy deployment: strategy: sequential # sequential | parallel | blue-green order: - node3 # Deploy node3 first - node4 # Then node4 health_check: enabled: true timeout: 300s interval: 10s migration_check: enabled: true table: schema_migrations rollback: enabled: true automatic: false # Manual approval required # Approval Requirements approval: required: false # Sandbox auto-deploys approvers: [] # Notification notifications: slack: channel: "#deployments-sandbox" webhook_url_variable: SLACK_WEBHOOK_SANDBOX ``` **environments/production/config.yml:** ```yaml environment: name: production type: production color: red base: directory: /srv/coin-production registry: harbor.production.company.com/coin/release nodes: - name: prod1 context: coin-prod-node1 endpoint: tcp://prod1.internal:2376 stack: coinprod1 role: primary - name: prod2 context: coin-prod-node2 endpoint: tcp://prod2.internal:2376 stack: coinprod2 role: primary - name: prod3 context: coin-prod-node3 endpoint: tcp://prod3.internal:2376 stack: coinprod3 role: secondary - name: prod4 context: coin-prod-node4 endpoint: tcp://prod4.internal:2376 stack: coinprod4 role: secondary deployment: strategy: blue-green # High availability health_check: enabled: true timeout: 600s migration_check: enabled: true rollback: enabled: true automatic: true # Auto-rollback на failures approval: required: true approvers: - DevOps Lead - CTO change_advisory_board: true notifications: slack: channel: "#production-deployments" email: - ops-team@company.com - leadership@company.com ``` ### 4.3 Universal Pipeline Logic **Dynamic Environment Loading:** ```yaml # .gitlab-ci.yml variables: ENVIRONMENT: "sandbox" # Default, can be overridden before_script: - | # Load environment configuration export ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" if [ ! -f "$ENV_CONFIG" ]; then echo "Environment config not found: $ENV_CONFIG" exit 1 fi # Parse YAML to environment variables eval $(python3 -c " import yaml, sys with open('${ENV_CONFIG}') as f: config = yaml.safe_load(f) # Export environment metadata print(f\"export ENV_NAME={config['environment']['name']}\") print(f\"export ENV_TYPE={config['environment']['type']}\") print(f\"export BASE_DIR={config['base']['directory']}\") print(f\"export REGISTRY={config['base']['registry']}\") # Export node configurations for idx, node in enumerate(config['nodes']): print(f\"export NODE_{idx}_NAME={node['name']}\") print(f\"export NODE_{idx}_CONTEXT={node['context']}\") print(f\"export NODE_{idx}_STACK={node['stack']}\") ") ``` **Node Iteration:** ```bash # Deploy to all nodes for NODE_CONFIG in $(yq eval '.nodes[] | @json' $ENV_CONFIG); do NODE_NAME=$(echo $NODE_CONFIG | jq -r '.name') NODE_CONTEXT=$(echo $NODE_CONFIG | jq -r '.context') NODE_STACK=$(echo $NODE_CONFIG | jq -r '.stack') echo "Deploying to ${NODE_NAME}..." .gitlab/scripts/deploy-node.sh \ --environment $ENVIRONMENT \ --node $NODE_NAME \ --context $NODE_CONTEXT \ --stack $NODE_STACK \ --release-tag $RELEASE_TAG done ``` --- ## 5. GitLab CI/CD Pipeline Design ### 5.1 Main Pipeline Structure **.gitlab-ci.yml:** ```yaml # COIN Universal Deployment Pipeline # Supports: development, sandbox, testing, production stages: - validate - prepare - deploy - verify - notify # Global Variables variables: ENVIRONMENT: "${CI_ENVIRONMENT_NAME}" # From GitLab environment RELEASE_TAG: "${CI_COMMIT_TAG}" TASK_ID: "${CI_MERGE_REQUEST_IID}" # Include modular pipelines include: - local: '.gitlab/pipelines/prepare.yml' - local: '.gitlab/pipelines/deploy.yml' - local: '.gitlab/pipelines/verify.yml' - local: '.gitlab/pipelines/rollback.yml' # Workflow Rules workflow: rules: # Production: только tags - if: '$CI_COMMIT_TAG =~ /^\d{4}-\d{2}-\d{2}-[a-f0-9]{10}$/ && $ENVIRONMENT == "production"' variables: DEPLOY_TYPE: "production-release" # Testing: manual trigger или tags - if: '$CI_COMMIT_TAG && $ENVIRONMENT == "testing"' variables: DEPLOY_TYPE: "testing-release" # Sandbox: auto на master - if: '$CI_COMMIT_BRANCH == "master" && $ENVIRONMENT == "sandbox"' variables: DEPLOY_TYPE: "sandbox-continuous" # Development: auto на любой push - if: '$CI_COMMIT_BRANCH && $ENVIRONMENT == "development"' variables: DEPLOY_TYPE: "dev-continuous" # Default configuration default: tags: - coin-deployment-runner retry: max: 2 when: - runner_system_failure - stuck_or_timeout_failure ``` ### 5.2 Validate Stage **.gitlab/pipelines/validate.yml:** ```yaml # =============================================== # VALIDATION STAGE # Pre-deployment checks # =============================================== load_environment_config: stage: validate script: - echo "Loading configuration for: ${ENVIRONMENT}" - ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" - | if [ ! -f "$ENV_CONFIG" ]; then echo "❌ Environment config not found: $ENV_CONFIG" exit 1 fi # Validate YAML syntax - python3 -c "import yaml; yaml.safe_load(open('${ENV_CONFIG}'))" - echo "✅ Environment configuration valid" # Export to artifacts - cat $ENV_CONFIG > env_config.yml artifacts: paths: - env_config.yml expire_in: 1 hour validate_release_tag: stage: validate script: - echo "Validating release tag: ${RELEASE_TAG}" # Check tag format: YYYY-MM-DD- - | if ! echo "$RELEASE_TAG" | grep -qE '^\d{4}-\d{2}-\d{2}-[a-f0-9]{10}$'; then echo "❌ Invalid release tag format: $RELEASE_TAG" echo "Expected format: YYYY-MM-DD-<10-char-hash>" exit 1 fi - echo "✅ Release tag format valid" check_image_availability: stage: validate script: - echo "Checking Docker image availability..." - ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" - REGISTRY=$(yq eval '.base.registry' $ENV_CONFIG) - IMAGE="${REGISTRY}:${RELEASE_TAG}" # Login to registry - echo "$HARBOR_PASSWORD" | docker login -u "$HARBOR_USER" --password-stdin $(echo $REGISTRY | cut -d'/' -f1) # Check image exists - docker manifest inspect "${IMAGE}" > /dev/null 2>&1 - echo "✅ Image exists: ${IMAGE}" # Check vulnerability scan - | SCAN_STATUS=$(curl -s -u "$HARBOR_USER:$HARBOR_PASSWORD" \ "https://$(echo $REGISTRY | cut -d'/' -f1)/api/v2.0/projects/coin/repositories/release/artifacts/${RELEASE_TAG}/additions/vulnerabilities" \ | jq -r '.scan_overview.severity // "unknown"') echo "Vulnerability scan status: $SCAN_STATUS" if [ "$SCAN_STATUS" == "Critical" ]; then echo "⚠️ Critical vulnerabilities found!" echo "Deployment blocked for production" if [ "$ENVIRONMENT" == "production" ]; then exit 1 fi fi - echo "✅ Image security check passed" validate_docker_contexts: stage: validate script: - echo "Validating Docker contexts..." - ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" # Check each node context - | yq eval '.nodes[]' $ENV_CONFIG -o=json | while read node; do CONTEXT=$(echo $node | jq -r '.context') ENDPOINT=$(echo $node | jq -r '.endpoint') echo "Checking context: $CONTEXT ($ENDPOINT)" # Verify context exists if ! docker context ls --format '{{.Name}}' | grep -q "^${CONTEXT}$"; then echo "❌ Context not found: $CONTEXT" exit 1 fi # Test connectivity docker --context $CONTEXT node ls > /dev/null 2>&1 if [ $? -eq 0 ]; then echo "✅ Context accessible: $CONTEXT" else echo "❌ Cannot connect to context: $CONTEXT" exit 1 fi done check_database_connectivity: stage: validate script: - echo "Checking database connectivity..." - ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" - DB_HOST=$(yq eval '.database.host' $ENV_CONFIG) - DB_PORT=$(yq eval '.database.port' $ENV_CONFIG) - DB_NAME=$(yq eval '.database.name' $ENV_CONFIG) - DB_USER=$(yq eval '.database.user' $ENV_CONFIG) - echo "Database: ${DB_USER}@${DB_HOST}:${DB_PORT}/${DB_NAME}" # Test connection - | PGPASSWORD="${DB_PASSWORD}" psql \ -h "${DB_HOST}" \ -p "${DB_PORT}" \ -U "${DB_USER}" \ -d "${DB_NAME}" \ -c "SELECT 1;" > /dev/null - echo "✅ Database connection successful" ``` ### 5.3 Prepare Stage **.gitlab/pipelines/prepare.yml:** ```yaml # =============================================== # PREPARATION STAGE # Prepare deployment directories and artifacts # =============================================== prepare_release_directories: stage: prepare needs: - load_environment_config script: - echo "Preparing release directories..." - ENV_CONFIG="env_config.yml" - BASE_DIR=$(yq eval '.base.directory' $ENV_CONFIG) - REGISTRY=$(yq eval '.base.registry' $ENV_CONFIG) # Extract release from Docker image - echo "Extracting release archive..." - IMAGE="${REGISTRY}:${RELEASE_TAG}" - docker run -i --rm "${IMAGE}" release | base64 -d > release.tar.gz - tar -xzf release.tar.gz - rm release.tar.gz - RELEASE_DIR="coin-${RELEASE_TAG}" - echo "Release extracted to: $RELEASE_DIR" # Prepare for each node - | yq eval '.nodes[]' $ENV_CONFIG -o=json | while read node; do NODE_NAME=$(echo $node | jq -r '.name') echo "Preparing node: $NODE_NAME" TARGET_DIR="${BASE_DIR}/${RELEASE_TAG}-${NODE_NAME}" mkdir -p "$TARGET_DIR" # Copy release files cp -r "$RELEASE_DIR"/* "$TARGET_DIR/" # Copy node-specific configuration cp "environments/${ENVIRONMENT}/nodes/${NODE_NAME}"/* "$TARGET_DIR/" # Decrypt secrets sops -d "environments/${ENVIRONMENT}/nodes/${NODE_NAME}/secrets.override.enc" \ > "$TARGET_DIR/secrets.override.env" # Update TAG in node.env sed -i "s/^TAG=.*/TAG=${RELEASE_TAG}/" "$TARGET_DIR/node.env" # Add deployment metadata cat >> "$TARGET_DIR/node.env" < deployment-manifest.json </dev/null || true # Create context with TLS docker context create "$CONTEXT" \ --description "COIN ${ENVIRONMENT} ${NAME}" \ --docker "host=${ENDPOINT},ca=/certs/${ENVIRONMENT}/ca.pem,cert=/certs/${ENVIRONMENT}/cert.pem,key=/certs/${ENVIRONMENT}/key.pem" # Verify context if docker --context "$CONTEXT" node ls > /dev/null 2>&1; then echo "✅ Context verified: $CONTEXT" else echo "❌ Context verification failed: $CONTEXT" exit 1 fi done echo "All contexts created successfully" ``` **Usage в pipeline:** ```yaml setup_docker_contexts: stage: .pre script: - .gitlab/scripts/setup-docker-contexts.sh "${ENVIRONMENT}" cache: key: docker-contexts-${ENVIRONMENT} paths: - ~/.docker/contexts/ ``` ### 6.4 Environment Promotion Workflow **Concept:** Изменения проходят через окружения последовательно. ``` Development → Sandbox → Testing → Production (auto) (auto) (manual) (CAB approval) ``` **Promotion Script:** **.gitlab/scripts/promote-environment.sh:** ```bash #!/usr/bin/env bash set -euo pipefail # Arguments: # $1 - FROM_ENV (development/sandbox/testing) # $2 - TO_ENV (sandbox/testing/production) FROM_ENV=$1 TO_ENV=$2 echo "Promoting configuration: ${FROM_ENV} → ${TO_ENV}" # Validation VALID_PROMOTIONS=( "development:sandbox" "sandbox:testing" "testing:production" ) PROMOTION="${FROM_ENV}:${TO_ENV}" if [[ ! " ${VALID_PROMOTIONS[@]} " =~ " ${PROMOTION} " ]]; then echo "❌ Invalid promotion path: $PROMOTION" echo "Valid promotions:" for p in "${VALID_PROMOTIONS[@]}"; do echo " - $p" done exit 1 fi # Copy common configuration echo "Copying common configuration..." cp -r "environments/${FROM_ENV}/common/project.env" \ "environments/${TO_ENV}/common/project.env.promoted" # Review changes echo "Configuration changes:" diff "environments/${TO_ENV}/common/project.env" \ "environments/${TO_ENV}/common/project.env.promoted" || true # Node-specific configurations for FROM_NODE in environments/${FROM_ENV}/nodes/*/; do NODE_NAME=$(basename "$FROM_NODE") TO_NODE="environments/${TO_ENV}/nodes/${NODE_NAME}" if [ -d "$TO_NODE" ]; then echo "Promoting node configuration: $NODE_NAME" # Copy non-secret files cp "${FROM_NODE}/docker-compose.yml" "${TO_NODE}/docker-compose.yml.promoted" cp "${FROM_NODE}/project_${NODE_NAME}.env" "${TO_NODE}/project_${NODE_NAME}.env.promoted" # Secrets are NOT promoted automatically - manual review required else echo "⚠️ Node ${NODE_NAME} does not exist in ${TO_ENV}" fi done echo "Promotion prepared. Review .promoted files and commit if acceptable." ``` **GitLab Pipeline Integration:** ```yaml promote_to_testing: stage: promote script: - .gitlab/scripts/promote-environment.sh sandbox testing # Create merge request - | git checkout -b "promote/sandbox-to-testing-${CI_COMMIT_SHORT_SHA}" # Move promoted files find environments/testing -name "*.promoted" | while read file; do mv "$file" "${file%.promoted}" done git add environments/testing/ git commit -m "config: promote sandbox → testing Promoted configuration from sandbox to testing - Common project settings - Node-specific configurations - Docker compose files Refs: ${CI_COMMIT_SHA}" git push origin "promote/sandbox-to-testing-${CI_COMMIT_SHORT_SHA}" # Create MR via GitLab API - | curl -X POST "${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/merge_requests" \ --header "PRIVATE-TOKEN: ${GITLAB_API_TOKEN}" \ --data "source_branch=promote/sandbox-to-testing-${CI_COMMIT_SHORT_SHA}" \ --data "target_branch=master" \ --data "title=Promote configuration: sandbox → testing" \ --data "description=Automated configuration promotion from sandbox to testing. ## Changes - Common configuration updates - Node-specific setting adjustments ## Review Required - Verify all changes are appropriate for testing environment - Check resource allocations - Validate feature flags ## Next Steps After merge, trigger testing deployment pipeline." when: manual only: - master ``` ### 6.5 Feature Flag Management **Purpose:** Enable/disable features без code deployment. **Implementation:** ```bash # environments/development/common/project.env # Development: All features ON для testing FEATURE_NEW_CHECKOUT=true FEATURE_BETA_UI=true FEATURE_ADVANCED_REPORTING=true FEATURE_EXPERIMENTAL_PAYMENT_FLOW=true FEATURE_AI_RECOMMENDATIONS=true # environments/sandbox/common/project.env # Sandbox: Most features ON, некоторые experimental OFF FEATURE_NEW_CHECKOUT=true FEATURE_BETA_UI=true FEATURE_ADVANCED_REPORTING=true FEATURE_EXPERIMENTAL_PAYMENT_FLOW=false FEATURE_AI_RECOMMENDATIONS=true # environments/testing/common/project.env # Testing: Production-like, only stable features FEATURE_NEW_CHECKOUT=true FEATURE_BETA_UI=false FEATURE_ADVANCED_REPORTING=true FEATURE_EXPERIMENTAL_PAYMENT_FLOW=false FEATURE_AI_RECOMMENDATIONS=false # environments/production/common/project.env # Production: Only battle-tested features FEATURE_NEW_CHECKOUT=true FEATURE_BETA_UI=false FEATURE_ADVANCED_REPORTING=true FEATURE_EXPERIMENTAL_PAYMENT_FLOW=false FEATURE_AI_RECOMMENDATIONS=false ``` **Advanced: LaunchDarkly Integration (optional):** ```yaml # For production, use LaunchDarkly для gradual rollouts production_feature_flags: stage: deploy script: - | # Get feature flags from LaunchDarkly FEATURE_CONFIG=$(curl -X GET \ "https://app.launchdarkly.com/api/v2/flags/coin-production" \ -H "Authorization: ${LAUNCHDARKLY_API_KEY}") # Update environment variables echo "FEATURE_NEW_CHECKOUT=$(echo $FEATURE_CONFIG | jq -r '.flags.new_checkout.on')" >> production.env echo "FEATURE_BETA_UI=$(echo $FEATURE_CONFIG | jq -r '.flags.beta_ui.on')" >> production.env only: - tags environment: name: production ``` ### 6.6 Resource Management per Environment **Development:** ```yaml # Minimal resources services: admin_api: deploy: replicas: 1 resources: limits: cpus: '0.5' memory: 512M reservations: cpus: '0.25' memory: 256M ``` **Sandbox:** ```yaml # Moderate resources services: admin_api: deploy: replicas: 1 resources: limits: cpus: '1.0' memory: 1G reservations: cpus: '0.5' memory: 512M ``` **Production:** ```yaml # Full resources services: admin_api: deploy: replicas: 3 resources: limits: cpus: '2.0' memory: 4G reservations: cpus: '1.0' memory: 2G placement: constraints: - node.labels.env == production preferences: - spread: node.labels.zone # Multi-AZ ``` --- ## 7. Secrets Management ### 7.1 Current Secret Management Analysis **Существующая система в docker-compose.yml:** ```yaml secrets: card_iv.txt: file: ./secrets/card_iv.txt name: card_iv.$SV_card_iv # Versioned secret db_access: file: ./secrets/db_access name: db_access.$SV_db_access # 30+ total secrets... ``` **Версионирование через SV_* переменные:** ```bash # secrets.override.env SV_card_iv=1 SV_db_access=2 SV_webhook_auth=1 # Results in Swarm: # card_iv.1 # card_iv.2 (новая версия, old still exists) ``` **Проблемы:** - ❌ Секреты в plaintext на filesystem - ❌ Нет centralized management - ❌ Сложная ротация (30+ файлов) - ❌ Нет audit trail кто получал доступ - ❌ Риск утечки через Git (если случайно закоммитить) ### 7.2 Multi-Layer Secrets Architecture **Архитектура:** ``` Layer 1: GitLab CI/CD Variables (Infrastructure Credentials) ├── HARBOR_USER / HARBOR_PASSWORD ├── SSH_PRIVATE_KEY_NODE3 / SSH_PRIVATE_KEY_NODE4 ├── SOPS_GPG_PRIVATE_KEY ├── DB_PASSWORD ├── SLACK_WEBHOOK_URL └── API tokens для external services Layer 2: SOPS Encrypted Files in Git (Application Secrets) ├── Database credentials ├── API keys (payment gateway, etc.) ├── Encryption keys ├── JWT secrets └── Third-party service credentials Layer 3: Docker Secrets (Runtime) ├── Mounted в containers как files (/run/secrets/) ├── Managed через Swarm ├── Versioned (card_iv.1, card_iv.2) ├── Encrypted at rest & in transit └── Access control через service definitions Layer 4: External Secret Manager (Optional - Enterprise) └── HashiCorp Vault ├── Dynamic secrets ├── Automatic rotation ├── Detailed audit logs └── Policy-based access ``` ### 7.3 SOPS Integration **Setup:** ```bash # 1. Generate GPG keys для authorized team members gpg --full-generate-key # Name: DevOps Team Member # Email: devops@company.com # 2. Export public key gpg --armor --export devops@company.com > devops.pub.asc # 3. Import team keys for key in team/*.pub.asc; do gpg --import "$key" done ``` **.sops.yaml configuration:** ```yaml creation_rules: # Production secrets - только senior team - path_regex: environments/production/.*/secrets\..*\.enc$ pgp: >- FBC7B9E2A4F9289AC0C1D4843D16CEE4A27381B4, 8E2E0E4F09A5F8B9C1D2E3F4A5B6C7D8E9F0A1B2 encrypted_regex: '^(password|secret|key|token|private_key|api_key)$' # Testing secrets - team leads + DevOps - path_regex: environments/testing/.*/secrets\..*\.enc$ pgp: >- FBC7B9E2A4F9289AC0C1D4843D16CEE4A27381B4, 1234567890ABCDEF1234567890ABCDEF12345678, ABCDEF1234567890ABCDEF1234567890ABCDEF12 encrypted_regex: '^(password|secret|key|token)$' # Sandbox secrets - все DevOps team - path_regex: environments/sandbox/.*/secrets\..*\.enc$ pgp: >- FBC7B9E2A4F9289AC0C1D4843D16CEE4A27381B4, 1234567890ABCDEF1234567890ABCDEF12345678, ABCDEF1234567890ABCDEF1234567890ABCDEF12, 9876543210FEDCBA9876543210FEDCBA98765432 encrypted_regex: '^(password|secret|key|token)$' # Development - all developers - path_regex: environments/development/.*/secrets\..*\.enc$ pgp: >- FBC7B9E2A4F9289AC0C1D4843D16CEE4A27381B4, DEV_TEAM_KEY_1, DEV_TEAM_KEY_2, DEV_TEAM_KEY_3 encrypted_regex: '^(password|secret|key)$' ``` **Create/Edit Encrypted Secrets:** ```bash # Create new secret file for sandbox/node3 cd coin-gitops sops environments/sandbox/nodes/node3/secrets.override.enc # File opens in $EDITOR as plaintext: DATABASE_PASSWORD: "sandbox-db-password-123" API_KEY: "sk-sandbox-api-key-456" JWT_SECRET: "jwt-signing-secret-789" REDIS_PASSWORD: "redis-password-abc" PAYMENT_GATEWAY_API_KEY: "pg-api-key-def" CARD_ENCRYPTION_KEY: "card-enc-key-ghi" # On save, automatically encrypted by SOPS # Safe to commit to Git git add environments/sandbox/nodes/node3/secrets.override.enc git commit -m "feat(secrets): add sandbox node3 secrets" ``` **Encrypted File Format:** ```yaml DATABASE_PASSWORD: ENC[AES256_GCM,data:8hT9k2mP3nQ...,iv:xyz...,tag:abc...,type:str] API_KEY: ENC[AES256_GCM,data:mK9sL3nQ7pR...,iv:def...,tag:ghi...,type:str] sops: kms: [] pgp: - created_at: "2025-01-14T10:30:00Z" enc: | -----BEGIN PGP MESSAGE----- hQIMA... -----END PGP MESSAGE----- fp: FBC7B9E2A4F9289AC0C1D4843D16CEE4A27381B4 version: 3.7.3 ``` ### 7.4 CI/CD Pipeline Secret Handling **Decryption в pipeline:** ```yaml decrypt_secrets: stage: prepare script: - echo "Decrypting secrets for ${ENVIRONMENT}..." # Import GPG key from GitLab CI/CD Variable - echo "$SOPS_GPG_PRIVATE_KEY" | base64 -d | gpg --import # Decrypt secrets для каждого node - ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" - | yq eval '.nodes[].name' $ENV_CONFIG | while read NODE_NAME; do SECRET_FILE="environments/${ENVIRONMENT}/nodes/${NODE_NAME}/secrets.override.enc" OUTPUT_FILE="/tmp/secrets-${NODE_NAME}.env" if [ -f "$SECRET_FILE" ]; then echo "Decrypting secrets for: $NODE_NAME" sops -d "$SECRET_FILE" > "$OUTPUT_FILE" # Restrictive permissions chmod 600 "$OUTPUT_FILE" # Validate required secrets present for KEY in DATABASE_PASSWORD API_KEY JWT_SECRET; do if ! grep -q "^${KEY}:" "$OUTPUT_FILE"; then echo "❌ Required secret ${KEY} not found for ${NODE_NAME}" exit 1 fi done echo "✅ Secrets decrypted: $NODE_NAME" else echo "⚠️ No secrets file for: $NODE_NAME" fi done artifacts: paths: - /tmp/secrets-*.env expire_in: 1 hour # Short expiration для security after_script: # Cleanup decrypted secrets - rm -f /tmp/secrets-*.env ``` **Convert YAML secrets to ENV format:** ```bash # secrets.override.enc (YAML format): DATABASE_PASSWORD: "secret123" API_KEY: "key456" # Convert to ENV format для deployment.sh: cat /tmp/secrets-node3.env | yq eval -o=props > /tmp/secrets-node3.props.env # Result: DATABASE_PASSWORD=secret123 API_KEY=key456 ``` ### 7.5 Docker Secrets Creation in Swarm **Create secrets from decrypted files:** ```yaml create_docker_secrets: stage: deploy needs: - decrypt_secrets script: - echo "Creating Docker secrets in Swarm..." - ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" - | yq eval '.nodes[]' $ENV_CONFIG -o=json | while read node; do NODE_NAME=$(echo $node | jq -r '.name') CONTEXT=$(echo $node | jq -r '.context') docker context use "$CONTEXT" # Read decrypted secrets SECRET_FILE="/tmp/secrets-${NODE_NAME}.env" # Parse secret version from config SECRET_VERSION=$(date +%s) # Unix timestamp # Create each secret in Swarm while IFS=: read -r key value; do SECRET_NAME="${key}_v${SECRET_VERSION}" echo "$value" | docker secret create "$SECRET_NAME" - || { echo "⚠️ Secret ${SECRET_NAME} already exists, skipping" } echo "✅ Secret created: $SECRET_NAME" done < <(yq eval 'to_entries | .[] | .key + ":" + .value' "$SECRET_FILE") # Update secret version variables echo "SV_${key}=${SECRET_VERSION}" >> secret_versions_${NODE_NAME}.env done - echo "All secrets created in Swarm" artifacts: paths: - secret_versions_*.env expire_in: 1 day ``` ### 7.6 Secret Rotation Strategy **Rotation Process:** ``` 1. Generate new secret value 2. Create new version in Swarm (e.g., db_password.3) 3. Update SV_db_password=3 в secrets.override.env 4. Deploy - services start using new version 5. Old versions (db_password.1, db_password.2) remain для rollback 6. After grace period (7-30 days), remove old versions ``` **Rotation Script:** **.gitlab/scripts/rotate-secret.sh:** ```bash #!/usr/bin/env bash set -euo pipefail # Arguments: # $1 - ENVIRONMENT # $2 - NODE_NAME # $3 - SECRET_NAME # $4 - NEW_VALUE ENVIRONMENT=$1 NODE_NAME=$2 SECRET_NAME=$3 NEW_VALUE=$4 echo "Rotating secret: ${SECRET_NAME} for ${ENVIRONMENT}/${NODE_NAME}" # Get Docker context ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" CONTEXT=$(yq eval ".nodes[] | select(.name == \"${NODE_NAME}\") | .context" $ENV_CONFIG) # Get current version SECRET_FILE="environments/${ENVIRONMENT}/nodes/${NODE_NAME}/secrets.override.enc" CURRENT_VERSION=$(sops -d "$SECRET_FILE" | yq eval ".${SECRET_NAME}_VERSION // 0") NEW_VERSION=$((CURRENT_VERSION + 1)) echo "Current version: $CURRENT_VERSION" echo "New version: $NEW_VERSION" # Create new secret in Swarm docker context use "$CONTEXT" echo "$NEW_VALUE" | docker secret create "${SECRET_NAME}.${NEW_VERSION}" - # Update encrypted file sops --set "[\"${SECRET_NAME}\"] \"${NEW_VALUE}\"" "$SECRET_FILE" sops --set "[\"${SECRET_NAME}_VERSION\"] ${NEW_VERSION}" "$SECRET_FILE" echo "✅ Secret rotated: ${SECRET_NAME} → version ${NEW_VERSION}" echo "" echo "Next steps:" echo "1. Commit updated secrets file" echo "2. Deploy to apply new secret" echo "3. After grace period, remove old version:" echo " docker secret rm ${SECRET_NAME}.${CURRENT_VERSION}" ``` **Automated Rotation Schedule:** ```yaml rotate_production_secrets: stage: maintenance script: - | # Rotate database password every 90 days LAST_ROTATION=$(git log -1 --format=%ct -- environments/production/nodes/*/secrets.override.enc) CURRENT=$(date +%s) DAYS_SINCE=$((($CURRENT - $LAST_ROTATION) / 86400)) if [ $DAYS_SINCE -gt 90 ]; then echo "Database password rotation required (${DAYS_SINCE} days since last)" # Generate new password NEW_PASSWORD=$(openssl rand -base64 32) # Rotate for all production nodes for NODE in prod1 prod2 prod3 prod4; do .gitlab/scripts/rotate-secret.sh production "$NODE" "DATABASE_PASSWORD" "$NEW_PASSWORD" done # Create MR for approval git checkout -b "security/rotate-db-password-$(date +%Y%m%d)" git add environments/production/ git commit -m "security: rotate production database password Automated 90-day rotation of database credentials - Generated new strong password - Updated all production nodes - Old version will be removed after 30 days" git push # Create MR via API... else echo "Database password rotation not required (${DAYS_SINCE} days since last)" fi only: - schedules when: manual ``` ### 7.7 Secret Access Audit **Audit Logging:** ```yaml audit_secret_access: stage: verify script: - echo "Auditing secret access..." - ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" - | yq eval '.nodes[]' $ENV_CONFIG -o=json | while read node; do NODE_NAME=$(echo $node | jq -r '.name') CONTEXT=$(echo $node | jq -r '.context') docker context use "$CONTEXT" # Get secret usage docker secret ls --format '{{.Name}}\t{{.CreatedAt}}\t{{.UpdatedAt}}' # Get services using secrets docker service ls --format '{{.Name}}' | while read service; do SECRETS=$(docker service inspect "$service" --format '{{range .Spec.TaskTemplate.ContainerSpec.Secrets}}{{.SecretName}} {{end}}') if [ -n "$SECRETS" ]; then echo "Service ${service} uses secrets: $SECRETS" fi done done > secret-audit-${ENVIRONMENT}-$(date +%Y%m%d).log - echo "✅ Audit log created" artifacts: paths: - secret-audit-*.log expire_in: 1 year only: - schedules ``` --- ## 8. Rollback Strategy ### 8.1 Current Rollback Mechanism Analysis **Существующая rollback функция в auto.sh:** ```bash rollback() { # 1. Stop current stacks docker stack rm "$NODE3_STACK" docker stack rm "$NODE4_STACK" sleep 3 # 2. Deploy previous version cd "$NODE3_PREV" ./deploy.sh deploy [params...] cd "$NODE4_PREV" ./deploy.sh deploy [params...] } ``` **Проблемы:** - ⚠️ Зависит от существования previous directories - ⚠️ Нет verification после rollback - ⚠️ Manual trigger только - ⚠️ Полное удаление стеков (downtime) - ⚠️ Нет partial rollback (только all-or-nothing) ### 8.2 Improved Rollback Architecture **Multi-Level Rollback Strategy:** ``` Level 1: Service-Level Rollback (fastest, 1-2 minutes) ├── Revert single service to previous version ├── Keep other services running ├── Minimal impact └── Use: bug в одном сервисе Level 2: Stack-Level Rollback (medium, 3-5 minutes) ├── Revert entire stack (all services) ├── Coordinated rollback ├── Moderate impact └── Use: multiple services affected Level 3: Infrastructure Rollback (slowest, 5-10 minutes) ├── Revert configuration changes ├── Revert database migrations (if safe) ├── Full environment restore └── Use: critical infrastructure issues ``` ### 8.3 GitLab Pipeline Rollback Jobs **.gitlab/pipelines/rollback.yml:** ```yaml # =============================================== # ROLLBACK PIPELINE # Multi-level rollback strategy # =============================================== .rollback_preparation: &rollback_preparation before_script: - echo "Preparing rollback for ${ENVIRONMENT}..." - ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" # Get previous stable version from Git - | PREVIOUS_TAG=$(git describe --tags --abbrev=0 HEAD~1) echo "Current: ${RELEASE_TAG}" echo "Previous: ${PREVIOUS_TAG}" echo "PREVIOUS_TAG=${PREVIOUS_TAG}" >> rollback.env artifacts: reports: dotenv: rollback.env expire_in: 1 hour rollback_service: stage: rollback <<: *rollback_preparation script: - echo "Rolling back service: ${SERVICE_NAME}" - NODE_NAME="${NODE_NAME}" - ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" # Get node configuration - | NODE_CONFIG=$(yq eval ".nodes[] | select(.name == \"${NODE_NAME}\")" $ENV_CONFIG -o=json) CONTEXT=$(echo $NODE_CONFIG | jq -r '.context') STACK=$(echo $NODE_CONFIG | jq -r '.stack') # Get previous image tag - PREVIOUS_IMAGE="${REGISTRY}/${SERVICE_NAME}:${PREVIOUS_TAG}" - echo "Rolling back ${SERVICE_NAME} to ${PREVIOUS_TAG}" # Update service image - docker context use "$CONTEXT" - | docker service update \ --image "$PREVIOUS_IMAGE" \ --update-failure-action rollback \ "${STACK}_${SERVICE_NAME}" # Wait for service update - sleep 30 # Verify service health - | REPLICAS=$(docker service ls --filter name="${STACK}_${SERVICE_NAME}" --format '{{.Replicas}}') echo "Service replicas: $REPLICAS" if [[ "$REPLICAS" != *"/"* ]]; then echo "❌ Service rollback failed" exit 1 fi RUNNING=$(echo $REPLICAS | cut -d'/' -f1) DESIRED=$(echo $REPLICAS | cut -d'/' -f2) if [ "$RUNNING" -ne "$DESIRED" ]; then echo "❌ Service not fully rolled back: $RUNNING/$DESIRED" exit 1 fi - echo "✅ Service rolled back successfully: ${SERVICE_NAME}" variables: SERVICE_NAME: "" # Must be provided NODE_NAME: "" # Must be provided when: manual allow_failure: false rollback_stack: stage: rollback <<: *rollback_preparation script: - echo "Rolling back entire stack: ${NODE_NAME}" - ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" # Get node configuration - | NODE_CONFIG=$(yq eval ".nodes[] | select(.name == \"${NODE_NAME}\")" $ENV_CONFIG -o=json) CONTEXT=$(echo $NODE_CONFIG | jq -r '.context') STACK=$(echo $NODE_CONFIG | jq -r '.stack') BASE_DIR=$(yq eval '.base.directory' $ENV_CONFIG) - echo "Context: $CONTEXT" - echo "Stack: $STACK" - echo "Previous version: $PREVIOUS_TAG" # Check previous version directory exists - PREV_DIR="${BASE_DIR}/${PREVIOUS_TAG}-${NODE_NAME}" - | if [ ! -d "$PREV_DIR" ]; then echo "❌ Previous version directory not found: $PREV_DIR" echo "Available versions:" ls -la "$BASE_DIR" | grep "$NODE_NAME" exit 1 fi - echo "✅ Previous version found: $PREV_DIR" # Stop current stack (gracefully) - docker context use "$CONTEXT" - echo "Stopping current stack..." - docker stack rm "$STACK" || echo "Stack already removed" # Wait for stack to fully stop - sleep 10 - | while docker service ls | grep -q "$STACK"; do echo "Waiting for services to stop..." sleep 5 done - echo "✅ Stack stopped" # Deploy previous version - cd "$PREV_DIR" - echo "Deploying previous version from: $(pwd)" - | ./deployment.sh deploy \ -n "$CONTEXT" \ -w "$STACK" \ -N node.env \ -P project.env \ -P project_${NODE_NAME}.env \ -f docker-compose.yml \ -f custom.secrets.yml \ -f docker-compose-testshop.yaml \ -s secrets.override.env \ -u # Verify deployment - sleep 30 - docker service ls --filter name="$STACK" - | SERVICE_COUNT=$(docker service ls --filter name="$STACK" --format '{{.Name}}' | wc -l) if [ "$SERVICE_COUNT" -lt 5 ]; then echo "❌ Rollback incomplete: only $SERVICE_COUNT services running" exit 1 fi - echo "✅ Stack rolled back successfully: ${NODE_NAME}" variables: NODE_NAME: "" # Must be provided when: manual allow_failure: false rollback_all_nodes: stage: rollback <<: *rollback_preparation script: - echo "Rolling back all nodes in ${ENVIRONMENT}" - ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" - BASE_DIR=$(yq eval '.base.directory' $ENV_CONFIG) # Rollback each node sequentially - | yq eval '.nodes[]' $ENV_CONFIG -o=json | while read node; do NODE_NAME=$(echo $node | jq -r '.name') CONTEXT=$(echo $node | jq -r '.context') STACK=$(echo $node | jq -r '.stack') echo "=========================================" echo "Rolling back node: $NODE_NAME" echo "=========================================" PREV_DIR="${BASE_DIR}/${PREVIOUS_TAG}-${NODE_NAME}" if [ ! -d "$PREV_DIR" ]; then echo "❌ Previous version not found for: $NODE_NAME" continue fi # Stop and redeploy docker context use "$CONTEXT" docker stack rm "$STACK" || true sleep 10 cd "$PREV_DIR" ./deployment.sh deploy \ -n "$CONTEXT" \ -w "$STACK" \ -N node.env \ -P project.env \ -P project_${NODE_NAME}.env \ -f docker-compose.yml \ -f custom.secrets.yml \ -f docker-compose-testshop.yaml \ -s secrets.override.env \ -u echo "✅ Node rolled back: $NODE_NAME" done - echo "✅ All nodes rolled back successfully" when: manual allow_failure: false environment: name: ${ENVIRONMENT} action: rollback ``` ### 8.4 Automatic Rollback Triggers **Health Check Based Auto-Rollback:** ```yaml verify_deployment_health: stage: verify script: - echo "Monitoring deployment health..." - ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" - HEALTH_CHECK_TIMEOUT=$(yq eval '.deployment.health_check.timeout' $ENV_CONFIG | sed 's/s//') - HEALTH_CHECK_INTERVAL=$(yq eval '.deployment.health_check.interval' $ENV_CONFIG | sed 's/s//') - START_TIME=$(date +%s) - FAILURES=0 - MAX_FAILURES=3 - | while true; do CURRENT_TIME=$(date +%s) ELAPSED=$(($CURRENT_TIME - $START_TIME)) if [ $ELAPSED -gt $HEALTH_CHECK_TIMEOUT ]; then echo "❌ Health check timeout reached" FAILURES=$(($FAILURES + 1)) break fi # Check all nodes ALL_HEALTHY=true yq eval '.nodes[]' $ENV_CONFIG -o=json | while read node; do NODE_NAME=$(echo $node | jq -r '.name') CONTEXT=$(echo $node | jq -r '.context') STACK=$(echo $node | jq -r '.stack') docker context use "$CONTEXT" # Check service health UNHEALTHY=$(docker service ls --filter name="$STACK" --format '{{.Replicas}}' | grep -v "/" | wc -l) if [ "$UNHEALTHY" -gt 0 ]; then echo "⚠️ Unhealthy services detected on $NODE_NAME" ALL_HEALTHY=false FAILURES=$(($FAILURES + 1)) fi done if $ALL_HEALTHY; then echo "✅ All services healthy" break fi if [ $FAILURES -ge $MAX_FAILURES ]; then echo "❌ Max failures reached: $FAILURES" echo "Triggering automatic rollback..." # Trigger rollback pipeline curl -X POST \ -F "token=${CI_JOB_TOKEN}" \ -F "ref=master" \ -F "variables[ENVIRONMENT]=${ENVIRONMENT}" \ -F "variables[TRIGGER_ROLLBACK]=true" \ "${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/trigger/pipeline" exit 1 fi sleep $HEALTH_CHECK_INTERVAL done retry: max: 0 # No retry - trigger rollback instead ``` ### 8.5 Database Migration Rollback **Проблема:** Database migrations нельзя откатить автоматически (data loss risk). **Strategy:** ```yaml handle_migration_rollback: stage: rollback script: - echo "Handling database migration rollback..." - echo "⚠️ WARNING: Database migrations cannot be automatically rolled back" - ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" - DB_HOST=$(yq eval '.database.host' $ENV_CONFIG) - DB_NAME=$(yq eval '.database.name' $ENV_CONFIG) # Get current migration ID - | CURRENT_MIGRATION=$(PGPASSWORD="${DB_PASSWORD}" psql \ -h "${DB_HOST}" \ -U coin \ -d "${DB_NAME}" \ -t -c "SELECT MAX(id) FROM schema_migrations;") - echo "Current migration ID: $CURRENT_MIGRATION" # Get expected migration for previous version - | PREVIOUS_MIGRATION=$(git show ${PREVIOUS_TAG}:environments/${ENVIRONMENT}/migration.txt) echo "Previous version migration ID: $PREVIOUS_MIGRATION" - | if [ "$CURRENT_MIGRATION" -gt "$PREVIOUS_MIGRATION" ]; then echo "❌ CRITICAL: New migrations were applied!" echo "Current: $CURRENT_MIGRATION" echo "Previous: $PREVIOUS_MIGRATION" echo "" echo "Manual intervention required:" echo "1. Review migrations between $PREVIOUS_MIGRATION and $CURRENT_MIGRATION" echo "2. Determine if rollback is safe (check for data loss)" echo "3. If safe, manually execute down migrations" echo "4. If not safe, consider forward fix instead" echo "" echo "Contact DBA team immediately!" # Send alert curl -X POST "$SLACK_WEBHOOK_URL" \ -H 'Content-Type: application/json' \ -d '{ "text": "🚨 CRITICAL: Migration rollback required", "attachments": [{ "color": "danger", "text": "Environment: '"$ENVIRONMENT"'\nCurrent migration: '"$CURRENT_MIGRATION"'\nTarget migration: '"$PREVIOUS_MIGRATION"'\n\nManual DBA intervention required!" }] }' exit 1 else echo "✅ No new migrations applied, safe to rollback" fi when: on_failure allow_failure: false ``` ### 8.6 Rollback Verification **Post-Rollback Checks:** ```yaml verify_rollback: stage: verify needs: - rollback_stack script: - echo "Verifying rollback success..." - ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" # 1. Check all services running - | yq eval '.nodes[]' $ENV_CONFIG -o=json | while read node; do NODE_NAME=$(echo $node | jq -r '.name') CONTEXT=$(echo $node | jq -r '.context') STACK=$(echo $node | jq -r '.stack') docker context use "$CONTEXT" echo "Checking services on: $NODE_NAME" SERVICES=$(docker service ls --filter name="$STACK" --format '{{.Name}}\t{{.Replicas}}') echo "$SERVICES" # Verify all services converged UNCONVERGED=$(echo "$SERVICES" | awk -F'\t' '{ split($2, a, "/") if (a[1] != a[2]) print $1 }') if [ -n "$UNCONVERGED" ]; then echo "❌ Unconverged services after rollback:" echo "$UNCONVERGED" exit 1 fi done - echo "✅ All services converged" # 2. Health check endpoints - | yq eval '.nodes[]' $ENV_CONFIG -o=json | while read node; do NODE_NAME=$(echo $node | jq -r '.name') PUBLIC_IP=$(echo $node | jq -r '.public_ip // ""') if [ -n "$PUBLIC_IP" ]; then echo "Health check: https://${PUBLIC_IP}:5443/health" HTTP_CODE=$(curl -k -s -o /dev/null -w "%{http_code}" "https://${PUBLIC_IP}:5443/health") if [ "$HTTP_CODE" != "200" ]; then echo "❌ Health check failed: HTTP $HTTP_CODE" exit 1 fi echo "✅ Health check passed: $NODE_NAME" fi done # 3. Smoke tests - .gitlab/scripts/smoke-tests.sh "${ENVIRONMENT}" - echo "✅ Rollback verification complete" ``` ### 8.7 Rollback Documentation **Post-Rollback Report:** ```yaml generate_rollback_report: stage: notify needs: - verify_rollback script: - | cat > rollback-report-${ENVIRONMENT}-$(date +%Y%m%d-%H%M%S).md < 0 # CPU Usage rate(container_cpu_usage_seconds_total[5m]) * 100 # Memory Usage container_memory_usage_bytes / container_spec_memory_limit_bytes * 100 # Network Traffic rate(container_network_receive_bytes_total[5m]) rate(container_network_transmit_bytes_total[5m]) # HTTP Request Rate rate(http_requests_total[5m]) # HTTP Error Rate rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) * 100 # Response Time (95th percentile) histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) ``` **Grafana Dashboard - Deployment Overview:** ```json { "dashboard": { "title": "COIN Deployment Dashboard", "panels": [ { "title": "Deployment Timeline", "type": "graph", "targets": [ { "expr": "changes(deployment_version{environment=\"$environment\"}[1h])" } ] }, { "title": "Service Health", "type": "stat", "targets": [ { "expr": "count(up{job=\"coin-api\",environment=\"$environment\"} == 1)" } ] }, { "title": "Error Rate", "type": "graph", "targets": [ { "expr": "rate(http_requests_total{status=~\"5..\",environment=\"$environment\"}[5m])" } ] }, { "title": "Response Time (p95)", "type": "graph", "targets": [ { "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{environment=\"$environment\"}[5m]))" } ] } ] } } ``` ### 9.3 Application Health Checks **Health Check Endpoints:** ```yaml # docker-compose.yml services: admin_api: healthcheck: test: ["CMD", "curl", "-f", "http://localhost:10000/health"] interval: 10s timeout: 5s retries: 3 start_period: 40s ``` **Comprehensive Health Check Script:** **.gitlab/scripts/health-check.sh:** ```bash #!/usr/bin/env bash set -euo pipefail # Arguments: # $1 - BASE_URL (e.g., https://coin-node3.sandbox.company.com) # $2 - ENVIRONMENT BASE_URL=$1 ENVIRONMENT=$2 echo "Running health checks against: ${BASE_URL}" FAILED_CHECKS=0 # Test 1: Basic Health Endpoint echo "Test 1: Health endpoint..." HTTP_CODE=$(curl -k -s -o /dev/null -w "%{http_code}" "${BASE_URL}/health") if [ "$HTTP_CODE" = "200" ]; then echo "✅ Health check passed (HTTP $HTTP_CODE)" else echo "❌ Health check failed (HTTP $HTTP_CODE)" FAILED_CHECKS=$((FAILED_CHECKS + 1)) fi # Test 2: API Version echo "Test 2: API version..." VERSION=$(curl -k -s "${BASE_URL}/api/version" | jq -r '.version // empty') if [ -n "$VERSION" ]; then echo "✅ API version: ${VERSION}" else echo "❌ API version check failed" FAILED_CHECKS=$((FAILED_CHECKS + 1)) fi # Test 3: Database Connectivity echo "Test 3: Database connectivity..." DB_STATUS=$(curl -k -s "${BASE_URL}/api/health/database" | jq -r '.status // empty') if [ "$DB_STATUS" = "ok" ]; then echo "✅ Database connectivity OK" else echo "❌ Database connectivity failed: $DB_STATUS" FAILED_CHECKS=$((FAILED_CHECKS + 1)) fi # Test 4: Redis Connectivity echo "Test 4: Redis connectivity..." REDIS_STATUS=$(curl -k -s "${BASE_URL}/api/health/redis" | jq -r '.status // empty') if [ "$REDIS_STATUS" = "ok" ]; then echo "✅ Redis connectivity OK" else echo "❌ Redis connectivity failed: $REDIS_STATUS" FAILED_CHECKS=$((FAILED_CHECKS + 1)) fi # Test 5: Critical Endpoints echo "Test 5: Critical endpoints..." ENDPOINTS=( "/api/auth/status" "/api/users/me" "/api/transactions/stats" ) for endpoint in "${ENDPOINTS[@]}"; do HTTP_CODE=$(curl -k -s -o /dev/null -w "%{http_code}" \ -H "Authorization: Bearer ${API_TEST_TOKEN}" \ "${BASE_URL}${endpoint}") if [ "$HTTP_CODE" = "200" ] || [ "$HTTP_CODE" = "401" ]; then echo "✅ Endpoint reachable: $endpoint (HTTP $HTTP_CODE)" else echo "❌ Endpoint failed: $endpoint (HTTP $HTTP_CODE)" FAILED_CHECKS=$((FAILED_CHECKS + 1)) fi done # Summary echo "" echo "========================================" if [ $FAILED_CHECKS -eq 0 ]; then echo "✅ All health checks passed" echo "========================================" exit 0 else echo "❌ ${FAILED_CHECKS} health check(s) failed" echo "========================================" exit 1 fi ``` ### 9.4 Smoke Tests **Post-Deployment Smoke Test Suite:** **.gitlab/scripts/smoke-tests.sh:** ```bash #!/usr/bin/env bash set -euo pipefail # Arguments: # $1 - ENVIRONMENT ENVIRONMENT=$1 ENV_CONFIG="environments/${ENVIRONMENT}/config.yml" echo "Running smoke tests for: ${ENVIRONMENT}" FAILED_TESTS=0 # Get first node URL FIRST_NODE=$(yq eval '.nodes[0].name' $ENV_CONFIG) BASE_URL="https://coin-${FIRST_NODE}.${ENVIRONMENT}.company.com" echo "Testing against: $BASE_URL" # Test 1: User Authentication echo "Smoke Test 1: User Authentication..." AUTH_RESPONSE=$(curl -k -s -X POST "${BASE_URL}/api/auth/login" \ -H "Content-Type: application/json" \ -d '{"username":"test_user","password":"test_password"}') TOKEN=$(echo $AUTH_RESPONSE | jq -r '.token // empty') if [ -n "$TOKEN" ]; then echo "✅ Authentication successful" else echo "❌ Authentication failed" FAILED_TESTS=$((FAILED_TESTS + 1)) fi # Test 2: Create Transaction echo "Smoke Test 2: Create Transaction..." TX_RESPONSE=$(curl -k -s -X POST "${BASE_URL}/api/transactions" \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"amount":100,"currency":"USD","description":"Smoke test"}') TX_ID=$(echo $TX_RESPONSE | jq -r '.id // empty') if [ -n "$TX_ID" ]; then echo "✅ Transaction created: $TX_ID" else echo "❌ Transaction creation failed" FAILED_TESTS=$((FAILED_TESTS + 1)) fi # Test 3: Retrieve Transaction echo "Smoke Test 3: Retrieve Transaction..." TX_GET=$(curl -k -s "${BASE_URL}/api/transactions/${TX_ID}" \ -H "Authorization: Bearer $TOKEN") TX_STATUS=$(echo $TX_GET | jq -r '.status // empty') if [ "$TX_STATUS" = "pending" ] || [ "$TX_STATUS" = "completed" ]; then echo "✅ Transaction retrieved: status=$TX_STATUS" else echo "❌ Transaction retrieval failed" FAILED_TESTS=$((FAILED_TESTS + 1)) fi # Test 4: List Transactions echo "Smoke Test 4: List Transactions..." TX_LIST=$(curl -k -s "${BASE_URL}/api/transactions?limit=10" \ -H "Authorization: Bearer $TOKEN") TX_COUNT=$(echo $TX_LIST | jq '.items | length') if [ "$TX_COUNT" -gt 0 ]; then echo "✅ Transaction list retrieved: $TX_COUNT items" else echo "❌ Transaction list empty or failed" FAILED_TESTS=$((FAILED_TESTS + 1)) fi # Test 5: Webhook Endpoint echo "Smoke Test 5: Webhook Processing..." WEBHOOK_RESPONSE=$(curl -k -s -X POST "${BASE_URL}/api/webhooks/test" \ -H "X-Webhook-Secret: ${WEBHOOK_SECRET}" \ -H "Content-Type: application/json" \ -d '{"event":"test","data":{}}') WEBHOOK_STATUS=$(echo $WEBHOOK_RESPONSE | jq -r '.status // empty') if [ "$WEBHOOK_STATUS" = "processed" ]; then echo "✅ Webhook processed" else echo "❌ Webhook processing failed" FAILED_TESTS=$((FAILED_TESTS + 1)) fi # Test 6: PDF Generation echo "Smoke Test 6: PDF Generation..." PDF_RESPONSE=$(curl -k -s -X POST "${BASE_URL}/api/reports/generate" \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"type":"transaction_report","format":"pdf"}') PDF_URL=$(echo $PDF_RESPONSE | jq -r '.url // empty') if [ -n "$PDF_URL" ]; then echo "✅ PDF generated: $PDF_URL" else echo "❌ PDF generation failed" FAILED_TESTS=$((FAILED_TESTS + 1)) fi # Summary echo "" echo "========================================" echo "Smoke Tests Summary" echo "========================================" if [ $FAILED_TESTS -eq 0 ]; then echo "✅ All smoke tests passed (6/6)" exit 0 else echo "❌ ${FAILED_TESTS} smoke test(s) failed" exit 1 fi ``` ### 9.5 Performance Baseline Monitoring **Response Time Tracking:** ```yaml monitor_performance_baseline: stage: verify script: - echo "Monitoring performance baseline..." - BASE_URL="https://coin-node3.${ENVIRONMENT}.company.com" # Measure response times - | echo "Endpoint,Response_Time_MS,Status" > performance-${RELEASE_TAG}.csv ENDPOINTS=( "/health" "/api/version" "/api/auth/status" "/api/transactions?limit=10" ) for endpoint in "${ENDPOINTS[@]}"; do RESPONSE_TIME=$(curl -k -s -o /dev/null -w "%{time_total}" "${BASE_URL}${endpoint}") HTTP_CODE=$(curl -k -s -o /dev/null -w "%{http_code}" "${BASE_URL}${endpoint}") RESPONSE_TIME_MS=$(echo "$RESPONSE_TIME * 1000" | bc) echo "${endpoint},${RESPONSE_TIME_MS},${HTTP_CODE}" >> performance-${RELEASE_TAG}.csv done - cat performance-${RELEASE_TAG}.csv # Compare with baseline - | if [ -f "performance-baseline.csv" ]; then echo "Comparing with baseline..." # Simple comparison (production should use proper analysis) CURRENT_AVG=$(awk -F',' 'NR>1 {sum+=$2; count++} END {print sum/count}' performance-${RELEASE_TAG}.csv) BASELINE_AVG=$(awk -F',' 'NR>1 {sum+=$2; count++} END {print sum/count}' performance-baseline.csv) DEGRADATION=$(echo "scale=2; ($CURRENT_AVG - $BASELINE_AVG) / $BASELINE_AVG * 100" | bc) echo "Current average: ${CURRENT_AVG}ms" echo "Baseline average: ${BASELINE_AVG}ms" echo "Degradation: ${DEGRADATION}%" # Alert if degradation > 20% if (( $(echo "$DEGRADATION > 20" | bc -l) )); then echo "⚠️ Performance degradation detected: ${DEGRADATION}%" echo "Consider rollback or investigation" fi else echo "No baseline found, creating..." cp performance-${RELEASE_TAG}.csv performance-baseline.csv fi artifacts: paths: - performance-*.csv expire_in: 30 days ``` ### 9.6 Alerting Configuration **Alertmanager Rules:** ```yaml # alertmanager.yml route: group_by: ['alertname', 'environment'] group_wait: 10s group_interval: 10s repeat_interval: 12h receiver: 'slack-notifications' routes: - match: severity: critical receiver: 'pagerduty-critical' continue: true - match: severity: warning environment: production receiver: 'slack-production' - match: environment: sandbox receiver: 'slack-sandbox' receivers: - name: 'slack-notifications' slack_configs: - api_url: '${SLACK_WEBHOOK_URL}' channel: '#deployments' title: '{{ .GroupLabels.alertname }}' text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}' - name: 'pagerduty-critical' pagerduty_configs: - service_key: '${PAGERDUTY_SERVICE_KEY}' description: '{{ .GroupLabels.alertname }}' - name: 'slack-production' slack_configs: - api_url: '${SLACK_WEBHOOK_PRODUCTION}' channel: '#production-alerts' color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}' ``` **Alert Rules:** ```yaml # prometheus-rules.yml groups: - name: deployment_alerts interval: 30s rules: - alert: DeploymentFailed expr: deployment_status{environment="production"} == 0 for: 2m labels: severity: critical annotations: description: "Deployment to {{ $labels.node }} failed" - alert: HighErrorRate expr: rate(http_requests_total{status=~"5..",environment="production"}[5m]) > 0.05 for: 5m labels: severity: warning annotations: description: "High error rate detected: {{ $value }}%" - alert: ServiceDown expr: up{job="coin-api",environment="production"} == 0 for: 1m labels: severity: critical annotations: description: "Service {{ $labels.instance }} is down" - alert: HighMemoryUsage expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9 for: 5m labels: severity: warning annotations: description: "Container {{ $labels.container }} memory usage > 90%" ``` --- ## 10. План внедрения ### 10.1 Phased Rollout Strategy **4-Phase Approach:** ``` Phase 1: Infrastructure Setup (Week 1-2) ├── GitLab Runner installation ├── Docker context configuration ├── SOPS setup ├── Monitoring stack deployment └── Testing infrastructure Phase 2: Development Environment (Week 3-4) ├── Migrate development to GitOps ├── Create pipeline templates ├── Test basic workflows ├── Train team └── Collect feedback Phase 3: Sandbox + Testing (Week 5-6) ├── Migrate sandbox environment ├── Implement approval workflows ├── Add advanced features (rollback, etc.) ├── Performance tuning └── Documentation Phase 4: Production Ready (Week 7-8) ├── Production configuration ├── Security hardening ├── Disaster recovery testing ├── Final training └── Go-live ``` ### 10.2 Week-by-Week Implementation Plan **Week 1: Foundation** | Day | Tasks | Deliverables | |-----|-------|--------------| | Mon | Kickoff meeting, Requirements review | Project charter | | Tue | GitLab Runner installation, Docker context setup | Working runner | | Wed | Create repository structure, Initial pipeline | Base .gitlab-ci.yml | | Thu | SOPS installation, GPG key generation | Encrypted secrets | | Fri | Monitoring stack deployment | Prometheus + Grafana | **Week 2: Development Pipeline** | Day | Tasks | Deliverables | |-----|-------|--------------| | Mon | Development environment configuration | config.yml | | Tue | Prepare stage implementation | Extract + prepare scripts | | Wed | Deploy stage implementation | Deployment automation | | Thu | Verification stage implementation | Health checks + smoke tests | | Fri | End-to-end testing | Working dev pipeline | **Week 3: Sandbox Migration** | Day | Tasks | Deliverables | |-----|-------|--------------| | Mon | Sandbox configuration creation | Sandbox config files | | Tue | Secret migration to SOPS | Encrypted secrets | | Wed | Pipeline adaptation | Sandbox-specific jobs | | Thu | Testing + validation | Successful deployment | | Fri | Parallel running (old + new) | Comparison data | **Week 4: Advanced Features** | Day | Tasks | Deliverables | |-----|-------|--------------| | Mon | Rollback implementation | Rollback pipeline | | Tue | Automatic rollback triggers | Health-based rollback | | Wed | Performance monitoring | Baseline tracking | | Thu | Alert configuration | Alerting rules | | Fri | Documentation update | User guides | **Week 5: Testing Environment** | Day | Tasks | Deliverables | |-----|-------|--------------| | Mon | Testing environment setup | Testing configs | | Tue | Approval workflow implementation | Manual gates | | Wed | Integration with QA processes | QA checklist | | Thu | Environment promotion testing | Promotion pipeline | | Fri | Load testing | Performance report | **Week 6: Production Preparation** | Day | Tasks | Deliverables | |-----|-------|--------------| | Mon | Production configuration | Prod configs | | Tue | Security hardening | Security audit | | Wed | Disaster recovery setup | DR procedures | | Thu | Change Advisory Board integration | CAB workflow | | Fri | Production dry-run | Test results | **Week 7: Production Migration** | Day | Tasks | Deliverables | |-----|-------|--------------| | Mon | Final security review | Sign-off | | Tue | Production secrets migration | Encrypted prod secrets | | Wed | Production pipeline testing | Test deployment | | Thu | Go-live preparation | Runbooks | | Fri | Production go-live | First prod deployment | **Week 8: Stabilization** | Day | Tasks | Deliverables | |-----|-------|--------------| | Mon | Monitor production deployments | Metrics report | | Tue | Address any issues | Bug fixes | | Wed | Team training sessions | Training materials | | Thu | Documentation finalization | Complete docs | | Fri | Project retrospective | Lessons learned | ### 10.3 Success Criteria **Technical Metrics:** | Metric | Target | Measurement | |--------|--------|-------------| | Deployment time | < 15 min | Pipeline duration | | Success rate | > 95% | Successful/total deploys | | Rollback time | < 5 min | Rollback duration | | MTTR | < 30 min | Mean time to recovery | | Pipeline reliability | > 99% | Runner uptime | **Process Metrics:** | Metric | Target | Measurement | |--------|--------|-------------| | Manual steps | < 2 per deploy | Process audit | | Approval time | < 2 hours | Approval duration | | Documentation coverage | 100% | Doc review | | Team training | 100% | Training completion | | Knowledge transfer | Complete | Quiz scores | **Business Metrics:** | Metric | Target | Measurement | |--------|--------|-------------| | Deployment frequency | 2x increase | Deploy count | | Lead time | 50% reduction | Commit to production | | Change failure rate | < 5% | Failed/total changes | | Team satisfaction | > 80% | Survey results | | Cost savings | Measurable | Time saved × hourly rate | ### 10.4 Risk Mitigation **Identified Risks:** | Risk | Probability | Impact | Mitigation | |------|-------------|--------|------------| | Pipeline failures during migration | High | Medium | Parallel running, quick rollback | | Secret leakage | Low | Critical | SOPS encryption, access control | | Learning curve | Medium | Medium | Training, documentation, support | | Production incident | Low | Critical | Comprehensive testing, gradual rollout | | Resistance to change | Medium | Medium | Change management, stakeholder buy-in | **Contingency Plans:** 1. **Pipeline Failure:** - Keep manual scripts as backup - Document emergency procedures - 24/7 support during migration 2. **Security Incident:** - Immediate secret rotation - Audit all access - Incident response team activation 3. **Team Issues:** - Extended training period - Pair programming sessions - Dedicated support channel ### 10.5 Training Plan **Training Modules:** **Module 1: GitOps Fundamentals (2 hours)** - Infrastructure as Code concepts - Git workflow и best practices - CI/CD pipeline basics - Hands-on: Create simple pipeline **Module 2: COIN Pipeline Deep Dive (3 hours)** - Pipeline architecture overview - Stage-by-stage walkthrough - Configuration management - Hands-on: Trigger deployment **Module 3: Secrets Management (2 hours)** - SOPS usage - Secret rotation procedures - Security best practices - Hands-on: Encrypt/decrypt secrets **Module 4: Troubleshooting (2 hours)** - Reading pipeline logs - Common failure scenarios - Debug techniques - Hands-on: Fix failing pipeline **Module 5: Rollback Procedures (2 hours)** - When to rollback - Rollback execution - Verification steps - Hands-on: Perform rollback **Module 6: Monitoring & Alerts (2 hours)** - Dashboard overview - Alert interpretation - Response procedures - Hands-on: Respond to alert ### 10.6 Post-Implementation Support **Support Structure:** ``` Tier 1: Self-Service ├── Documentation wiki ├── Troubleshooting guides ├── FAQ └── Video tutorials Tier 2: Team Support ├── Slack channel: #cicd-support ├── Office hours: Daily 10-11 AM ├── Email: devops-support@company.com └── Response time: < 4 hours Tier 3: Expert Support ├── On-call DevOps engineer ├── Escalation for critical issues ├── Response time: < 1 hour └── 24/7 for production ``` **Continuous Improvement:** - Weekly metrics review - Monthly retrospectives - Quarterly pipeline optimization - Annual security audit - Regular training updates --- ## Заключение ### Итоговое решение Универсальный GitLab CI/CD pipeline для COIN приложения **полностью реализуем** и обеспечит: ✅ **Автоматизацию** - 90% reduction ручных операций ✅ **Универсальность** - поддержка всех 4 окружений ✅ **Безопасность** - SOPS encryption + audit trail ✅ **Надежность** - automatic rollback + health checks ✅ **Observability** - comprehensive monitoring ✅ **Скорость** - 3x faster deployments ### Ключевые преимущества 1. **Единый процесс** для всех окружений 2. **Git как source of truth** для всех конфигураций 3. **Автоматический deployment** с manual gates где нужно 4. **Built-in rollback** с verification 5. **Comprehensive monitoring** на всех уровнях 6. **Полная прослеживаемость** всех изменений ### Следующие шаги 1. Review этого документа с командой 2. Утверждение implementation плана 3. Allocation ресурсов (8 недель, 1-2 FTE) 4. Kickoff meeting 5. Start Phase 1 implementation **Документ готов для начала внедрения!** 🚀