admin/k3s-gitops

Fork 0

Files

Claude AI dd061c8f1f docs: add detailed architecture document for GitOps solution

2026-01-12 13:00:31 +00:00

27 KiB

Raw Blame History

FinTech GitOps CI/CD - Архитектура решения

Версия: 1.0
Дата: Январь 2026
Целевая аудитория: Архитекторы, DevOps, Инфраструктура, Безопасность

1. Общая архитектура

1.1 Принципы проектирования

Defense in Depth: Многоуровневая защита с изоляцией на каждом уровне:

Сетевая сегментация через VLAN
Firewall между всеми зонами
Application-level authentication и authorization
Encryption at rest и in transit
Audit logging на всех уровнях

Least Privilege: Минимальные необходимые права для каждого компонента:

Service accounts с ограниченными permissions
Network access только к необходимым endpoints
Time-bound credentials где возможно
Регулярная ротация secrets

Immutable Infrastructure: Инфраструктура как код, изменения только через Git:

Нет ручных изменений на серверах
Все изменения version controlled
Reproducible deployments
Easy rollback через Git history

Observability: Полная видимость всех процессов:

Централизованное логирование
Метрики со всех компонентов
Distributed tracing для запросов
Audit trail для compliance

1.2 Логические слои

Presentation Layer (User Interface):

Portainer UI для визуального управления Swarm
Grafana для дашбордов и метрик
Jenkins Blue Ocean для CI/CD визуализации
Ollama web interface для AI взаимодействия
Gitea web UI для repository management

API Layer:

Docker Swarm API для управления кластером
Harbor API для registry операций
Gitea API для Git operations
Jenkins API для trigger builds
Prometheus API для метрик
MCP Server API для AI интеграции

Service Layer (Business Logic):

GitOps Operator - автоматическая синхронизация
Jenkins pipelines - CI/CD логика
Harbor webhooks - уведомления о новых образах
AlertManager - правила для алертов
AI models - обработка запросов

Data Layer:

PostgreSQL - реляционные данные
Git repositories - код и конфигурации
Harbor storage - Docker образы
Prometheus TSDB - временные ряды метрик
Loki - логи
Vector DB - embeddings для AI

Infrastructure Layer:

Docker Swarm - orchestration platform
Overlay networks - service communication
Shared storage - persistent data
Backup systems - disaster recovery

2. Сетевая архитектура

2.1 VLAN сегментация

VLAN 10 - Management & CI/CD Zone:

Subnet: 10.10.10.0/24
Gateway: 10.10.10.1
Компоненты: Gitea, Jenkins, Harbor, GitOps Operator, Portainer
Доступ: Только через VPN с MFA
Изоляция: Строгий firewall на границе

VLAN 20 - Docker Swarm Cluster Zone:

Subnet: 10.20.0.0/16 (для большого количества containers)
Manager subnet: 10.20.1.0/24
Worker subnet: 10.20.2.0/23
Gateway: 10.20.0.1
Компоненты: Swarm managers, workers, overlay networks
Доступ: Только из Management zone и Monitoring zone
Изоляция: Encrypted overlay network внутри

VLAN 30 - AI & Analytics Zone:

Subnet: 10.30.10.0/24
Gateway: 10.30.10.1
Компоненты: Ollama, MCP Server, Vector Database
Доступ: Read-only к источникам данных
Изоляция: Не может инициировать изменения в других зонах

VLAN 40 - Monitoring & Logging Zone:

Subnet: 10.40.10.0/24
Gateway: 10.40.10.1
Компоненты: Prometheus, Grafana, Loki, AlertManager
Доступ: Read-only metrics collection
Изоляция: Не может управлять компонентами

VLAN 50 - Data & Database Zone:

Subnet: 10.50.0.0/16
Infrastructure DB subnet: 10.50.10.0/24
Application DB subnet: 10.50.20.0/23
Storage subnet: 10.50.30.0/24
Gateway: 10.50.0.1
Компоненты: PostgreSQL, Application databases, Shared storage
Доступ: Строго контролируемый, encrypted connections
Изоляция: Самая строгая, audit всех подключений

VLAN 60 - Backup & DR Zone:

Subnet: 10.60.10.0/24
Gateway: 10.60.10.1
Компоненты: Backup server, long-term storage
Доступ: Write-only для backup agents, read для recovery
Изоляция: Offline storage, air-gapped где возможно

2.2 Firewall правила

Принцип: Deny all, allow explicitly needed

Management VLAN → Swarm VLAN:

Source: 10.10.10.40 (GitOps Operator)
Destination: 10.20.1.0/24 (Swarm Managers)
Ports: 2377/tcp (cluster management)
Action: ALLOW
Logging: YES

Source: 10.10.10.50 (Portainer)
Destination: 10.20.1.0/24 (Swarm Managers)
Ports: 2375/tcp (Docker API over TLS)
Action: ALLOW
Logging: YES

All other traffic: DENY

Swarm VLAN → Harbor (Management VLAN):

Source: 10.20.0.0/16 (All Swarm nodes)
Destination: 10.10.10.30 (Harbor)
Ports: 443/tcp, 5000/tcp (HTTPS, Docker registry)
Protocol: TLS 1.3 with mutual authentication
Action: ALLOW
Logging: YES

All other traffic: DENY

AI VLAN → Data Sources:

Source: 10.30.10.20 (MCP Server)
Destination: Multiple (через MCP connectors)
Ports: Varies (SSH 22, HTTPS 443, PostgreSQL 5432, etc.)
Access: READ-ONLY
Authentication: Service account per destination
Logging: ALL QUERIES LOGGED
Action: ALLOW with rate limiting

Write operations: DENY

Monitoring VLAN → All Zones:

Source: 10.40.10.10 (Prometheus)
Destination: ALL VLANs
Ports: Metrics endpoints (обычно 9090-9999)
Access: READ-ONLY metrics scraping
Action: ALLOW
Logging: NO (too verbose, metrics only)

Any non-metrics ports: DENY

Data VLAN → Backup VLAN:

Source: 10.50.0.0/16 (All databases)
Destination: 10.60.10.10 (Backup server)
Ports: Backup protocol specific
Direction: ONE-WAY (source → backup only)
Action: ALLOW
Logging: YES
Encryption: MANDATORY

Reverse direction: DENY (except for restore procedures)

2.3 Внешнее подключение

VPN Gateway:

Публичный IP для VPN подключений
Multi-factor authentication обязательна
Certificate-based authentication + one-time password
Split-tunnel запрещен (все через VPN)
Session timeout: 8 часов
Idle timeout: 30 минут
Disconnect после 3 неудачных MFA попыток

Jump Host/Bastion:

Единая точка входа после VPN
Session recording для аудита
No direct access to production systems, только через jump host
Authorized keys management централизованно
Automatic logout после 15 минут idle
Audit log всех команд

Разрешенные пользователи:

Developers: Доступ к Gitea, Jenkins, Portainer (read-only для production)
DevOps: Полный доступ ко всем системам управления
Security team: Read-only audit доступ ко всему
Managers: Grafana и reporting dashboards только

3. Зоны и их назначение

3.1 Management & CI/CD Zone

Назначение: Централизованное управление кодом, CI/CD процессами и container registry.

Критичность: HIGH - простой влияет на возможность деплоя новых версий

Компоненты:

Gitea (10.10.10.10):

Роль: Single source of truth для всего кода и конфигураций
Взаимодействие: Принимает push от developers, отправляет webhooks в Jenkins
Зависимости: PostgreSQL (VLAN 50), Shared storage для Git LFS
SLA: 99.9% uptime

Jenkins (10.10.10.20):

Роль: CI automation, build и test applications
Взаимодействие: Получает webhooks от Gitea, push образов в Harbor, update Git
Зависимости: Gitea, Harbor, Docker build agents
SLA: 99.5% uptime (может работать в degraded mode)

Harbor (10.10.10.30):

Роль: Enterprise container registry с security scanning
Взаимодействие: Принимает push от Jenkins, pull от Swarm nodes
Зависимости: PostgreSQL, Object storage для images
SLA: 99.9% uptime (критичен для pull образов)

GitOps Operator (10.10.10.40):

Роль: Автоматическая синхронизация Git → Swarm
Взаимодействие: Мониторит Gitea, применяет изменения в Swarm через API
Зависимости: Gitea, Docker Swarm API
SLA: 99.9% uptime

Portainer (10.10.10.50):

Роль: Web UI для управления и мониторинга Swarm
Взаимодействие: Подключается к Swarm managers через Docker API
Зависимости: Docker Swarm API, PostgreSQL для своей базы
SLA: 99% uptime (не критичен, есть CLI альтернатива)

Резервирование:

Gitea: Master-slave replication, automated failover
Jenkins: Standby instance в warm mode
Harbor: Geo-replication на secondary site
GitOps Operator: Active-passive pair
Portainer: Standby instance

3.2 Docker Swarm Cluster Zone

Назначение: Выполнение production workloads с high availability и load balancing.

Критичность: CRITICAL - прямое влияние на бизнес сервисы

Swarm Manager Nodes (10.20.1.1-3):

Количество: 3 для кворума (рекомендуется нечетное число)
Роль: Cluster orchestration, scheduling, API endpoint
Raft consensus: Нужно минимум 2 alive из 3 для работы кластера
Workload: НЕ запускают application containers (только infrastructure)
CPU: 4 vCPU каждый
RAM: 8 GB каждый
Disk: 200 GB SSD каждый
Network: 10 Gbps для Raft communication

Swarm Worker Nodes (10.20.2.1-N):

Количество: Зависит от workload, минимум 3 для redundancy
Роль: Выполнение application containers
Constraints: Можно маркировать ноды для specific workloads
CPU: 8-16 vCPU каждый
RAM: 32-64 GB каждый
Disk: 500 GB SSD каждый
Network: 10 Gbps для overlay network performance

Overlay Networks:

Automatic encryption (IPSec)
Service discovery через DNS
Load balancing через routing mesh
Изоляция между разными стеками

Secrets Management:

Docker Swarm secrets encrypted at rest
Rotation через stack update
Mount как files в containers
Audit log доступа к secrets

Резервирование:

Manager nodes: N-1 failure tolerance (3 ноды = 1 failure ok)
Worker nodes: Application replicas распределены по разным нодам
Persistent data: Replicated storage (GlusterFS или NFS с HA)
Network: Bonded interfaces для redundancy

3.3 AI & Analytics Zone

Назначение: Предоставление AI-powered помощи через анализ internal data sources.

Критичность: MEDIUM - удобство, но не критично для операций

Ollama Server (10.30.10.10):

Роль: Запуск AI моделей локально на собственном железе
Модели: Llama 3.3 70B, Qwen 2.5 Coder, DeepSeek, и другие
Взаимодействие: Получает запросы от пользователей, context от MCP Server
Требования: GPU highly recommended для производительности
CPU: 16 vCPU (или меньше если есть GPU)
RAM: 64 GB (модели требуют много памяти)
GPU: NVIDIA A100 40GB или 2x RTX 4090 24GB (опционально но рекомендуется)
Disk: 2 TB NVMe SSD (модели весят 10-100 GB каждая)
Network: 10 Gbps для быстрого ответа

MCP Server (10.30.10.20):

Роль: Интеграция AI с источниками данных (Gitea, Swarm, DBs, logs)
Connectors: Модульные плагины для каждого источника
Взаимодействие: Read-only запросы к data sources, передача context в Ollama
Security: Service accounts для каждого connector, audit всех запросов
CPU: 8 vCPU
RAM: 16 GB
Disk: 100 GB SSD
Network: 1 Gbps

Vector Database (10.30.10.30):

Роль: Хранение embeddings документации для semantic search
Технология: Qdrant или Milvus
Размер: Зависит от количества документации
CPU: 4 vCPU
RAM: 16 GB (зависит от размера index)
Disk: 500 GB SSD
Network: 1 Gbps

Data Flow: Пользователь → Ollama → MCP Server → (параллельно):

Gitea MCP Connector → Gitea (документация, код)
Swarm MCP Connector → Docker API (статус, логи)
Database MCP Connector → PostgreSQL (метаданные)
Prometheus MCP Connector → Metrics
Loki MCP Connector → Logs → Агрегированный context → Ollama → Ответ пользователю

Резервирование:

Ollama: Standby instance (warm standby)
MCP Server: Active-passive pair
Vector DB: Replicated для HA

3.4 Monitoring & Logging Zone

Назначение: Observability инфраструктуры для проактивного мониторинга и troubleshooting.

Критичность: HIGH - необходим для detection проблем

Prometheus (10.40.10.10):

Роль: Сбор и хранение метрик временных рядов
Scrape targets: Все компоненты инфраструктуры
Retention: 30 дней в Prometheus, long-term в Thanos/VictoriaMetrics
CPU: 8 vCPU
RAM: 32 GB
Disk: 2 TB HDD (time-series data)
Network: 1 Gbps

Grafana (10.40.10.20):

Роль: Визуализация метрик и логов
Dashboards: Преднастроенные для каждого компонента
Alerting: Визуальный редактор алертов
CPU: 4 vCPU
RAM: 8 GB
Disk: 100 GB SSD
Network: 1 Gbps

Loki (10.40.10.30):

Роль: Централизованное хранение логов
Agents: Promtail на каждой ноде
Retention: 90 дней
CPU: 8 vCPU
RAM: 16 GB
Disk: 5 TB HDD (logs)
Network: 1 Gbps

AlertManager (10.40.10.40):

Роль: Обработка и роутинг алертов
Интеграции: Slack, Email, PagerDuty, Telegram
Deduplication: Группировка похожих алертов
CPU: 2 vCPU
RAM: 4 GB
Disk: 50 GB SSD
Network: 1 Gbps

Резервирование:

Prometheus: Federated setup, multiple instances
Grafana: Load balanced instances
Loki: Distributed deployment
AlertManager: Clustered для HA

3.5 Data & Database Zone

Назначение: Хранение persistent data для инфраструктуры и приложений.

Критичность: CRITICAL - потеря данных недопустима

Infrastructure PostgreSQL Cluster (10.50.10.10-11):

Роль: Базы данных для Gitea, Harbor, Portainer
Топология: Master-slave с automatic failover
Backup: Continuous WAL archiving + daily full backup
Encryption: At rest (LUKS) и in transit (TLS)
CPU: 8 vCPU per instance
RAM: 16 GB per instance
Disk: 500 GB SSD per instance
Network: 10 Gbps

Application Databases (10.50.20.x):

Роль: Базы данных бизнес-приложений
Технологии: Зависит от приложений (PostgreSQL, MySQL, MongoDB)
Isolation: Каждое приложение в своей database/schema
Backup: Application-specific strategy
Resources: Зависит от workload

Shared Storage (10.50.30.1-3):

Роль: Persistent volumes для Swarm services
Технология: GlusterFS (replicated) или NFS с HA
Replication: 3x для fault tolerance
Snapshots: Каждый час, retention 7 дней
Capacity: 10 TB (grows as needed)
Network: 10 Gbps для I/O performance

Резервирование:

PostgreSQL: Synchronous replication, automatic failover
Shared Storage: Distributed replication (GlusterFS 3-way)
Backups: Multiple copies в разных locations

3.6 Backup & DR Zone

Назначение: Защита от data loss и быстрое восстановление при катастрофах.

Критичность: CRITICAL для долгосрочной устойчивости бизнеса

Backup Server (10.60.10.10):

Роль: Прием и хранение backups
Technology: Bacula или Bareos (enterprise backup solution)
Scheduling: Automated по расписанию + on-demand
Encryption: All backups encrypted at rest
CPU: 4 vCPU
RAM: 8 GB
Disk: 20 TB HDD (RAID 10)
Network: 10 Gbps для fast backups

Backup Strategy:

Hourly Incremental:

Git repositories (только изменения)
Retention: 48 hours

Daily Full:

Databases (full dump)
Docker Swarm configs
Важные логи
Retention: 30 days

Weekly Full:

Полный snapshot всей инфраструктуры
VM images, configs, data
Retention: 12 weeks

Monthly Archives:

Long-term compliance storage
Retention: 7 years (regulatory requirement)

DR Site (опционально, в другом ЦОД):

Роль: Geographic redundancy
Replication: Asynchronous из primary site
RTO (Recovery Time Objective): 4 hours
RPO (Recovery Point Objective): 15 minutes
Testing: Quarterly DR drills

4. Потоки данных

4.1 Development Workflow

Developer commits code:

Developer Workstation
↓ (SSH через VPN)
Gitea (VLAN 10)
↓ (Webhook HTTPS + signature verification)
Jenkins (VLAN 10)
↓ (git clone through SSH)
Gitea

CI Pipeline execution:

Jenkins
↓ (build application)
Build Agent (ephemeral container/VM)
↓ (run tests)
Test results → Archived in Jenkins
↓ (build Docker image)
Docker build agent
↓ (security scan with Trivy)
Vulnerability report
↓ (docker push через TLS + creds)
Harbor (VLAN 10)

Update GitOps repo:

Jenkins
↓ (update image tag в compose file)
Gitea GitOps repository
↓ (commit + push)
Gitea

4.2 CD Workflow

GitOps sync:

GitOps Operator (VLAN 10)
↓ (poll Git repository каждые 30 sec)
Gitea
↓ (detect changes)
GitOps Operator
↓ (docker stack deploy через Swarm API)
Swarm Managers (VLAN 20)

Swarm orchestration:

Swarm Manager
↓ (schedule tasks на workers)
Swarm Scheduler
↓ (pull image from Harbor)
Worker Nodes ↔ Harbor (VLAN 10)
↓ (start containers)
Application Running

Service update (rolling):

Swarm Manager
↓ (stop 1 task из N)
Worker Node A
↓ (start new task с новым image)
Worker Node B
↓ (verify health check)
Health Check (5 consecutive passes required)
↓ (proceed to next task)
Repeat until all tasks updated

4.3 AI Interaction Flow

User query:

User (через Web UI или API)
↓ (HTTPS request)
Ollama Server (VLAN 30)
↓ (request context через MCP protocol)
MCP Server (VLAN 30)

MCP gathers context (parallel):

MCP Server
├→ Gitea MCP Connector → Gitea API (docs, code)
├→ Swarm MCP Connector → Docker API (logs, metrics)
├→ Database MCP Connector → PostgreSQL (metadata)
├→ Prometheus MCP Connector → Prometheus API (metrics)
└→ Loki MCP Connector → Loki API (logs)
↓ (all responses aggregated)
MCP Server
↓ (full context sent to AI)
Ollama Server
↓ (generate response)
User

AI response with action:

AI determines action needed
↓ (if requires change)
AI suggests change to user
↓ (user approves)
Change committed to Git
↓ (normal GitOps flow)
Applied to infrastructure

4.4 Monitoring Data Flow

Metrics collection:

All Infrastructure Components
↓ (expose metrics endpoints)
Prometheus Exporters
↓ (scrape every 15 seconds)
Prometheus (VLAN 40)
↓ (evaluate alert rules)
AlertManager (VLAN 40)
↓ (route notifications)
Slack/Email/PagerDuty

Logs collection:

All Containers
↓ (stdout/stderr)
Docker logging driver
↓ (forward)
Promtail Agent (на каждой ноде)
↓ (push)
Loki (VLAN 40)
↓ (index и store)
Loki Storage
↓ (query)
Grafana или CLI

Audit logs:

All Infrastructure Actions
├→ Gitea (Git operations)
├→ Docker Swarm (API calls)
├→ Harbor (image push/pull)
├→ Jenkins (builds)
└→ SSH sessions (bastion)
↓ (forward)
Centralized Syslog
↓ (store)
Long-term Audit Storage (7 years)

5. High Availability и масштабирование

5.1 HA Strategy

Tier 1 - Critical (99.99% uptime):

Docker Swarm (application platform)
Harbor (cannot deploy without it)
Shared Storage (persistent data)
Strategy: Active-Active где возможно, N+1 redundancy

Tier 2 - Important (99.9% uptime):

Gitea (code access)
GitOps Operator (CD automation)
Databases (infrastructure metadata)
Strategy: Active-Passive с automatic failover

Tier 3 - Nice to have (99% uptime):

Jenkins (can wait for restore)
Portainer (CLI alternative exists)
Monitoring (short downtime acceptable)
Strategy: Warm standby, manual failover

5.2 Scaling Points

Vertical Scaling (увеличение ресурсов):

Databases: Больше RAM для cache
Ollama: Добавление GPU для speed
Harbor storage: Больше disk для images
Limit: Hardware limitations

Horizontal Scaling (добавление instances):

Swarm Workers: Добавить ноды для capacity
Jenkins Agents: Dynamic scaling по demand
Prometheus: Federation для distributed scraping
MCP Connectors: Независимые instances per source

Data Scaling:

PostgreSQL: Read replicas для read-heavy workloads
Harbor: Geo-replication для distributed teams
Loki: Sharding по времени
Git: Repository sharding (не часто нужно)

5.3 Capacity Planning

Metrics для отслеживания:

CPU utilization (target <70% average)
Memory utilization (target <80%)
Disk usage (alert при 80%, critical при 90%)
Network bandwidth (baseline + trend analysis)
IOPS (SSD wear, performance degradation)

Growth projections:

Applications: 20% growth в год
Code repositories: 30% growth в год (accumulative)
Logs: 50% growth в год (more verbose logging)
Metrics retention: Linear с количеством services

Scaling triggers:

Add Swarm worker когда CPU >80% sustained
Upgrade database когда query latency >100ms p95
Expand storage когда >75% used
Add Jenkins agents когда queue >5 builds

6. Disaster Recovery

6.1 RTO и RPO Targets

Recovery Time Objective (RTO):

Tier 1 services: 1 hour
Tier 2 services: 4 hours
Tier 3 services: 24 hours
Full infrastructure: 8 hours

Recovery Point Objective (RPO):

Databases: 15 minutes (via WAL shipping)
Git repositories: 1 hour (hourly backup)
Docker images: 0 (replicated to DR)
Configs: 0 (in Git)
Logs: 1 hour (buffered before ingestion)

6.2 DR Scenarios

Scenario 1: Single server failure

Detection: Automated monitoring
Response: Automatic failover to redundant instance
Recovery time: <5 minutes
Data loss: None (active-active or sync replication)

Scenario 2: Network partition

Detection: Raft consensus loss, monitoring alerts
Response: Manual investigation, possible split-brain resolution
Recovery time: 30 minutes
Data loss: Possible if write to minority partition

Scenario 3: Data center failure

Detection: Total loss of connectivity
Response: Failover to DR site
Recovery time: 4 hours (RTO)
Data loss: Up to 15 minutes (RPO)

Scenario 4: Ransomware/Corruption

Detection: File integrity monitoring, unusual encryption activity
Response: Isolate affected systems, restore from clean backup
Recovery time: 8 hours (full rebuild)
Data loss: Up to last clean backup (potentially hours)

Scenario 5: Human error (accidental delete)

Detection: Git history, audit logs
Response: Restore from backup or Git revert
Recovery time: 1-2 hours
Data loss: None (everything in version control)

6.3 Recovery Procedures

Database Recovery:

Stop application access
Restore base backup
Apply WAL logs до point-in-time
Verify data integrity
Resume application access

Git Repository Recovery:

Clone from DR site или restore backup
Verify commit history integrity
Restore hooks и configurations
Test push/pull operations
Notify team of recovery

Docker Swarm Recovery:

Deploy manager nodes from backup configs
Join worker nodes
Restore network и volume configs
Deploy stacks from Git
Verify service health

Full Site Recovery:

Deploy infrastructure от Terraform/IaC
Restore databases from backup
Clone Git repositories
Deploy Docker Swarm
Apply all stacks from GitOps
Verify end-to-end functionality
Switch DNS to DR site
Notify stakeholders

6.4 Testing DR

Monthly:

Restore тест на отдельной инфраструктуре
Verify backup integrity
Test recovery procedures

Quarterly:

Full DR drill с failover на DR site
Measure actual RTO/RPO
Update procedures based на findings

Annually:

Tabletop exercise с всеми stakeholders
Test business continuity plans
Update и train на changes

Следующие документы:

03-security-compliance.md - Детальные требования безопасности
04-component-specifications.md - Технические спецификации компонентов
05-development-environment.md - Dev окружение для тестирования

Утверждение:

Enterprise Architect: _______________
Security Architect: _______________
Infrastructure Lead: _______________
Date: _______________

27 KiB Raw Blame History Unescape Escape

FinTech GitOps CI/CD - Архитектура решения

Содержание

1. Общая архитектура

1.1 Принципы проектирования

1.2 Логические слои

2. Сетевая архитектура

2.1 VLAN сегментация

2.2 Firewall правила

2.3 Внешнее подключение

3. Зоны и их назначение

3.1 Management & CI/CD Zone

3.2 Docker Swarm Cluster Zone

3.3 AI & Analytics Zone

3.4 Monitoring & Logging Zone

3.5 Data & Database Zone

3.6 Backup & DR Zone

4. Потоки данных

4.1 Development Workflow

4.2 CD Workflow

4.3 AI Interaction Flow

4.4 Monitoring Data Flow

5. High Availability и масштабирование

5.1 HA Strategy

5.2 Scaling Points

5.3 Capacity Planning

6. Disaster Recovery

6.1 RTO и RPO Targets

6.2 DR Scenarios

6.3 Recovery Procedures

6.4 Testing DR

27 KiB

Raw Blame History