diff --git a/docs/gitops-cicd/05-development-environment.md b/docs/gitops-cicd/05-development-environment.md new file mode 100644 index 0000000..0972c8f --- /dev/null +++ b/docs/gitops-cicd/05-development-environment.md @@ -0,0 +1,500 @@ +# FinTech GitOps CI/CD - Development Environment + +**Версия:** 1.0 +**Дата:** Январь 2026 +**Целевая аудитория:** DevOps Team, Infrastructure, Development Team + +--- + +## Содержание + +1. [Назначение Dev Environment](#1-назначение-dev-environment) +2. [Архитектура Dev окружения](#2-архитектура-dev-окружения) +3. [Технические требования](#3-технические-требования) +4. [План развертывания](#4-план-развертывания) +5. [Тестирование и валидация](#5-тестирование-и-валидация) +6. [Переход к Production](#6-переход-к-production) + +--- + +## 1. Назначение Dev Environment + +### 1.1 Зачем нужен отдельный Dev Environment + +**Безопасность:** +- Тестирование новых компонентов без риска для production +- Эксперименты с конфигурациями +- Обучение команды на безопасной среде +- Валидация security политик перед production + +**Проверка интеграций:** +- Тестирование CI/CD pipelines +- Валидация GitOps workflows +- Проверка backup/restore процедур +- Тестирование disaster recovery scenarios + +**Разработка и отладка:** +- Development приложений в production-like окружении +- Debugging проблем без impact на production +- Performance testing и tuning +- Capacity planning и load testing + +**Обучение команды:** +- Hands-on тренинг на реальной инфраструктуре +- Практика troubleshooting +- Изучение новых инструментов +- Onboarding новых сотрудников + +### 1.2 Отличия от Production + +**Масштаб:** +- Меньше ресурсов (~40% от production) +- Меньше replicas для services +- Shorter retention periods для данных +- Simplified HA (не обязательна полная redundancy) + +**Данные:** +- Synthetic/mock данные (НЕ production data) +- Anonymized копии production data где необходимо +- Меньшие dataset sizes +- Shorter retention + +**Availability:** +- SLA не критичны (допустимы downtime для maintenance) +- Может быть выключен в нерабочее время +- Scheduled maintenance windows без согласования + +**Security:** +- Менее строгие access controls (больше людей имеют доступ) +- Simplified authentication (можно без MFA для dev team) +- Relaxed network policies (для удобства debugging) +- НО: все равно следуем основным security practices + +--- + +## 2. Архитектура Dev окружения + +### 2.1 Network Layout + +**Separate VLAN от Production:** + +``` +Dev Environment VLAN: 10.100.0.0/16 + +Зоны (подсети): +├── Management Zone: 10.100.10.0/24 +│ ├── Gitea Dev: 10.100.10.10 +│ ├── Jenkins Dev: 10.100.10.20 +│ ├── Harbor Dev: 10.100.10.30 +│ ├── GitOps Operator Dev: 10.100.10.40 +│ └── Portainer Dev: 10.100.10.50 +│ +├── Swarm Cluster Zone: 10.100.20.0/24 +│ ├── Manager: 10.100.20.1 +│ └── Workers: 10.100.20.2-4 (3 workers) +│ +├── AI Zone: 10.100.30.0/24 +│ ├── Ollama Dev: 10.100.30.10 +│ └── MCP Server Dev: 10.100.30.20 +│ +├── Monitoring Zone: 10.100.40.0/24 +│ ├── Prometheus Dev: 10.100.40.10 +│ ├── Grafana Dev: 10.100.40.20 +│ └── Loki Dev: 10.100.40.30 +│ +└── Data Zone: 10.100.50.0/24 + ├── PostgreSQL: 10.100.50.10 + └── Storage: 10.100.50.20 +``` + +**Access:** +- Доступ через тот же VPN что и production (но separate subnet routing) +- Или dedicated Dev VPN (опционально) +- Jump host опционален (можно direct access для удобства dev team) + +### 2.2 Simplified Architecture + +**Single manager Swarm (упрощение):** +- 1 manager node вместо 3 (не нужен quorum в dev) +- 3 worker nodes (достаточно для testing HA behaviors) + +**No full redundancy:** +- Single instance каждого infrastructure service +- No automated failover (можно восстановить manually) +- Simplified backup (daily вместо hourly) + +**Shared infrastructure где возможно:** +- Один PostgreSQL server для всех dev databases +- Shared storage (single NFS server) +- Combined monitoring (все в одном Grafana) + +--- + +## 3. Технические требования + +### 3.1 Серверная инфраструктура + +**Вариант A: Отдельные VM (recommended)** + +| Component | Qty | CPU | RAM | Storage | Total Resources | +|-----------|-----|-----|-----|---------|-----------------| +| Gitea | 1 | 4 | 8 GB | 200 GB | 4 vCPU, 8 GB, 200 GB | +| Jenkins | 1 | 8 | 16 GB | 500 GB | 8 vCPU, 16 GB, 500 GB | +| Harbor | 1 | 4 | 8 GB | 2 TB | 4 vCPU, 8 GB, 2 TB | +| Swarm Manager | 1 | 4 | 8 GB | 100 GB | 4 vCPU, 8 GB, 100 GB | +| Swarm Workers | 3 | 8 | 16 GB | 200 GB | 24 vCPU, 48 GB, 600 GB | +| GitOps/Portainer | 1 | 2 | 4 GB | 50 GB | 2 vCPU, 4 GB, 50 GB | +| Ollama | 1 | 8 | 32 GB | 500 GB | 8 vCPU, 32 GB, 500 GB | +| MCP Server | 1 | 4 | 8 GB | 50 GB | 4 vCPU, 8 GB, 50 GB | +| Monitoring | 1 | 8 | 16 GB | 1 TB | 8 vCPU, 16 GB, 1 TB | +| PostgreSQL | 1 | 4 | 8 GB | 200 GB | 4 vCPU, 8 GB, 200 GB | +| Storage/Backup | 1 | 2 | 8 GB | 5 TB | 2 vCPU, 8 GB, 5 TB | +| **TOTAL** | **12 VMs** | **72 vCPU** | **168 GB** | **~10 TB** | - | + +**Вариант B: Single powerful server (budget option)** + +Если бюджет ограничен, можно развернуть все на одном мощном сервере: + +| Component | Specification | +|-----------|--------------| +| **CPU** | 80 vCPU | +| **RAM** | 256 GB | +| **Disk 1** | 2 TB NVMe SSD (OS, apps, databases) | +| **Disk 2** | 10 TB HDD RAID 10 (storage, backups) | +| **Network** | 2x 10 Gbps (bonded) | + +Все компоненты как VM на этом single host (используя KVM/Proxmox). + +**Pros:** Экономия costs, проще management +**Cons:** Single point of failure (ok для dev), limited scale + +### 3.2 Network Infrastructure + +**Minimum requirements:** +- 1 Gbps switch с VLAN support +- Firewall с routing между VLANs (может быть virtual/software) +- VPN gateway (shared с production или dedicated) + +**Recommended:** +- 10 Gbps switch для лучшей производительности +- Separate internet connection (чтобы dev experiments не влияли на production traffic) + +### 3.3 Storage Infrastructure + +**Local storage:** +- Fast SSD для OS и applications +- HDD для Harbor images и backups + +**Shared storage:** +- Simple NFS server sufficient (не нужен GlusterFS replication в dev) +- 5 TB capacity + +--- + +## 4. План развертывания + +### 4.1 Phase 1: Base Infrastructure (Week 1) + +**Day 1-2: Network Setup** +- Configure VLANs +- Setup firewall rules +- Configure VPN access +- DNS entries для dev services + +**Day 3-4: Server Provisioning** +- Deploy VM или prepare physical servers +- Install OS (Ubuntu 22.04 LTS) +- Basic hardening +- Network configuration + +**Day 5: Base Services** +- PostgreSQL installation и setup +- NFS storage setup +- Monitoring agents deployment + +### 4.2 Phase 2: Core Services (Week 2) + +**Day 1-2: Source Control** +- Deploy Gitea +- Configure PostgreSQL database +- Setup LDAP integration (если используется) +- Create initial repositories structure +- Import existing docs если есть + +**Day 3-4: CI/CD Foundation** +- Deploy Jenkins +- Install essential plugins +- Configure Gitea webhook integration +- Setup first sample pipeline +- Test build process + +**Day 5: Container Registry** +- Deploy Harbor +- Configure storage backend +- Enable vulnerability scanning +- Setup replication (если есть secondary Harbor) +- Test image push/pull + +### 4.3 Phase 3: Orchestration (Week 3) + +**Day 1-2: Docker Swarm Setup** +- Initialize Swarm на manager node +- Join worker nodes +- Configure overlay networks +- Setup secrets management +- Deploy test stack + +**Day 3: GitOps Automation** +- Deploy GitOps Operator +- Configure Git polling +- Test automated deployment +- Verify rollback functionality + +**Day 4: Management UI** +- Deploy Portainer +- Connect к Swarm +- Configure RBAC +- Create user accounts +- Deploy через UI (test) + +**Day 5: Integration Testing** +- End-to-end CI/CD test +- Git commit → build → push → deploy +- Verify monitoring +- Test rollback + +### 4.4 Phase 4: AI Infrastructure (Week 4) + +**Day 1-2: AI Server** +- Deploy Ollama server +- Download AI models (Llama 3, Qwen, etc.) +- Test inference +- Performance tuning + +**Day 3-4: MCP Server** +- Deploy MCP Server +- Configure connectors (Gitea, Swarm, DB) +- Test data access +- Integration с Ollama + +**Day 5: AI Integration Testing** +- End-to-end AI workflow test +- Query documentation через AI +- Analyze logs через AI +- Generate code examples + +### 4.5 Phase 5: Monitoring & Documentation (Week 5) + +**Day 1-2: Monitoring Stack** +- Deploy Prometheus +- Deploy Grafana +- Deploy Loki +- Configure dashboards +- Setup alerting rules + +**Day 3-4: Documentation** +- Create detailed runbooks +- Document all procedures +- Record configuration details +- Create architecture diagrams +- Write troubleshooting guides + +**Day 5: Team Training** +- Walkthrough всех компонентов +- Hands-on exercises +- Q&A session +- Access provisioning + +--- + +## 5. Тестирование и валидация + +### 5.1 Functional Testing + +**Git Operations:** +- Clone repositories +- Push commits +- Create Pull Requests +- Merge workflows +- Webhook triggers + +**CI Pipeline:** +- Build applications (multiple languages) +- Run tests (unit, integration) +- Security scanning +- Docker image creation +- Push к Harbor + +**CD Process:** +- Automated deployment +- Manual deployment через Portainer +- Service scaling +- Rolling updates +- Rollback operations + +**Monitoring:** +- Metrics collection +- Log aggregation +- Alert triggering +- Dashboard visualization + +**AI Capabilities:** +- Query documentation +- Analyze logs +- Code generation +- Troubleshooting assistance + +### 5.2 Performance Testing + +**Load Testing:** +- Multiple concurrent builds в Jenkins +- High-frequency deployments +- Large image pushes к Harbor +- Monitoring system под нагрузкой + +**Capacity Planning:** +- Resource utilization measurement +- Identify bottlenecks +- Determine scaling needs for production + +### 5.3 Security Testing + +**Vulnerability Scanning:** +- Container images +- Infrastructure components +- Dependencies + +**Penetration Testing:** +- Network security +- Access controls +- Authentication mechanisms + +**Compliance Validation:** +- Audit logging working +- Data encryption verified +- Access controls enforced + +### 5.4 Disaster Recovery Testing + +**Backup/Restore:** +- Database backup и restore +- Git repository backup и restore +- Configuration backup +- Full system restore + +**Failover Scenarios:** +- Service failures +- Node failures +- Network partitions +- Data corruption + +--- + +## 6. Переход к Production + +### 6.1 Lessons Learned от Dev + +**Документировать:** +- Все проблемы encountered +- Solutions и workarounds +- Performance bottlenecks +- Configuration optimizations +- Team feedback + +**Updates для Production:** +- Refined architecture +- Optimized configurations +- Improved procedures +- Better sizing estimates +- Updated documentation + +### 6.2 Production Readiness Checklist + +**Infrastructure:** +- [ ] All servers provisioned согласно specs +- [ ] Network configured с proper segmentation +- [ ] Firewall rules implemented и tested +- [ ] VPN access configured +- [ ] Monitoring fully deployed + +**Services:** +- [ ] All components deployed +- [ ] High availability configured +- [ ] Backup systems operational +- [ ] Disaster recovery tested +- [ ] Security hardening completed + +**Processes:** +- [ ] CI/CD pipelines validated +- [ ] GitOps workflows tested +- [ ] Incident response procedures documented +- [ ] Escalation paths defined +- [ ] On-call rotation established + +**Security:** +- [ ] Vulnerability scans completed +- [ ] Penetration testing passed +- [ ] Compliance requirements met +- [ ] Audit logging verified +- [ ] Access controls implemented + +**Documentation:** +- [ ] Architecture documented +- [ ] Runbooks created +- [ ] Troubleshooting guides written +- [ ] Contact lists updated +- [ ] Training materials prepared + +**Team:** +- [ ] Training completed +- [ ] Access provisioned +- [ ] Roles и responsibilities defined +- [ ] Communication channels established +- [ ] Support procedures understood + +### 6.3 Migration Strategy + +**Phased Approach:** + +**Phase 1: Pilot (1-2 weeks)** +- Migrate 1-2 non-critical applications +- Test full workflow в production +- Gather feedback +- Refine processes + +**Phase 2: Gradual Migration (1-2 months)** +- Migrate applications in batches +- 3-5 applications per week +- Monitor closely +- Address issues quickly + +**Phase 3: Full Production (ongoing)** +- All new applications use GitOps +- Legacy applications migrated over time +- Continuous improvement +- Regular reviews + +**Rollback Plan:** +- Keep legacy deployment process operational в параллель +- Document rollback procedures +- Test rollback scenarios +- Clear decision criteria для rollback + +--- + +**Success Criteria:** + +Dev environment считается успешным когда: +1. Все компоненты deployed и operational +2. End-to-end CI/CD workflow работает +3. Team trained и comfortable с инструментами +4. Documentation complete и accurate +5. Production deployment plan validated + +--- + +**Sign-off:** +- DevOps Lead: _______________ +- Development Lead: _______________ +- Infrastructure Lead: _______________ +- Date: _______________ \ No newline at end of file