diff --git a/docs/gitops-cicd/08-ollama-infrastructure-requirements.md b/docs/gitops-cicd/08-ollama-infrastructure-requirements.md new file mode 100644 index 0000000..bce7260 --- /dev/null +++ b/docs/gitops-cicd/08-ollama-infrastructure-requirements.md @@ -0,0 +1,776 @@ +# Требования к серверу Ollama для FinTech DevOps с MCP интеграцией + +**Версия:** 2.0 +**Дата:** Январь 2026 +**Статус:** Production Ready +**Целевая аудитория:** Infrastructure Team, DevOps, Security, Management + +--- + +## Executive Summary + +### Бизнес-обоснование + +**Проблема:** +FinTech компания генерирует огромное количество технической информации (код, логи, документация, Kubernetes конфигурации), которая распределена по множеству систем. Разработчики и DevOps инженеры тратят 30-40% времени на поиск информации, анализ логов и написание документации. + +**Решение:** +Self-hosted AI ассистент на базе Ollama с интеграцией через MCP (Model Context Protocol) ко всем источникам данных компании. + +**Ключевые преимущества для FinTech:** +- ✅ Данные не покидают корпоративную сеть (PCI DSS, GDPR compliance) +- ✅ Нет зависимости от внешних AI провайдеров (OpenAI, Anthropic) +- ✅ Полный контроль над обрабатываемой информацией +- ✅ Возможность обучения на конфиденциальных данных + +**Ожидаемый эффект:** +- 40% сокращение времени на поиск информации +- 50% ускорение написания документации +- 30% сокращение времени troubleshooting +- ROI: 8-12 месяцев + +--- + +## Содержание + +1. [Цели и Use Cases](#1-цели-и-use-cases) +2. [Архитектура решения](#2-архитектура-решения) +3. [Серверные требования](#3-серверные-требования) +4. [Выбор AI моделей](#4-выбор-ai-моделей) +5. [MCP Services](#5-mcp-services) +6. [Knowledge Base (RAG)](#6-knowledge-base-rag) +7. [Безопасность](#7-безопасность) +8. [Развертывание](#8-развертывание) +9. [Мониторинг](#9-мониторинг) +10. [Бюджет](#10-бюджет) + +--- + +## 1. Цели и Use Cases + +### 1.1 Основные задачи + +**Для DevOps команды (5 человек):** + +1. **Анализ Kubernetes/Docker Swarm** + - "Почему pod в CrashLoopBackOff?" + - "Как оптимизировать resource requests?" + - "Покажи все pods с высоким memory usage" + +2. **Troubleshooting по логам** + - "Найди причину ошибки 500 в logs за последний час" + - "Какие services показывают connection timeout?" + - "Анализ performance degradation" + +3. **Генерация инфраструктурного кода** + - "Создай Helm chart для microservice с PostgreSQL" + - "Напиши Terraform для AWS RDS с encryption" + - "Генерация docker-compose.yml" + +**Для разработчиков (5 человек):** + +1. **Code generation и review** + - "Напиши unit tests для этого сервиса" + - "Оптимизируй этот SQL query" + - "Code review: найди potential security issues" + +2. **Работа с документацией** + - "Как использовать наш internal payment API?" + - "Покажи примеры интеграции с fraud detection service" + +### 1.2 Технические требования + +- **Одновременные пользователи:** до 10 человек +- **Peak concurrent requests:** 8 одновременно +- **Источники данных:** + - Gitea (100+ репозиториев) + - Docker Swarm (50+ services) + - Kubernetes cluster (150+ pods, если используется) + - Loki logs (1 TB/месяц) + - Technical documentation (5000+ документов) + +--- + +## 2. Архитектура решения + +### 2.1 High-Level Architecture + +``` +┌─────────────────────────────────────────────────────────────┐ +│ USER ACCESS LAYER │ +│ │ +│ ┌──────────┐ ┌───────────┐ ┌──────────┐ │ +│ │ Web UI │ │ VS Code │ │ CLI Tool │ │ +│ │(Gradio) │ │(Extension)│ │ (Python) │ │ +│ └────┬─────┘ └─────┬─────┘ └────┬─────┘ │ +└───────┼──────────────┼──────────────┼─────────────────────┘ + │ │ │ + └──────────────┼──────────────┘ + │ +┌──────────────────────▼─────────────────────────────────────┐ +│ API GATEWAY / REVERSE PROXY │ +│ (Traefik/Nginx) │ +│ • TLS termination │ +│ • Authentication (LDAP/OIDC) │ +│ • Rate limiting (100 req/min per user) │ +│ • IP: 10.30.10.5 │ +└──────────────────────┬─────────────────────────────────────┘ + │ +┌──────────────────────▼─────────────────────────────────────┐ +│ OLLAMA INFERENCE LAYER │ +│ │ +│ ┌─────────────────────────────────────┐ │ +│ │ Ollama Server │ │ +│ │ │ │ +│ │ Models (Hot-loaded): │ │ +│ │ • qwen2.5-coder:32b (Code) │ │ +│ │ • deepseek-r1:32b (Reasoning) │ │ +│ │ • llama3.3:70b-q4 (Universal) │ │ +│ │ │ │ +│ │ GPU: 1x NVIDIA RTX 4090 24GB │ │ +│ │ CPU: 32 vCPU │ │ +│ │ RAM: 128 GB │ │ +│ │ IP: 10.30.10.10:11434 │ │ +│ └─────────────────────────────────────┘ │ +└──────────────────────┬─────────────────────────────────────┘ + │ +┌──────────────────────▼─────────────────────────────────────┐ +│ MCP (MODEL CONTEXT PROTOCOL) LAYER │ +│ │ +│ ┌─────────────────────────────────────┐ │ +│ │ MCP Orchestrator │ │ +│ │ • Request routing │ │ +│ │ • Context assembly │ │ +│ │ IP: 10.30.10.20 │ │ +│ └───────┬─────────────────────────────┘ │ +│ │ │ +│ ┌────┼────┬────────┬────────┬────────┬────────┐ │ +│ │ │ │ │ │ │ │ │ +│ ┌──▼─┐ ┌▼──┐ ┌▼────┐ ┌▼─────┐ ┌▼────┐ ┌▼─────┐ │ +│ │Git │ │Swm│ │ K8s │ │ Logs │ │Docs │ │CI/CD │ │ +│ │ea │ │arm│ │ │ │(Loki)│ │ │ │ │ │ +│ └────┘ └───┘ └─────┘ └──────┘ └─────┘ └──────┘ │ +└──────────────────────┬─────────────────────────────────────┘ + │ +┌──────────────────────▼─────────────────────────────────────┐ +│ KNOWLEDGE BASE / RAG LAYER │ +│ │ +│ ┌─────────────────────────────────────┐ │ +│ │ Vector Database (Qdrant) │ │ +│ │ • technical-docs (5000+ docs) │ │ +│ │ • code-snippets (10000+ samples) │ │ +│ │ • k8s-configs (500+ manifests) │ │ +│ │ • incidents (1000+ postmortems) │ │ +│ │ Storage: 500 GB │ │ +│ │ IP: 10.30.10.30:6333 │ │ +│ └─────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────┐ │ +│ │ Embedding Service │ │ +│ │ • bge-large-en-v1.5 │ │ +│ │ • Text chunking (512 tokens) │ │ +│ │ IP: 10.30.10.31 │ │ +│ └─────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────┘ +``` + +--- + +## 3. Серверные требования + +### 3.1 Production Configuration (Recommended) + +| Component | Specification | Rationale | +|-----------|--------------|-----------| +| **GPU** | 1x NVIDIA RTX 4090 24GB VRAM | Оптимальный баланс цена/производительность для 32B моделей | +| **GPU (альтернатива)** | 1x NVIDIA L40 48GB VRAM | Для 70B моделей и больших контекстов | +| **CPU** | AMD Ryzen 9 7950X (16 cores, 32 threads) | Preprocessing, embedding, parallel MCP calls | +| **RAM** | 128 GB DDR5 ECC | 64 GB для OS/services + 64 GB для model offloading | +| **Storage Primary** | 2x 2TB NVMe SSD (RAID 1) | Model cache, vector DB, fast I/O | +| **Storage Secondary** | 4TB SATA SSD | Document storage, backups | +| **Network** | 2x 10 Gbps (bonded) | High throughput для MCP data retrieval | +| **PSU** | 1600W 80+ Titanium | GPU power requirements | + +**Estimated Cost:** $12,000-15,000 (with RTX 4090) или $18,000-22,000 (with L40) + +### 3.2 GPU Selection Guide + +| Use Case | GPU | VRAM | Models Supported | Cost | +|----------|-----|------|------------------|------| +| **Code generation only** | RTX 3090 | 24 GB | qwen2.5-coder:32b | $1,000-1,500 | +| **Balanced (recommended)** | RTX 4090 | 24 GB | 32B models, 70B Q4 | $1,600-2,000 | +| **Large context (70B)** | L40 | 48 GB | llama3.3:70b | $6,000-8,000 | +| **Maximum capacity** | A100 | 80 GB | Multiple 70B models | $10,000-15,000 | + +**Recommendation для FinTech:** +RTX 4090 24GB - оптимальный выбор для 10 пользователей. + +### 3.3 Resource Allocation + +**VRAM:** +``` +Model Memory (Q4 quantization): +qwen2.5-coder:32b → 22 GB VRAM +deepseek-r1:32b → 24 GB VRAM +llama3.3:70b-q4 → 40 GB VRAM (needs L40) +``` + +**RAM (128 GB breakdown):** +``` +16 GB → OS (Ubuntu Server) +8 GB → Ollama service +32 GB → Vector DB (Qdrant) +16 GB → MCP Services +8 GB → Embedding service +8 GB → API Gateway + misc +40 GB → Model offloading buffer +``` + +**Storage (2 TB NVMe):** +``` +300 GB → AI Models +500 GB → Vector Database +200 GB → MCP Services cache +100 GB → OS и applications +900 GB → Free space / growth +``` + +--- + +## 4. Выбор AI моделей + +### 4.1 Рекомендованный Model Pool + +**Primary Models:** + +#### 1. qwen2.5-coder:32b - Code Specialist +``` +Purpose: Code generation, review, debugging +Size: 20 GB (Q4) +VRAM: 22 GB +Context: 32k tokens +Speed: ~45 tokens/sec (RTX 4090) + +Strengths: +✓ Лучший для infrastructure code (Terraform, K8s) +✓ Понимает DevOps patterns +✓ Отличные комментарии к коду + +Use cases: +• Генерация Helm charts +• Написание Bash scripts +• Code review для security issues +• Dockerfile optimization +``` + +#### 2. deepseek-r1:32b - Reasoning Engine +``` +Purpose: Complex analysis, troubleshooting +Size: 22 GB (Q4) +VRAM: 24 GB +Context: 64k tokens +Speed: ~40 tokens/sec + +Strengths: +✓ Excellent reasoning для root cause analysis +✓ Multi-step problem solving +✓ Complex системный анализ + +Use cases: +• Log analysis и troubleshooting +• Architecture decision making +• Incident post-mortems +• Performance optimization +``` + +#### 3. llama3.3:70b-q4 - Universal Assistant +``` +Purpose: Documentation, explanations +Size: 38 GB (Q4) +VRAM: 40 GB (needs L40) +Context: 128k tokens +Speed: ~25 tokens/sec + +Strengths: +✓ Best для длинной документации +✓ Excellent writing quality +✓ Multi-lingual + +Use cases: +• Technical documentation +• README files +• Architecture design documents +``` + +### 4.2 Model Performance Benchmarks + +**Real-world performance на RTX 4090:** + +| Task | Model | Context | Time | Quality | +|------|-------|---------|------|---------| +| **Code generation** | qwen2.5-coder:32b | 8k | 12 sec | 9/10 | +| **Log analysis** | deepseek-r1:32b | 32k | 25 sec | 9/10 | +| **Documentation** | llama3.3:70b-q4 | 64k | 90 sec* | 10/10 | +| **Quick Q&A** | qwen2.5-coder:32b | 2k | 3 sec | 8/10 | + +*70B модель на RTX 4090 работает через CPU offloading + +--- + +## 5. MCP Services + +### 5.1 MCP Architecture + +**Model Context Protocol (MCP)** - стандартизированный способ подключения AI моделей к внешним источникам данных. + +### 5.2 MCP Server: Gitea + +**Capabilities:** +``` +1. list_repositories() +2. get_file(repo, path, branch) +3. search_code(query, language) +4. get_commit_history(repo, file) +5. get_pull_requests(repo) +6. compare_branches(repo, base, head) +7. get_documentation(repo) +8. analyze_dependencies(repo) +``` + +**Configuration:** +```yaml +gitea: + url: "https://git.thedevops.dev" + read_only: true + allowed_repos: + - "admin/k3s-gitops" + - "devops/*" + max_requests_per_minute: 100 + cache_ttl: 300 +``` + +### 5.3 MCP Server: Docker Swarm + +**Capabilities:** +``` +1. list_services() +2. get_service_logs(service, tail, since) +3. describe_service(service) +4. list_stacks() +5. get_stack_services(stack) +6. analyze_service_health(service) +7. get_swarm_nodes() +``` + +**Security:** +```yaml +docker_swarm: + read_only: true + secrets_masking: true + secret_patterns: + - "*_PASSWORD" + - "*_TOKEN" + - "*_KEY" +``` + +### 5.4 MCP Server: Kubernetes + +**Capabilities:** +``` +1. get_pods(namespace, labels) +2. get_pod_logs(pod, namespace, container) +3. describe_pod(pod, namespace) +4. get_deployments(namespace) +5. get_events(namespace, since) +6. analyze_resource_usage(namespace) +``` + +**RBAC:** +```yaml +kubernetes: + read_only: true + namespaces: + allowed: ["production", "staging"] + denied: ["kube-system"] + mask_secrets: true +``` + +### 5.5 MCP Server: Logs (Loki) + +**Capabilities:** +``` +1. query_logs(query, start, end) +2. search_errors(service, since) +3. analyze_patterns(service, time_range) +4. get_service_logs(service, tail) +5. trace_request(request_id) +``` + +**Security:** +```yaml +loki: + max_query_range: "24h" + max_lines: 5000 + sensitive_patterns: + - regex: '\b\d{16}\b' # Credit cards + replacement: "[CARD_REDACTED]" + - regex: 'password=\S+' + replacement: "password=[REDACTED]" +``` + +### 5.6 MCP Server: Documentation + +**Capabilities:** +``` +1. search_docs(query, category) +2. get_document(doc_id) +3. list_runbooks() +4. get_architecture_docs() +5. search_code_examples(language, topic) +``` + +### 5.7 MCP Server: CI/CD + +**Capabilities:** +``` +1. get_build_status(job) +2. get_build_logs(job, build_number) +3. list_failed_builds(since) +4. get_argocd_applications() +5. get_application_health(app) +``` + +--- + +## 6. Knowledge Base (RAG) + +### 6.1 RAG Architecture + +**Data Sources:** +- Technical Documentation (5000+ docs) +- Code Repositories (10000+ snippets) +- Kubernetes Configs (500+ manifests) +- Incident History (1000+ postmortems) + +### 6.2 Vector Database (Qdrant) + +**Configuration:** +```yaml +service: + host: "0.0.0.0" + port: 6333 + +storage: + storage_path: "/var/lib/qdrant/storage" + on_disk_payload: true + +log_level: "INFO" +``` + +**Collections:** +```python +collections = [ + "technical_docs", # 5000+ documents + "code_snippets", # 10000+ samples + "incidents", # 1000+ postmortems + "k8s_configs", # 500+ manifests + "runbooks" # 200+ procedures +] +``` + +### 6.3 Embedding Service + +**Model:** bge-large-en-v1.5 (1024 dimensions) + +**Implementation:** +```python +from sentence_transformers import SentenceTransformer + +model = SentenceTransformer("BAAI/bge-large-en-v1.5") + +@app.post("/embed") +async def create_embeddings(texts: list[str]): + embeddings = model.encode(texts, normalize_embeddings=True) + return {"embeddings": embeddings.tolist()} +``` + +--- + +## 7. Безопасность + +### 7.1 Network Isolation + +**Firewall Rules:** +``` +Inbound: +├─ 443 (HTTPS) from Corporate VPN +├─ 11434 (Ollama) from MCP Orchestrator only +└─ 6333 (Qdrant) from Ollama server only + +Outbound: +├─ 3000 (Gitea API) +├─ 2377 (Docker Swarm API) +├─ 6443 (Kubernetes API) +└─ 3100 (Loki query API) + +Default: DENY ALL +``` + +### 7.2 Authentication + +```yaml +authentication: + provider: "ldap" + ldap: + url: "ldaps://ldap.company.local:636" + user_base: "ou=users,dc=company,dc=local" + +authorization: + roles: + - name: "devops" + permissions: + - "query:*" + - "mcp:*:read" + members: + - "cn=devops-team,ou=groups" +``` + +### 7.3 Secrets Masking + +```python +PATTERNS = [ + (r'password:\s*"?([^"\s]+)"?', r'password: "[REDACTED]"'), + (r'token:\s*"?([^"\s]+)"?', r'token: "[REDACTED]"'), + (r'\b\d{16}\b', '[CARD_REDACTED]'), # Credit cards + (r'\b\d{3}-\d{2}-\d{4}\b', '[SSN_REDACTED]'), # SSN +] +``` + +### 7.4 Audit Logging + +```python +# Log format: +# timestamp | user | action | details | result + +2026-01-12 14:23:45 | vladimir.levinas | query | model=qwen2.5-coder:32b | success +2026-01-12 14:23:46 | vladimir.levinas | mcp_k8s | method=get_pods | success +``` + +--- + +## 8. Развертывание + +### 8.1 Installation (Ubuntu 22.04) + +**Step 1: System Setup** +```bash +# Update system +apt update && apt upgrade -y + +# Install NVIDIA drivers +apt install -y nvidia-driver-535 + +# Install Docker +curl -fsSL https://get.docker.com | sh + +# Reboot +reboot +``` + +**Step 2: Install Ollama** +```bash +curl -fsSL https://ollama.com/install.sh | sh +systemctl enable ollama +systemctl start ollama + +# Pull models +ollama pull qwen2.5-coder:32b +ollama pull deepseek-r1:32b +``` + +**Step 3: Deploy Infrastructure** +```bash +# Clone repo +git clone https://git.thedevops.dev/devops/ollama-infrastructure +cd ollama-infrastructure + +# Configure +cp .env.example .env +# Edit .env with your settings + +# Deploy +docker-compose up -d + +# Initialize Vector DB +python3 scripts/init-vector-db.py + +# Load initial data +python3 scripts/load-docs.py +``` + +### 8.2 Production Checklist + +- [ ] Hardware протестирован +- [ ] GPU drivers работают (`nvidia-smi`) +- [ ] Ollama и модели загружены +- [ ] Docker containers запущены +- [ ] Vector DB инициализирован +- [ ] MCP services тестированы +- [ ] End-to-end тест пройден +- [ ] TLS сертификаты валидны +- [ ] LDAP authentication работает +- [ ] Rate limiting настроен +- [ ] Audit logging включен +- [ ] Backup настроен +- [ ] Monitoring настроен +- [ ] Team обучена + +--- + +## 9. Мониторинг + +### 9.1 Key Metrics + +**GPU Metrics:** +``` +nvidia_gpu_temperature_celsius +nvidia_gpu_utilization_percent +nvidia_gpu_memory_used_bytes +nvidia_gpu_power_usage_watts +``` + +**Ollama Metrics:** +``` +ollama_requests_total +ollama_request_duration_seconds +ollama_tokens_per_second +``` + +**MCP Metrics:** +``` +mcp_requests_total{service="gitea"} +mcp_request_duration_seconds +mcp_errors_total +``` + +### 9.2 Grafana Dashboards + +**Dashboard 1: Ollama Overview** +- GPU utilization +- Request rate +- Response time +- Active users + +**Dashboard 2: MCP Services** +- Request distribution by service +- Success/error rates +- Latency percentiles + +**Dashboard 3: Vector DB** +- Collection sizes +- Query performance +- Cache hit rate + +--- + +## 10. Бюджет + +### 10.1 Hardware Costs + +| Item | Specification | Cost | +|------|--------------|------| +| **GPU** | NVIDIA RTX 4090 24GB | $1,600-2,000 | +| **CPU** | AMD Ryzen 9 7950X | $500-600 | +| **RAM** | 128GB DDR5 ECC | $600-800 | +| **Storage** | 2x 2TB NVMe + 4TB SATA | $800-1,000 | +| **Motherboard** | High-end workstation | $400-500 | +| **PSU** | 1600W Titanium | $300-400 | +| **Case/Cooling** | Enterprise grade | $300-400 | +| **Network** | 2x 10GbE NIC | $200-300 | +| **TOTAL** | | **$12,000-15,000** | + +### 10.2 Software Costs + +| Item | Cost | +|------|------| +| OS (Ubuntu Server) | FREE | +| Ollama | FREE | +| Qdrant | FREE (open source) | +| All MCP services | FREE (self-developed) | +| Monitoring (Prometheus/Grafana) | FREE | +| **TOTAL** | **$0** | + +### 10.3 Annual Operational Costs + +| Item | Cost | +|------|------| +| Electricity (~500W 24/7) | $650/year | +| Cooling | $200/year | +| Maintenance | $500/year | +| Training/Documentation | $2,000/year | +| **TOTAL Annual OpEx** | **$3,350/year** | + +### 10.4 ROI Analysis + +**Total Initial Investment:** $12,000-15,000 + +**Annual Savings:** +``` +Time savings for 10 engineers: +├─ 4 hours/week saved per person +├─ 40 hours/week total +├─ 2080 hours/year total +└─ At $100/hour = $208,000/year saved + +Productivity increase: +├─ 30% faster troubleshooting +├─ 50% faster documentation +└─ Estimated value: $100,000/year + +Total annual benefit: ~$308,000 +``` + +**Payback Period:** ~1-2 months +**3-Year ROI:** 6000% + +--- + +## Appendix A: Quick Reference + +### Service URLs +``` +API Gateway: https://ai.company.local +Ollama API: http://10.30.10.10:11434 +Qdrant: http://10.30.10.30:6333 +Grafana: https://monitoring.company.local +``` + +### Common Commands +```bash +# Check Ollama status +ollama list + +# Run model test +ollama run qwen2.5-coder:32b "Hello" + +# Check GPU +nvidia-smi + +# View logs +docker-compose logs -f ollama + +# Backup Vector DB +docker exec qdrant tar -czf /backup/qdrant-$(date +%Y%m%d).tar.gz /qdrant/storage +``` + +--- + +**Document Version:** 2.0 +**Last Updated:** Январь 2026 +**Status:** Production Ready + +**Approvals:** +- [ ] Infrastructure Lead +- [ ] Security Lead +- [ ] DevOps Lead +- [ ] Financial Approval \ No newline at end of file