From e7657db02dc0698e916a298da3f511d272ee3446 Mon Sep 17 00:00:00 2001 From: admin Date: Tue, 13 Jan 2026 08:49:43 +0000 Subject: [PATCH] Delete docs/gitops-cicd/08-ollama-infrastructure-requirements.md --- .../08-ollama-infrastructure-requirements.md | 776 ------------------ 1 file changed, 776 deletions(-) delete mode 100644 docs/gitops-cicd/08-ollama-infrastructure-requirements.md diff --git a/docs/gitops-cicd/08-ollama-infrastructure-requirements.md b/docs/gitops-cicd/08-ollama-infrastructure-requirements.md deleted file mode 100644 index bce7260..0000000 --- a/docs/gitops-cicd/08-ollama-infrastructure-requirements.md +++ /dev/null @@ -1,776 +0,0 @@ -# Требования к серверу Ollama для FinTech DevOps с MCP интеграцией - -**Версия:** 2.0 -**Дата:** Январь 2026 -**Статус:** Production Ready -**Целевая аудитория:** Infrastructure Team, DevOps, Security, Management - ---- - -## Executive Summary - -### Бизнес-обоснование - -**Проблема:** -FinTech компания генерирует огромное количество технической информации (код, логи, документация, Kubernetes конфигурации), которая распределена по множеству систем. Разработчики и DevOps инженеры тратят 30-40% времени на поиск информации, анализ логов и написание документации. - -**Решение:** -Self-hosted AI ассистент на базе Ollama с интеграцией через MCP (Model Context Protocol) ко всем источникам данных компании. - -**Ключевые преимущества для FinTech:** -- ✅ Данные не покидают корпоративную сеть (PCI DSS, GDPR compliance) -- ✅ Нет зависимости от внешних AI провайдеров (OpenAI, Anthropic) -- ✅ Полный контроль над обрабатываемой информацией -- ✅ Возможность обучения на конфиденциальных данных - -**Ожидаемый эффект:** -- 40% сокращение времени на поиск информации -- 50% ускорение написания документации -- 30% сокращение времени troubleshooting -- ROI: 8-12 месяцев - ---- - -## Содержание - -1. [Цели и Use Cases](#1-цели-и-use-cases) -2. [Архитектура решения](#2-архитектура-решения) -3. [Серверные требования](#3-серверные-требования) -4. [Выбор AI моделей](#4-выбор-ai-моделей) -5. [MCP Services](#5-mcp-services) -6. [Knowledge Base (RAG)](#6-knowledge-base-rag) -7. [Безопасность](#7-безопасность) -8. [Развертывание](#8-развертывание) -9. [Мониторинг](#9-мониторинг) -10. [Бюджет](#10-бюджет) - ---- - -## 1. Цели и Use Cases - -### 1.1 Основные задачи - -**Для DevOps команды (5 человек):** - -1. **Анализ Kubernetes/Docker Swarm** - - "Почему pod в CrashLoopBackOff?" - - "Как оптимизировать resource requests?" - - "Покажи все pods с высоким memory usage" - -2. **Troubleshooting по логам** - - "Найди причину ошибки 500 в logs за последний час" - - "Какие services показывают connection timeout?" - - "Анализ performance degradation" - -3. **Генерация инфраструктурного кода** - - "Создай Helm chart для microservice с PostgreSQL" - - "Напиши Terraform для AWS RDS с encryption" - - "Генерация docker-compose.yml" - -**Для разработчиков (5 человек):** - -1. **Code generation и review** - - "Напиши unit tests для этого сервиса" - - "Оптимизируй этот SQL query" - - "Code review: найди potential security issues" - -2. **Работа с документацией** - - "Как использовать наш internal payment API?" - - "Покажи примеры интеграции с fraud detection service" - -### 1.2 Технические требования - -- **Одновременные пользователи:** до 10 человек -- **Peak concurrent requests:** 8 одновременно -- **Источники данных:** - - Gitea (100+ репозиториев) - - Docker Swarm (50+ services) - - Kubernetes cluster (150+ pods, если используется) - - Loki logs (1 TB/месяц) - - Technical documentation (5000+ документов) - ---- - -## 2. Архитектура решения - -### 2.1 High-Level Architecture - -``` -┌─────────────────────────────────────────────────────────────┐ -│ USER ACCESS LAYER │ -│ │ -│ ┌──────────┐ ┌───────────┐ ┌──────────┐ │ -│ │ Web UI │ │ VS Code │ │ CLI Tool │ │ -│ │(Gradio) │ │(Extension)│ │ (Python) │ │ -│ └────┬─────┘ └─────┬─────┘ └────┬─────┘ │ -└───────┼──────────────┼──────────────┼─────────────────────┘ - │ │ │ - └──────────────┼──────────────┘ - │ -┌──────────────────────▼─────────────────────────────────────┐ -│ API GATEWAY / REVERSE PROXY │ -│ (Traefik/Nginx) │ -│ • TLS termination │ -│ • Authentication (LDAP/OIDC) │ -│ • Rate limiting (100 req/min per user) │ -│ • IP: 10.30.10.5 │ -└──────────────────────┬─────────────────────────────────────┘ - │ -┌──────────────────────▼─────────────────────────────────────┐ -│ OLLAMA INFERENCE LAYER │ -│ │ -│ ┌─────────────────────────────────────┐ │ -│ │ Ollama Server │ │ -│ │ │ │ -│ │ Models (Hot-loaded): │ │ -│ │ • qwen2.5-coder:32b (Code) │ │ -│ │ • deepseek-r1:32b (Reasoning) │ │ -│ │ • llama3.3:70b-q4 (Universal) │ │ -│ │ │ │ -│ │ GPU: 1x NVIDIA RTX 4090 24GB │ │ -│ │ CPU: 32 vCPU │ │ -│ │ RAM: 128 GB │ │ -│ │ IP: 10.30.10.10:11434 │ │ -│ └─────────────────────────────────────┘ │ -└──────────────────────┬─────────────────────────────────────┘ - │ -┌──────────────────────▼─────────────────────────────────────┐ -│ MCP (MODEL CONTEXT PROTOCOL) LAYER │ -│ │ -│ ┌─────────────────────────────────────┐ │ -│ │ MCP Orchestrator │ │ -│ │ • Request routing │ │ -│ │ • Context assembly │ │ -│ │ IP: 10.30.10.20 │ │ -│ └───────┬─────────────────────────────┘ │ -│ │ │ -│ ┌────┼────┬────────┬────────┬────────┬────────┐ │ -│ │ │ │ │ │ │ │ │ -│ ┌──▼─┐ ┌▼──┐ ┌▼────┐ ┌▼─────┐ ┌▼────┐ ┌▼─────┐ │ -│ │Git │ │Swm│ │ K8s │ │ Logs │ │Docs │ │CI/CD │ │ -│ │ea │ │arm│ │ │ │(Loki)│ │ │ │ │ │ -│ └────┘ └───┘ └─────┘ └──────┘ └─────┘ └──────┘ │ -└──────────────────────┬─────────────────────────────────────┘ - │ -┌──────────────────────▼─────────────────────────────────────┐ -│ KNOWLEDGE BASE / RAG LAYER │ -│ │ -│ ┌─────────────────────────────────────┐ │ -│ │ Vector Database (Qdrant) │ │ -│ │ • technical-docs (5000+ docs) │ │ -│ │ • code-snippets (10000+ samples) │ │ -│ │ • k8s-configs (500+ manifests) │ │ -│ │ • incidents (1000+ postmortems) │ │ -│ │ Storage: 500 GB │ │ -│ │ IP: 10.30.10.30:6333 │ │ -│ └─────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────────┐ │ -│ │ Embedding Service │ │ -│ │ • bge-large-en-v1.5 │ │ -│ │ • Text chunking (512 tokens) │ │ -│ │ IP: 10.30.10.31 │ │ -│ └─────────────────────────────────────┘ │ -└─────────────────────────────────────────────────────────────┘ -``` - ---- - -## 3. Серверные требования - -### 3.1 Production Configuration (Recommended) - -| Component | Specification | Rationale | -|-----------|--------------|-----------| -| **GPU** | 1x NVIDIA RTX 4090 24GB VRAM | Оптимальный баланс цена/производительность для 32B моделей | -| **GPU (альтернатива)** | 1x NVIDIA L40 48GB VRAM | Для 70B моделей и больших контекстов | -| **CPU** | AMD Ryzen 9 7950X (16 cores, 32 threads) | Preprocessing, embedding, parallel MCP calls | -| **RAM** | 128 GB DDR5 ECC | 64 GB для OS/services + 64 GB для model offloading | -| **Storage Primary** | 2x 2TB NVMe SSD (RAID 1) | Model cache, vector DB, fast I/O | -| **Storage Secondary** | 4TB SATA SSD | Document storage, backups | -| **Network** | 2x 10 Gbps (bonded) | High throughput для MCP data retrieval | -| **PSU** | 1600W 80+ Titanium | GPU power requirements | - -**Estimated Cost:** $12,000-15,000 (with RTX 4090) или $18,000-22,000 (with L40) - -### 3.2 GPU Selection Guide - -| Use Case | GPU | VRAM | Models Supported | Cost | -|----------|-----|------|------------------|------| -| **Code generation only** | RTX 3090 | 24 GB | qwen2.5-coder:32b | $1,000-1,500 | -| **Balanced (recommended)** | RTX 4090 | 24 GB | 32B models, 70B Q4 | $1,600-2,000 | -| **Large context (70B)** | L40 | 48 GB | llama3.3:70b | $6,000-8,000 | -| **Maximum capacity** | A100 | 80 GB | Multiple 70B models | $10,000-15,000 | - -**Recommendation для FinTech:** -RTX 4090 24GB - оптимальный выбор для 10 пользователей. - -### 3.3 Resource Allocation - -**VRAM:** -``` -Model Memory (Q4 quantization): -qwen2.5-coder:32b → 22 GB VRAM -deepseek-r1:32b → 24 GB VRAM -llama3.3:70b-q4 → 40 GB VRAM (needs L40) -``` - -**RAM (128 GB breakdown):** -``` -16 GB → OS (Ubuntu Server) -8 GB → Ollama service -32 GB → Vector DB (Qdrant) -16 GB → MCP Services -8 GB → Embedding service -8 GB → API Gateway + misc -40 GB → Model offloading buffer -``` - -**Storage (2 TB NVMe):** -``` -300 GB → AI Models -500 GB → Vector Database -200 GB → MCP Services cache -100 GB → OS и applications -900 GB → Free space / growth -``` - ---- - -## 4. Выбор AI моделей - -### 4.1 Рекомендованный Model Pool - -**Primary Models:** - -#### 1. qwen2.5-coder:32b - Code Specialist -``` -Purpose: Code generation, review, debugging -Size: 20 GB (Q4) -VRAM: 22 GB -Context: 32k tokens -Speed: ~45 tokens/sec (RTX 4090) - -Strengths: -✓ Лучший для infrastructure code (Terraform, K8s) -✓ Понимает DevOps patterns -✓ Отличные комментарии к коду - -Use cases: -• Генерация Helm charts -• Написание Bash scripts -• Code review для security issues -• Dockerfile optimization -``` - -#### 2. deepseek-r1:32b - Reasoning Engine -``` -Purpose: Complex analysis, troubleshooting -Size: 22 GB (Q4) -VRAM: 24 GB -Context: 64k tokens -Speed: ~40 tokens/sec - -Strengths: -✓ Excellent reasoning для root cause analysis -✓ Multi-step problem solving -✓ Complex системный анализ - -Use cases: -• Log analysis и troubleshooting -• Architecture decision making -• Incident post-mortems -• Performance optimization -``` - -#### 3. llama3.3:70b-q4 - Universal Assistant -``` -Purpose: Documentation, explanations -Size: 38 GB (Q4) -VRAM: 40 GB (needs L40) -Context: 128k tokens -Speed: ~25 tokens/sec - -Strengths: -✓ Best для длинной документации -✓ Excellent writing quality -✓ Multi-lingual - -Use cases: -• Technical documentation -• README files -• Architecture design documents -``` - -### 4.2 Model Performance Benchmarks - -**Real-world performance на RTX 4090:** - -| Task | Model | Context | Time | Quality | -|------|-------|---------|------|---------| -| **Code generation** | qwen2.5-coder:32b | 8k | 12 sec | 9/10 | -| **Log analysis** | deepseek-r1:32b | 32k | 25 sec | 9/10 | -| **Documentation** | llama3.3:70b-q4 | 64k | 90 sec* | 10/10 | -| **Quick Q&A** | qwen2.5-coder:32b | 2k | 3 sec | 8/10 | - -*70B модель на RTX 4090 работает через CPU offloading - ---- - -## 5. MCP Services - -### 5.1 MCP Architecture - -**Model Context Protocol (MCP)** - стандартизированный способ подключения AI моделей к внешним источникам данных. - -### 5.2 MCP Server: Gitea - -**Capabilities:** -``` -1. list_repositories() -2. get_file(repo, path, branch) -3. search_code(query, language) -4. get_commit_history(repo, file) -5. get_pull_requests(repo) -6. compare_branches(repo, base, head) -7. get_documentation(repo) -8. analyze_dependencies(repo) -``` - -**Configuration:** -```yaml -gitea: - url: "https://git.thedevops.dev" - read_only: true - allowed_repos: - - "admin/k3s-gitops" - - "devops/*" - max_requests_per_minute: 100 - cache_ttl: 300 -``` - -### 5.3 MCP Server: Docker Swarm - -**Capabilities:** -``` -1. list_services() -2. get_service_logs(service, tail, since) -3. describe_service(service) -4. list_stacks() -5. get_stack_services(stack) -6. analyze_service_health(service) -7. get_swarm_nodes() -``` - -**Security:** -```yaml -docker_swarm: - read_only: true - secrets_masking: true - secret_patterns: - - "*_PASSWORD" - - "*_TOKEN" - - "*_KEY" -``` - -### 5.4 MCP Server: Kubernetes - -**Capabilities:** -``` -1. get_pods(namespace, labels) -2. get_pod_logs(pod, namespace, container) -3. describe_pod(pod, namespace) -4. get_deployments(namespace) -5. get_events(namespace, since) -6. analyze_resource_usage(namespace) -``` - -**RBAC:** -```yaml -kubernetes: - read_only: true - namespaces: - allowed: ["production", "staging"] - denied: ["kube-system"] - mask_secrets: true -``` - -### 5.5 MCP Server: Logs (Loki) - -**Capabilities:** -``` -1. query_logs(query, start, end) -2. search_errors(service, since) -3. analyze_patterns(service, time_range) -4. get_service_logs(service, tail) -5. trace_request(request_id) -``` - -**Security:** -```yaml -loki: - max_query_range: "24h" - max_lines: 5000 - sensitive_patterns: - - regex: '\b\d{16}\b' # Credit cards - replacement: "[CARD_REDACTED]" - - regex: 'password=\S+' - replacement: "password=[REDACTED]" -``` - -### 5.6 MCP Server: Documentation - -**Capabilities:** -``` -1. search_docs(query, category) -2. get_document(doc_id) -3. list_runbooks() -4. get_architecture_docs() -5. search_code_examples(language, topic) -``` - -### 5.7 MCP Server: CI/CD - -**Capabilities:** -``` -1. get_build_status(job) -2. get_build_logs(job, build_number) -3. list_failed_builds(since) -4. get_argocd_applications() -5. get_application_health(app) -``` - ---- - -## 6. Knowledge Base (RAG) - -### 6.1 RAG Architecture - -**Data Sources:** -- Technical Documentation (5000+ docs) -- Code Repositories (10000+ snippets) -- Kubernetes Configs (500+ manifests) -- Incident History (1000+ postmortems) - -### 6.2 Vector Database (Qdrant) - -**Configuration:** -```yaml -service: - host: "0.0.0.0" - port: 6333 - -storage: - storage_path: "/var/lib/qdrant/storage" - on_disk_payload: true - -log_level: "INFO" -``` - -**Collections:** -```python -collections = [ - "technical_docs", # 5000+ documents - "code_snippets", # 10000+ samples - "incidents", # 1000+ postmortems - "k8s_configs", # 500+ manifests - "runbooks" # 200+ procedures -] -``` - -### 6.3 Embedding Service - -**Model:** bge-large-en-v1.5 (1024 dimensions) - -**Implementation:** -```python -from sentence_transformers import SentenceTransformer - -model = SentenceTransformer("BAAI/bge-large-en-v1.5") - -@app.post("/embed") -async def create_embeddings(texts: list[str]): - embeddings = model.encode(texts, normalize_embeddings=True) - return {"embeddings": embeddings.tolist()} -``` - ---- - -## 7. Безопасность - -### 7.1 Network Isolation - -**Firewall Rules:** -``` -Inbound: -├─ 443 (HTTPS) from Corporate VPN -├─ 11434 (Ollama) from MCP Orchestrator only -└─ 6333 (Qdrant) from Ollama server only - -Outbound: -├─ 3000 (Gitea API) -├─ 2377 (Docker Swarm API) -├─ 6443 (Kubernetes API) -└─ 3100 (Loki query API) - -Default: DENY ALL -``` - -### 7.2 Authentication - -```yaml -authentication: - provider: "ldap" - ldap: - url: "ldaps://ldap.company.local:636" - user_base: "ou=users,dc=company,dc=local" - -authorization: - roles: - - name: "devops" - permissions: - - "query:*" - - "mcp:*:read" - members: - - "cn=devops-team,ou=groups" -``` - -### 7.3 Secrets Masking - -```python -PATTERNS = [ - (r'password:\s*"?([^"\s]+)"?', r'password: "[REDACTED]"'), - (r'token:\s*"?([^"\s]+)"?', r'token: "[REDACTED]"'), - (r'\b\d{16}\b', '[CARD_REDACTED]'), # Credit cards - (r'\b\d{3}-\d{2}-\d{4}\b', '[SSN_REDACTED]'), # SSN -] -``` - -### 7.4 Audit Logging - -```python -# Log format: -# timestamp | user | action | details | result - -2026-01-12 14:23:45 | vladimir.levinas | query | model=qwen2.5-coder:32b | success -2026-01-12 14:23:46 | vladimir.levinas | mcp_k8s | method=get_pods | success -``` - ---- - -## 8. Развертывание - -### 8.1 Installation (Ubuntu 22.04) - -**Step 1: System Setup** -```bash -# Update system -apt update && apt upgrade -y - -# Install NVIDIA drivers -apt install -y nvidia-driver-535 - -# Install Docker -curl -fsSL https://get.docker.com | sh - -# Reboot -reboot -``` - -**Step 2: Install Ollama** -```bash -curl -fsSL https://ollama.com/install.sh | sh -systemctl enable ollama -systemctl start ollama - -# Pull models -ollama pull qwen2.5-coder:32b -ollama pull deepseek-r1:32b -``` - -**Step 3: Deploy Infrastructure** -```bash -# Clone repo -git clone https://git.thedevops.dev/devops/ollama-infrastructure -cd ollama-infrastructure - -# Configure -cp .env.example .env -# Edit .env with your settings - -# Deploy -docker-compose up -d - -# Initialize Vector DB -python3 scripts/init-vector-db.py - -# Load initial data -python3 scripts/load-docs.py -``` - -### 8.2 Production Checklist - -- [ ] Hardware протестирован -- [ ] GPU drivers работают (`nvidia-smi`) -- [ ] Ollama и модели загружены -- [ ] Docker containers запущены -- [ ] Vector DB инициализирован -- [ ] MCP services тестированы -- [ ] End-to-end тест пройден -- [ ] TLS сертификаты валидны -- [ ] LDAP authentication работает -- [ ] Rate limiting настроен -- [ ] Audit logging включен -- [ ] Backup настроен -- [ ] Monitoring настроен -- [ ] Team обучена - ---- - -## 9. Мониторинг - -### 9.1 Key Metrics - -**GPU Metrics:** -``` -nvidia_gpu_temperature_celsius -nvidia_gpu_utilization_percent -nvidia_gpu_memory_used_bytes -nvidia_gpu_power_usage_watts -``` - -**Ollama Metrics:** -``` -ollama_requests_total -ollama_request_duration_seconds -ollama_tokens_per_second -``` - -**MCP Metrics:** -``` -mcp_requests_total{service="gitea"} -mcp_request_duration_seconds -mcp_errors_total -``` - -### 9.2 Grafana Dashboards - -**Dashboard 1: Ollama Overview** -- GPU utilization -- Request rate -- Response time -- Active users - -**Dashboard 2: MCP Services** -- Request distribution by service -- Success/error rates -- Latency percentiles - -**Dashboard 3: Vector DB** -- Collection sizes -- Query performance -- Cache hit rate - ---- - -## 10. Бюджет - -### 10.1 Hardware Costs - -| Item | Specification | Cost | -|------|--------------|------| -| **GPU** | NVIDIA RTX 4090 24GB | $1,600-2,000 | -| **CPU** | AMD Ryzen 9 7950X | $500-600 | -| **RAM** | 128GB DDR5 ECC | $600-800 | -| **Storage** | 2x 2TB NVMe + 4TB SATA | $800-1,000 | -| **Motherboard** | High-end workstation | $400-500 | -| **PSU** | 1600W Titanium | $300-400 | -| **Case/Cooling** | Enterprise grade | $300-400 | -| **Network** | 2x 10GbE NIC | $200-300 | -| **TOTAL** | | **$12,000-15,000** | - -### 10.2 Software Costs - -| Item | Cost | -|------|------| -| OS (Ubuntu Server) | FREE | -| Ollama | FREE | -| Qdrant | FREE (open source) | -| All MCP services | FREE (self-developed) | -| Monitoring (Prometheus/Grafana) | FREE | -| **TOTAL** | **$0** | - -### 10.3 Annual Operational Costs - -| Item | Cost | -|------|------| -| Electricity (~500W 24/7) | $650/year | -| Cooling | $200/year | -| Maintenance | $500/year | -| Training/Documentation | $2,000/year | -| **TOTAL Annual OpEx** | **$3,350/year** | - -### 10.4 ROI Analysis - -**Total Initial Investment:** $12,000-15,000 - -**Annual Savings:** -``` -Time savings for 10 engineers: -├─ 4 hours/week saved per person -├─ 40 hours/week total -├─ 2080 hours/year total -└─ At $100/hour = $208,000/year saved - -Productivity increase: -├─ 30% faster troubleshooting -├─ 50% faster documentation -└─ Estimated value: $100,000/year - -Total annual benefit: ~$308,000 -``` - -**Payback Period:** ~1-2 months -**3-Year ROI:** 6000% - ---- - -## Appendix A: Quick Reference - -### Service URLs -``` -API Gateway: https://ai.company.local -Ollama API: http://10.30.10.10:11434 -Qdrant: http://10.30.10.30:6333 -Grafana: https://monitoring.company.local -``` - -### Common Commands -```bash -# Check Ollama status -ollama list - -# Run model test -ollama run qwen2.5-coder:32b "Hello" - -# Check GPU -nvidia-smi - -# View logs -docker-compose logs -f ollama - -# Backup Vector DB -docker exec qdrant tar -czf /backup/qdrant-$(date +%Y%m%d).tar.gz /qdrant/storage -``` - ---- - -**Document Version:** 2.0 -**Last Updated:** Январь 2026 -**Status:** Production Ready - -**Approvals:** -- [ ] Infrastructure Lead -- [ ] Security Lead -- [ ] DevOps Lead -- [ ] Financial Approval \ No newline at end of file