Delete docs/gitops-cicd/08-ollama-infrastructure-requirements.md
This commit is contained in:
@@ -1,776 +0,0 @@
|
|||||||
# Требования к серверу Ollama для FinTech DevOps с MCP интеграцией
|
|
||||||
|
|
||||||
**Версия:** 2.0
|
|
||||||
**Дата:** Январь 2026
|
|
||||||
**Статус:** Production Ready
|
|
||||||
**Целевая аудитория:** Infrastructure Team, DevOps, Security, Management
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Executive Summary
|
|
||||||
|
|
||||||
### Бизнес-обоснование
|
|
||||||
|
|
||||||
**Проблема:**
|
|
||||||
FinTech компания генерирует огромное количество технической информации (код, логи, документация, Kubernetes конфигурации), которая распределена по множеству систем. Разработчики и DevOps инженеры тратят 30-40% времени на поиск информации, анализ логов и написание документации.
|
|
||||||
|
|
||||||
**Решение:**
|
|
||||||
Self-hosted AI ассистент на базе Ollama с интеграцией через MCP (Model Context Protocol) ко всем источникам данных компании.
|
|
||||||
|
|
||||||
**Ключевые преимущества для FinTech:**
|
|
||||||
- ✅ Данные не покидают корпоративную сеть (PCI DSS, GDPR compliance)
|
|
||||||
- ✅ Нет зависимости от внешних AI провайдеров (OpenAI, Anthropic)
|
|
||||||
- ✅ Полный контроль над обрабатываемой информацией
|
|
||||||
- ✅ Возможность обучения на конфиденциальных данных
|
|
||||||
|
|
||||||
**Ожидаемый эффект:**
|
|
||||||
- 40% сокращение времени на поиск информации
|
|
||||||
- 50% ускорение написания документации
|
|
||||||
- 30% сокращение времени troubleshooting
|
|
||||||
- ROI: 8-12 месяцев
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Содержание
|
|
||||||
|
|
||||||
1. [Цели и Use Cases](#1-цели-и-use-cases)
|
|
||||||
2. [Архитектура решения](#2-архитектура-решения)
|
|
||||||
3. [Серверные требования](#3-серверные-требования)
|
|
||||||
4. [Выбор AI моделей](#4-выбор-ai-моделей)
|
|
||||||
5. [MCP Services](#5-mcp-services)
|
|
||||||
6. [Knowledge Base (RAG)](#6-knowledge-base-rag)
|
|
||||||
7. [Безопасность](#7-безопасность)
|
|
||||||
8. [Развертывание](#8-развертывание)
|
|
||||||
9. [Мониторинг](#9-мониторинг)
|
|
||||||
10. [Бюджет](#10-бюджет)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 1. Цели и Use Cases
|
|
||||||
|
|
||||||
### 1.1 Основные задачи
|
|
||||||
|
|
||||||
**Для DevOps команды (5 человек):**
|
|
||||||
|
|
||||||
1. **Анализ Kubernetes/Docker Swarm**
|
|
||||||
- "Почему pod в CrashLoopBackOff?"
|
|
||||||
- "Как оптимизировать resource requests?"
|
|
||||||
- "Покажи все pods с высоким memory usage"
|
|
||||||
|
|
||||||
2. **Troubleshooting по логам**
|
|
||||||
- "Найди причину ошибки 500 в logs за последний час"
|
|
||||||
- "Какие services показывают connection timeout?"
|
|
||||||
- "Анализ performance degradation"
|
|
||||||
|
|
||||||
3. **Генерация инфраструктурного кода**
|
|
||||||
- "Создай Helm chart для microservice с PostgreSQL"
|
|
||||||
- "Напиши Terraform для AWS RDS с encryption"
|
|
||||||
- "Генерация docker-compose.yml"
|
|
||||||
|
|
||||||
**Для разработчиков (5 человек):**
|
|
||||||
|
|
||||||
1. **Code generation и review**
|
|
||||||
- "Напиши unit tests для этого сервиса"
|
|
||||||
- "Оптимизируй этот SQL query"
|
|
||||||
- "Code review: найди potential security issues"
|
|
||||||
|
|
||||||
2. **Работа с документацией**
|
|
||||||
- "Как использовать наш internal payment API?"
|
|
||||||
- "Покажи примеры интеграции с fraud detection service"
|
|
||||||
|
|
||||||
### 1.2 Технические требования
|
|
||||||
|
|
||||||
- **Одновременные пользователи:** до 10 человек
|
|
||||||
- **Peak concurrent requests:** 8 одновременно
|
|
||||||
- **Источники данных:**
|
|
||||||
- Gitea (100+ репозиториев)
|
|
||||||
- Docker Swarm (50+ services)
|
|
||||||
- Kubernetes cluster (150+ pods, если используется)
|
|
||||||
- Loki logs (1 TB/месяц)
|
|
||||||
- Technical documentation (5000+ документов)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 2. Архитектура решения
|
|
||||||
|
|
||||||
### 2.1 High-Level Architecture
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────────────────────────────────────────────────────┐
|
|
||||||
│ USER ACCESS LAYER │
|
|
||||||
│ │
|
|
||||||
│ ┌──────────┐ ┌───────────┐ ┌──────────┐ │
|
|
||||||
│ │ Web UI │ │ VS Code │ │ CLI Tool │ │
|
|
||||||
│ │(Gradio) │ │(Extension)│ │ (Python) │ │
|
|
||||||
│ └────┬─────┘ └─────┬─────┘ └────┬─────┘ │
|
|
||||||
└───────┼──────────────┼──────────────┼─────────────────────┘
|
|
||||||
│ │ │
|
|
||||||
└──────────────┼──────────────┘
|
|
||||||
│
|
|
||||||
┌──────────────────────▼─────────────────────────────────────┐
|
|
||||||
│ API GATEWAY / REVERSE PROXY │
|
|
||||||
│ (Traefik/Nginx) │
|
|
||||||
│ • TLS termination │
|
|
||||||
│ • Authentication (LDAP/OIDC) │
|
|
||||||
│ • Rate limiting (100 req/min per user) │
|
|
||||||
│ • IP: 10.30.10.5 │
|
|
||||||
└──────────────────────┬─────────────────────────────────────┘
|
|
||||||
│
|
|
||||||
┌──────────────────────▼─────────────────────────────────────┐
|
|
||||||
│ OLLAMA INFERENCE LAYER │
|
|
||||||
│ │
|
|
||||||
│ ┌─────────────────────────────────────┐ │
|
|
||||||
│ │ Ollama Server │ │
|
|
||||||
│ │ │ │
|
|
||||||
│ │ Models (Hot-loaded): │ │
|
|
||||||
│ │ • qwen2.5-coder:32b (Code) │ │
|
|
||||||
│ │ • deepseek-r1:32b (Reasoning) │ │
|
|
||||||
│ │ • llama3.3:70b-q4 (Universal) │ │
|
|
||||||
│ │ │ │
|
|
||||||
│ │ GPU: 1x NVIDIA RTX 4090 24GB │ │
|
|
||||||
│ │ CPU: 32 vCPU │ │
|
|
||||||
│ │ RAM: 128 GB │ │
|
|
||||||
│ │ IP: 10.30.10.10:11434 │ │
|
|
||||||
│ └─────────────────────────────────────┘ │
|
|
||||||
└──────────────────────┬─────────────────────────────────────┘
|
|
||||||
│
|
|
||||||
┌──────────────────────▼─────────────────────────────────────┐
|
|
||||||
│ MCP (MODEL CONTEXT PROTOCOL) LAYER │
|
|
||||||
│ │
|
|
||||||
│ ┌─────────────────────────────────────┐ │
|
|
||||||
│ │ MCP Orchestrator │ │
|
|
||||||
│ │ • Request routing │ │
|
|
||||||
│ │ • Context assembly │ │
|
|
||||||
│ │ IP: 10.30.10.20 │ │
|
|
||||||
│ └───────┬─────────────────────────────┘ │
|
|
||||||
│ │ │
|
|
||||||
│ ┌────┼────┬────────┬────────┬────────┬────────┐ │
|
|
||||||
│ │ │ │ │ │ │ │ │
|
|
||||||
│ ┌──▼─┐ ┌▼──┐ ┌▼────┐ ┌▼─────┐ ┌▼────┐ ┌▼─────┐ │
|
|
||||||
│ │Git │ │Swm│ │ K8s │ │ Logs │ │Docs │ │CI/CD │ │
|
|
||||||
│ │ea │ │arm│ │ │ │(Loki)│ │ │ │ │ │
|
|
||||||
│ └────┘ └───┘ └─────┘ └──────┘ └─────┘ └──────┘ │
|
|
||||||
└──────────────────────┬─────────────────────────────────────┘
|
|
||||||
│
|
|
||||||
┌──────────────────────▼─────────────────────────────────────┐
|
|
||||||
│ KNOWLEDGE BASE / RAG LAYER │
|
|
||||||
│ │
|
|
||||||
│ ┌─────────────────────────────────────┐ │
|
|
||||||
│ │ Vector Database (Qdrant) │ │
|
|
||||||
│ │ • technical-docs (5000+ docs) │ │
|
|
||||||
│ │ • code-snippets (10000+ samples) │ │
|
|
||||||
│ │ • k8s-configs (500+ manifests) │ │
|
|
||||||
│ │ • incidents (1000+ postmortems) │ │
|
|
||||||
│ │ Storage: 500 GB │ │
|
|
||||||
│ │ IP: 10.30.10.30:6333 │ │
|
|
||||||
│ └─────────────────────────────────────┘ │
|
|
||||||
│ │
|
|
||||||
│ ┌─────────────────────────────────────┐ │
|
|
||||||
│ │ Embedding Service │ │
|
|
||||||
│ │ • bge-large-en-v1.5 │ │
|
|
||||||
│ │ • Text chunking (512 tokens) │ │
|
|
||||||
│ │ IP: 10.30.10.31 │ │
|
|
||||||
│ └─────────────────────────────────────┘ │
|
|
||||||
└─────────────────────────────────────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 3. Серверные требования
|
|
||||||
|
|
||||||
### 3.1 Production Configuration (Recommended)
|
|
||||||
|
|
||||||
| Component | Specification | Rationale |
|
|
||||||
|-----------|--------------|-----------|
|
|
||||||
| **GPU** | 1x NVIDIA RTX 4090 24GB VRAM | Оптимальный баланс цена/производительность для 32B моделей |
|
|
||||||
| **GPU (альтернатива)** | 1x NVIDIA L40 48GB VRAM | Для 70B моделей и больших контекстов |
|
|
||||||
| **CPU** | AMD Ryzen 9 7950X (16 cores, 32 threads) | Preprocessing, embedding, parallel MCP calls |
|
|
||||||
| **RAM** | 128 GB DDR5 ECC | 64 GB для OS/services + 64 GB для model offloading |
|
|
||||||
| **Storage Primary** | 2x 2TB NVMe SSD (RAID 1) | Model cache, vector DB, fast I/O |
|
|
||||||
| **Storage Secondary** | 4TB SATA SSD | Document storage, backups |
|
|
||||||
| **Network** | 2x 10 Gbps (bonded) | High throughput для MCP data retrieval |
|
|
||||||
| **PSU** | 1600W 80+ Titanium | GPU power requirements |
|
|
||||||
|
|
||||||
**Estimated Cost:** $12,000-15,000 (with RTX 4090) или $18,000-22,000 (with L40)
|
|
||||||
|
|
||||||
### 3.2 GPU Selection Guide
|
|
||||||
|
|
||||||
| Use Case | GPU | VRAM | Models Supported | Cost |
|
|
||||||
|----------|-----|------|------------------|------|
|
|
||||||
| **Code generation only** | RTX 3090 | 24 GB | qwen2.5-coder:32b | $1,000-1,500 |
|
|
||||||
| **Balanced (recommended)** | RTX 4090 | 24 GB | 32B models, 70B Q4 | $1,600-2,000 |
|
|
||||||
| **Large context (70B)** | L40 | 48 GB | llama3.3:70b | $6,000-8,000 |
|
|
||||||
| **Maximum capacity** | A100 | 80 GB | Multiple 70B models | $10,000-15,000 |
|
|
||||||
|
|
||||||
**Recommendation для FinTech:**
|
|
||||||
RTX 4090 24GB - оптимальный выбор для 10 пользователей.
|
|
||||||
|
|
||||||
### 3.3 Resource Allocation
|
|
||||||
|
|
||||||
**VRAM:**
|
|
||||||
```
|
|
||||||
Model Memory (Q4 quantization):
|
|
||||||
qwen2.5-coder:32b → 22 GB VRAM
|
|
||||||
deepseek-r1:32b → 24 GB VRAM
|
|
||||||
llama3.3:70b-q4 → 40 GB VRAM (needs L40)
|
|
||||||
```
|
|
||||||
|
|
||||||
**RAM (128 GB breakdown):**
|
|
||||||
```
|
|
||||||
16 GB → OS (Ubuntu Server)
|
|
||||||
8 GB → Ollama service
|
|
||||||
32 GB → Vector DB (Qdrant)
|
|
||||||
16 GB → MCP Services
|
|
||||||
8 GB → Embedding service
|
|
||||||
8 GB → API Gateway + misc
|
|
||||||
40 GB → Model offloading buffer
|
|
||||||
```
|
|
||||||
|
|
||||||
**Storage (2 TB NVMe):**
|
|
||||||
```
|
|
||||||
300 GB → AI Models
|
|
||||||
500 GB → Vector Database
|
|
||||||
200 GB → MCP Services cache
|
|
||||||
100 GB → OS и applications
|
|
||||||
900 GB → Free space / growth
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 4. Выбор AI моделей
|
|
||||||
|
|
||||||
### 4.1 Рекомендованный Model Pool
|
|
||||||
|
|
||||||
**Primary Models:**
|
|
||||||
|
|
||||||
#### 1. qwen2.5-coder:32b - Code Specialist
|
|
||||||
```
|
|
||||||
Purpose: Code generation, review, debugging
|
|
||||||
Size: 20 GB (Q4)
|
|
||||||
VRAM: 22 GB
|
|
||||||
Context: 32k tokens
|
|
||||||
Speed: ~45 tokens/sec (RTX 4090)
|
|
||||||
|
|
||||||
Strengths:
|
|
||||||
✓ Лучший для infrastructure code (Terraform, K8s)
|
|
||||||
✓ Понимает DevOps patterns
|
|
||||||
✓ Отличные комментарии к коду
|
|
||||||
|
|
||||||
Use cases:
|
|
||||||
• Генерация Helm charts
|
|
||||||
• Написание Bash scripts
|
|
||||||
• Code review для security issues
|
|
||||||
• Dockerfile optimization
|
|
||||||
```
|
|
||||||
|
|
||||||
#### 2. deepseek-r1:32b - Reasoning Engine
|
|
||||||
```
|
|
||||||
Purpose: Complex analysis, troubleshooting
|
|
||||||
Size: 22 GB (Q4)
|
|
||||||
VRAM: 24 GB
|
|
||||||
Context: 64k tokens
|
|
||||||
Speed: ~40 tokens/sec
|
|
||||||
|
|
||||||
Strengths:
|
|
||||||
✓ Excellent reasoning для root cause analysis
|
|
||||||
✓ Multi-step problem solving
|
|
||||||
✓ Complex системный анализ
|
|
||||||
|
|
||||||
Use cases:
|
|
||||||
• Log analysis и troubleshooting
|
|
||||||
• Architecture decision making
|
|
||||||
• Incident post-mortems
|
|
||||||
• Performance optimization
|
|
||||||
```
|
|
||||||
|
|
||||||
#### 3. llama3.3:70b-q4 - Universal Assistant
|
|
||||||
```
|
|
||||||
Purpose: Documentation, explanations
|
|
||||||
Size: 38 GB (Q4)
|
|
||||||
VRAM: 40 GB (needs L40)
|
|
||||||
Context: 128k tokens
|
|
||||||
Speed: ~25 tokens/sec
|
|
||||||
|
|
||||||
Strengths:
|
|
||||||
✓ Best для длинной документации
|
|
||||||
✓ Excellent writing quality
|
|
||||||
✓ Multi-lingual
|
|
||||||
|
|
||||||
Use cases:
|
|
||||||
• Technical documentation
|
|
||||||
• README files
|
|
||||||
• Architecture design documents
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4.2 Model Performance Benchmarks
|
|
||||||
|
|
||||||
**Real-world performance на RTX 4090:**
|
|
||||||
|
|
||||||
| Task | Model | Context | Time | Quality |
|
|
||||||
|------|-------|---------|------|---------|
|
|
||||||
| **Code generation** | qwen2.5-coder:32b | 8k | 12 sec | 9/10 |
|
|
||||||
| **Log analysis** | deepseek-r1:32b | 32k | 25 sec | 9/10 |
|
|
||||||
| **Documentation** | llama3.3:70b-q4 | 64k | 90 sec* | 10/10 |
|
|
||||||
| **Quick Q&A** | qwen2.5-coder:32b | 2k | 3 sec | 8/10 |
|
|
||||||
|
|
||||||
*70B модель на RTX 4090 работает через CPU offloading
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 5. MCP Services
|
|
||||||
|
|
||||||
### 5.1 MCP Architecture
|
|
||||||
|
|
||||||
**Model Context Protocol (MCP)** - стандартизированный способ подключения AI моделей к внешним источникам данных.
|
|
||||||
|
|
||||||
### 5.2 MCP Server: Gitea
|
|
||||||
|
|
||||||
**Capabilities:**
|
|
||||||
```
|
|
||||||
1. list_repositories()
|
|
||||||
2. get_file(repo, path, branch)
|
|
||||||
3. search_code(query, language)
|
|
||||||
4. get_commit_history(repo, file)
|
|
||||||
5. get_pull_requests(repo)
|
|
||||||
6. compare_branches(repo, base, head)
|
|
||||||
7. get_documentation(repo)
|
|
||||||
8. analyze_dependencies(repo)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Configuration:**
|
|
||||||
```yaml
|
|
||||||
gitea:
|
|
||||||
url: "https://git.thedevops.dev"
|
|
||||||
read_only: true
|
|
||||||
allowed_repos:
|
|
||||||
- "admin/k3s-gitops"
|
|
||||||
- "devops/*"
|
|
||||||
max_requests_per_minute: 100
|
|
||||||
cache_ttl: 300
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5.3 MCP Server: Docker Swarm
|
|
||||||
|
|
||||||
**Capabilities:**
|
|
||||||
```
|
|
||||||
1. list_services()
|
|
||||||
2. get_service_logs(service, tail, since)
|
|
||||||
3. describe_service(service)
|
|
||||||
4. list_stacks()
|
|
||||||
5. get_stack_services(stack)
|
|
||||||
6. analyze_service_health(service)
|
|
||||||
7. get_swarm_nodes()
|
|
||||||
```
|
|
||||||
|
|
||||||
**Security:**
|
|
||||||
```yaml
|
|
||||||
docker_swarm:
|
|
||||||
read_only: true
|
|
||||||
secrets_masking: true
|
|
||||||
secret_patterns:
|
|
||||||
- "*_PASSWORD"
|
|
||||||
- "*_TOKEN"
|
|
||||||
- "*_KEY"
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5.4 MCP Server: Kubernetes
|
|
||||||
|
|
||||||
**Capabilities:**
|
|
||||||
```
|
|
||||||
1. get_pods(namespace, labels)
|
|
||||||
2. get_pod_logs(pod, namespace, container)
|
|
||||||
3. describe_pod(pod, namespace)
|
|
||||||
4. get_deployments(namespace)
|
|
||||||
5. get_events(namespace, since)
|
|
||||||
6. analyze_resource_usage(namespace)
|
|
||||||
```
|
|
||||||
|
|
||||||
**RBAC:**
|
|
||||||
```yaml
|
|
||||||
kubernetes:
|
|
||||||
read_only: true
|
|
||||||
namespaces:
|
|
||||||
allowed: ["production", "staging"]
|
|
||||||
denied: ["kube-system"]
|
|
||||||
mask_secrets: true
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5.5 MCP Server: Logs (Loki)
|
|
||||||
|
|
||||||
**Capabilities:**
|
|
||||||
```
|
|
||||||
1. query_logs(query, start, end)
|
|
||||||
2. search_errors(service, since)
|
|
||||||
3. analyze_patterns(service, time_range)
|
|
||||||
4. get_service_logs(service, tail)
|
|
||||||
5. trace_request(request_id)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Security:**
|
|
||||||
```yaml
|
|
||||||
loki:
|
|
||||||
max_query_range: "24h"
|
|
||||||
max_lines: 5000
|
|
||||||
sensitive_patterns:
|
|
||||||
- regex: '\b\d{16}\b' # Credit cards
|
|
||||||
replacement: "[CARD_REDACTED]"
|
|
||||||
- regex: 'password=\S+'
|
|
||||||
replacement: "password=[REDACTED]"
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5.6 MCP Server: Documentation
|
|
||||||
|
|
||||||
**Capabilities:**
|
|
||||||
```
|
|
||||||
1. search_docs(query, category)
|
|
||||||
2. get_document(doc_id)
|
|
||||||
3. list_runbooks()
|
|
||||||
4. get_architecture_docs()
|
|
||||||
5. search_code_examples(language, topic)
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5.7 MCP Server: CI/CD
|
|
||||||
|
|
||||||
**Capabilities:**
|
|
||||||
```
|
|
||||||
1. get_build_status(job)
|
|
||||||
2. get_build_logs(job, build_number)
|
|
||||||
3. list_failed_builds(since)
|
|
||||||
4. get_argocd_applications()
|
|
||||||
5. get_application_health(app)
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 6. Knowledge Base (RAG)
|
|
||||||
|
|
||||||
### 6.1 RAG Architecture
|
|
||||||
|
|
||||||
**Data Sources:**
|
|
||||||
- Technical Documentation (5000+ docs)
|
|
||||||
- Code Repositories (10000+ snippets)
|
|
||||||
- Kubernetes Configs (500+ manifests)
|
|
||||||
- Incident History (1000+ postmortems)
|
|
||||||
|
|
||||||
### 6.2 Vector Database (Qdrant)
|
|
||||||
|
|
||||||
**Configuration:**
|
|
||||||
```yaml
|
|
||||||
service:
|
|
||||||
host: "0.0.0.0"
|
|
||||||
port: 6333
|
|
||||||
|
|
||||||
storage:
|
|
||||||
storage_path: "/var/lib/qdrant/storage"
|
|
||||||
on_disk_payload: true
|
|
||||||
|
|
||||||
log_level: "INFO"
|
|
||||||
```
|
|
||||||
|
|
||||||
**Collections:**
|
|
||||||
```python
|
|
||||||
collections = [
|
|
||||||
"technical_docs", # 5000+ documents
|
|
||||||
"code_snippets", # 10000+ samples
|
|
||||||
"incidents", # 1000+ postmortems
|
|
||||||
"k8s_configs", # 500+ manifests
|
|
||||||
"runbooks" # 200+ procedures
|
|
||||||
]
|
|
||||||
```
|
|
||||||
|
|
||||||
### 6.3 Embedding Service
|
|
||||||
|
|
||||||
**Model:** bge-large-en-v1.5 (1024 dimensions)
|
|
||||||
|
|
||||||
**Implementation:**
|
|
||||||
```python
|
|
||||||
from sentence_transformers import SentenceTransformer
|
|
||||||
|
|
||||||
model = SentenceTransformer("BAAI/bge-large-en-v1.5")
|
|
||||||
|
|
||||||
@app.post("/embed")
|
|
||||||
async def create_embeddings(texts: list[str]):
|
|
||||||
embeddings = model.encode(texts, normalize_embeddings=True)
|
|
||||||
return {"embeddings": embeddings.tolist()}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 7. Безопасность
|
|
||||||
|
|
||||||
### 7.1 Network Isolation
|
|
||||||
|
|
||||||
**Firewall Rules:**
|
|
||||||
```
|
|
||||||
Inbound:
|
|
||||||
├─ 443 (HTTPS) from Corporate VPN
|
|
||||||
├─ 11434 (Ollama) from MCP Orchestrator only
|
|
||||||
└─ 6333 (Qdrant) from Ollama server only
|
|
||||||
|
|
||||||
Outbound:
|
|
||||||
├─ 3000 (Gitea API)
|
|
||||||
├─ 2377 (Docker Swarm API)
|
|
||||||
├─ 6443 (Kubernetes API)
|
|
||||||
└─ 3100 (Loki query API)
|
|
||||||
|
|
||||||
Default: DENY ALL
|
|
||||||
```
|
|
||||||
|
|
||||||
### 7.2 Authentication
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
authentication:
|
|
||||||
provider: "ldap"
|
|
||||||
ldap:
|
|
||||||
url: "ldaps://ldap.company.local:636"
|
|
||||||
user_base: "ou=users,dc=company,dc=local"
|
|
||||||
|
|
||||||
authorization:
|
|
||||||
roles:
|
|
||||||
- name: "devops"
|
|
||||||
permissions:
|
|
||||||
- "query:*"
|
|
||||||
- "mcp:*:read"
|
|
||||||
members:
|
|
||||||
- "cn=devops-team,ou=groups"
|
|
||||||
```
|
|
||||||
|
|
||||||
### 7.3 Secrets Masking
|
|
||||||
|
|
||||||
```python
|
|
||||||
PATTERNS = [
|
|
||||||
(r'password:\s*"?([^"\s]+)"?', r'password: "[REDACTED]"'),
|
|
||||||
(r'token:\s*"?([^"\s]+)"?', r'token: "[REDACTED]"'),
|
|
||||||
(r'\b\d{16}\b', '[CARD_REDACTED]'), # Credit cards
|
|
||||||
(r'\b\d{3}-\d{2}-\d{4}\b', '[SSN_REDACTED]'), # SSN
|
|
||||||
]
|
|
||||||
```
|
|
||||||
|
|
||||||
### 7.4 Audit Logging
|
|
||||||
|
|
||||||
```python
|
|
||||||
# Log format:
|
|
||||||
# timestamp | user | action | details | result
|
|
||||||
|
|
||||||
2026-01-12 14:23:45 | vladimir.levinas | query | model=qwen2.5-coder:32b | success
|
|
||||||
2026-01-12 14:23:46 | vladimir.levinas | mcp_k8s | method=get_pods | success
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 8. Развертывание
|
|
||||||
|
|
||||||
### 8.1 Installation (Ubuntu 22.04)
|
|
||||||
|
|
||||||
**Step 1: System Setup**
|
|
||||||
```bash
|
|
||||||
# Update system
|
|
||||||
apt update && apt upgrade -y
|
|
||||||
|
|
||||||
# Install NVIDIA drivers
|
|
||||||
apt install -y nvidia-driver-535
|
|
||||||
|
|
||||||
# Install Docker
|
|
||||||
curl -fsSL https://get.docker.com | sh
|
|
||||||
|
|
||||||
# Reboot
|
|
||||||
reboot
|
|
||||||
```
|
|
||||||
|
|
||||||
**Step 2: Install Ollama**
|
|
||||||
```bash
|
|
||||||
curl -fsSL https://ollama.com/install.sh | sh
|
|
||||||
systemctl enable ollama
|
|
||||||
systemctl start ollama
|
|
||||||
|
|
||||||
# Pull models
|
|
||||||
ollama pull qwen2.5-coder:32b
|
|
||||||
ollama pull deepseek-r1:32b
|
|
||||||
```
|
|
||||||
|
|
||||||
**Step 3: Deploy Infrastructure**
|
|
||||||
```bash
|
|
||||||
# Clone repo
|
|
||||||
git clone https://git.thedevops.dev/devops/ollama-infrastructure
|
|
||||||
cd ollama-infrastructure
|
|
||||||
|
|
||||||
# Configure
|
|
||||||
cp .env.example .env
|
|
||||||
# Edit .env with your settings
|
|
||||||
|
|
||||||
# Deploy
|
|
||||||
docker-compose up -d
|
|
||||||
|
|
||||||
# Initialize Vector DB
|
|
||||||
python3 scripts/init-vector-db.py
|
|
||||||
|
|
||||||
# Load initial data
|
|
||||||
python3 scripts/load-docs.py
|
|
||||||
```
|
|
||||||
|
|
||||||
### 8.2 Production Checklist
|
|
||||||
|
|
||||||
- [ ] Hardware протестирован
|
|
||||||
- [ ] GPU drivers работают (`nvidia-smi`)
|
|
||||||
- [ ] Ollama и модели загружены
|
|
||||||
- [ ] Docker containers запущены
|
|
||||||
- [ ] Vector DB инициализирован
|
|
||||||
- [ ] MCP services тестированы
|
|
||||||
- [ ] End-to-end тест пройден
|
|
||||||
- [ ] TLS сертификаты валидны
|
|
||||||
- [ ] LDAP authentication работает
|
|
||||||
- [ ] Rate limiting настроен
|
|
||||||
- [ ] Audit logging включен
|
|
||||||
- [ ] Backup настроен
|
|
||||||
- [ ] Monitoring настроен
|
|
||||||
- [ ] Team обучена
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 9. Мониторинг
|
|
||||||
|
|
||||||
### 9.1 Key Metrics
|
|
||||||
|
|
||||||
**GPU Metrics:**
|
|
||||||
```
|
|
||||||
nvidia_gpu_temperature_celsius
|
|
||||||
nvidia_gpu_utilization_percent
|
|
||||||
nvidia_gpu_memory_used_bytes
|
|
||||||
nvidia_gpu_power_usage_watts
|
|
||||||
```
|
|
||||||
|
|
||||||
**Ollama Metrics:**
|
|
||||||
```
|
|
||||||
ollama_requests_total
|
|
||||||
ollama_request_duration_seconds
|
|
||||||
ollama_tokens_per_second
|
|
||||||
```
|
|
||||||
|
|
||||||
**MCP Metrics:**
|
|
||||||
```
|
|
||||||
mcp_requests_total{service="gitea"}
|
|
||||||
mcp_request_duration_seconds
|
|
||||||
mcp_errors_total
|
|
||||||
```
|
|
||||||
|
|
||||||
### 9.2 Grafana Dashboards
|
|
||||||
|
|
||||||
**Dashboard 1: Ollama Overview**
|
|
||||||
- GPU utilization
|
|
||||||
- Request rate
|
|
||||||
- Response time
|
|
||||||
- Active users
|
|
||||||
|
|
||||||
**Dashboard 2: MCP Services**
|
|
||||||
- Request distribution by service
|
|
||||||
- Success/error rates
|
|
||||||
- Latency percentiles
|
|
||||||
|
|
||||||
**Dashboard 3: Vector DB**
|
|
||||||
- Collection sizes
|
|
||||||
- Query performance
|
|
||||||
- Cache hit rate
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 10. Бюджет
|
|
||||||
|
|
||||||
### 10.1 Hardware Costs
|
|
||||||
|
|
||||||
| Item | Specification | Cost |
|
|
||||||
|------|--------------|------|
|
|
||||||
| **GPU** | NVIDIA RTX 4090 24GB | $1,600-2,000 |
|
|
||||||
| **CPU** | AMD Ryzen 9 7950X | $500-600 |
|
|
||||||
| **RAM** | 128GB DDR5 ECC | $600-800 |
|
|
||||||
| **Storage** | 2x 2TB NVMe + 4TB SATA | $800-1,000 |
|
|
||||||
| **Motherboard** | High-end workstation | $400-500 |
|
|
||||||
| **PSU** | 1600W Titanium | $300-400 |
|
|
||||||
| **Case/Cooling** | Enterprise grade | $300-400 |
|
|
||||||
| **Network** | 2x 10GbE NIC | $200-300 |
|
|
||||||
| **TOTAL** | | **$12,000-15,000** |
|
|
||||||
|
|
||||||
### 10.2 Software Costs
|
|
||||||
|
|
||||||
| Item | Cost |
|
|
||||||
|------|------|
|
|
||||||
| OS (Ubuntu Server) | FREE |
|
|
||||||
| Ollama | FREE |
|
|
||||||
| Qdrant | FREE (open source) |
|
|
||||||
| All MCP services | FREE (self-developed) |
|
|
||||||
| Monitoring (Prometheus/Grafana) | FREE |
|
|
||||||
| **TOTAL** | **$0** |
|
|
||||||
|
|
||||||
### 10.3 Annual Operational Costs
|
|
||||||
|
|
||||||
| Item | Cost |
|
|
||||||
|------|------|
|
|
||||||
| Electricity (~500W 24/7) | $650/year |
|
|
||||||
| Cooling | $200/year |
|
|
||||||
| Maintenance | $500/year |
|
|
||||||
| Training/Documentation | $2,000/year |
|
|
||||||
| **TOTAL Annual OpEx** | **$3,350/year** |
|
|
||||||
|
|
||||||
### 10.4 ROI Analysis
|
|
||||||
|
|
||||||
**Total Initial Investment:** $12,000-15,000
|
|
||||||
|
|
||||||
**Annual Savings:**
|
|
||||||
```
|
|
||||||
Time savings for 10 engineers:
|
|
||||||
├─ 4 hours/week saved per person
|
|
||||||
├─ 40 hours/week total
|
|
||||||
├─ 2080 hours/year total
|
|
||||||
└─ At $100/hour = $208,000/year saved
|
|
||||||
|
|
||||||
Productivity increase:
|
|
||||||
├─ 30% faster troubleshooting
|
|
||||||
├─ 50% faster documentation
|
|
||||||
└─ Estimated value: $100,000/year
|
|
||||||
|
|
||||||
Total annual benefit: ~$308,000
|
|
||||||
```
|
|
||||||
|
|
||||||
**Payback Period:** ~1-2 months
|
|
||||||
**3-Year ROI:** 6000%
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Appendix A: Quick Reference
|
|
||||||
|
|
||||||
### Service URLs
|
|
||||||
```
|
|
||||||
API Gateway: https://ai.company.local
|
|
||||||
Ollama API: http://10.30.10.10:11434
|
|
||||||
Qdrant: http://10.30.10.30:6333
|
|
||||||
Grafana: https://monitoring.company.local
|
|
||||||
```
|
|
||||||
|
|
||||||
### Common Commands
|
|
||||||
```bash
|
|
||||||
# Check Ollama status
|
|
||||||
ollama list
|
|
||||||
|
|
||||||
# Run model test
|
|
||||||
ollama run qwen2.5-coder:32b "Hello"
|
|
||||||
|
|
||||||
# Check GPU
|
|
||||||
nvidia-smi
|
|
||||||
|
|
||||||
# View logs
|
|
||||||
docker-compose logs -f ollama
|
|
||||||
|
|
||||||
# Backup Vector DB
|
|
||||||
docker exec qdrant tar -czf /backup/qdrant-$(date +%Y%m%d).tar.gz /qdrant/storage
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Document Version:** 2.0
|
|
||||||
**Last Updated:** Январь 2026
|
|
||||||
**Status:** Production Ready
|
|
||||||
|
|
||||||
**Approvals:**
|
|
||||||
- [ ] Infrastructure Lead
|
|
||||||
- [ ] Security Lead
|
|
||||||
- [ ] DevOps Lead
|
|
||||||
- [ ] Financial Approval
|
|
||||||
Reference in New Issue
Block a user