diff --git a/docs/gitops-cicd/11-ollama-comprehensive-enterprise-guide.md b/docs/gitops-cicd/11-ollama-comprehensive-enterprise-guide.md index 60c74d7..1abc834 100644 --- a/docs/gitops-cicd/11-ollama-comprehensive-enterprise-guide.md +++ b/docs/gitops-cicd/11-ollama-comprehensive-enterprise-guide.md @@ -182,7 +182,7 @@ Self-hosted AI-инфраструктура на базе Ollama с интегр ### Уровень 1: User Access Layer -**Веб-интерфейс** на базе Gradio предоставляет удобный браузерный доступ без установки дополнительного ПО. Это основной способ взаимодействия для большинства пользователей. +**Веб-интерфейс** на базе Open WebUI предоставляет удобный браузерный доступ без установки дополнительного ПО. Это основной способ взаимодействия для большинства пользователей. **VS Code Extension** интегрирует AI-ассистента непосредственно в процесс разработки. Разработчик может задавать вопросы о коде, генерировать тесты, получать объяснения, не покидая IDE. @@ -237,7 +237,7 @@ Embedding Service использует модель bge-large-en-v1.5 для с | **Network** | 2x 10 Gbps (bonded) | High throughput для MCP data retrieval | | **PSU** | 1600W 80+ Titanium | GPU power requirements | -**Ориентировочная стоимость:** $12,000-15,000 + ### Выбор GPU по сценарию использования @@ -261,29 +261,7 @@ Embedding Service использует модель bge-large-en-v1.5 для с *с частичным offloading в RAM -### Распределение системной памяти (128 GB) -``` -16 GB → Операционная система Ubuntu Server -8 GB → Ollama service -32 GB → Vector Database Qdrant -16 GB → MCP Services -8 GB → Embedding service -8 GB → API Gateway + мониторинг -40 GB → Model offloading buffer -``` - -### Распределение хранилища (2 TB NVMe) - -``` -300 GB → AI Models -500 GB → Vector Database -200 GB → MCP Services cache -100 GB → OS и приложения -900 GB → Резерв для роста -``` - ---- ## Выбор и оптимизация AI-моделей @@ -604,33 +582,6 @@ Effective AI-ассистент строит каждое взаимодейст **Relevance-based selection** - вместо отбора по времени, анализируется relevance каждого сообщения к текущему запросу через embedding similarity. -### Persistent storage - -PostgreSQL хранит conversation data: -- **sessions** table: ID, user_id, created_at, updated_at, title, status -- **messages** table: session_id, role, content, created_at, model_used, token_count -- JSONB columns для semi-structured metadata - -**Indexes:** -- (user_id, updated_at) для listing недавних сессий -- (session_id, created_at) для получения истории - -**Partitioning:** Monthly partitions поддерживают performance при росте данных. - -### Конфиденциальность и retention - -**Encryption:** -- At rest: Database или filesystem-level encryption -- In transit: TLS для всех коммуникаций - -**Access controls:** -- Пользователи видят только свои диалоги -- RBAC для managers с audit trail - -**Retention policies:** -- Automated cleanup согласно policy -- User right to deletion -- Anonymization для analytics ### Search и navigation @@ -652,678 +603,7 @@ PostgreSQL хранит conversation data: **Sharing links** - read-only URL с expiration time и access controls. -### Analytics -**Usage metrics:** -- Активные пользователи per day -- Количество сессий -- Среднее messages per session -- Peak usage times - -**Query patterns:** -- Common question types -- Frequently discussed topics -- Typical workflows - -**User satisfaction:** -- Explicit ratings -- Implicit signals (conversation length, corrections) - -### Таблица session management - -| Параметр | Значение | Обоснование | -|----------|----------|-------------| -| Max messages в window | 40 | Баланс context/performance | -| Trigger для summarization | 30 messages | До исчерпания window | -| Compression ratio | 5:1 | 5 messages → 1 summary | -| Max session idle time | 24 часа | Auto-close неактивных | -| Max concurrent sessions | 10/user | Предотвращение abuse | - -### Таблица retention policy - -| Тип данных | Retention | Действие | Access | -|------------|-----------|----------|--------| -| Active sessions | Indefinite | N/A | User only | -| Inactive (<30d) | Indefinite | N/A | User only | -| Old (30-90d) | Summarized | Messages→summary | User only | -| Very old (>90d) | Archived | Cold storage | Read-only | -| Marked deletion | 30d grace | Permanent delete | User during grace | - ---- - -## Стратегия хранения данных - -### Многоуровневая архитектура - -Эффективная AI-инфраструктура требует sophisticated подхода к хранению различных типов данных с различными характеристиками и требованиями. - -### Hot Storage: NVMe SSD RAID - -**Primary tier** обеспечивает высокую производительность для frequently accessed данных. - -**Содержимое:** -- AI models (300 GB) - fast loading критичен для UX -- Vector DB indices (200 GB) - intensive I/O для каждого query -- Recent conversations (100 GB) - frequent access - -**Характеристики:** -- NVMe интерфейс: несколько GB/sec throughput -- Latency: <100 microseconds -- RAID 1: fault tolerance без downtime - -### Warm Storage: SATA SSD - -**Secondary tier** предоставляет больший объем за меньшую цену. - -**Содержимое:** -- Vector DB payload (300 GB) -- Source documents (200 GB) -- Older conversations (200 GB) -- Daily backups (1 TB) - -**Характеристики:** -- SATA интерфейс: достаточная скорость -- Cost-effective для large volumes -- Acceptable latency для less frequent access - -### Cold Storage: Object Storage - -**Tertiary tier** для archival data и compliance. - -**Содержимое:** -- Very old sessions (500 GB) -- Weekly backups (500 GB) -- Long-term analytics (variable) - -**Характеристики:** -- S3-compatible storage -- Dramatically lower cost -- Retrieval latency в секундах - -### Lifecycle Management - -**Automated policies:** -- Hot→Warm после месяца inactivity -- Warm→Cold после трех месяцев -- Deletion согласно retention policy -- Compression older data - -### Backup Strategy - -**Continuous WAL archiving** в PostgreSQL для point-in-time recovery. - -**Daily full backups:** -- Qdrant snapshots -- PostgreSQL dumps -- На warm и cold tiers - -**Weekly full backups:** -- AI models (rarely change) -- Configuration -- На cold tier - -**Testing:** Automated restoration tests в test environment. - -### Таблица Storage Tier Allocation - -| Данные | Volume | Tier | Access pattern | Latency | Retention | -|--------|--------|------|----------------|---------|-----------| -| AI models | 300 GB | Hot | На load | <1s | Indefinite | -| Vector indices | 200 GB | Hot | На query | <100ms | Indefinite | -| Vector payload | 300 GB | Warm | На retrieval | <500ms | Indefinite | -| Recent sessions | 100 GB | Hot | Very frequent | <50ms | Indefinite | -| Old sessions | 200 GB | Warm | Occasional | <1s | До deletion | -| Archived | 500 GB | Cold | Rare | <10s | До deletion | -| Source docs | 200 GB | Warm | На reindex | <2s | Indefinite | - -### Таблица Backup Strategy - -| Тип | Frequency | Retention | Location | RTO | RPO | -|-----|-----------|-----------|----------|-----|-----| -| PostgreSQL WAL | Continuous | 7d | Object | 1h | 5min | -| PostgreSQL full | Daily | 30d | Warm+Cold | 2h | 24h | -| Qdrant snapshot | Daily | 30d | Warm | 3h | 24h | -| Qdrant snapshot | Weekly | 90d | Cold | 6h | 7d | -| AI models | Weekly | Indefinite | Cold | 1h | 7d | -| Configuration | On change | Indefinite | Git | 30min | Last commit | - ---- - -## Безопасность и Compliance - -### Network Isolation - -**Firewall rules** implement least privilege: - -**Inbound:** -- 443 (HTTPS) из Corporate VPN -- 11434 (Ollama) только с MCP Orchestrator -- 6333 (Qdrant) только с Ollama server - -**Outbound:** -- 3000 (Gitea API) -- 2377 (Docker Swarm API) -- 6443 (Kubernetes API) -- 3100 (Loki API) -- Default: DENY ALL - -**IDS/IPS** мониторит traffic для suspicious patterns, используя ML-based anomaly detection. - -### Authentication и Authorization - -**LDAP integration** для enterprises: -- Аутентификация с corporate credentials -- Group membership определяет access levels -- Centralized password management - -**OIDC** для modern cloud-native auth: -- Integration с Okta, Auth0, Azure AD -- SSO capabilities -- MFA support - -**RBAC (Role-Based Access Control):** -- **devops role**: query:*, mcp:*:read -- **developer role**: query:code, mcp:gitea:read -- **viewer role**: query:docs - -### Secrets Masking - -**Automated patterns:** -``` -password:\s*"?([^"\s]+)"? → password: "[REDACTED]" -token:\s*"?([^"\s]+)"? → token: "[REDACTED]" -\b\d{16}\b → [CARD_REDACTED] -\b\d{3}-\d{2}-\d{4}\b → [SSN_REDACTED] -``` - -**Application в:** -- MCP server responses -- Логах системы -- Conversation histories -- Export files - -### Audit Logging - -**Все операции логируются:** -``` -Timestamp | User | Action | Details | Result -2026-01-12 14:23:45 | user@company.com | query | model=qwen2.5-coder | success -2026-01-12 14:23:46 | user@company.com | mcp_k8s | get_pods | success -``` - -**Retention:** 1 год для compliance. - -**Analysis:** Регулярный review для suspicious patterns. - -### Data Protection - -**Encryption at rest:** -- Database encryption (PostgreSQL TDE) -- Filesystem encryption (LUKS) -- Vector DB encryption - -**Encryption in transit:** -- TLS 1.3 для всех connections -- Certificate management через Let's Encrypt или internal CA - -**DLP (Data Loss Prevention):** -- Content inspection на egress -- Block передачи sensitive patterns -- Alert на suspicious exports - -### Compliance - -**PCI DSS:** Данные не покидают secured network. - -**GDPR:** -- Right to deletion implemented -- Data minimization principles -- Consent management -- Data portability через exports - -**SOC 2:** -- Comprehensive audit trails -- Access controls documented -- Regular security reviews -- Incident response procedures - -### Security Monitoring - -**Metrics tracked:** -- Failed authentication attempts -- Unusual access patterns -- MCP server errors -- Rate limit hits -- Secrets exposure attempts - -**Alerting:** -- Slack integration для security team -- PagerDuty для critical alerts -- Email для regular notifications - -### Таблица Security Controls - -| Контроль | Тип | Уровень | Мониторинг | -|----------|-----|---------|------------| -| Network firewall | Preventive | Infrastructure | 24/7 | -| TLS encryption | Preventive | Transport | Certificate monitoring | -| LDAP auth | Detective | Application | Login success rate | -| RBAC | Preventive | Application | Access patterns | -| Secrets masking | Preventive | Application | Exposure attempts | -| Audit logging | Detective | All layers | Log analysis | -| IDS/IPS | Detective/Preventive | Network | Alert monitoring | -| Backup encryption | Preventive | Storage | Backup verification | - ---- - -## Мониторинг и Observability - -### Key Metrics - -**GPU Metrics:** -- nvidia_gpu_temperature_celsius -- nvidia_gpu_utilization_percent -- nvidia_gpu_memory_used_bytes -- nvidia_gpu_power_usage_watts - -**Ollama Metrics:** -- ollama_requests_total -- ollama_request_duration_seconds -- ollama_tokens_per_second -- ollama_active_models - -**MCP Metrics:** -- mcp_requests_total{service="gitea"} -- mcp_request_duration_seconds -- mcp_errors_total -- mcp_cache_hit_ratio - -**RAG Metrics:** -- qdrant_collection_size -- qdrant_query_duration_seconds -- embedding_generation_duration -- reranking_duration - -**Storage Metrics:** -- disk_usage_percent{tier="hot"} -- disk_iops{tier="hot"} -- disk_throughput_bytes -- backup_last_success_timestamp - -### Grafana Dashboards - -**Dashboard 1: Ollama Overview** -- GPU utilization timeline -- Request rate по моделям -- Response time percentiles (p50, p95, p99) -- Active users count -- Token generation rate - -**Dashboard 2: MCP Services** -- Request distribution pie chart -- Success/error rates по сервисам -- Latency heatmap -- Cache hit rates -- Top users by requests - -**Dashboard 3: Vector DB** -- Collection sizes growth -- Query performance trends -- Cache effectiveness -- Index rebuild status - -**Dashboard 4: User Experience** -- Average response time -- User satisfaction ratings -- Session duration distribution -- Popular query types -- Error rate по типам - -**Dashboard 5: Infrastructure Health** -- CPU/RAM utilization -- Disk I/O patterns -- Network throughput -- Temperature monitoring -- Power consumption - -### Alerting Strategy - -**Critical Alerts (PagerDuty):** -- Ollama service down -- GPU temperature >85°C -- Disk usage >90% -- Authentication system unavailable -- Backup failed - -**Warning Alerts (Slack):** -- High error rate (>5%) -- Slow response times (p95 >10s) -- GPU utilization consistently >95% -- MCP service degraded -- Cache miss rate >50% - -**Info Alerts (Email):** -- Scheduled maintenance reminders -- Usage statistics weekly digest -- Capacity planning recommendations - -### Logging Strategy - -**Structured logging** JSON format для всех компонентов: -```json -{ - "timestamp": "2026-01-12T14:23:45Z", - "level": "INFO", - "service": "ollama", - "message": "Model loaded", - "model": "qwen2.5-coder:32b", - "load_time_ms": 2341 -} -``` - -**Log aggregation** через Loki: -- Central collection -- Retention: 30 days hot, 90 days warm -- Full-text search capability -- Correlation with metrics - -**Log levels:** -- ERROR: Failures requiring attention -- WARN: Degraded performance -- INFO: Normal operations -- DEBUG: Detailed troubleshooting (disabled in production) - -### Distributed Tracing - -OpenTelemetry для end-to-end request tracing: -- User request → API Gateway -- Gateway → Ollama -- Ollama → MCP services -- MCP → Backend systems -- RAG → Vector DB - -Jaeger UI для visualizing traces, identifying bottlenecks. - -### Health Checks - -**Liveness probes:** -- Ollama /health endpoint -- Qdrant readiness -- PostgreSQL connectivity -- MCP services status - -**Readiness probes:** -- Models loaded -- Indices ready -- Database connections available - -**Периодичность:** Every 30 seconds. - -### Capacity Planning - -**Trend analysis:** -- Usage growth rate -- Storage consumption trends -- Peak load patterns -- Resource saturation points - -**Forecasting:** -- When additional GPU needed -- Storage expansion timeline -- Network bandwidth requirements -- Team growth accommodation - -### Таблица мониторинга - -| Компонент | Метрика | Threshold Warning | Threshold Critical | Action | -|-----------|---------|-------------------|-------------------|--------| -| GPU | Temperature | >75°C | >85°C | Check cooling | -| GPU | Utilization | >85% | >95% | Consider scaling | -| GPU | Memory | >20GB | >23GB | Model optimization | -| Storage | Disk usage | >75% | >90% | Cleanup/expansion | -| Storage | IOPS | >80% max | >95% max | Storage upgrade | -| API | Error rate | >2% | >5% | Investigate logs | -| API | Latency p95 | >5s | >10s | Performance tuning | -| RAG | Query time | >1s | >2s | Index optimization | - ---- - -## Экономическое обоснование - -### Капитальные затраты (CapEx) - -| Компонент | Стоимость | -|-----------|-----------| -| GPU (RTX 4090 24GB) | $1,600-2,000 | -| CPU (Ryzen 9 7950X) | $500-600 | -| RAM (128GB DDR5 ECC) | $600-800 | -| Storage (NVMe + SATA) | $800-1,000 | -| Motherboard (High-end) | $400-500 | -| PSU (1600W Titanium) | $300-400 | -| Case/Cooling | $300-400 | -| Network (2x 10GbE) | $200-300 | -| **TOTAL CapEx** | **$12,000-15,000** | - -### Операционные затраты (OpEx) годовые - -| Статья | Стоимость | -|--------|-----------| -| Электричество (~500W 24/7) | $650/год | -| Охлаждение | $200/год | -| Maintenance | $500/год | -| Training/Documentation | $2,000/год | -| **TOTAL OpEx** | **$3,350/год** | - -### Софт (бесплатно) - -Все программные компоненты open source: -- Ubuntu Server: FREE -- Ollama: FREE -- Qdrant: FREE -- PostgreSQL: FREE -- Все MCP services: FREE (self-developed) -- Prometheus/Grafana: FREE - -### ROI Analysis - -**Экономия времени команды 10 инженеров:** - -| Активность | Сэкономлено | Часов/год | Ценность ($100/час) | -|------------|-------------|-----------|---------------------| -| Поиск информации | 40% | 832 часов | $83,200 | -| Написание документации | 50% | 520 часов | $52,000 | -| Troubleshooting | 30% | 624 часов | $62,400 | -| Code review | 20% | 208 часов | $20,800 | -| **TOTAL** | | **2,184 часов** | **$218,400/год** | - -**ROI расчет:** -``` -Total Investment: $15,000 (CapEx) + $3,350 (OpEx год 1) = $18,350 -Annual Benefit: $218,400 -Payback Period: 18,350 / 218,400 = 0.08 года = 1 месяц -3-Year ROI: (3 × $218,400 - $18,350 - 2 × $3,350) / $18,350 = 3,458% -``` - -### Сравнение с облачными AI API - -**OpenAI GPT-4 pricing:** -- Prompt: $0.03 per 1K tokens -- Completion: $0.06 per 1K tokens - -**Типичный query:** -- 2K tokens prompt (context + question) -- 1K tokens completion -- Cost per query: $0.12 - -**Monthly cost для 10 пользователей:** -- 50 queries/day per user = 500 queries/day -- 500 × 30 days = 15,000 queries/month -- 15,000 × $0.12 = $1,800/month = $21,600/year - -**Self-hosted advantages:** -- Lower cost after year 1 -- Complete data control -- No API rate limits -- Customizable models -- No vendor lock-in - -### Таблица TCO (Total Cost of Ownership) 3 года - -| Год | CapEx | OpEx | Total Annual | Cumulative | Cloud Alternative | -|-----|-------|------|--------------|------------|-------------------| -| 1 | $15,000 | $3,350 | $18,350 | $18,350 | $21,600 | -| 2 | $0 | $3,350 | $3,350 | $21,700 | $43,200 | -| 3 | $0 | $3,350 | $3,350 | $25,050 | $64,800 | -| **Savings** | | | | | **$39,750** | - ---- - -## Deployment Roadmap - -### Phase 1: Foundation (Weeks 1-2) - -**Infrastructure setup:** -- Server assembly и OS installation -- Network configuration -- GPU drivers installation -- Docker setup - -**Deliverables:** -- Working server с GPU functional -- Network connectivity verified -- Monitoring baseline established - -### Phase 2: Core Services (Weeks 3-4) - -**AI infrastructure:** -- Ollama installation -- Models download и testing -- Basic API Gateway setup - -**Deliverables:** -- Models responding to queries -- Simple web interface functional -- Performance benchmarks completed - -### Phase 3: MCP Integration (Weeks 5-6) - -**MCP services deployment:** -- Gitea MCP server -- Docker Swarm MCP server -- Kubernetes MCP server (if applicable) - -**Deliverables:** -- Models accessing corporate systems -- Read-only access verified -- Security controls tested - -### Phase 4: RAG Implementation (Weeks 7-8) - -**Knowledge base setup:** -- Qdrant deployment -- Embedding service -- Initial document indexing - -**Deliverables:** -- Vector DB operational -- Initial corpus indexed -- Search quality validated - -### Phase 5: Production Readiness (Weeks 9-10) - -**Finalization:** -- Authentication integration -- Monitoring dashboards -- Backup automation -- Documentation - -**Deliverables:** -- Production-ready system -- Team training completed -- Operational runbooks -- Go-live approval - -### Phase 6: Rollout (Week 11-12) - -**Gradual adoption:** -- Pilot group (2-3 users) -- Feedback collection -- Issue resolution -- Full team rollout - ---- - -## Operational Excellence - -### Daily Operations - -**Health checks:** -- Morning review dashboards -- Check overnight alerts -- Verify backup success -- Monitor disk usage - -**User support:** -- Answer questions in Slack -- Collect feedback -- Document common issues - -### Weekly Tasks - -**Performance review:** -- Analyze usage trends -- Review slow queries -- Check error patterns -- Optimize as needed - -**Content updates:** -- Reindex modified documents -- Update code snippets -- Refresh runbooks - -**Capacity planning:** -- Review storage trends -- Analyze GPU utilization -- Forecast growth - -### Monthly Tasks - -**Security review:** -- Audit logs analysis -- Access patterns review -- Update firewall rules -- Vulnerability scanning - -**System maintenance:** -- OS updates -- Driver updates -- Dependency updates -- Performance tuning - -**Reporting:** -- Usage statistics -- ROI tracking -- User satisfaction -- Improvement recommendations - -### Quarterly Tasks - -**Major upgrades:** -- Model updates -- Infrastructure upgrades -- Feature additions - -**Strategy review:** -- Roadmap adjustment -- Budget review -- Team expansion planning - -**Training:** -- Advanced features training -- New team members onboarding -- Best practices sharing - ---- ## Best Practices @@ -1366,103 +646,6 @@ Payback Period: 18,350 / 218,400 = 0.08 года = 1 месяц 4. **Test backups** regularly 5. **Plan for growth** from day one ---- - -## Troubleshooting Guide - -### GPU Issues - -**Symptom:** Model loading fails -**Causes:** -- Insufficient VRAM -- Driver issues -- Cooling problems - -**Resolution:** -1. Check nvidia-smi output -2. Verify model size vs VRAM -3. Update drivers if needed -4. Check temperatures - -**Symptom:** Slow inference -**Causes:** -- GPU throttling due to heat -- CPU bottleneck -- Insufficient RAM - -**Resolution:** -1. Monitor GPU temperature -2. Check cooling system -3. Verify CPU usage -4. Check RAM availability - -### MCP Service Issues - -**Symptom:** MCP timeouts -**Causes:** -- Backend system slow/down -- Network issues -- Rate limiting - -**Resolution:** -1. Check backend system health -2. Verify network connectivity -3. Review rate limit settings -4. Check MCP logs - -**Symptom:** Incorrect data returned -**Causes:** -- Cache staleness -- Backend API changes -- Parsing errors - -**Resolution:** -1. Clear MCP cache -2. Verify backend API format -3. Check MCP server logs -4. Update parsers if needed - -### RAG Issues - -**Symptom:** Poor search quality -**Causes:** -- Outdated index -- Poor chunk strategy -- Embedding model issues - -**Resolution:** -1. Trigger reindexing -2. Review chunk configuration -3. Test embedding service -4. Analyze user feedback - -**Symptom:** Slow searches -**Causes:** -- Index size too large -- Insufficient resources -- Network latency - -**Resolution:** -1. Optimize index parameters -2. Add more RAM/storage -3. Check Qdrant configuration -4. Review network latency - -### Storage Issues - -**Symptom:** Disk full -**Causes:** -- Uncontrolled growth -- Failed cleanup jobs -- Backup accumulation - -**Resolution:** -1. Run cleanup scripts -2. Archive old data -3. Verify retention policies -4. Plan capacity expansion - ---- ## Заключение @@ -1480,29 +663,4 @@ Self-hosted AI-инфраструктура на базе Ollama с интегр **История для контекста**. Persistent storage и intelligent management истории диалогов критичны для user experience и continuous improvement системы. -### Путь вперед -Развертывание такой инфраструктуры - не одноразовый проект, а начало journey continuous improvement. Система будет evolve вместе с: -- Появлением новых, более мощных моделей -- Расширением интеграций с корпоративными системами -- Ростом knowledge base -- Увеличением команды пользователей -- Развитием best practices - -### Следующие шаги - -1. **Оценка готовности** вашей организации к внедрению -2. **Планирование бюджета** и получение approvals -3. **Формирование команды** для deployment и support -4. **Pilot deployment** с small group пользователей -5. **Iterative improvement** на основе feedback -6. **Gradual rollout** ко всей команде - -С правильной стратегией, инвестициями и commitment, self-hosted AI-инфраструктура становится мощным enabler productivity, качества работы и innovation в вашей организации. - ---- - -**Версия документа:** 1.0 -**Дата:** Январь 2026 -**Автор:** Based on infrastructure requirements для k3s-gitops -**Статус:** Comprehensive Guide \ No newline at end of file