Update docs/gitops-cicd/11-ollama-comprehensive-enterprise-guide.md
This commit is contained in:
@@ -182,7 +182,7 @@ Self-hosted AI-инфраструктура на базе Ollama с интегр
|
|||||||
|
|
||||||
### Уровень 1: User Access Layer
|
### Уровень 1: User Access Layer
|
||||||
|
|
||||||
**Веб-интерфейс** на базе Gradio предоставляет удобный браузерный доступ без установки дополнительного ПО. Это основной способ взаимодействия для большинства пользователей.
|
**Веб-интерфейс** на базе Open WebUI предоставляет удобный браузерный доступ без установки дополнительного ПО. Это основной способ взаимодействия для большинства пользователей.
|
||||||
|
|
||||||
**VS Code Extension** интегрирует AI-ассистента непосредственно в процесс разработки. Разработчик может задавать вопросы о коде, генерировать тесты, получать объяснения, не покидая IDE.
|
**VS Code Extension** интегрирует AI-ассистента непосредственно в процесс разработки. Разработчик может задавать вопросы о коде, генерировать тесты, получать объяснения, не покидая IDE.
|
||||||
|
|
||||||
@@ -237,7 +237,7 @@ Embedding Service использует модель bge-large-en-v1.5 для с
|
|||||||
| **Network** | 2x 10 Gbps (bonded) | High throughput для MCP data retrieval |
|
| **Network** | 2x 10 Gbps (bonded) | High throughput для MCP data retrieval |
|
||||||
| **PSU** | 1600W 80+ Titanium | GPU power requirements |
|
| **PSU** | 1600W 80+ Titanium | GPU power requirements |
|
||||||
|
|
||||||
**Ориентировочная стоимость:** $12,000-15,000
|
|
||||||
|
|
||||||
### Выбор GPU по сценарию использования
|
### Выбор GPU по сценарию использования
|
||||||
|
|
||||||
@@ -261,29 +261,7 @@ Embedding Service использует модель bge-large-en-v1.5 для с
|
|||||||
|
|
||||||
*с частичным offloading в RAM
|
*с частичным offloading в RAM
|
||||||
|
|
||||||
### Распределение системной памяти (128 GB)
|
|
||||||
|
|
||||||
```
|
|
||||||
16 GB → Операционная система Ubuntu Server
|
|
||||||
8 GB → Ollama service
|
|
||||||
32 GB → Vector Database Qdrant
|
|
||||||
16 GB → MCP Services
|
|
||||||
8 GB → Embedding service
|
|
||||||
8 GB → API Gateway + мониторинг
|
|
||||||
40 GB → Model offloading buffer
|
|
||||||
```
|
|
||||||
|
|
||||||
### Распределение хранилища (2 TB NVMe)
|
|
||||||
|
|
||||||
```
|
|
||||||
300 GB → AI Models
|
|
||||||
500 GB → Vector Database
|
|
||||||
200 GB → MCP Services cache
|
|
||||||
100 GB → OS и приложения
|
|
||||||
900 GB → Резерв для роста
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Выбор и оптимизация AI-моделей
|
## Выбор и оптимизация AI-моделей
|
||||||
|
|
||||||
@@ -604,33 +582,6 @@ Effective AI-ассистент строит каждое взаимодейст
|
|||||||
|
|
||||||
**Relevance-based selection** - вместо отбора по времени, анализируется relevance каждого сообщения к текущему запросу через embedding similarity.
|
**Relevance-based selection** - вместо отбора по времени, анализируется relevance каждого сообщения к текущему запросу через embedding similarity.
|
||||||
|
|
||||||
### Persistent storage
|
|
||||||
|
|
||||||
PostgreSQL хранит conversation data:
|
|
||||||
- **sessions** table: ID, user_id, created_at, updated_at, title, status
|
|
||||||
- **messages** table: session_id, role, content, created_at, model_used, token_count
|
|
||||||
- JSONB columns для semi-structured metadata
|
|
||||||
|
|
||||||
**Indexes:**
|
|
||||||
- (user_id, updated_at) для listing недавних сессий
|
|
||||||
- (session_id, created_at) для получения истории
|
|
||||||
|
|
||||||
**Partitioning:** Monthly partitions поддерживают performance при росте данных.
|
|
||||||
|
|
||||||
### Конфиденциальность и retention
|
|
||||||
|
|
||||||
**Encryption:**
|
|
||||||
- At rest: Database или filesystem-level encryption
|
|
||||||
- In transit: TLS для всех коммуникаций
|
|
||||||
|
|
||||||
**Access controls:**
|
|
||||||
- Пользователи видят только свои диалоги
|
|
||||||
- RBAC для managers с audit trail
|
|
||||||
|
|
||||||
**Retention policies:**
|
|
||||||
- Automated cleanup согласно policy
|
|
||||||
- User right to deletion
|
|
||||||
- Anonymization для analytics
|
|
||||||
|
|
||||||
### Search и navigation
|
### Search и navigation
|
||||||
|
|
||||||
@@ -652,678 +603,7 @@ PostgreSQL хранит conversation data:
|
|||||||
|
|
||||||
**Sharing links** - read-only URL с expiration time и access controls.
|
**Sharing links** - read-only URL с expiration time и access controls.
|
||||||
|
|
||||||
### Analytics
|
|
||||||
|
|
||||||
**Usage metrics:**
|
|
||||||
- Активные пользователи per day
|
|
||||||
- Количество сессий
|
|
||||||
- Среднее messages per session
|
|
||||||
- Peak usage times
|
|
||||||
|
|
||||||
**Query patterns:**
|
|
||||||
- Common question types
|
|
||||||
- Frequently discussed topics
|
|
||||||
- Typical workflows
|
|
||||||
|
|
||||||
**User satisfaction:**
|
|
||||||
- Explicit ratings
|
|
||||||
- Implicit signals (conversation length, corrections)
|
|
||||||
|
|
||||||
### Таблица session management
|
|
||||||
|
|
||||||
| Параметр | Значение | Обоснование |
|
|
||||||
|----------|----------|-------------|
|
|
||||||
| Max messages в window | 40 | Баланс context/performance |
|
|
||||||
| Trigger для summarization | 30 messages | До исчерпания window |
|
|
||||||
| Compression ratio | 5:1 | 5 messages → 1 summary |
|
|
||||||
| Max session idle time | 24 часа | Auto-close неактивных |
|
|
||||||
| Max concurrent sessions | 10/user | Предотвращение abuse |
|
|
||||||
|
|
||||||
### Таблица retention policy
|
|
||||||
|
|
||||||
| Тип данных | Retention | Действие | Access |
|
|
||||||
|------------|-----------|----------|--------|
|
|
||||||
| Active sessions | Indefinite | N/A | User only |
|
|
||||||
| Inactive (<30d) | Indefinite | N/A | User only |
|
|
||||||
| Old (30-90d) | Summarized | Messages→summary | User only |
|
|
||||||
| Very old (>90d) | Archived | Cold storage | Read-only |
|
|
||||||
| Marked deletion | 30d grace | Permanent delete | User during grace |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Стратегия хранения данных
|
|
||||||
|
|
||||||
### Многоуровневая архитектура
|
|
||||||
|
|
||||||
Эффективная AI-инфраструктура требует sophisticated подхода к хранению различных типов данных с различными характеристиками и требованиями.
|
|
||||||
|
|
||||||
### Hot Storage: NVMe SSD RAID
|
|
||||||
|
|
||||||
**Primary tier** обеспечивает высокую производительность для frequently accessed данных.
|
|
||||||
|
|
||||||
**Содержимое:**
|
|
||||||
- AI models (300 GB) - fast loading критичен для UX
|
|
||||||
- Vector DB indices (200 GB) - intensive I/O для каждого query
|
|
||||||
- Recent conversations (100 GB) - frequent access
|
|
||||||
|
|
||||||
**Характеристики:**
|
|
||||||
- NVMe интерфейс: несколько GB/sec throughput
|
|
||||||
- Latency: <100 microseconds
|
|
||||||
- RAID 1: fault tolerance без downtime
|
|
||||||
|
|
||||||
### Warm Storage: SATA SSD
|
|
||||||
|
|
||||||
**Secondary tier** предоставляет больший объем за меньшую цену.
|
|
||||||
|
|
||||||
**Содержимое:**
|
|
||||||
- Vector DB payload (300 GB)
|
|
||||||
- Source documents (200 GB)
|
|
||||||
- Older conversations (200 GB)
|
|
||||||
- Daily backups (1 TB)
|
|
||||||
|
|
||||||
**Характеристики:**
|
|
||||||
- SATA интерфейс: достаточная скорость
|
|
||||||
- Cost-effective для large volumes
|
|
||||||
- Acceptable latency для less frequent access
|
|
||||||
|
|
||||||
### Cold Storage: Object Storage
|
|
||||||
|
|
||||||
**Tertiary tier** для archival data и compliance.
|
|
||||||
|
|
||||||
**Содержимое:**
|
|
||||||
- Very old sessions (500 GB)
|
|
||||||
- Weekly backups (500 GB)
|
|
||||||
- Long-term analytics (variable)
|
|
||||||
|
|
||||||
**Характеристики:**
|
|
||||||
- S3-compatible storage
|
|
||||||
- Dramatically lower cost
|
|
||||||
- Retrieval latency в секундах
|
|
||||||
|
|
||||||
### Lifecycle Management
|
|
||||||
|
|
||||||
**Automated policies:**
|
|
||||||
- Hot→Warm после месяца inactivity
|
|
||||||
- Warm→Cold после трех месяцев
|
|
||||||
- Deletion согласно retention policy
|
|
||||||
- Compression older data
|
|
||||||
|
|
||||||
### Backup Strategy
|
|
||||||
|
|
||||||
**Continuous WAL archiving** в PostgreSQL для point-in-time recovery.
|
|
||||||
|
|
||||||
**Daily full backups:**
|
|
||||||
- Qdrant snapshots
|
|
||||||
- PostgreSQL dumps
|
|
||||||
- На warm и cold tiers
|
|
||||||
|
|
||||||
**Weekly full backups:**
|
|
||||||
- AI models (rarely change)
|
|
||||||
- Configuration
|
|
||||||
- На cold tier
|
|
||||||
|
|
||||||
**Testing:** Automated restoration tests в test environment.
|
|
||||||
|
|
||||||
### Таблица Storage Tier Allocation
|
|
||||||
|
|
||||||
| Данные | Volume | Tier | Access pattern | Latency | Retention |
|
|
||||||
|--------|--------|------|----------------|---------|-----------|
|
|
||||||
| AI models | 300 GB | Hot | На load | <1s | Indefinite |
|
|
||||||
| Vector indices | 200 GB | Hot | На query | <100ms | Indefinite |
|
|
||||||
| Vector payload | 300 GB | Warm | На retrieval | <500ms | Indefinite |
|
|
||||||
| Recent sessions | 100 GB | Hot | Very frequent | <50ms | Indefinite |
|
|
||||||
| Old sessions | 200 GB | Warm | Occasional | <1s | До deletion |
|
|
||||||
| Archived | 500 GB | Cold | Rare | <10s | До deletion |
|
|
||||||
| Source docs | 200 GB | Warm | На reindex | <2s | Indefinite |
|
|
||||||
|
|
||||||
### Таблица Backup Strategy
|
|
||||||
|
|
||||||
| Тип | Frequency | Retention | Location | RTO | RPO |
|
|
||||||
|-----|-----------|-----------|----------|-----|-----|
|
|
||||||
| PostgreSQL WAL | Continuous | 7d | Object | 1h | 5min |
|
|
||||||
| PostgreSQL full | Daily | 30d | Warm+Cold | 2h | 24h |
|
|
||||||
| Qdrant snapshot | Daily | 30d | Warm | 3h | 24h |
|
|
||||||
| Qdrant snapshot | Weekly | 90d | Cold | 6h | 7d |
|
|
||||||
| AI models | Weekly | Indefinite | Cold | 1h | 7d |
|
|
||||||
| Configuration | On change | Indefinite | Git | 30min | Last commit |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Безопасность и Compliance
|
|
||||||
|
|
||||||
### Network Isolation
|
|
||||||
|
|
||||||
**Firewall rules** implement least privilege:
|
|
||||||
|
|
||||||
**Inbound:**
|
|
||||||
- 443 (HTTPS) из Corporate VPN
|
|
||||||
- 11434 (Ollama) только с MCP Orchestrator
|
|
||||||
- 6333 (Qdrant) только с Ollama server
|
|
||||||
|
|
||||||
**Outbound:**
|
|
||||||
- 3000 (Gitea API)
|
|
||||||
- 2377 (Docker Swarm API)
|
|
||||||
- 6443 (Kubernetes API)
|
|
||||||
- 3100 (Loki API)
|
|
||||||
- Default: DENY ALL
|
|
||||||
|
|
||||||
**IDS/IPS** мониторит traffic для suspicious patterns, используя ML-based anomaly detection.
|
|
||||||
|
|
||||||
### Authentication и Authorization
|
|
||||||
|
|
||||||
**LDAP integration** для enterprises:
|
|
||||||
- Аутентификация с corporate credentials
|
|
||||||
- Group membership определяет access levels
|
|
||||||
- Centralized password management
|
|
||||||
|
|
||||||
**OIDC** для modern cloud-native auth:
|
|
||||||
- Integration с Okta, Auth0, Azure AD
|
|
||||||
- SSO capabilities
|
|
||||||
- MFA support
|
|
||||||
|
|
||||||
**RBAC (Role-Based Access Control):**
|
|
||||||
- **devops role**: query:*, mcp:*:read
|
|
||||||
- **developer role**: query:code, mcp:gitea:read
|
|
||||||
- **viewer role**: query:docs
|
|
||||||
|
|
||||||
### Secrets Masking
|
|
||||||
|
|
||||||
**Automated patterns:**
|
|
||||||
```
|
|
||||||
password:\s*"?([^"\s]+)"? → password: "[REDACTED]"
|
|
||||||
token:\s*"?([^"\s]+)"? → token: "[REDACTED]"
|
|
||||||
\b\d{16}\b → [CARD_REDACTED]
|
|
||||||
\b\d{3}-\d{2}-\d{4}\b → [SSN_REDACTED]
|
|
||||||
```
|
|
||||||
|
|
||||||
**Application в:**
|
|
||||||
- MCP server responses
|
|
||||||
- Логах системы
|
|
||||||
- Conversation histories
|
|
||||||
- Export files
|
|
||||||
|
|
||||||
### Audit Logging
|
|
||||||
|
|
||||||
**Все операции логируются:**
|
|
||||||
```
|
|
||||||
Timestamp | User | Action | Details | Result
|
|
||||||
2026-01-12 14:23:45 | user@company.com | query | model=qwen2.5-coder | success
|
|
||||||
2026-01-12 14:23:46 | user@company.com | mcp_k8s | get_pods | success
|
|
||||||
```
|
|
||||||
|
|
||||||
**Retention:** 1 год для compliance.
|
|
||||||
|
|
||||||
**Analysis:** Регулярный review для suspicious patterns.
|
|
||||||
|
|
||||||
### Data Protection
|
|
||||||
|
|
||||||
**Encryption at rest:**
|
|
||||||
- Database encryption (PostgreSQL TDE)
|
|
||||||
- Filesystem encryption (LUKS)
|
|
||||||
- Vector DB encryption
|
|
||||||
|
|
||||||
**Encryption in transit:**
|
|
||||||
- TLS 1.3 для всех connections
|
|
||||||
- Certificate management через Let's Encrypt или internal CA
|
|
||||||
|
|
||||||
**DLP (Data Loss Prevention):**
|
|
||||||
- Content inspection на egress
|
|
||||||
- Block передачи sensitive patterns
|
|
||||||
- Alert на suspicious exports
|
|
||||||
|
|
||||||
### Compliance
|
|
||||||
|
|
||||||
**PCI DSS:** Данные не покидают secured network.
|
|
||||||
|
|
||||||
**GDPR:**
|
|
||||||
- Right to deletion implemented
|
|
||||||
- Data minimization principles
|
|
||||||
- Consent management
|
|
||||||
- Data portability через exports
|
|
||||||
|
|
||||||
**SOC 2:**
|
|
||||||
- Comprehensive audit trails
|
|
||||||
- Access controls documented
|
|
||||||
- Regular security reviews
|
|
||||||
- Incident response procedures
|
|
||||||
|
|
||||||
### Security Monitoring
|
|
||||||
|
|
||||||
**Metrics tracked:**
|
|
||||||
- Failed authentication attempts
|
|
||||||
- Unusual access patterns
|
|
||||||
- MCP server errors
|
|
||||||
- Rate limit hits
|
|
||||||
- Secrets exposure attempts
|
|
||||||
|
|
||||||
**Alerting:**
|
|
||||||
- Slack integration для security team
|
|
||||||
- PagerDuty для critical alerts
|
|
||||||
- Email для regular notifications
|
|
||||||
|
|
||||||
### Таблица Security Controls
|
|
||||||
|
|
||||||
| Контроль | Тип | Уровень | Мониторинг |
|
|
||||||
|----------|-----|---------|------------|
|
|
||||||
| Network firewall | Preventive | Infrastructure | 24/7 |
|
|
||||||
| TLS encryption | Preventive | Transport | Certificate monitoring |
|
|
||||||
| LDAP auth | Detective | Application | Login success rate |
|
|
||||||
| RBAC | Preventive | Application | Access patterns |
|
|
||||||
| Secrets masking | Preventive | Application | Exposure attempts |
|
|
||||||
| Audit logging | Detective | All layers | Log analysis |
|
|
||||||
| IDS/IPS | Detective/Preventive | Network | Alert monitoring |
|
|
||||||
| Backup encryption | Preventive | Storage | Backup verification |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Мониторинг и Observability
|
|
||||||
|
|
||||||
### Key Metrics
|
|
||||||
|
|
||||||
**GPU Metrics:**
|
|
||||||
- nvidia_gpu_temperature_celsius
|
|
||||||
- nvidia_gpu_utilization_percent
|
|
||||||
- nvidia_gpu_memory_used_bytes
|
|
||||||
- nvidia_gpu_power_usage_watts
|
|
||||||
|
|
||||||
**Ollama Metrics:**
|
|
||||||
- ollama_requests_total
|
|
||||||
- ollama_request_duration_seconds
|
|
||||||
- ollama_tokens_per_second
|
|
||||||
- ollama_active_models
|
|
||||||
|
|
||||||
**MCP Metrics:**
|
|
||||||
- mcp_requests_total{service="gitea"}
|
|
||||||
- mcp_request_duration_seconds
|
|
||||||
- mcp_errors_total
|
|
||||||
- mcp_cache_hit_ratio
|
|
||||||
|
|
||||||
**RAG Metrics:**
|
|
||||||
- qdrant_collection_size
|
|
||||||
- qdrant_query_duration_seconds
|
|
||||||
- embedding_generation_duration
|
|
||||||
- reranking_duration
|
|
||||||
|
|
||||||
**Storage Metrics:**
|
|
||||||
- disk_usage_percent{tier="hot"}
|
|
||||||
- disk_iops{tier="hot"}
|
|
||||||
- disk_throughput_bytes
|
|
||||||
- backup_last_success_timestamp
|
|
||||||
|
|
||||||
### Grafana Dashboards
|
|
||||||
|
|
||||||
**Dashboard 1: Ollama Overview**
|
|
||||||
- GPU utilization timeline
|
|
||||||
- Request rate по моделям
|
|
||||||
- Response time percentiles (p50, p95, p99)
|
|
||||||
- Active users count
|
|
||||||
- Token generation rate
|
|
||||||
|
|
||||||
**Dashboard 2: MCP Services**
|
|
||||||
- Request distribution pie chart
|
|
||||||
- Success/error rates по сервисам
|
|
||||||
- Latency heatmap
|
|
||||||
- Cache hit rates
|
|
||||||
- Top users by requests
|
|
||||||
|
|
||||||
**Dashboard 3: Vector DB**
|
|
||||||
- Collection sizes growth
|
|
||||||
- Query performance trends
|
|
||||||
- Cache effectiveness
|
|
||||||
- Index rebuild status
|
|
||||||
|
|
||||||
**Dashboard 4: User Experience**
|
|
||||||
- Average response time
|
|
||||||
- User satisfaction ratings
|
|
||||||
- Session duration distribution
|
|
||||||
- Popular query types
|
|
||||||
- Error rate по типам
|
|
||||||
|
|
||||||
**Dashboard 5: Infrastructure Health**
|
|
||||||
- CPU/RAM utilization
|
|
||||||
- Disk I/O patterns
|
|
||||||
- Network throughput
|
|
||||||
- Temperature monitoring
|
|
||||||
- Power consumption
|
|
||||||
|
|
||||||
### Alerting Strategy
|
|
||||||
|
|
||||||
**Critical Alerts (PagerDuty):**
|
|
||||||
- Ollama service down
|
|
||||||
- GPU temperature >85°C
|
|
||||||
- Disk usage >90%
|
|
||||||
- Authentication system unavailable
|
|
||||||
- Backup failed
|
|
||||||
|
|
||||||
**Warning Alerts (Slack):**
|
|
||||||
- High error rate (>5%)
|
|
||||||
- Slow response times (p95 >10s)
|
|
||||||
- GPU utilization consistently >95%
|
|
||||||
- MCP service degraded
|
|
||||||
- Cache miss rate >50%
|
|
||||||
|
|
||||||
**Info Alerts (Email):**
|
|
||||||
- Scheduled maintenance reminders
|
|
||||||
- Usage statistics weekly digest
|
|
||||||
- Capacity planning recommendations
|
|
||||||
|
|
||||||
### Logging Strategy
|
|
||||||
|
|
||||||
**Structured logging** JSON format для всех компонентов:
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"timestamp": "2026-01-12T14:23:45Z",
|
|
||||||
"level": "INFO",
|
|
||||||
"service": "ollama",
|
|
||||||
"message": "Model loaded",
|
|
||||||
"model": "qwen2.5-coder:32b",
|
|
||||||
"load_time_ms": 2341
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Log aggregation** через Loki:
|
|
||||||
- Central collection
|
|
||||||
- Retention: 30 days hot, 90 days warm
|
|
||||||
- Full-text search capability
|
|
||||||
- Correlation with metrics
|
|
||||||
|
|
||||||
**Log levels:**
|
|
||||||
- ERROR: Failures requiring attention
|
|
||||||
- WARN: Degraded performance
|
|
||||||
- INFO: Normal operations
|
|
||||||
- DEBUG: Detailed troubleshooting (disabled in production)
|
|
||||||
|
|
||||||
### Distributed Tracing
|
|
||||||
|
|
||||||
OpenTelemetry для end-to-end request tracing:
|
|
||||||
- User request → API Gateway
|
|
||||||
- Gateway → Ollama
|
|
||||||
- Ollama → MCP services
|
|
||||||
- MCP → Backend systems
|
|
||||||
- RAG → Vector DB
|
|
||||||
|
|
||||||
Jaeger UI для visualizing traces, identifying bottlenecks.
|
|
||||||
|
|
||||||
### Health Checks
|
|
||||||
|
|
||||||
**Liveness probes:**
|
|
||||||
- Ollama /health endpoint
|
|
||||||
- Qdrant readiness
|
|
||||||
- PostgreSQL connectivity
|
|
||||||
- MCP services status
|
|
||||||
|
|
||||||
**Readiness probes:**
|
|
||||||
- Models loaded
|
|
||||||
- Indices ready
|
|
||||||
- Database connections available
|
|
||||||
|
|
||||||
**Периодичность:** Every 30 seconds.
|
|
||||||
|
|
||||||
### Capacity Planning
|
|
||||||
|
|
||||||
**Trend analysis:**
|
|
||||||
- Usage growth rate
|
|
||||||
- Storage consumption trends
|
|
||||||
- Peak load patterns
|
|
||||||
- Resource saturation points
|
|
||||||
|
|
||||||
**Forecasting:**
|
|
||||||
- When additional GPU needed
|
|
||||||
- Storage expansion timeline
|
|
||||||
- Network bandwidth requirements
|
|
||||||
- Team growth accommodation
|
|
||||||
|
|
||||||
### Таблица мониторинга
|
|
||||||
|
|
||||||
| Компонент | Метрика | Threshold Warning | Threshold Critical | Action |
|
|
||||||
|-----------|---------|-------------------|-------------------|--------|
|
|
||||||
| GPU | Temperature | >75°C | >85°C | Check cooling |
|
|
||||||
| GPU | Utilization | >85% | >95% | Consider scaling |
|
|
||||||
| GPU | Memory | >20GB | >23GB | Model optimization |
|
|
||||||
| Storage | Disk usage | >75% | >90% | Cleanup/expansion |
|
|
||||||
| Storage | IOPS | >80% max | >95% max | Storage upgrade |
|
|
||||||
| API | Error rate | >2% | >5% | Investigate logs |
|
|
||||||
| API | Latency p95 | >5s | >10s | Performance tuning |
|
|
||||||
| RAG | Query time | >1s | >2s | Index optimization |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Экономическое обоснование
|
|
||||||
|
|
||||||
### Капитальные затраты (CapEx)
|
|
||||||
|
|
||||||
| Компонент | Стоимость |
|
|
||||||
|-----------|-----------|
|
|
||||||
| GPU (RTX 4090 24GB) | $1,600-2,000 |
|
|
||||||
| CPU (Ryzen 9 7950X) | $500-600 |
|
|
||||||
| RAM (128GB DDR5 ECC) | $600-800 |
|
|
||||||
| Storage (NVMe + SATA) | $800-1,000 |
|
|
||||||
| Motherboard (High-end) | $400-500 |
|
|
||||||
| PSU (1600W Titanium) | $300-400 |
|
|
||||||
| Case/Cooling | $300-400 |
|
|
||||||
| Network (2x 10GbE) | $200-300 |
|
|
||||||
| **TOTAL CapEx** | **$12,000-15,000** |
|
|
||||||
|
|
||||||
### Операционные затраты (OpEx) годовые
|
|
||||||
|
|
||||||
| Статья | Стоимость |
|
|
||||||
|--------|-----------|
|
|
||||||
| Электричество (~500W 24/7) | $650/год |
|
|
||||||
| Охлаждение | $200/год |
|
|
||||||
| Maintenance | $500/год |
|
|
||||||
| Training/Documentation | $2,000/год |
|
|
||||||
| **TOTAL OpEx** | **$3,350/год** |
|
|
||||||
|
|
||||||
### Софт (бесплатно)
|
|
||||||
|
|
||||||
Все программные компоненты open source:
|
|
||||||
- Ubuntu Server: FREE
|
|
||||||
- Ollama: FREE
|
|
||||||
- Qdrant: FREE
|
|
||||||
- PostgreSQL: FREE
|
|
||||||
- Все MCP services: FREE (self-developed)
|
|
||||||
- Prometheus/Grafana: FREE
|
|
||||||
|
|
||||||
### ROI Analysis
|
|
||||||
|
|
||||||
**Экономия времени команды 10 инженеров:**
|
|
||||||
|
|
||||||
| Активность | Сэкономлено | Часов/год | Ценность ($100/час) |
|
|
||||||
|------------|-------------|-----------|---------------------|
|
|
||||||
| Поиск информации | 40% | 832 часов | $83,200 |
|
|
||||||
| Написание документации | 50% | 520 часов | $52,000 |
|
|
||||||
| Troubleshooting | 30% | 624 часов | $62,400 |
|
|
||||||
| Code review | 20% | 208 часов | $20,800 |
|
|
||||||
| **TOTAL** | | **2,184 часов** | **$218,400/год** |
|
|
||||||
|
|
||||||
**ROI расчет:**
|
|
||||||
```
|
|
||||||
Total Investment: $15,000 (CapEx) + $3,350 (OpEx год 1) = $18,350
|
|
||||||
Annual Benefit: $218,400
|
|
||||||
Payback Period: 18,350 / 218,400 = 0.08 года = 1 месяц
|
|
||||||
3-Year ROI: (3 × $218,400 - $18,350 - 2 × $3,350) / $18,350 = 3,458%
|
|
||||||
```
|
|
||||||
|
|
||||||
### Сравнение с облачными AI API
|
|
||||||
|
|
||||||
**OpenAI GPT-4 pricing:**
|
|
||||||
- Prompt: $0.03 per 1K tokens
|
|
||||||
- Completion: $0.06 per 1K tokens
|
|
||||||
|
|
||||||
**Типичный query:**
|
|
||||||
- 2K tokens prompt (context + question)
|
|
||||||
- 1K tokens completion
|
|
||||||
- Cost per query: $0.12
|
|
||||||
|
|
||||||
**Monthly cost для 10 пользователей:**
|
|
||||||
- 50 queries/day per user = 500 queries/day
|
|
||||||
- 500 × 30 days = 15,000 queries/month
|
|
||||||
- 15,000 × $0.12 = $1,800/month = $21,600/year
|
|
||||||
|
|
||||||
**Self-hosted advantages:**
|
|
||||||
- Lower cost after year 1
|
|
||||||
- Complete data control
|
|
||||||
- No API rate limits
|
|
||||||
- Customizable models
|
|
||||||
- No vendor lock-in
|
|
||||||
|
|
||||||
### Таблица TCO (Total Cost of Ownership) 3 года
|
|
||||||
|
|
||||||
| Год | CapEx | OpEx | Total Annual | Cumulative | Cloud Alternative |
|
|
||||||
|-----|-------|------|--------------|------------|-------------------|
|
|
||||||
| 1 | $15,000 | $3,350 | $18,350 | $18,350 | $21,600 |
|
|
||||||
| 2 | $0 | $3,350 | $3,350 | $21,700 | $43,200 |
|
|
||||||
| 3 | $0 | $3,350 | $3,350 | $25,050 | $64,800 |
|
|
||||||
| **Savings** | | | | | **$39,750** |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Deployment Roadmap
|
|
||||||
|
|
||||||
### Phase 1: Foundation (Weeks 1-2)
|
|
||||||
|
|
||||||
**Infrastructure setup:**
|
|
||||||
- Server assembly и OS installation
|
|
||||||
- Network configuration
|
|
||||||
- GPU drivers installation
|
|
||||||
- Docker setup
|
|
||||||
|
|
||||||
**Deliverables:**
|
|
||||||
- Working server с GPU functional
|
|
||||||
- Network connectivity verified
|
|
||||||
- Monitoring baseline established
|
|
||||||
|
|
||||||
### Phase 2: Core Services (Weeks 3-4)
|
|
||||||
|
|
||||||
**AI infrastructure:**
|
|
||||||
- Ollama installation
|
|
||||||
- Models download и testing
|
|
||||||
- Basic API Gateway setup
|
|
||||||
|
|
||||||
**Deliverables:**
|
|
||||||
- Models responding to queries
|
|
||||||
- Simple web interface functional
|
|
||||||
- Performance benchmarks completed
|
|
||||||
|
|
||||||
### Phase 3: MCP Integration (Weeks 5-6)
|
|
||||||
|
|
||||||
**MCP services deployment:**
|
|
||||||
- Gitea MCP server
|
|
||||||
- Docker Swarm MCP server
|
|
||||||
- Kubernetes MCP server (if applicable)
|
|
||||||
|
|
||||||
**Deliverables:**
|
|
||||||
- Models accessing corporate systems
|
|
||||||
- Read-only access verified
|
|
||||||
- Security controls tested
|
|
||||||
|
|
||||||
### Phase 4: RAG Implementation (Weeks 7-8)
|
|
||||||
|
|
||||||
**Knowledge base setup:**
|
|
||||||
- Qdrant deployment
|
|
||||||
- Embedding service
|
|
||||||
- Initial document indexing
|
|
||||||
|
|
||||||
**Deliverables:**
|
|
||||||
- Vector DB operational
|
|
||||||
- Initial corpus indexed
|
|
||||||
- Search quality validated
|
|
||||||
|
|
||||||
### Phase 5: Production Readiness (Weeks 9-10)
|
|
||||||
|
|
||||||
**Finalization:**
|
|
||||||
- Authentication integration
|
|
||||||
- Monitoring dashboards
|
|
||||||
- Backup automation
|
|
||||||
- Documentation
|
|
||||||
|
|
||||||
**Deliverables:**
|
|
||||||
- Production-ready system
|
|
||||||
- Team training completed
|
|
||||||
- Operational runbooks
|
|
||||||
- Go-live approval
|
|
||||||
|
|
||||||
### Phase 6: Rollout (Week 11-12)
|
|
||||||
|
|
||||||
**Gradual adoption:**
|
|
||||||
- Pilot group (2-3 users)
|
|
||||||
- Feedback collection
|
|
||||||
- Issue resolution
|
|
||||||
- Full team rollout
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Operational Excellence
|
|
||||||
|
|
||||||
### Daily Operations
|
|
||||||
|
|
||||||
**Health checks:**
|
|
||||||
- Morning review dashboards
|
|
||||||
- Check overnight alerts
|
|
||||||
- Verify backup success
|
|
||||||
- Monitor disk usage
|
|
||||||
|
|
||||||
**User support:**
|
|
||||||
- Answer questions in Slack
|
|
||||||
- Collect feedback
|
|
||||||
- Document common issues
|
|
||||||
|
|
||||||
### Weekly Tasks
|
|
||||||
|
|
||||||
**Performance review:**
|
|
||||||
- Analyze usage trends
|
|
||||||
- Review slow queries
|
|
||||||
- Check error patterns
|
|
||||||
- Optimize as needed
|
|
||||||
|
|
||||||
**Content updates:**
|
|
||||||
- Reindex modified documents
|
|
||||||
- Update code snippets
|
|
||||||
- Refresh runbooks
|
|
||||||
|
|
||||||
**Capacity planning:**
|
|
||||||
- Review storage trends
|
|
||||||
- Analyze GPU utilization
|
|
||||||
- Forecast growth
|
|
||||||
|
|
||||||
### Monthly Tasks
|
|
||||||
|
|
||||||
**Security review:**
|
|
||||||
- Audit logs analysis
|
|
||||||
- Access patterns review
|
|
||||||
- Update firewall rules
|
|
||||||
- Vulnerability scanning
|
|
||||||
|
|
||||||
**System maintenance:**
|
|
||||||
- OS updates
|
|
||||||
- Driver updates
|
|
||||||
- Dependency updates
|
|
||||||
- Performance tuning
|
|
||||||
|
|
||||||
**Reporting:**
|
|
||||||
- Usage statistics
|
|
||||||
- ROI tracking
|
|
||||||
- User satisfaction
|
|
||||||
- Improvement recommendations
|
|
||||||
|
|
||||||
### Quarterly Tasks
|
|
||||||
|
|
||||||
**Major upgrades:**
|
|
||||||
- Model updates
|
|
||||||
- Infrastructure upgrades
|
|
||||||
- Feature additions
|
|
||||||
|
|
||||||
**Strategy review:**
|
|
||||||
- Roadmap adjustment
|
|
||||||
- Budget review
|
|
||||||
- Team expansion planning
|
|
||||||
|
|
||||||
**Training:**
|
|
||||||
- Advanced features training
|
|
||||||
- New team members onboarding
|
|
||||||
- Best practices sharing
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Best Practices
|
## Best Practices
|
||||||
|
|
||||||
@@ -1366,103 +646,6 @@ Payback Period: 18,350 / 218,400 = 0.08 года = 1 месяц
|
|||||||
4. **Test backups** regularly
|
4. **Test backups** regularly
|
||||||
5. **Plan for growth** from day one
|
5. **Plan for growth** from day one
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Troubleshooting Guide
|
|
||||||
|
|
||||||
### GPU Issues
|
|
||||||
|
|
||||||
**Symptom:** Model loading fails
|
|
||||||
**Causes:**
|
|
||||||
- Insufficient VRAM
|
|
||||||
- Driver issues
|
|
||||||
- Cooling problems
|
|
||||||
|
|
||||||
**Resolution:**
|
|
||||||
1. Check nvidia-smi output
|
|
||||||
2. Verify model size vs VRAM
|
|
||||||
3. Update drivers if needed
|
|
||||||
4. Check temperatures
|
|
||||||
|
|
||||||
**Symptom:** Slow inference
|
|
||||||
**Causes:**
|
|
||||||
- GPU throttling due to heat
|
|
||||||
- CPU bottleneck
|
|
||||||
- Insufficient RAM
|
|
||||||
|
|
||||||
**Resolution:**
|
|
||||||
1. Monitor GPU temperature
|
|
||||||
2. Check cooling system
|
|
||||||
3. Verify CPU usage
|
|
||||||
4. Check RAM availability
|
|
||||||
|
|
||||||
### MCP Service Issues
|
|
||||||
|
|
||||||
**Symptom:** MCP timeouts
|
|
||||||
**Causes:**
|
|
||||||
- Backend system slow/down
|
|
||||||
- Network issues
|
|
||||||
- Rate limiting
|
|
||||||
|
|
||||||
**Resolution:**
|
|
||||||
1. Check backend system health
|
|
||||||
2. Verify network connectivity
|
|
||||||
3. Review rate limit settings
|
|
||||||
4. Check MCP logs
|
|
||||||
|
|
||||||
**Symptom:** Incorrect data returned
|
|
||||||
**Causes:**
|
|
||||||
- Cache staleness
|
|
||||||
- Backend API changes
|
|
||||||
- Parsing errors
|
|
||||||
|
|
||||||
**Resolution:**
|
|
||||||
1. Clear MCP cache
|
|
||||||
2. Verify backend API format
|
|
||||||
3. Check MCP server logs
|
|
||||||
4. Update parsers if needed
|
|
||||||
|
|
||||||
### RAG Issues
|
|
||||||
|
|
||||||
**Symptom:** Poor search quality
|
|
||||||
**Causes:**
|
|
||||||
- Outdated index
|
|
||||||
- Poor chunk strategy
|
|
||||||
- Embedding model issues
|
|
||||||
|
|
||||||
**Resolution:**
|
|
||||||
1. Trigger reindexing
|
|
||||||
2. Review chunk configuration
|
|
||||||
3. Test embedding service
|
|
||||||
4. Analyze user feedback
|
|
||||||
|
|
||||||
**Symptom:** Slow searches
|
|
||||||
**Causes:**
|
|
||||||
- Index size too large
|
|
||||||
- Insufficient resources
|
|
||||||
- Network latency
|
|
||||||
|
|
||||||
**Resolution:**
|
|
||||||
1. Optimize index parameters
|
|
||||||
2. Add more RAM/storage
|
|
||||||
3. Check Qdrant configuration
|
|
||||||
4. Review network latency
|
|
||||||
|
|
||||||
### Storage Issues
|
|
||||||
|
|
||||||
**Symptom:** Disk full
|
|
||||||
**Causes:**
|
|
||||||
- Uncontrolled growth
|
|
||||||
- Failed cleanup jobs
|
|
||||||
- Backup accumulation
|
|
||||||
|
|
||||||
**Resolution:**
|
|
||||||
1. Run cleanup scripts
|
|
||||||
2. Archive old data
|
|
||||||
3. Verify retention policies
|
|
||||||
4. Plan capacity expansion
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Заключение
|
## Заключение
|
||||||
|
|
||||||
@@ -1480,29 +663,4 @@ Self-hosted AI-инфраструктура на базе Ollama с интегр
|
|||||||
|
|
||||||
**История для контекста**. Persistent storage и intelligent management истории диалогов критичны для user experience и continuous improvement системы.
|
**История для контекста**. Persistent storage и intelligent management истории диалогов критичны для user experience и continuous improvement системы.
|
||||||
|
|
||||||
### Путь вперед
|
|
||||||
|
|
||||||
Развертывание такой инфраструктуры - не одноразовый проект, а начало journey continuous improvement. Система будет evolve вместе с:
|
|
||||||
- Появлением новых, более мощных моделей
|
|
||||||
- Расширением интеграций с корпоративными системами
|
|
||||||
- Ростом knowledge base
|
|
||||||
- Увеличением команды пользователей
|
|
||||||
- Развитием best practices
|
|
||||||
|
|
||||||
### Следующие шаги
|
|
||||||
|
|
||||||
1. **Оценка готовности** вашей организации к внедрению
|
|
||||||
2. **Планирование бюджета** и получение approvals
|
|
||||||
3. **Формирование команды** для deployment и support
|
|
||||||
4. **Pilot deployment** с small group пользователей
|
|
||||||
5. **Iterative improvement** на основе feedback
|
|
||||||
6. **Gradual rollout** ко всей команде
|
|
||||||
|
|
||||||
С правильной стратегией, инвестициями и commitment, self-hosted AI-инфраструктура становится мощным enabler productivity, качества работы и innovation в вашей организации.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Версия документа:** 1.0
|
|
||||||
**Дата:** Январь 2026
|
|
||||||
**Автор:** Based on infrastructure requirements для k3s-gitops
|
|
||||||
**Статус:** Comprehensive Guide
|
|
||||||
Reference in New Issue
Block a user