# 📊 Grafana + Loki Integration ## ✅ What Was Configured ### 1. Loki Data Source **File**: `apps/grafana/loki-datasource.yaml` Automatically adds Loki as a data source in Grafana: - **URL**: http://loki.loki.svc.cluster.local:3100 - **Type**: Loki - **Access**: Proxy (internal cluster access) ### 2. Loki Logs Dashboard **File**: `apps/grafana/loki-dashboard.yaml` Comprehensive dashboard with 7 panels: #### 📈 Panel 1: Log Rate by Namespace - Real-time log ingestion rate - Grouped by namespace - Shows logs/second #### 🔥 Panel 2: Error Rate by Namespace - Errors, exceptions, and fatal messages - Per namespace breakdown - 5-minute rate #### ⚠️ Panel 3: Total Errors (Last Hour) - Gauge showing total error count - Color-coded thresholds: - Green: < 10 errors - Yellow: 10-50 errors - Red: > 50 errors #### 🔍 Panel 4: Log Browser - Interactive log viewer - Filterable by: - Namespace (dropdown) - Pod (dropdown) - Search text (free text) - Live tail capability #### 📊 Panel 5: Top 10 Namespaces by Log Volume - Pie chart showing which namespaces generate most logs - Based on last hour #### 🎯 Panel 6: Top 10 Pods by Log Volume - Pie chart of chattiest pods - Filtered by selected namespace #### 🚨 Panel 7: Errors & Warnings - All errors/warnings across cluster - Full log details - Sortable and searchable --- ## 🚀 How to Access ### 1. Wait for ArgoCD Sync ArgoCD will automatically apply the changes (~2-3 minutes). ### 2. Access Grafana ```bash # Get Grafana URL kubectl get ingress -n monitoring # Or port-forward kubectl port-forward -n monitoring svc/k8s-monitoring-grafana 3000:80 ``` ### 3. Find the Dashboard In Grafana: 1. Click **Dashboards** (left menu) 2. Search for **"Loki Logs Dashboard"** 3. Or navigate to: **Dashboards → Browse → loki-logs** --- ## 🔍 How to Use the Dashboard ### Filters at the Top **Namespace Filter:** - Select one or multiple namespaces - Default: All namespaces **Pod Filter:** - Dynamically updates based on selected namespace(s) - Default: All pods **Search Box:** - Free-text search across all logs - Examples: - `error` - find errors - `timeout` - find timeouts - `sync` - find ArgoCD syncs ### Example Workflows #### 1. Debug Application Errors ``` 1. Select namespace: "default" 2. Select pod: "myapp-xyz" 3. Search: "error" 4. Look at "Log Browser" panel ``` #### 2. Monitor ArgoCD ``` 1. Select namespace: "argocd" 2. Search: "sync" 3. Check "Error Rate" and "Log Browser" ``` #### 3. Find Noisy Pods ``` 1. Look at "Top 10 Pods by Log Volume" 2. Click on highest pod 3. Use "Log Browser" to see what it's logging ``` #### 4. Cluster-Wide Error Monitoring ``` 1. Set namespace to "All" 2. Check "Total Errors" gauge 3. Look at "Errors & Warnings" panel at bottom ``` --- ## 📝 LogQL Query Examples The dashboard uses LogQL (Loki Query Language). Here are some queries you can use: ### Basic Queries ```logql # All logs from a namespace {namespace="loki"} # Logs from specific pod {pod="loki-0"} # Multiple namespaces {namespace=~"loki|argocd|grafana"} ``` ### Filtering ```logql # Contains "error" {namespace="default"} |= "error" # Regex match {namespace="argocd"} |~ "sync|deploy" # NOT containing {namespace="loki"} != "debug" # Case insensitive {namespace="default"} |~ "(?i)error" ``` ### Metrics Queries ```logql # Log rate per namespace sum by (namespace) (rate({namespace=~".+"}[1m])) # Error count sum(count_over_time({namespace=~".+"} |~ "(?i)error"[5m])) # Top namespaces topk(10, sum by (namespace) (count_over_time({namespace=~".+"}[1h]))) ``` ### JSON Parsing ```logql # If logs are JSON {namespace="app"} | json | level="error" # Extract fields {namespace="app"} | json | line_format "{{.message}}" ``` --- ## 🎨 Dashboard Customization ### Add New Panel 1. Click **"Add panel"** (top right) 2. Select **"Loki"** as data source 3. Write your LogQL query 4. Choose visualization type 5. Save ### Useful Panel Types - **Time series**: For rates and counts over time - **Logs**: For viewing actual log lines - **Stat**: For single values (like total errors) - **Gauge**: For thresholds (like error counts) - **Table**: For structured data - **Pie chart**: For distribution --- ## 🔧 Troubleshooting ### Dashboard Not Appearing ```bash # Check ConfigMap created kubectl get configmap -n monitoring loki-logs-dashboard # Check Grafana pod logs kubectl logs -n monitoring -l app.kubernetes.io/name=grafana # Restart Grafana kubectl rollout restart deployment/k8s-monitoring-grafana -n monitoring ``` ### Data Source Not Working ```bash # Test Loki from Grafana pod kubectl exec -n monitoring -it deployment/k8s-monitoring-grafana -- \ curl http://loki.loki.svc.cluster.local:3100/ready # Should return: ready ``` ### No Logs Showing ```bash # Check Promtail is running kubectl get pods -n loki -l app.kubernetes.io/name=promtail # Check Promtail logs kubectl logs -n loki -l app.kubernetes.io/name=promtail --tail=50 # Test query directly kubectl exec -n monitoring -it deployment/k8s-monitoring-grafana -- \ curl "http://loki.loki.svc.cluster.local:3100/loki/api/v1/labels" ``` --- ## 📊 What Logs Are Collected Promtail collects logs from: ### 1. All Pod Logs ``` /var/log/pods/__//*.log ``` ### 2. Labels Added Automatically Every log line gets these labels: - `namespace` - Kubernetes namespace - `pod` - Pod name - `container` - Container name - `node` - Node where pod runs - `job` - Always "kubernetes-pods" ### 3. Example Log Entry ```json { "namespace": "loki", "pod": "loki-0", "container": "loki", "node": "master1", "timestamp": "2026-01-05T13:30:00Z", "line": "level=info msg=\"flushing stream\"" } ``` --- ## 🎯 Advanced Features ### Live Tail Click **"Live"** button in Log Browser panel to stream logs in real-time. ### Context Click on any log line → "Show context" to see surrounding logs. ### Log Details Click on any log line to see: - All labels - Parsed fields (if JSON) - Timestamp - Full message ### Sharing Click **"Share"** (top right) to: - Copy link - Create snapshot - Export as JSON --- ## 🚨 Alerting (Optional) You can create alerts based on log patterns: ### Example: Alert on High Error Rate ```yaml alert: HighErrorRate expr: sum(rate({namespace=~".+"} |~ "(?i)error"[5m])) > 10 for: 5m annotations: summary: "High error rate detected" description: "{{ $value }} errors/sec" ``` To add alerts: 1. Go to dashboard panel 2. Click "Alert" tab 3. Configure threshold 4. Set notification channel --- ## 📈 Performance Tips ### 1. Limit Time Range - Use smaller time ranges for faster queries - Default: 1 hour (good balance) ### 2. Use Filters - Always filter by namespace/pod when possible - Reduces data scanned ### 3. Dashboard Refresh - Default: 10 seconds - Increase if experiencing lag ### 4. Log Volume - Monitor "Top 10" panels - Consider log retention policy if volume is high --- ## 🔗 Useful Links - **Loki API**: http://loki.loki.svc.cluster.local:3100 - **Loki Ready**: http://loki.loki.svc.cluster.local:3100/ready - **Loki Metrics**: http://loki.loki.svc.cluster.local:3100/metrics - **LogQL Docs**: https://grafana.com/docs/loki/latest/logql/ --- ## 📋 Quick Reference ### LogQL Operators - `|=` - Contains (exact) - `!=` - Does not contain - `|~` - Regex match - `!~` - Regex not match - `| json` - Parse JSON - `| logfmt` - Parse logfmt - `| line_format` - Format output ### Rate Functions - `rate()` - Per-second rate - `count_over_time()` - Total count - `bytes_over_time()` - Total bytes - `bytes_rate()` - Bytes per second ### Aggregations - `sum by (label)` - Sum grouped by label - `count by (label)` - Count grouped - `avg by (label)` - Average - `max by (label)` - Maximum - `topk(n, query)` - Top N results --- **Dashboard is ready! It will appear in Grafana after ArgoCD syncs (~2-3 minutes).** 🎉 ## Next Steps 1. ✅ Wait for ArgoCD sync 2. ✅ Open Grafana 3. ✅ Find "Loki Logs Dashboard" 4. ✅ Start exploring your logs! Want me to add more panels or create specific queries for your use case?