8.0 KiB
📊 Grafana + Loki Integration
✅ What Was Configured
1. Loki Data Source
File: apps/grafana/loki-datasource.yaml
Automatically adds Loki as a data source in Grafana:
- URL: http://loki.loki.svc.cluster.local:3100
- Type: Loki
- Access: Proxy (internal cluster access)
2. Loki Logs Dashboard
File: apps/grafana/loki-dashboard.yaml
Comprehensive dashboard with 7 panels:
📈 Panel 1: Log Rate by Namespace
- Real-time log ingestion rate
- Grouped by namespace
- Shows logs/second
🔥 Panel 2: Error Rate by Namespace
- Errors, exceptions, and fatal messages
- Per namespace breakdown
- 5-minute rate
⚠️ Panel 3: Total Errors (Last Hour)
- Gauge showing total error count
- Color-coded thresholds:
- Green: < 10 errors
- Yellow: 10-50 errors
- Red: > 50 errors
🔍 Panel 4: Log Browser
- Interactive log viewer
- Filterable by:
- Namespace (dropdown)
- Pod (dropdown)
- Search text (free text)
- Live tail capability
📊 Panel 5: Top 10 Namespaces by Log Volume
- Pie chart showing which namespaces generate most logs
- Based on last hour
🎯 Panel 6: Top 10 Pods by Log Volume
- Pie chart of chattiest pods
- Filtered by selected namespace
🚨 Panel 7: Errors & Warnings
- All errors/warnings across cluster
- Full log details
- Sortable and searchable
🚀 How to Access
1. Wait for ArgoCD Sync
ArgoCD will automatically apply the changes (~2-3 minutes).
2. Access Grafana
# Get Grafana URL
kubectl get ingress -n monitoring
# Or port-forward
kubectl port-forward -n monitoring svc/k8s-monitoring-grafana 3000:80
3. Find the Dashboard
In Grafana:
- Click Dashboards (left menu)
- Search for "Loki Logs Dashboard"
- Or navigate to: Dashboards → Browse → loki-logs
🔍 How to Use the Dashboard
Filters at the Top
Namespace Filter:
- Select one or multiple namespaces
- Default: All namespaces
Pod Filter:
- Dynamically updates based on selected namespace(s)
- Default: All pods
Search Box:
- Free-text search across all logs
- Examples:
error- find errorstimeout- find timeoutssync- find ArgoCD syncs
Example Workflows
1. Debug Application Errors
1. Select namespace: "default"
2. Select pod: "myapp-xyz"
3. Search: "error"
4. Look at "Log Browser" panel
2. Monitor ArgoCD
1. Select namespace: "argocd"
2. Search: "sync"
3. Check "Error Rate" and "Log Browser"
3. Find Noisy Pods
1. Look at "Top 10 Pods by Log Volume"
2. Click on highest pod
3. Use "Log Browser" to see what it's logging
4. Cluster-Wide Error Monitoring
1. Set namespace to "All"
2. Check "Total Errors" gauge
3. Look at "Errors & Warnings" panel at bottom
📝 LogQL Query Examples
The dashboard uses LogQL (Loki Query Language). Here are some queries you can use:
Basic Queries
# All logs from a namespace
{namespace="loki"}
# Logs from specific pod
{pod="loki-0"}
# Multiple namespaces
{namespace=~"loki|argocd|grafana"}
Filtering
# Contains "error"
{namespace="default"} |= "error"
# Regex match
{namespace="argocd"} |~ "sync|deploy"
# NOT containing
{namespace="loki"} != "debug"
# Case insensitive
{namespace="default"} |~ "(?i)error"
Metrics Queries
# Log rate per namespace
sum by (namespace) (rate({namespace=~".+"}[1m]))
# Error count
sum(count_over_time({namespace=~".+"} |~ "(?i)error"[5m]))
# Top namespaces
topk(10, sum by (namespace) (count_over_time({namespace=~".+"}[1h])))
JSON Parsing
# If logs are JSON
{namespace="app"} | json | level="error"
# Extract fields
{namespace="app"} | json | line_format "{{.message}}"
🎨 Dashboard Customization
Add New Panel
- Click "Add panel" (top right)
- Select "Loki" as data source
- Write your LogQL query
- Choose visualization type
- Save
Useful Panel Types
- Time series: For rates and counts over time
- Logs: For viewing actual log lines
- Stat: For single values (like total errors)
- Gauge: For thresholds (like error counts)
- Table: For structured data
- Pie chart: For distribution
🔧 Troubleshooting
Dashboard Not Appearing
# Check ConfigMap created
kubectl get configmap -n monitoring loki-logs-dashboard
# Check Grafana pod logs
kubectl logs -n monitoring -l app.kubernetes.io/name=grafana
# Restart Grafana
kubectl rollout restart deployment/k8s-monitoring-grafana -n monitoring
Data Source Not Working
# Test Loki from Grafana pod
kubectl exec -n monitoring -it deployment/k8s-monitoring-grafana -- \
curl http://loki.loki.svc.cluster.local:3100/ready
# Should return: ready
No Logs Showing
# Check Promtail is running
kubectl get pods -n loki -l app.kubernetes.io/name=promtail
# Check Promtail logs
kubectl logs -n loki -l app.kubernetes.io/name=promtail --tail=50
# Test query directly
kubectl exec -n monitoring -it deployment/k8s-monitoring-grafana -- \
curl "http://loki.loki.svc.cluster.local:3100/loki/api/v1/labels"
📊 What Logs Are Collected
Promtail collects logs from:
1. All Pod Logs
/var/log/pods/<namespace>_<pod>_<uid>/<container>/*.log
2. Labels Added Automatically
Every log line gets these labels:
namespace- Kubernetes namespacepod- Pod namecontainer- Container namenode- Node where pod runsjob- Always "kubernetes-pods"
3. Example Log Entry
{
"namespace": "loki",
"pod": "loki-0",
"container": "loki",
"node": "master1",
"timestamp": "2026-01-05T13:30:00Z",
"line": "level=info msg=\"flushing stream\""
}
🎯 Advanced Features
Live Tail
Click "Live" button in Log Browser panel to stream logs in real-time.
Context
Click on any log line → "Show context" to see surrounding logs.
Log Details
Click on any log line to see:
- All labels
- Parsed fields (if JSON)
- Timestamp
- Full message
Sharing
Click "Share" (top right) to:
- Copy link
- Create snapshot
- Export as JSON
🚨 Alerting (Optional)
You can create alerts based on log patterns:
Example: Alert on High Error Rate
alert: HighErrorRate
expr: sum(rate({namespace=~".+"} |~ "(?i)error"[5m])) > 10
for: 5m
annotations:
summary: "High error rate detected"
description: "{{ $value }} errors/sec"
To add alerts:
- Go to dashboard panel
- Click "Alert" tab
- Configure threshold
- Set notification channel
📈 Performance Tips
1. Limit Time Range
- Use smaller time ranges for faster queries
- Default: 1 hour (good balance)
2. Use Filters
- Always filter by namespace/pod when possible
- Reduces data scanned
3. Dashboard Refresh
- Default: 10 seconds
- Increase if experiencing lag
4. Log Volume
- Monitor "Top 10" panels
- Consider log retention policy if volume is high
🔗 Useful Links
- Loki API: http://loki.loki.svc.cluster.local:3100
- Loki Ready: http://loki.loki.svc.cluster.local:3100/ready
- Loki Metrics: http://loki.loki.svc.cluster.local:3100/metrics
- LogQL Docs: https://grafana.com/docs/loki/latest/logql/
📋 Quick Reference
LogQL Operators
|=- Contains (exact)!=- Does not contain|~- Regex match!~- Regex not match| json- Parse JSON| logfmt- Parse logfmt| line_format- Format output
Rate Functions
rate()- Per-second ratecount_over_time()- Total countbytes_over_time()- Total bytesbytes_rate()- Bytes per second
Aggregations
sum by (label)- Sum grouped by labelcount by (label)- Count groupedavg by (label)- Averagemax by (label)- Maximumtopk(n, query)- Top N results
Dashboard is ready! It will appear in Grafana after ArgoCD syncs (~2-3 minutes). 🎉
Next Steps
- ✅ Wait for ArgoCD sync
- ✅ Open Grafana
- ✅ Find "Loki Logs Dashboard"
- ✅ Start exploring your logs!
Want me to add more panels or create specific queries for your use case?