Files
k3s-gitops/apps/grafana/README-LOKI-INTEGRATION.md

8.0 KiB

📊 Grafana + Loki Integration

What Was Configured

1. Loki Data Source

File: apps/grafana/loki-datasource.yaml

Automatically adds Loki as a data source in Grafana:

2. Loki Logs Dashboard

File: apps/grafana/loki-dashboard.yaml

Comprehensive dashboard with 7 panels:

📈 Panel 1: Log Rate by Namespace

  • Real-time log ingestion rate
  • Grouped by namespace
  • Shows logs/second

🔥 Panel 2: Error Rate by Namespace

  • Errors, exceptions, and fatal messages
  • Per namespace breakdown
  • 5-minute rate

⚠️ Panel 3: Total Errors (Last Hour)

  • Gauge showing total error count
  • Color-coded thresholds:
    • Green: < 10 errors
    • Yellow: 10-50 errors
    • Red: > 50 errors

🔍 Panel 4: Log Browser

  • Interactive log viewer
  • Filterable by:
    • Namespace (dropdown)
    • Pod (dropdown)
    • Search text (free text)
  • Live tail capability

📊 Panel 5: Top 10 Namespaces by Log Volume

  • Pie chart showing which namespaces generate most logs
  • Based on last hour

🎯 Panel 6: Top 10 Pods by Log Volume

  • Pie chart of chattiest pods
  • Filtered by selected namespace

🚨 Panel 7: Errors & Warnings

  • All errors/warnings across cluster
  • Full log details
  • Sortable and searchable

🚀 How to Access

1. Wait for ArgoCD Sync

ArgoCD will automatically apply the changes (~2-3 minutes).

2. Access Grafana

# Get Grafana URL
kubectl get ingress -n monitoring

# Or port-forward
kubectl port-forward -n monitoring svc/k8s-monitoring-grafana 3000:80

3. Find the Dashboard

In Grafana:

  1. Click Dashboards (left menu)
  2. Search for "Loki Logs Dashboard"
  3. Or navigate to: Dashboards → Browse → loki-logs

🔍 How to Use the Dashboard

Filters at the Top

Namespace Filter:

  • Select one or multiple namespaces
  • Default: All namespaces

Pod Filter:

  • Dynamically updates based on selected namespace(s)
  • Default: All pods

Search Box:

  • Free-text search across all logs
  • Examples:
    • error - find errors
    • timeout - find timeouts
    • sync - find ArgoCD syncs

Example Workflows

1. Debug Application Errors

1. Select namespace: "default"
2. Select pod: "myapp-xyz"
3. Search: "error"
4. Look at "Log Browser" panel

2. Monitor ArgoCD

1. Select namespace: "argocd"
2. Search: "sync"
3. Check "Error Rate" and "Log Browser"

3. Find Noisy Pods

1. Look at "Top 10 Pods by Log Volume"
2. Click on highest pod
3. Use "Log Browser" to see what it's logging

4. Cluster-Wide Error Monitoring

1. Set namespace to "All"
2. Check "Total Errors" gauge
3. Look at "Errors & Warnings" panel at bottom

📝 LogQL Query Examples

The dashboard uses LogQL (Loki Query Language). Here are some queries you can use:

Basic Queries

# All logs from a namespace
{namespace="loki"}

# Logs from specific pod
{pod="loki-0"}

# Multiple namespaces
{namespace=~"loki|argocd|grafana"}

Filtering

# Contains "error"
{namespace="default"} |= "error"

# Regex match
{namespace="argocd"} |~ "sync|deploy"

# NOT containing
{namespace="loki"} != "debug"

# Case insensitive
{namespace="default"} |~ "(?i)error"

Metrics Queries

# Log rate per namespace
sum by (namespace) (rate({namespace=~".+"}[1m]))

# Error count
sum(count_over_time({namespace=~".+"} |~ "(?i)error"[5m]))

# Top namespaces
topk(10, sum by (namespace) (count_over_time({namespace=~".+"}[1h])))

JSON Parsing

# If logs are JSON
{namespace="app"} | json | level="error"

# Extract fields
{namespace="app"} | json | line_format "{{.message}}"

🎨 Dashboard Customization

Add New Panel

  1. Click "Add panel" (top right)
  2. Select "Loki" as data source
  3. Write your LogQL query
  4. Choose visualization type
  5. Save

Useful Panel Types

  • Time series: For rates and counts over time
  • Logs: For viewing actual log lines
  • Stat: For single values (like total errors)
  • Gauge: For thresholds (like error counts)
  • Table: For structured data
  • Pie chart: For distribution

🔧 Troubleshooting

Dashboard Not Appearing

# Check ConfigMap created
kubectl get configmap -n monitoring loki-logs-dashboard

# Check Grafana pod logs
kubectl logs -n monitoring -l app.kubernetes.io/name=grafana

# Restart Grafana
kubectl rollout restart deployment/k8s-monitoring-grafana -n monitoring

Data Source Not Working

# Test Loki from Grafana pod
kubectl exec -n monitoring -it deployment/k8s-monitoring-grafana -- \
  curl http://loki.loki.svc.cluster.local:3100/ready

# Should return: ready

No Logs Showing

# Check Promtail is running
kubectl get pods -n loki -l app.kubernetes.io/name=promtail

# Check Promtail logs
kubectl logs -n loki -l app.kubernetes.io/name=promtail --tail=50

# Test query directly
kubectl exec -n monitoring -it deployment/k8s-monitoring-grafana -- \
  curl "http://loki.loki.svc.cluster.local:3100/loki/api/v1/labels"

📊 What Logs Are Collected

Promtail collects logs from:

1. All Pod Logs

/var/log/pods/<namespace>_<pod>_<uid>/<container>/*.log

2. Labels Added Automatically

Every log line gets these labels:

  • namespace - Kubernetes namespace
  • pod - Pod name
  • container - Container name
  • node - Node where pod runs
  • job - Always "kubernetes-pods"

3. Example Log Entry

{
  "namespace": "loki",
  "pod": "loki-0",
  "container": "loki",
  "node": "master1",
  "timestamp": "2026-01-05T13:30:00Z",
  "line": "level=info msg=\"flushing stream\""
}

🎯 Advanced Features

Live Tail

Click "Live" button in Log Browser panel to stream logs in real-time.

Context

Click on any log line → "Show context" to see surrounding logs.

Log Details

Click on any log line to see:

  • All labels
  • Parsed fields (if JSON)
  • Timestamp
  • Full message

Sharing

Click "Share" (top right) to:

  • Copy link
  • Create snapshot
  • Export as JSON

🚨 Alerting (Optional)

You can create alerts based on log patterns:

Example: Alert on High Error Rate

alert: HighErrorRate
expr: sum(rate({namespace=~".+"} |~ "(?i)error"[5m])) > 10
for: 5m
annotations:
  summary: "High error rate detected"
  description: "{{ $value }} errors/sec"

To add alerts:

  1. Go to dashboard panel
  2. Click "Alert" tab
  3. Configure threshold
  4. Set notification channel

📈 Performance Tips

1. Limit Time Range

  • Use smaller time ranges for faster queries
  • Default: 1 hour (good balance)

2. Use Filters

  • Always filter by namespace/pod when possible
  • Reduces data scanned

3. Dashboard Refresh

  • Default: 10 seconds
  • Increase if experiencing lag

4. Log Volume

  • Monitor "Top 10" panels
  • Consider log retention policy if volume is high


📋 Quick Reference

LogQL Operators

  • |= - Contains (exact)
  • != - Does not contain
  • |~ - Regex match
  • !~ - Regex not match
  • | json - Parse JSON
  • | logfmt - Parse logfmt
  • | line_format - Format output

Rate Functions

  • rate() - Per-second rate
  • count_over_time() - Total count
  • bytes_over_time() - Total bytes
  • bytes_rate() - Bytes per second

Aggregations

  • sum by (label) - Sum grouped by label
  • count by (label) - Count grouped
  • avg by (label) - Average
  • max by (label) - Maximum
  • topk(n, query) - Top N results

Dashboard is ready! It will appear in Grafana after ArgoCD syncs (~2-3 minutes). 🎉

Next Steps

  1. Wait for ArgoCD sync
  2. Open Grafana
  3. Find "Loki Logs Dashboard"
  4. Start exploring your logs!

Want me to add more panels or create specific queries for your use case?