admin/k3s-gitops

Fork 0

Files

Claude AI 16fe113da3 docs(grafana): Add comprehensive Loki integration guide

2026-01-05 13:32:46 +00:00

8.0 KiB

Raw Blame History

📊 Grafana + Loki Integration

✅ What Was Configured

1. Loki Data Source

File: apps/grafana/loki-datasource.yaml

Automatically adds Loki as a data source in Grafana:

URL: http://loki.loki.svc.cluster.local:3100
Type: Loki
Access: Proxy (internal cluster access)

2. Loki Logs Dashboard

File: apps/grafana/loki-dashboard.yaml

Comprehensive dashboard with 7 panels:

📈 Panel 1: Log Rate by Namespace

Real-time log ingestion rate
Grouped by namespace
Shows logs/second

🔥 Panel 2: Error Rate by Namespace

Errors, exceptions, and fatal messages
Per namespace breakdown
5-minute rate

⚠️ Panel 3: Total Errors (Last Hour)

Gauge showing total error count
Color-coded thresholds:
- Green: < 10 errors
- Yellow: 10-50 errors
- Red: > 50 errors

🔍 Panel 4: Log Browser

Interactive log viewer
Filterable by:
- Namespace (dropdown)
- Pod (dropdown)
- Search text (free text)
Live tail capability

📊 Panel 5: Top 10 Namespaces by Log Volume

Pie chart showing which namespaces generate most logs
Based on last hour

🎯 Panel 6: Top 10 Pods by Log Volume

Pie chart of chattiest pods
Filtered by selected namespace

🚨 Panel 7: Errors & Warnings

All errors/warnings across cluster
Full log details
Sortable and searchable

🚀 How to Access

1. Wait for ArgoCD Sync

ArgoCD will automatically apply the changes (~2-3 minutes).

2. Access Grafana

# Get Grafana URL
kubectl get ingress -n monitoring

# Or port-forward
kubectl port-forward -n monitoring svc/k8s-monitoring-grafana 3000:80

3. Find the Dashboard

In Grafana:

Click Dashboards (left menu)
Search for "Loki Logs Dashboard"
Or navigate to: Dashboards → Browse → loki-logs

🔍 How to Use the Dashboard

Filters at the Top

Namespace Filter:

Select one or multiple namespaces
Default: All namespaces

Pod Filter:

Dynamically updates based on selected namespace(s)
Default: All pods

Search Box:

Free-text search across all logs
Examples:
- error - find errors
- timeout - find timeouts
- sync - find ArgoCD syncs

Example Workflows

1. Debug Application Errors

1. Select namespace: "default"
2. Select pod: "myapp-xyz"
3. Search: "error"
4. Look at "Log Browser" panel

2. Monitor ArgoCD

1. Select namespace: "argocd"
2. Search: "sync"
3. Check "Error Rate" and "Log Browser"

3. Find Noisy Pods

1. Look at "Top 10 Pods by Log Volume"
2. Click on highest pod
3. Use "Log Browser" to see what it's logging

4. Cluster-Wide Error Monitoring

1. Set namespace to "All"
2. Check "Total Errors" gauge
3. Look at "Errors & Warnings" panel at bottom

📝 LogQL Query Examples

The dashboard uses LogQL (Loki Query Language). Here are some queries you can use:

Basic Queries

# All logs from a namespace
{namespace="loki"}

# Logs from specific pod
{pod="loki-0"}

# Multiple namespaces
{namespace=~"loki|argocd|grafana"}

Filtering

# Contains "error"
{namespace="default"} |= "error"

# Regex match
{namespace="argocd"} |~ "sync|deploy"

# NOT containing
{namespace="loki"} != "debug"

# Case insensitive
{namespace="default"} |~ "(?i)error"

Metrics Queries

# Log rate per namespace
sum by (namespace) (rate({namespace=~".+"}[1m]))

# Error count
sum(count_over_time({namespace=~".+"} |~ "(?i)error"[5m]))

# Top namespaces
topk(10, sum by (namespace) (count_over_time({namespace=~".+"}[1h])))

JSON Parsing

# If logs are JSON
{namespace="app"} | json | level="error"

# Extract fields
{namespace="app"} | json | line_format "{{.message}}"

🎨 Dashboard Customization

Add New Panel

Click "Add panel" (top right)
Select "Loki" as data source
Write your LogQL query
Choose visualization type
Save

Useful Panel Types

Time series: For rates and counts over time
Logs: For viewing actual log lines
Stat: For single values (like total errors)
Gauge: For thresholds (like error counts)
Table: For structured data
Pie chart: For distribution

🔧 Troubleshooting

Dashboard Not Appearing

# Check ConfigMap created
kubectl get configmap -n monitoring loki-logs-dashboard

# Check Grafana pod logs
kubectl logs -n monitoring -l app.kubernetes.io/name=grafana

# Restart Grafana
kubectl rollout restart deployment/k8s-monitoring-grafana -n monitoring

Data Source Not Working

# Test Loki from Grafana pod
kubectl exec -n monitoring -it deployment/k8s-monitoring-grafana -- \
  curl http://loki.loki.svc.cluster.local:3100/ready

# Should return: ready

No Logs Showing

# Check Promtail is running
kubectl get pods -n loki -l app.kubernetes.io/name=promtail

# Check Promtail logs
kubectl logs -n loki -l app.kubernetes.io/name=promtail --tail=50

# Test query directly
kubectl exec -n monitoring -it deployment/k8s-monitoring-grafana -- \
  curl "http://loki.loki.svc.cluster.local:3100/loki/api/v1/labels"

📊 What Logs Are Collected

Promtail collects logs from:

1. All Pod Logs

/var/log/pods/<namespace>_<pod>_<uid>/<container>/*.log

2. Labels Added Automatically

Every log line gets these labels:

namespace - Kubernetes namespace
pod - Pod name
container - Container name
node - Node where pod runs
job - Always "kubernetes-pods"

3. Example Log Entry

{
  "namespace": "loki",
  "pod": "loki-0",
  "container": "loki",
  "node": "master1",
  "timestamp": "2026-01-05T13:30:00Z",
  "line": "level=info msg=\"flushing stream\""
}

🎯 Advanced Features

Live Tail

Click "Live" button in Log Browser panel to stream logs in real-time.

Context

Click on any log line → "Show context" to see surrounding logs.

Log Details

Click on any log line to see:

All labels
Parsed fields (if JSON)
Timestamp
Full message

Click "Share" (top right) to:

Copy link
Create snapshot
Export as JSON

🚨 Alerting (Optional)

You can create alerts based on log patterns:

Example: Alert on High Error Rate

alert: HighErrorRate
expr: sum(rate({namespace=~".+"} |~ "(?i)error"[5m])) > 10
for: 5m
annotations:
  summary: "High error rate detected"
  description: "{{ $value }} errors/sec"

To add alerts:

Go to dashboard panel
Click "Alert" tab
Configure threshold
Set notification channel

📈 Performance Tips

1. Limit Time Range

Use smaller time ranges for faster queries
Default: 1 hour (good balance)

2. Use Filters

Always filter by namespace/pod when possible
Reduces data scanned

3. Dashboard Refresh

Default: 10 seconds
Increase if experiencing lag

4. Log Volume

Monitor "Top 10" panels
Consider log retention policy if volume is high

🔗 Useful Links

Loki API: http://loki.loki.svc.cluster.local:3100
Loki Ready: http://loki.loki.svc.cluster.local:3100/ready
Loki Metrics: http://loki.loki.svc.cluster.local:3100/metrics
LogQL Docs: https://grafana.com/docs/loki/latest/logql/

📋 Quick Reference

LogQL Operators

|= - Contains (exact)
!= - Does not contain
|~ - Regex match
!~ - Regex not match
| json - Parse JSON
| logfmt - Parse logfmt
| line_format - Format output

Rate Functions

rate() - Per-second rate
count_over_time() - Total count
bytes_over_time() - Total bytes
bytes_rate() - Bytes per second

Aggregations

sum by (label) - Sum grouped by label
count by (label) - Count grouped
avg by (label) - Average
max by (label) - Maximum
topk(n, query) - Top N results

Dashboard is ready! It will appear in Grafana after ArgoCD syncs (~2-3 minutes). 🎉

Next Steps

✅ Wait for ArgoCD sync
✅ Open Grafana
✅ Find "Loki Logs Dashboard"
✅ Start exploring your logs!

Want me to add more panels or create specific queries for your use case?

8.0 KiB Raw Blame History