Add apps/demo-nginx/docs/Argocd_sync_issue.md
This commit is contained in:
419
apps/demo-nginx/docs/Argocd_sync_issue.md
Normal file
419
apps/demo-nginx/docs/Argocd_sync_issue.md
Normal file
@@ -0,0 +1,419 @@
|
||||
# 🔧 ArgoCD Sync vs Actual Deployment Issue
|
||||
|
||||
## 🐛 The Problem
|
||||
|
||||
**Symptom:**
|
||||
- ArgoCD shows `Synced` ✅
|
||||
- Deployment manifest in Kubernetes is updated ✅
|
||||
- **BUT** pods are still running old image ❌
|
||||
|
||||
**Why This Happens:**
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Git Repository │
|
||||
│ deployment.yaml: image: app:v2 ✅ │
|
||||
└──────────────────┬──────────────────────────────────────────┘
|
||||
│
|
||||
│ ArgoCD syncs
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Kubernetes API (Deployment Object) │
|
||||
│ spec.template.image: app:v2 ✅ │
|
||||
└──────────────────┬──────────────────────────────────────────┘
|
||||
│
|
||||
│ Kubernetes Controller should trigger rollout
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Running Pods │
|
||||
│ Pod-1: image: app:v1 ❌ (OLD!) │
|
||||
│ Pod-2: image: app:v1 ❌ (OLD!) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
|
||||
ArgoCD says "Synced" because Git == Kubernetes manifest ✅
|
||||
But pods haven't rolled out yet! ❌
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Why ArgoCD Says "Synced"
|
||||
|
||||
ArgoCD checks:
|
||||
1. ✅ Git manifest == Kubernetes Deployment object
|
||||
2. ✅ Health status (from status fields)
|
||||
|
||||
ArgoCD **DOES NOT** check:
|
||||
- ❌ Are pods actually running?
|
||||
- ❌ What image are pods using?
|
||||
- ❌ Did rollout complete?
|
||||
|
||||
**ArgoCD's job:** Keep Kubernetes resources in sync with Git
|
||||
**NOT ArgoCD's job:** Wait for pods to finish rolling out
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ When This Happens
|
||||
|
||||
### Scenario 1: Slow Rollout
|
||||
```
|
||||
14:00:00 - ArgoCD syncs deployment (v1 → v2)
|
||||
14:00:05 - ArgoCD: "Synced!" ✅
|
||||
14:00:10 - Kubernetes starts rollout
|
||||
14:00:30 - Pod-1 terminates (v1)
|
||||
14:00:35 - Pod-3 starts (v2)
|
||||
14:00:50 - Pod-2 terminates (v1)
|
||||
14:00:55 - Pod-4 starts (v2)
|
||||
14:01:00 - Rollout complete! ✅
|
||||
|
||||
Jenkins checks at 14:00:05: ArgoCD says "Synced"
|
||||
But pods are still v1! ❌
|
||||
```
|
||||
|
||||
### Scenario 2: Image Pull Delay
|
||||
```
|
||||
14:00:00 - ArgoCD syncs
|
||||
14:00:05 - ArgoCD: "Synced!" ✅
|
||||
14:00:10 - Kubernetes tries to start new pod
|
||||
14:00:15 - Pulling image... (slow network)
|
||||
14:00:45 - Image pulled
|
||||
14:00:50 - Pod starts
|
||||
14:01:00 - Pod ready
|
||||
|
||||
Jenkins checks at 14:00:05: "Synced" but no new pods yet!
|
||||
```
|
||||
|
||||
### Scenario 3: Resource Constraints
|
||||
```
|
||||
14:00:00 - ArgoCD syncs
|
||||
14:00:05 - ArgoCD: "Synced!" ✅
|
||||
14:00:10 - Kubernetes: "No resources available"
|
||||
14:00:20 - Kubernetes: "Waiting for node capacity..."
|
||||
14:01:00 - Old pod terminates, resources freed
|
||||
14:01:10 - New pod starts
|
||||
|
||||
Jenkins checks at 14:00:05: "Synced" but can't schedule pods!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ The Solution
|
||||
|
||||
### What Jenkins Must Check:
|
||||
|
||||
```groovy
|
||||
// ❌ BAD - Only checks ArgoCD
|
||||
if (argocdStatus == 'Synced') {
|
||||
echo "Done!"
|
||||
}
|
||||
|
||||
// ✅ GOOD - Checks ArgoCD + Kubernetes
|
||||
if (argocdStatus == 'Synced') {
|
||||
// 1. Wait for rollout
|
||||
kubectl rollout status deployment/app
|
||||
|
||||
// 2. Verify actual pod images
|
||||
podImages = kubectl get pods -o jsonpath='{.status.containerStatuses[0].image}'
|
||||
if (podImages contains newVersion) {
|
||||
echo "Verified!"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎯 New Jenkinsfile Verification
|
||||
|
||||
### Stage 1: ArgoCD Sync Check
|
||||
```groovy
|
||||
stage('Wait for ArgoCD Sync') {
|
||||
// Checks:
|
||||
// 1. ArgoCD sync status = "Synced"
|
||||
// 2. Deployment SPEC image updated
|
||||
//
|
||||
// Does NOT check if pods rolled out!
|
||||
// That's the next stage.
|
||||
}
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
ArgoCD sync status: Synced
|
||||
Deployment spec image: app:v2
|
||||
✅ ArgoCD synced and deployment spec updated!
|
||||
Note: Pods may still be rolling out - will verify in next stage
|
||||
```
|
||||
|
||||
### Stage 2: Wait for Rollout
|
||||
```groovy
|
||||
stage('Wait for Deployment') {
|
||||
// Uses kubectl rollout status
|
||||
// Waits for actual pod rollout to complete
|
||||
sh "kubectl rollout status deployment/app --timeout=5m"
|
||||
}
|
||||
```
|
||||
|
||||
**What `kubectl rollout status` does:**
|
||||
- Watches deployment progress
|
||||
- Waits for all new pods to be ready
|
||||
- Returns when rollout complete
|
||||
- Times out if rollout stuck
|
||||
|
||||
**Output:**
|
||||
```
|
||||
Waiting for deployment "app" rollout to finish: 1 out of 2 new replicas have been updated...
|
||||
Waiting for deployment "app" rollout to finish: 1 old replicas are pending termination...
|
||||
deployment "app" successfully rolled out
|
||||
✅ Rollout completed successfully!
|
||||
```
|
||||
|
||||
### Stage 3: Verify Actual Pods
|
||||
```groovy
|
||||
stage('Verify Deployment') {
|
||||
// CRITICAL CHECKS:
|
||||
|
||||
// 1. Deployment status
|
||||
readyReplicas == desiredReplicas
|
||||
|
||||
// 2. Deployment spec image
|
||||
deploymentImage contains newTag
|
||||
|
||||
// 3. ACTUAL POD IMAGES (most important!)
|
||||
podImages = all pods images
|
||||
for each podImage:
|
||||
if podImage does not contain newTag:
|
||||
FAIL!
|
||||
|
||||
// 4. Pod health
|
||||
all pods in Running state
|
||||
|
||||
// 5. Restart count
|
||||
check for crash loops
|
||||
}
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
================================================
|
||||
DEPLOYMENT VERIFICATION
|
||||
================================================
|
||||
|
||||
1. Checking deployment status...
|
||||
Desired replicas: 2
|
||||
Updated replicas: 2
|
||||
Ready replicas: 2
|
||||
Available replicas: 2
|
||||
✅ All pods ready
|
||||
|
||||
2. Checking deployment spec image...
|
||||
Deployment spec image: app:v2
|
||||
Expected tag: v2
|
||||
✅ Deployment spec correct
|
||||
|
||||
3. Checking actual running pod images...
|
||||
Running pod images:
|
||||
- app:v2
|
||||
- app:v2
|
||||
✅ All pods running correct image
|
||||
|
||||
4. Checking pod readiness probes...
|
||||
✅ All pods in Running state
|
||||
|
||||
5. Checking for container restarts...
|
||||
Max restart count: 0
|
||||
✅ Restart count acceptable
|
||||
|
||||
================================================
|
||||
✅ ALL VERIFICATION CHECKS PASSED!
|
||||
================================================
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔥 What Happens If Check #3 Fails
|
||||
|
||||
```
|
||||
3. Checking actual running pod images...
|
||||
Running pod images:
|
||||
- app:v1 ❌
|
||||
- app:v1 ❌
|
||||
❌ Pod running wrong image: app:v1
|
||||
❌ FAILED: 2 pod(s) running old image!
|
||||
This is the ArgoCD sync bug - deployment updated but pods not rolled out
|
||||
```
|
||||
|
||||
**Jenkins will:**
|
||||
1. ❌ Mark build as failed
|
||||
2. 🔄 Trigger rollback (if enabled)
|
||||
3. 📱 Send notification with details
|
||||
|
||||
---
|
||||
|
||||
## 🧪 Testing the Fix
|
||||
|
||||
### Test 1: Normal Deployment
|
||||
```bash
|
||||
# Update image in Git
|
||||
git commit -m "Update to v2"
|
||||
git push
|
||||
|
||||
# Jenkins should:
|
||||
# 1. Wait for ArgoCD sync ✅
|
||||
# 2. Wait for rollout ✅
|
||||
# 3. Verify pods have v2 ✅
|
||||
# 4. Success! ✅
|
||||
```
|
||||
|
||||
### Test 2: Slow Rollout
|
||||
```bash
|
||||
# Set slow rollout
|
||||
kubectl patch deployment app -p '{"spec":{"strategy":{"rollingUpdate":{"maxUnavailable":0,"maxSurge":1}}}}'
|
||||
|
||||
# Update image
|
||||
git push
|
||||
|
||||
# Jenkins should:
|
||||
# 1. ArgoCD syncs quickly ✅
|
||||
# 2. Wait for slow rollout (may take 2-3 minutes) ⏳
|
||||
# 3. Verify when complete ✅
|
||||
```
|
||||
|
||||
### Test 3: Rollout Stuck
|
||||
```bash
|
||||
# Create a broken image tag
|
||||
# Update to image: app:nonexistent
|
||||
|
||||
git push
|
||||
|
||||
# Jenkins should:
|
||||
# 1. ArgoCD syncs ✅
|
||||
# 2. kubectl rollout status times out ❌
|
||||
# 3. Rollback triggered ✅
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Comparison: Old vs New
|
||||
|
||||
### Old Pipeline (Unreliable)
|
||||
```
|
||||
1. ArgoCD sync check
|
||||
├─ Checks: ArgoCD status
|
||||
├─ Checks: Deployment spec image
|
||||
└─ Duration: ~30 seconds
|
||||
|
||||
⚠️ PROBLEM: Pods might not have rolled out!
|
||||
|
||||
2. Success! ✅ (but pods are still old!)
|
||||
```
|
||||
|
||||
### New Pipeline (Reliable)
|
||||
```
|
||||
1. ArgoCD sync check
|
||||
├─ Checks: ArgoCD status
|
||||
├─ Checks: Deployment spec image
|
||||
└─ Duration: ~30 seconds
|
||||
|
||||
2. Rollout status check
|
||||
├─ Checks: kubectl rollout status
|
||||
├─ Waits: For actual pod rollout
|
||||
└─ Duration: ~1-2 minutes
|
||||
|
||||
3. Verification
|
||||
├─ Checks: Deployment status
|
||||
├─ Checks: ACTUAL pod images ← KEY!
|
||||
├─ Checks: Pod health
|
||||
├─ Checks: Restart count
|
||||
└─ Duration: ~10 seconds
|
||||
|
||||
4. Success! ✅ (pods verified running new version)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Key Takeaways
|
||||
|
||||
### ❌ Don't Trust:
|
||||
- ArgoCD "Synced" status alone
|
||||
- Deployment spec image alone
|
||||
- Health status alone
|
||||
|
||||
### ✅ Always Verify:
|
||||
1. **ArgoCD synced** (manifest applied)
|
||||
2. **Rollout completed** (`kubectl rollout status`)
|
||||
3. **Actual pod images** (what's really running)
|
||||
4. **Pod health** (ready and not crashing)
|
||||
|
||||
### 💡 Remember:
|
||||
```
|
||||
ArgoCD "Synced" = Git matches Kubernetes manifest ✅
|
||||
BUT
|
||||
Kubernetes manifest != Running pods ⚠️
|
||||
|
||||
You MUST check actual pods!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Issues
|
||||
|
||||
- [ArgoCD #2723](https://github.com/argoproj/argo-cd/issues/2723) - "Synced but pods not updated"
|
||||
- [Kubernetes #93033](https://github.com/kubernetes/kubernetes/issues/93033) - "Deployment rollout delays"
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Using the New Jenkinsfile
|
||||
|
||||
```bash
|
||||
# 1. Update Jenkinsfile in your repo
|
||||
cp Jenkinsfile.telegram.en apps/demo-nginx/Jenkinsfile
|
||||
|
||||
# 2. Commit and push
|
||||
git add apps/demo-nginx/Jenkinsfile
|
||||
git commit -m "fix: add proper deployment verification"
|
||||
git push
|
||||
|
||||
# 3. Run build
|
||||
# Jenkins will now properly verify deployments!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📱 Notifications
|
||||
|
||||
With the new verification, you'll see:
|
||||
|
||||
**During deployment:**
|
||||
```
|
||||
⏳ ArgoCD Syncing
|
||||
Application: demo-nginx
|
||||
Timeout: 120s
|
||||
|
||||
🚀 Deploying to Kubernetes
|
||||
Deployment: demo-nginx
|
||||
Image: main-42
|
||||
Rolling out new pods...
|
||||
```
|
||||
|
||||
**On success:**
|
||||
```
|
||||
✅ Deployment Successful!
|
||||
|
||||
Verified:
|
||||
- ArgoCD synced ✅
|
||||
- Rollout completed ✅
|
||||
- Pods running v42 ✅
|
||||
- All pods healthy ✅
|
||||
```
|
||||
|
||||
**On failure:**
|
||||
```
|
||||
❌ Deployment Failed
|
||||
|
||||
Error: 2 pods running old image!
|
||||
|
||||
Rollback initiated...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**This fix ensures Jenkins never reports success until pods are actually running the new version!** ✅
|
||||
Reference in New Issue
Block a user