10 KiB
10 KiB
🔧 ArgoCD Sync vs Actual Deployment Issue
🐛 The Problem
Symptom:
- ArgoCD shows
Synced✅ - Deployment manifest in Kubernetes is updated ✅
- BUT pods are still running old image ❌
Why This Happens:
┌─────────────────────────────────────────────────────────────┐
│ Git Repository │
│ deployment.yaml: image: app:v2 ✅ │
└──────────────────┬──────────────────────────────────────────┘
│
│ ArgoCD syncs
▼
┌─────────────────────────────────────────────────────────────┐
│ Kubernetes API (Deployment Object) │
│ spec.template.image: app:v2 ✅ │
└──────────────────┬──────────────────────────────────────────┘
│
│ Kubernetes Controller should trigger rollout
▼
┌─────────────────────────────────────────────────────────────┐
│ Running Pods │
│ Pod-1: image: app:v1 ❌ (OLD!) │
│ Pod-2: image: app:v1 ❌ (OLD!) │
└─────────────────────────────────────────────────────────────┘
ArgoCD says "Synced" because Git == Kubernetes manifest ✅
But pods haven't rolled out yet! ❌
🔍 Why ArgoCD Says "Synced"
ArgoCD checks:
- ✅ Git manifest == Kubernetes Deployment object
- ✅ Health status (from status fields)
ArgoCD DOES NOT check:
- ❌ Are pods actually running?
- ❌ What image are pods using?
- ❌ Did rollout complete?
ArgoCD's job: Keep Kubernetes resources in sync with Git NOT ArgoCD's job: Wait for pods to finish rolling out
⚠️ When This Happens
Scenario 1: Slow Rollout
14:00:00 - ArgoCD syncs deployment (v1 → v2)
14:00:05 - ArgoCD: "Synced!" ✅
14:00:10 - Kubernetes starts rollout
14:00:30 - Pod-1 terminates (v1)
14:00:35 - Pod-3 starts (v2)
14:00:50 - Pod-2 terminates (v1)
14:00:55 - Pod-4 starts (v2)
14:01:00 - Rollout complete! ✅
Jenkins checks at 14:00:05: ArgoCD says "Synced"
But pods are still v1! ❌
Scenario 2: Image Pull Delay
14:00:00 - ArgoCD syncs
14:00:05 - ArgoCD: "Synced!" ✅
14:00:10 - Kubernetes tries to start new pod
14:00:15 - Pulling image... (slow network)
14:00:45 - Image pulled
14:00:50 - Pod starts
14:01:00 - Pod ready
Jenkins checks at 14:00:05: "Synced" but no new pods yet!
Scenario 3: Resource Constraints
14:00:00 - ArgoCD syncs
14:00:05 - ArgoCD: "Synced!" ✅
14:00:10 - Kubernetes: "No resources available"
14:00:20 - Kubernetes: "Waiting for node capacity..."
14:01:00 - Old pod terminates, resources freed
14:01:10 - New pod starts
Jenkins checks at 14:00:05: "Synced" but can't schedule pods!
✅ The Solution
What Jenkins Must Check:
// ❌ BAD - Only checks ArgoCD
if (argocdStatus == 'Synced') {
echo "Done!"
}
// ✅ GOOD - Checks ArgoCD + Kubernetes
if (argocdStatus == 'Synced') {
// 1. Wait for rollout
kubectl rollout status deployment/app
// 2. Verify actual pod images
podImages = kubectl get pods -o jsonpath='{.status.containerStatuses[0].image}'
if (podImages contains newVersion) {
echo "Verified!"
}
}
🎯 New Jenkinsfile Verification
Stage 1: ArgoCD Sync Check
stage('Wait for ArgoCD Sync') {
// Checks:
// 1. ArgoCD sync status = "Synced"
// 2. Deployment SPEC image updated
//
// Does NOT check if pods rolled out!
// That's the next stage.
}
Output:
ArgoCD sync status: Synced
Deployment spec image: app:v2
✅ ArgoCD synced and deployment spec updated!
Note: Pods may still be rolling out - will verify in next stage
Stage 2: Wait for Rollout
stage('Wait for Deployment') {
// Uses kubectl rollout status
// Waits for actual pod rollout to complete
sh "kubectl rollout status deployment/app --timeout=5m"
}
What kubectl rollout status does:
- Watches deployment progress
- Waits for all new pods to be ready
- Returns when rollout complete
- Times out if rollout stuck
Output:
Waiting for deployment "app" rollout to finish: 1 out of 2 new replicas have been updated...
Waiting for deployment "app" rollout to finish: 1 old replicas are pending termination...
deployment "app" successfully rolled out
✅ Rollout completed successfully!
Stage 3: Verify Actual Pods
stage('Verify Deployment') {
// CRITICAL CHECKS:
// 1. Deployment status
readyReplicas == desiredReplicas
// 2. Deployment spec image
deploymentImage contains newTag
// 3. ACTUAL POD IMAGES (most important!)
podImages = all pods images
for each podImage:
if podImage does not contain newTag:
FAIL!
// 4. Pod health
all pods in Running state
// 5. Restart count
check for crash loops
}
Output:
================================================
DEPLOYMENT VERIFICATION
================================================
1. Checking deployment status...
Desired replicas: 2
Updated replicas: 2
Ready replicas: 2
Available replicas: 2
✅ All pods ready
2. Checking deployment spec image...
Deployment spec image: app:v2
Expected tag: v2
✅ Deployment spec correct
3. Checking actual running pod images...
Running pod images:
- app:v2
- app:v2
✅ All pods running correct image
4. Checking pod readiness probes...
✅ All pods in Running state
5. Checking for container restarts...
Max restart count: 0
✅ Restart count acceptable
================================================
✅ ALL VERIFICATION CHECKS PASSED!
================================================
🔥 What Happens If Check #3 Fails
3. Checking actual running pod images...
Running pod images:
- app:v1 ❌
- app:v1 ❌
❌ Pod running wrong image: app:v1
❌ FAILED: 2 pod(s) running old image!
This is the ArgoCD sync bug - deployment updated but pods not rolled out
Jenkins will:
- ❌ Mark build as failed
- 🔄 Trigger rollback (if enabled)
- 📱 Send notification with details
🧪 Testing the Fix
Test 1: Normal Deployment
# Update image in Git
git commit -m "Update to v2"
git push
# Jenkins should:
# 1. Wait for ArgoCD sync ✅
# 2. Wait for rollout ✅
# 3. Verify pods have v2 ✅
# 4. Success! ✅
Test 2: Slow Rollout
# Set slow rollout
kubectl patch deployment app -p '{"spec":{"strategy":{"rollingUpdate":{"maxUnavailable":0,"maxSurge":1}}}}'
# Update image
git push
# Jenkins should:
# 1. ArgoCD syncs quickly ✅
# 2. Wait for slow rollout (may take 2-3 minutes) ⏳
# 3. Verify when complete ✅
Test 3: Rollout Stuck
# Create a broken image tag
# Update to image: app:nonexistent
git push
# Jenkins should:
# 1. ArgoCD syncs ✅
# 2. kubectl rollout status times out ❌
# 3. Rollback triggered ✅
📊 Comparison: Old vs New
Old Pipeline (Unreliable)
1. ArgoCD sync check
├─ Checks: ArgoCD status
├─ Checks: Deployment spec image
└─ Duration: ~30 seconds
⚠️ PROBLEM: Pods might not have rolled out!
2. Success! ✅ (but pods are still old!)
New Pipeline (Reliable)
1. ArgoCD sync check
├─ Checks: ArgoCD status
├─ Checks: Deployment spec image
└─ Duration: ~30 seconds
2. Rollout status check
├─ Checks: kubectl rollout status
├─ Waits: For actual pod rollout
└─ Duration: ~1-2 minutes
3. Verification
├─ Checks: Deployment status
├─ Checks: ACTUAL pod images ← KEY!
├─ Checks: Pod health
├─ Checks: Restart count
└─ Duration: ~10 seconds
4. Success! ✅ (pods verified running new version)
🎯 Key Takeaways
❌ Don't Trust:
- ArgoCD "Synced" status alone
- Deployment spec image alone
- Health status alone
✅ Always Verify:
- ArgoCD synced (manifest applied)
- Rollout completed (
kubectl rollout status) - Actual pod images (what's really running)
- Pod health (ready and not crashing)
💡 Remember:
ArgoCD "Synced" = Git matches Kubernetes manifest ✅
BUT
Kubernetes manifest != Running pods ⚠️
You MUST check actual pods!
🔗 Related Issues
- ArgoCD #2723 - "Synced but pods not updated"
- Kubernetes #93033 - "Deployment rollout delays"
🚀 Using the New Jenkinsfile
# 1. Update Jenkinsfile in your repo
cp Jenkinsfile.telegram.en apps/demo-nginx/Jenkinsfile
# 2. Commit and push
git add apps/demo-nginx/Jenkinsfile
git commit -m "fix: add proper deployment verification"
git push
# 3. Run build
# Jenkins will now properly verify deployments!
📱 Notifications
With the new verification, you'll see:
During deployment:
⏳ ArgoCD Syncing
Application: demo-nginx
Timeout: 120s
🚀 Deploying to Kubernetes
Deployment: demo-nginx
Image: main-42
Rolling out new pods...
On success:
✅ Deployment Successful!
Verified:
- ArgoCD synced ✅
- Rollout completed ✅
- Pods running v42 ✅
- All pods healthy ✅
On failure:
❌ Deployment Failed
Error: 2 pods running old image!
Rollback initiated...
This fix ensures Jenkins never reports success until pods are actually running the new version! ✅