Files
k3s-gitops/apps/demo-nginx/docs/Argocd_sync_issue.md

10 KiB

🔧 ArgoCD Sync vs Actual Deployment Issue

🐛 The Problem

Symptom:

  • ArgoCD shows Synced
  • Deployment manifest in Kubernetes is updated
  • BUT pods are still running old image

Why This Happens:

┌─────────────────────────────────────────────────────────────┐
│ Git Repository                                              │
│   deployment.yaml: image: app:v2  ✅                        │
└──────────────────┬──────────────────────────────────────────┘
                   │
                   │ ArgoCD syncs
                   ▼
┌─────────────────────────────────────────────────────────────┐
│ Kubernetes API (Deployment Object)                         │
│   spec.template.image: app:v2  ✅                           │
└──────────────────┬──────────────────────────────────────────┘
                   │
                   │ Kubernetes Controller should trigger rollout
                   ▼
┌─────────────────────────────────────────────────────────────┐
│ Running Pods                                                │
│   Pod-1: image: app:v1  ❌ (OLD!)                           │
│   Pod-2: image: app:v1  ❌ (OLD!)                           │
└─────────────────────────────────────────────────────────────┘

ArgoCD says "Synced" because Git == Kubernetes manifest ✅
But pods haven't rolled out yet! ❌

🔍 Why ArgoCD Says "Synced"

ArgoCD checks:

  1. Git manifest == Kubernetes Deployment object
  2. Health status (from status fields)

ArgoCD DOES NOT check:

  • Are pods actually running?
  • What image are pods using?
  • Did rollout complete?

ArgoCD's job: Keep Kubernetes resources in sync with Git NOT ArgoCD's job: Wait for pods to finish rolling out


⚠️ When This Happens

Scenario 1: Slow Rollout

14:00:00 - ArgoCD syncs deployment (v1 → v2)
14:00:05 - ArgoCD: "Synced!" ✅
14:00:10 - Kubernetes starts rollout
14:00:30 - Pod-1 terminates (v1)
14:00:35 - Pod-3 starts (v2)
14:00:50 - Pod-2 terminates (v1)
14:00:55 - Pod-4 starts (v2)
14:01:00 - Rollout complete! ✅

Jenkins checks at 14:00:05: ArgoCD says "Synced"
But pods are still v1! ❌

Scenario 2: Image Pull Delay

14:00:00 - ArgoCD syncs
14:00:05 - ArgoCD: "Synced!" ✅
14:00:10 - Kubernetes tries to start new pod
14:00:15 - Pulling image... (slow network)
14:00:45 - Image pulled
14:00:50 - Pod starts
14:01:00 - Pod ready

Jenkins checks at 14:00:05: "Synced" but no new pods yet!

Scenario 3: Resource Constraints

14:00:00 - ArgoCD syncs
14:00:05 - ArgoCD: "Synced!" ✅
14:00:10 - Kubernetes: "No resources available"
14:00:20 - Kubernetes: "Waiting for node capacity..."
14:01:00 - Old pod terminates, resources freed
14:01:10 - New pod starts

Jenkins checks at 14:00:05: "Synced" but can't schedule pods!

The Solution

What Jenkins Must Check:

// ❌ BAD - Only checks ArgoCD
if (argocdStatus == 'Synced') {
    echo "Done!"
}

// ✅ GOOD - Checks ArgoCD + Kubernetes
if (argocdStatus == 'Synced') {
    // 1. Wait for rollout
    kubectl rollout status deployment/app
    
    // 2. Verify actual pod images
    podImages = kubectl get pods -o jsonpath='{.status.containerStatuses[0].image}'
    if (podImages contains newVersion) {
        echo "Verified!"
    }
}

🎯 New Jenkinsfile Verification

Stage 1: ArgoCD Sync Check

stage('Wait for ArgoCD Sync') {
    // Checks:
    // 1. ArgoCD sync status = "Synced"
    // 2. Deployment SPEC image updated
    // 
    // Does NOT check if pods rolled out!
    // That's the next stage.
}

Output:

ArgoCD sync status: Synced
Deployment spec image: app:v2
✅ ArgoCD synced and deployment spec updated!
Note: Pods may still be rolling out - will verify in next stage

Stage 2: Wait for Rollout

stage('Wait for Deployment') {
    // Uses kubectl rollout status
    // Waits for actual pod rollout to complete
    sh "kubectl rollout status deployment/app --timeout=5m"
}

What kubectl rollout status does:

  • Watches deployment progress
  • Waits for all new pods to be ready
  • Returns when rollout complete
  • Times out if rollout stuck

Output:

Waiting for deployment "app" rollout to finish: 1 out of 2 new replicas have been updated...
Waiting for deployment "app" rollout to finish: 1 old replicas are pending termination...
deployment "app" successfully rolled out
✅ Rollout completed successfully!

Stage 3: Verify Actual Pods

stage('Verify Deployment') {
    // CRITICAL CHECKS:
    
    // 1. Deployment status
    readyReplicas == desiredReplicas
    
    // 2. Deployment spec image
    deploymentImage contains newTag
    
    // 3. ACTUAL POD IMAGES (most important!)
    podImages = all pods images
    for each podImage:
        if podImage does not contain newTag:
            FAIL!
    
    // 4. Pod health
    all pods in Running state
    
    // 5. Restart count
    check for crash loops
}

Output:

================================================
DEPLOYMENT VERIFICATION
================================================

1. Checking deployment status...
   Desired replicas: 2
   Updated replicas: 2
   Ready replicas: 2
   Available replicas: 2
   ✅ All pods ready

2. Checking deployment spec image...
   Deployment spec image: app:v2
   Expected tag: v2
   ✅ Deployment spec correct

3. Checking actual running pod images...
   Running pod images:
      - app:v2
      - app:v2
   ✅ All pods running correct image

4. Checking pod readiness probes...
   ✅ All pods in Running state

5. Checking for container restarts...
   Max restart count: 0
   ✅ Restart count acceptable

================================================
✅ ALL VERIFICATION CHECKS PASSED!
================================================

🔥 What Happens If Check #3 Fails

3. Checking actual running pod images...
   Running pod images:
      - app:v1  ❌
      - app:v1  ❌
   ❌ Pod running wrong image: app:v1
   ❌ FAILED: 2 pod(s) running old image!
   This is the ArgoCD sync bug - deployment updated but pods not rolled out

Jenkins will:

  1. Mark build as failed
  2. 🔄 Trigger rollback (if enabled)
  3. 📱 Send notification with details

🧪 Testing the Fix

Test 1: Normal Deployment

# Update image in Git
git commit -m "Update to v2"
git push

# Jenkins should:
# 1. Wait for ArgoCD sync ✅
# 2. Wait for rollout ✅
# 3. Verify pods have v2 ✅
# 4. Success! ✅

Test 2: Slow Rollout

# Set slow rollout
kubectl patch deployment app -p '{"spec":{"strategy":{"rollingUpdate":{"maxUnavailable":0,"maxSurge":1}}}}'

# Update image
git push

# Jenkins should:
# 1. ArgoCD syncs quickly ✅
# 2. Wait for slow rollout (may take 2-3 minutes) ⏳
# 3. Verify when complete ✅

Test 3: Rollout Stuck

# Create a broken image tag
# Update to image: app:nonexistent

git push

# Jenkins should:
# 1. ArgoCD syncs ✅
# 2. kubectl rollout status times out ❌
# 3. Rollback triggered ✅

📊 Comparison: Old vs New

Old Pipeline (Unreliable)

1. ArgoCD sync check
   ├─ Checks: ArgoCD status
   ├─ Checks: Deployment spec image
   └─ Duration: ~30 seconds
   
   ⚠️ PROBLEM: Pods might not have rolled out!
   
2. Success! ✅ (but pods are still old!)

New Pipeline (Reliable)

1. ArgoCD sync check
   ├─ Checks: ArgoCD status
   ├─ Checks: Deployment spec image
   └─ Duration: ~30 seconds
   
2. Rollout status check
   ├─ Checks: kubectl rollout status
   ├─ Waits: For actual pod rollout
   └─ Duration: ~1-2 minutes
   
3. Verification
   ├─ Checks: Deployment status
   ├─ Checks: ACTUAL pod images ← KEY!
   ├─ Checks: Pod health
   ├─ Checks: Restart count
   └─ Duration: ~10 seconds
   
4. Success! ✅ (pods verified running new version)

🎯 Key Takeaways

Don't Trust:

  • ArgoCD "Synced" status alone
  • Deployment spec image alone
  • Health status alone

Always Verify:

  1. ArgoCD synced (manifest applied)
  2. Rollout completed (kubectl rollout status)
  3. Actual pod images (what's really running)
  4. Pod health (ready and not crashing)

💡 Remember:

ArgoCD "Synced" = Git matches Kubernetes manifest ✅
BUT
Kubernetes manifest != Running pods ⚠️

You MUST check actual pods!


🚀 Using the New Jenkinsfile

# 1. Update Jenkinsfile in your repo
cp Jenkinsfile.telegram.en apps/demo-nginx/Jenkinsfile

# 2. Commit and push
git add apps/demo-nginx/Jenkinsfile
git commit -m "fix: add proper deployment verification"
git push

# 3. Run build
# Jenkins will now properly verify deployments!

📱 Notifications

With the new verification, you'll see:

During deployment:

⏳ ArgoCD Syncing
Application: demo-nginx
Timeout: 120s

🚀 Deploying to Kubernetes
Deployment: demo-nginx
Image: main-42
Rolling out new pods...

On success:

✅ Deployment Successful!

Verified:
- ArgoCD synced ✅
- Rollout completed ✅  
- Pods running v42 ✅
- All pods healthy ✅

On failure:

❌ Deployment Failed

Error: 2 pods running old image!

Rollback initiated...

This fix ensures Jenkins never reports success until pods are actually running the new version!