Add apps/demo-nginx/docs/Argocd_sync_issue.md
This commit is contained in:
419
apps/demo-nginx/docs/Argocd_sync_issue.md
Normal file
419
apps/demo-nginx/docs/Argocd_sync_issue.md
Normal file
@@ -0,0 +1,419 @@
|
|||||||
|
# 🔧 ArgoCD Sync vs Actual Deployment Issue
|
||||||
|
|
||||||
|
## 🐛 The Problem
|
||||||
|
|
||||||
|
**Symptom:**
|
||||||
|
- ArgoCD shows `Synced` ✅
|
||||||
|
- Deployment manifest in Kubernetes is updated ✅
|
||||||
|
- **BUT** pods are still running old image ❌
|
||||||
|
|
||||||
|
**Why This Happens:**
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Git Repository │
|
||||||
|
│ deployment.yaml: image: app:v2 ✅ │
|
||||||
|
└──────────────────┬──────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
│ ArgoCD syncs
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Kubernetes API (Deployment Object) │
|
||||||
|
│ spec.template.image: app:v2 ✅ │
|
||||||
|
└──────────────────┬──────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
│ Kubernetes Controller should trigger rollout
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Running Pods │
|
||||||
|
│ Pod-1: image: app:v1 ❌ (OLD!) │
|
||||||
|
│ Pod-2: image: app:v1 ❌ (OLD!) │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
|
||||||
|
ArgoCD says "Synced" because Git == Kubernetes manifest ✅
|
||||||
|
But pods haven't rolled out yet! ❌
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔍 Why ArgoCD Says "Synced"
|
||||||
|
|
||||||
|
ArgoCD checks:
|
||||||
|
1. ✅ Git manifest == Kubernetes Deployment object
|
||||||
|
2. ✅ Health status (from status fields)
|
||||||
|
|
||||||
|
ArgoCD **DOES NOT** check:
|
||||||
|
- ❌ Are pods actually running?
|
||||||
|
- ❌ What image are pods using?
|
||||||
|
- ❌ Did rollout complete?
|
||||||
|
|
||||||
|
**ArgoCD's job:** Keep Kubernetes resources in sync with Git
|
||||||
|
**NOT ArgoCD's job:** Wait for pods to finish rolling out
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ⚠️ When This Happens
|
||||||
|
|
||||||
|
### Scenario 1: Slow Rollout
|
||||||
|
```
|
||||||
|
14:00:00 - ArgoCD syncs deployment (v1 → v2)
|
||||||
|
14:00:05 - ArgoCD: "Synced!" ✅
|
||||||
|
14:00:10 - Kubernetes starts rollout
|
||||||
|
14:00:30 - Pod-1 terminates (v1)
|
||||||
|
14:00:35 - Pod-3 starts (v2)
|
||||||
|
14:00:50 - Pod-2 terminates (v1)
|
||||||
|
14:00:55 - Pod-4 starts (v2)
|
||||||
|
14:01:00 - Rollout complete! ✅
|
||||||
|
|
||||||
|
Jenkins checks at 14:00:05: ArgoCD says "Synced"
|
||||||
|
But pods are still v1! ❌
|
||||||
|
```
|
||||||
|
|
||||||
|
### Scenario 2: Image Pull Delay
|
||||||
|
```
|
||||||
|
14:00:00 - ArgoCD syncs
|
||||||
|
14:00:05 - ArgoCD: "Synced!" ✅
|
||||||
|
14:00:10 - Kubernetes tries to start new pod
|
||||||
|
14:00:15 - Pulling image... (slow network)
|
||||||
|
14:00:45 - Image pulled
|
||||||
|
14:00:50 - Pod starts
|
||||||
|
14:01:00 - Pod ready
|
||||||
|
|
||||||
|
Jenkins checks at 14:00:05: "Synced" but no new pods yet!
|
||||||
|
```
|
||||||
|
|
||||||
|
### Scenario 3: Resource Constraints
|
||||||
|
```
|
||||||
|
14:00:00 - ArgoCD syncs
|
||||||
|
14:00:05 - ArgoCD: "Synced!" ✅
|
||||||
|
14:00:10 - Kubernetes: "No resources available"
|
||||||
|
14:00:20 - Kubernetes: "Waiting for node capacity..."
|
||||||
|
14:01:00 - Old pod terminates, resources freed
|
||||||
|
14:01:10 - New pod starts
|
||||||
|
|
||||||
|
Jenkins checks at 14:00:05: "Synced" but can't schedule pods!
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ✅ The Solution
|
||||||
|
|
||||||
|
### What Jenkins Must Check:
|
||||||
|
|
||||||
|
```groovy
|
||||||
|
// ❌ BAD - Only checks ArgoCD
|
||||||
|
if (argocdStatus == 'Synced') {
|
||||||
|
echo "Done!"
|
||||||
|
}
|
||||||
|
|
||||||
|
// ✅ GOOD - Checks ArgoCD + Kubernetes
|
||||||
|
if (argocdStatus == 'Synced') {
|
||||||
|
// 1. Wait for rollout
|
||||||
|
kubectl rollout status deployment/app
|
||||||
|
|
||||||
|
// 2. Verify actual pod images
|
||||||
|
podImages = kubectl get pods -o jsonpath='{.status.containerStatuses[0].image}'
|
||||||
|
if (podImages contains newVersion) {
|
||||||
|
echo "Verified!"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎯 New Jenkinsfile Verification
|
||||||
|
|
||||||
|
### Stage 1: ArgoCD Sync Check
|
||||||
|
```groovy
|
||||||
|
stage('Wait for ArgoCD Sync') {
|
||||||
|
// Checks:
|
||||||
|
// 1. ArgoCD sync status = "Synced"
|
||||||
|
// 2. Deployment SPEC image updated
|
||||||
|
//
|
||||||
|
// Does NOT check if pods rolled out!
|
||||||
|
// That's the next stage.
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output:**
|
||||||
|
```
|
||||||
|
ArgoCD sync status: Synced
|
||||||
|
Deployment spec image: app:v2
|
||||||
|
✅ ArgoCD synced and deployment spec updated!
|
||||||
|
Note: Pods may still be rolling out - will verify in next stage
|
||||||
|
```
|
||||||
|
|
||||||
|
### Stage 2: Wait for Rollout
|
||||||
|
```groovy
|
||||||
|
stage('Wait for Deployment') {
|
||||||
|
// Uses kubectl rollout status
|
||||||
|
// Waits for actual pod rollout to complete
|
||||||
|
sh "kubectl rollout status deployment/app --timeout=5m"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**What `kubectl rollout status` does:**
|
||||||
|
- Watches deployment progress
|
||||||
|
- Waits for all new pods to be ready
|
||||||
|
- Returns when rollout complete
|
||||||
|
- Times out if rollout stuck
|
||||||
|
|
||||||
|
**Output:**
|
||||||
|
```
|
||||||
|
Waiting for deployment "app" rollout to finish: 1 out of 2 new replicas have been updated...
|
||||||
|
Waiting for deployment "app" rollout to finish: 1 old replicas are pending termination...
|
||||||
|
deployment "app" successfully rolled out
|
||||||
|
✅ Rollout completed successfully!
|
||||||
|
```
|
||||||
|
|
||||||
|
### Stage 3: Verify Actual Pods
|
||||||
|
```groovy
|
||||||
|
stage('Verify Deployment') {
|
||||||
|
// CRITICAL CHECKS:
|
||||||
|
|
||||||
|
// 1. Deployment status
|
||||||
|
readyReplicas == desiredReplicas
|
||||||
|
|
||||||
|
// 2. Deployment spec image
|
||||||
|
deploymentImage contains newTag
|
||||||
|
|
||||||
|
// 3. ACTUAL POD IMAGES (most important!)
|
||||||
|
podImages = all pods images
|
||||||
|
for each podImage:
|
||||||
|
if podImage does not contain newTag:
|
||||||
|
FAIL!
|
||||||
|
|
||||||
|
// 4. Pod health
|
||||||
|
all pods in Running state
|
||||||
|
|
||||||
|
// 5. Restart count
|
||||||
|
check for crash loops
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output:**
|
||||||
|
```
|
||||||
|
================================================
|
||||||
|
DEPLOYMENT VERIFICATION
|
||||||
|
================================================
|
||||||
|
|
||||||
|
1. Checking deployment status...
|
||||||
|
Desired replicas: 2
|
||||||
|
Updated replicas: 2
|
||||||
|
Ready replicas: 2
|
||||||
|
Available replicas: 2
|
||||||
|
✅ All pods ready
|
||||||
|
|
||||||
|
2. Checking deployment spec image...
|
||||||
|
Deployment spec image: app:v2
|
||||||
|
Expected tag: v2
|
||||||
|
✅ Deployment spec correct
|
||||||
|
|
||||||
|
3. Checking actual running pod images...
|
||||||
|
Running pod images:
|
||||||
|
- app:v2
|
||||||
|
- app:v2
|
||||||
|
✅ All pods running correct image
|
||||||
|
|
||||||
|
4. Checking pod readiness probes...
|
||||||
|
✅ All pods in Running state
|
||||||
|
|
||||||
|
5. Checking for container restarts...
|
||||||
|
Max restart count: 0
|
||||||
|
✅ Restart count acceptable
|
||||||
|
|
||||||
|
================================================
|
||||||
|
✅ ALL VERIFICATION CHECKS PASSED!
|
||||||
|
================================================
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔥 What Happens If Check #3 Fails
|
||||||
|
|
||||||
|
```
|
||||||
|
3. Checking actual running pod images...
|
||||||
|
Running pod images:
|
||||||
|
- app:v1 ❌
|
||||||
|
- app:v1 ❌
|
||||||
|
❌ Pod running wrong image: app:v1
|
||||||
|
❌ FAILED: 2 pod(s) running old image!
|
||||||
|
This is the ArgoCD sync bug - deployment updated but pods not rolled out
|
||||||
|
```
|
||||||
|
|
||||||
|
**Jenkins will:**
|
||||||
|
1. ❌ Mark build as failed
|
||||||
|
2. 🔄 Trigger rollback (if enabled)
|
||||||
|
3. 📱 Send notification with details
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🧪 Testing the Fix
|
||||||
|
|
||||||
|
### Test 1: Normal Deployment
|
||||||
|
```bash
|
||||||
|
# Update image in Git
|
||||||
|
git commit -m "Update to v2"
|
||||||
|
git push
|
||||||
|
|
||||||
|
# Jenkins should:
|
||||||
|
# 1. Wait for ArgoCD sync ✅
|
||||||
|
# 2. Wait for rollout ✅
|
||||||
|
# 3. Verify pods have v2 ✅
|
||||||
|
# 4. Success! ✅
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test 2: Slow Rollout
|
||||||
|
```bash
|
||||||
|
# Set slow rollout
|
||||||
|
kubectl patch deployment app -p '{"spec":{"strategy":{"rollingUpdate":{"maxUnavailable":0,"maxSurge":1}}}}'
|
||||||
|
|
||||||
|
# Update image
|
||||||
|
git push
|
||||||
|
|
||||||
|
# Jenkins should:
|
||||||
|
# 1. ArgoCD syncs quickly ✅
|
||||||
|
# 2. Wait for slow rollout (may take 2-3 minutes) ⏳
|
||||||
|
# 3. Verify when complete ✅
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test 3: Rollout Stuck
|
||||||
|
```bash
|
||||||
|
# Create a broken image tag
|
||||||
|
# Update to image: app:nonexistent
|
||||||
|
|
||||||
|
git push
|
||||||
|
|
||||||
|
# Jenkins should:
|
||||||
|
# 1. ArgoCD syncs ✅
|
||||||
|
# 2. kubectl rollout status times out ❌
|
||||||
|
# 3. Rollback triggered ✅
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📊 Comparison: Old vs New
|
||||||
|
|
||||||
|
### Old Pipeline (Unreliable)
|
||||||
|
```
|
||||||
|
1. ArgoCD sync check
|
||||||
|
├─ Checks: ArgoCD status
|
||||||
|
├─ Checks: Deployment spec image
|
||||||
|
└─ Duration: ~30 seconds
|
||||||
|
|
||||||
|
⚠️ PROBLEM: Pods might not have rolled out!
|
||||||
|
|
||||||
|
2. Success! ✅ (but pods are still old!)
|
||||||
|
```
|
||||||
|
|
||||||
|
### New Pipeline (Reliable)
|
||||||
|
```
|
||||||
|
1. ArgoCD sync check
|
||||||
|
├─ Checks: ArgoCD status
|
||||||
|
├─ Checks: Deployment spec image
|
||||||
|
└─ Duration: ~30 seconds
|
||||||
|
|
||||||
|
2. Rollout status check
|
||||||
|
├─ Checks: kubectl rollout status
|
||||||
|
├─ Waits: For actual pod rollout
|
||||||
|
└─ Duration: ~1-2 minutes
|
||||||
|
|
||||||
|
3. Verification
|
||||||
|
├─ Checks: Deployment status
|
||||||
|
├─ Checks: ACTUAL pod images ← KEY!
|
||||||
|
├─ Checks: Pod health
|
||||||
|
├─ Checks: Restart count
|
||||||
|
└─ Duration: ~10 seconds
|
||||||
|
|
||||||
|
4. Success! ✅ (pods verified running new version)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎯 Key Takeaways
|
||||||
|
|
||||||
|
### ❌ Don't Trust:
|
||||||
|
- ArgoCD "Synced" status alone
|
||||||
|
- Deployment spec image alone
|
||||||
|
- Health status alone
|
||||||
|
|
||||||
|
### ✅ Always Verify:
|
||||||
|
1. **ArgoCD synced** (manifest applied)
|
||||||
|
2. **Rollout completed** (`kubectl rollout status`)
|
||||||
|
3. **Actual pod images** (what's really running)
|
||||||
|
4. **Pod health** (ready and not crashing)
|
||||||
|
|
||||||
|
### 💡 Remember:
|
||||||
|
```
|
||||||
|
ArgoCD "Synced" = Git matches Kubernetes manifest ✅
|
||||||
|
BUT
|
||||||
|
Kubernetes manifest != Running pods ⚠️
|
||||||
|
|
||||||
|
You MUST check actual pods!
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔗 Related Issues
|
||||||
|
|
||||||
|
- [ArgoCD #2723](https://github.com/argoproj/argo-cd/issues/2723) - "Synced but pods not updated"
|
||||||
|
- [Kubernetes #93033](https://github.com/kubernetes/kubernetes/issues/93033) - "Deployment rollout delays"
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚀 Using the New Jenkinsfile
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Update Jenkinsfile in your repo
|
||||||
|
cp Jenkinsfile.telegram.en apps/demo-nginx/Jenkinsfile
|
||||||
|
|
||||||
|
# 2. Commit and push
|
||||||
|
git add apps/demo-nginx/Jenkinsfile
|
||||||
|
git commit -m "fix: add proper deployment verification"
|
||||||
|
git push
|
||||||
|
|
||||||
|
# 3. Run build
|
||||||
|
# Jenkins will now properly verify deployments!
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📱 Notifications
|
||||||
|
|
||||||
|
With the new verification, you'll see:
|
||||||
|
|
||||||
|
**During deployment:**
|
||||||
|
```
|
||||||
|
⏳ ArgoCD Syncing
|
||||||
|
Application: demo-nginx
|
||||||
|
Timeout: 120s
|
||||||
|
|
||||||
|
🚀 Deploying to Kubernetes
|
||||||
|
Deployment: demo-nginx
|
||||||
|
Image: main-42
|
||||||
|
Rolling out new pods...
|
||||||
|
```
|
||||||
|
|
||||||
|
**On success:**
|
||||||
|
```
|
||||||
|
✅ Deployment Successful!
|
||||||
|
|
||||||
|
Verified:
|
||||||
|
- ArgoCD synced ✅
|
||||||
|
- Rollout completed ✅
|
||||||
|
- Pods running v42 ✅
|
||||||
|
- All pods healthy ✅
|
||||||
|
```
|
||||||
|
|
||||||
|
**On failure:**
|
||||||
|
```
|
||||||
|
❌ Deployment Failed
|
||||||
|
|
||||||
|
Error: 2 pods running old image!
|
||||||
|
|
||||||
|
Rollback initiated...
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**This fix ensures Jenkins never reports success until pods are actually running the new version!** ✅
|
||||||
Reference in New Issue
Block a user