Operational Runbooks#
Audience: IT Operations, Helpdesk, Security Operations
Prerequisites: Access to Kleidia admin interface and/or Kubernetes cluster
Outcome: Resolve incidents efficiently with documented procedures
Runbook Overview#
This document provides step-by-step procedures for common incidents and operational scenarios. Each runbook follows a consistent structure:
- Trigger: What initiates this procedure
- Roles Involved: Who participates
- Actions: Step-by-step procedure
- Expected Outcome: What success looks like
- Related Docs: Links to additional information
Lost or Stolen YubiKey#
Trigger#
User reports their YubiKey as lost, stolen, or potentially compromised.
Roles Involved#
| Role | Responsibility |
|---|---|
| End User | Reports incident to helpdesk |
| Helpdesk | Verifies identity, initiates revocation |
| Security Team | Reviews for signs of compromise, approves replacement |
Actions#
1. Verify User Identity#
Before taking any action, verify the reporter’s identity using your organization’s verification policy (e.g., callback, manager confirmation, security questions).
2. Locate Device in Kleidia#
- Log into Kleidia admin interface
- Navigate to Admin → YubiKeys
- Search for the user’s device by:
- User name/email
- Device serial number (if known)
3. Revoke All Certificates#
- Select the lost YubiKey
- Click Revoke All Certificates
- Confirm the revocation
- Certificates are added to CRL immediately
4. Disable Device#
- Select Mark as Lost/Stolen
- Device status changes to disabled
- Document the incident in the notes field
5. Disable FIDO2 Credentials#
- Navigate to FIDO2 Credentials for the user
- Remove/disable all credentials associated with the lost device
- Verify removal in the credential list
6. Document in Audit Log#
- Navigate to Audit Logs
- Verify revocation and disable events are logged
- Add incident reference number if applicable
7. Issue Replacement (Optional)#
If user needs a replacement:
- Obtain new YubiKey from inventory
- Navigate to YubiKeys → Register New Device
- Follow enrollment wizard for the user
- Generate new certificates
- Register new FIDO2 credentials
Expected Outcome#
- ✅ All certificates on lost device are revoked
- ✅ Device is marked as lost/disabled in system
- ✅ FIDO2 credentials are removed
- ✅ Audit log documents all actions
- ✅ User has replacement device (if applicable)
Time to Resolution#
| Priority | Target |
|---|---|
| Lost (no suspected theft) | 4 hours |
| Stolen/Compromised | 1 hour |
Related Docs#
User Leaves Company#
Trigger#
HR notifies IT that a user is leaving the organization (voluntary or involuntary).
Roles Involved#
| Role | Responsibility |
|---|---|
| HR | Initiates departure notification |
| Helpdesk/IT Admin | Executes credential revocation |
| User’s Manager | Confirms departure, coordinates handoff |
| Security Team | Verifies complete revocation (high-risk departures) |
Actions#
1. Receive Departure Notification#
Document receipt of notification including:
- User name/email
- Last working day
- Priority (standard vs. immediate termination)
2. Identify User’s YubiKey(s)#
- Log into Kleidia admin interface
- Navigate to Admin → Users
- Search for departing user
- List all assigned YubiKeys
3. Revoke All Certificates#
For each YubiKey:
- Select the device
- Click Revoke All Certificates
- Confirm revocation
- Verify certificates added to CRL
4. Disable FIDO2 Credentials#
- Navigate to FIDO2 Credentials for the user
- Remove all registered credentials
- Verify removal complete
5. Disable or Retire Devices#
Based on your organization’s policy:
Option A: Retire Device
- Mark device as retired
- Physical device collected and securely disposed
Option B: Reassign Device
- Mark device as available
- Wipe PIV certificates and keys
- Reset PIN/PUK to defaults
- Re-enroll to new user
6. Disable User Account#
- Navigate to Admin → Users
- Select the departing user
- Click Disable Account
- User can no longer log in
7. Generate Departure Report#
- Navigate to Audit Logs
- Filter by user
- Export log for compliance records
- Attach to HR departure file
Expected Outcome#
- ✅ All user certificates revoked
- ✅ All FIDO2 credentials removed
- ✅ User account disabled
- ✅ Devices collected or reassigned
- ✅ Audit trail documented
- ✅ Report generated for HR/compliance
Time to Resolution#
| Departure Type | Target |
|---|---|
| Standard (2+ weeks notice) | Before last day |
| Immediate termination | Within 1 hour of notification |
Related Docs#
OpenBao/Vault Failure#
Trigger#
OpenBao (Vault) is unavailable, sealed, or returning errors.
Roles Involved#
| Role | Responsibility |
|---|---|
| Operations/DevOps | Diagnose and restore service |
| Security Team | Provide unseal keys if needed |
| Management | Authorize data restoration if needed |
Actions#
1. Identify the Issue#
Check Vault status:
kubectl exec -it kleidia-openbao-0 -n kleidia -- vault statusCommon states:
- Sealed: Vault needs to be unsealed
- Standby: HA replica, not primary
- Active: Should be working
- Pod not running: Container issue
2. If Vault is Sealed#
Unseal using your organization’s procedure:
# Unseal with key shares (repeat for each key holder)
kubectl exec -it kleidia-openbao-0 -n kleidia -- \
vault operator unseal <key-share>Note: Your organization should have a documented key ceremony procedure. Never store unseal keys in the same location.
3. If Pod is Crashing#
Check pod logs:
kubectl logs kleidia-openbao-0 -n kleidia --previous
kubectl describe pod kleidia-openbao-0 -n kleidiaCommon issues:
- Storage full: Expand PVC or clean up
- Memory limits: Increase resource limits
- Network issues: Check service connectivity
4. If Data Corruption Suspected#
Stop dependent services first:
# Scale down backend
kubectl scale deployment/kleidia-backend --replicas=0 -n kleidiaRestore from backup:
# Copy backup to pod
kubectl cp backups/vault-snapshot.snap \
kleidia-openbao-0:/tmp/vault-snapshot.snap -n kleidia
# Restore snapshot
kubectl exec -it kleidia-openbao-0 -n kleidia -- \
vault operator raft snapshot restore /tmp/vault-snapshot.snap
# Unseal after restore
kubectl exec -it kleidia-openbao-0 -n kleidia -- \
vault operator unseal <key-share>Restart dependent services:
kubectl scale deployment/kleidia-backend --replicas=2 -n kleidia5. Verify Recovery#
# Check Vault is active
kubectl exec -it kleidia-openbao-0 -n kleidia -- vault status
# Check secrets are accessible
kubectl exec -it kleidia-openbao-0 -n kleidia -- \
vault kv list yubikeys/metadata/
# Check PKI is functional
kubectl exec -it kleidia-openbao-0 -n kleidia -- \
vault read pki/cert/ca6. Test Kleidia Operations#
- Log into Kleidia web UI
- Verify YubiKey operations work
- Test certificate generation on a test device
- Review audit logs for errors
Expected Outcome#
- ✅ Vault is unsealed and active
- ✅ All secrets are accessible
- ✅ PKI engine is functional
- ✅ Kleidia operations work normally
- ✅ Incident documented
Escalation#
If unable to restore:
- Contact Kleidia support
- Engage security team for key ceremony
- Consider point-in-time recovery from backups
Related Docs#
Database Failure#
Trigger#
PostgreSQL database is unavailable or returning errors.
Roles Involved#
| Role | Responsibility |
|---|---|
| Operations/DevOps | Diagnose and restore service |
| DBA (if available) | Assist with complex recovery |
Actions#
1. Identify the Issue#
Check pod status:
kubectl get pods -n kleidia | grep postgres
kubectl describe pod kleidia-postgresql-0 -n kleidiaCheck logs:
kubectl logs kleidia-postgresql-0 -n kleidia2. If Pod is Not Running#
Restart the pod:
kubectl delete pod kleidia-postgresql-0 -n kleidia
# StatefulSet will recreate itIf persistent volume issue:
kubectl describe pvc data-kleidia-postgresql-0 -n kleidia3. If Database Corruption#
Stop dependent services:
kubectl scale deployment/kleidia-backend --replicas=0 -n kleidiaRestore from backup:
# Restore database
gunzip -c backups/kleidia-db.sql.gz | \
kubectl exec -i kleidia-postgresql-0 -n kleidia -- \
psql -U kleidia -d kleidiaRestart services:
kubectl scale deployment/kleidia-backend --replicas=2 -n kleidia4. Verify Recovery#
# Check database connectivity
kubectl exec -it kleidia-postgresql-0 -n kleidia -- \
psql -U kleidia -d kleidia -c "SELECT 1;"
# Check tables exist
kubectl exec -it kleidia-postgresql-0 -n kleidia -- \
psql -U kleidia -d kleidia -c "\dt"Expected Outcome#
- ✅ Database pod running
- ✅ Data accessible
- ✅ Kleidia backend connects successfully
- ✅ Audit logs intact
Related Docs#
Additional Runbooks#
Agent Pairing Issues#
See Troubleshooting Guide for agent connectivity problems.
Certificate Expiry#
See Certificates & PKI for certificate renewal procedures.
TLS Certificate Expiry#
See Load Balancer Setup for TLS certificate management.