Runbook¶
When something is broken or behaving oddly, start here. Each section is "symptom → diagnosis → fix".
Bot is offline / not responding¶
Symptom: No response to slash commands, "Bot is offline" indicator in Discord member list.
bash
ssh root@<droplet>
docker compose ps # is the container even running?
docker compose logs --tail=200 clanguard
Common causes:
| What logs show | Cause | Fix |
|---|---|---|
Cannot connect to Discord / WebSocketException |
Token revoked or rotated | Update DISCORD_BOT_TOKEN in /opt/clanguard/.env, docker compose up -d --force-recreate |
Used disallowed intents |
Privileged intents disabled in dev portal | Re-enable Presence / Members / Message Content intents |
database is locked |
WAL contention or stuck process | Restart the container |
Cannot find google-credentials.json |
Mount missing or file empty | Re-decode from GOOGLE_CREDENTIALS_BASE64 secret |
| Container exits immediately | Missing env var or bad config | Check docker compose logs for the .NET stack trace |
Deploy succeeded but old code is still running¶
Almost always a stale container. Force a clean rebuild:
bash
cd /opt/clanguard
docker compose build --no-cache
docker compose up -d --force-recreate
docker compose logs --tail=50
If that still doesn't pick up the change:
bash
docker compose down
docker image prune -f
docker compose up -d --build
Don't docker system prune -a
That deletes the named volumes — including bot-data — and you lose the database. Always scope prunes to images.
AWOL list keeps showing the same notification¶
Symptom: Same user gets posted to the AWOL channel every check cycle.
Possible causes:
NotificationSentcolumn not being written. CheckAwolRecordfor the user —NotificationSentshould be1once posted.- Missing channel permissions. If posting silently fails,
LastNotificationAttemptUtcwill keep updating butNotificationSentstays0. After 7 days the record auto-resolves as "given up" — seeAwolCheckService.
sql
SELECT UserId, Username, AssignedAt, NotificationSent, LastNotificationAttemptUtc
FROM AwolRecord
WHERE NotificationSent = 0
ORDER BY AssignedAt DESC;
Auto-promotion didn't run / promoted the wrong people¶
Auto-promotion runs once a day at AutoPromotionRunHourUtc (default 3 UTC). To verify a run happened:
bash
docker compose logs clanguard | grep -i "AutoPromotion"
The BotState table records cycle completion. If AutoPromotionEnabled=false or AutoPromotionDryRun=true in config, no announcements happen.
To re-trigger manually, restart the container after the configured hour. There is currently no "run now" command for auto-promotion.
Dry-run vs. dry-run-ranks
AutoPromotionDryRun=true makes every tier dry-run. AutoPromotionDryRunRanks=PVT,PFC makes only those tiers dry-run while other tiers continue to promote for real. Use the per-tier flag when rolling out a tier change.
Apollo events aren't being captured¶
Symptom: New Apollo posts in #events aren't appearing on the calendar or in attendance.
- Confirm the Apollo bot user matches
BotConfig.ApolloBotName(defaultApollo). - Check
ApolloMessageLogfor the most recent entry — if recent, the capture handler is working but the parser is failing. - Check logs:
grep -i Apollo logs/clanguard-*.log
Known parser landmines:
- Smart punctuation in event titles crashes the embed parser. Strip curly quotes / em-dashes before posting if seen.
- Apollo edits the same message to update RSVP counts. The bot needs
MessageCacheSize=500andGuildMessagesintent forMessageUpdatedto fire. - Apollo embed timestamps are parsed in
ApolloEmbedParser— if the embed format changes, the parser may need updating.
Voice tracking is wrong¶
Symptom: A user has voice time recorded but they were definitely never in voice / vice versa.
Most likely cause: bot restarted while the user was in voice. The VoiceSession row for that session never got a LeftAt timestamp.
VoiceSessionCleanupService reconciles dangling sessions on startup, but if the user joined and left during a downtime window, the session is lost entirely.
To audit one user:
sql
SELECT JoinedAt, LeftAt, ChannelName, ROUND((julianday(COALESCE(LeftAt, datetime('now'))) - julianday(JoinedAt)) * 24, 2) AS hours
FROM VoiceSession
WHERE UserId = <user-id>
ORDER BY JoinedAt DESC
LIMIT 20;
To cap a single session that ran absurdly long:
sql
-- MaxSingleSessionHours defaults to 12. Sessions exceeding this are auto-trimmed
-- by AwolCheckService when computing activity, but the raw row is unchanged.
Roster export to Google Sheets is empty / outdated¶
Runs once a day at RosterExportHourUtc (default 6 UTC).
- Verify
RosterSpreadsheetIdandRosterSheetNamein config. - Check that
google-credentials.jsonis mounted and the service account has edit access on the sheet (share the sheet with the service account email). - Logs:
grep -i RosterExport logs/clanguard-*.log
To re-export immediately, run /roster-export (officer-only).
/promote or /demote fails silently¶
The bot's own role must be above every rank role it's modifying. Check Server Settings → Roles and drag the bot's role up.
Other failure modes:
- User running the command lacks the
PromoteDemoteMinRankrank (defaultMAJ). - Target user is exempt (
Bot,Retired, etc.). - Target rank doesn't appear in the
RankRolesconfig list.
Reddit Leads stopped posting¶
bash
docker compose logs clanguard | grep -i Reddit
403from Reddit → User-Agent is being rejected. Reddit requires a unique, descriptive UA. CheckRedditLeads.UserAgent.429→ rate-limited. IncreasePollingIntervalMinutes.- No errors but no posts →
LeadMatcherfilters may be too strict, or no recent posts match.
Briefing never posted on Sunday¶
WeeklyOfficerBriefingService runs at WeeklyBriefing.RunOnDayUtc / RunAtUtc. Check:
Claude__ApiKeyis set in the environment.WeeklyBriefing.OfficerChannelIdis correct and the bot can post there.DryRunisn'ttrue.- Anthropic API hasn't returned an error — search logs for
ClaudeorAnthropic.
To test on demand: /briefing-now (BG+ only).
Database is corrupt¶
bash
docker exec -it clanguard-bot sqlite3 /app/data/clanguard.db "PRAGMA integrity_check;"
If it returns anything other than ok:
- Stop the bot:
docker compose stop clanguard - Restore from the most recent
~/clanguard-backups/backup-*.db:bash docker cp ~/clanguard-backups/backup-YYYYMMDD.db clanguard-bot:/app/data/clanguard.db - Restart:
docker compose start clanguard
"I just need to see what the bot is doing right now"¶
bash
docker compose logs -f --tail=100 clanguard
Filter for a specific subsystem:
bash
docker compose logs --tail=500 clanguard | grep -i "AutoPromotion\|Apollo\|Awol"