Monitoring
These pages are for day to day reliability checks and troubleshooting. They are safe to use without SSH access.
Queues
[screenshot here soon: /admin/queues. Circle any Failed count and the queue name link.]
[screenshot here soon: /admin/queues/<queue>. On the Failed tab, circle the Retry and Remove buttons.]
- Queue list: shows backlog counts (waiting, active, delayed, failed)
- Drill-down: shows recent jobs by status, with safe metadata and failure reasons
- Actions: failed jobs can be retried or removed. These actions are recorded in audit logs.
Workers
[screenshot here soon: /admin/workers. Circle a Stale tag and the “(Xs ago)” age indicator.]
Worker status is based on a database heartbeat (not process manager output). If workers are stale, background processing is not healthy.
Incidents
[screenshot here soon: /admin/incidents. Circle a CRITICAL incident and the Acknowledge button.]
[screenshot here soon: Incident Details modal. Circle the “Details” button that opens it.]
- Incidents: capture of important failures (for example, failed LTI launch or worker bootstrap failure)
- Acknowledge: marks an incident as reviewed so it no longer shows as unacknowledged
Comments
Leave a comment, question, or feedback. Comments are public — please don’t include personal data.