Monitoring

These pages are for day to day reliability checks and troubleshooting. They are safe to use without SSH access.

Queues

[screenshot here soon: /admin/queues. Circle any Failed count and the queue name link.]

[screenshot here soon: /admin/queues/<queue>. On the Failed tab, circle the Retry and Remove buttons.]

  • Queue list: shows backlog counts (waiting, active, delayed, failed)
  • Drill-down: shows recent jobs by status, with safe metadata and failure reasons
  • Actions: failed jobs can be retried or removed. These actions are recorded in audit logs.

Workers

[screenshot here soon: /admin/workers. Circle a Stale tag and the “(Xs ago)” age indicator.]

Worker status is based on a database heartbeat (not process manager output). If workers are stale, background processing is not healthy.

Incidents

[screenshot here soon: /admin/incidents. Circle a CRITICAL incident and the Acknowledge button.]

[screenshot here soon: Incident Details modal. Circle the “Details” button that opens it.]

  • Incidents: capture of important failures (for example, failed LTI launch or worker bootstrap failure)
  • Acknowledge: marks an incident as reviewed so it no longer shows as unacknowledged

Comments

Leave a comment, question, or feedback. Comments are public — please don’t include personal data.

Loading comments…