User Tools

Site Tools


Writing /app/www/public/data/meta/watchdogs/quartzjobsblocked.meta failed
watchdogs:quartzjobsblocked

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
watchdogs:quartzjobsblocked [2020/04/10 12:58] bosowskiwatchdogs:quartzjobsblocked [2021/06/25 10:09] (current) – external edit 127.0.0.1
Line 1: Line 1:
 +//Author: Stephanie Song - Sept 12, 2018//
 +//Edited: Bartosz Osowski - April 10th, 2020//
  
 +====== Sample QuartzJobsBlocked Watchdog: ======
 +
 +**WATCHDOG-787755 - Alarm Received : extdb1 - 10.230.10.14 - CRITICAL - ticketerDbCheck : errigalDatabaseMySqlQuartzCheckJobFailureAlarm - QuartzJobsBlocked
 +**
 +
 +Background: 
 +
 +Quartz is a scheduler plugin used inside of grails. When the job says ‘BLOCKED’ it means something went wrong, and the job couldn’t complete. If the job is BLOCKED, it will no longer run the job. 
 +
 +Each of the main databases have their own Quartz tables so make sure you are using the right database. In this watchdog sample, the database is the ticketer. So ‘use ticketer’. 
 +
 +To see all quartz-related tables:
 +<code>
 +show tables like '%QRTZ%'
 +</code>
 +====== Resolution: ======
 +
 +For this particular watchdog, the table we are concerned with is QRTZ_CLUSTER_TRIGGERS:
 +
 +<code>
 +select * from QRTZ_CLUSTER_TRIGGERS
 +</code>
 +
 +<code>
 +select * from QRTZ_CLUSTER_TRIGGERS where TRIGGER_STATE = 'BLOCKED'
 +</code>
 +
 +The field TRIGGER_STATE is where the job may say ‘BLOCKED’, ‘WAITING’, etc
 +
 +To fix this watchdog, we want to change any jobs that state ‘BLOCKED’ to ‘WAITING’, and update the START_TIME and NEXT_FIRE_TIME to be sometime in the future (say 30 seconds). 
 +
 +This will trigger Quartz to try running the job again and hopefully the job will run successfully.
 +
 +The START_TIME and NEXT_FIRE_TIME are written in milliseconds.
 +
 +Run the following to unblock the jobs. It is important to set the NEXT_FIRE_TIME to a time in the future:
 +<code>
 +update QRTZ_CLUSTER_TRIGGERS set NEXT_FIRE_TIME = (UNIX_TIMESTAMP(date_add(now(), interval 30 second))*1000), TRIGGER_STATE = 'WAITING' where TRIGGER_STATE  = 'BLOCKED';
 +</code>
 +
 +Wait for the job to run again at the specified trigger time and check if the Watchdog comes back as clear. 
 +
 +Sometimes the watchdog clears but comes back. Consult with the Operations team if the above fix does not work and the watchdog keeps re-occurring. 
 +
 +
 +======If the jobs keep constanty blocking======
 +I wouldn't suggest doing this on production, but everywhere else, run the following:
 +<code>
 +delete from persisted_variable;
 +delete from persisted_task;
 +</code>