=====MdcTasksFailing=====

**Level:** Major


**Purpose:** Notify operations that the percentage of MDC failed tasks has reached greater than 60% for more than 1 hour. Percentage is the amount of failed tasks in orchestrator database vs total processed tasks (failed + completed) for the last hour

**Scenario:** The RDF agent has become overloaded with tasks and can't process them, causing a greater number of them to fail
  

**Resolution:**
Clear out the active_task table and restart the agent. If the issue persists, check the schedule_config table to see if there are schedules that have a long timeout (> 200 seconds) that might be bottlenecking it. Check if there are too many schedules starting at the same time. Check the metrics in Prometheus to detect a pattern.

 
**Manual Action Steps:**
<code>delete from orchestrator.active_task;</code>

On the oat servers:
<code>sudo docker restart rdfagent</code>

[[http://prometheusprod.err:9090/graph?g0.expr=mdc_agent_task_failed_percentage&g0.tab=0&g0.stacked=0&g0.range_input=1d]]


**Auto Clear:** When the failed task percentage drops below 60%