====== Resolving Replication Timeout Issue ======

Author: Yan Wang

Steps to resolve replication timeout issue:

1. login into slave database

2. run show slave status\G

3. check the Last_Errno and Last_Error to identify if it's a timeout issue

4. run SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; to skip that query(will be run manually later) - AWS RDS can run the query, check here for sulotion: https://jerrylogic.wordpress.com/2017/04/21/amazon-rds-skip-replication-errors-by-repositioning-slave/

5. restart slave by START SLAVE;

6. if Last_Error doesn't provide the query or full query, you can either check the bin/relay log, or running percona to catch the difference for the table between master and slave by run this command from master server pt-table-sync -u<user have writer permission> --print --sync-to-master <slave db server IP> --tables <database name>.<target table name> --ask-pass > file_to_store_query.sql, e.g. pt-table-sync -uwriter --print --sync-to-master 10.230.10.15 --tables snmp_manager.network_element --ask-pass > ne.sql to check network_element table

7. run the query you get from file_to_store_query.sql, if there is foreign key constraint fails, may check the two items on master and slave manually and find the difference to write an UPDATE query yourself and run the UPDATE query on the slave database