Repair a broken MySQL slave

If a MySQL slave encounters an error while replicating commands from the master, the slave will abort.

One way this can happen is if you are using triggers on a table that calls a stored procedure, but the stored procedures are missing on the slave because you forgot to include the –routines option when generating a mysqldump from the master to import to the slave.

So what do you do? First, verify that the slave is indeed encountering errors:

mysql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 1.2.3.4
                Master_User: slave_user
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.001079
        Read_Master_Log_Pos: 269214454
             Relay_Log_File: slave-relay.000130
              Relay_Log_Pos: 100125935
      Relay_Master_Log_File: mysql-bin.001079
           Slave_IO_Running: Yes
          Slave_SQL_Running: No
            Replicate_Do_DB: mydb
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 1146
                 Last_Error: Error 'Table 'mydb.taggregate_temp_1212047760' doesn't exist' on query. Default database: 'mydb'. 
Query: 'UPDATE thread AS thread,taggregate_temp_1212047760 AS aggregate
        SET thread.views = thread.views + aggregate.views
        WHERE thread.threadid = aggregate.threadid'
               Skip_Counter: 0
        Exec_Master_Log_Pos: 203015142
            Relay_Log_Space: 166325247
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: NULL
1 row in set (0.00 sec)

mysql>

If  either Slave_IO_Running or Slave_SQL_Running are ‘No’ then the slave has aborted. Here’s how to fix it:

Stop the slave

mysql> STOP SLAVE;

Resolve the error

Pay attention to the Last_Error field and try to resolve it. If this is a recurring issue you may need to import a fresh dump from the master to bring your slave back into sync. If the situation merits it, you can run the following to have the slave skip the offending SQL query:

mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;

Be careful, though because this could have severe repercussions down the line.

Restart the slave

mysql> START SLAVE;

Verify that the slave is running

mysql> SHOW SLAVE STATUS \G

When Slave_IO_Running and Slave_SQL_Running both are set to ‘Yes’ then you’re good to go. It would probably be a good idea to have some sort of monitor to ensure your slaves are indeed running when you need them to. Perhaps I’ll throw together a quick script to do that…

Thanks to http://www.howtoforge.com/how-to-repair-mysql-replication for this information.

Written on May 2, 2009