Replication Lag Race Conditions in Magento

Published: September 7, 2018

Tags:

One of my favorite class of bugs are race conditions caused by replication lag (yes, I’m a masochist). These typically play out something like this…

I’ve run into a number of these both in core Magento as well as in 3rd party extensions. In this post I’ll cover a couple of my favorites…

Incorrect Usage Count For Sales Rules / Coupons

This was the first bug of this type that I’ve run into. I found this on a Magento 1 site in the sales_order_afterPlace method of Mage_SalesRule_Model_Observer, however Magento 2 currently looks to have the same issue. The code is as follows…

$rule->setTimesUsed($rule->getTimesUsed() + 1);

Under high concurrency the following can happen…

The solution to this problem, fortunately, is simple…increment times_used without consulting the slave to provide the current value. From a SQL standpoint the query needs to be changed like this…

Before

UPDATE salesrule_coupon SET times_used = 0 + 1;

After

UPDATE salesrule_coupon SET times_used = times_used + 1;

All Users Removed From Role When Modifying Role

There’s another fun replication lag race condition bug where saving a role causes all users assigned to the role to become unassigned. This one also was found on a Magento 1 site, but again appears to remain an issue in Magento 2.

In Magento 1 the trouble is in the saveRoleAction() method of Mage_Adminhtml_Permissions_RoleController, where, you’ll find the following code…

foreach($oldRoleUsers as $oUid) {
    $this->_deleteUserFromRole($oUid, $role->getId());
}

foreach ($roleUsers as $nRuid) {
    $this->_addUserToRole($nRuid, $role->getId());
}

$oldRoleUsers is an array with the ids all the users assigned to the role prior to save (when the page was loaded). $roleUsers is an array with the ids of all the users currently being saved to the role.

Astute readers may notice that this code is not particularly efficient as users who were present when the page is loaded AND when being saved will be unnecessarily removed from the role and then re-added.

But worse than the inefficiency is the race condition bug that can happen in the case of (even slight) replication lag…

The _deleteUserFromRole method will eventually wind up calling the deleteFromRole method on Mage_Admin_Model_Resource_User.

public function deleteFromRole(Mage_Core_Model_Abstract $user)
{
    if ( $user->getUserId() <= 0 ) {
        return $this;
    }
    if ( $user->getRoleId() <= 0 ) {
        return $this;
    }

    $dbh = $this->_getWriteAdapter();

    $condition = array(
        'user_id = ?'   => (int) $user->getId(),
        'parent_id = ?' => (int) $user->getRoleId(),
    );

    $dbh->delete($this->getTable('admin/role'), $condition);
    return $this;
}

This will issue a DELETE query for the user record in the admin_role table (which will need to replicate to the slave).

The problem can be found back in Mage_Adminhtml_Permissions_RoleController’s _addUserToRole method, which looks like this…

protected function _addUserToRole($userId, $roleId)
{
    $user = Mage::getModel('admin/user')->load($userId);
    $user->setRoleId($roleId)->setUserId($userId);

    if( $user->roleUserExists() === true ) {
        return false;
    } else {
        $user->add();
        return true;
    }
}

If it check passes the user won’t be added back to the role…

if( $user->roleUserExists() === true ) {
    return false;

roleUserExists checks for the record in the admin_user table that was previously deleted…

public function roleUserExists(Mage_Core_Model_Abstract $user)
{
    if ( $user->getUserId() > 0 ) {
        $roleTable = $this->getTable('admin/role');

        $dbh = $this->_getReadAdapter();

        $binds = array(
            'parent_id' => $user->getRoleId(),
            'user_id'   => $user->getUserId(),
        );

        $select = $dbh->select()->from($roleTable)
            ->where('parent_id = :parent_id')
            ->where('user_id = :user_id');

        return $dbh->fetchCol($select, $binds);
    } else {
        return array();
    }
}

If there’s a slight delay in replicating the previous DELETE statement from the master to the slave, the record may still exist when this check is run, causing it to get deleted, but never re-added.

Again, the solution is fairly simple, filter $oldRoleUsers and $roleUsers to remove any duplicates…

$originalOldRuleUsers = $oldRoleUsers;
$originalRoleUsers = $roleUsers;
$oldRoleUsers = array_diff($originalOldRoleUsers, $originalNewRoleUsers);
$roleUsers = array_diff($originalNewRoleUsers, $originalOldRoleUsers);

Max Chadwick Hi, I'm Max!

I'm a software developer who mainly works in PHP, but also dabbles in Ruby and Go. Technical topics that interest me are monitoring, security and performance.

During the day I solve challenging technical problems at Something Digital where I mainly work with the Magento platform. I also blog about tech, work on open source and hunt for bugs.

If you'd like to get in touch with me the best way is on Twitter.