How Partial Reindexing Schedule Impacts Page Cache Hit Rate

Published: January 30, 2017

Tags:

One factor that impacts your Magento site’s full page cache hit rate is your partial reindexing schedule. By partial reindexing, I mean execution of the enterprise_refresh_index job, which runs when the Magento cron is executed in “always” mode.

Let’s take a closer look at the interplay between partial reindexing and full page cache hit rate.

Reindexing Impacts Full Page Cache?

Yes!

It is out of scope for this blog post to go too deep on Magento Enterprise’s changelog based reindexing, but one file I’d like to draw your attention to is Enterprise_CatalogInventory_Model_Index_Action_Refresh_Changelog. Everytime enterprise_refresh_index runs it will call the _reindex method on this class (along with all the other changelog action models) for any new records in the cataloginventory_stock_status_cl table.

You’ll see that once it’s done reindexing any products with inventory changes it dispatches an event and passes along the IDs of each product that was reindexed.

Mage::dispatchEvent('cataloginventory_stock_partial_reindex', array('product_ids' => $processIds));

Searching through the code base you’ll see that Enterprise_PageCache is actually listening for that event.

<cataloginventory_stock_partial_reindex>
    <observers>
        <enterprise_pagecache>
            <class>enterprise_pagecache/observer_index</class>
            <method>cleanProductsCacheAfterPartialReindex</method>
        </enterprise_pagecache>
    </observers>
</cataloginventory_stock_partial_reindex>

As you can see, the cleanProductsCacheAfterPartialReindex method will be called, which, as you might guess, will clean the cache for any pages tagged with the reindexed products.

How Does Schedule Fit Into The Equation?

Good question. Let me give you an example…

In this example, if I were to run partial reindexing every 1 minute, all pages tagged for this product would be flushed each minute as a new entry would be in the changelog each time partial reindexing ran.

However, if I decreased my partial reindexing frequency to every 5 minutes, those tags would only be flushed once every 5 minutes.

In other words, decreasing the frequency of partial reindexing can be used to dedupe cache invalidations (NOTE: I recommend you also install SomethingDigital_EnterpriseIndexPerf as out-of-box Magento does not dedupe entries in the changelog tables).

The net result of this is that information is more likely to be stale on the frontend, however end users are more likely to get cached responses.

Build Ups

While decreasing the frequency partial indexing will ultimately increase your hit rate, decreasing it too much could be damaging. For example, let’s say I set partial reindexing to run every hour. In that case, I would dedupe even more cache cleans than I did switching from one minutes to five minutes.

However, when partial reindexing would run each hour, there would be a large build up of products whose cache needed to be cleared. The net result of this would be a sudden burst of uncached responses.

Tune this setting carefully.

A Case Study

With this in mind, we recently did an experiment at Something Digital decreasing the partial reindexing frequency from every one minute to every five minutes. Then, we compared the site’s page cache hit rate for the week after the change to the week before the change.

The overall throughput was very close for each period (925,741 responses before / 992,001 responses after). However, there was a 12% increase in responses coming from cache (partial of full) (37.50% before to 42.08% after)

A chart demonstrating the increase in FPC hit rate

Leveraging This During An Emergency / High Traffic Scenario

During an emergency / high traffic scenario, one potential strategy could be to temporarily disable partial reindexing entirely. Then, when traffic quiets down and things stabilize, partial reindexing could be enabled (as mentioned above time this carefully). The frontend will be stale during this time, however, potentially, your infrastructure will be able to handle some additional throughput.

Conclusion

If you have any questions or comments, feel free to drop a note below, or, as always, you can reach me on Twitter as well.

Max ChadwickHi, I'm Max!

I'm a software developer who mainly works in PHP, but also dabbles in Ruby and Go. Technical topics that interest me are monitoring, security and performance.

During the day I solve challenging technical problems at Something Digital where I mainly work with the Magento platform. I also blog about tech, work on open source and hunt for bugs.

If you'd like to get in touch with me the best way is on Twitter.