Published: May 2, 2015
Sometimes it's helpful to keep a visual record of certain areas of your website.
I'll give you an example. I worked on a Magento site with a long-standing issue where multiple times a day, half of the documents would fall out of the SOLR index. In the end we found the culprit behind this issue, but in the process of debugging and analyzing the issue it was helpful to keep screenshots of certain search terms on record. In this post I'll dive into the details on setting said monitoring up.
The Search Begins
To get this set up, the first and most obvious question is "What tool should I use?". I had heard of PhantomJS being used for things like this in the past (in fact their README even lists "screen capture" in the "Use Cases" section) so researching PhantomJS was my first order of business.
PhantomJS likely would've been a solid choice for this, however, I would be running this job on a LAMP stack and wanted to keep everything as "LAMP-Y" as possible. There had to be a better way...
The Search Continues
After an hour or so Googling I determined that using a command line solution was the way to go (there are plenty of PHP solutions as well). Simply install the Linux package and execute from the command line. What could be easier (or more "LAMP-Y")?
The End Game
At this point I'm convinced that there really isn't a better solution than wkhtmltopdf for my needs. While I'm not wild about the name (since it does more than just PDFs) it damn well gets the job done and is dead simple to set up. Below is what I have implemented for keeping records of the necessary screenshots.
I run this script on a cron twice a day, once at 6:00AM and again at 6:00PM.
Since all the parameters are accepted as arguments rather than being hardcoded in the script I am able to monitor multiple pages with the same script. All the filenames are prefixed with the date and time the screenshot was captured and use a unique name which comes from the second argument, so they will be retained indefinitely (setting up rotation is a good idea as well!). I also am keeping these in a directory that is publicly accessible with an .htaccess file to enable directory indexes so that my team, and our client, can review the screenshots on demand without needing an SFTP connection.
This has worked like a charm for me since implementation.