Removing Paginated URLs From jekyll-sitemap

Published: January 14, 2018

Tags:

While I couldn’t find any official statements from Google on the matter, leaving paginated URLs out of your sitemap generally seems to be agreed upon as best practices.

However, by default, if you’re using jekyll-sitemap to generate a sitemap for your Jekyll based website, paginated URLs will be included.

In this post, let’s explore how you can remove these URLs from your sitemap.

Front Matter Defaults

As discussed in the “Exclude Pagination Pages” issue within the jekyll-sitemap GitHub repo the trick is to use front matter defaults.

On this site, I’m using the following configuration

paginate_path: "/blog/page/:num/"

defaults:
  -
    scope:
      path: "blog/page"
    values:
      sitemap: false

This will automatically add sitemap: false to the front matter for all paginated URLs, ensuring they are not added to the sitemap.

The Problem With Some Paginate Paths

While front matter defaults are a great solution for this, depending on your paginate_path you may run into an issue.

For example, on this site, prior to implementing front matter defaults to remove paginated URLs from the sitemap I was using the following paginate_path

paginate_path: "/blog/:num/"

The front matter default would have had to be as follows to add sitemap: false to all paginated URLs.

defaults:
  -
    scope:
      path: "blog"
    values:
      sitemap: false

However, this would have also caused https://maxchadwick.xyz/blog/ to have been excluded from the sitemap (the front page of my blog).

In order to use front matter defaults, I needed to change my paginate_path, but this also would mean that all my old paginated URLs (which were being crawled and indexed by Googlebot) would start to 404.

My solution was to jekyll-redirect-from to create redirects for all the old URLs. I created a simple bash script to create all the files…

#!/usr/bin/env bash

count=$1
for ((i=2; i<=count; i++)); do
	mkdir blog/$i
	echo "---" > blog/$i/index.html
	echo "redirect_to: /blog/page/$i" >> blog/$i/index.html
	echo "sitemap: false" >> blog/$i/index.html
	echo "---" >> blog/$i/index.html
done

I had 19 paginated URLs at the time made the switch so I ran it as follows

$ ./jekyll-pagination-redirects 19

You can see the diff from when I made the switch here.

Max Chadwick Hi, I'm Max!

I'm a software developer who mainly works in PHP, but loves dabbling in other languages like Go and Ruby. Technical topics that interest me are monitoring, security and performance. I'm also a stickler for good documentation and clear technical writing.

During the day I lead a team of developers and solve challenging technical problems at Rightpoint where I mainly work with the Magento platform. I've also spoken at a number of events.

In my spare time I blog about tech, work on open source and participate in bug bounty programs.

If you'd like to get in contact, you can find me on Twitter and LinkedIn.