Get Lines of Code Changed in git diff Excluding Directory

Published: March 3, 2020

Tags:

Recently I was faced with large amount of code requiring review. I needed to get back to the individual who requested the review with an estimate of how long it would take.

To give an informed estimate, it’s useful to have a sense of how many lines of code have changed. This is generally easy to do with the --shortstat diff option (GitHub also conveniently displays this when looking at a pull request):

$ git diff --shortstat 2.0.0 2.3.0
 209 files changed, 30020 insertions(+), 2531 deletions(-)

However a quick glance at the changeset told me that the majority of the changes were unit test files, which I wasn’t particularly interested in reviewing here (this was a review of 3rd party code). I wanted to figure out how many lines of code had changed, excluding test files.

I did some Googling but didn’t find a great answer. This StackOverflow answer was informative, but ultimately didn’t outline a clear solution to the problem.

I dug into it a bit myself and came up with the following command, which did what I was looking for.

$ git diff --stat 2.0.0 2.3.0 \
  | grep -v 'Test/' \
  | awk -F"|" '{ print $2 }' \
  | awk '{ print $1 }' \
  | sed '/^$/d' \
  | paste -sd+ - \
  | bc
10745

Let’s do a quick review of what’s going on here…

Get the full diff stat

git diff --stat 2.0.0 2.3.0 will give me the diff like this:

 .circleci/config.yml                        |  154 ++-
 .github/CODEOWNERS                          |    2 +-
 .github/pull_request_template.md            |   33 +
 .github/stale.yml                           |   17 +
 .gitignore                                  |    7 +
 ...

On the left we see the file name. Next we see the pipe character, followed by the lines of code changed, and then a representation of how many of those were added lines and how many were removed lines.

Filter out the directory

In my case, all the tests are in the Test/ directory. As such I pipe the diff stat to grep -v 'Test/' to filter out any files paths containing the string Test/. If you need to filter a different directory, this is the part of the command you would update.

Extract the number of lines changed

I take the part of the string after the | character via awk.

awk -F"|" '{ print $2 }'

Next I take the number:

awk '{ print $1 }'

Remove any empty lines

I strip out any empty lines with sed:

sed '/^$/d'

Sum the numbers

Finally I sum up all the numbers:

paste -sd+ - | bc

For the example in question the result of this command was “10745”, meaning that out of the initial 32,551 changes reported by git diff --shortstat 10,745 of those were to files outside of the Test/ directory.

This isn’t the most elegant solution, but it works. If you know of a better way I’d love to hear it in the comments below.

Max Chadwick Hi, I'm Max!

I'm a software developer who mainly works in PHP, but loves dabbling in other languages like Go and Ruby. Technical topics that interest me are monitoring, security and performance. I'm also a stickler for good documentation and clear technical writing.

During the day I lead a team of developers and solve challenging technical problems at Rightpoint where I mainly work with the Magento platform. I've also spoken at a number of events.

In my spare time I blog about tech, work on open source and participate in bug bounty programs.

If you'd like to get in contact, you can find me on Twitter and LinkedIn.