When a team of people, both technical and non-technical, collectively operate a shared software installation things are bound to go wrong at some point. As the technical folk we are often engaged to perform forensic analysis. This type of work frequently includes tasks such as
grep-ping server access logs for certain request paths, dates, and IP addresses or reviewing any other logs or information related to whatever incident may have occurred.
This post is about a specific incident that came up recently. It was not a major one, but there were some learnings for me along the way and I figured it would be interesting to document the process.
As with most things these days, the story starts with an email. I had previously requested a client conduct UAT on a bugfix in a staging environment. Not long after, the client responded back and told me that he was having trouble with the site...he couldn't proceed through the checkout flow and was getting a white screen.
"Hmm...that's strange", I thought. I had personally deployed and tested the bug fix on that same staging environment prior to notifying the client and did not observe any such issue. What could be wrong?
To start out, I SSH-ed into the box and ran a
git status in the web root. While I was expecting it to be on the head of the develop branch it was not. But I had just run a
git pull before notifying the client! Something was amiss...
This is where things get interesting.
"Anyone doing anything on the staging server?" I popped into the Hipchat room for the site in question. We have Hipchat rooms for each site we manage at Something Digital, which is an awesome practice.
"Hmm...what's going on?", I wondered.
Like most devs do, I turned to Google.
NOTE: I probably could've figured out the solution without Googling for it. However, I didn't even take a moment to think about how I might approach the challenge. I could easily go into a tangent here into the tendency of devs to Google solutions and copy / paste answers from Stack Overflow without attempting to implement a solution on their own, which I, myself, am clearly guilty of, but I'll save that for a separate post.
"Search bash history for all users" I typed into Google
The command that turned up was as follows
For me the term to search for was
Upon running the command I got an inordinate amount of results. What I needed to know was the most recent execution of
.bash_history does not include timestamps for each command. I can't say I agree with that behavior. If you're responsible for a server that is accessed by multiple users it's probably a good idea to make sure bash commands are getting logged with time stamps. This article covers how to add timestamps to
.bash_history globally. You could also default all users to z-shell which does log timestamps to
.zsh_history. Finally, it may be worth looking into tools such as snoopy that offer several improvements over both
Fortunately timestamps were being recorded globally for
.bash_history on this box. Running
less ~/.bash_history looked like this...
The timestamps were recorded on the line above the command, so I needed to add the
-B 1 flag to the
grep command to also get one line before.
Awesome now all I need to do is combine every two lines in the output, and finally sort
Beautiful. This command was so sexy I put it in put it into the main Something Digital Hipchat room, created an alias in my
.zshrc, tweeted about it, and now am even writing a blog about it.
After all this build up, I have to say, the result of the story is pretty anti-climactic. Or, I should say it remains an unsolved mystery...or maybe I just thought I QA-ed it. Or maybe clearing the application cache upon deploy didn't kick in immediately. Anyway, for whatever reason no one had run a
git checkout since my most recent
git pull. Who knows what happened, but I certainly had fun trying to figure it out.
Hi, I'm Max!
If you'd like to get in touch with me the best way is on Twitter.