Triage

What I am about to discuss may seem self-evident at first, but the fact is, even really bright people can get distracted by faux emergencies and lose sight of the big picture in project (and crisis) management.

This is where triage comes into play.

Over the last year, as people have moved up in our company and other people have joined the team, the concept of what constitutes a high priority bug has been tossed around quite a bit. There have been some rather passionate conversations regarding whether or not a particular bug is an immediate priority and what should be done about it.

The conversations would typically go as follows:

Engineer: “I was testing today and I realized that user home pages are broken. You should be able to type http://imp.fm/username and go straight to the user’s page, but right now you get a 404.”

Me: “Ah, okay, open a bug on it and get it captured.”

Engineer, pressing a bit: “Well, this is really important, if people can’t give out their home page links to their fans, they’re going to have to say something like“yeah, check out my page on imp. The address is http://imp.fm/viewprofile.php?id=80.” People will get confused and no one will use our site!”

Me: “Well, yeah, we definitely have to fix it, and we will, but this isn’t really heinous yet, you know? Open a bug on it, give it a low priority, and we’ll get it fixed over the next couple of weeks.”

Engineer, now a bit flustered: “I think that this is heinous! This feature is going to be a big deal to our users. If we don’t make this as simple as possible to remember people will just stick with their My Space pages!”

Me: “I understand, but we haven’t even gone live yet. Just open the bug and let’s get back to dealing with the ’server bursts into flames on startup’ issue, okay?”

The conversation continues a bit, with the engineer getting more and more frustrated, feeling like they aren’t being listened to or taken seriously, and with me getting frustrated as well, because I don’t have time to explain triage, actual heinousness, or Maslow’s Hierarchy of Need right then and there in the hallway. Eventually things would stop with a directive to open the bug, as requested by me, and get back to the immediate task at hand and deal with it later.

Bug Priorities Change Throughout the Development Process

Rands has an excellent post on heinious bugs. In his own words, a heinous bug is “the type of bug you will not ship with. As a responsible parent for your product, you think you will ship with no bad bugs and that’s where you’re loopy. You’re going to ship with tons of bad bugs. More than you’ll be comfortable with. You’re, however, not going to ship with Heinous bugs because these are the bugs which, if found AFTER YOU SHIP, would stop the presses. People would run around the building screaming. Much money would be lost and heads would roll.”

Heinousness is undeniable, but it is also a relative thing, and the relative heinousness of a particular bug will change over time as other bugs are fixed and product release draws closer. A bug that was relatively unimportant two weeks ago may now be a top priority on which most of the team’s resources are focused on.

An Example

As an example, consider a hypothetical case in which a product has 14 known major bugs (see below).

bugs-1.png

At this particular time there are three bugs getting most of the attention:

  • Bug 10 - “product causes server to burst into flame on application startup.”
  • Bug 7 - “product induces Snow Crash in users on login error.”
  • Bug 14 - ”exiting the application erases user’s hard drive”

Bug 10 is the one currently looming in my mind. Bug 7 (which reduces users to babbling idiots) is definitely worse, but until we can actually get the server to run, this off-nominal case is pretty much academic. Number 14 is also one I’m losing sleep over, but it’s going to have to wait until we can actually get the program to run without lobotomizing anyone before it will get fixed.

The engineer’s reported bug, number 5 (”user website names broken”) is so far down in the noise that it’s barely noticeable, which is why I wanted it captured and ignored for the present. Its relative importance compared to other issues is nil and we don’t have any users yet to complain about it. It’s safe to defer for the time being.

Pass Two

Eventually we find the“[self detonate_power_supply:true]”call and comment it out. Now the bug landscape looks something like this:

bugs-2.png

Bug 7 is still causing people to lapse into semi-catatonic baby talk, and is currently the biggest outstanding issue we have to deal with. A few days later, Garret Cole removes the Asherah virus from the login failure notification message, effectively closing the bug and making the world (especially the office) a little bit better place.

Pass Three

Suddenly, bug 14 stands out dramatically:

bugs-3.png

At this point, what becomes plainly evident is that it’s the priority of your worst bug that determines the relative importance of everything else. It is also evident that our team has some unusual coding practices, and should perhaps stop attending Cthulhu cult singalongs.

This really just comes back to Maslow’s hierarchy of needs (again discussed deftly by the esteemed Mr. Rands here). All major bugs must be normalized to the worst bug you are currently facing and considered within the current context in order to be prioritized correctly. Once the worst bugs have been fixed and the fires extinguished, our sense of proportion begins to change to reflect the new crisis level. At the new level, bugs that were once considered tertiary considerations become important, and the importance of previously unimportant bugs begins to grow.

Pass Four

With bug 14 out of the way, the relative priority of the other bugs grows considerably:

bugs-4.png

Let’s see, bug 3 (”Users can’t upload”), bug 4 (”Downloads broken”), and bug 6 (”User uploaded files not visible”) are starting to look a lot more relevant. Since what we’re working on is effectively a content management system, these are starting to seem pretty important. Bug 5 is still pretty much lost in the noise at this point.

Pass Five

It takes about a week, but the file management code is eventually fixed. Most of the pr0n used by the software team to test the file distribution system have been removed at the request of our QA team, and we’re ready to deal with what’s left in the bug tracker.

bugs-5.png

At this point bug 5 is becoming pretty damn relevant: it’s finally on the radar relative to the worst case bug in the system, and we’re also starting to take on beta testers that are going to want to give out their personal site addresses to people. The bug gets assigned and handled in short order.

The Point

The point here is not that bugs like number 5 aren’t important. The point is that bugs must also be taken in the context of the current system state in order to be prioritized properly. Getting rid of heinous bugs is like smoothing out a piece of wood with increasingly fine-grained sand paper. Bad triage practice is like taking the finest grit sandpaper you have to a tree trunk that still has bark on it - until you get rid of the really rough spots, you can polish forever without results.

Another point, brought up to me by Garrett Kelly at the time this was being written, is that the choices one makes during the triage process are not always obvious unless taken in the context of your basic goals.

In the example cited by Garrett, he pointed out the the military was faced with a short supply of penicillin during World War II. Forced to decide between giving the drugs to the seriously injured, or soldiers with venereal disease, the military chose to administer the drug the soldiers with VD, because they were still able to fight.

I personally would have expected it to go to the wounded, but I wasn’t thinking of the Army’s primary goal - winning the war. Giving penicillin to someone that wasn’t going to be able to fight anymore wouldn’t make sense from their point of view.

3 Responses to “Triage”

  1. Brain Murmurs » CEO Blog: Triage in Project Management Says:

    [...] CEO Daniel Pasco discusses the fractal nature of heinousness in project management [...]

  2. Around the web | alexking.org Says:

    [...] Soft Arts - Triage [...]

  3. Soft Arts » Blog Archive » Things that I like to keep in mind Says:

    [...] This seems particularly relevant considering Andy Ihnatko’s recent, scathing review of Mars Edit, developed by Red Sweater Software, a one-man operation run by Daniel Jalkut. Many of the points Ihnatko makes are reasonable, but he doesn’t seem to consider whether or not the missing features he wants to see are eliminated on purpose, or simply haven’t made it in yet due to triage. [...]

Leave a Reply