All software contains bugs. Period. Some bugs require immediate attention, once discovered. And, some bugs aren’t worth fixing at all. How you differentiate between the two makes all the difference inside and outside of your Engineering department.
A Framework for Classifying Bugs
Bugs may be classified by severity and scope in order to assign a priority.
- Severity is a measurement of a bug’s impact on the usefulness of the software. A high severity bug significantly curtails the user’s ability to derive value from the system. A moderate severity bug limits a user’s access to a major feature. And, a low severity bug is a minor nuisance, but not blocking use of any features.
- Scope is a measurement of the impact of the bug. This could be the number of users affected. It could be the total spend of all the impacted customers. However you define it, ensure that it is measurable.
- Priority is used to set expectations with customers and between departments as to when an issue will receive attention. Essentially, a priority level is shorthand for a service level agreement and the actions the engineering team takes in response to the bug.
One way to define priority levels is as follows:
A few notes on using users affected as your measurement of scope:
- If you think a bug affects all users, treat it as such until you discover otherwise. If you discover that, in fact, the bug did not affect all users, you may downgrade the bug as appropriate.
- If you can reproduce a bug, treat it as though at least a subset of users (some) are affected, even if only a single user has reported the issue.
- If you cannot reproduce a bug that was reported by a single user, treat it as such.
Service Level Agreements
One way to define service level agreements is as follows:
The highest priority of bugs deserve immediate attention, even after hours. The next level down should receive attention no later than one business hour after it was reported. And so on, down to the lowest priority level where the issue may not receive attention for months.
Note that “receives attention” means that the Product team will prioritize and the Engineering team will look into the bug within the specified time frame. It does not promise resolution within that time. Some bugs are harder to fix than others.
Adjusting Service Levels Based on Circumstances
For the vast majority of bugs, the default service level should be fine. But, from time to time, you may encounter an issue that requires more urgent attention than the default SLA affords. Specifically, the Product Manager may decide (in consultation with Customer Service/Support) to prioritize a bug sooner than the SLA would warrant, based on the presence of one or more aggravating circumstances, including but not limited to:
- The bug affects our relationship with a strategic customer.
- The bug affects the renewal of a large account.
It is important to note that aggravating circumstances only affect a bug’s SLA. They do not change a bug’s priority level, since priority levels govern the type of response Engineering is required to undertake for under many compliance regimes. This is important because the additional overhead associated with P0 and P1 bugs means that Engineering can actually resolve more P2 bugs than P0 and P1 bugs in the same amount of time.
Classifying bugs well is an important attribute of a healthy engineering culture. Using this small framework enabled me to bring sanity to an otherwise completely random process at one start up. In the end it made everyone happier, knowing what the expecations were and how to communicate them clearly internally and with customers as well.