Stop Wasting Time Troubleshooting Technology

What I learned about techniques for faster problem solving

I was recently introduced to the idea of “heuristics” as a valid(?) problem-solving technique.

Common Faults — A Heuristics Approach

The concept itself is quite simple — for any given fault, compile a list of known root-causes and/or solutions, and sort your list by decreasing frequency.

As a demonstration (with made-up data), if your problem is that the light(s) are not illuminated, the possible cause might be:

50% — The power is off at the light switch
25% — The bulb is blown
12% — The circuit breaker has tripped
6% — Someone left the main power switch off in the switchboard
3% — The power is off at the street
1.5% — The bulb is not screwed in properly
0.8% — The switch is faulty
0.4% — There’s a loose wire at the back of the light socket
0.2% — A tree fell on / someone dug through your property supply line
0.1% — Something chewed though the cabling in your ceiling
0.05% — The circuit breaker has failed open circuit
0.02% — Someone put a nail in the wall through your wiring
0.01% — Someone installed a new oven on your light circuit
0.005% — Something more and more random…
0.002% — Etc., etc.

So now, armed with our heuristics data we can try and fix the problem, starting with the most likely root-cause.

If, it turns out that the bulb was blown, we only need to test the first two items in our checklist before we find the solution. Easy! (Or is it?)

My personal preference for problem solving is a logic-based approach. While the heuristics-based approach works really well for simple problems, it can (more on this later) start to resemble more of a guess-and-check methodology when you start dealing with high-tech systems, with more and more possible faults.

Even with our simple lighting example, if someone did put a nail through our wire (twelfth on our list), that might also cause the circuit breaker to trip, so we would get to the third solution in our list (a tripped circuit breaker), thinking we solved the problem, only to realise that the circuit breaker would trip again right away, as we hadn’t really found the true root cause.

Alternative — Splitting a Problem Into Parts

A logic-based approach on the other hand treats the problem akin to an algebraic expression, where you rearrange the formulae, to solve for ‘x’, where ‘x’ is your problem. Then as you methodically substitute measurable values into all your other unknowns, you are left with a defined fault.

If algebra wasn’t your favourite subject at school, that last bit probably left you thinking — “huh, what?”

The key here is we need a way of getting measurable feedback. In our light scenario, we could use a bulb, that we know works, as a test instrument and place it at strategic test-points around the circuit to establish our baseline measurements.

In reality, trailing a fragile test bulb around might not be practical, so instead of a test bulb we could use a “multimeter” set to “volts” (AC or DC depending on the circuit), and now we have a reliable test instrument at our disposal.

Now, using our logic based approach, and our trusty test equipment, we can begin to break the problem down into parts, (think back to our algebra expression), solving the largest unknowns first, and then narrowing the problem down (reduction), until all we are left with is a single answer.

If we test the lightbulb end of the circuit, and we get a reading on our multimeter, then straight away we can say everything is working fine through to this point, replace the bulb, and problem solved!

At the other end of the scale, if we don’t get a reading at the lightbulb end, we can quickly go to the metre-box and (with the correct training) test for a reading there, and if we then find it is working here, we know the fault is somewhere between the two points.

From here you could go somewhere in the middle, say the light switch (circuit design dependent), and depending on the reading, we can then determine if the fault is upstream, or downstream of that mid-point.

In a 100m circuit, after just seven tests, we could have narrowed the fault down to within a few meters.

In this illustration, we’re splitting the circuit it into parts (approximately half) each time. This “half-split” methodology works in well in “straight-line” circuits, where we are looking for a fault at a particular physical location.

(The concept of breaking a problem down into smaller parts it not limited to just linear problems however — logic approaches work just as well for more complex branched type systems too.)

Comparison

How does a logic-based approach compare to using a heuristic-based approach?

Well, the answer is — it depends. If you have been following along until now, we used seven tests in our logic approach to narrow the fault down to within a few metres. If we limited ourselves to just seven tests using the heuristics approach, we would have ruled out several of the most common faults, but if the fault wasn’t one of those seven, we’re stuck.

In a situation where you have an uncommon fault, and you must find the root-cause, no matter what (this is the area I operate in), logic-based approach will give you an outcome, in a timely fashion, without fail.

On the other hand, using a heuristics approach alone, doesn’t guarantee an outcome, and if you are dealing with an uncommon fault, the further you move down the list, the more you will feel like you are operating under a guess-and-check methodology, which is not going to efficiently help you isolate the problem.

My conclusion is that a hybrid approach, combining the guaranteed accuracy of a logic approach, together with high-speed resolution for heuristically common faults, will give the best overall outcome.

Next time you find yourself using your favourite logic-based technique, remember it might pay to try a handful of common faults from the top of your heuristic list first. And if you are a heuristics-only kind of person, remember to employ some logic approaches too.

If I had to pick one, I would still choose a logic-based approach, however there’s no reason to just use one approach — after all, there’s no point going straight for the multimeter, if all you have to do is reset the circuit breaker!

Written by