When something goes wrong with an industrial or business process it is important, obviously, to know what went wrong, and in some detail. It is important to understand the steps that led to the failure so that it is possible to identify where the process should be changed or adapted so the failure does not occur again.

One of the last things any business needs is a failure mode to keep repeating itself because its cause is not understood or found. That way lies, at the very least, the potential for poor customer experiences with the company, and damage to its reputation. At worst, it is possible catastrophic damage to the whole business.

In the early days of IT, when it was still something of a fringe activity run by pointy-headed rocket scientists in white coats and only adding marginal value to the business, taking a week or so to identify a problem and write a patch to get round it – actually re-writing the original code would take longer and might not appear until the next upgrade in a year or two’s time – would normally be an inconvenience. The business would, however, nearly always survive despite the problem.

Now, things are different…significantly different.

New communication behaviors create new expectations

The major change is that all of us are starting to become consumers of IT services rather than users of IT applications. It is already endemic in the world of smartphones and tablet computers, which increasingly use their prodigious power to be the super-portals to every kind of service across the globe. Those services can be sources of information or entertainment, applications that provide specific services and capabilities, places to store personal and/or business-related data, and communications services that allow business users in particular to be integral components of business processes, systems and environments that can – and do – span the world.

For such users the two most important services are `always on’ and `instant access’. Time – perhaps better stated as timeliness – is a major factor now in their operations and ways of working. Regardless of the hour of the day or the day of the week, if such people need their IT services to be available, ready to be consumed, wherever they may be, then the systems providing those services need to be there, ready to run.

Any process failure puts your business at stake

So if there are problems with the delivery of those services, business can be affected both directly and instantly. There can be few things worse for a company representative than to fail to deliver an up-to-the-second quotation or similar vital information in real time, especially if that is part of the `sales pitch’.

And these days, customer expectations associated with a brand are key business components, and they can be far too easily damaged. One of the best forms of damage can be the failure of an online business application, especially one dealing directly with end user consumers, to deliver when needed.

That is why the old ways of remediating such failures can no longer be the front-line option for any business. The bottom line is now simple: if the old ways of identifying and remediating a problem with a business process or service are the only solution available to you, your business may well be seriously damaged – and indeed may no longer exist – by the time that service is ready to function again.

Knowing a little, now, can make all the difference

It is now imperative for most businesses that problems and failures with business processes are identified as quickly as possible. And what is important here is the exact opposite of the `old ways’ of managing such process failures.

A little knowledge – and ideally a small amount of the right knowledge – can be a very powerful tool. If that can identify at least the component(s) within a process that are the cause of its failure or reduced efficiency, the IT team is, at the very least, 75% of the way to remediating the problem, They will know exactly where to look, and what was done to that component just prior to failure. If no changes were made then they can quickly check back for changes to the other components that interoperate with it.

The closer this can be brought to real time the better it will be, because that means remediation time can also be brought close to real time.

Have your system learn from the past

The key to this is the availability of business process monitoring systems that can effectively `learn’. This is not the currently well-hyped AI/machine learning approach, but it does exploit a key IT capability – the ability to very rapidly compare a current incident with past records and, quite often, pick the appropriate solution to IT staff. They can then either use it or, if their knowledge and experience suggests an alternative resolution for the identified problem, build and implement that.

This also gives the IT department a very fast, and automatable, first line of defense, which is to roll back to the previous implementation of that applications component. As most applications updates are about improving performance or capabilities rather than fixing earlier problems, such a roll back will rapidly provide a working environment once again.

Identify the change that caused the problem

This is what Integration Matters' nJAMS has been designed to provide to the world of Business Activity Monitoring. Its key difference is that it uses sensors that monitor specific applications and services. Its monitoring services set deployed applications components and their performance against a timeline so that changes (one of the most common causes of problems and failures) can be rapidly compared with the time a problem or failure occurred.

This is a very important element in short-circuiting the problems of the traditional, time-consuming route to finding the causes of failures. Remedial action can be started quickly because one of the first factors identified is that it is `this’ application component which is demonstrably generating `that’ problem.

…and the fixes that may already exist

That way, the IT staff know where to look and often have a pretty good idea of what solution will be needed. Sometimes, it will even be a fix they have already engineered and used before, which points at the potential to even automate some of the more common fixes that occur.

And best of all, it moves the traditional option for remedial action to a more appropriate place. There will still be problems that need lengthy investigation and recoding work to resolve. But the ability to identify the specific component causing a problem and roll back to the previous working version can keep the business rolling and the company alive while that essential work is completed.

About the Author: Hendrik Siegeln is co-founder and Managing Director of Integration Matters.