Direct Control Systems & Failure Analysis

Data Records

A chart from an old-fashioned but effective circular recorder.

While it is now much easier with modern instrumentation to control a process, perversely it seems more difficult to use modern records for failure analysis. Years ago the controls of industrial processes were relatively simple and a typical record would consist of circular or linear charts. Data collection was easy and examination enabled surprising precision.

A linear chart from a gas turbine accident where close examination with a magnifier allowed events to be resolved to intervals of 6.5 seconds.

Fast forward to the modern era, and the control room of a modern factory where there are rows of computer consoles with operators who can look up and ascertain such parameters as temperature, pressure, density and flow rate for hundreds of units over hectares of a site. Not only can they do this in real time, but they can typically pull up graphs to show how these have altered over the past hour or 24 hours, at a resolution of 1-2 seconds. In recent years we have carried out over a hundred machinery accident investigations, most of them in modern factories operated by direct control systems. In not one of these investigations has it been possible for us to analyse an accident to a resolution remotely approaching 6.5 seconds (see caption above). The quantity of data is immense and during normal operation no attempt is made to record it all on paper. Consequently the following problems arise in our investigations.

Data is Discarded!

A modern DCS room.

The data takes up so much space on the magnetic recording media that after a period of time it is erased. This may be after a fixed time, or the data may be progressively summarized, losing resolution each time. Perhaps the system works on a first-in first-out basis, and the continuous error messages that occur during an accident can fill the capacity, pushing out the important data.

Data Cannot be Read!

If an investigator is given a disk with the data, how is it to be read? Most of the time it can only be read on the actual DCS system or a dummy one. Often by the time the investigator visits the site, the plant is back in operation and it can be unrealistic for production to stop to allow the reloading of old data.

What Data is Required?

A steam turbine failure. The DCS data for the period relevant to the above failure was one gigabyte in size.

The records for a reasonable time period surrounding the accident could amount to gigabytes of data. To ask for all the records is so impractical at a modern factory that the request, though politely received, will probably be ignored.

A preliminary failure analysis should be carried out as quickly as possible. This is not necessarily aimed at determining the precise cause, but more to identify the sections of the plant that could have played a part in the accident and ensuring that the relevant information is printed out and records copied.

When DCS Data is Unavailable

Apart from the automatic procedures built into DCS computers, in our experience the main factor for t he discarding of information is simply the pressure applied by management to get the factory working again. If the investigator arrives and finds the DCS data has gone, it is possible that shortly after the accident, one or other of the engineers did print relevant information and has it, without management necessarily being aware.

A plot from data recorded manually each hour at a lime kiln, documenting the rise in kiln temperature, culminating in the collapse of the refractory lining.

If the electronic data records are truly gone and no one thought at the time to print out the relevant data, at most factories a manual of key parameters is usually kept. These will be recorded at intervals, typically once an hour, but many accidents have the genesis of their cause in that sort of time frame.