Preventing Enterprise Software Failures

by Abayomi Oloko

Businesses often run on database software applications and, in many cases, the software controls a machine or a process which provides a value or a service intended to aid the organization. Robots run the assembly lines of major automobile manufacturers and software packages communicate task-based instructions to those robots. Banking software helps financial institutions effectively track transactions of their customer accounts from various locations and reconcile those activities in a central database. Telecommunication firms use software to track the duration of calls and subsequently determine billing actions to take on user accounts. Hospital databases help keep track of patient medial history so future diagnoses are more successful and doctors can track trends to aid in the prevention of further ailments. As useful as the contributions of software has been to enterprises, there have also been numerous reported failure cases with far reaching negative consequences:

In June 2010, the news of the death of Janice Hall in Minnesota broke out in the media. Janice lost her life because the software controlling an oxygen delivery system in the ambulance conveying her to the hospital suddenly failed, forcing the shutdown of oxygen supply and consequently Janice’s life.

In October 2010, Brasil’s 675MW Angra I nuclear plant could not come online for almost one week after it was shut down for planned maintenance due to software failure problems.

In March 2011, Japan’s financial giant Mizuho Bank experienced serious problems with its banking software: ATMs malfunctioned, deposits could not be processed, foreign exchange transactions halted, and firms using the bank for employee salary payments could not pay their staff. The failure affected an estimated 1 million ATM users.

In March 2011, the Commonwealth bank of Australia was forced to shut down its entire network of ATMs after a problem with its banking systems software caused the ATMs to overpay customers. Word got around quickly that the ATMs were issuing “free” money which allowed customers to draw more money than they had in their accounts (or indicated on the machines) and many people took advantage.

In October 2011, a switch failure in the infrastructure of Research In Motion (RIM) caused the outage of Blackberry services in Europe, Middle East, Africa, and South America for three to four days.

By August 2011, Honda had recalled 2 million cars due to a software failure problem. The problem had to do with the transmission control modules of some of its models. Among the recalled models were the 2001 and 2002 Accord, 2001 to 2003 Civic, 2003 CRV, 2003 Pilot and 2003 Acura 3.2 TL. GM also recalled about 50,000 units of the Cadillac SRX crossover SUVs in June 2011 because of a software failure that may not allow the deployment of airbags for passengers sitting in the right rear seat in the event of an accident.

Software failures happen mostly because of programming inconsistencies or what has become generally referred to as software bugs. A software bug is an error, flaw, mistake, or failure in a software program that makes it behave in an unexpected way or give an unexpected result.

Some Reasons for Enterprise Software Failure

Human error, as it applies to the task of application programming, is the primary reason for the failure of enterprise software. Most of these failures are due to oversights during the software development life cycle (SDLC). In programming, exception handling deals with cases when an “illegal operation” is carried out by the intended user. A typical instance occurs during coding when the programmer does not consider extreme cases of user inputs. For example, the exception handling area of the programming code may not be able to correctly resolve errors that may arise when a user inputs a hyphen along with his name into a field meant for only letters. If this occurs, the programming code may not see it as an exception or error and will translate the operation mathematically, and the result of such a translation will be unexpected. Some failures are also related to sign errors. For instance, some logic may require comparing two numbers and a “>” sign may be used erroneously in the code by the programmer instead of a “<” sign.

Software Failure Prevention Techniques

The task of preventing software failure can effectively be achieved at the time of conception and during the software engineering process. It is better for developers to put mechanisms and controls into place during the design of the architecture and the coding operation from project inception. Below are some of the techniques to use for the prevention of software malfunction:

Code Analysis

This involves inspecting the program text beyond the compiler’s abilities to identify potential problems. The programming code can be analysed using tools that have capabilities for both static and dynamic analysis.

Static Code Analysis is the analysis of software programming code without executing programs built into that software.
Dynamic Code Analysis is the analysis of a software programming code by executing the programs built from the software on a real or virtual processor.

Programming Techniques

Software failures often lead to problems in the consistency of internal data within the running program. A good programming technique is to check this data within the code and provide a means of handling errors that may emanate from such inconsistency. The code may be required to halt the operation of the program and give a message to the user on the specific problem.

Programming style

This is a set of guidelines used when writing the programming code for software. If programmers follow a particular programming style, they will be more likely to read and understand source code conforming to the style and help avoid the introduction of errors or mistakes. Common features of good programming are good appearance, code indentation, vertical alignment of similar variables, spacing, and the creative use of tabs. The naming convention of functions, modules, and variables are also very important parts of a good programming style. For example, intMyAge could be an integer variable that is used to hold the value a user’s age as a number while a variable like charMyAddress could allow both numbers and letters. Functions could also be named based on the actions they are supposed to carry out when called from within another function: CheckIfMyAgeIsInCorrectFormat(intMyAge) could help check to confirm if the passed argument intMyAge in brackets is actually a number. Defining which is the best quality and method may be an object of debate, but actually adopting a style helps to prevent the failure of the software in the future.

Defensive Programming

The main objective of this technique is to ensure that the software continues to function as intended in the event of an error or misuse that may occur inadvertently or due to mischief. This technique aims to improve the general quality of the code by reducing the number of software bugs and errors, making the code more readable and understandable, and also making the software behave in a particular manner in the event of unexpected user action or data input. Below are some common Defensive Programming techniques:

Reduction of Source Code Complexity: The more complex the source code is the more likely it is to fail in the future. It is better to write simple codes and organize them into modules. This gives a lot of advantages especially with organisation of the programming code.
Reviewing of Source Code: The source code of the software program should be reviewed by someone other than the programmer who wrote the code. It is not practical for the programmer to take on this role as it represents a conflict of interest.
Software Testing: There are various techniques available for testing the software and the main objective is to subject the software to unexpected inputs to test possible behaviours. If the behaviour is known under such circumstances then it should be possible to design a countermeasure into the software to ensure that it does not toe the line of catastrophe.
Intelligent Code Reuse: Code that has been used earlier can be recalled and reused in another area of the software. This is possible only if the source code had been designed in modular form, where each subroutine is designed in blocks that can be called independently to perform its specific function when needed elsewhere.

The general rule is to guarantee that errors are minimized in the software code because apart from failures, malicious code may actually exploit known vulnerabilities, especially those associated with a particular programming software tool or known human software coding error. These may introduce salient failures that may impact the profitability of the business in the long term. In 2002, a study commissioned by the US Department of Commerce’s National Institute of Standards and Technology concluded that software bugs, or errors, are so prevalent and so detrimental that they cost the US economy an estimated $59 billion annually, or about 0.6 percent of the gross domestic product. Careful programming could significantly lower that number.

LISTEN NOW: MY CAREER IN DATA PODCAST

Data Topics

Preventing Enterprise Software Failures

Leave a Reply Cancel reply