Technological Themes: Reliability with Self-Healing Mechanism

Reliability is the ability of the system to perform its operations in routine circumstances, as well as hostile or unexpected circumstances. Normally as a system is being developed attention is focused mainly on ensuring optimum performance under normal circumstances. It becomes highly challenging to simulate hostile scenarios due to their unpredictable nature and most of it is understood only when encountered.

Self-Healing mechanism is a technology, which considers the worst case scenario that a system could face and addresses it to ensure that it delivers the desired performance. Self-healing deals with self- cure of problems by the failed objects themselves. This will ensure that they are back at work again. In order to cure the problem, it has to first identify the problem. So a Self-Healing mechanism is composed of automatic error detection and recovery mechanisms.

Before looking to other detailed aspects of Self-Healing systems, let us understand the problem more clearly and how the self healing mechanism solves the problem to a greater extent in our daily usage of computers software.

Example 1: Consider the MS Word application. You want to document some notes and you just start with it and in the process you forget to save the document. After you have documented several pages unexpectedly the word application closes and this could be because of several reasons like power outage, word process being killed by an external process, system shuts down because of hardware failure etc. But the end result of this incident is some of your valuable data is lost as well as the time spent.

Fortunately this is not the case; the word application has a self-healing mechanism which recovers the unsaved or lost data. When the system or application resumes after the unexpected failure and when you restart the word application, you will find a popped-up recovery window, which shows all the documents that were recovered from the unexpected failure. If you open the document that you haven’t saved, to your surprise you will find most of the data you entered.

Example 2: Consider the case of Robots which are getting more used gradually at home, industry and military for wide variety of tasks. Suppose that a robot application is to deliver water from a kitchen to a human being. This application can be composed of several software components, which include Camera, Face Recognition, Obstacle Detection, and Mobility components. The Camera component takes pictures on the path between a kitchen and a human being repeatedly, sending the pictures to both the Face Recognition component and Obstacle Detection component. The Face Recognition component analyzes the camera data to recognize the human being, whereas the Obstacle Detection component uses the data to detect obstacles on the path.

As can be seen in the above example robotic application is composed of several components to accomplish a particular work. So if any one of these components fails, the work is not completed as expected. In this kind of adverse situation a Self-Healing mechanism, which is monitoring all the components in the robot application will immediately analyze the problem in the failed robot component and will automatically take measures to repair and recover the failed component. This greatly improves the performance of robotic applications even in adverse situations.

In summary, either it is a widely used application as Word or more sophisticated as a Robot, a Self-healing Mechanism surely adds the Reliability component to the system and makes it more Robust. But still this technology is very naïve as you find this feature only in some widely used and more popular products or in some critical applications.

In this post, I have just discussed the reliability issue in software systems for which Self-Healing Mechanism could be a possible solution.

In my next post, Designing a Self-Healing mechanism as a layered architecture I’ll be discussing more about design and implementation details of a Self-Healing system.

Technological Themes

Friday, March 5, 2010

Reliability with Self-Healing Mechanism

No comments:

Hits

Reader Location