1. OceanStore by Berkeley University of California
OceanStore is a global persistent data store designed to scale to billions of users. It provides a consistent, highly available, and durable storage utility atop an infrastructure comprised of untrusted servers. Any computer can join the infrastructure, users need only subscribe to a single OceanStore service provider, although they may consume storage and bandwidth from many different providers. Researchers at Berkeley are exploring the space of Introspective Computing—systems that perform continuous, online adaptation. Applications include on-chip tolerance of flaky components and continuous optimization to adapt to server failures and denial of service attacks as well as autonomic computing. [23]
2. Recovery-Oriented Computing by Berkeley University of California
Recovery-Oriented Computing (ROC) project is a joint Berkeley/Stanford research project that is investigating novel techniques for building highly dependable Internet services. ROC emphasizes recovery from failures rather than failure-avoidance. This philosophy is motivated by the
observation that even the most robust systems still occasionally encounter failures due to human operator error, transient or permanent hardware failure, or software anomalies resulting from software aging. [21]
3. Anthill project by University of Bologna, Italy
Anthill is a framework built to support the design, implementation, and evaluation of peer-to-peer (P2P) applications. P2P systems are characterized by decentralized control, large scale, and extreme dynamism of their operating environment and can be seen as instances of Complex Adaptive Systems, typically found in biological and social sciences. Anthill exploits this analogy and advocates a methodology whereby the desired application properties correspond to the "emergent behavior" of the underlying complex adaptive system. An Anthill system consists of a dynamic network of peer nodes; societies of adaptive agents travel through this network, interacting with nodes and cooperating with other agents in order to solve complex problems. Anthill can be used to construct different classes of P2P services that exhibit resilience, adaptation, and self-organization properties. [5]
4. Software Rejuvenation by Duke University:
Software rejuvenation is a proactive fault management technique aimed at cleaning up a system's internal state to prevent the occurrence of more severe crash failures in the future. It involves occasionally terminating an application or a system, cleaning its internal state, and restarting it. Current methods of software rejuvenation include system restart, application restart (partial rejuvenation), and node/application failover (in a cluster system). Software rejuvenation is a cost-effective technique for dealing with software faults that include protection not only against hard failures, but against performance degradation as well. Duke University collaborated with IBM to develop the IBM Director Software Rejuvenation tool. [6]
5. Bio-Inspired Approaches to Autonomous Configuration of Distributed Systems, University College London, England
Next generation networks require new control techniques to increase automation and deal with complexity. Active networks in particular will require the management and control systems to evolve extremely rapidly, since users will be continuously adding new applications, services, and virtual configurations. This research is
exploring novel ad-hoc distributed control algorithms and architectures derived from biological and geophysical systems and measurements of fabricated systems such as the World Wide Web. [16]
4.0 Four basic elements of autonomic computing
By examine the above eight characteristics researches identifies that autonomic computing has four basic elements: self-configuring, self-healing, self-optimizing, and self-protecting.
4.1 Self-Configuring
An autonomous computing system must be able to install and set up software automatically. To do so, it will utilize dynamic software configuration techniques, which means applying technical and administrative direction and surveillance to identify and document the functional and physical characteristics of a configurable item. Also to control changes to those characteristics, to record and report change processing and implementation status, and to verify compliance with specified service levels. Also, downloading new versions of software and installing regular service packs are required. When working with other autonomous components, an autonomous system will update new signatures for virus protection and security levels. Self-configuration will use adaptive algorithms to determine the optimum configurations.
1. Updating Web pages dynamically with software changes, testing those changes, analyzing the results, releasing the system back into production, and reporting back to self-management whether the procedure was successful.
2. Installation, testing, and release of regular vendor service packs.
3. Installation of vendor patches, corrections, and modifications together with the necessary testing and release.
4. Installation of new software releases—automatically and seamlessly.
4.2 Self-optimizing
An autonomous system will never settle for the status quo. It will be constantly monitoring predefined system goals or performance levels to ensure that all systems are running at optimum levels. With the business constantly changing and demands from customers and suppliers changing equally fast, self-adapting requirements will be needed.
Self-optimization will be the key to allocating e-utility-type resources, determining when an increase in processing cycles is needed, how much in needed, where they are needed, and for how long. To be effective, autonomous self-optimization will need advanced data and feedback. The metrics need to be in a form where rapid analysis can take place. Many new and innovative techniques are needed for optimization to be successful. For example, control theory is needed in new autonomous infrastructures. New algorithms to process control decisions will be needed.[y]
Examples:
1. Calling for additional processing power from the e-utility when needed. Releasing those additional cycles when peaks are over.
2. Working with outside vendor software.
3. Interfacing with other autonomic modules to exchange data and files.
4. Optimum sub-second response times for all types of access devices, such as personal computers, handheld devices, and media phones.
4.3 Self-healing
Present computer systems are very weak. They fail at the smallest amount problem. If a period, a comma, or a bracket is not correct, the software will fail. We still have much to do in designing tolerant systems. Autonomous computing systems will have the ability to discover and repair potential problems to ensure that the systems run smoothly.
With today's complex IT architectures, it can be hours before a problem is identified at the root cause level. System staff members need to pore over listings of error logs and memory dumps, tracing step-by-step back to the point of failure. The cost of downtime to the business is too expensive. For example, in large-scale banking networks, the cost can be as much as $2,600,000 per hour. Self-healing systems will be able to take immediate action to resolve the issue, even if further analysis is required. Rules for self-healing will need to be defined and applied. As autonomous systems become more sophisticated, embedded intelligence will be applied to discover new rules and objectives. For example, recall from the pervious section IBM will be building SMART (Self-Managing and Resource Tuning) databases into upcoming versions of their DB2 database product. This database is designed to run with less need for human intervention. For example, the user can opt not to be involved, and the database will automatically detect failures when they occur and configure itself by installing operating systems and data automatically to cope with the changing demands of e-business and the Internet [19].
Examples:
1. Self-correcting Job Control Language (JCL): when a job fails, the errors or problems are identified and jobs rerun without human intervention.
2. An application error forces the entire system to halt. After root cause analysis, the error is corrected, recompiled, tested, and moved back into production.
3. A database index fails. The files are automatically re-indexed, tested, and loaded back into production.
4. Automatically extend file space and database storage, according to previous data on growth and expansion.
4.4 Self-protecting
In an increasingly hostile corporate world, autonomous systems must identify, detect, and protect valuable corporate assets from numerous threats. They must maintain integrity and accuracy and be responsible for overall system security. For years before the Internet, each corporation was an isolated island where threats usually came from within.
Now, outside threats come daily, and security and protection are paramount. Threats must be identified quickly and protective action taken.
Autonomic system solutions must address all aspects of system security at the platform, operating system, network, application, Internet, and infrastructure levels. This involves developing new cryptographic techniques and algorithms, their secure implementation, and designing secure networking protocols, operating environments, and mechanisms to monitor and maintain overall system integrity. Such security solutions need to be standardized to provide/preserve interoperability and to ensure that these techniques are used in a correct way.
To achieve this will require continuous sensors feeding data to a protection center. A log of events will be written and accessed when appropriate for audit purposes. To manage the threat levels, we might expect a tiered level. Threats can be escalated through the tiers for increasing action and priority.
Examples:
1. Confirm the ability of backup and recovery resources that may be needed.
2. Implement tiered security levels.
3. Focus resources on network monitoring and immediately disconnect computer systems with suspicious network traffic.
4. Verify that network configurations inventories are correct and, if not, take action.
5. Contact system administrators outside of autonomous system and other offices that may be affected by the increasing threat levels.
6. Have the system verify that all computer systems are at the appropriate version levels, including "patches." Update automatically as needed.
7. Resolve any open security concerns.
8. Implement any special software for additional security protection according to the threat level.
9. Contact offsite vendors to determine if any preventive measures (patches, etc.) to be applied to both hardware and software.
Table 4.1 compares the four states of autonomic computing with how we manage today and what it will be like with full autonomic systems.
Concept | Current Computing | Autonomic Computing |
Self-configuration | Corporate data centers have multiple vendors and platforms. Installing, configuring, and integrating systems is time-consuming and error prone | Automated configuration of components and systems follows high-level policies. Rest of system adjusts automatically and seamlessly |
Self-optimization | Systems have hundreds of manually set nonlinear tuning parameters, and their number increases with each release | Components and systems continually seek opportunities to improve their own performance and efficiency |
Self-healing | Problem determination in large, complex systems can take a team of programmers weeks | System automatically detects, diagnoses, and repairs localized software and hardware problems |
Self-protections | Detection of and recovery from attacks and cascading failures is manual | System automatically defends against malicious attacks or cascading failures. It uses early warning to anticipate and prevent systemwide failures |
Table 4.1 A Comparison of current Computing Systems, with autonomic computing
5.0 Autonomic computing architecture
In an autonomic computing architecture, the basic management element is a control loop, depicted in Figure 5.1. This acts as manager of the resource through monitoring, analysis, and actions taken on a set of predefined system policies. These control loops, or managers, can communicate and eventually will negotiate with each other and other types of resources within and outside of the autonomic computing architecture [14].
|
Figure 5.1 An example of a basic autonomic control loop. |
This collects information from the system and makes decisions based on that data and then issues instructions to make adjustments to the system. An intelligent control loop can provide functionality of autonomous computing, such as the following: