Putting the brains into your network

Automatic Correction
The previous focus of the automatic management of servers and other devices has been on event handling (eg, by OpenView and Tivoli), but according to Munch the emphasis is now moving towards end-to-end performance monitoring. This functionality is not being embedded in the network itself, but is being achieved by devices providing the hooks for management systems.

"People are starting to correlate alarms to give network actions," says Boland.

What's needed, says Munch, is a combination of load management, application level response monitoring (with different metrics for different applications), root cause analysis (as manual processes are too slow), and business level reporting (eg, in terms of a service level agreement).

In any case, change management requires proper processes, he says. Reconfiguring a network should not be an ad hoc process -- it's important to simulate and evaluate first, and doing this for a large network requires the right tools. However, there is a need for automation in order to improve the response time when problems arise.

Intrusion detection or prevention systems are capable of automated responses, but it should be a business decision to allow such automation, says Nick Day, a technical specialist with NetStar. He suggests automatic actions should be perhaps restricted to times when an appropriately skilled engineer is present to undo the mess if things go wrong, or that they should be disabled during business-critical times such as end-of-month processing.

House says that at some stage Packeteer will start to offer automatic actions based on correlated information. For example, if connections to a device suddenly spike, the reaction might be to cap bandwidth to that device and alert the appropriate administrator. If you believe such a spike could only be the result of some sort of attack, the cap could be set at a very low value.

Even if you don't want to go as far as automating the response, automated diagnosis can be a big timesaver. Felix Marks, technical services manager, Australia and New Zealand at Micromuse says it can be very expensive to have sufficient staff to handle the problems that can arise in a large network with multiple routers, but the cost of an outage can be extreme.

The company's Netcool/Visionary product automates the diagnosis of problems concerning routers, switches, and associated equipment. Unless multiple aspects of devices' behaviour are considered, it can be difficult to detect misconfigured routers and other types of problems, says Marks. Visionary "is like having many of these [skilled] engineers constantly looking at those devices," he says.

Visionary ships with around 1000 pre-defined rules. For example, it knows that border gateway protocol update problems are manifested by three different symptoms occurring simultaneously. While it can take a skilled engineer up to two weeks to diagnose this type of problem, Visionary can detect it before services are affected, Marks says.

Ericsson is using Netcool products including Realtime Active Dashboards (RAD) in the management of its customers' networks. General manager network operations Michael Pease says Netcool was originally used to manage alarms, but Ericsson is moving to proactive management and the tools help maintain customer service levels and provide those customers with the information to make business and investment decisions. For example, RAD helps the company to prioritise different alarms, while customers can detect trends that will require high-level action rather than merely ongoing network management.

Computer Associates' neugent (neural agent) technology is implemented in the Unicenter suite to predict events, says principal consultant Robert Cruchley. The software finds clusters of activity, and determines patterns of movement from one cluster to another. Predictions can then be made about the likely future state of the system based on real-time information about its current state. This works well for the behaviour of servers, but "network usage tends to be unpredictable," says Cruchley, as it relates to real-world conditions -- even the weather (people are more likely to stay in the office at lunchtime on wet days and surf the Web instead of going for a walk in the park).

IBM is probably the company most associated with the move towards self-managing systems. The company suggests there are five steps along this path:

  • Basic. Everything monitored and managed by people.
  • Managed. Information from multiple subsystems is correlated into a small number of consoles.
  • Predictive. Management software suggests actions for human approval.
  • Adaptive. The system takes those actions without intervention.
  • Autonomic. Systems and components are dynamically managed by business rules and policies.
Most medium to large organisations would be at the managed or predictive level. Some operations are at the adaptive level. For example, IBM used its ThinkDynamics Intelligent Orchestrator earlier this year to automatically allocate infrastructure between the Australian Open Web site, grid-based credit analysis, and protein folding experiments. According to IBM officials, server resources were shifted as needed to manage the unpredictable spikes in demand for the Web site, allowing visitors to the site to access continuous, uninterrupted live scores and results.

Some IBM server products already include features for self-healing, self-optimising, and other characteristic of autonomic computing.

  • Automatic Correction
  • Advertisement

    Talkback 0 comments

    Sponsored content

    Power Centre - Content from our premier sponsors

    Blogs

    Tags

    Back to top

    Featured