Virtual stores

By Stephen Withers
14 August 2003 06:30 PM
Tags: hsm, storage, technology, nas, business, virtual, san, virtualisation


Virtualisation is the latest buzzword in storage. How will it help you simplify your storage management, and when will it be ready?

The idea of separating storage from processing through the use of SAN (storage area networks), NAS (network attached storage), and related technologies has many attractive implications. To take full advantage of the opportunities presented, an additional layer--storage virtualisation--is needed.

Just as a DBMS (database management system) means an application programmer need not be concerned with how data is stored (a database can be reorganised and applications keep working without change), storage virtualisation isolates logical storage entities from the physical devices that implement them, so administrators need not be concerned about exactly where data is stored.

Storage hierarchy
The idea of hierarchical storage management (HSM) is not new. Go back a couple of decades, and the limited capacity of disk drives meant it was common practice to keep only current and essential data on disk, with older or less important data stored on tape and brought online as needed. Various tools were available to automate this process.

"Virtualisation has been around since the year dot," says Ian Selway, HP's product manager, network storage solutions. Vendors tend to talk about what's going to happen, then we get the reality, he says, but "it's all about reducing the cost of ownership and providing business value."

While monolithic storage has a significant value proposition such as very high availability, these advantages come at a price. It makes sense to classify data in terms of requirements for performance and availability, and then map classes of data onto classes of storage, says Selway.

"For organisations that require large amounts of storage, a storage differentiation approach is very much an option," says Greg Bowden, business manager at Dimension Data. "Where perhaps two different classes of storage are deployed providing different levels of service, one storage class might provide basic levels of redundancy, while another class might provide data replication for redundancy, snapshot technology for backups, and reporting and so on."

Even though disks are now cheap by historical standards, the growth in data volumes from e-mail, CRM systems and so on means that most organisations still can't afford to keep all their data on disk, especially when the need for disaster recovery is factored in. Disk prices are falling in terms of their cost per byte, but that's been offset by increased demands for space. Storage manufacturers have been recording flat revenues, and CIOs find themselves spending as much as ever just to tread water.

"We have this insatiable appetite for information," says Michael Burnie, managing director of Network Appliance Australia and New Zealand. Exploding quantities of data means the demand for tier-two storage (cheap, slower disks such as SATA drives) is growing because organisations can't afford to keep everything on tier-one storage, but he says "you can't buy expensive storage when you're leveraging blade servers" and other low-cost technologies. However, "tier three [tape] will never go away".

Policy can help
The solution to the seemingly infinite demand for storage is not necessarily technological. Kevin McIsaac, research director, Asia-Pacific at META Group, says IT organisations must implement some kind of demand management.

"The best approach is economic, where some type of costing mechanism is used with the business to pass back the full cost of the storage growth," he says. This will drive changes in behaviour that will enable IT to reduce demand. Examples include policies on the use of particular storage assets (no audio or video files on fileservers, for instance) and cleaning up old data by summarising or archiving. "Without demand management, IT simply can't influence [users'] behaviour," says McIsaac

“Virtualisation has been around since the year dot. It’s all about reducing the cost of ownership and providing business value.”
The promise of storage virtualisation is that it will allow more finely grained control over the type of storage used for particular types of data.

For example, if a customer is going to query a bill from a utility company, they are most likely to do so soon after the bill arrives. This means it makes sense to keep such data on fast disks for a week or two, so the call centre can respond quickly to most enquiries. The older the bill gets, the less likely the customer is to ask a question about it, so the underlying data can be gradually migrated to increasingly slower and cheaper devices. Eventually, it may be permissible and even desirable to delete the data completely, but that's another story.

Similarly, if a manager doesn't want to drill down to a finer level of detail soon after receiving the report, they probably never will. The reasoning applies to most types of data, the difference being how quickly it ages.

Mike Palermo, director of StorageTek's Application Storage Manager business group says industry analysts report that between 60 and 70 percent of data on primary storage is not accessed within a six-month period. This is clearly inefficient, but the process of moving data on and off primary storage must be automated if a reduction in the total cost of ownership is to be achieved.

Why HSM?
External pressures such as the requirement that US companies store e-mail messages for seven years are encouraging renewed attention to HSM, says Andrew Antal, senior consultant, storage at Computer Associates Australia. "There's still the issue of the manageability of an HSM solution," he says, and the cost of management has to be balanced against any hardware savings.

HSM started at a time when there was a huge difference in the cost of different types of storage devices, but falling disk prices means the saving isn't so large. Consequently, disk-to-disk backup is becoming more common and this can be done while applications are running thanks to virtualisation software. It also allows point-in-time copies to be made for quick recovery, spreading them across various devices.

"There's going to be a whole revaluation of the lifecycle of data," says Selway. For example, health records may be retained for a certain number of years, but moved from first to second-tier storage when inactive. "HSM never really fulfilled its potential," he adds, but it is re-emerging in the concept of tiered storage in a virtualised environment.

Products in this space may work either generically, moving entire files from tier to tier, or in a program-specific manner, migrating individual records from e-mail systems or database servers in a way that is transparent to the applications involved.

Price-performance
Storage classes (or tiers) typically represent different price and performance levels, says Clive Gold, marketing director at EMC. Storage units from different vendors may be grouped in one class if they have similar parameters, and one unit may be placed in multiple classes if it can present different characteristics (eg, it may be possible to choose whether data is kept permanently in the cache of a storage unit, or one unit may contain a mix of devices such as disk drives with Fibre Channel or ATA interfaces).

Storage tiers are "a very valid reason for virtualising," says Vic Madarevic, who is responsible for storage solutions marketing and management in Australia and New Zealand for Hitachi Data Systems. Tiers mean resources are used most effectively, and virtualisation means pools of storage are available for allocation and reallocation without the management headaches, even when using more than one vendors' hardware.

However, the tools needed to mix and match different vendors' products are hard to find at this stage.

Currently, customers buy high-performance storage for a particular application, then put other applications onto the same device to improve its utilisation, says Mark Bregman, executive vice president, product operations at Veritas. When the device starts to become full, they buy more of the same type of storage.

Service levels
Dan Kieran, national storage business manager at Sun, suggests application users and database administrators don't really care about disks, but about measures such as capacity, performance, and importance--for example, they need 10GB of space with a particular I/O rate and a maximum recovery time of two minutes in the event of failure. These measures must be mapped onto arrangements such as disk-to-disk backup and RAID configurations. By turning this around, organisations can implement three or four storage service levels and let users pick the appropriate one for their needs. Currently, you need to know a lot of technical details to make that choice, but this will change, he says.

The goal is to reduce complexity by making disparate systems appear the same, and "it's going to be done by software," says Burnie.

McIsaac says it is very realistic to begin setting up tiered storage. "Our leading clients are doing this now," he says, but warns "this requires a high-level sponsor to drive the required changes, eg, charging by tiers, using specified storage."

On demand
Virtualisation makes it easier to make efficient use of storage, as there is less need for headroom. Instead of leaving 10GB of free space on each of a dozen systems, it may be possible to have the same amount of spare capacity--or even less--shared between all those servers.

Joan Tunstall, marketing manager, Australia and New Zealand at StorageTek, points out that usually disk volumes are not very full, and with virtualisation the free space in a logical volume doesn't occupy any space on a physical drive.

With virtualisation, you "just use the disks you have more efficiently," says HP's Selway. Directly attached storage is typically 30 percent utilised, consolidated storage lifts this to 50 or 60 percent, and virtualisation can give a further improvement to perhaps 75 or 80 percent, he says.

The use of storage pools means applications are unaware of the physical disks holding the data. This means adding new hardware is a non-disruptive process, whether you are adding a new server or a new storage device, since neither can see the other. Similarly, an application is unaware of the failure of any particular storage device as the virtualisation layer can provide access to an alternative copy of the data.

According to Bregman, secondary gains from virtualisation include more effective backup strategies and more sophisticated HSM and disaster recovery strategies--when users want storage, they would be able to specify requirements and performance.

Virtualisation promises to allow the non-disruptive movement of data between storage technologies, such as from an old storage device to a new one to overcome performance issues but without stopping the application using that data, says Garry Barker, storage consultant at IBM.

Management
Virtualisation combines the manageability improvements provided by storage consolidation with those that come from removing the barriers between different types of storage unit, says Barker. "The virtualisation philosophy is to make the storage one entity presented to the outside world."

A major selling point for most players in the virtualisation market is the cost of storage management. Sal Ferando, technical architect-solutions partner at Veritas, says you can buy storage for around US$1 per megabyte but then spend around US$15 to manage it. Barker makes a more modest claim that hardware cost is around 15 or 20 percent of storage TCO.

But these claims don't go unchallenged. In a report (Storage Technologies: Separating Fact From Fantasy) published late last year, META Group states its own research shows hardware and software costs are as much as 60 percent of the three-year TCO of storage. Furthermore, around two-thirds of the management costs are related to backup and restore, the firm claims, and the shift to disk-based backup and recovery has the potential to halve storage management labour costs.

Graham Penn, IDC's Asia-Pacific director of storage research, says organisations are buying more storage software in an effort to manage the huge capacity they have installed.

Hitting the management wall
The availability of cheap storage has been outstripping our ability to manage it, says Barker, so "storage at some point becomes a scalability barrier".

"Now we're talking in petabytes" of storage, says Burnie, but although the price per gigabyte has fallen the management complexity has yet to change.

Selway says that rather than the generally accepted ratio of one administrator to 10TB of storage, virtualisation and its associated tools provide a consolidated view and allow easy provisioning so one administrator might be responsible for as much as 100TB of storage.

Much of the benefit from virtualisation is expected to come from policy-based automated management. EMC's Gold says "what people want is one unified way . . . to set a business policy to work across different systems." Current policy-based tools are application-specific, working with particular programs such as SAP to move records relating to transactions more than 12 months old into a secondary storage tier.

Some people were expecting networked storage to automatically deliver lifecycle data management, but networked storage is merely a precondition for virtualisation, he says. While virtualisation abstracts access to storage, you still need a component that can move data between storage devices or--more appropriately--between classes of storage.

Simplifying management
Palermo says HSM failed in the open systems arena because the need to separate frequently and infrequently used data and to provide on-demand access put pressure on skilled people to manage the retrieval process, so the staffing costs offset the hardware savings. Policy-based automatic processes can provide a net saving, he says, and from the user's perspective the only change is that the system response time is sometimes a little longer than it would have been if the data were on primary storage.

“The best approach is economic, where some type of costing mechanism is used with the business to pass back the full cost of the storage growth.”
Virtualisation can simplify management in other ways, suggests Mario Blandini, senior fabric applications and solutions manager at Brocade. For example, patch management can be a major issue as it is necessary to schedule downtime to install patches. If volume management is performed in the switch, it becomes possible to apply patches to even 100 servers in a non-disruptive way.

"What you're seeing is just the beginning," says Gold. Most organisations do not have enterprise-wide systems, he says, instead they operate various mail, ERP, and other systems. Consequently, there will be a continuing need to separate servers from the information they use, and storage virtualisation does that, moving some of the intelligence from the servers into the network.

"It really doesn't have to be difficult," says Grant Smith, storage software manager at IBM, providing you can introduce intelligent mechanisms to take the load away from human administrators--or as Tunstall puts it, "control in a hands-off fashion".

Some inroads towards policy-based management are possible, says McIsaac, but "the hype is 18 months in front of the reality," he warns.

Intelligent software
While current policy-based software takes the characteristics of a storage device as given, Penn suggests that in three or four years it will be capable of making recommendations such as the installation of additional drives within a storage unit in order to meet performance goals.

CA's Antal agrees. Some degree of virtualisation is a given in a SAN environment, he says, but the missing piece is an application or user-based management engine that can automate back end processes. For example, such an engine could detect that an Oracle application was running out of disk space, and automatically request and allocate more storage. That's coming, but "we're not there yet," says Antal.

Mechanisation or automation?
Drawing an analogy with the history of the phone system, Gold distinguishes mechanisation and automation. The transition from manual to mechanical exchanges was mechanisation: the same task was done, simply without human intervention. But when those mechanical exchanges were replaced with electronic switches, the door was opened to previously impossible functions such as real-time routing according to costs.

The interface of EMC's SAN Manager lets an administrator drag and drop a chunk of storage onto a server, and then the software sets up the connections between the server, switch, and storage unit. "That's mechanisation," says Gold. The more recent Automated Resource Manager (ARM) is a provisioning tool that lets the administrator define classes of storage, and specify how much storage of which class is required by a particular application. ARM then allocates and manages the storage according to the defined policies, "and that's automation."

But the ability to move (for example) a particular subset of records instead of the entire database (as opposed to moving the data and leaving "stubs" in the database that act as pointers to the new locations) is around 12 months away, Gold suggests.

Automation questions
Bregman draws an analogy with the water supply: "We're currently managing the pipes, but customers care about qualities of the water" such as whether it is hot or cold. They aren't interested in the plumbing, he says, they just want the water to be there when they need it, and they want billing that tracks their usage.

"How do you manage all this at a business level?" he asks. For example, if a certain application becomes less critical to the organisation, you would want to move its data to lower quality storage.

"Software is where the value is in an IT infrastructure," says Ferando. "People have to let go of the physical relationship between their data and the storage," he says. "You need to have classes of storage, and software to apply those policies automatically."

But simply providing storage virtualisation is not sufficient, says Ferando. Additional tools are needed to support the utility model of IT, covering server provisioning, storage provisioning, and applications monitoring.

Like others in the industry, Burnie predicts the emergence of a "storage tone" utility structure that includes the user-controlled restoration of corrupted or inadvertently deleted files. "There's a lot going on under the covers" of such a scenario, but the important thing is that it will be automated with no human involvement in the background processes.

Another important trend is the use of intelligent (eg, self-tuning or self-healing) systems to reduce human involvement in everyday operations, Madarevic says, and this matches equivalent strategies from other vendors such as IBM's push towards autonomous systems.

Implementation
"Virtualisation isn't new, it's just come back to the fore again," says Antal.

Traditionally, storage virtualisation has been done at the server or storage array level, but now it can be done in switches (such as those from Brocade) or by appliances.

Virtualisation is often done at the host level, says Antal. The downside of host-based virtualisation is that it can put a significant load on the CPU, especially when creating or extending a volume. The cost of software licences can also become an issue as the number of hosts increases.

Bregman says "Virtualisation has become a catchphrase." Veritas has been doing it for years, he says, but called it "volume management". The job is basically the same wherever it is done, though some aspects are better done close to the host (eg, naming), while others are better done close to the disks (eg, RAID management).

HDS' Madarevic says immediate benefits can be achieved through storage consolidation. He claims 60 or 70 percent of the advantages of virtualisation can be obtained from consolidating and rationalising storage software and hardware. He says organisations should wait for a choice of standards-compliant switches before proceeding with virtualisation. "The world's not going to stop if you don't go for virtualisation today."

Standardisation of low-level functions (eg, data copying) is underway, with International Committee for Information Technology Standards (INCITS) Technical Committee T11 recently accepting a proposal from Brocade for an API for this purpose, with the expected release of Fabric Application Interface Standard (FAIS) in mid-2004.

“You need to fix the problems at the core, not just sweep them under the carpet.”
Brocade is offering its XPath technology as the basis for the standard. The company was the author or co-author of most major Fibre Channel standards and was a major contributor to the Bluefin/SMIS management standard.

While an early partial implementation may provide benefits in the area of data migration and business continuity, the additional components required might add to the risks, warns Madarevic.

Virtualisation is ideally implemented in the switch, as it sits in the data path, says Madarevic.

Who should do the work?
Sun Microsystems' Kieran echoes Antal's warning about the CPU load involved in host-based virtualisation, but points to the advantage of being able to use any vendors' drives.

Doing the work in switches makes it possible to virtualise cheap disk drives, but "there are still some host components involved" to manage the process and provide a user interface, Antal says. Virtualisation in switches can improve performance because the work is being performed in hardware that's close to the disks. Further advantages are that this arrangement provides easier manageability and simplifies access from multiple operating systems.

Brocade's Blandini expands on this theme, observing that an organisation using storage arrays from multiple vendors currently needs expertise in multiple replication technologies. With virtualisation in the fabric, any-to-any replication becomes possible with a single application, even when duplicating data from a high-performance array to mid-range drives or to a remote site.

Switch-based virtualisation provides storage companies with "a new vehicle for selling their storage software," he says. Each of their products provides unique value, and there are opportunities for revenue growth as customers may have three or four brands of storage but they will want a single set of software to manage it.

According to Antal, another reason for the popularity of switch-based virtualisation is that switches can be clustered to avoid a single point of failure more cheaply than virtualisation software running on clustered servers.

This arrangement allows flexibility of hosts and storage, says Kieran, and this is where Sun's Data Services Platform (DSP)--part of the company's N1 strategy--is positioned. Where NAS tends to be file oriented and SAN is block oriented, DSP can serve blocks of data or files: "all data is treated the same," says Kieran. Volume management, point in time copies, and other functions are performed inside this layer.

The first set of DSP products has been released, but Sun has initially chosen to sell services rather than products, as "we want to develop best practices and implementations," says Kieran.

Switching to appliances
Bregman says switch vendors looking to implement fabric-based virtualisation tend to say "OK, it's only software, let's do it ourself," but managing persistent data in the network is very different to the data communications issues they are used to dealing with, so vendors such as MaXXan Systems have chosen instead to put Veritas' software into their switches.

He expects developments in storage virtualisation to follow the pattern exhibited by client-server computing: conventional practice began host-based, moved to client-server, and then evolved to a multi-tier model. With virtualisation, he predicts that the move from host-centric to switch-centric models will be followed by a collaborative model.

Appliances are black boxes, says Antal, in that they take care of virtualisation but the administrator has no real control over how it is done. Using an appliance also provides a single point of failure, he adds.

Madarevic is also pessimistic about appliances. "In-band hardware components [eg, virtualisation appliances] are potentially the wrong solution as you are adding complexity," he says, suggesting a better approach is to add intelligence to existing components such as switches.

Some major vendors including IBM and HP are taking the appliance route.

Mike Zisman, VP corporate strategy at IBM says existing SANs are complex, especially when it comes to advanced functions such as point-in-time copy and remote copy. And as for interoperability, there are "lots of things that should work, but don't quite."

The first version of the IBM virtualisation appliance supports two IBM disk systems, Barker says, but by the end of the year other vendors' units will be supported along with multiple operating systems.

Advertisement

Talkback 0 comments

Latest Videos

Sponsored content

Power Centre - Content from our premier sponsors

Blogs

Tags

Back to top

Featured