How do you scale databases to e-business heights? The demands of Internet computing, where every click can turn into a transaction, present a huge database challenge to corporate IT. The long-term solution is database clustering, but organisations must choose between two opposing technical designs and, thus, two opposing vendor camps: Oracle versus just about everyone else.
Choosing the right database clustering strategy is a critical, far-reaching decision for e-businesses, and evidence is mounting that the "shared-nothing" database cluster design used by Microsoft and IBM is the best way to meet the challenges inherent in high-performance e-commerce.
This leaves Oracle, as one of the last remaining proponents of shared-disk clustering, fighting a rear-guard action as it slowly and quietly migrates from its own clustering architecture toward that of its competitors.
Shared-disk versus shared-nothing clustering has been a major split in the database industry for years, but e-commerce workloads are pushing an increasing number of organisations to design for scalability as never before. Corporate IT architects planning for, or experiencing, these kinds of heavy workloads need to choose carefully between the easy failover but fundamentally inefficient design of shared-disk clusters and the nearly unlimited scalability but higher maintenance of shared-nothing clusters.
In shared-disk database clustering, multiple database servers have equal access to a shared set of disk devices. In shared-nothing schemes, each database server manages its own disk devices. Shared nothing is a misleading term, of course: Shared-nothing clusters don't require telepathic CPUs; what's shared is a network connection among all the systems in the cluster.
The clear trend among database vendors, particularly for new projects, is toward shared-nothing designs. Other than Oracle, the only mainstream shared-disk database platform is DB2 on the 390, where IBM solved shared-disk scalability problems by implementing global locking and a shared-memory disk cache into dedicated hardware, its Sysplex Coupling Facility.
Oracle's tack
In discussions with PC Week Labs, Oracle officials extolled the availability
and manageability advantages of its clustered database, OPS (Oracle Parallel
Server), which uses the shared-disk design.
"The problem with shared nothing [is] it requires human intervention to reallocate resources," said Merrill Holt, director of product management for OPS. "The shared-disk approach [can] dynamically respond to usage patterns and adapt to that very efficiently."
Very telling, however, is the company's current effort, called Cache Fusion, to fundamentally redesign OPS to pass more and more information over its network interconnects and continue to distribute ownership of more cluster resources to each local node-in other words, move toward a shared-nothing design.
In Oracle8i, a major change to OPS-and the first part of Cache Fusion-is the ability of one node that had modified a disk block to ship a consistent read image of that block directly to another node instead of writing it out to disk first; this is called I/O shipping.
Oracle also told us that the next major release of OPS will include I/O shipping for the write/write case and the read/read case. These are complicated and fundamental changes to the product's design, but Oracle officials said they see big benefits ahead for customers.
"In the write/write case, it's much more complicated," Holt said. "We have the code working and are working on cleaning up the recovery. We're 99 percent confident we'll have it out for the 8.2 release [of Oracle8i]. We'll be shipping over the interconnect instead of a disk-based ping. It's a huge advantage over disk drives that are slow."
The dominant database architecture on the market is still nonclustered SMP (symmetric multiprocessing) systems. Oracle is deservedly a key player in this market because of its database platform's functional breadth and superior programmability. However, customers needing high-throughput systems either have to invest in very high-end SMP servers that are pushing up against their performance limits or switch to using clustered database systems.
Six of the top 10 systems on the Transaction Processing Performance Council's TPC-C benchmark are clustered systems, and the far-and-away leader on the list uses a shared-nothing design.
The No. 1 figure was achieved by Microsoft's as-yet unreleased SQL Server 2000, which has some shared-nothing features, running on Compaq Computer's ProLiant 8500 servers. The result has sent shock waves through the industry.
The Microsoft/Compaq system achieved 227,079 transactions per minute, about 70 percent higher than the previous No. 1 figure, held by Oracle, of 135,816 transactions per minute. And the Microsoft/Compaq did so at about one-third the cost: $19.12 per transaction vs. $52.70 per transaction for Oracle's best.
This isn't the first time shared-nothing database clusters have rocked the database market. Tandem Computers' shared-nothing NonStop SQL database dominated the TPC-C in the mid-1990s.
Today, shared-nothing database clustering is much less expensive and more mainstream than it was on the specialised Tandem systems because of the availability of volume four-way and eight-way servers and use of off-the-rack networks as cluster interconnects.
In the short term, IBM, not Microsoft, will be the real winner in the shared-nothing space because of the strength of IBM's DB2 EEE (Enterprise-Extended Edition) database. In tests, PCWeek Labs found that DB2 EEE provides mature, well-honed shared-nothing technology that runs on high-volume, low-margin servers such as lower-end systems from Sun Microsystems and those based on Intel processors. For example, we've seen excellent scalability in tests of DB2 EEE using multiple Dell Computer servers. DB2 EEE also provides a single administrative image for managing servers in a cluster, a feature Microsoft's upcoming SQL Server 2000, which uses a federated database approach, will not offer. This critical difference makes IBM's offering easier to tune and less expensive to administer.
Weighing options
Shared-disk clustering grew up with Digital Equipment Corporation's OpenVMS
clusters, which used shared-disk hardware so any VAX in the cluster could
access any disk block. Digital's own Rdb database, later acquired by Oracle,
as well as Oracle's OPS, was originally written for OpenVMS clusters.
The really difficult task with shared-disk designs, especially with data bases, is making sure that all the computers in the cluster have identical pictures of all shared disks, no matter which nodes are writing to which disks. The solution is a shared lock manager component that ensures that only a single node can write to a particular disk block at a time and that every node's local disk cache doesn't contain copies of out-of-date data.
The advantage of the shared-disk approach, once the lock manager is working, is that it provides a similar programming environment to a nonclustered system.
As a result, getting a shared-disk database to market is a lot faster than writing a shared-nothing system, according to Hal Berenson, who was Digital's head of Rdb development when clustering was being added to the database.
"For Rdb, we took the same approach [as VMS clusters]," said Berenson. "The simplest way to [add clustering] is to not change much in the database system. Use the same buffering system, the same I/O mechanism, and take the same locking construct in the database. But extend that, so no matter what node you're running on ... you make sure multiple instances don't step on each other."
Scalability proved to be the long-term problem with shared-disk design. "What we found with Rdb is we never got [traffic levels] so low you would get linear scaling in the cluster," Berenson said. "You could get some level of scaling up to three or four nodes, but after that, things would flatten out."
Microsoft later hired Berenson to help revamp its SQL Server 7.0 release, and he lobbied for a shared-nothing design.
A high level of lock manager traffic is a particular problem for databases that use row-level locking, as all modern databases do, according to Gilles Fecteau, the lead architect for shared-nothing databases at IBM and an IBM Distinguished Engineer.
"If you have an OLTP [online transaction processing] system with row-level locking, you have to check who is the owner. This generates a lot of small messages. Shared disk has 100 times as many messages [as shared nothing]," Fecteau said.
In contrast to shared-disk designs, shared-nothing designs, such as IBM's DB2 EEE, Informix' Extended Parallel Server and Microsoft's SQL Server 2000, don't ship locks or even disk blocks around. They implement a function shipping approach, where a query is divided into pieces and each subquery is sent to the node that owns part of the needed data set.
Each node then accesses all the necessary local disk blocks and sends the answer back over the wire-it's like stored procedures for clusters. As a result, shared nothing is far less chatty than shared-disk designs.
Shared-nothing databases can also make much better use of disk caching because there's a one-to-one match between a local cache and local disk resources. If a disk block is in the cache of any cluster node, it will always be pulled from the cache.
In the shared-disk case, if one node needs a disk block that happens to be in the disk cache of another node, the disk block often will still have to be read from disk by the first node. (OPS has some optimisations so this does not happen all the time.)
Moves to high-bandwidth networks, such as 100MB, 1GB and soon 10GB networks, and to low-latency networks, such as Intel's VI Architecture , almost always make it less expensive to go remote for data if it resides somewhere - anywhere - in cache, rather than going to it on disk.
PC Week Labs Senior Analyst Timothy Dyck can be reached at timothy_dyck@ ziffdavis.com.











