Distributed computingwhich harnesses the power of multiple CPUsgrew out of scientists and academics needs for processing power, but it is rapidly developing commercial applications.
One of the hottest tickets in distributed computing is grid computing. According to Ian Foster, associate division director, senior scientist, and head of the distributed systems lab at Argonne National Laboratory, the scientific community realised in the early 1990s that high-speed networks presented an opportunity for resource sharing.
This would allow interpersonal collaboration, distributed data analysis, or access to specialised scientific instrumentation.
This led to the Globus Project, which defines grids as persistent environments that enable software applications to integrate instruments, displays, computational, and information resources that are managed by diverse organisations in widespread locations. Foster qualifies this by identifying three distinguishing features of grid computing:
- Coupling resources that span multiple administrative domains;
- Doing that in a way that provides well-defined and non-trivial qualities of service, whether its performance, security, or something else; and
- Doing it with standard, open protocols so resources can be incorporated into other systems.
- The Globus Toolkit created at Argonne is used by essentially all the big science projects across the US and Europe. In Australia as well theres a lot of interest [and] Japan is just starting a big program, says Foster.
The idea attracted broader attention when Foster, Carl Kesselman of The University of Chicago, and Steven Tuecke of The University of Southern California, published a paper called The Anatomy of the Grid which showed how this idea of resource sharing could be generalised. IBM picked up the idea and the two organisations collaborated on the Open Grid Services Architecture (OGSA) which is basically grid meets Web Services.
According to Foster, an OGSA version of Globus Toolkit will appear late this year, and IBM, Microsoft, Sun, and other companies have committed to OGSA-compliant versions of relevant software.
One of the goals of OGSA is to provide the building blocks required for the construction of autonomic (self-managing) systems. Were taking WSDL (Web Services Description Language) and building on that, defining standard WSDL interfaces for things like manageability of resources, lifetime management of services, service data accessso that every resource has a standard format description of its capabilities, standard WSDL interfaces for subscribing to notification events, says Foster. The next version of the Globus Toolkit will define WSDL interfaces.
The big grid
The first components of a Grid described as the worlds largest, fastest, most comprehensive, distributed infrastructure for open scientific research are expected to go online early next year.
Funded by the US National Science Foundation, the US$53 million TeraGrid will include 13.6 teraflops of computing power, over 450 terabytes of data storage, and high-resolution visualisation systems, interconnected by a 40Gbps network. The nodes will comprise Linux clusters of Intel-based IBM computers. Sun and Oracle are also involved in the project, and the research partners are the National Center for Supercomputing Applications at the University of Illinois, the San Diego Supercomputer Center at the University of California, Argonne National Laboratory, and the Center for Advanced Computing Research at the California Institute of Technology.
IBM is also involved in several other major Grid projects including the UK National Grid, the North Carolina Bioinformatics Grid, and the University of Pennsylvania Grid.
SGI is another company closely involved with the development of grid computing. The Globus Toolkit was developed entirely on SGI systems, and the companys hardware ran the first public demonstration of grid technology by Argonne and the University of Southern California. The NASA Information Power Grid is powered exclusively by SGI, and the company claims its systems are used in almost all of the major grid installations in Europe, North America, Japan, and Australia.
Australian universities and other organisations are collaborating to create GrangeNet (GRid And Next GEneration Network), which will connect Melbourne, Canberra, Sydney, and Brisbane with a high-speed backbone network that will provide a platform for grid and other services.
Wheres it going?
Grid computing has been driven by technical areas such as life sciences, automotive design and testing, and electronic design, says Kevin Mayo, Asia-Pacific government technologist at Sun. His company uses the technology internally for chip design, with a campus grid covering three centres. The tools are mostly there now, says Mayo, including open source and public domain software. Over 3000 new grids were created with Suns software last year, he says. The average size is 40 CPUs, but some have over 1000.
Despite that initial emphasis, Foster says that vendors such as IBM seem to be looking to apply the technology to the notion of sharing heterogenous resources within organisations in order to make more effective use of them.
Early examples of this might be seen in hosting centres. If you look at most hosting centres theyre incredibly inefficient with all these dedicated stacks of machines for each user. The promise of grid computing is you can start being much more dynamic with resource provision at the back end, and then at the front end you make it possible for people to dynamically acquire resources from different locations rather than rely on the static resources they have, says Foster.
And then companies like Sun, IBM, and HP want to move towards utility computing, where youre starting to break down this tight coupling between producing and consuming of services, and you can achieve economies of scale and hopefully reduce the cost of acquiring and using various kinds of computing services. But thats maybe some time out, two or three years, he adds.
Security must also be addressed, and allowing the use of resources without increasing risk is a challenging issue.
Grid services could be accessed through a portal, suggests Mayo. Users would be able to indicate the type of hardware needed for their problem and an approximation of the time required, and the portal would act as a broker, discovering the available resources, scheduling the jobs, and handling security issues.
Dont DIY
Jim Miller is lead program manager, common language runtime (kernel) at Microsoft, but spoke to Technology & Business in a private capacity as someone with long experience of distributed computing, a topic he describes as a recurrent theme in IT.
Miller is concerned by the research establishments tradition of custom building their own distributed systems, when much is available commercially with broadly similar capabilities, he says. For example, researchers shouldnt worry about creating new wire formats and protocols, since they can deliver only marginal performance improvements. Instead, existing and emerging standards such as Web Services can provide these functions. Although Web Services uses a text protocol rather than binary, this incurs a performance penalty perhaps as small as 10 percent. It would be useful if commercial and research work was aligned this time, he suggested.
But Miller understands the reluctance to throw away infrastructure thats been created over a period of years, even if that means ongoing costs escalate.
The focus of distributed computing has shifted from parallelism to the sharing of equipment for computing, he suggests, for example, to take advantage of processing time outside normal hours. Although it is now viable on a global scale, I dont think its going to make the commercial cut this time round, he says, due to the need for security and a mechanism to support accounting for payment.
According to Kim Branson, a computational biochemist at CSIRO Health Sciences and Nutrition, grids already have security features to prevent eavesdropping, but they are only as secure as the individual machines doing the work.
Back to time-sharing
Work is underway locally to address the need for a market in grid services. The flagship project of the Grid Computing and Distributed Systems (GRIDS) Laboratory at The University of Melbourne is called GridBus (for grid computing and business). Its initiatives include the development of an architecture for a grid economy (including brokers that can dynamically trade for grid resources in an open market), and GridBank, a system to handle accounting and payments for grid computing.
Branson says such an approach is likely to work in two ways: organisations that donate processing time will receive credits that can be used to buy time on other systems (obviously, time on a modest PC cluster would be less valuable than time on a real supercomputer), and people who cannot afford the capital cost of high-performance computing will be able to purchase time on the grid. He hopes this will allow researchers in countries with limited resources to find cures for local diseases that are not of commercial interest.
But why trade time on one system for another? Branson explains that certain problems are best solved on particular types of system. Some problems benefit from hardware with vector maths capabilities, others need large amounts of memory. A heterogenous grid provides the flexibility to attack different types of problems, he says. Another aspect of the grid economy is the ability to juggle time and cost. Existing scheduling algorithms usually enable jobs to be completed within the requested time period and notional budget, he explains.
Branson works in computer aided drug design. This involves testing compounds to see if they will usefully bind to protein receptors associated with particular diseases. The virtual screening process compares the shapes of the compound and the receptor to ensure a sufficiently sticky fit to block the normal action of the protein. What makes this job difficult is that the receptors are tested against databases of six to eight million compounds.
The size of the database begins to pose some interesting problems in a grid environment, he says. It is too large to store copies at every site, so a few copies are located around the world. When a particular set of calculations are scheduled, the system concerned collects information about the relevant compounds from the closest copy of the database. Once this virtual process is completed, promising candidate compounds are tested in the laboratory.











