|
|
To print: Select File and then Print from your browser's menu
-------------------------------------------------------------- This story was printed from ZDNet Australia. --------------------------------------------------------------
|
Google open sources 'Protocol Buffers' By Matthew Broersma, ZDNet UK July 11, 2008 URL: http://www.zdnet.com.au/news/software/soa/Google-open-sources-Protocol-Buffers-/0,130061733,339290529,00.htm
Google has open sourced an internal development tool called 'Protocol Buffers', a data description language that forms a basic part of the operation of the company's vast computing cluster. The tool, which has been in use for several years at Google, handles the process in which the company encodes almost any sort of structured information that needs to be passed across the network or stored on a disk, Google open-source programs manager Chris DiBona said in a blog post announcing the move. Protocol Buffers could be useful for other organisations that need an efficient way to move structured data around a network, for instance in large clusters or datacentres, DiBona said. Google uses thousands of data formats for networked messages, and XML is simply too cumbersome to use as an encoding method for it all, Google software engineer Kenton Varda explained in a separate blog post. "As nice as XML is, it isn't going to be efficient enough for this scale," he wrote. "When all of your machines and network links are running at capacity, XML is an extremely expensive proposition." Various other methods exist for passing encoded data over networks, but Google found that none of them suited its particular need, which was for a system optimised for efficiency over everything else, Varda said. Protocol Buffers is a sort of interface definition language (IDL), but IDLs have a reputation for being over-complicated, Varda said. "One of Protocol Buffers' major design goals is simplicity," he wrote. "By sticking to a simple lists-and-records model that solves the majority of problems, and resisting the desire to chase diminishing returns, we believe we have created something that is powerful without being bloated." He estimated the system is at least an order of magnitude faster than XML, while other Google documentation said Protocol Buffers can be parsed 20 to 100 times faster. The binary files produced by Protocol Buffers are three to 10 times smaller than a comparable XML file, Google said. Google released an FAQ detailing Protocol Buffers, along with source code for the Java, Python, and C++ protocol buffer compilers. Google admitted that the system is comparable to long-established projects such as JavaScript Object Notation (JSON), which is often used in Ajax web programming. But JSON, like XML, is a human-readable text format, rather than a binary format such as Protocol Buffers, a fact that reduces JSON's efficiency, Google said. Even so, Google was criticised on some fronts for creating its own system from scratch and ignoring currently existing approaches. David Golightly, user experience developer lead for Zillow.com, argued the textual syntax used in Protocol Buffers could have been made interoperable with an existing text-based format. "I'm always just a little disappointed when someone goes about creating their own new textual format syntax on arbitrary grounds, rather than adapting an existing format to their needs," Golightly said in a blog post. Google is not the first to open source its internal data interchange system: Protocol Buffers is very similar to the Thrift framework, developed by Facebook and now an open-source project in the Apache Software Foundation Incubator. Thrift, however, differs in that it describes services rather than pure data.
Copyright © 2009 CBS Interactive, a CBS Company. All Rights Reserved. |