X
Tech

CSIRO work tackles enterprise data mountains

CSIRO's new Terabyte Science Project is aimed at helping science and business cope with masses of data.
Written by Suzanne Tindal, Contributor

CSIRO's new terabyte science project is aimed at helping science and business cope with ever-growing masses of data.

The world is generating "vast amounts of data, but people don't know how to extract information from [it]", Dr John Taylor, leader of the terabyte science project at CSIRO (Commonwealth Scientific and Industrial Research Organisation), told ZDNet Australia. The organisation is now working on a project that will develop completely new mathematical approaches and processes to deal with such data mountains, he added.

Taylor said businesses will benefit from CSIRO's work on processing large sets of data by gaining new abilities to analyse information produced by data-intensive processes, such as those that surround the use of RFID tags. With the track-and-trace tags set to be embedded in "just about everything", Taylor said, more businesses will find themselves in need of such analysis tools as they tracking the chips and detecting patterns in their movement.

Sequencing of the human genome could also be aided by CSIRO's data analysis work: since automatic genome sequencing machines were developed, data began "flooding in", according to Taylor.

Taylor said that until now, the world has been happy working on small data sets. He continued that if successful, CSIRO's work will cause a "step change in the way we do science" and could lead to the "potential for huge new science discoveries".

Taylor's team has already been considering new algorithms for the square kilometre array, which is an international project to develop a next-generation radio telescope capable of exploring the origins of the universe which will produce "terabytes an hour of data", according to Taylor.

He said that the data analysis methods currently in use will not work on projects such as the square kilometre array, because although they work for small datasets, they are unable to be scaled up for larger ones.

The current methods "won't be able to compute the answers in a reasonable amount of time", said Taylor, since the "computational cost of an algorithm rises as a square of the data points".

To be able to deal with larger data sets, Taylor said it is necessary to consciously acknowledge the problem of scale, and find new mathematical methods to deal with it.

He hopes to build up a generic community of knowledge of algorithms for large data sets by working on projects such as the square kilometre array and that a significant portion of what his team develops for individual projects will be applicable to a wide range of problems across science and business.

Editorial standards