CSIRO work tackles enterprise data mountains

CSIRO's new terabyte science project is aimed at helping science and business cope with ever-growing masses of data.

The world is generating "vast amounts of data, but people don't know how to extract information from [it]", Dr John Taylor, leader of the terabyte science project at CSIRO (Commonwealth Scientific and Industrial Research Organisation), told ZDNet Australia. The organisation is now working on a project that will develop completely new mathematical approaches and processes to deal with such data mountains, he added.

Taylor said businesses will benefit from CSIRO's work on processing large sets of data by gaining new abilities to analyse information produced by data-intensive processes, such as those that surround the use of RFID tags. With the track-and-trace tags set to be embedded in "just about everything", Taylor said, more businesses will find themselves in need of such analysis tools as they tracking the chips and detecting patterns in their movement.

Sequencing of the human genome could also be aided by CSIRO's data analysis work: since automatic genome sequencing machines were developed, data began "flooding in", according to Taylor.

Taylor said that until now, the world has been happy working on small data sets. He continued that if successful, CSIRO's work will cause a "step change in the way we do science" and could lead to the "potential for huge new science discoveries".

Taylor's team has already been considering new algorithms for the square kilometre array, which is an international project to develop a next-generation radio telescope capable of exploring the origins of the universe which will produce "terabytes an hour of data", according to Taylor.

He said that the data analysis methods currently in use will not work on projects such as the square kilometre array, because although they work for small datasets, they are unable to be scaled up for larger ones.

The current methods "won't be able to compute the answers in a reasonable amount of time", said Taylor, since the "computational cost of an algorithm rises as a square of the data points".

To be able to deal with larger data sets, Taylor said it is necessary to consciously acknowledge the problem of scale, and find new mathematical methods to deal with it.

He hopes to build up a generic community of knowledge of algorithms for large data sets by working on projects such as the square kilometre array and that a significant portion of what his team develops for individual projects will be applicable to a wide range of problems across science and business.

Advertisement

Talkback 0 comments


Latest Videos

Blogs

  • Renai LeMay Datacentre disaster lessons
    As a system administrator, the health and status of your datacentre is at the forefront of your mind. But how often do you think about the needs beyond server status and bandwidth?
  • Array E-health too unsexy for COAG
    There will always be something more politically sexy than e-health for state governments, meaning the National E-Health Transition Authority's business case for a national electronic medical record might just sit on the shelf gathering dust forever.
  • Array TelstraUnClear
    Telstra's New Zealand arm TelstraClear is one strange company ...
  • More blogs »

Tags

Back to top

Featured