CSIRO work tackles enterprise data mountains

CSIRO's new terabyte science project is aimed at helping science and business cope with ever-growing masses of data.

The world is generating "vast amounts of data, but people don't know how to extract information from [it]", Dr John Taylor, leader of the terabyte science project at CSIRO (Commonwealth Scientific and Industrial Research Organisation), told ZDNet Australia. The organisation is now working on a project that will develop completely new mathematical approaches and processes to deal with such data mountains, he added.

Taylor said businesses will benefit from CSIRO's work on processing large sets of data by gaining new abilities to analyse information produced by data-intensive processes, such as those that surround the use of RFID tags. With the track-and-trace tags set to be embedded in "just about everything", Taylor said, more businesses will find themselves in need of such analysis tools as they tracking the chips and detecting patterns in their movement.

Sequencing of the human genome could also be aided by CSIRO's data analysis work: since automatic genome sequencing machines were developed, data began "flooding in", according to Taylor.

Taylor said that until now, the world has been happy working on small data sets. He continued that if successful, CSIRO's work will cause a "step change in the way we do science" and could lead to the "potential for huge new science discoveries".

Taylor's team has already been considering new algorithms for the square kilometre array, which is an international project to develop a next-generation radio telescope capable of exploring the origins of the universe which will produce "terabytes an hour of data", according to Taylor.

He said that the data analysis methods currently in use will not work on projects such as the square kilometre array, because although they work for small datasets, they are unable to be scaled up for larger ones.

The current methods "won't be able to compute the answers in a reasonable amount of time", said Taylor, since the "computational cost of an algorithm rises as a square of the data points".

To be able to deal with larger data sets, Taylor said it is necessary to consciously acknowledge the problem of scale, and find new mathematical methods to deal with it.

He hopes to build up a generic community of knowledge of algorithms for large data sets by working on projects such as the square kilometre array and that a significant portion of what his team develops for individual projects will be applicable to a wide range of problems across science and business.

Advertisement

Talkback 0 comments

Latest Videos

Blogs

  • Darren Greenwood Telecom NZ savings damage prospects
    If Telecom NZ wants to have any of the NZ$1.5 billion the government intends to spend on its new broadband network, it had better think long and hard before offshoring 1500 jobs.
  • Array iiNet: The whys and what nows
    Last week the Federal Court ruled that internet service providers are not responsible for copyright violation by their customers. This is an important decision not just for iiNet, which spent around $4 million defending the case, but for all ISPs in Australia and, indeed, globally.
  • Array Govt, hurry up with releasing data
    A programmer scraped data from the My School website to make some really cool heat maps showing regions of smart schools — no thanks to the government, which didn't supply the data in any useful kind of format.
  • More blogs »

Tags

Back to top

Featured