Govt, hurry up with releasing data

Gen Why?

Josh Taylor

Millennials were raised on technology -- they never had to be taught. So if you really need someone to explain what it all really means, just ask Gen-Y geek Josh Taylor, and he'll blog about it (whenever he feels like it).

Related gallery

CeBIT 2012 opens: photos

CeBIT 2012 opens: photos

A programmer scraped data from the My School website to make some really cool heat maps showing regions of smart schools — no thanks to the government, which didn't supply the data in any useful kind of format.

Joel Pobar, a former Microsoft employee, showed on his blog how he combined the data he scraped with Google Maps to show visual heat maps showing which regions offered the best education.

He marked schools across the state as either green or red in colour on Google Maps, depicting how good the school's average was. It made for an interesting map, and from first impressions, revealed some very interesting information visually, such as city schools having better averages than rural schools.

NSW with the My School data applied

NSW with the My School data applied
(Credit: Joel Pobar)

But these images weren't easy to produce. Pobar had a lot of trouble getting the raw data (data that is straight from its original source). He ended up "scraping" the data from the My School website, something he didn't likely have permission to do.

Data scraping, as defined by Wikipedia, is a technique in which a computer program extracts data from human-readable output (such as a web page).

In Pobar's case he needed to extract data from the My School site and export it into a format that his code could understand. It is, however, something any programmer would try to avoid as you can end up with all sorts of nasties if the data isn't extracted correctly.

The scraping process took him around four hours, four hours of his life he could have had back if the government had provided the data for developers to use. "Why didn't the government just offer up the raw data and let the programmers of Australia mash it up ... or at least give me a feed of the raw data to save me some time," Pobar said on his blog.

You see, government website data, by default, is not licensed under a creative commons licence (oh how nice that would be!). Although we pay taxes to the government, we don't own the information it produces — that data is Crown data; data we need to get permission to reproduce. So if Pobar wished to publish his work, he would need to seek permission to do so. If he wanted to earn money from the work, well that's another kettle of fish.

The My School's copyright statement says:

Copyright in the content and design of this website, including publications and logos, is owned by or licensed to the Australian Curriculum, Assessment and Reporting Authority (ACARA).

Subject to uses permitted under the Copyright Act 1968, you may only download, display, print and reproduce this material in unaltered form only for your personal, non-commercial educational use or non-commercial educational use within your organisation. However, unless otherwise indicated, this permission does not extend to reproduction, communication to the public, publication or other use of the work (in whole or in part) on an external website, intranet site or equivalent media.

This has been an issue the Government 2.0 Taskforce had attempted to try and fix late last year by creating a contest designed to entice programmers to use government data.

In creating the competition, it also released a new website called data.australia.gov.au, which lists a whole bunch of raw data sets available for people to use.

This is a great step forward, but we need more of it. At the time I wrote this, the date of the last data release on the site was 11 December. That's last year! Also, some of the data wasn't raw, it was in excel spreadsheets which weren't comma separated or easily usable for mashups. That definitely needs improvement.

Of course, I understand that not all data can be released.

I was at a mashup event last year where an Australian Bureau of Statistics employee faced down angry developers calling for the release of data. He said that if it were to release most of its raw data it could allow people to figure out sensitive information about other people or companies.

"Some people are that brilliant that they can work out how much companies earn, what their profit margins are and all of that — and that's something that we have to kind of avoid," said Anthony Zuza, quality assurance manager at the ABS.

I guess finding the right balance is going to be tough, and is something which is slowing the government's hand at releasing information in the right format. Hopefully we can get over this, so that developers can start doing more cool things.

Talkback

Wow, this is such a shame.

I worked on the Gov2.0 project, looking at online video, but I was there for the free the data discussions.

It is just crazy that this could have happened. Such a high-profile site.

I am sure that the folks that did the site were so rushed and so I think that we can forgive them.

Let us hope that we start to see less of this and we see machine readable data be the norm.

It is perhaps coincedence that the last data.gov is just before the Gov 2.0 Taskforce packed up.

I would love to know what is happing with the Gov 2.0 report

Jimi Bostock
PUSH Agency
Brisbane | Canberra | Sydney | Australia
jimi@pushagency.net

JimiBostockJimiBostock March 29th, 2010
Report offensive content Reply (0) (0)
Add your opinion

In order to post a comment, you need to be registered. (Sign In or register below)

Post your comment

Terms of Service - As a ZDNet registrant, and by using this service, you indicate that you agree to our Terms and Conditions and have read and understand our Privacy Policy.

ZDNet Australia Live

I guess but in both cases, dead body!

4 hours ago by Doubt on National Botnet Network coming: Earthwave

I think it's for the very reasons you mention in your first paragraph that there is no CBA. With the ideological differences and vested ...

4 hours ago by RealismBias on NBN cost-benefit analyses are so 2011

Good points; but how do you establish consensus about the terms of reference of a cost-benefit analysis? What is to be included? How far ...

5 hours ago by Gwyntaglaw on NBN cost-benefit analyses are so 2011

I live in a small country town & have done since 2002. When I got to this town it had no mobile phone & no broadband. The only reason w...

5 hours ago by fibretech on Regional review highlights NBN, mobile

Hi there, just became alert to your blog through Google, and found that it is really informative. I am going to watch out for brussels. I...

6 hours ago by Uttedsips on Fujitsu Stylistic ST5011

Like most things in life, the devil is in the details. If a cost benefit analysis included a societal element, I'm certain nobody on eit...

6 hours ago by RealismBias on NBN cost-benefit analyses are so 2011

The coalition has done nothing else but keep changing their view over the last 2 years. -first it was "there is nothing wrong with the ...

6 hours ago by djz on NBN cost-benefit analyses are so 2011

Use the force Luke... FFS

6 hours ago by Beta on Regional review highlights NBN, mobile

michael kors outlet http://www.michael-kors-discount.com/#5923

6 hours ago by michael kors bag on Best iPhone travel apps

Hey butterflyeffecs and lex, Sorry you're not fans of this piece. But you're dead right in that it is the thoughts and experience of a se...

6 hours ago by LHopewell on Android fragmentation steers Vic Health

teen cams
http://www.aloe-vera.cz handjob

7 hours ago by MyncWenry on Fusion-io ioDrive (80GB)

We have fashional replica bags designer .Replica luxury bags sale here are perfect compromise of quality and price. The replica handbags ...

7 hours ago by Machelle on Telecom NZ CEO Paul Reynolds to leave

It's not a question of whether anyone at HSU would know how to do this, but whether they would have connections with people who could. T...

7 hours ago by meski on CT, phone clone

Fred, I can tell you what the difference between FTTN and FTTH is. FTTH means we will be developing technology and services that we sell ...

7 hours ago by andye on NBN FUD: will Abbott ever learn?

You are 100% right – Abbott is a paragon of tenacity. Now if he could only try that hard to get Malcolm Turnbull's phone number, we co...

7 hours ago by braue on NBN FUD: will Abbott ever learn?

Very interesting to hear Ben and thanks for providing some real-world examples. I suspect the NBN has actually improved things for a grea...

7 hours ago by braue on NBN FUD: will Abbott ever learn?

Hi Geoff, my opening paragraph simply suggests that the leader of the opposition party would rightfully be turning to his communications ...

7 hours ago by braue on NBN FUD: will Abbott ever learn?

Very good point Richard – perhaps one of the most interesting things about this whole debate is how extensively it feeds the collective...

8 hours ago by braue on NBN FUD: will Abbott ever learn?

Yes. I also wonder how much of this intentional subterfuge is actually playing out as part of Turnbull's master plan. Given the rough ri...

8 hours ago by braue on NBN FUD: will Abbott ever learn?

Westpac Management runs STG IT since the take over and it is they Westpac who makes the decisions.

8 hours ago by jeff_syd on St George opts to keep 200 IT workers

This story has been voted 12000 times in the last 24 hours!

10 hours ago, Is Bill Gates a great leader?

This story has been voted 10 times in the last 24 hours!

2 days ago, CeBIT 2012 opens: photos

This story has been voted 15 times in the last 24 hours!

2 days ago, Lenovo ThinkPad 3G tablet (32GB)

Facebook Activity

Keep up with ZDNet Australia

ZDNet Events Calendar

ZDNet Events Calendar