Advertisement
To print: Select File and then Print from your browser's menu
-------------------------------------------------------------- This story was printed from ZDNet Australia. --------------------------------------------------------------
Business intelligence: Beyond Olap

By Mark Whitehorn, IT Week
March 21, 2001
URL: http://www.zdnet.com.au/news/business/soa/Business-intelligence-Beyond-Olap/0,139023166,120210277,00.htm


Core business intelligence technologies such as Olap can struggle to make sense of some data sets, but alternative BI approaches can offer new ways of exploring information, says Mark Whitehorn

Business intelligence (BI) is difficult to define because it is not a single technology. Rather, it is a collection of methodologies that are grouped together solely because they are all designed to help business people extract usable information from raw data.

There is a core set of technologies that are widely accepted as being part of BI: data warehousing, online analytical processing (Olap), and data mining. Quite rightly, these get the majority of the coverage when BI is discussed. However, there are several lesser known technologies that also deserve to be discussed and perhaps included under the BI umbrella. This briefing paper will consider just two, chosen to demonstrate the broad range of software that is currently available.

Query response
Operational data is almost always held in relational databases ­ these provide efficient data storage, high data integrity and allow questions to be asked of the data. The bad news is that they have a very poor query response when used with large sets of data. However, Olap can be used to speed up data queries.

Olap is an established approach to BI and is usually based on several methodologies and technologies. Data is first structured in a star schema with what are known as facts, dimensions and hierarchies. Facts are the data to be analysed, such as number of items sold, payment methods and so on. Typical dimensions might be time, customer, region and product. Each dimension is often arranged as a hierarchy, so time might be arranged as days, weeks, months, quarters and years.

Users can then ask questions such as, 'How many chairs did we sell in the first quarter of this year to male customers in Leeds?' and expect an almost instant reply. Olap provides such a rapid response by the simple expedient of calculating the answer to all the possible questions that can be asked. In other words it calculates all of the intersections of all the dimensions during the initial building of the data set ­ which is known as an Olap cube. But although Olap cubes provide astonishing speed, they have several disadvantages.

For a start, given, say, 10 dimensions (not an unusual number) each with 1,000 values, the number of intersections that can be calculated is 1029. Olap software usually has mechanisms to reduce this, but cubes are still massive. The number of possible intersections also means that they can take a long time to build. However, the main disadvantage of Olap goes right back to that star schema and those dimensions. At the point when cube designers choose the dimensions and hierarchies, they are limiting the questions that can be asked of that particular cube. For example, if a monthly hierarchy is not included, the user cannot get monthly totals of product sales. The cube can, of course, be rebuilt to include monthly totals, but that takes time.

This is where Alterian, a UK-based company, comes in. Alterian has attempted to combine the speed of Olap with the flexibility of a relational database. It provides a database engine that leaves the data structured as relational tables, which instantly cures Olap's problems ­ because the problems are an inevitable consequence of Olap structuring. However, the reason that Olap was developed in the first place was that relational tables are notoriously slow to query; queries can take hours or days to run. The Alterian solution overcomes this difficulty by making significant changes to the way in which the data is compressed and indexed.

Consider a 10-million row table with a yes/no field in which the majority of responses are yes. Traditional databases store them all: the Alterian engine can note the default value and then store only the exceptions. The engine has a host of different compression techniques available to it that it can apply to different data types. It draws from this pool of techniques and applies any or all that are appropriate. In fact, the choice of a given technique may depend not just upon the data type but also on the distribution of the data within a given field.

The tables are very heavily indexed and, similarly, the Alterian database engine can draw upon a variety of different indexing techniques. Indeed, some fields may end up indexed multiple times, depending on the structure of the table and its relationship with other tables.

It is worth noting that although this database is structured as a relational database, this does not mean that it is operational. In other words, it is designed to be used as a read-only database rather than as one where users are allowed to constantly update the data it contains.

Alterian really does seem to let users have their cake and eat it. The database remains structured in a very flexible way, allowing any and all queries to be run. The data volume is trivial compared to Olap, as is the load rate of more than 10GB per hour. The company says it offers query speeds in excess of 50 million rows per second using a desktop PC, while servers typically provide faster performance. So users should be able to obtain the query performance of Olap without the disadvantages.

Intuition

Meanwhile, Ijen from Ncorp attempts to use a computer to perform the sort of intuitive matching that we might expect from humans.

One of the big problems that people have when working with computers is that computers are very specific and exact. If a customer searches a traditional database of an online travel agent they might look for a two-week break in Spain for between £200 and £300. If no such holiday has been entered into the database they have to try again. Had they been talking to a human travel agent, the travel agent might have said: 'I've nothing exactly like that, but I've got a fortnight in Portugal for £195; what about that?'

The difference is crucial: a human being has an intuitive feel for how close different pieces of data happen to be. For example, in human terms, a £300 holiday is an ocean away from a £700 one, whereas a £250,000 house is very close to a £249,600 house ­ despite the fact that the financial difference in both cases is the same, £400. People have acquired a sense of proportion which computers lack.

It is, of course, possible to program into an application a complex set of rules that define what, in this instance, is a sense of proportion. In our example, we might define 'close' as plus or minus five percent. The problem here is two-fold. First, it is difficult and time-consuming to define the rules and then program them ­ it is often the process of defining the rules that is the most time-consuming. Second, experience suggests that, in most business interactions, the rules are constantly altering.

This process would be much easier if the root problem was addressed. If it was possible to encapsulate such concepts of 'closeness', 'matching', and 'similarity' and develop software that could recognise when blocks of data are close then this solution could be applied in a number of different ways. This is exactly what Ijen has done, and the range of applications is staggering. For a start, it can be used to solve the holiday problem outlined above. But it can also be used, for example, to 'watch' the activity of a user and figure out what sort of holiday the user likes, which factors predominate ­ the price, the destination, or the dates, perhaps ­ and suggest alternatives that are likely to be acceptable within the ranges deduced.

It is also able to identify separate holiday events. If the same user has been looking for a week in Portugal and a fortnight in Florida, the system will not suggest 10 days in Iceland. And the same technology can be used in reverse ­ holiday packages can be put together and tried against the profiles of, for example, the top 1000 users, so the travel agent can predict the take-up.

But it is important to realise that this is a generic solution that can be applied in a variety of areas. The financial services industry has been an early adopter, using the software to identify patterns in the mass of structured information which it collects but which is often stored and never used because the processing has been too complex to automate. Human resources departments can also use this technology ­ matching people to jobs and getting the right people to do the right tasks.

Copyright © 2009 CBS Interactive, a CBS Company. All Rights Reserved.
ZDNET is a registered service mark of CBS Interactive. ZDNET Logo is a service mark of CBS Interactive.