Unlocking the business intelligence in research data

Vibeke Ulmann explains how a visual analytics solution is empowering the healthcare and life science research sectors

Today’s digital world is collecting data faster than ever before, a rise that is not explained by population dynamics but by the number, range and affordability of information-gathering devices.

These cheap and increasingly numerous devices include the omnipresent mobile phone, estimated to have passed the 5 billion mark and on target to reach 5.7 billion by 2020, but also comprise remote-sensing devices, RFID readers, wireless sensors, CCTV cameras, and microphones, not to mention the vast data collecting behemoth that is the worldwide web.

And we should not overlook the fact that the Internet of Things (IoT) and wearable medical data collection devices are still in their infancy.

If we are in any doubt about the data explosion, a 2015 Aureus Analytics report suggested that 90% of the data in the world had been created in the previous two years alone, and the world’s data volume was expected to continue to grow at 40% year on year.

The healthcare and life science industries are no strangers to large data sets, having collected scientific research results across many disciplines, patient profiles, high-throughput screening for active compounds, and drug compliance and regulatory databases for many years.

These activities continue and the data collected is also expected to grow exponentially in the coming years, holding out the promise of new treatments, new drugs, better clinical decision-making and increased quality of life.

However, to make good on the promises, we need to ensure – as far as possible – the quality, consistency and integrity of this data. And we need the tools to interrogate and visualise the results of these datasets that are so large and complex they challenge relational database software and desktop data and statistical packages.

Only with tools that enable us to discover hidden correlations within the data and understand the patterns and trends of these associations, will we be able to leverage and take advantage of this data explosion.

Finally, to ensure that the insights we derive are current, collecting and analysing these big data sets in real-time, is a critical requirement for the healthcare and life science sectors.

A multi-dimensional, unstructured analytical engine

Given that big data has the potential to cut research lead times, lower costs, improve care and save lives, establishing massively parallel relational database software running on tens or even hundreds of powerful servers may seem a logical approach. However, a better approach is to adopt a multidimensional database that is optimised for data warehouse and online analytical processing (OLAP) applications driven by a specialised language optimised for speed.

This is the concept behind Cosmos – a true visual analytics solution for research purposes. Powered by a multi-dimensional, unstructured analytical engine, written entirely in Dyalog APL, Cosmos can quickly – and flexibly – analyse vast quantities of data, and show the analysis in an interactive and dynamic display of interconnected data points.

Cosmos was developed in the UK by Optima Systems and MD Paul Grosvenor is clear about the challenges facing the medical and pharmaceutical industries as they find themselves in a battle of understanding; trying to delve into the mass of information that is now routinely available to researchers and extract only the data of interest.

“Researchers know that cause and effect are very hard to identify and even harder to prove in a world of multiple inputs,” he begins. “For instance, it is easy to say that smoking is bad for you, and few would argue differently, but then so too are chocolate and convenience foods.

But we don’t live simple lives and the arguments for and against must take into account other factors, such as our lifestyle, genetic make-up, and environmental pressures.

Chocolate may be bad for us but regular exercise may offset its effects unless, of course, you suffer from diabetes.

“Or take a study into the efficacy of drug treatment, where the outcome may be influenced by many other factors. This could include personal factors, such as age, gender, previous medical history, social and environmental factors such as income, postcode and where the care is provided, and other factors such as whether the drug is used solo or in combination.

“Fortunately, our acquisitive approach to collecting data, which is especially true of the healthcare sector, means we have no shortage of information about the manifold factors that may impact our health outcomes. We just need to know what the question is!”

Don’t ask questions – look for anomalies and patterns

Cosmos takes a different approach, relying on taking in as much data as possible from a wide variety of sources.

Then, handling up to 20 data dimensions at a time, it actively searches for patterns and anomalies, and graphically displays links between the data points.

Rather than asking discrete questions, it visualises patterns and hot spots within the data so that the researcher can be guided towards the areas where questions should, could or might be asked. What’s more, it monitors and tracks the totality of the data store in real time, constantly updating the graphical display in response to the changing data.

“Cosmos eliminates the need to know which question to ask,” says Grosvenor. “Instead it illuminates correlations of data where you can drill down and investigate the underlying individual factors that are coming together. Several clients have called it a ‘thesis generator’ as it allows researchers to tailor their questioning of the data to what the data actually displays, rather than what they think it might hold.

“It is a fundamentally different approach to unlocking the business intelligence held in the data.”

CancerLinq case study

US-based CancerLinq was one of the first organisations to take an interest in the project and decided to implement Cosmos. Within three months of concept, the prototype was already showing sufficient promise to accelerate the move to a larger system. The CancerLinq solution currently holds around one million patient records, covering every state in the USA and all cancer types and cases.

Clinicians across the country now have unprecedented access to search for similar cases of specific types of cancer anywhere within the USA. By locating these almost identical instances, they can see which courses of action and treatment schedules deliver the best outcomes.

Specialist language

Multidimensional processing on the scale of Cosmos would take days or even weeks using traditional techniques.

Cosmos relies on the Dyalog APL language to deliver its real-time performance, a specialist language designed to specifically handle mathematical processing. An illustration of its power is that the inventors managed to write the code in a matter of hundreds of lines rather than thousands.

The APL language is highly effective at solving problems that involve performing complex calculations on lists or arrays (‘chunks’ of data). Its bit-manipulation capabilities make it a great tool for embedded, robotics and computer vision applications. 

Recent Issues