Achieving data-enabled science

Jesse Harris asks whether Excel is holding back the future of digitalised science

Every company is a data company. Businesses worldwide are working hard to turn data into actionable insights. Machine learning, artificial intelligence and big data will change which products we make and how we make them. This is especially true for science companies, which rely on research and development – today’s data management decisions will shape tomorrow’s winners and losers.

Are scientists ready for this new world of digitalised science? Over-reliance on simple data tools, such as spreadsheets, is holding back the transition to the next era of science. Training a generation of digital-enabled researchers is necessary for maintaining scientific progress.

Excel isn’t enough

As scientists, we may think of ourselves as data-savvy. Many of us start using Excel in secondary school and continue through university, graduate school and beyond. Spreadsheets are both beginner-friendly and flexible, which is part of the reason they have become the world’s default data management tool.

But Excel is merely the beginning. The software is designed to manage small data sets and create graphs, not to be the backbone of scientific research. Excel also doesn’t speak science – it doesn’t understand chemical structures or genetic code. It is impossible to search by chemical structure or assign relationships between structures without chemical intelligence.

Excel doesn’t work directly with analytical data. Information from NMR, mass spectra and chromatograms must be abstracted into numerical values to be compatible with a spreadsheet. This flattens data and prevents reprocessing. Analytical data also must be exported into Excel-friendly files before it can be merged with other data streams, leading to an increased chance of transcription errors.

Does Excel still have a place in scientific research? Absolutely. Spreadsheets fill many critical roles. But we need to recognise that Excel is not the only option for managing scientific data. Research organisations must consider how to train or hire scientists with data skills beyond just Excel.

Minding the skills gap

Companies are built on data. Whether it is customer behaviour information, product data or website analytics, every business sits on a mountain of data. Machine learning and artificial intelligence are essential for parsing through this pile of data and turning it into actionable insights.

Organisations worldwide are hunting for employees with skills in data analysis and data management. In almost every industry you look at, employers report a skills gap: they need more data experts. Everyone from big tech to small business is fighting for analytics specialists to help them take advantage of their information and implement machine learning and artificial intelligence projects.

This gap becomes magnified when you look to the sciences. General knowledge about data management will no longer cut it – scientific data is unique. Research organisations are looking for people who are knowledgeable in the natural sciences as well as computers.

It’s not enough for a few data overlords to manage things from afar – every scientist needs to contribute. The success of machine learning applications requires data cleaning, processing, and data basing. Organisations will need training, systems, and a culture suited for this new reality.

On the other hand, the promise of advanced data sciences in research organizations is exciting. Computer-designed pharmaceuticals, materials, or foods will improve the lives of millions of people worldwide. These products are also a tremendous business opportunity for those who can capitalise on them.

The future of science is collaborative

Science and research work is becoming increasingly collaborative and multi-disciplinary. Major pharmaceutical companies are entering into joint projects or working with contract organisations to bring in subject-area expertise. This is paired with a move towards decentralised research, which was accelerated by the pandemic.

Again, Excel is a barrier to progress. Passing spreadsheets back-and-forth, either within a team or between groups, leads to “versioning” problems. This is when users cannot track file versions accurately. Automatically syncing shared spreadsheets often have similar versioning issues in addition to file permission challenges and accuracy issues.

These problems can lead to losing critical results, submitting incorrect data to regulators, or needing to repeat costly experiments. To put it simply, small mistakes in spreadsheet management can lead to frustration, embarrassment, and expense. Excel is not meant for the highly collaborative workflow of modern research.

Data-enabled science

What does the future of science look like? How will data-enabled researchers work differently than today?

It is hard to say exactly, but you can find clues in cutting-edge scientific data management technology. Luminata is an example of a software designed to manage chemistry data in a large research organisation. This is a chemistry manufacturing and control (CMC) decision support tool designed to consolidate process chemistry and analytical data in one application. It allows researchers to:

•          Search for chemicals based on structure

•          Compare analytical results using data visualisation

•          Improve decision making

•          Simplify regulatory submissions with audit trails

•          Track project completion

Pharmaceutical companies using Luminata report substantial time savings, efficiency gains in accessing data compared to working with Excel and a reinforced culture of data sharing. While learning to use new software may be intimidating, it is often easier to use than the alternative once implemented.

Technology such as Luminata is only possible when researchers understand the need to improve their data management. Excel is still an essential tool for scientists, but we need to develop a richer toolset. Companies that adapt to this new reality have the opportunity to be true innovators.

Jesse Harris is with ACD/Labs

Recent Issues