Standard file formats – taking mass spectrometry a step forward

1st April 2013

Standard file formats are critical when immediate and easy access to a range of raw instrument data is required.

Since only one data format needs to be considered, standard file formats enable far easier processing of a wide range of applications including comparing data from different instruments, conducting statistical analyses, generating reports and archival storage.

So standard file formats have significant benefits. Why, then, is their development apparently going backwards?

In the 1980s, there were three significant attempts to develop standard file formats for data exchange. The first format, JCAMP, was focused on optical spectroscopy applications. The second one was known as the Andi protocols and was intended for use with chromatography and mass spectrometry instruments. A third format, which was created to support any type of instrument data, was the Galactic SPC format.

While JCAMP and Andi protocols were created by industry committees, SPC was a published de facto standard developed by software vendors. Both SPC and Andi protocol formats are still in wide use today.

Originally, standard file formats were intended to accelerate processing applications and enable easy data exchange between instruments and laboratory information management systems (LIMS). This was necessary as users were often locked in to vendor’s software due to proprietary formats. If data were only accessible via the vendor's software and tools, how would it be accessed in the future when the vendor ceased support of the software or even ceased to exist? Long-term archival storage was a further reason that led to the development of standard file formats.

All three standard file formats attempted to grow to encompass new experiments and types of instrumentation. The Andi protocols were well supported by chromatography and mass spectrometry vendors alike. JCAMP was supported by infrared (IR) and UV/VIS instruments although complications from a rather complex unstructured implementation limited their effectiveness.

However, those standard file formats never evolved to become extensible or versatile enough to keep up with the introduction of new experiments and instrument types. As a result, standard file formats have languished.

In the late 1990s, the need for standard file formats to meet new regulatory requirements – namely the FDA’s 21CFRPart11 – with regards to data archival storage renewed interest in the matter. The new regulations required archive formats to be ‘accurate and complete’. Many of these proposed formats were based on a new data format language, XML. Unfortunately, despite many proposed formats, none of them has been widely adopted – although there has been some success in specific fields.

There continues to be a strong argument for the development of standard file formats. Data are owned by the user or company who made the measurement and vendors should provide open methods to access data at every level.

Most vendors provide a convenient, simple method of copying a single data scan to the clipboard. Many also support exporting to third party formats. However, in many cases the export is incomplete or suitable only for use solely in another vendor’s application. This can make data archiving or post run analysis difficult or limited. In addition, the vendor's software is needed in order to export to third party formats. This limits the ability of third party analysis tools to easily access vendor data, not to mention the need to support multiple formats from multiple vendors.

Mass spectrometry data

In reality, efficiency in terms of file size and performance is particularly important in mass spectrometry (MS) experiments where huge volumes of data are generated. In addition, there are considerable differences and specialised experiments unique to each vendor. This makes defining standard formats challenging.

Arguably a different approach needs to be taken for MS data. One interesting possibility is to define a standard software application programming interface (API) instead of a standard format to contain the data.

An API is a software interface that allows other programs to access the underlying data or functionality through a set of standardised software calls. If vendors were to support a common software API for accessing raw data, this would go a long way in providing a solution. First it would remove the need to create a copy of the potentially huge data files, but still allow efficient access for third party applications.

It should be noted that for GLP and regulatory compliance issues the API would need to be read-only. In other words, the vendor data could not be altered or updated through the API. This approach still allows vendor data to be transferred to archive or other storage formats, and in fact, makes it very easy to do so as the conversion utility need only be written once.

Software tools for reading the vendor formats would need to be provided and supported by the vendor. Furthermore, these software tools should be independent of the vendor data acquisition and analysis software. Finally, they should be available or licensed under reasonable terms.

Still, the process of defining the API is challenging but the results, I believe, would be hugely beneficial to all, both vendors and users alike.

In many ways, a standard API is far superior in providing automation of workflows for data analysis, LIMS integration, and third party applications integration, and knowledge sharing. It also facilitates the task of long term archival storage. Imagine a centralised data archive whereby any data from any vendors’ instruments can be easily shared, viewed and analysed from a single application including a web based viewer!

Cerno is particularly interested in standard file formats for MS applications as we provide general tools for processing and analysing mass spectrometry data. Any system that would standardise the method by which MS data is accessed would be beneficial for both users and vendors and other third party developers.

For example, Cerno provides a novel MS calibration technology that significantly improves mass accuracy on an MS instrument (Fig.1). More importantly, Cerno’s comprehensive calibration process performs standardisation on different types of MS data, allowing quantitative comparison/search of archived MS data from different vendors and/or acquired at different times by different users under different mass spectrometry tuning conditions. This makes all MS measurements more reproducible and comparable between brands and techniques.

This could have a large impact on improving the results for proteomic database searching, statistical analysis for metabolomics, and a host of other MS applications. Therefore, an openly accessible standard data format containing such quantitatively standardised mass spectrometry data would be an important step forward for the mass spectrometry community.

Don Kuehl is Vice President, Marketing and Product Development with Cerno Bioscience, Danbury, CT, USA.





To receive the Scientist Live weekly email NewsBrief please enter your details below

Twitter Icon © Setform Limited