Software / file formats

  • Print
   
   

Software / file formats

In your data management plan, you should describe and justify your choice of file formats for all data collected by your project.

According to the University's code of research conduct (pdf), data must be retained intact in its original form. 

However, the formats that are most appropriate for data collection and analysis may not ensure the long-term preservation, accessibility and re-usability of data.

It is important to consider from the outset how the data can be converted into open, re-usable and preservable formats before deposit into an archive or repository.   

softwareFile208x90

While it is possible for bits to be preserved indefinitely without consideration of format, it is only through the careful management of format that the meaning of those bits remains accessible over time. 

 

- Stephen Abrams, California Digital Library

 

Formats for data collection and analysis 

 The formats you use during research will depend on:

  • How you plan to analyse data
  • Discipline-specific standards and customs
  • Preferred choice of proprietary and open software
  • Compatibility of software with available hardware
  • Availability of institutional software licences
  • Staff expertise

Planning for data conversion

When data is converted from its original format in preparation for long-term storage, changes may occur to the data. 

Data should be checked after conversion for errors and changes.

Hardware and software independence

You should consider whether the hardware and software used to create and manipulate research data are likely to remain available in the long term. 

Considerations include:

  • the likely time that the hardware and software will be available
  • the likelihood of support for the current data formats in future versions of the software
  • the size and level of activity of the developer and user communities
  • the level of technical support available, both now and in the future

JISC digital media

The JISC digital media site offers advice and information, especially regarding images, sound and video. 


Formats for preservation and usability 

When choosing a file format for long-term preservation, ask:

  • Will future users of the data be able to use this format?
  • Will this format still be usable in ten years time?
  • Does your archive accept this file format?
  • When converting data across formats, how will you check that no data or internal metadata have been lost or changed?

The file formats you choose for long-term preservation should be:

  • non-proprietary
  • unencrypted
  • uncompressed
  • open standards (endorsed and published by standards agencies such as ISO)
  • publicly documented (in complete and authoritative specifications)
  • widely used and accepted as best practice (within your discipline or another user community)
  • self-documenting (i.e. the file itself includes useful metadata)
  • the product of collaborative development and consultative processes

For detailed guidance, see: