A brief look at file format design - A brief look at file format design Introduction What are file formats Why talk about them Everybodys come across

A brief look at file format design - A brief look at file...

This preview shows page 1 - 2 out of 17 pages.

A brief look at file format design Introduction: What are file formats? Why talk about them? Everybody’s come across great many file formats and all the problems that go along with them. Coders in particular face all the burdens of supporting the multitude of formats out there. File formats are a fact of life, then; why talk about them? The answer is simple: far too many bad formats exist. As the scene is very active in establishing its own standards and conventions, putting a finger at the basics might well serve a valuable purpose in the long run. In computer science circles what most know as file formats are called external data representations or serialisations. These two terms nicely summarise what formats are all about: they are about representing abstract data types (ADTs) as concrete byte sequences. But this is only part of the story: ADTs include a procedural side as well. It tells how the stored structures are accessed and modified. Now, current software architectures do not admit a portable definition of complete ADTs—we must separate ADTs into procedural and declarative parts and fix a particular concrete transfer syntax for the latter. This is why computer scientists draw a line between transfer syntaxes and internal representations—the first is used when we cannot control the procedural side of our ADTs, the second when we can. Strictly specified file formats are, then, just a way to circumvent the need for common access libraries and proper type encapsulation. In the scene, this internal vs. external distinction is not often appreciated. A common mistake is made: file formats are modelled too closely after the operation of the writing program. Instead, the logical organization of the data itself should be used as the starting point. One direct consequence of fixed representations is the need to accurately define them. Often, in addition to being incompletely defined, file formats tend to be too syntax oriented—usage guidelines, algorithm details and data dependencies are seldom spelled out. In effect, we leave a considerable hole in the definition of our abstract type. This means that even though a format spec may be available, improper implementation is still quite easy. All too frequently this happens because the writer of the original application thought no one else would use the format—the scene rests purely on de facto standards which are almost invariably meant to be private at the time of their inception. This assumption often fails, so most file formats should be designed and defined more carefully. This is the subject matter of this article—food for thought, basically. Fundamentals of serialisation In designing a transfer encoding, we face a multitude of engineering goals, some of which are mutually incompatible. Some of the major ones are: 1. Compact encoding 2. Speedy access 3. Generality and extensibility 4. Simplicity and clarity (in formats, specs and software) 5. Architectural requirements (like serial access) 6. Consistence/orthogonality (see below) 7. Fault tolerance and security
Image of page 1
Image of page 2

You've reached the end of your free preview.

Want to read all 17 pages?

  • Fall '14
  • File format, file formats

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes