|
This document was put together by b3ta member claws of doom as
part of a digitisation project which is to be found here.
Why do we have file formats?
We have file formats because they form the basis of a convenient
way of passing image data from machine to machine. If I were to
ask you to pass information about an image to me over the phone,
how would you go about it? Would you break the image into sections
and describe what is in each section? Would you try and convey the
whole of the image at once? Would you try and define what colours
there are in the image?
File formats are merely pre-agreed standard ways of describing
images – so that when your computer talks to mine about a
so-and-so colour in a specific part of the image, my computer will
be able to understand exactly, and replicate the image your computer
describes.
File formats
This a rough guide into what each of the popular file formats was
designed to do. It'll discuss which format is suitable for different
purposes, and what the specific strengths and weaknesses of each
file format is. This isn't meant to be an absolutely accurate fact-laden
primer – merely a gentle introduction into discussing file
formats.
The three formats under scrutiny here will be GIF (Graphic Interchange
Format), JPEG (Joint Photographic Experts Group), and TIFF (Tagged
Image File Format). There are many (many!) more file formats than
the three described here. These three are merely those used most
often on the web for their given purposes.
The basics
All the file formats follow roughly the same logic for storing
images. They take an image, break it up into manageable bits, and
then store those bits in an order that can be reconstructed later.
The "bits" that the images are broken into, and the way
the order is chosen is what differentiates the file formats (on
a very basic level).
GIF
A GIF is mostly used on the web for logos, lines and one pixel
images. It is the only image format that can handle animations on
the internet (not discussed here). GIFs are limited (by their definition)
to be able to handle at most 256 colours. This means that they cannot
accurately display photographic images (which need 256 colours in
each base colour: Red, Green and Blue).
To store an image as a GIF, the following steps are taken.
- The image is analyzed to see which set of colours (up to 256
different ones) will best describe the colours present.
- The colours are defined and each tagged with a different key
(colour a, b, c etc. if you will)
- The lines of the picture are placed end to end, with the topmost
line first.
- Each individual pixel is then described. This generates a line
of data which could appear as "a,b,a,a,a,c,b,a,a,b,b,b,b,c,a"
etc.
- This is compressed to: "a,b,3a,c,b,2a,4b,c,a" etc.
From this you can see that a large swathe of the same colour takes
up very little space – imagine the section compressed by "2478a".
These large swathes of colour occur mostly in logos and blocks of
colour. Even the slightest change of shade between pixels will render
the compression useless.
Given the original picture dimensions the image can be worked out
again, by merely reading the list of data and laying each pixel
down – much like a mosaic. If the image had less than 256
colours in it at the start, you wouldn't have lost any detail in
your image by storing it as a gif.
JPEG
A JPEG is mostly used for photographs and complex images where
continuous change of tone is required. The number of colours aren't
limited. Compression is dependent on the image quality desired.
To store an image as a JPEG, the following steps are taken:
- The image is broken up into bits: squares of a size dependent
on the compression desired. The higher the compression, the bigger
the square. Squares can typically be about 5 pixels by 5 pixels.
- The top left corner, and bottom right corner of a square is
then looked at, and the colours in those corners defined.
- A simple mathematical equation is used to draw a colour curve
from the top left corner to the bottom right, approximating as
good a match as is possible.
- A list is generated of the two colours and the equation for
each box, in their order. (example: Col1, Col2, Eq1; Col 3, Col1,
Eq2). This list is not compressed – compression occurs by
making the size of the squares bigger – requiring less of
them to complete a whole picture.
From this you can see that if you had any fine detail (smaller
than the size of the squares), it would be lost – or at least
altered as an approximation. When these fine detail changes are
visible, they are known as "artefacts".
Once again, given the dimensions of the original image, it is a
simple process to re-lay all the boxes in their correct order and
generated the image.
Note that the regenerated image isn't exactly the same as your
original – merely a (close) approximation. If you re-save
the image as a jpeg many times – with varying compression
(and therefore box size), the image will get further and further
away from the original.
TIFF
A TIFF is an image format that is not used to display images on
the web, but is used to transfer images accurately. It can handle
all the colours of a JPEG with the accuracy of a GIF. The down side
to this is that the file size does tend to be larger. The up side
of this is that you can save a TIFF exactly as it was before without
loss of data between each generation, and that TIFFs store photographic
images actually better than a JPEG could.
Storing an image as a TIFF is much the same as you would a GIF
– just with a larger defined colour table. TIFF is also a
much-extendable format. This means that should you want to define
an image in your own way – it allows you to do so. It also
allows for a large variety of complex compression algorithms –
which can help reduce file size.
Examples:
Included are exaggerated examples of the suitability of GIFs or
JPEGs for different images. TIFFs would match the best in every
case – and would beat the JPEG for photographic images.
Example 1: Logo

logo.gif

logo.jpg
Example 2: Photo

photo.gif

photo.jpg
Comparison table of uses for GIF, JPEG and TIFF
| Suitable for
|
GIF
(*.gif) |
JPEG
(*.jpg/*.jpeg) |
TIFF
(*.tif) |
| Logos |
Yes |
No |
Yes |
| Photographs |
No |
Yes |
Yes |
| Data Loss |
No/Variable |
Yes, Variable |
No |
| Web Display |
Yes |
Yes |
No |
Notes
This is in no way meant to be an exacting or even truthful account
of How Things Work – merely an explanation that gives some
(if any) insight into the mechanics of image formats. It is intended
to whet the appetite more than provide the final answer. It was
generated after seeing that explanations of file formats often instantly
involved delving deeply into technical specifications. I hope it
helps. Should you require further input (Johnny Five is alive!),
see the following links:
(c)Gwydion Gruffudd (5/1/2004) |