Computer Science

Types of Compression

Compression in computer science can be categorized into two main types: lossless and lossy compression. Lossless compression reduces file size without losing any data, making it suitable for text and data files. Lossy compression, on the other hand, reduces file size by eliminating some data, making it more suitable for multimedia files like images, audio, and video.

Written by Perlego with AI-assistance

8 Key excerpts on "Types of Compression"

Learn about this page

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

eBook - ePub
The Digital Document
- Bruce Duyshart(Author)
- 2013(Publication Date)
- Routledge
  (Publisher)
There are quite a number of different compression algorithms which can be used to compress data. As was shown in the previous examples, some file formats can use just one or a combination of these different Types of Compression in their specification. Fortunately, in most cases, the user is also shielded from having to understand the type of compression which a file format uses. An understanding of the basic principles of compression however, will explain which types of data can be more easily compressed and, consequently, why some file formats are more efficient with the amount of space they occupy than others.

Data compression is a method of reducing the physical size of a block of information.5 In order for the process to be used effectively, a standard encoding scheme must be used to determine how data is to be compressed when it is not in use, and decompressed when it is in use. The difference in the file size between these two states is referred to as the compression ratio or compression per centage. A ratio of 10:1 or 10 per cent for example, would imply that the algorithm used is capable of compressing the raw data to one tenth of its original size.

Compression technology is quite diverse. The algorithms that can be used include: physical, logical, symmetrical, asymmetrical, adaptive, semi-adaptive, non-adaptive, lossy and lossless methods. The most important of these Types of Compression to understand, is the difference between lossy and lossless compression.

Most compression schemes use the lossless method of compression, whereby data can be compressed and recompressed without any loss of data. By comparison, lossy compression removes some of the relatively superfluous data in a file, in order to achieve better compression ratios.

Lossy compression systems generally work by using complex heuristic algorithms to rationalise the number of required colours in an image. The process works by replacing adjacent and similarly coloured pixels within a certain range, with another common colour. While this process may seem to reduce the overall quality of an image, most lossy compression algorithms can achieve compression ratios of 20:1 to 25:1 without any perceptible loss of image detail.
Sign up to read
Learn more about book
eBook - ePub
How Video Works
From Broadcast to the Cloud
- Diana Weynand, Vance Piccin(Authors)
- 2015(Publication Date)
- Routledge
  (Publisher)
14 Compression
Compression is the process of reducing data in a digital signal by eliminating redundant information. This process reduces the amount of bandwidth required to transmit the data and the amount of storage space required to store it. Any type of digital data can be compressed. Reducing the required bandwidth permits more data to be transmitted at one time.

Compression can be divided into two categories: lossless and lossy . In lossless compression, the restored image is an exact duplicate of the original with no loss of data. In lossy compression, the restored image is an approximation, not an exact duplicate, of the original (Figure 14.1 ).

Lossless Compression

In lossless compression, the original data can be perfectly reconstructed from the compressed data that was contained in the original image. Compressing a document is a form of lossless compression in that the restored document must be exactly the same as the original. It cannot be an approximation. In the visual world, lossless compression lends itself to images that contain large quantities of repeated information, such as an image that contains a large area of one color, perhaps a blue sky. Computer-generated images or flat colored areas that do not contain much detail—e.g., cartoons, graphics, and 3D animation—also lend themselves to lossless compression.

Figure 14.1 Lossless vs Lossy Compression

One type of lossless compression commonly used in graphics and computer-generated images (CGI) is run-length encoding
Sign up to read
Learn more about book
eBook - ePub
The Manual of Photography
- Elizabeth Allen, Sophie Triantaphillidou(Authors)
- 2012(Publication Date)
- Routledge
  (Publisher)
lossy.

Lossless compression methods, as the name suggests, compress data without removing any information, meaning that after decompression the reconstruction will be identical to the original. However, the amount of compression achieved will be limited. Certain types of information require perfect reconstruction, and therefore only lossless methods are applicable.

Lossy compression methods remove redundancy in both data and information, incurring some losses in the reconstructed version. Lossy compression is possible in cases where there is some tolerance for loss and depends on the type of information being represented. An example of such a situation is one where some of the information is beyond the capabilities of the receiver. This process is sometimes described as the removal of irrelevancies. In lossy methods there is always a trade-off between the level of compression achieved and the degree of quality loss in the reconstructed signal.

Types of redundancy
Mathematically, the process of compression may be seen as the removal of correlation within the image. There are a number of different areas of redundancy commonly present in typical digital images:
• Spatial redundancy
(see Figure 29.2 )
. This type of redundancy refers to correlation between neighbouring pixels and therefore inherent redundancy in the pixel values (also known as interpixel redundancy). The correlation may consist of several consecutive pixels of the same value, in an area where there is a block of colour, for example. More commonly in natural images, however, neighbouring pixels will not be identical, but will have similar values with very small differences. In images where there are repeating patterns, there may be correlation between groups of pixels. A specific type of inter-pixel redundancy occurs between pixels in the same position in subsequent frames in a sequence of images (i.e. in video applications). This is known as interframe redundancy, or temporal redundancy
Sign up to read
Learn more about book
eBook - ePub
Understanding Digital Cinema
A Professional Handbook
- Charles S. Swartz(Author)
- 2004(Publication Date)
- Routledge
  (Publisher)
lossless compression, and it does have practical applications. Well-known computer programs such as PK-Zip and Stuffit are lossless compression systems. They can take a computer file, make it more compact for storage or transmission, and then restore a perfect copy of the original.

Unfortunately, lossless systems generally do not provide sufficient compression for large-scale imagery applications such as Digital Cinema distribution. Typically, lossless systems can compress image data by factors in the range of two or three to one; a useful degree of compression, certainly, but not enough to make Digital Cinema practical. Recently there have been claims that new techniques can provide much higher compression ratios but—at the time of writing—no independent tests have verified these claims.
So the majority of this chapter will be devoted to the characteristics and design of lossy compression systems; systems that are likely to meet the practical needs of Digital Cinema distribution.
However, lossless compression does still play an important role. These techniques may be used with almost any source of data, including the output data of a lossy compression system. So practical compression systems usually consist of a lossy front end followed by a lossless section (known as the entropy coder) to reduce the bit rate even further.

Lossy Compression

For the foreseeable future, Digital Cinema will require the use of compression systems that are not lossless: systems that discard or distort some of the information in the original image data, or lossy compression. The intent of such systems is to provide the maximum amount of compression of the image data consistent with an acceptably low level of distortion of the images, as perceived by a human viewer viewing the images under the intended conditions
Sign up to read
Learn more about book
eBook - ePub
Compression for Great Video and Audio
Master Tips and Common Sense
- Ben Waggoner(Author)
- 2013(Publication Date)
- Routledge
  (Publisher)
Generating a codebook for each compressed file is time-consuming, expands the size of the file, and increases time to compress. Ideally, a compression technology will be able to be tuned to the structure of the data it gets. This is why lossless still image compression will typically make the file somewhat smaller than doing data compression on the same uncompressed source file, and will do it faster as well. We see the same thing as with the text compression example.

Small Increases in Compression Require Large Increases in Compression Time

There is a fundamental limit to how small a given file can be compressed, called the Shannon limit. For random data, the limit is the same as the size of the source file. For highly redundant data, the limit can be tiny. A file that consists of the pattern “01010101” repeated a few million times can be compressed down to a tiny percentage of the original data. However, real-world applications don’t get all the way to the Shannon limit, since it requires an enormous amount of computer horsepower, especially as the files get larger. Most compression applications have a controlling tradeoff between encoding speed and compression efficiency. In essence, these controls expand the amount of the file that is being examined at any given moment, and the size of the codebook that is searched for matches. However, doubling compression time doesn’t cut file size in half! Doubling compression time might only get you a few percentages closer to the Shannon limit for the file. Getting a file 10 percent smaller might take more than 10 times the processing time, or be flat-out impossible.

Lossy and Lossless Compression

Lossless compression codecs preserve all of the information contained within the original file. Lossy codecs, on the other hand, discard some data contained in the original file during compression. Some codecs, like PNG, are always lossless. Others like VC-1 are always lossy. Others still may or may not be lossy depending on how you set their quality and data rate options. Lossless algorithms, by definition, might not be able to compress the file any smaller than it started. Lossy codecs generally let you specify a target data rate, and discard enough information to hit that data rate target. This really only makes sense with media—we wouldn’t want poems coming out with different words after compression!
Sign up to read
Learn more about book
eBook - ePub
Introduction to Digital Audio
- John Watkinson(Author)
- 2013(Publication Date)
- Routledge
  (Publisher)
on-line editing is being performed, the output of the workstation is the finished product and clearly a lower compression factor will have to be used.

The cost of digital storage continues to fall and the pressure to use compression for recording purposes falls with it. Perhaps it is in broadcasting and the Internet where the use of compression will have its greatest impact. There is only one electromagnetic spectrum and pressure from other services such as cellular telephones makes efficient use of bandwidth mandatory. Analog broadcasting is an old technology and makes very inefficient use of bandwidth. Its replacement by a compressed digital transmission will be inevitable for the practical reason that the bandwidth is needed elsewhere.

Fortunately in broadcasting there is a mass market for decoders and these can be implemented as low-cost integrated circuits. Fewer encoders are needed and so it is less important if these are expensive. Whilst the cost of digital storage goes down year on year, the cost of electromagnetic spectrum goes up. Consequently in the future the pressure to use compression in recording will ease whereas the pressure to use it in radio communications will increase.

5.2 Lossless and perceptive coding

Although there are many different audio coding tools, all of them fall into one or other of these categories. In lossless coding, the data from the expander are identical bit-for-bit with the original source data. The so-called ‘stacker’ programs which increase the apparent capacity of disk drives in personal computers use lossless codecs. Clearly with computer programs the corruption of a single bit can be catastrophic. Lossless coding is generally restricted to compression factors of around 2:1.

It is important to appreciate that a lossless coder cannot guarantee a particular compression factor and the communications link or recorder used with it must be able to handle the variable output data rate. Audio material which results in poor compression factors on a given codec is described as difficult . It should be pointed out that the difficulty is often a function of the codec. In other words audio which one codec finds difficult may not be found difficult by another. Lossless codecs can be included in bit-error-rate testing schemes. It is also possible to cascade or concatenate
Sign up to read
Learn more about book
eBook - ePub
The Technology of Video and Audio Streaming
- David Austerberry(Author)
- 2013(Publication Date)
- Routledge
  (Publisher)
The typical scaled video data rate at a size and frame rate used with analog modems is 1.15 Mbit/s. Compression to a rate suitable for delivery below 56 kbit/s will require a further 30:1 reduction. To reduce the rate even further, some form of image compression has to be employed.

Compression

Compression removes information that is perceptually redundant; that is, information that does not add to the perception of a scene. Compression is a tradeoff between the level of artifacts that it causes and the saving in bandwidth. These trade-offs sometimes can be seen on satellite television. If too many channels are squeezed into one transponder, fast-moving objects within a scene can become blocky and soft.
Like scaling, compression of video splits into spatial compression (called intraframe) and temporal or (interframe) compression.

Intraframe compression

Single frames can be compressed with spatial, or intraframe, compression. This can be a simple system like run-length encoding, or a lossy system where the original data cannot wholly be reconstructed. A typical example of a lossy system is JPEG, a popular codec for continuous-tone still images.

Interframe compression

The next method to compress video is to remove information that does not change from one frame to the next, and to transmit information only in the areas where the picture has changed. This is referred to as temporal or interframe compression. This technique is one of those used by the MPEG-1, MPEG-2, and MPEG-4 standards.

Compression classes
The different algorithms are classified into families:

Lossless

Lossy

Naturally lossy

Unnaturally lossy

If all the original information is preserved the codec is called lossless. A typical example for basic file compression is PKZIP. To achieve the high levels of compression demanded by streaming codecs, the luxury of the lossless codecs is not possible – the data reduction is insufficient.

The goal with compression is to avoid artifacts that are perceived as unnatural. The fine detail in an image can be degraded gently without losing understanding of the objects in a scene. As an example we can watch a 70-mm print of a movie or a VHS transfer, and in both cases still enjoy the experience, even though the latter is a pale representation of the former.
Sign up to read
Learn more about book
eBook - ePub
Art of Digital Audio
- John Watkinson(Author)
- 2013(Publication Date)
- Routledge
  (Publisher)
The cost of digital storage continues to fall and the pressure to use compression for recording purposes falls with it. Perhaps it is in broadcasting and the Internet where the use of compression will have its greatest impact. There is only one electromagnetic spectrum and pressure from other services such as cellular telephones makes efficient use of bandwidth mandatory. Analog broadcasting is an old technology and makes very inefficient use of bandwidth. Its replacement by a compressed digital transmission will be inevitable for the practical reason that the bandwidth is needed elsewhere.

Fortunately in broadcasting there is a mass market for decoders and these can be implemented as low-cost integrated circuits. Fewer encoders are needed and so it is less important if these are expensive. Whilst the cost of digital storage goes down year on year, the cost of electromagnetic spectrum goes up. Consequently in the future the pressure to use compression in recording will ease whereas the pressure to use it in radio communications will increase.

5.2 Lossless and perceptive coding

Although there are many different audio coding tools, all of them fall into one or other of these categories. In lossless coding, the data from the expander are identical bit-for-bit with the original source data. The so-called ‘stacker’ programs which increase the apparent capacity of disk drives in personal computers use lossless codecs. Clearly with computer programs the corruption of a single bit can be catastrophic. Lossless coding is generally restricted to compression factors of around 2:1.

It is important to appreciate that a lossless coder cannot guarantee a particular compression factor and the communications link or recorder used with it must be able to handle the variable output data rate. Audio material which results in poor compression factors on a given codec is described as difficult. It should be pointed out that the difficulty is often a function of the codec. In other words audio which one codec finds difficult may not be found difficult by another. Lossless codecs can be included in bit-error-rate testing schemes. It is also possible to cascade or concatenate
Sign up to read
Learn more about book

Explore more topic indexes

Biological Sciences

Languages & Linguistics

Politics & International Relations

Social Sciences

Technology & Engineering