An embedded image coder incorporating a perceptually motivated spatially - and scale - adaptive coefficient reordering is proposed. The spatially adaptive reordering is achieved by weighting wavelet coefficients according to the classification of corresponding pixels in the image as belonging to one of three perceptually significant activity regions: smooth, edge, or detailed. The scale-adaptive reordering omits the finest wavelet coefficients until a pre-determined bit rate has been achieved, significantly improving very low rate images. The perceptual weights are known at the decoder but no side information is necessary for the activity classification which is hidden in the bit stream. By classifying the coefficients according to their activity, the coder better distributes the compression artifacts around the less visually significant regions of the image thereby achieving higher perceptual quality than the standard SPIHT algorithm especially at low bit rates and around the strong edges of the image.
Introduction
The SPIHT coder [1] is a powerful image compression algorithm that produces an embedded bit stream from which the best reconstructed images in the mean square error sense can be extracted at various bit rates. The perceptual image quality, however, is not guaranteed to be optimal since the coder is not designed to explicitly consider the human visual system (HVS) characteristics. Extensive HVS research has shown that there are three perceptually significant activity regions in an image: smooth, edge, and textured or detailed regions [2]-[4]. By incorporating the differing sensitivity of the HVS to these regions in image compression schemes such as SPIHT, the perceptual quality of the images can be improved at all bit rates [5].
Previous work to improve the visual quality of embedded coders has applied just noticeable distortion thresholds for uniform noise in different subbands to weight the transform coefficients but no distinction made between coefficients belonging to different activity regions inside a subband [6]. In this paper, the differing activity regions are used to assign perceptual weights to the transform coefficients prior to SPIHT encoding.
Activity Selective SPIHT Coder
A block diagram of the coder is shown in Figure 1. The image is first segmented into 16 x 16 smooth, edge, or detailed blocks using the segmentation algorithm introduced in [7]. The image is then transformed into a 4 level subband decomposition using the 9-7 biothogonal filters. The activity classification is used to apply perceptual weights to the high frequency transform coefficients by noting that a 16 x 16 image block belonging to a given activity region corresponds to single coefficients in the high frequency bands at the fourth decomposition level. Since humans are more sensitive to the edge regions of the image than the smooth and detailed regions, higher weights are applied to the edge regions than the smooth and detailed regions, respectively. The weights increase with the decomposition level to reflect the higher perceptual importance of the coarsest bands. After the weights are applied to the transform coefficients, the image is then coded with the SPIHT algorithm. To ensure that the most visually significant information is sent at very low bit rates, a scale-adaptive reordering is introduced to omit the finest wavelet coefficients until a pre-determined bit rate has been achieved. This results in extra bits to be used in encoding the more perceptually important information of the coarser bands, significantly improving the visual quality of very low bit rate images.
![]() |
At the decoder side, the perceptual weights are known but the activity classification is inferred from the bit stream, with no side information being transmitted. In order to hide the activity classification of the coefficients in the bit stream, the encoder uses the parity modulo 3 of the sum of the high frequency coefficients belonging to the 4th and 3rd level bands. If the parity of the sum modulo 3 is zero, the coefficients belong to a smooth region, if the parity modulo 3 is one, the coefficients belong to an edge region, otherwise, the coefficients belong to a detailed region. The parity is forced at the encoder by modifying only the least significant bits of the non-zero coefficients starting from the $HH$ band at the 3rd decomposition level. In this way, no extra bits are used to encode the activity classification.
Results
The activity selective coder achieves higher perceptual quality and PSNRs than the standard SPIHT coder at very low to medium bit rates. Both coders perform well for high bit rates, with the standard SPIHT coder achieving slightly higher PSNRs and slightly lower visual quality. Hiding the classification information among the high frequency coefficients has no visually noticeable artifacts, and the drop in PSNR to do so is only a few fractions of dB. Figure 2 presents the PSNR results for the barbara image. The toggle bit rate at which the finest coefficients start to be included in the bit stream is 0.2 bpp. At that bit rate, the PSNR performance curves intersect, and the standard SPIHT coder starts to show a slight advantage in PSNR. The visual quality of the activity selective images is, however, better throughout the span of bit rates.
![]() |
Figure 3 shows the barbara image coded at several bit rates. At very low bit rates, the visual quality of the activity selective SPIHT coded images is considerably better especially around the edges of the images. This is a result of both the perceptually motivated spatially adaptive reordering as well as the scale-adaptive reordering that omits the finest wavelet coefficients at very low bit rates until the toggle bit rate is achieved.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Table I shows the percentage of bits used to code coefficients corresponding to each activity region for both coders. The percentage of bits devoted to coding edge information is higher than the standard SPIHT algorithm thereby considerably improving the visual quality of very low bit rate images. At bit rates higher than the toggle bit rate, the difference in the bit distributions is smaller, due to the presence of the finest coefficients in both coders.
| Image | Bit Rate (bpp) | Coder | Bit Distributions | ||
| Smooth | Edge | Detailed | |||
| barbara | 0.05 | SPIHT | 29.62 % | 67.75 % | 2.63 % |
| barbara | 0.05 | AS-SPIHT | 28.17 % | 69.29 % | 2.54 % |
| barbara | 0.2 | SPIHT | 21.88 % | 74.71 % | 3.40 % |
| barbara | 0.2 | AS-SPIHT | 16.73 % | 80.68 % | 2.59 % |
| barbara | 0.5 | SPIHT | 12.12 % | 83.18 % | 4.69 % |
| barbara | 0.5 | AS-SPIHT | 13.13 % | 84.21 % | 2.65 % |
| mandrill | 0.05 | SPIHT | 21.54 % | 32.38 % | 46.18 % |
| mandrill | 0.05 | AS-SPIHT | 18.57 % | 34.88 % | 46.55 % |
| mandrill | 0.2 | SPIHT | 15.04 % | 29.45 % | 55.51 % |
| mandrill | 0.2 | AS-SPIHT | 17.15 % | 36.22 % | 46.63 % |
| mandrill | 0.5 | SPIHT | 8.61 % | 24.11 % | 67.28 % |
| mandrill | 0.5 | AS-SPIHT | 10.36 % | 35.77 % | 53.87 % |
Concluding Remarks
This paper presents an embedded coder in which a perceptually motivated spatially - and scale - adaptive coefficient reordering is proposed. Results show that the proposed coder achieves higher perceptual quality than th e standard SPIHT algorithm.
References
[1] A. Said and W. Pearlman, ``A new fast and efficient image codec based on set partitioning in hierachical trees", IEEE Transactions on Circuits and Systems for Video Technology, (6)6, pp. 243-50, June 1996.
[2] D. Marr, Vision, W. H. Freeman and Company, New York, 1982.
[3] X. Ran and N. Farvardin, ``A perceptually-motivated three-component image model - part I: description of the model", IEEE Transactions on Image Processing, (4)4, pp. 401-15, April 1995.
[4] X. Ran and N. Farvardin, ``A perceptually-motivated three-component image model - part II: applications to image compression", IEEE Transactions on Image Processing, (4)4, pp. 430-47, April 1995.
[5] M. G. Ramos and S. S. Hemami, ``Perceptually-based scalable image coding for packet networks", to appear in Journal of Electronic Imaging, special issue on Image/Video Compression and Processing for Visual Communications, July 1998.
[6] I. Hontsch, L. J. Karam, and R. J. Safranek, ``A perceptually tuned embedded zerotree image coder", Proceedings IEEE International Conference on Image Processing, Santa Barbara, CA, vol. 1, pp. 41-4, October 1997.
[7] M. G. Ramos, S. S. Hemami, and M. A. Tamburro, ``Psychovisually-based multiresolution image segmentation", Proceedings IEEE International Conference on Image Processing, Santa Barbara, CA, vol. 3, pp. 66-9, October 1997.