VSNR: A Wavelet-Based Visual Signal-to-Noise Ratio for Natural Images
(C++ and MATLAB implementations below)

D. M. Chandler and S. S. Hemami

IEEE Transactions on Image Processing, Vol. 16 (9), pp. 2284-2298, 2007.

  • Update 1: VSNR performs even better on the LIVE image database now that the realigned DMOS values have been released. Using the MATLAB code provided below, you should see correlation coefficients of 0.96 and 0.97 between VSNR and DMOS on JPEG and JPEG-2000 images, respectively (fits performed using a cubic polynomial).
     
  • Update 2: The MATLAB source code for VSNR has been updated. If you're using the C++ version, be sure to see the Usage Note below.

Abstract: This paper presents an efficient metric for quantifying the visual fidelity of natural images based on near-threshold and suprathreshold properties of human vision. The proposed metric, the visual signal-to-noise ratio (VSNR), operates via a two-stage approach: In the first stage, contrast thresholds for detection of distortions in the presence of natural images are computed via wavelet-based models of visual masking and visual summation in order to determine whether the distortions in the distorted image are visible. If the distortions are below the threshold of detection, the distorted image is deemed to be of perfect visual fidelity (VSNR = inf) and no further analysis is required. If the distortions are suprathreshold, a second stage is applied which operates based on the low-level visual property of perceived contrast, and the mid-level visual property of global precedence. These two properties are modeled as Euclidean distances in distortion-contrast space of a multiscale wavelet decomposition, and VSNR is computed based on a simple linear sum of these distances. The proposed VSNR metric is generally competitive with current metrics of visual fidelity; it is efficient both in terms of its low computational complexity and in terms of its low memory requirements; and it operates based on physical luminances and visual angle (rather than on digital pixel values and pixel-based dimensions) to accommodate different viewing conditions.

- Click here for the online supplement to the paper (preliminary results).
 
- Click here for the "A57" database (10 MB ZIP file).
- Click here for the "A57" database (10 MB TGZ file).
 
- C++ source code: vsnr_source.zip (260 KB) [Usage Note: You must crop input images to dimensions that are multiples of 32 pixels when using this code; this is not required with the MATLAB code]
-
MATLAB source code: vsnr_matlab_source.zip (740 KB)  [Updated 04/02/2008]
  

Supplement: Performance on the A57 Database (Preliminary Results) [Full Text PDF]

In addition to its performance on the LIVE image database, the performance of the VSNR metric was analyzed on a preliminary database: The A57 database. A psychophysical scaling experiment was performed on various distorted images to obtain subjective ratings of visual fidelity (A57 database); the metric was then applied to these images, and then the predicted results were compared with the actual subjective results. For comparison, these same sets of images were analyzed in terms of PSNR, the Universal Quality Index (UQI), the Noise Quality Measure (NQM), the Structural Similarity (SSIM) metric, and the Visual Information Fidelity (VIF) metric.

It is important to note that due to the limited number of images and limited number of human subjects, the A57 database is of limited statistical reliability. The results provided here should be considered preliminary and subject to change.

Three natural images, horse, harbor, and baby, obtained from the Kodak image database served as original images in this study; two of these images, horse and baby, were also used in [11]. The digital images were of size 512×512 pixels and were 8-bit grayscale with pixel values in the range 0−255. Figure 1 depicts the three original images. These images were distorted with six types of distortions:

  1. Quantization of the LH subbands of a 5-level DWT of the image using the 9/7 filters; the bands were quantized
    via uniform scalar quantization with step sizes chosen such that the RMS contrast of the distortions was equal
  2. Additive Gaussian white noise.
  3. Baseline JPEG compression of the image (using the standard quantization matrix).
  4. JPEG-2000 compression of the image (using the 9/7 filters and no visual frequency weighting).
  5. JPEG-2000 compression (using the 9/7 filters) with the Dynamic Contrast-Based Quantization (DCQ) algorithm which applies greater quantization to the fine spatial scales relative to the coarse scales in an attempt to preserve global precedence; the DCQ algorithm was applied assuming sRGB display characteristics and a viewing distance of three picture heights.
  6. Blurring by using a Gaussian filter.

Figure 1 depicts the three original images used in this study. Figure 2 depicts graphs of the subjective ratings of perceived distortion (on the vertical axis) plotted against each metric’s transformed output (on the horizontal axis). Further information regarding these results and the experimental methods used in the experiment are available in the document: vsnr_a57.pdf.

Fig. 1. Three natural images, horse, harbor, and baby, used in the subjective rating experiments.

Fig. 2. Subjective ratings of perceived distortion plotted against predicted values from each of the six metrics: (a) PNSR, (b) UQI, (c) NQM, (d) SSIM, (e) VIF, and (f) VSNR. In all graphs, the vertical axis denotes perceived distortion as reported by subjects; error bars represent standard deviations of the means. The horizontal axes correspond to transformed metric outputs.

 


 
Damon M. Chandler
306 Engineering South
Image Coding and Analysis Lab
School of Electrical and Computer Engineering
Oklahoma State University
Stillwater, OK 74078

(405) 744-9924
damon.chandlerokstate.edu
S. S. Hemami
332 Rhodes Hall
Visual Communications Lab
School of Electrical and Computer Engineering
Cornell University
Ithaca NY 14853

(607) 255-6393
hemamiece.cornell.edu