VSNR: A Wavelet-Based Visual Signal-to-Noise Ratio for
Natural Images
(C++ and MATLAB implementations below)
D. M. Chandler and S.
S. Hemami
IEEE Transactions on Image Processing, Vol. 16 (9),
pp. 2284-2298, 2007. |
 |
- Update 1: VSNR performs even better on the LIVE image
database now that the
realigned DMOS values have been released. Using the
MATLAB code provided below, you should see correlation coefficients
of 0.96 and 0.97 between VSNR and DMOS on JPEG and JPEG-2000 images,
respectively (fits performed using a cubic polynomial).
- Update 2: The MATLAB source code for VSNR has been
updated. If you're using the C++ version, be sure to see the Usage
Note below.
|
| Abstract: This paper presents an efficient metric for
quantifying the visual fidelity of natural images based on
near-threshold and suprathreshold properties of human vision.
The proposed metric, the visual signal-to-noise ratio
(VSNR), operates via a two-stage approach: In the first stage,
contrast thresholds for detection of distortions in the presence
of natural images are computed via wavelet-based models of
visual masking and visual summation in order to determine
whether the distortions in the distorted image are visible. If
the distortions are below the threshold of detection, the
distorted image is deemed to be of perfect visual fidelity (VSNR
= inf) and no further analysis is required. If the distortions
are suprathreshold, a second stage is applied which operates
based on the low-level visual property of perceived contrast,
and the mid-level visual property of global precedence. These
two properties are modeled as Euclidean distances in
distortion-contrast space of a multiscale wavelet decomposition,
and VSNR is computed based on a simple linear sum of these
distances. The proposed VSNR metric is generally competitive
with current metrics of visual fidelity; it is efficient both in
terms of its low computational complexity and in terms of its
low memory requirements; and it operates based on physical
luminances and visual angle (rather than on digital pixel values
and pixel-based dimensions) to accommodate different viewing
conditions. |
|
Supplement: Performance on the
A57 Database (Preliminary Results) [Full Text PDF]
In addition to its performance on the LIVE image database, the
performance of the VSNR metric was analyzed on a preliminary database:
The A57 database. A psychophysical scaling experiment was performed on
various distorted images to obtain subjective ratings of visual fidelity
(A57 database); the metric was then applied to these images, and then
the predicted results were compared with the actual subjective results.
For comparison, these same sets of images were analyzed in terms of
PSNR, the Universal Quality Index (UQI), the Noise Quality Measure
(NQM), the Structural Similarity (SSIM) metric, and the Visual
Information Fidelity (VIF) metric.
It is important to note that due to the limited number of images
and limited number of human subjects, the A57 database is of limited
statistical reliability. The results provided here should be considered
preliminary and subject to change.
Three natural images, horse, harbor, and baby,
obtained from the Kodak image database served as original images in this
study; two of these images, horse and baby, were also used in [11]. The
digital images were of size 512×512 pixels and were 8-bit grayscale with
pixel values in the range 0−255. Figure 1 depicts the three original
images. These images were distorted with six types of distortions:
- Quantization of the LH subbands of a 5-level DWT of the image
using the 9/7 filters; the bands were quantized
via uniform scalar quantization with step sizes chosen such that the
RMS contrast of the distortions was equal
- Additive Gaussian white noise.
- Baseline JPEG compression of the image (using the standard
quantization matrix).
- JPEG-2000 compression of the image (using the 9/7 filters and no
visual frequency weighting).
- JPEG-2000 compression (using the 9/7 filters) with the Dynamic
Contrast-Based Quantization (DCQ) algorithm which applies greater
quantization to the fine spatial scales relative to the coarse
scales in an attempt to preserve global precedence; the DCQ
algorithm was applied assuming sRGB display characteristics and a
viewing distance of three picture heights.
- Blurring by using a Gaussian filter.
Figure 1 depicts the three original images used in this study. Figure
2 depicts graphs of the subjective ratings of perceived distortion (on
the vertical axis) plotted against each metric’s transformed output (on
the horizontal axis). Further information regarding these results and
the experimental methods used in the experiment are available in the
document: vsnr_a57.pdf.
 |
| Fig. 1. Three natural images, horse, harbor,
and baby, used in the subjective rating experiments. |
 |
| Fig. 2. Subjective ratings of perceived distortion
plotted against predicted values from each of the six metrics:
(a) PNSR, (b) UQI, (c) NQM, (d) SSIM, (e) VIF, and (f) VSNR. In
all graphs, the vertical axis denotes perceived distortion as
reported by subjects; error bars represent standard deviations
of the means. The horizontal axes correspond to transformed
metric outputs. |
|