Personal tools

SCD ScalableColorDescriptor

From hpcwiki

Jump to: navigation, search

The Scalable Color Descriptor (SCD) can be interpreted as a Haar transform based encodingscheme applied across values of a color histogram in the HSV color space (see Section 2.1). The histogram values are extracted, normalized and non-linearly mapped into a 4-bit integer representation, giving higher significance to small values. The Haar transform is applied to the 4-bit integer values across the histogram bins. The basic unit of the transform consists of a sum operation and a difference operation (see Figure 4 (a)), which relate to primitive low pass and high pass filters. Summing pairs of adjacent bins is equivalent to the calculation of a histogram with half number of bins. From the sums of every two adjacent Hue bin values out of the 256-bin histogram , we get a representation of a 128-bin histogram with 8 levels in H, 4 levels in S and 4 levels in V. If this process is repeated, the resulting 64, 32 or 16 sum coefficients from the Haar representation are equivalent to histograms with 64, 32 or 16 bins. Table 1 shows the equivalent partitioning of the HSV color space for different number of coefficients of the Haar transform. If an application does not require the full resolution, limited number of Haar coefficients may simply be extracted from a 128, 64 or 32 bin histogram; this would still guarantee interoperability with another representation where all coefficients were extracted, but only to the precision of the coefficients that are available in both of the representations. Note that since all partitions in the original color space quantization are powers of 2, the combination with the Haar transform appears to be very natural. The high pass (difference) coefficients of the Haar transform express the information contained in finer-resolution levels (with higher number of bins). Histograms of natural image signals usually exhibit high redundancy between adjacent histogram bins. This can be explained by the “impurity” (slight variation) of colors caused by variable illumination and shadowing effects. Hence, it can be expected that the high pass coefficients expressing differences between adjacent histogram bins usually have only small values. Exploiting this property, it is possible to truncate the high pass coefficients to an integer representation with only a small number of bits. 4.1 Extraction and Matching Figure 4b shows the block diagram of the of the SCD extraction process. The output representation is scalable in terms of numbers of bins, by varying the number of coefficients used. Interoperability between different resolution levels is retained due to the scaling property of the Haar transform. Thus, matching based on the information from subsets of coefficients guarantees an approximation of the similarity in full resolution. Furthermore, as mentioned above, also the feature extraction operation can be scaled to lower levels (less bins in the source histogram). Besides the scalability in the number of histogram bins, another form of scalability is achieved by scaling the quantized (integer) representation of the coefficients to different numbers of bits. The “difference” coefficients in the Haar transform can take either positive or negative values. The sign part is always retained whereas the magnitude part can be scaled by skipping the least significant bits. Using the sign-bit only (1 bit / coefficient) leads to an extremely compact representation, while good retrieval efficiency is retained. At the highest- accuracy level, 1-8 bits are defined for integer representations of the magnitude part, depending on the relevance of the respective coefficients. Between these extremes, it is possible to scale to different resolution levels. For example, consider a set of five coefficients whose magnitudes are encoded using 8,4,7,3, and 7 bits, respectively, as shown in Figure 5. If the lowest 3 bits are discarded in the scalable bit representation, only 5,1,4,0, and 4 bits remain to encode the absolute value. l1-norm based matching (sum of absolute differences) can be applied in the Haar transformdomain; however, results are not identical with l1-norm based matching in the histogram domain. In the case where only the sign bit is used (all bit planes representing the absolute value discarded), the l1-norm degenerates to a Hamming distance, allowing very low complexity in the distance calculation. 4.2 Representation The scalability in the number of histogram bins and the number of bit planes are represented by the fields NumberofCoefficients and NumberofBitplanesDiscarded. The NumberofCoefficients is used to indicate whether 16, 32, 64, 128 or 256 bins (coefficients) are used. The NumberofBitplanesDiscarded specifies the number of bitplanes of the coefficients that are discarded, ranging from 0 to 8. In the case this value is 8, the magnitude of the coefficients are not present, only the sign of each coefficient is retained which is represented by the CoefficientSign. The magnitudes of the coefficients are represented in a bit-plane fashion, which means that the most significant bits of all coefficients are taken first, followed by the next most significant, etc. The bit plane representation allows the transmission of only a certain amount of most significant bits for bandwidth constrained applications. The representation is as follows: Field NumberofCoefficients NumberofBitplanesDiscarded CoefficientSign[ ] BitPlane[ ][ ] 4.3 Number Bits 3 of 3 NumberofCoefficients See text Meaning Specifies the number of histogram bins = 16,32,64,128,256 Specifies discarding 0 to 8 bitplanes The sign of each coefficient Coefficient magnitudes represented in a bitplane fashion Experimental Results Retrieval results achieved by the SCD are shown in Figure 6. In addition, the ANMRR quality measure was calculated from matching in the histogram domain, after performing an inverse Haar transform. The results show that a reasonable performance can be achieved even with small numbers of bits, while the performance saturates between 256 and 512 bits.