Personal tools

MPEG 7

From hpcwiki

Jump to: navigation, search

Estándar de la Organización Internacional para la Estandarización ISO/IEC, generado por el grupo MPEG para la caracterización de contenido multimedia que tiene como objetivos:

  • Habilitar un método rápido y eficiente de búsqueda, filtrado e identificación de contenido.
  • Describir aspectos principales del contenido (características de bajo nivel, estructura, semántica, modelos, colecciones, etc.)
  • Indexar un gran abanico de aplicaciones.
  • El tipo de información a tratar es: audio, voz, vídeo, imágenes, gráficos y modelos 3D.
  • Informar de cómo los objetos están combinados dentro de una escena.
  • Independencia entre la descripción y el soporte dónde se encuentra la información.

Lo cual ha ayudado a realizar distintos estudios los cuales permitieron avances en el campo de la visión por computador.

El estándar MPEG-7 maneja una variabilidad de descriptores visuales que se centran sus características en el color algunos de estos descriptores

Contents

Dominant color descriptor

F={\{
   \lbrace
     {c_i,p_i,v_i}
   \},s}
   \rbrace,(i=1...N)

where N is the number of dominant colors. Each dominant color value c_i is a vector of corresponding color space component values, The percentage pi (normalized to a value between 0 and 1) is the fraction of pixels in the image or image region corresponding to color ci, and  \sum_{i=1}^n p_i=1. The optional color variance v_i describes the variation of the color values of the pixels in a cluster around the corresponding representative color. The spatial coherency s is a single number that represents the overall spatial homogeneity of the dominant colors in the image. The number of dominant colors N can vary from image to image and a maximum of eight dominant colors was found to be sufficient to represent an image or an image region . The color space quantization depends on the color space specifications defined for the entire database and need not be specified with each descriptor.

Extracción

D_{i}  =  \sum_{n}  h(x)||x(n)-c_{i})||^{2}  ,  x(n)\in C_{i}


c_{i}  =  \frac{\sum h(n)x(n)}{\sum h(n)} ,  x(n)\in C_{i}


Color Structure Descriptor

El Color Structure Descriptor como se define en[Jens-Rainer Ohm - The MPEG-7 Color Descriptors]”representa una imagen por lo tanto su distribución de color y la estructura espacial local del color”. Este descriptor de color posee información adicional que permite lo hace un descriptor sensible a algunas características que el histograma de color no. El CSD es muy similar al histograma de color si nos referimos a las forma pero su significado cambia. Este se define como un arreglo unidimensional de 8 bits: CSD  =  \overline{h}_{s}(m)  ,  m\in\{1,...,M\}

donde M puede tener 4 distintos valores que son {256,128,64,32} y s representa una escala que se asocia al elemento cuadrado estructurado. Uno de los detalles de este descriptor es que utiliza el espacio de color HMMD el cual se obtiene mediante:

      %Espacio de color HMMD
      Max = max(R, G, B);  
      Min = min(R, G, B);  
      if( Max == Min )
         Hue=0; 
      otherwise:  
      if( Max == R && G >= B )   
          Hue = 60*(G-B)/(Max-Min)  
      else if( Max == R && G < B )   
          Hue = 360 + 60*(G-B)/(Max-Min)  
          else if( G == Max )   
          Hue = 60*(2.0 + (B-R)/(Max-Min))  
          else   
             Hue = 60*(4.0 + (R-G)/(Max-Min))
          /*Hue-Max-Min-Diff*/
      Diff = Max - Min;

El algoritmo nos demuestra como extraemos los valores que se requieren para el manejo de este espacio de color que utiliza el componente matiz (Hue) del espacio de color HSV, utilizando el valor Max y Min extraídos del espacio de color RGB y como ultimo se toma la diferencia entre el valor máximo y mínimo del espacio RGB por cada píxel de la imagen.

K = 2 ^ p


p=max{
   \lbrace 0,\lfloor \log_2 \sqrt {W*H-0.75} \rfloor \rbrace}


where W and H are the picture width and height respectively, and where ⎣⋅⎦ is the floor operator. The reader is directed to [1] and [2] for an equivalent formulation where the accumulation requires no explicit sub-sampling of the image.

Scalable Color Descriptor

SCD ScalableColorDescriptor

Color layout descriptor

The Color Layout Descriptor (CLD) is a very compact and resolution-invariant representation of color for high-speed image retrieval. It is designed to efficiently represent spatial distribution of colors. This feature can be used for wide variety of similarity-based retrieval, content filtering, and visualization. It is especially useful for spatial-structure based retrieval applications, for example, sketch based retrieval and video segment identification. The sketch-based retrieval is considered to be a very important functionality since it can offer very user-friendly interfaces, especially when the search is fast enough. The functionalities of this descriptor are image-to-image matching and video-clip-to-video-clip matching, and sketch to image/video-clip matching.. Description of the color layout can also be achieved using the Grid Layout data type of MPEG-7 and the Dominant Color Descriptor. However, this combination would require a relatively large number of bits, and matching will be more complex and expensive. CLD provides more precise and faster retrieval using more compact description.

Extraction

This descriptor is obtained by applying the DCT transformation on a two dimensional array of local representative colors in Y/Cb/Cr color space. Figure 9 illustrates the extraction process of the descriptor from an image. It consists of four stages, image partitioning, representative color detection, DCT transformation, and non-linear quantization of the zigzag-scanned coefficients. In the first stage, an input picture is divided into 64 blocks to guarantee the resolution or scale invariance. In the next stage, a single dominant color is selected from each block. Any method to select representative color can be applied, but it is recommended to use the average of pixel colors as the representative color since it is most simple and the description accuracy is enough in general. The selection results in a tiny image icon of size 8x8. In the third stage, each of the three color-components is transformed by 8x8 DCT, so three sets of 64 DCT-coefficients are obtained. They are zigzag scanned and the first few coefficients are non-linearly quantized (using 64 and 32 levels for DC and AC coefficients, respectively). The standard allows scalable representation of the feature by controlling the number of enclosed coefficients. It is recommended to use a total of 12 coefficients, 6 for luminance and 3 for each chrominance, for most of the images. However, another option to use a total 18 coefficients (6 for both luminance and chrominance) can also be considered to apply this descriptor for high-quality still pictures. The total bit-length of the recommended descriptor (12 coefficients) is just 64 bits including one signaling bit, which specifies the extension of the number of coefficients. It should be noted that this descriptor is one of the more compact descriptors in the MPEG-7/Visual and is quite suitable for applications having limitations on storage and/or bandwidth

Representation

The number of DCT coefficients used in the CLD is variable and is represented by the CoefficientPattern field. The CoefficientPattern field can take three possible values. The first value indicates the use of six DCT coefficients for luminance and three each for chrominance, the second values indicates the use of six coefficients for both luminance and chrominance. For the third value of the CoefficientPattern , the number of DCT coefficients are represented by the NumberofYCoeff and NumberofCCoeff fields. The possible number of coefficients is one of 3, 6, 10, 15, 21, 28, and 64. The actual values of the coefficients are represented by the arrays Ycoeff, CbCoeff and CrCoeff. The lengths of each of these is either five or six bits depending on the coefficient.

Referencias

  1. ISO/IEC/JTC1/SC29/WG11 : “Text of ISO/IEC 15938-3 Multimedia Content Description Interface – Part 3 : Visual. Final Committee Draft”, document no. N4062, Singapore, March 2001. [LESZEK]
  2. ISO/IEC/JTC1/SC29/WG11 : “MPEG-7 Visual Experimentation Model (XM), Version 10.0”, document no. N4063, Singapore, March 2001. [LESZEK]