Image-DNA: the key to high speed video analytics

One of the key features of the underlying technology in the Astraguard products was the extensive use of wavelet analysis (see lecture notes by BJ ). Use of wavelet representations of images made it possible to simply and quickly remove noise form images and to exploit the scale dependency of the image data to maintain the highest resolution. Moreover, the use of wavelets allowed, at no extra computational cost, the computation of multi-scale components of the image Hessian fields and the associated analysis of image structure. This technology grew out and fed back into BJ's astrophysics research on the analysis cosmic structure (see keynote talk by BJ 2016 ).

Camera data quality: colour, resolution and sensitivity
MarkVI Frame grabber

Astraguard's MarkVI frame grabber produced high quality colour images from analogue PAL/NTSC video input. It supported from 1-16 cameras.

In the early days of analogue video imaging the low-cost colour cameras used in the security industry had very poor colour rendition. This was made worse by low grade analogue to digital converters and pictures were generally encoded in 4:2:0 video format. The situation with cheap IP cameras is not much better.

Astraguard manufactured a series of high quality "frame grabbers" over a period of 20 years. Their function was to convert analogue video to a digital format that could be handled by a computer. In other words, we converted analogue video cameras into high quality IP cameras, though at a lower pixel resolution. (The pixel format was determined by the PAL or NTSC video standards.) The last Astraguard frame grabber, the "Mark VI" was the only colour grabber of the series.

By design, the Astraguard Mark VI frame grabber had excellent colour rendition, making it higher in cost but allowing a better rendition in 4:2:2 format. This improvement performance meant that we could work in colour and develop software to analyse colour as well as intensity.

Wavelet representation of an image (technical)

Wavelets transform

The wavelet transform, in the context of images, is a mathematical device by means of which an image can be shrunk by a factor or two in both horizontal and vertical directions, and then reconstructed from the shrunken image. In order to do this, the shrinking process must also generate additional information which enables the reconstruction to work: there is no free lunch.

This additional information takes the form of three separate images, each having the same dimensions as the shrunken image: so we have the same number of pixels-worth of data comprising of four quarter-size images. These three images are referred to as the "detail", they supply the details that is in the original image, but is missing from the shrunken image.

Wavelets transform

So why do this? The key factor is that these support images have a lot of data that is not essential to the reconstruction. They can be marked by setting them to have a zero-value. Then we get the image appearing on the right hand panel of the above figure. While the total number of pixels of data has not changed, most of them are marked as zero-value and can be ignored in the reconstruction.

We see in the right hand panel large blank areas which correspond to the architectural structures in the original image. These areas become blank because not much information is required to characterise the surface of those structures. That is the essence of wavelet image compression: the wavelet allows us to sort out what is important and what is not.

Pyramidal decomposition (technical)
Wavelet pyramid

The wavelet pyramid. A succession of wavelet transforms on the shrunken image data reduces the resolution of the image detail by a factor 2 at each stage. We are, in effect, looking at ever larger scale structures at lower resolution in the image as we head towards the top of the pyramid.

The four images that make up the wavelet transform image are conventionally labelled "SS" for the shrunken source in the top left, "SD" for the additional vertically oriented data that is required for reconstruction, "DS" for the horizontal data, and "DD" for the diagonal.

Indeed we see from the right hand panel of the previous figure that the SD image emphasizes vertical lines while the DS image emphasizes the horizontal and the DD the diagonals.

The half-size image that results from the first shrinking can itself be shrunk by a factor of two with its own three support images for reconstruction. This process is shown in the above figure where the picture on the left is shrunk twice, each time generating three of the support images necessary for the reconstruction. This is shown in the middle panel.

The classical Wavelet Pyramid is illustrated to the right. The wavelet transform produces an image of half the size of the original (quarter the number of pixels) together with three other images that are, in effect, second order directional derivatives of the image gradient.

Image-DNA (technical)

Image-DNA is the information carried by the detail maps of the wavelet pyramid. Since we are interested specifically in motion, a set of motion-maps is used as the basis. Likewise, any other feature of interest is encoded in the same way, the corresponding detail maps being used to create the Image-DNA for that attribute.

Why do we call this "Image-DNA"? Because the "bases" that make up the information are the encoded eigenvalues of the Hessian matrix of the structure at a point. There is a set of such eigenvalues for each scale in the wavelet transformation.

For more details about eigenvalues of Hessian matrices of images see my two papers: Aragon-Calvo et al. and Cautun et al. .

Concluding technical remarks

It is important to recognise that, while images have scaling properties which are highlighted by wavelets, those scaling properties themselves depend on the scale. In other words, a typical image scales as a multifractal having a continuum of (Renyi) dimensions. It is therefore natural to use different wavelets at different levels. The multifractal spectrum reflects textures in the image. See this paper on multifractal scaling and this one in relation to textures . These are set in an astrophysical context, but the theory is applicable to any structured distribution in two or three dimensions.