where f,(x, j', i) is the current intensity value of the pixel at location (.x, j') at time i, and +c(•, y, t) is the set of allowable values for that pixel to be figure at that time.
2.2. Construction oJ’ a figure image
There is a wide variety of techniques that could be used for the identification of whether a pixel is part of the figure or the ground. For example, we could possess a model of the average shape of objects and attempt to fit this model to locations within an image. However, detection algorithms that are too computationally intensive will not be able to perform within the limited time available for real-time processing. In searching for a fast means of estimating the figure/ground state of a pixel, we
consider the following heuristic regarding moving I (x,y, I) ’iL I —— 0
objects. These objects tend to be located where pixel intensities have recently experienced significant change, while environmental objects tend to be displayed by pixels whose intensity change is very slight. Thus, a temporal comparison between pixel
otherwise
(3)
intensities may yield information about the existence of important objects.
As a base for a temporal comparison, the proposed framework maintains a ground image (kg(NJ k)) which represents the environment’s past history. For each pixel in the current image (I,(x, y, i)), a comparison is made to the corresponding pixel in the ground image. If they differ by more than a threshold intensity amount, then the pixel is considered to be part of a binary figure image (fJx, y, i)). In accordance with our heuristic, the set T,(x, y, i) of acceptable intensity values is selected so as to yield the following pixel acceptance test for the construc- tion of the figure image:
1 if Jf,(x, y, t)
0 otherwise.
The choice of threshold value c plays an important role in this process. If this threshold is too small, then portions of an object may blend into the background. If the threshold is too large, then slight changes in the environment will cause “false positive” errors in which the figure image contains many pixels that do not belong to objects of interest. A more sensitive (i.e. smaller) threshold can be used if the images are preprocessed with a low-pass filter (i.e. the image is convolved with an 8 x 8 discrete approximation to a Gaussian kernel). Inherent noise in the vision sensor’s signal occasionally causes small, brief variations in an area’s intensity values. By spatially averaging pixel intensities, the low-pass filter reduces the difference values that result from this noise. In addition, this filter can be implemented with Datacube pipeline processing elements (the Datacube is part of our hardware environment), making it a more efficient approach than more complicated means of noise reduction. Figure 1 illustrates the construction of a figure image by a visual servoing system that is monitoring the activity of objects moving through its environment.
2.3. Construction of a ground image
Initially, the ground image is a copy of the first image of a sequence. However, environmental changes (e.g. lighting changes and shadows) may cause a back- ground pixel’s intensity to vary over time. To account for this dynamic aspect of some environ- ments, our system periodically updates the ground image with information from the current frame of intensities. Rather than periodically replacing the previous ground image, a new ground image is produced by incorporating new intensity values from the current image according to a time-average: