Title: Sliding Window Filters : A Set-based Implementation

Author: Martin Moene

Date: 11 March 2016 14:37:22 +00:00 or Fri, 11 March 2016 14:37:22 +00:00

Summary: Omar Bashir presents an implementation using pre-sorted data to reduce CPU load.

Body:

Sliding window filters are commonly used for noise reduction and data smoothing in applications ranging from algorithmic trading in financial systems to signal processing in digital media systems. Efficient implementations of these filters optimise overall performance of systems in which they operate.

A set-based implementation of a sliding window filter is discussed which optimises the performance of the most common of sliding window filters, the median filter. This implementation can easily be extended to use measures other then median for filtering.

Motivation

A median filter operates on a window over the data stream. This window is implemented using a queue of a specified size. When the queue is full, an input causes the earliest value in the queue to be ejected and discarded. The output of the filter is the median of the values in the queue (see figure 1).

Figure 1

An obvious implementation of the Median Calculator in figure 1 sorts the data in the queue and then calculates the median from the sorted collection. An alternative and efficient implementation maintains a sorted collection which is updated with every new insert. If the queue is full, this update involves removing the earliest value from both the queue and the sorted collection and then adding the new value into both. Updating a sorted collection is generally a computationally less expensive operation than sorting the complete list (i.e., ln(n) vs. n ln(n)). If the sorted collection is index based, calculation of median from it is a constant complexity operation. Figure 2 shows operation of a median filter using a sorted collection.

Figure 2

Sorted collections

Bashir [1] demonstrated the versatility of sorted collections of numerical data in the context of performance monitoring of computer networks. Analysis of network latencies and investigation of related incidents requires iterative derivation and analysis of several summaries at different levels of temporal granularities. Processing these statistical summaries from raw data in every iteration can be computationally intensive.

Bashir proposes preprocessing raw data to intermediate information at the lowest level of granularity. This intermediate information should have properties of richness, efficiency and mergeability. Richness is the ability to derive most statistical summaries from intermediate information without accessing the raw data. Efficiency requires calculation of statistical summaries to be equally or less computationally complex than deriving them from raw data. Finally mergeability is the ability to combine appropriate intermediate information at finer granularities to provide intermediate information at coarser granularities. These can then be processed efficiently to derive required summaries at those granularities.

Sorted numerical data exhibit all these properties. In the context of filtering based on measures of central tendency, richness and efficiency are desired properties while mergeability is less relevant as these filters generally operate at fixed granularities.

Using sorted numerical data also allows decomposing such filters into a preprocessor and a summary calculator. The preprocessor queues the input and maintains a sorted copy of the queued data. The summary calculator produces the desired measure of central tendency from the sorted collection. Thus, the output of the summary calculator is the filtered value. Decoupling the two allows developing filters based on different measures of central tendency using the same preprocessor implementation.

However, such preprocessors need to maintain a sorted copy of the window in addition to the window itself. This can be an issue for larger window sizes in applications that need to have a smaller memory footprint.

A sliding window preprocessor implementation

A Java implementation of a sliding window preprocessor is described here. This preprocessor uses an instance of the ArrayBlockingQueue, an implementation of the Queue interface, as the sliding window. A TreeSet instance is used as the sorted collection of elements in the sliding window.

A TreeSet does not allow recurring values while a sliding window algorithm accepts recurring values. To allow recurring values in a TreeSet, each value is encapsulated with a monotonically increasing integer that is incremented for every new value. OrderedElement class (Listing 1) encapsulates the order of arrival with the arriving value. OrderedElementâ€™s implementation of the Comparable interface compares the order of arrival for recurring values. This allows recurring values encapsulated in an OrderedElement instance to exist in a TreeSet in sorted order.

public class OrderedElement<T extends Comparable<T>> implements Comparable<OrderedElement<T>>{ 

  private Integer order; 
  private T value; 

  public OrderedElement(Integer order, T value) { 
    this.order = order; 
    this.value = value; 
  }

  public Integer getOrder() { 
    return order; 
  }

  public T getValue() { 
    return value; 
  } 

  @Override 
  public int compareTo(OrderedElement<T> o) {
    int valueComp = value.compareTo(o.value);
    return (valueComp != 0) ? valueComp 
      : order.compareTo(o.order); 
  } 
  @Override 
  public boolean equals(Object o) { 
    return (null != o) && 
      (o instanceof OrderedElement<?>) && 
      ((this == o) ||
      (this.compareTo((OrderedElement<T>) o) 
       == 0)); 
  }
 ...
}

Listing 1

Listing 2 shows the preprocessor, SetOrderedWindow. An instance of this class uses queue, an ArrayBlockingQueue instance, and sorter, a TreeSet instance. SetOrderedWindow maintains an integer, sequence, which is incremented every time the insert method is called to insert a value into the window.

public class 
    SetOrderedWindow<T extends Comparable<T>> {
  private
    ArrayBlockingQueue<OrderedElement<T>> queue; 
  private Set<OrderedElement<T>> sorter;
  private int sequence;
  public SetOrderedWindow(int size) {
    this.sequence = 0;
    this.queue = new
     ArrayBlockingQueue<OrderedElement<T>>(size);
    this.sorter = new
     TreeSet<OrderedElement<T>>();
  }
  public int getCurrentSize() {
    return queue.size();
  }
  public int getCapacity() {
    return queue.size() +
      queue.remainingCapacity();
  }
  public void insert(T value) 
      throws InterruptedException {
    OrderedElement<T> element = 
      new OrderedElement<>(sequence, value);
    sequence++;
    if (queue.remainingCapacity() == 0) {
      OrderedElement<T> elementToRemove 
        = queue.poll();
      sorter.remove(elementToRemove);
    }
    queue.put(element);
    sorter.add(element);
  }
  public ArrayList<T> getElementsSorted() {
    ArrayList<T> reply = new ArrayList<T>();
    for (OrderedElement<T> element : sorter) {
      reply.add(element.getValue());
    }
    return reply;
  }
  public List<T> getElements() {
    List<T> reply = new ArrayList<T>();
    for (OrderedElement<T> element : queue) {
      reply.add(element.getValue());
    }
    return reply;
  }
}

Listing 2

If there is still capacity in the window, the input value is encapsulated with the sequence number in an instance of OrderedElement and inserted into both queue and sorter. If there is no capacity in the window, SetOrderedWindow removes the element at the head of the queue and also removes the same element from sorter before inserting the new element to both queue and sorter (Figure 3).

Figure 3

The SetOrderedWindow class contains two methods to retrieve the data its instance contains. The getElements method returns a List of all the values contained in queue in the order of arrival. getElementsSorted method returns the values contained in sorter in sorted order in an array which allows efficient index based access to the values. getElementsSorted is the most commonly used method for data access as the array it returns can be used efficiently to derive any measure of central tendency for filtering.

Performance measurements

The performance of the SetOrderedWindow class has been measured for different window sizes from 500 to 2500 elements.

The performance of SetOrderedWindow is compared with the performance of sorting the data in queue using an ArrayList for every summary query. These test applications were compiled using JDK 8 and executed over Lubuntu 14.04 on a 1.6 GHz Intel Atom N270 processor with 1 GB RAM.

Figure 4 compares the performance when the windows are being filled from the beginning and they are not yet full. This means the elements are only being inserted in the queue objects. In the SetOrderedWindow, these elements are also being inserted in sorter to keep the data in sorted order. For each window size, the time measured is the time required to fill the entire window and for each entry calculating the median of the data in the window.

Figure 4

Figure 5 compares the performance when the windows are already full. Inserting an element in the windows requires removing the element at the head of queue objects before inserting the new element at the tail of queue objects. In the SetOrderedWindow, the element at the queue head is also removed from sorter and the element being inserted at the tail of queue is also inserted into sorter. For each window size, the time measured is the time required to replace the contents of the entire window and for each new entry calculating the median of the data in the window.

Figure 5

These results show that the sliding window filter using a set based sorter performs better than the filter that sorts window data for every query. Moreover, relative performance of the set based sliding window filter improves for larger window sizes.

Most applications will have window sizes less than 100 elements. The performance difference between the two approaches may not be significant for an individual insert and summary calculation. But as these applications are long running, any additional performance impact reduces the overall throughput. However, a sliding window implementation using a set based sorter consumes more memory to maintain the data it needs to process. Depending upon the specifics of the implementation, it can be twice as much memory. As long as this overhead is acceptable, such an implementation can improve overall application throughput.

Conclusions

Sliding window filters summarise data captured in a window sliding over the input data stream and the summarised value at the output is the filtered value. These filters are used for noise reduction and data smoothing in applications from diverse domains such as financial computations, signal processing and automatic systems control.

A common sliding window filter is the median filter where the filter calculates median over the window and uses that as the filtered output. Conventional median filter implementations sort the data in the window for each insert. As sorting is computationally expensive, this can reduce throughput of host systems.

An alternative implementation discussed here uses a set to keep a copy of the data in the sliding window in sorted order. It is shown here that maintenance of this set is computationally less expensive than resorting the window as it slides over the input data. This, however, can have the cost of doubling the memory footprint of the filter because the filter needs to maintain the values in the window in arrival and in sorted order.

Maintaining sorted data in the filter allows calculating a range of summaries efficiently. Furthermore, actual summary calculation can be decoupled from maintaining the window in arrival and sorted orders inside a preprocessor. Thus, the same preprocessor can then be used with filters that use measures of central tendency other than the median.

While this article discusses implementing a 1-D sliding window filter, an adaptation can be used to implement 2-D sliding window filters. These can be used to optimise image processing applications.

References

[1] Bashir, O. (1998), Management and Processing of Network Performance Information, PhD Thesis, Loughborough University

Notes:

More fields may be available via dynamicdata ..