Title: Indexing STL lists with maps

Author: Martin Moene

Date: 05 February 2003 21:10:07 +00:00 or Wed, 05 February 2003 21:10:07 +00:00

Summary:

Body:

A list will keep its elements in the same order that they were added, but searching for elements takes linear time (i.e. time proportional to the length of the list). Removing an element from a list is a constant-time operation if you have an iterator that refers to the element in question, but if you donâ€™t have such an iterator then the element must first be found and this will also take linear time.

In the case of a set, searching and removing is faster (logarithmic, and in the case of the SGI extension hash_set, amortized linear). However, sets donâ€™t remember the order that the elements were added in.

If you want to have an ordered list of elements, but you also want to be able to find given elements quickly, then you can combine a list with a map to index it. This article describes a brief template class called hashed_list, which uses the SGI extension hash_map and the STL class list; it could also use the STL class map with little modification. The template class described here does not support all STL semantics (for example, you canâ€™t get a nonÂconst iterator out of it); this article is meant to demonstrate the concept.

Note that hash_map will assume that all the elements are unique. Generalising this class to support non-unique elements is left as an exercise to the reader (besides using multimap, you need to think about the erase() method that is defined below).

The general idea

The objects are stored in a list, and also in a map. The map will map objects to list iterators. Whenever an object is added to the list, an iterator that refers to that object is taken and stored in the map under that object. Then when objects need to be counted or deleted, the map is used to quickly retrieve the list iterator, so that the list does not need to be searched linearly.

This relies on the fact that iterators in a list will remain valid even if other parts of the list are modified. The only thing that invalidates a list iterator is deleting the object that it points to. Hence it is acceptable to store list iterators for later reference.

The code

First, we include the two container classes that we will be using:

#include <list>
#include <hash_map>
using namespace std;

Now for the hashed_list class itself. Since hash_map is a template class with three arguments (the object type, the hash function, and the equality function), we need to support the same three arguments if weâ€™re making a general template:

template<class object, class hashFunc, class equal>
class hashed_list {

I like to start with some typedefs to save typing later. We will need iterators and const iterators to the list, and also a type for the map:

    typedef list<object>::iterator iterator;
    typedef list<object>::const_iterator const_iterator;
    typedef hash_map<object,iterator, hashFunc,equal> mapType;

And the data itself. For brevity Iâ€™ll be storing it directly and I wonâ€™t write an explicit constructor, so we donâ€™t have to worry about the allocation.

    list<object> theList;
    mapType theMap;

Now for some methods. Obtaining iterators is simply a matter of getting them from the list; we only allow const iterators because otherwise weâ€™d have to write lots more code to deal with what happens when the user changes things via the iterators. Similarly, obtaining a count of objects just involves getting it from the map (since hash_map does not allow duplicate keys, the result will be 0 or 1).

public:
    const_iterator begin() const {
        return theList.begin();
    }
    const_iterator end() {
        return theList.end();
    }
    mapType::size_type count(const object& k) const {
        return theMap.count(k);
    }

Now if we want to add an object to the list, we must also add it to the map. Since the listâ€™s insert() method returns an iterator that refers to the object we just added, we can give its return value to the map, so our add() method is one statement:

    void add(const object &o) {
        theMap[o] = theList.insert(theList.end(),o);
    }

To check that the â€œobjects must be uniqueâ€ constraint is not being violated, you might wish to add

assert(theMap.count(o)==0);

to the beginning of the above method (and #include <cassert>).

The following method will erase a given object. First, a map iterator is obtained; this is dereferenced to get the list iterator and erase it from the list, and then the map iterator is used to erase it from the map. That way, only one lookup operation is needed in the map (when the iterator is found); it is not necessary to look up the object twice (once to reference it, again to erase it).

    void erase(const object &o) {
        mapType::iterator i=theMap.find(o);
        theList.erase((*i).second);
        theMap.erase(i);
    }

For readability you could also write it like this, but it will be less efficient if the lookup takes longer (even in a hashtable it can take a while in the worst case):

void slower_erase(const object &o) {
    theList.erase(theMap[o]);
    theMap.erase(o);
}

Finally, finish the class:

};

To test it, you might want to write something like this:

#include <string.h>
struct strEqual {
    bool operator()(const char* s1, const char* s2) const {
        return (s1==s2 || !strcmp(s1,s2));
        // (although strcmp() might already have the s1==s2 check)
    }
};

typedef hashed_list<const char*, hash<const char*>, strEqual> TestType;

int main() {
    TestType t;
    t.add("one");
    t.add("two");
    t.add("three");
    cout << t.count("two") << endl;
                                    // should be 1
    t.erase("two"); cout << t.count("two") << endl;
                                    // should be 0
    copy(
        t.begin(), t.end(),
        ostream_iterator<const char*>(cout," "));
    cout << endl;
}

Conclusion

The above code will increase the speed of finding items in lists, particularly when they are long. This is at the expense of consuming more memory (because of the map) and making the maintenance of the list slightly slower (because the map needs to be maintained with it). If hash_map is being used then the overhead of maintaining the map is (in the amortized case) a constant for each maintenance operation, which may or may not be acceptable depending on the application and on what proportion of its operations are maintenance.

Silas S Brown
ssb22@cam.ac.uk

Notes:

More fields may be available via dynamicdata ..