The JavaTM Tutorial
Previous Page Lesson Contents Next Page Start of Tutorial > Start of Trail > Start of Lesson Search
Feedback Form

Trail: Collections
Lesson: Implementations

Set Implementations

The Set implementations are grouped into general-purpose and special-purpose implementations.

General-Purpose Set Implementations

There are three general purpose Set (in the API reference documentation) implementations: HashSet (in the API reference documentation), TreeSet (in the API reference documentation), and LinkedHashSet (in the API reference documentation). It's generally straightforward to decide which of these three to use. HashSet is much faster than TreeSet (constant time versus log time for most operations) but offers no ordering guarantees. If you need to use the operations in the SortedSet interface, or if value-ordered iteration is required, use TreeSet; otherwise, use HashSet. It's a fair bet that you'll end up using HashSet most of the time.

LinkedHashSet is in some sense intermediate between HashSet and TreeSet. Implemented as a hash table with a linked list running through it, it provides insertion-ordered iteration (least recently inserted to most recently) and runs nearly as fast as HashSet. The LinkedHashSet implementation spares its clients from the unspecified, generally chaotic ordering provided by HashSet, without incurring the increased cost associated with TreeSet.

One thing worth keeping in mind about HashSet is that iteration is linear in the sum of the number of entries and the number of buckets (the capacity). Thus, choosing an initial capacity thatĘs too high can waste both space and time. On the other hand, choosing an initial capacity that's too low wastes time by copying the data structure each time it's forced to increase its capacity. If you don't specify an initial capacity, the default is 16. In the past, there was some advantage to choosing a prime number as the initial capacity. This is no longer true. Internally, the capacity is always rounded up to a power of two. The initial capacity is specified by using the int constructor. The following line of code allocates a HashSet whose initial capacity is 64:

        Set<String> s= new HashSet<String>(64);
The HashSet class has one other tuning parameter, called the load factor. If you care deeply about the space consumption of your HashSet, read the HashSet documentation for more information. Otherwise, just accept the default; it's almost always the right thing to do.

If you accept the default load factor but want to specify an initial capacity, pick a number that's about twice the size to which you expect the set to grow. If your guess is way off, you may waste a bit of space, time, or both, but it's unlikely to be a big problem.

LinkedHashSet has the same tuning parameters as HashSet, but iteration time is not affected by capacity. TreeSet has no tuning parameters.

Special-Purpose Set Implementations

There are two special-purpose Set implementations: EnumSet (in the API reference documentation) and CopyOnWriteArraySet (in the API reference documentation).

EnumSet is a high-performance Set implementation for enum types. All of the members of an enum set must be of the same enum type. Internally, it is represented by a bit-vector, typically a single long. Enum sets support iteration over ranges of enum types. For example given the enum declaration for the days of the week, you can iterate over the weekdays. The EnumSet class provides a static factory that makes it easy:

    for (Day d : EnumSet.range(Day.MONDAY, Day.FRIDAY))
        System.out.println(d);
Enum sets also provide a rich, typesafe replacement for traditional bit flags:
    EnumSet.of(Style.BOLD, Style.ITALIC)
CopyOnWriteArraySet is a Set implementation backed by a copy-on-write array. All mutative operations (such as add, set, and remove) are implemented by making a new copy of the array; no locking is ever required. Even iteration may proceed concurrently with element insertion and deletion. Unlike most Set implementations, the add, remove, and contains methods require time proportional to the size of the set. This implementation is only appropriate for sets that are rarely modified but frequently iterated. It is well-suited to maintaining event-handler lists that must prevent duplicates.

Previous Page Lesson Contents Next Page Start of Tutorial > Start of Trail > Start of Lesson Search
Feedback Form

Copyright 1995-2005 Sun Microsystems, Inc. All rights reserved.