Java Collections Framework

Introduction

The concept of collections is pretty old in the world of programming languages. But the collections framework in Java added a special flavor to it, by providing a perfect abstraction to the functionality that allowed the developers to easily focus on the development tasks instead of wasting time on implementing stuff from first principles. Java language provides several different types of collections - each with its own special utilities. Each is meant for a particular use case. It is important for developers to understand these concepts very well so that they can appropriately use the features provided by the collections framework.
The collections framework is based out of the java.util package. It defines an interface java.util.Iterable which is extended by java.util.Collection. This forms the base class of the collections framework. There is another interface in the java.util package. It is the java.util.Map. Although this does not inherit from the Collection or Iterable interfaces, it is considered as a part of the collections framework. Because it provides many meaningful features that often go along with the other collections. This blog gives you an overview of the different types of collections that Java provides. Please note that most collections, unless specified are not thread safe. They attempt to provide Fail-Fast behavior on a conflict by throwing the Concurrent updates exception. But that is not guaranteed. The JDK documentation warns against designing systems based on that exception. Even among the concurrent collections, the bulk operations may not always be thread safe and consistent.
Now what was that? Before we proceed, it is important to understand these concepts of thread safe, fail-fast, atomic, consistent and bulk operations. In collections, mostly thread-safe refers to operations on a particular element in the collection. For example, adding a particular element into the collection, or removing one element, etc. Thread safe for a single operation implies that operations related to a single element are guaranteed to be thread safe. That is if two different threads try to add an element or remove an element from the collection, each is guaranteed to see a consistent state of the data. Note that this does not mean updates to the object in the collection. That may or may not be thread safe as per the implementation of the object itself. Atomic refers to performing several operations as if it is just one operation. For example updating an element in a Map typically involve several steps like locating the value from the key, removing that pair from the Map and adding a new pair with the new Value. For a ConcurrentMap, it is guaranteed that such operations will be atomic - thus no other thread will be able to preempt and corrupt the operation. Bulk operations refer to operations not limited to just one element in the collection. For example, deleting all the elements of the collection or adding a new chunk of elements from another collection, etc. These require several operations on record level. Collections framework does not guarantee atomicity of such bulk operations. If you have multiple threads performing bulk operations on the same collection, it is possible that you get garbage at the end.
The most important concept is that of Fail Fast. Given that a collection is not thread safe, and it gets into a situation where two threads try to update the same record at the same time. The collections framework is designed in a way that it tries to identify such a situation and throw an exception as early as possible - thus reducing damage. The JDK documentation does not make any guarantee about this behavior, but promises that the code does make this attempt.
  • Important Interfaces: To start with, we look at the 5 major interfaces that form the basis of the collections framework - Iterable, Collection, List, Set, Queue, Map.
  • Implementations of List: Java provides many useful implementations of List. ArrayList is the most popular. There are many others that are good for certain scenarios.
  • Implementations of Queue: Queues suggest access by two different entities. Multi-threading is quite common.
  • Implementations of Set: Sets enforce distinct elements. There are a lot of scenarios where this is helpful. We have many different implementations of the interface that suite different requirements.
  • Implementations of Map: Although Maps are not a part of the Collections interface. They are considered a part of the collections framework.