Thursday, July 23, 2015

concurrent hashmap

http://javarevisited.blogspot.sg/2011/04/difference-between-concurrenthashmap.html

How to use ConcurrentHashMap in Java - Example Tutorial and Working

ConcurrentHashMap in Java is introduced as an alternative of Hashtable in Java 1.5 as part of Java concurrency package. Prior to Java 1.5 if you need a Map implementation, which can be safely used in a concurrent and multi-threaded Java program, than, you only havHashtable or synchronized Map because HashMap is not thread-safe. WithConcurrentHashMap, now you have better choice; because, not only it can be safely used in concurrent multi-threaded environment but also provides better performance over Hashtable and synchronizedMap. ConcurrentHashMap performs better than earlier two because it only locks a portion of Map, instead of whole Map, which is the case with Hashtable and synchronized Map. CHM allows concurred read operations and same time, maintains integrity by synchronizing write operations. We have seen basics of ConcurrentHashMap on Top 5 Java Concurrent Collections from JDK 5 and 6 and in this Java tutorial, we will learn :

Ø       How ConcurrentHashMap works in Java or how it is implemented in Java.
Ø       When to use ConcurrentHashMap in Java
Ø       ConcurrentHashMap examples in Java
Ø       And some important properties of CHM .

How ConcurrentHashMap is implemented in Java

ConcurrentHashMap is introduced as an alternative of Hashtable and provided all functions supported by Hashtable with additional feature called "concurrency level", which allows ConcurrentHashMap to partition Map. ConcurrentHashMap allows multiple readers to read concurrently without any blocking. This is achieved by partitioning Map into different parts based on concurrency level and locking only a portion of Map during updates. Default concurrency level is 16, and accordingly Map is divided into 16 part and each part is governed with different lock. This means, 16 thread can operate on Map simultaneously, until they are operating on different part of Map. This makes ConcurrentHashMap high performance despite keeping thread-safety intact.  Though, it comes with caveat. Since update operations like put()remove(),putAll() or clear() is not synchronizedconcurrent retrieval may not reflect most recent change on Map.

In case of putAll() or clear(), which operates on whole Map, concurrent read may reflect insertion and removal of only some entries. Another important point to remember is iteration over CHM, Iterator returned by keySet of ConcurrentHashMap are weekly consistent and they only reflect state of ConcurrentHashMap and certain point and may not reflect any recent change. Iterator of ConcurrentHashMap's keySetarea also fail-safe and doesn’t throw ConcurrentModificationExceptoin..

Default concurrency level is 16 and can be changed, by providing a number which make sense and work for you while creating ConcurrentHashMap. Since concurrency level is used for internal sizing and indicate number of concurrent update without contention, so, if you just have few writers or thread to update Map keeping it low is much better. ConcurrentHashMap also uses ReentrantLock to internally lock its segments.

ConcurrentHashMap putifAbsent example in Java

ConcurrentHashMap examples are similar to Hashtable examples, we have seen earlier,  but worth knowing is use of putIfAbsent() method. Many times we need to insert entry into Map, if its not present already, and we wrote following kind of code:

synchronized(map){
  if (map.get(key) == null){
      return map.put(key, value);
  } else{
      return map.get(key);
  }
}

Though this code will work fine in HashMap and Hashtable, This won't work in ConcurrentHashMap; because, during put operation whole map is not locked, and while one thread is putting value, other thread's get() call can still return null which result in one thread overriding value inserted by other thread. Ofcourse, you can wrap whole code in synchronized block and make it thread-safe but that will only make your code single threaded. ConcurrentHashMap provides putIfAbsent(key, value) which does same thing but atomically and thus eliminates above race condition.


When to use ConcurrentHashMap in Java

Java ConcurrentHashMap Example Tutorial and internal working
ConcurrentHashMap is best suited when you have multiple readers and few writers. If writers outnumber reader, or writer is equal to reader, than performance of ConcurrentHashMap effectively reduces to synchronized map or HashtablePerformance of CHM drops, because you got to lock all portion of Map, and effectively each reader will wait for another writer, operating on that portion of Map. ConcurrentHashMap is a good choice for caches, which can be initialized during application start up and later accessed my many request processing threads. As javadoc states, CHM is also good replacement of Hashtable and should be used whenever possible, keeping in mind, that CHM provides slightly weeker form of synchronization than Hashtable.


Summary

Now we know What is ConcurrentHashMap in Java and when to use ConcurrentHashMap, it’s time to know and revise some important points about CHM in Java.

1. ConcurrentHashMap allows concurrent read and thread-safe update operation.

2. During update operation, ConcurrentHashMap only lock a portion of Map instead of whole Map.

3. Concurrent update is achieved by internally dividing Map into small portion which is defined by concurrency level.

4. Choose concurrency level carefully as a significant higher number can be waste of time and space and lower number may introduce thread contention in case writers over number concurrency level.

5. All operations of ConcurrentHashMap are thread-safe.

6. Since ConcurrentHashMap implementation doesn't lock whole Map, there is chance of read overlapping with update operations like put() and remove(). In that case result returned by get() method will reflect most recently completed operation from there start.

7. Iterator returned by ConcurrentHashMap is weekly consistent, fail safe and never throw ConcurrentModificationException. In Java.

8. ConcurrentHashMap doesn't allow null as key or value.

9. You can use ConcurrentHashMap in place of Hashtable but with caution as CHM doesn't lock whole Map.

10. During putAll() and clear() operations, concurrent read may only reflect insertion or deletion of some entries.

That’s all on What is ConcurrentHashMap in Java and when to use it. We have also seen little bit about internal working of ConcurrentHashMap and how it achieves it’s thread-safety and better performance over Hashtable and synchronized Map. Use ConcurrentHashMap in Java program, when there will be more reader than writers and it’s a good choice for creating cache in Java as well.

Related Java Concurrency and Collection Tutorials from this blog

14 comments :

SARAL SAXENA said...
one more thing I want to share Javin...

interviewer was possibly expecting a simple answer, such as:
•if the whole map is synchronized for get/put operations, adding threads won't improve throughput because the bottleneck will be the synchronized blocks. You can then write a piece of code with a synchronizedMap that shows that adding threads does not help
•because the map uses several locks, and assuming you have more than one core on your machine, adding threads will improve throughput

The example below outputs the following:


Synchronized one thread: 30
Synchronized multiple threads: 96
Concurrent one thread: 219
Concurrent multiple threads: 142

So you can see that the synchronized version is more than 3 times slower under high contention (16 threads) whereas the concurrent version is almost twice as fast with multiple threads as with a single thread.

It is also interesting to note that the ConcurrentMap has a non-negligible overhead in a single threaded situation.

This is a very contrived example, with all the possible problems due to micro-benchmarking (first results should be discarded anyway). But it gives a hint at what happens.


public class Test1 {
static final int SIZE = 1000000;
static final int THREADS = 16;
static final ExecutorService executor = Executors.newFixedThreadPool(THREADS);

public static void main(String[] args) throws Exception{

for (int i = 0; i < 10; i++) {
System.out.println("Concurrent one thread");
addSingleThread(new ConcurrentHashMap ());
System.out.println("Concurrent multiple threads");
addMultipleThreads(new ConcurrentHashMap ());
System.out.println("Synchronized one thread");
addSingleThread(Collections.synchronizedMap(new HashMap ()));
System.out.println("Synchronized multiple threads");
addMultipleThreads(Collections.synchronizedMap(new HashMap ()));
}
executor.shutdown();
}

private static void addSingleThread(Map map) {
long start = System.nanoTime();
for (int i = 0; i < SIZE; i++) {
map.put(i, i);
}
System.out.println(map.size()); //use the result
long end = System.nanoTime();
System.out.println("time with single thread: " + (end - start) / 1000000);
}

private static void addMultipleThreads(final Map map) throws Exception {
List runnables = new ArrayList<> ();
for (int i = 0; i < THREADS; i++) {
final int start = i;
runnables.add(new Runnable() {

@Override
public void run() {
//Trying to have one runnable by bucket
for (int j = start; j < SIZE; j += THREADS) {
map.put(j, j);
}
}
});
}
List futures = new ArrayList<> ();
long start = System.nanoTime();
for (Runnable r : runnables) {
futures.add(executor.submit(r));
}
for (Future f : futures) {
f.get();
}
System.out.println(map.size()); //use the result
long end = System.nanoTime();
System.out.println("time with multiple threads: " + (end - start) / 1000000);
}
}


Javin @ Java Classloder Working said...
@Saral Saxena, great comment. Yes, you are right on single threaded environment, ConcurrentHashMap is tough slow than Hashtable and HashMap. On scalability front, it achieves using concurrency level by dividing Map into segment, with each segment having its own lock.
SARAL SAXENA said...
Thanks Javin,..As I have shown in the above example the two methods simply add 1 million entries to the map. the first one is run in the main thread, the second one is run with an executor service that has 16 threads. To try make to help the CHM, I count with a step of 16 (so one thread will put 0, 15, 31 etc, the second thread 1, 16, 32 etc) which should reduce lock contention and associate each thread with one lock (there are 16 locks, based on the hashcode of the key modulo 16). Note that the effect is more pronounced with more threads (with 100 threads, the sychronized version really suffers).
mattnguyen said...
In your code snippet, doesn't the "synchronized(map)" part actually lock the map? If you want your example to be relevant, shouldn't the if {} else {} be without the synchronized() block?
Javin @ String to int in Java said...
Hi @mattnguyen, if you don't synchronized then that code snippet will create data race, where one thread calling get() getting null, just before another thread putting value, which result in overriding that value. synchronized is required to get the desired functionality, i.e. only insert,if not present.
suresh said...
Did any one really faced questions like How ConcurrentHashMap works or internal implementation of ConcurrentHashMap in Java interview? is that a real question or hypothetical?
Arsh said...
8. ConcurrentHashMap doesn't allow null as key or value.!!
Then how the putIfAbsent will work, the example you gave..if(map.get(key)==null)
Anonymous said...
@suresh, yes I faced these questions in an interview recently
Anonymous said...
Does ConcurrentHashMap uses ReadWriteLock internally for allowing multiple thread to read and block only for writers? I am not sure, but thought this may be a good question here.
Gauri said...
@Anonymous, ConcurrentHashMap doesn't use ReadWriteLock, instead it uses multiple tables (arrays) known as Segment, each with there own lock. This concept is known as lock-stripping, this means, you don't need to lock whole map during write operation, only relevant segment gets locked. This is optimized for more reading than updating activity. Even a call to size() may perform poor because, in worst case a call to size() may lock all segment for calculating entries in each segment and then adding them up. As I said, ConcurrentHashMap is optimized for multiple readers with few writers.
Suman said...
Just want to high light one important limitation of ConcurrentHashMap, client side locking is not possible as there is no lock, which can guard whole map. Though this limitation is somewhat addressed by adding ConcurrentMap interface and providing atomic function for compound operations e.g. putIfAbsent(), remove() or replace(key, oldValue, newValue)
Anonymous said...
Don't use ConcurrentHashMap, instead create your own concurrent Map by using ReentrantReadWriteLock and high performance Map based on your need e.g. EnumMap, primitive maps from trove library.
Anonymous said...
"concurrent retrieval may not reflect most recent change on Map"

That's imprecise and misleading.

From the Javadoc:

"Retrievals reflect the results of the most recently completed update operations" (for put() and remove()).

"For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries"
rahil khan said...
on one hand, you have said that CHM has concurrent reads and THREAD SAFE writes, on the other hand you said "Since update operations like put(), remove(), putAll() or clear() is not synchronized".
if updates are not synchronized then how come it has thread safe writes?


Read more: http://javarevisited.blogspot.com/2013/02/concurrenthashmap-in-java-example-tutorial-working.html#ixzz3gj5h1tdv

No comments:

Post a Comment