Java Resources

The Latest Java Topics

Why 12 Factor Application Patterns, Microservices and CloudFoundry Matter (Part 2)

Learn why 12 Factor Application Patterns, Microservices and CloudFoundry matter when trying to change the way your product is produced.

June 12, 2015

by Tim Spann

CORE

· 15,650 Views · 4 Likes

Spring Integration Tests with MongoDB Rulez

Spring integration tests allow you to test functionality against a running application. This article shows proper database set- and clean-up with MongoDB.

June 10, 2015

by Ralf Stuckert

· 21,464 Views · 2 Likes

Purpose of ThreadLocal in Java and When to Use ThreadLocal

ThreadLocal is a simple way to have per-thread data that cannot be accessed concurrently by other threads, without requiring great effort or design compromises.

June 7, 2015

by Santosh Singh

· 21,467 Views · 3 Likes

Top 80 Thread- Java Interview Questions and Answers (Part 2)

PART 1 > THREADS - Top 80 interview questions and answers (detailed explanation with programs) Question 61. class MyRunnable implements Runnable{ public void run(){ for(int i=0;i<3;i++){ System.out.println("i="+i+" ,ThreadName="+Thread.currentThread().getName()); } } } public class MyClass { public static void main(String...args){ MyRunnable runnable=new MyRunnable(); System.out.println("start main() method"); Thread thread1=new Thread(runnable); Thread thread2=new Thread(runnable); thread1.start(); thread2.start(); System.out.println("end main() method"); } } Answer. Thread behaviour is unpredictable because execution of Threads depends on Thread scheduler, start main() method will be the printed first, but after that we cannot guarantee the order of thread1, thread2 and main thread they might run simultaneously or sequentially, so order of end main() method will not be guaranteed. /*OUTPUT start main() method end main() method i=0 ,ThreadName=Thread-0 i=0 ,ThreadName=Thread-1 i=1 ,ThreadName=Thread-0 i=2 ,ThreadName=Thread-0 i=1 ,ThreadName=Thread-1 i=2 ,ThreadName=Thread-1 */ Question 62. class MyRunnable implements Runnable{ public void run(){ for(int i=0;i<3;i++){ System.out.println("i="+i+" ,ThreadName="+Thread.currentThread().getName()); } } } public class MyClass { public static void main(String...args) throws InterruptedException{ System.out.println("In main() method"); MyRunnable runnable=new MyRunnable(); Thread thread1=new Thread(runnable); Thread thread2=new Thread(runnable); thread1.start(); thread1.join(); thread2.start(); thread2.join(); System.out.println("end main() method"); } } Answer. We use join() methodto ensure all threads that started from main must end in order in which they started and also main should end in last. In other words join() method waited for this thread to die. /*OUTPUT In main() method i=0 ,ThreadName=Thread-0 i=1 ,ThreadName=Thread-0 i=2 ,ThreadName=Thread-0 i=0 ,ThreadName=Thread-1 i=1 ,ThreadName=Thread-1 i=2 ,ThreadName=Thread-1 end main() method */ Question 63. class MyRunnable implements Runnable { public void run() { try { while (!Thread.currentThread().isInterrupted()) { Thread.sleep(1000); System.out.println("x"); } } catch (InterruptedException e) { System.out.println(Thread.currentThread().getName() + " ENDED"); } } } public class MyClass { public static void main(String args[]) throws Exception { MyRunnable obj = new MyRunnable(); Thread t = new Thread(obj, "Thread-1"); t.start(); System.out.println("press enter"); System.in.read(); t.interrupt(); } } Answer. "press enter" will be printed first then thread1 will keep on printing x until enter is pressed, once enter is pressed "Thread-1 ENDED" will be printed. System.in.read() causes main thread to go from running to waiting state (thread waits for user input) /* OUTPUT press enter x x x x Thread-1 ENDED */ Question 64. class MyRunnable implements Runnable{ public void run(){ synchronized (this) { System.out.println("1 "); try { this.wait(); System.out.println("2 "); } catch (InterruptedException e) { e.printStackTrace(); } } } } public class MyClass { public static void main(String[] args) { MyRunnable myRunnable=new MyRunnable(); Thread thread1=new Thread(myRunnable,"Thread-1"); thread1.start(); } } Answer. Thread acquires lock on myRunnable object so 1 was printed but notify wasn't called so 2 will never be printed, this is called frozen process. Deadlock is formed, These type of deadlocksare called Frozen processes. /*OUTPUT 1 */ Question 65. import java.util.ArrayList; /* Producer is producing, Producer will allow consumer to * consume only when 10 products have been produced (i.e. when production is over). */ class Producer implements Runnable{ ArrayList sharedQueue; Producer(){ sharedQueue=new ArrayList(); } @Override public void run(){ synchronized (this) { for(int i=1;i<=3;i++){ //Producer will produce 10 products sharedQueue.add(i); System.out.println("Producer is still Producing, Produced : "+i); try{ Thread.sleep(1000); }catch(InterruptedException e){e.printStackTrace();} } System.out.println("Production is over, consumer can consume."); this.notify(); } } } class Consumer extends Thread{ Producer prod; Consumer(Producer obj){ prod=obj; } public void run(){ synchronized (this.prod) { System.out.println("Consumer waiting for production to get over."); try{ this.prod.wait(); }catch(InterruptedException e){e.printStackTrace();} } int productSize=this.prod.sharedQueue.size(); for(int i=0;i Q61- Q80

June 6, 2015

by Ankit Mittal

· 13,697 Views · 3 Likes

If You Do It Do It Right

This is a philosophical or ethical command. Very general. It is something like “fail fast”. The reason it came up to my mind is that I wanted to compile and release License3j using Java 8 and JavaDoc refused to compile during release build. This package is a simple license manager, which has some established user base who require that I keep up with the new versions of BouncyCastle. It itself being a cryptography package should not be outdated and programs are encouraged to use the latest version to avoid security issues. When I executed mvn release:prepare I got many errors: [ERROR] * [ERROR] ^ [ERROR] /Users/verhasp/github/License3j/src/main/java/License3j.java:132: error: unexpected end tag: [ERROR] * [ERROR] ^ [ERROR] /Users/verhasp/github/License3j/src/main/java/License3j.java:134: warning: no @param for args [ERROR] public static void main(String[] args) throws Exception { [ERROR] ^ [ERROR] /Users/verhasp/github/License3j/src/main/java/License3j.java:134: warning: no @throws for java.lang.Exception [ERROR] public static void main(String[] args) throws Exception { [ERROR] ^ [ERROR] /Users/verhasp/github/License3j/src/main/java/com/verhas/licensor/ExtendedLicense.java:73: warning: no @param for expiryDate [ERROR] public void setExpiry(final Date expiryDate) { [ERROR] ^ [ERROR] /Users/verhasp/github/License3j/src/main/java/com/verhas/licensor/License.java:196: warning: no description for @throws [ERROR] * @throws IOException [ERROR] ^ [ERROR] /Users/verhasp/github/License3j/src/main/java/com/verhas/licensor/License.java:246: warning: no description for @throws New JavaDoc Wants You DIR The errors are there because the java doc of License3j is a bit sloppy. Sorry guys, I created the code many years ago and honestly it is not only the java doc that could be improved. As a matter of fact one of the unit tests rely on network and the reachability of GitHub. (Not anymore though, I fixed that.) The new Java version 8 is very strict regarding to JavaDoc. As you can see on the “Enhancements in Javadoc, Java SE 8” page of ORACLE: The javadoc tool now has support for checking the content of javadoc comments for issues that could lead to various problems, such as invalid HTML or accessibility issues, in the files that are generated by javadoc. The feature is enabled by default, and can also be controlled by the new -Xdoclint option. For more details, see the output from running “javadoc -X”. This feature is also available in javac, although it is not enabled by default there. To get the release working I had the choice to fix the JavaDoc or to use the configuration org.apache.maven.plugins maven-javadoc-plugin 2.9 attach-javadocs jar -Xdoclint:none in pom.xml. (Source is stackoverflow.) But You Just Won’t DIR You can easily imagine that you will opt for the second option when you are under time pressure. You fix the issue modifying your pom.xml or other build configuration and forget about it. But you keep on thinking about why it is the way like that? Why is the new tool strict by default? Is it a good choice? Will it drive people to create better JavaDoc? (Just for now I assume that the aim of the new behavior was to drive programmers to create better JavaDoc documentation and not simply to annoy us.) I am a bit suspicious that this alone will be sufficient to improve documentation. Programmers will: Switch off the lint option. Delete JavaDoc from the source. Write some description that Java 8 will accept but is generally meaningless. or some of them will just write correct java doc. Some of them who were writing it well anyway and will be helped by the new strictness. How many of us? 1% or 2%? The others will just see it as a whip and try to avoid. We would need carrot instead. Hey, bunnies! Where is the carrot?

June 1, 2015

by Peter Verhas

CORE

· 5,886 Views

Top 80 Thread- Java Interview Questions and Answers (Part 1)

Question 1. What is Thread in java? Answer. Threads consumes CPU in best possible manner, hence enables multi processing. Multi threading reduces idle time of CPU which improves performance of application. Thread are light weight process. A thread class belongs to java.lang package. We can create multiple threads in java, even if we don’t create any Thread, one Thread at least do exist i.e. main thread. Multiple threads run parallely in java. Threads have their own stack. Advantage of Thread : Suppose one thread needs 10 minutes to get certain task, 10 threads used at a time could complete that task in 1 minute, because threads can run parallely. Question 2. What is difference between Process and Thread in java? Answer. One process can have multiple Threads, Thread are subdivision of Process. One or more Threads runs in the context of process. Threads can execute any part of process. And same part of process can be executed by multiple Threads. Processes have their own copy of the data segment of the parent process while Threads have direct access to the data segment of its process. Processes have their own address while Threads share the address space of the process that created it. Process creation needs whole lot of stuff to be done, we might need to copy whole parent process, but Thread can be easily created. Processes can easily communicate with child processes but interprocess communication is difficult. While, Threads can easily communicate with other threads of the same process using wait() and notify() methods. In process all threads share system resource like heap Memory etc. while Thread has its own stack. Any change made to process does not affect child processes, but any change made to thread can affect the behavior of the other threads of the process. Example to see where threads on are created on different processes and same process. Question 3. How to implement Threads in java? Answer. This is very basic threading question. Threads can be created in two ways i.e. by implementing java.lang.Runnable interface or extending java.lang.Thread class and then extending run method. Thread has its own variables and methods, it lives and dies on the heap. But a thread of execution is an individual process that has its own call stack. Thread are lightweight process in java. Thread creation by implementingjava.lang.Runnableinterface. We will create object of class which implements Runnable interface : MyRunnable runnable=new MyRunnable(); Thread thread=new Thread(runnable); 2) And then create Thread object by calling constructor and passing reference of Runnable interface i.e. runnable object : Thread thread=new Thread(runnable); Question 4 . Does Thread implements their own Stack, if yes how? (Important) Answer. Yes, Threads have their own stack. This is very interesting question, where interviewer tends to check your basic knowledge about how threads internally maintains their own stacks. I’ll be explaining you the concept by diagram. Question 5. We should implement Runnable interface or extend Thread class. What are differences between implementing Runnable and extending Thread? Answer. Well the answer is you must extend Thread only when you are looking to modify run() and other methods as well. If you are simply looking to modify only the run() method implementing Runnable is the best option (Runnable interface has only one abstract method i.e. run() ). Differences between implementing Runnable interface and extending Thread class - Multiple inheritance in not allowed in java : When we implement Runnable interface we can extend another class as well, but if we extend Thread class we cannot extend any other class because java does not allow multiple inheritance. So, same work is done by implementing Runnable and extending Thread but in case of implementing Runnable we are still left with option of extending some other class. So, it’s better to implement Runnable. Thread safety : When we implement Runnable interface, same object is shared amongst multiple threads, but when we extend Thread class each and every thread gets associated with new object. Inheritance (Implementing Runnable is lightweight operation) : When we extend Thread unnecessary all Thread class features are inherited, but when we implement Runnable interface no extra feature are inherited, as Runnable only consists only of one abstract method i.e. run() method. So, implementing Runnable is lightweight operation. Coding to interface : Even java recommends coding to interface. So, we must implement Runnable rather than extending thread. Also, Thread class implements Runnable interface. Don’t extend unless you wanna modify fundamental behaviour of class, Runnable interface has only one abstract method i.e. run() : We must extend Thread only when you are looking to modify run() and other methods as well. If you are simply looking to modify only the run() method implementing Runnable is the best option (Runnable interface has only one abstract method i.e. run() ). We must not extend Thread class unless we're looking to modify fundamental behaviour of Thread class. Flexibility in code when we implement Runnable : When we extend Thread first a fall all thread features are inherited and our class becomes direct subclass of Thread , so whatever action we are doing is in Thread class. But, when we implement Runnable we create a new thread and pass runnable object as parameter,we could pass runnable object to executorService & much more. So, we have more options when we implement Runnable and our code becomes more flexible. ExecutorService : If we implement Runnable, we can start multiple thread created on runnable object with ExecutorService (because we can start Runnable object with new threads), but not in the case when we extend Thread (because thread can be started only once). Question 6. How can you say Thread behaviour is unpredictable? (Important) Answer. The solution to question is quite simple, Thread behaviour is unpredictable because execution of Threads depends on Thread scheduler, thread scheduler may have different implementation on different platforms like windows, unix etc. Same threading program may produce different output in subsequent executions even on same platform. To achieve we are going to create 2 threads on same Runnable Object, create for loop in run() method and start both threads. There is no surety that which threads will complete first, both threads will enter anonymously in for loop. Question 7 . When threads are not lightweight process in java? Answer. Threads are lightweight process only if threads of same process are executing concurrently. But if threads of different processes are executing concurrently then threads are heavy weight process. Question 8. How can you ensure all threads that started from main must end in order in which they started and also main should end in last? (Important) Answer. Interviewers tend to know interviewees knowledge about Thread methods. So this is time to prove your point by answering correctly. We can use join() methodto ensure all threads that started from main must end in order in which they started and also main should end in last.In other words waits for this thread to die. Calling join() method internally calls join(0); DETAILED DESCRIPTION : Join() method - ensure all threads that started from main must end in order in which they started and also main should end in last. Types of join() method with programs- 10 salient features of join. Question 9.What is difference between starting thread with run() and start() method? (Important) Answer. This is quite interesting question, it might confuse you a bit and at time may make you think is there really any difference between starting thread with run() and start() method. When you call start() method, main thread internally calls run() method to start newly created Thread, so run() method is ultimately called by newly created thread. When you call run() method main thread rather than starting run() method with newly thread it start run() method by itself. Question 10. What is significance of using Volatile keyword? (Important) Answer. Java allows threads to access shared variables. As a rule, to ensure that shared variables are consistently updated, a thread should ensure that it has exclusive use of such variables by obtaining a lock that enforces mutual exclusion for those shared variables. If a field is declared volatile, in that case the Java memory model ensures that all threads see a consistent value for the variable. Few small questions> Q. Can we have volatile methods in java? No, volatile is only a keyword, can be used only with variables. Q. Can we have synchronized variable in java? No, synchronized can be used only with methods, i.e. in method declaration. Question 11. Differences between synchronized and volatile keyword in Java? (Important) Answer.Its very important question from interview perspective. Volatilecan be used as a keyword against the variable, we cannot use volatile against method declaration. volatile void method1(){} //it’s illegal, compilation error. While synchronization can be used in method declaration or we can create synchronization blocks (In both cases thread acquires lock on object’s monitor). Variables cannot be synchronized. Synchronized method: synchronized void method2(){} //legal Synchronized block: void method2(){ synchronized (this) { //code inside synchronized block. } } Synchronized variable (illegal): synchronized int i;//it’s illegal, compilatiomn error. Volatile does not acquire any lock on variable or object, but Synchronization acquires lock on method or block in which it is used. Volatile variables are not cached, but variables used inside synchronized method or block are cached. When volatile is used will never create deadlock in program, as volatile never obtains any kind of lock . But in case if synchronization is not done properly, we might end up creating dedlock in program. Synchronization may cost us performance issues, as one thread might be waiting for another thread to release lock on object. But volatile is never expensive in terms of performance. DETAILED DESCRIPTION : Differences between synchronized and volatile keyword in detail with programs. Question 12. Can you again start Thread? Answer.No, we cannot start Thread again, doing so will throw runtimeException java.lang.IllegalThreadStateException. The reason is once run() method is executed by Thread, it goes into dead state. Let’s take an example- Thinking of starting thread again and calling start() method on it (which internally is going to call run() method) for us is some what like asking dead man to wake up and run. As, after completing his life person goes to dead state. Question 13. What is race condition in multithreading and how can we solve it? (Important) Answer. This is very important question, this forms the core of multi threading, you should be able to explain about race condition in detail. When more than one thread try to access same resource without synchronization causes race condition. So we can solve race condition by using either synchronized block or synchronized method. When no two threads can access same resource at a time phenomenon is also called as mutual exclusion. Few sub questions> What if two threads try to read same resource without synchronization? When two threads try to read on same resource without synchronization, it’s never going to create any problem. What if two threads try to write to same resource without synchronization? When two threads try to write to same resource without synchronization, it’s going to create synchronization problems. Question 14. How threads communicate between each other? Answer. This is very must know question for all the interviewees, you will most probably face this question in almost every time you go for interview. Threads can communicate with each other by using wait(), notify() and notifyAll() methods. Question 15. Why wait(), notify() and notifyAll() are in Object class and not in Thread class? (Important) Answer. Every Object has a monitor, acquiring that monitors allow thread to hold lock on object. But Thread class does not have any monitors. wait(), notify() and notifyAll()are called on objects only >When wait() method is called on object by thread it waits for another thread on that object to release object monitor by calling notify() or notifyAll() method on that object. When notify() method is called on object by thread it notifies all the threads which are waiting for that object monitor that object monitor is available now. So, this shows that wait(), notify() and notifyAll() are called on objects only. Now, Straight forward question that comes to mind is how thread acquires object lock by acquiring object monitor? Let’s try to understand this basic concept in detail? Wait(), notify() and notifyAll() method being in Object class allows all the threads created on that object to communicate with other. . As multiple threads exists on same object. Only one thread can hold object monitor at a time. As a result thread can notify other threads of same object that lock is available now. But, thread having these methods does not make any sense because multiple threads exists on object its not other way around (i.e. multiple objects exists on thread). Now let’s discuss one hypothetical scenario, what will happen if Thread class contains wait(), notify() and notifyAll() methods? Having wait(), notify() and notifyAll() methods means Thread class also must have their monitor. Every thread having their monitor will create few problems - >Thread communication problem. >Synchronization on object won’t be possible- Because object has monitor, one object can have multiple threads and thread hold lock on object by holding object monitor. But if each thread will have monitor, we won’t have any way of achieving synchronization. >Inconsistency in state of object (because synchronization won't be possible). Question 16. Is it important to acquire object lock before calling wait(), notify() and notifyAll()? Answer.Yes, it’s mandatory to acquire object lock before calling these methods on object. As discussed above wait(), notify() and notifyAll() methods are always called from Synchronized block only, and as soon as thread enters synchronized block it acquires object lock (by holding object monitor). If we call these methods without acquiring object lock i.e. from outside synchronize block then java.lang. IllegalMonitorStateException is thrown at runtime. Wait() method needs to enclosed in try-catch block, because it throws compile time exception i.e. InterruptedException. Question 17. How can you solve consumer producer problem by using wait() and notify() method? (Important) Answer. Here come the time to answer very very important question from interview perspective. Interviewers tends to check how sound you are in threads inter communication. Because for solving this problem we got to use synchronization blocks, wait() and notify() method very cautiously. If you misplace synchronization block or any of the method, that may cause your program to go horribly wrong. So, before going into this question first i’ll recommend you to understand how to use synchronized blocks, wait() and notify() methods. Key points we need to ensure before programming : >Producer will produce total of 10 products and cannot produce more than 2 products at a time until products are being consumed by consumer. Example> when sharedQueue’s size is 2, wait for consumer to consume (consumer will consume by calling remove(0) method on sharedQueue and reduce sharedQueue’s size). As soon as size is less than 2, producer will start producing. >Consumer can consume only when there are some products to consume. Example> when sharedQueue’s size is 0, wait for producer to produce (producer will produce by calling add() method on sharedQueue and increase sharedQueue’s size). As soon as size is greater than 0, consumer will start consuming. Explanation of Logic > We will create sharedQueue that will be shared amongst Producer and Consumer. We will now start consumer and producer thread. Note: it does not matter order in which threads are started (because rest of code has taken care of synchronization and key points mentioned above) First we will start consumerThread > consumerThread.start(); consumerThread will enter run method and call consume() method. There it will check for sharedQueue’s size. -if size is equal to 0 that means producer hasn’t produced any product, wait for producer to produce by using below piece of code- synchronized (sharedQueue) { while (sharedQueue.size() == 0) { sharedQueue.wait(); } } -if size is greater than 0, consumer will start consuming by using below piece of code. synchronized (sharedQueue) { Thread.sleep((long)(Math.random() * 2000)); System.out.println("consumed : "+ sharedQueue.remove(0)); sharedQueue.notify(); } Than we will start producerThread > producerThread.start(); producerThread will enter run method and call produce() method. There it will check for sharedQueue’s size. -if size is equal to 2 (i.e. maximum number of products which sharedQueue can hold at a time), wait for consumer to consume by using below piece of code- synchronized (sharedQueue) { while (sharedQueue.size() == maxSize) { //maxsize is 2 sharedQueue.wait(); } } -if size is less than 2, producer will start producing by using below piece of code. synchronized (sharedQueue) { System.out.println("Produced : " + i); sharedQueue.add(i); Thread.sleep((long)(Math.random() * 1000)); sharedQueue.notify(); } DETAILED DESCRIPTION with program : Solve Consumer Producer problem by using wait() and notify() methods in multithreading. Question 18. How to solve Consumer Producer problem without using wait() and notify() methods, where consumer can consume only when production is over.? Answer. In this problem, producer will allow consumer to consume only when 10 products have been produced (i.e. when production is over). We will approach by keeping one boolean variable productionInProcess and initially setting it to true, and later when production will be over we will set it to false. Question 19. How can you solve consumer producer pattern by using BlockingQueue? (Important) Answer. Now it’s time to gear up to face question which is most probably going to be followed up by previous question i.e. after how to solve consumer producer problem using wait() and notify() method. Generally you might wonder why interviewer's are so much interested in asking about solving consumer producer problem using BlockingQueue, answer is they want to know how strong knowledge you have about java concurrent Api’s, this Api use consumer producer pattern in very optimized manner, BlockingQueue is designed is such a manner that it offer us the best performance. BlockingQueue is a interface and we will use its implementation class LinkedBlockingQueue. Key methods for solving consumer producer pattern are > put(i); //used by producer to put/produce in sharedQueue. take();//used by consumer to take/consume from sharedQueue. Question 20. What is deadlock in multithreading? Write a program to form DeadLock in multi threading and also how to solve DeadLock situation. What measures you should take to avoid deadlock? (Important) Answer. This is very important question from interview perspective. But, what makes this question important is it checks interviewees capability of creating and detecting deadlock. If you can write a code to form deadlock, than I am sure you must be well capable in solving that deadlock as well. If not, later on this post we will learn how to solve deadlock as well. First question comes to mind is, what is deadlock in multi threading program? Deadlock is a situation where two threads are waiting for each other to release lock holded by them on resources. But how deadlock could be formed : Thread-1 acquires lock on String.class and then calls sleep() method which gives Thread-2 the chance to execute immediately after Thread-1 has acquired lock on String.class and Thread-2 acquires lock on Object.class then calls sleep() method and now it waits for Thread-1 to release lock on String.class. Conclusion: Now, Thread-1 is waiting for Thread-2 to release lock on Object.class and Thread-2 is waiting for Thread-1 to release lock on String.class and deadlock is formed. //Code called by Thread-1 public void run() { synchronized (String.class) { Thread.sleep(100); synchronized (Object.class) { } } } //Code called by Thread-2 publicvoid run() { synchronized (Object.class) { Thread.sleep(100); synchronized (String.class) { } } } Here comes the important part, how above formed deadlock could be solved : Thread-1 acquires lock on String.class and then calls sleep() method which gives Thread-2 the chance to execute immediately after Thread-1 has acquired lock on String.class and Thread-2 tries to acquire lock on String.class but lock is holded by Thread-1. Meanwhile, Thread-1 completes successfully. As Thread-1 has completed successfully it releases lock on String.class, Thread-2 can now acquire lock on String.class and complete successfully without any deadlock formation. Conclusion: No deadlock is formed. //Code called by Thread-1 publicvoid run() { synchronized (String.class) { Thread.sleep(100); synchronized (Object.class) { } } } //Code called by Thread-2 publicvoid run() { synchronized (String.class) { Thread.sleep(100); synchronized (Object.class) { } } } Few important measures to avoid Deadlock > Lock specific member variables of class rather than locking whole class: We must try to lock specific member variables of class rather than locking whole class. Use join() method: If possible try touse join() method, although it may refrain us from taking full advantage of multithreading environment because threads will start and end sequentially, but it can be handy in avoiding deadlocks. If possible try avoid using nested synchronization blocks. Question 21. Have you ever generated thread dumps or analyzed Thread Dumps? (Important) Answer. Answering this questions will show your in depth knowledge of Threads. Every experienced must know how to generate Thread Dumps. VisualVM is most popular way to generate Thread Dump and is most widely used by developers. It’s important to understand usage of VisualVM for in depth knowledge of VisualVM. I’ll recommend every developer must understand this topic to become master in multi threading. It helps us in analyzing threads performance, thread states, CPU consumed by threads, garbage collection and much more. For detailed information see Generating and analyzing Thread Dumps using VisualVM - step by step detail to setup VisualVM with screenshots jstack is very easy way to generate Thread dump and is widely used by developers. I’ll recommend every developer must understand this topic to become master in multi threading. For creating Thread dumps we need not to download any jar or any extra software. For detailed information see Generating and analyzing Thread Dumps using JSATCK - step by step detail to setup JSTACK with screenshots. Question 22. What is life cycle of Thread, explain thread states? (Important) Answer. Thread states/ Thread life cycle is very basic question, before going deep into concepts we must understand Thread life cycle. Thread have following states > New Runnable Running Waiting/blocked/sleeping Terminated (Dead) Thread states/ Thread life cycle in diagram > Thread states in detail > New : When instance of thread is created using new operator it is in new state, but the start() method has not been invoked on the thread yet, thread is not eligible to run yet. Runnable : When start() method is called on thread it enters runnable state. Running : Thread scheduler selects thread to go fromrunnable to running state. In running state Thread starts executing by entering run() method. Waiting/blocked/sleeping : In this state a thread is not eligible to run. >Thread is still alive, but currently it’s not eligible to run. In other words. > How can Thread go from running to waiting state? By calling wait()method thread go from running to waiting state. In waiting state it will wait for other threads to release object monitor/lock. > How can Thread go from running to sleeping state? By calling sleep() methodthread go from running to sleeping state. In sleeping state it will wait for sleep time to get over. Terminated (Dead) : A thread is considered dead when its run() method completes. Question 23. Are you aware of preemptive scheduling and time slicing? Answer. In preemptive scheduling, the highest priority thread executes until it enters into the waiting or dead state. In time slicing, a thread executes for a certain predefined time and then enters runnable pool. Than thread can enter running state when selected by thread scheduler. Question 24. What are daemon threads? Answer.Daemon threads are low priority threads which runs intermittently in background for doing garbage collection. 12 Few salient features of daemon() threads> Thread scheduler schedules these threads only when CPU is idle. Daemon threads are service oriented threads, they serves all other threads. These threads are created before user threads are created and die after all other user threads dies. Priority of daemon threads is always 1 (i.e. MIN_PRIORITY). User created threads are non daemon threads. JVM can exit when only daemon threads exist in system. we can use isDaemon() method to check whether thread is daemon thread or not. we can use setDaemon(boolean on) method to make any user method a daemon thread. If setDaemon(boolean on) is called on thread after calling start() method than IllegalThreadStateException is thrown. You may like to see how daemon threads work, for that you can use VisualVM or jStack. I have provided Thread dumps over there which shows daemon threads which were intermittently running in background. Some of the daemon threads which intermittently run in background are > "RMI TCP Connection(3)-10.175.2.71" daemon"RMI TCP Connection(idle)" daemon"RMI Scheduler(0)" daemon"C2 CompilerThread1" daemon "GC task thread#0 (ParallelGC)" Question 25. Why suspend() and resume() methods are deprecated? Answer.Suspend() method is deadlock prone. If the target thread holds a lock on object when it is suspended, no thread can lock this object until the target thread is resumed. If the thread that would resume the target thread attempts to lock this monitor prior to calling resume, it results in deadlock formation. These deadlocksare generally called Frozen processes. Suspend() method puts thread from running to waiting state. And thread can go from waiting to runnable state only when resume() method is called on thread. It is deprecated method. Resume() method is only used with suspend() method that’s why it’s also deprecated method. Question 26. Why destroy() methods is deprecated? Answer. This question is again going to check your in depth knowledge of thread methods i.e. destroy() method is deadlock prone. If the target thread holds a lock on object when it is destroyed, no thread can lock this object (Deadlock formed are similar to deadlock formed when suspend() and resume() methods are used improperly). It results in deadlock formation. These deadlocksare generally called Frozen processes. Additionally you must know calling destroy() method on Threads throw runtimeException i.e. NoSuchMethodError. Destroy() method puts thread from running to dead state. Question 27. As stop() method is deprecated, How can we terminate or stop infinitely running thread in java? (Important) Answer. This is very interesting question where interviewees thread basics basic will be tested. Interviewers tend to know user’s knowledge about main thread’s and thread invoked by main thread. We will try to address the problem by creating new thread which will run infinitely until certain condition is satisfied and will be called by main Thread. Infinitely running thread can be stopped using boolean variable. Infinitely running thread can be stopped using interrupt() method. Let’s understand Why stop() method is deprecated : Stopping a thread with Thread.stop() causes it to release all of the monitors that it has locked. If any of the objects previously protected by these monitors were in an inconsistent state, the damaged objects become visible to other threads, which might lead to unpredictable behavior. Question 28. what is significance of yield() method, what state does it put thread in? yield() is a native method it’s implementation in java 6 has been changed as compared to its implementation java 5. As method is native it’s implementation is provided by JVM. In java 5, yield() method internally used to call sleep() method giving all the other threads of same or higher priority to execute before yielded thread by leaving allocated CPU for time gap of 15 millisec. But java 6, calling yield() method gives a hint to the thread scheduler that the current thread is willing to yield its current use of a processor. The thread scheduler is free to ignore this hint. So, sometimes even after using yield() method, you may not notice any difference in output. salient features of yield() method > Definition : yield() method when called on thread gives a hint to the thread scheduler that the current thread is willing to yield its current use of a processor.The thread scheduler is free to ignore this hint. Thread state : when yield() method is called on thread it goes from running to runnable state, not in waiting state. Thread is eligible to run but not running and could be picked by scheduler at anytime. Waiting time : yield() method stops thread for unpredictable time. Static method : yield()is a static method, hence calling Thread.yield() causes currently executing thread to yield. Native method : implementation of yield() method is provided by JVM. Let’s see definition of yield() method as given in java.lang.Thread - public static native void yield(); synchronized block : thread need not to to acquire object lock before calling yield()method i.e. yield() method can be called from outside synchronized block. Question 29.What is significance of sleep() method in detail, what statedoes it put thread in ? sleep() is a native method, it’s implementation is provided by JVM. 10 salient features of sleep() method > Definition : sleep() methods causes current thread to sleep for specified number of milliseconds (i.e. time passed in sleep method as parameter). Ex- Thread.sleep(10) causes currently executing thread to sleep for 10 millisec. Thread state : when sleep() is called on thread it goes from running to waiting state and can return to runnable state when sleep time is up. Exception : sleep() method must catch or throw compile time exception i.e. InterruptedException. Waiting time : sleep() method have got few options. sleep(long millis) - Causes the currently executing thread to sleep for the specified number of milliseconds public static native void sleep(long millis) throws InterruptedException; sleep(long millis, int nanos) - Causes the currently executing thread to sleep for the specified number of milliseconds plus the specified number of nanoseconds. public static native void sleep(long millis,int nanos) throws InterruptedException; static method : sleep()is a static method, causes the currently executing thread to sleep for the specified number of milliseconds. Belongs to which class :sleep() method belongs to java.lang.Thread class. synchronized block : thread need not to to acquire object lock before calling sleep()method i.e. sleep() method can be called from outside synchronized block. Question 30. Difference between wait() and sleep() ? (Important) Answer. Should be called from synchronized block :wait() method is always called from synchronized block i.e. wait() method needs to lock object monitor before object on which it is called. But sleep() method can be called from outside synchronized block i.e. sleep() method doesn’t need any object monitor. IllegalMonitorStateException : if wait() method is called without acquiring object lock than IllegalMonitorStateException is thrown at runtime, but sleep() methodnever throws such exception. Belongs to which class : wait() method belongs to java.lang.Object class but sleep() method belongs to java.lang.Thread class. Called on object or thread : wait() method is called on objects but sleep() method is called on Threads not objects. Thread state : when wait() method is called on object, thread that holded object’s monitor goes from running to waiting state and can return to runnable state only when notify() or notifyAll()method is called on that object. And later thread scheduler schedules that thread to go from from runnable to running state. when sleep() is called on thread it goes from running to waiting state and can return to runnable state when sleep time is up. When called from synchronized block :when wait() method is called thread leaves the object lock. But sleep()method when called from synchronized block or method thread doesn’t leaves object lock. Question 31. Differences and similarities between yield() and sleep()? Answer. Differences yield() and sleep() : Definition : yield() method when called on thread gives a hint to the thread scheduler that the current thread is willing to yield its current use of a processor.The thread scheduler is free to ignore this hint. sleep() methods causes current thread to sleep for specified number of milliseconds (i.e. time passed in sleep method as parameter). Ex- Thread.sleep(10) causes currently executing thread to sleep for 10 millisec. Thread state : when sleep() is called on thread it goes from running to waiting state and can return to runnable state when sleep time is up. when yield() method is called on thread it goes from running to runnable state, not in waiting state. Thread is eligible to run but not running and could be picked by scheduler at anytime. Exception : yield() method need not to catch or throw any exception. But sleep() method must catch or throw compile time exception i.e. InterruptedException. Waiting time : yield() method stops thread for unpredictable time, that depends on thread scheduler. But sleep() method have got few options. sleep(long millis) - Causes the currently executing thread to sleep for the specified number of milliseconds sleep(long millis, int nanos) - Causes the currently executing thread to sleep for the specified number of milliseconds plus the specified number of nanoseconds. similarity between yield() and sleep(): > yield() and sleep() method belongs to java.lang.Thread class. > yield() and sleep() method can be called from outside synchronized block. > yield() and sleep() method are called on Threads not objects. Question 32. Mention some guidelines to write thread safe code, most important point we must take care of in multithreading programs? Answer. In multithreading environment it’s important very important to write thread safe code, thread unsafe code can cause a major threat to your application. I have posted many articles regarding thread safety. So overall this will be revision of what we have learned so far i.e. writing thread safe healthy code and avoiding any kind of deadlocks. If method is exposed in multithreading environment and it’s not synchronized (thread unsafe) than it might lead us to race condition, we must try to use synchronized block and synchronized methods. Multiple threads may exist on same object but only one thread of that object can enter synchronized method at a time, though threads on different object can enter same method at same time. Even static variables are not thread safe, they are used in static methods and if static methods are not synchronized then thread on same or different object can enter method concurrently. Multiple threads may exist on same or different objects of class but only one thread can enter static synchronized method at a time, we must consider making static methods as synchronized. If possible, try to use volatile variables. If a field is declared volatile all threads see a consistent value for the variable. Volatile variables at times can be used as alternate to synchronized methods as well. Final variables are thread safe because once assigned some reference of object they cannot point to reference of other object. s is pointing to String object. public class MyClass { final String s=new String("a"); void method(){ s="b"; //compilation error, s cannot point to new reference. } } If final is holding some primitive value it cannot point to other value. public class MyClass { final inti=0; void method(){ i=0; //compilation error, i cannot point to new value. } } Usage of local variables : If possible try to use local variables, local variables are thread safe, because every thread has its own stack, i.e. every thread has its own local variables and its pushes all the local variables on stack. public class MyClass { void method(){ inti=0; //Local variable, is thread safe. } } Using thread safe collections : Rather than using ArrayList we must Vector and in place of using HashMap we must use ConcurrentHashMap or HashTable. We must use VisualVM or jstack to detect problems such as deadlocks and time taken by threads to complete in multi threading programs. Using ThreadLocal:ThreadLocal is a class which provides thread-local variables. Every thread has its own ThreadLocal value that makes ThreadLocal value threadsafe as well. Rather than StringBuffer try using immutable classes such as String. Any change to String produces new String. Question 33. How thread can enter waiting, sleeping and blocked state and how can they go to runnable state ? Answer. This is very prominently asked question in interview which will test your knowledge about thread states. And it’s very important for developers to have in depth knowledge of this thread state transition. I will try to explain this thread state transition by framing few sub questions. I hope reading sub questions will be quite interesting. > How can Thread go from running to waiting state ? By calling wait()method thread go from running to waiting state. In waiting state it will wait for other threads to release object monitor/lock. > How can Thread return from waiting to runnable state ? Once notify() or notifyAll()method is called object monitor/lock becomes available and thread can again return to runnable state. > How can Thread go from running to sleeping state ? By calling sleep() methodthread go from running to sleeping state. In sleeping state it will wait for sleep time to get over. > How can Thread return from sleeping to runnable state ? Once specified sleep time is up thread can again return to runnable state. Suspend() method can be used to put thread in waiting state and resume() method is the only way which could put thread in runnable state. Thread also may go from running to waiting state if it is waiting for some I/O operation to take place. Once input is available thread may return to running state. >When threads are in running state, yield()method can make thread to go in Runnable state. Question 34. Difference between notify() and notifyAll() methods, can you write a code to prove your point? Answer. Goodness. Theoretically you must have heard or you must be aware of differences between notify() and notifyAll().But have you created program to achieve it? If not let’s do it. First, I will like give you a brief description of what notify() and notifyAll() methods do. notify()- Wakes up a single thread that is waiting on this object's monitor. If any threads are waiting on this object, one of them is chosen to be awakened. The choice is random and occurs at the discretion of the implementation. A thread waits on an object's monitor by calling one of the wait methods. The awakened threads will not be able to proceed until the current thread relinquishes the lock on this object. public final native void notify(); notifyAll()- Wakes up all threads that are waiting on this object's monitor. A thread waits on an object's monitor by calling one of the wait methods. The awakened threads will not be able to proceed until the current thread relinquishes the lock on this object. public final native void notifyAll(); Now it’s time to write down a program to prove the point. Question 35. Does thread leaves object lock when sleep() method is called? Answer. When sleep() method is called Thread does not leaves object lock and goes from running to waiting state. Thread waits for sleep time to over and once sleep time is up it goes from waiting to runnable state. Question 36. Does thread leaves object lock when wait() method is called? Answer. When wait() method is called Thread leaves the object lock and goes from running to waiting state. Thread waits for other threads on same object to call notify() or notifyAll() and once any of notify() or notifyAll() is called it goes from waiting to runnable state and again acquires object lock. Question 37. What will happen if we don’t override run method? Answer. This question will test your basic knowledge how start and run methods work internally in Thread Api. When we call start() method on thread, it internally calls run() method with newly created thread. So, if we don’t override run() method newly created thread won’t be called and nothing will happen. class MyThread extends Thread { //don't override run() method } publicclass DontOverrideRun { publicstaticvoid main(String[] args) { System.out.println("main has started."); MyThread thread1=new MyThread(); thread1.start(); System.out.println("main has ended."); } } /*OUTPUT main has started. main has ended. */ As we saw in output, we didn’t override run() method that’s why on calling start() method nothing happened. Question 38. What will happen if we override start method? Answer. This question will again test your basic core java knowledge how overriding works at runtime, what what will be called at runtime and how start and run methods work internally in Thread Api. When we call start() method on thread, it internally calls run() method with newly created thread. So, if we override start() method, run() method will not be called until we write code for calling run() method. class MyThread extends Thread { @Override publicvoid run() { System.out.println("in run() method"); } @Override publicvoid start(){ System.out.println("In start() method"); } } publicclass OverrideStartMethod { publicstaticvoid main(String[] args) { System.out.println("main has started."); MyThread thread1=new MyThread(); thread1.start(); System.out.println("main has ended."); } } /*OUTPUT main has started. In start() method main has ended. */ If we note output. we have overridden start method and didn’t called run() method from it, so, run() method wasn’t call. Question 39. Can we acquire lock on class? What are ways in which you can acquire lock on class? Answer. Yes, we can acquire lock on class’s class object in 2 ways to acquire lock on class. Thread can acquire lock on class’s class object by- Entering synchronized block or Let’s say there is one class MyClass. Now we can create synchronization block, and parameter passed with synchronization tells which class has to be synchronized. In below code, we have synchronized MyClass synchronized (MyClass.class) { //thread has acquired lock on MyClass’s class object. } by entering static synchronized methods. public staticsynchronizedvoid method1() { //thread has acquired lock on MyRunnable’s class object. } As soon as thread entered Synchronization method, thread acquired lock on class’s class object. Thread will leave lock when it exits static synchronized method. Question 40. Difference between object lock and class lock? Answer. It is very important question from multithreading point of view. We must understand difference between object lock and class lock to answer interview, ocjp answers correctly. Object lock Class lock Thread can acquire object lock by- Entering synchronized block or by entering synchronized methods. Thread can acquire lock on class’s class object by- Entering synchronized block or by entering static synchronized methods. Multiple threads may exist on same object but only one thread of that object can enter synchronized method at a time. Threads on different object can enter same method at same time. Multiple threads may exist on same or different objects of class but only one thread can enter static synchronized method at a time. Multiple objects of class may exist and every object has it’s own lock. Multiple objects of class may exist but there is always one class’s class object lock available. First let’s acquire object lock by entering synchronized block. Example- Let’s say there is one class MyClassand we have created it’s object and reference to that object is myClass. Now we can create synchronization block, and parameter passed with synchronization tells which object has to be synchronized. In below code, we have synchronized object reference by myClass. MyClass myClass=newMyclass(); synchronized (myClass) { } As soon thread entered Synchronization block, thread acquired object lock on object referenced by myClass (by acquiring object’s monitor.) Thread will leave lock when it exits synchronized block. First let’s acquire lock on class’s class object by entering synchronized block. Example- Let’s say there is one class MyClass. Now we can create synchronization block, and parameter passed with synchronization tells which class has to be synchronized. In below code, we have synchronized MyClass synchronized (MyClass.class) { } As soon as thread entered Synchronization block, thread acquired MyClass’s class object. Thread will leave lock when it exits synchronized block. publicsynchronizedvoid method1() { } As soon as thread entered Synchronization method, thread acquired object lock. Thread will leave lock when it exits synchronized method. public staticsynchronizedvoid method1() {} As soon as thread entered static Synchronization method, thread acquired lock on class’s class object. Thread will leave lock when it exits synchronized method. Let’s me give you some tricky situation based question, Question 41. Suppose you have 2 threads (Thread-1 and Thread-2) on same object. Thread-1 is in synchronized method1(), can Thread-2 enter synchronized method2() at same time? Answer.No, here when Thread-1 is in synchronized method1() it must be holding lock on object’s monitor and will release lock on object’s monitor only when it exits synchronized method1(). So, Thread-2 will have to waitfor Thread-1 to release lock on object’s monitor so that it could enter synchronized method2(). Likewise, Thread-2 even cannot enter synchronized method1() which is being executed by Thread-1. Thread-2 will have to wait for Thread-1 to release lock on object’s monitor so that it could enter synchronized method1(). Now, let’s see a program to prove our point. Question 42. Suppose you have 2 threads (Thread-1 and Thread-2) on same object. Thread-1 is in static synchronized method1(), can Thread-2 enter static synchronized method2() at same time? Answer.No, here when Thread-1 is in static synchronized method1() it must be holding lock on class class’s object and will release lock on class’s classobject only when it exits static synchronized method1(). So, Thread-2 will have to wait for Thread-1 to release lock on class’s classobject so that it could enter static synchronized method2(). Likewise, Thread-2 even cannot enter static synchronized method1() which is being executed by Thread-1. Thread-2 will have to wait for Thread-1 to release lock on class’s classobject so that it could enter static synchronized method1(). Now, let’s see a program to prove our point. Question 43. Suppose you have 2 threads (Thread-1 and Thread-2) on same object. Thread-1 is in synchronized method1(), can Thread-2 enter static synchronized method2() at same time? Answer.Yes, here when Thread-1 is in synchronized method1() it must be holding lock on object’s monitor and Thread-2 can enter static synchronized method2() by acquiring lock on class’s class object. Now, let’s see a program to prove our point. Question 44. Suppose you have thread and it is in synchronized method and now can thread enter other synchronized method from that method? Answer.Yes, here when thread is in synchronized method it must be holding lock on object’s monitor and using that lock thread can enter other synchronized method. Now, let’s see a program to prove our point. Question 45. Suppose you have thread and it is in static synchronized method and now can thread enter other static synchronized method from that method? Answer. Yes, here when thread is in static synchronized method it must be holding lock on class’s class object and using that lock thread can enter other static synchronized method. Now, let’s see a program to prove our point. Question 46. Suppose you have thread and it is in static synchronized method and now can thread enter other non static synchronized method from that method? Answer.Yes, here when thread is in static synchronized method it must be holding lock on class’s class object and when it enters synchronized method it will hold lock on object’s monitor as well. So, now thread holds 2 locks (it’s also called nested synchronization)- >first one on class’s class object. >second one on object’s monitor (This lock will be released when thread exits non static method).Now, let’s see a program to prove our point. Question 47. Suppose you have thread and it is in synchronized method and now can thread enter other static synchronized method from that method? Answer.Yes, here when thread is in synchronized method it must be holding lock on object’s monitor and when it enters static synchronized method it will hold lock on class’s class object as well. So, now thread holds 2 locks (it’s also called nested synchronization)- >first one on object’s monitor. >second one on class’s class object.(This lock will be released when thread exits static method).Now, let’s see a program to prove our point. Question 48. Suppose you have 2 threads (Thread-1 on object1 and Thread-2 on object2). Thread-1 is in synchronized method1(), can Thread-2 enter synchronized method2() at same time? Answer.Yes, here when Thread-1 is in synchronized method1() it must be holding lock on object1’s monitor. Thread-2 will acquire lock on object2’s monitor and enter synchronized method2(). Likewise, Thread-2 even enter synchronized method1() as well which is being executed by Thread-1 (because threads are created on different objects). Now, let’s see a program to prove our point. Question 49. Suppose you have 2 threads (Thread-1 on object1 and Thread-2 on object2). Thread-1 is in static synchronized method1(), can Thread-2 enter static synchronized method2() at same time? Answer.No, it might confuse you a bit that threads are created on different objects. But, not to forgot that multiple objects may exist but there is always one class’s class object lock available. Here, when Thread-1 is in static synchronized method1() it must be holding lock on class class’s object and will release lock on class’s classobject only when it exits static synchronized method1(). So, Thread-2 will have to wait for Thread-1 to release lock on class’s classobject so that it could enter static synchronized method2(). Likewise, Thread-2 even cannot enter static synchronized method1() which is being executed by Thread-1. Thread-2 will have to wait for Thread-1 to release lock on class’s classobject so that it could enter static synchronized method1(). Now, let’s see a program to prove our point. Question 50. Difference between wait() and wait(long timeout), What are thread states when these method are called? Answer. wait() wait(long timeout) When wait() method is called on object, it causes causes the current thread to wait until another thread invokes the notify() or notifyAll() method for this object. wait(long timeout) - Causes the current thread to wait until either another thread invokes the notify() or notifyAll() methods for this object, or a specified timeout time has elapsed. When wait() is called on object - Thread enters from running to waiting state. It waits for some other thread to call notify so that it could enter runnable state. When wait(1000) is called on object - Thread enters from running to waiting state. Than even if notify() or notifyAll() is not called after timeout time has elapsed thread will go from waiting to runnable state. Question 51. How can you implement your own Thread Pool in java? Answer. What is ThreadPool? ThreadPool is a pool of threads which reuses a fixed number of threads to execute tasks. At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available. ThreadPool implementation internally uses LinkedBlockingQueue for adding and removing tasks. In this post i will be using LinkedBlockingQueue provide by java Api, you can refer this post for implementing ThreadPool using custom LinkedBlockingQueue. Need/Advantage of ThreadPool? Instead of creating new thread every time for executing tasks, we can create ThreadPool which reuses a fixed number of threads for executing tasks. As threads are reused, performance of our application improves drastically. How ThreadPool works? We will instantiate ThreadPool, in ThreadPool’s constructor nThreads number of threads are created and started. ThreadPool threadPool=new ThreadPool(2); Here 2 threads will be created and started in ThreadPool. Then, threads will enter run() method of ThreadPoolsThread class and will call take() method on taskQueue. If tasks are available thread will execute task by entering run() method of task (As tasks executed always implements Runnable). publicvoid run() { . . . while (true) { . . . Runnable runnable = taskQueue.take(); runnable.run(); . . . } . . . } Else waits for tasks to become available. When tasks are added? When execute() method of ThreadPool is called, it internally calls put() method on taskQueue to add tasks. taskQueue.put(task); Once tasks are available all waiting threads are notified that task is available. Question 52. What is significance of using ThreadLocal? Answer. This question will test your command in multi threading, can you really create some perfect multithreading application or not. ThreadLocal is a class which provides thread-local variables. What is ThreadLocal ? ThreadLocal is a class which provides thread-local variables. Every thread has its own ThreadLocal value that makes ThreadLocal value threadsafe as well. For how long Thread holds ThreadLocal value? Thread holds ThreadLocal value till it hasn’t entered dead state. Can one thread see other thread’s ThreadLocal value? No, thread can see only it’s ThreadLocal value. Are ThreadLocal variables thread safe. Why? Yes, ThreadLocal variables are thread safe. As every thread has its own ThreadLocal value and one thread can’t see other threads ThreadLocal value. Application of ThreadLocal? ThreadLocal are used by many web frameworks for maintaining some context (may be session or request) related value. In any single threaded application, same thread is assigned for every request made to same action, so ThreadLocal values will be available in next request as well. In multi threaded application, different thread is assigned for every request made to same action, so ThreadLocal values will be different for every request. When threads have started at different time they might like to store time at which they have started. So, thread’s start time can be stored in ThreadLocal. Creating ThreadLocal > private ThreadLocal threadLocal = new ThreadLocal(); We will create instance of ThreadLocal. ThreadLocal is a generic class, i will be using String to demonstrate threadLocal. All threads will see same instance of ThreadLocal, but a thread will be able to see value which was set by it only. How thread set value of ThreadLocal > threadLocal.set( new Date().toString()); Thread set value of ThreadLocal by calling set(“”) method on threadLocal. How thread get value of ThreadLocal > threadLocal.get() Thread get value of ThreadLocal by calling get() method on threadLocal. See here for detailed explanation of threadLocal. Question 53. What is busy spin? Answer. What is busy spin? When one thread loops continuously waiting for another thread to signal. Performance point of view - Busy spin is very bad from performance point of view, because one thread keeps on looping continuously ( and consumes CPU) waiting for another thread to signal. Solution to busy spin - We must use sleep() or wait() and notify() method. Using wait() is better option. Why using wait() and notify() is much better option to solve busy spin? Because in case when we use sleep() method, thread will wake up again and again after specified sleep time until boolean variable is true. But, in case of wait() thread will wake up only when when notified by calling notify() or notifyAll(), hence end up consuming CPU in best possible manner. Program - Consumer Producer problem with busy spin > Consumer thread continuously execute (busy spin) in while loop tillproductionInProcess is true. Once producer thread has ended it will make boolean variable productionInProcess false and busy spin will be over. while(productionInProcess){ System.out.println("BUSY SPIN - Consumer waiting for production to get over"); } Question 54. Can a constructor be synchronized? Answer. No, constructor cannot be synchronized. Because constructor is used for instantiating object, when we are in constructor object is under creation. So, until object is not instantiated it does not need any synchronization. Enclosing constructor in synchronized block will generate compilation error. Using synchronized in constructor definition will also show compilation error. COMPILATION ERROR = Illegal modifier for the constructor in type ConstructorSynchronizeTest; only public, protected & private are permitted Though we can use synchronized block inside constructor. Read More about : Constructor in java cannot be synchronized Question 55. Can you find whether thread holds lock on object or not? Answer. holdsLock(object) method can be used to find out whether current thread holds the lock on monitor of specified object. holdsLock(object) method returns true if the current thread holds the lock on monitor of specified object. Question 56. What do you mean by thread starvation? Answer. When thread does not enough CPU for its execution Thread starvation happens. Thread starvation may happen in following scenarios > Low priority threads gets less CPU (time for execution) as compared to high priority threads. Lower priority thread may starve away waiting to get enough CPU to perform calculations. In deadlock two threads waits for each other to release lock holded by them on resources. There both Threads starves away to get CPU. Thread might be waiting indefinitely for lock on object’s monitor (by calling wait() method), because no other thread is calling notify()/notifAll() method on object. In that case, Thread starves away to get CPU. Thread might be waiting indefinitely for lock on object’s monitor (by calling wait() method), but notify() may be repeatedly awakening some other threads. In that case also Thread starves away to get CPU. Question 57. What is addShutdownHook method in java? Answer. addShutdownHook method in java > addShutdownHook method registers a new virtual-machine shutdown hook. A shutdown hook is a initialized but unstarted thread. When JVM starts its shutdown it will start all registered shutdown hooks in some unspecified order and let them run concurrently. When JVM (Java virtual machine) shuts down > When the last non-daemon thread finishes, or when the System.exit is called. Once JVM’s shutdown has begunnew shutdown hook cannot be registered neither previously-registered hook can be de-registered. Any attempt made to do any of these operations causes an IllegalStateException. For more detail with program read : Threads addShutdownHook method in java Question 58. How you can handle uncaught runtime exception generated in run method? Answer. We can use setDefaultUncaughtExceptionHandler method which can handle uncaught unchecked(runtime) exception generated in run() method. What is setDefaultUncaughtExceptionHandler method? setDefaultUncaughtExceptionHandler method sets the default handler which is called when a thread terminates due to an uncaught unchecked(runtime) exception. setDefaultUncaughtExceptionHandler method features > setDefaultUncaughtExceptionHandler method sets the default handler which is called when a thread terminates due to an uncaught unchecked(runtime) exception. setDefaultUncaughtExceptionHandler is a static method method, so we can directly call Thread.setDefaultUncaughtExceptionHandler to set the default handler to handle uncaught unchecked(runtime) exception. It avoids abrupt termination of thread caused by uncaught runtime exceptions. Defining setDefaultUncaughtExceptionHandler method > Thread.setDefaultUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler(){ publicvoid uncaughtException(Thread thread, Throwable throwable) { System.out.println(thread.getName() + " has thrown " + throwable); } }); Question 59. What is ThreadGroup in java, What is default priority of newly created threadGroup, mention some important ThreadGroup methods ? Answer. When program starts JVM creates a ThreadGroup named main. Unless specified, all newly created threads become members of the main thread group. ThreadGroup is initialized with default priority of 10. ThreadGroup important methods > getName() name of ThreadGroup. activeGroupCount() count of active groups in ThreadGroup. activeCount() count of active threads in ThreadGroup. list() list() method has prints ThreadGroups information getMaxPriority() Method returns the maximum priority of ThreadGroup. setMaxPriority(int pri) Sets the maximum priority of ThreadGroup. Question 60. What are thread priorities? Answer. Thread Priority range is from 1 to 10. Where 1 is minimum priority and 10 is maximum priority. Thread class provides variables of final static int type for setting thread priority. /* The minimum priority that a thread can have. */ publicfinalstaticintMIN_PRIORITY= 1; /* The default priority that is assigned to a thread. */ publicfinalstaticintNORM_PRIORITY= 5; /* The maximum priority that a thread can have. */ publicfinalstaticintMAX_PRIORITY= 10; Thread with MAX_PRIORITY is likely to get more CPU as compared to low priority threads. But occasionally low priority thread might get more CPU. Because thread scheduler schedules thread on discretion of implementation and thread behaviour is totally unpredictable. Thread with MIN_PRIORITY is likely to get less CPU as compared to high priority threads. But occasionally high priority thread might less CPU. Because thread scheduler schedules thread on discretion of implementation and thread behaviour is totally unpredictable. setPriority()method is used for Changing the priority of thread. getPriority()method returns the thread’s priority.

May 29, 2015

by Ankit Mittal

· 338,426 Views · 38 Likes

FlatMap in Guava

This is a short post about a method I recently discovered in Guava. The Issue I had a situation at work where I was working with objects structured something like this: public class Outer { String outerId; List innerList; ....... } public class Inner { String innerId; Date timestamp; } public class Merged { String outerId; String innerId; Date timestamp; } My task was flatten a list Outer objects (along with the list of Inner objects) into a list of Merged objects. Since I’m working with Java 7, using streams is not an option. The First Solution Instead I turn to the FluentIterable class from Guava. My first instinct is to go with the FluentIterable.transform method (which is essentially a map function): List originalList = getListOfObjects(); Function> flattenFunction //Details left out for clarity //returns an Iterable of Lists! Iterable> mergedObjects = FluentIterable.from(originalList).tranform(flattenFunction); But I really want a single collection of Merged objects, not an iterable of lists! The missing ingredient here is a flatMap function. Since I’m not using Scala, Clojure or Java 8, I feel that I’m out of luck. A Better Solution I decide to take a closer look at the FluentIterable class and I discover the FluentIterable.transformAndConcat method. The transformAndConcat method applies a function to each element of the fluent iterable and appends the results into a single iterable instance. I have my flatMap function in Guava! Now my solution looks like this: List originalList = getListOfObjects(); Function> flattenFunction //Details left out for clarity Iterable mergedObjects = FluentIterable.from(originalList).transformAndConcat(flattenFunction); Conclusion While this is a very short post, it goes to show how useful the Guava library is and how functional programming concepts can make our code more concise.

May 26, 2015

by Bill Bejeck

· 9,973 Views · 1 Like

All About Java Modifier Keywords

I’ve been a Java programmer for a while now, however, recently someone asked me a question regarding one of Java modifier keywords and I had no clue what it was. This made it obvious to me that I needed to brush up on some Java that goes beyond actual coding and algorithms. After a few Google searches, I got bits and pieces on the topic, but never really the full story, so I’m using this post as a way to document the subject. This is a great interview question to test your computer science book-smarts. Modifiers in Java are keywords that you add to variables, classes, and methods in order to change their meaning. They can be broken into two groups: Access control modifiers Non-access modifiers Let’s first take a look at the access control modifiers and see some code examples on how to use them. Modifier Description public Visible to the world private Visible to the class protected Visible to the package and all subclasses So how do you use these three access control modifiers? Let’s take the following two classes. Please ignore how inefficient they may or may not be as that is besides the point for this tutorial. Create a file called project/mypackage/Person.java and add the following code: package mypackage; class Person { private String firstname; private String lastname; protected void setFirstname(String firstname) { this.firstname = firstname; } protected void setLastname(String lastname) { this.lastname = lastname; } protected String getFirstname() { return this.firstname; } protected String getLastname() { return this.lastname; } } The above Person class is going to have private variables and protected methods. This means that the variables will only be accessible from the class and the methods will only be accessible from the mypackage package. Next create a file called project/mypackage/Company.java and add the following code: package mypackage; import java.util.*; public class Company { private ArrayList people; public Company() { this.people = new ArrayList(); } public void addPerson(String firstname, String lastname) { Person p = new Person(); p.setFirstname(firstname); p.setLastname(lastname); this.people.add(p); } public void printPeople() { for(int i = 0; i < this.people.size(); i++) { System.out.println(this.people.get(i).getFirstname() + " " + this.people.get(i).getLastname()); } } } The above class is public, so it can be accessed from any future classes inside and outside of the package. It has a private variable that is only accessible from within the class, and it has a bunch of public methods. Because the Person class and Company class both share the same package, the Company class can access the Person class as well as all its methods. To complete the demonstration of the access control modifiers, let’s create a driver class in a new project/MainDriver.java file: import mypackage.*; public class MainDriver { public static void main(String[] args) { Company c = new Company(); c.addPerson("Nic", "Raboy"); c.printPeople(); Person p = new Person(); p.setFirstname("Maria"); p.setLastname("Campos"); } } Remember, because the Company class is public, we won’t have issues adding and printing people. However, because the Person class is protected, we’re going to get a compile time error since the MainDriver is not part of the mypackage package. Now let’s take a look at the available non-access modifiers and some example code on how to use them. Modifier Description static Used for creating class methods and variables final Used for finalizing implementations of classes, variables, and methods abstract Used for creating abstract methods and classes synchronized Used in threads and locks the method or variable so it can only be used by one thread at a time volatile Used in threads and keeps the variable in main memory rather than caching it locally in each thread So how do you use these five non-access modifiers? A good example of the static modifier is the following in Java: int max = Integer.MAX_VALUE int numeric = Integer.parseInt("1234"); Notice in the above example we make use of variables and methods in the Integer class without first instantiating it. This is because those particular methods and variables are static. The abstract modifier is a little different. You can create a class with methods, but they are essentially nothing more than definitions. You cannot add logic to them. For example: abstract class Shape { abstract int getArea(int width, int height); } Then inside a child class you would add code similar to this: class Rectangle extends Shape { int getArea(int width, int height) { return width * height; } } This brings us to the synchronized and volatile modifiers. Let’s take a look at a threading example where we try to access the same method from two different threads: import java.lang.*; public class ThreadExample { public static void main(String[] args) { Thread thread1 = new Thread(new Runnable() { public void run() { print("THREAD 1"); } }); Thread thread2 = new Thread(new Runnable() { public void run() { print("THREAD 2"); } }); thread1.start(); thread2.start(); } public static void print(String s) { for(int i = 0; i < 5; i++) { System.out.println(s + ": " + i); } } } Running the above code will result in output that is printed in a random order. It could be sequential, or not, it depends on the CPU. However, if we make use of the synchronized modifier, the first thread must complete before the second one can start printing. The print(String s) method will now look like this: public static synchronized void print(String s) { for(int i = 0; i < 5; i++) { System.out.println(s + ": " + i); } } Next let’s take a look at an example using the volatile modifier: import java.lang.*; public class ThreadExample { public static volatile boolean isActive; public static void main(String[] args) { isActive = true; Thread thread1 = new Thread(new Runnable() { public void run() { while(true) { if(isActive) { System.out.println("THREAD 1"); isActive = false; } } } }); Thread thread2 = new Thread(new Runnable() { public void run() { while(true) { if(!isActive) { System.out.println("THREAD 2"); try { Thread.sleep(100); } catch (Exception e) { } isActive = true; } } } }); thread1.start(); thread2.start(); } } Running the above code will print the thread number and alternate between them because our volatile variable is a status flag. This is because the flag is stored in main memory. If we remove the volatile keyword, the thread will only alternate one time because only a local reference is used and the two threads are essentially hidden from each other. Conclusion Java modifiers can be a bit tricky to understand and it is actually common for programmers to be unfamiliar with a lot of them. This is a great interview question to test your book knowledge too. If I’ve missed any or you think my explanations could be better, definitely share in the comments section.

May 15, 2015

by Nic Raboy

· 12,985 Views

HashMap Custom implementation in java

Contents of page : Custom HashMap > Entry Putting 5 key-value pairs in HashMap (step-by-step)> Methods used in custom HashMap > What will happen if map already contains mapping for key? Complexity calculation of put and get methods in HashMap > put method - worst Case complexity > put method - best Case complexity > get method - worst Case complexity > get method - best Case complexity > Summary of complexity of methods in HashMap > Custom HashMap > This is very important and trending topic. In this post i will be explaining HashMap custom implementation in lots of detail with diagrams which will help you in visualizing the HashMap implementation. I will be explaining how we will put and get key-value pair in HashMap by overriding- >equals method - helps in checking equality of entry objects. >hashCode method - helps in finding bucket’s index on which data will be stored. We will maintain bucket (ArrayList) which will store Entry (LinkedList). Entry We store key-value pair by usingEntry Entry contains K key, V value and Entrynext (i.e. next entry on that location of bucket). static class Entry { K key; V value; Entry next; public Entry(K key, V value, Entry next){ this.key = key; this.value = value; this.next = next; } } Putting 5 key-value pairs in custom HashMap (step-by-step)> I will explain you the whole concept of HashMap by putting 5 key-value pairs in HashMap. Initially, we have bucket of capacity=4. (all indexes of bucket i.e. 0,1,2,3 are pointing to null) Let’s put first key-value pair in HashMap- Key=21, value=12 newEntry Object will be formed like this > We will calculate hash by using our hash(K key) method - in this case it returns key/capacity= 21%4= 1. So, 1 will be the index of bucket on which newEntry object will be stored. We will go to 1stindex as it is pointing to null we will put our newEntry object there. At completion of this step, our HashMap will look like this- Let’s put second key-value pair in HashMap- Key=25, value=121 newEntry Object will be formed like this > We will calculate hash by using our hash(K key) method - in this case it returns key/capacity= 25%4= 1. So, 1 will be the index of bucket on which newEntry object will be stored. We will go to 1st index, it contains entry with key=21, we will compare two keys(i.e. compare 21 with 25 by using equals method), as two keys are different we check whether entry with key=21’s next is null or not, if next is null we will put our newEntry objecton next. At completion of this step our HashMap will look like this- Let’s put third key-value pair in HashMap- Key=30, value=151 newEntry Object will be formed like this > We will calculate hash by using our hash(K key) method - in this case it returns key/capacity= 30%4= 2. So, 2 will be the index of bucket on which newEntry object will be stored. We will go to 2nd index as it is pointing to null we will put our newEntry object there. At completion of this step, our HashMap will look like this- Let’s put fourth key-value pair in HashMap- Key=33, value=15 Entry Object will be formed like this > We will calculate hash by using our hash(K key) method - in this case it returns key/capacity= 33%4= 1, So, 1 will be the index of bucket on whichnewEntry object will be stored. We will go to 1st index - >it contains entry with key=21, we will compare two keys (i.e. compare 21 with 33 by using equals method, as two keys are different, proceed to next of entry with key=21 (proceed only if next is not null). >now, next contains entry with key=25, we will compare two keys (i.e. compare 25 with 33 by using equals method, as two keys are different, now next of entry with key=25 is pointing to null so we won’t proceed further, we will put our newEntry object on next. At completion of this step our HashMap will look like this- Let’s put fifth key-value pair in HashMap- Key=35, value=89 Repeat above mentioned steps. At completion of this step our HashMap will look like this- Must read: LinkedHashMap Custom implementation Methods used in custom HashMap > public void put(K newKey, V data) -Method allows you put key-value pair in HashMap -If the map already contains a mapping for the key, the old value is replaced. -provide complete functionality how to override equals method. -provide complete functionality how to override hashCode method. public V get(K key) Method returns value corresponding to key. public boolean remove(K deleteKey) Method removes key-value pair from HashMapCustom. public void display() -Method displays all key-value pairs present in HashMapCustom., -insertion order is not guaranteed, for maintaining insertion order refer LinkedHashMapCustom. private int hash(K key) -Method implements hashing functionality, which helps in finding the appropriate bucket location to store our data. -This is very important method, as performance of HashMapCustom is very much dependent on this method's implementation. What will happen if map already contains mapping for key? If the map already contains a mapping for the key, the old value is replaced. Complexity calculation of put and get methods in HashMap > put method - worst Case complexity > O(n). But how complexity is O(n)? Initially, let's say map is like this - And we have to insert newEntry Object with Key=25, value=121 We will calculate hash by using our hash(K key) method - in this case it returns key/capacity= 25%4= 1. So, 1 will be the index of bucket on which newEntry object will be stored. We will go to 1st index, it contains entry with key=21, we will compare two keys(i.e. compare 21 with 25 by using equals method), as two keys are different we check whether entry with key=21’s next is null or not, if next is null we will put our newEntry objecton next. At completion of this step our HashMap will look like this- Now let’s do complexity calculation - Earlier there was 1 element in HashMap and for putting newEntry Object we iterated on it. Hence complexity was O(n). Note: We may calculate complexity by adding more elements in HashMap as well, but to keep explanation simple i kept less elements in HashMap. put method - best Case complexity > O(1). But how complexity is O(n)? Let's say map is like this - And we have to insert newEntry Object with Key=30, value=151 We will calculate hash by using our hash(K key) method - in this case it returns key/capacity= 30%4= 2. So, 2 will be the index of bucket on which newEntry object will be stored. We will go to 2nd index as it is pointing to null we will put our newEntry object there. At completion of this step our HashMap will look like this- Now let’s do complexity calculation - Earlier there 2 elements in HashMap but we were able to put newEntry Object in first go. Hence complexity was O(1). get method - worst Case complexity > O(n). But how complexity is O(n)? Initially, let's say map is like this - And we have to get Entry Object with Key=25, value=121 We will calculate hash by using our hash(K key) method - in this case it returns key/capacity= 25%4= 1. So, 1 will be the index of bucket on which Entry object is stored. We will go to 1st index, it contains entry with key=21, we will compare two keys(i.e. compare 21 with 25 by using equals method), as two keys are different we check whether entry with key=21’s next is null or not, next is not null so we will repeat same process and ultimately will be able to get Entry object. Now let’s do complexity calculation - There were 2 elements in HashMap and for getting Entry Object we iterated on both of them. Hence complexity was O(n). Note: We may calculate complexity by using HashMap of larger size, but to keep explanation simple i kept less elements in HashMap. get method - best Case complexity > O(1). But how complexity is O(n)? Initially, let's say map is like this - And we have to get Entry Object with Key=30, value=151 We will calculate hash by using our hash(K key) method - in this case it returns key/capacity= 30%4= 2. So, 2 will be the index of bucket on which Entry object is stored. We will go to 2nd index and get Entry object. Now let’s do complexity calculation - There were 3 elements in HashMap but we were able to get Entry Object in first go. Hence complexity was O(1). Summary of complexity of methods in HashMap > Operation/ method Worst case Best case put(K key, V value) O(n) O(1) get(Object key) O(n) O(1) REFER: http://javamadesoeasy.com/2015/02/hashmap-custom-implementation.html HashMap Custom implementation - put, get, remove Employee object.

May 13, 2015

by Ankit Mittal

· 74,462 Views · 10 Likes

Groovy vs Java for Testing [video]

Video and slideshow for Trisha Gee's lecture on whether Groovy is better for testing than Java and why, along with other resources to learn more about testing.

May 10, 2015

by Trisha Gee

· 11,122 Views

Logging Stop-the-world Pauses in JVM

Different events can cause the JVM to pause all the application threads. Such pauses are called Stop-The-World (STW) pauses. The most common cause for an STW pause to be triggered is garbage collection (example in github) , but different JIT actions (example), biased lock revocation (example), certainJVMTI operations , and many more also require the application to be stopped. The points at which the application threads may be safely stopped are called, surprise, safepoints. This term is also often used to refer to all the STW pauses. It is more or less common that GC logs are enabled. However, this does not capture information on all the safepoints. To get it all, use these JVM options: 1.-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime If you are wondering about the naming explicitly referring to GC, don’t be alarmed – turning on these options logs all of the safepoints, not just garbage collection pauses. If you run a following example (source in github) with the flags specified above 01.public class FullGc { 02. private static final Collection

May 7, 2015

by Nikita Salnikov-Tarnovski

· 16,632 Views

Binding to Data Services with Spring Boot in Cloud Foundry

Written by Dave Syer on the Spring blog In this article we look at how to bind a Spring Boot application to data services (JDBC, NoSQL, messaging etc.) and the various sources of default and automatic behaviour in Cloud Foundry, providing some guidance about which ones to use and which ones will be active under what conditions. Spring Boot provides a lot of autoconfiguration and external binding features, some of which are relevant to Cloud Foundry, and many of which are not. Spring Cloud Connectors is a library that you can use in your application if you want to create your own components programmatically, but it doesn’t do anything “magical” by itself. And finally there is the Cloud Foundry java buildpack which has an “auto-reconfiguration” feature that tries to ease the burden of moving simple applications to the cloud. The key to correctly configuring middleware services, like JDBC or AMQP or Mongo, is to understand what each of these tools provides, how they influence each other at runtime, and and to switch parts of them on and off. The goal should be a smooth transition from local execution of an application on a developer’s desktop to a test environment in Cloud Foundry, and ultimately to production in Cloud Foundry (or otherwise) with no changes in source code or packaging, per the twelve-factor application guidelines. There is some simple source code accompanying this article. To use it you can clone the repository and import it into your favourite IDE. You will need to remove two dependencies from the complete project to get to the same point where we start discussing concrete code samples, namely spring-boot-starter-cloud-connectors and auto-reconfiguration. NOTE: The current co-ordinates for all the libraries being discussed are org.springframework.boot:spring-boot-*:1.2.3.RELEASE,org.springframework.boot:spring-cloud-*-connector:1.1.1.RELEASE,org.cloudfoundry:auto-reconfiguration:1.7.0.RELEASE. TIP: The source code in github includes a docker-compose.yml file (docs here). You can use that to create a local MySQL database if you don’t have one running already. You don’t actually need it to run most of the code below, but it might be useful to validate that it will actually work. Punchline for the Impatient If you want to skip the details, and all you need is a recipe for running locally with H2 and in the cloud with MySQL, then start here and read the rest later when you want to understand in more depth. (Similar options exist for other data services, like RabbitMQ, Redis, Mongo etc.) Your first and simplest option is to simply do nothing: do not define a DataSource at all but put H2 on the classpath. Spring Boot will create the H2 embedded DataSource for you when you run locally. The Cloud Foundry buildpack will detect a database service binding and create a DataSource for you when you run in the cloud. If you add Spring Cloud Connectors as well, your app will also work in other cloud platforms, as long as you include a connector. That might be good enough if you just want to get something working. If you want to run a serious application in production you might want to tweak some of the connection pool settings (e.g. the size of the pool, various timeouts, the important test on borrow flag). In that case the buildpack auto-reconfiguration DataSource will not meet your requirements and you need to choose an alternative, and there are a number of more or less sensible choices. The best choice is probably to create a DataSource explicitly using Spring Cloud Connectors, but guarded by the “cloud” profile: @Configuration @Profile("cloud") public class DataSourceConfiguration { @Bean public Cloud cloud() { return new CloudFactory().getCloud(); } @Bean @ConfigurationProperties(DataSourceProperties.PREFIX) public DataSource dataSource() { return cloud().getSingletonServiceConnector(DataSourceclass, null); } } You can use spring.datasource.* properties (e.g. in application.properties or a profile-specific version of that) to set the additional properties at runtime. The “cloud” profile is automatically activated for you by the buildpack. Now for the details. We need to build up a picture of what’s going on in your application at runtime, so we can learn from that how to make a sensible choice for configuring data services. Layers of Autoconfiguration Let’s take a a simple app with DataSource (similar considerations apply to RabbitMQ, Mongo, Redis): @SpringBootApplication public class CloudApplication { @Autowired private DataSource dataSource; public static void main(String[] args) { SpringApplication.run(CloudApplication.class, args); } } This is a complete application: the DataSource can be @Autowired because it is created for us by Spring Boot. The details of the DataSource (concrete class, JDBC driver, connection URL, etc.) depend on what is on the classpath. Let’s assume that the application uses Spring JDBC via the spring-boot-starter-jdbc (or spring-boot-starter-data-jpa), so it has aDataSource implementation available from Tomcat (even if it isn’t a web application), and this is what Spring Boot uses. Consider what happens when: Classpath contains H2 (only) in addition to the starters: the DataSource is the Tomcat high-performance pool from DataSourceAutoConfiguration and it connects to an in memory database “testdb”. Classpath contains H2 and MySQL: DataSource is still H2 (same as before) because we didn’t provide any additional configuration for MySQL and Spring Boot can’t guess the credentials for connecting. Add spring-boot-starter-cloud-connectors to the classpath: no change inDataSource because the Spring Cloud Connectors do not detect that they are running in a Cloud platform. The providers that come with the starter all look for specific environment variables, which they won’t find unless you set them, or run the app in Cloud Foundry, Heroku, etc. Run the application in “cloud” profile with spring.profiles.active=cloud: no change yet in the DataSource, but this is one of the things that the Java buildpack does when your application runs in Cloud Foundry. Run in “cloud” profile and provide some environment variables to simulate running in Cloud Foundry and binding to a MySQL service: VCAP_APPLICATION={"name":"application","instance_id":"FOO"} VCAP_SERVICES={"mysql":[{"name":"mysql","tags":["mysql"],"credentials":{"uri":"mysql://root:root@localhost/test"}]} (the “tags” provides a hint that we want to create a MySQL DataSource, the “uri” provides the location, and the “name” becomes a bean ID). The DataSource is now using MySQL with the credentials supplied by the VCAP_* environment variables. Spring Boot has some autoconfiguration for the Connectors, so if you looked at the beans in your application you would see a CloudFactory bean, and also the DataSource bean (with ID “mysql”). Theautoconfiguration is equivalent to adding @ServiceScan to your application configuration. It is only active if your application runs in the “cloud” profile, and only if there is no existing @Bean of type Cloud, and the configuration flagspring.cloud.enabled is not “false”. Add the “auto-reconfiguration” JAR from the Java buildpack (Maven co-ordinatesorg.cloudfoundry:auto-reconfiguration:1.7.0.RELEASE). You can add it as a local dependency to simulate running an application in Cloud Foundry, but it wouldn’t be normal to do this with a real application (this is just for experimenting with autoconfiguration). The auto-reconfiguration JAR now has everything it needs to create a DataSource, but it doesn’t (yet) because it detects that you already have a bean of type CloudFactory, one that was added by Spring Boot. Remove the explicit “cloud” profile. The profile will still be active when your app starts because the auto-reconfiguration JAR adds it back again. There is still no change to theDataSource because Spring Boot has created it for you via the @ServiceScan. Remove the spring-boot-starter-cloud-connectors dependency, so that Spring Boot backs off creating a CloudFactory. The auto-reconfiguration JAR actually has its own copy of Spring Cloud Connectors (all the classes with different package names) and it now uses them to create a DataSource (in a BeanFactoryPostProcessor). The Spring Boot autoconfigured DataSource is replaced with one that binds to MySQL via theVCAP_SERVICES. There is no control over pool properties, but it does still use the Tomcat pool if available (no support for Hikari or DBCP2). Remove the auto-reconfiguration JAR and the DataSource reverts to H2. TIP: use web and actuator starters with endpoints.health.sensitive=false to inspect the DataSource quickly through “/health”. You can also use the “/beans”, “/env” and “/autoconfig” endpoints to see what is going in in the autoconfigurations and why. NOTE: Running in Cloud Foundry or including auto-reconfiguration JAR in classpath locally both activate the “cloud” profile (for the same reason). The VCAP_* env vars are the thing that makes Spring Cloud and/or the auto-reconfiguration JAR create beans. NOTE: The URL in the VCAP_SERVICES is actually not a “jdbc” scheme, which should be mandatory for JDBC connections. This is, however, the format that Cloud Foundry normally presents it in because it works for nearly every language other than Java. Spring Cloud Connectors or the buildpack auto-reconfiguration, if they are creating a DataSource, will translate it into a jdbc:* URL for you. NOTE: The MySQL URL also contains user credentials and a database name which are valid for the Docker container created by the docker-compose.yml in the sample source code. If you have a local MySQL server with different credentials you could substitute those. TIP: If you use a local MySQL server and want to verify that it is connected, you can use the “/health” endpoint from the Spring Boot Actuator (included in the sample code already). Or you could create a schema-mysql.sql file in the root of the classpath and put a simple keep alive query in it (e.g. SELECT 1). Spring Boot will run that on startupso if the app starts successfully you have configured the database correctly. The auto-reconfiguration JAR is always on the classpath in Cloud Foundry (by default) but it backs off creating any DataSource if it finds a org.springframework.cloud.CloudFactorybean (which is provided by Spring Boot if the CloudAutoConfiguration is active). Thus the net effect of adding it to the classpath, if the Connectors are also present in a Spring Boot application, is only to enable the “cloud” profile. You can see it making the decision to skip auto-reconfiguration in the application logs on startup: 015-04-14 15:11:11.765 INFO 12727 --- [ main] urceCloudServiceBeanFactoryPostProcessor : Skipping auto-reconfiguring beans of type javax.sql.DataSource 2015-04-14 15:11:57.650 INFO 12727 --- [ main] ongoCloudServiceBeanFactoryPostProcessor : Skipping auto-reconfiguring beans of type org.springframework.data.mongodb.MongoDbFactory 2015-04-14 15:11:57.650 INFO 12727 --- [ main] bbitCloudServiceBeanFactoryPostProcessor : Skipping auto-reconfiguring beans of type org.springframework.amqp.rabbit.connection.ConnectionFactory 2015-04-14 15:11:57.651 INFO 12727 --- [ main] edisCloudServiceBeanFactoryPostProcessor : Skipping auto-reconfiguring beans of type org.springframework.data.redis.connection.RedisConnectionFactory ... etc. Create your own DataSource The last section walked through most of the important autoconfiguration features in the various libraries. If you want to take control yourself, one thing you could start with is to create your own instance of DataSource. You could do that, for instance, using aDataSourceBuilder which is a convenience class and comes as part of Spring Boot (it chooses an implementation based on the classpath): @SpringBootApplication public class CloudApplication { @Bean public DataSource dataSource() { return DataSourceBuilder.create().build(); } ... } The DataSource as we’ve defined it is useless because it doesn’t have a connection URL or any credentials, but that can easily be fixed. Let’s run this application as if it was in Cloud Foundry: with the VCAP_* environment variables and the auto-reconfiguration JAR but not Spring Cloud Connectors on the classpath and no explicit “cloud” profile. The buildpack activates the “cloud” profile, creates a DataSource and binds it to the VCAP_SERVICES. As already described briefly, it removes your DataSource completely and replaces it with a manually registered singleton (which doesn’t show up in the “/beans” endpoint in Spring Boot). Now add Spring Cloud Connectors back into the classpath the application and see what happens when you run it again. It actually fails on startup! What has happened? The@ServiceScan (from Connectors) goes and looks for bound services, and creates bean definitions for them. That’s a bit like the buildpack, but different because it doesn’t attempt to replace any existing bean definitions of the same type. So you get an autowiring error because there are 2 DataSources and no way to choose one to inject into your application in various places where one is needed. To fix that we are going to have to take control of the Cloud Connectors (or simply not use them). Using a CloudFactory to create a DataSource You can disable the Spring Boot autoconfiguration and the Java buildpack auto-reconfiguration by creating your own Cloud instance as a @Bean: @Bean public Cloud cloud() { return new CloudFactory().getCloud(); } @Bean @ConfigurationProperties(DataSourceProperties.PREFIX) public DataSource dataSource() { return cloud().getSingletonServiceConnector(DataSource.class, null); } Pros: The Connectors autoconfiguration in Spring Boot backed off so there is only oneDataSource. It can be tweaked using application.properties via spring.datasource.*properties, per the Spring Boot User Guide. Cons: It doesn’t work without VCAP_* environment variables (or some other cloud platform). It also relies on user remembering to ceate the Cloud as a @Bean in order to disable the autoconfiguration. Summary: we are still not in a comfortable place (an app that doesn’t run without some intricate wrangling of environment variables is not much use in practice). Dual Running: Local with H2, in the Cloud with MySQL There is a local configuration file option in Spring Cloud Connectors, so you don’t have to be in a real cloud platform to use them, but it’s awkward to set up despite being boiler plate, and you also have to somehow switch it off when you are in a real cloud platform. The last point there is really the important one because you end up needing a local file to run locally, but only running locally, and it can’t be packaged with the rest of the application code (for instance violates the twelve factor guidelines). So to move forward with our explicit @Bean definition it’s probably better to stick to mainstream Spring and Spring Boot features, e.g. using the “cloud” profile to guard the explicit creation of a DataSource: @Configuration @Profile("cloud") public class DataSourceConfiguration { @Bean public Cloud cloud() { return new CloudFactory().getCloud(); } @Bean @ConfigurationProperties(DataSourceProperties.PREFIX) public DataSource dataSource() { return cloud().getSingletonServiceConnector(DataSource.class, null); } } With this in place we have a solution that works smoothly both locally and in Cloud Foundry. Locally Spring Boot will create a DataSource with an H2 embedded database. In Cloud Foundry it will bind to a singleton service of type DataSource and switch off the autconfigured one from Spring Boot. It also has the benefit of working with any platform supported by Spring Cloud Connectors, so the same code will run on Heroku and Cloud Foundry, for instance. Because of the @ConfigurationProperties you can bind additional configuration to the DataSource to tweak connection pool properties and things like that if you need to in production. NOTE: We have been using MySQL as an example database server, but actually PostgreSQL is at least as compelling a choice if not more. When paired with H2 locally, for instance, you can put H2 into its “Postgres compatibility” mode and use the same SQL in both environments. Manually Creating a Local and a Cloud DataSource If you like creating DataSource beans, and you want to do it both locally and in the cloud, you could use 2 profiles (“cloud” and “local”), for example. But then you would have to find a way to activate the “local” profile by default when not in the cloud. There is already a way to do that built into Spring because there is always a default profile called “default” (by default). So this should work: @Configuration @Profile("default") // or "!cloud" public class LocalDataSourceConfiguration { @Bean @ConfigurationProperties(DataSourceProperties.PREFIX) public DataSource dataSource() { return DataSourceBuilder.create().build(); } } @Configuration @Profile("cloud") public class CloudDataSourceConfiguration { @Bean public Cloud cloud() { return new CloudFactory().getCloud(); } @Bean @ConfigurationProperties(DataSourceProperties.PREFIX) public DataSource dataSource() { return cloud().getSingletonServiceConnector(DataSource.class, null); } } The “default” DataSource is actually identical to the autoconfigured one in this simple example, so you wouldn’t do this unless you needed to, e.g. to create a custom concreteDataSource of a type not supported by Spring Boot. You might think it’s all getting a bit complicated, but in fact Spring Boot is not making it any harder, we are just dealing with the consequences of needing to control the DataSource construction in 2 environments. Using a Non-Embedded Database Locally If you don’t want to use H2 or any in-memory database locally, then you can’t really avoid having to configure it (Spring Boot can guess a lot from the URL, but it will need that at least). So at a minimum you need to set some spring.datasource.* properties (the URL for instance). That that isn’t hard to do, and you can easily set different values in different environments using additional profiles, but as soon as you do that you need to switch off the default values when you go into the cloud. To do that you could define thespring.datasource.* properties in a profile-specific file (or document in YAML) for the “default” profile, e.g. application-default.properties, and these will not be used in the “cloud” profile. A Purely Declarative Approach If you prefer not to write Java code, or don’t want to use Spring Cloud Connectors, you might want to try and use Spring Boot autoconfiguration and external properties (or YAML) files for everything. For example Spring Boot creates a DataSource for you if it finds the right stuff on the classpath, and it can be completely controlled through application.properties, including all the granular features on the DataSource that you need in production (like pool sizes and validation queries). So all you need is a way to discover the location and credentials for the service from the environment. The buildpack translates Cloud Foundry VCAP_*environment variables into usable property sources in the Spring Environment. Thus, for instance, a DataSource configuration might look like this: spring.datasource.url: ${cloud.services.mysql.connection.jdbcurl:jdbc:h2:mem:testdb} spring.datasource.username: ${cloud.services.mysql.connection.username:sa} spring.datasource.password: ${cloud.services.mysql.connection.password:} spring.datasource.testOnBorrow: true The “mysql” part of the property names is the service name in Cloud Foundry (so it is set by the user). And of course the same pattern applies to all kinds of services, not just a JDBCDataSource. Generally speaking it is good practice to use external configuration and in particular @ConfigurationProperties since they allow maximum flexibility, for instance to override using System properties or environment variables at runtime. Note: similar features are provided by Spring Boot, which provides vcap.services.*instead of cloud.services.*, so you actually end up with more than one way to do this. However, the JDBC urls are not available from the vcap.services.* properties (non-JDBC services work fine with tthe corresponding vcap.services.*credentials.url). One limitation of this approach is it doesn’t apply if the application needs to configure beans that are not provided by Spring Boot out of the box (e.g. if you need 2 DataSources), in which case you have to write Java code anyway, and may or may not choose to use properties files to parameterize it. Before you try this yourself, though, beware that actually it doesn’t work unless you also disable the buildpack auto-reconfiguration (and Spring Cloud Connectors if they are on the classpath). If you don’t do that, then they create a new DataSource for you and Spring Boot cannot bind it to your properties file. Thus even for this declarative approach, you end up needing an explicit @Bean definition, and you need this part of your “cloud” profile configuration: @Configuration @Profile("cloud") public class CloudDataSourceConfiguration { @Bean public Cloud cloud() { return new CloudFactory().getCloud(); } } This is purely to switch off the buildpack auto-reconfiguration (and the Spring Boot autoconfiguration, but that could have been disabled with a properties file entry). Mixed Declarative and Explicit Bean Definition You can also mix the two approaches: declare a single @Bean definition so that you control the construction of the object, but bind additional configuration to it using@ConfigurationProperties (and do the same locally and in Cloud Foundry). Example: @Configuration public class LocalDataSourceConfiguration { @Bean @ConfigurationProperties(DataSourceProperties.PREFIX) public DataSource dataSource() { return DataSourceBuilder.create().build(); } } (where the DataSourceBuilder would be replaced with whatever fancy logic you need for your use case). And the application.properties would be the same as above, with whatever additional properties you need for your production settings. A Third Way: Discover the Credentials and Bind Manually Another approach that lends itself to platform and environment independence is to declare explicit bean definitions for the @ConfigurationProperties beans that Spring Boot uses to bind its autoconfigured connectors. For instance, to set the default values for a DataSourceyou can declare a @Bean of type DataSourceProperties: @Bean @Primary public DataSourceProperties dataSourceProperties() { DataSourceProperties properties = new DataSourceProperties(); properties.setInitialize(false); return properties; } This sets a default value for the “initialize” flag, and allows other properties to be bound fromapplication.properties (or other external properties). Combine this with the Spring Cloud Connectors and you can control the binding of the credentials when a cloud service is detected: @Autowired(required="false") Cloud cloud; @Bean @Primary public DataSourceProperties dataSourceProperties() { DataSourceProperties properties = new DataSourceProperties(); properties.setInitialize(false); if (cloud != null) { List infos = cloud.getServiceInfos(RelationalServiceInfo.class); if (infos.size()==1) { RelationalServiceInfo info = (RelationalServiceInfo) infos.get(0); properties.setUrl(info.getJdbcUrl()); properties.setUsername(info.getUserName()); properties.setPassword(info.getPassword()); } } return properties; } and you still need to define the Cloud bean in the “cloud” profile. It ends up being quite a lot of code, and is quite unnecessary in this simple use case, but might be handy if you have more complicated bindings, or need to implement some logic to choose a DataSource at runtime. Spring Boot has similar *Properties beans for the other middleware you might commonly use (e.g. RabbitProperties, RedisProperties, MongoProperties). An instance of such a bean marked as @Primary is enough to reset the defaults for the autoconfigured connector. Deploying to Multiple Cloud Platforms So far, we have concentrated on Cloud Foundry as the only cloud platform in which to deploy the application. One of the nice features of Spring Cloud Connectors is that it supports other platforms, either out of the box or as extension points. Thespring-boot-starter-cloud-connectors even includes Heroku support. If you do nothing at all, and rely on the autoconfiguration (the lazy programmer’s approach), then your application will be deployable in all clouds where you have a connector on the classpath (i.e. Cloud Foundry and Heroku if you use the starter). If you take the explicit @Bean approach then you need to ensure that the “cloud” profile is active in the non-Cloud Foundry platforms, e.g. through an environment variable. And if you use the purely declarative approach (or any combination involving properties files) you need to activate the “cloud” profile and probably also another profile specific to your platform, so that the right properties files end up in theEnvironment at runtime. Summary of Autoconfiguration and Provided Behaviour Spring Boot provides DataSource (also RabbitMQ or Redis ConnectionFactory, Mongo etc.) if it finds all the right stuff on the classpath. Using the “spring-boot-starter-*” dependencies is sufficient to activate the behaviour. Spring Boot also provides an autowirable CloudFactory if it finds Spring Cloud Connectors on the classpath (but switches off only if it finds a @Bean of type Cloud). The CloudAutoConfiguration in Spring Boot also effectively adds a @CloudScan to your application, which you would want to switch off if you ever needed to create your ownDataSource (or similar). The Cloud Foundry Java buildpack detects a Spring Boot application and activates the “cloud” profile, unless it is already active. Adding the buildpack auto-reconfiguration JAR does the same thing if you want to try it locally. Through the auto-reconfiguration JAR, the buildpack also kicks in and creates aDataSource (ditto RabbitMQ, Redis, Mongo etc.) if it does not find a CloudFactory bean or a Cloud bean (amongst others). So including Spring Cloud Connectors in a Spring Boot application switches off this part of the “auto-reconfiguration” behaviour (the bean creation). Switching off the Spring Boot CloudAutoConfiguration is easy, but if you do that, you have to remember to switch off the buildpack auto-reconfiguration as well if you don’t want it. The only way to do that is to define a bean definition (can be of type Cloud orCloudFactory for instance). Spring Boot binds application.properties (and other sources of external properties) to@ConfigurationProperties beans, including but not limited to the ones that it autoconfigures. You can use this feature to tweak pool properties and other settings that need to be different in production environments. General Advice and Conclusion We have seen quite a few options and autoconfigurations in this short article, and we’ve only really used thee libraries (Spring Boot, Spring Cloud Connectors, and the Cloud Foundry buildpack auto-reconfiguration JAR) and one platform (Cloud Foundry), not counting local deployment. The buildpack features are really only useful for very simple applications because there is no flexibility to tune the connections in production. That said it is a nice thing to be able to do when prototyping. There are only three main approaches if you want to achieve the goal of deploying the same code locally and in the cloud, yet still being able to make necessary tweaks in production: Use Spring Cloud Connectors to explicitly create DataSource and other middleware connections and protect those @Beans with @Profile("cloud"). The approach always works, but leads to more code than you might need for many applications. Use the Spring Boot default autoconfiguration and declare the cloud bindings usingapplication.properties (or in YAML). To take full advantage you have to expliccitly switch off the buildpack auto-reconfiguration as well. Use Spring Cloud Connectors to discover the credentials, and bind them to the Spring Boot@ConfigurationProperties as default values if present. The three approaches are actually not incompatible, and can be mixed using@ConfigurationProperties to provide profile-specific overrides of default configuration (e.g. for setting up connection pools in a different way in a production environment). If you have a relatively simple Spring Boot application, the only way to choose between the approaches is probably personal taste. If you have a non-Spring Boot application then the explicit @Bean approach will win, and it may also win if you plan to deploy your application in more than one cloud platform (e.g. Heroku and Cloud Foundry). NOTE: This blog has been a journey of discovery (who knew there was so much to learn?). Thanks go to all those who helped with reviews and comments, in particularScott Frederick, who spotted most of the mistakes in the drafts and always had time to look at a new revision.

May 6, 2015

by Pieter Humphrey

· 27,065 Views · 2 Likes

Introduction to Probabilistic Data Structures

When processing large data sets, we often want to do some simple checks, such as number of unique items, most frequent items, and whether some items exist in the data set. The common approach is to use some kind of deterministic data structure like HashSet or Hashtable for such purposes. But when the data set we are dealing with becomes very large, such data structures are simply not feasible because the data is too big to fit in the memory. It becomes even more difficult for streaming applications which typically require data to be processed in one pass and perform incremental updates. Probabilistic data structures are a group of data structures that are extremely useful for big data and streaming applications. Generally speaking, these data structures use hash functions to randomize and compactly represent a set of items. Collisions are ignored but errors can be well-controlled under certain threshold. Comparing with error-free approaches, these algorithms use much less memory and have constant query time. They usually support union and intersection operations and therefore can be easily parallelized. This article will introduce three commonly used probabilistic data structures: Bloom filter, HyperLogLog, and Count-Min sketch. Membership Query - Bloom filter A Bloom filter is a bit array of m bits initialized to 0. To add an element, feed it to k hash functions to get k array position and set the bits at these positions to 1. To query an element, feed it to k hash functions to obtain k array positions. If any of the bits at these positions is 0, then the element is definitely not in the set. If the bits are all 1, then the element might be in the set. A Bloom filter with 1% false positive rate only requires 9.6 bits per element regardless of the size of the elements. For example, if we have inserted x, y, z into the bloom filter, with k=3 hash functions like the picture above. Each of these three elements has three bits each set to 1 in the bit array. When we look up for w in the set, because one of the bits is not set to 1, the bloom filter will tell us that it is not in the set. Bloom filter has the following properties: False positive is possible when the queried positions are already set to 1. But false negative is impossible. Query time is O(k). Union and intersection of bloom filters with same size and hash functions can be implemented with bitwise OR and AND operations. Cannot remove an element from the set. Bloom filter requires the following inputs: m: size of the bit array n: estimated insertion p: false positive probability The optimum number of hash functions k can be determined using the formula: Given false positive probability p and the estimated number of insertions n, the length of the bit array can be calculated as: The hash functions used for bloom filter should generally be faster than cryptographic hash algorithms with good distribution and collision resistance. Commonly used hash functions for bloom filter include Murmur hash, fnv series of hashes and Jenkins hashes. Murmur hash is the fastest among them. MurmurHash3 is used by Google Guava library's bloom filter implementation. Cardinality - HyperLogLog HyperLogLog is a streaming algorithm used for estimating the number of distinct elements (the cardinality) of very large data sets. HyperLogLog counter can count one billion distinct items with an accuracy of 2% using only 1.5 KB of memory. It is based on the bit pattern observation that for a stream of randomly distributed numbers, if there is a number x with the maximum of leading 0 bits k, the cardinality of the stream is very likely equal to 2^k. For each element si in the stream, hash function h(si) transforms si into string of random bits (0 or 1 with probability of 1/2): The probability P of the bit patterns: 0xxxx... → P = 1/2 01xxx... → P = 1/4 001xx... → P = 1/8 The intuition is that when we are seeing prefix 0k 1..., it's likely there are n ≥ 2k+1 different strings. By keeping track of prefixes 0k 1... that have appeared in the data stream, we can estimate the cardinality to be 2p, where p is the length of the largest prefix. Because the variance is very high when using single counter, in order to get a better estimation, data is split into m sub-streams using the first few bits of the hash. The counters are maintained by m registers each has memory space of multiple of 4 bytes. If the standard deviation for each sub-stream is σ, then the standard deviation for the averaged value is only σ/√m. This is called stochastic averaging. For instance for m=4, The elements are split into m stream using the first 2 bits (00, 01, 10, 11) which are then discarded. Each of the register stores the rest of the hash bits that contains the largest 0k 1 prefix. The values in the m registers are then averaged to obtain the cardinality estimate. HyperLogLog algorithm uses harmonic mean to normalize result. The algorithm also makes adjustment for small and very large values. The resulting error is equal to 1.04/√m. Each of the m registers uses at most log2log2 n + O(1) bits when cardinalities ≤ n need to be estimated. Union of two HyperLogLog counters can be calculated by first taking the maximum value of the two counters for each of the m registers, and then calculate the estimated cardinality. Frequency - Count-Min Sketch Count-Min sketch is a probabilistic sub-linear space streaming algorithm. It is somewhat similar to bloom filter. The main difference is that bloom filter represents a set as a bitmap, while Count-Min sketch represents a multi-set which keeps a frequency distribution summary. The basic data structure is a two dimensional d x w array of counters with d pairwise independent hash functions h1 ... hd of range w. Given parameters (ε,δ), set w = [e/ε], and d = [ln1/δ]. ε is the accuracy we want to have and δ is the certainty with which we reach the accuracy. The two dimensional array consists of wd counts. To increment the counts, calculate the hash positions with the d hash functions and update the counts at those positions. The estimate of the counts for an item is the minimum value of the counts at the array positions determined by the d hash functions. The space used by Count-Min sketch is the array of w*d counters. By choosing appropriate values for d and w, very small error and high probability can be achieved. Example of Count-Min sketch sizes for different error and probability combination: ε 1 - δ w d wd 0.1 0.9 28 3 84 0.1 0.99 28 5 140 0.1 0.999 28 7 196 0.01 0.9 272 3 816 0.01 0.99 272 5 1360 0.01 0.999 272 7 1940 0.001 0.999 2719 7 19033 Count-Min sketch has the following properties: Union can be performed by cell-wise ADD operation O(k) query time Better accuracy for higher frequency items (heavy hitters) Can only cause over-counting but not under-counting Count-Min sketch can be used for querying single item count or "heavy hitters" which can be obtained by keeping a heap structure of all the counts. Summary Probabilistic data structures have many applications in modern web and data applications where the data arrives in a streaming fashion and needs to be processed on the fly using limited memory. Bloom filter, HyperLogLog, and Count-Min sketch are the most commonly used probabilistic data structures. There are a lot of research on various streaming algorithms, synopsis data structures and optimization techniques that are worth investigating and studying. If you haven't tried these data structures, you will be amazed how powerful they can be once you start using them. It may be a little bit intimidating to understand the concept initially, but the implementation is actually quite simple. Google Guava has Bloom filter implementation using murmur hash. Clearspring's Java library stream-lib and Twitter's Scala library Algebird have implementation for all three data structures and other useful data structures that you can play with. I have included the links below. Links http://bigsnarf.wordpress.com/2013/02/08/probabilistic-data-structures-for-data-analytics/ http://en.wikipedia.org/wiki/Bloom_filter http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/40671.pdf http://research.neustar.biz/2012/10/25/sketch-of-the-day-hyperloglog-cornerstone-of-a-big-data-infrastructure/ http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf http://www.moneyscience.com/pg/blog/ThePracticalQuant/read/438348/realtime-analytics-hokusai-adds-a-temporal-component-to-countmin-sketch http://people.cs.umass.edu/~mcgregor/711S12/sketches1.pdf https://github.com/addthis/stream-lib https://github.com/twitter/algebird

April 30, 2015

by Yin Niu

· 35,559 Views · 6 Likes

Implementing Filter and Bakery Locks in Java

In order to understand how locks work, implementing custom locks is a good way. This post will show how to implement Filter and Bakery locks at Java (which are spin locks) and will compare their performances with Java's ReentrantLock. Filter and Bakery locks satisfies mutual exclusion and are starvation free algorithms also, Bakery lock is a first-come-first-served lock [1]. For performance testing, a counter value is incremented up to 10000000 with different lock types, different number of threads and different number of times. Test system configuration is: Intel Core I7 (has 8 cores – 4 of them are real), Ubuntu 14.04 LTS and Java 1.7.0_60. Filter lock has n-1 levels which maybe considered as “waiting rooms”. A thread must traverse this waiting rooms before acquiring the lock. There are two important properties for levels [2]: 1) At least one thread trying to enter level l succeeds. 2) If more than one thread is trying to enter level l, then at least one is blocked (i.e., continues to wait at that level). Filter lock is implemented as follows: /** * @author Furkan KAMACI */ public class Filter extends AbstractDummyLock implements Lock { /* Due to Java Memory Model, int[] not used for level and victim variables. Java programming language does not guarantee linearizability, or even sequential consistency, when reading or writing fields of shared objects [The Art of Multiprocessor Programming. Maurice Herlihy, Nir Shavit, 2008, pp.61.] */ private AtomicInteger[] level; private AtomicInteger[] victim; private int n; /** * Constructor for Filter lock * * @param n thread count */ public Filter(int n) { this.n = n; level = new AtomicInteger[n]; victim = new AtomicInteger[n]; for (int i = 0; i < n; i++) { level[i] = new AtomicInteger(); victim[i] = new AtomicInteger(); } } /** * Acquires the lock. */ @Override public void lock() { int me = ConcurrencyUtils.getCurrentThreadId(); for (int i = 1; i < n; i++) { level[me].set(i); victim[i].set(me); for (int k = 0; k < n; k++) { while ((k != me) && (level[k].get() >= i && victim[i].get() == me)) { //spin wait } } } } /** * Releases the lock. */ @Override public void unlock() { int me = ConcurrencyUtils.getCurrentThreadId(); level[me].set(0); } } Bakery lock algorithm maintains the first-come-first-served property by using a distributed version of the number-dispensing machines often found in bakeries: each thread takes a number in the doorway, and then waits until no thread with an earlier number is trying to enter it [3]. Bakery lock is implemented as follows: /** * @author Furkan KAMACI */ public class Bakery extends AbstractDummyLock implements Lock { /* Due to Java Memory Model, int[] not used for level and victim variables. Java programming language does not guarantee linearizability, or even sequential consistency, when reading or writing fields of shared objects [The Art of Multiprocessor Programming. Maurice Herlihy, Nir Shavit, 2008, pp.61.] */ private AtomicBoolean[] flag; private AtomicInteger[] label; private int n; /** * Constructor for Bakery lock * * @param n thread count */ public Bakery(int n) { this.n = n; flag = new AtomicBoolean[n]; label = new AtomicInteger[n]; for (int i = 0; i < n; i++) { flag[i] = new AtomicBoolean(); label[i] = new AtomicInteger(); } } /** * Acquires the lock. */ @Override public void lock() { int i = ConcurrencyUtils.getCurrentThreadId(); flag[i].set(true); label[i].set(findMaximumElement(label) + 1); for (int k = 0; k < n; k++) { while ((k != i) && flag[k].get() && ((label[k].get() < label[i].get()) || ((label[k].get() == label[i].get()) && k < i))) { //spin wait } } } /** * Releases the lock. */ @Override public void unlock() { flag[ConcurrencyUtils.getCurrentThreadId()].set(false); } /** * Finds maximum element within and {@link java.util.concurrent.atomic.AtomicInteger} array * * @param elementArray element array * @return maximum element */ private int findMaximumElement(AtomicInteger[] elementArray) { int maxValue = Integer.MIN_VALUE; for (AtomicInteger element : elementArray) { if (element.get() > maxValue) { maxValue = element.get(); } } return maxValue; } } For such kind of algorithms, it should be provided or used a thread id system which starts from 0 or 1 and increments one by one. Threads' names set appropriately for that purpose. It should also be considererd that: Java programming language does not guarantee linearizability, or even sequential consistency, when reading or writing fields of shared objects [4]. So, level and victim variables for Filter lock, flag and label variables for Bakery lock defined as atomic variables. For one, who wants to test effects of Java Memory Model can change that variables into int[] and boolean[] and run algorithm with more than 2 threads. Than, can see that algorithm will hang for either Filter or Bakery even threads are alive. To test algorithm performances, a custom counter class implemented which has a getAndIncrement method as follows: /** * gets and increments value up to a maximum number * * @return value before increment if it didn't exceed a defined maximum number. Otherwise returns maximum number. */ public long getAndIncrement() { long temp; lock.lock(); try { if (value >= maxNumber) { return value; } temp = value; value = temp + 1; } finally { lock.unlock(); } return temp; } There is a maximum number barrier to fairly test multiple application configurations. Consideration is that: there is a piece amount of work (incrementing a variable up to a desired number) and with different number of threads how fast you can finish it. So, for comparison, there should be a “job” equality. This approach also tests unnecessary work load with that piece of code: if (value >= maxNumber) { return value; } for multiple threads when it is compared an approach that calculating unit work performance of threads (i.e. does not putting a maximum barrier, iterating in a loop up to a maximum number and than dividing last value to thread number). This configuration used for performance comparison: Threads 1,2,3,4,5,6,7,8 Retry Count 20 Maximum Number 10000000 This is the chart of results which includes standard errors: First of all, when you run a block of code within Java several time, there is an internal optimization for codes. When algorithm is run multiple times and first output compared to second output this optimization's effect can be seen. First elapsed time mostly should be greater than second line because of that. For example: currentTry = 0, threadCount = 1, maxNumber = 10000000, lockType = FILTER, elapsedTime = 500 (ms) currentTry = 1, threadCount = 1, maxNumber = 10000000, lockType = FILTER, elapsedTime = 433 (ms) Conclusion: From the chart, it can bee seen that Bakery lock is faster than Filter Lock with a low standard error. Reason is Filter Lock's lock method. At Bakery Lock, as a faired approach threads runs one by one but at Filter Lock they computes with each other. Java's ReentrantLock has best when compared to others. On the other hand Filter Lock gets worse linearly but Bakery and ReentrantLock are not (Filter lock may have a linear graphic when it run with much more threads). More thread count does not mean less elapsed time. 2 threads maybe worse than 1 thread because of thread creating and locking/unlocking. When thread count starts to increase, elapsed time gets better for Bakery and ReentrantLock. However when thread count keep going to increase than it gets worse. Reason is real core number of the test computer which runs algorithms. Source code for implementing filter and bakery locks in Java can be downloaded from here: https://github.com/kamaci/filbak [1] The Art of Multiprocessor Programming. Maurice Herlihy, Nir Shavit, 2008, pp.31.-33. [2] The Art of Multiprocessor Programming. Maurice Herlihy, Nir Shavit, 2008, pp.28. [3] The Art of Multiprocessor Programming. Maurice Herlihy, Nir Shavit, 2008, pp.31. [4] The Art of Multiprocessor Programming. Maurice Herlihy, Nir Shavit, 2008, pp.61.

April 28, 2015

by Furkan Kamaci

· 9,185 Views · 2 Likes

uniVocity-parsers: A powerful CSV/TSV/Fixed-width file parser library for Java

uniVocity-parsers is an open-source project CSV/TSV/Fixed-width file parser library in Java, providing many capabilities to read/write files with simplified API, and powerful features as shown below. Unlike other libraries out there, uniVocity-parsers built its own architecture for parsing text files, which focuses on maximum performance and flexibility while making it easy to extend and build new parsers. Contents Overview Installation Features Overview Reading CSV/TSV/Fixed-width Files Writing CSV/TSV/Fixed-width Files Performance and Flexibility Design and Implementations 1. Overview I'm a Java developer working on a web-based system to evaluate telecommunication carriers' network and work out reports. In the system, the CSV format was heavily involved for the network-related data, such as real-time network status (online/offline) for the broadband subscribers, and real-time traffic for each subscriber. Generally the size of a single CSV file would exceed 1GB, with millions of rows included. And we were using the library JavaCSV as the CSV file parser. As growth in the capacity of carriers' network and the time duration our system monitors, the size of data in CSV increased so much. My team and I have to work out a solution to achieve better performance (even in seconds) in CSV files processing, and better extendability to provide much more customized functionality. We came across this library uniVocity-parsers as a final solution after a lot of testing and analysis, and we found it great. In addition of better performance and extendability, the library provides developers with simplified APIs, detailed documents & tutorials and commercial support for highly customized functionality. This project is hosted at Github with 62 stars & 8 forks (at the time of writing). Tremendous documents & tutorials are provided at here and here. You can find more examples and news here as well. In addition, the well-known open-source project Apache Camel integrates uniVocity-parsers for reading and writing CSV/TSV/Fixed-width files. Find more details here. 2. Installation I'm using version 1.5.1 , but refer to the official download page to see if there's a more recent version available. The project is also available in the maven central repository, so you can add this to your pom.xml: com.univocity univocity-parsers 1.5.1 3. Features Overview uniVocity-parsers provides a list of powerful features, which can fulfill all requirements you might have for processing tabular presentations of data. Check the following overview chart for the features: 4. Reading Tabular Presentations Data Read all rows of a csv CsvParser parser = new CsvParser(new CsvParserSettings()); List allRows = parser.parseAll(getReader("/examples/example.csv")); For full list of demos in reading features, refer to: https://github.com/uniVocity/univocity-parsers#reading-csv 5. Writing Tabular Presentations Data Write data in CSV format with just 2 lines of code: List rows = someMethodToCreateRows(); CsvWriter writer = new CsvWriter(outputWriter, new CsvWriterSettings()); writer.writeRowsAndClose(rows); For full list of demos in writing features, refer to: https://github.com/uniVocity/univocity-parsers/blob/master/README.md#writing 6. Performance and Flexibility Here is the performance comparison we tested for uniVocity-parsers and JavaCSV in our system: File size Duration for JavaCSV parsing Duration for uniVocity-parsers parsing 10MB, 145453 rows 1138ms 836ms 100MB, 809008 rows 23s 6s 434MB, 4499959 rows 91s 28s 1GB, 23803502 rows 245s 70s Here are some performance comparison tables for almost all CSV parsers libraries in existence. And you can find that uniVocity-parsers got significantly ahead of other libraries in performance. uniVocity-parsers achieved its purpose in performance and flexibility with the following mechanisms: Read input on separate thread (enable by invoking CsvParserSettings.setReadInputOnSeparateThread()) Concurrent row processor (refer to ConcurrentRowProcessor which implements RowProcessor) Extend ColumnProcessor to process columns with your own business logic Extend RowProcessor to read rows with your own business logic 7. Design and Implementations A bunch of processors in uniVocity-parsers are core modules, which are responsible for reading/writing data in rows and columns, and execute data conversions. Here is the diagram of processors: You can create your own processors easily by implementing the RowProcessor interface or extending the provided implementations. In the following example I simply used an anonymous class: CsvParserSettings settings = new CsvParserSettings(); settings.setRowProcessor(new RowProcessor() { /** * initialize whatever you need before processing the first row, with your own business logic **/ @Override public void processStarted(ParsingContext context) { System.out.println("Started to process rows of data."); } /** * process the row with your own business logic **/ StringBuilder stringBuilder = new StringBuilder(); @Override public void rowProcessed(String[] row, ParsingContext context) { System.out.println("The row in line #" + context.currentLine() + ": "); for (String col : row) { stringBuilder.append(col).append("\t"); } } /** * After all rows were processed, perform any cleanup you need **/ @Override public void processEnded(ParsingContext context) { System.out.println("Finished processing rows of data."); System.out.println(stringBuilder); } }); CsvParser parser = new CsvParser(settings); List allRows = parser.parseAll(new FileReader("/myFile.csv")); The library offers a whole lot more features. I recommend you to have a look as it really made a difference in our project.

April 27, 2015

by Jerry Joe

· 9,309 Views

Using Apache Kafka for Integration and Data Processing Pipelines with Spring

written by josh long on the spring blog applications generated more and more data than ever before and a huge part of the challenge - before it can even be analyzed - is accommodating the load in the first place. apache’s kafka meets this challenge. it was originally designed by linkedin and subsequently open-sourced in 2011. the project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. the design is heavily influenced by transaction logs. it is a messaging system, similar to traditional messaging systems like rabbitmq, activemq, mqseries, but it’s ideal for log aggregation, persistent messaging, fast (_hundreds_ of megabytes per second!) reads and writes, and can accommodate numerous clients. naturally, this makes it perfect for cloud-scale architectures! kafka powers many large production systems . linkedin uses it for activity data and operational metrics to power the linkedin news feed, and linkedin today, as well as offline analytics going into hadoop. twitter uses it as part of their stream-processing infrastructure. kafka powers online-to-online and online-to-offline messaging at foursquare. it is used to integrate foursquare monitoring and production systems with hadoop-based offline infrastructures. square uses kafka as a bus to move all system events through square’s various data centers. this includes metrics, logs, custom events, and so on. on the consumer side, it outputs into splunk, graphite, or esper-like real-time alerting. netflix uses it for 300-600bn messages per day. it’s also used by airbnb, mozilla, goldman sachs, tumblr, yahoo, paypal, coursera, urban airship, hotels.com, and a seemingly endless list of other big-web stars. clearly, it’s earning its keep in some powerful systems! installing apache kafka there are many different ways to get apache kafka installed. if you’re on osx, and you’re using homebrew, it can be as simple as brew install kafka . you can also download the latest distribution from apache . i downloaded kafka_2.10-0.8.2.1.tgz , unzipped it, and then within you’ll find there’s a distribution of apache zookeeper as well as kafka, so nothing else is required. i installed apache kafka in my $home directory, under another directory, bin , then i created an environment variable, kafka_home , that points to $home/bin/kafka . start apache zookeeper first, specifying where the configuration properties file it requires is: $kafka_home/bin/zookeeper-server-start.sh $kafka_home/config/zookeeper.properties the apache kafka distribution comes with default configuration files for both zookeeper and kafka, which makes getting started easy. you will in more advanced use cases need to customize these files. then start apache kafka. it too requires a configuration file, like this: $kafka_home/bin/kafka-server-start.sh $kafka_home/config/server.properties the server.properties file contains, among other things, default values for where to connect to apache zookeeper ( zookeeper.connect ), how much data should be sent across sockets, how many partitions there are by default, and the broker id ( broker.id - which must be unique across a cluster). there are other scripts in the same directory that can be used to send and receive dummy data, very handy in establishing that everything’s up and running! now that apache kafka is up and running, let’s look at working with apache kafka from our application. some high level concepts.. a kafka broker cluster consists of one or more servers where each may have one or more broker processes running. apache kafka is designed to be highly available; there are no master nodes. all nodes are interchangeable. data is replicated from one node to another to ensure that it is still available in the event of a failure. in kafka, a topic is a category, similar to a jms destination or both an amqp exchange and queue. topics are partitioned, and the choice of which of a topic’s partition a message should be sent to is made by the message producer. each message in the partition is assigned a unique sequenced id, its offset . more partitions allow greater parallelism for consumption, but this will also result in more files across the brokers. producers send messages to apache kafka broker topics and specify the partition to use for every message they produce. message production may be synchronous or asynchronous. producers also specify what sort of replication guarantees they want. consumers listen for messages on topics and process the feed of published messages. as you’d expect if you’ve used other messaging systems, this is usually (and usefully!) asynchronous. like spring xd and numerous other distributed system, apache kafka uses apache zookeeper to coordinate cluster information. apache zookeeper provides a shared hierarchical namespace (called znodes ) that nodes can share to understand cluster topology and availability (yet another reason that spring cloud has forthcoming support for it..). zookeeper is very present in your interactions with apache kafka. apache kafka has, for example, two different apis for acting as a consumer. the higher level api is simpler to get started with and it handles all the nuances of handling partitioning and so on. it will need a reference to a zookeeper instance to keep the coordination state. let’s turn now turn to using apache kafka with spring. using apache kafka with spring integration the recently released apache kafka 1.1 spring integration adapter is very powerful, and provides inbound adapters for working with both the lower level apache kafka api as well as the higher level api. the adapter, currently, is xml-configuration first, though work is already underway on a spring integration java configuration dsl for the adapter and milestones are available. we’ll look at both here, now. to make all these examples work, i added the libs-milestone-local maven repository and used the following dependencies: org.apache.kafka:kafka_2.10:0.8.1.1 org.springframework.boot:spring-boot-starter-integration:1.2.3.release org.springframework.boot:spring-boot-starter:1.2.3.release org.springframework.integration:spring-integration-kafka:1.1.1.release org.springframework.integration:spring-integration-java-dsl:1.1.0.m1 using the spring integration apache kafka with the spring integration xml dsl first, let’s look at how to use the spring integration outbound adapter to send message instances from a spring integration flow to an external apache kafka instance. the example is fairly straightforward: a spring integration channel named inputtokafka acts as a conduit that forwards message messages to the outbound adapter, kafkaoutboundchanneladapter . the adapter itself can take its configuration from the defaults specified in the kafka:producer-context element or it from the adapter-local configuration overrides. there may be one or many configurations in a given kafka:producer-context element. here’s the java code from a spring boot application to trigger message sends using the outbound adapter by sending messages into the incoming inputtokafka messagechannel . package xml; import org.apache.commons.logging.log; import org.apache.commons.logging.logfactory; import org.springframework.beans.factory.annotation.qualifier; import org.springframework.boot.commandlinerunner; import org.springframework.boot.springapplication; import org.springframework.boot.autoconfigure.springbootapplication; import org.springframework.context.annotation.bean; import org.springframework.context.annotation.dependson; import org.springframework.context.annotation.importresource; import org.springframework.integration.config.enableintegration; import org.springframework.messaging.messagechannel; import org.springframework.messaging.support.genericmessage; @springbootapplication @enableintegration @importresource("/xml/outbound-kafka-integration.xml") public class demoapplication { private log log = logfactory.getlog(getclass()); @bean @dependson("kafkaoutboundchanneladapter") commandlinerunner kickoff(@qualifier("inputtokafka") messagechannel in) { return args -> { for (int i = 0; i < 1000; i++) { in.send(new genericmessage<>("#" + i)); log.info("sending message #" + i); } }; } public static void main(string args[]) { springapplication.run(demoapplication.class, args); } } using the new apache kafka spring integration java configuration dsl shortly after the spring integration 1.1 release, spring integration rockstar artem bilan got to work on adding a spring integration java configuration dsl analog and the result is a thing of beauty! it’s not yet ga (you need to add the libs-milestone repository for now), but i encourage you to try it out and kick the tires. it’s working well for me and the spring integration team are always keen on getting early feedback whenever possible! here’s an example that demonstrates both sending messages and consuming them from two different integrationflow s. the producer is similar to the example xml above. new in this example is the polling consumer. it is batch-centric, and will pull down all the messages it sees at a fixed interval. in our code, the message received will be a map that contains as its keys the topic and as its value another map with the partition id and the batch (in this case, of 10 records), of records read. there is a messagelistenercontainer -based alternative that processes messages as they come. package jc; import org.apache.commons.logging.log; import org.apache.commons.logging.logfactory; import org.springframework.beans.factory.annotation.autowired; import org.springframework.beans.factory.annotation.qualifier; import org.springframework.beans.factory.annotation.value; import org.springframework.boot.commandlinerunner; import org.springframework.boot.springapplication; import org.springframework.boot.autoconfigure.springbootapplication; import org.springframework.context.annotation.bean; import org.springframework.context.annotation.configuration; import org.springframework.context.annotation.dependson; import org.springframework.integration.integrationmessageheaderaccessor; import org.springframework.integration.config.enableintegration; import org.springframework.integration.dsl.integrationflow; import org.springframework.integration.dsl.integrationflows; import org.springframework.integration.dsl.sourcepollingchanneladapterspec; import org.springframework.integration.dsl.kafka.kafka; import org.springframework.integration.dsl.kafka.kafkahighlevelconsumermessagesourcespec; import org.springframework.integration.dsl.kafka.kafkaproducermessagehandlerspec; import org.springframework.integration.dsl.support.consumer; import org.springframework.integration.kafka.support.zookeeperconnect; import org.springframework.messaging.messagechannel; import org.springframework.messaging.support.genericmessage; import org.springframework.stereotype.component; import java.util.list; import java.util.map; /** * demonstrates using the spring integration apache kafka java configuration dsl. * thanks to spring integration ninja artem bilan * for getting the java configuration dsl working so quickly! * * @author josh long */ @enableintegration @springbootapplication public class demoapplication { public static final string test_topic_id = "event-stream"; @component public static class kafkaconfig { @value("${kafka.topic:" + test_topic_id + "}") private string topic; @value("${kafka.address:localhost:9092}") private string brokeraddress; @value("${zookeeper.address:localhost:2181}") private string zookeeperaddress; kafkaconfig() { } public kafkaconfig(string t, string b, string zk) { this.topic = t; this.brokeraddress = b; this.zookeeperaddress = zk; } public string gettopic() { return topic; } public string getbrokeraddress() { return brokeraddress; } public string getzookeeperaddress() { return zookeeperaddress; } } @configuration public static class producerconfiguration { @autowired private kafkaconfig kafkaconfig; private static final string outbound_id = "outbound"; private log log = logfactory.getlog(getclass()); @bean @dependson(outbound_id) commandlinerunner kickoff( @qualifier(outbound_id + ".input") messagechannel in) { return args -> { for (int i = 0; i < 1000; i++) { in.send(new genericmessage<>("#" + i)); log.info("sending message #" + i); } }; } @bean(name = outbound_id) integrationflow producer() { log.info("starting producer flow.."); return flowdefinition -> { consumer spec = (kafkaproducermessagehandlerspec.producermetadataspec metadata)-> metadata.async(true) .batchnummessages(10) .valueclasstype(string.class) .valueencoder(string::getbytes); kafkaproducermessagehandlerspec messagehandlerspec = kafka.outboundchanneladapter( props -> props.put("queue.buffering.max.ms", "15000")) .messagekey(m -> m.getheaders().get(integrationmessageheaderaccessor.sequence_number)) .addproducer(this.kafkaconfig.gettopic(), this.kafkaconfig.getbrokeraddress(), spec); flowdefinition .handle(messagehandlerspec); }; } } @configuration public static class consumerconfiguration { @autowired private kafkaconfig kafkaconfig; private log log = logfactory.getlog(getclass()); @bean integrationflow consumer() { log.info("starting consumer.."); kafkahighlevelconsumermessagesourcespec messagesourcespec = kafka.inboundchanneladapter( new zookeeperconnect(this.kafkaconfig.getzookeeperaddress())) .consumerproperties(props -> props.put("auto.offset.reset", "smallest") .put("auto.commit.interval.ms", "100")) .addconsumer("mygroup", metadata -> metadata.consumertimeout(100) .topicstreammap(m -> m.put(this.kafkaconfig.gettopic(), 1)) .maxmessages(10) .valuedecoder(string::new)); consumer endpointconfigurer = e -> e.poller(p -> p.fixeddelay(100)); return integrationflows .from(messagesourcespec, endpointconfigurer) .>>handle((payload, headers) -> { payload.entryset().foreach(e -> log.info(e.getkey() + '=' + e.getvalue())); return null; }) .get(); } } public static void main(string[] args) { springapplication.run(demoapplication.class, args); } } the example makes heavy use of java 8 lambdas. the producer spends a bit of time establishing how many messages will be sent in a single send operation, how keys and values are encoded (kafka only knows about byte[] arrays, after all) and whether messages should be sent synchronously or asynchronously. in the next line, we configure the outbound adapter itself and then define an integrationflow such that all messages get sent out via the kafka outbound adapter. the consumer spends a bit of time establishing which zookeeper instance to connect to, how many messages to receive (10) in a batch, etc. once the message batches are recieved, they’re handed to the handle method where i’ve passed in a lambda that’ll enumerate the payload’s body and print it out. nothing fancy. using apache kafka with spring xd apache kafka is a message bus and it can be very powerful when used as an integration bus. however, it really comes into its own because it’s fast enough and scalable enough that it can be used to route big-data through processing pipelines. and if you’re doing data processing, you really want spring xd ! spring xd makes it dead simple to use apache kafka (as the support is built on the apache kafka spring integration adapter!) in complex stream-processing pipelines. apache kafka is exposed as a spring xd source - where data comes from - and a sink - where data goes to. spring xd exposes a super convenient dsl for creating bash -like pipes-and-filter flows. spring xd is a centralized runtime that manages, scales, and monitors data processing jobs. it builds on top of spring integration, spring batch, spring data and spring for hadoop to be a one-stop data-processing shop. spring xd jobs read data from sources , run them through processing components that may count, filter, enrich or transform the data, and then write them to sinks. spring integration and spring xd ninja marius bogoevici , who did a lot of the recent work in the spring integration and spring xd implementation of apache kafka, put together a really nice example demonstrating how to get a full working spring xd and kafka flow working . the readme walks you through getting apache kafka, spring xd and the requisite topics all setup. the essence, however, is when you use the spring xd shell and the shell dsl to compose a stream. spring xd components are named components that are pre-configured but have lots of parameters that you can override with --.. arguments via the xd shell and dsl. (that dsl, by the way, is written by the amazing andy clement of spring expression language fame!) here’s an example that configures a stream to read data from an apache kafka source and then write the message a component called log , which is a sink. log , in this case, could be syslogd, splunk, hdfs, etc. xd> stream create kafka-source-test --definition "kafka --zkconnect=localhost:2181 --topic=event-stream | log"--deploy and that’s it! naturally, this is just a tase of spring xd, but hopefully you’ll agree the possibilities are tantalizing. deploying a kafka server with lattice and docker it’s easy to get an example kafka installation all setup using lattice , a distributed runtime that supports, among other container formats, the very popular docker image format. there’s a docker image provided by spotify that sets up a collocated zookeeper and kafka image . you can easily deploy this to a lattice cluster, as follows: ltc create --run-as-root m-kafka spotify/kafka from there, you can easily scale the apache kafka instances and even more easily still consume apache kafka from your cloud-based services. next steps you can find the code for this blog on my github account . we’ve only scratched the surface! if you want to learn more (and why wouldn’t you?), then be sure to check out marius bogoevici and dr. mark pollack’s upcoming webinar on reactive data-pipelines using spring xd and apache kafka where they’ll demonstrate how easy it can be to use rxjava, spring xd and apache kafka!

April 18, 2015

by Pieter Humphrey

· 29,099 Views

Using Multiple Grok Statements to Parse a Java Stack Trace

Parse your Java stack trace log information with the Logstash tool.

April 14, 2015

by Bipin Patwardhan

· 77,945 Views · 6 Likes

MQ Trace on Java MQ Clients

I find it very difficult to debug the Java client application written for MQ. Most of the error messages are self explanatory but some of the error messages are difficult to diagnose and find the cause. Enable tracing would really help to identify the root of the problem. In this post, we will see how to enable tracing on Java applications. MQ Trace Simple MQ Trace enabling can be done by adding a JVM parameter like this java -Dcom.ibm.mq.commonservices=trace.properties ... Create trace.properties with the following attributes Diagnostics.MQ=enabled Diagnostics.Java=explorer,wmqjavaclasses,all Diagnostics.Java.Trace.Detail=high Diagnostics.Java.Trace.Destination.File=enabled Diagnostics.Java.Trace.Destination.Console=disabled Diagnostics.Java.Trace.Destination.Pathname=/tmp/trace Diagnostics.Java.FFDC.Destination.Pathname=/tmp/FFDC Diagnostics.Java.Errors.Destination.Filename=/tmp/errors/AMQJERR.LOG oints to be noted Add libraries of IBM JRE (From the directory $JRE_HOME/lib) to CLASSPATH Create the directories /tmp/FFDC, /tmp/trace and /tmp/errors Add read permission to trace.properties Keep an eye on the log file size and make sure to disable logging when not required. Otherwise, it will keep filling the disk space. (Logs size will become some gigs in few hours easily) One you start the Java Client, it will start printing the trace into the mentioned files. If you want to disable the tracing, make the Diagnostics.MQ property value to disabled so that trace will be stopped. JSSE Debug In case of secure connection with MQ, if you want to enable debug for JSSE, then you can use the JVM parameter java.net.debug with different values -Djavax.net.debug=true This prints full trace of JSSE, this can be limited to only handshake by changing the value of the parameter to ssl:handshake -Djavax.net.debug=ssl:handshake This prints only the trace related to handshake, all other trace will be simply ignored. For more articles read my blog . Happy Learning!!!!

April 14, 2015

by Veeresham Kardas

· 7,355 Views

Mockito & DBUnit: Implementing a Mocking Structure Focused and Independent to Automated Tests on Java

On this post, we will make a hands-on about Mockito and DBUnit, two libraries from Java's open source ecosystem which can help us in improving our JUnit tests on focus and independence. But why mocking is so important on our unit tests? Focusing the tests Let's imagine a Java back-end application with a tier-like architecture. On this application, we could have 2 tiers: The service tier, which have the business rules and make as a interface for the front-end; The entity tier, which have the logic responsible for making calls to a database, utilizing techonologies like JDBC or JPA; Of course, on a architecture of this kind, we will have the following dependence of our tiers: Service >>> Entity On this kind of architecture, the most common way of building our automated tests is by creating JUnit Test Classes which test each tier independently, thus we can make running tests that reflect only the correctness of the tier we want to test. However, if we simply create the classes without any mocking, we will got problems like the following: On the JUnit tests of our service tier, for example, if we have a problem on the entity tier, we will have also our tests failed, because the error from the entity tier will reverberate across the tiers; If we have a project where different teams are working on the same system, and one team is responsible for the construction of the service tier, while the other is responsible for the construction of the entity tier, we will have a dependency of one team with the other before the tests could be made; To resolve such issues, we could mock the entity tier on the service tier's unit test classes, so we can have independence and focus of our tests on the service tier, which it belongs. independence One point that it is specially important when we make our JUnit test classes in the independence department is the entity tier. Since in our example this tier is focused in the connection and running of SQL commands on a database, it makes a break on our independence goal, since we will need a database so we can run our tests. Not only that, if a test breaks any structure that it is used by the subsequent tests, all of them will also fail. It is on this point that enters our other library, DBUnit. With DBUnit, we can use embedded databases, such as HSQLDB, to make our database exclusive to the running of our tests. So, without further delay, let's begin our hands-on! Hands-on For this lab, we will create a basic CRUD for a Client entity. The structure will follow the simple example we talked about previously, with the DAO (entity) and Service tiers. We will use DBUnit and JUnit to test the DAO tier, and Mockito with JUnit to test the Service tier. First, let's create a Maven project, without any archetype and include the following dependencies on pom.xml: . . . junit junit 4.12 org.dbunit dbunit 2.5.0 org.mockito mockito-all 1.10.19 org.hibernate hibernate-entitymanager 4.3.8.Final org.hsqldb hsqldb 2.3.2 org.springframework spring-core 4.1.4.RELEASE org.springframework spring-context 4.1.5.RELEASE org.springframework spring-test 4.1.5.RELEASE org.springframework spring-tx 4.1.5.RELEASE org.springframework spring-orm 4.1.5.RELEASE . . . On the previous snapshot, we included not only the Mockito, DBUnit and JUnit libraries, but we also included Hibernate to implement the persistence layer and Spring 4 to use the IoC container and the transaction management. We also included the Spring Test library, which includes some features that we will use later on this lab. Finally, to simplify the setup and remove the need of installing a database to run the code, we will use HSQLDB as our database. Our lab will have the following structure: One class will represent the application itself, as a standalone class, where we will consume the tiers, like a real application would do; We will have another 2 classes, each one with JUnit tests, that will test each tier independently; First, we define a persistence unit, where we define the name of the unit and the properties to make Hibernate create the table for us and populate her with some initial rows. The code of the persistence.xml can be seen bellow: com.alexandreesl.handson.model.Client And the initial data to populate the table can be seen bellow: insert into Client(id,name,sex, phone) values (1,'Alexandre Eleuterio Santos Lourenco','M','22323456'); insert into Client(id,name,sex, phone) values (2,'Lucebiane Santos Lourenco','F','22323876'); insert into Client(id,name,sex, phone) values (3,'Maria Odete dos Santos Lourenco','F','22309456'); insert into Client(id,name,sex, phone) values (4,'Eleuterio da Silva Lourenco','M','22323956'); insert into Client(id,name,sex, phone) values (5,'Ana Carolina Fernandes do Sim','F','22123456'); In order to not making the post burdensome, we will not discuss the project structure during the lab, but just show the final structure at the end. The code can be found on a Github repository, at the end of the post. With the persistence unit defined, we can start coding! First, we create the entity class: package com.alexandreesl.handson.model; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.Id; import javax.persistence.Table; @Table(name = "Client") @Entity public class Client { @Id private long id; @Column(name = "name", nullable = false, length = 50) private String name; @Column(name = "sex", nullable = false) private String sex; @Column(name = "phone", nullable = false) private long phone; public long getId() { return id; } public void setId(long id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getSex() { return sex; } public void setSex(String sex) { this.sex = sex; } public long getPhone() { return phone; } public void setPhone(long phone) { this.phone = phone; } } In order to create the persistence-related beans to enable Hibernate and the transaction manager, alongside all the rest of the beans necessary for the application, we use a Java-based Spring configuration class. The code of the class can be seen bellow: package com.alexandreesl.handson.core; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; import org.springframework.jdbc.datasource.DriverManagerDataSource; import org.springframework.orm.jpa.JpaTransactionManager; import org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean; import org.springframework.orm.jpa.vendor.Database; import org.springframework.orm.jpa.vendor.HibernateJpaDialect; import org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter; import org.springframework.transaction.annotation.EnableTransactionManagement; @Configuration @EnableTransactionManagement @ComponentScan({ "com.alexandreesl.handson.dao", "com.alexandreesl.handson.service" }) public class AppConfiguration { @Bean public DriverManagerDataSource dataSource() { DriverManagerDataSource dataSource = new DriverManagerDataSource(); dataSource.setDriverClassName("org.hsqldb.jdbcDriver"); dataSource.setUrl("jdbc:hsqldb:mem://standalone"); dataSource.setUsername("sa"); dataSource.setPassword(""); return dataSource; } @Bean public JpaTransactionManager transactionManager() { JpaTransactionManager transactionManager = new JpaTransactionManager(); transactionManager.setEntityManagerFactory(entityManagerFactory() .getNativeEntityManagerFactory()); transactionManager.setDataSource(dataSource()); transactionManager.setJpaDialect(jpaDialect()); return transactionManager; } @Bean public HibernateJpaDialect jpaDialect() { return new HibernateJpaDialect(); } @Bean public HibernateJpaVendorAdapter jpaVendorAdapter() { HibernateJpaVendorAdapter jpaVendor = new HibernateJpaVendorAdapter(); jpaVendor.setDatabase(Database.HSQL); jpaVendor.setDatabasePlatform("org.hibernate.dialect.HSQLDialect"); return jpaVendor; } @Bean public LocalContainerEntityManagerFactoryBean entityManagerFactory() { LocalContainerEntityManagerFactoryBean entityManagerFactory = new LocalContainerEntityManagerFactoryBean(); entityManagerFactory .setPersistenceXmlLocation("classpath:META-INF/persistence.xml"); entityManagerFactory.setPersistenceUnitName("persistence"); entityManagerFactory.setDataSource(dataSource()); entityManagerFactory.setJpaVendorAdapter(jpaVendorAdapter()); entityManagerFactory.setJpaDialect(jpaDialect()); return entityManagerFactory; } } And finally, we create the classes that represent the tiers itself. This is the DAO class: package com.alexandreesl.handson.dao; import javax.persistence.EntityManager; import javax.persistence.PersistenceContext; import org.springframework.stereotype.Component; import org.springframework.transaction.annotation.Transactional; import com.alexandreesl.handson.model.Client; @Component public class ClientDAO { @PersistenceContext private EntityManager entityManager; @Transactional(readOnly = true) public Client find(long id) { return entityManager.find(Client.class, id); } @Transactional public void create(Client client) { entityManager.persist(client); } @Transactional public void update(Client client) { entityManager.merge(client); } @Transactional public void delete(Client client) { entityManager.remove(client); } } And this is the service class: package com.alexandreesl.handson.service; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component; import com.alexandreesl.handson.dao.ClientDAO; import com.alexandreesl.handson.model.Client; @Component public class ClientService { @Autowired private ClientDAO clientDAO; public ClientDAO getClientDAO() { return clientDAO; } public void setClientDAO(ClientDAO clientDAO) { this.clientDAO = clientDAO; } public Client find(long id) { return clientDAO.find(id); } public void create(Client client) { clientDAO.create(client); } public void update(Client client) { clientDAO.update(client); } public void delete(Client client) { clientDAO.delete(client); } } The reader may notice that we created a getter/setter to the DAO class on the Service class. This is not necessary for the Spring injection, but we made this way to get easier to change the real DAO by a Mockito's mock on the tests class. Finally, we code the class we talked about previously, the one that consume the tiers: package com.alexandreesl.handson.core; import org.springframework.context.ApplicationContext; import org.springframework.context.annotation.AnnotationConfigApplicationContext; import com.alexandreesl.handson.model.Client; import com.alexandreesl.handson.service.ClientService; public class App { public static void main(String[] args) { ApplicationContext context = new AnnotationConfigApplicationContext( AppConfiguration.class); ClientService service = (ClientService) context .getBean(ClientService.class); System.out.println(service.find(1).getName()); System.out.println(service.find(3).getName()); System.out.println(service.find(5).getName()); Client client = new Client(); client.setId(6); client.setName("Celina do Sim"); client.setPhone(44657688); client.setSex("F"); service.create(client); System.out.println(service.find(6).getName()); System.exit(0); } } If we run the class, we can see that the console print all the clients we searched for and that Hibernate is initialized properly, proving our implementation is a success: Mar 28, 2015 1:09:22 PM org.springframework.context.annotation.AnnotationConfigApplicationContext prepareRefresh INFO: Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@6433a2: startup date [Sat Mar 28 13:09:22 BRT 2015]; root of context hierarchy Mar 28, 2015 1:09:22 PM org.springframework.jdbc.datasource.DriverManagerDataSource setDriverClassName INFO: Loaded JDBC driver: org.hsqldb.jdbcDriver Mar 28, 2015 1:09:22 PM org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean createNativeEntityManagerFactory INFO: Building JPA container EntityManagerFactory for persistence unit 'persistence' Mar 28, 2015 1:09:22 PM org.hibernate.jpa.internal.util.LogHelper logPersistenceUnitInformation INFO: HHH000204: Processing PersistenceUnitInfo [ name: persistence ...] Mar 28, 2015 1:09:22 PM org.hibernate.Version logVersion INFO: HHH000412: Hibernate Core {4.3.8.Final} Mar 28, 2015 1:09:22 PM org.hibernate.cfg.Environment INFO: HHH000206: hibernate.properties not found Mar 28, 2015 1:09:22 PM org.hibernate.cfg.Environment buildBytecodeProvider INFO: HHH000021: Bytecode provider name : javassist Mar 28, 2015 1:09:22 PM org.hibernate.annotations.common.reflection.java.JavaReflectionManager INFO: HCANN000001: Hibernate Commons Annotations {4.0.5.Final} Mar 28, 2015 1:09:23 PM org.hibernate.dialect.Dialect INFO: HHH000400: Using dialect: org.hibernate.dialect.HSQLDialect Mar 28, 2015 1:09:23 PM org.hibernate.hql.internal.ast.ASTQueryTranslatorFactory INFO: HHH000397: Using ASTQueryTranslatorFactory Mar 28, 2015 1:09:23 PM org.hibernate.tool.hbm2ddl.SchemaExport execute INFO: HHH000227: Running hbm2ddl schema export Mar 28, 2015 1:09:23 PM org.hibernate.tool.hbm2ddl.SchemaExport execute INFO: HHH000230: Schema export complete Alexandre Eleuterio Santos Lourenco Maria Odete dos Santos Lourenco Ana Carolina Fernandes do Sim Celina do Sim Now, let's move on for the tests themselves. For the DBUnit tests, we create a Base class, which will provide the base DB operations which all of our JUnit tests will benefit. On the @PostConstruct method, which is fired after all the injections of the Spring context are made - reason why we couldn't use the @BeforeClass annotation, because we need Spring to instantiate and inject the EntityManager first - we use DBUnit to make a connection to our database, with the class DatabaseConnection and populate the table using the DataSet class we created, passing a XML structure that represents the data used on the tests. This operation of populating the table is made by the DatabaseOperation class, which we use with the CLEAN_INSERT operation, that truncate the table first and them insert the data on the dataset. Finally, we use one of JUnit's event listeners, the @After event, which is called after every test case. On our scenario, we use this event to call the clear() method on the EntityManager, which forces Hibernate to query against the Database for the first time at every test case, thus eliminating possible problems we could have between our test cases because of data that it is different on the second level cache than it is on the DB. The code for the base class is the following: package com.alexandreesl.handson.dao.test; import java.io.InputStream; import java.sql.SQLException; import javax.annotation.PostConstruct; import javax.persistence.EntityManager; import javax.persistence.EntityManagerFactory; import javax.persistence.PersistenceUnit; import org.dbunit.DatabaseUnitException; import org.dbunit.database.DatabaseConfig; import org.dbunit.database.DatabaseConnection; import org.dbunit.database.IDatabaseConnection; import org.dbunit.dataset.IDataSet; import org.dbunit.dataset.xml.FlatXmlDataSetBuilder; import org.dbunit.ext.hsqldb.HsqldbDataTypeFactory; import org.dbunit.operation.DatabaseOperation; import org.hibernate.HibernateException; import org.hibernate.internal.SessionImpl; import org.junit.After; public class BaseDBUnitSetup { private static IDatabaseConnection connection; private static IDataSet dataset; @PersistenceUnit public EntityManagerFactory entityManagerFactory; private EntityManager entityManager; @PostConstruct public void init() throws HibernateException, DatabaseUnitException, SQLException { entityManager = entityManagerFactory.createEntityManager(); connection = new DatabaseConnection( ((SessionImpl) (entityManager.getDelegate())).connection()); connection.getConfig().setProperty( DatabaseConfig.PROPERTY_DATATYPE_FACTORY, new HsqldbDataTypeFactory()); FlatXmlDataSetBuilder flatXmlDataSetBuilder = new FlatXmlDataSetBuilder(); InputStream dataSet = Thread.currentThread().getContextClassLoader() .getResourceAsStream("test-data.xml"); dataset = flatXmlDataSetBuilder.build(dataSet); DatabaseOperation.CLEAN_INSERT.execute(connection, dataset); } @After public void afterTests() { entityManager.clear(); } } The xml structure used on the test cases is the following: And the code of our test class of the DAO tier is the following: package com.alexandreesl.handson.dao.test; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; import org.junit.Test; import org.junit.runner.RunWith; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.test.context.ContextConfiguration; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; import org.springframework.test.context.transaction.TransactionConfiguration; import org.springframework.transaction.annotation.Transactional; import com.alexandreesl.handson.core.test.AppTestConfiguration; import com.alexandreesl.handson.dao.ClientDAO; import com.alexandreesl.handson.model.Client; @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(classes = AppTestConfiguration.class) @TransactionConfiguration(defaultRollback = true) public class ClientDAOTest extends BaseDBUnitSetup { @Autowired private ClientDAO clientDAO; @Test public void testFind() { Client client = clientDAO.find(1); assertNotNull(client); client = clientDAO.find(2); assertNotNull(client); client = clientDAO.find(3); assertNull(client); client = clientDAO.find(4); assertNull(client); client = clientDAO.find(5); assertNull(client); } @Test @Transactional public void testInsert() { Client client = new Client(); client.setId(3); client.setName("Celina do Sim"); client.setPhone(44657688); client.setSex("F"); clientDAO.create(client); } @Test @Transactional public void testUpdate() { Client client = clientDAO.find(1); client.setPhone(12345678); clientDAO.update(client); } @Test @Transactional public void testRemove() { Client client = clientDAO.find(1); clientDAO.delete(client); } } The code is very self explanatory so we will just focus on explaining the annotations at the top-level class. The @RunWith(SpringJUnit4ClassRunner.class) annotationchanges the JUnit base class that runs our test cases, using rather one made by Spring that enable support of the IoC container and the Spring's annotations. The @TransactionConfiguration(defaultRollback = true) annotation is from Spring's test library and change the behavior of the @Transactional annotation, making the transactions to roll back after execution, instead of a commit. That ensures that our test cases wont change the structure of the DB, so a test case wont break the execution of his followers. The reader may notice that we changed the configuration class to another one, exclusive for the test cases. It is essentially the same beans we created on the original configuration class, just changing the database bean to point to a different DB then the previously one, showing that we can change the database of our tests without breaking the code. On a real world scenario, the configuration class of the application would be pointing to a relational database like Oracle, DB2, etc and the test cases would use a embedded database such as HSQLDB, which we are using on this case: package com.alexandreesl.handson.core.test; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; import org.springframework.jdbc.datasource.DriverManagerDataSource; import org.springframework.orm.jpa.JpaTransactionManager; import org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean; import org.springframework.orm.jpa.vendor.Database; import org.springframework.orm.jpa.vendor.HibernateJpaDialect; import org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter; import org.springframework.transaction.annotation.EnableTransactionManagement; @Configuration @EnableTransactionManagement @ComponentScan({ "com.alexandreesl.handson.dao", "com.alexandreesl.handson.service" }) public class AppTestConfiguration { @Bean public DriverManagerDataSource dataSource() { DriverManagerDataSource dataSource = new DriverManagerDataSource(); dataSource.setDriverClassName("org.hsqldb.jdbcDriver"); dataSource.setUrl("jdbc:hsqldb:mem://standalone-test"); dataSource.setUsername("sa"); dataSource.setPassword(""); return dataSource; } @Bean public JpaTransactionManager transactionManager() { JpaTransactionManager transactionManager = new JpaTransactionManager(); transactionManager.setEntityManagerFactory(entityManagerFactory() .getNativeEntityManagerFactory()); transactionManager.setDataSource(dataSource()); transactionManager.setJpaDialect(jpaDialect()); return transactionManager; } @Bean public HibernateJpaDialect jpaDialect() { return new HibernateJpaDialect(); } @Bean public HibernateJpaVendorAdapter jpaVendorAdapter() { HibernateJpaVendorAdapter jpaVendor = new HibernateJpaVendorAdapter(); jpaVendor.setDatabase(Database.HSQL); jpaVendor.setDatabasePlatform("org.hibernate.dialect.HSQLDialect"); return jpaVendor; } @Bean public LocalContainerEntityManagerFactoryBean entityManagerFactory() { LocalContainerEntityManagerFactoryBean entityManagerFactory = new LocalContainerEntityManagerFactoryBean(); entityManagerFactory .setPersistenceXmlLocation("classpath:META-INF/persistence.xml"); entityManagerFactory.setPersistenceUnitName("persistence"); entityManagerFactory.setDataSource(dataSource()); entityManagerFactory.setJpaVendorAdapter(jpaVendorAdapter()); entityManagerFactory.setJpaDialect(jpaDialect()); return entityManagerFactory; } } If we run the test class, we can see that it runs the test cases successfully, showing that our code is a success. If we see the console, we can see that transactions were created and rolled back, respecting our configuration: . . . ar 28, 2015 2:29:55 PM org.springframework.test.context.transaction.TransactionContext startTransaction INFO: Began transaction (1) for test context [DefaultTestContext@644abb8f testClass = ClientDAOTest, testInstance = com.alexandreesl.handson.dao.test.ClientDAOTest@1a411233, testMethod = testInsert@ClientDAOTest, testException = [null], mergedContextConfiguration = [MergedContextConfiguration@70325d20 testClass = ClientDAOTest, locations = '{}', classes = '{class com.alexandreesl.handson.core.test.AppTestConfiguration}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{}', contextLoader = 'org.springframework.test.context.support.DelegatingSmartContextLoader', parent = [null]]]; transaction manager [org.springframework.orm.jpa.JpaTransactionManager@7c2327fa]; rollback [true] Mar 28, 2015 2:29:55 PM org.springframework.test.context.transaction.TransactionContext endTransaction INFO: Rolled back transaction for test context [DefaultTestContext@644abb8f testClass = ClientDAOTest, testInstance = com.alexandreesl.handson.dao.test.ClientDAOTest@1a411233, testMethod = testInsert@ClientDAOTest, testException = [null], mergedContextConfiguration = [MergedContextConfiguration@70325d20 testClass = ClientDAOTest, locations = '{}', classes = '{class com.alexandreesl.handson.core.test.AppTestConfiguration}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{}', contextLoader = 'org.springframework.test.context.support.DelegatingSmartContextLoader', parent = [null]]]. Mar 28, 2015 2:29:55 PM org.springframework.test.context.transaction.TransactionContext startTransaction INFO: Began transaction (1) for test context [DefaultTestContext@644abb8f testClass = ClientDAOTest, testInstance = com.alexandreesl.handson.dao.test.ClientDAOTest@2adddc06, testMethod = testRemove@ClientDAOTest, testException = [null], mergedContextConfiguration = [MergedContextConfiguration@70325d20 testClass = ClientDAOTest, locations = '{}', classes = '{class com.alexandreesl.handson.core.test.AppTestConfiguration}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{}', contextLoader = 'org.springframework.test.context.support.DelegatingSmartContextLoader', parent = [null]]]; transaction manager [org.springframework.orm.jpa.JpaTransactionManager@7c2327fa]; rollback [true] Mar 28, 2015 2:29:55 PM org.springframework.test.context.transaction.TransactionContext endTransaction INFO: Rolled back transaction for test context [DefaultTestContext@644abb8f testClass = ClientDAOTest, testInstance = com.alexandreesl.handson.dao.test.ClientDAOTest@2adddc06, testMethod = testRemove@ClientDAOTest, testException = [null], mergedContextConfiguration = [MergedContextConfiguration@70325d20 testClass = ClientDAOTest, locations = '{}', classes = '{class com.alexandreesl.handson.core.test.AppTestConfiguration}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{}', contextLoader = 'org.springframework.test.context.support.DelegatingSmartContextLoader', parent = [null]]]. Mar 28, 2015 2:29:55 PM org.springframework.test.context.transaction.TransactionContext startTransaction INFO: Began transaction (1) for test context [DefaultTestContext@644abb8f testClass = ClientDAOTest, testInstance = com.alexandreesl.handson.dao.test.ClientDAOTest@4905c46b, testMethod = testUpdate@ClientDAOTest, testException = [null], mergedContextConfiguration = [MergedContextConfiguration@70325d20 testClass = ClientDAOTest, locations = '{}', classes = '{class com.alexandreesl.handson.core.test.AppTestConfiguration}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{}', contextLoader = 'org.springframework.test.context.support.DelegatingSmartContextLoader', parent = [null]]]; transaction manager [org.springframework.orm.jpa.JpaTransactionManager@7c2327fa]; rollback [true] Mar 28, 2015 2:29:55 PM org.springframework.test.context.transaction.TransactionContext endTransaction INFO: Rolled back transaction for test context [DefaultTestContext@644abb8f testClass = ClientDAOTest, testInstance = com.alexandreesl.handson.dao.test.ClientDAOTest@4905c46b, testMethod = testUpdate@ClientDAOTest, testException = [null], mergedContextConfiguration = [MergedContextConfiguration@70325d20 testClass = ClientDAOTest, locations = '{}', classes = '{class com.alexandreesl.handson.core.test.AppTestConfiguration}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{}', contextLoader = 'org.springframework.test.context.support.DelegatingSmartContextLoader', parent = [null]]]. Now let's move on to the Service tests, with the help of Mockito. The class to test the Service tier is very simple, as we can see bellow: package com.alexandreesl.handson.service.test; import static org.junit.Assert.assertEquals; import org.junit.BeforeClass; import org.junit.Test; import org.mockito.Mockito; import org.mockito.invocation.InvocationOnMock; import org.mockito.stubbing.Answer; import com.alexandreesl.handson.dao.ClientDAO; import com.alexandreesl.handson.model.Client; import com.alexandreesl.handson.service.ClientService; public class ClientServiceTest { private static ClientDAO clientDAO; private static ClientService clientService; @BeforeClass public static void beforeClass() { clientService = new ClientService(); clientDAO = Mockito.mock(ClientDAO.class); clientService.setClientDAO(clientDAO); Client client = new Client(); client.setId(0); client.setName("Mocked client!"); client.setPhone(11111111); client.setSex("M"); Mockito.when(clientDAO.find(Mockito.anyLong())).thenReturn(client); Mockito.doThrow(new RuntimeException("error on client!")) .when(clientDAO).delete((Client) Mockito.any()); Mockito.doNothing().when(clientDAO).create((Client) Mockito.any()); Mockito.doAnswer(new Answer

April 14, 2015

by Alexandre Lourenco

· 21,778 Views · 2 Likes

How to Batch DELETE Statements with Hibernate

Introduction In my , I explained the Hibernate configurations required for batching INSERT and UPDATE statements. This post will continue this topic with DELETE statements batching. Domain model entities We’ll start with the following entity model: The Post entity has a one-to-many association to a Comment and a one-to-one relationship with the PostDetails entity: @OneToMany(cascade = CascadeType.ALL, mappedBy = "post", orphanRemoval = true) private List comments = new ArrayList<>(); @OneToOne(cascade = CascadeType.ALL, mappedBy = "post", orphanRemoval = true, fetch = FetchType.LAZY) private PostDetails details; The up-coming tests will be run against the following data: doInTransaction(session -> { int batchSize = batchSize(); for(int i = 0; i < itemsCount(); i++) { int j = 0; Post post = new Post(String.format( "Post no. %d", i)); post.addComment(new Comment( String.format( "Post comment %d:%d", i, j++))); post.addComment(new Comment(String.format( "Post comment %d:%d", i, j++))); post.addDetails(new PostDetails()); session.persist(post); if(i % batchSize == 0 && i > 0) { session.flush(); session.clear(); } } }); Hibernate Configuration As , the following properties are required for batching INSERT and UPDATE statements: properties.put("hibernate.jdbc.batch_size", String.valueOf(batchSize())); properties.put("hibernate.order_inserts", "true"); properties.put("hibernate.order_updates", "true"); properties.put("hibernate.jdbc.batch_versioned_data", "true"); Next, we are going to check if DELETE statements are batched as well. JPA Cascade Delete Because is convenient, I’m going to prove that CascadeType.DELETE and JDBC batching don’t mix well. The following tests is going to: Select some Posts along with Comments and PostDetails Delete the Posts, while propagating the delete event to Comments and PostDetails as well @Test public void testCascadeDelete() { LOGGER.info("Test batch delete with cascade"); final AtomicReference startNanos = new AtomicReference<>(); addDeleteBatchingRows(); doInTransaction(session -> { List posts = session.createQuery( "select distinct p " + "from Post p " + "join fetch p.details d " + "join fetch p.comments c") .list(); startNanos.set(System.nanoTime()); for (Post post : posts) { session.delete(post); } }); LOGGER.info("{}.testCascadeDelete took {} millis", getClass().getSimpleName(), TimeUnit.NANOSECONDS.toMillis( System.nanoTime() - startNanos.get() )); } Running this test gives the following output: Query:{[delete from Comment where id=? and version=?][55,0]} {[delete from Comment where id=? and version=?][56,0]} Query:{[delete from PostDetails where id=?][3]} Query:{[delete from Post where id=? and version=?][3,0]} Query:{[delete from Comment where id=? and version=?][54,0]} {[delete from Comment where id=? and version=?][53,0]} Query:{[delete from PostDetails where id=?][2]} Query:{[delete from Post where id=? and version=?][2,0]} Query:{[delete from Comment where id=? and version=?][52,0]} {[delete from Comment where id=? and version=?][51,0]} Query:{[delete from PostDetails where id=?][1]} Query:{[delete from Post where id=? and version=?][1,0]} Only the Comment DELETE statements were batched, the other entities being deleted in separate database round-trips. The reason for this behaviour is given by the ActionQueue sorting implementation: if ( session.getFactory().getSettings().isOrderUpdatesEnabled() ) { // sort the updates by pk updates.sort(); } if ( session.getFactory().getSettings().isOrderInsertsEnabled() ) { insertions.sort(); } While INSERTS and UPDATES are covered, DELETE statements are not sorted at all. A JDBC batch can only be reused when all statements belong to the same database table. When an incoming statement targets a different database table, the current batch has to be released, so that the new batch matches the current statement database table: public Batch getBatch(BatchKey key) { if ( currentBatch != null ) { if ( currentBatch.getKey().equals( key ) ) { return currentBatch; } else { currentBatch.execute(); currentBatch.release(); } } currentBatch = batchBuilder().buildBatch(key, this); return currentBatch; } If you enjoy reading this article, you might want to subscribe to my newsletter and get a discount for my book as well. Orphan removal and manual flushing A work-around is to dissociate all Child entities while manually flushing the HibernateSession before advancing to a new Child association: @Test public void testOrphanRemoval() { LOGGER.info("Test batch delete with orphan removal"); final AtomicReference startNanos = new AtomicReference<>(); addDeleteBatchingRows(); doInTransaction(session -> { List posts = session.createQuery( "select distinct p " + "from Post p " + "join fetch p.details d " + "join fetch p.comments c") .list(); startNanos.set(System.nanoTime()); posts.forEach(Post::removeDetails); session.flush(); posts.forEach(post -> { for (Iterator commentIterator = post.getComments().iterator(); commentIterator.hasNext(); ) { Comment comment = commentIterator.next(); comment.post = null; commentIterator.remove(); } }); session.flush(); posts.forEach(session::delete); }); LOGGER.info("{}.testOrphanRemoval took {} millis", getClass().getSimpleName(), TimeUnit.NANOSECONDS.toMillis( System.nanoTime() - startNanos.get() )); } This time all DELETE statements are properly batched: Query:{[delete from PostDetails where id=?][2]} {[delete from PostDetails where id=?][3]} {[delete from PostDetails where id=?][1]} Query:{[delete from Comment where id=? and version=?][53,0]} {[delete from Comment where id=? and version=?][54,0]} {[delete from Comment where id=? and version=?][56,0]} {[delete from Comment where id=? and version=?][55,0]} {[delete from Comment where id=? and version=?][52,0]} {[delete from Comment where id=? and version=?][51, Query:{[delete from Post where id=? and version=?][2,0]} {[delete from Post where id=? and version=?][3,0]} {[delete from Post where id=? and version=?][1,0]} SQL Cascade Delete A better solution is to use SQL cascade deletion, instead of JPA entity state propagation mechanism. This way, we can also reduce the DML statements count. Because Hibernate Session acts as a , we must be extra cautious when mixing entity state transitions with database-side automatic actions, as the Persistence Context might not reflect the latest database changes. The Post entity one-to-manyComment association is marked with the Hibernate specific @OnDelete annotation, so that the auto-generated database schema includes the ON DELETE CASCADE directive: @OneToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE}, mappedBy = "post") @OnDelete(action = OnDeleteAction.CASCADE) private List comments = new ArrayList<>(); Generating the following DDL: alter table Comment add constraint FK_apirq8ka64iidc18f3k6x5tc5 foreign key (post_id) references Post on delete cascade The same is done with the PostDetails entity one-to-one Post association: @OneToOne(fetch = FetchType.LAZY) @JoinColumn(name = "id") @MapsId @OnDelete(action = OnDeleteAction.CASCADE) private Post post; And the associated DDL: alter table PostDetails add constraint FK_h14un5v94coafqonc6medfpv8 foreign key (id) references Post on delete cascade The CascadeType.ALL and orphanRemoval were replaced with CascadeType.PERSIST and CascadeType.MERGE, because we no longer want Hibernate to propagate the entity removal event. The test only deletes the Post entities. doInTransaction(session -> { List posts = session.createQuery( "select p from Post p") .list(); startNanos.set(System.nanoTime()); for (Post post : posts) { session.delete(post); } }); The DELETE statements are properly batched as there’s only one target table. Query:{[delete from Post where id=? and version=?][1,0]} {[delete from Post where id=? and version=?][2,0]} {[delete from Post where id=? and version=?][3,0]} If you enjoyed this article, I bet you are going to love my book as well. Conclusion If INSERT and UPDATE statements batching is just a matter of configuration, DELETE statements require some additional steps, which may increase the data access layer complexity. Code available on GitHub. If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider .

April 11, 2015

by Vlad Mihalcea

· 22,620 Views · 1 Like