In my application we have number of clients Databases, every hour we
get new data for processing in that databases
There is a cron to checks data from this databases and pickup the data and
Then Create thread pool and It start execution of 30 threads in parallel and
remaining thread are store in queue
it takes several hours to process this all threads
So while execution, if new data arrives then it has to wait, because this cron
will not pickup this newly arrived data until it's current execution is not
got finished.
Sometimes we have priority data for processing but due to this case that
clients also need to wait for several hours for processing their data.
Please give me the suggestion to avoid this wait state for newly arrived data
(I am working on java 1.7 , tomcat7 and SQL server2012)
Thank you in advance
Please let me know, for more information on this if not clear
Each of your thread should procces data in bulk (for example 100/1000 records) and this records should be selected from DB by priority. Each time you select new records for proccesing data with highest priority go first.
I can't create comment yet :(
For this problem we are thinking about two solution
Create more then one thread pool for processing normal and high
priority data.
Create more then one tomcat instance with same code for processing normal and priority
data
But I am not understanding which solution is best for my case 1 or 2
Please give me suggestions about above solutions, so that i can take decision on it
You can use ExecutorService newCachedThreadPool()
Benefits of using a cached thread pool :
The pool creates new threads if needed but reuses previously constructed threads if they are available.
Only if no threads are available for reuse will a new thread be created and added to the pool.
Threads that have not been used for more than sixty seconds are terminated and removed from the cache. Hence a pool which has not been used long enough will not consume any resources.
Im chasing some memory issues in an app that pulls file names from a kafka queue and does some processing on each. This app runs in Docker with an instance / partition.
Each instance has a single consumer handle that retrieves the next file name and puts it into an ArrayBlockingQueue. Meanwhile there are several threads that take the next file from this queue and do the processing. Im using this secondary queuing as each file can take some time to copy and process (there are instances of "exponential backoff" used IE a thread may be sleeping) so it seemed prudent to have several 'in the pipeline' simultaneously.
My question is about the relative benefits (w/re memory mgmt) of doing it this way (several 'permanent' threads reading from a shared queue) vs launching a new thread for each file as it gets pulled from the queue. In this alternative track I would imagine a FixedThreadPool that would generate a new thread as each file was pulled from kafka.
Is there any advantage to one method vs the other?
Edit:
my primary concern is minimizing GC time. I want to avoid having anything substantial sent to old-gen. This makes me think that the 2nd model is a better way to go.
I designed a java application. A friend suggested using multi-threading, he claims that running my application as several threads will decrease the run time significantly.
In my main class, I carry several operations that are out of our scope to fill global static variables and hash maps to be used across the whole life time of the process. Then I run the core of the application on the entries of an array list.
for(int customerID : customers){
ConsumerPrinter consumerPrinter = new ConsumerPrinter();
consumerPrinter.runPE(docsPath,outputPath,customerID);
System.out.println("Customer with CustomerID:"+customerID+" Done");
}
for each iteration of this loop XMLs of the given customer is fetched from the machine, parsed and calculations are taken on the parsed data. Later, processed results are written in a text file (Fetched and written data can reach up to several Giga bytes at most and 50 MBs on average). More than one iteration can write on the same file.
Should I make this piece of code multi-threaded so each group of customers are taken in an independent thread?
How can I know the most optimal number of threads to run?
What are the best practices to take into consideration when implementing multi-threading?
Should I make this piece of code multi-threaded so each group of customers are taken
in an independent thread?
Yes multi-threading will save your processing time. While iterating on your list you can spawn new thread each iteration and do customer processing in it. But you need to do proper synchronization meaning if two customers processing requires operation on same resource you must synchronize that operation to avoid possible race condition or memory inconsistency issues.
How can I know the most optimal number of threads to run?
You cannot really without actually analyzing the processing time for n customers with different number of threads. It will depend on number of cores your processor has, and what is the actually processing that is taking place for each customer.
What are the best practices to take into consideration when implementing multi-threading?
First and foremost criteria is you must have multiple cores and your OS must support multi-threading. Almost every system does that in present times but is a good criteria to look into. Secondly you must analyze all the possible scenarios that may led to race condition. All the resource that you know will be shared among multiple threads must be thread-safe. Also you must also look out for possible chances of memory inconsistency issues(declare your variable as volatile). Finally there are something that you cannot predict or analyze until you actually run test cases like deadlocks(Need to analyze Thread dump) or memory leaks(Need to analyze Heap dump).
The idea of multi thread is to make some heavy process into another, lets say..., "block of memory".
Any UI updates have to be done on the main/default thread, like print messenges or inflate a view for example. You can ask the app to draw a bitmap, donwload images from the internet or a heavy validation/loop block to run them on a separate thread, imagine that you are creating a second short life app to handle those tasks for you.
Remember, you can ask the app to download/draw a image on another thread, but you have to print this image on the screen on the main thread.
This is common used to load a large bitmap on a separated thread, make math calculations to resize this large image and then, on the main thread, inflate/print/paint/show the smaller version of that image to te user.
In your case, I don't know how heavy runPE() method is, I don't know what it does, you could try to create another thread for him, but the rest should be on the main thread, it is the main process of your UI.
You could optmize your loop by placing the "ConsumerPrinter consumerPrinter = new ConsumerPrinter();" before the "for(...)", since it does not change dinamically, you can remove it inside the loop to avoid the creating of the same object each time the loop restarts : )
While straight java multi-threading can be used (java.util.concurrent) as other answers have discussed, consider also alternate programming approaches to multi-threading, such as the actor model. The actor model still uses threads underneath, but much complexity is handled by the actor framework rather than directly by you the programmer. In addition, there is less (or no) need to reason about synchronizing on shared state between threads because of the way programs using the actor model are created.
See Which Actor model library/framework for Java? for a discussion of popular actor model libraries.
I was asked a question during interview today. First they asked how to provide Synchronization
between thread. Then they asked how to provide Synchronization between process, because I told them, the variable inside each process can not be shared with other process, so they asked me to explain how two process can communicate with each other and how to provide Synchronization
between them, and where to declare the shared variable? Now the interview finished, but I want to know the answer, can anyone explain me?Thank you.
I think the interviewer(s) may not be using the proper terminology. A process runs in its own space, and has been mentioned in separate answers, you have to use OS-specific mechanisms to communicate between process. This is called IPC for Inter-Process Communication.
Using sockets is a common practice, but can be grossly inefficient, depending on your application. But if working with pure Java, this may be the only option since sockets are universally supported.
Shared memory is another technique, but that is OS-specific and requires OS-specific calls. You would have to use something like JNI for a Java application to access shared memory services. Shared memory access is not synchronized, so you will likely have to use semaphors to synchronize access among multiple processes.
Unix-like systems provide multiple IPC mechansims, and which one to use depends on the nature of your application. Shared memory can be a limited resource, so it may not be the best method. Googling on this topics provides numerous hits providing useful information on the technical details.
A process is a collection of virtual memory space, code, data, and system resources. A thread is code that is to be serially executed within a process. A processor executes threads, not processes, so each application has at least one process, and a process always has at least one thread of execution, known as the primary thread. A process can have multiple threads in addition to the primary thread. Prior to the introduction of multiple threads of execution, applications were all designed to run on a single thread of execution.
When a thread begins to execute, it continues until it is killed or until it is interrupted by a thread with higher priority (by a user action or the kernel's thread scheduler). Each thread can run separate sections of code, or multiple threads can execute the same section of code. Threads executing the same block of code maintain separate stacks. Each thread in a process shares that process's global variables and resources.
To communicate between two processes I suppose you can use a ServerSocket and Socket to manage process synchronization. You would bind to a specific port (acquire lock) and if a process already is bound you can connect to the socket (block) and wait until the server socket is closed.
private static int KNOWN_PORT = 11000;//arbitrary valid port
private ServerSocket socket;
public void acquireProcessLock(){
socket = new ServetSocket(KNOWN_PORT);
INetAddress localhostInetAddres = ...
try{
socket.bind(localhostInetAddres );
}catch(IOException failed){
try{
Socket socket = new Socket(localhostInetAddres ,KNOWN_PORT);
socket.getInputStream().read();//block
}catch(IOException ex){ acquireProcessLock(); } //other process invoked releaseProcessLock()
}
}
public void releaseProcessLock(){
socket.close();
}
Not sure if this is the actual best means of doing it but I think its worth considering.
Synchronization is for threads only it wont work for processes in Java. There is no utility in them working across processes, since the processes do not share any state that would need to be synchronized. A variable in one process will not have the same data as a variable in the other process
From a system point of view, a thread is defined by his "state" and the "instruction pointer".
The instruction pointer (eip) contains the address of the next instruction to be executed.
A thread "state" can be : the registers (eax, ebx,etc), the signals, the open files, the code, the stack, the data managed by this thread (variables, arrays, etc) and also the heap.
A process is a group of threads that share a part of their "state": it might be the code, the data, the heap.
Hope i answer your question ;)
EDIT:
The processes can communicate via the IPCs (Inter process communications). There are 3 mecanisms : shared memory, message queue. Synchronization between processes can me made with the Semaphors
Threads synchronization can me made with mutexes (pthread_mutex_lock, pthread_mutex_unlock, etc)
Check Terracotta Cluster or Terracotta's DSO Clustering documentation to see how this issue can be solved (bytecode manipulation, maintaince the semantics of Java Language Specification on putfield/getfield-level etc.)
the most simplest answer is process means a program under execution and a program is nothing but collection of functions.
where thread is the part of the proccess because all the threads are functions.
in other way we can say that a process may have multiple threads.
always OS allocates the memory for a process and that memory is disributed among the threads of that process.OS does not allocates memory for threads.
In one sentence processes are designed more independently than threads are.
Their major differences can be described at memory level. Different processes share nothing among each other, from register, stock memory to heap memory, which make them safe on their own tracks. However, normally threads are designed to share a common heap memory, which provides a more closely connected way for multiple processes computing task. Creating a more efficient way to take up computation resources.
E.g. If I compute with 3 processes, I have to let them each finish their jobs and wait for their results in a system level, at the mean time, registers and stack memory are always taken up. However, if I do it with 3 threads, then if thread 2 luckily finish its job earlier, because the result it computed had already been stored to the common heap memory pool, we can simply kill it without waiting for others to deliver their results, and this released resources of registers and stock memory can be used on other purposes.
Process:
A process is nothing but a program under execution.
Each process have its own memory address space.
Process are used for heavyweight tasks i.e. is basically execution of applications.
Cost of communication between process is high.
Switching from one process to another require some time for saving and loading registers, memory maps etc.
Process is operating system approach.
Threads:
A thread is light weight sub-process.
Thread share the same address space.
Cost of communication between the thread is low.
Note: At least one process is required for each thread.
I suppose the processes can communicate through a third-party : a file or a database...
I am working on a Java web application for tomcat6 that offers suggest functionality. This means a user types in a free text and get suggestions for completing his input. It is essential that the web application needs to react very fast to make sense.
This web application makes suggestions for data that can be modified at any time. If new data is available the suggest index will be created completely new in the background with the help of a daemon thread. If the data preparation process is finished, the old index is thrown away and the new index comes into play. This offers the advantage of no service gaps.
The background process for data preparation costs a lot of CPU power. This causes the service to hang sometimes for more than a seconds, which makes the service less usable and causes a bad user experience.
My first attempt to solve the problem was to pause all background data preparation threads, when a request has to be processed. This attempt narrows the problem a bit, but the service is still not smooth. It is not that the piece of code for generating the suggests itself gets slower, but it seems as if tomcat does not start the thread for the request immediately all the time because of high load in other threads (I guess)
I have no idea how to face this problem. Can I interact with the thread scheduler of tomcat and tell him to force execution of requests? As expected adjusting the thread priority also did not help. I did not find configuration options for tomcat that offer help. Or is there no way to deal with this and I have to modify the software concept? I am helpless. Do you have any hints for my how to face this problem?
JanP
I would not change the thread priority, By doing that you are slowing down other Threads and will slow down other users. If you have synchronized data then you will run into a priority inversion problem, where your faster threads are waiting on lower priority threads to release locks on data.
Instead I would look at how to optimize the data generation process. What are you doing there ?
EDIT:
You could create an ExecutorService and send messages to it through a Queue like in this example: java thread pool keep running In order to be able to change the Thread priority of the tasks instead of calling ExecutorService pool = Executors.newFixedThreadPool(3); you would create a ThreadFactory and then have the ThreadFactory lower the priority of the Threads, then call ExecutorService pool = Executors.newSingleThreadExecutor(threadFactory);