How to read files in multithreaded mode? - java

I currently have a program that reads file (very huge) in single threaded mode and creates search index but it takes too long to index in single threaded environment.
Now I am trying to make it work in multithreaded mode but not sure the best way to achieve that.
My main program creates a buffered reader and passes the instance to thread and the thread uses the buffered reader instance to read the files.
I don't think this works as expected rather each thread is reading the same line again and again.
Is there a way to make the threads read only the lines that are not read by other thread? Do I need to split the file? Is there a way to implement this without splitting the file?
Sample Main program:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;
public class TestMTFile {
public static void main(String args[]) {
BufferedReader reader = null;
ArrayList<Thread> threads = new ArrayList<Thread>();
try {
reader = new BufferedReader(new FileReader(
"test.tsv"));
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
for (int i = 0; i <= 10; i++) {
Runnable task = new ReadFileMT(reader);
Thread worker = new Thread(task);
// We can set the name of the thread
worker.setName(String.valueOf(i));
// Start the thread, never call method run() direct
worker.start();
// Remember the thread for later usage
threads.add(worker);
}
int running = 0;
int runner1 = 0;
int runner2 = 0;
do {
running = 0;
for (Thread thread : threads) {
if (thread.isAlive()) {
runner1 = running++;
}
}
if (runner2 != runner1) {
runner2 = runner1;
System.out.println("We have " + runner2 + " running threads. ");
}
} while (running > 0);
if (running == 0) {
System.out.println("Ended");
}
}
}
Thread:
import java.io.BufferedReader;
import java.io.IOException;
public class ReadFileMT implements Runnable {
BufferedReader bReader = null;
ReadFileMT(BufferedReader reader) {
this.bReader = reader;
}
public synchronized void run() {
String line;
try {
while ((line = bReader.readLine()) != null) {
try {
System.out.println(line);
} catch (Exception e) {
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

Your bottleneck is most likely the indexing, not the file reading. assuming your indexing system supports multiple threads, you probably want a producer/consumer setup with one thread reading the file and pushing each line into a BlockingQueue (the producer), and multiple threads pulling lines from the BlockingQueue and pushing them into the index (the consumers).

See this thread - if your files are all on the same disk then you can't do better than reading them with a single thread, although it may be possible to process the files with multiple threads once you've read them into main memory.

If you can use Java 8, you may be able to do this quickly and easily using the Streams API. Read the file into a MappedByteBuffer, which can open a file up to 2GB very quicky, then read the lines out of the buffer (you need to make sure your JVM has enough extra memory to hold the file):
package com.objective.stream;
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.stream.Stream;
public class StreamsFileProcessor {
private MappedByteBuffer buffer;
public static void main(String[] args){
if (args[0] != null){
Path myFile = Paths.get(args[0]);
StreamsFileProcessor proc = new StreamsFileProcessor();
try {
proc.process(myFile);
} catch (IOException e) {
e.printStackTrace();
}
}
}
public void process(Path file) throws IOException {
readFileIntoBuffer(file);
getBufferStream().parallel()
.forEach(this::doIndex);
}
private Stream<String> getBufferStream() throws IOException {
try (BufferedReader reader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(buffer.array())))){
return reader.lines();
}
}
private void readFileIntoBuffer(Path file) throws IOException{
try(FileInputStream fis = new FileInputStream(file.toFile())){
FileChannel channel = fis.getChannel();
buffer = channel.map(FileChannel.MapMode.PRIVATE, 0, channel.size());
}
}
private void doIndex(String s){
// Do whatever I need to do to index the line here
}
}

First, I agree with #Zim-Zam that it is the file IO, not the indexing, that is likely the rate determining step. (So I disagree with #jtahlborn). Depends on how complex the indexing is.
Second, in your code, each thread has it's own, independent BufferedReader. Therefore they will all read the entire file. One possible fix is to use a single BufferedReader that they share. And then you need to synchronize the BufferedReader.readLine() method (I think) since the javadocs are silent on whether BufferedReader is thread-safe. And, since I think the IO is the botleneck, this will become the bottleneck and I doubt if multithreading will gain you much. But give it a try, I have been wrong occasionally. :-)
p.s. I agree with #jtahlmorn that a producer/consumer pattern is better than my share the BufferedReader idea, but that would be much more work for you.

Related

How to make a Jtext Component for input and output of a single process ran through ProcessBuilder like netbeans console [duplicate]

I am trying to create a sort of console/terminal that allows the user to input a string, which then gets made into a process and the results are printed out. Just like a normal console. But I am having trouble managing the input/output streams. I have looked into this thread, but that solution sadly doesn't apply to my problem.
Along with the standard commands like "ipconfig" and "cmd.exe", I need to be able to run a script and use the same inputstream to pass some arguments, if the script is asking for input.
For example, after running a script "python pyScript.py", I should be able pass further input to the script if it is asking for it(example: raw_input), while also printing the output from the script. The basic behavior you would expect from a terminal.
What I've got so far:
import java.awt.BorderLayout;
import java.awt.Color;
import java.awt.Dimension;
import java.awt.event.KeyEvent;
import java.awt.event.KeyListener;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import javax.swing.JFrame;
import javax.swing.JPanel;
import javax.swing.JScrollPane;
import javax.swing.JTextPane;
import javax.swing.text.BadLocationException;
import javax.swing.text.Document;
public class Console extends JFrame{
JTextPane inPane, outPane;
InputStream inStream, inErrStream;
OutputStream outStream;
public Console(){
super("Console");
setPreferredSize(new Dimension(500, 600));
setLocationByPlatform(true);
setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
// GUI
outPane = new JTextPane();
outPane.setEditable(false);
outPane.setBackground(new Color(20, 20, 20));
outPane.setForeground(Color.white);
inPane = new JTextPane();
inPane.setBackground(new Color(40, 40, 40));
inPane.setForeground(Color.white);
inPane.setCaretColor(Color.white);
JPanel panel = new JPanel(new BorderLayout());
panel.add(outPane, BorderLayout.CENTER);
panel.add(inPane, BorderLayout.SOUTH);
JScrollPane scrollPanel = new JScrollPane(panel);
getContentPane().add(scrollPanel);
// LISTENER
inPane.addKeyListener(new KeyListener(){
#Override
public void keyPressed(KeyEvent e){
if(e.getKeyCode() == KeyEvent.VK_ENTER){
e.consume();
read(inPane.getText());
}
}
#Override
public void keyTyped(KeyEvent e) {}
#Override
public void keyReleased(KeyEvent e) {}
});
pack();
setVisible(true);
}
private void read(String command){
println(command);
// Write to Process
if (outStream != null) {
System.out.println("Outstream again");
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(outStream));
try {
writer.write(command);
//writer.flush();
//writer.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
// Execute Command
try {
exec(command);
} catch (IOException e) {}
inPane.setText("");
}
private void exec(String command) throws IOException{
Process pro = Runtime.getRuntime().exec(command, null);
inStream = pro.getInputStream();
inErrStream = pro.getErrorStream();
outStream = pro.getOutputStream();
Thread t1 = new Thread(new Runnable() {
public void run() {
try {
String line = null;
while(true){
BufferedReader in = new BufferedReader(new InputStreamReader(inStream));
while ((line = in.readLine()) != null) {
println(line);
}
BufferedReader inErr = new BufferedReader(new InputStreamReader(inErrStream));
while ((line = inErr.readLine()) != null) {
println(line);
}
Thread.sleep(1000);
}
} catch (Exception e) {
e.printStackTrace();
}
}
});
t1.start();
}
public void println(String line) {
Document doc = outPane.getDocument();
try {
doc.insertString(doc.getLength(), line + "\n", null);
} catch (BadLocationException e) {}
}
public static void main(String[] args){
new Console();
}
}
I don't use the mentioned ProcessBuilder, since I do like to differentiate between error and normal stream.
UPDATE 29.08.2016
With the help of #ArcticLord we have achieved what was asked in the original question.
Now it is just a matter of ironing out any strange behavior like the non terminating process. The Console has a "stop" button that simply calls pro.destroy(). But for some reason this does not work for infinitely running processes, that are spamming outputs.
Console: http://pastebin.com/vyxfPEXC
InputStreamLineBuffer: http://pastebin.com/TzFamwZ1
Example code that does not stop:
public class Infinity{
public static void main(String[] args){
while(true){
System.out.println(".");
}
}
}
Example code that does stop:
import java.util.concurrent.TimeUnit;
public class InfinitySlow{
public static void main(String[] args){
while(true){
try {
TimeUnit.SECONDS.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(".");
}
}
}
You are on the right way with your code. There are only some minor things you missed.
Lets start with your read method:
private void read(String command){
[...]
// Write to Process
if (outStream != null) {
[...]
try {
writer.write(command + "\n"); // add newline so your input will get proceed
writer.flush(); // flush your input to your process
} catch (IOException e1) {
e1.printStackTrace();
}
}
// ELSE!! - if no outputstream is available
// Execute Command
else {
try {
exec(command);
} catch (IOException e) {
// Handle the exception here. Mostly this means
// that the command could not get executed
// because command was not found.
println("Command not found: " + command);
}
}
inPane.setText("");
}
Now lets fix your exec method. You should use separate threads for reading normal process output and error output. Additionally I introduce a third thread that waits for the process to end and closes the outputStream so next user input is not meant for process but is a new command.
private void exec(String command) throws IOException{
Process pro = Runtime.getRuntime().exec(command, null);
inStream = pro.getInputStream();
inErrStream = pro.getErrorStream();
outStream = pro.getOutputStream();
// Thread that reads process output
Thread outStreamReader = new Thread(new Runnable() {
public void run() {
try {
String line = null;
BufferedReader in = new BufferedReader(new InputStreamReader(inStream));
while ((line = in.readLine()) != null) {
println(line);
}
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Exit reading process output");
}
});
outStreamReader.start();
// Thread that reads process error output
Thread errStreamReader = new Thread(new Runnable() {
public void run() {
try {
String line = null;
BufferedReader inErr = new BufferedReader(new InputStreamReader(inErrStream));
while ((line = inErr.readLine()) != null) {
println(line);
}
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Exit reading error stream");
}
});
errStreamReader.start();
// Thread that waits for process to end
Thread exitWaiter = new Thread(new Runnable() {
public void run() {
try {
int retValue = pro.waitFor();
println("Command exit with return value " + retValue);
// close outStream
outStream.close();
outStream = null;
} catch (InterruptedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
});
exitWaiter.start();
}
Now this should work.
If you enter ipconfig it prints the command output, closes the output stream and is ready for a new command.
If you enter cmd it prints the output and let you enter more cmd commands like dir or cd and so on until you enter exit. Then it closes the output stream and is ready for a new command.
You may run into problems with executing python scripts because there are problems with reading Process InputStreams with Java if they are not flushed into system pipeline.
See this example python script
print "Input something!"
str = raw_input()
print "Received input is : ", str
You could run this with your Java programm and also enter the input but you will not see the script output until the script is finished.
The only fix I could find is to manually flush the output in the script.
import sys
print "Input something!"
sys.stdout.flush()
str = raw_input()
print "Received input is : ", str
sys.stdout.flush()
Running this script will bahave as you expect.
You can read more about this problem at
Java: is there a way to run a system command and print the output during execution?
Why does reading from Process' InputStream block altough data is available
Java: can't get stdout data from Process unless its manually flushed
EDIT: I have just found another very easy solution for the stdout.flush() problem with Python Scripts. Start them with python -u script.py and you don't need to flush manually. This should solve your problem.
EDIT2: We discussed in the comments that with this solution output and error Stream will be mixed up since they run in different threads. The problem here is that we cannot distinguish if output writing is finish when error stream thread comes up. Otherwise classic thread scheduling with locks could handle this situation. But we have a continuous stream until process is finished no matter if data flows or not. So we need a mechanism here that logs how much time has elapsed since last line was read from each stream.
For this I will introduce a class that gets an InputStream and starts a Thread for reading the incoming data. This Thread stores each line in a Queue and stops when end of stream arrives. Additionally it holds the time when last line was read and added to Queue.
public class InputStreamLineBuffer{
private InputStream inputStream;
private ConcurrentLinkedQueue<String> lines;
private long lastTimeModified;
private Thread inputCatcher;
private boolean isAlive;
public InputStreamLineBuffer(InputStream is){
inputStream = is;
lines = new ConcurrentLinkedQueue<String>();
lastTimeModified = System.currentTimeMillis();
isAlive = false;
inputCatcher = new Thread(new Runnable(){
#Override
public void run() {
StringBuilder sb = new StringBuilder(100);
int b;
try{
while ((b = inputStream.read()) != -1){
// read one char
if((char)b == '\n'){
// new Line -> add to queue
lines.offer(sb.toString());
sb.setLength(0); // reset StringBuilder
lastTimeModified = System.currentTimeMillis();
}
else sb.append((char)b); // append char to stringbuilder
}
} catch (IOException e){
e.printStackTrace();
} finally {
isAlive = false;
}
}});
}
// is the input reader thread alive
public boolean isAlive(){
return isAlive;
}
// start the input reader thread
public void start(){
isAlive = true;
inputCatcher.start();
}
// has Queue some lines
public boolean hasNext(){
return lines.size() > 0;
}
// get next line from Queue
public String getNext(){
return lines.poll();
}
// how much time has elapsed since last line was read
public long timeElapsed(){
return (System.currentTimeMillis() - lastTimeModified);
}
}
With this class we could combine the output and error reading thread into one. That lives while the input reading buffer threads live and have not comsumed data. In each run it checks if some time has passed since last output was read and if so it prints all unprinted lines at a stroke. The same with the error output. Then it sleeps for some millis for not wasting cpu time.
private void exec(String command) throws IOException{
Process pro = Runtime.getRuntime().exec(command, null);
inStream = pro.getInputStream();
inErrStream = pro.getErrorStream();
outStream = pro.getOutputStream();
InputStreamLineBuffer outBuff = new InputStreamLineBuffer(inStream);
InputStreamLineBuffer errBuff = new InputStreamLineBuffer(inErrStream);
Thread streamReader = new Thread(new Runnable() {
public void run() {
// start the input reader buffer threads
outBuff.start();
errBuff.start();
// while an input reader buffer thread is alive
// or there are unconsumed data left
while(outBuff.isAlive() || outBuff.hasNext() ||
errBuff.isAlive() || errBuff.hasNext()){
// get the normal output if at least 50 millis have passed
if(outBuff.timeElapsed() > 50)
while(outBuff.hasNext())
println(outBuff.getNext());
// get the error output if at least 50 millis have passed
if(errBuff.timeElapsed() > 50)
while(errBuff.hasNext())
println(errBuff.getNext());
// sleep a bit bofore next run
try {
Thread.sleep(100);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
System.out.println("Finish reading error and output stream");
}
});
streamReader.start();
// remove outStreamReader and errStreamReader Thread
[...]
}
Maybe this is not a perfect solution but it should handle the situation here.
EDIT (31.8.2016)
We discussed in comments that there is still a problem with the code while implementing a stop button that kills the started
process using Process#destroy(). A process that produces very much output e.g. in an infinite loop will
be destroyed immediately by calling destroy(). But since it has already produced a lot of output that has to be consumed
by our streamReader we can't get back to normal programm behaviour.
So we need some small changes here:
We will introduce a destroy() method to the InputStreamLineBuffer that stops the output reading and clears the queue.
The changes will look like this:
public class InputStreamLineBuffer{
private boolean emergencyBrake = false;
[...]
public InputStreamLineBuffer(InputStream is){
[...]
while ((b = inputStream.read()) != -1 && !emergencyBrake){
[...]
}
}
[...]
// exits immediately and clears line buffer
public void destroy(){
emergencyBrake = true;
lines.clear();
}
}
And some little changes in the main programm
public class ExeConsole extends JFrame{
[...]
// The line buffers must be declared outside the method
InputStreamLineBuffer outBuff, errBuff;
public ExeConsole{
[...]
btnStop.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent e) {
if(pro != null){
pro.destroy();
outBuff.destroy();
errBuff.destroy();
}
}});
}
[...]
private void exec(String command) throws IOException{
[...]
//InputStreamLineBuffer outBuff = new InputStreamLineBuffer(inStream);
//InputStreamLineBuffer errBuff = new InputStreamLineBuffer(inErrStream);
outBuff = new InputStreamLineBuffer(inStream);
errBuff = new InputStreamLineBuffer(inErrStream);
[...]
}
}
Now it should be able to destroy even some output spamming processes.
Note: I found out that Process#destroy() is not able to destroy child processes. So if you start cmd on windows
and start a java programm from there you will end up destroying the cmd process while the java programm is still running.
You will see it in the task manager. This problem could not be solved with java itself. it will need
some os depending external tools to get the pids of these processes and kill them manually.
Although #ArticLord solution is nice and neat, recently I faced the same kind of problem and came up with a solution that's conceptually equivalent, but slightly different in its implementation.
The concept is the same, namely "bulk reads": when a reader thread acquires its turn, it consumes all the stream it handles, and pass the hand only when it is done.
This guarantees the out/err print order.
But instead of using a timer-based turn assignment, I use a lock-based non-blocking read simulation:
// main method for testability: replace with private void exec(String command)
public static void main(String[] args) throws Exception
{
// create a lock that will be shared between reader threads
// the lock is fair to minimize starvation possibilities
ReentrantLock lock = new ReentrantLock(true);
// exec the command: I use nslookup for testing on windows
// because it is interactive and prints to stderr too
Process p = Runtime.getRuntime().exec("nslookup");
// create a thread to handle output from process (uses a test consumer)
Thread outThread = createThread(p.getInputStream(), lock, System.out::print);
outThread.setName("outThread");
outThread.start();
// create a thread to handle error from process (test consumer, again)
Thread errThread = createThread(p.getErrorStream(), lock, System.err::print);
errThread.setName("errThread");
errThread.start();
// create a thread to handle input to process (read from stdin for testing purpose)
PrintWriter writer = new PrintWriter(p.getOutputStream());
Thread inThread = createThread(System.in, null, str ->
{
writer.print(str);
writer.flush();
});
inThread.setName("inThread");
inThread.start();
// create a thread to handle termination gracefully. Not really needed in this simple
// scenario, but on a real application we don't want to block the UI until process dies
Thread endThread = new Thread(() ->
{
try
{
// wait until process is done
p.waitFor();
logger.debug("process exit");
// signal threads to exit
outThread.interrupt();
errThread.interrupt();
inThread.interrupt();
// close process streams
p.getOutputStream().close();
p.getInputStream().close();
p.getErrorStream().close();
// wait for threads to exit
outThread.join();
errThread.join();
inThread.join();
logger.debug("exit");
}
catch(Exception e)
{
throw new RuntimeException(e.getMessage(), e);
}
});
endThread.setName("endThread");
endThread.start();
// wait for full termination (process and related threads by cascade joins)
endThread.join();
logger.debug("END");
}
// convenience method to create a specific reader thread with exclusion by lock behavior
private static Thread createThread(InputStream input, ReentrantLock lock, Consumer<String> consumer)
{
return new Thread(() ->
{
// wrap input to be buffered (enables ready()) and to read chars
// using explicit encoding may be relevant in some case
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
// create a char buffer for reading
char[] buffer = new char[8192];
try
{
// repeat until EOF or interruption
while(true)
{
try
{
// wait for your turn to bulk read
if(lock != null && !lock.isHeldByCurrentThread())
{
lock.lockInterruptibly();
}
// when there's nothing to read, pass the hand (bulk read ended)
if(!reader.ready())
{
if(lock != null)
{
lock.unlock();
}
// this enables a soft busy-waiting loop, that simultates non-blocking reads
Thread.sleep(100);
continue;
}
// perform the read, as we are sure it will not block (input is "ready")
int len = reader.read(buffer);
if(len == -1)
{
return;
}
// transform to string an let consumer consume it
String str = new String(buffer, 0, len);
consumer.accept(str);
}
catch(InterruptedException e)
{
// catch interruptions either when sleeping and waiting for lock
// and restore interrupted flag (not necessary in this case, however it's a best practice)
Thread.currentThread().interrupt();
return;
}
catch(IOException e)
{
throw new RuntimeException(e.getMessage(), e);
}
}
}
finally
{
// protect the lock against unhandled exceptions
if(lock != null && lock.isHeldByCurrentThread())
{
lock.unlock();
}
logger.debug("exit");
}
});
}
Note that both solutions, #ArticLord's and mine, are not totally starvation-safe, and chances (really few) are inversely proportional to consumers speed.
Happy 2016! ;)

Force BufferedWriter to write from BlockingQueue when another tasks finished

I'm writing simple html parser with JSoup. I've got about 50 000 links to check, so I thought it's great chance to learn abut threads and concurnecy. I've got 8 tasks registered with ExecutorService: 6 of them parse links to some data stored in ArrayLists and then add it to the BlockingQueues. Two of the tasks are filewriters based on BufferedWriter. The problem is when my 6 tasks finish prase all links, file writers stop write data from BlockingQueue, so I lose part of data. I'm pretty newbie in java, so if you could give me a hand.... The code:
Main file:
public static void main(String[] args) {
BlockingQueue<ArrayList<String>> units = new ArrayBlockingQueue<ArrayList<String>>(50, true);
BlockingQueue<ArrayList<String>> subjects = new ArrayBlockingQueue<ArrayList<String>>(50, true);
File subjectFile = new File("lekarze.csv");
File unitFile = new File("miejsca.csv");
ExecutorService executor = Executors.newFixedThreadPool(9);
executor.submit(new Thread(new FileSaver(subjects, subjectFile)));
executor.submit(new Thread(new FileSaver(units, unitFile)));
for(int i = 29323; i < 29400; i++){
executor.submit(new ParserDocsThread(i, subjects, units, errors));
}
executor.shutdown();
}
FileSaver class:
package parser;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.concurrent.BlockingQueue;
public class FileSaver implements Runnable {
private BlockingQueue<ArrayList<String>> toWrite = null;
private File outputFile = null;
private BufferedWriter writer = null;
public FileSaver(BlockingQueue<ArrayList<String>> queue, File file){
toWrite = queue;
outputFile = file;
}
public void run() {
try {
writer = new BufferedWriter(new FileWriter(outputFile, true));
while(true){
try{
save(toWrite.take());
} catch(InterruptedException e) {
e.printStackTrace();
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
private void save(ArrayList<String> data){
String temp ="";
int size = data.size();
for(int i = 0; i < size; i++){
temp += data.get(i);
if(i != size - 1) temp += '\t';
}
try {
writer.write(temp);
writer.newLine();
} catch (IOException e) {
e.printStackTrace();
}
}
}
In ParserDocsThread I'm only use put() method to add elements to BlockingQueue.
Your consumer threads don't end cleanly because the take() call is waiting for a new array, and are not closing the buffered writer. The ServiceExecutor gives up on waiting for these threads to finish, and kills them. This is causing the last lines in the writer to not be written out to disk.
You should use poll(10, TimeUnit.SECONDS) (but with an appropriate timeout). After that timeout, your consumers will give up on the producers, and you should make sure you close your buffered writer properly so that the last of the buffer is printed out properly.
try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile, true)))
{
while(true){
List<String> data = toWrite.poll(10, TimeUnit.SECONDS);
if (data == null) {
break;
}
save(data, writer);
}
} catch (...) {
}
I've put the buffered writer here into a try-with-resources (so the try here will automatically close the writer) and passed it to your save method, but you can do it your way, and manually close the writer in a finally block if you want:
try {
...
} catch(...) {
} finally {
writer.close(); // Closes and flushes out the remaining lines
}
You may also want to put in a call to awaitTermination on the executor servier (like so: How to wait for all threads to finish, using ExecutorService?) with a wait time greater than your poll timeout.

Is there any Java blocking queue that can save data to hard drive when limit is reached

I know that I can use JMS and ActiveMQ, but I really need something very simple and without a lot of overhead. I did some test with ActiveMQ and didn't really liked a performance of persistence queues.
What I'm looking for is basic implementation of any blocking queue with ability to store message on HDD (ideally) if some size limit is reached. Then it should be able to read stored message from HDD and if possible stop writing new to HDD (restore in memory use).
My scenario is very simple - messages (json) are coming from outside world. I do some processing and then send them to another REST service. Problem can occur when target REST service is down or network between us is bad. In this case ready to go events are stored in queue that can potentially fill up all available memory. I don't want/need to write every message to HDD/DB - only those that can't fit into memory.
Thank you!
This code should work for you - its an in memory persistent blocking queue - needs some file tuning but should work
package test;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.LinkedList;
import java.util.List;
public class BlockingQueue {
//private static Long maxInMenorySize = 1L;
private static Long minFlushSize = 3L;
private static String baseDirectory = "/test/code/cache/";
private static String fileNameFormat = "Table-";
private static String currentWriteFile = "";
private static List<Object> currentQueue = new LinkedList<Object>();
private static List<Object> lastQueue = new LinkedList<Object>();
static{
try {
load();
} catch (IOException e) {
System.out.println("Unable To Load");
e.printStackTrace();
}
}
private static void load() throws IOException{
File baseLocation = new File(baseDirectory);
List<String> fileList = new ArrayList<String>();
for(File entry : baseLocation.listFiles()){
if(!entry.isDirectory() && entry.getName().contains(fileNameFormat)){
fileList.add(entry.getAbsolutePath());
}
}
Collections.sort(fileList);
if(fileList.size()==0){
//currentQueue = lastQueue = new ArrayList<Object>();
currentWriteFile = baseDirectory + "Table-1";
BufferedWriter writer = new BufferedWriter(new FileWriter(currentWriteFile));
while (!lastQueue.isEmpty()){
writer.write(lastQueue.get(0).toString()+ "\n");
lastQueue.remove(0);
}
writer.close();
}else{
if(fileList.size()>0){
BufferedReader reader = new BufferedReader(new FileReader(fileList.get(0)));
String line=null;
while ((line=reader.readLine())!=null){
currentQueue.add(line);
}
reader.close();
File toDelete = new File(fileList.get(0));
toDelete.delete();
}
if(fileList.size()>0){
BufferedReader reader = new BufferedReader(new FileReader(fileList.get(fileList.size()-1)));
currentWriteFile = fileList.get(fileList.size()-1);
String line=null;
while ((line=reader.readLine())!=null){
lastQueue.add(line);
}
reader.close();
//lastFileNameIndex=Long.parseLong(fileList.get(fileList.size()).substring(6, 9));
}
}
}
private void loadFirst() throws IOException{
File baseLocation = new File(baseDirectory);
List<String> fileList = new ArrayList<String>();
for(File entry : baseLocation.listFiles()){
if(!entry.isDirectory() && entry.getName().contains(fileNameFormat)){
fileList.add(entry.getAbsolutePath());
}
}
Collections.sort(fileList);
if(fileList.size()>0){
BufferedReader reader = new BufferedReader(new FileReader(fileList.get(0)));
String line=null;
while ((line=reader.readLine())!=null){
currentQueue.add(line);
}
reader.close();
File toDelete = new File(fileList.get(0));
toDelete.delete();
}
}
public Object pop(){
if(currentQueue.size()>0)
return currentQueue.remove(0);
if(currentQueue.size()==0){
try {
loadFirst();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
if(currentQueue.size()>0)
return currentQueue.remove(0);
else
return null;
}
public synchronized Object waitTillPop() throws InterruptedException{
if(currentQueue.size()==0){
try {
loadFirst();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
if(currentQueue.size()==0)
wait();
}
return currentQueue.remove(0);
}
public synchronized void push(Object data) throws IOException{
lastQueue.add(data);
this.notifyAll();
if(lastQueue.size()>=minFlushSize){
BufferedWriter writer = new BufferedWriter(new FileWriter(currentWriteFile));
while (!lastQueue.isEmpty()){
writer.write(lastQueue.get(0).toString() + "\n");
lastQueue.remove(0);
}
writer.close();
currentWriteFile = currentWriteFile.substring(0,currentWriteFile.indexOf("-")+1) +
(Integer.parseInt(currentWriteFile.substring(currentWriteFile.indexOf("-")+1,currentWriteFile.length())) + 1);
}
}
public static void main(String[] args) {
try {
BlockingQueue bq = new BlockingQueue();
for(int i =0 ; i<=8 ; i++){
bq.push(""+i);
}
System.out.println(bq.pop());
System.out.println(bq.pop());
System.out.println(bq.pop());
System.out.println(bq.waitTillPop());
System.out.println(bq.waitTillPop());
System.out.println(bq.waitTillPop());
System.out.println(bq.waitTillPop());
} catch (Exception e) {
e.printStackTrace();
}
}
}
Okay, so having your queue persisted to disk would work if you back your queue with a RandomAccessFile, a MemoryMappedFile or a MappedByteBuffer.. or some other equivalent implementation.
In the event of your JVM crashing or terminating prematurely, you can pretty much rely on your operating system to persist uncommitted buffers to disk. The caveat is that if your machine crashes beforehand, you can say goodbye to any updates in your queue, so make sure you understand this.
You can sync your disk for a guaranteed persistence albeit with a heavy performance hit.
From a more hardcore perspective, another option is to replicate to another machine for redundancy, which warrants a separate answer given it's complexity.

Parallel image renaming in Java with ExecutorService: Doesn't use 100% of my cpu-power

I have a problem with my parallel Java-code. I try to read some images from the disk, change the names of the images and then save them again into a different folder.
To do so, I tried to run it in parallel as follows:
int nrOfThreads = Runtime.getRuntime().availableProcessors();
int nrOfImagesPerThread = Math.round(remainingImages.size()/((float)nrOfThreads));
ExecutorService ex2 = Executors.newFixedThreadPool(nrOfThreads);
int indexCounter = 0;
for(int i = 0; i< nrOfThreads; ++i) {
if(i != (nrOfThreads-1)) {
ex2.execute(new ImageProcessing(remainingImages.subList(indexCounter, indexCounter+nrOfImagesPerThread), newNames.subList(indexCounter,indexCounter+nrOfImagesPerThread)));
indexCounter+=nrOfImagesPerThread;
}else {
ex2.execute(new ImageProcessing(remainingImages.subList(indexCounter, remainingImages.size()), newNames.subList(indexCounter,remainingImages.size())));
}
}
ex2.shutdown();
try {
ex2.awaitTermination(12, TimeUnit.HOURS);
} catch (InterruptedException e) {
e.printStackTrace();
}
and here is the ImageProcessing-class:
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.util.List;
import javax.imageio.ImageIO;
public class ImageProcessing implements Runnable {
private List<String> oldPaths;
private List<String> newPaths;
public ImageProcessing(List<String> oldPaths, List<String> newPaths) {
this.oldPaths = oldPaths;
this.newPaths = newPaths;
}
#Override
public void run() {
for(int i = 0; i< oldPaths.size();++i) {
try {
BufferedImage img = ImageIO.read(new File(oldPaths.get(i)));
File output = new File(newPaths.get(i));
ImageIO.write(img, "jpg", output);
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
I divide the image-locations in the for-loop into (number of threads)-parts, so in my case around 8 parts. When I now run the code, it does run in parallel, but it does not utilize 100% of my cpu power. It's only using around 25% of each processor.
Does anybody have an idea why that happend? Or did I just screw up somewhere in the programming?
Thanks a lot!
Edit: Just for completion for people looking for the same functionality, I had a look at the Apache commons library (see here) and found a nice and much faster way to copy the images from one HDD to the other. The ImageProcessing-class looks now like the following:
import java.io.File;
import java.io.IOException;
import java.util.List;
import org.apache.commons.io.FileUtils;
public class ImageProcessing implements Runnable {
private List<String> oldPaths;
private List<String> newPaths;
public ImageProcessing(List<String> oldPaths, List<String> newPaths) {
this.oldPaths = oldPaths;
this.newPaths = newPaths;
}
#Override
public void run() {
for(int i = 0; i< oldPaths.size();++i) {
File sourceFile = new File(oldPaths.get(i));
File targetFile = new File(newPaths.get(i));
//copy file from one location to other
try {
FileUtils.copyFile(sourceFile, targetFile);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
Your problem is, that the bottleneck here definitly is the I/O to the disk.
You might need the same time renaming your files without an ExecutorService.
In other words: Writing the changes (renaming the files) to the disk consumes more time than used by your CPU.
You cannot multithread such an action.
Just measure the time needed in a serial (non-multithreaded) version of your code and compare it to the the time needed with your multithreaded code. It will be more or less the same.

Log thread memory leak

I have coded a background logging thread for my program, if a class needs a logger it pulls it from my threadpool, so for each filename there is only one log running. The class, adds anything which needs to be logged via log(String).
Anyway whenever I set logging on and it runs the writetolog() after a while I get heapoutofmemory exception. This is caused by the log threads, but I can't see where the memory leak is, and I am not that great at threading. My only idea is that it is in the buffered writer?
import java.io.File;
import java.io.IOException;
import java.io.FileWriter;
import java.util.Calendar;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
public class Log extends Thread{
private String file;
private BlockingQueue<String> pq = new LinkedBlockingQueue<String>();
private BufferedWriter bw;
private boolean Writing;
#Depreciated
public Log(){
super();
file = "log.txt";
start();
}
public Log(ThreadGroup tg, String fileName){
super(tg,fileName);
file = fileName;
try {
new File(file).createNewFile();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
start();
}
public Log(String fileName){
file = fileName;
try {
new File(file).createNewFile();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
start();
}
#Override
public void run(){
//System.out.println("Log Thread booted " +file);
while(Run.running){
if (!Writing){
if(Run.logging)
writeToLog();
}
try{
Thread.sleep(500);
}catch(InterruptedException e){
Thread.currentThread().interrupt();
break;
}
}
//System.out.println("Log Thread shutting down " +file);
}
public synchronized void log(String s){
if(Run.logging)
pq.add(s);
}
private void writeToLog(){
try{
Writing = true;
bw = new BufferedWriter(new FileWriter(file, true));
while(!pq.isEmpty()){
bw.write(Calendar.getInstance().getTime().toString() +" " +pq.poll());
bw.newLine();
}
bw.flush();
bw.close();
Writing = false;
}catch(Exception e){Writing = false; e.printStackTrace();}
}
}
EDIT - It is worth mentioning as well that in the context of the program it is logging 100's - 1000's of lines
Many thanks
Sam
If your background thread doesn't write to the disk fast enough, the LinkedBlockingQueue (whose capacity you left unspecified) will grow until it contains Integer.MAX_VALUE strings. That's too much for your java heap size.
Specify a capacity so that, in case of a full queue, the thread calling the log method will wait while some part of the queued log is dumped on disk :
private BlockingQueue<String> pq = new LinkedBlockingQueue<String>(1000);
Use put instead of add in the log method so that the logging operation waits instead of throwing an exception.
(did you notice that you write the time at writing on disk instead of the time at logging ?)
I believe having private BufferedWriter bw; as member variable is causing the trouble. Since you are only using it in your writeToLog() function there is no reason for it to be a member variable and get instantiated every time by multiple threads. Creating BufferedWriter within the function will GC the object as soon as it goes out of scope.
private void writeToLog(){
try{
Writing = true;
BufferedWriter bw = new BufferedWriter(new FileWriter(file, true));
while(!pq.isEmpty()){
bw.write(Calendar.getInstance().getTime().toString() +" " +pq.poll());
bw.newLine();
}
bw.flush();
bw.close();
Writing = false;
}catch(Exception e){Writing = false; e.printStackTrace();}
}

Categories

Resources