Multi threading multiple pdf files - java

So i'm trying to run multiple PDF files through a function that scrapes the text, compares it to a static dictionary , then adds it's relational data to an index table in MYSQL. I looked into multi-threading but am not sure if this would achieve what I need.
Here is the for loop where I am going through all the PDF files
for(String temp: files){
//addToDict(temp,dictonary,conn);
//new Scraper(temp,dictonary,conn).run();
Scraper obj=new Scraper(temp,dictonary,conn);
Thread T1 =new Thread(obj);
T1.start();
//System.out.println((ammountOfFiles--)+" files left");
}
And here is the Scraper class I created that implements runnable
public class Scraper implements Runnable {
private String filePath;
private HashMap<String,Integer> map;
private Connection conn;
public Scraper(String file_path,HashMap<String,Integer> dict,Connection connection) {
// store parameter for later user
filePath =file_path;
map = dict;
conn = connection;
}
#Override
public void run() {
//cut file path so it starts from the data folder
int cutPos = filePath.indexOf("Data");
String cutPath = filePath.substring(cutPos);
cutPath = cutPath.replaceAll("\\\\", "|");
System.out.println(cutPath+" being scrapped");
// Queries
String addSentanceQuery ="INSERT INTO sentance(sentance_ID,sentance_Value) VALUES(Default,?)";
String addContextQuery ="INSERT INTO context(context_ID,word_ID,sentance_ID,pdf_path) VALUES(Default,?,?,?)";
// Prepared Statementes
// RESULT SETS
ResultSet sentanceKeyRS=null;
BodyContentHandler handler = new BodyContentHandler(-1);
Metadata metadata = new Metadata();
FileInputStream inputstream = null;
try {
inputstream = new FileInputStream(new File(filePath));
} catch (FileNotFoundException ex) {
Logger.getLogger(Scraper.class.getName()).log(Level.SEVERE, null, ex);
}
ParseContext pcontext = new ParseContext();
//parsing the document using PDF parser
PDFParser pdfparser = new PDFParser();
try {
pdfparser.parse(inputstream, handler, metadata, pcontext);
} catch (IOException ex) {
Logger.getLogger(Scraper.class.getName()).log(Level.SEVERE, null, ex);
} catch (SAXException ex) {
Logger.getLogger(Scraper.class.getName()).log(Level.SEVERE, null, ex);
} catch (TikaException ex) {
Logger.getLogger(Scraper.class.getName()).log(Level.SEVERE, null, ex);
}
//getting the content of the document
String fileText = handler.toString();
fileText = fileText.toLowerCase();
//spilt text by new line
String sentances [] = fileText.split("\\n");
for(String x : sentances){
x = x.trim();
if(x.isEmpty() || x.matches("\\t+") || x.matches("\\n+") || x.matches("")){
}else{
int sentanceID = 0;
//add sentance to db and get the id
try (PreparedStatement addSentancePrepare = conn.prepareStatement(addSentanceQuery,Statement.RETURN_GENERATED_KEYS)) {
addSentancePrepare.setString(1, x);
addSentancePrepare.executeUpdate();
sentanceKeyRS = addSentancePrepare.getGeneratedKeys();
while (sentanceKeyRS.next()) {
sentanceID = sentanceKeyRS.getInt(1);
}
addSentancePrepare.close();
sentanceKeyRS.close();
} catch (SQLException ex) {
Logger.getLogger(Scraper.class.getName()).log(Level.SEVERE, null, ex);
}
String words [] = x.split(" ");
for(String y : words){
y = y.trim();
if(y.matches("\\s+") || y.matches("")){
}else if(map.containsKey(y)){
//get ID and put in middle table
try (PreparedStatement addContextPrepare = conn.prepareStatement(addContextQuery)) {
addContextPrepare.setInt(1, map.get(y));
addContextPrepare.setInt(2, sentanceID);
addContextPrepare.setString(3, cutPath);
addContextPrepare.executeUpdate();
addContextPrepare.close();
} catch (SQLException ex) {
Logger.getLogger(Scraper.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
}
}
try {
inputstream.close();
} catch (IOException ex) {
Logger.getLogger(Scraper.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
Am I going about this correctly? I have never used multi threading but it seems like it would speed up my program.

You completed the basic modeling of your program. Conceptually, you got it almost right. Few concerns though.
Scalability
you simply cannot increase the number of threads as you get more files to process. Even though increasing number of concurrent workers should increase the performance as we feel, in real world it might not be the case. When number of threads increases pass a certain level (depends on various parameters) actually the performance decreases.(due to thread contention, communication, memory usage). So I;m proposing you to use a ThreadPool implementation comes with java concurrent package. Refer to the following modification I did to your code.
public class Test {
private final ThreadPoolExecutor threadPoolExecutor;
public Test(int coreSize, int maxSize) {
this.threadPoolExecutor = new ThreadPoolExecutor(coreSize,maxSize, 50, TimeUnit.MILLISECONDS, new ArrayBlockingQueue<Runnable>(100));
}
public void submit(String[] files) {
for(String temp: files){
//addToDict(temp,dictonary,conn);
//new Scraper(temp,dictonary,conn).run();
Scraper obj=new Scraper(temp,dictonary,conn);
threadPoolExecutor.submit(obj);
//System.out.println((ammountOfFiles--)+" files left");
}
}
public void shutDown() {
this.threadPoolExecutor.shutdown();
}
}
Thread safety and Synchronization
I can see you have shared the java.sql.Connection instance across the threads. Eventhough java.sql.Connection is thread safe, this usage will drop your app performance significantly since java.sql.Connection achives thread safety through synchronization. So only one thread would be able to use the connection at a time. To overcome this we can use a Connection Pooling concept. One simple impl i could suggest is Apache Commons dbcp

Related

Returning values from thread

I am running selenium tests inside a for loop which takes time.
I needed to indicate the progress of those tests using javafx progressbar.
So I replaced the code inside for loop with a task.
Following is my code
The runTests() method returns a String which is displayed as an Alert
I cannot return a String from inside Task<Void> as the return type is Void
The test code is inside runTest(data) method which returns true or false
#FXML
private void handleRunTestButton(ActionEvent aEvent) throws IOException {
String testResultMessage = runTests();
if (!testResultMessage.equals("Testing Complete!")) {
Alert alert = DialogUtils.getAlert("Info", "Information", testResultMessage, "info");
alert.showAndWait();
} else {
Alert alert = DialogUtils.getAlert("Error", "Error(s)", testResultMessage, "error");
alert.showAndWait();
}
}
private String runTests() {
/* xlsx read */
FileInputStream fis = null;
File testDataFile = null;
try {
fis = new FileInputStream(selectedTestDataFile);
} catch (FileNotFoundException e) {
return "File Input Stream Error: File Not Found";
}
// Finds the workbook instance for XLSX file
XSSFWorkbook myWorkBook = null;
try {
myWorkBook = new XSSFWorkbook(fis);
} catch (IOException e) {
return "XSSFWorkbook I/O Error";
}
// Return first sheet from the XLSX workbook
XSSFSheet mySheet = myWorkBook.getSheetAt(0);
int totalWids = mySheet.getLastRowNum();
final Task<Void> task = new Task<Void>() {
#Override
protected Void call() throws Exception {
for (int rowIndex = 1; rowIndex <= totalWids; rowIndex++) {
updateProgress(rowIndex, totalWids);
Row row = mySheet.getRow(rowIndex);
if (row != null) {
String data = "";
Cell cellData = row.getCell(2);
if (cellData != null) {
data = cellWid.getStringCellValue();
boolean testresult = runTest(data);
System.out.println(rowIndex + ". data = " + data + ", testresult = " + testresult);
}
}
}
return null;
}
};
progressBar.progressProperty().bind(task.progressProperty());
progressIndicator.progressProperty().bind(task.progressProperty());
final Thread thread = new Thread(task, "task-thread");
thread.setDaemon(true);
thread.start();
/* xlsx read */
FileOutputStream fos = null;
try {
fos = new FileOutputStream(selectedTestDataFile);
} catch (FileNotFoundException e) {
try {
myWorkBook.close();
} catch (IOException e1) {
return "Error: Please Close Workbook";
}
return "Error: File Not Found";
}
try {
myWorkBook.write(fos);
} catch (IOException e) {
try {
myWorkBook.close();
} catch (IOException e1) {
return "Error: Please Close Workbook";
}
return "Error: Workbook Write";
}
try {
fos.close();
} catch (IOException e) {
try {
myWorkBook.close();
} catch (IOException e1) {
return "Error: Please Close Workbook";
}
return "Error: File Output Stream";
}
try {
myWorkBook.close();
} catch (IOException e) {
return "Error: Please Close Workbook";
}
try {
fis.close();
} catch (IOException e) {
return "Error: Input file format";
}
return "Testing Complete!";
}
However, now it returns Testing Complete! while the tests are still running.
I am new to multithreading. Please suggest me how to structure the code.
How can I make the runTests() method return a String value from inside
final Task<Void> task = new Task<Void>() {
#Override
protected Void call() throws Exception {
for () {
}
return null;
}
};
Before this, when I didn't use Task my code showed the alert properly however the progress bar did not update despite of setting the progress from within the for loop.
Your code, in general, seems pretty solid, but there are several problems.
The task you created does the trick, and the progress bar will work, but it uses a thread so returning that the tests are complete without confirming the progress of the thread is wrong. Because the tests are in a thread and the method returns a value without being dependent on it, the value is returned before the tests are done.
When calling thread.start() the thread starts execution seperatly from your current thread, meaning that your code continues to execute as usual even if the thread was not done.
You have 2 possible options: keep the thread, or don't. If you don't keep the thread, that means that the tests are executed in the method which causes the javaFX event that called it to wait for the tests to finish. This is a bad idea because now the javaFX thread is stuck and the window can't handle any other events (basically, iresponsive).
A good option is to keep the thread, only that at the end of the thread you could show a dialog indicating whether the tests were complete or not. To do that you can use Platform.runLater(runnable) and pass it a Runnable object which shows the dialog:
Platform.runLater(()->{
//show dialog
});
It is required because you can't show a dialog while not in the javaFX thread. This allows you to run something in the javaFX thread.
Another issue is the fact that you're accessing the files outside of your thread. Meaning that at the same time the thread runs your test, you attempt to access the files and write to them. Instead of doing that, you should either write to the file in the thread or before it is started.
To summerize it all, you should use your thread to execute the tests and show the dialogs which indicate whether or not the tests were completed. Writing to your test file should not be done while the thread is still executing tests, but rather after the thread was finished, so you can do it at the end of the task.
public void runTests(){
if(testsRunning) return;
testsRunning = true;
final Task<Void> task = new Task<Void>() {
#Override
protected Void call() throws Exception {
FileInputStream fis = null;
File testDataFile = null;
try {
fis = new FileInputStream(selectedTestDataFile);
} catch (FileNotFoundException e) {
displayResponse("File Input Stream Error: File Not Found");
}
// Finds the workbook instance for XLSX file
XSSFWorkbook myWorkBook = null;
try {
myWorkBook = new XSSFWorkbook(fis);
} catch (IOException e) {
displayResponse("XSSFWorkbook I/O Error");
}
// displayResponse(first sheet from the XLSX workbook
XSSFSheet mySheet = myWorkBook.getSheetAt(0);
int totalWids = mySheet.getLastRowNum();
for (int rowIndex = 1; rowIndex <= totalWids; rowIndex++) {
updateProgress(rowIndex, totalWids);
Row row = mySheet.getRow(rowIndex);
if (row != null) {
String data = "");
Cell cellData = row.getCell(2);
if (cellData != null) {
data = cellWid.getStringCellValue();
boolean testresult = runTest(data);
System.out.println(rowIndex + ". data = " + data + ", testresult = " + testresult);
}
}
}
/* xlsx read */
FileOutputStream fos = null;
try {
fos = new FileOutputStream(selectedTestDataFile);
} catch (FileNotFoundException e) {
try {
myWorkBook.close();
} catch (IOException e1) {
displayResponse("Error: Please Close Workbook");
}
displayResponse("Error: File Not Found");
}
try {
myWorkBook.write(fos);
} catch (IOException e) {
try {
myWorkBook.close();
} catch (IOException e1) {
displayResponse("Error: Please Close Workbook");
}
displayResponse("Error: Workbook Write");
}
try {
fos.close();
} catch (IOException e) {
try {
myWorkBook.close();
} catch (IOException e1) {
displayResponse("Error: Please Close Workbook");
}
displayResponse("Error: File Output Stream");
}
try {
myWorkBook.close();
} catch (IOException e) {
displayResponse("Error: Please Close Workbook");
}
try {
fis.close();
} catch (IOException e) {
displayResponse("Error: Input file format");
}
displayResponse("Testing Complete!");
return null;
}
private void displayResponse(String testResultMessage){
Platform.runLater(()->{
if (testResultMessage.equals("Testing Complete!")) {
Alert alert = DialogUtils.getAlert("Info", "Information", testResultMessage, "info");
alert.showAndWait();
} else {
Alert alert = DialogUtils.getAlert("Error", "Error(s)", testResultMessage, "error");
alert.showAndWait();
}
testsRunning = false;
});
}
};
progressBar.progressProperty().bind(task.progressProperty());
progressIndicator.progressProperty().bind(task.progressProperty());
final Thread thread = new Thread(task, "task-thread");
thread.setDaemon(true);
thread.start();
}
So this code now does everything test related in the thread and doesn't interrupt your window from handling events. There is one problem from this: someone might press the runTests button again, while the tests are running. One option is to use a boolean indicating whether the tests are already active and check its value when runTests is called which I added and is called testsRunning. displayResponse is called when the tests where finished (completed or not) and it displayes the response dialog.
Hope I helped, and sorry for the long answer.
First off, you can't return a value from the thread in the sense you want it to without blocking. Instead, try calling a method when the thread is done.
Let say we have this thread that runs some intensive task (in this case a simple forloop) and you want it to "return" the sum when it is done:
private void startMyThread() {
Thread t = new Thread( () -> {
System.out.println("Thread Started.");
// Some intensive task
int sum = 0;
for(int i = 0; i < 1000000; i++) {
sum++;
}
System.out.println("Thread ending.");
threadIsDone(sum);
});
System.out.println("Starting Thread.");
t.start();
}
Instead of getting your return value from startMyThread() you wait to execute your action until threadIsDone() is called:
private void threadIsDone(int sum) {
Platform.runLater( () -> {
/* Update Progress Bar */
System.out.println("Updated Progress Bar");
});
System.out.println("Thread ended.");
}
You'll notice I use Platform.runLater() inside the method because all updates to JavaFx elements needs to be done on the main thread and since we called threadIsDone() from a different thread, we need to tell JavaFx that we want it to do this action on the main thread and not the current thread.

get query behind each executorservice thread

I am using executorsevice in JAVA to execute some threads, let’s say ten threads, number of threads may vary. Each thread is executing a SQL server query. I am using Future and Callable classes to submit the tasks. I am getting the results [using future.get()] once each thread is finished.
Now my requirement is that I need to know the query which is executed by each thread once its result is returned, even if the result is an empty set.
Here is my code:
List<Future<List>> list = new ArrayList<Future<List>>();
int totalThreads = allQueriesWeight.size();
ExecutorService taskExecutor = Executors.newFixedThreadPool(totalThreads);
for (String query : allQueriesWeight) {//allQueriesWeight is an arraylist containing sql server queries
SearchTask searchTask = new SearchTask(query);
Future<List> submit = taskExecutor.submit(searchTask);
list.add(submit);
}
Here is my call function:
#Override
public List<SearchResult> call() throws Exception {
java.sql.Statement statement = null;
Connection co = null;
List<SearchResult> allSearchResults = new ArrayList();
try {
//executing query and getting results
while (r1.next()) {
...
allSearchResults.add(r);//populating array
}
} catch (Exception e) {
Logger.getLogger(GenericResource.class.getName()).log(Level.SEVERE, null, e);
} finally {
if (statement != null) {
statement.close();
}
if (co != null) {
co.close();
}
}
return allSearchResults;
}
Here is how I am getting the results:
for (Future<List> future : list) {
try {
System.out.println(future.get().size());
List<SearchResult> sr = future.get();
} catch (InterruptedException ex) {
Logger.getLogger(GenericResource.class.getName()).log(Level.SEVERE, null, ex);
} catch (ExecutionException ex) {
Logger.getLogger(GenericResource.class.getName()).log(Level.SEVERE, null, ex);
}
}
In this above for loop, I need to identify the query of which the result is returned. I am a newbie and any help/suggestion is highly appreciated.
Thanks.
Alternative 1:
You have both the lists in the same order and of same size, so you can simple do as below
for (int i = 0; i < allQueriesWeight.size(); i++) {
allQueriesWeight.get(i);
futureList.get(i);
}
Alternative 2:
If all the queries are different, you can use a map as shown below but this approach will lose the order of execution.
int totalThreads = allQueriesWeight.size();
Map<String,Future<List>> map = new HashMap<>;
ExecutorService taskExecutor = Executors.newFixedThreadPool(totalThreads);
for (String query : allQueriesWeight) {//allQueriesWeight is an arraylist containing sql server queries
SearchTask searchTask = new SearchTask(query);
Future<List> submit = taskExecutor.submit(searchTask);
map.put(query ,submit );
}
And then iterate the map
for (Entry<String,Future<List>> future : map.) {
System.out.println("query is:" +map.getKey());
List<SearchResult> sr = map.getValue().get();
}
Alternative 3
If you want to keep the order, create a class with Future and query as the attributes and then put that class in list
public class ResultWithQuery {
private final Future<List<?>> future;
private final String query;
public ResultWithQuery(Future<List<?>> future, String query) {
this.future = future;
this.query = query;
}
public Future<List<?>> getFuture() {
return future;
}
public String getQuery() {
return query;
}
}
And
List<ResultWithQuery > list = new ArrayList<ResultWithQuery >();
int totalThreads = allQueriesWeight.size();
ExecutorService taskExecutor = Executors.newFixedThreadPool(totalThreads);
for (String query : allQueriesWeight) {//allQueriesWeight is an arraylist containing sql server queries
SearchTask searchTask = new SearchTask(query);
Future<List> submit = taskExecutor.submit(searchTask);
list.add(new ResultWithQuery (submit, query));
}
And iterate the list
for (ResultWithQuery resQuery: list) {
try {
resQuery.getQuery();
List<SearchResult> sr = resQuery.getFuture.get();
} catch (InterruptedException ex) {
Logger.getLogger(GenericResource.class.getName()).log(Level.SEVERE, null, ex);
} catch (ExecutionException ex) {
Logger.getLogger(GenericResource.class.getName()).log(Level.SEVERE, null, ex);
}
}

Java runtime.exec user input race condition

I want my app to be able to use a global su instance. I have code that does that, but I have encountered a race condition, I believe.
I am storing some variables for su like so:
public static List<Object> rootObjects = Collections.synchronizedList(new ArrayList<>());
protected void onCreate(Bundle savedInstanceState) {
...
if(PreferenceManager.getDefaultSharedPreferences(
getApplicationContext()).getBoolean("use_su", false) && rootObjects.isEmpty())
{
try {
Process process = Runtime.getRuntime().exec("su");
rootObjects.add(process);
InputStream inputStream = new DataInputStream(process.getInputStream());
rootObjects.add(inputStream);
OutputStream outputStream = new DataOutputStream(process.getOutputStream());
rootObjects.add(outputStream);
} catch (IOException e) {
Log.d(MainActivity.mainActivity.getPackageName(), e.getLocalizedMessage());
}
finally {
synchronized (rootObjects) {
rootObjects.notifyAll();
}
}
}
}
and using them like so:
byte[] getPrivateKeyAsSuperUser() {
byte[] data = null;
DataInputStream inputStream = null;
DataOutputStream outputStream = null;
if(MainActivity.rootObjects.size() != 3)
synchronized (MainActivity.rootObjects)
{
try {
MainActivity.rootObjects.wait();
} catch (InterruptedException e) {
Log.d(MainActivity.mainActivity.getPackageName(), e.getLocalizedMessage());
}
}
for(Object rootObj : MainActivity.rootObjects)
{
if(rootObj instanceof DataInputStream)
inputStream = (DataInputStream) rootObj;
else if(rootObj instanceof DataOutputStream)
outputStream = (DataOutputStream) rootObj;
}
try {
outputStream.writeBytes(String.format("cat \"%s\"\n", sshPrivateKey.getAbsolutePath()));
outputStream.flush();
data = readStream(inputStream);
} catch (IOException e) {
Log.d(MainActivity.mainActivity.getPackageName(), e.getLocalizedMessage());
}
return data;
}
private byte[] readStream(InputStream stream) {
byte[] data = null;
try {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte buff[] = new byte[1024];
int count = 0;
while (stream.available() != 0 && (count = stream.read(buff)) != -1) {
bos.write(buff, 0, count);
}
data = bos.toByteArray();
//System.out.println(new String(data));
} catch (IOException e) {
Log.d(MainActivity.mainActivity.getPackageName(), e.getLocalizedMessage());
}
return data;
}
But it does not wait like I expect, and I instantly receive a Toast that the returned private key is not valid with my sanity check (It's probably null).
The code works if I let Process finish initializing, but I'd like the program to do that for me.
I've tried some other synchronization techniques such as locks, but apparently as soon as you know if an object has a lock your info is stale.
What is the best thread safe approach to have the caller of getPrivateKeyAsSuperUser() wait if Process is not initialized properly?
EDIT:
I would like to add that through some debugging, I have found that I do not want be waiting for Process to initialize (because what I have DOES that), but rather, that the shell spawned by su is valid to accept further commands. I suppose I could have a thread pipe something like echo DONE and loop until I get DONE back, but that seems like that would waste CPU horsepower. If someone could lend some knowledge on the subject, I would be extremely grateful.
You're attempting the singleton pattern here. I'm not sure why you want to store these objects in a list. The most sensible way to store them is in an object that you guarantee to create one instance of. There are a few ways you could do this. I think in your case the following would work
public class SuProcessHolder {
// store the state of the process here - this would be your Process and streams as above
// these should be non-static members of the class
// this would be the singleton instance you'll use - it will be constructed once
// on first use
private static SuProcessHolder singletonInstance = new SuProcessHolder();
public SuProcessHolder() {
// put your construction code in here to create an SU process
}
// this will access your SU process
public static SuProcessHolder getInstance() { return singletonInstance; }
}
Then, wherever you need your SU process, just call
SuProcessHolder.getInstance()
and it will be there like a Michael Jackson song.
I have solved it. I did end up having to echo and check for done, but I have done it without a loop, or sleeping in my thread, so it will fire as soon as it can, without hogging the CPU. The concurrent class I was looking for was CountDownLatch as well.
The assignment look like this:
process = Runtime.getRuntime().exec("su");
outputStream = new DataOutputStream(process.getOutputStream());
outputStream.writeBytes("echo DONE\n");
outputStream.flush();
inputStream = new DataInputStream(process.getInputStream());
byte[] buff = new byte[4];
inputStream.read(buff);
if(new String(buff).equals("DONE"));
MainActivity.rootLatch.countDown();
and getPrivateKeyAsSuperUser() became:
byte[] getPrivateKeyAsSuperUser() {
byte[] data = null;
try {
MainActivity.rootLatch.await();
} catch (InterruptedException e) {
Log.d(MainActivity.mainActivity.getPackageName(), e.getLocalizedMessage());
}
Su su = Su.getStaticInstance();
try {
su.outputStream.writeBytes(String.format("cat \"%s\"\n", sshPrivateKey.getAbsolutePath()));
su.outputStream.flush();
data = readStream(su.inputStream);
} catch (IOException e) {
Log.d(MainActivity.mainActivity.getPackageName(), e.getLocalizedMessage());
}
return data;
}
Although, this feels slightly sloppy, I may end up posting this on Code Review.

Execute SQL Queries in multiple threads (HSQLDB)

Let's say we've got an SQL database (hsqldb) and want to run a number of queries on it which do not modify the content.
This takes a long time for some queries and I would like to run the queries in multiple threads.
So my question is: what is the best way to implement this?
I did not find any good samples to do this so I came up with the following (which I would love to get some comments on).
First, very briefly in words:
I use thread-safe collections to access the queries and to put the results in. The queries are executed in a number of worker threads. The results are processed in the main thread which checks for new results until all threads are finished.
Now the code:
Create thread-safe collections of queries and results:
ConcurrentLinkedQueue<String> queries = new ConcurrentLinkedQueue<String>()
ConcurrentLinkedQueue<ResultSet> sqlResults = new ConcurrentLinkedQueue<ResultSet>();
Create a number of threads and start them (edited):
ExecutorService executorService = Executors.newFixedThreadPool(4);
for(int i=0; i<4; i++){
executorService.execute(new QueryThread(sqlResults, queries));
}
Within the thread class QueryThread a connection is opened and queries are executed as long as there are any left:
private class QueryThread implements Runnable {
private ConcurrentLinkedQueue<ResultSet> sqlResults;
private ConcurrentLinkedQueue<String> queries;
public QueryThread(ConcurrentLinkedQueue<ResultSet> sqlResults, ConcurrentLinkedQueue<String> queries){
this.sqlResults = sqlResults;
this.queries = queries;
}
#Override
public void run(){
Connection connThr = null;
try{
try {
connThr = DriverManager.getConnection(dbModeSave, "sa", "");
connThr.setAutoCommit(false);
} catch (SQLException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
String currentQuery;
do {
currentQuery = queries.poll(); // get and remove element from remaining queries
if (currentQuery != null) { // only continue if element was found
try {
Statement stmnt = connThr.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE,ResultSet.CONCUR_UPDATABLE);
try {
ResultSet resultSet = stmnt.executeQuery(currentQuery);
sqlResults.add(resultSet);
} catch (SQLException e) {
// (Do something specific)
} finally {
stmnt.close();
}
} catch (SQLException e) {
// (Do something specific)
}
}
} while (currentQuery != null);
} finally {
if (connThr != null) {
try {
connThr.close();
} catch (SQLException e) {
// Nothing we can do?
}
}
}
}
}
From the original thread I check, if the threads are all finished and therefore all queries were processed (edited).
while (!executorService.isTerminated()) {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
while (!sqlResults.isEmpty()) {
ResultSet result = sqlResults.poll();
//process result and close it in the end
}
}
Java standard sulution for parallel processing is ThreadPoolExecutor. Try it.

Java ExecutorService only running 2 threads not four

I am working on an application that retrieves files from different URL's.
There is a TreeSet that contains the target to download. This is processed in a loop with each item being called with an ExecutorService. Here's some code:
private void retrieveDataFiles() {
if (this.urlsToRetrieve.size() > 0) {
System.out.println("Target URLs to retrieve: " + this.urlsToRetrieve.size());
ExecutorService executorProcessUrls = Executors.newFixedThreadPool(this.urlsToRetrieve.size());//could use fixed pool based on size of urls to retrieve
for (Entry target : this.urlsToRetrieve.entrySet()) {
final String fileName = (String) target.getKey();
final String url = (String) target.getValue();
String localFile = localDirectory + File.separator + fileName;
System.out.println(localFile);
executorProcessUrls.submit(new WikiDumpRetriever(url, localFile));
dumpFiles.add(localFile);
//TODO: figure out why only 2 files download
}
executorProcessUrls.shutdown();
try {
executorProcessUrls.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException ex) {
System.out.println("retrieveDataFiles InterruptedException: " + ex.getMessage());
}
} else {
System.out.println("No target URL's were retrieved");
}
}
Then the WikiDumpRetriever:
private static class WikiDumpRetriever implements Runnable {
private String wikiUrl;
private String downloadTo;
public WikiDumpRetriever(String targetUrl, String localDirectory) {
this.downloadTo = localDirectory;
this.wikiUrl = targetUrl;
}
public void downloadFile() throws FileNotFoundException, IOException, URISyntaxException {
HTTPCommunicationGet httpGet = new HTTPCommunicationGet(wikiUrl, "");
httpGet.downloadFiles(downloadTo);
}
#Override
public void run() {
try {
downloadFile();
} catch (FileNotFoundException ex) {
System.out.println("WDR: FileNotFound " + ex.getMessage());
} catch (IOException ex) {
System.out.println("WDR: IOException " + ex.getMessage());
} catch (URISyntaxException ex) {
System.out.println("WDR: URISyntaxException " + ex.getMessage());
}
}
}
As you can see this is an inner class. The TreeSet contains:
Key : Value
enwiki-latest-pages-articles.xml.bz2 : http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
elwiki-latest-pages-articles.xml.bz2 : http://dumps.wikimedia.org/enwiki/latest/elwiki-latest-pages-articles.xml.bz2
zhwiki-latest-pages-articles.xml.bz2 : http://dumps.wikimedia.org/enwiki/latest/zhwiki-latest-pages-articles.xml.bz2
hewiki-latest-pages-articles.xml.bz2 : http://dumps.wikimedia.org/enwiki/latest/hewiki-latest-pages-articles.xml.bz2
The problem is that this process downloads 2 of the four files. I know that all four are available and I know that they can be downloaded. However, only 2 of them process at any time.
Can anyone shed any light on this for me please - what am I missing or what am I getting wrong?
Thanks
nathj07
Thanks to ppeterka - it was a limit from the source. So, to overcome this I set the fixed thread pool size to 2. This means that only 2 files are downloaded simultaneously.
The answer then was to find the vendor imposed limit and set the thread pool:
ExecutorService executorProcessUrls = Executors.newFixedThreadPool(2);
I wanted to accept an answer but couldn't seem to do it with the comments. Sorry if this was the wrong way to do it.
Thanks for all the pointers - the 'group think' really helped solve this for me.

Categories

Resources