Multithreaded Geometry loading with GeoTools

Multithreaded Geometry loading with GeoTools - java

Hey StackOverflow community,
I am currently trying to write a little tool, that reads a shapefiles geometries (Multipolygons / Polygons) and writes the WKT-representations of those into a text file.
To do so, I am using GeoTools and I managed to get it running fine, due to the fact that I am converting files with about 5000000 Polygons / Multipolygons, it takes pretty long to finish.
So my question is:
Is it possible to fasten up the file loading/writing?
As I am using a SimpleFeatureIterator I did not find out how to implement multithreading.
Is there a way to do so?
Or does anyone know, how to get the shapefiles geometries without using an iterator?
This is my code:
This method is just stating the File Chooser and starting the thread for each selected file.
protected static void printGeometriesToFile() {
JFileChooser chooser = new JFileChooser();
FileNameExtensionFilter filter = new FileNameExtensionFilter(
"shape-files", "shp");
chooser.setFileFilter(filter);
chooser.setDialogTitle("Choose the file to be converted.");
chooser.setMultiSelectionEnabled(true);
File[] files = null;
int returnVal = chooser.showOpenDialog(null);
if (returnVal == JFileChooser.APPROVE_OPTION) {
files = chooser.getSelectedFiles();
}
for (int i = 0; i < files.length; i++) {
MultiThreadWriter writer = new MultiThreadWriter(files[i]);
writer.start();
}
}
The class for multithreading:
class MultiThreadWriter extends Thread {
private File threadFile;
MultiThreadWriter(File file) {
threadFile = file;
System.out.println("Starting Thread for " + file.getName());
}
public void run() {
try {
File outputFolder = new File(threadFile.getAbsolutePath() + ".txt");
FileOutputStream fos = new FileOutputStream(outputFolder);
System.out.println("Now writing data to file: " + outputFolder.getName());
FileDataStore store = FileDataStoreFinder.getDataStore(threadFile);
SimpleFeatureSource featureSource = store.getFeatureSource();
SimpleFeatureCollection featureCollection = featureSource.getFeatures();
SimpleFeatureIterator featureIterator = featureCollection.features();
int pos = 0;
while (featureIterator.hasNext()) {
fos.write((geometryToByteArray((Polygonal) featureIterator.next().getAttribute("the_geom"))));
pos++;
System.out.println("The file " + threadFile.getName() + "'s current positon is: " + pos);
}
fos.close();
System.out.println("Finished writing.");
} catch (IOException e) {
e.printStackTrace();
}
}
}
This is just a helper function that converts the Multipolygons to polygons and returns its WKT-representation with a "|" as a seperator.
private byte[] geometryToByteArray(Polygonal polygonal) {
List<Polygon> polygonList;
String polygonString = "";
if (polygonal instanceof MultiPolygon) {
polygonList = GeometrieUtils.convertMultiPolygonToPolygonList((MultiPolygon) polygonal);
//The method above just converts a MultiPolygon into a list of Polygons
} else {
polygonList = new ArrayList<>(1);
polygonList.add((Polygon) polygonal);
}
for (int i = 0; i < polygonList.size(); i++) {
polygonString = polygonString + polygonList.get(i).toString() + "|";
}
return polygonString.getBytes();
}
}
I know my code is not pretty or good. I have just started learning Java and hope it will become better soon.
sincerely
ihavenoclue :)

You do not need create a new thread for every file, because creating new thread is an expensive operation. Instead, you can let MultiThreadWriter implements Runnable and use ThreadPoolExecuter manage all threads.
MultiThreadWriter
public class MultiThreadWriter implements Runnable {
#Override
public void run() {
//
}
}
Create thread pool matches your runtime processors.
ExecutorService service = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
for (int i = 0; i < files.length; i++) {
MultiThreadWriter writer = new MultiThreadWriter(files[i]);
service.submit(writer);
}
You can use BufferedWriter instead OutputStream, it is more
efficient when you repeatly write small pieces.
File outputFolder = new File(threadFile.getAbsolutePath() + ".txt");
FileOutputStream fos = new FileOutputStream(outputFolder);
BufferedWriter writer = new BufferedWriter(fos);

I would prefere to read files content as a list of objects, then split the list onto sublists, then create a thread to each list, example :
int nbrThreads = 10;
ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(nbrThreads);
int count = myObjectsList != null ? myObjectsList.size() / nbrThreads : 0;
List<List<MyObject>> resultlists = choppeList(myObjectsList, count > 0 ? count : 1);
try
{
for (List<MyObject> list : resultlists)
{
// TODO : create your thread and passe the list of objects
}
executor.shutdown();
executor.awaitTermination(30, TimeUnit.MINUTESS); // chose time of termination
}
catch (Exception e)
{
LOG.error("Problem launching threads", e);
}
The choppeList method can be like that :
public <T> List<List<T>> choppeList(final List<T> list, final int L)
{
final List<List<T>> parts = new ArrayList<List<T>>();
final int N = list.size();
for (int i = 0; i < N; i += L)
{
parts.add(new ArrayList<T>(list.subList(i, Math.min(N, i + L))));
}
return parts;
}

Related

Use threading to process multiple files

I have a file that I need to use to execute the wordcount function(based on MapReduce) but using threads, I take the file and split it into multiple small files then I loop the small files to count the number of occurrences of words with a Reduce() function, how can I implement threads withe the run() function to use them with the Reduce function.
here's my code:
public class WordCounter implements Runnable {
private String Nom;
protected static int Chunks = 1 ;
public WordCounter (String n) {
Nom = n;
}
public void split () throws IOException
{
File source = new File(this.Nom);
int maxRows = 100;
int i = 1;
try(Scanner sc = new Scanner(source)){
String line = null;
int lineNum = 1;
File splitFile = new File(this.Nom+i+".txt");
FileWriter myWriter = new FileWriter(splitFile);
while (sc.hasNextLine()) {
line = sc.nextLine();
if(lineNum > maxRows){
Chunks++;
myWriter.close();
lineNum = 1;
i++;
splitFile = new File(this.Nom+i+".txt");
myWriter = new FileWriter(splitFile);
}
myWriter.write(line+"\n");
lineNum++;
}
myWriter.close();
}
}
public void Reduce() throws IOException
{
ArrayList<String> words = new ArrayList<String>();
ArrayList<Integer> count = new ArrayList<Integer>();
for (int i = 1; i < Chunks; i++) {
//create the input stream (recevoir le texte)
FileInputStream fin = new FileInputStream(this.getNom()+i+".txt");
//go through the text with a scanner
Scanner sc = new Scanner(fin);
while (sc.hasNext()) {
//Get the next word
String nextString = sc.next();
//Determine if the string exists in words
if (words.contains(nextString)) {
int index = words.indexOf(nextString);
count.set(index, count.get(index)+1);
}
else {
words.add(nextString);
count.add(1);
}
}
sc.close();
fin.close();
}
// Creating a File object that represents the disk file.
FileWriter myWriter = new FileWriter(new File(this.getNom()+"Result.txt"));
for (int i = 0; i < words.size(); i++) {
myWriter.write(words.get(i)+ " : " +count.get(i) +"\n");
}
myWriter.close();
//delete the small files
deleteFiles();
}
public void deleteFiles()
{
File f= new File("");
for (int i = 1; i <= Chunks; i++) {
f = new File(this.getNom()+i+".txt");
f.delete();
}
}
}

Better use Callable instead of using Runnable interface and this way you can retrieve your data.
So in order to fix your code you can more or less do something like this:
public class WordCounter {
private static ExecutorService threadPool = Executors.newFixedThreadPool(5); // 5 represents the number of concurrent threads.
public Map<String, Integer> count(String filename) {
int chunks = splitFileInChunks(filename);
List<Future<Report>> reports = new ArrayList<Future<Report>>();
for (int i=1; i<=chunks; i++) {
Callable<Report> callable = new ReduceCallable(filename + i + ".txt");
Future<Report> future = threadPool.submit(callable);
reports.add(future);
}
Map<String, Integer> finalMap = new HashMap<>();
for (Future<Report> future : reports) {
Map<String, Integer> map = future.get().getWords();
for (Map.Entry<String, Integer> entry : map.entrySet()) {
int oldCnt = finalMap.get(entry.getKey()) != null ? finalMap.get(entry.getKey()) : 0;
finalMap.put(entry.getKey(), entry.getValue() + oldCnt);
}
}
// return a map with the key being the word and the value the counter for that word
return finalMap;
}
// this method doesn't need to be run on the separate thread
private int splitFileInChunks(String filename) throws IOException { .... }
}
public class Report {
Map<String, Integer> words = new HashMap<>();
// ... getter, setter, constructor etc
}
public class ReduceCounter implements Callable<Report> {
String filename;
public ReduceCounter(String filename) { this.filename = filename;}
public Report call() {
// store the values in a Map<String, Integer> since it's easier that way
Map<String, Integer> myWordsMap = new HashMap<String, Integer>;
// here add the logic from your Reduce method, without the for loop iteration
// you should add logic to read only the file named with the value from "filename"
return new Report(myWordsMap);
}
}
Please note you can skip the Report class and return Future<Map<String,Integer>>, but I used Report to make it more easy to follow.
Update for Runnable as requested by user
public class WordCounter {
public Map<String, Integer> count(String filename) throws InterruptedException {
int chunks = splitFileInChunks(filename);
List<ReduceCounter> counters = new ArrayList<>();
List<Thread> reducerThreads = new ArrayList<>();
for (int i=1; i<=chunks; i++) {
ReduceCounter rc = new ReduceCounter(filename + i + ".txt");
Thread t = new Thread(rc);
counters.add(rc);
reducerThreads.add(t);
t.start();
}
// next wait for the threads to finish processing
for (Thread t : reducerThreads) {
t.join();
}
// now grab the results from each of them
for (ReduceCounter cnt : counters ) {
cnt.getWords();
// next just merge the results here...
}
}
Reducer class should look like:
public class ReduceCounter implements Runnable {
String filename;
Map<String, Integer> words = new HashMap();
public ReduceCounter(String filename) { this.filename = filename;}
public void run() {
// store the values in the "words" map
// here add the logic from your Reduce method, without the for loop iteration
// also read, only the file named with the value from "filename"
}
public Map<String, Integer> getWords() {return words;}
}

I kind of found a solution as i assign a thread to each small file, then i call the Reduce() function inside the run() function, but i still don't fully have my head around it, here's the code:
public void Reduce() throws IOException
{
ArrayList<String> words = new ArrayList<String>();
ArrayList<Integer> count = new ArrayList<Integer>();
Thread TT= new Thread();
for (int i = 1; i < Chunks; i++) {
//create the input stream (recevoir le texte)
FileInputStream fin = new FileInputStream(this.getNom()+i+".txt");
TT=new Thread(this.getNom()+i+".txt");
TT.start();
//go through the text with a scanner
Scanner sc = new Scanner(fin);
while (sc.hasNext()) {
//Get the next word
String nextString = sc.next();
//Determine if the string exists in words
if (words.contains(nextString)) {
int index = words.indexOf(nextString);
count.set(index, count.get(index)+1);
}
else {
words.add(nextString);
count.add(1);
}
}
sc.close();
fin.close();
}
// Creating a File object that represents the disk file.
FileWriter myWriter = new FileWriter(new File(this.getNom()+"Result.txt"));
for (int i = 0; i < words.size(); i++) {
myWriter.write(words.get(i)+ " : " +count.get(i) +"\n");
}
myWriter.close();
//Store the result in the new file
deleteFiles();
}
public void run() {
try {
this.Reduce();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void main(String[] args) throws IOException {
Wordcounter w1 = new Wordcounter("Words.txt");
Thread T1= new Thread(w1);
T1.start();
}

Writing an object array to csv file in java

I am trying to take an initial CSV file, pass it through a class that checks another file if it has an A or a D to then adds or deletes the associative entry to an array object.
example of pokemon.csv:
1, Bulbasaur
2, Ivysaur
3, venasaur
example of changeList.csv:
A, Charizard
A, Suirtle
D, 2
That being said, I am having a lot of trouble getting the content of my new array to a new CSV file. I have checked to see whether or not my array and class files are working properly. I have been trying and failing to take the final contents of "pokedex1" object array into the new CSV file.
Main File
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Scanner;
public class PokedexManager {
public static void printArray(String[] array) {
System.out.print("Contents of array: ");
for(int i = 0; i < array.length; i++) {
if(i == array.length - 1) {
System.out.print(array[i]);
}else {
System.out.print(array[i] + ",");
}
}
System.out.println();
}
public static void main(String[] args) {
try {
//output for pokedex1 using PokemonNoGaps class
PokemonNoGaps pokedex1 = new PokemonNoGaps();
//initializes scanner to read from csv file
String pokedexFilename = "pokedex.csv";
File pokedexFile = new File(pokedexFilename);
Scanner pokescanner = new Scanner(pokedexFile);
//reads csv file, parses it into an array, and then adds new pokemon objects to Pokemon class
while(pokescanner.hasNextLine()) {
String pokeLine = pokescanner.nextLine();
String[] pokemonStringArray = pokeLine.split(", ");
int id = Integer.parseInt(pokemonStringArray[0]);
String name = pokemonStringArray[1];
Pokemon apokemon = new Pokemon(id, name);
pokedex1.add(apokemon);
}
//opens changeList.csv file to add or delete entries from Pokemon class
String changeListfilename = "changeList.csv";
File changeListFile = new File(changeListfilename);
Scanner changeScanner = new Scanner(changeListFile);
//loads text from csv file to be parsed to PokemonNoGaps class
while(changeScanner.hasNextLine()) {
String changeLine = changeScanner.nextLine();
String[] changeStringArray = changeLine.split(", ");
String action = changeStringArray[0];
String nameOrId = changeStringArray[1];
//if changList.csv file line has an "A" in the first spot add this entry to somePokemon
if(action.equals("A")) {
int newId = pokedex1.getNewId();
String name = nameOrId;
Pokemon somePokemon = new Pokemon(newId, name);
pokedex1.add(somePokemon);
}
//if it has a "D" then send it to PokemonNoGaps class to delete the entry from the array
else { //"D"
int someId = Integer.parseInt(nameOrId);
pokedex1.deleteById(someId);
}
//tests the action being taken and the update to the array
//System.out.println(action + "\t" + nameOrId + "\n");
System.out.println(pokedex1);
//*(supposedly)* prints the resulting contents of the array to a new csv file
String[] pokemonList = changeStringArray;
try {
String outputFile1 = "pokedex1.csv";
FileWriter writer1 = new FileWriter(outputFile1);
writer1.write(String.valueOf(pokemonList));
} catch (IOException e) {
System.out.println("\nError writing to Pokedex1.csv!");
e.printStackTrace();
}
}
//tests final contents of array after being passed through PokemonNoGaps class
//System.out.println(pokedex1);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
PokemonNoGaps class file:
public class PokemonNoGaps implements ChangePokedex {
private Pokemon[] pokedex = new Pokemon[1];
private int numElements = 0;
private static int id = 0;
// add, delete, search
#Override
public void add(Pokemon apokemon) {
// if you have space
this.pokedex[this.numElements] = apokemon;
this.numElements++;
// if you don't have space
if(this.numElements == pokedex.length) {
Pokemon[] newPokedex = new Pokemon[ this.numElements * 2]; // create new array
for(int i = 0; i < pokedex.length; i++) { // transfer all elements from array into bigger array
newPokedex[i] = pokedex[i];
}
this.pokedex = newPokedex;
}
this.id++;
}
public int getNewId() {
return this.id + 1;
}
#Override
public void deleteById(int id) {
for(int i = 0; i < numElements; i++) {
if(pokedex[i].getId() == id) {
for(int j = i+1; j < pokedex.length; j++) {
pokedex[j-1] = pokedex[j];
}
numElements--;
pokedex[numElements] = null;
}
}
}
public Pokemon getFirstElement() {
return pokedex[0];
}
public int getNumElements() {
return numElements;
}
public String toString() {
String result = "";
for(int i = 0; i < this.numElements; i++) {
result += this.pokedex[i].toString() + "\n";
}
return result;
}
}
Excpeted output:
1, Bulbasaur
3, Venasaur
4, Charizard
5, Squirtle
Am i using the wrong file writer? Am I calling the file writer at the wrong time or incorrectly? In other words, I do not know why my output file is empty and not being loaded with the contents of my array. Can anybody help me out?

I spotted a few issues whilst running this. As mentioned in previous answer you want to set file append to true in the section of code that writes to the new pokedx1.csv
try {
String outputFile1 = "pokedex1.csv";
FileWriter fileWriter = new FileWriter(prefix+outputFile1, true);
BufferedWriter bw = new BufferedWriter(fileWriter);
for(String pokemon : pokedex1.toString().split("\n")) {
System.out.println(pokemon);
bw.write(pokemon);
}
bw.flush();
bw.close();
} catch (IOException e) {
System.out.println("\nError writing to Pokedex1.csv!");
e.printStackTrace();
}
I opted to use buffered reader for the solution. Another issue I found is that your reading pokedex.csv but the file is named pokemon.csv.
String pokedexFilename = "pokemon.csv";
I made the above change to fix this issue.
On a side note I noticed that you create several scanners to read the two files. With these types of resources its good practice to call the close method once you have finished using them; as shown below.
Scanner pokescanner = new Scanner(pokedexFile);
// Use scanner code here
// Once finished with scanner
pokescanner.close();

String outputFile1 = "pokedex1.csv";
FileWriter writer1 = new FileWriter(outputFile1);
appears to be within your while loop so a new file will be created every time.
Either use the FileWriter(File file, boolean append) constructor or create before the loop

What's the difference between DataOutputStream and ObjectOutputStream?

I'm learning about socket programming in Java. I've seen client/server app examples with some using DataOutputStream, and some using ObjectOutputStream.
What's the difference between the two?
Is there a performance difference?

DataInput/OutputStream performs generally better because its much simpler. It can only read/write primtive types and Strings.
ObjectInput/OutputStream can read/write any object type was well as primitives. It is less efficient but much easier to use if you want to send complex data.
I would assume that the Object*Stream is the best choice until you know that its performance is an issue.

This might be useful for people still looking for answers several years later... According to my tests on a recent JVM (1.8_51), the ObjectOutput/InputStream is surprisingly almost 2x times faster than DataOutput/InputStream for reading/writing a huge array of double!
Below are the results for writing 10 million items array (for 1 million the results are the essentially the same). I also included the text format (BufferedWriter/Reader) for the sake of completeness:
TestObjectStream written 10000000 items, took: 409ms, or 24449.8778 items/ms, filesize 80390629b
TestDataStream written 10000000 items, took: 727ms, or 13755.1582 items/ms, filesize 80000000b
TestBufferedWriter written 10000000 items, took: 13700ms, or 729.9270 items/ms, filesize 224486395b
Reading:
TestObjectStream read 10000000 items, took: 250ms, or 40000.0000 items/ms, filesize 80390629b
TestDataStream read 10000000 items, took: 424ms, or 23584.9057 items/ms, filesize 80000000b
TestBufferedWriter read 10000000 items, took: 6298ms, or 1587.8057 items/ms, filesize 224486395b
I believe Oracle has heavily optimized the JVM for using ObjectStreams in last Java releases, as this is the most common way of writing/reading data (including serialization), and thus is located on the Java performance critical path.
So looks like today there's no much reason anymore to use DataStreams. "Don't try to outsmart JVM", just use the most straightforward way, which is ObjectStreams :)
Here's the code for the test:
class Generator {
private int seed = 1235436537;
double generate(int i) {
seed = (seed + 1235436537) % 936855463;
return seed / (i + 1.) / 524323.;
}
}
class Data {
public final double[] array;
public Data(final double[] array) {
this.array = array;
}
}
class TestObjectStream {
public void write(File dest, Data data) {
try (ObjectOutputStream out = new ObjectOutputStream(new BufferedOutputStream(new FileOutputStream(dest)))) {
for (int i = 0; i < data.array.length; i++) {
out.writeDouble(data.array[i]);
}
} catch (IOException e) {
throw new RuntimeIoException(e);
}
}
public void read(File dest, Data data) {
try (ObjectInputStream in = new ObjectInputStream(new BufferedInputStream(new FileInputStream(dest)))) {
for (int i = 0; i < data.array.length; i++) {
data.array[i] = in.readDouble();
}
} catch (IOException e) {
throw new RuntimeIoException(e);
}
}
}
class TestDataStream {
public void write(File dest, Data data) {
try (DataOutputStream out = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(dest)))) {
for (int i = 0; i < data.array.length; i++) {
out.writeDouble(data.array[i]);
}
} catch (IOException e) {
throw new RuntimeIoException(e);
}
}
public void read(File dest, Data data) {
try (DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(dest)))) {
for (int i = 0; i < data.array.length; i++) {
data.array[i] = in.readDouble();
}
} catch (IOException e) {
throw new RuntimeIoException(e);
}
}
}
class TestBufferedWriter {
public void write(File dest, Data data) {
try (BufferedWriter out = new BufferedWriter(new FileWriter(dest))) {
for (int i = 0; i < data.array.length; i++) {
out.write(Double.toString(data.array[i]));
out.newLine();
}
} catch (IOException e) {
throw new RuntimeIoException(e);
}
}
public void read(File dest, Data data) {
try (BufferedReader in = new BufferedReader(new FileReader(dest))) {
String line = in.readLine();
int i = 0;
while (line != null) {
if(!line.isEmpty()) {
data.array[i++] = Double.parseDouble(line);
}
line = in.readLine();
}
} catch (IOException e) {
throw new RuntimeIoException(e);
}
}
}
#Test
public void testWrite() throws Exception {
int N = 10000000;
double[] array = new double[N];
Generator gen = new Generator();
for (int i = 0; i < array.length; i++) {
array[i] = gen.generate(i);
}
Data data = new Data(array);
Map<Class, BiConsumer<File, Data>> subjects = new LinkedHashMap<>();
subjects.put(TestDataStream.class, new TestDataStream()::write);
subjects.put(TestObjectStream.class, new TestObjectStream()::write);
subjects.put(TestBufferedWriter.class, new TestBufferedWriter()::write);
subjects.forEach((aClass, fileDataBiConsumer) -> {
File f = new File("test." + aClass.getName());
long start = System.nanoTime();
fileDataBiConsumer.accept(f, data);
long took = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
System.out.println(aClass.getSimpleName() + " written " + N + " items, took: " + took + "ms, or " + String.format("%.4f", (N / (double)took)) + " items/ms, filesize " + f.length() + "b");
});
}
#Test
public void testRead() throws Exception {
int N = 10000000;
double[] array = new double[N];
Data data = new Data(array);
Map<Class, BiConsumer<File, Data>> subjects = new LinkedHashMap<>();
subjects.put(TestDataStream.class, new TestDataStream()::read);
subjects.put(TestObjectStream.class, new TestObjectStream()::read);
subjects.put(TestBufferedWriter.class, new TestBufferedWriter()::read);
subjects.forEach((aClass, fileDataBiConsumer) -> {
File f = new File("test." + aClass.getName());
long start = System.nanoTime();
fileDataBiConsumer.accept(f, data);
long took = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
System.out.println(aClass.getSimpleName() + " read " + N + " items, took: " + took + "ms, or " + String.format("%.4f", (N / (double)took)) + " items/ms, filesize " + f.length() + "b");
});
}

DataOutputStream and ObjectOutputStream: when handling basic types, there is no difference apart from the header that ObjectOutputStream creates.
With the ObjectOutputStream class, instances of a class that implements Serializable can be written to the output stream, and can be read back with ObjectInputStream.
DataOutputStream can only handle basic types.

Only objects that implement the java.io.Serializable interface can be written to streams using ObjectOutputStream.Primitive data types can also be written to the stream using the appropriate methods from DataOutput. Strings can also be written using the writeUTF method. But DataInputStream on the other hand lets an application write primitive Java data types to an output stream in a portable way.
Object OutputStream
Data Input Stream

Termination Condition for threads in Java

I have written a multi-threaded Java application which reads a bunch of .jar files from a directory. This application spawns multiple threads and each threads reads bunch of jar files. I'm having trouble identifying the stopping condition for this application. How can i identify that all the files have been read?
The following is a snippet function which gets called from the run() method for each thread.
import java.io.*;
import java.util.Enumeration;
import java.util.jar.*;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipException;
import java.io.FilenameFilter;
public class ArchiveFileTest implements Runnable {
private static boolean stopAll = false;
private static int threadNumber = 0;
private int myNumber = 0;
public ArchiveFileTest () {
myNumber = threadNumber;
threadNumber++;
}
public static boolean setStopAll () {
return setStopAll(true);
}
public static boolean setStopAll (boolean b) {
stopAll = b;
return stopAll;
}
public static String[] listFiles (File parentDir,final String ext1,final String ext2,final String ext3,final String ext4) {
String allFiles[] = parentDir.list(new FilenameFilter() {
public boolean accept(File pDir, String fName) {
if (fName.endsWith("."+ext1) || fName.endsWith("."+ext2) || fName.endsWith("."+ext3) || fName.endsWith("."+ext4)) return true;
else return false;
}
});
for (int i=0; i<allFiles.length; i++)
allFiles[i] = parentDir.getAbsolutePath() + File.separator + allFiles[i];
return allFiles;
}
public ZipFile getMyZipFile (File parentDir) {
String fn[] = listFiles(parentDir, "jar", "zip", "war", "rar");
int fileNum = myNumber % fn.length;
ZipFile zFile = null;
for (int i=0; i<fn.length; i++) {
String jFile = fn[(fileNum + i)%fn.length];
try {
zFile = new ZipFile(jFile);
break;
} catch (IOException e) {
setStopAll();
}
}
return zFile;
}
public void doStuff() throws Exception {
File dName = new File("/home/sqatest/chander/sample-files");
final int N_TIMES = 15;
final int N_FILES = 500;
int counter = 0;
int fCount = 0;
if (!dName.isDirectory() || !dName.exists()) {
System.err.println("The parent directory given should point to an existing directory...");
setStopAll();
return;
}
while (counter < N_TIMES) {
ZipFile zipFile = getMyZipFile(dName);
if (zipFile == null) {
System.err.println("No zip file entry for the Thread-" + myNumber);
break;
}
try {
Enumeration <? extends ZipEntry> zipEntries = zipFile.entries();
fCount = 0;
ZipEntry ze = null;
while (zipEntries.hasMoreElements()) {
ze = zipEntries.nextElement();
if (ze.isDirectory()) continue; // if it is a directory go to next entry
InputStream is = zipFile.getInputStream(ze);
fCount++;
int readCount = 0;
try {
while(is.read((new byte[50])) != -1 && readCount != 200) readCount++;
System.out.println("Successfully Read " + zipFile.toString());
//is.close();
} catch (IOException e) {
e.printStackTrace();
}
if (fCount == N_FILES) break; // read maximum of N_FILES
}
if (stopAll) break;
} catch (Exception e) {
e.printStackTrace();
} finally {
counter++;
}
}
}
public void run () {
try {
doStuff();
} catch (IOException e) {
e.printStackTrace();
setStopAll();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main (String[] args) throws Exception {
final int MAX_THREADS = 500;
final int MAX_HOLDING_THREADS = 5;
int loopCount = 0;
Thread mainThread = Thread.currentThread();
for (int m=0; ; m++) {
Thread t[] = new Thread[MAX_HOLDING_THREADS];
for (int n=0; n<t.length; n++) {
t[n] = new Thread(new ArchiveFileTest());
t[n].start();
if ((m+1)*(n+1)==MAX_THREADS) {
System.out.println("\n" + MAX_THREADS + " reached... \nMain Sleeping for some mins...");
loopCount++;
try {
t[n].join();
System.out.println("\nMain is back... (" + loopCount + ")");
} catch (InterruptedException e) {
e.printStackTrace();
setStopAll();
}
m = 0;
}
}
}
}
}

I don't think your application will ever stop. You've got an infinite loop in the main method:
for (int m=0; ; m++) {
....
}
Note, setting m=0 inside the body won't break the loop, so I think you'll never end even if you have no file. It then continuously reads all zip/jar/war/rar files in the directory (choosing the file based on a rotating counter myNumber is not very maintainable), but never exits the loop.
If you're requirement is to read ZIP files using a number of threads, then I would go about it a different way.
Create a Set of files which you want to look at.
Create a ThreadPoolExecutor to create a fixed pool of 5 threads
Iterate over the set of files and create a new Runnable which does the Zip Extraction (though I'm not quite sure why you read the first 10000 bytes of a ZIP entry and then don't do anything with it), and call the execute method. That will use the thread pool to process 5 files at a time.
After submitting all the runnables Use the shutdown method, which will wait for all submitted tasks to finish, and the shutdown the thread pool.

If by stopping you mean terminating then the application will stop when all threads, that are not daemon special case, are finished.

In your class that launches the threads, have a volatile counter for your running threads.
In your thread constructor pass a reference to the launching class.
Have a synchronized method to let the threads notify the launching class that they are done.
After instancing and starting your threads wait for the counter to become 0;
while(getRunningThreads() > 0) // getRunningThreads must be synchronized too
Thread.sleep(500); // Check every half second.

Multiple 'if' statements into one outputStream

import java.io.*;
import java.io.File;
import java.io.FilenameFilter;
public class YDSearch{
public void listFiles(String dir) throws IOException{
File directory = new File(dir);
if (!directory.isDirectory()) {
System.out.println("No directory provided");
return;
}
//create a FilenameFilter and override its accept-method
FilenameFilter filefilter = new FilenameFilter() {
public boolean accept(File dir, String name) {
//if the file extension is .mp3 return true, else false
return name.endsWith(".mp3")||name.endsWith(".mp4")||name.endsWith(".3gp")
||name.endsWith(".mov")||name.endsWith(".avi")||name.endsWith(".wmv");
}
};
String[] filenames = directory.list(filefilter);
DataOutputStream output = new DataOutputStream(new FileOutputStream("C:/Users/Jonathan/Desktop/YouDetect/SearchByFileType/AllMediaFiles.dat"));
for (String name : filenames) {
output.writeUTF(dir + name);
}
output.close();
DataInputStream input = new DataInputStream(new FileInputStream("C:/Users/Jonathan/Desktop/YouDetect/SearchByFileType/AllMediaFiles.dat"));
DataOutputStream output2 = new DataOutputStream(new FileOutputStream("C:/Users/Jonathan/Desktop/ReadyForAnalysis.dat"));
for (String name : filenames) {
FileInputStream in = new FileInputStream(input.readUTF());
int byteCounter = 0;
int rowCounter = 0;
long bufferCounter = 0;
if(name.endsWith(".mp3")){
byte[] b = new byte[36];
int read = in.read(b, 0, 36);
if (byteCounter != 1000){
if (rowCounter == 1){
System.out.println("\n");
rowCounter = 0;
}
output2.writeUTF(org.apache.commons.codec.binary.Hex.encodeHexString(b)+ " " + dir + name);
bufferCounter ++;
rowCounter ++;
}else{
byteCounter = 0;
try{
Thread.sleep(200);
}catch(InterruptedException e) {
}
}
}
else if(name.endsWith(".mp4")){
byte[] b = new byte[29];
int read = in.read(b, 0, 29);
if (byteCounter != 1000){
if (rowCounter == 1){
System.out.println("\n");
rowCounter = 0;
}
output2.writeUTF(org.apache.commons.codec.binary.Hex.encodeHexString(b)+ " " + dir + name);
bufferCounter ++;
rowCounter ++;
}else{
byteCounter = 0;
try{
Thread.sleep(200);
}catch(InterruptedException e) {
}
}
}
//System.out.println("====================");
}
output2.close();
input.close();
DataInputStream input2 = new DataInputStream(new FileInputStream("C:/Users/Jonathan/Desktop/ReadyForAnalysis.dat"));
for (String name : filenames) {
System.out.println(input2.readUTF()+"\n");
}
}
public void checkHeaderSC(String allFiles)throws IOException{
}
public static void main(String[] args) throws IOException {
YDSearch YDSearch = new YDSearch();
YDSearch.listFiles("C:/Users/Jonathan/Desktop/YD Tests/1) High Quality/");
YDSearch.listFiles("C:/Users/Jonathan/Desktop/YD Tests/2) Medium Quality/");
YDSearch.listFiles("C:/Users/Jonathan/Desktop/YD Tests/3) Low Quality/");
YDSearch.checkHeaderSC("C:/Users/Jonathan/Desktop/YouDetect/SearchByFileType/ReadyForAnalysis.dat");
}
}
Hey there, having a little issue with the above coding and hoped someone here might be able to help. This is sort of a partial version of the code as the real one has 4 more if/else if statements involved.
The program compiles and begins to run fine. It produces several results back from the file that is being read into/then out of again in input2 but then stops, produces no more results and gives the error:
Exception in thread "main" java.io.EOFException
at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
at java.io.DataInputStream.readUTF(DataInputStream.java:572)
at java.io.DataInputStream.readUTF(DataInputStream.java:547)
at YDSearch.listFiles(YDSearch.java:85)
at YDSearch.main(YDSearch.java:93)
Anybody know why this might be happening and have a solution they could share?
I've also tried making the variable 'b' to be inside of an if statement but that doesn't work because of scope. If b was defined by if's then there would only need to be one if statement to output to the file
Please let me know if you've got any ideas, I'd really appreciate it :)

As far as I can see, you don't always put out an output record for every name, only for when the name matches one of your patterns. However, you do try to read an input record for every name.
Ergo, if you have any filenames that don't match the patterns you try to read more than you write, and you will get the EOF.
EDIT:
In more detail, the problem is that you get a list of all the files that end with "mp3", "mp4", "3gp", "mov", "avi or "wmv". You then process that list, and write out something into C:/Users/Jonathan/Desktop/ReadyForAnalysis.dat for each "mp3" and "mp4" file. You then assume that for each entry in your list of files, that you will have an entry in ReadyForAnalysis.dat. However, if there are any files ending in "3gp", "mov", "avi or "wmv" then this will not hold true.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Multithreaded Geometry loading with GeoTools - java

Related

Use threading to process multiple files

Writing an object array to csv file in java

What's the difference between DataOutputStream and ObjectOutputStream?

Termination Condition for threads in Java

Multiple 'if' statements into one outputStream

Categories

Resources