sorting files using multithreading in java - java

I was given an assignment to write all ordered contents of given files into a result.txt. At first, the filenames are split into different Arraylists where each file contains a label in a format #n/N where N is the total number of files. e.g.
British explorer James Clark Ross led the first
expedition to reach the north magnetic pole
#001/004
from a file 1831-06-01.txt
The problem with my code is that it has written in order 1,4,2,3 respectively. However, the result must be in order 1,2,3,4. This may be due to a lack of synchronization. Nonetheless, I am still struggling to fix the problem.
This is my code:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.*;
class PopThread implements Runnable {
ArrayList<String> fileList;
public PopThread(ArrayList<String> fileList) {
this.fileList = fileList;
}
#Override
public void run() {
//System.out.println("running\n");
Thread.currentThread().setPriority(Thread.MIN_PRIORITY);
long startTime = System.nanoTime();
System.out.println("fileList: " + fileList);
ArrayList<String> sortedFileList = sortFiles(fileList);
File resultFile = new File("result.txt");
for (String filename : sortedFileList) {
Writer w1 = new Writer(filename, resultFile);
Thread t = new Thread(w1);
t.setPriority(Thread.MAX_PRIORITY);
t.start();
}
long stopTime = System.nanoTime();
//System.out.println("Total execution time: " + (stopTime - startTime));
}
public ArrayList<String> readFiles(String filename) {
ArrayList<String> list = new ArrayList<String>();
try {
File myObj = new File(filename);
Scanner s = new Scanner(myObj);
while (s.hasNext()) {
list.add(s.next());
}
s.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
return list;
}
public int getNumber(String filename) {
String lastLine = "";
String sCurrentLine;
int identifier_integer = -1;
try {
BufferedReader br = new BufferedReader(new FileReader(filename));
while ((sCurrentLine = br.readLine()) != null) {
lastLine = sCurrentLine;
}
String identifier_number = lastLine.substring(1,4);
identifier_integer = Integer.parseInt(identifier_number);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
return identifier_integer;
}
public ArrayList<String> sortFiles(ArrayList<String> listFileName) {
int i = listFileName.size();
boolean sorted = false;
while ( (i > 1) && (!(sorted)) ) {
sorted = true;
for (int j = 1; j < i; j++) {
if ( getNumber(listFileName.get(j-1)) > getNumber(listFileName.get(j)) ) {
String temp = listFileName.get(j-1);
listFileName.set(j-1, listFileName.get(j));
listFileName.set(j, temp);
sorted = false;
}
}
i--;
}
return listFileName;
}
}
class Writer implements Runnable {
String filename;
File resultFile;
public Writer(String filename, File resultFile) {
this.filename = filename;
this.resultFile = resultFile;
}
#Override
public void run() {
String content;
content = readFromFile(filename);
writeToFile(resultFile, content);
}
private static void writeToFile(File resultFile, String content) {
try {
BufferedWriter writer = new BufferedWriter(new FileWriter(resultFile, true));
writer.write(content);
//writer.write("file content written");
writer.flush();
} catch (IOException e) {
e.printStackTrace();
}
}
static String readFromFile(String filename) {
StringBuffer content = new StringBuffer();
try {
String text;
BufferedReader reader = new BufferedReader(new FileReader(filename));
while ((text = reader.readLine()) != null) {
content.append(text);
content.append("\n");
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
return content.toString();
}
}
public class q4 {
public static void main(String[] args) {
ArrayList<String> filesOne = new ArrayList<String>();
filesOne.add("1831-06-01.txt");
filesOne.add("2003-08-27.txt");
ArrayList<String> filesTwo = new ArrayList<String>();
filesTwo.add("1961-04-12.txt");
filesTwo.add("1972-12-11.txt");
PopThread popRunnableOne = new PopThread(filesOne);
PopThread popRunnableTwo = new PopThread(filesTwo);
Thread threadOne = new Thread(popRunnableOne);
Thread threadTwo = new Thread(popRunnableTwo);
threadOne.start();
threadTwo.start();
try {
threadOne.join();
threadTwo.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
( NOTE: The class q4 cannot be altered)

This assignment is horrible. You have my sympathy.
Your two threads will have to communicate with each other. Each thread will have to know, what is the filename that the other thread wants to output next. And, they will have to take turns. Each thread needs to loop:
While the date on my next file is less than or equal to the date on the other thread's next file, output my next file,
Tell the other thread, "it's your turn,"
If I have no more files, then exit (return from the run() method), otherwise, wait for the other thread to tell me it's my turn again,
Go back to step 1.
Having to take turns is the worst part of the assignment. Any time you find yourself needing to make threads take turns doing something—any time you need to make threads do things in a particular order—that's a clear sign that all of the things should be done by a single thread.
The only way threads can communicate is through shared variables. Your instructor has done you a huge disservice by telling you not to modify the q4 class. That prevents you from passing any shared objects in to your PopThread implementation through its constructor.
The only other way your two threads can share any variables is by making the variables static. Forcing you to use static is the second worst part of the assignment. If you go on to study software engineering, you will learn that static is an anti-pattern. Programs that use static variables are brittle (i.e., hard to modify), and they are hard to test.
Forcing you to use static variables also will make your threads do extra work to figure out who is who. Normally, I would do something like this so that each thread would automatically know which state is its own, and which belongs to the other guy:
class SharedState { ... }
class PopThread {
public PopThread(
SharedState myState,
SharedState otherThreadState,
ArrayList<String> fileList
) {
this.myState = myState;
this.otherGuyState = otherThreadState;
this.fileList = fileList;
...initialize this.myState...
}
...
}
class q4 {
public static void main(String[] args) {
SharedState stateOne = new SharedState();
SharedState stateTwo = new SharedState();
PopThread popRunnableOne = new PopThread(stateOne, stateTwo, filesOne);
PopThread popRunnableTwo = new PopThread(stateTwo, stateOne, filesTwo);
...
}
}
The best way I can think of with static variables would be to have an array of two SharedState, and have the threads use an AtomicInteger to each assign themself one of the two array slots:
class PopThread {
static SharedState[] state = new SharedState [2];
static AtomicInteger nextStateIndex = new AtomicInteger(0);
public PopThread(
SharedState myState,
SharedState otherThreadState,
ArrayList<String> fileList
) {
myStateIndex = nextStateIndex.getAndIncrement();
otherGuysStateIndex = myStateIndex ^ 1;
this.fileList = fileList;
...initialize state[myStateIndex]...
}
...
}

Related

Java create thread with parameters

I have a program that counts several times a word in the text.
I want the loop to be in a separate thread. How can I pass parameters articles and stringToSearch to the thread, or set the global parameters?
public class Main {
public static void main(String[] args) {
Scanner s = new Scanner(System.in);
int numberArticles = s.nextInt();
ArrayList<Article> articles = new ArrayList<>();
for(int i = 0; i < numberArticles; i++) {
String articleName = s.nextLine();
String content = "";
File file = new File(articleName + ".txt");
BufferedReader br;
try {
br = new BufferedReader(new FileReader(file));
String st;
while ((st = br.readLine()) != null) {
content += st;
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
articles.add(new Article(articleName, content));
}
String stringToSearch = s.nextLine();
MyThread myThread = new MyThread();
myThread.start();
}
}
public class MyThread extends Thread {
public void run(){
for(Article article : articles) {
int counter = 0;
String[] words = article.getContent().split(" ");
for (String word : words) {
if(word.equals(stringToSearch)) {
counter++;
}
}
}
}
}
You are extending Thread with your custom class. And you can add any number of any additional properties to that class (MyThread). And you can create a constructor in MyThread to pass all those parameters.
Here's an example showing how to pass some values into the constructor for your MyThread class. This passes two things into the constructor which then saves them to private members which can then be used within the run() method. I removed most of the other code from your question since it wasn't required for this explanation.
import java.util.ArrayList;
public class Scratch2 {
public static void main(String[] args) {
ArrayList<Article> articles = new ArrayList<>();
String stringToSearch = "...";
MyThread myThread = new MyThread(articles, stringToSearch);
myThread.start();
}
}
public class MyThread extends Thread {
private final ArrayList<Article> articles;
private final String stringToSearch;
public MyThread(ArrayList<Article> articles, String stringToSearch) {
this.articles = articles;
this.stringToSearch = stringToSearch;
}
public void run() {
for (Article article : articles) {
// ... do things with "stringToSearch"
}
}
}
class Article {
// more stuff here
}

How can I get what I'm reading in from a file to an output file

I'm able to pull out the 20 names randomly but how do I store them in an output file rather than displaying them to the screen? I tried filewriter but couldn't get it to work.
public class Assignment2 {
public static void main(String[] args) throws IOException {
// Read in the file into a list of strings
BufferedReader reader = new BufferedReader(new FileReader("textfile.txt"));
//BufferedWriter bw = new BufferedWriter(new FileWriter("out.txt"));
List<String> lines = new ArrayList<String>();
String line = reader.readLine();
while( line != null ) {
lines.add(line);
line = reader.readLine();
}
// Choose a random one from the list
Random r = new Random();
FileWriter letters = new FileWriter("out.txt");
for (int i = 0; i < 20; i++) {
int rowNum = r.nextInt(lines.size ());
System.out.println(lines.get(rowNum));
}
}
}
System.out is a PrintStream (javadoc) and not a Writer (javadoc) so the api to access it will be different, you can't replace one by the other. The Writer is some how a lower level abstraction.
But it is easy to create a PrintStream that output a a file and use it as replacement of System.out:
PrintStreasm out = new PrintStream("out.txt");
// you can event assign System.out to out.
// out = System.out;
for (int i = 0; i < 20; i++) {
int rowNum = r.nextInt(lines.size ());
out.println(lines.get(rowNum));
}
out.close()
PS: don't forget to closed any file you open (auto closable functionality of Java 7 is even better)
PSS: I assume you are learning Java, I can't recommend enough that you have a look to at the Java I/O stream apis.
I am using a convince method to write to the file from the library commons.io-2.4, code is also available on github
this example demonstrate how to read and write String lines to a file
import org.apache.commons.io.FileUtils;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
/**
* Created by Pankaj Nimgade on 10-02-2016.
*/
public class WriteFile {
public static void main(String[] args) {
ArrayList<String> list = new ArrayList<>();
for (int i = 0; i < 20; i++) {
list.add("somename_" + i);
}
File file = new File("file.txt");
try {
// FileUtils.writeLines(file, list);
ArrayList<String> strings = (ArrayList<String>) FileUtils.readLines(file);
ArrayList<Name> names = new ArrayList<>();
for (String single:strings) {
names.add(new Name(single));
if (names.size() == 20) {
break;
}
}
for (Name single_name:names) {
System.out.println(single_name.getName());
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
class Name {
String name;
public Name(String name) {
this.name = name;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
#Override
public boolean equals(Object obj) {
return this.name.equalsIgnoreCase(((Name) obj).getName());
}
}
output, this will be inside file.txt
somename_0
somename_1
somename_2
somename_3
somename_4
somename_5
somename_6
somename_7
somename_8
somename_9
somename_10
somename_11
somename_12
somename_13
somename_14
somename_15
somename_16
somename_17
somename_18
somename_19

How can i return an array in java that is accessible by other objects?

I want to return an array that is accessible by other objects after having read a text file. My instruction parsing class is:
import java.io.*;
public class Instruction {
public String[] instructionList;
public String[] readFile() throws IOException {
FileInputStream in = new FileInputStream("directions.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(in));
int n = 5;
instructionList = new String[n];
for (int j = 0; j < instructionList.length; j++) {
instructionList[j] = br.readLine();
}
in.close();
return instructionList;
}
}
The above takes in a text file with 5 lines of text in it. In my main() I want to run that function and have the string array be accessible to other objects.
import java.util.Arrays;
public class RoverCommand {
public static void main(String[] args) throws Exception {
Instruction directions = new Instruction();
directions.readFile();
String[] directionsArray;
directionsArray = directions.returnsInstructionList();
System.out.println(Arrays.toString(directionsArray));
}
}
What's the best way to do that? I would need the elements of the array to be integers if they are numbers and strings if they are letters. P.S. I'm brand new to Java. is there a better way to do what I'm doing?
You don't have to use generics. I try to catch exceptions in the accessors and return null if anything blows up. So you can test if the value returned is null before proceeding.
// Client.java
import java.io.IOException;
public class Client {
public static void main(String args[]) {
try {
InstructionList il = new InstructionList();
il.readFile("C:\\testing\\ints.txt", 5);
int[] integers = il.getInstructionsAsIntegers();
if (integers != null) {
for (int i : integers) {
System.out.println(i);
}
}
} catch (IOException e) {
// handle
}
}
}
// InstructionList.java
import java.io.*;
public class InstructionList {
private String[] instructions;
public void readFile(String path, int lineLimit) throws IOException {
FileInputStream in = new FileInputStream(path);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
instructions = new String[lineLimit];
for (int i = 0; i < lineLimit; i++) {
instructions[i] = br.readLine();
}
in.close();
}
public String[] getInstructionsAsStrings() {
return instructions; // will return null if uninitialized
}
public int[] getInstructionsAsIntegers() {
if (this.instructions == null) {
return null;
}
int[] instructions = new int[this.instructions.length];
try {
for (int i = 0; i < instructions.length; i++) {
instructions[i] = new Integer(this.instructions[i]);
}
} catch (NumberFormatException e) {
return null; // data integrity fail, return null
}
return instructions;
}
}
check instructionList is null or not. if it is null, call readFile method.
public String[] returnsInstructionList() {
if (instructionList== null){
try { readFile(); } catch(Exception e){}
}
return instructionList;
}
because of readFile can throw exception, it would be good to use one extra variable. like:
private boolean fileReaded = false;
public String[] returnsInstructionList() {
if (!fileReaded){
fileReaded = true;
try { readFile(); } catch(Exception e){}
}
return instructionList;
}
and if readFile can be run concurrently, easiest way to make function synchronized, like
private boolean fileReaded = false;
public synchronized void readFile() throws IOException {
.
.
.
}
public synchronized String[] returnsInstructionList() {
if (!fileReaded){
fileReaded = true;
try { readFile(); } catch(Exception e){}
}
return instructionList;
}
There is no guarantee that readFile is called before returnsInstructionList is invoked. Leaving you returnsInstructionList returning null.
I would :
public String[] getContentsFromFile(String fileName) throws IOException {
FileInputStream in = new FileInputStream(fileName);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
int n = 5;
instructionList = new String[n];
for (int j = 0; j < instructionList.length; j++) {
instructionList[j] = br.readLine();
}
in.close();
return instructionList;
}
Part two to the question you can use generics. To achieve what you want but you have to incorporate a way to say what it is.
Eg
public class Foo {
public ReturnForFoo returnAStringOrIntger(boolean val) {
if(val){
return new ReturnForFoo("String", ValueType.STRING) ;
}
return new ReturnForFoo(10, ValueType.INTEGER); //int
}
}
public class ReturnForFoo {
Object value;
ValueType type;
public ReturnForFoo(Object value, ValueType type) {
this.value=value;
this.type=type
}
// Asume you have getters for both value and value type
public static ENUM ValueType {
STRING,
INTEGER,
UNKNOWN
}
}
This code is in your main.
Foo foo = new Foo();
String value;
int val;
ReturnForFoo returnForFoo = foo.returnAStringOrIntger(true);
// NOTE you can use switch instead of if's and else if's. It will be better
if(returnForFoo.getValueType().equals(ValueType.INTEGER)){
val = (int) returnForFoo.getValue();
} else if(returnForFoo.getValueType().equals(ValueType.STRING)){
value = (String) returnForFoo.getValue();
} else {
// UNKOWN Case
}

Sorting lines in a file by 2 fields with JAVA

I work at a printing company that has many programs in COBOL and I have been tasked to
convert the COBOL programs into JAVA programs. I've run into a snag in the one conversion. I need to take a file that each line is a record and on each line the data is blocked.
Example of a line is
60000003448595072410013 FFFFFFFFFFV 80 0001438001000014530020120808060134
I need to sort data by a 5 digit number at the 19-23 characters and then by the very first character on a line.
BufferedReader input;
BufferedWriter output;
String[] sort, sorted, style, accountNumber, customerNumber;
String holder;
int lineCount;
int lineCounter() {
int result = 0;
boolean eof = false;
try {
FileReader inputFile = new FileReader("C:\\Users\\cbook\\Desktop\\Chemical\\"
+ "LB26529.fil");
input = new BufferedReader(inputFile);
while (!eof) {
holder = input.readLine();
if (holder == null) {
eof = true;
} else {
result++;
}
}
} catch (IOException e) {
System.out.println("Error - " + e.toString());
}
return result;
}
chemSort(){
lineCount = this.lineCounter();
sort = new String[lineCount];
sorted = new String[lineCount];
style = new String[lineCount];
accountNumber = new String[lineCount];
customerNumber = new String[lineCount];
try {
FileReader inputFile = new FileReader("C:\\Users\\cbook\\Desktop\\Chemical\\"
+ "LB26529.fil");
input = new BufferedReader(inputFile);
for (int i = 0; i < (lineCount + 1); i++) {
holder = input.readLine();
if (holder != null) {
sort[i] = holder;
style[i] = sort[i].substring(0, 1);
customerNumber[i] = sort[i].substring(252, 257);
}
}
} catch (IOException e) {
System.out.println("Error - " + e.toString());
}
}
This what I have so far and I'm not really sure where to go from here or even if this is the correct way
to go about sorting the file. After the file is sorted it will be stored into another file and processed
again with another program for it to be ready for printing.
List<String> linesAsList = new ArrayList<String>();
String line=null;
while(null!=(line=reader.readLine())) linesAsList.add(line);
Collections.sort(linesAsList, new Comparator<String>() {
public int compare(String o1,String o2){
return (o1.substring(18,23)+o1.substring(0,1)).compareTo(o2.substring(18,23)+o2.substring(0,1));
}});
for (String line:linesAsList) System.out.println(line); // or whatever output stream you want
This phone's autocorrect is messing up my answer
Read the file into an ArrayList (instead of an array). Use the following methods:
// to declare the arraylist
ArrayList<String> lines = new ArrayList<String>();
// to add a new line to it (within your reading-lines loop)
lines.add(input.readLine());
Then, sort it using a custom Comparator:
Collections.sort(lines, new Comparator<String>() {
public int compare(String a, String b) {
String a5 = theFiveNumbersOf(a);
String b5 = theFiveNumbersOf(b);
int firstComparison = a5.compareTo(b5);
if (firstComparison != 0) { return firstComparison; }
String a1 = theDigitOf(a);
String b1 = theDigitOf(b);
return a1.compareTo(b1);
}
});
(It is unclear what 5 digits or what digit you want to compare; I've left them as functions for you to fill in).
Finally, write it to the output file:
BufferedWriter ow = new BufferedWriter(new FileOutputStream("filename.extension"));
for (String line : lines) {
ow.println(line);
}
ow.close();
(adding imports and try/catch as needed)
This code will sort a file based on mainframe sort parameters.
You pass 3 parameters to the main method of the Sort class.
The input file path.
The output file path.
The sort parameters in mainframe sort format. In your case, this string would be 19,5,CH,A,1,1,CH,A
This first class, the SortParameter class, holds instances of the sort parameters. There's one instance for every group of 4 parameters in the sort parameters string. This class is a basic getter / setter class, except for the getDifference method. The getDifference method brings some of the sort comparator code into the SortParameter class to simplify the comparator code in the Sort class.
public class SortParameter {
protected int fieldStartByte;
protected int fieldLength;
protected String fieldType;
protected String sortDirection;
public SortParameter(int fieldStartByte, int fieldLength, String fieldType,
String sortDirection) {
this.fieldStartByte = fieldStartByte;
this.fieldLength = fieldLength;
this.fieldType = fieldType;
this.sortDirection = sortDirection;
}
public int getFieldStartPosition() {
return fieldStartByte - 1;
}
public int getFieldEndPosition() {
return getFieldStartPosition() + fieldLength;
}
public String getFieldType() {
return fieldType;
}
public String getSortDirection() {
return sortDirection;
}
public int getDifference(String a, String b) {
int difference = 0;
if (getFieldType().equals("CH")) {
String as = a.substring(getFieldStartPosition(),
getFieldEndPosition());
String bs = b.substring(getFieldStartPosition(),
getFieldEndPosition());
difference = as.compareTo(bs);
if (getSortDirection().equals("D")) {
difference = -difference;
}
}
return difference;
}
}
The Sort class contains the code to read the input file, sort the input file, and write the output file. This class could probably use some more error checking.
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
public class Sort implements Runnable {
protected List<String> lines;
protected String inputFilePath;
protected String outputFilePath;
protected String sortParameters;
public Sort(String inputFilePath, String outputFilePath,
String sortParameters) {
this.inputFilePath = inputFilePath;
this.outputFilePath = outputFilePath;
this.sortParameters = sortParameters;
}
#Override
public void run() {
List<SortParameter> parameters = parseParameters(sortParameters);
lines = read(inputFilePath);
lines = sort(lines, parameters);
write(outputFilePath, lines);
}
protected List<SortParameter> parseParameters(String sortParameters) {
List<SortParameter> parameters = new ArrayList<SortParameter>();
String[] field = sortParameters.split(",");
for (int i = 0; i < field.length; i += 4) {
SortParameter parameter = new SortParameter(
Integer.parseInt(field[i]), Integer.parseInt(field[i + 1]),
field[i + 2], field[i + 3]);
parameters.add(parameter);
}
return parameters;
}
protected List<String> sort(List<String> lines,
final List<SortParameter> parameters) {
Collections.sort(lines, new Comparator<String>() {
#Override
public int compare(String a, String b) {
for (SortParameter parameter : parameters) {
int difference = parameter.getDifference(a, b);
if (difference != 0) {
return difference;
}
}
return 0;
}
});
return lines;
}
protected List<String> read(String filePath) {
List<String> lines = new ArrayList<String>();
BufferedReader reader = null;
try {
String line;
reader = new BufferedReader(new FileReader(filePath));
while ((line = reader.readLine()) != null) {
lines.add(line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (reader != null) {
reader.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
return lines;
}
protected void write(String filePath, List<String> lines) {
BufferedWriter writer = null;
try {
writer = new BufferedWriter(new FileWriter(filePath));
for (String line : lines) {
writer.write(line);
writer.newLine();
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (writer != null) {
writer.flush();
writer.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) {
if (args.length < 3) {
System.err.println("The sort process requires 3 parameters.");
System.err.println(" 1. The input file path.");
System.err.println(" 2. The output file path.");
System.err.print (" 3. The sort parameters in mainframe ");
System.err.println("sort format. Example: 15,5,CH,A");
} else {
new Sort(args[0], args[1], args[2]).run();
}
}
}

A Producer-Consumer implemented using java threads writes only half the data to file

Hello I have a problem wherein I have to read a huge csv file. remove first field from it, then store only unique values to a file. I have written a program using threads which implements producer-consumer pattern.
Class CSVLineStripper does what the name suggests. Takes a line out of csv, removes first field from every line and adds it to a queue. CSVLineProcessor then takes that field stores all one by one in an arraylist and checks if fields are unique so only uniques are stored. Arraylist is only used for reference. every unique field is written to a file.
Now what is happening is that all fields are stripped correctly. I run about 3000 lines it's all correct. When I start the program for all lines, which are around 7,00,000 + lines, i get incomplete records, about 1000 unique are not taken. Every field is enclosed in double-quotes. What is weird is that the last field in the file that is generated is an incomplete word and ending double quote is missing. Why is this happening?
import java.util.*;
import java.io.*;
class CSVData
{
Queue <String> refererHosts = new LinkedList <String> ();
Queue <String> uniqueReferers = new LinkedList <String> (); // final writable queue of unique referers
private int finished = 0;
private int safety = 100;
private String line = "";
public CSVData(){}
public synchronized String getCSVLine() throws InterruptedException{
int i = 0;
while(refererHosts.isEmpty()){
if(i < safety){
wait(10);
}else{
return null;
}
i++;
}
finished = 0;
line = refererHosts.poll();
return line;
}
public synchronized void putCSVLine(String CSVLine){
if(finished == 0){
refererHosts.add(CSVLine);
this.notifyAll();
}
}
}
class CSVLineStripper implements Runnable //Producer
{
private CSVData cd;
private BufferedReader csv;
public CSVLineStripper(CSVData cd, BufferedReader csv){ // CONSTRUCTOR
this.cd = cd;
this.csv = csv;
}
public void run() {
System.out.println("Producer running");
String line = "";
String referer = "";
String [] CSVLineFields;
int limit = 700000;
int lineCount = 1;
try {
while((line = csv.readLine()) != null){
CSVLineFields = line.split(",");
referer = CSVLineFields[0];
cd.putCSVLine(referer);
lineCount++;
if(lineCount >= limit){
break;
}
}
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("<<<<<< PRODUCER FINISHED >>>>>>>");
}
private String printString(String [] str){
String string = "";
for(String s: str){
string = string + " "+s;
}
return string;
}
}
class CSVLineProcessor implements Runnable
{
private CSVData cd;
private FileWriter fw = null;
private BufferedWriter bw = null;
public CSVLineProcessor(CSVData cd, BufferedReader bufferedReader){ // CONSTRUCTOR
this.cd = cd;
try {
this.fw = new FileWriter("unique_referer_dump.txt");
} catch (IOException e) {
e.printStackTrace();
}
this.bw = new BufferedWriter(fw);
}
public void run() {
System.out.println("Consumer Started");
String CSVLine = "";
int safety = 10000;
ArrayList <String> list = new ArrayList <String> ();
while(CSVLine != null || safety <= 10000){
try {
CSVLine = cd.getCSVLine();
if(!list.contains(CSVLine)){
list.add(CSVLine);
this.CSVDataWriter(CSVLine);
}
} catch (Exception e) {
e.printStackTrace();
}
if(CSVLine == null){
break;
}else{
safety++;
}
}
System.out.println("<<<<<< CONSUMER FINISHED >>>>>>>");
System.out.println("Unique referers found in 30000 records "+list.size());
}
private void CSVDataWriter(String referer){
try {
bw.write(referer+"\n");
} catch (Exception e) {
e.printStackTrace();
}
}
}
public class RefererCheck2
{
public static void main(String [] args) throws InterruptedException
{
String pathToCSV = "/home/shantanu/DEV_DOCS/Contextual_Work/excite_domain_kw_site_wise_click_rev2.csv";
CSVResourceHandler csvResHandler = new CSVResourceHandler(pathToCSV);
CSVData cd = new CSVData();
CSVLineProcessor consumer = new CSVLineProcessor(cd, csvResHandler.getCSVFileHandler());
CSVLineStripper producer = new CSVLineStripper(cd, csvResHandler.getCSVFileHandler());
Thread consumerThread = new Thread(consumer);
Thread producerThread = new Thread(producer);
producerThread.start();
consumerThread.start();
}
}
This is how a sample input is:
"xyz.abc.com","4432"."clothing and gifts","true"
"pqr.stu.com","9537"."science and culture","false"
"0.stu.com","542331"."education, studies","false"
"m.dash.com","677665"."technology, gadgets","false"
Producer stores in queue:
"xyz.abc.com"
"pqr.stu.com"
"0.stu.com"
"m.dash.com"
Consumer stores uniques in the file, but after opening file contents one would see
"xyz.abc.com"
"pqr.stu.com"
"0.st
Couple things, you are breaking after 700k, not 7m, also you are not flushing your buffered writer, so the last stuff you could be incomplete, add flush at end and close all your resources. Debugger is a good idea :)

Categories

Resources