Is there a faster way to use csv reader in Java?

Is there a faster way to use csv reader in Java? - java

I need to open a csv file in more parts, each one by 5,000 samples and then plot them. To go back and forward on the signal each time I click a button I have to instantiate a new reader and than I skip to the point I need. My signal is big, is about 135,000 samples so csvReader.skip() method is very slow when I work with last samples. But to go back I can't delete lines, so each time my iterator needs to be re-instantiated. I noticed that skip uses a for loop? Is there a better way to overtake this problem? Here is my code:
public void updateSign(int segmento) {
Log.d("segmento", Integer.toString(segmento));
//check if I am in the signal length
if (segmento>0 && (float)(segmento-1)<=(float)TOTAL/normaLen)
{
try {
reader = new CSVReader(new FileReader(new File(patty)));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
List<Integer> sign = new ArrayList<>();
//this is the point of the signal where i finish
int len = segmento * normaLen;
//check if i am at the end of the signal
if (len >= TOTAL) {
len = TOTAL;
segmento=0;
avanti.setValue(false);
System.out.println(avanti.getValue());
} else {
lines = TOTAL - len;
avanti.setValue(true);
System.out.println(avanti.getValue());
}
//the int to i need to skip
int skipper = (segmento-1)*normaLen;
try {
System.out.println("pre skip");
reader.skip(skipper);
System.out.println("post skip");
} catch (IOException e) {
e.printStackTrace();
}
//my iterator
it = reader.iterator();
System.out.println("iteratore fatto");
//loop to build my mini-signal to plot
//having only 5,000 sample it is fast enaugh
for (int i = skipper; i < len-1; i++) {
if (i>=(segmento-1)*normaLen) {
sign.add(Integer.parseInt(it.next()[0]));
}
else
{
it.next();
System.out.println("non ha funzionato lo skip");
}
}
System.out.println("ciclo for: too much fatica?");
//set sign to be plotted by my fragment
liveSign.setValue(sign);
}
}
Thanks in advance!

Related

Highscore binary file

Hi!
I would like to maintain a high score system with a maximum of 5 binary scores for my school project game. If the new value is higher than what is already in the file, I want to replace it with the new value.
I've been at it for a long time, but I'm not really getting it out. It always adds new lines and my check that the value is already in the file does not work. I hope you can help me a step further.
thanks in advance
Here's my code:
private static final String HIGHSCORE_PATH = "highscore-" + LocalDate.now() + ".dat";
private static final Path filePath = Paths.get(HIGHSCORE_PATH);
private final ArrayList highscoreList = new ArrayList();
private final ArrayList highscoreFileWaardes = new ArrayList();
FileOutputStream fos = new FileOutputStream(HIGHSCORE_PATH, true);
int numberRulesInUse = 0;
int teller2 = 0;
int teller3 = 0;
public void writeHighscore(int highscore) throws IOException {
if (Files.exists(filePath)) {
try (Scanner fileScanner = new Scanner(filePath)) {
if (numberRulesInUse > 6) {
System.out.println("File is too long");
return;
} else if (numberRulesInUse == 0) {
for (int i = 0; i < 5; i++) {
highscoreList.add(0);
}
} else {
String highscoreBinair = fileScanner.nextLine();
highscoreList.set(teller2, (Integer.parseInt(highscoreBinair, 2)));
while (fileScanner.hasNext()) {
if (aantalRegelsInGebruik < 5) {
if (!highscoreList.get(teller3).equals(highscoreFileWaardes.get(teller3))) {
highscoreList.set(teller3, (Integer.parseInt(highscoreBinair, 2)));
teller3++;
numberRulesInUse++;
} else {
System.out.println("Already in file");
return;
}
} else {
System.out.println("FILE DOESN'T ALLOW MORE THAN 5 RULES");
return;
}
}
}
} catch (IOException ioException) {
throw new IOException("Error while reading file");
}
if ((Integer) highscoreList.get(teller3) < highscore && (Integer) highscoreList.get(teller3) != highscore) {
try {
highscoreList.set(teller3, Integer.parseInt(String.valueOf(highscore)));
fos.write(Integer.toBinaryString(Integer.parseInt(String.valueOf(highscoreList.get(teller3)))).getBytes());
fos.write("\n".getBytes());
teller3++;
numberRulesInUse++;
} catch (IOException e) {
throw new IOException("Error while writing to file");
}
} else {
System.out.println("No new highscore or highscore already exists");
}
}
fos.flush();
//fos.close();
}

Let's take a step back and think about what you are trying to do, and if there is an easier way to do it all.
Load the scores from the file to a list,
Add the new high score to the list if it is higher,
Save the top 5 new high scores,
Profit!
We start by reading each line of the file to an array and parsing it to an Integer existingHighScores.add(Integer.parseInt(line, 2));
The next step is to check if your score is higher or not, however, there is an far easier way to do this by simply adding the score to the end of the list existingHighScores.add(highscore);, and then we can sort the list in descending order Collections.sort(existingHighScores, Collections.reverseOrder()); that single line of code will cut out all your loops and if/else checks and give you some great readable code.
Finally, because we have a nicely sorted list we can just save the first 5 items in the list which are the 5 top scores to file, and as mentioned earlier, because the code is sorted it will remove the need to compare and remove lower scores:
for (int i = 0; i < 5; i++)
{
out.write(Integer.toBinaryString(existingHighScores.get(i)).getBytes());
out.write("\n".getBytes());
}
Now if we put it all together a fully working example might look something like this:
public void writeHighscore(int highscore) throws Exception {
//File path
Path file = Paths.get(HIGHSCORE_PATH);
//Fill arraylist with 5 0's to avoid issues if the list loaded from file is corrupt, or shorter than 5
ArrayList<Integer> existingHighScores = new ArrayList<>(Arrays.asList(0,0,0,0,0));
//Load the exising high scores
InputStream in = Files.newInputStream(file);
BufferedReader reader = new BufferedReader(new InputStreamReader(in)));
String line = null;
while ((line = reader.readLine()) != null)
{
//load score, of if there is an error default to 0
try{
existingHighScores.add(Integer.parseInt(line, 2));
}
catch (Exception e){
existingHighScores.add(0);
}
}
//Remember to close the file, otherwise we cant save the new scores below
reader.close();
in.close();
//Add the new high score to list
existingHighScores.add(highscore);
//Sort the scores
Collections.sort(existingHighScores, Collections.reverseOrder());
//Save the highscores to file
OutputStream out = new BufferedOutputStream(Files.newOutputStream(file)));
//Only write the first 5 scores
for (int i = 0; i < 5; i++)
{
out.write(Integer.toBinaryString(existingHighScores.get(i)).getBytes());
out.write("\n".getBytes());
}
}
Now you can add your exception handling and error checking if(Files.exists(file)) etc, and you can play with the code to get the file format working exactly as you want, but the logic itself remains simple and unchanged.

Correct Implementation of Multithreading

I've been thinking about this for a few days now, and I am under the belief that my thinking of how multithreading works is flawed. I've consulted the API for Concurrency, and still did not get an answer to my question.
If I have some process that I want to return a unique string, and I want 100 different threads to run this process (thereby giving me 100 threads with 100 different strings), I can utilize executorServices to accomplish this, correct?
Then, once every thread has a string, I want to send that string to a queue, thereby blasting the queue with messages (again, sounds like a usage for a executorService.submit() call).
Lastly, Once a thread sends the message to to the queue, I want this thread to immediately start checking another queue for a response (matching it's unique string), and if it matches, output some data and terminte.
Although I am under the impression that using Executorservice would be the answer to my issue, I am failing in implementation. Is there another solution that I am missing, or does multithreading using using this method suffice--and if so, how?
Code thus far. I stopped after the sendTextMessage after realizing my issue:
int arraySize = 750; //Number of hits to send out: MANIPULATE THIS IF YOU WANT TO SEND MORE
// String[] toSend = new String[arraySize];
String[] rfids = new String[arraySize];
double total = 0;
int count = 0;
//Insert Connection information and set up clients here//
clientSend.connectSend();
clientRec.connectRec(); //edit to ensure credientials are corrects
// System.out.println("Signed-in");
StringBuffer output = new StringBuffer(); //What holds out output
File infile = new File("infile.txt"); //Populating Rfids array
Scanner scan = new Scanner(infile);
for (int i = 0; i <= arraySize-1; i++)
{
if(scan.hasNextLine())
{
rfids[i]=scan.nextLine();
// System.out.println(rfids[i]);
}
}
scan.close();
count=0;
ExecutorService load = Executors.newFixedThreadPool(arraySize);
Callable<String> readAndSendPrep = () ->
{
StringBuffer fileBasedResponse = new StringBuffer();
String rfid = "";
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader("input.txt")); //This is the standard message that will be sent everytime, give or take
} catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
String line; //temp var
for(int x = 0; x<arraySize-1;x++)
{
try {
while ((line = reader.readLine ()) != null)
{
if (line.trim().startsWith("<MessageId>"))
{
// System.out.println(rf);
rfid = rfids[arraySize]; //not necessary i think
int endIndex = line.trim().indexOf("</MessageId>");
String messageId = line.trim().substring(11, endIndex);
line = "<MessageId>" + messageId + " - " + rfids[arraySize] + "</MessageId>"; //puts unique ID in thread details
}
else if (line.trim().startsWith("str"))
{
// System.out.println(allRFID[thisIndex]);
rfid = rfids[arraySize];
line = "str" + rfids[arraySize] + "str"; //Another unique ID
// System.out.println("BOOM");
}
fileBasedResponse.append(line); //put the whole response to the StringBuffer object
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Thread.sleep(1);
return fileBasedResponse.toString();
};
TimeUnit.SECONDS.sleep(10);
Future<String> fileBasedResponse = load.submit(readAndSendPrep);
while(!fileBasedResponse.isDone())
{
Thread.sleep(1);
}
String fileBasedResponseStr = fileBasedResponse.toString();
Runnable sender = () ->
{
try {
clientSend.sendTextMessage(fileBasedResponseStr);
} catch (JMSException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
};
clientSend.close(); //close connections
clientRec.close();
System.out.println(output.toString()); //output the results
System.out.println(count);

Why does my code become slow after processing a large dataset?

I have a Java program, which basically reads from a file line by line and stores the lines into a set. The file contains more than 30000000 lines. My program runs fast at the beginning but slow down after processing 20000000 lines and even too slow to wait. Can somebody explains why this would happen and how can I speed up the program again?
Thanks.
public void returnTop100Phases() {
Set<Phase> phaseTreeSet = new TreeSet<>(new Comparator<Phase>() {
#Override
public int compare(Phase o1, Phase o2) {
int diff = o2.count - o1.count;
if (diff == 0) {
return o1.phase.compareTo(o2.phase);
} else {
return diff > 0 ? 1 : -1;
}
}
});
try {
int lineCount = 0;
BufferedReader br = new BufferedReader(
new InputStreamReader(new FileInputStream(new File("output")), StandardCharsets.UTF_8));
String line = null;
while ((line = br.readLine()) != null) {
lineCount++;
if (lineCount % 10000 == 0) {
System.out.println(lineCount);
}
String[] tokens = line.split("\\t");
phaseTreeSet.add(new Phase(tokens[0], Integer.parseInt(tokens[1])));
}
br.close();
PrintStream out = new PrintStream(System.out, true, "UTF-8");
Iterator<Phase> iterator = phaseTreeSet.iterator();
int n = 100;
while (n > 0 && iterator.hasNext()) {
Phase phase = iterator.next();
out.print(phase.phase + "\t" + phase.count + "\n");
n--;
}
out.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}

Looking at the runtime behaviour this is clearly a memory issue. Actually my tests even broke after around 5M with 'GC overhaed limit exeeded' on Java8. If I limit the size of the phaseTreeSet by adding
if (phaseTreeSet.size() > 100) { phaseTreeSet.pollLast(); }
it runs through quickly. The point why it gets that slow is, it uses more memory, and thus the garbage collection takes longer. But every time before it takes more memory it has to do a big garbage collection again. Obviously there's quite some memory to take, and every time it gets a bit slower...
To get faster you need to get the stuff out of memory. Maybe by keeping only top Phases like I did, or by using kind of a database.

randomAccessFile.readLine() returns null after many uses even though not reaching EOF?

I have a file with 10K lines.
I read it in chunks of 200 lines.
I have a problem that after 5600 lines (chunk 28), randomAccessFile.readLine() returns null.
however, if i start reading from chunk 29 it reads another chunk and stops ( return null).
I force reading from chunk 30, and again - it reads one chunk and stops.
this is my code:
private void addRequestsToBuffer(int fromChunkId, List<String> requests) {
String line;
while (requests.size() < chunkSizeInLines) {
if ((line = readNextLine()) != null) {
return;
}
int httpPosition = line.indexOf("http");
int index = fromChunkId * chunkSizeInLines + requests.size();
requests.add(index + ") " + line.substring(httpPosition));
}
}
private String readNextLine() {
String line;
try {
line = randomAccessFile.readLine();
if (line == null) {
System.out.println("randomAccessFile.readLine() returned null");
}
} catch (IOException ex) {
ex.printStackTrace();
throw new RuntimeException(ex);
}
return line;
}
#Override
public List<String> getNextRequestsChunkStartingChunkId(int fromChunkId) {
List<String> requests = new ArrayList<>();
int linesNum = 0;
try {
for (int i = 0; i < fromChunkId; i++) {
while ((linesNum < chunkSizeInLines) && (randomAccessFile.readLine()) != null) {
linesNum++;
}
linesNum = 0;
}
addRequestsToBuffer(fromChunkId, requests);
} catch (IOException ex) {
ex.printStackTrace();
throw new RuntimeException(ex);
}
return requests;
}
what can cause this? randomAccessFile time out?

Each time you call getNextRequestsChunkStartingChunkId you're skipping the specified number of chunks, without "rewinding" the RandomAccessFile to the start. So for example, if you call:
getNextRequestsChunkStartingChunkId(0);
getNextRequestsChunkStartingChunkId(1);
getNextRequestsChunkStartingChunkId(2);
you'll actually read:
Chunk 0 (leaving the stream at the start of chunk 1)
Chunk 2 (leaving the stream at the start of chunk 3)
Chunk 5 (leaving the stream at the start of chunk 6)
Options:
Read the chunks sequentially, without skipping anything
Rewind at the start of the method
Unfortunately you can't use seek for this, because your chunks aren't equally sized, in terms of bytes.

Java- Basic IO trouble

I'm trying to code a sieve of eratosthenes which I intend to use to find the largest prime factor of 13195. If this works, I intend to use it on the number: 600851475143.
Since creating a list of numbers ranging from 2-600851475143 would be nearly impossible due to memory issues, I have decided to store the numbers in a text file instead.
The problem I'm running into though is that instead of getting a text file filled with numbers, the code only produces a file with one number (this is my first time work with IO related stuff in Java):
long number = 13195;
long limit = (long) Math.sqrt(number);
for (long i = 2; i < limit + 1; i++)
{
try
{
Writer output = null;
File file = new File("Primes.txt");
output = new BufferedWriter(new FileWriter(file));
output.write(Long.toString(i) + "\n");
output.close();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Here's the output contained the text file:
114
What am I doing wrong?

Don't use Erathostenes - it's too slow unless you need all the primes in the range.
Here is a better way to factorize a given number. The function returns a map, where the keys are the prime factors of n and the values are their powers. E.g. for 13195 it will be {5:1, 7:1, 13:1, 29:1}
It's complexity is O(sqrt(n)):
public static Map<Integer, Integer> Factorize(int n){
HashMap<Integer, Integer> ret = new HashMap<Integer, Integer>();
int origN = n;
for(int p = 2; p*p <= origN && n > 1; p += (p == 2 ? 1: 2)){
int power = 0;
while (n % p == 0){
++power;
n /= p;
}
if(power > 0)
ret.put(p, power);
}
return ret;
}
Of course if you need just the largest prime factor you can return the last p only not the whole map - the complexity is the same.

Your code keep re-opening, writing, and closing the same file. You should do something like this:
long number = 13195;
long limit = (long) Math.sqrt(number);
try
{
File file = new File("Primes.txt");
Writer output = new BufferedWriter(new FileWriter(file));
for (long i = 2; i < limit + 1; i++)
{
output.write(Long.toString(i) + "\n");
}
output.close();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}

You need to take the file instantiation out of the loop.

You are overwriting your file on every pass through the loop.
You need to open your file outside the main loop.
long number = 13195;
long limit = (long) Math.sqrt(number);
try
{
Writer output = null;
File file = new File("Primes.txt");
output = new BufferedWriter(new FileWriter(file));
catch (IOException e)
{
// Cannot open file
e.printStackTrace();
}
for (long i = 2; i < limit + 1; i++)
{
try
{
output.write(Long.toString(i) + "\n");
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
output.close();

You are recreating your filewriter in every iteration of your for-loop and not specifying it to append so you are overwriting your file in every iteration.
Try changing it to create your filewriter before your for-loop and close it after the loop. Something like this:
long number = 13195;
long limit = (long) Math.sqrt(number);
Writer output = null;
try
{
File file = new File("/var/tmp/Primes.txt");
output = new BufferedWriter(new FileWriter(file));
for (long i = 2; i < limit + 1; i++) {
output.write(Long.toString(i) + "\n");
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
output.close();
}

Saving directly to disk is slower, have you considered doing this in pieces, then saving to disk? It could also have the benefit of smaller file size, since you can write only the primes you have found, instead of every composite number.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Is there a faster way to use csv reader in Java? - java

Related

Highscore binary file

Correct Implementation of Multithreading

Why does my code become slow after processing a large dataset?

randomAccessFile.readLine() returns null after many uses even though not reaching EOF?

Java- Basic IO trouble

Categories

Resources