Java looping through array - Optimization - java

I've got some Java code that runs quite the expected way, but it's taking some amount of time -some seconds- even if the job is just looping through an array.
The input file is a Fasta file as shown in the image below. The file I'm using is 2.9Mo, and there are some other Fasta file that can take up to 20Mo.
And in the code im trying to loop through it by bunches of threes, e.g: AGC TTT TCA ... etc The code has no functional sens for now but what I want is to append each Amino Acid to it's equivalent bunch of Bases. Example :
AGC - Ser / CUG Leu / ... etc
So what's wrong with the code ? and Is there any way to do it better ? Any optimization ? Looping through the whole String is taking some time, maybe just seconds, but need to find a better way to do it.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
public class fasta {
public static void main(String[] args) throws IOException {
File fastaFile;
FileReader fastaReader;
BufferedReader fastaBuffer = null;
StringBuilder fastaString = new StringBuilder();
try {
fastaFile = new File("res/NC_017108.fna");
fastaReader = new FileReader(fastaFile);
fastaBuffer = new BufferedReader(fastaReader);
String fastaDescription = fastaBuffer.readLine();
String line = fastaBuffer.readLine();
while (line != null) {
fastaString.append(line);
line = fastaBuffer.readLine();
}
System.out.println(fastaDescription);
System.out.println();
String currentFastaAcid;
for (int i = 0; i < fastaString.length(); i+=3) {
currentFastaAcid = fastaString.toString().substring(i, i + 3);
System.out.println(currentFastaAcid);
}
} catch (NullPointerException e) {
System.out.println(e.getMessage());
} catch (FileNotFoundException e) {
System.out.println(e.getMessage());
} catch (IOException e) {
System.out.println(e.getMessage());
} finally {
fastaBuffer.close();
}
}
}

currentFastaAcid = fastaString.toString().substring(i, i + 3);
Please replace with
currentFastaAcid = fastaString.substring(i, i + 3);
toString method of StringBuilder create new instance of String object every time you call it. It still contain a copy of all your large string. If you call substring directly from StringBuilder it will return a small copy of substring.
Also remove System.out.println if you don't really need it.

The big factor here is you are doing the call to substring over a new String each time.
Instead, use substring directly over the stringbuilder
for (int i = 0; i < fastaString.length(); i+=3){
currentFastaAcid = fastaString.substring(i, i + 3);
System.out.println(currentFastaAcid);
}
Also, instead of print the currentFastaAcid each time, save it into a list and print this list at the end
List<String> acids = new LinkedList<String>();
for (int i = 0; i < fastaString.length(); i+=3){
currentFastaAcid = fastaString.substring(i, i + 3);
acids.add(currentFastaAcid);
}
System.out.println(acids.toString());

Your main problem besides the debug output surely is, that you are creating a new String with your completely read data from the file in each iteration of your loop:
currentFastaAcid = fastaString.toString().substring(i, i + 3);
fastaString.toString() will give the same result in each iteration and therefore is redundant. Get it outside the loop and you will surely save some seconds runtime.

Apart from suggested optimization in the serial code, I will go for parallel processing to reduce time further. If you have really big file, you can divide the work of reading file and processing read-lines, in separate threads. That way, when one thread is busy reading nextline from large file, other thread can process read-lines and print them on console.

If you remove the
System.out.println(currentFastaAcid);
line in the for loop, you will gain quite decent time.

Related

How can I get Java to read all text in file?

I am trying to get Java to read text from a file so that I can convert the text into a series of ascii values, but currently it only seems to be reading and retrieving the first line of the txt file. I know this because the output is much shorter than the text in the file.
The text in the file is below:
AD Mullin Sep 2014 https://hellopoetry.com/poem/872466/prime/
Prime
Have you ever thought deeply about Prime numbers?
We normally think of prime as something unbreachable
In base ten this is most likely true
But there are other languages that might be used to break down numbers
I'm no theorist but I have my theories
What was behind the Big Bang?
Prime
If impermeable ... then the Big Bang never happened
And any good programmer worth a lick of salt, always leaves a back door
So, I bet there are some Prime numbers out there that are permeable, otherwise ...
We wouldn't be the Children of the Big Bang
I think because each line of text has an empty line between them the program is only reading the first line then stopping when it sees there is no line after it, but in facts 2 lines down instead.
Here is the code I have written:
package poetry;
import java.io.FileNotFoundException;
import java.util.Formatter;
import java.util.Scanner;
import java.io.File;
import java.io.IOException;
import java.io.FileWriter;
public class poetry {
public static void main(String[] args) {
// TODO Auto-generated method stub
//Below try catch block reads file text and encodes it.
try {
File x = new File("/Users/jordanbendon/Desktop/poem.txt");
Scanner sc = new Scanner(x);
//Right below is where I think the issue lies!
while(sc.hasNextLine()) {
String lines = sc.nextLine();
char[] stringArray = lines.toCharArray();
String result = "";
for (int i = 0; i < lines.length(); i++) {
int ascii = lines.codePointAt(i);
if ((ascii >= 65 && ascii <= 90) || (ascii >= 97 && ascii <= 122)) {
ascii += 15;
result += Integer.toString(ascii);
} else {
result += stringArray[i];
}
}
System.out.println(result);
//Try catch block here creates a new file.
try {
File myObj = new File("/Users/jordanbendon/Desktop/EncryptedMessage.txt");
File s = myObj;
if (myObj.createNewFile()) {
System.out.println("File created: " + myObj.getName());
} else {
System.out.println("File already exists.");
break;
}
} catch (IOException e) {
System.out.println("An error occurred.");
e.printStackTrace();
}
//Try catch block here writes the new encrypted code to the newly created file.
try {
FileWriter myWriter = new FileWriter("/Users/jordanbendon/Desktop/EncryptedMessage.txt");
myWriter.write(result);
myWriter.close();
} catch (IOException e) {
System.out.println("An error occurred.");
e.printStackTrace();
}
}}
catch(FileNotFoundException e) {
System.out.println("error");
}
}
}
I have commented in the code where I think the issue is. The first while condition checks whether there is a next line by using the hasNextLine(), I have tried using the method ReadAllLines() but it says this method is undefined for the type scanner.
How can I get the program to read and retrieve the entire text file instead of the first line?
Thanks!
To read the entire input stream:
Scanner sc = new Scanner(x).useDelimiter("\\A");
then just:
String entireInput = sc.next();
This works by setting the token delimiter to start of all input, which of course is never encountered after any byte read, so the "next" token is the entire input.
For each execution you check whether the hard coded file name was created or already exists. In case it already existed you happen to break the loop which halts the execution from progressing.
https://www.javatpoint.com/java-break

Getting data from .csv file into array list of objects

Here is the format of the .csv file I am working with -
Hostname,IP Address,Patched?,OS Version,Notes
A.example.COM,1.1.1.1,NO,11,Faulty fans
b.example.com,1.1.1.2,no,13,Behind the other routers so no one sees it
C.EXAMPLE.COM,1.1.1.3,no,12.1
d.example.com,1.1.1.4,yes,14
c.example.com,1.1.1.5,no,12,Case a bit loose
e.example.com,1.1.1.6,no,12.3
f.example.com,1.1.1.7,No,15,Guarded by sharks with lasers on their heads
I currently have this program which reads in all data from the above .csv file into an array list before then outputting the results to the console as you can see in the included output text. Ideally I eventually need to be able to compare each different field in each different row with one another, perform calculations etc. so would like to save each row as an object inside an array list instead. I have tried doing this but so far with no success. Is there a simple way of modifying my program to do this? Also, if possible, I would like for the headings and the notes not to be included. Here is my code so far -
package crunchify;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
public class CrunchifyCSVtoArrayList {
public static void main(String[] args) {
BufferedReader crunchifyBuffer = null;
try {
String crunchifyLine;
crunchifyBuffer = new BufferedReader(new FileReader("Crunchify-CSV-to-ArrayList.csv"));
while ((crunchifyLine = crunchifyBuffer.readLine()) != null) {
System.out.println("ArrayList data: " + crunchifyCSVtoArrayList(crunchifyLine) + "\n");
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (crunchifyBuffer != null) crunchifyBuffer.close();
} catch (IOException crunchifyException) {
crunchifyException.printStackTrace();
}
}
}
// Utility which converts CSV to ArrayList using Split Operation
public static ArrayList<String> crunchifyCSVtoArrayList(String crunchifyCSV) {
ArrayList<String> crunchifyResult = new ArrayList<String>();
if (crunchifyCSV != null) {
String[] splitData = crunchifyCSV.split("\\s*,\\s*");
for (int i = 0; i < splitData.length; i++) {
if (!(splitData[i] == null) || !(splitData[i].length() == 0)) {
crunchifyResult.add(splitData[i].trim());
}
}
}
return crunchifyResult;
}
}
Current output:
ArrayList data: [Hostname, IP Address, Patched?, OS Version, Notes]
ArrayList data: [A.example.COM, 1.1.1.1, NO, 11, Faulty fans]
ArrayList data: [b.example.com, 1.1.1.2, no, 13, Behind the other routers so no one sees it]
ArrayList data: [C.EXAMPLE.COM, 1.1.1.3, no, 12.1]
ArrayList data: [d.example.com, 1.1.1.4, yes, 14]
ArrayList data: [c.example.com, 1.1.1.5, no, 12, Case a bit loose]
ArrayList data: [e.example.com, 1.1.1.6, no, 12.3]
ArrayList data: [f.example.com, 1.1.1.7, No, 12.2]
ArrayList data: [g.example.com, 1.1.1.6, no, 15, Guarded by sharks with lasers on their heads]
To give some more detail on the kind of thing I will eventually need to do, it needs to print out which routers need updating e.g. if any of the routers are below version 12 then print that those ones need updating.
You should create a custom class to hold the data. Objects of this class represent a single row from the csv file.

Java variable not being affected

Now this may sound like a question that has been repeated many times before but I've done a day of research with people that has other reasons for this Issue.
I have a function that reads a part of the save file and its been shown that it does receive the correct data. So the error is that the integer variable completely ignores the new variable and shows no change in the live debugger so like many other post it is not just a duplicate object error. I cant seem to pinpoint what is the main issue is here and it's the last major thing holding me back. Any help would be great and I'm very extremely sorry if I did manage to miss a topic about this on the internet.
Code that fails:
#Override
public void read(List<String> data) {
//world positions are not being changed at all
System.out.println(data.get(1));
int test = Integer.valueOf(data.get(1).replaceAll("[^\\d.]", ""));
worldXPos = Integer.valueOf(data.get(0).replaceAll("[^\\d.]", ""));
worldZPos = test;
}
Another class that gives the data:
public void readSaveFunctions(){
if(!gameSaves.exists()){
gameSaves.mkdir();
}
String currentLine;
try {
List<String> data = new ArrayList<String>();
FileReader read = new FileReader(currentFile);
BufferedReader reader = new BufferedReader(read);
String key = "";
while((currentLine = reader.readLine()) != null){
if(currentLine.contains("#")){
key = currentLine;
data = new ArrayList<String>();
}else if(currentLine.contains("*end")){
for(int i = 0; i < saves.length; i++){
String tryKey = "#" + saves[i].IDName();
if(tryKey.equals(key)){
key = "";
saves[i].read(data);
}
}
}else data.add(currentLine);
}
reader.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
Another way of explaining it is this:
Debugger is set to step - to - step mode so I see each line getting executed at human speed then I get to a line like this but all of the ones setting the variables have the same effect:
worldXPos = Integer.valueOf(data.get(0).replaceAll("[^\\d.]", ""));
and the debugger shows the two integers having different numbers but the instant class variable stays exactly the same with no effect in the debugger after the line goes through.
Update:
I forgot to mention the method has a #override method and it seems that this #override may be causing this issue, now finally I may have a path to follow again
So I found my answer: The AWT thread manage to activate calling a method from another class that changed the integer before it could be read. It really though me off at first because the debugger only showed one of the threads and with no way to know the other one was actively changing it to early. Thanks for all the help :P.

Java Strings not printing above length 4094

So I've got a program that generates large binary sequences, and if the string length goes above 4094 it doesn't print. Here's a code snippet the highlights the problem:
private static void ALStringTest() {
String al = "1";
for (int i = 0; i < 5000; i++) {
al += "1";
System.out.println(al.length());
System.out.println(al);
System.out.println(al.isEmpty());
}
}
What's interesting is the length continues to increase, and the boolean value stays false, but I'm unable to see the strings of length 4095 and above.
It's also not a printing error, as I've attempted to write the strings to xml and they don't appear either, all I get is spaces equal to the strings length.
Edit:
I've tried printing a file using this snippet and I have the same problem:
private static void ALStringTest() throws IOException {
File fout = new File("out.txt");
FileOutputStream fos = new FileOutputStream(fout);
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(fos));
String al = "1";
for (int i = 0; i < 5000; i++) {
al += "1";
bw.write(al);
bw.newLine();
System.out.println(al.length());
System.out.println(al);
System.out.println(al.isEmpty());
}
bw.close();
}
However, people have confirmed this works on external machines (thanks) (as well as on my own using javac, I'm lead to believe this may be Eclipse specific.
Anyone know why Eclipse might be doing this?
the boolean will stay false as hashcode of al is some value and that of "" is 0, == checks for reference.
So it turns out it was an IDE issue:
Simply copying it to a text editor revealed the strings. I'll update when I find the offending option.
The possible reason of this maybe the defect of console which is rapidly printing output results. So that maybe it's only happening in console output. I've tested those each and the result was sometime it prints false only more than 6 times or sometime it prints only length That shouldn't be happened as scenario. But everything works fine when we use thread and make it sleep even 1 millisecond. The output is fine enough as codes,
class ThreadTest extends Thread {
public ThreadTest() {
super();
}
public void run() {
String al = "1";
for (int i = 0; i < 5000; i++) {
try {
sleep(1);
al += "1";
System.out.println(al.length());
System.out.println(al);
System.out.println(al.equals(""));
} catch (InterruptedException e) {
}
}
}
}
Call this in main method
new ThreadTest().start();

Why is website crawling taking forever?

public class Parser {
public static void main(String[] args) {
Parser p = new Parser();
p.matchString();
}
parserObject courseObject = new parserObject();
ArrayList<parserObject> courseObjects = new ArrayList<parserObject>();
ArrayList<String> courseNames = new ArrayList<String>();
String theWebPage = " ";
{
try {
URL theUrl = new URL("http://ocw.mit.edu/courses/");
BufferedReader reader =
new BufferedReader(new InputStreamReader(theUrl.openStream()));
String str = null;
while((str = reader.readLine()) != null) {
theWebPage = theWebPage + " " + str;
}
reader.close();
} catch (MalformedURLException e) {
// do nothing
} catch (IOException e) {
// do nothing
}
}
public void matchString() {
// this is my regex that I am using to compare strings on input page
String matchRegex = "#\\w+(-\\w+)+";
Pattern p = Pattern.compile(matchRegex);
Matcher m = p.matcher(theWebPage);
int i = 0;
while (!m.hitEnd()) {
try {
System.out.println(m.group());
courseNames.add(i, m.group());
i++;
} catch (IllegalStateException e) {
// do nothing
}
}
}
}
What I am trying to achieve with the above code is to get the list of departments on the MIT OpencourseWare website. I am using a regular expression that matches the pattern of the department names as in the page source. And I am using a Pattern object and a Matcher object and trying to find() and print these department names that match the regular expression. But the code is taking forever to run and I don't think reading in a webpage using bufferedReader takes that long. So I think I am either doing something horribly wrong or parsing websites takes a ridiculously long time. so I would appreciate any input on how to improve performance or correct a mistake in my code if any. I apologize for the badly written code.
The problem is with the code
while ((str = reader.readLine()) != null)
theWebPage = theWebPage + " " +str;
The variable theWebPage is a String, which is immutable. For each line read, this code creates a new String with a copy of everything that's been read so far, with a space and the just-read line appended. This is an extraordinary amount of unnecessary copying, which is why the program is running so slow.
I downloaded the web page in question. It has 55,000 lines and is about 3.25MB in size. Not too big. But because of the copying in the loop, the first line ends up being copied about 1.5 billion times (1/2 of 55,000 squared). The program is spending all its time copying and garbage collecting. I ran this on my laptop (2.66GHz Core2Duo, 1GB heap) and it took 15 minutes to run when reading from a local file (no network latency or web crawling countermeasures).
To fix this, make theWebPage into a StringBuilder instead, and change the line in the loop to be
theWebPage.append(" ").append(str);
You can convert theWebPage to a String using toString() after the loop if you wish. When I ran the modified version, it took a fraction of a second.
BTW your code is using a bare code block within { } inside a class. This is an instance initializer (as opposed to a static initializer). It gets run at object construction time. This is legal, but it's quite unusual. Notice that it misled other commenters. I'd suggest converting this code block into a named method.
Is this your whole program? Where is the declaration of parserObject?
Also, shouldn't all of this code be in your main() prior to calling matchString()?
parserObject courseObject = new parserObject();
ArrayList<parserObject> courseObjects = new ArrayList<parserObject>();
ArrayList<String> courseNames = new ArrayList<String>();
String theWebPage=" ";
{
try {
URL theUrl = new URL("http://ocw.mit.edu/courses/");
BufferedReader reader = new BufferedReader(new InputStreamReader(theUrl.openStream()));
String str = null;
while((str = reader.readLine())!=null)
{
theWebPage = theWebPage+" "+str;
}
reader.close();
} catch (MalformedURLException e) {
} catch (IOException e) {
}
}
You are also catching exceptions and not displaying any error messages. You should always display an error message and do something when you encounter an exception. For example, if you can't download the page, there is no reason to try to parse a empty string.
From you comment I learned about static blocks in classes (thank you, didn't know about them). However, from what I've read you need to put the keyword static before the start of the block {. Also, it might just be better to put the code into your main, that way you can exit if you get a MalformedURLException or IOException.
You can, of course, solve this assignment with the limited JDK 1.0 API, and run into the issue that Stuart Marks helped you solve in his excellent answer.
Or, you just use a popular de-facto standard library, like for instance, Apache Commons IO, and read your website into a String using a no-brainer like this:
// using this...
import org.apache.commons.io.IOUtils;
// run this...
try (InputStream is = new URL("http://ocw.mit.edu/courses/").openStream()) {
theWebPage = IOUtils.toString(is);
}

Categories

Resources