Batch reading error (Train Schedulling) on Java NetBeans - java

I am currently enrolled in a Real Time class and I need to program an adaption to a workload-scheduler module of hybrid big data (stream) system.
I am starting from scratch: using Java on NetBeans, generating synthetic dataset, reading this data set with a Thread.
So what's going on: My thread is supposed to take different file readings and, depending on the size of the file, send it to a different ArrayList on my execution. It works fine for small files such as <10KB, but I also need to I/O batch files with size 64MB+ and medium files with variable size between 1KB and 64MB. Each one of these files will be treated accordingly.
Problem: My thread(single) reads all small .txt, but when reading larger files, my NetBeans just keep running and don't really do anything.
Could it be memory problems?
Does anyone have any knowledge of how to manipulate large txt files in NetBeans?
P.S.: Memory is not an issue on my computer.

In java you can read entire file into memory with:
java.nio.file.Files.readAlLines
Or you can use streaming with java.util.Scanner which reads file line by line or by token that matches pattern:
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// System.out.println(line);
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
So to read large txt files use Scanner.

Related

Selenium Chrome Driver In Tomcat is not working why?

was trying run my selenium automation code using java in a Tomcat server. It works fine when I run using javac but when it gets run on Tomcat as a jar It shows "com.google.common.base.Preconditions.checkState(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)V|" this as a log. Here my selenium-chrome driver is placed in desktop of my local machine and path is defined (Tomcat is also a local server)
I would go with a buffered file reader like this:
public static void main(String[] args) throws IOException {
try {
File f = new File("data.txt");
BufferedReader b = new BufferedReader(new FileReader(f));
String readLine;
while ((readLine = b.readLine()) != null) {
if (readLine.contains("WORD"))
System.out.println("Found WORD in: " + readLine);
}
} catch (IOException e) {
e.printStackTrace();
}
}
where "WORD" is the word you are searching for.
The advantage of a BufferedReader is that it reads ahead to reduce the number of I/O roundtrips - or as they put it in the JavaDoc: "Reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines."
FileChannel is a slightly newer invention, arriving in the NIO with Java 1.4. It might perform better than the BufferedReader - but I also find it a lot more low-level in its API, so unless you have very special performance requirements, I would leave the readahead/buffering to BufferedReader and FileReader.
You can also say that BufferedReader is "line oriented" whereas FileChannel is "byte oriented".
I like the BufferedReader from Java.io with a FileReader most:
https://docs.oracle.com/javase/7/docs/api/java/io/FileReader.html
https://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
https://www.mkyong.com/java/how-to-read-file-from-java-bufferedreader-example/
It is easy to use and has most functions. But your file mus be char-based to use that ( like a text file)

Regarding stitching of multiple files into a single file

I work on query latencies and have a requirement where I have several files which contain data. I want to aggregate this data into a single file. I use a naive technique where I open each file and collect all the data in a global file. I do this for all the files but this is time taking. Is there a way in which you can stitch the end of one file to the beginning of another and create a big file containing all the data. I think many people might have faced this problem before. Can anyone kindly help ?
I suppose you are currently doing the opening and appending by hand; otherwise I do not know why it would take a long time to aggregate the data, especially since you describe the amount of files using multiple and several which seem to indicate it's not an enormous number.
Thus, I think you are just looking for a way to automatically to the opening and appending for you. In that case, you can use an approach similar to below. Note this creates the output file or overwrites it if it already exists, then appends the contents of all specified files. If you want to call the method multiple times and append to the same file instead of overwriting an existing file, an alternative is to use a FileWriter instead with true as a second argument to its constructor so it will append to an existing file.
void aggregateFiles(List<String> fileNames, String outputFile) {
PrintWriter writer = null;
try {
writer = new PrintWriter(outputFile);
for(String fileName : fileNames) {
Path path = Paths.get(fileName);
String fileContents = new String(Files.readAllBytes(path));
writer.println(fileContents);
}
} catch(IOException e) {
// Handle IOException
} finally {
if(writer != null) writer.close();
}
}
List<String> files = new ArrayList<>();
files.add("f1.txt");
files.add("someDir/f2.txt");
files.add("f3.txt");
aggregateFiles(files, "output.txt");

Assets + skipBytes performance

DataInputStream istream = new DataInputStream(inputstream);
istream.skipBytes(offset);
int value = istream.readInt();
When the inputstream is from getAssets().open("abc") and the file is big (up to 20mb), such simple seek+read takes much time to complete (~250ms on my android phone).
When I first copy the file to getCacheDir().getPath() the same code will take less than 1ms (if you ignore the startup time for copying ~3sec). Plus the app will use more space, once from the copy and once from the asset.
I only read read about 6 values (readInt()) on different locations.
Now to my question, is it possible to improve the performance of the assets skipByte? If yes how? If not is there an alternative, instead of copying the file?
Finally I found a working alternative instead of copying assets.
Example Code
try {
AssetFileDescriptor fd_description = getAssets().openFd("test.raw");
String apk_path = getPackageResourcePath(); //maybe getPackageCodePath() ?
RandomAccessFile file = new RandomAccessFile(apk_path, "r");
file.seek(fd_description.getStartOffset());
String value = file.readLine();
Log.e("RAW ACCESS", "READ:" + value);
file.seek(fd_description.getStartOffset());
value = file.readLine();
Log.e("RAW ACCESS", "READ:" + value);
} catch (IOException exp){
Log.e("RAW ACCESS", "ERROR:"+exp.toString());
}
Some Info
getPackageResourcePath() returns the path to your APK
getAssets().openFd("test.raw") returns the asset informations you need to find the data in the APK
Your asset starts at fd_description.getStartOffset() and ends at fd_description.getStartOffset()+fd_description.getLength()
With if (fd_description.getLength() != fd_description.getDeclaredLength()) you can check if the asset is compressed..
Important
Of course this only works if the asset isn't compressed ! But it's not that hard to disable the compression.
Use a RandomAccessFile if you possibly can. InputStream.skipBytes() only works by reading the file.

How does other applications handle large text files without having a large memory foot print?

I need to know how applications like Bairtail or Baregrep can handle such large text files without having a huge foot print?
I am trying to do something similar in Java as in question:
Viewing large log files in JavaFX in a ListView
But when I handle large text log files (900Mb up to 2.5Gb of text) I am running into issues. The JVM memory size increase dramatically when I read the text files.
One other way was to only retrieve the lines that I am interested in. but I am not aware of any technology to do this in java. I have to start reading line by line until I get to the required line that I want (let’s say line 1000) and then grab hold of that text. But in doing so I have 999 lines in memory which is waiting to be GC’d.
Bairgrep for instance is scanning multiple files in a folder and look for a pattern. If I open the task manager I can hardly see that the memory footprint is growing. What type of technology or way of scanning is these programs using.
Is there a technology out there that I can use in my application to handle large text files?
I might add that my log file is files that is generated by a java application and the length of each line is not the same.
One correction... with memory footprint I mean I cannot read a 6Gb file in memory. Event if I specify the VM size with -Xmx to be small. The application is running out of memory when reading the 6Gb file.
Added 2 ways I tried to get the text from the 758 Mb Log file
Method 1
#FXML
private void handleButtonAction(ActionEvent event) {
final String fileName = "D:/Development/Logs/File1.log";
try {
BufferedReader in = new BufferedReader(new FileReader(fileName));
while (in.ready()) {
String s = in.readLine();
}
in.close();
} catch (Exception e) {
e.printStackTrace();
}
}
Method 2
#FXML
private void handleButtonAction(ActionEvent event) {
final String fileName = "D:/Development/Logs/File1.log";
Scanner scan = null;
try {
File file = new File(fileName);
if (!file.exists()) {
return;
}
scan = new Scanner(file);
long start = System.nanoTime();
while (scan.hasNextLine()) {
final String line = scan.nextLine();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
scan.close();
}
}
I think "MemoryMappedFile" is what you are looking for.
I found some links to help you:
http://www.linuxtopia.org/online_books/programming_books/thinking_in_java/TIJ314_029.htm
http://javarevisited.blogspot.de/2012/01/memorymapped-file-and-io-in-java.html
Both the applications you mentioned, might "handle" large files, but they don't actually need to load entire files into memory. The first one sounds like it might seek directly to the end of the file, while the second operates on a line by line basis.
It is possible they are using native code via JNI to achieve the low memory use.
Edit: Infact, they look to be purely C or C++ applications, they don't need to wait for GC like Java applications do.

java output html code to file

I have a chunk of html code that should be outputted as a .html file, in java. The pre-written html code is the header and table for a page, and i need to generate some data, and output it to the same .html file. Is there an easier way to print the html code than to do prinln() line by line? Thanks
You can look at some Java libraries for parsing HTML code. A quick Google search tuns up a few. Read in the HTML and then use their queries to manipulate the DOM as needed and then spit it back out.
e.g. http://jsoup.org/
Try using a templating engine, MVEL2 or FreeMarker, for example. Both can be used by standalone applications outside of a web framework. You lose time upfront but it will save you time in the long run.
JSP (Java Server Pages) allows you to write HTML files which have some Java code easily embedded within them. For example
<html><head><title>Hi!</title></head><body>
<% some java code here that outputs some stuff %>
</body></html>
Though that requires that you have an enterprise Java server installed. But if this is on a web server, that might not be unreasonable to have.
If you want to do it in normal Java, that depends. I don't fully understand which part you meant you will be outputting line by line. Did you mean you are going to do something like
System.out.println("<html>");
System.out.println("<head><title>Hi!</title></head>");
System.out.println("<body>");
// etc
Like that? If that's what you meant, then don't do that. You can just read in the data from the template file and output all the data at once. You could read it into a multiline text string of you could read the data in from the template and output it directly to the new file. Something like
while( (strInput = templateFileReader.readLine()) != null)
newFileOutput.println(strInput);
Again, I'm not sure exactly what you mean by that part.
HTML is simply a way of marking up text, so to write a HTML file, you are simply writing the HTML as text to a file with the .html extension.
There's plenty of tutorials out there for reading and writing from files, as well as getting a list of files from a directory. (Google 'java read file', 'java write file', 'java list directory' - that is basically everything you need.) The important thing is the use of BufferedReader/BufferedWriter for pulling and pushing the text in to the files and realising that there is no particular code science involved in writing HTML to a file.
I'll reiterate; HTML is nothing more than <b>text with tags</b>.
Here's a really crude example that will output two files to a single file, wrapping them in an <html></html> tag.
BufferedReader getReaderForFile(filename) {
FileInputStream in = new FileInputStream(filename);
return new BufferedReader(new InputStreamReader(in));
}
public void main(String[] args) {
// Open a file
BufferedReader myheader = getReaderForFile("myheader.txt");
BufferedReader contents = getReaderForFile("contentfile.txt");
FileWriter fstream = new FileWriter("mypage.html");
BufferedWriter out = new BufferedWriter(fstream);
out.write("<html>");
out.newLine();
for (String line = myheader.readLine(); line!=null; line = myheader.readLine()) {
out.write(line);
out.newLine(); // readLine() strips 'carriage return' characters
}
for (String line = contents.readLine(); line!=null; line = contents.readLine()) {
out.write(line);
out.newLine(); // readLine() strips 'carriage return' characters
}
out.write("</html>");
}
To build a simple HTML text file, you don't have to read your input file line by line.
File theFile = new File("file.html");
byte[] content = new byte[(int) theFile.length()];
You can use "RandomAccessFile.readFully" to read files entirely as a byte array:
// Read file function:
RandomAccessFile file = null;
try {
file = new RandomAccessFile(theFile, "r");
file.readFully(content);
} finally {
if(file != null) {
file.close();
}
}
Make your modifications on the text content:
String text = new String(content);
text = text.replace("<!-- placeholder -->", "generated data");
content = text.getBytes();
Writing is also easy:
// Write file content:
RandomAccessFile file = null;
try {
file = new RandomAccessFile(theFile, "rw");
file.write(content);
} finally {
if(file != null) {
file.close();
}
}

Categories

Resources