Currently I have a massive log file in my application that I need to post to an endpoint. I periodically run a method that will read in the entire file into a list, perform some formatting so that the endpoint will accept it, and then convert the string using StringBuilder, return this string, and then post it to my endpoint. Oh, I forgot to mention, I batch the data out in chunks of X many characters. I am seeing some memory issues in my application and am trying to deal with this.
So this is how I am partitioning out the data to a temporary list
if (logFile.exists()) {
try (BufferedReader br = new BufferedReader(new FileReader(logFile.getPath()))) {
String line;
while ((line = br.readLine()) != null) {
if (isJSONValid(line)) {
temp.add(line);
tempCharCount += line.length();
}
if (tempCharCount >= LOG_PARTITION_CHAR_COUNT) {
// Formatting for the backend
String tempString = postFormat(temp);
// Send
sendLogs(tempString);
// Refresh
temp = new ArrayList<>();
tempCharCount = 0;
}
}
// Send "dangling" data
// Formatting for the backend
String tempString = postFormat(temp);
// Send
sendLogs(tempString);
} catch (FileNotFoundException e) {
Timber.e(new Exception(e));
} catch (IOException e) {
Timber.e(new Exception(e));
}
So when we reach our partition limit for character count, you can see that we are running
String tempString = postFormat(temp);
This is where we make sure our data is formatted into a string of json data that the endpoint will accept.
private String postFormat(ArrayList<String> list) {
list.add(0, LOG_ARRAY_START);
list.add(LOG_ARRAY_END);
StringBuilder sb = new StringBuilder();
for (int stringCount = 0; stringCount < list.size(); stringCount++) {
sb.append(list.get(stringCount));
// Only add comma separators after initial element, but never add to final element and
// its preceding element to match the expected backend input
if (stringCount > 0 && stringCount < list.size() - 2) {
sb.append(",");
}
}
return sb.toString();
}
As you might imagine, if you have a large log file, and these requests are going out async, then we will be using a lot of memory. Once our Stringbuilder is done, we return as a string that will eventually be gzip compressed and posted to an endpoint.
I am looking for ways to decrease the memory usage of this. I profiled it a bit on the side and could see how obviously inefficient it is, but am not sure of how I can do this better. Any ideas are appreciated.
I have one suggestion for you.
Formatted Output in Temp File - You can write formatted output in temp file. Once the transformation completed then you can read temp files and send to endpoint. If you don’t have sequence concern then you can use multi thread to append same file.
With this approach you are not storming any data in memory while transformation which will save lot of memory.
I have a method that stores an int in a .dat file (among other things) later I try to retrieve that in with a different method and it gives an absurd value. For example if I try to store a 1 the other method retrieves 484449. I'm new to Java so if this is somehow normal please explain.
Method that writes int:
public static int fromText (String textRefference, String binaryRefference,
boolean overwrite, String countRefference){
if(!(new File(binaryRefference).exists()))overwrite = true;
BufferedReader input;
ObjectOutputStream output;
ObjectInputStream binaryInput;
ObjectInputStream countStreamI;
ObjectOutputStream countStreamO;
int count = 0;
try{
input = new BufferedReader(new FileReader(textRefference));
String[] data = null;
int oldCount = 0;
if(!overwrite){
countStreamI = new ObjectInputStream(new FileInputStream(countRefference));
binaryInput = new ObjectInputStream(new FileInputStream(binaryRefference));
oldCount = countStreamI.readInt();
data = new String[oldCount];
int i;
for(i = 0;i < oldCount; i++){
data[i] = binaryInput.readUTF();
}
countStreamI.close();
}
countStreamO = new ObjectOutputStream(new FileOutputStream(countRefference));
output = new
ObjectOutputStream(new FileOutputStream(binaryRefference));
String sentinel = input.readLine();
String[] data2 = new String[1500];
while(!sentinel.equalsIgnoreCase("end")){
System.out.println(sentinel + " has been recorded");
data2[count] = sentinel;
sentinel = input.readLine();
count++;
}
count += oldCount;
countStreamO.writeInt(count);
if(!overwrite){
int i;
for(i = 0; i < oldCount;i++){
output.writeUTF(data[i]);
}
}
int i = 0;
for(; i < count + oldCount;i++){
output.writeUTF(data2[i]);
}
output.flush();
countStreamO.flush();
countStreamO.close();
output.close();
input.close();
}
catch(Exception e){
Scanner in = new Scanner(System.in);
e.printStackTrace();
in.nextLine();
System.exit(0);
}
return count;
}'
And the function retrieving it:
public static String[] pullStrings(String file, String countReferrence, boolean print){
String[] data = null;
try{
ObjectInputStream input = new ObjectInputStream(new FileInputStream(file));
int count = input.readInt();
data = new String[count];
int i = 0;
String string;
for(;i < count; i++){
string = input.readUTF();
if(print)System.out.println(string);
data[i] = string;
}
}
catch(Exception e){
Scanner in = new Scanner(System.in);
System.out.println(e.getMessage() + "\n\n");
e.printStackTrace();
System.out.println("\n hit ENTER to exit.");
in.nextLine();
System.exit(0);
}
return data;
}
And the text file:
data!!!
end
This strange number you're getting, 484449, is actually the result of reading four bytes: 00 07 64 61.
Where did those bytes come from? Well, for some reason, you chose to send count to a different file, using countStreamO.writeInt(count);. So when your retrieval code executes input.readInt(), it's expecting to find a count in the same file, but you never wrote it there.
Instead, you sent the count to a different file, then proceeded to write each string to the main data file using output.writeUTF(data[i]) and output.writeUTF(data2[i]).
What does writeUTF actually do? Well, the documentation for ObjectOutputStream.writeUTF doesn't say much about it, except that the method is mandated by the DataOutput interface. The documentation for DataOutput.writeUTF is pretty informative, though:
Writes two bytes of length information to the output stream, followed by the modified UTF-8 representation of every character in the string s.
So you never wrote your count value to the file, but you did send the string "data!!!" to it. And now we know that writeUTF first writes the byte length of that string (after converting it to modified UTF-8) followed by the modified UTF-8 bytes themselves.
In this case, your String consists entirely of ASCII characters, which, when encoded in modified UTF-8 (or real UTF-8, for that matter), take up one byte each, with no encoding required. So the string requires 7 bytes, one for each character.
Meaning, the writeUTF method wrote two bytes for the byte length (00 07) followed by seven bytes for the characters (64 61 74 61 21 21 21).
Which means the first four bytes in the file are 00 07 64 61. You're trying to read them as 32-bit int, so you're getting 0x00076461, or 484449.
Your code is far more complicated than it needs to be. That complexity makes it difficult to see small problems like this one. Also, some documentation would have made it clear what your code should be doing. It looks like you realized by the time you got around to writing the retrieval code that you didn't need a separate file for the count, but you never went back and updated the code that writes the data to accommodate the improvement.
I don't know if your data file needs to adhere to an externally specified format, but if it doesn't, you can easily accomplish your task by doing away with counts entirely, and doing away with readUTF and writeUTF. Instead, you can simply serialize a String array:
String[] allData = new String[data.length + data2.length];
System.arraycopy(data, 0, allData, 0, data.length);
System.arraycopy(data2, 0, allData, data.length, data2.length);
try (ObjectOutputStream out = new ObjectOutputStream(
new BufferedOutputStream(
new FileOutputStream(binaryReference)))) {
out.write(allData);
}
The length is part of the array object's state, so it is included in the serialized output.
Reading it is even easier:
String[] data;
try (ObjectInputStream in = new ObjectInputStream(
new BufferedInputStream(
new FileInputStream(file)))) {
data = (String[]) in.readObject();
}
I have a problem with running tshark in Java. It seems that packets arrive in bulk instead of truly real-time (as it happens when run from terminal).
I tried a few different approaches:
ArrayList<String> command = new ArrayList<String>();
command.add("C:\\Program Files\\Wireshark\\tshark.exe");
ProcessBuilder pb = new ProcessBuilder(command);
Process process = pb.start();
BufferedReader br = null;
try {
//tried different numbers for BufferedReader's last parameter
br = new BufferedReader(new InputStreamReader(process.getInputStream()), 1);
String line = null;
while ((line = br.readLine()) != null) {
System.out.println(line);
}
} catch...
also tried using InputStream's available() method as seen in What does InputStream.available() do in Java?
I also tried NuProcess library with the following code:
NuProcessBuilder pb = new NuProcessBuilder(command);
ProcessHandler processHandler = new ProcessHandler();
pb.setProcessListener(processHandler);
NuProcess process = pb.start();
try {
process.waitFor(0, TimeUnit.SECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
private class ProcessHandler extends NuAbstractProcessHandler {
private NuProcess nuProcess;
#Override
public void onStart(NuProcess nuProcess) {
this.nuProcess = nuProcess;
}
#Override
public void onStdout(ByteBuffer buffer) {
if (buffer == null)
return;
byte[] bytes = new byte[buffer.remaining()];
buffer.get(bytes);
System.out.println(new String(bytes));
}
}
None of the methods work. Packets always arrive, as if buffered, only when about 50 were sniffed.
Do you have any idea why this may be happening and how to solve it? It's pretty frustrating. I spent a lot of time looking at similar questions at SO, but none of them helped.
Do you see any errors in my code? Is it working in your case?
As the tshark man page says:
−l Flush the standard output after the information for each packet is
printed. (This is not, strictly speaking, line‐buffered if −V was
specified; however, it is the same as line‐buffered if −V wasn’t
specified, as only one line is printed for each packet, and, as −l
is normally used when piping a live capture to a program or script,
so that output for a packet shows up as soon as the packet is seen
and dissected, it should work just as well as true line‐buffering.
We do this as a workaround for a deficiency in the Microsoft Visual
C++ C library.)
This may be useful when piping the output of TShark to another
program, as it means that the program to which the output is piped
will see the dissected data for a packet as soon as TShark sees the
packet and generates that output, rather than seeing it only when
the standard output buffer containing that data fills up.
Try running tshark with the -l command-line argument.
I ran some tests to see how much Buffering would be done by BufferedReader versus just using the input stream.
ProcessBuilder pb = new ProcessBuilder("ls", "-lR", "/");
System.out.println("pb.command() = " + pb.command());
Process p = pb.start();
byte ba[] = new byte[100];
InputStream is = p.getInputStream();
int bytecountsraw[] = new int[10000];
long timesraw[] = new long[10000];
long last_time = System.nanoTime();
for (int i = 0; i < timesraw.length; i++) {
int bytecount = is.read(ba);
long time = System.nanoTime();
timesraw[i] = time - last_time;
last_time = time;
bytecountsraw[i] = bytecount;
}
try (PrintWriter pw = new PrintWriter(new FileWriter("dataraw.csv"))) {
pw.println("bytecount,time");
for (int i = 0; i < timesraw.length; i++) {
pw.println(bytecountsraw[i] + "," + timesraw[i] * 1.0E-9);
}
} catch (Exception e) {
e.printStackTrace();
}
BufferedReader br = new BufferedReader(new InputStreamReader(is));
int bytecountsbuffered[] = new int[10000];
long timesbuffered[] = new long[10000];
last_time = System.nanoTime();
for (int i = 0; i < timesbuffered.length; i++) {
String str = br.readLine();
int bytecount = str.length();
long time = System.nanoTime();
timesbuffered[i] = time - last_time;
last_time = time;
bytecountsbuffered[i] = bytecount;
}
try (PrintWriter pw = new PrintWriter(new FileWriter("databuffered.csv"))) {
pw.println("bytecount,time");
for (int i = 0; i < timesbuffered.length; i++) {
pw.println(bytecountsbuffered[i] + "," + timesbuffered[i] * 1.0E-9);
}
} catch (Exception e) {
e.printStackTrace();
}
I tried to find a command that would just keep printing as fast as it could so that any delays would be due to the buffering and/or ProcessBuilder rather than in the command itself. Here is a plot of the result.
You can plot the csv files with excel although I used a Netbeans plugin called DebugPlot. There wasn't a great deal of difference between the raw and the buffered. Both were bursty with majority of reads taking less than a microsecond separated by peaks of 10 to 50 milliseconds. The scale of the plot is in nanoseconds so the top of 5E7 is 50 milliseconds or 0.05 seconds. If you test and get similar results perhaps it is the best process builder can do. If you get dramatically worse results with tshark than other commands, perhaps there is an option to tshark or the packets themselves are coming in bursts.
I am currently writing a programming a webserver, I am working on the HTTP PUT method. When the client is connect and he types something like this:
PUT /newfile.txt HTTP/1.1
Host: myhost
BODY This text is what has to be written to the new created file
I want to be able to write the BODY of the client request into the created file.
This is what I have to far, it work but after I press enter it stays there.
InputStream is = conn.getInputStream();
OutputStream fos = Files.newOutputStream(file);
int count = 0;
int n = 10;
while (count < n) {
int b = is.read();
if (b == -1) break;
fos.write(b);
++count;
}
fos.close();
conn.close();
You may try this
Scanner sc = new Scanner(is);
Scanner will enable you to read file easily to a String. You may separate BODY using regex. You just have to provide it as an argument of the next method of Scanner.
After separating, you'll need to write that String to the file. Just use
FileWriter writer = new FileWriter(file);
writer.write(bodyContentString);
writer.flush();
writer.close();
Good luck.
I want to read the last n lines of a very big file without reading the whole file into any buffer/memory area using Java.
I looked around the JDK APIs and Apache Commons I/O and am not able to locate one which is suitable for this purpose.
I was thinking of the way tail or less does it in UNIX. I don't think they load the entire file and then show the last few lines of the file. There should be similar way to do the same in Java too.
I found it the simplest way to do by using ReversedLinesFileReader from apache commons-io api.
This method will give you the line from bottom to top of a file and you can specify n_lines value to specify the number of line.
import org.apache.commons.io.input.ReversedLinesFileReader;
File file = new File("D:\\file_name.xml");
int n_lines = 10;
int counter = 0;
ReversedLinesFileReader object = new ReversedLinesFileReader(file);
while(counter < n_lines) {
System.out.println(object.readLine());
counter++;
}
If you use a RandomAccessFile, you can use length and seek to get to a specific point near the end of the file and then read forward from there.
If you find there weren't enough lines, back up from that point and try again. Once you've figured out where the Nth last line begins, you can seek to there and just read-and-print.
An initial best-guess assumption can be made based on your data properties. For example, if it's a text file, it's possible the line lengths won't exceed an average of 132 so, to get the last five lines, start 660 characters before the end. Then, if you were wrong, try again at 1320 (you can even use what you learned from the last 660 characters to adjust that - example: if those 660 characters were just three lines, the next try could be 660 / 3 * 5, plus maybe a bit extra just in case).
RandomAccessFile is a good place to start, as described by the other answers. There is one important caveat though.
If your file is not encoded with an one-byte-per-character encoding, the readLine() method is not going to work for you. And readUTF() won't work in any circumstances. (It reads a string preceded by a character count ...)
Instead, you will need to make sure that you look for end-of-line markers in a way that respects the encoding's character boundaries. For fixed length encodings (e.g. flavors of UTF-16 or UTF-32) you need to extract characters starting from byte positions that are divisible by the character size in bytes. For variable length encodings (e.g. UTF-8), you need to search for a byte that must be the first byte of a character.
In the case of UTF-8, the first byte of a character will be 0xxxxxxx or 110xxxxx or 1110xxxx or 11110xxx. Anything else is either a second / third byte, or an illegal UTF-8 sequence. See The Unicode Standard, Version 5.2, Chapter 3.9, Table 3-7. This means, as the comment discussion points out, that any 0x0A and 0x0D bytes in a properly encoded UTF-8 stream will represent a LF or CR character. Thus, simply counting the 0x0A and 0x0D bytes is a valid implementation strategy (for UTF-8) if we can assume that the other kinds of Unicode line separator (0x2028, 0x2029 and 0x0085) are not used. You can't assume that, then the code would be more complicated.
Having identified a proper character boundary, you can then just call new String(...) passing the byte array, offset, count and encoding, and then repeatedly call String.lastIndexOf(...) to count end-of-lines.
The ReversedLinesFileReader can be found in the Apache Commons IO java library.
int n_lines = 1000;
ReversedLinesFileReader object = new ReversedLinesFileReader(new File(path));
String result="";
for(int i=0;i<n_lines;i++){
String line=object.readLine();
if(line==null)
break;
result+=line;
}
return result;
I found RandomAccessFile and other Buffer Reader classes too slow for me. Nothing can be faster than a tail -<#lines>. So this it was the best solution for me.
public String getLastNLogLines(File file, int nLines) {
StringBuilder s = new StringBuilder();
try {
Process p = Runtime.getRuntime().exec("tail -"+nLines+" "+file);
java.io.BufferedReader input = new java.io.BufferedReader(new java.io.InputStreamReader(p.getInputStream()));
String line = null;
//Here we first read the next line into the variable
//line and then check for the EOF condition, which
//is the return value of null
while((line = input.readLine()) != null){
s.append(line+'\n');
}
} catch (java.io.IOException e) {
e.printStackTrace();
}
return s.toString();
}
CircularFifoBuffer from apache commons . answer from a similar question at How to read last 5 lines of a .txt file into java
Note that in Apache Commons Collections 4 this class seems to have been renamed to CircularFifoQueue
package com.uday;
import java.io.File;
import java.io.RandomAccessFile;
public class TailN {
public static void main(String[] args) throws Exception {
long startTime = System.currentTimeMillis();
TailN tailN = new TailN();
File file = new File("/Users/udakkuma/Documents/workspace/uday_cancel_feature/TestOOPS/src/file.txt");
tailN.readFromLast(file);
System.out.println("Execution Time : " + (System.currentTimeMillis() - startTime));
}
public void readFromLast(File file) throws Exception {
int lines = 3;
int readLines = 0;
StringBuilder builder = new StringBuilder();
try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
long fileLength = file.length() - 1;
// Set the pointer at the last of the file
randomAccessFile.seek(fileLength);
for (long pointer = fileLength; pointer >= 0; pointer--) {
randomAccessFile.seek(pointer);
char c;
// read from the last, one char at the time
c = (char) randomAccessFile.read();
// break when end of the line
if (c == '\n') {
readLines++;
if (readLines == lines)
break;
}
builder.append(c);
fileLength = fileLength - pointer;
}
// Since line is read from the last so it is in reverse order. Use reverse
// method to make it correct order
builder.reverse();
System.out.println(builder.toString());
}
}
}
A RandomAccessFile allows for seeking (http://download.oracle.com/javase/1.4.2/docs/api/java/io/RandomAccessFile.html). The File.length method will return the size of the file. The problem is determining number of lines. For this, you can seek to the end of the file and read backwards until you have hit the right number of lines.
I had similar problem, but I don't understood to another solutions.
I used this. I hope thats simple code.
// String filePathName = (direction and file name).
File f = new File(filePathName);
long fileLength = f.length(); // Take size of file [bites].
long fileLength_toRead = 0;
if (fileLength > 2000) {
// My file content is a table, I know one row has about e.g. 100 bites / characters.
// I used 1000 bites before file end to point where start read.
// If you don't know line length, use #paxdiablo advice.
fileLength_toRead = fileLength - 1000;
}
try (RandomAccessFile raf = new RandomAccessFile(filePathName, "r")) { // This row manage open and close file.
raf.seek(fileLength_toRead); // File will begin read at this bite.
String rowInFile = raf.readLine(); // First readed line usualy is not whole, I needn't it.
rowInFile = raf.readLine();
while (rowInFile != null) {
// Here I can readed lines (rowInFile) add to String[] array or ArriyList<String>.
// Later I can work with rows from array - last row is sometimes empty, etc.
rowInFile = raf.readLine();
}
}
catch (IOException e) {
//
}
Here is the working for this.
private static void printLastNLines(String filePath, int n) {
File file = new File(filePath);
StringBuilder builder = new StringBuilder();
try {
RandomAccessFile randomAccessFile = new RandomAccessFile(filePath, "r");
long pos = file.length() - 1;
randomAccessFile.seek(pos);
for (long i = pos - 1; i >= 0; i--) {
randomAccessFile.seek(i);
char c = (char) randomAccessFile.read();
if (c == '\n') {
n--;
if (n == 0) {
break;
}
}
builder.append(c);
}
builder.reverse();
System.out.println(builder.toString());
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
Here is the best way I've found to do it. Simple and pretty fast and memory efficient.
public static void tail(File src, OutputStream out, int maxLines) throws FileNotFoundException, IOException {
BufferedReader reader = new BufferedReader(new FileReader(src));
String[] lines = new String[maxLines];
int lastNdx = 0;
for (String line=reader.readLine(); line != null; line=reader.readLine()) {
if (lastNdx == lines.length) {
lastNdx = 0;
}
lines[lastNdx++] = line;
}
OutputStreamWriter writer = new OutputStreamWriter(out);
for (int ndx=lastNdx; ndx != lastNdx-1; ndx++) {
if (ndx == lines.length) {
ndx = 0;
}
writer.write(lines[ndx]);
writer.write("\n");
}
writer.flush();
}
(See commend)
public String readFromLast(File file, int howMany) throws IOException {
int numLinesRead = 0;
StringBuilder builder = new StringBuilder();
try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
long fileLength = file.length() - 1;
/*
* Set the pointer at the end of the file. If the file is empty, an IOException
* will be thrown
*/
randomAccessFile.seek(fileLength);
for (long pointer = fileLength; pointer >= 0; pointer--) {
randomAccessFile.seek(pointer);
byte b = (byte) randomAccessFile.read();
if (b == '\n') {
numLinesRead++;
// (Last line often terminated with a line separator)
if (numLinesRead == (howMany + 1))
break;
}
baos.write(b);
fileLength = fileLength - pointer;
}
/*
* Since line is read from the last so it is in reverse order. Use reverse
* method to make it ordered correctly
*/
byte[] a = baos.toByteArray();
int start = 0;
int mid = a.length / 2;
int end = a.length - 1;
while (start < mid) {
byte temp = a[end];
a[end] = a[start];
a[start] = temp;
start++;
end--;
}// End while
return new String(a).trim();
} // End inner try-with-resources
} // End outer try-with-resources
} // End method
I tried RandomAccessFile first and it was tedious to read the file backwards, repositioning the file pointer upon every read operation. So, I tried #Luca solution and I got the last few lines of the file as a string in just two lines in a few minutes.
InputStream inputStream = Runtime.getRuntime().exec("tail " + path.toFile()).getInputStream();
String tail = new BufferedReader(new InputStreamReader(inputStream)).lines().collect(Collectors.joining(System.lineSeparator()));
Code is 2 lines only
// Please specify correct Charset
ReversedLinesFileReader rlf = new ReversedLinesFileReader(file, StandardCharsets.UTF_8);
// read last 2 lines
System.out.println(rlf.toString(2));
Gradle:
implementation group: 'commons-io', name: 'commons-io', version: '2.11.0'
Maven:
<dependency>
<groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version>
</dependency>