When to flush a BufferedWriter - java

In a Java program (Java 1.5), I have a BufferedWriter that wraps a Filewriter, and I call write() many many times... The resulting file is pretty big...
Among the lines of this file, some of them are incomplete...
Do I need to call flush each time I write something (but I suspect it would be inefficient) or use another method of BufferedWriter or use another class...?
(Since I've a zillion lines to write, I do want to have something quite efficient.)
What would be the ideal "flushing" moment? (when I reach the capacity of the BufferedWriter)...
Init:
try {
analysisOutput = new BufferedWriter(new FileWriter(
"analysisResults", true));
analysisOutput.newLine();
analysisOutput.write("Processing File " + fileName + "\n");
}
catch (FileNotFoundException ex) {
ex.printStackTrace();
}
catch (IOException ex) {
ex.printStackTrace();
}
Writing:
private void printAfterInfo(String toBeMoved,HashMap<String, Boolean> afterMap, Location location)
throws IOException {
if(afterMap != null) {
for (Map.Entry<String, Boolean> map : afterMap.entrySet()) {
if (toBeMoved == "Condition") {
if (1 <= DEBUG)
System.out.println("###" + toBeMoved + " " + location + " "
+ conditionalDefs.get(conditionalDefs.size() - 1)
+ " After " + map.getKey() + " "
+ map.getValue() + "\n");
analysisOutput.write("###" + toBeMoved + " " + location + " "
+ conditionalDefs.get(conditionalDefs.size() - 1)
+ " After " + map.getKey() + " " + map.getValue()
+ "\n");
} else {
if (1 <= DEBUG)
System.out.println("###" + toBeMoved + " " + location + " "
+ map.getKey() + " After "
+ map.getValue() + "\n");
if (conditionalDefs.size() > 0)
analysisOutput.write("###" + toBeMoved + " " + location + " "
+ conditionalDefs.get(conditionalDefs.size() - 1) + " "
+ map.getKey() + " After " + map.getValue()
+ "\n");
else
analysisOutput.write("###" + toBeMoved + " " + location + " " + map.getKey() + " After " + map.getValue() + "\n");
}
}
}
I've just figured out that the lines which are incomplete are those just before "Processing file"... so it occurs when I'm switching from one file that I analyze to another...
Closing:
dispatch(unit);
try {
if (analysisOutput != null) {
printFileInfo();
analysisOutput.close();
}
}
catch (IOException ex) {
ex.printStackTrace();
}
Sometimes the information printed out by printFileInfo does not appear in the results file...

The BufferedWriter will already flush when it fills its buffer. From the docs of BufferedWriter.write:
Ordinarily this method stores characters from the given array into this stream's buffer,
flushing the buffer to the underlying stream as needed.
(Emphasis mine.)
The point of BufferedWriter is basically to consolidate lots of little writes into far fewer big writes, as that's usually more efficient (but more of a pain to code for). You shouldn't need to do anything special to get it to work properly though, other than making sure you flush it when you're finished with it - and calling close() will do this and flush/close the underlying writer anyway.
In other words, relax - just write, write, write and close :) The only time you normally need to call flush manually is if you really, really need the data to be on disk now. (For instance, if you have a perpetual logger, you might want to flush it every so often so that whoever's reading the logs doesn't need to wait until the buffer's full before they can see new log entries!)

The ideal flushing moment is when you need another program reading the file to see the data that's been written, before the file is closed. In many cases, that's never.

If you have a loop alternating init and printAfterInfo, my guess about your problem is that you don't close your writer before creating a new one on the same file. You'd better create the BufferedWriter once and close it at the end of all the processing.

Related

Compare speed of two java methods [duplicate]

This question already has answers here:
How do I write a correct micro-benchmark in Java?
(11 answers)
Closed 4 years ago.
I have two different methods which actually do the same but are implemented a bit different. They walk through a directory and read all files in it and check how many files with a certain name are in the directory. Now I want to know which is faster but both are similar and take around 3-4 seconds (the directory has millions of files) but how can I know which is really faster? Is there a method which compares the speed of them?
method)
private void getAllRelatedFilesEig(String corrId) throws InterruptedException, IOException
{
log.debug("Get all files with corrId=" + corrId + " from directory=" + processingDir);
Profiler profiler = Profiler.createStarted();
Files.list(Paths.get(processingDir))
.filter(p ->
p.getFileName().toString()
.indexOf("EPX_" + corrId + "_") >= 0)
.forEach( path ->
{
try
{
EPEXFile file = new EPEXFile(path);
if (file.isTranMessage())
{
if (file.isOrderMessage())
{
orderFiles.add(file);
}
else
{
tradeFiles.add(file);
}
}
else
{
infoFiles.add(file);
}
}
catch (IFException ex)
{
log.error("Error creating EPEXFile object " + ex.getMessage());
}
}
);
profiler.stop("allFilesWithSameCorrIdRetrieval");
log.info(orderFiles.size() + " order files with corrId=" + corrId);
log.info(tradeFiles.size() + " trade files with corrId=" + corrId);
log.info(infoFiles.size() + " info files with corrId=" + corrId);
profiler = Profiler.createStarted();
profiler.stop("processFiles");
orderFiles.clear();
tradeFiles.clear();
infoFiles.clear();
}
method)
private void getAllRelatedFilesOrig(String corrId) throws InterruptedException, IOException {
log.debug("Get all files with corrId=" + corrId + " from directory=" + processingDir);
Path dirPath = Paths.get(processingDir);
ArrayList<Path> fileList;
Profiler profiler = Profiler.createStarted();
try (Stream<Path> paths = Files.walk(dirPath)) {
fileList = paths.filter(t -> (t.getFileName().toString().indexOf("EPX_" + corrId + "_") >= 0))
.collect(Collectors.toCollection(ArrayList::new));
for (Path path : fileList) {
try {
EPEXFile file = new EPEXFile(path);
if (file.isTranMessage()) {
if (file.isOrderMessage()) {
orderFiles.add(file);
} else {
tradeFiles.add(file);
}
} else {
infoFiles.add(file);
}
} catch (IFException ex) {
log.error("Error creating EPEXFile object " + ex.getMessage());
}
}
}
profiler.stop("allFilesWithSameCorrIdRetrieval");
log.info(orderFiles.size() + " order files with corrId=" + corrId);
log.info(tradeFiles.size() + " trade files with corrId=" + corrId);
log.info(infoFiles.size() + " info files with corrId=" + corrId);
profiler = Profiler.createStarted();
profiler.stop("processFiles");
orderFiles.clear();
tradeFiles.clear();
infoFiles.clear();
}
I have tried to figure it out with the Profiler class but I could not figure out which is exactly faster since sometimes the first and sometimes the second is faster. Is there even a way to say which is faster in general? Even when it is just a little bit faster it would help me to know which one it is.
I recently wrote this method to test two of my methods which did the exact same thing differently.
private void benchMark(){
long t, t1=0, t2 =0;
for (int i =0; i< 50; i++){
t= System.currentTimeMillis();
method1();
t1 += System.currentTimeMillis()-t;
t= System.currentTimeMillis();
method2();
t2+= System.currentTimeMillis()-t;
}
System.out.println("Benchmarking\n\tMethod 1 took + "+t1+" ms\n\tMethod 2 took "+t2+" ms");
}
That's a brute way to do it, but it works since I found that one of my methods was consistently about 5% faster in every of my tests.
I call the methods one after the other to diminish the effect of performance variations during the test.

Spark - Restore nested saved RDD

I am using AWS S3 as a backup storage for data coming in to our Spark cluster. Data comes in every second and is processed when 10 seconds of data has been read. The RDD containing the 10 seconds of data is stored to S3 using
rdd.saveAsObjectFile(s3URL + dateFormat.format(new Date()));
This means that we get a lot of files added to S3 each day in the format of
S3URL/2017/07/23/12/00/10, S3URL/2017/07/23/12/00/20 etc
From here it is easy to restore the RDD which is a
JavaRDD<'byte[]>
using either
sc.objectFile or the AmazonS3 API
The problem is that to reduce the number of files needed to iterate through we run a daily cron job that goes through each file during a day, bunch the data together and store the new RDD to S3. This is done as follows:
List<byte[]> dataList = new ArrayList<>(); // A list of all read messages
/* Get all messages from S3 and store them in the above list */
try {
final ListObjectsV2Request req = new ListObjectsV2Request().withBucketName("bucketname").withPrefix("logs/" + dateString);
ListObjectsV2Result result;
do {
result = s3Client.listObjectsV2(req);
for (S3ObjectSummary objectSummary :
result.getObjectSummaries()) {
System.out.println(" - " + objectSummary.getKey() + " " +
"(size = " + objectSummary.getSize() +
")");
if(objectSummary.getKey().contains("part-00000")){ // The messages are stored in files named "part-00000"
S3Object object = s3Client.getObject(
new GetObjectRequest(objectSummary.getBucketName(), objectSummary.getKey()));
InputStream objectData = object.getObjectContent();
byte[] byteData = new byte[(int) objectSummary.getSize()]; // The size of the messages differ
objectData.read(byteData);
dataList.add(byteData); // Add the message to the list
objectData.close();
}
}
/* When iterating, messages are split into chunks called continuation tokens.
* All tokens have to be iterated through to get all messages. */
System.out.println("Next Continuation Token : " + result.getNextContinuationToken());
req.setContinuationToken(result.getNextContinuationToken());
} while(result.isTruncated() == true );
} catch (AmazonServiceException ase) {
System.out.println("Caught an AmazonServiceException, " +
"which means your request made it " +
"to Amazon S3, but was rejected with an error response " +
"for some reason.");
System.out.println("Error Message: " + ase.getMessage());
System.out.println("HTTP Status Code: " + ase.getStatusCode());
System.out.println("AWS Error Code: " + ase.getErrorCode());
System.out.println("Error Type: " + ase.getErrorType());
System.out.println("Request ID: " + ase.getRequestId());
} catch (AmazonClientException ace) {
System.out.println("Caught an AmazonClientException, " +
"which means the client encountered " +
"an internal error while trying to communicate" +
" with S3, " +
"such as not being able to access the network.");
System.out.println("Error Message: " + ace.getMessage());
} catch (IOException e) {
e.printStackTrace();
}
JavaRDD<byte[]> messages = sc.parallelize(dataList); // Loads the messages into an RDD
messages.saveAsObjectFile("S3URL/daily_logs/" + dateString);
This all works fine, but now I am not sure how to actually restore the data to a manageable state again. If I use
sc.objectFile
to restore the RDD I end up with a JavaRDD<'byte[]> where the byte[] is actually a JavaRDD<'byte[]> in itself. How can I restore the nested JavaRDD from the byte[] located in the JavaRDD<'byte[]>?
I hope this somehow makes sense and I am grateful for any help. In a worst case scenario I have to come up with another way to backup the data.
Best regards
Mathias
I solved it by instead of storing a nested RDD I flatmapped all the byte[] into a single JavaRDD and stored that one instead.

Is Character Stream capable to transfer binary files without corrupting them?

I was reading for a while about binary and Character Streams in JAVA, the Character Stream should not be capable to transfer binary files correctly, with other words, if we copy binary file to another one using Character Stream, the target file will be corrupted, but the result i get is not the same with what I was expecting,The files I'm trying to copy are images and videos.
here is the code I'm using.
public void charCopy() throws IOException {
FileReader fileReader = new FileReader(source);
FileWriter fileWriter = new FileWriter(target);
int readChar;
System.out.println("log : " + this.loggin() + ": copy successfully started !");
while ((readChar = fileReader.read()) != -1) {
fileWriter.write(readChar);
}
System.out.println("log : " + this.loggin() + ": copy successfully terminated !");
fileReader.close();
fileWriter.close();
System.out.println("log : " + this.loggin() + ": streams successfully closed !");
System.out.println("log : " + this.loggin() + ": target file ready !");
}

JTOpen ProgramCall Socket Timeout

I'm working on a web app (running on Tomcat) that calls programs on an IBM i (AS/400) using the JTOpen ProgramCall class (com.ibm.as400.access.ProgramCall). My problem is with program calls that take more than 30s to respond, which are triggering a java.net.SocketTimeoutException: Read timed out exception.
There is a setTimeout() method available for this class, but it doesn't seem to have an effect on the socket timeout. I've also checked my Tomcat configurations and didn't see anything that would cause this behavior.
Does anyone know of a way to alter the timeout for such an implementation?
Code:
pgmCall.setProgram(getCompleteName(), parmList);
initializeAS400TextParameters();
// Run the AS/400 program.
try {
Trace.setTraceDiagnosticOn(true);
Trace.setTraceInformationOn(true);
Trace.setTraceWarningOn(true);
Trace.setTraceErrorOn(true);
Trace.setTraceDatastreamOn(true);
if (pgmCall.run() != true) {
messageList = pgmCall.getMessageList();
for (int i = 0; i < messageList.length; i++) {
log.debug("Error Message " + i + " " + messageList[i]);
}
setCompletionMsg("Program call failed.");
log.debug("442 Program call failed.");
return false;
} else {
messageList = pgmCall.getMessageList();
for (int i = 0; i < messageList.length; i++) {
log.debug("Success Message " + i + " " + messageList[i]);
}
setCompletionMsg("Program called ok.");
log.debug("452 Program called ok.");
return true;
}
} catch (Exception e) {
// This is where the timeout exception is thrown
log.debug("Error Running Program: " + e.getMessage() + " " + e.getLocalizedMessage());
setCompletionMsg(e.getMessage());
}
Well, after several more hours I've found the solution. Apparently the original developer added a socket timeout parameter to the JDBC connection string - simply removing the parameter did the trick as the default value is 0, or infinite timeout.
Before:
String connectionStr = "jdbc:as400://" + systemInfo.getIPAddress() + ":1527" + ";naming=system;socket timeout=30000;thread used=false;errors=full;prompt=false;date format=iso;block size=128;transaction isolation=none;user=" + systemInfo.getUserName() + ";password=" + systemInfo.getPassword();
After:
String connectionStr = "jdbc:as400://" + systemInfo.getIPAddress() + ":1527" + ";naming=system;thread used=false;errors=full;prompt=false;date format=iso;block size=128;transaction isolation=none;user=" + systemInfo.getUserName() + ";password=" + systemInfo.getPassword();
:\

IOException doesn't give enough information

My android program isn't working. I am using normal client-server sockets. I have tested my server with telnet and it works fine, but when I try it with my android program, it doesn't work (more details in a second). Here's my code:
Socket s = null;
try
{
String SocketServerAddress = db.getPhSsServerAddress();
Integer SocketServerPort = db.getPhSsServerPort();
s = new Socket(SocketServerAddress, SocketServerPort);
Log.d(MY_DEBUG_TAG, "Setting up Socket: " + SocketServerAddress + ":" + SocketServerPort);
DataOutputStream out = new DataOutputStream(s.getOutputStream());
DataInputStream in = new DataInputStream(s.getInputStream());
Log.d(MY_DEBUG_TAG, "Connected to: " + s.getInetAddress() + " on port " + s.getPort());
out.writeUTF("Helo, Server");
out.flush();
Log.d(MY_DEBUG_TAG, "Bytes written: " + out.size());
String st = in.readUTF();
Log.d(MY_DEBUG_TAG, "SocketServerResponse: " + st);
}
catch (UnknownHostException e)
{
Log.e(MY_ERROR_TAG, "UnknownHostException: " + e.getMessage() + "; " + e.getCause());
}
catch (IOException e)
{
Log.e(MY_ERROR_TAG, "IOException: " + e.getMessage() + "; " + e.getCause() + "; " + e.getLocalizedMessage());
}
finally
{
try {
s.close();
} catch (IOException e) {
Log.e(MY_ERROR_TAG, "IOException on socket.close(): " + e.getMessage() + "; " + e.getCause());
}
}
All I ever get here is a thrown IOException with no message or cause attached. The specific line causing the error is the String st = in.readUTF(). If I comment out that line, my code runs fine (no exceptions thrown), but my server does not acknowledge that any data has been sent to it. And of course I don't get any data back since that line is commented out.
So, how can I figure out what the problem is? Tonight I am going to try and see what is being passed with wireshark to see if that gives any insight.
Is the server using readUTF() and writeUTF() too? writeUTF() writes data in a unique format that can only be understood by readUTF(), which won't understand anything else.
EDIT EOFException means that there is no more data. You should catch it separately and handle it by closing the socket etc. It can certainly be caused spuriously by readUTF() trying to read data that wasn't written with writeUTF().
And deciding it was an IOException when it was really an EOFException means you didn't print out or log the exception itself, just its message. Always use the log methods provided for exceptions, or at least use Exception.toString().
As I remember I had a problem with DataInpuStream some day... try doing so:
in = new DataInputStream(new BufferedInputStream(socket.getInputStream()));

Categories

Resources