Suspicious IO behavior: System.out does not flush

Suspicious IO behavior: System.out does not flush - java

I am working on a GA which used to work fine up until last week. In my efforts of optimizing the code I somehow broke the output sequence where the program dumps information about each generation of individuals. In all my debug efforts I came to understand that it's probably a flushing problem, but I can't really put my finger on the crux. The odd thing is that I actually did not touch any part of the IO since the last working version of the code.
The GA is part of a much larger software where there is a progress console which is essentially a way to "mirror" System.out and System.err to the GUI. From my debugging efforts I realized that the buffer does not flush even if I specifically call ps.flush().
What could be the reason of this problem??
Edit: To answer the questions in the comments, as well as further information on the problem:
The rest of the software does it's output to the GUI and Eclipse console as normal, it's only the calls to the outputGenInfo() method (see below) that have disappeared
If I add a notification line such as System.out.println("Fitness calculation is complete!"); in my evaluateGeneration() method which is called just before the outputGenInfo() the information for each generation gets printed exactly as expected... [That particular line was one of things I had trimmed during my optimization efforts]
For mirroring/redirecting System.out I used the MessageConsole class written by Rob Camick which can be found here
The suspicious code is as follows:
/**
* Convenience method for getting debug/info
* on this generation
* #param ps - a <code>PrintStream</code> to output to
* */
public void outputGenInfo(PrintStream ps){ // <-- ps = System.out by default
double stdev_fitness = stats.getStandardDeviation();
double mean_fitness = stats.getMean();
double cv = stdev_fitness / mean_fitness;
StringBuilder sb = new StringBuilder();
String new_line = System.getProperty("line.separator");
sb.append(new_line);
sb.append("+++++++++++++++++++++++++++++++++");
sb.append(new_line);
sb.append("Generation: " + this.id);
sb.append(new_line);
sb.append("-------------------------------");
sb.append(new_line);
sb.append("Mean fitness: " +
String.format("%.3f",mean_fitness) +
", CV: " + String.format("%.3f",cv));
sb.append(new_line);
sb.append("Top fitness: " + getTopIndividual().getEntropy());
sb.append(new_line);
sb.append("+++++++++++++++++++++++++++++++++");
sb.append(new_line);
ps.println(sb.toString());
// during debug this actually helps but not when the code is running full throttle!
System.out.println("");
}

Related

Google Protocol Buffers not reading properly

I am just setting up a new Java project which will (maybe, not so sure now) make use of Google Protocol Buffers. I am new to this API, so I started with a very basic test. A test whose outcome, to be honest, really disappointed me. Why isn't this very straight-forward code working?
var output = new ByteArrayOutputStream();
Message.Echo.newBuilder().setMsg("MSG1?").build().writeTo(output);
System.out.println("output.length " + output.toByteArray().length);
Message.Echo.newBuilder().setMsg("MSG2!!").build().writeTo(output);
System.out.println("output.length " + output.toByteArray().length);
var input = new ByteArrayInputStream(output.toByteArray());
System.out.println("input.available " + input.available());
System.out.print(Message.Echo.parseFrom(input));
System.out.println("input.available " + input.available());
System.out.print(Message.Echo.parseFrom(input));
The above code produces the following output:
output.length 7
output.length 15
input.available 15
msg: "MSG2!!"
input.available 0
It entirely misses the first messages, or rather it seems to "overwrite" it in some way since all the 15 bytes get read. Plus it fails to block on the second call considering there are no further bytes to read.
However, changing the two reading lines into:
System.out.print(Message.Echo.parseFrom(input.readNBytes(7)));
System.out.print(Message.Echo.parseFrom(input.readNBytes(15-7)));
correctly prints the two messages. I am running Kubuntu 18.04 with JDK 11. Am I missing something really important (not mentioned in the official tutorial) or is this a bug?
This is the .proto file:
syntax = "proto3";
package ...;
option java_package = "...";
option java_outer_classname = "Message";
message Echo {
string msg = 1;
}

Ok, it seems that in order to write/read multiple message using the same set of streams requires using writeDelimitedTo and parseDelimitedFrom instead, because parseFrom reads until reaching reaches EOF.
It seems that the preferred behaviour is to use a new Socket for each message. It sounds a bit odd to me, but I am sure there are good reason behind this. It should be better explained in the official tutorial though.

Issues with Dynamic Destinations in Dataflow

I have a Dataflow job that reads data from pubsub and based on the time and filename writes the contents to GCS where the folder path is based on the YYYY/MM/DD. This allows files to be generated in folders based on date and uses apache beam's FileIO and Dynamic Destinations.
About two weeks ago, I noticed an unusual buildup of unacknowledged messages. Upon restarting the df job the errors disappeared and new files were being written in GCS.
After a couple of days, writing stopped again, except this time, there were errors claiming that processing was stuck. After some trusty SO research, I found out that this was likely caused by a deadlock issue in pre 2.90 Beam because it used the Conscrypt library as the default security provider. So, I upgraded to Beam 2.11 from Beam 2.8.
Once again, it worked, until it didn't. I looked more closely at the error and noticed that it had a problem with a SimpleDateFormat object, which isn't thread-safe. So, I switched to use Java.time and DateTimeFormatter, which is thread-safe. It worked until it didn't. However, this time, the error was slightly different and didn't point to anything in my code:
The error is provided below.
Processing stuck in step FileIO.Write/WriteFiles/WriteShardedBundlesToTempFiles/WriteShardsIntoTempFiles for at least 05m00s without outputting or completing in state process
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at org.apache.beam.vendor.guava.v20_0.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:469)
at org.apache.beam.vendor.guava.v20_0.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:76)
at org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:202)
at org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:409)
at org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:311)
at org.apache.beam.runners.dataflow.worker.WindmillStateReader$BagPagingIterable$1.computeNext(WindmillStateReader.java:700)
at org.apache.beam.vendor.guava.v20_0.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
at org.apache.beam.vendor.guava.v20_0.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
at org.apache.beam.vendor.guava.v20_0.com.google.common.collect.MultitransformedIterator.hasNext(MultitransformedIterator.java:47)
at org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn.processElement(WriteFiles.java:701)
at org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn$DoFnInvoker.invokeProcessElement(Unknown Source)
This error started occurring approximately 5 hours after job deployment and at an increasing rate over time. Writing slowed significantly within 24 hours. I have 60 workers and I suspect that one worker fails every time there is an error, which eventually kills the job.
In my writer, I parse the lines for certain keywords (may not be the best way) in order to determine which folder it belongs in. I then proceed to insert the file to GCS with the determined filename. Here is the code I use for my writer:
The partition function is provided as the following:
#SuppressWarnings("serial")
public static class datePartition implements SerializableFunction<String, String> {
private String filename;
public datePartition(String filename) {
this.filename = filename;
}
#Override
public String apply(String input) {
String folder_name = "NaN";
String date_dtf = "NaN";
String date_literal = "NaN";
try {
Matcher foldernames = Pattern.compile("\"foldername\":\"(.*?)\"").matcher(input);
if(foldernames.find()) {
folder_name = foldernames.group(1);
}
else {
Matcher folderid = Pattern.compile("\"folderid\":\"(.*?)\"").matcher(input);
if(folderid.find()) {
folder_name = folderid.group(1);
}
}
Matcher date_long = Pattern.compile("\"timestamp\":\"(.*?)\"").matcher(input);
if(date_long.find()) {
date_literal = date_long.group(1);
if(Utilities.isNumeric(date_literal)) {
LocalDateTime date = LocalDateTime.ofInstant(Instant.ofEpochMilli(Long.valueOf(date_literal)), ZoneId.systemDefault());
date_dtf = date.format(dtf);
}
else {
date_dtf = date_literal.split(":")[0].replace("-", "/").replace("T", "/");
}
}
return folder_name + "/" + date_dtf + "h/" + filename;
}
catch(Exception e) {
LOG.error("ERROR with either foldername or date");
LOG.error("Line : " + input);
LOG.error("folder : " + folder_name);
LOG.error("Date : " + date_dtf);
return folder_name + "/" + date_dtf + "h/" + filename;
}
}
}
And the actual place where the pipeline is deployed and run can be found below:
public void streamData() {
Pipeline pipeline = Pipeline.create(options);
pipeline.apply("Read PubSub Events", PubsubIO.readMessagesWithAttributes().fromSubscription(options.getInputSubscription()))
.apply(options.getWindowDuration() + " Window",
Window.<PubsubMessage>into(FixedWindows.of(parseDuration(options.getWindowDuration())))
.triggering(AfterWatermark.pastEndOfWindow())
.discardingFiredPanes()
.withAllowedLateness(parseDuration("24h")))
.apply(new GenericFunctions.extractMsg())
.apply(FileIO.<String, String>writeDynamic()
.by(new datePartition(options.getOutputFilenamePrefix()))
.via(TextIO.sink())
.withNumShards(options.getNumShards())
.to(options.getOutputDirectory())
.withNaming(type -> FileIO.Write.defaultNaming(type, ".txt"))
.withDestinationCoder(StringUtf8Coder.of()));
pipeline.run();
}

The error 'Processing stuck ...' indicates that some particular operation took longer than 5m, not that the job is permanently stuck. However, since the step FileIO.Write/WriteFiles/WriteShardedBundlesToTempFiles/WriteShardsIntoTempFiles is the one that is stuck and the job gets cancelled/killed, I would think on an issue while the job is writing temp files.
I found out the BEAM-7689 issue which is related to a second-granularity timestamp (yyyy-MM-dd_HH-mm-ss) that is used to write temporary files. This happens because several concurrent jobs can share the same temporary directory and this can cause that one of the jobs deletes it before the other(s) job finish(es).
According to the previous link, to mitigate the issue please upgrade to SDK 2.14. And let us know if the error is gone.

Since posting this question, I've optimized the dataflow job to dodge bottlenecks and increase parallelization. Much like rsantiago explained, processing stuck isn't an error, but simply a way dataflow communicates that a step is taking significantly longer than other steps, which is essentially a bottleneck that can't be cleared with the given resources. The changes I made seem to have addressed them. The new code is as follows:
public void streamData() {
try {
Pipeline pipeline = Pipeline.create(options);
pipeline.apply("Read PubSub Events", PubsubIO.readMessagesWithAttributes().fromSubscription(options.getInputSubscription()))
.apply(options.getWindowDuration() + " Window",
Window.<PubsubMessage>into(FixedWindows.of(parseDuration(options.getWindowDuration())))
.triggering(AfterWatermark.pastEndOfWindow())
.discardingFiredPanes()
.withAllowedLateness(parseDuration("24h")))
.apply(FileIO.<String,PubsubMessage>writeDynamic()
.by(new datePartition(options.getOutputFilenamePrefix()))
.via(Contextful.fn(
(SerializableFunction<PubsubMessage, String>) inputMsg -> new String(inputMsg.getPayload(), StandardCharsets.UTF_8)),
TextIO.sink())
.withDestinationCoder(StringUtf8Coder.of())
.to(options.getOutputDirectory())
.withNaming(type -> new CrowdStrikeFileNaming(type))
.withNumShards(options.getNumShards())
.withTempDirectory(options.getTempLocation()));
pipeline.run();
}
catch(Exception e) {
LOG.error("Unable to deploy pipeline");
LOG.error(e.toString(), e);
}
}
The biggest change involved removing the extractMsg() function and changing partitioning to only use metadata. Both of these steps forced deserialization/reserialization of messages and heavily impacted performance.
Additionally, since my data set was unbounded, I had to set a non-zero number of shards. I wanted to simplify my filenaming policy, so I set it to 1 without knowing how much it hurt performance. Since then, I've found a good balance of workers/shards/machine type for my job (mostly based on guess & check, unfortunately).
Although it's still possible that a bottleneck might be observed with a large enough data load, the pipeline has been performing well despite heavy load (3-5tb per day). The changes also significantly improved autoscaling, but I'm not sure why. The dataflow job now reacts to spikes and valleys a lot quicker.

BufferedReader.lines() method locking up on Windows

I have code that invokes BufferedReader.lines().
EIPLogManager2.getServerLogger().info("Got header row: " + headerRow); //TODO delete this
List<String> allBatches = reader.lines()
.skip(forkCount > 0 ? forkCount * forkSize : 0)
.limit(transactionsRemaining.get() * forkSize)
.collect(Collectors.toList());
EIPLogManager2.getServerLogger().info("Got all batches. Size: " + allBatches.size()); //TODO delete this
Let me explain how this code is behaving:
Run it on my Mac. Works perfectly.
Run it on Windows. The header row log entry prints out, but the Got all batches log entry never does. It seems to freeze during the stream.
The transactionsRemaining.get() call is to an AtomicInteger.
I don't know why this is happening on Windows. It makes no sense. I've seen this behavior with JRE 8 and JRE 11.

Ok, I was dumb. So the forkSize and transactionsRemaining variables are set by user input. The user had set transactionsRemaining to 1,000,000,000, so the math on limit() was producing a value larger than Integer.MAX_VALUE. An IllegalArgumentException was then being thrown, and I guess I didn't have anything in place that was reporting that exception to me.
The problem is now resolved.

Get formatted output from eclipse template variable

How can I make an eclipse java template that allows generating java code that eases the repeating part of registering code for a java method. Example:
Assume that the class description is like so:
class A{
public static void methodName(String s, int i, Object o) {
}
}
Now, what I want is to make a template that does something somewhat like this:
"${enclosing_type}.${enclosing_method}(" + ${variable1} + ", " + ${variable2} + ", " + ${variable3} + ")"
Given the available Eclipse variables I know, the idea would probably be:
"${enclosing_type}.${enclosing_method}(" + ${enclosing_method_arguments(" + \", \" + ")} + ")"
Where that argument would signal the glue to the join of each element of the enclosing_method_arguments. The result of the format would be:
"A.methodName(" + s + ", " + i + ", " + o + ")"
If there's an even better alternative, I'm open for suggestions.
This is meant to be used with a piece of code that is executed a LOT,
Unfortunately, String.format() (and related) "solution" is not an option here due to the requirement above and due to other inherited requirements with what I'm working on. It must generate that code in that format no matter what, unfortunately.
I'm open to any plugins that allow that and, if eclipse doesn't have it, I'm open to make a plugin myself... In case of me having to do a plugin please do show me the resources required to make it.

It should be possible to write a custom variable resolver. It is defined by org.eclipse.ui.editors.templates extension point
You could implement a custom resolver that points to Java context (its id is java defined in jdt.ui).
I don't know any plugins that would provide what you need out of the box.
If you are new to Eclipse plug-ins you might need to read the documentation on extending Eclipse. However, the task is really simple, so you should be able to do it after just a glimpse over the docs.

Deleting a line of text later in the program

I have a college assignment where I have to get a first, middle, and last name from the user, and their age, and give them a bank ID using the initials and whatnot. But that's not what I'm here to ask.
I wanted to have a little bit of fun with it, and make the "Imaginary Bank" accidentally tell the user that it's a scam! Then an Error will pop up and delete that accidental line of text, replacing it with the normal "We look forward to helping you!" line. All I need to know how to do is delete that line of text that starts with "At Imaginary Bank, we" Thanks!
System.out.println("Hello " + first_name + " " + last_name + ", greetings from the Imaginary Bank!");
System.out.println("To access your account, please use the following ID: " + first_init + middle_init + last_init + age);
System.out.println("At Imaginary Bank we look forward to scamming you and stealing your money!");
try
{
Thread.sleep(11000);
}
catch (InterruptedException e) {
e.printStackTrace();}
System.out.println("ERROR! ERROR! ERROR! ERROR! ERROR! ERROR! ERROR! ERROR! ERROR! ERROR! ERROR!");
try
{
Thread.sleep(3000);
}
catch (InterruptedException e) {
e.printStackTrace() ;}
System.out.println("We look forward to aiding you with your financial needs! - The IB Team");

According to this answer, you can print a backspace character using \b in the console using System.out.print. Therefore, for however many characters you have previously printed, print that many backspace characters.
Additionally, this answer for the same question suggests using the cls command to clear the console output entirely, however this forever binds your application to only operating systems that use that command (In this case Windows / Dos). In linux, for example, the command is clear...I'm sure you see the potential problem.

You can always print out backspace characters like this:
System.out.print("\b");
Just print out the same number of characters you would like to remove.

Check out How to delete stuff printed to console by System.out.println()? There is not a certain way to remove text from the output window but there is generally a way for each type of console window. Take your pick for what works best for your deployment.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Suspicious IO behavior: System.out does not flush - java

Related

Google Protocol Buffers not reading properly

Issues with Dynamic Destinations in Dataflow

BufferedReader.lines() method locking up on Windows

Get formatted output from eclipse template variable

Deleting a line of text later in the program

Categories

Resources