Java stream apply different logic to last element in a single pass

Java stream apply different logic to last element in a single pass - java

I have a stream of data from database using Spring Data Jpa that needs to be Json serialized and write to a Http response, without storing in memory. This is the sample code.
try (Stream<Employee> dataStream = empRepo.findAllStream()) {
response.setHeader("content-type", "application/json");
PrintWriter respWriter = response.getWriter();
respWriter.write("["); // array begin
dataStream.forEach(data -> {
try {
respWriter.write(jsonSerialize(data));
respWriter.write(",");
} catch (JsonProcessingException e) {
log(e);
}
entityManager.detach(data);
});
respWriter.write("]"); // array end
respWriter.flush();
} catch (IOException e) {
log(e);
}
}
But this logic will write an extra comma after the last element. How can I not to do respWriter.write(",");, if it is the last element?
There are solutions with stream operators - peek, reduce etc, but what's the most optimized solution? Is there something like Stream.hasNext() so that I can use an if condition inside forEach?

First I'd like to say that I don't think that your problem is a good fit for a single pipeline stream. You are performing side effects both with the write call and the detach call. Maybe you are better of with a normal for-loop? Or using multiple streams instead?
That being said, you can use the technique that Eran describes in an answer to this question: Interleave elements in a stream with separator
try (Stream<Employee> dataStream = empRepo.findAllStream()) {
response.setHeader("content-type", "application/json");
PrintWriter respWriter = response.getWriter();
respWriter.write("["); // array begin
dataStream.map(data -> {
try {
String json = jsonSerialize(data);
// NOTE! It is confusing to have side effects like this in a stream!
entityManager.detach(data);
return json;
} catch (JsonProcessingException e) {
throw new RuntimeException(e);
}
})
.flatMap(json -> Stream.of(",", json))
.skip(1)
.forEach(respWriter::write);
respWriter.write("]"); // array end
respWriter.flush();
} catch (IOException e) {
log(e);
}

For this specific scenario, you can use Collectors.joining
printWriter.write(dataStream
.map(this::deserializeJson)
.peek(entityManager::detach)
.collect(Collectors.joining(","));
However since you are performing side-effects which are discouraged in streams, and since you specifically asked about a hasNext() operation, and since this streaming solution would build a large string in memory, you might instead prefer converting the stream to an iterator and using an imperative loop:
Iterator<Employee> it = dataStream.iterator();
while(it.hasNext()) {
Employee data = it.next();
...
// skip writing delimiter on last entry
if (it.hasNext()){
respWriter.write(",")
}
}

Related

How to use multi-threading to parallellize a for loop in Java?

I am writing a code which picks the multiple API call details from a file and executes those ones by one and provides the response data in an ArrayList. Below is my current code.
ArrayList<APICallDetails> apiCallDetailsArray = new ArrayList<>();
APICallDetails apiCallDetails = new APICallDetails();
for (count= 1; count <= callsCount; count++){
try{
apiCallDetails = new APICallDetails();
apiCallDetails.setName(property.getPropertyReader(callName+"_"+count+"_Name", propsFile));
apiCallDetails.setHost(marketConfigs.getRawJson().get(property.getPropertyReader(callName+"_"+count+"_Host", propsFile)).toString().replaceAll("\"", ""));
apiCallDetails.setPath(property.getPropertyReader(callName+"_"+count+"_Path", propsFile));
apiCallDetails.setMethod(property.getPropertyReader(callName+"_"+count+"_Method", propsFile));
apiCallDetails.setBody(property.getPropertyReader(callName+"_"+count+"_Body", propsFile));
apiCallDetails = sendAPIRequest.mwRequestWithoutBody(apiCallDetails, marketConfigs);
BufferedWriter out = null;
try {
out = new BufferedWriter ( new FileWriter ( "C:\\file"+count+".html"));
out.write("something");
out.close();
} catch (IOException e) {
e.printStackTrace();
logger.error(new Date()+" - Error in "+getClass()+".apiCallRequester() flow: "+e.toString());
}
apiCallDetailsArray.add(apiCallDetails);
}catch(NullPointerException e){
e.printStackTrace();
logger.error(new Date()+" - Error in "+getClass()+".apiCallRequester() flow: "+e.toString());
}
}
As there are more API calls, this is taking the sum of the response time of all the calls. I want these calls to run parallelly and store the response data in an ArrayList which I can use further.
I am new to Java so, can someone please help me with this?

You can use parallel streams. The following invocation will invoke in parallel createAPICallDetails(idx) and add their return objects into a List:
List<APICallDetails> result = IntStream.range(0, callsCount)
.parallel()
.mapToObj(idx -> createAPICallDetails(idx))
.collect(Collectors.toList());
So, the only thing left for you is to implement the logic of:
APICallDetails createAPICallDetails(int index) { ... }
To create a single object of your APICallDetails given the index argument, so it can be used in the previous lambda.
Hope this helps.

What is the proper way to return a value from try-catch block?

An example that does not work due to the lack of a return value:
public Path writeToFile() {
try {
Path tempFilePath = Files.createTempFile(Paths.get(""), "sorting_test_", ".txt");
BufferedWriter bw = new BufferedWriter(new FileWriter(tempFilePath.toFile()));
for (List<Integer> arr : arrays) {
// Convert array ints to strings, join it to single string and write
bw.write(arr.stream()
.map(String::valueOf)
.collect(Collectors.joining(" ")));
bw.newLine();
}
bw.close();
return tempFilePath;
} catch (IOException e) {
e.printStackTrace();
}
}
I know that I can do like this:
public Path writeToFile() {
Path tempFilePath = null;
//try-catch{...}
return tempFilePath;
}
But it looks ugly. Is there a more natural way to solve this task?

Here are some possible solutions:
Change the method signature to public void writeToFile(). Don't return the Path. (But this probably won't work for you: you probably need the Path.)
Add return null; at the end of the method. This has the disadvantage that the caller needs to deal with the case where null is returned ... or else it will get NPEs when they attempt to use the non-existent Path.
This is equivalent to your "ugly" solution. It is debatable which is better from a stylistic perspective. (A dogmatic "structured programming" person would say your way is better!)
Change the signature to return as Optional<Path>. This is a better alternative than returning an explicit null. If you implement it correctly, the caller is effectively forced to deal with the "absent" case.
Remove the try catch and change the signature of the method to public Path writeToFile() throws IOException. The caller has to deal with the checked exception, but that may be a good thing!
I should point out that your code is not handling the resources properly. You should be using try with resources to ensure that the stream created by FileWriter is always closed. Otherwise there is a risk of leaking file descriptors that could ultimately result in unexpected I/O errors.

If you don't want to return null i will prefer using Optional from java 8
public Optional<Path> writeToFile() {
try {
Path tempFilePath = Files.createTempFile(Paths.get(""), "sorting_test_", ".txt");
BufferedWriter bw = new BufferedWriter(new FileWriter(tempFilePath.toFile()));
for (List<Integer> arr : arrays) {
// Convert array ints to strings, join it to single string and write
bw.write(arr.stream()
.map(String::valueOf)
.collect(Collectors.joining(" ")));
bw.newLine();
}
bw.close();
return Optional.of(tempFilePath);
} catch (IOException e) {
e.printStackTrace();
}
return Optional.empty()
}
So in the caller method you can use
public void ifPresent(Consumer consumer)
or
public boolean isPresent()

I don't know why you're looking for a more "natural" solution, but you could just return null in your catch block.

Another solution, instead of eating an IOException (antipattern), convert it to an appropriate subclass of RuntimeException and throw from the catch block.
Also, in your example, you are leaking file handler, by not closing FileWriter on exception.
public Path writeToFile() {
final Path tempFilePath;
try {
tempFilePath = Files.createTempFile(Paths.get(""), "sorting_test_", ".txt");
} catch (IOException e ) {
throw new MyRuntimeException(
"Cannot create sorting_test temp file",
e
);
}
try (final FileWriter fw = new FileWriter(tempFilePath.toFile())) {
try(final BufferedWriter bw = new BufferedWriter(fw)) {
for (List<Integer> arr : arrays) {
// Convert array ints to strings, join it to single string and write
bw.write(arr.stream()
.map(String::valueOf)
.collect(Collectors.joining(" ")));
bw.newLine();
}
}
return tempFilePath;
} catch (IOException e) {
throw new MyRuntimeException(
"Cannot write to " + tempFilePath,
e
);
}
}

The most appropriate way is to keep the return statement in try block.
If we keep the return statement in finally or after catch we might be swallowing the exception.
This is an old link that seems to be related. See if this helps.

How to insert data as fast as possible with Hibernate

I read file and create a Object from it and store to postgresql database. My file has 100,000 document that I read from one file and split it and finally store to database.
I can't create List<> and store all document in List<> because my RAM is little. My code to read and write to database are as below. But My JVM Heap fills and can not continue to store more document. How to read file and store to database efficiently.
public void readFile() {
StringBuilder wholeDocument = new StringBuilder();
try {
bufferedReader = new BufferedReader(new FileReader(files));
String line;
int count = 0;
while ((line = bufferedReader.readLine()) != null) {
if (line.contains("<page>")) {
wholeDocument.append(line);
while ((line = bufferedReader.readLine()) != null) {
wholeDocument = wholeDocument.append("\n" + line);
if (line.contains("</page>")) {
System.out.println(count++);
addBodyToDatabase(wholeDocument.toString());
wholeDocument.setLength(0);
break;
}
}
}
}
wikiParser.commit();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
bufferedReader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public void addBodyToDatabase(String wholeContent) {
Page page = new Page(new Timestamp(System.currentTimeMillis()),
wholeContent);
database.addPageToDatabase(page);
}
public static int counter = 1;
public void addPageToDatabase(Page page) {
session.save(page);
if (counter % 3000 == 0) {
commit();
}
counter++;
}

First of all you should apply a fork-join approach here.
The main task parses the file and sends batches of at most 100 items to an ExecutorService. The ExecutorService should have a number of worker threads that equals the number of available database connections. If you have 4 CPU cores, let's say that the database can take 8 concurrent connections without doing to much context switching.
You should then configure a connection pooling DataSource and have a minSize equal to maxSize and equal to 8. Try HikariCP or ViburDBCP for connection pooling.
Then you need to configure JDBC batching. If you're using MySQL, the IDENTITY generator will disable bathing. If you're using a database that supports sequences, make sure you also use the enhanced identifier generators (they are the default option in Hibernate 5.x).
This way the entity insert process is parallelized and decoupled from the main parsing thread. The main thread should wait for the ExecutorService to finish processing all tasks prior to shutting down.

Actually it is hard to suggest to you without doing real profiling and find out what's making your code slow or inefficient.
However there are several things we can see from your code
You are using StringBuilder inefficiently
wholeDocument.append("\n" + line); should be wrote as wholeDocument.append("\n").append(line); instead
Because what you original wrote will be translated by compiler to
whileDocument.append(new StringBuilder("\n").append(line).toString()). You can see how much unnecessary StringBuilders you have created :)
Consideration in using Hibernate
I am not sure how you manage your session or how you implemented your commit(), I assume you have done it right, there are still more thing to consider:
Have you properly set up batch size in Hibernate? (hibernate.jdbc.batch_size) By default, the JDBC batch size is something around 5. You may want to make sure you set it in bigger size (so that internally Hibernate will send inserts in a bigger batch).
Given that you do not need the entities in 1st level cache for later use, you may want to do intermittent session flush() + clear() to
Trigger batch inserts mentioned in previous point
clear out first level cache
Switch away from Hibernate for this feature.
Hibernate is cool but it is not panacea for everything. Given that in this feature you are just saving records into DB based on text file content. Neither you do need any entity behavior, nor you need to make use of first level cache for later processing, there is not much reason to make use of Hibernate here given the extra processing and space overhead. Simply doing JDBC with manual batch handling is going to save you a lot of trouble .

I use #RookieGuy answer.
stackoverflow.com/questions/14581865/hibernate-commit-and-flush
I use
session.flush();
session.clear();
and finally after read all documents and store them into database
tx.commit();
session.close();
and change
wholeDocument = wholeDocument.append("\n" + line);
to
wholeDocument.append("\n" + line);

I'm not very much sure about the structure of your data file.It will be easy to understand, if you could provide a sample of your file.
The root cause of the memory consumption is the way of reading/iterating the file. Once something get read, stays in memory. You should rather use either java.io.FileInputStream or org.apache.commons.io.FileUtils.
Here is a sample code to iterate with java.io.FileInputStream
try (
FileInputStream inputStream = new FileInputStream("/tmp/sample.txt");
Scanner sc = new Scanner(inputStream, "UTF-8")
) {
while (sc.hasNextLine()) {
String line = sc.nextLine();
addBodyToDatabase(line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
Here is a sample code to iterate with org.apache.commons.io.FileUtils
File file = new File("/tmp/sample.txt");
LineIterator it = FileUtils.lineIterator(file, "UTF-8");
try {
while (it.hasNext()) {
String line = it.nextLine();
addBodyToDatabase(line);
}
} finally {
LineIterator.closeQuietly(it);
}

You should begin a transaction, do the save operation and commit a transaction. (Don't begin a transaction after save!). You can try to use StatelessSession to exclude memory consumption by a cache.
And use more less value, for an example 20, in this code
if (counter % 20 == 0)
You can try to pass StringBuilder as a method's argument as far as possible.

Query on proper usage of OutputStream

Basically attempting to send video data and trying to understand how this whole process works, not sure whether I've put this together properly. Any help would be greatly appreciated.
public void OutputStream(BufferedOutputStream out) throws MalformedURLException {
URL url = new URL("http://www.android.com//");
HttpURLConnection urlConnection = null;
try {
urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setDoOutput(true);
urlConnection.setChunkedStreamingMode(0);
out = new BufferedOutputStream(urlConnection.getOutputStream());
out = new BufferedOutputStream(new FileOutputStream(String.valueOf(mVideoUri)), 8 * 1024);
} catch (IOException e) {
e.printStackTrace();
} finally {
assert urlConnection != null;
urlConnection.disconnect();
}
}

You aren't technically using the output stream at all in this case, merely reassigning it several times. The input parameter out is being reassigned within the method but never used prior which doesn't seem like what you want to do at all since whatever existing output stream instance reference passed to this method is simply discarded.
You reassign out once more and discard the buffered of the socket connection on this line:
new BufferedOutputStream(urlConnection.getOutputStream());
which is mostly harmless in that there isn't any resource leakage (given that disconnect() is called) but once again doesn't seem like what you want to do.
Your code also has resource leakage on the last out given that it is not closed anywhere within the try-catch-finally block which is a serious flaw. Additionally, the usage of assertions to check for nulls on out needs to be promoted to a if-statement to handle the very real possibility that out is null in case of a failed URL resolution/open. Assertion tests can be turned off, in which you'd get a NPE (and when turned on, you'll get an AssertionError, nether of which is better).
Whilst it's hard to anticipate exactly what your project structure is, the general contract of output stream usage can be seen as follows:
public void foo(){
OutputStream out = null;
byte[] data = ... // Populated from some data source
try{
out = ... // Populated from some source
out.write(data); // Writes the data to the output destination
}catch(IOException ex){
// Handle exception here
}finally{
// Only attempt to close the output stream if it was actually opened successfully
if(out != null){
try{
out.close();
}catch(IOException closeEx){
// Handle, propogate upwards or log it
}
}
}
}
The output stream is used within the try block such that any exceptions will result in the finally block closing the stream as appropriate, removing the resource leakage. Note the sample write() method in the try block, illustrating in the most basic form how OutputStreams can be used to put data into some destination.
Under java 7 (and above), the above example is more compact:
public void foo(){
byte[] data = ... // Populated from some data source
try(OutputStream out = ...){
out.write(data); // Writes the data to the output destination
}catch(IOException ex){
// Handle exception here
}
}
Utilizing try-with-resources, resource safety can be assured thanks to the AutoClosable interface and java 7 (and above's) new syntax. There is one small difference in that exceptions from closing the stream are also bunched into the same catch block instead of being separate as in the first example.

ReadObject Method

anyway to check if method readObject of class ObjectInputStream has finished reading file other than catching its thrown exceptions?
and if no. how can I make outNewmast.writeObject(accountRecord); statement reached in this case?
// read oldmast.ser
try {
while (true) {
accountRecord = (AccountRecord) inOldmast.readObject();
//read trans.ser
while (true) {
transactionRecord = (TransactionRecord) inTrans.readObject();
if (transactionRecord.getAccountNumber() == accountRecord.getAccount()) {
accountRecord.combine(transactionRecord);
}//end if
}//end inner while
outNewmast.writeObject(accountRecord);
}//end while
}//end try
catch (ClassNotFoundException e) {
System.err.println("Error reading file.");
System.exit(1);
}//end catch
catch (IOException e) {
System.err.println("Error reading file.");
System.exit(1);
}//end catch

The best idea would be to serialize the number of elements beforehand, so you could just do:
cnt = file.readInt();
for (int i=0;i<cnt;i++) {
file.readObject();
}
The method proposed by #ChrisCooper is not reliable, as stated in documentation. Some streams don't implement it, or return approximate result (in theory, it can even return 0 when there is still some data. Example - network stream).
Therefore, looking at same documentation, we find this particular block:
Any attempt to read object data which exceeds the boundaries of the
custom data written by the corresponding writeObject method will cause
an OptionalDataException to be thrown with an eof field value of true.
Non-object reads which exceed the end of the allotted data will
reflect the end of data in the same way that they would indicate the
end of the stream: bytewise reads will return -1 as the byte read or
number of bytes read, and primitive reads will throw EOFExceptions. If
there is no corresponding writeObject method, then the end of default
serialized data marks the end of the allotted data.
So, the best idea would be to catch an OptionalDataException and check it's eof field for true.
And to digest the answer even further, here's the method you want:
TransactionRecord readRecord(ObjectInputStream stream) throws OptionalDataException, IOException {
try {
transactionRecord = (TransactionRecord) stream.readObject();
} catch (OptionalDataException e) {
if (e.eof) {
return null;
} else {
throw e;
}
}
return transactionRecord;
}
.....
TransactionRecord record;
while ((record = readRecord(inTrans)) != null) {
doSomethingWithRecord(record);
}
endOfFile();

Yes, check the input stream to see if anything more is available:
http://docs.oracle.com/javase/6/docs/api/java/io/InputStream.html#available()
if (inOldmast.available() > 0) {
// read and process
} else {
// Close the stream and clean up
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java stream apply different logic to last element in a single pass - java

Related

How to use multi-threading to parallellize a for loop in Java?

What is the proper way to return a value from try-catch block?

How to insert data as fast as possible with Hibernate

Query on proper usage of OutputStream

ReadObject Method

Categories

Resources