Java Reactive Way to read lines of a File - java

So I started to play with the Advent of Code and I would like to use the project reactor for this to find the solutions in a reactive way.
I have implemented a solutions that works partially but not quite how I want it. Because it can also read lines partially if there is no more space in the buffer.
The Input to run the following function you can find here: https://adventofcode.com/2022/day/1/input
public static Flux<String> getLocalInputForStackOverflow(String filePath) throws IOException {
Path dayPath = Path.of(filePath);
FileOutputStream resultDay = new FileOutputStream(basePath.resolve("result_day.txt").toFile());
return DataBufferUtils
.readAsynchronousFileChannel(
() -> AsynchronousFileChannel.open(dayPath),
new DefaultDataBufferFactory(),
64)
.map(DataBuffer::asInputStream)
.map(db -> {
try {
resultDay.write(db.readAllBytes());
resultDay.write("\n".getBytes());
return db;
} catch (FileNotFoundException e) {
throw new RuntimeException(e);
} catch (IOException e) {
throw new RuntimeException(e);
}
})
.map(InputStreamReader::new)
.map(is ->new BufferedReader(is).lines())
.flatMap(Flux::fromStream);
}
The point of this function is to read the lines of the files in a reactive way.
I used the FileOutputStream to write what I read into another file and the compare the resulted file with the original, because I noticed that some lines are only partially read if there is no more space in the buffer. So the try-catch .map() can be ignored
My questions here would:
Is there a more optimal way to read files asynchronously in a Reactive way?
Is there a more optimal way to read a file asyncronously line by line with a limited buffer and make sure that only whole lines are read?
Workarounds that I've found are:
Increased the buffer to read the whole file in 1 run -> Not optimal solution
Use the following functions, but this raise a warning:
Possibly blocking call in non-blocking context could lead to thread starvation
public static Flux<String> getLocalInput1(int day ) throws IOException {
Path dayPath = getFilePath(day);
return Flux.using(() -> Files.lines(dayPath),
Flux::fromStream,
BaseStream::close);
}

You're almost there. Just use BufferedReader instead of Files.lines.
In Spring Webflux, the optimal way to read files asynchronously in a reactive way is to use the Reactor Core library's Flux.using method. It creates a Flux that consumes a resource, performs some operations on it, and then cleans up the resource when the Flux is complete.
Example of reading a file asynchronously and reactively:
Flux<String> flux = Flux.using(
// resource factory creates FileReader instance
() -> new FileReader("/path/to/file.txt"),
// transformer function turns the FileReader into a Flux
reader -> Flux.fromStream(new BufferedReader(reader).lines()),
// resource cleanup function closes the FileReader when the Flux is complete
reader -> reader.close()
);
Subscribe to the Flux and consume the lines of the file as they are emitted; this will print each line of the file to the console as it is read from the file.
flux.subscribe(line -> System.out.println(line));
In similar way we can solve it controlling each line explicitly:
Flux<String> flux = Flux.generate(
() -> new BufferedReader(new FileReader("/path/to/file.txt")),
// generator function reads a line from the file and emits it
(bufferedReader, sink) -> {
String line = bufferedReader.readLine();
if (line != null) {
sink.next(line);
} else {
sink.complete();
}
},
reader -> Mono.fromRunnable(() -> {
try {
reader.close();
} catch (IOException e) {
// Handle exception
}
})
);

Related

Java stream apply different logic to last element in a single pass

I have a stream of data from database using Spring Data Jpa that needs to be Json serialized and write to a Http response, without storing in memory. This is the sample code.
try (Stream<Employee> dataStream = empRepo.findAllStream()) {
response.setHeader("content-type", "application/json");
PrintWriter respWriter = response.getWriter();
respWriter.write("["); // array begin
dataStream.forEach(data -> {
try {
respWriter.write(jsonSerialize(data));
respWriter.write(",");
} catch (JsonProcessingException e) {
log(e);
}
entityManager.detach(data);
});
respWriter.write("]"); // array end
respWriter.flush();
} catch (IOException e) {
log(e);
}
}
But this logic will write an extra comma after the last element. How can I not to do respWriter.write(",");, if it is the last element?
There are solutions with stream operators - peek, reduce etc, but what's the most optimized solution? Is there something like Stream.hasNext() so that I can use an if condition inside forEach?
First I'd like to say that I don't think that your problem is a good fit for a single pipeline stream. You are performing side effects both with the write call and the detach call. Maybe you are better of with a normal for-loop? Or using multiple streams instead?
That being said, you can use the technique that Eran describes in an answer to this question: Interleave elements in a stream with separator
try (Stream<Employee> dataStream = empRepo.findAllStream()) {
response.setHeader("content-type", "application/json");
PrintWriter respWriter = response.getWriter();
respWriter.write("["); // array begin
dataStream.map(data -> {
try {
String json = jsonSerialize(data);
// NOTE! It is confusing to have side effects like this in a stream!
entityManager.detach(data);
return json;
} catch (JsonProcessingException e) {
throw new RuntimeException(e);
}
})
.flatMap(json -> Stream.of(",", json))
.skip(1)
.forEach(respWriter::write);
respWriter.write("]"); // array end
respWriter.flush();
} catch (IOException e) {
log(e);
}
For this specific scenario, you can use Collectors.joining
printWriter.write(dataStream
.map(this::deserializeJson)
.peek(entityManager::detach)
.collect(Collectors.joining(","));
However since you are performing side-effects which are discouraged in streams, and since you specifically asked about a hasNext() operation, and since this streaming solution would build a large string in memory, you might instead prefer converting the stream to an iterator and using an imperative loop:
Iterator<Employee> it = dataStream.iterator();
while(it.hasNext()) {
Employee data = it.next();
...
// skip writing delimiter on last entry
if (it.hasNext()){
respWriter.write(",")
}
}

Reactive Stream skip filter on certain Exception

Say I have a reactive stream like so:
Flux<App> apps = this.getApps(arg)
.filter( res -> firstFilter())
.filter( res -> secondFilter())
And say that the getApps() call returns UnsupportedOperationException. How could one skip the firstFilter and return a default vault for the secondFilter when this exception is raised without resolving the whole chain?
Note that UnsupportedOperationException should be the only exception that results in the firstFilter being skipped.
For example one can use onErrorReturn or onErrorResume for a fallback, but they would complete the whole chain, and only onErrorResume could discriminate between exception types.
Stream<> stream = null; //your type here
try{
stream = this.getApps(arg).filter( res -> firstFilter());
}catch (UnsupportedOperationException ex) {
stream = Stream.of("1","2","3"); // you default provided stream of strings for example
}finally {
stream.filter( res -> secondFilter())
}

is it ok to use Parallel stream to FileWriter?

I want to write a Stream to file. However, the Stream is big (few Gb when write to file) so I want to use parallel. At the end of process, I would like to write to file (I am using FileWriter)
I would like to ask if that has potential cause any problem in file.
Here is some code
function to write stream to file
public static void writeStreamToFile(Stream<String> ss, String fileURI) {
try (FileWriter wr = new FileWriter(fileURI)) {
ss.forEach(line -> {
try {
if (line != null) {
wr.write(line + "\n");
}
} catch (Exception ex) {
System.err.println("error when write file");
}
});
} catch (IOException ex) {
Logger.getLogger(OaStreamer.class.getName()).log(Level.SEVERE, null, ex);
}
}
how I use my stream
Stream<String> ss = Files.lines(path).parallel()
.map(x->dosomething(x))
.map(x->dosomethingagain(x))
writeStreamToFile(ss, "path/to/output.csv")
As others have mentioned, this approach should work, however you should question if it is the best method. Writing to a file is a shared operation between threads meaning you are introducing thread contention.
While it is easy to think that having multiple threads will speed up performance, in the case of I/O operations the opposite is true. Remember I/O operations are finitely bounded, so more threads will not increase performance. In fact, this I/O contention will slow down access to the shared resource because of the constant locking/unlocking of the ability to write to the resource.
The bottom line is that only one thread can write to a file at a time, so parallelizing write operations is counterproductive.
Consider using multiple threads to handle your CPU intensive tasks, and then having all threads post to a queue/buffer. A single thread can then pull from the queue and write to your file. This solution (and more detail) was suggested in this answer.
Checkout this article for more info on thread contention and locks.
Yes It is Ok to use FileWriter as you are using, I have some another ways which may be helpful to you.
As you are dealing with large files, FileChannel can be faster than standard IO. The following code write String to a file using FileChannel:
#Test
public void givenWritingToFile_whenUsingFileChannel_thenCorrect()
throws IOException {
RandomAccessFile stream = new RandomAccessFile(fileName, "rw");
FileChannel channel = stream.getChannel();
String value = "Hello";
byte[] strBytes = value.getBytes();
ByteBuffer buffer = ByteBuffer.allocate(strBytes.length);
buffer.put(strBytes);
buffer.flip();
channel.write(buffer);
stream.close();
channel.close();
// verify
RandomAccessFile reader = new RandomAccessFile(fileName, "r");
assertEquals(value, reader.readLine());
reader.close();
}
Reference : https://www.baeldung.com/java-write-to-file
You can use Files.write with stream operations as below which converts the Stream to the Iterable:
Files.write(Paths.get(filepath), (Iterable<String>)yourstream::iterator);
For example:
Files.write(Paths.get("/dir1/dir2/file.txt"),
(Iterable<String>)IntStream.range(0, 1000).mapToObj(String::valueOf)::iterator);
If you have stream of some custom objects, you can always add the .map(Object::toString) step to apply the toString() method.
Heading
It is not a problem in case it is okay for the file to have the lines in random order. You are reading content in parallel, not in sequence. Therefore you have no guarantees at which point any line is coming in for processing.
That is only thing to keep in mind here.

Close file for reading before start writing

I wrote a method which replace some lines in a file (it's not the purpose of this question). Everything works fine, but I'm wondering if file is closed for reading when I start writing. I'd like to ensure that my solution is safe. That's what I've done:
private void replaceCodeInTranslationFile(File file, String code) {
if (file.exists()) {
try (Stream<String> lines = Files.lines(Paths.get(file.getAbsolutePath()), Charset.defaultCharset())) {
String output = this.getLinesWithUpdatedCode(lines, code);
this.replaceFileWithContent(file, output); // is it safe?
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
}
Method replaceFileWithContent() looks like this:
private void replaceFileWithContent(File file, String content) throws IOException {
try (FileOutputStream fileOut = new FileOutputStream(file.getAbsolutePath())) {
fileOut.write(content.getBytes(Charset.defaultCharset()));
}
}
I think that try-with-resources closes resource at the end of a statement, so this code can be potentially the source of problems. Am I correct?
Read / Write lock implementations may be helpful for this kind of scenario to ensure thread safe operations.
Refer this http://tutorials.jenkov.com/java-concurrency/read-write-locks.html for more..

Creating observables that do IO work

I have several resources in my app I need to load and dump into my database on first launch. I want to do this parallely.
So i created an observable wrapper around reading a file.
#Override
public Observable<List<T>> loadDataFromFile() {
return Observable.create(new Observable.OnSubscribe<List<T>>() {
#Override
public void call(Subscriber<? super List<T>> subscriber) {
LOG.info("Starting load from file for %s ON THREAD %d" + type, Thread.currentThread().getId());
InputStream inputStream = null;
try {
Gson gson = JsonConverter.getExplicitGson();
inputStream = resourceWrapper.openRawResource(resourceId);
InputStreamReader inputStreamReader = new InputStreamReader(inputStream);
List<T> tList = gson.fromJson(inputStreamReader, type);
subscriber.onNext(tList);
subscriber.onCompleted();
LOG.info("Completed load from file for " + type);
} catch (Exception e) {
LOG.error("An error occurred loading the file");
subscriber.onError(e);
} finally {
if (inputStream != null) {
try {
inputStream.close();
} catch (IOException e) {
}
}
}
}
});
}
However its not asynchronous, There are two approaches to making this asynchronous that i see:
1) Do the asynchrony inside the observable Spawn a new thread or use a callback based file reading api.
2) Use a scheduler to do the work on an I/O thread,
Again for the DB i have to create my own observable that wraps the databases Api and there is a synchronous and asynchronous version with a callback.
So what is the correct way of creating observables that do i/o work?
Secondly How can i use these observables in a chain to read these files all in parallel, then for each store the contents in the DB. I want to receive an onCompleted event when the entire process is complete for all my reference data.
One good thing about RX is you can control on what thread your "work" is done. You can use
subscribeOn(Schedulers.io())
If you want to load resources in parallel I suggest using the merge (or mergeDelayError) operator.
Assuming you have a function
Observable<List<T>> loadDataFromresource(int resID)
to load one resource, you could first create a list of observables for each resource
for (int i=0 ; i<10; i++) {
obsList.add(loadDataFromresource(i+1).subscribeOn(Schedulers.io()));
}
associating a scheduler with each observable. Merge the observables using
Observable<List<T>> mergedObs = Observable.merge(obsList);
Subscribing to the resulting observable should then load the resources in parallel. If you'd like to delay errors until the end of the merged observable then use
Observable<List<T>> mergedObs = Observable.mergeDelayError(obsList);
I'm not a Java developer, but in C# this is basically how this kind of code should be structured:
public IObservable<string> LoadDataFromFile()
{
return
Observable.Using(
() => new FileStream("path", FileMode.Open),
fs =>
Observable.Using(
() => new StreamReader(fs),
sr => Observable.Start(() => sr.ReadLine())));
}
Hopefully you can adapt from that.

Categories

Resources