why does spring webflux (or java nio) makes multipartData DataBuffer tmp file?
in my case on macOS, files like /private/var/folders/v6/vtrxqpbd4lb3pq8v_sbm10hc0000gn/T/nio-file-upload/nio-body-1-82f11dbe-61b3-4e5d-8c43-92e02aa38481.tmp made on request and then deleted.
is it possible to improve performance with preventing disk write?
this is my code:
public class FileHandler {
public Mono<ServerResponse> postFile(ServerRequest req) {
val file = req.multipartData()
.map(map -> map.getFirst("file"))
.ofType(FilePart.class);
val buffer = file.flatMap(part -> part.content().next());
val hash = buffer.map(d -> {
try {
val md = MessageDigest.getInstance("SHA-1");
md.update(d.asByteBuffer());
return Base64Utils.encodeToString(md.digest());
} catch (NoSuchAlgorithmException e) {
// does not reach here!
return "";
}
});
val name = file.map(FilePart::filename);
return ok().body(hash, String.class);
}
}
The multipart file support in Spring WebFlux is using the Synchronoss NIO Multipart library. The downside of that implementation is that it's not fully reactive and as a result it can create temporary files to not load the whole content in memory.
What makes you think that this behavior is a performance problem? Do you have a sample or benchmark results that show that this is an issue?
The Spring Framework team already worked on this and a fully reactive implementation will be available as the default in Spring Framework 5.2 (see spring-framework#21659).
Related
I am writing a Service that obtains data from large sql query in database (over 100,000 records) and streams into an API CSV File. Is there any java library function that does it, or any way to make the code below more efficient? Currently using Java 8 in Spring boot environment.
Code is below with sql repository method, and service for csv. Preferably trying to write to csv file, while data is being fetched from sql concurrently as query make take 2-3 min for user .
We are using Snowflake DB.
public class ProductService {
private final ProductRepository productRepository;
private final ExecutorService executorService;
public ProductService(ProductRepository productRepository) {
this.productRepository = productRepository;
this.executorService = Executors.newFixedThreadPool(20);
}
public InputStream getproductExportFile(productExportFilters filters) throws IOException {
PipedInputStream is = new PipedInputStream();
PipedOutputStream os = new PipedOutputStream(is);
executorService.execute(() -> {
try {
Stream<productExport> productStream = productRepository.getproductExportStream(filters);
Field[] fields = Stream.of(productExport.class.getDeclaredFields())
.peek(f -> f.setAccessible(true))
.toArray(Field[]::new);
String[] headers = Stream.of(fields)
.map(Field::getName).toArray(String[]::new);
CSVFormat csvFormat = CSVFormat.DEFAULT.builder()
.setHeader(headers)
.build();
OutputStreamWriter outputStreamWriter = new OutputStreamWriter(os);
CSVPrinter csvPrinter = new CSVPrinter(outputStreamWriter, csvFormat);
productStream.forEach(productExport -> writeproductExportToCsv(productExport, csvPrinter, fields));
outputStreamWriter.close();
csvPrinter.close();
} catch (Exception e) {
logger.warn("Unable to complete writing to csv stream.", e);
} finally {
try {
os.close();
} catch (IOException ignored) { }
}
});
return is;
}
private void writeProductExportToCsv(productExport productExport, CSVPrinter csvPrinter, Field[] fields) {
Object[] values = Stream.of(fields).
map(f -> {
try {
return f.get(productExport);
} catch (IllegalAccessException e) {
return null;
}
})
.toArray();
try {
csvPrinter.printRecord(values);
csvPrinter.flush();
} catch (IOException e) {
logger.warn("Unable to write record to file.", e);
}
}
public Stream<PatientExport> getProductExportStream(ProductExportFilters filters) {
MapSqlParameterSource parameterSource = new MapSqlParameterSource();
parameterSource.addValue("customerId", filters.getCustomerId().toString());
parameterSource.addValue("practiceId", filters.getPracticeId().toString());
StringBuilder sqlQuery = new StringBuilder("SELECT * FROM dbo.Product ");
sqlQuery.append("\nWHERE CUSTOMERID = :customerId\n" +
"AND PRACTICEID = :practiceId\n"
);
Streaming allows you to transfer the data, little by little, without having to load it all into the server’s memory. You can do your operations by using the extractData() method in ResultSetExtractor. You can find javadoc about ResultSetExtractor here.
You can view an example using ResultSetExtractor here.
You can also easily create your JPA queries as ResultSet using JdbcTemplate. You can take a look at an example here. to use ResultSetExtractor.
There is product which we bought some time ago for our company, we got even the source code back then. https://northconcepts.com/ We were also evaluating Apache Camel which had similar support but it didnt suite our goal. If you really need speed you should go to lowest level possible - pure JDBC and as simple as possible csv writer.
Nortconcepts library itself provides capability to read from jdbc and write to CSV on lower level. We found few tweaks which have sped up the transmission and processing. With single thread we are able to stream 100 000 records (with 400 columns) within 1-2 minutes.
Given that you didn't specify which database you use I can give you only generic answers.
In general code like this is network limited, as JDBC resultset is usually transferred in "only n rows" packages, and when you exhaust one, only then database triggers fetching of next packet. This property is often called fetch-size, and you should greatly increase it. By default settings, most of databases transfer 10-100 rows in one fetch. In spring you can use setFetchSize property. Some benchmarks here.
There are other similar low level stuff which you could do. For example, Oracle jdbc driver has "InsensitiveResultSetBufferSize" - how big in bytes is a buffer which holds result set. But dose things tend to be database specific.
Thus being said, the best way to really increase speed of your transfer is to actually launch multiple queries. Divide your data on some column value, and than launch multiple parallel queries. Essentially, if you can design data to support parallel queries working on easily distinguished subsets, bottleneck can be transferred to a network or CPU throughput.
For example one of your columns might be 'timestamp'. Instead having one query to fetch all rows, fetch multiple subset of rows with query like this:
SELECT * FROM dbo.Product
WHERE CUSTOMERID = :customerId
AND PRACTICEID = :practiceId
AND :lowerLimit <= timestamp AND timestamp < :upperLimit
Launch this query in parallel with different timestamp ranges. Aggregate result of those subqueries in shared ConcurrentLinkedQueue and build CSV there.
With similar approach I regularly read 100000 rows/sec on 80 column table from Oracle DB. That is 40-60 MB/sec sustained transfer rate from a table which is not even locked.
In PrimeFaces 8.0 the DefaultStreamedContent cannot be initialized like new DefaultStreamedContent(inputStream, contentType, name) because it has been deprecated, instead you shound use DefaultStreamedContent.builder().
Although while doing .stream() it asks for a SerializableSupplier<InputStream> instead of an InputStream like in the version before 8.0.
DefaultStreamedContent.builder().contentType(contentType).name(name).stream(is).build();
^^
How can I convert a InputStream to a SerializableSupplier?
Everthing is in the migration guide here: https://github.com/primefaces/primefaces/wiki/Migration-Guide.
in general the following will work:
DefaultStreamedContent.builder().contentType(contentType).name(name).stream(() -> is).build();
But the idea behind the change is a different.
If you use a RequestScoped bean to build the StreamedContent, your bean and therefore the StreamedContent will be created twice:
when rendering the view
when streaming the resource (this is a new browser request!)
In this case, your is will probably created 2 times. Most of the times this results in 1 useless IO access or DB call.
To only create the is one time, you should lazy initialize it via the supplier lambda:
DefaultStreamedContent.builder().contentType(contentType).name(name).stream(() -> new FileInputStream(....)).build();
This worked for me
DataHandler dataHandler = someBean.getFileData();
byte contents[] = IOUtils.toByteArray(dataHandler.getInputStream());
StreamedContent streamedContent = DefaultStreamedContent.builder()
.name(someBean.getFileName())
.contentType("application/octet-stream")
.stream(() -> new ByteArrayInputStream(contents)).build();
The lazy initialize answer above by #tandraschko did not work for me in Netbeans using java 8. I had to have the FileInputStream created before injecting it into the builder.
So my code looks like :
public StreamedContent getFiledownload() {
FileInputStream fis = new FileInputStream("...");
filedownload = DefaultStreamedContent.builder()
.contentType("...")
.name("...")
.stream(() -> fis)
.build();
return filedownload;
}
Thought I would comment just in case someone else was running into compiling issues.
For MySQL stored image I use this:
resultset = statement.executeQuery("call sp_query()");
if(resultset.next()) {
new DefaultStreamedContent();
StreamedContent photo = DefaultStreamedContent.builder().contentType("contentType").name("name").stream(() -> resultset.getBinaryStream("picture")).build());
}
// Close the connection
con.close();
I am writing a quick proof-of-concept for downloading images from Azure Blob Storage using the Java 12 Azure Storage SDK. The following code works properly when I convert it to synchronous. However, despite the subscribe() at the bottom of the code, I only see the subscription message. The success and error handlers are not firing. I would appreciate any suggestions or ideas.
Thank you for your time and help.
private fun azureReactorDownload() {
var startTime = 0L
var accountName = "abcd"
var key = "09sd0908sd08f0s&&6^%"
var endpoint = "https://${accountName}.blob.core.windows.net/$accountName
var containerName = "mycontainer"
var blobName = "animage.jpg"
// Get the Blob Service client, so we can use it to access blobs, containers, etc.
BlobServiceClientBuilder()
// Container URL
.endpoint(endpoint)
.credential(
SharedKeyCredential(
accountName,
key
)
)
.buildAsyncClient()
// Get the container client so we can work with our container and its blobs.
.getContainerAsyncClient(containerName)
// Get the block blob client so we can access individual blobs and include the path
// within the container as part of the filename.
.getBlockBlobAsyncClient(blobName)
// Initiate the download of the desired blob.
.download()
.map { response ->
// Drill down to the ByteBuffer.
response.value()
}
.doOnSubscribe {
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subscription arrived.")
startTime = System.currentTimeMillis()
}
.doOnSuccess { data ->
data.map { byteBuffer ->
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> READY TO WRITE TO THE FILE")
byteBuffer.writeToFile("/tmp/azrxblobdownload.jpg")
val elapsedTime = System.currentTimeMillis() - startTime
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Finished downloading blob in $elapsedTime ms.")
}
}
.doOnError {
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Failed to download blob: ${it.localizedMessage}")
}
.subscribe()
}
fun ByteBuffer.writeToFile(path: String) {
val fc = FileOutputStream(path).channel
fc.write(this)
fc.close()
}
I see someone asking the same question 4 months ago and getting no answer:
Azure Blob Storage Java SDK: Why isn't asynchronous working?
I'm going to conjecture that this part of the JDK just isn't working right now. I wouldn't recommend using Azure's version of Java.
You should be able to accomplish it another way perhaps one of these answers:
Downloading Multiple Files Parallelly or Asynchronously in Java
I've worked with Microsoft and have a documented solution at the following link: https://github.com/Azure/azure-sdk-for-java/issues/5071. The person who worked with me provided very good background information, so it is more than just some working code.
I have opened a similar query with Microsoft for the downloadToFile() method in the Azure Java SDK v12, which is throwing an exception when saving to a file.
Here is the working code from that posting:
private fun azureReactorDownloadMS() {
var startTime = 0L
val chunkCounter = AtomicInteger(0)
// Get the Blob Service client, so we can use it to access blobs, containers, etc.
val aa = BlobServiceClientBuilder()
// Container URL
.endpoint(kEndpoint)
.credential(
SharedKeyCredential(
kAccountName,
kAccountKey
)
)
.buildAsyncClient()
// Get the container client so we can work with our container and its blobs.
.getContainerAsyncClient(kContainerName)
// Get the block blob client so we can access individual blobs and include the path
// within the container as part of the filename.
.getBlockBlobAsyncClient(kBlobName)
.download()
// Response<Flux<ByteBuffer>> to Flux<ByteBuffer>
.flatMapMany { response ->
response.value()
}
.doOnSubscribe {
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subscription arrived.")
startTime = System.currentTimeMillis()
}
.doOnNext { byteBuffer ->
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CHUNK ${chunkCounter.incrementAndGet()} FROM BLOB ARRIVED...")
}
.doOnComplete {
val elapsedTime = System.currentTimeMillis() - startTime
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Finished downloading ${chunkCounter.incrementAndGet()} chunks of data for the blob in $elapsedTime ms.")
}
.doOnError {
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Failed to download blob: ${it.localizedMessage}")
}
.blockLast()
}
I am developing an API REST using Spring WebFlux, but I have problems when uploading files. They are stored but I don't get the expected return value.
This is what I do:
Receive a Flux<Part>
Cast Part to FilePart.
Save parts with transferTo() (this return a Mono<Void>)
Map the Mono<Void> to Mono<String>, using file name.
Return Flux<String> to client.
I expect file name to be returned, but client gets an empty string.
Controller code
#PostMapping(value = "/muscles/{id}/image")
public Flux<String> updateImage(#PathVariable("id") String id, #RequestBody Flux<Part> file) {
log.info("REST request to update image to Muscle");
return storageService.saveFiles(file);
}
StorageService
public Flux<String> saveFiles(Flux<Part> parts) {
log.info("StorageService.saveFiles({})", parts);
return
parts
.filter(p -> p instanceof FilePart)
.cast(FilePart.class)
.flatMap(file -> saveFile(file));
}
private Mono<String> saveFile(FilePart filePart) {
log.info("StorageService.saveFile({})", filePart);
String filename = DigestUtils.sha256Hex(filePart.filename() + new Date());
Path target = rootLocation.resolve(filename);
try {
Files.deleteIfExists(target);
File file = Files.createFile(target).toFile();
return filePart.transferTo(file)
.map(r -> filename);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
FilePart.transferTo() returns Mono<Void>, which signals when the operation is done - this means the reactive Publisher will only publish an onComplete/onError signal and will never publish a value before that.
This means that the map operation was never executed, because it's only given elements published by the source.
You can return the name of the file and still chain reactive operators, like this:
return part.transferTo(file).thenReturn(part.filename());
It is forbidden to use the block operator within a reactive pipeline and it even throws an exception at runtime as of Reactor 3.2.
Using subscribe as an alternative is not good either, because subscribe will decouple the transferring process from your request processing, making those happen in different execution sequences. This means that your server could be done processing the request and close the HTTP connection while the other part is still trying to read the file part to copy it on disk. This is likely to fail in subtle ways at runtime.
FilePart.transferTo() returns Mono<Void> that is a constant empty. Then, map after that was never executed. I solved it by doing this:
private Mono<String> saveFile(FilePart filePart) {
log.info("StorageService.saveFile({})", filePart);
String filename = DigestUtils.sha256Hex(filePart.filename() + new Date());
Path target = rootLocation.resolve(filename);
try {
Files.deleteIfExists(target);
File file = Files.createFile(target).toFile();
return filePart
.transferTo(file)
.doOnSuccess(data -> log.info("do something..."))
.thenReturn(filename);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
I'm using Apache JClouds to connect to my Openstack Swift installation. I managed to upload and download objects from Swift. However, I failed to see how to upload dynamic large object to Swift.
To upload dynamic large object, I need to upload all segments first, which I can do as usual. Then I need to upload a manifest object to combine them logically. The problem is to tell Swift this is a manifest object, I need to set a special header, which I don't know how to do that using JClouds api.
Here's a dynamic large object example from openstack official website.
The code I'm using:
public static void main(String[] args) throws IOException {
BlobStore blobStore = ContextBuilder.newBuilder("swift").endpoint("http://localhost:8080/auth/v1.0")
.credentials("test:test", "test").buildView(BlobStoreContext.class).getBlobStore();
blobStore.createContainerInLocation(null, "container");
ByteSource segment1 = ByteSource.wrap("foo".getBytes(Charsets.UTF_8));
Blob seg1Blob = blobStore.blobBuilder("/foo/bar/1").payload(segment1).contentLength(segment1.size()).build();
System.out.println(blobStore.putBlob("container", seg1Blob));
ByteSource segment2 = ByteSource.wrap("bar".getBytes(Charsets.UTF_8));
Blob seg2Blob = blobStore.blobBuilder("/foo/bar/2").payload(segment2).contentLength(segment2.size()).build();
System.out.println(blobStore.putBlob("container", seg2Blob));
ByteSource manifest = ByteSource.wrap("".getBytes(Charsets.UTF_8));
// TODO: set manifest header here
Blob manifestBlob = blobStore.blobBuilder("/foo/bar").payload(manifest).contentLength(manifest.size()).build();
System.out.println(blobStore.putBlob("container", manifestBlob));
Blob dloBlob = blobStore.getBlob("container", "/foo/bar");
InputStream input = dloBlob.getPayload().openStream();
while (true) {
int i = input.read();
if (i < 0) {
break;
}
System.out.print((char) i); // should print "foobar"
}
}
The "TODO" part is my problem.
Edited:
I've been pointed out that Jclouds handles large file upload automatically, which is not so useful in our case. In fact, we do not know how large the file will be or when the next segment will arrive at the time we start to upload the first segment. Our api is designed to make client able to upload their files in chunks of their own chosen size and at their own chosen time, and when done, call a 'commit' to make these chunks as a file. So this makes us want to upload the manifest on our own here.
According to #Everett Toews's answer, I've got my code correctly running:
public static void main(String[] args) throws IOException {
CommonSwiftClient swift = ContextBuilder.newBuilder("swift").endpoint("http://localhost:8080/auth/v1.0")
.credentials("test:test", "test").buildApi(CommonSwiftClient.class);
SwiftObject segment1 = swift.newSwiftObject();
segment1.getInfo().setName("foo/bar/1");
segment1.setPayload("foo");
swift.putObject("container", segment1);
SwiftObject segment2 = swift.newSwiftObject();
segment2.getInfo().setName("foo/bar/2");
segment2.setPayload("bar");
swift.putObject("container", segment2);
swift.putObjectManifest("container", "foo/bar2");
SwiftObject dlo = swift.getObject("container", "foo/bar", GetOptions.NONE);
InputStream input = dlo.getPayload().openStream();
while (true) {
int i = input.read();
if (i < 0) {
break;
}
System.out.print((char) i);
}
}
jclouds handles writing the manifest for you. Here are a couple of examples that might help you, UploadLargeObject and largeblob.MainApp.
Try using
Map<String, String> manifestMetadata = ImmutableMap.of(
"X-Object-Manifest", "<container>/<prefix>");
BlobBuilder.userMetadata(manifestMetadata)
If that doesn't work you might have to use the CommonSwiftClient like in CrossOriginResourceSharingContainer.java.