Spring Boot exporting huge database to csv via REST endpoint - java

I need to build a spring boot application which exposes a REST endpoint to export a huge database table as CSV file with different filter parameters. I am trying to find an efficient solution to this problem.
Currently, I am using spring-data-jpa to query the database table, which returns a list of POJOs. Then write this list to HttpServletResponse as CSV file using Apache Commons CSV. There are couple of issues with this approach. First, it loads all the data into memory. And secondly, it is slow.
I am not doing any business logic with the data, is it necessary to use jpa and entity(POJO) in this case. I feel this is the area where causing the problem.

You can try the new SpringWebflux introduced with Spring 5:
https://www.baeldung.com/spring-webflux

First create the controller a Flux from DataBuffer:
#GetMapping(path = "/report/detailReportFile/{uid}" , produces = "text/csv")
public Mono<Void> getWorkDoneReportDetailSofkianoFile (#PathVariable(name = "uid") String uid,
#RequestParam(name = "startDate", required = false, defaultValue = "0") long start,
#RequestParam(name = "endDate" , required = false, defaultValue = "0") long end,
ServerHttpResponse response) {
var startDate = start == 0 ? GenericData.GENERIC_DATE : new Date(start);
var endDate = end == 0 ? new Date() : new Date(end);
response.getHeaders().set(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename="+uid+".csv");
response.getHeaders().add("Accept-Ranges", "bytes");
Flux<DataBuffer> df = queryWorkDoneUseCase.findWorkDoneByIdSofkianoAndDateBetween(uid, startDate, endDate).collectList()
.flatMapMany(workDoneList -> WriteCsvToResponse.writeWorkDone(workDoneList));
return response.writeWith(df);
}
Now the DataBuffer must be created in my case create it using opencsv with a StringBuffer
public static Flux<DataBuffer> writeWorkDone(List<WorkDone> workDoneList) {
try {
StringWriter writer = new StringWriter();
ColumnPositionMappingStrategy<WorkDone> mapStrategy = new ColumnPositionMappingStrategy<>();
mapStrategy.setType(WorkDone.class);
String[] columns = new String[]{"idSofkiano", "nameSofkiano","idProject", "nameProject", "description", "hours", "minutes", "type"};
mapStrategy.setColumnMapping(columns);
StatefulBeanToCsv<WorkDone> btcsv = new StatefulBeanToCsvBuilder<WorkDone>(writer)
.withQuotechar(CSVWriter.NO_QUOTE_CHARACTER)
.withMappingStrategy(mapStrategy)
.withSeparator(',')
.build();
btcsv.write(workDoneList);
return Flux.just(stringBuffer(writer.getBuffer().toString()));
} catch (CsvException ex) {
return Flux.error(ex.getCause());
}
}
private static DataBuffer stringBuffer(String value) {
byte[] bytes = value.getBytes(StandardCharsets.UTF_8);
NettyDataBufferFactory nettyDataBufferFactory = new NettyDataBufferFactory(ByteBufAllocator.DEFAULT);
DataBuffer buffer = nettyDataBufferFactory.allocateBuffer(bytes.length);
buffer.write(bytes);
return buffer;
}

Related

How to combine a WebFlux WebClient DataBuffer download with more actions

I am trying to download a file (or multiple files), based on the result of a previous webrequest. After downloading the file I need to send the previous Mono result (dossier and obj) and the file to another system. So far I have been working with flatMaps and Monos. But when reading large files, I cannot use the Mono during the file download, as the buffer is too small.
Simplified the code looks something like this:
var filePath = Paths.get("test.pdf");
this.dmsService.search()
.flatMap(result -> {
var dossier = result.getObjects().get(0).getProperties();
var objectId = dossier.getReferencedObjectId();
return Mono.zip(this.dmsService.getById(objectId), Mono.just(dossier));
})
.flatMap(tuple -> {
var obj = tuple.getT1();
var dossier = tuple.getT2();
var media = this.dmsService.getDocument(objectId);
var writeMono = DataBufferUtils.write(media, filePath);
return Mono.zip(Mono.just(obj), Mono.just(dossier), writeMono);
})
.flatMap(tuple -> {
var obj = tuple.getT1();
var dossier = tuple.getT2();
var objectId = dossier.getReferencedObjectId();
var zip = zipService.createZip(objectId, obj, dossier);
return zipService.uploadZip(Flux.just(zip));
})
.flatMap(newWorkItemId -> {
return updateMetadata(newWorkItemId);
})
.subscribe(() -> {
finishItem();
});
dmsService.search(), this.dmsService.getById(objectId), zipService.uploadZip() all return Mono of a specific type.
dmsService.getDocument(objectId) returns a Flux due to support for large files. With a DataBuffer Mono it was worked for small files if I simply used a Files.copy:
...
var contentMono = this.dmsService.getDocument(objectId);
return contentMono;
})
.flatMap(content -> {
Files.copy(content.asInputStream(), Path.of("test.pdf"));
...
}
I have tried different approaches but always ran into problems.
Based on https://www.amitph.com/spring-webclient-large-file-download/#Downloading_a_Large_File_with_WebClient
DataBufferUtils.write(dataBuffer, destination).share().block();
When I try this, nothing after .block() is ever executed. No download is made.
Without the .share() I get an exception, that I may not use block:
java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking, which is not supported in thread reactor-http-nio-5
Since DataBufferUtils.write returns a Mono my next assumption was, that instead of calling block, I can Mono.zip() this together with my other values, but this never returns either.
var media = this.dmsService.getDocument(objectId);
var writeMono = DataBufferUtils.write(media, filePath);
return Mono.zip(Mono.just(obj), Mono.just(dossier), writeMono);
Any inputs on how to achieve this are greatly appreachiated.
I finally figured out that if I use a WritableByteChannel which returns a Flux<DataBuffer> instead of a Mono<Void> I can map the return value to release the DataBufferUtils, which seems to do the trick. I found the inspiration for this solution here: DataBuffer doesn't write to file
var media = this.dmsService.getDocument(objectId);
var file = Files.createTempFile(objectId, ".tmp");
WritableByteChannel filechannel = Files.newByteChannel(file, StandardOpenOption.WRITE);
var writeMono = DataBufferUtils.write(media, filechannel)
.map(DataBufferUtils::release)
.then(Mono.just(file));
return Mono.zip(Mono.just(obj), Mono.just(dossier), writeMono);

how to read csv file which should not contains "" or ' or new line and then insert the modified csv into db

i have a csv file which i should read using apache poi,while reading the file it should follow some pattern like data should not have ' or '' or new line like that.After validating we need to insert the validated csv into db.The code which i wrote for this is below.
#RequestMapping(value = "/insert", method = RequestMethod.POST)
public void uploadData(#RequestParam("file") final MultipartFile DataFile,
#PathVariable("DataType") final String DataType,
final Model model, final HttpServletRequest request,
final HttpServletResponse response) throws Exception {
byte[] bytes = null;
InputStream inputStream = null;
if (DataFile != null && !DataFile.isEmpty()) {
inputStream = DataFile.getInputStream();
LOGGER.info("Making Service call to save imported Enrichment details in DB ");
if (StringUtils.equalsIgnoreCase(DataType, "csvData1")) {
bytes = enrichmentDataFile.getBytes();
inputStream = new ByteArrayInputStream(Pattern.compile("(\\r\"|\\n\"|\\r\"\\n\"|\"|\')+")
.matcher(new String(bytes)).replaceAll("").getBytes(Charset.forName("UTF-8")));
DataService.insertData(inputStream,
DataFile.getOriginalFilename());//reading data using Apache POI and inserting into db
} else if (StringUtils.equalsIgnoreCase(DataType, "csvData2")) {
DataService.insertData(inputStream,
DataFile.getOriginalFilename());//reading data using Apache POI and inserting into db
}
}
}
iam able to insert csvData2 into db but when iam trying to insert csvData1,it was creating an empty file and that file was inserting into db.
can anyone suggest How can i validate the inputstream(scv) without having any ' or " or new lines and then insert validated one into db
Your InputStream mutation works perfectly. Problem should be in DataService.insertData method.

How to use Wordnet Synonyms with Hibernate Search?

I've been trying to figure out how to use WordNet synonyms with a search function I'm developing which uses Hibernate Search 5.6.1. At first, I thought about using Hibernate Search annotations:
#TokenFilterDef(factory = SynonymFilterFactory.class, params = {#Parameter(name = "ignoreCase", value = "true"),
#Parameter(name = "expand", value = "true"),#Parameter(name = "synonyms", value = "synonymsfile") })
However, this requires an actual file populated with synonyms. From WordNet I was only able to get ".pl" files. So I tried manually making a SynonymAnalyzer class which would read from the ".pl" file:
public class SynonymAnalyzer extends Analyzer {
#Override
protected TokenStreamComponents createComponents(String fieldName) {
final Tokenizer source = new StandardTokenizer();
TokenStream result = new StandardFilter(source);
result = new LowerCaseFilter(result);
SynonymMap wordnetSynonyms = null;
try {
wordnetSynonyms = loadSynonyms();
} catch (IOException e) {
e.printStackTrace();
}
result = new SynonymFilter(result, wordnetSynonyms, false);
result = new StopFilter(result, StopAnalyzer.ENGLISH_STOP_WORDS_SET);
return new TokenStreamComponents(source, result);
}
private SynonymMap loadSynonyms() throws IOException {
File file = new File("synonyms\\wn_s.pl");
InputStream stream = new FileInputStream(file);
Reader reader = new InputStreamReader(stream);
SynonymMap.Builder parser = null;
parser = new WordnetSynonymParser(true, true, new StandardAnalyzer(CharArraySet.EMPTY_SET));
try {
((WordnetSynonymParser) parser).parse(reader);
} catch (ParseException e) {
e.printStackTrace();
}
return parser.build();
}
}
The problem with this method is that I'm getting java.lang.OutOfMemoryError which I'm assuming is because there's too many synonyms or something? What is the proper way to do this, everywhere I've looked online has suggested using WordNet but I can't seem to find an example with Hibernate Search Annotations. Any help is appreciated, thanks!
The wordnet format is actually supported by SynonymFilterFactory. You're simply missing the "format" parameter in your annotation configuration; by default, the factory uses the Solr format.
Change your annotation to this:
#TokenFilterDef(
factory = SynonymFilterFactory.class,
params = {
#Parameter(name = "ignoreCase", value = "true"),
#Parameter(name = "expand", value = "true"),
#Parameter(name = "synonyms", value = "synonymsfile"),
#Parameter(name = "format", value = "wordnet") // Add this
}
)
Also, make sure that the value of the "synonyms" parameter is the path of a file in your classpath (e.g. "com/acme/synonyms.pl", or just "synonyms.pl" if the file is at the root of your "resources" directory).
In general when you have an issue with the parameters of a Lucene filter/tokenizer factory, your best bet is having a look at the source code of that factory, or having a look at this page.

How to get server base path to find and read a json

I have this rest and I'm trying to mock some responses
I'm working on a WebSphere server with Spring Boot
#RequestMapping(method = RequestMethod.GET, value = "", produces = {MediaType.APPLICATION_JSON_VALUE, "application/hal+json"})
public Resources<String> getAssemblyLines() throws IOException {
String fullMockPath = servletContext.getContextPath() + "\\assets\\services-mocks\\assembly-lines\\get-assembly-lines.ok.json";
List<String> result = new ArrayList<String>();
result.add(fullMockPath);
try {
byte[] rawJson = Files.readAllBytes(Paths.get(fullMockPath));
Map<String, String> mappedJson = new HashMap<String, String>();
String jsonMock = new ObjectMapper().writeValueAsString(mappedJson);
result = new ArrayList<String>();
result.add(jsonMock);
} catch (IOException e) {
result.add("Not found");
result.add(e.getMessage());
}
return new Resources<String>(result,
linkTo(methodOn(this.getClass()).getAssemblyLines()).withSelfRel());
}
I get
FileNotFoundException
Tired Tushinov's solution
System.getProperty("user.dir");
But that returns the PATH of my server, not of my document root (and yes, they're in different folders)
How can understand my base path?
To your question How can understand my base path?. You can use:
System.getProperty("user.dir")
System.getProperty("user.dir")will return the path to your project.
Example output:
C:\folder_with_java_projects\CURRENT_PROJECT
So if the file is inside your project folder you can just do the following:
System.getProperty("user.dir") + "somePackage\someJson.json";

How to accept Form Params with multipart file upload

I'm working on a REST application hosted on a Liberty server that should allow support for file upload. As a part of this POST request, I'd like to allow the user to specify a few things with Form parameters. Despite using the example from IBM here, I can't quite get it to work. I can retrieve information related to the uploaded File using headers (like in the example). But when I try the same method to retrieve information about the headers, the only thing available is the name of the fields, but not their values.
I'm also trying to maintain support for Swagger..
Here's a snippet of my code:
#ApiOperation("Upload Source Code to Build")
#POST
#Path("/")
#Consumes(MediaType.MULTIPART_FORM_DATA)
#Produces(MediaType.MULTIPART_FORM_DATA)
#ApiResponses({
#ApiResponse(code = 200, message= "OK", response=String.class)})
#ApiImplicitParams({
#ApiImplicitParam(
name="source",
value = "Source File",
required = true,
dataType = "java.io.File",
paramType = "form",
type="file"),
#ApiImplicitParam(
name="archive_name",
value = "Name of File to Retrieve",
required = true,
dataType = "java.lang.String",
paramType = "form",
type="string"),
#ApiImplicitParam(
name="build_command",
value = "Command to Run",
required = true,
dataType = "java.lang.String",
paramType = "form",
type="string")})
public Response uploadFileBuildArchive(
#ApiParam(hidden=true) IMultipartBody multipartBody,
#Context UriInfo uriInfo) throws IOException {
List<IAttachment> attachments = multipartBody.getAllAttachments();
InputStream stream = null;
Iterator<IAttachment> it = attachments.iterator();
IAttachment attachment = it.next();
if (attachment == null)
return Response.status(400).entity("No file attached").build();
StringBuilder fileName = new StringBuilder("");
StringBuilder archiveName = new StringBuilder("");
StringBuilder buildCommand = new StringBuilder("");
for(int i = 0; it.hasNext(); i++) {
DataHandler dataHandler = attachment.getDataHandler();
MultivaluedMap<String, String> map = attachment.getHeaders();
String[] contentDisposition = map.getFirst("Content-Disposition").split(";");
if(i == 0) {
stream = dataHandler.getInputStream();
Response saveFileResponse = handleFileStream(contentDisposition, stream, fileName);
if (saveFileResponse != null)
return saveFileResponse;
} else if (i == 1) {
if (!handleFormParam(contentDisposition, archiveName))
return Response.status(400).entity("Missing FormParam: archive_name").build();
}
.
.
.
private boolean handleFormParam(String[] contentDisposition, StringBuilder stringBuilder) {
String formElementName = null;
for (String tempName : contentDisposition) {
String[] names = tempName.split("=");
for (String n : names) {
System.out.println(tempName + ":" + n);
}
}
if (stringBuilder.length() == 0) {
return false;
}
return true;
}
The only information about the "archive_name" form param I can get from the attachment is that the name of the param is "archive_name". How do I get the value associated with "archive_name"?
I think you're saying you want to submit other text form parameters along with the file to upload, correct? Like:
<form action="upload" method="post" enctype="multipart/form-data">
Name: <input name="myName" type="text"/><br>
File: <input name="document" type="file"/><br>
<input type="submit"/>
</form>
If so, while I've only done this under older JAX-RS, I think the issue is that those extra field values don't arrive inside the Content-Disposition header, but like in the "body" of the "part". The older WebSphere JAX-RS, based on Apache Wink instead of CXF, actually had a part.getBody() method, and this is what worked for us.
If I use browser developer tools to watch the actual POST content of such a form, this is what the extra parameters look like:
-----------------------------18059782765430180341846081335
Content-Disposition: form-data; name="myName"
Joe Smith
So I'd look for other methods on that (apparently IBM-specific) class, IAttachment, or maybe the DataHandler, that might get you text from the body of the part.

Categories

Resources