Extraneous data when reading/writing binary data

Extraneous data when reading/writing binary data - java

I am trying to write a process that will retrieve a file (various types - pdf, txt, docx, tif, etc.) via a rest API call, convert that file's binary data from base64 encoding to un-encoded, and write the file to another location (to be picked up by another process). All of the above is working, but if the file type is anything other than txt, the newly written out file will not open.
public File retrieveDocument(String in_ItemId, File in_DestinationFile, Map<String, String> in_DocumentProperties)
throws IOException {
byte[] binaryData = new byte[8198];
try {
String url = "filestore url";
RestTemplate restTemplate = new RestTemplate();
List<HttpMessageConverter<?>> messageConverters = new ArrayList<HttpMessageConverter<?>>();
messageConverters.add(new MappingJacksonHttpMessageConverter());
restTemplate.getMessageConverters().add(new StringHttpMessageConverter());
restTemplate.setMessageConverters(messageConverters);
Map documentMap = restTemplate.getForObject(url, Map.class);
if (documentMap.get("binaryData") != null) {
binaryData = Base64.decodeBase64(((String) documentMap.get("binaryData")).getBytes());
}
OutputStream outputStream = new BufferedOutputStream(new FileOutputStream(in_DestinationFile));
outputStream.write(binaryData);
outputStream.close();
} catch (Exception e) {
e.printStackTrace();
}
return in_DestinationFile;
}
When I open both the original and new files in a text editor (i.e., Notepad++) and compare the two, there are a number of additional characters (mainly question marks) in the new file. Below is an example from a tif image. I've added ^ under some of the additional characters in the new file.
Original file:
II* P €?à#$
„BaP¸d6ˆDbQ8¤V-ŒFcQ¸äv=HdR9$–M'”JeR¹d¶]/˜LfS9¤Öm7œNgS¹äö}? PhT:%3Ñ©Tºe6O¨
‡ÄbqX¼f7•ß²<¦W-—ÌfsY¼æw=ŸÐhlÐ=—M§ÔjuZ½f·]¯Øll™-–×m·Ünw[½æ÷}¿à_tœ'Çäry\¾g7hÚsú]
New file:
II* P €?à#$
„BaP¸d6ˆDbQ8¤V-ŒFcQ¸äv=?HdR9$–M'”JeR¹d¶]/˜LfS9¤Öm7œNgS¹äö}? PhT:%3Ñ©Tºe6?O¨
^ ^
‡ÄbqX¼f7?•ß²<¦W-—ÌfsY¼æw=ŸÐhlÐ=—M§ÔjuZ½f·]¯Øll™-–×m·Ünw[½æ÷}¿à_tœ'?Çäry\¾g7?hÚsú]
^ ^
Any ideas as to what I'm doing wrong?

Writer classes including PrintWriter are for text data. Stream classes such as OutputStream are for binary data.
You're converting binary data into a String at which point some binary data can get corrupted.
Get rid of the String strBinaryData and just the byte[] you get from Base64.decodeBase64.

Related

How to get svg in database injava

I am having 1 problem. I save SVG images in the database as binary. Now, I want to download it without converting to base64, is there any way. Thank you.

Basically, that would mean getting the BLOB object from the database.
I would follow this approach to show it in directly in the browser:
#RestController
public class ImageController {
#GetMapping(value = "/img-test", produces = "image/svg+xml")
public byte[] getImg() throws IOException
{
// retrieve your image from the DB
File imgFile = new File("C:\\Users\\USR\\Desktop\\img.svg");
InputStream inp = new DataInputStream(new FileInputStream(imgFile));
return inp.readAllBytes(); // This is a Java 9 specific convertion
}
}
With this approach, you do not change anything on the BLOB image. You take it and return it as is, an array with bytes. And you can directly show it in a browser or embed it somewhere in your HTML file.
The main thing here is the MIME type : image/svg+xml
If you are using an older version of Java, then check this question for the conversion of the InputStream object to a byte array.
And with this approach you can download the file:
#GetMapping("download-img")
public ResponseEntity downloadImg() throws IOException
{
// Get the file from the DB...
File imgFile = new File("C:\\Users\\USR\\Desktop\\img.svg");
InputStream inp = new DataInputStream(new FileInputStream(imgFile));
//Dynamically change the File Name here
return ResponseEntity.ok()
.header(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=\"img.svg\"")
.body(inp.readAllBytes());
}

Read a file using Java from an S3 bucket and HTTP PUT file to presigned AWS S3 URL of another bucket in a way that simulates an actual file upload

New to Java and HTTP requests.
Why this question is not a duplicate: I'm not using AWS SDK to generate any presigned URL. I get it from an external API.
Here is what I'm trying to accomplish:
Step 1: Read the source S3 bucket for a file (for now .xlsx)
Step 2: Parse this file by converting it to an InputStreamReader (I need help here)
Step 3: Do a HTTP PUT of this file by transferring the contents of the InputStreamReader to an OutputStreamWriter, on a pre-signed S3 URL that I already have obtained from an external team. The file must sit in the destination S3 bucket, in the exact way a file is uploaded manually by dragging and dropping. (Also need help here)
Here is what I've tried:
Step 1: Read the S3 bucket for the file
public class LambdaMain implements RequestHandler<S3Event, String> {
#Override
public String handleRequest(final S3Event event, final Context context) {
System.out.println("Create object was called on the S3 bucket");
S3EventNotification.S3EventNotificationRecord record = event.getRecords().get(0);
String srcBucket = record.getS3().getBucket().getName();
String srcKey = record.getS3().getObject().getUrlDecodedKey();
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.withCredentials(DefaultAWSCredentialsProviderChain.getInstance())
.build();
S3Object s3Object = s3Client.getObject(new GetObjectRequest(
srcBucket, srcKey));
String presignedS3Url = //Assume that I have this by making an external API call
InputStreamReader inputStreamReader = parseFileFromS3(s3Object); #Step 2
int responseCode = putContentIntoS3URL(inputStreamReader, presignedS3Url); #Step 3
}
Step 2: Parse the file into an InputStreamReader to copy it to an OutputStreamWriter:
private InputStreamReader parseFileFromS3(S3Object s3Object) {
return new InputStreamReader(s3Object.getObjectContent(), StandardCharsets.UTF_8);
}
Step 3: Make a HTTP PUT call by copying the contents from InputStreamReader to OutputStreamWriter:
private int putContentIntoS3URL(InputStreamReader inputStreamReader, String presignedS3Url) {
URL url = null;
try {
url = new URL(presignedS3Url);
} catch (MalformedURLException e) {
e.printStackTrace();
}
HttpURLConnection httpCon = null;
try {
assert url != null;
httpCon = (HttpURLConnection) url.openConnection();
} catch (IOException e) {
e.printStackTrace();
}
httpCon.setDoOutput(true);
try {
httpCon.setRequestMethod("PUT");
} catch (ProtocolException e) {
e.printStackTrace();
}
OutputStreamWriter outputStreamWriter = null;
try {
outputStreamWriter = new OutputStreamWriter(
httpCon.getOutputStream());
} catch (IOException e) {
e.printStackTrace();
}
try {
IOUtils.copy(inputStreamReader, outputStreamWriter);
} catch (IOException e) {
e.printStackTrace();
}
try {
outputStreamWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
try {
httpCon.getInputStream();
} catch (IOException e) {
e.printStackTrace();
}
int responseCode = 0;
try {
responseCode = httpCon.getResponseCode();
} catch (IOException e) {
e.printStackTrace();
}
return responseCode;
}
The issue with the among approach is that when I read an .xlsx file via an S3 insert trigger and PUT into the URL, when I download the uploaded file - it gets downloaded as some gibberish.
When I try reading in a .png file and PUT into the URL, when I download the uploaded file - it gets downloaded as some text file with some gibberish (I did see the word PNG in it though)
It feels like I'm making mistakes with:
Incorrectly creating an OutputStreamWriter since I don't understand how to send a file via a HTTP request
Assuming that every file type can be handled in a generic way.
Not setting the content-type in the HTTP request
Expecting S3 to magically understand my file type after the PUT operation
I would like to know if my above 4 assumptions are correct or incorrect.
The intention is that, I do the PUT on the file data correctly so it sits in the S3 bucket along with the correct file type/extension. I hope my effort is worthy to garner some help. I've done a lot of searching into HTTP PUT and File/IO, but I'm unable to LINK them together for my use-case, since I perform a File I/O followed by a HTTP PUT.
UPDATE 1:
I've added the setRequestProperty("Content-Type", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"), but the file doesn't sit in the S3 bucket with the file extension. It simply sits there as an object.
UPDATE 2:
I think this also has something to do with setContentDisposition() header, although I'm not sure how I go about setting these headers for Excel files.
UPDATE 3:
This may simply have to do with how the Presigned S3 URL itself is vended out to us. As mentioned in the question, I said that we get the Presigned S3 URL from some other team. The question itself has multiple parts that need answering.
Does the default Presigned S3 URL ALLOW clients to set the content-type and content-disposition in the HTTP header?: I've set up another separate question here since it's quite unclear: Can a client set file name and extension programmatically when he PUTs file content to a presigned S3 URL that the service vends out?
If the answer to above question is TRUE, then and only then must we go into how to set the file contents and write it to the OutputStream

You are using InputStreamReader and OutputStreamWriter, which are both bridges between a byte stream and a character stream. However, you are using these with byte data, which means you first convert your bytes to characters, and then back to bytes. Since your data is not character data, this conversion might explain why you get gibberish as a result.
I'd start trying to get rid of the reader and writer, instead directly using the InputStream (which you already got from s3Object.getObjectContent()), and the OutputStream (which you got from httpCon.getOutputStream()). IOUtils.copy should also support this.
Also as a side note, when you construct the InputStreamReader you set StandardCharsets.UTF_8 as the charset to use, but when you construct the OutputStreamWriter you don't set the charset. Should the default charset not be UTF-8, this conversion would probably also result in gibberish.

Downloading a file with file name greater than 255 characters using MS edge

I am using the spring framework to serve a file for download. I have the following code.
public ResponseEntity<List<Integer>>
defExport()
throws IllegalStateException,
IOException {
Map<String, Object> resultMap = Maps.newHashMap();
int status = STATUS_SUCCESS;
HttpHeaders headers = new HttpHeaders();
List<Integer> byteList = Lists.newArrayList();
try {
File file = transformService.executeExport(something);
byte[] fileContent = null;
try (InputStream is = new FileInputStream(file)) {
fileContent = read(is);
}
String fileName = StringUtils.join(something, ".xlsx");
headers.add("fileName", fileName);
headers.add("Content-Disposition", StringUtils.join(
"attachment; filename=\"", URLEncoder.encode(fileName, "UTF8"), "\""));
for (byte b : fileContent) {
byteList.add(new Integer(b));
}
} catch (Exception e) {
status = STATUS_ERROR;
}
headers.add("ifxResultStatus", String.valueOf(status));
return new ResponseEntity<>(byteList, headers, HttpStatus.OK);
on JS side, I have the follwing:
_self.xhr.open('POST', targetUrl, true);
_self.xhr.onreadystatechange = goog.bind(this.blankDlResponseHandler, this);
_self.xhr.setRequestHeader('X-CSRF-TOKEN', project.core.app.View.getCsrfToken());
var form_data = new FormData();
_self.xhr.send(form_data);
When I try to download the file with name greater than 255 characters, I can do so successfully in Chrome and IE11 on Windows 10.
However, when I try to do so in MS Edge, the download is unsuccessful due to the long file name.
How can I make it work on MS Edge also.
Edit
I just realized that in Chrome the file name is always limited to 218 characters and the last characters are trimmed.
So, my new question is how can I limit the characters to 218, specially in the case where a file with the same name already exists (file names are then appended (1),(2) and so on)

Reading directly from Google Drive in Java

Please I need to read the content of a file stored in Google Drive programmatically. I'm looking forward to some sort of
InputStream is = <drive_stuff>.read(fileID);
Any help?
I'll also appreciate if I can write back to a file using some sort of
OutputStream dos = new DriveOutputStream(driveFileID);
dos.write(data);
If this sort of convenient approach is too much for what Drive can offer, please I'll like to have suggestions on how I can read/write to Drive directly from java.io.InputStream / OutputStream / Reader / Writer without creating temporary local file copies of the data I want to ship to drive. Thanks!

// Build a new authorized API client service.
Drive service = getDriveService();
// Print the names and IDs for up to 10 files.
FileList result = service.files().list()
.setPageSize(10)
.setFields("nextPageToken, files(id, name)")
.execute();
List<File> files = result.getFiles();
if (files == null || files.size() == 0) {
System.out.println("No files found.");
} else {
System.out.println("Files:");
for (File file : files) {
System.out.printf("%s (%s)\n", file.getName(), file.getId());
String fileId = file.getId();
Export s=service.files().export(fileId, "text/plain");
InputStream in=s.executeMediaAsInputStream();
InputStreamReader isr=new InputStreamReader(in);
BufferedReader br = new BufferedReader(isr);
String line = null;
StringBuilder responseData = new StringBuilder();
while((line = br.readLine()) != null) {
responseData.append(line);
}
System.out.println(responseData);
}
}
}

Please take a look at the DrEdit Java sample that is available on the Google Drive SDK documentation.
This example shows how to authorize and build requests to read metadata, file's data and upload content to Google Drive.
Here is a code snippet showing how to use the ByteArrayContent to upload media to Google Drive stored in a byte array:
/**
* Create a new file given a JSON representation, and return the JSON
* representation of the created file.
*/
#Override
public void doPost(HttpServletRequest req, HttpServletResponse resp)
throws IOException {
Drive service = getDriveService(req, resp);
ClientFile clientFile = new ClientFile(req.getReader());
File file = clientFile.toFile();
if (!clientFile.content.equals("")) {
file = service.files().insert(file,
ByteArrayContent.fromString(clientFile.mimeType, clientFile.content))
.execute();
} else {
file = service.files().insert(file).execute();
}
resp.setContentType(JSON_MIMETYPE);
resp.getWriter().print(new Gson().toJson(file.getId()).toString());
}

Here's a (incomplete) snippet from my app which might help.
URL url = new URL(urlParam);
HttpURLConnection connection = (HttpURLConnection) url
.openConnection();
connection.setDoOutput(true);
connection.setRequestMethod("GET");
connection
.setRequestProperty("Authorization",
"OAuth "+accessToken);
String docText = convertStreamToString(connection.getInputStream());

Using google-api-services-drive-v3-rev24-java-1.22.0:
To read the contents of a file, make sure you set DriveScopes.DRIVE_READONLY when you do GoogleAuthorizationCodeFlow.Builder(...) in your credential authorizing method/code.
You'll need the fileId of the file you want to read. You can do something like this:
FileList result = driveService.files().list().execute();
You can then iterate the result for the file and fileId you want to read.
Once you have done that, reading the contents would be something like this:
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
driveService.files().get(fileId).executeMediaAndDownloadTo(outputStream);
InputStream in = new ByteArrayInputStream(outputStream.toByteArray());

Convert PDF to Base64 and store data to BLOB of Database

I want to binary data (e.g. a PDF) into a BLOB of my Oracle database.
At first I putted the PDF into a FileInputStream and created a byte-array.Here is the code for that:
public static byte[] createByteArray(File pCurrentFolder, String pNameOfBinaryFile)
{
String pathToBinaryData = pCurrentFolder.getAbsolutePath()+"/"+pNameOfBinaryFile;
File file = new File(pathToBinaryData);
if (!file.exists())
{
System.out.println(pNameOfBinaryFile+" could not be found in folder "+pCurrentFolder.getName());
return null;
}
FileInputStream fin = null;
try {
fin = new FileInputStream(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
byte fileContent[] = new byte[(int) file.length()];
try {
fin.read(fileContent);
} catch (IOException e) {
e.printStackTrace();
}
return fileContent;
}
I sent this (the byte array) via MyBatis to the database and it worked, so that I had the PDF in my BLOB and I also could read the PDF from my database.
But now I face the following problem:
I have a JDBC Connector for my search engine (FAST ESP...but that dowsnt matter) which connects to a certain database and stores all the content to a xml file. Inside this xml file is an element called "data" which contains the binary data inside its CDATA Field.
When I want to parse this xml, Java tells me:
The content of elements must consist of well-formed character data or markup.
With some PDF's i works but with some not. So I think the problem is, that I have stored them in the database in the wrong way.
For further information I would reverence to another questions I asked before which is similar to that.
Java: skip binary data in xml file while parsing
Someone there told me that I should encode my PDF (or any binary file) with base64. So that would mean, I do not just put my PDF into a FileInputStream, store the byte[] and put this byte[] to my BLOB of the database.
What do I have to do, to store the PDF in correct way inside my database, so that afterwards I can correctly parse my XML file the JDBC connector creates?

You can use the JAXB DatatypeConverter class to easily convert your data to base64 without any external dependencies:
byte[] arr = YOUR_BINARY_ARRAY;
String result = javax.xml.bind.DatatypeConverter.printBase64Binary(arr);
You can simply add this code to the end of your method and change its return type to a String.

You can try to first convert the bytes to basse64 using Apache Commons as this example:
import org.apache.commons.codec.binary.Base64;
import java.util.Arrays;
public class Base64Encode {
public static void main(String[] args) {
String hello = "Hello World";
byte[] encoded = Base64.encodeBase64(hello.getBytes());
System.out.println(Arrays.toString(encoded));
String encodedString = new String(encoded);
System.out.println(hello + " = " + encodedString);
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extraneous data when reading/writing binary data - java

Writer classes including PrintWriter are for text data. Stream classes such as OutputStream are for binary data. You're converting binary data into a String at which point some binary data can get corrupted. Get rid of the String strBinaryData and just the byte[] you get from Base64.decodeBase64.

Related

How to get svg in database injava

Read a file using Java from an S3 bucket and HTTP PUT file to presigned AWS S3 URL of another bucket in a way that simulates an actual file upload

Downloading a file with file name greater than 255 characters using MS edge

Reading directly from Google Drive in Java

Convert PDF to Base64 and store data to BLOB of Database

Categories

Resources