Consume JSON / Base64 encoded file in Android / Java - java

I have a web service capable of returning PDF files in two ways:
RAW: The file is simply included in the response body. For example:
HTTP/1.1 200 OK
Content-Type: application/pdf
<file_contents>
JSON: The file is encoded (Base 64) and served as a JSON with the following structure:
HTTP/1.1 200 OK
Content-Type: application/json
{
"base64": <file_contents_base64>
}
I want to be able to consume both services on Android / Java by using the following architecture:
// Get response body input stream (OUT OF THE SCOPE OF THIS QUESTION)
InputStream bodyStream = getResponseBodyInputStream();
// Get PDF file contents input stream from body stream
InputStream fileStream = getPDFFileContentsInputStream(bodyStream);
// Write stream to a local file (OUT OF THE SCOPE OF THIS QUESTION)
saveToFile(fileStream);
For the first case (RAW response), the response body will the file itself. This means that the getPDFFileContentsInputStream(InputStream) method implementation is trivial:
#NonNull InputStream getPDFFileContentsInputStream(#NonNull InputStream bodyStream) {
// Return the input
return bodyStream;
}
The question is: how to implement the getPDFFileContentsInputStream(InputStream) method for the second case (JSON response)?

You can use any json parser (like Jackson or Gson), and then use Base64InputStream from apache-commons codec.
EDIT: You can obtain an input stream from string using ByteArrayInputStream, i.e.
InputStream stream = new ByteArrayInputStream(exampleString.getBytes(StandardCharsets.UTF_8));
as stated here.
EDIT 2: This will cause 2 pass over the data, and if the file is big, you might have memory problems. To solve it, you can use Jackson and parse the content yourself like this example instead of obtaining the whole object through reflection. you can wrap original input stream in another one, say ExtractingInputStream, and this will skip the data in the underlying input stream until the encoded part. Then you can wrap this ExtractingInputStream instance in a Base64InputStream. A simple algorithm to skip unnecessary parts would be like this: In the constructor of ExtractingInputStream, skip until you have read three quotation marks. In read method, return what underlying stream returns except return -1 if the underlying stream returns quotation mark, which corresponds to the end of base 64 encoded data.

Related

Cannot unzip nor get blob from HTTP Response

I am trying to unzip a file that is in a "response" of a HTTP Request.
My point is that after receiving the response I cannot unzip it nor make it to a blob to parse it afterward.
The zip will always return a xml and the idea after the file is unzipped, is to transform the XML to a JSON.
Here is the code I tried:
val client = HttpClient.newBuilder().build();
val request = HttpRequest.newBuilder()
.uri(URI.create("https://donnees.roulez-eco.fr/opendata/instantane"))
.build();
val response = client.send(request, HttpResponse.BodyHandlers.ofString());
Then the response.body() is just unreadable and I did not find a proper way to make it to a blob
The other code I used for unzipping directly is this one:
val url = URL("https://donnees.roulez-eco.fr/opendata/instantane")
val con = url.openConnection() as HttpURLConnection
con.setRequestProperty("Accept-Encoding", "gzip")
println("Length : " + con.contentLength)
var reader: Reader? = null
reader = InputStreamReader(GZIPInputStream(con.inputStream))
while (true) {
val ch: Int = reader.read()
if (ch == -1) {
break
}
print(ch.toChar())
}
But in this case, it won't accept the gzip
Any idea?
It looks like you're confusing zip (an archive format that supports compression) with gzip (a simple compressed format).
Downloading https://donnees.roulez-eco.fr/opendata/instantane (e.g. with curl) and checking the result shows that it's a zip archive (containing a single file, PrixCarburants_instantane.xml).
But you're trying to decode it as a gzip stream (with GZIPInputStream), which it's not — hence your issue.
Reading a zip file is slightly more involved than reading a gzip file, because it can hold multiple compressed files. But ZipInputStream makes it fairly easy: you can read the first zip entry (which has metadata including its uncompressed size), and then go on to read the actual data in that entry.
A further complication is that this particular compressed file seems to use ISO 8859-1 encoding, not the usual UTF-8. So you need to take that into account when converting the byte stream into text.
Here's some example code:
val zipStream = ZipInputStream(con.inputStream)
val entry = zipStream.nextEntry
val reader = InputStreamReader(zipStream, Charset.forName("ISO-8859-1"))
for (i in 1..entry.size)
print(reader.read().toChar())
Obviously, reading and printing the entire 11MB file one character at a time is not very efficient! And if there's any possibility that the zip archive could have multiple entries, you'd have to read through them all, stopping when you get to the one with the right name. But I hope this is a good illustration.

Headers getting added to file content while retrieving file from APIGatewayProxyRequestEvent in AWS lambda

I am using AWS Lambda to push the file to S3 through Java code.
While sending the file from Postman or from Angular I am trying to print the content of file in Java functions. While doing so headers are getting added to the file content automatically like:
"----------------------------965855468995803568737630
Content-Disposition: form-data; name="test"; filename="test.pdf"
Content-Type: application/pdf"
.
How to get the file content without headers from APIGatewayProxyRequestEvent?.
This is code am using to print the file content.
context.getLogger().log("Input File: "+apiGatewayProxyRequestEvent.getBody());
This is a tricky one for you to solve. The method getBody() will give you the actual request body that is sent through the APIGatewayProxyRequest so it's going to give you back what is sent through, which is the file encoded as form-data with a Content-Type and a filename. The responsibility lies on you to convert the form-data back into an understandable object format if you wan to print the content.
If you have a look at this tutorial on Medium you can see an approach to this. It boils down to processing the data and working with the format boundary:
//Get the uploaded file and decode from base64
byte[] bI = Base64.decodeBase64(event.getBody().getBytes());
//Get the content-type header and extract the boundary
Map<String, String> hps = event.getHeaders();
if (hps != null) {
contentType = hps.get("content-type");
}
String[] boundaryArray = contentType.split("=");
//Transform the boundary to a byte array
byte[] boundary = boundaryArray[1].getBytes();
//Log the extraction for verification purposes
logger.log(new String(bI, "UTF-8") + "\n");
That last line will get you what you want, which is printing the body content, obviously if it's a binary format that might not be very useful for you. I'd recommend giving that tutorial a full read as it will help show you how to iterate through the data stream and create the object.

Manipulate big XML file in Java Spring

I have a Java program (a war) that runs out of memory when manipulating a big XML file.
The program is a REST API that returns the manipulated XML via a REST Controller.
First, the program gets an XML file from a remote URL.
Then it replaces the values of id attributes.
Finally, it returns the new XML to the caller via the API controller.
What I get from the remote URL is a byte[] body with XML data.
Then, I convert it to a String.
Next, I do a regexp search-replace on the whole string.
Then I convert it back to a byte[].
I'm guessing that the XML now is in memory 3 times (the incoming bytes, the string and the outgoing bytes).
I'm looking for ways to improve this.
I have no local copies on the filesystem btw.
You can delete the incoming bytes from memory after converting the bytes to String:
byte[] bytes = bytesFromURL;
String xml = new String(bytes);
{...manipulate xml}
bytes = null;
System.gc();
bytes = xml.getBytes();

JAX-RS and character encoding problems

I am using Jax RS and have simple POST WS, that takes InputStream, that contains MIME message (xml + file).
The MIME message is in UTF-8, file contained as a body part is an email message in MIME RFC 822 in ISO-8859-1 encoding, that I'm converting to PDF using Aspose.
When running as a webservice, the resulting PDF has incorrect characters (ø, å etc.). But when I tried to use the exact input, but reading it from file instead and call the method with FileInputStream, the resulting PDF is OK.
Here is the simplified version of the code:
#POST
#Path(value = "/documents/convert/{flag}")
#Produces("text/plain")
public String convertFile(InputStream input, #PathParam("flag") String flag) throws WebApplicationException {
FileInfo info = convertToPdf(input);
return info.getResponse();
}
If I run this as webservice it produces PDF with incorrectly encoded characters with "box" instead of some charcters (such as ø, å etc.). When I run the the same code with the same input by by calling
FileInputStream fis = new FileInputStream(file);
convertFile(fis);
the resulting PDF has correct encoding (the WS is run on server, testing with file is done on my local machine).
Could this be incorrect setting of locale on the server?
Do you use an InputStreamReader to read the FileInputStream ? If so, did you initialize it using the 2-parameters constructor, with CharSet.forName("UTF-8") as the second argument ? (as you mentionned the incoming stream is already in UTF-8) ?
You might need to tell the container that it's UTF-8.
something like...
#Produces("text/plain; charset=utf-8")
Apparently your local file and you MIME message body are not encoded the same way.
Your post states that the file is encoded in ISO-8859-1.
If you are using an InputStreamReader (as Xavier Coulon's is suggesting) you should pass the expected encoding to it. In this case
CharSet.forName("ISO-8859-1")
If this does not help, could you please provide the content of the convertToPdf(InputStream is) method

How to parse an XML file containing BOM?

I want to parse an XML file from URL using JDOM. But when trying this:
SAXBuilder builder = new SAXBuilder();
builder.build(aUrl);
I get this exception:
Invalid byte 1 of 1-byte UTF-8 sequence.
I thought this might be the BOM issue. So I checked the source and saw the BOM in the beginning of the file. I tried reading from URL using aUrl.openStream() and removing the BOM with Commons IO BOMInputStream. But to my surprise it didn't detect any BOM.
I tried reading from the stream and writing to a local file and parse the local file. I set all the encodings for InputStreamReader and OutputStreamWriter to UTF8 but when I opened the file it had crazy characters.
I thought the problem is with the source URL encoding. But when I open the URL in browser and save the XML in a file and read that file through the process I described above, everything works fine.
I appreciate any help on the possible cause of this issue.
That HTTP server is sending the content in GZIPped form (Content-Encoding: gzip; see http://en.wikipedia.org/wiki/HTTP_compression if you don't know what that means), so you need to wrap aUrl.openStream() in a GZIPInputStream that will decompress it for you. For example:
builder.build(new GZIPInputStream(aUrl.openStream()));
Edited to add, based on the follow-up comment: If you don't know in advance whether the URL will be GZIPped, you can write something like this:
private InputStream openStream(final URL url) throws IOException
{
final URLConnection cxn = url.openConnection();
final String contentEncoding = cxn.getContentEncoding();
if(contentEncoding == null)
return cxn.getInputStream();
else if(contentEncoding.equalsIgnoreCase("gzip")
|| contentEncoding.equalsIgnoreCase("x-gzip"))
return new GZIPInputStream(cxn.getInputStream());
else
throw new IOException("Unexpected content-encoding: " + contentEncoding);
}
(warning: not tested) and then use:
builder.build(openStream(aUrl.openStream()));
. This is basically equivalent to the above — aUrl.openStream() is explicitly documented to be a shorthand for aUrl.openConnection().getInputStream() — except that it examines the Content-Encoding header before deciding whether to wrap the stream in a GZIPInputStream.
See the documentation for java.net.URLConnection.
You might find you can avoid handling encoded responses by sending a blank Accept-Encoding header. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html: "If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding.". That seems to be occurring here.

Categories

Resources