Strange byte[] behavior reading from a URL

Strange byte[] behavior reading from a URL - java

In the end, my ultimate goals are:
Read from a URL (what this question is about)
Save the retrieved [PDF] content to a BLOB field in a DB (already have that nailed down)
Read from the BLOB field and attach that content to an email
All without going to a filesystem
The goal with the following method is to get a byte[] that can be used downstream as an email attachment (to avoid writing to disk):
public byte[] retrievePDF() {
HttpClient httpClient = new HttpClient();
GetMethod httpGet = new GetMethod("http://website/document.pdf");
httpClient.executeMethod(httpGet);
InputStream is = httpGet.getResponseBodyAsStream();
byte[] byteArray = new byte[(int) httpGet.getResponseContentLength()];
is.read(byteArray, 0, byteArray.length);
return byteArray;
}
For a particular PDF, the getResponseContentLength() method returns 101,689 as the length. The strange part is that if I set a break-point and interrogate the byteArray variable, it has 101,689 byte elements, however, after byte #3744 the remaining bytes of the array are all zeroes (0). The resulting PDF is then not readable by a PDF-reader client, like Adobe Reader.
Why would that happen?
Retrieving this same PDF via browser and saving to disk, or using a method like the following (which I patterned after an answer to this StackOverflow post), results in a readable PDF:
public void retrievePDF() {
FileOutputStream fos = null;
URL url;
ReadableByteChannel rbc = null;
url = new URL("http://website/document.pdf");
DataSource urlDataSource = new URLDataSource(url);
/* Open a connection, then set appropriate time-out values */
URLConnection conn = url.openConnection();
conn.setConnectTimeout(120000);
conn.setReadTimeout(120000);
rbc = Channels.newChannel(conn.getInputStream());
String filePath = "C:\\temp\\";
String fileName = "testing1234.pdf";
String tempFileName = filePath + fileName;
fos = new FileOutputStream(tempFileName);
fos.getChannel().transferFrom(rbc, 0, 1 << 24);
fos.flush();
/* Clean-up everything */
fos.close();
rbc.close();
}
For both approaches, the size of the resulting PDF is 101,689-bytes when doing a Right-click > Properties... in Windows.
Why would the byte array essentially "stop" part-way through?

InputStream.read reads up to byteArray.length bytes but might not read exactly that much. It returns how many bytes it read. You should call it repeatedly to fully read the data, like this:
int bytesRead = 0;
while (true) {
int n = is.read(byteArray, bytesRead, byteArray.length);
if (n == -1) break;
bytesRead += n;
}

Check the return value of InputStream.read. It's not going to read all at one go. You have to write a loop. Or, better yet, use Apache Commons IO to copy the stream.

101689 = 2^16 + 36153 so it would look like, that there is a 16 bit limitation on buffer size.
The difference between 36153 and 3744 maybe stems from the header part having been read in an extra small 1K buffer or so, and already containing some bytes.

Related

In java some redirected urls return the data from redirected page okay but some do not

I have some Java code that takes a url and then returns the data (and at later point BufferedImage is constructed from it, the problem is it works for most urls from particular website but not all.
The urls are actually redirects so for example I pass
//https://coverartarchive.org/release/bdd0e35c-ce68-3f5b-b957-f83ab5846111/front
it will actually redirect to
//https://ia600301.us.archive.org/31/items/mbid-bdd0e35c-ce68-3f5b-b957-f83ab5846111/mbid-bdd0e35c-ce68-3f5b-b957-f83ab5846111-6094238097.jpg
and return the correct data
But if I pass the seemingly very similar url
//http://coverartarchive.org/release/6b105b89-21ee-414a-b98f-b2756c92b0bc/front
Then although this is what is the url seen if pasted into web-browser
//https://ia902907.us.archive.org/33/items/mbid-6b105b89-21ee-414a-b98f-b2756c92b0bc/mbid-6b105b89-21ee-414a-b98f-b2756c92b0bc-3167310704.jpg
My code only returns 169 bytes
If I pass the url it redirects to directly
//https://ia902907.us.archive.org/33/items/mbid-6b105b89-21ee-414a-b98f-b2756c92b0bc/mbid-6b105b89-21ee-414a-b98f-b2756c92b0bc-3167310704.jpg
then it works okay, but I dont have this url so not a solution.
This is quite old code, maybe a better way to do it now, but is my code broken or is the website broken ?
private static byte[] convertUrlToByteArray(URL url) throws IOException
{
//Get imagedata, we want to ensure we just write the data as is as long as in a supported format
URLConnection connection = url.openConnection();
connection.setConnectTimeout(URL_TIMEOUT);
connection.setReadTimeout(URL_TIMEOUT);
// Since you get a URLConnection, use it to get the InputStream
InputStream in = connection.getInputStream();
// Now that the InputStream is open, get the content length
int contentLength = connection.getContentLength();
// To avoid having to resize the array over and over and over as
// bytes are written to the array, provide an accurate estimate of
// the ultimate size of the byte array
ByteArrayOutputStream tmpOut;
if (contentLength != -1)
{
tmpOut = new ByteArrayOutputStream(contentLength);
}
else
{
tmpOut = new ByteArrayOutputStream(16384); // Pick some appropriate size
}
byte[] buf = new byte[1024];
while (true)
{
int len = in.read(buf);
if (len == -1)
{
break;
}
tmpOut.write(buf, 0, len);
}
in.close();
tmpOut.close(); // No effect, but good to do anyway to keep the metaphor alive
return tmpOut.toByteArray();
}

How send the http post request with DataOutPutStream properly so server can handle it

I wish somebody tell me what is the different of just write a file and a file with another kind of bytes.
server using, python3 flask
I think maybe the android retrofit etc useful, but I would like to try with the classic method, HTTPUrlConnection
So i successfully sending just one or multi-parameter of string to the server.
I also successfully just by sending a file to the server.
- my file will just 5-second audio or video mp4 that creates from real android.
When i tried just two, param and a list of byte, len(list) = 2, i can get back my sent file, but the concat style of the bytes just could not acheive it.
but when I combine both of it, i found out that when the file is chopped as multi-part, the file just could not recover.
I know delimeter is useful, I tried with a string of "--------------" to split it in server-side.
list= request.data.split(b"------------------------------")
newList= list[1:]
data = b""
for part in newList:
data += part
how i recover the file (python)
def createAudioFromDataReceived(fileName, data):
with open(fileName, 'wb') as wfile:
wfile.write(data)
the basic code write to dataOutPutStream
public void writeFilesParamToDataOutputStream(HttpURLConnection conn, File file, String action) throws IOException {
byte[] buffer;
FileInputStream fileInputStream = new FileInputStream(file);
DataOutputStream dos = new DataOutputStream(conn.getOutputStream());
buffer = new byte[1024 * 1024];
int length = 0;
while ( ( length = fileInputStream.read( buffer ) ) > 0 ) {
dos.write(buffer, 0, length);
}
dos.flush();
fileInputStream.close();
dos.close();
}
to add extra line to the dataOutputStream
//Bytes
byte[] bytes = "toSend".getBytes();
dos.write(bytes);
dos.write("------------------------------");

ops, reference are seen before
roughly this kind How to send data from server to Android?
i could not imagine the byte got alot of "-" and need "/r/n" ...
delimiter should be some thing like
String delimiter = "--aaWEdFXvDF--" + "\r\n";

How to unzip file from InputStream

I'm trying to get a zip file from the server.
Im using HttpURLConnection to get InputStream and this is what i have:
myInputStream.toString().getBytes().toString() is equal to [B#4.....
byte[] bytes = Base64.decode(myInputStream.toString(), Base64.DEFAULT);
String string = new String(bytes, "UTF-8");
string == �&ܢ��z�m����y....
I realy tried to unzip this file but nothing works, also there is so many questions, should I use GZIPInputStream or ZipInputStream? Do I have to save this stream as file, or I can work on InputStream
Please help, my boss is getting impatient:O
I have no idea what is in this file i have to find out:)

GZipInputStream and ZipInputStream are two different formats. https://en.wikipedia.org/wiki/Gzip
It is not a good idea to retrieve a string directly from the stream.From an InputStream, you can create a File and write data into it using a FileOutputStream.
Decoding in Base 64 is something else. If your stream has already decoded the format upstream, it's OK; otherwise you have to encapsulate your stream with another input stream that decodes the Base64 format.
The best practice is to use a buffer to avoid memory overflow.
Here is some Kotlin code that decompresses the InputStream zipped into a file. (simpler than java because the management of byte [] is tedious) :
val fileBinaryDecompress = File(..path..)
val outputStream = FileOutputStream(fileBinaryDecompress)
readFromStream(ZipInputStream(myInputStream), BUFFER_SIZE_BYTES,
object : ReadBytes {
override fun read(buffer: ByteArray) {
outputStream.write(buffer)
}
})
outputStream.close()
interface ReadBytes {
/**
* Called after each buffer fill
* #param buffer filled
*/
#Throws(IOException::class)
fun read(buffer: ByteArray)
}
#Throws(IOException::class)
fun readFromStream(inputStream: InputStream, bufferSize: Int, readBytes: ReadBytes) {
val buffer = ByteArray(bufferSize)
var read = 0
while (read != -1) {
read = inputStream.read(buffer, 0, buffer.size)
if (read != -1) {
val optimizedBuffer: ByteArray = if (buffer.size == read) {
buffer
} else {
buffer.copyOf(read)
}
readBytes.read(optimizedBuffer)
}
}
}
If you want to get the file from the server without decompressing it, remove the ZipInputStream() decorator.

Usually, there is no significant difference between GZIPInputStream or ZipInputStream, so if at all, both should work.
Next, you need to identify whether the zipped stream was Base64 encoded, or the some Base64 encoded contents was put into a zipped stream - from what you put to your question, it seems to be the latter option.
So you should try
ZipInputStream zis = new ZipInputStream( myInputStream );
ZipEntry ze = zis.getNextEntry();
InputStream is = zis.getInputStream( ze );
and proceed from there ...

basically by setting inputStream to be GZIPInputStream should be able to read the actual content.
Also for simplicity using IOUtils package from apache.commons makes your life easy
this works for me:
InputStream is ; //initialize you IS
is = new GZIPInputStream(is);
byte[] bytes = IOUtils.toByteArray(is);
String s = new String(bytes);
System.out.println(s);

Who is adding "\n" in Base64 encoded image when I write it in a file? Java

WHAT I'M DOING
I need to send via HTTPS request a JsonArray with some data and images in Base64 encoded strings. This works well if data is stored in memory.
Now, I need to avoid load all data in memory and I'm creating a temporally file in android device with all data that I need to send.
To create the file, I'm writting lots of JsonObjects inside him. Some of this JsonObjects have a field that represents the image. When I detect one, I get the image path and I encode it with Base64 as a String.
UPDATE:
First of all, I inicialize the file and I get the bufferedWriter
File f = new File(privateSincronizePath+File.separator+"upload_"+timefile+".json");
f.createNewFile();
FileWriter fw = new FileWriter(f.getAbsoluteFile(), true);
BufferedWriter bw = new BufferedWriter(fw);
Here is the code that create the image when exists:
JSONObject jobInf = new JSONObject();
jobInf.put("xx", "INSERT");
jobInf.put("xx", c.getString(c.getColumnIndex("xx")));
jobInf.put("xx", ""+c.getInt(c.getColumnIndex("xx")));
jobInf.put("xx", c.getString(c.getColumnIndex("xx")));
JSONObject jsonObject = new JSONObject(docu.filtre(dades, docu.getXx()));
Iterator<?> keys = jsonObject.keys();
boolean updated = false;
while(keys.hasNext() && !updated){
String key = (String)keys.next();
if (key != null && key.equals("path") && key.length() > 0){
jsonObject.put(key, getBase64EncodeFromImage(jsonObject.getString(key)));
updated = true;
}
}
jobInf.put("object", jsonObject);
escriure(bw, ","+jobInf.toString());
Method escriure():
UPDATE: This method is called every time I complete the creation of the JsonObject. Only append JsonObject as String to the file.
private void escriure(BufferedWriter bw, String s) throws IOException {
uploadLength += s.length();
bw.append(s);
}
Finally, when file is created, I'm reading it and, using OutputStream, I'm sending the data as Post parameters to the server:
this.Https_url = new URL(url+"/sync.php?");
HttpsURLConnection con = (HttpsURLConnection) Https_url.openConnection();
con.setReadTimeout(60000);
con.setConnectTimeout(60000);
con.setRequestMethod("POST");
con.setDoInput(true);
con.setDoOutput(true);
con.setFixedLengthStreamingMode(uploadLength);
OutputStream os = con.getOutputStream();
InputStream inp = new FileInputStream(new File(privateSincronizePath+File.separator+"upload_"+timefile+".json"));
byte[] buffer = new byte[1024];
int len;
while ((len = inp.read(buffer)) != -1) {
os.write(buffer, 0, len);
}
inp.close();
os.close();
// Establim la connexió:
con.connect();
WHAT IS THE PROBLEM
The problem is simple. When i open the image in the server, the file is corrupted and doesn't show the image.
WHAT I NOTICED
If I capture the image Base64 string encoded, before write in the file, and uploads it in the server, the image is Ok! Then, after write in the file, the image seems to be corrupted.
After investigate, I noticed that the Base64 encoded string after be written in the file, it have a lot of "\n" every x characters.
If I delete all of this "\n" or breaklines, the image can be opened by the server correctly.
THE QUESTION
Who is putting this breaklines? How can I write the Base64 encoded as String "as is" ?
Thanks all for your help in advance!
THE ANSWER
As Schoenobates answer, the solution was use the flag NO_WRAP.
To add more information, we put in the server side this function to read the encoded Base64 string with flag URL_SAFE
The function, obtained in on coment of TOM in php.net is:
<?php
function base64url_decode($base64url)
{
$base64 = strtr($base64url, '-_', '+/');
$plainText = base64_decode($base64);
return ($plainText);
}
?>
Thanks to StackOverflow, Axel and Schoenobates for your time!

That'll be the Android Base64 class - you need to set the flags to remove the newlines:
byte[] image = ...;
Base64.encodeToString(image, Base64.NO_WRAP | Base64.URL_SAFE);

Fetch attachment content using javamail

I am using javamail to automate some email handling.
I managed to get a connection to the pop3 server and fetch the messages. Some of them contains an attachment. Base on the email title I am able to "predict" the filename of the attachment that I need to fetch.
But I can't get its content :(
I have a function
public byte[] searchForContent(Part part,String fileName){
if(part.getFileName()!=null){
if(part.getFileName().equals(fileName)){
byte[] content = new byte[part.getSize()];
part.getInputStream().read(content);
return content[]
}
}
return null;
}
The function works very well (ie: return content only if the part was the attachment described by fileName). But the array its returns is too big.
The downloaded attachment is 256 bytes long and the function return a 352 bytes long content.
I think that the problem comes from the headers, but I can't be sure.
How would you proceed to get the content only ?
Thank you.

For what it's worth, the API documentation for javax.mail.Part.getSize() says
Note that the size may not be an exact measure of the content size and may or may not account for any transfer encoding of the content. The size is appropriate for display in a user interface to give the user a rough idea of the size of this part.
Assuming that you know the type of content, you can probably fetch it from Part.getContent() and process that. getContent() returns an Object, but if you know the content type you can cast it appropriately (e.g. to a String for text).

I finally found a solution.
As eaj told : part.getSize() returns the size of the part object. Not the size of the attachment itself.
However the InputStream returned by part.getInputStream() contains the content of the attachment only.
So the code below does give the expected result :
public byte[] searchForContent(Part part,String fileName){
if(part.getFileName()!=null){
if(part.getFileName().equals(fileName)){
InputStream stream=part.getInputStream();
byte[] buffer = new byte[512];
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int bytesRead;
while ((bytesRead = stream.read(buffer) != -1)
{
baos.write(buffer, 0, bytesRead);
}
return baos.toByteArray();
}
}
return null;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Strange byte[] behavior reading from a URL - java

Check the return value of InputStream.read. It's not going to read all at one go. You have to write a loop. Or, better yet, use Apache Commons IO to copy the stream.

101689 = 2^16 + 36153 so it would look like, that there is a 16 bit limitation on buffer size. The difference between 36153 and 3744 maybe stems from the header part having been read in an extra small 1K buffer or so, and already containing some bytes.

Related

In java some redirected urls return the data from redirected page okay but some do not

How send the http post request with DataOutPutStream properly so server can handle it

How to unzip file from InputStream

Who is adding "\n" in Base64 encoded image when I write it in a file? Java

Fetch attachment content using javamail

Categories

Resources