JMeter not decoding base64 correctly - Results in blank PDF - java

In JMeter, I get a base64 encoded PDF in a Response that I extract using the RegEx Extractor. That is all working great.
Then I need to decode that base64 encoded document and write it out to a file, which I'm doing with the following in a BeanShell Post Processor:
import org.apache.commons.io.FileUtils;
import org.apache.commons.codec.binary.Base64;
// Set the response variable
String response = vars.get("documentText");
// Remove the carriage return hex code and condense to single string
String encodedFile = response.replace("
","").replaceAll("[\n]+","");
// Decode the encoded string
vars.put("decodedFile",new String(Base64.decodeBase64(encodedFile)));
// Write out the decoded file
Output = vars.get("decodedFile");
f = new FileOutputStream("C:\\Users\\user\\Desktop\\decodedFile.pdf");
p = new PrintStream(f);
this.interpreter.setOut(p);
print(Output);
p.flush();
f.close();
My problem is that the file that gets decoded and written out opens as a blank PDF.
In troubleshooting this, I wrote out a file with the encoded string from JMeter and then manually decoded it using this base64 tool. When I manually decoded the file, it opened as expected.
I then compared the text of the file that was produced by JMeter and the one I decoded with the tool and noticed that the file produced by JMeter included random ?'s throughout
I am assuming this must be the culprit, however, I do not know what is causing these to show up or how to fix it.

JMeter is not decoding Base64 correctly because JMeter cannot decode Base64. If you are using some custom code to do it I would suggest look into this code first.
Given you need to do this magic:
String encodedFile = response.replace("
","").replaceAll("[\n]+","");
my expectation is that your either your regular expression or server response is shitty
Given you use scripting-based post-processor you ain't gonna need this "regex" interim step, you should be able to access parent sampler response from Beanshell PostProcessor via data shorthand
So your great script can be optimized into something like:
FileUtils.writeByteArrayToFile(new File("C:\\Users\\user\\Desktop\\decodedFile.pdf"), Base64.decodeBase64(data));
As a fallback option you can execute this decb64.exe program using OS Process Sampler.

Related

Converting string to byte[] returns wrong value (encoding?)

I read a byte[] from a file and convert it to a String:
byte[] bytesFromFile = Files.readAllBytes(...);
String stringFromFile = new String(bytesFromFile, "UTF-8");
I want to compare this to another byte[] I get from a web service:
String stringFromWebService = webService.getMyByteString();
byte[] bytesFromWebService = stringFromWebService.getBytes("UTF-8");
So I read a byte[] from a file and convert it to a String and I get a String from my web service and convert it to a byte[]. Then I do the following tests:
// works!
org.junit.Assert.assertEquals(stringFromFile, stringFromWebService);
// fails!
org.junit.Assert.assertArrayEquals(bytesFromFile, bytesFromWebService);
Why does the second assertion fail?
Other answers have covered the likely fact that the file is not UTF-8 encoded giving rise to the symptoms described.
However, I think the most interesting aspect of this is not that the byte[] assert fails, but that the assert that the string values are the same passes. I'm not 100% sure why this is, but I think the following trawl through the source code might give us the answer:
Looking at how new String(bytesFromFile, "UTF-8"); works - we see that the constructor calls through to StringCoding.decode()
This in turn, if supplied with tht UTF-8 character set, calls through to StringDecoder.decode()
This calls through to CharsetDecoder.decode() which decides what to do if the character is unmappable (which I guess will be the case if a non-UTF-8 character is presented)
In this case it uses an action defined by
private CodingErrorAction unmappableCharacterAction
= CodingErrorAction.REPORT;
Which means that it still reports the character it has decoded, even though it's technically unmappable.
I think this means that even when the code gets an umappable character, it substitutes its best guess - so I'm guessing that its best guess is correct and hence the String representations are the same under comparison, but the byte[] are no longer the same.
This hypothesis is kind of supported by the fact that the catch block for CharacterCodingException in StringCoding.decode() says:
} catch (CharacterCodingException x) {
// Substitution is always enabled,
// so this shouldn't happen
I don't understand it fully, but here's what I get so fare:
The problem is that the data contains some bytes which are not valid UTF-8 bytes as I know by the following check:
// returns false for my data!
public static boolean isValidUTF8(byte[] input) {
CharsetDecoder cs = Charset.forName("UTF-8").newDecoder();
try {
cs.decode(ByteBuffer.wrap(input));
return true;
}
catch(CharacterCodingException e){
return false;
}
}
When I change the encoding to ISO-8859-1 everything works fine. The strange thing (which a don't understand yet) is why my conversion (new String(bytesFromFile, "UTF-8");) doesn't throw any exception (like my isValidUTF8 method), although the data is not valid UTF-8.
However, I think I will go another and encode my byte[] in a Base64 string as I don't want more trouble with encoding.
The real problem in your code is that you don't know what the real file encoding.
When you read the string from the web service you get a sequence of chars; when you convert the string from chars to bytes the conversion is made right because you specify how to transform char in bytes with a specific encoding ("UFT-8"). when you read a text file you face a different problem. You have a sequence of bytes that needs to be converted to chars. In order to do it properly you must know how the chars where converted to bytes i.e. what is the file encoding. For files (unless specified) it's a platform constants; on windows the file are encoded in win1252 (which is very close to ISO-8859-1); on linux/unix it depends, I think UTF8 is the default.
By the way the web service call did a decond operation under the hood; the http call use an header taht defins how chars are encoded, i.e. how to read the bytes form the socket and transform then to chars. So calling a SOAP web service gives you back an xml (which can be marshalled into a Java object) with all the encoding operations done properly.
So if you must read chars from a File you must face the encoding issue; you can use BASE64 as you stated but you lose one of the main benefits of text files: the are human readable, easing debugging and developing.

Decoding a base64 XML cuts off the last part

I have a base64 encoded string, which represents an XML Schema (xsd). I decode this using Apache's Base64 utilities, put the resulting byte array into an intputsource and let an XMLSchemaCollection read this inputSource:
String base64String = ......
byte[] decoded = Base64.decodeBase64(base64String);
InputSource inputSource = new InputSource(new ByteArrayInputStream(decoded));
xmlSchemaCollection.read(inputSource, new ValidationEventHandler());
This gives an error:
XML document structure must start and end within the same entity
Which usually means the XML structure isn't valid. I performed two tests to see what the base64 actually holds. First is printing it out to the console:
System.out.println(new String(decoded,"UTF-8"));
In eclipse, I see my xml is suddenly cut off, like part of it is missing. However, if I use any online website, such as https://www.base64decode.org/, and I copy/paste my base64, I see the complete full xml. If I validate this xml, the validation succeeds. So I'm a bit confused as to why eclipse seemingly cuts off my xml after decoding?
Errors like this are usually indicative of a badly formatted document:
XML document structures must start and end within the same entity...
A few things you can do to debug this:
1. Print out the XML document to a log and run it through some sort of XML validator.
2. Check to make sure that there are no invalid characters (ex UTF-16 characters in a UTF-8 document)

JAX-RS and character encoding problems

I am using Jax RS and have simple POST WS, that takes InputStream, that contains MIME message (xml + file).
The MIME message is in UTF-8, file contained as a body part is an email message in MIME RFC 822 in ISO-8859-1 encoding, that I'm converting to PDF using Aspose.
When running as a webservice, the resulting PDF has incorrect characters (ø, å etc.). But when I tried to use the exact input, but reading it from file instead and call the method with FileInputStream, the resulting PDF is OK.
Here is the simplified version of the code:
#POST
#Path(value = "/documents/convert/{flag}")
#Produces("text/plain")
public String convertFile(InputStream input, #PathParam("flag") String flag) throws WebApplicationException {
FileInfo info = convertToPdf(input);
return info.getResponse();
}
If I run this as webservice it produces PDF with incorrectly encoded characters with "box" instead of some charcters (such as ø, å etc.). When I run the the same code with the same input by by calling
FileInputStream fis = new FileInputStream(file);
convertFile(fis);
the resulting PDF has correct encoding (the WS is run on server, testing with file is done on my local machine).
Could this be incorrect setting of locale on the server?
Do you use an InputStreamReader to read the FileInputStream ? If so, did you initialize it using the 2-parameters constructor, with CharSet.forName("UTF-8") as the second argument ? (as you mentionned the incoming stream is already in UTF-8) ?
You might need to tell the container that it's UTF-8.
something like...
#Produces("text/plain; charset=utf-8")
Apparently your local file and you MIME message body are not encoded the same way.
Your post states that the file is encoded in ISO-8859-1.
If you are using an InputStreamReader (as Xavier Coulon's is suggesting) you should pass the expected encoding to it. In this case
CharSet.forName("ISO-8859-1")
If this does not help, could you please provide the content of the convertToPdf(InputStream is) method

PIL saved jpeg string cannot be read in java

I create a PIL image string on a python server:
frame = cv.CaptureFromCAM(0)
image = Image.fromstring('RGB', cv.GetSize(frame), frame.tostring(), 'raw', 'BGR')
buffer = cStringIO.stringIO()
image.save(buffer,'JPEG')
udptransmit(buffer.getvalue())
I have a java client trying to read the transmitted image string and reform the jpeg. This however doesn't seem to be working. I created a python client just to check, and I can reform the jpeg correctly using a call to pygame's load method.
The string being sent from python, contains characters 6:10 = JFIF, which is the correct format (also recognised by python's imghdr module.
In java, I ahve tried
simply writing the byte contents of the string received into a file and naming it with a .jpeg extension. The file isn't a valid jpeg.
Using ImageIO to read the bytes from the string. This produces a null image.
Tried to fetch ImageReaderByFormat('JPEG') and parse the bytes with this. This gives me an error stating 'Image is not a JPEG, starts with 0x...'
I really can't see why python recognises the string as a valid jpeg and java doesn't. Do the two use different jpeg decoders? Even if they do, shouldn't both either validate or reject the string?
Just found a solution to the problem
The problem was with the charset used in java while converting the string sent from my python server into bytes in java.
Here's the simple modification that was required in my java client code:
Charset charset = Charset.forName("ISO-8859-1");
Byte[] bytes: Array[Byte] = cam_data.getBytes(charset)
File f = new File("image.jpeg")
FileImageOutputStream fios = new FileImageOutputStream(f)
BufferedImage bim = ImageIO.read(new ByteArrayInputStream(bytes))
ImageIO.write(bim,"jpeg",fios)
The helpful link that lead me to the answer was http://www.java-forums.org/advanced-java/50516-reading-image-files-into-strings.html

Non-english characters are decoded incorrectly on Android with HtlmCleaner

I'm using HtmlCleaner to scrape a ISO-8859-1 encoded web site in Android.
I've implemented this in an external jar file that I import into my Android app.
When I run the unit tests in Eclipse it handles Norwegian letters (æ,ø,å) correct (I can verify that in the debugger), but in the Android app these characters look like inverted question marks.
If I attach the debugger to my Android app I can see that these letters are not correct in the exact same places they were good when running unit test from Eclipse, so it's not a display/render/view issue in the Android app.
When I copy the text from the debuggers I get these results:
Java Process (Unit Test): «Blårek», «Benny»
Android Process (In emulator): «Bl�rek», «Benny»
I would expect these Strings to be equal, but notice how the "å" is replaed by the inverted question marks in Android.
I have tried running htmlCleaner.getProperties().setRecognizeUnicodeChars(true) without any luck. Also, I found no way of forcing UTF-8 or ISO-8859-1 encoding in html cleaner, but I' not sure if that would have made a difference.
Here is the code i run:
HtmlCleaner htmlCleaner = new HtmlCleaner();
// connect to url and get root TagNode from HtmlCleaner
InputSteram is = new URL( url ).openConnection().getInputStream();
TagNode rootNode = htmlCleaner.clean( is );
// navigate through some TagNodes, getting the ContentNode
ContentNode cn = rootNode...
// This String contains the incorrectly decoded characters on Android.
// Good in Oracle JVM though..
String value = cn.toString().trim();
Does anyone knows what could cause the decoding behavoir to be different on Android? I guess the main difference between the two environments is that the Android app uses Android's java.io stack while my unit tests use Sun/Oracle's stack.
Thanks,
Geir
HtmlCleaner can't tell what encoding to use; you are passing only the body of the response in the InputStream, but the encoding is in the "content-type" header.
You can set the character encoding on the properties of the HtmlCleaner to the correct encoding from the HTTP connection. But that would require you to parse the correct parameter from the content-type header. Alternatively, you can pass a URL instance to HtmlCleaner and let it manage the connection. Then, it will have access to all the information it needs to decode properly.

Categories

Resources