Parsing curl response with Java

Parsing curl response with Java - java

Before writing something like "why don't you use Java HTTP client such as apache, etc", I need you to know that the reason is SSL. I wish I could, they are very convenient, but I can't.
None of the available HTTP clients support GOST cipher suite, and I get handshake exception all the time. The ones which do support the suite, doesn't support SNI (they are also proprietary) - I'm returned with a wrong cert and get handshake exception over and over again.
The only solution was to configure openssl (with gost engine) and curl and finally execute the command with Java.
Having said that, I wrote a simple snippet for executing a command and getting input stream response:
public static InputStream executeCurlCommand(String finalCurlCommand) throws IOException
{
return Runtime.getRuntime().exec(finalCurlCommand).getInputStream();
}
Additionally, I can convert the returned IS to a string like that:
public static String convertResponseToString(InputStream isToConvertToString) throws IOException
{
StringWriter writer = new StringWriter();
IOUtils.copy(isToConvertToString, writer, "UTF-8");
return writer.toString();
}
However, I can't see a pattern according to which I could get a good response or a desired response header:
Here's what I mean
After executing a command (with -i flag), there might be lots and lots of information like in the screen below:
At first, I thought that I could just split it with '\n', but the thing is that a required response's header or a response itself may not satisfy the criteria (prettified JSON or long redirect URL break the rule).
Also, the static line GOST engine already loaded is a bit annoying (but I hope that I'll be able to get rid of it and nothing unrelated info like that will emerge)
I do believe that there's a pattern which I can use.
For now I can only do that:
public static String getLocationRedirectHeaderValue(String curlResponse)
{
String locationHeaderValue = curlResponse.substring(curlResponse.indexOf("Location: "));
locationHeaderValue = locationHeaderValue.substring(0, locationHeaderValue.indexOf("\n")).replace("Location: ", "");
return locationHeaderValue;
}
Which is not nice, obviosuly
Thanks in advance.

Instead of reading the whole result as a single string you might want to consider reading it line by line using a scanner.
Then keep a few status variables around. The main task would be to separate header from body. In the body you might have a payload you want to treat differently (e.g. use GSON to make a JSON object).
The nice thing: Header and Body are separated by an empty line. So your code would be along these lines:
boolean inHeader = true;
StringBuilder b = new StringBuilder;
String lastLine = "";
// Technically you would need Multimap
Map<String,String> headers = new HashMap<>();
Scanner scanner = new Scanner(yourInputStream);
while scanner.hasNextLine() {
String line = scanner.nextLine();
if (line.length() == 0) {
inHeader = false;
} else {
if (inHeader) {
// if line starts with space it is
// continuation of previous header
treatHeader(line, lastLine);
} else {
b.append(line);
b.appen(System.lineSeparator());
}
}
}
String body = b.toString();

Related

java or node.js removes one character when sending or receiving a string

I'm trying to send a string from a java (android) app to a node.js server.
But one character disappears somewhere in the middle and I can't really figure out why.
To send I use a HttpUrlConnection (conn) and send the string like this:
try {
OutputStream os = conn.getOutputStream();
os.write(json.getBytes());
os.close();
} catch (Exception e) {
e.printStackTrace();
}
Here is the base64 encoded string when sent, and string when received:
khVGUBH2kNAR5PPRy7v5dO5iz48Rc7benYARu78\/9wY=\n
khVGUBH2kNAR5PPRy7v5dO5iz48Rc7benYARu78/9wY=\n
so one backslash has be removed.
In node I use this:
exports.getString = function(req, res) {
var string = req.body.thestring;
}
which outputs the later of the two strings.
var express = require('express'),
http = require('http'),
stylus = require('stylus'),
nib = require('nib');
var app = express();
app.configure(function () {
app.use(express.logger('dev'));
//app.use(express.bodyParser());
app.use(express.json());
app.use(express.urlencoded());
app.use(app.router);
}
Any ideas of how I can get the missing character?

The missing backslash character is most probably disappearing in node.js side.
As per the chosen answer on the following question:
Two part question on JavaScript forward slash
As far as JS is concerned / and \ / are identical inside a string
So maybe a fix from Java's would solve your problem by using String's replaceAll method to replace all occurrences of \/ with \\/:
os.write(json.replaceAll("\\/", "\\\\/").getBytes());
Note that replaceAll returns the new string and doesn't change the original string.

Making the base64 encoding url safe solved my problem.

Ignore whitespace while converting XML to JSON using XStream

I'm attempting to establish a reliable and fast way to transform XML to JSON using Java and I've started to use XStream to perform this task. However, when I run the code below the test fails due to whitespace (including newline), if I remove these characters then the test will pass.
#Test
public void testXmlWithWhitespaceBeforeStartElementCanBeConverted() throws Exception {
String xml =
"<root>\n" +
" <foo>bar</foo>\n" + // remove the newlines and white space to make the test pass
"</root>";
String expectedJson = "{\"root\": {\n" +
" \"foo\": bar\n" +
"}}";
String actualJSON = transformXmlToJson(xml);
Assert.assertEquals(expectedJson, actualJSON);
}
private String transformXmlToJson(String xml) throws XmlPullParserException {
XmlPullParser parser = XppFactory.createDefaultParser();
HierarchicalStreamReader reader = new XppReader(new StringReader(xml), parser, new NoNameCoder());
StringWriter write = new StringWriter();
JsonWriter jsonWriter = new JsonWriter(write);
HierarchicalStreamCopier copier = new HierarchicalStreamCopier();
copier.copy(reader, jsonWriter);
jsonWriter.close();
return write.toString();
}
The test fails the exception:
com.thoughtworks.xstream.io.json.AbstractJsonWriter$IllegalWriterStateException: Cannot turn from state SET_VALUE into state START_OBJECT for property foo
at com.thoughtworks.xstream.io.json.AbstractJsonWriter.handleCheckedStateTransition(AbstractJsonWriter.java:265)
at com.thoughtworks.xstream.io.json.AbstractJsonWriter.startNode(AbstractJsonWriter.java:227)
at com.thoughtworks.xstream.io.json.AbstractJsonWriter.startNode(AbstractJsonWriter.java:232)
at com.thoughtworks.xstream.io.copy.HierarchicalStreamCopier.copy(HierarchicalStreamCopier.java:36)
at com.thoughtworks.xstream.io.copy.HierarchicalStreamCopier.copy(HierarchicalStreamCopier.java:47)
at testConvertXmlToJSON.transformXmlToJson(testConvertXmlToJSON.java:30)
Is there a way to to tell the copy process to ignore the ignorable white space. I cannot find any obvious way to enable this behaviour, but I think it should be there. I know I can pre-process the XML to remove the white space, or maybe just use another library.
update
I can work around the issue using a decorator of the HierarchicalStreamReader interface and suppressing the white space node manually, this still does not feel ideal though. This would look something like the code below, which will make the test pass.
public class IgnoreWhitespaceHierarchicalStreamReader implements HierarchicalStreamReader {
private HierarchicalStreamReader innerHierarchicalStreamReader;
public IgnoreWhitespaceHierarchicalStreamReader(HierarchicalStreamReader hierarchicalStreamReader) {
this.innerHierarchicalStreamReader = hierarchicalStreamReader;
}
public String getValue() {
String getValue = innerHierarchicalStreamReader.getValue();
System.out.printf("getValue = '%s'\n", getValue);
if(innerHierarchicalStreamReader.hasMoreChildren() && getValue.length() >0) {
if(getValue.matches("^\\s+$")) {
System.out.printf("*** White space value suppressed\n");
getValue = "";
}
}
return getValue;
}
// rest of interface ...
Any help is appreciated.

Comparing two XML's as String objects is not a good idea. How are you going to handle case when xml is same but nodes are not in the same order.
e.g.
<xml><node1>1</node1><node2>2</node2></xml>
is similar to
<xml><node2>2</node2><node1>1</node1></xml>
but when you do a String compare it will always return false.
Instead use tools like XMLUnit. Refer to following link for more details,
Best way to compare 2 XML documents in Java

how can I detect charset of a web page

I just want to get the web page source in java language and I just want to get that content with correct encoding type. I am able to get the content of a web page till now. But for some web pages the content comes with absurd characters. So I need to detect charset of that web page.
According to my little research I found that there is a jChardet library to do this. But I couldn't import it to my project. Can someone please help me?
By the way the code below is the code to read the web page content
StringBuilder builder = new StringBuilder();
InputStream is = fURL.openStream();
BufferedReader buffer = null;
buffer = new BufferedReader(new InputStreamReader(is, encodingType));
int byteRead;
while ((byteRead = buffer.read()) != -1) {
builder.append((char) byteRead);
}
buffer.close();
return builder;

Read the Content-Type header of the HTTP response, it's the best way to get the charset. Only apply guessing when you have no alternatives - you do.

You can use too the http://jchardet.sourceforge.net/
private static String detectCharset(byte[] body) {
nsDetector det = new nsDetector(nsPSMDetector.ALL);
det.Init(new nsICharsetDetectionObserver() {
public void Notify(String charset) {
HtmlCharsetDetector.found = true;
}
});
boolean done = false;
boolean isAscii = true;
if (isAscii) {
isAscii = det.isAscii(body, body.length);
}
// DoIt if non-ascii and not done yet.
if (!isAscii && !done) {
done = det.DoIt(body, body.length, false);
}
return det.getProbableCharsets()[0];
}

Minimally, you would need to read and parse the HTTP headers to see whether they declare the encoding in HTTP headers and, in the absence of such a declaration (rather common), parse the document itself to find a meta tag that declares the encoding. For XHTML documents, you would need to check the XML declaration and default to utf-8. This would still leave a considerable amount of pages with undeclared encoding, so some heuristics would be needed. You might check the section on encodings in the HTML5 draft, which contains some heuristic overrides too (e.g., treating iso-8859-1 as windows-1252).

Failing to parse this multi-part mime message body in Java

I'm not writing a mail application, so I don't have access to all the headers and such. All I have is something like the block at the end of this question. I've tried using the JavaMail API to parse this, using something like
Session s = Session.getDefaultInstance(new Properties());
InputStream is = new ByteArrayInputStream(<< String to parse >>);
MimeMessage message = new MimeMessage(s, is);
Multipart multipart = (Multipart) message.getContent();
But, it just tells me that message.getContent is a String, not a Multipart or MimeMultipart. Plus, I don't really need all the overhead of the whole JavaMail API, I just need to parse the text into it's parts. Here's an example:
This is a multi-part message in MIME format.\n\n------=_NextPart_000_005D_01CC73D5.3BA43FB0\nContent-Type: text/plain;\n\tcharset="iso-8859-1"\nContent-Transfer-Encoding: quoted-printable\n\nStuff:\n\n Please read this stuff at the beginning of each week. =\nFeel free to discuss it throughout the week.\n\n\n--=20\n\nMrs. Suzy M. Smith\n555-555-5555\nsuzy#suzy.com\n------=_NextPart_000_005D_01CC73D5.3BA43FB0\nContent-Type: text/html;\n\tcharset="iso-8859-1"\nContent-Transfer-Encoding: quoted-printable\n\n\n\n\n\n\n\n\n\nStuff:\n =20\nPlease read this stuff at the beginning of each =\nweek. Feel=20\nfree to discuss it throughout the week.\n-- Mrs. Suzy M. Smith555-555-5555suzy#suzy.com\n\n------=_NextPart_000_005D_01CC73D5.3BA43FB0--\n\n

First I took your example message and replaced all occurrences of \n with newlines and \t with tabs.
Then I downloaded the JARs from the Mime4J project, a subproject of Apache James, and executed the GUI parsing example org.apache.james.mime4j.samples.tree.MessageTree with the transformed message above as input. And apparently Mime4J was able to parse the message and to extract the HTML message part.

There are a few things wrong with the text you posted.
It is not a valid multi-part mime. Check out wikipedia reference which, while non-normative, is still correct.
The mime boundary is not defined. From the wikipedia example: Content-Type: multipart/mixed; boundary="frontier" shows that the boundary is "frontier". In your example, "----=_NextPart_000_005D_01CC73D5.3BA43FB0" is the boundary, but that can only be determined by scanning the text (i.e. the mime is malformed). You need to instruct the goofball that is passing you the mime content that you also need to know the mime boundary value, which is not defined in a message header. If you get the entire body of the message you will have enough because the body of the message starts with MIME-Version: 1.0 followed by Content-Type: multipart/mixed; boundary="frontier" where frontier will be replaced with the value of the boundary for the encoded mime.
If the person who is sending the body is a goofball (changed from monkey because monkey is too judgemental - my bad DwB), and will not (more likely does not know how to) send the full body, you can derive the boundary by scanning the text for a line that starts and ends with "--" (i.e. --boundary--). Note that I mentioned a "line". The terminal boundary is actually "--boundary--\n".
Finally, the stuff you posted has 2 parts. The first part appears to define substitutions to take place in the second part. If this is true, the Content-Type: of the first part should probably be something other than "text/plain". Perhaps "companyname/substitution-definition" or something like that. This will allow for multiple (as in future enhancements) substitution formats.

Can create MimeMultipart from http request.
javax.mail.internet.MimeMultipart m = new MimeMultipart(new ServletMultipartDataSource(httpRequest));
public class ServletMultipartDataSource implements DataSource {
String contentType;
InputStream inputStream;
public ServletMultipartDataSource(ServletRequest request) throws IOException {
inputStream = new SequenceInputStream(new ByteArrayInputStream("\n".getBytes()), request.getInputStream());
contentType = request.getContentType();
}
public InputStream getInputStream() throws IOException {
return inputStream;
}
public OutputStream getOutputStream() throws IOException {
return null;
}
public String getContentType() {
return contentType;
}
public String getName() {
return "ServletMultipartDataSource";
}
}
For get submitted form parameter need parse BodyPart headers:
public String getStringParameter(String name) throws MessagingException, IOException {
for (int i = 0; i < getCount(); i++) {
BodyPart bodyPart = m.getBodyPart(i);
String[] nameHeader = bodyPart.getHeader("Content-Disposition");
if (nameHeader != null && content instanceof String) {
for (String bodyName : nameHeader) {
if (bodyName.contains("name=\"" + name + "\"")) return String.valueOf(bodyPart.getContent());
}
}
}
return null;
}

If you are using javax.servlet.http.HttpServlet to receive the message, you will have to use HttpServletRequests.getHeaders to obtain the value of the HTTP header content-type. You will then use org.apache.james.mime4j.stream.MimeConfig.setHeadlessParsing to set the MimeConfig with the information so that it can properly process the mime message.
It appears that you are using HttpServletRequest.getInputStream to read the contents of the request. The input stream returned only has the content of the message after the HTTP headers (terminated by a blank line). That is why you have to extract content-type from the HTTP headers and feed it to the parser using setHeadlessParsing.

Java data object for bidirectional I/O

I am developing an interface that takes as input an encrypted byte stream -- probably a very large one -- that generates output of more or less the same format.
The input format is this:
{N byte envelope}
- encryption key IDs &c.
{X byte encrypted body}
The output format is the same.
Here's the usual use case (heavily pseudocoded, of course):
Message incomingMessage = new Message (inputStream);
ProcessingResults results = process (incomingMessage);
MessageEnvelope messageEnvelope = new MessageEnvelope ();
// set message encryption options &c. ...
Message outgoingMessage = new Message ();
outgoingMessage.setEnvelope (messageEnvelope);
writeProcessingResults (results, message);
message.writeToOutput (outputStream);
To me, it seems to make sense to use the same object to encapsulate this behaviour, but I'm at a bit of a loss as to how I should go about this. It isn't practical to load all of the encrypted body in at a time; I need to be able to stream it (so, I'll be using some kind of input stream filter to decrypt it) but at the same time I need to be able to write out new instances of this object. What's a good approach to making this work? What should Message look like internally?

I won't create one class to handle in- and output - one class, one responsibility. I would like two filter streams, one for input/decryption and one for output/encryption:
InputStream decrypted = new DecryptingStream(inputStream, decryptionParameters);
...
OutputStream encrypted = new EncryptingStream(outputSream, encryptionOptions);
They may have something like a lazy init mechanism reading the envelope before first read() call / writing the envelope before first write() call. You also use classes like Message or MessageEnvelope in the filter implementations, but they may stay package protected non API classes.
The processing will know nothing about de-/encryption just working on a stream. You may also use both streams for input and output at the same time during processing streaming the processing input and output.

Can you split the body at arbitrary locations?
If so, I would have two threads, input thread and output thread and have a concurrent queue of strings that the output thread monitors. Something like:
ConcurrentLinkedQueue<String> outputQueue = new ConcurrentLinkedQueue<String>();
...
private void readInput(Stream stream) {
String str;
while ((str = stream.readLine()) != null) {
outputQueue.put(processStream(str));
}
}
private String processStream(String input) {
// do something
return output;
}
private void writeOutput(Stream out) {
while (true) {
while (outputQueue.peek() == null) {
sleep(100);
}
String msg = outputQueue.poll();
out.write(msg);
}
}
Note: This will definitely not work as-is. Just a suggestion of a design. Someone is welcome to edit this.

If you need to read and write same time you either have to use threads (different threads reading and writing) or asynchronous I/O (the java.nio package). Using input and output streams from different threads is not a problem.
If you want to make a streaming API in java, you should usually provide InputStream for reading and OutputStream for writing. This way those can then be passed for other APIs so that you can chain things and so get the streams go all the way as streams.
Input example:
Message message = new Message(inputStream);
results = process(message.getInputStream());
Output example:
Message message = new Message(outputStream);
writeContent(message.getOutputStream());
The message needs to wrap the given streams with a classes that do the needed encryption and decryption.
Note that reading multiple messages at same time or writing multiple messages at same time would need support from the protocol too. You need to get the synchronization correct.

You should check Wikipedia article on different block cipher modes supporting encryption of streams. The different encryption algorithms may support a subset of these.
Buffered streams will allow you to read, encrypt/decrypt and write in a loop.
Examples demonstrating ZipInputStream and ZipOutputStream could provide some guidance on how you may solve this. See example.

What you need is using Cipher Streams (CipherInputStream). Here is an example of how to use it.

I agree with Arne, the data processor shouldn't know about encryption, it just needs to read the decrypted body of the message, and write out the results, and stream filters should take care of encryption. However, since this is logically operating on the same piece of information (a Message), I think they should be packaged inside one class which handles the message format, although the encryption/decryption streams are indeed independent from this.
Here's my idea for the structure, flipping the architecture around somewhat, and moving the Message class outside the encryption streams:
class Message {
InputStream input;
Envelope envelope;
public Message(InputStream input) {
assert input != null;
this.input = input;
}
public Message(Envelope envelope) {
assert envelope != null;
this.envelope = envelope;
}
public Envelope getEnvelope() {
if (envelope == null && input != null) {
// Read envelope from beginning of stream
envelope = new Envelope(input);
}
return envelope
}
public InputStream read() {
assert input != null
// Initialise the decryption stream
return new DecryptingStream(input, getEnvelope().getEncryptionParameters());
}
public OutputStream write(OutputStream output) {
// Write envelope header to output stream
getEnvelope().write(output);
// Initialise the encryption
return new EncryptingStream(output, getEnvelope().getEncryptionParameters());
}
}
Now you can use it by creating a new message for the input, and one for the output:
OutputStream output; // This is the stream for sending the message
Message inputMessage = new Message(input);
Message outputMessage = new Message(inputMessage.getEnvelope());
process(inputMessage.read(), outputMessage.write(output));
Now the process method just needs to read chunks of data as required from the input, and write results to the output:
public void process(InputStream input, OutputStream output) {
byte[] buffer = new byte[1024];
int read;
while ((read = input.read(buffer) > 0) {
// Process buffer, writing to output as you go.
}
}
This all now works in lockstep, and you don't need any extra threads. You can also abort early without having to process the whole message (if the output stream is closed for example).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.