Extracting values from a String containing a HTTP header - java

I'm seeking a better way to extract data from a String that contains a HTTP header. For example, I'd like to get the number 160 from the content length portion of the string: "Content-Length: 160\r\n".
It appears that all the data in the HTTP header is preceded with a name, colon and space, and after the value immediately follows the '\r' and '\n' characters.
At the moment I am doing this:
int contentLengthIndex = serverHeader.lastIndexOf("Content-Length: ");
int contentLastIndex = serverHeader.length()-1;
String contentText = serverHeader.substring(contentLengthIndex + 16, contentLastIndex);
contentText = contentText.replaceAll("(\\r|\\n)", "");
int contentLength = Integer.parseInt(contentText);
But it seems messy and it is only good for getting the "Content-Length" at the end of the string. Is there a better more universal solution for extracting values from a String containing a HTTP header that can be adjusted to work for obtaining both int values or String values?
I should also mention that the connection needs to be able return data back to the browser after a request, which from my understanding prevents me from reaping the benefits of using HttpURLConnection.

A quick solution will be:
new Scanner(serverHeader).useDelimiter("[^\\d]+").nextInt());
The other way if you want to create a Hashtable of the headers:
String[] fields = serverHeader.trim().split(":");
String key = fields[0].trim();
String value = fields[1].trim();
I am not sure why you are doing this manual, there is already API for this!

use Class java.net.HttpURLConnection
edited: also methods URLConnection.getContentLength() and URLConnection.getContentLengthLong()

Have you tried just stripping all non-numeric characters from the string?
serverHeader = serverHeader.replaceAll("[^\\d.]", "");

If you are using Socket class to read HTTP data i suggest you to use HttpURLConnection as it provides you convenient method to parse the Header Fields.
It has public Map getHeaderFields() method which you can use to get all the fields.
If you want a guide to start using HttpURLConnection you can see it here.

Related

Rest API call encoding in Flutter and decoding in Java

My Flutter app calls a REST API method /user/search/<search string> and I am forming the URL endpoint using encodeQueryComponent like this:
String endpoint = "/user/search/"+Uri.encodeQueryComponent(searchString);
The back-end implemented in Java tries to retrieve the search string like this:
String value = URLDecoder.decode(value, StandardCharsets.UTF_8.toString());
However, when the search string contains the + sign, the raw encode string in the back-end contains %2B and the decoded String contains space. As a temporary hack, I am currently doing value = value.replace("%2B", "+"); instead of decode. But this is obviously not the right approach because the search string may contain characters from any language or special characters.
Can someone tell me what is the right way to get the original string sent by the user in Java?

trying to figure out what kind of unicode should i have

I'm working on spring boot on a project that fetch the data from the database then use post method to send them through HTTP post request, everything is okay but with Latina, the data i have in database encoded with: ISO 8859-6 i have encoded it to UTF-8 and UTF-16 but still it returns unreadable text question marks and special characters
test example in Arabic :
مرحبا
should be like this to be valid and reliable after post method :
06450631062d06280627
i can't figure out what kind of encoding happend here, now im doing integration from .NET to java:
this what they used in .NET :
public static String UnicodeStr2HexStr(String strMessage)
{
byte[] ba = Encoding.BigEndianUnicode.GetBytes(strMessage);
String strHex = BitConverter.ToString(ba);
strHex = strHex.Replace("-", "");
return strHex;
}
i just need to know what kind of encoding happend here to apply in java, and it would helpfull if someone provide me with way:
i have tried this but it return different value:
String encodedWithISO88591 = "مرحبا;
String decodedToUTF8 = new String(encodedWithISO88591.getBytes("ISO-8859-1"), "UTF-8");
What you're looking to get is the hex representation of the Arabic String in UTF-16BE
String yourVal = "مرحبا";
System.out.println(DatatypeConverter.printHexBinary(yourVal.getBytes(StandardCharsets.UTF_16BE)));
output will be :
06450631062D06280627

Parsing a URL in Java

I am looking for an equivalent to PHP's "parse_url" function in Java. I am not running in Tomcat. I have query strings saved in a database that I'm trying to break apart into individual parameters. I'm working inside of Pentaho, so I only have the Java SE classes to work with. I know I could write a plugin or something, but if I'm going to do all that I'll just write the script in PHP and be done with it.
TLDR: Looking for a Java standard class/function that takes a String and spits out an array of parameters.
Thanks,
Roger
You can accomplish that using java.net.URL:
URL url = new URL("http://hostname:port/path?arg=value#anchor");
String protocol = url.getProtocol(); // http
String host = url.getHost(); // hostname
String path = url.getPath(); // /path
int port = url.getPort(); // port
String query = url.getQuery(); // arg=value
String ref = url.getRef(); // anchor
Here's something quick and dirty (have not compiled it, but you should get the idea.
URL url = new URL("http://...");
String query = url.getQuery();
String paramStrings[] = query.split("\\&");
HashMultiMap<String, String> params = HashMultiMap.create(); // <== google guava class
for (int i=0;iparamStrings.length;i++) {
String parts[] = params[i].split("=");
params.put(URLDecoder.decode(parts[0], "UTF-8"), URLDecoder.decode(parts[1], "UTF-8"));
}
Set<String> paramVals = params.get("paramName");
If you don't want to use the guava class, you can accomplish the same thing with some additional code, and a HashMap>
No such thing in Java. You will need to parse the strings manually and create your own array. You could create your own parse_url using StringTokenizer, String.split, or Regular Expressions rather easily.
You could also cast those strings from the database back to URL objects and parse them that way, here are the docs.
String has a split function, but you will need to write your own regex to determine how to split the string.
See: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)

Character encoding URLDecoder on the action

I'm attempting to fix a character encoding issue. I realize this is really not a good way to go about it but currently I am just going to bandage it up and when character encoding comes up in a new to do list I will bring a proper solution.
anyway currently i've fixed a character encoding issue with french characters by doing this in the action:
String folderName = request.getParameter(PK_FOLDER_NAME);
if (response.getCharacterEncoding().equals("ISO-8859-1") && folderName != null) {
folderName = URLDecoder.decode(new String(folderName.getBytes("ISO-8859-1"), "UTF-8"), "UTF-8");
}
however what is the string is an array? how would i do it?
for example what if string is as such:
String[] memos = request.getParameterValues(PK_MEMO);
how would i convert using the URLDecoder than?
thanks guys...
the answer I was looking for was this (which works):
if (response.getCharacterEncoding().equals("ISO-8859-1") && memos != null) {
for(int n=0; n< memos.length; n++) {
memos[n] = URLDecoder.decode(new String(memos[n].getBytes("ISO-8859-1"), "UTF-8"), "UTF-8");
}
}
You're going about it completely the wrong way.
You're first obtaining the request parameter (and thus it start to get parsed which makes it too late to set the proper encoding for request parameter parsing!) and you're determing the encoding of the response instead of the request. This makes no sense.
Just set the request encoding before ever getting the first parameter. It will then be used during parsing the request parameters for the first time.
request.setCharacterEncoding("UTF-8");
String folderName = request.getParameter(PK_FOLDER_NAME);
String[] memos = request.getParameterValues(PK_MEMO);
// ...
Note that you'd normally like to call request.setCharacterEncoding("UTF-8") in a servlet filter so that you don't need to repeat it over all servlets of your webapp.
The response encoding is normally to be configured on the JSP side by #page pageEncoding on a per-JSP basis or <page-encoding> in web.xml on an application-wide basis.
Don't try to introduce bandages/workarounds, it would only make things worse. Just do it the right way from the beginning on.
See also:
Unicode - How to get the characters right?

How to extract query string from a URL of a web-page using java

From the following URL in OathCallBack page I want extract access_token and token_type using Java. Any idea how to do it?
http://myserver.com/OathCallBack#state=/profile&access_token=ya29.AHES6ZQLqtYrPKuw2pMzURJtWuvINspm8-Vf5x-MZ5YzqVy5&token_type=Bearer&expires_in=3600
I tried the following, but unable to extract required information.
{
String scheme = req.getScheme(); // http
String serverName = req.getServerName(); // myserver.com
int serverPort = req.getServerPort(); // 80
String contextPath = req.getContextPath();
String servletPath = req.getServletPath();
String pathInfo = req.getPathInfo(); // return null and exception
String queryString = req.getQueryString(); // return null
}
<---------------------------------------------------------->
I am going to edit my question
Thank you every one for nice reply,
google did it,
you can refer to that link by URL
http://developers.google.com/accounts/docs/OAuth2Login
inside above URL page there is following link
http://accounts.google.com/o/oauth2/auth? scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww‌​.googleapis.com%2Fauth%2Fuserinfo.profile& state=%2Fprofile& redirect_uri=https%3A%2F%2Foauth2-login-demo.appspot.com%2Foauthcallback& response_type=token& client_id=812741506391.apps.googleusercontent.com
when you click on above link, then you will get your gmail login account access_token, and that token is after # sign
Some characters cannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character # can be used to further specify a subsection (or fragment) of a document; the character = is used to separate a name from a value.
see http://en.wikipedia.org/wiki/Query_string for more:
It looks like the '#' should be a '?'.
In a normal URL, the parameters are passed as key value pairs following a '?' and multiple parameters chained together using '&'. A URL might look as follows:
http: //someserver.com/somedir/somepage.html?param1=value1&param2=value2&param3=value3.
Normally the Java servlet container would return everything after the '?' when calling getQueryString() but due to the absence of the '?' it returns null.
As #Sandeep Nair has suggested getRequestURL() should return this full URL to you and you could parse it using regular expressions to get the information you want. A possible regular expression to use would be along the lines of:
(?<=access_token=)[a-zA-Z0-9.-]*
However, getRequestURL() does NOT normally return the query string, so using this method is relying on the fact that there is a '#' rather and a '?' and is therefore probably not a great solution. See here.
I would advise that you find out why you are getting a '#' instead of a '?' and try to get this changed, if you can do this then the servlet container should manage the URL parameters for you and call to request.getAttribute("access_token") and request.getAttribute("token_type") (see here) will return both values as strings.
You get query string by calling
String queryString = req.getQueryString();
It correctly returns null in your case, as there is no query string. The characters after "#" are anchor specification, which is only visible to the browser and not sent to server.

Categories

Resources