Parsing a URL in Java - java

I am looking for an equivalent to PHP's "parse_url" function in Java. I am not running in Tomcat. I have query strings saved in a database that I'm trying to break apart into individual parameters. I'm working inside of Pentaho, so I only have the Java SE classes to work with. I know I could write a plugin or something, but if I'm going to do all that I'll just write the script in PHP and be done with it.
TLDR: Looking for a Java standard class/function that takes a String and spits out an array of parameters.
Thanks,
Roger

You can accomplish that using java.net.URL:
URL url = new URL("http://hostname:port/path?arg=value#anchor");
String protocol = url.getProtocol(); // http
String host = url.getHost(); // hostname
String path = url.getPath(); // /path
int port = url.getPort(); // port
String query = url.getQuery(); // arg=value
String ref = url.getRef(); // anchor

Here's something quick and dirty (have not compiled it, but you should get the idea.
URL url = new URL("http://...");
String query = url.getQuery();
String paramStrings[] = query.split("\\&");
HashMultiMap<String, String> params = HashMultiMap.create(); // <== google guava class
for (int i=0;iparamStrings.length;i++) {
String parts[] = params[i].split("=");
params.put(URLDecoder.decode(parts[0], "UTF-8"), URLDecoder.decode(parts[1], "UTF-8"));
}
Set<String> paramVals = params.get("paramName");
If you don't want to use the guava class, you can accomplish the same thing with some additional code, and a HashMap>

No such thing in Java. You will need to parse the strings manually and create your own array. You could create your own parse_url using StringTokenizer, String.split, or Regular Expressions rather easily.
You could also cast those strings from the database back to URL objects and parse them that way, here are the docs.

String has a split function, but you will need to write your own regex to determine how to split the string.
See: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)

Related

Java - how to build an URI using a query string that is already escaped?

LATER EDIT: not same problem as the suggested answer. In my case, I need to build the URI, relying on the fact that the original query string is not modified.
I have a String (coming from a request query String) that is already correctly escaped, like param1=%2Ffolder1%2Ffolder2%2Ffolder%26name%202&param2=9481dxcv234.
The decoded value of param1 is /folder1/folder2/folder&name 2. Obviously, I cannot unescape that String (because of the & char in this value)...
I need to build an URI which has that original string as query value.
I tried using org.apache.http.client.utils.URIBuilder but could not get it to work: if I provide the original String to the URI(... constructor, the resulting URL is double-escaped, like param1=%252Ffolder1%252Ffolder2%252Ffolder%2526name%25202&param2=9481dxcv234.
Can I somehow do what I need ? To build an URI by passing the query string already escaped and leave it unchanged ?
Thanks.
I think, the simplest way is unescape it first.
Then you can work with url as usualy.
You could use org.springframework.web.util.UriComponentsBuilder:
UriComponentsBuilder builder = UriComponentsBuilder.fromHttpUrl(url);
UriComponents components = builder.build(true); //EDIT: pass true when the URI was fully encoded already.
MultiValueMap<String, String> parameters = components.getQueryParams();
parameters.get("param1") //"%2Ffolder1%2Ffolder2%2Ffolder%26name%202"
You can then URLDecode to get what you need.
Edit:
Since you appear to be using Apache HttpClient,
List<NameValuePair> params = org.apache.http.client.utils.URLEncodedUtils.parse(new URI(url), Charset.forName("UTF-8"));
params.get("param1") //"/folder1/folder2/folder&name 2"

Cloudant java non-latin characters

I am having a difficulty trying to use the Cloudant java client with Greek characters. Saving objects that include Strings with Greek characters seems to be working ok, as they appear correctly at the Cloudant console. Below is a minimal test case for this. The DummyObject has a String name, an _id and a _rev.
String password = "xxxx";
CloudantClient client = new CloudantClient("xx", "xxx", password);
Database database = client.database("mydatabase", false);
DummyClass dummyobject = new DummyClass();
dummyobject.setName("ά έ ό ύ αβγδεζηθικλμνξ");
Response saveResponse = database.save(dummyobject);
String id = saveResponse.getId();
String result=new String();
DummyClass loaded = database.find(DummyClass.class,id);
result = result+"Object:"+loaded.getName()+"\n"; //Prints out garbage
result = result+"UTF-8:"+new String(loaded.getName().getBytes(),Charset.forName("utf-8"))+"\n"; //Prints most characters correct, except for some accented ones
InputStream inputStream = database.find(id);
DummyClass loadedFromStream = Json.fromJson(Json.parse(inputStream), DummyClass.class);
result = result+"From Stream:"+loadedFromStream.getName(); //prints out fine
return ok(result);
By retrieving the stream and using Jackson to deserialize, the output is correct, but then I'd have to implement many of the provided methods for views, bulk document manipulation etc.
Perhaps the problem is in the LightCouch library, specifically here: CouchDbClientBase.java, since that is the point that I have found differs between the two implementations (get() as object and as stream). However, I do not know how to confirm, fix or work around it.
We fixed this in release 1.1.0, I think:
https://github.com/cloudant/java-cloudant/releases/tag/1.1.0
[FIX] Fixed handling of non-ASCII characters when the platform's default charset is not UTF-8.
The problem was indeed at the LightCouch library. Making the following change and respective changes on the code for views, fixed it.
return getGson().fromJson(new InputStreamReader(in), classType);
to
return getGson().fromJson(new InputStreamReader(in, Charset.forName("UTF-8")), classType);

Get a specific string from a url

How to get the ip address/address from the url string using substring in java.
http://abc.com:8080/abc/abc?abc=abc
I want to show the output abc.com from the above url. how can I extract this from the url.
Below is my code, I have retrieved, but is it a good way?
String a = servlet.substring(servlet.indexOf(":")+1);
String b = a.substring(2,a.indexOf(":"));
System.out.println(a);
System.out.println(b);
String c = servlet.replace(b, "192.168.0.1");
System.out.println(c);
Use URL class:
URL url = new URL("http://abc.com:8080/abc/abc?abc=abc");
System.out.println(url.getHost());
Why not use the existing URL class and call getHost() ?
Gets the host name of this URL, if applicable. The format of the host
conforms to RFC 2732, i.e. for a literal IPv6 address, this method
will return the IPv6 address enclosed in square brackets ('[' and
']').
Note the other useful methods on this (getPort() etc.). It's worth using these existing utility classes rather than roll your own solution. It looks a simple solution but the existing utilities will cater for all the edge cases.

Extracting values from a String containing a HTTP header

I'm seeking a better way to extract data from a String that contains a HTTP header. For example, I'd like to get the number 160 from the content length portion of the string: "Content-Length: 160\r\n".
It appears that all the data in the HTTP header is preceded with a name, colon and space, and after the value immediately follows the '\r' and '\n' characters.
At the moment I am doing this:
int contentLengthIndex = serverHeader.lastIndexOf("Content-Length: ");
int contentLastIndex = serverHeader.length()-1;
String contentText = serverHeader.substring(contentLengthIndex + 16, contentLastIndex);
contentText = contentText.replaceAll("(\\r|\\n)", "");
int contentLength = Integer.parseInt(contentText);
But it seems messy and it is only good for getting the "Content-Length" at the end of the string. Is there a better more universal solution for extracting values from a String containing a HTTP header that can be adjusted to work for obtaining both int values or String values?
I should also mention that the connection needs to be able return data back to the browser after a request, which from my understanding prevents me from reaping the benefits of using HttpURLConnection.
A quick solution will be:
new Scanner(serverHeader).useDelimiter("[^\\d]+").nextInt());
The other way if you want to create a Hashtable of the headers:
String[] fields = serverHeader.trim().split(":");
String key = fields[0].trim();
String value = fields[1].trim();
I am not sure why you are doing this manual, there is already API for this!
use Class java.net.HttpURLConnection
edited: also methods URLConnection.getContentLength() and URLConnection.getContentLengthLong()
Have you tried just stripping all non-numeric characters from the string?
serverHeader = serverHeader.replaceAll("[^\\d.]", "");
If you are using Socket class to read HTTP data i suggest you to use HttpURLConnection as it provides you convenient method to parse the Header Fields.
It has public Map getHeaderFields() method which you can use to get all the fields.
If you want a guide to start using HttpURLConnection you can see it here.

How do I pass multiple parameter in URL?

I am trying to figure out how to pass multiple parameters in a URL. I want to pass latitude and longitude from my android class to a java servlet. How can I do that?
URL url;
double lat=touchedPoint.getLatitudeE6() / 1E6;
double lon=touchedPoint.getLongitudeE6() / 1E6;
url = new URL("http://10.0.2.2:8080/HelloServlet/PDRS?param1="+lat+lon);
In this case output (written to file) is 28.53438677.472097.
This is working but I want to pass latitude and longitude in two separate parameters so that my work at server side is reduced. If it is not possible how can I at least add a space between lat & lon so that I can use tokenizer class to get my latitude and longitude. I tried following line but to no avail.
url = new URL("http://10.0.2.2:8080/HelloServlet/PDRS?param1="+lat+" "+lon);
output- Nothing is written to file
url = new URL("http://10.0.2.2:8080/HelloServlet/PDRS?param1="+lat+"&?param2="+lon);
output- 28.534386 (Only Latitude)
url = new URL("http://10.0.2.2:8080/HelloServlet/PDRS?param1="+lat+"?param2="+lon);
output- 28.532577?param2=77.502996
My servlet code is as follows:
req.setCharacterEncoding("UTF-8");
resp.setCharacterEncoding("UTF-8");
final String par1 = req.getParameter("param1");
final String par2 = req.getParameter("param2");
FileWriter fstream = new FileWriter("C:\\Users\\Hitchhiker\\Desktop\\out2.txt");
BufferedWriter out = new BufferedWriter(fstream);
out.write(par1);
out.append(par2);
out.close();
Also I wanted to the know is this the most safe and secured way to pass the data from android device to server.
This
url = new URL("http://10.0.2.2:8080/HelloServlet/PDRS?param1="+lat+"&param2="+lon);
must work. For whatever strange reason1, you need ? before the first parameter and & before the following ones.
Using a compound parameter like
url = new URL("http://10.0.2.2:8080/HelloServlet/PDRS?param1="+lat+"_"+lon);
would work, too, but is surely not nice. You can't use a space there as it's prohibited in an URL, but you could encode it as %20 or + (but this is even worse style).
1 Stating that ? separates the path and the parameters and that & separates parameters from each other does not explain anything about the reason. Some RFC says "use ? there and & there", but I can't see why they didn't choose the same character.
I do not know much about Java but URL query arguments should be separated by "&", not "?"
https://www.rfc-editor.org/rfc/rfc3986 is good place for reference using "sub-delim" as keyword. http://en.wikipedia.org/wiki/Query_string is another good source.
You can pass multiple parameters as "?param1=value1&param2=value2"
But it's not secure. It's vulnerable to Cross Site Scripting (XSS) Attack.
Your parameter can be simply replaced with a script.
Have a look at this article and article
You can make it secure by using API of StringEscapeUtils
static String escapeHtml(String str)
Escapes the characters in a String using HTML entities.
Even using https url for security without above precautions is not a good practice.
Have a look at related SE question:
Is URLEncoder.encode(string, "UTF-8") a poor validation?

Categories

Resources