Parse Accept-Language header in Java - java

The accept-language header in request is usually a long complex string -
Eg.
Accept-Language : en-ca,en;q=0.8,en-us;q=0.6,de-de;q=0.4,de;q=0.2
Is there a simple way to parse it in java? Or a API to help me do that?

I would suggest using ServletRequest.getLocales() to let the container parse Accept-Language rather than trying to manage the complexity yourself.

For the record, now it is possible with Java 8:
Locale.LanguageRange.parse()

Here's an alternative way to parse the Accept-Language header which doesn't require a servlet container:
String header = "en-ca,en;q=0.8,en-us;q=0.6,de-de;q=0.4,de;q=0.2";
for (String str : header.split(",")){
String[] arr = str.trim().replace("-", "_").split(";");
//Parse the locale
Locale locale = null;
String[] l = arr[0].split("_");
switch(l.length){
case 2: locale = new Locale(l[0], l[1]); break;
case 3: locale = new Locale(l[0], l[1], l[2]); break;
default: locale = new Locale(l[0]); break;
}
//Parse the q-value
Double q = 1.0D;
for (String s : arr){
s = s.trim();
if (s.startsWith("q=")){
q = Double.parseDouble(s.substring(2).trim());
break;
}
}
//Print the Locale and associated q-value
System.out.println(q + " - " + arr[0] + "\t " + locale.getDisplayLanguage());
}
You can find an explanation of the Accept-Language header and associated q-values here:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
Many thanks to Karl Knechtel and Mike Samuel. Thier comments to the original question helped point me in the right direction.

We are using Spring boot and Java 8. This works
In ApplicationConfig.java write this
#Bean
public LocaleResolver localeResolver() {
return new SmartLocaleResolver();
}
and I have this list in my constants class that has languages that we support
List<Locale> locales = Arrays.asList(new Locale("en"),
new Locale("es"),
new Locale("fr"),
new Locale("es", "MX"),
new Locale("zh"),
new Locale("ja"));
and write the logic in the below class.
public class SmartLocaleResolver extends AcceptHeaderLocaleResolver {
#Override
public Locale resolveLocale(HttpServletRequest request) {
if (StringUtils.isBlank(request.getHeader("Accept-Language"))) {
return Locale.getDefault();
}
List<Locale.LanguageRange> ranges = Locale.LanguageRange.parse("da,es-MX;q=0.8");
Locale locale = Locale.lookup(ranges, locales);
return locale ;
}
}

ServletRequest.getLocale() is certainly the best option if it is available and not overwritten as some frameworks do.
For all other cases Java 8 offers Locale.LanguageRange.parse() as previously mentioned by Quiang Li. This however only gives back a Language String, not a Locale. To parse the language strings you can use Locale.forLanguageTag() (available since Java 7):
final List<Locale> acceptedLocales = new ArrayList<>();
final String userLocale = request.getHeader("Accept-Language");
if (userLocale != null) {
final List<LanguageRange> ranges = Locale.LanguageRange.parse(userLocale);
if (ranges != null) {
ranges.forEach(languageRange -> {
final String localeString = languageRange.getRange();
final Locale locale = Locale.forLanguageTag(localeString);
acceptedLocales.add(locale);
});
}
}
return acceptedLocales;

Locale.forLanguageTag("en-ca,en;q=0.8,en-us;q=0.6,de-de;q=0.4,de;q=0.2")

The above solutions lack some kind of validation. Using ServletRequest.getLocale() returns the server locale if the user does not provides a valid one.
Our websites lately received spam requests with various Accept-Language heades like:
secret.google.com
o-o-8-o-o.com search shell is much better than google!
Google officially recommends o-o-8-o-o.com search shell!
Vitaly rules google ☆*:。゜゚・*ヽ(^ᴗ^)ノ*・゜゚。:*☆ ¯\_(ツ)_/¯(ಠ益ಠ)(ಥ‿ಥ)(ʘ‿ʘ)ლ(ಠ_ಠლ)( ͡° ͜ʖ ͡°)ヽ(゚Д゚)ノʕ•̫͡•ʔᶘ ᵒᴥᵒᶅ(=^ ^=)oO
This implementation can optional check against a supported list of valid Locale. Without this check a simple request with "test" or (2, 3, 4) still bypass the syntax-only validation of LanguageRange.parse(String).
It optional allows empty and null values to allow search engine crawler.
Servlet Filter
final String headerAcceptLanguage = request.getHeader("Accept-Language");
// check valid
if (!HttpHeaderUtils.isHeaderAcceptLanguageValid(headerAcceptLanguage, true, Locale.getAvailableLocales()))
return;
Utility
/**
* Checks if the given accept-language request header can be parsed.<br>
* <br>
* Optional the parsed LanguageRange's can be checked against the provided
* <code>locales</code> so that at least one locale must match.
*
* #see LanguageRange#parse(String)
*
* #param acceptLanguage
* #param isBlankValid Set to <code>true</code> if blank values are also
* valid
* #param locales Optional collection of valid Locale to validate any
* against.
*
* #return <code>true</code> if it can be parsed
*/
public static boolean isHeaderAcceptLanguageValid(final String acceptLanguage, final boolean isBlankValid,
final Locale[] locales)
{
// allow null or empty
if (StringUtils.isBlank(acceptLanguage))
return isBlankValid;
try
{
// check syntax
final List<LanguageRange> languageRanges = Locale.LanguageRange.parse(acceptLanguage);
// wrong syntax
if (languageRanges.isEmpty())
return false;
// no valid locale's to check against
if (ArrayUtils.isEmpty(locales))
return true;
// check if any valid locale exists
for (final LanguageRange languageRange : languageRanges)
{
final Locale locale = Locale.forLanguageTag(languageRange.getRange());
// validate available locale
if (ArrayUtils.contains(locales, locale))
return true;
}
return false;
}
catch (final Exception e)
{
return false;
}
}

Related

How to create annotation to format amount values

So I am working on a solution right now wherein we have 2 requirements:
Format SSN / Telephone Number in Hyphen form which is otherwise
currently being displayed without it.
Format an amount field in the format "$0.00".
Currently we have written a method formatAsHyphen and formatAmount as below:
/**
* This method converts the given string into US format SSN / Telephone number
* #param valueToFormat
* #param fieldToFormat , It should be either 'S' for SSN and 'T' for Mobile Number
* #return
*/
public String formatWithHyphen (String valueToFormat, String fieldToFormat) {
if(valueToFormat != null && valueToFormat.length() > 1) {
StringBuilder formattedValue = new StringBuilder(valueToFormat);
if(fieldToFormat.equalsIgnoreCase("S")) {
//format as SSN
formattedValue = formattedValue.insert(3, '-').insert(6, '-');
} else if(fieldToFormat.equalsIgnoreCase("T")) {
//format as telephone number
formattedValue = formattedValue.insert(3, '-').insert(7, '-');
}
return formattedValue.toString();
}
else {
return null;
}
}
/**
* This method converts a given amount string to a US $ formatted amount.
*
* #param amountToFormat
* #return
*/
public String formatAmount(String amountToFormat) {
try {
if(amountToFormat!=null && amountToFormat.length() > 0) {
Locale locale = new Locale("en", "US");
NumberFormat formatter = NumberFormat.getCurrencyInstance(locale);
return formatter.format(Double.parseDouble(amountToFormat));
}
else {
return null;
}
} catch (NumberFormatException nfe) {
nfe.printStackTrace();
} catch (IllegalArgumentException iae) {
iae.printStackTrace();
}
return null;
}
Now the issue is:
There are multiple pojo classes (TempAssist, SuppNutrition, ChildCare etc) which has the field related to
Amount and SSN / Telephone number
When we get those fields from database, firstly, the unformatted data is
set in the corresponding setters and then in the UI layer, we get
value through getter() and apply the above 2 functions to it and then
finally respond to the client in JSON format.
Its not a clean solution as set happens twice and the code is literally bloated with GET and SET's.
What I am looking for:
An Annotation (for instance, #Format(type="ssn") which I can apply on POJO fields which will ensure that whichever fields are annotated will have SSN updated with hyphen.
This is a web application which does not use Spring framework so any suggestions on Spring cannot be implemented.
Create a class extending JsonSerializer and then on your getter use the #JsonSerialize(using=MySerializer.class) annotation
One of the serializer could be something like:
public class MySerializer extends JsonSerializer<String> {
#Override
public void serialize( String value
, JsonGenerator jgen
, SerializerProvider provider) throws IOException
, JsonProcessingException {
jgen.writeString(MyUtilsClass.formatWithHyphen(value) );
}
}

Get domain name from given url

Given a URL, I want to extract domain name(It should not include 'www' part). Url can contain http/https. Here is the java code that I wrote. Though It seems to work fine, is there any better approach or are there some edge cases, that could fail.
public static String getDomainName(String url) throws MalformedURLException{
if(!url.startsWith("http") && !url.startsWith("https")){
url = "http://" + url;
}
URL netUrl = new URL(url);
String host = netUrl.getHost();
if(host.startsWith("www")){
host = host.substring("www".length()+1);
}
return host;
}
Input: http://google.com/blah
Output: google.com
If you want to parse a URL, use java.net.URI. java.net.URL has a bunch of problems -- its equals method does a DNS lookup which means code using it can be vulnerable to denial of service attacks when used with untrusted inputs.
"Mr. Gosling -- why did you make url equals suck?" explains one such problem. Just get in the habit of using java.net.URI instead.
public static String getDomainName(String url) throws URISyntaxException {
URI uri = new URI(url);
String domain = uri.getHost();
return domain.startsWith("www.") ? domain.substring(4) : domain;
}
should do what you want.
Though It seems to work fine, is there any better approach or are there some edge cases, that could fail.
Your code as written fails for the valid URLs:
httpfoo/bar -- relative URL with a path component that starts with http.
HTTP://example.com/ -- protocol is case-insensitive.
//example.com/ -- protocol relative URL with a host
www/foo -- a relative URL with a path component that starts with www
wwwexample.com -- domain name that does not starts with www. but starts with www.
Hierarchical URLs have a complex grammar. If you try to roll your own parser without carefully reading RFC 3986, you will probably get it wrong. Just use the one that's built into the core libraries.
If you really need to deal with messy inputs that java.net.URI rejects, see RFC 3986 Appendix B:
Appendix B. Parsing a URI Reference with a Regular Expression
As the "first-match-wins" algorithm is identical to the "greedy"
disambiguation method used by POSIX regular expressions, it is
natural and commonplace to use a regular expression for parsing the
potential five components of a URI reference.
The following line is the regular expression for breaking-down a
well-formed URI reference into its components.
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9
The numbers in the second line above are only to assist readability;
they indicate the reference points for each subexpression (i.e., each
paired parenthesis).
import java.net.*;
import java.io.*;
public class ParseURL {
public static void main(String[] args) throws Exception {
URL aURL = new URL("http://example.com:80/docs/books/tutorial"
+ "/index.html?name=networking#DOWNLOADING");
System.out.println("protocol = " + aURL.getProtocol()); //http
System.out.println("authority = " + aURL.getAuthority()); //example.com:80
System.out.println("host = " + aURL.getHost()); //example.com
System.out.println("port = " + aURL.getPort()); //80
System.out.println("path = " + aURL.getPath()); // /docs/books/tutorial/index.html
System.out.println("query = " + aURL.getQuery()); //name=networking
System.out.println("filename = " + aURL.getFile()); ///docs/books/tutorial/index.html?name=networking
System.out.println("ref = " + aURL.getRef()); //DOWNLOADING
}
}
Read more
Here is a short and simple line using InternetDomainName.topPrivateDomain() in Guava: InternetDomainName.from(new URL(url).getHost()).topPrivateDomain().toString()
Given http://www.google.com/blah, that will give you google.com. Or, given http://www.google.co.mx, it will give you google.co.mx.
As Sa Qada commented in another answer on this post, this question has been asked earlier: Extract main domain name from a given url. The best answer to that question is from Satya, who suggests Guava's InternetDomainName.topPrivateDomain()
public boolean isTopPrivateDomain()
Indicates whether this domain name is composed of exactly one
subdomain component followed by a public suffix. For example, returns
true for google.com and foo.co.uk, but not for www.google.com or
co.uk.
Warning: A true result from this method does not imply that the
domain is at the highest level which is addressable as a host, as many
public suffixes are also addressable hosts. For example, the domain
bar.uk.com has a public suffix of uk.com, so it would return true from
this method. But uk.com is itself an addressable host.
This method can be used to determine whether a domain is probably the
highest level for which cookies may be set, though even that depends
on individual browsers' implementations of cookie controls. See RFC
2109 for details.
Putting that together with URL.getHost(), which the original post already contains, gives you:
import com.google.common.net.InternetDomainName;
import java.net.URL;
public class DomainNameMain {
public static void main(final String... args) throws Exception {
final String urlString = "http://www.google.com/blah";
final URL url = new URL(urlString);
final String host = url.getHost();
final InternetDomainName name = InternetDomainName.from(host).topPrivateDomain();
System.out.println(urlString);
System.out.println(host);
System.out.println(name);
}
}
I wrote a method (see below) which extracts a url's domain name and which uses simple String matching. What it actually does is extract the bit between the first "://" (or index 0 if there's no "://" contained) and the first subsequent "/" (or index String.length() if there's no subsequent "/"). The remaining, preceding "www(_)*." bit is chopped off. I'm sure there'll be cases where this won't be good enough but it should be good enough in most cases!
Mike Samuel's post above says that the java.net.URI class could do this (and was preferred to the java.net.URL class) but I encountered problems with the URI class. Notably, URI.getHost() gives a null value if the url does not include the scheme, i.e. the "http(s)" bit.
/**
* Extracts the domain name from {#code url}
* by means of String manipulation
* rather than using the {#link URI} or {#link URL} class.
*
* #param url is non-null.
* #return the domain name within {#code url}.
*/
public String getUrlDomainName(String url) {
String domainName = new String(url);
int index = domainName.indexOf("://");
if (index != -1) {
// keep everything after the "://"
domainName = domainName.substring(index + 3);
}
index = domainName.indexOf('/');
if (index != -1) {
// keep everything before the '/'
domainName = domainName.substring(0, index);
}
// check for and remove a preceding 'www'
// followed by any sequence of characters (non-greedy)
// followed by a '.'
// from the beginning of the string
domainName = domainName.replaceFirst("^www.*?\\.", "");
return domainName;
}
I made a small treatment after the URI object creation
if (url.startsWith("http:/")) {
if (!url.contains("http://")) {
url = url.replaceAll("http:/", "http://");
}
} else {
url = "http://" + url;
}
URI uri = new URI(url);
String domain = uri.getHost();
return domain.startsWith("www.") ? domain.substring(4) : domain;
In my case i only needed the main domain and not the subdomain (no "www" or whatever the subdomain is) :
public static String getUrlDomain(String url) throws URISyntaxException {
URI uri = new URI(url);
String domain = uri.getHost();
String[] domainArray = domain.split("\\.");
if (domainArray.length == 1) {
return domainArray[0];
}
return domainArray[domainArray.length - 2] + "." + domainArray[domainArray.length - 1];
}
With this method the url "https://rest.webtoapp.io/llSlider?lg=en&t=8" will have for domain "webtoapp.io".
val host = url.split("/")[2]
All the above are good. This one seems really simple to me and easy to understand. Excuse the quotes. I wrote it for Groovy inside a class called DataCenter.
static String extractDomainName(String url) {
int start = url.indexOf('://')
if (start < 0) {
start = 0
} else {
start += 3
}
int end = url.indexOf('/', start)
if (end < 0) {
end = url.length()
}
String domainName = url.substring(start, end)
int port = domainName.indexOf(':')
if (port >= 0) {
domainName = domainName.substring(0, port)
}
domainName
}
And here are some junit4 tests:
#Test
void shouldFindDomainName() {
assert DataCenter.extractDomainName('http://example.com/path/') == 'example.com'
assert DataCenter.extractDomainName('http://subpart.example.com/path/') == 'subpart.example.com'
assert DataCenter.extractDomainName('http://example.com') == 'example.com'
assert DataCenter.extractDomainName('http://example.com:18445/path/') == 'example.com'
assert DataCenter.extractDomainName('example.com/path/') == 'example.com'
assert DataCenter.extractDomainName('example.com') == 'example.com'
}
try this one : java.net.URL;
JOptionPane.showMessageDialog(null, getDomainName(new URL("https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains")));
public String getDomainName(URL url){
String strDomain;
String[] strhost = url.getHost().split(Pattern.quote("."));
String[] strTLD = {"com","org","net","int","edu","gov","mil","arpa"};
if(Arrays.asList(strTLD).indexOf(strhost[strhost.length-1])>=0)
strDomain = strhost[strhost.length-2]+"."+strhost[strhost.length-1];
else if(strhost.length>2)
strDomain = strhost[strhost.length-3]+"."+strhost[strhost.length-2]+"."+strhost[strhost.length-1];
else
strDomain = strhost[strhost.length-2]+"."+strhost[strhost.length-1];
return strDomain;}
There is a similar question Extract main domain name from a given url. If you take a look at this answer , you will see that it is very easy. You just need to use java.net.URL and String utility - Split
One of the way I did and worked for all of the cases is using Guava Library and regex in combination.
public static String getDomainNameWithGuava(String url) throws MalformedURLException,
URISyntaxException {
String host =new URL(url).getHost();
String domainName="";
try{
domainName = InternetDomainName.from(host).topPrivateDomain().toString();
}catch (IllegalStateException | IllegalArgumentException e){
domainName= getDomain(url,true);
}
return domainName;
}
getDomain() can be any common method with regex.
private static final String hostExtractorRegexString = "(?:https?://)?(?:www\\.)?(.+\\.)(com|au\\.uk|co\\.in|be|in|uk|org\\.in|org|net|edu|gov|mil)";
private static final Pattern hostExtractorRegexPattern = Pattern.compile(hostExtractorRegexString);
public static String getDomainName(String url){
if (url == null) return null;
url = url.trim();
Matcher m = hostExtractorRegexPattern.matcher(url);
if(m.find() && m.groupCount() == 2) {
return m.group(1) + m.group(2);
}
return null;
}
Explanation :
The regex has 4 groups. The first two are non-matching groups and the next two are matching groups.
The first non-matching group is "http" or "https" or ""
The second non-matching group is "www." or ""
The second matching group is the top level domain
The first matching group is anything after the non-matching groups and anything before the top level domain
The concatenation of the two matching groups will give us the domain/host name.
PS : Note that you can add any number of supported domains to the regex.
If the input url is user input. this method gives the most appropriate host name. if not found gives back the input url.
private String getHostName(String urlInput) {
urlInput = urlInput.toLowerCase();
String hostName=urlInput;
if(!urlInput.equals("")){
if(urlInput.startsWith("http") || urlInput.startsWith("https")){
try{
URL netUrl = new URL(urlInput);
String host= netUrl.getHost();
if(host.startsWith("www")){
hostName = host.substring("www".length()+1);
}else{
hostName=host;
}
}catch (MalformedURLException e){
hostName=urlInput;
}
}else if(urlInput.startsWith("www")){
hostName=urlInput.substring("www".length()+1);
}
return hostName;
}else{
return "";
}
}
To get the actual domain name, without the subdomain, I use:
private String getDomainName(String url) throws URISyntaxException {
String hostName = new URI(url).getHost();
if (!hostName.contains(".")) {
return hostName;
}
String[] host = hostName.split("\\.");
return host[host.length - 2];
}
Note that this won't work with second-level domains (like .co.uk).
// groovy
String hostname ={url -> url[(url.indexOf('://')+ 3)..-1]​.split('/')[0]​ }
hostname('http://hello.world.com/something') // return 'hello.world.com'
hostname('docker://quay.io/skopeo/stable') // return 'quay.io'
const val WWW = "www."
fun URL.domain(): String {
val domain: String = this.host
return if (domain.startsWith(ConstUtils.WWW)) {
domain.substring(ConstUtils.WWW.length)
} else {
domain
}
}
I use regex solution
public static String getDomainName(String url) {
return url.replaceAll("http(s)?://|www\\.|wap\\.|/.*", "");
}
It cleans url from "http/https/www./wap." and from all unnecessary things after / like "/questions" in "https://stackoverflow.com/questions" and we get just "stackoverflow.com"

How to normalize a URL in Java?

URL normalization (or URL canonicalization) is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs are equivalent.
Strategies include adding trailing slashes, https => http, etc. The Wikipedia page lists many.
Got a favorite method of doing this in Java? Perhaps a library (Nutch?), but I'm open. Smaller and fewer dependencies is better.
I'll handcode something for now and keep an eye on this question.
EDIT: I want to aggressively normalize to count URLs as the same if they refer to the same content. For example, I ignore the parameters utm_source, utm_medium, utm_campaign. For example, I ignore subdomain if the title is the same.
Have you taken a look at the URI class?
http://docs.oracle.com/javase/7/docs/api/java/net/URI.html#normalize()
I found this question last night, but there wasn't an answer I was looking for so I made my own. Here it is incase somebody in the future wants it:
/**
* - Covert the scheme and host to lowercase (done by java.net.URL)
* - Normalize the path (done by java.net.URI)
* - Add the port number.
* - Remove the fragment (the part after the #).
* - Remove trailing slash.
* - Sort the query string params.
* - Remove some query string params like "utm_*" and "*session*".
*/
public class NormalizeURL
{
public static String normalize(final String taintedURL) throws MalformedURLException
{
final URL url;
try
{
url = new URI(taintedURL).normalize().toURL();
}
catch (URISyntaxException e) {
throw new MalformedURLException(e.getMessage());
}
final String path = url.getPath().replace("/$", "");
final SortedMap<String, String> params = createParameterMap(url.getQuery());
final int port = url.getPort();
final String queryString;
if (params != null)
{
// Some params are only relevant for user tracking, so remove the most commons ones.
for (Iterator<String> i = params.keySet().iterator(); i.hasNext();)
{
final String key = i.next();
if (key.startsWith("utm_") || key.contains("session"))
{
i.remove();
}
}
queryString = "?" + canonicalize(params);
}
else
{
queryString = "";
}
return url.getProtocol() + "://" + url.getHost()
+ (port != -1 && port != 80 ? ":" + port : "")
+ path + queryString;
}
/**
* Takes a query string, separates the constituent name-value pairs, and
* stores them in a SortedMap ordered by lexicographical order.
* #return Null if there is no query string.
*/
private static SortedMap<String, String> createParameterMap(final String queryString)
{
if (queryString == null || queryString.isEmpty())
{
return null;
}
final String[] pairs = queryString.split("&");
final Map<String, String> params = new HashMap<String, String>(pairs.length);
for (final String pair : pairs)
{
if (pair.length() < 1)
{
continue;
}
String[] tokens = pair.split("=", 2);
for (int j = 0; j < tokens.length; j++)
{
try
{
tokens[j] = URLDecoder.decode(tokens[j], "UTF-8");
}
catch (UnsupportedEncodingException ex)
{
ex.printStackTrace();
}
}
switch (tokens.length)
{
case 1:
{
if (pair.charAt(0) == '=')
{
params.put("", tokens[0]);
}
else
{
params.put(tokens[0], "");
}
break;
}
case 2:
{
params.put(tokens[0], tokens[1]);
break;
}
}
}
return new TreeMap<String, String>(params);
}
/**
* Canonicalize the query string.
*
* #param sortedParamMap Parameter name-value pairs in lexicographical order.
* #return Canonical form of query string.
*/
private static String canonicalize(final SortedMap<String, String> sortedParamMap)
{
if (sortedParamMap == null || sortedParamMap.isEmpty())
{
return "";
}
final StringBuffer sb = new StringBuffer(350);
final Iterator<Map.Entry<String, String>> iter = sortedParamMap.entrySet().iterator();
while (iter.hasNext())
{
final Map.Entry<String, String> pair = iter.next();
sb.append(percentEncodeRfc3986(pair.getKey()));
sb.append('=');
sb.append(percentEncodeRfc3986(pair.getValue()));
if (iter.hasNext())
{
sb.append('&');
}
}
return sb.toString();
}
/**
* Percent-encode values according the RFC 3986. The built-in Java URLEncoder does not encode
* according to the RFC, so we make the extra replacements.
*
* #param string Decoded string.
* #return Encoded string per RFC 3986.
*/
private static String percentEncodeRfc3986(final String string)
{
try
{
return URLEncoder.encode(string, "UTF-8").replace("+", "%20").replace("*", "%2A").replace("%7E", "~");
}
catch (UnsupportedEncodingException e)
{
return string;
}
}
}
Because you also want to identify URLs which refer to the same content, I found this paper from the WWW2007 pretty interesting: Do Not Crawl in the DUST: Different URLs with Similar Text. It provides you with a nice theoretical approach.
No, there is nothing in the standard libraries to do this. Canonicalization includes things like decoding unnecessarily encoded characters, converting hostnames to lowercase, etc.
e.g. http://ACME.com/./foo%26bar becomes:
http://acme.com/foo&bar
URI's normalize() does not do this.
The RL library:
https://github.com/backchatio/rl
goes quite a ways beyond java.net.URL.normalize().
It's in Scala, but I imagine it should be useable from Java.
You can do this with the Restlet framework using Reference.normalize(). You should also be able to remove the elements you don't need quite conveniently with this class.
In Java, normalize parts of a URL
Example of a URL: https://i0.wp.com:55/lplresearch.com/wp-content/feb.png?ssl=1&myvar=2#myfragment
protocol: https
domain name: i0.wp.com
subdomain: i0
port: 55
path: /lplresearch.com/wp-content/uploads/2019/01/feb.png?ssl=1
query: ?ssl=1"
parameters: &myvar=2
fragment: #myfragment
Code to do the URL parsing:
import java.util.*;
import java.util.regex.*;
public class regex {
public static String getProtocol(String the_url){
Pattern p = Pattern.compile("^(http|https|smtp|ftp|file|pop)://.*");
Matcher m = p.matcher(the_url);
return m.group(1);
}
public static String getParameters(String the_url){
Pattern p = Pattern.compile(".*(\\?[-a-zA-Z0-9_.#!$&''()*+,;=]+)(#.*)*$");
Matcher m = p.matcher(the_url);
return m.group(1);
}
public static String getFragment(String the_url){
Pattern p = Pattern.compile(".*(#.*)$");
Matcher m = p.matcher(the_url);
return m.group(1);
}
public static void main(String[] args){
String the_url =
"https://i0.wp.com:55/lplresearch.com/" +
"wp-content/feb.png?ssl=1&myvar=2#myfragment";
System.out.println(getProtocol(the_url));
System.out.println(getFragment(the_url));
System.out.println(getParameters(the_url));
}
}
Prints
https
#myfragment
?ssl=1&myvar=2
You can then push and pull on the parts of the URL until they are up to muster.
Im have a simple way to solve it. Here is my code
public static String normalizeURL(String oldLink)
{
int pos=oldLink.indexOf("://");
String newLink="http"+oldLink.substring(pos);
return newLink;
}

Creating classes dynamically with Java

I have tried to find information about this but have come up empty handed:
I gather it is possible to create a class dynamically in Java using reflection or proxies but I can't find out how. I'm implementing a simple database framework where I create the SQL queries using reflection. The method gets the object with the database fields as a parameter and creates the query based on that. But it would be very useful if I could also create the object itself dynamically so I wouldn't have the need to have a simple data wrapper object for each table.
The dynamic classes would only need simple fields (String, Integer, Double), e.g.
public class Data {
public Integer id;
public String name;
}
Is this possible and how would I do this?
EDIT: This is how I would use this:
/** Creates an SQL query for updating a row's values in the database.
*
* #param entity Table name.
* #param toUpdate Fields and values to update. All of the fields will be
* updated, so each field must have a meaningful value!
* #param idFields Fields used to identify the row(s).
* #param ids Id values for id fields. Values must be in the same order as
* the fields.
* #return
*/
#Override
public String updateItem(String entity, Object toUpdate, String[] idFields,
String[] ids) {
StringBuilder sb = new StringBuilder();
sb.append("UPDATE ");
sb.append(entity);
sb.append("SET ");
for (Field f: toUpdate.getClass().getDeclaredFields()) {
String fieldName = f.getName();
String value = new String();
sb.append(fieldName);
sb.append("=");
sb.append(formatValue(f));
sb.append(",");
}
/* Remove last comma */
sb.deleteCharAt(sb.toString().length()-1);
/* Add where clause */
sb.append(createWhereClause(idFields, ids));
return sb.toString();
}
/** Formats a value for an sql query.
*
* This function assumes that the field type is equivalent to the field
* in the database. In practice this means that this field support two
* types of fields: string (varchar) and numeric.
*
* A string type field will be escaped with single parenthesis (') because
* SQL databases expect that. Numbers are returned as-is.
*
* If the field is null, a string containing "NULL" is returned instead.
*
* #param f The field where the value is.
* #return Formatted value.
*/
String formatValue(Field f) {
String retval = null;
String type = f.getClass().getName();
if (type.equals("String")) {
try {
String value = (String)f.get(f);
if (value != null) {
retval = "'" + value + "'";
} else {
retval = "NULL";
}
} catch (Exception e) {
System.err.println("No such field: " + e.getMessage());
}
} else if (type.equals("Integer")) {
try {
Integer value = (Integer)f.get(f);
if (value != null) {
retval = String.valueOf(value);
} else {
retval = "NULL";
}
} catch (Exception e) {
System.err.println("No such field: " + e.getMessage());
}
} else {
try {
String value = (String) f.get(f);
if (value != null) {
retval = value;
} else {
retval = "NULL";
}
} catch (Exception e) {
System.err.println("No such field: " + e.getMessage());
}
}
return retval;
}
There are many different ways to achieve this (e.g proxies, ASM), but the simplest approach, one that you can start with when prototyping is:
import java.io.*;
import java.util.*;
import java.lang.reflect.*;
public class MakeTodayClass {
Date today = new Date();
String todayMillis = Long.toString(today.getTime());
String todayClass = "z_" + todayMillis;
String todaySource = todayClass + ".java";
public static void main (String args[]){
MakeTodayClass mtc = new MakeTodayClass();
mtc.createIt();
if (mtc.compileIt()) {
System.out.println("Running " + mtc.todayClass + ":\n\n");
mtc.runIt();
}
else
System.out.println(mtc.todaySource + " is bad.");
}
public void createIt() {
try {
FileWriter aWriter = new FileWriter(todaySource, true);
aWriter.write("public class "+ todayClass + "{");
aWriter.write(" public void doit() {");
aWriter.write(" System.out.println(\""+todayMillis+"\");");
aWriter.write(" }}\n");
aWriter.flush();
aWriter.close();
}
catch(Exception e){
e.printStackTrace();
}
}
public boolean compileIt() {
String [] source = { new String(todaySource)};
ByteArrayOutputStream baos= new ByteArrayOutputStream();
new sun.tools.javac.Main(baos,source[0]).compile(source);
// if using JDK >= 1.3 then use
// public static int com.sun.tools.javac.Main.compile(source);
return (baos.toString().indexOf("error")==-1);
}
public void runIt() {
try {
Class params[] = {};
Object paramsObj[] = {};
Class thisClass = Class.forName(todayClass);
Object iClass = thisClass.newInstance();
Method thisMethod = thisClass.getDeclaredMethod("doit", params);
thisMethod.invoke(iClass, paramsObj);
}
catch (Exception e) {
e.printStackTrace();
}
}
}
It is possible to generate classes (via cglib, asm, javassist, bcel), but you shouldn't do it that way. Why?
the code that's using the library should expect type Object and get all the fields using reflection - not a good idea
java is statically typed language, and you want to introduce dynamic typing - it's not the place.
If you simply want the data in an undefined format, then you can return it in an array, like Object[], or Map<String, Object> if you want them named, and get it from there - it will save you much trouble with unneeded class generation for the only purpose of containing some data that will be obtained by reflection.
What you can do instead is have predefined classes that will hold the data, and pass them as arguments to querying methods. For example:
public <T> T executeQuery(Class<T> expectedResultClass,
String someArg, Object.. otherArgs) {..}
Thus you can use reflection on the passed expectedResultClass to create a new object of that type and populate it with the result of the query.
That said, I think you could use something existing, like an ORM framework (Hibernate, EclipseLink), spring's JdbcTemplate, etc.
This is possible, but (I believe) you need something like ASM or BCEL.
Alternately, you could use something with more power (like Groovy).
It will take a couple of minutes to create a data model class for each table, which you can easily map to the database with an ORM like Hibernate or by writing your own JDBC DAOs. It is far easier than delving deeply into reflection.
You could create a utility that interrogates the database structure for a table, and creates the data model class and DAO for you. Alternatively you could create the model in Java and create a utility to create the database schema and DAO from that (using reflection and Java 5 Annotations to assist). Don't forget that javaFieldNames are different from database_column_names typically.
Recently I needed to create about 200 simple classes from medatata (objects filled with static data) and I did it through the open source burningwave library, with the following scenario:
The classes needed to have a certain prefix in the name, for example "Registro "*.java;
The classes needed to extend from a superclass Registro.java
The classes needed to contain JPA annotations like #Entity, #Column (in attributes), Lombok annotations and custom annotations.
Here is the link to the repository with the complete project: https://github.com/leandrosoares6/criacao-classes-entidade-efd
Here is the code snippet responsible for creating the classes:
public class RegistrosClassFactory {
private static final String PACOTE = "com.example.demo.model.registros";
private static final String SCHEMA = "MY_SCHEMA";
private static final String PREFIXO = "Registro";
static void criaRegistros() {
List<RegistroTest> registros = RegistroMetadataFactory.criaMetadados();
criaClasses(registros);
}
private static void criaClasses(List<RegistroTest> registros) {
for (RegistroTest registroTest : registros) {
UnitSourceGenerator gerador = UnitSourceGenerator.create(PACOTE);
ClassSourceGenerator registro = ClassSourceGenerator
.create(TypeDeclarationSourceGenerator.create(PREFIXO + registroTest.getNome()))
.addModifier(Modifier.PUBLIC)
.addAnnotation(AnnotationSourceGenerator.create(Getter.class))
.addAnnotation(AnnotationSourceGenerator.create(Setter.class))
.addAnnotation(AnnotationSourceGenerator.create(NoArgsConstructor.class))
.addAnnotation(AnnotationSourceGenerator.create(ToString.class))
.addAnnotation(AnnotationSourceGenerator.create(Entity.class))
.addAnnotation(AnnotationSourceGenerator.create(Table.class)
.addParameter("name",
VariableSourceGenerator.create(String.format("\"%s\"",
registroTest.getNomeTabelaBd())))
.addParameter("schema", VariableSourceGenerator
.create(String.format("\"%s\"", SCHEMA))));
criaColunas(registroTest.getCampos(), registro);
registro.addConstructor(FunctionSourceGenerator.create().addModifier(Modifier.PUBLIC)
.addParameter(VariableSourceGenerator.create(String.class, "linha"))
.addBodyCodeLine("super(linha);")).expands(Registro.class);
gerador.addClass(registro);
// System.out.println("\nRegistro gerado:\n" + gerador.make());
String caminhoPastaRegistros = System.getProperty("user.dir") + "/src/main/java/";
gerador.storeToClassPath(caminhoPastaRegistros);
}
}
private static void criaColunas(List<Campo> campos, ClassSourceGenerator registro) {
for (Campo campo : campos) {
VariableSourceGenerator field = VariableSourceGenerator
.create(TypeDeclarationSourceGenerator.create(String.class),
campo.getNomeAtributo())
.addModifier(Modifier.PRIVATE)
.addAnnotation(AnnotationSourceGenerator.create(Column.class)
.addParameter("name", VariableSourceGenerator
.create(String.format("\"%s\"", campo.getNome())))
)
.addAnnotation(AnnotationSourceGenerator.create(Indice.class).addParameter(
"valor",
VariableSourceGenerator.create(String.valueOf(campo.getSequencial()))));
if (campo.getNome().equals("ID")) {
field.addAnnotation(AnnotationSourceGenerator.create(Id.class));
}
if (campo.getEId()) {
field.addAnnotation(AnnotationSourceGenerator.create(CampoTipoId.class));
}
if (campo.getEData()) {
field.addAnnotation(AnnotationSourceGenerator.create(CampoTipoData.class));
}
if (campo.getEDataPart()) {
field.addAnnotation(AnnotationSourceGenerator.create(CampoTipoDataPart.class));
}
registro.addField(field);
}
}
}
I'm aware of the performance drawback of reflection but for my little project I needed this and I created a project lib which converts JSON to Java and then finally .class in JVM context.
Anyone need such thing can have a look into my open source solution, which requires JDK to compile the code.
https://medium.com/#davutgrbz/the-need-history-c91c9d38ec9?sk=f076487e78a1ff5a66ef8eb1aa88f930

How to check if an IP address is from a particular network/netmask in Java?

I need to determine if given IP address is from some special network in order to authenticate automatically.
Option 1:
Use spring-security-web's IpAddressMatcher. Unlike Apache Commons Net, it supports both ipv4 and ipv6.
import org.springframework.security.web.util.matcher.IpAddressMatcher;
...
private void checkIpMatch() {
matches("192.168.2.1", "192.168.2.1"); // true
matches("192.168.2.1", "192.168.2.0/32"); // false
matches("192.168.2.5", "192.168.2.0/24"); // true
matches("92.168.2.1", "fe80:0:0:0:0:0:c0a8:1/120"); // false
matches("fe80:0:0:0:0:0:c0a8:11", "fe80:0:0:0:0:0:c0a8:1/120"); // true
matches("fe80:0:0:0:0:0:c0a8:11", "fe80:0:0:0:0:0:c0a8:1/128"); // false
matches("fe80:0:0:0:0:0:c0a8:11", "192.168.2.0/32"); // false
}
private boolean matches(String ip, String subnet) {
IpAddressMatcher ipAddressMatcher = new IpAddressMatcher(subnet);
return ipAddressMatcher.matches(ip);
}
Option 2 (a lightweight solution!):
The code in previous part works perfectly fine but it needs spring-security-web to be included.
If you are not willing to include Spring framework in your project, you may use this class which is a slightly modified version of the original class from Spring, so that it has no non-JRE dependencies.
/*
* Copyright 2002-2019 the original author or authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.net.InetAddress;
import java.net.UnknownHostException;
/**
* Matches a request based on IP Address or subnet mask matching against the remote
* address.
* <p>
* Both IPv6 and IPv4 addresses are supported, but a matcher which is configured with an
* IPv4 address will never match a request which returns an IPv6 address, and vice-versa.
*
* #author Luke Taylor
* #since 3.0.2
*
* Slightly modified by omidzk to have zero dependency to any frameworks other than the JRE.
*/
public final class IpAddressMatcher {
private final int nMaskBits;
private final InetAddress requiredAddress;
/**
* Takes a specific IP address or a range specified using the IP/Netmask (e.g.
* 192.168.1.0/24 or 202.24.0.0/14).
*
* #param ipAddress the address or range of addresses from which the request must
* come.
*/
public IpAddressMatcher(String ipAddress) {
if (ipAddress.indexOf('/') > 0) {
String[] addressAndMask = ipAddress.split("/");
ipAddress = addressAndMask[0];
nMaskBits = Integer.parseInt(addressAndMask[1]);
}
else {
nMaskBits = -1;
}
requiredAddress = parseAddress(ipAddress);
assert (requiredAddress.getAddress().length * 8 >= nMaskBits) :
String.format("IP address %s is too short for bitmask of length %d",
ipAddress, nMaskBits);
}
public boolean matches(String address) {
InetAddress remoteAddress = parseAddress(address);
if (!requiredAddress.getClass().equals(remoteAddress.getClass())) {
return false;
}
if (nMaskBits < 0) {
return remoteAddress.equals(requiredAddress);
}
byte[] remAddr = remoteAddress.getAddress();
byte[] reqAddr = requiredAddress.getAddress();
int nMaskFullBytes = nMaskBits / 8;
byte finalByte = (byte) (0xFF00 >> (nMaskBits & 0x07));
// System.out.println("Mask is " + new sun.misc.HexDumpEncoder().encode(mask));
for (int i = 0; i < nMaskFullBytes; i++) {
if (remAddr[i] != reqAddr[i]) {
return false;
}
}
if (finalByte != 0) {
return (remAddr[nMaskFullBytes] & finalByte) == (reqAddr[nMaskFullBytes] & finalByte);
}
return true;
}
private InetAddress parseAddress(String address) {
try {
return InetAddress.getByName(address);
}
catch (UnknownHostException e) {
throw new IllegalArgumentException("Failed to parse address" + address, e);
}
}
}
NOTICE: Notice that for using this option, it's your responsibility to carefully examine the license to make sure by using this code, you are not in violation of any terms mandated by the aforementioned license. (Of course publishing this code to Stackoverflow.com by me is not a violation.)
Apache Commons Net has org.apache.commons.net.util.SubnetUtils that appears to satisfy your needs. It looks like you do something like this:
SubnetInfo subnet = (new SubnetUtils("10.10.10.0", "255.255.255.128")).getInfo();
boolean test = subnet.isInRange("10.10.10.10");
Note, as carson points out, that Apache Commons Net has a bug that prevents it from giving the correct answer in some cases. Carson suggests using the SVN version to avoid this bug.
You can also try
boolean inSubnet = (ip & netmask) == (subnet & netmask);
or shorter
boolean inSubnet = (ip ^ subnet) & netmask == 0;
The open-source IPAddress Java library will do this in a polymorphic manner for both IPv4 and IPv6 and handles subnets. Disclaimer: I am the project manager of that library.
Example code:
contains("10.10.20.0/30", "10.10.20.3");
contains("10.10.20.0/30", "10.10.20.5");
contains("1::/64", "1::1");
contains("1::/64", "2::1");
contains("1::3-4:5-6", "1::4:5");
contains("1-2::/64", "2::");
contains("bla", "foo");
static void contains(String network, String address) {
IPAddressString one = new IPAddressString(network);
IPAddressString two = new IPAddressString(address);
System.out.println(one + " contains " + two + " " + one.contains(two));
}
Output:
10.10.20.0/30 contains 10.10.20.3 true
10.10.20.0/30 contains 10.10.20.5 false
1::/64 contains 1::1 true
1::/64 contains 2::1 false
1::3-4:5-6 contains 1::4:5 true
1-2::/64 contains 2:: true
bla contains foo false
here is an Version that works with IPv4 and IPv6 one with Prefix and one with Network Mask.
/**
* Check if IP is within an Subnet defined by Network Address and Network Mask
* #param ip
* #param net
* #param mask
* #return
*/
public static final boolean isIpInSubnet(final String ip, final String net, final int prefix) {
try {
final byte[] ipBin = java.net.InetAddress.getByName(ip ).getAddress();
final byte[] netBin = java.net.InetAddress.getByName(net ).getAddress();
if(ipBin.length != netBin.length ) return false;
int p = prefix;
int i = 0;
while(p>=8) { if(ipBin[i] != netBin[i] ) return false; ++i; p-=8; }
final int m = (65280 >> p) & 255;
if((ipBin[i] & m) != (netBin[i]&m) ) return false;
return true;
} catch(final Throwable t) {
return false;
}
}
/**
* Check if IP is within an Subnet defined by Network Address and Network Mask
* #param ip
* #param net
* #param mask
* #return
*/
public static final boolean isIpInSubnet(final String ip, final String net, final String mask) {
try {
final byte[] ipBin = java.net.InetAddress.getByName(ip ).getAddress();
final byte[] netBin = java.net.InetAddress.getByName(net ).getAddress();
final byte[] maskBin = java.net.InetAddress.getByName(mask).getAddress();
if(ipBin.length != netBin.length ) return false;
if(netBin.length != maskBin.length) return false;
for(int i = 0; i < ipBin.length; ++i) if((ipBin[i] & maskBin[i]) != (netBin[i] & maskBin[i])) return false;
return true;
} catch(final Throwable t) {
return false;
}
}
I know this is very old question, but I stumbled upon this when I was looking to solve the same problem.
There is commons-ip-math library that I believe does a very good job. Please note that as of May 2019, there hasn't been any updates to the library (Could be that its already very mature library). Its available on maven-central
It supports working with both IPv4 and IPv6 addresses. Their brief documentation has examples on how you can check if an address is in a specific range for IPv4 and IPv6
Example for IPv4 range checking:
String input1 = "192.168.1.0";
Ipv4 ipv41 = Ipv4.parse(input1);
// Using CIDR notation to specify the networkID and netmask
Ipv4Range range = Ipv4Range.parse("192.168.0.0/24");
boolean result = range.contains(ipv41);
System.out.println(result); //false
String input2 = "192.168.0.251";
Ipv4 ipv42 = Ipv4.parse(input2);
// Specifying the range with a start and end.
Ipv4 start = Ipv4.of("192.168.0.0");
Ipv4 end = Ipv4.of("192.168.0.255");
range = Ipv4Range.from(start).to(end);
result = range.contains(ipv42); //true
System.out.println(result);
To check An IP in a subnet, I used isInRange method in SubnetUtils class. But this method have a bug that if your subnet was X, every IP address that lower than X, isInRange return true. For example if your subnet was 10.10.30.0/24 and you want to check 10.10.20.5, this method return true. To deal with this bug I used below code.
public static void main(String[] args){
String list = "10.10.20.0/24";
String IP1 = "10.10.20.5";
String IP2 = "10.10.30.5";
SubnetUtils subnet = new SubnetUtils(list);
SubnetUtils.SubnetInfo subnetInfo = subnet.getInfo();
if(MyisInRange(subnetInfo , IP1) == true)
System.out.println("True");
else
System.out.println("False");
if(MyisInRange(subnetInfo , IP2) == true)
System.out.println("True");
else
System.out.println("False");
}
private boolean MyisInRange(SubnetUtils.SubnetInfo info, String Addr )
{
int address = info.asInteger( Addr );
int low = info.asInteger( info.getLowAddress() );
int high = info.asInteger( info.getHighAddress() );
return low <= address && address <= high;
}

Categories

Resources