Normalize a String to create a safe URL in Java - java

I'm writing a library in Java which creates the URL from a list of filenames in this way:
final String domain = "http://www.example.com/";
String filenames[] = {"Normal text","Ich weiß nicht", "L'ho inserito tra i princìpi"};
System.out.println(domain+normalize(filenames[0]);
//Prints "http://www.example.com/Normal_text"
System.out.println(domain+normalize(filenames[1]);
//Prints "http://www.example.com/Ich_weib_nicht"
System.out.println(domain+normalize(filenames[2]);
//Prints "http://www.example.com/L_ho_inserito_tra_i_principi"
Exists somewhere a Java library that exposes the method normalize that I'm using in the code above?
Literature:
Which special characters are safe to use in url?
Safe characters for friendly url

Taking the content from my previous answer here, you can use java.text.Normalizer which comes close to normalizing Strings in Java. An example of normalization would be;
Accent removal:
String accented = "árvíztűrő tükörfúrógép";
String normalized = Normalizer.normalize(accented, Normalizer.Form.NFD);
normalized = normalized.replaceAll("[^\\p{ASCII}]", "");
System.out.println(normalized);
Gives;
arvizturo tukorfurogep

Assuming you mean you want to encode the strings to make them safe for the url. In which case use URLEncoder:
final String domain = "http://www.example.com/";
String filenames[] = {"Normal text","Ich weiß nicht", "L'ho inserito tra i princìpi"};
System.out.println(domain + URLEncoder.encode(filenames[0], "UTF-8"));
System.out.println(domain + URLEncoder.encode(filenames[1], "UTF-8"));
System.out.println(domain + URLEncoder.encode(filenames[2], "UTF-8"));

Related

How to replace string values with "XXXXX" in java?

I want to replace particular string values with "XXXX". The issue is the pattern is very dynamic and it won't have a fixed pattern in input data.
My input data
https://internetbanking.abc.co.uk/personal/logon/login/?userId=Templ3108&password=Got1&reme
I need to replace the values of userId and password with "XXXX".
My output should be -
https://internetbanking.abc.co.uk/personal/logon/login/?userId=XXXX&password=XXXX&reme
This is an one off example. There are other cases where only userId and password is present -
userId=12345678&password=stackoverflow&rememberID=
I am using Regex in java to achieve the above, but have not been successful yet. Appreciate any guidance.
[&]([^\\/?&;]{0,})(userId=|password=)=[^&;]+|((?<=\\/)|(?<=\\?)|(?<=;))([^\\/?&;]{0,})(userId=|password=)=[^&]+|(?<=\\?)(userId=|password=)=[^&]+|(userId=|password=)=[^&]+
PS : I am not an expert in Regex. Also, please do let me know if there are any other alternatives to achieve this apart from Regex.
This may cover given both cases.
String maskUserNameAndPassword(String input) {
return input.replaceAll("(userId|password)=[^&]+", "$1=XXXXX");
}
String inputUrl1 =
"https://internetbanking.abc.co.uk/personal/logon/login/?userId=Templ3108&password=Got1&reme";
String inputUrl2 =
"userId=12345678&password=stackoverflow&rememberID=";
String input = "https://internetbanking.abc.co.uk/personal/logon/login/?userId=Templ3108&password=Got1&reme";
String maskedUrl1 = maskUserNameAndPassword(inputUrl1);
System.out.println("Mask url1: " + maskUserNameAndPassword(inputUrl1));
String maskedUrl2 = maskUserNameAndPassword(inputUrl1);
System.out.println("Mask url2: " + maskUserNameAndPassword(inputUrl2));
Above will result:
Mask url1: https://internetbanking.abc.co.uk/personal/logon/login/?userId=XXXXX&password=XXXXX&reme
Mask url2: userId=XXXXX&password=XXXXX&rememberID=
String url = "https://internetbanking.abc.co.uk/personal/logon/login/?userId=Templ3108&password=Got1&reme";
String masked = url.replaceAll("(userId|password)=[^&]+", "$1=XXXX");
See online demo and regex explanation.
Please note, that sending sensitive data via the query string is a big security issue.
I would rather use a URL parser than regex. The below example uses the standard URL class available in java but third party libraries can do it much better.
Function<Map.Entry<String, String>, Map.Entry<String, String>> maskUserPasswordEntries = e ->
(e.getKey().equals("userId") || e.getKey().equals("password")) ? Map.entry(e.getKey(), "XXXX") : e;
Function<List<String>, Map.Entry<String, String>> transformParamsToMap = p ->
Map.entry(p.get(0), p.size() == 1 ? "" : p.get(p.size() - 1));
URL url = new URL("https://internetbanking.abc.co.uk/personal/logon/login/?userId=Templ3108&password=Got1&reme");
String maskedQuery = Stream.of(url.getQuery().split("&"))
.map(s -> List.of(s.split("=")))
.map(transformParamsToMap)
.map(maskUserPasswordEntries).map(e -> e.getKey() + "=" + e.getValue())
.collect(Collectors.joining("&"));
System.out.println(url.getProtocol() + "://" + url.getAuthority() + url.getPath() + "?" + maskedQuery);
Output:
https://internetbanking.abc.co.uk/personal/logon/login/?userId=XXXX&password=XXXX&reme=
Just use the methods replace/replaceAll from the String class, they support Charset aswell as regex.
String url = "https://internetbanking.abc.co.uk/personal/logon/login/?userId=Templ3108&password=Got1&reme";
url = url.replaceAll("(userId=.+?&)", "userId=XXXX&");
url = url.replaceAll("(password=.+?&)", "password=XXXX&");
System.out.println(url);
I'm not a regex expert either, but if you find it useful, I usually use this website to test my expressions and as a online Cheatsheet:
https://regexr.com
Use:
(?<=(\?|&))(userId|password)=(.*?)(?=(&|$))
(?<=(\?|&)) makes sure it’s preceded by ? or & (but not part of the match)
(userId|password)= matches either userId or password, then =
(.*?) matches any char as long as the next instruction cannot be executed
(?=(&|$)) makes sure the next char is either & or end of the string, (but not part of the match)
Then, replace with $2=xxxxx (to keep userId or password) and choose replaceAll.

Convert any simple string to JSON in Java

I have string like:
String metadata = "{ Item: {Owner:John, About:{website:www.john.com/about, firstName:William, lastName:Shakespeare, date:1/2/2000, books:[Othello, R&J], shelf:[12/3/4, 14/5/6, 17/8/6]}}}"
I want to convert this metadata into JSON format. But because of missing quotes I was not able to convert it.
I can write a code that could do parsing and add quotes, that does not becomes flexible for any string to make it into JSON. (Since this metadata is prone to changes in future).
Is there a way to add quotes to any string to make it into JSON?
Or how can be this generalized so that simple string is converted into JSON. Do we have any library in Java that converts?
Code snippets will be really helpful to understand, if no library is there.
Well, I prefer not to use any external library.
Your string seems to use fancy incoherent notation: colons and equal signs to separate keys, unquoted strings, dates with multiple formats...
How is it generated in the first place? It may be easier to change how that string is generated than parsing it again.
To parse the string, you must first determine the rules on how to break it apart (the grammar). Incoherent as it is, this parser would have no end of special cases.
public static void main(String[] args){
String metadata = "{ Item: {Owner:John, About:{website:www.john.com/about, firstName:William, lastName:Shakespeare, date:1/2/2000, books:[Othello, R&J], shelf:[12/3/4, 14/5/6, 17/8/6]}}}";
String json = "";
StringTokenizer st = new StringTokenizer(metadata, "([^:{}, ])", true);
StringTokenizer stkey = new StringTokenizer(metadata, "([^:{}, ])", false);
while(stkey.hasMoreTokens()){
String s1 = stkey.nextToken();
while(st.hasMoreTokens()){
String s2 = st.nextToken();
if(s1.equals(s2)){
json += "\"" + s2.trim() + "\"";
break;
} else {
json += s2;
}
}
}
while(st.hasMoreTokens()){
json += st.nextToken();
}
System.out.println(json);
JSONObject jo = JSONObject.fromObject(json);
}
Only can regarded every values as string. you can control regex repression to do better.

parsing a string using string tokenizer twice

I am getting input string as below from some procedure
service:jmx:t3://10.20.30.40:9031/jndi/weblogic.management.mbeanservers.runtime
I want to parse it in java and get out
t3
10.20.30.40
9031
into separate strings
I think I can use string tokenizer but I have to tokenize 2 times ?Any better way to handle this?
Use the JMXServiceUrl class. It will parse the URL for you. No need to battle with regex or String splits.
String url = "service:jmx:t3://10.20.30.40:9031/jndi/weblogic.management.mbeanservers.runtime";
JMXServiceURL jmxServiceURL = new JMXServiceURL(url);
System.out.println(jmxServiceURL.getHost());
System.out.println(jmxServiceURL.getPort());
System.out.println(jmxServiceURL.getProtocol());
Prints
10.20.30.40
9031
t3
If it's only a somehow composed String and you can ignorie performance, I would prefer a readable solution (more than regex ;-)) like this:
int pos_1 = input.indexOf("//");
String s1 = input.substring(0, pos_1);
String input_2 = input.substring(pos_1 + 2);
int pos_2 = input_2.indexOf(":");
String s2 = input_2.substring(0, pos_2);
...
Regex is a good approach. You should find the pattern for your string and group with parenthesis what you want. Maybe this could be enough for you:
service\\:jmx\\:(?<groupName01>[a-z0-9]+)\\://(?<groupName02>[0-9\\.]+)\\:(?<groupName03>[o-9]+)
See Java Regex
If you use java earlier from 7, do not use ?<groupName> in the pattern. It will be grouped by number.
Do a simple string split
String s = "service:jmx:t3://10.20.30.40:9031/jndi/weblogic.management.mbeanservers.runtime";
String tokens[] = s.split("[:/]");
System.out.println(tokens[2]);
System.out.println(tokens[5]);
System.out.println(tokens[6]);

Extract text from string Java

With this string "ADACADABRA". how to extract "CADA" From string "ADACADABRA" in java.
and also how to extract the id between "/" and "?" from the link below.
http://www.youtube-nocookie.com/embed/zaaU9lJ34c5?rel=0
output should be: zaaU9lJ34c5
but should use "/" and "?" in the process.
and also how to extract the id between "/" and "?" from the link below.
http://www.youtube-nocookie.com/embed/zaaU9lJ34c5?rel=0
output should be: zaaU9lJ34c5
Should be :
String url = "http://www.youtube-nocookie.com/embed/zaaU9lJ34c5?rel=0";
String str = url.substring(url.lastIndexOf("/") + 1, url.indexOf("?"));
String s = "ADACADABRA";
String s2 = s.substring(3,7);
Here 3 specifies the beginning index, and 7 specifies the stopping point.
The string returned contains all the characters from the beginning index, up to, but not including, the ending index.
I'm not entirely sure what you mean by extract, so I've provided the code to remove it from the String, I'm not certain if this is what you want.
public static void main (String args[]){
String string = "ADACADABRA";
string = string.replace("CADA", "");
System.out.println(string);
}
This is untested but something like this may help for the youtube part:
String youtubeUrl = "http://www.youtube-nocookie.com/embed/zaaU9lJ34c5?rel=0";
String[] urlParts = youtubeUrl.split("/");
String videoId = urlParts[urlParts.length - 1];
videoId = videoId.substring(0, videoId.indexOf("?"));
Extracting CADA from the string makes no sense. You will need to specify how you have determined that CADA is the string to extract.
E.g. is it because it is the middle 4 characters? Is it because you are stripping off 3 characters each side? Are you just looking for the String "CADA"? Is it characters 3,7 of the String? Is it the first 4 of the last 7 characters of a String? Is it because it contains 2 vowels and 2 consanants? I could go on..
String regex = "CADA";
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
Matcher m = p.matcher(originalText);
while (m.find()) {
String outputThis = m.group(1);
}
Use this tool http://www.regexplanet.com/advanced/java/index.html
Probably, you don't take in account the fact of java.lang.String immutability. That's why you need to assign the result of substringing to a new variable.

Need to Trim Java String

I need help in trimming a string url.
Let's say the String is http://myurl.com/users/232222232/pageid
What i would like returned would be /232222232/pageid
Now the 'myurl.com' can change but the /users/ will always be the same.
I suggest you use substring and indexOf("/users/").
String url = "http://myurl.com/users/232222232/pageid";
String lastPart = url.substring(url.indexOf("/users/") + 6);
System.out.println(lastPart); // prints "/232222232/pageid"
A slightly more sophisticated variant would be to let the URL class parse the url for you:
URL url = new URL("http://myurl.com/users/232222232/pageid");
String lastPart = url.getPath().substring(6);
System.out.println(lastPart); // prints "/232222232/pageid"
And, a third approach, using regular expressions:
String url = "http://myurl.com/users/232222232/pageid";
String lastPart = url.replaceAll(".*/users", "");
System.out.println(lastPart); // prints "/232222232/pageid"
string.replaceAll(".*/users(/.*/.*)", "$1");
String rest = url.substring(url.indexOf("/users/") + 6);
You can use split(String regex,int limit) which will split the string around the pattern in regex at most limit times, so...
String url="http://myurl.com/users/232222232/pageid";
String[] parts=url.split("/users",1);
//parts={"http://myurl.com","/232222232/pageid"}
String rest=parts[1];
//rest="/232222232/pageid"
The limit is there to prevent strings like "http://myurl.com/users/232222232/users/pageid" giving answers like "/232222232".
You can use String.indexOf() and String.substring() in order to achieve this:
String pattern = "/users/";
String url = "http://myurl.com/users/232222232/pageid";
System.out.println(url.substring(url.indexOf(pattern)+pattern.length()-1);

Categories

Resources