How to replace tokens in a string without StringTokenizer

How to replace tokens in a string without StringTokenizer - java

Given a string like so:
Hello {FIRST_NAME}, this is a personalized message for you.
Where FIRST_NAME is an arbitrary token (a key in a map passed to the method), to write a routine which would turn that string into:
Hello Jim, this is a personalized message for you.
given a map with an entry FIRST_NAME -> Jim.
It would seem that StringTokenizer is the most straight forward approach, but the Javadocs really say you should prefer to use the regex aproach. How would you do that in a regex based solution?

Thanks everyone for the answers!
Gizmo's answer was definitely out of the box, and a great solution, but unfortunately not appropriate as the format can't be limited to what the Formatter class does in this case.
Adam Paynter really got to the heart of the matter, with the right pattern.
Peter Nix and Sean Bright had a great workaround to avoid all of the complexities of the regex, but I needed to raise some errors if there were bad tokens, which that didn't do.
But in terms of both doing a regex and a reasonable replace loop, this is the answer I came up with (with a little help from Google and the existing answer, including Sean Bright's comment about how to use group(1) vs group()):
private static Pattern tokenPattern = Pattern.compile("\\{([^}]*)\\}");
public static String process(String template, Map<String, Object> params) {
StringBuffer sb = new StringBuffer();
Matcher myMatcher = tokenPattern.matcher(template);
while (myMatcher.find()) {
String field = myMatcher.group(1);
myMatcher.appendReplacement(sb, "");
sb.append(doParameter(field, params));
}
myMatcher.appendTail(sb);
return sb.toString();
}
Where doParameter gets the value out of the map and converts it to a string and throws an exception if it isn't there.
Note also I changed the pattern to find empty braces (i.e. {}), as that is an error condition explicitly checked for.
EDIT: Note that appendReplacement is not agnostic about the content of the string. Per the javadocs, it recognizes $ and backslash as a special character, so I added some escaping to handle that to the sample above. Not done in the most performance conscious way, but in my case it isn't a big enough deal to be worth attempting to micro-optimize the string creations.
Thanks to the comment from Alan M, this can be made even simpler to avoid the special character issues of appendReplacement.

Well, I would rather use String.format(), or better MessageFormat.

String.replaceAll("{FIRST_NAME}", actualName);
Check out the javadocs for it here.

Try this:
Note: The author's final solution builds upon this sample and is much more concise.
public class TokenReplacer {
private Pattern tokenPattern;
public TokenReplacer() {
tokenPattern = Pattern.compile("\\{([^}]+)\\}");
}
public String replaceTokens(String text, Map<String, String> valuesByKey) {
StringBuilder output = new StringBuilder();
Matcher tokenMatcher = tokenPattern.matcher(text);
int cursor = 0;
while (tokenMatcher.find()) {
// A token is defined as a sequence of the format "{...}".
// A key is defined as the content between the brackets.
int tokenStart = tokenMatcher.start();
int tokenEnd = tokenMatcher.end();
int keyStart = tokenMatcher.start(1);
int keyEnd = tokenMatcher.end(1);
output.append(text.substring(cursor, tokenStart));
String token = text.substring(tokenStart, tokenEnd);
String key = text.substring(keyStart, keyEnd);
if (valuesByKey.containsKey(key)) {
String value = valuesByKey.get(key);
output.append(value);
} else {
output.append(token);
}
cursor = tokenEnd;
}
output.append(text.substring(cursor));
return output.toString();
}
}

With import java.util.regex.*:
Pattern p = Pattern.compile("{([^{}]*)}");
Matcher m = p.matcher(line); // line being "Hello, {FIRST_NAME}..."
while (m.find) {
String key = m.group(1);
if (map.containsKey(key)) {
String value= map.get(key);
m.replaceFirst(value);
}
}
So, the regex is recommended because it can easily identify the places that require substitution in the string, as well as extracting the name of the key for substitution. It's much more efficient than breaking the whole string.
You'll probably want to loop with the Matcher line inside and the Pattern line outside, so you can replace all lines. The pattern never needs to be recompiled, and it's more efficient to avoid doing so unnecessarily.

The most straight forward would seem to be something along the lines of this:
public static void main(String[] args) {
String tokenString = "Hello {FIRST_NAME}, this is a personalized message for you.";
Map<String, String> tokenMap = new HashMap<String, String>();
tokenMap.put("{FIRST_NAME}", "Jim");
String transformedString = tokenString;
for (String token : tokenMap.keySet()) {
transformedString = transformedString.replace(token, tokenMap.get(token));
}
System.out.println("New String: " + transformedString);
}
It loops through all your tokens and replaces every token with what you need, and uses the standard String method for replacement, thus skipping the whole RegEx frustrations.

Depending on how ridiculously complex your string is, you could try using a more serious string templating language, like Velocity. In Velocity's case, you'd do something like this:
Velocity.init();
VelocityContext context = new VelocityContext();
context.put( "name", "Bob" );
StringWriter output = new StringWriter();
Velocity.evaluate( context, output, "",
"Hello, #name, this is a personalized message for you.");
System.out.println(output.toString());
But that is likely overkill if you only want to replace one or two values.

import java.util.HashMap;
public class ReplaceTest {
public static void main(String[] args) {
HashMap<String, String> map = new HashMap<String, String>();
map.put("FIRST_NAME", "Jim");
map.put("LAST_NAME", "Johnson");
map.put("PHONE", "410-555-1212");
String s = "Hello {FIRST_NAME} {LAST_NAME}, this is a personalized message for you.";
for (String key : map.keySet()) {
s = s.replaceAll("\\{" + key + "\\}", map.get(key));
}
System.out.println(s);
}
}

The docs mean that you should prefer writing a regex-based tokenizer, IIRC. What might work better for you is a standard regex search-replace.

Generally we'd use MessageFormat in a case like this, coupled with loading the actual message text from a ResourceBundle. This gives you the added benefit of being G10N friendly.

Related

Replacing special characters from a string

Just would like to know if there is a more elegant and maintainable approach for this:
private String replaceSpecialChars(String fileName) {
if (fileName.length() < 1) return null;
if (fileName.contains("Ü")) {
fileName = fileName.replace("Ü", "Ue");
}
if (fileName.contains("Ä")) {
fileName = fileName.replace("Ä", "Ae");
}
if (fileName.contains("Ö")) {
fileName = fileName.replace("Ö", "Oe");
}
if (fileName.contains("ü")) {
fileName = fileName.replace("ü", "ue");
}
...
return fileName;
}
I'm restricted to Java 6.

Before you go any further on this, note that what you're doing is effectively impossible. For example, the 'ascii-fication' of 'Ö' in swedish is 'O' and not 'Oe'. There is no way to know if a word is swedish or german; after all, swedes sometimes move to germany, for example. If you open a german phonebook and you see a Mrs. Sjögren, and you asciify that to Sjoegren, you messed it up.
If you want to run 'case and asciification insensitive comparisons', well, first you have to answer a few questions. Is muller equal to mueller equal to müller? That rabbit hole goes quite deep.
The general solution is trigrams or other generalized text search tools such as provided by postgres. Alternatively, opt out of this mechanism and store this stuff in unicode, and be clear that to find Ms. Sjögren, you're going to have search for "Sjögren" for the same reason that to find Mr. Johnson, you're not going to if you try to search for Jahnson.
Note that most filesystems allow unicode filenames; there is no need to try to replace a Ü.
This also goes some way as to explain why there are no ready libraries available for this seemingly common job; the job is, in fact, impossible.
You can simplify this code by using a Map<String, String> with replacements if you must. I advise against it for the above reasons. Or, just.. keep it as is, but ditch the contains. This code is needlessly slow and lengthy.
There is no difference between:
if (fileName.contains("x")) fileName = fileName.replace("x", "y");
and just fileName = fileName.replace("x", "y"); except that the former is strictly slower (replace does not make a new string and returns itself, if you ask it to replace a string that it does not contain. The former will search twice, the latter only once, and either one will make no new strings unless actual string replacing needs to be done.
You can then chain it:
if (fileName.isEmpty()) return null;
return fileName
.replace("Ü", "Ue")
.replace("Ä", "Ae")
...
;
But, as I said, you probably don't want to do that, unless you want an aggravated person on the line at some point in the future complaining that you bungled up the asciification of their surname.

You can remove unnecessary if statements an use a chain of String.replace methods. Your code might look something like this:
private static String replaceSpecialChars(String fileName) {
if (fileName == null)
return null;
else
return fileName
.replace("Ü", "Ue")
.replace("Ä", "Ae")
.replace("Ö", "Oe")
.replace("ü", "ue");
}
public static void main(String[] args) {
System.out.println(replaceSpecialChars("ABc")); // ABc
System.out.println(replaceSpecialChars("ÜÄÖü")); // UeAeOeue
System.out.println(replaceSpecialChars("").length()); // 0
System.out.println(replaceSpecialChars(null)); // null
}

How can I parse a specific cookie from this string in Java?

say I have the following string in a variable
cookie-one=someValue;HttpOnly;Secure;Path=/;SameSite=none, cookie-two=someOtherValue;Path=/;Secure;HttpOnly, cookie-three=oneMoreValue;Path=/;Secure
and I want a substring from the name of a cookie that I choose say cookie-two and store the string up to the contents of that cookie.
So basically I need
cookie-two=someOtherValue;Path=/;Secure;HttpOnly
How can I get this substring out?

You can just separate the String by commas first to separate the cookies. For example if you wanted just the cookie that has the name cookie-two:
String s = "cookie-one=someValue;HttpOnly;Secure;Path=/;SameSite=none, cookie-two=someOtherValue;Path=/;Secure;HttpOnly, cookie-three=oneMoreValue;Path=/;Secure";
String[] cookies = s.split(",");
for(String cookie : cookies){
if(cookie.trim().startsWith("cookie-two")){
System.out.println(cookie);
}
}

This is possible to achieve in several different ways depending on how the data might vary in the sting. For your specific example we could for instance do like this:
String cookieString = "cookie-one=someValue;HttpOnly;Secure;Path=/;SameSite=none, cookie-two=someOtherValue;Path=/;Secure;HttpOnly, cookie-three=oneMoreValue;Path=/;Secure";
String result = "";
for(String s: cookieString.split(", ")) {
if(s.startsWith("cookie-two")) {
result = s;
break;
}
}
We could also use regex and/or streams to make the code look nicer, but this is probably one of the most straight forward ways of achieving what you want.

Advise to create a translator in Java

I want to make a translator ex: English to Spanish.
I want to translate a large text with a map for the translation.
HashMap <String, Object> hashmap = new HashMap <String, Object>();
hashmap.put("hello", "holla");
.
.
.
Witch object should I use to handle my inital text of 1000 words? A String or StringBuilder is fine ?
How can I do a large replace? Without iterate each word with each element of the map ?
I don't want take each word of the string, and see there is a match in my map
Maybe a multimap with the first letter of the word?
If you have any answer or advise thank you

Here is an example implementation:
import java.io.*;
import java.util.*;
public class Translator {
public enum Language {
EN, ES
}
private static final String TRANSLATION_TEMPLATE = "translation_%s_%s.properties";
private final Properties translations = new Properties();
public Translator(Language from, Language to) {
String translationFile = String.format(TRANSLATION_TEMPLATE, from, to);
try (InputStream is = getClass().getResourceAsStream(translationFile)) {
translations.load(is);
} catch (final IOException e) {
throw new RuntimeException("Could not read: " + translationFile, e);
}
}
private String[] translate(String text) {
String[] source = normalizeText(text);
List<String> translation = new ArrayList<>();
for (String sourceWord : source) {
translation.add(translateWord(sourceWord));
}
return translation.toArray(new String[source.length]);
}
private String translateWord(String sourceWord) {
Object value = translations.get(sourceWord);
String translatedWord;
if (value != null) {
translatedWord = String.valueOf(value);
}
else {
// if no translation is found, add the source word with a question mark
translatedWord = sourceWord + "?";
}
return translatedWord;
}
private String[] normalizeText(String text) {
String alphaText = text.replaceAll("[^A-Za-z]", " ");
return alphaText.split("\\s+");
}
public static void main(final String[] args) {
final Translator translator = new Translator(Language.EN, Language.ES);
System.out.println(Arrays.toString(translator.translate("hello world!")));
}
}
And put a file called 'translation_EN_ES.properties' on your classpath (e.g. src/main/resources) with:
hello=holla
world=mundo

If you know all the words before hand you could easily create a Regex Trie.
Then at runtime, compile the regex once. Then you are good to go.
To create the regex, download and install RegexFormat 5 here.
From the main menu, select Tools -> Strings to Regex - Ternary Tree
paste the list in the input box, then press the Generate button.
It spits out a full regex Trie that is as fast as any hash lookup there is.
Copy the compressed output from that dialog into Rxform tab (mdi) window.
Right click window to get the Context menu, select Misc Utilities -> Line Wrap
set it for about a 60 character width, press ok.
Next press the C++ button from the windows toolbar to bring up the MegaString
dialog. Click make C-style strings Lines Catenated-1 press OK.
Copy and paste the result into your Java source.
Use the regex in a Replace-All with callback.
In the callback use the match as a key into your hash table to return the
translation to replace.
Its simple, one pass and oh so fast.
For a more extreme example of the tool see this regex of a 130,000 word dictionary.
Sample of the letter X
"(?:x(?:anth(?:a(?:m|n|te(?:s)?)|e(?:in|ne)|i(?:an|"
"c|n(?:e)?|um)|o(?:ma(?:s|ta)?|psia|us|xyl))|e(?:be"
"c(?:s)?|n(?:arthral|i(?:a(?:l)?|um)|o(?:biotic|cry"
"st(?:s)?|g(?:amy|enous|raft(?:s)?)|lith(?:s)?|mani"
"a|n|ph(?:ile(?:s)?|ob(?:e(?:s)?|ia|y)|ya)|time))|r"
"(?:a(?:fin(?:s)?|n(?:sis|tic)|rch|sia)|ic|o(?:derm"
"(?:a|i(?:a|c))|graphy|m(?:a(?:s|ta)?|orph(?:s)?)|p"
"h(?:agy|ily|yt(?:e(?:s)?|ic))|s(?:is|tom(?:a|ia))|"
"t(?:es|ic))))|i(?:pho(?:id(?:al)?|pag(?:ic|us)|sur"
"an))?|oan(?:a|on)|u|y(?:l(?:e(?:m|n(?:e(?:s)?|ol(?"
":s)?))|i(?:c|tol)|o(?:carp(?:s)?|g(?:en(?:ous)?|ra"
"ph(?:s|y)?)|id(?:in)?|l(?:ogy|s)?|m(?:a(?:s)?|eter"
"(?:s)?)|nic|ph(?:ag(?:an|e(?:s)?)|on(?:e(?:s)?|ic)"
")|rimba(?:s)?|se|tomous)|yl(?:s)?)|st(?:er(?:s)?|i"
"|o(?:i|s)|s|us)?)))"

How to use regular expression for fetching specific data?

I have input stream with the following data:
---------------------------------------------
manil#manil-ubvm:~$ db2level
DB21085I Instance "manil" uses "64" bits and DB2 code release "SQL10010" with
level identifier "0201010E".
Informational tokens are "DB2 v10.1.0.0", "s120403", "LINUXAMD64101", and Fix
Pack "0".
Product is installed at "/home/manil/sqllib".
---------------------------------------------
From above i need v10.1.0.0 to be stored in a string variable.
How to do that using java regular expression?

Use something like this to capture the version pattern :
import java.util.regex.*;
public class RTest {
public static void main(String [] args) {
String raw_data = "asdkgjasdbf984 sdkjfashfiu 4qwsadkfjnv w98sa-asdf08gywbfsd v1231.123.12.11.1 fkjsdfn9823isd";
Pattern version_find = Pattern.compile("v[\\d+\\.?]+");
Pattern directory_find = Pattern.compile("[\\/[^\\/]+]+");
Matcher version_finder = version_find.matcher(raw_data);
while(version_finder.find()) {
System.out.println(version_finder.group());
}
}
}
Output is :
v1231.123.12.11.1
/isd/asdasd2903 ajshdaq09r34/adsj 38/
You really need to understand regexes deeply if you are a programmer. They are one of the essentials. They are hard at first, but once you 'crack them' you don't forget it. Like riding a bike.

This will suit your needs:
String version = yourLine.replaceAll(".*(v\\d+([.]\\d+){3}).*", "$1")

You dont need regularExpression here
just use
String .contain() method and String substring()

can jQuery serialize() use other separator than ampersand?

I am looking for some nice solution. I've got a couple of textfields on my page and I am sending these via Ajax using jQuery serialize method. This serialized string is parsed in my java method to hashmap with key = 'nameOfTextfield' nad value = 'valueInTextfield'
For example, I've got this String stdSel=value1&stdNamText=value2&stdRevText=value3 and everything works fine.
String[] sForm = serializedForm.split("&");
Map<String, String> fForm = new HashMap<String, String>();
for (String part : sForm) {
String key = null;
String value = null;
try {
key = part.split("=")[0];
value = part.split("=",2)[1];
fForm.put(key, value);
//if textfield is empty
} catch(IndexOutOfBoundsException e) {
fForm.put(key, "");
}
}
But this method will break down when ampersand in some textfield appears, for example this stdSel=value1&stdNamText=value2&stdRevText=val&&ue3. My thought was that I'll replace ampersand as separator in searialized string for some other character or maybe more characters. Is it possible and good idea or is there any better way?
Regards
Ondrej

Ampersands are escaped by the serialize function, so they don't break the URL.
What you need to unescape a field you got from an URL is
value = URLDecoder.decode(value,"UTF-8");
But, as was pointed by... Pointy, if you're using a web framework and not using only vanilla java.net java you probably don't have to do this.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.