I have searched SO (and Google) but not found any fully matching answer to my question:
I want to replace all swedish characters and whitespace in a String with another character. I would like it to work as follows:
"å" and "ä" should be replaced with "a"
"ö" should be replaced with "o"
"Å" and "Ä" should be replace with "A"
"Ö" should be replaced with "O"
" " should be replaced with "-"
Can this be achieved with regex (or any other way), and if so, how?
Of course, the below method does the job (and can be improved, I know, by replacing for example "å" and "ä" on the same line):
private String changeSwedishCharactersAndWhitespace(String string) {
String newString = string.replaceAll("å", "a");
newString = string.replaceAll("ä", "a");
newString = string.replaceAll("ö", "o");
newString = string.replaceAll("Å", "A");
newString = string.replaceAll("Ä", "A");
newString = string.replaceAll("Ö", "O");
newString = string.replaceAll(" ", "-");
return newString;
}
I know how to use regex to replace, for example, all "å", "ä", or "ö" with "". The question is how do I replace a character using regex with another depending on which character it is? There must surely be a better way using regex than the above aproach?
For latin characters with diacritics, a unicode normalization (java text) to retrieve basic letter code + diacritic combining code might help. Something like:
import java.text.Normalizer;
newString = Normalizer.normalize(string,
Normalizer.Form.NFKD).replaceAll("\\p{M}", "");
You can use StringUtils.replaceEach, like this:
private String changeSwedishCharactersAndWhitespace(String string) {
String newString = StringUtils.replaceEach (string,
new String[] {"å", "ä", "ö", "Å", "Ä", "Ö", " "},
new String[] {"a", "a", "o", "A", "A", "O", "-"});
return newString;
}
I think there is not a common regex for replacing these characters at once. Apart from that, you can facilitate your replacement work by using a HashMap.
HashMap<String, String> map = new HashMap<String, String>()
{{put("ä", "a"); /*put others*/}};
for (Map.Entry<String, String> entry : map.entrySet())
newString = string.replaceAll(entry.getKey(), entry.getValue());
You can write your own mapper usen the matcher.find method:
public static void main(String[] args) {
String from = "äöÂ";
String to = "aoA";
String testString = "Hellö Wärld";
Pattern p = Pattern.compile(String.format("[%s]", from));
Matcher m = p.matcher(testString);
String result = testString;
while (m.find()){
char charFound = m.group(0).charAt(0);
result = result.replace(charFound, to.charAt(from.indexOf(charFound)));
}
System.out.println(result);
}
this will replace
Hellö Wärld
with
Hello Warld
Related
String[] operatorList = { "name", "first_name", "last_name", "city" };
String originalString = "city=Houston^ORlast_name=Cervantsz^ORfirst_name=John^name=don";
for (String opElement : operatorList) {
if (originalString.contains(opElement)) {
String tempStr = originalString.replace(opElement, "user." + opElement);
originalString = tempStr;
}
}
System.out.println("originalString " + originalString);
Output:
user.city=Houston^ORlast_user.name=Cervantsz^ORfirst_user.name=John^user.name=don
When i am trying to replace name with "user.name" at that time name from "last_name" is replaced with "last_user.name" and first_name with first_user.name
But i want replace "name" with "user.name" and "last_name" with "user.last_name"
and "first_name" with "user.first_name".
Any help appreciated.
You can add prefix all key and control that key. Example
String[] operatorList = {"name", "first_name", "last_name", "city"};
String originalString = "city=Houston^ORlast_name=Cervantsz^ORfirst_name=John^ORname=don";
for (String opElement : operatorList) {
if (originalString.contains("^OR"+opElement)) {
String tempStr = originalString.replace(opElement, "user." + opElement);
originalString = tempStr;
}
}
System.out.println("originalString " + originalString);
If the values you are trying to change are always unique and generated (meaning they are always in the same order), you can simply put your operators in the same order and use replaceLast() instead.
A more complete solution would be to determine how the string is constructed. Do all the values have a ^ in front of them? Is OR generated for the same values or is it to indicate optional values?. So in the end, what allows you to split the string properly. Then you can use a Regex to use the surrounding characters.
I would format the string to make sure the splitters are constant (all "^OR" become "##%!!" and all remaining "^" become "%!!") so all replaced strings start with !!. Then I would reformat the string to the original format using the remaining "##%" or "%" :
String[] operatorList = { "name", "first_name", "last_name", "city" };
String originalString = "city=Houston^ORlast_name=Cervantsz^ORfirst_name=John^name=don";
originalString = originalString.replaceAll("\\^OR", "##%!!");
originalString = originalString.replaceAll("\\^", "%!!");
//the order is important here
for (String opElement : operatorList) {
if (originalString.startsWith(opElement)) {
originalString = originalString.replaceFirst(opElement, "user." + opElement);
}
originalString = originalString.replaceAll("!!" + opElement, "user." + opElement);
// no need for an other alternative here because replaceAll returns the
// String as is if it does not find the first parameter in the String.
}
originalString = originalString.replaceAll("##%", "^OR");
originalString = originalString.replaceAll("%", "^");
// the order here is also important
outputs : "user.city=Houston^ORuser.last_name=Cervantsz^ORuser.first_name=John^user.name=don"
If all keypairs need prefix "user.", I would like to split originalString first.
In Java8
String originalString = "city=Houston^ORlast_name=Cervantsz^ORfirst_name=John^name=don";
String[] params = originalString.split("\\^");
String result = String.join("^", Arrays.stream(params)
.map(param -> param.startsWith("OR") ? "ORuser." + param.substring(2) : "user." + param)
.collect(Collectors.toList()));
System.out.println(result);
It can also be changed to for loop type.
You may use a quick search and replace with an alternation based pattern created dynamically from the search words only when they are preceded with a word boundary or ^ + OR/AND/etc. operators and followed with a word boundary. Note that this solution assumes the search words only consist of word chars (letters, digits or _):
String[] operatorList = { "name", "first_name", "last_name", "city" };
// assuming operators only consist of word chars
String pat = "(\\b|\\^(?:OR|AND)?)(" + String.join("|", operatorList) + ")\\b";
String originalString = "city=Houston^ORlast_name=Cervantsz^ORfirst_name=John^name=don";
originalString = originalString.replaceAll(pat, "$1user.$2");
System.out.println(originalString);
// => user.city=Houston^ORuser.last_name=Cervantsz^ORuser.first_name=John^user.name=don
See the Java demo online
The regex will look like
(\b|\^(?:OR|AND)?)(name|first_name|last_name|city)\b
See the regex demo.
Details
(\b|\^(?:OR|AND)?) - Group 1: a word boundary \b or a ^ symbol and an optional substring, OR or AND (you may add more here after |)
(name|first_name|last_name|city) - Group 2: any of the search words
\b - a trailing word boundary.
The $1 in the replacement pattern inserts the contents of Group 1 and $2 does the same with Group 2 contents.
I have a string that may look kind of like this: "aaaaffdddd" and want to replace characters that occur 3 times (or more) with [NUMBER_OF_CHARACTERS][ONE_TIME_THE_CHARACTER] - I am not very confident with RegEx, but I came up with "([A-z])(\1{2,})" to find exactly those. However, in javas String.replaceAll() I have no possibility to refer to the number of characters in a group (?) and if I use Matcher.appendReplace() and a StringBuffer I lose the rest of my string since the result should still include characters which do not occur 3 or more times.
The example above should encode to "4aff4d"
This is not easy as you cannot get # of matches in replacement part easily. Try this code:
Pattern pat = Pattern.compile("(?i)([A-Z])(?=\\1{2})");
String str = "aaaaffdddd";
Matcher mat = pat.matcher(str);
Map<String, Integer> charMap = new HashMap<>();
while(mat.find()) {
String key = mat.group();
if (!charMap.containsKey(key))
charMap.put(key, 3);
else
charMap.put(key, charMap.get(key)+1);
}
System.out.println("map " + charMap);
for (Entry<String, Integer> e: charMap.entrySet()) {
str = str.replaceAll(e.getKey() + "+", e.getValue() + e.getKey());
}
System.out.println(str);
OUTPUT:
map {d=4, a=4}
4aff4d
You can try this (not tested)
String str = "aaaaffdddd";
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile("([A-z])(\\1{2,})");
Matcher m = p.matcher(str);
while (m.find()) {
m.appendReplacement(sb, "" + (m.group(2).length() + 1) + m.group(1));
}
System.out.println(sb);
After using appendReplacement on a StringBuffer I had to call appendTail in order to rebuild the rest of the String. Thanks to Holger for his Suggestion!
I want to add Two java JSON String manually , so for this i need to remove "}" and replace it with comma "," of first JSON String and remove the first "{" of the second JSON String .
This is my program
import java.util.Map;
import org.codehaus.jackson.type.TypeReference;
public class Hi {
private static JsonHelper jsonHelper = JsonHelper.getInstance();
public static void main(String[] args) throws Exception {
Map<String, Tracker> allCusts = null;
String A = "{\"user5\":{\"Iden\":4,\"Num\":1},\"user2\":{\"Iden\":5,\"Num\":1}}";
String B = "{\"user1\":{\"Iden\":4,\"Num\":1},\"user3\":{\"Iden\":6,\"Num\":1},\"user2\":{\"Iden\":5,\"Num\":1}}";
String totalString = A + B;
if (null != totalString) {
allCusts = (Map<String, Tracker>) jsonHelper.toObject(
totalString, new TypeReference<Map<String, Tracker>>() {
});
}
System.out.println(allCusts);
}
}
When adding two Strings A + B
I want to remove the last character of "}" in A and replace it with "," and remove the FIrst character of "{" in B .
SO this should it look like .
String A = "{\"user5\":{\"Iden\":4,\"Num\":1},\"user2\":{\"Iden\":5,\"Num\":1},";
String B = "\"user1\":{\"Iden\":4,\"Num\":1},\"user3\":{\"Iden\":6,\"Num\":1},\"user2\":{\"Iden\":5,\"Num\":1}}";
I have tried
String Astr = A.replace(A.substring(A.length()-1), ",");
String Bstr = B.replaceFirst("{", "");
String totalString = Astr + Bstr ;
With this i was getting
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repetition
please suggest .
{ is a control character for Regular Expressions, and since replaceFirst takes a string representation of a Regular Expression as its first argument, you need to escape the { so it's not treated as a control character:
String Bstr = B.replaceFirst("\\{", "");
I would say that using the replace methods is really overkill here since you're just trying to chop a character off of either end of a string. This should work just as well:
String totalString = A.substring(0, A.length()-1) + "," + B.substring(1);
Of course, regex doesn't look like a very good tool for this. But the following seem to work:
String str = "{..{...}..}}";
str = str.replaceFirst("\\{", "");
str = str.replaceFirst("}$", ",");
System.out.println(str);
Output:
..{...}..},
Some issues in your first two statements. Add 0 as start index in substring method and leave with that. Put \\ as escape char in matching pattern and ut a , in second statement as replacement value.
String Astr = A.substring(0, A.length()-1);//truncate the ending `}`
String Bstr = B.replaceFirst("\\{", ",");//replaces first '{` with a ','
String totalString = Astr + Bstr ;
Please note: There are better ways, but I am just trying to correct your statements.
I try to split a String into tokens.
The token delimiters are not single characters, some delimiters are included into others (example, & and &&), and I need to have the delimiters returned as token.
StringTokenizer is not able to deal with multiple characters delimiters. I presume it's possible with String.split, but fail to guess the magical regular expression that will suits my needs.
Any idea ?
Example:
Token delimiters: "&", "&&", "=", "=>", " "
String to tokenize: a & b&&c=>d
Expected result: an string array containing "a", " ", "&", " ", "b", "&&", "c", "=>", "d"
--- Edit ---
Thanks to all for your help, Dasblinkenlight gives me the solution. Here is the "ready to use" code I wrote with his help:
private static String[] wonderfulTokenizer(String string, String[] delimiters) {
// First, create a regular expression that matches the union of the delimiters
// Be aware that, in case of delimiters containing others (example && and &),
// the longer may be before the shorter (&& should be before &) or the regexpr
// parser will recognize && as two &.
Arrays.sort(delimiters, new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return -o1.compareTo(o2);
}
});
// Build a string that will contain the regular expression
StringBuilder regexpr = new StringBuilder();
regexpr.append('(');
for (String delim : delimiters) { // For each delimiter
if (regexpr.length() != 1) regexpr.append('|'); // Add union separator if needed
for (int i = 0; i < delim.length(); i++) {
// Add an escape character if the character is a regexp reserved char
regexpr.append('\\');
regexpr.append(delim.charAt(i));
}
}
regexpr.append(')'); // Close the union
Pattern p = Pattern.compile(regexpr.toString());
// Now, search for the tokens
List<String> res = new ArrayList<String>();
Matcher m = p.matcher(string);
int pos = 0;
while (m.find()) { // While there's a delimiter in the string
if (pos != m.start()) {
// If there's something between the current and the previous delimiter
// Add it to the tokens list
res.add(string.substring(pos, m.start()));
}
res.add(m.group()); // add the delimiter
pos = m.end(); // Remember end of delimiter
}
if (pos != string.length()) {
// If it remains some characters in the string after last delimiter
// Add this to the token list
res.add(string.substring(pos));
}
// Return the result
return res.toArray(new String[res.size()]);
}
It could be optimize if you have many strings to tokenize by creating the Pattern only one time.
You can use the Pattern and a simple loop to achieve the results that you are looking for:
List<String> res = new ArrayList<String>();
Pattern p = Pattern.compile("([&]{1,2}|=>?| +)");
String s = "s=a&=>b";
Matcher m = p.matcher(s);
int pos = 0;
while (m.find()) {
if (pos != m.start()) {
res.add(s.substring(pos, m.start()));
}
res.add(m.group());
pos = m.end();
}
if (pos != s.length()) {
res.add(s.substring(pos));
}
for (String t : res) {
System.out.println("'"+t+"'");
}
This produces the result below:
's'
'='
'a'
'&'
'=>'
'b'
Split won't do it for you as it removed the delimeter. You probably need to tokenize the string on your own (i.e. a for-loop) or use a framework like
http://www.antlr.org/
Try this:
String test = "a & b&&c=>d=A";
String regEx = "(&[&]?|=[>]?)";
String[] res = test.split(regEx);
for(String s : res){
System.out.println("Token: "+s);
}
I added the '=A' at the end to show that that is also parsed.
As mentioned in another answer, if you need the atypical behaviour of keeping the delimiters in the result, you will probably need to create you parser yourself....but in that case you really have to think about what a "delimiter" is in your code.
I'm not strong in regex, so any help would be appreciated.
I need to parse such strings:
["text", "text", ["text",["text"]],"text"]
And output should be (4 strings):
text, text, ["text",["text"]], text
I've tried this pattern (\\[[^\\[,^\\]]*\\])|(\"([^\"]*)\"):
String data="\"aa\", \"aaa\", [\"bb\", [\"1\",\"2\"]], [cc]";
Pattern p=Pattern.compile("(\\[[^\\[,^\\]]*\\])|(\"([^\"]*)\")");
But output is (quotes themselves in output are not so critical):
"aa", "aaa", "bb", "1", "2", [cc]
How to improve my regex?
I'm not sure regex are able to do that kind of stuff on their own. Here is a way to do it though:
// data string
String input = "\"aa\", \"a, aa\", [\"bb\", [\"1\", \"2\"]], [cc], [\"dd\", [\"5\"]]";
System.out.println(input);
// char that can't ever be within the data string
char tempReplacement = '#';
// escape strings containing commas, e.g "hello, world", ["x, y", 42]
while(input.matches(".*\"[^\"\\[\\]]+,[^\"\\[\\]]+\".*")) {
input = input.replaceAll("(\"[^\"\\[\\]]+),([^\"\\[\\]]+\")", "$1" + tempReplacement + "$2");
}
// while there are "[*,*]" substrings
while(input.matches(".*\\[[^\\]]+,[^\\]]+\\].*")) {
// replace the nested "," chars by the replacement char
input = input.replaceAll("(\\[[^\\]]+),([^\\]]+\\])", "$1" + tempReplacement + "$2");
}
// split the string by the remaining "," (i.e. those non nested)
String[] split = input.split(",");
List<String> output = new LinkedList<String>();
for(String s : split) {
// replace all the replacement chars by a ","
s = s.replaceAll(tempReplacement + "", ",");
s = s.trim();
output.add(s);
}
// syso
System.out.println("SPLIT:");
for(String s : output) {
System.out.println("\t" + s);
}
Output:
"aa", "a, aa", ["bb", ["1", "2"]], [cc], ["dd", ["5"]]
SPLIT:
"aa"
"a, aa"
["bb", ["1","2"]]
[cc]
["dd", ["5"]]
PS: the code seems complex 'cause commented. Here is a more concise version:
public static List<String> split(String input, char tempReplacement) {
while(input.matches(".*\"[^\"\\[\\]]+,[^\"\\[\\]]+\".*")) {
input = input.replaceAll("(\"[^\"\\[\\]]+),([^\"\\[\\]]+\")", "$1" + tempReplacement + "$2");
}
while(input.matches(".*\\[[^\\]]+,[^\\]]+\\].*")) {
input = input.replaceAll("(\\[[^\\]]+),([^\\]]+\\])", "$1" + tempReplacement + "$2");
}
String[] split = input.split(",");
List<String> output = new LinkedList<String>();
for(String s : split) {
output.add(s.replaceAll(tempReplacement + "", ",").trim());
}
return output;
}
Call:
String input = "\"aa\", \"a, aa\", [\"bb\", [\"1\", \"2\"]], [cc], [\"dd\", [\"5\"]]";
List<String> output = split(input, '#');
It seems that you have recursion in your input, so if you have many nested [] regexes are probably not the best solution.
For this purpose I think it's far better/easier to use simple algorithm using indexOf() and substring(). It's also aften more efficient!
Unfortunately i don't think you can do that with Java regexes. What you have here is recursive expression.. This type of language is not amendable to basic regular expressions (which is what java Pattern actually is).
But it's not that hard to write a small recursive descent parser for that language.
You can check to following answer for inspiration: java method for parsing nested expressions