replace multiple sub-strings in a string

replace multiple sub-strings in a string - java

This function is used to replace certain substrings in a string with respective values.
// map(string_to_replace, string_to_replace_with)
String template = "ola ala kala pala sala";
StringBuilder populatedTemplate = new StringBuilder();
HashMap<String, String> map = new HashMap<>();
map.put("ola", "patola");
map.put("pala", "papala");
int i=0;
for (String word : template.split("'")) {
populatedTemplate.append( map.getOrDefault(word, word));
populatedTemplate.append(" ");
}
System.out.println(populatedTemplate.toString());
This above function works fine if substring to be replaced is surrounded by " "(space).
Ex- String => "Hey {how} are $=you"
if substrings to be replaced is "Hey" or "you", then it works fine. The issue is when I want to replace "how" and "you".
How can I achieve this without additional complexity ?

I you want to replace only the words that you have in the map and keep the rest as it is, you can proceed as next:
String template = "Hey {how} are $=you";
StringBuilder populatedTemplate = new StringBuilder();
Map<String, String> map = new HashMap<>();
map.put("how", "HH");
map.put("you", "YY");
// Pattern allowing to extract only the words
Pattern pattern = Pattern.compile("\\w+");
Matcher matcher = pattern.matcher(template);
int fromIndex = 0;
while (matcher.find(fromIndex)) {
// The start index of the current word
int startIdx = matcher.start();
if (fromIndex < startIdx) {
// Add what we have between two words
populatedTemplate.append(template, fromIndex, startIdx);
}
// The current word
String word = matcher.group();
// Replace the word by itself or what we have in the map
populatedTemplate.append(map.getOrDefault(word, word));
// Start the next find from the end index of the current word
fromIndex = matcher.end();
}
if (fromIndex < template.length()) {
// Add the remaining sub String
populatedTemplate.append(template, fromIndex, template.length());
}
System.out.println(populatedTemplate);
Output:
Hey {HH} are $=YY
Response Update:
Assuming that you want to be able to replace not only words but anything like ${questionNumber}, you will need to create the regular expression dynamically like this:
String template = "Hey {how} are $=you id=minScaleBox-${questionNumber}";
...
map.put("${questionNumber}", "foo");
StringBuilder regex = new StringBuilder();
boolean first = true;
for (String word : map.keySet()) {
if (first) {
first = false;
} else {
regex.append('|');
}
regex.append(Pattern.quote(word));
}
Pattern pattern = Pattern.compile(regex.toString());
...
Output:
Hey {HH} are $=YY id=minScaleBox-foo

Related

Extracting strings that contains particular words

This code can extract sentences that contain a particular word. The problem is if I want to extract several sentences based on different words I must copy it several times. Is there a way of doing this with several words? possibly feeding an array to it?
String o = "Trying to extract this string. And also the one next to it.";
String[] sent = o.split("\\.");
List<String> output = new ArrayList<String>();
for (String sentence : sent) {
if (sentence.contains("this")) {
output.add(sentence);
}
}
System.out.println(">>output=" + output);

You can try this:
String o = "Trying to extract this string. And also the one next to it.";
String[] sent = o.split("\\.");
List<String> keyList = new ArrayList<String>();
keyList.add("this");
keyList.add("these");
keyList.add("that");
List<String> output = new ArrayList<String>();
for (String sentence : sent) {
for (String key : keyList) {
if (sentence.contains(key)) {
output.add(sentence);
break;
}
}
}
System.out.println(">>output=" + output);

String sentence = "First String. Second Int. Third String. Fourth Array. Fifth Double. Sixth Boolean. Seventh String";
List<String> output = new ArrayList<String>();
for(String each: sentence.split("\\.")){
if(inKeyword(each)) output.add(each);
}
System.out.println(output);
Helper Function:
public static Boolean inKeyword(String currentSentence){
String[] keyword = {"int", "double"};
for(String each: keyword){
if(currentSentence.toLowerCase().contains(each)) return true;
}
return false;
}

If you have a list of words to filter for called filter and an array of sentences you could use Collections.disjoint to compare if the words of that sentence does not overlap with the words to filter for. Sadly, this does not work if you filter for "However" and your sentence contains "However,".
Collection<String> filter = /**/;
String[] sentences = /**/;
List<String> result = new ArrayList();
for(String sentence : sentences) {
Collection<String> words = Arrays.asList(sentence.split(" "));
// If they do not not overlap, they overlap
if (!Collections.disjoint(words, filter)) {
result.add(sentence);
}
}

You can use String.matches as follows.
String sentence = ...;
if (sentence.matches(".*(you|can|use).*")) { // Or:
if (sentence.matches(".*\\b(you|can|use)\\b.*")) { // With word boundaries
if (sentence.matches("(?i).*(you|can|use).*")) { // Case insensitive ("You")
In java 8 the following variations might do:
String pattern = ".*(you|can|use).*";
String pattern = new StringJoiner("|", ".*(", ").*)
.add("you")
.add("can")
.add("use")
.toString();
// Or a stream on the words with a joining collector
Arrays.stream(o.split("\\.\\s*"))
filter(sentence -> sentence.matches(pattern))
forEach(System.out::println);

With streams (splitting into sentences and words):
String o = "Trying to extract this string. And also the one next to it.";
Set<String> words = new HashSet<>(Arrays.asList("this", "also"));
List<String> output = Arrays.stream(o.split("\\.")).filter(
sentence -> Arrays.stream(sentence.split("\\s")).anyMatch(
word -> words.contains(word)
)
).collect(Collectors.toList());
System.out.println(">>output=" + output);

Uppercase all characters but not those in quoted strings

I have a String and I would like to uppercase everything that is not quoted.
Example:
My name is 'Angela'
Result:
MY NAME IS 'Angela'
Currently, I am matching every quoted string then looping and concatenating to get the result.
Is it possible to achieve this in one regex expression maybe using replace?

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("\\'(.*?)\\'");
String input = "'s'Hello This is 'Java' Not '.NET'";
Matcher regexMatcher = regex.matcher(input);
StringBuffer sb = new StringBuffer();
int counter = 0;
while (regexMatcher.find())
{// Finds Matching Pattern in String
regexMatcher.appendReplacement(sb, "{"+counter+"}");
matchList.add(regexMatcher.group());// Fetching Group from String
counter++;
}
String format = MessageFormat.format(sb.toString().toUpperCase(), matchList.toArray());
System.out.println(input);
System.out.println("----------------------");
System.out.println(format);
Input: 's'Hello This is 'Java' Not '.NET'
Output: 's'HELLO THIS IS 'Java' NOT '.NET'

You could use a regular expression like this:
([^'"]+)(['"]+[^'"]+['"]+)(.*)
# match and capture everything up to a single or double quote (but not including)
# match and capture a quoted string
# match and capture any rest which might or might not be there.
This will only work with one quoted string, obviously. See a working demo here.

Ok. This will do it for you.. Not efficient, but will work for all cases. I actually don't suggest this solution as it will be too slow.
public static void main(String[] args) {
String s = "'Peter' said, My name is 'Angela' and I will not change my name to 'Pamela'.";
Pattern p = Pattern.compile("('\\w+')");
Matcher m = p.matcher(s);
List<String> quotedStrings = new ArrayList<>();
while(m.find()) {
quotedStrings.add(m.group(1));
}
s=s.toUpperCase();
// System.out.println(s);
for (String str : quotedStrings)
s= s.replaceAll("(?i)"+str, str);
System.out.println(s);
}
O/P :
'Peter' SAID, MY NAME IS 'Angela' AND I WILL NOT CHANGE MY NAME TO 'Pamela'.

Adding to the answer by #jan_kiran, we need to call the
appendTail()
method appendTail(). Updated code is:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("\\'(.*?)\\'");
String input = "'s'Hello This is 'Java' Not '.NET'";
Matcher regexMatcher = regex.matcher(input);
StringBuffer sb = new StringBuffer();
int counter = 0;
while (regexMatcher.find())
{// Finds Matching Pattern in String
regexMatcher.appendReplacement(sb, "{"+counter+"}");
matchList.add(regexMatcher.group());// Fetching Group from String
counter++;
}
regexMatcher.appendTail(sb);
String formatted_string = MessageFormat.format(sb.toString().toUpperCase(), matchList.toArray());

I did not find my luck with these solutions, as they seemed to remove trailing non-quoted text.
This code works for me, and treats both ' and " by remembering the last opening quotation mark type. Replace toLowerCase appropriately, of course...
Maybe this is extremely slow; I don't know:
private static String toLowercaseExceptInQuotes(String line) {
StringBuffer sb = new StringBuffer(line);
boolean nowInQuotes = false;
char lastQuoteType = 0;
for (int i = 0; i < sb.length(); ++i) {
char cchar = sb.charAt(i);
if (cchar == '"' || cchar == '\''){
if (!nowInQuotes) {
nowInQuotes = true;
lastQuoteType = cchar;
}
else {
if (lastQuoteType == cchar) {
nowInQuotes = false;
}
}
}
else if (!nowInQuotes) {
sb.setCharAt(i, Character.toLowerCase(sb.charAt(i)));
}
}
return sb.toString();
}

Getting incorrect output when using the Java method indexof in combination with substring to create a new string

In my program I am trying to split a string at the "," character. After I split the string, I need to create a new string after the "=" character. Then I need to rebuild the string back to its original state. Currently I'm able to split the string and rebuild it to its original state. However when I try to create a new string using the indexof and substring methods, I'm not getting the correct string. I have listed my code below along with my current output and my disired output. Thanks in advance for your help.
public class StringTestProgram {
public static void main(String[] args) {
String relativeDN = "cn=abc,dn=xyz,ou=abc/def";
System.out.println(relativeDN);
//Split String
String[] stringData = relativeDN.split(",");
{
StringBuilder sb = new StringBuilder();
CharSequence charAdded = ",";
// loop thru each element of the array
for (int place = 0; place < stringData.length; place++) {
System.out.println(stringData[place]);
{
int eq = relativeDN.indexOf('=');
String sub = relativeDN.substring(0, eq);
System.out.println(sub);
}
// append element to the StringBuilder
sb.append(stringData[place]);
// avoids adding an extra ',' at the end
if (place < stringData.length - 1)
// if not at the last element, add the ',' character
sb.append(charAdded);
}
System.out.print(sb.toString());
}
}
}
My original string "cn=abc,dn=xyz,ou=abc/def"
My current output:
cn=abc (split string)
cn (create new String)
dn=xyz (split string)
cn (create new String)
ou=abc/def (split string)
cn (create new String)
cn=abc,dn=xyz,ou=abc/def (rebuild String to its original form)
My desired output:
cn=abc (split string)
abc (create new string)
dn=xyz (split string)
xyz (create new String)
ou=abc/def (split string)
abc/def (create new String)
cn=abc,dn=xyz,ou=abc/def (rebuild String to its original form)

Change the following lines:
int eq = relativeDN.indexOf('=');
String sub = relativeDN.substring(0, eq); in your for loop to
int eq = stringData[place].indexOf('=');
String sub = stringData[place].substring(eq+1, stringData[place].length());
You need the separated strings in each iteration so you will need to use stringData[place]. As you were using relativeDN, it was taking your original string in every iteration and was returning cn for each iteration.
Also for printing the string after =, you need to specify the starting location as the location after = and end location as end of string(length) to substring function.

Using Pattern and Matcher classes.
String s = "cn=abc,dn=xyz,ou=abc/def";
String parts[] = s.split(",");
for(String i: parts)
{
System.out.println(i);
Matcher m = Pattern.compile("(?<==).+").matcher(i);
while(m.find())
{
System.out.println(m.group());
}
}
System.out.println(s);
Output:
cn=abc
abc
dn=xyz
xyz
ou=abc/def
abc/def
cn=abc,dn=xyz,ou=abc/def
OR
Using the StringBuilder class.
String relativeDN = "cn=abc,dn=xyz,ou=abc/def";
String[] stringData = relativeDN.split(",");
StringBuilder sb = new StringBuilder();
for(int i =0;i<stringData.length;i++)
{
if(i!=0)
{
sb.append(",");
}
System.out.println(stringData[i]);
int eq = stringData[i].indexOf('=');
String sub = stringData[i].substring(eq+1,stringData[i].length());
System.out.println(sub);
sb.append(stringData[i]);
}
System.out.print(sb.toString());
Output:
cn=abc
abc
dn=xyz
xyz
ou=abc/def
abc/def
cn=abc,dn=xyz,ou=abc/def

Equivalent to StringTokenizer with multiple characters delimiters

I try to split a String into tokens.
The token delimiters are not single characters, some delimiters are included into others (example, & and &&), and I need to have the delimiters returned as token.
StringTokenizer is not able to deal with multiple characters delimiters. I presume it's possible with String.split, but fail to guess the magical regular expression that will suits my needs.
Any idea ?
Example:
Token delimiters: "&", "&&", "=", "=>", " "
String to tokenize: a & b&&c=>d
Expected result: an string array containing "a", " ", "&", " ", "b", "&&", "c", "=>", "d"
--- Edit ---
Thanks to all for your help, Dasblinkenlight gives me the solution. Here is the "ready to use" code I wrote with his help:
private static String[] wonderfulTokenizer(String string, String[] delimiters) {
// First, create a regular expression that matches the union of the delimiters
// Be aware that, in case of delimiters containing others (example && and &),
// the longer may be before the shorter (&& should be before &) or the regexpr
// parser will recognize && as two &.
Arrays.sort(delimiters, new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return -o1.compareTo(o2);
}
});
// Build a string that will contain the regular expression
StringBuilder regexpr = new StringBuilder();
regexpr.append('(');
for (String delim : delimiters) { // For each delimiter
if (regexpr.length() != 1) regexpr.append('|'); // Add union separator if needed
for (int i = 0; i < delim.length(); i++) {
// Add an escape character if the character is a regexp reserved char
regexpr.append('\\');
regexpr.append(delim.charAt(i));
}
}
regexpr.append(')'); // Close the union
Pattern p = Pattern.compile(regexpr.toString());
// Now, search for the tokens
List<String> res = new ArrayList<String>();
Matcher m = p.matcher(string);
int pos = 0;
while (m.find()) { // While there's a delimiter in the string
if (pos != m.start()) {
// If there's something between the current and the previous delimiter
// Add it to the tokens list
res.add(string.substring(pos, m.start()));
}
res.add(m.group()); // add the delimiter
pos = m.end(); // Remember end of delimiter
}
if (pos != string.length()) {
// If it remains some characters in the string after last delimiter
// Add this to the token list
res.add(string.substring(pos));
}
// Return the result
return res.toArray(new String[res.size()]);
}
It could be optimize if you have many strings to tokenize by creating the Pattern only one time.

You can use the Pattern and a simple loop to achieve the results that you are looking for:
List<String> res = new ArrayList<String>();
Pattern p = Pattern.compile("([&]{1,2}|=>?| +)");
String s = "s=a&=>b";
Matcher m = p.matcher(s);
int pos = 0;
while (m.find()) {
if (pos != m.start()) {
res.add(s.substring(pos, m.start()));
}
res.add(m.group());
pos = m.end();
}
if (pos != s.length()) {
res.add(s.substring(pos));
}
for (String t : res) {
System.out.println("'"+t+"'");
}
This produces the result below:
's'
'='
'a'
'&'
'=>'
'b'

Split won't do it for you as it removed the delimeter. You probably need to tokenize the string on your own (i.e. a for-loop) or use a framework like
http://www.antlr.org/

Try this:
String test = "a & b&&c=>d=A";
String regEx = "(&[&]?|=[>]?)";
String[] res = test.split(regEx);
for(String s : res){
System.out.println("Token: "+s);
}
I added the '=A' at the end to show that that is also parsed.
As mentioned in another answer, if you need the atypical behaviour of keeping the delimiters in the result, you will probably need to create you parser yourself....but in that case you really have to think about what a "delimiter" is in your code.

java regular expression find and replace

I am trying to find environment variables in input and replace them with values.
The pattern of env variable is ${\\.}
Pattern myPattern = Pattern.compile( "(${\\.})" );
String line ="${env1}sojods${env2}${env3}";
How can I replace env1 with 1 and env2 with 2 and env3 with 3, so
that after this I will have a new string 1sojods23?

Strings in Java are immutable, which makes this somewhat tricky if you are talking about an arbitrary number of things you need to find and replace.
Specifically you need to define your replacements in a Map, use a StringBuilder (before Java 9, less performant StringBuffer should have been used) and the appendReplacements() and appendTail() methods from Matcher. The final result will be stored in your StringBuilder (or StringBuffer).
Map<String, String> replacements = new HashMap<String, String>() {{
put("${env1}", "1");
put("${env2}", "2");
put("${env3}", "3");
}};
String line ="${env1}sojods${env2}${env3}";
String rx = "(\\$\\{[^}]+\\})";
StringBuilder sb = new StringBuilder(); //use StringBuffer before Java 9
Pattern p = Pattern.compile(rx);
Matcher m = p.matcher(line);
while (m.find())
{
// Avoids throwing a NullPointerException in the case that you
// Don't have a replacement defined in the map for the match
String repString = replacements.get(m.group(1));
if (repString != null)
m.appendReplacement(sb, repString);
}
m.appendTail(sb);
System.out.println(sb.toString());
Output:
1sojods23

I know this is old, I was myself looking for a, appendReplacement/appendTail example when I found it; However, the OP's question doesn't need those complicated multi-line solutions I saw here.
In this exact case, when the string to replace holds itself the value we want to replace with, then this could be done easily with replaceAll:
String line ="${env1}sojods${env2}${env3}";
System.out.println( line.replaceAll("\\$\\{env([0-9]+)\\}", "$1") );
// Output => 1sojods23
DEMO
When the replacement is random based on some conditions or logic on each match, then you can use appendReplacement/appendTail for example

Hopefully you would find this code useful:
Pattern phone = Pattern.compile("\\$\\{env([0-9]+)\\}");
String line ="${env1}sojods${env2}${env3}";
Matcher action = phone.matcher(line);
StringBuffer sb = new StringBuffer(line.length());
while (action.find()) {
String text = action.group(1);
action.appendReplacement(sb, Matcher.quoteReplacement(text));
}
action.appendTail(sb);
System.out.println(sb.toString());
The output is the expected: 1sojods23.

This gives you 1sojods23:
String s = "${env1}sojods${env2}${env3}";
final Pattern myPattern = Pattern.compile("\\$\\{[^\\}]*\\}");
Matcher m = myPattern.matcher(s);
int i = 0;
while (m.find()) {
s = m.replaceFirst(String.valueOf(++i));
m = myPattern.matcher(s);
}
System.out.println(s);
and this works too:
final String re = "\\$\\{[^\\}]*\\}";
String s = "${env1}sojods${env2}${env3}";
int i = 0;
String t;
while (true) {
t = s.replaceFirst(re, String.valueOf(++i));
if (s.equals(t)) {
break;
} else {
s = t;
}
}
System.out.println(s);

You can use a StringBuffer in combination with the Matcher appendReplacement() method, but if the the pattern does not match, there is no point in creating the StringBuffer.
For example, here is a pattern that matches ${...}. Group 1 is the contents between the braces.
static Pattern rxTemplate = Pattern.compile("\\$\\{([^}\\s]+)\\}");
And here is sample function that uses that pattern.
private static String replaceTemplateString(String text) {
StringBuffer sb = null;
Matcher m = rxTemplate.matcher(text);
while (m.find()) {
String t = m.group(1);
t = t.toUpperCase(); // LOOKUP YOUR REPLACEMENT HERE
if (sb == null) {
sb = new StringBuffer(text.length());
}
m.appendReplacement(sb, t);
}
if (sb == null) {
return text;
} else {
m.appendTail(sb);
return sb.toString();
}
}

Map<String, String> replacements = new HashMap<String, String>() {
{
put("env1", "1");
put("env2", "2");
put("env3", "3");
}
};
String line = "${env1}sojods${env2}${env3}";
String rx = "\\$\\{(.*?)\\}";
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile(rx);
Matcher m = p.matcher(line);
while (m.find()) {
// Avoids throwing a NullPointerException in the case that you
// Don't have a replacement defined in the map for the match
String repString = replacements.get(m.group(1));
if (repString != null)
m.appendReplacement(sb, repString);
}
m.appendTail(sb);
System.out.println(sb.toString());
In the above example we can use map with just key and values --keys can be env1 ,env2 ..

Use groups once it is matched ${env1} will be your first group and then you use regex to replace what is in each group.
Pattern p = Pattern.compile("(${\\.})");
Matcher m = p.matcher(line);
while (m.find())
for (int j = 0; j <= m.groupCount(); j++)
//here you do replacement - check on the net how to do it;)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

replace multiple sub-strings in a string - java

Related

Extracting strings that contains particular words

Uppercase all characters but not those in quoted strings

Getting incorrect output when using the Java method indexof in combination with substring to create a new string

Equivalent to StringTokenizer with multiple characters delimiters

java regular expression find and replace

Categories

Resources