replace words using regex [duplicate]

replace words using regex [duplicate] - java

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
regex replace all ignore case
I need to replace all occurrences of Sony Ericsson with a tilda in between them. This is what I have tried
String outText="";
String inText="Sony Ericsson is a leading company in mobile. The company sony ericsson was found in oct 2001";
String word = "sony ericsson";
outText = inText.replaceAll(word, word.replaceAll(" ", "~"));
System.out.println(outText);
The output of this is
Sony Ericsson is a leading company in mobile. The company sony~ericsson was found in oct 2001
But what I want is
Sony~Ericsson is a leading company in mobile. The company sony~ericsson was found in oct 2001
It should ignore cases & give the desired output.

Change it to
outText = inText.replaceAll("(?i)" + word, word.replaceAll(" ", "~"));
to make the search / replace case insensitive.
String outText="";
String inText="Sony Ericsson is a leading company in mobile. " +
"The company sony ericsson was found in oct 2001";
String word = "sony ericsson";
outText = inText.replaceAll("(?i)" + word, word.replaceAll(" ", "~"));
System.out.println(outText);
Output:
sony~ericsson is a leading company in mobile.
The company sony~ericsson was found in oct 2001
Avoid ruining the original capitalization:
In the above approach however, you're ruining the capitalization of the replaced word. Here is a better suggestion:
String inText="Sony Ericsson is a leading company in mobile. " +
"The company sony ericsson was found in oct 2001";
String word = "sony ericsson";
Pattern p = Pattern.compile(word, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(inText);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String replacement = m.group().replace(' ', '~');
m.appendReplacement(sb, Matcher.quoteReplacement(replacement));
}
m.appendTail(sb);
String outText = sb.toString();
System.out.println(outText);
Output:
Sony~Ericsson is a leading company in mobile.
The company sony~ericsson was found in oct 2001

str.replaceAll(regex, repl) is equal to Pattern.compile(regex).matcher(str).replaceAll(repl). Thus, you can make your matcher case-insensitive with a flag:
Pattern.compile(regex, Pattern.CASE_INSENSITIVE).matcher(str).replaceAll(repl)
Using backreferences to preserve case:
Pattern.compile("(sony) (ericsson)", Pattern.CASE_INSENSITIVE)
.matcher(str)
.replaceAll("$1~$2")
Gives:
Sony~Ericsson is a leading company in mobile. The company sony~ericsson was found in oct 2001

String outText = inText.replaceAll("(?i)(Sony) (Ericsson)", "$1~$2");
Output:
Sony~Ericsson is a leading company in mobile. The company Sony~ericsson was found in oct 2001

Related

How to exclude underscore from regex group java

I am using the name of excel files that can be in this format
table_A_Apr_2000.xlsx and I want an expression that would give me groups as string object below form
[table_A, Apr, 2000, .xlsx]
when I am using this expression in my code
String table="table_A";
String[] slist = {"table_A_Apr_2001.xlsx"};
Pattern p = Pattern.compile("^"+table+"|\\d+|\\D+|[^_]*");
for(int i=0; i<slist.length;i++){
Matcher m = p.matcher(slist[i]);
List<String> a = new ArrayList<String>();
while(m.find()){
a.add((m.group()));
}
System.out.println(a);
System.out.println("~~~~~");
}
it gives following output
[table_A, _Apr_, 2001, .xlsx, ]
but I want it to be like
[table_A, Apr, 2000, .xlsx]
Any suggestions will be much appreciated, especially in the pattern expression part

\\D represents every non-digit which includes _. To eliminate it create intersection of \\D and [^_] sets with && operator. Try using [\\D&&[^_]]+ instead of \\D+|[^_]*
OR since \D is negation of \d, we can use De Morgan's law that ~p AND ~q is same as ~(p OR q) and rewrite it as [^\\d_]+.

You could use a formal regex matcher, but one option which might be workable here would be to do an intelligent split of the filename:
String filename = "table_A_Apr_2001.xlsx";
filename = filename.substring(0, filename.indexOf('.'));
String[] parts = filename.split("_(?=[^_]{3,})");
System.out.println("table: " + parts[0]);
System.out.println("month: " + parts[1]);
System.out.println("year: " + parts[2]);
table: table_A
month: Apr
year: 2001
Demo

java parse regex multiple capture groups

Hi I need to be able to handle both of these scenarios
John, party of 4
william, party of 6 dislikes John, jeff
What I want to capture is
From string 1: John, 4
From String 2: william, 6, john, jeff
I'm pretty stumped at how to achieve this
I know that ([^,])+ gives me the first group (just the name before the comma, without including the comma) but I have no clue on how to concatenate the other portion of the expression.

You may use
(\w+)(?:,\s*party of (\d+)|(?![^,]))
See the regex demo.
Details
(\w+) - Group 1: one or more word chars
(?:,\s*party of (\d+)|(?![^,])) - a non-capturing group matching
,\s*party of (\d+) - ,, then 0+ whitespaces, then party of and a space, and then Group 2 capturing 1+ digits
| - or
(?![^,]) - a location that is followed with , or end of string.
See Java demo:
String regex = "(\\w+)(?:,\\s*party of (\\d+)|(?![^,]))";
List<String> strings = Arrays.asList("John, party of 4", "william, party of 6 dislikes John, jeff");
Pattern pattern = Pattern.compile(regex);
for (String s : strings) {
System.out.println("-------- Testing '" + s + "':");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(1) + ": " + (matcher.group(2) != null ? matcher.group(2) : "N/A"));
}
}
Output:
-------- Testing 'John, party of 4':
John: 4
-------- Testing 'william, party of 6 dislikes John, jeff':
william: 6
John: N/A
jeff: N/A

how to writ regex and how making more one group in java?

Please help me! I am writing a program in Java using Regex. I have many structures for the sentence.
Given sentence - "book 'learning java' for doctor Ahmed Mohamed" or "The best title: learning java for Ahmed Mohamed ", and so on.
that means:
(book) may be [the book or text: or (text)].
(for doctor ) may be [ for author or for or by or for doctor].
the regex is:
"(?<=(book| the book| \\ (\\)|\\:)) .*? (?=(for doctor| for| for author))"
the output:
I want to extract any words after (book) and before (for doctor ) and named it Title. And extract any words after (for doctor ) and named it Author.
String inputtext = "book 'learning java' for doctor ahmed mohamed";
Pattern p = Pattern.compile("(?<=(book| the book| \\ (\\)|\\:)) .*? (?=(for doctor| for| for author))");
Matcher m = p.matcher(inputtext);
if (m.matches()) {
String author = m.group(1).trim();
String bookTitle = m.group(2).trim();
System.out.println("Title is : " + author);
System.out.println("Author is : " + bookTitle);
This program gives me error at running and i would to be shore from Regex.

You're missing a ). \\) is an escaped ), so (\\)|\\:) is surrounded by brackets, and (book| the book| \\ (\\)|\\:)) is also, but then (?<= doesn't have a closing bracket.
I'm pretty sure you can't use look-around on Matcher::matches() because it tries to match the whole string, so look-around will look before the start of the string and after the end of the string. Using Matcher::find() works better.
Everything in brackets are groups (except look-around and non-capturing groups), so group 1 is (book| the book| \\ (\\)|\\:)) and group 2 is (\\)|\\:)
(book| the book| \\ (\\)|\\:)) (and for doctor|...) doesn't really need to be in brackets as the look-around brackets are sufficient.
It seems you have too many spaces in your regex (they are part of the regex, so they need to be matched).
If you use look-ahead on the for doctor|for|for author part, you won't be able to capture the author.
You don't need look-around at all.
Which brings us to:
String inputtext = "book 'learning java' for doctor ahmed mohamed";
Pattern p = Pattern.compile("(book|\\)|\\:) (.*) for( doctor| author|) (.*)");
Matcher m = p.matcher(inputtext);
if (m.find()) {
String title = m.group(2).trim();
String author = m.group(4).trim();
System.out.println("Title is : " + title);
System.out.println("Author is : " + author);
}
If you do want to use matches:
String inputtext = "book 'learning java' for doctor ahmed mohamed";
Pattern p = Pattern.compile("(?:book|the book|(?:\\(.*?\\))|.*?\\:) (.*) for(?: doctor| author|) (.*)");
Matcher m = p.matcher(inputtext);
if (m.matches()) {
String title = m.group(1).trim();
String author = m.group(2).trim();
System.out.println("Title is : " + title);
System.out.println("Author is : " + author);
}
The ?: is simply a non-capturing group, without them the match will be the same, but you'll have to use group(3) and group(5) instead of group(1) and group(2).
Reference.

regex replace all ignore case

How do I ignore case in the below example?
outText = inText.replaceAll(word, word.replaceAll(" ", "~"));
Example:
Input:
inText = "Retail banking Wikipedia, the free encyclopedia Retail banking "
+ "From Wikipedia. retail banking industry."
word = "retail banking"
Output
outText = "Retail~banking Wikipedia, the free encyclopedia Retail~banking " +
"From Wikipedia. retail~banking industry."

To do case-insensitive search and replace, you can change
outText = inText.replaceAll(word, word.replaceAll(" ", "~"));
into
outText = inText.replaceAll("(?i)" + word, word.replaceAll(" ", "~"));
Avoid ruining the original capitalization:
In the above approach however, you're ruining the capitalization of the replaced word. Here is a better suggestion:
String inText="Sony Ericsson is a leading company in mobile. " +
"The company sony ericsson was found in oct 2001";
String word = "sony ericsson";
Pattern p = Pattern.compile(word, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(inText);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String replacement = m.group().replace(' ', '~');
m.appendReplacement(sb, Matcher.quoteReplacement(replacement));
}
m.appendTail(sb);
String outText = sb.toString();
System.out.println(outText);
Output:
Sony~Ericsson is a leading company in mobile.
The company sony~ericsson was found in oct 2001

You could convert it all to lowercase before doing the search, or look at a regex modifier Pattern.CASE_INSENSITIVE

Here is my way of doing it:
private String replaceAllIgnoreCase(final String text, final String search, final String replacement){
if(search.equals(replacement)) return text;
final StringBuffer buffer = new StringBuffer(text);
final String lowerSearch = search.toLowerCase(Locale.CANADA);
int i = 0;
int prev = 0;
while((i = buffer.toString().toLowerCase(Locale.CANADA).indexOf(lowerSearch, prev)) > -1){
buffer.replace(i, i+search.length(), replacement);
prev = i+replacement.length();
}
return buffer.toString();
}
Seems to work flawlessly up to my extent. The good thing about doing it my way is that there is no regex in my solution, meaning if you wanted to replace a bracket or a plus sign (or any other meta character for that matter) it will actually replace the text for what it actually is, rather than what it means in regex. Hope this has helped.

You didn't specify a language.
Java has Pattern.CASE_INSENSITIVE
C# and VB have RegexOptions.IgnoreCase

Replace string by excluding some strings in Java

How can I replace following string in Java:
Sports videos (From 2002 To 2003) here.
TO
Sports videos 2002 2003 here.
I have use code but it remove the whole string i.e.
I am getting this ouput: Sports videos here.
String pattern= "\\((From)(?:\\s*\\d*\\s*)(To)(?:\\s*\\d*\\s*)\\)";
String testStr = "Sports videos (From 2002 To 2003) here.";
String testStrAfterRegex = testStr.replaceFirst(pattern, "");
What is missing here?
Thanks
DIFFERENT STRING WITH DATE FORMATTER
If above string has date formatter like(\\) or any other character/words then digit, the answer will not work
I replace orginal answer with this pattern and it will work
String pattern= "\\((From)(.*)(To)(.*)\\)";

Change to
String pattern= "\\((From)(\\s*\\d*\\s*)(To)(\\s*\\d*\\s*)\\)";
String testStr = "Sports videos (From 2002 To 2003) here.";
String testStrAfterRegex = testStr.replaceFirst(pattern, "$2 $4");
There are two problems:
First
You put (?:) in groups with years. This is used to not remember these groups.
Second
You don't use group identifiers, like $1, $2.
I fixed using $2 and $4 for 2th and 4th groups.
EDIT
Cleaner solution:
String pattern= "\\(From(\\s*\\d*\\s*)To(\\s*\\d*\\s*)\\)";
String testStr = "Sports videos (From 2002 To 2003) here.";
String testStrAfterRegex = testStr.replaceFirst(pattern, "$1$2");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

replace words using regex [duplicate] - java

String outText = inText.replaceAll("(?i)(Sony) (Ericsson)", "$1~$2"); Output: Sony~Ericsson is a leading company in mobile. The company Sony~ericsson was found in oct 2001

Related

How to exclude underscore from regex group java

java parse regex multiple capture groups

how to writ regex and how making more one group in java?

regex replace all ignore case

Replace string by excluding some strings in Java

Categories

Resources