Replace While Pattern is Found - java

I'm trying to go through a string and replace all instances of a regex-matching string. For some reason when I use if then it will work and replace just one string instance of a regex-match. When I change the if to while then it does some weird replacement over itself and makes a mess on the first regex-matching string while not even touching the others...
pattern = Pattern.compile(regex);
matcher = pattern.matcher(docToProcess);
while (matcher.find()) {
start = matcher.start();
end = matcher.end();
match = docToProcess.substring(start, end);
stringBuilder.replace(start, end, createRef(match));
docToProcess = stringBuilder.toString();
}

Aside from the sysouts I only added the last assignment. See if it helps:
// your snippet:
pattern = Pattern.compile(regex);
matcher = pattern.matcher(docToProcess);
while (matcher.find()) {
start = matcher.start();
end = matcher.end();
match = docToProcess.substring(start, end);
String rep = createRef(match);
stringBuilder.replace(start, end, rep);
docToProcess = stringBuilder.toString();
// my addition:
System.out.println("Found: '" + matcher.group() + "'");
System.out.println("Replacing with: '" + rep + "'");
System.out.println(" --> " + docToProcess);
matcher = pattern.matcher(docToProcess);
}

Not sure exactly what problem you got but maybe this example will help a little:
I want to change names in sentence like:
Jack -> Albert
Albert -> Paul
Paul -> Jack
We can do this with little help of appendReplacement and appendTail methods from Matcher class
//this method can use Map<String,String>, or maybe even be replaced with Map.get(key)
static String getReplacement(String name) {
if ("Jack".equals(name))
return "Albert";
else if ("Albert".equals(name))
return "Paul";
else
return "Jack";
}
public static void main(String[] args) {
String sentence = "Jack and Albert are goint to see Paul. Jack is tall, " +
"Albert small and Paul is not in home.";
Matcher m = Pattern.compile("Jack|Albert|Paul").matcher(sentence);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, getReplacement(m.group()));
}
m.appendTail(sb);
System.out.println(sb);
}
Output:
Albert and Paul are goint to see Jack. Albert is tall, Paul small and Jack is not in home.

If createRef(match) returns a string which is not the same length as (end - start) then the indexes you are using in docToProcess.substring(start, end) will potentially overlap.

Related

How do I properly use Matcher to retrieve first 30 chars of a String?

My goal is to return the first 30 characters of a user entered String and its returned in an email subject line.
My current solution is this:
Matcher matcher = Pattern.compile(".{1,30}").matcher(Item.getName());
String subject = this.subjectPrefix + "You have been assigned to Item Number " + Item.getId() + ": " + matcher + "...";
What is being returned for matcher is "java.util.regex.Matcher[pattern=.{1,30} region=0,28 lastmatch=]"
I think it's better to use String.substring():
public static String getFirstChars(String str, int n) {
if(str == null)
return null;
return str.substring(0, Math.min(n, str.length()));
}
In case you really want to use regexp, then this is an example:
public static String getFirstChars(String str, int n) {
if (str == null)
return null;
Pattern pattern = Pattern.compile(String.format(".{1,%d}", n));
Matcher matcher = pattern.matcher(str);
return matcher.matches() ? matcher.group(0) : null;
}
I personally would use the substring method of the String class too.
However, don't take it for granted that your string is at least 30 chars long, I'd guess that this may have been part of your problem:
String itemName = "lorem ipsum";
String itemDisplayName = itemName.substring(0, itemName.length() < 30 ? itemName.length() : 30);
System.out.println(itemDisplayName);
This makes use of the ternary operator, where you have a boolean condition, then and else. So if your string is shorter than 30 chars, we'll use the whole string and avoid a java.lang.StringIndexOutOfBoundsException.
Well, if you really need to use Matcher, then try:
Matcher matcher = Pattern.compile(".{1,30}").matcher("123456789012345678901234567890");
if (matcher.find()) {
String subject = matcher.group(0);
}
But it would be better to use the substring method:
String subject = "123456789012345678901234567890".substring(0, 30);
use substring instead.
String str = "....";
String sub = str.substring(0, 30);

How to recover integers?

I get a string and I have to retrieve the values
Je pense que nous devons utiliser le ".slit"
if (stringReceived.contains("ID")&& stringReceived.contains("Value")) {
here is my character string:
I/RECEIVER: [1/1/0 3
I/RECEIVER: :32:11]
I/RECEIVER: Timestam
I/RECEIVER: p=946697
I/RECEIVER: 531 ID=4
I/RECEIVER: 3 Value=
I/RECEIVER: 18
I receive the value 1 byte by 1 byte.
I would like to recover the value of Timestamp, Id and Value..
You can also use regex for that. Something like:
String example="[11/2/19 9:48:25] Timestamp=1549878505 ID=4 Value=2475";
Pattern pattern=Pattern.compile(".*Timestamp=(\\d+).*ID=(\\d+).*Value=(\\d+)");
Matcher matcher = pattern.matcher(example);
while(matcher.find()) {
System.out.println("Timestamp is:" + matcher.group(1));
System.out.println("Id is:" + matcher.group(2));
System.out.println("Value is:" + matcher.group(3));
}
If the order of tokens can be different (for example ID can come before Timestamp) you can also do it. But since it looks like log which is probably structured I doubt you will need to.
First [11/2/19 9:48:25] seems unnecessary so let's remove it by jumping right into "Timestamp".
Using indexOf(), we can find where Timestamp starts.
// "Timestamp=1549878505 ID=4 Value=2475"
line = line.substring(line.indexOf("Timestamp"));
Since each string is separated by space, we can split it.
// ["Timestamp=1549878505", "ID=4" ,"Value=2475"]
line.split(" ");
Now for each tokens, we can substring it using index of '=' and parse it into string.
for(String token: line.split(" ")) {
int v = Integer.parseInt(token.substring(token.indexOf('=') + 1));
System.out.println(v);
}
Hope that helps :)
String text = "Timestamp=1549878505 ID=4 Value=2475";
Pattern p = Pattern.compile("ID=(\\d)");
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println(m.group(1));
}
output
4
A simple regex is also an option:
private int fromString(String data, String key) {
Pattern pattern = Pattern.compile(key + "=(\\d*)");
Matcher matcher = pattern.matcher(data);
if (matcher.find()) {
return Integer.parseInt(matcher.group(1));
}
return -1;
}
private void test(String data, String key) {
System.out.println(key + " = " + fromString(data, key));
}
private void test() {
String test = "[11/2/19 9:48:25] Timestamp=1549878505 ID=4 Value=2475";
test(test, "Timestamp");
test(test, "ID");
test(test, "Value");
}
prints:
Timestamp = 1549878505
ID = 4
Value = 2475
You can try that:
String txt= "[11/2/19 9:48:25] Timestamp=1549878505 ID=4 Value=2475";
String re1= ".*?\\d+.*?\\d+.*?\\d+.*?\\d+.*?\\d+.*?\\d+.*?(\\d+).*?(\\d+).*?(\\d+)";
Pattern p = Pattern.compile(re1,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String int1=m.group(1);
String int2=m.group(2);
String int3=m.group(3);
System.out.print("("+int1+")"+"("+int2+")"+"("+int3+")"+"\n");
}
Use below code, You will find your timestamp at index 0, id at 1 and value at 2 in List.
Pattern pattern = Pattern.compile("=\\d+");
Matcher matcher = pattern.matcher(stringToMatch);
final List<String> matches = new ArrayList<>();
while (matcher.find()) {
String ans = matcher.group(0);
matches.add(ans.substring(1, ans.length()));
}
Explaining the regex
= matches the character = literally
\d* matches a digit (equal to [0-9])
* Quantifier — Matches between zero and unlimited times, as many times as possible

Get an array of Strings matching a pattern from a String

I have a long string let's say
I like this #computer and I want to buy it from #XXXMall.
I know the regular expression pattern is
Pattern tagMatcher = Pattern.compile("[#]+[A-Za-z0-9-_]+\\b");
Now i want to get all the hashtags in an array. How can i use this expression to get array of all hash tags from string something like
ArrayList hashtags = getArray(pattern, str)
You can write like?
private static List<String> getArray(Pattern tagMatcher, String str) {
Matcher m = tagMatcher.matcher(str);
List<String> l = new ArrayList<String>();
while(m.find()) {
String s = m.group(); //will give you "#computer"
s = s.substring(1); // will give you just "computer"
l.add(s);
}
return l;
}
Also you can use \\w- instead of A-Za-z0-9-_ making the regex [#]+[\\w]+\\b
This link would surely be helpful for achieving what you want.
It says:
The find() method searches for occurrences of the regular expressions
in the text passed to the Pattern.matcher(text) method, when the
Matcher was created. If multiple matches can be found in the text, the
find() method will find the first, and then for each subsequent call
to find() it will move to the next match.
The methods start() and end() will give the indexes into the text
where the found match starts and ends.
Example:
String text =
"This is the text which is to be searched " +
"for occurrences of the word 'is'.";
String patternString = "is";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
int count = 0;
while(matcher.find()) {
count++;
System.out.println("found: " + count + " : "
+ matcher.start() + " - " + matcher.end());
}
You got the hint now.
Here is one way, using Matcher
Pattern tagMatcher = Pattern.compile("#+[-\\w]+\\b");
Matcher m = tagMatcher.matcher(stringToMatch);
ArrayList<String> hashtags = new ArrayList<>();
while (m.find()) {
hashtags.add(m.group());
}
I took the liberty of simplifying your regex. # does not need to be in a character class. [A-Za-z0-9_] is the same as \w, so [A-Za-z0-9-_] is the same as [-\w]
You can use :
String val="I like this #computer and I want to buy it from #XXXMall.";
String REGEX = "(?<=#)[A-Za-z0-9-_]+";
List<String> list = new ArrayList<String>();
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(val);
while(matcher.find()){
list.add(matcher.group());
}
(?<=#) Positive Lookbehind - Assert that the character # literally be matched.
you can use the following code for getting the names
String saa = "#{akka}nikhil#{kumar}aaaaa";
Pattern regex = Pattern.compile("#\\{(.*?)\\}");
Matcher m = regex.matcher(saa);
while(m.find()) {
String s = m.group(1);
System.out.println(s);
}
It will print
akka
kumar

regex capture groups returning as null after an OR operator

Matcher matcher = Pattern.compile("\\bwidth\\s*:\\s*(\\d+)px|\\bbackground\\s*:\\s*#([0-9A-Fa-f]+)").matcher(myString);
if (matcher.find()) {
System.out.println(matcher.group(2));
}
Example data:
myString = width:17px;background:#555;float:left; will produce null.
What I wanted:
matcher.group(1) = 17
matcher.group(2) = 555
I've just started using regex on Java, any help?
I would suggest to split things a bit up.
Instead of building one large regex (maybe you want to add more rules into the String?) you should split up the string in multiple sections:
String myString = "width:17px;background:#555;float:left;";
String[] sections = myString.split(";"); // split string in multiple sections
for (String section : sections) {
// check if this section contains a width definition
if (section.matches("width\\s*:\\s*(\\d+)px.*")) {
System.out.println("width: " + section.split(":")[1].trim());
}
// check if this section contains a background definition
if (section.matches("background\\s*:\\s*#[0-9A-Fa-f]+.*")) {
System.out.println("background: " + section.split(":")[1].trim());
}
...
}
Here is a working example. Having | (or) in the regexp is usually confusing so I've added two more matchers to show how I would do it.
public static void main(String[] args) {
String myString = "width:17px;background:#555;float:left";
int matcherOffset = 1;
Matcher matcher = Pattern.compile("\\bwidth\\s*:\\s*(\\d+)px|\\bbackground\\s*:\\s*#([0-9A-Fa-f]+)").matcher(myString);
while (matcher.find()) {
System.out.println("found something: " + matcher.group(matcherOffset++));
}
matcher = Pattern.compile("width:(\\d+)px").matcher(myString);
if (matcher.find()) {
System.out.println("found width: " + matcher.group(1));
}
matcher = Pattern.compile("background:#(\\d+)").matcher(myString);
if (matcher.find()) {
System.out.println("found background: " + matcher.group(1));
}
}

Need help with using regular expression in Java

I am trying to match pattern like '#(a-zA-Z0-9)+ " but not like 'abc#test'.
So this is what I tried:
Pattern MY_PATTERN
= Pattern.compile("\\s#(\\w)+\\s?");
String data = "abc#gere.com #gogasig #jytaz #tibuage";
Matcher m = MY_PATTERN.matcher(data);
StringBuffer sb = new StringBuffer();
boolean result = m.find();
while(result) {
System.out.println (" group " + m.group());
result = m.find();
}
But I can only see '#jytaz', but not #tibuage.
How can I fix my problem? Thank you.
This pattern should work: \B(#\w+)
The \B scans for non-word boundary in the front. The \w+ already excludes the trailing space. Further I've also shifted the parentheses so that the # and + comes in the correct group. You should preferably use m.group(1) to get it.
Here's the rewrite:
Pattern pattern = Pattern.compile("\\B(#\\w+)");
String data = "abc#gere.com #gogasig #jytaz #tibuage";
Matcher m = pattern.matcher(data);
while (m.find()) {
System.out.println(" group " + m.group(1));
}

Categories

Resources