find patern text in Java and replace to another pattern - java

I have a paragraph of text numbers with specific format
e.g "123-21-1234 this is another text - some text 222-34-2244 another text"
I need to select the specific numbers ( 123-21-1234 and 222-34-2244) and convert them to "123/21/1234 this is another text - some text 222/34/2244 another text"

You can try something like below using Matcher.appendReplacement
public static void main(String[] args) {
String str = "123-21-1234 this is another text - some text 222-34-2244 another text";
Pattern p = Pattern.compile("(\\d{3})-(\\d{2})-(\\d{4})");
Matcher m = p.matcher(str);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String num = m.group();
m.appendReplacement(sb, num.replace('-', '/'));
}
m.appendTail(sb);
System.out.println(sb.toString());
}

Using .replaceAll("-", "/") has some annoying side effects
Instead you can look for the String literal to replace, or craft your own regex
string.replaceAll("123-21-1234", "123/21/1234").replaceAll("222-34-2244", "222/34/2244");
If you wish to match any XXX-XX-XXXX patterns
string.replaceAll("(\\d{3})-(\\d{2})-(\\d{4})", "$1/$2/$3");
This works by looking for the digit sequence, putting the digits into groups ($0 is the whole match, $1 is the first ()s, $2 is second ()s...)

Related

How to parse a string to get array of #tags out of the string?

so I have this string like
"#tag1 #tag2 #tag3 not_tag1 not_tag2 #tag4" (the space between tag2 and tag4 is to indicate there can be many spaces). From this string I want to parse just a tag1, tag2 and so on. They are similar to #tags we see on LinkedIn or any other social media. Is there any easy way to do this using regex or any other function in Java. Or should I do it hard way(i.e. using loops and conditions).
Tag format should be "#" (to indicate tag is starting) and space " "(to indicate end of tag). In between there can be character or numbers but start should be a character only.
example,
input : "#tag1 #tag2 #tag3 not_tag1 not_tag2 #12tag #tag4"
output : ["tag1", "tag2", "tag3", "tag4"]
split by regex: "#\w+"
EDIT: this is the correct regex, but split is not the right method.
same solution as javadev suggested, but use instead:
String input = "#tag1 #tag2 #tag3 not_tag1 not_tag2 #12tag #tag4";
Matcher matcher = Pattern.compile("#\\w+").matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
output with # as expected.
Maybe something like:
public static void main(String[] args ) {
String input = "#tag1 #tag2 #tag3 not_tag1 not_tag2 #12tag #tag4";
Pattern pattern = Pattern.compile("#([A-z][A-z0-9]*) *");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
worked for me :)
Output:
tag1
tag2
tag3
tag4

JAVA Get text from String

Hi I get this String from server :
id_not="autoincrement"; id_obj="-"; id_tr="-"; id_pgo="-"; typ_not=""; tresc="Nie wystawił"; datetime="-"; lon="-"; lat="-";
I need to create a new String e.x String word and send a value which I get from String tresc="Nie wystawił"
Like #Jan suggest in comment you can use regex for example :
String str = "id_not=\"autoincrement\"; id_obj=\"-\"; id_tr=\"-\"; id_pgo=\"-\"; typ_not=\"\"; tresc=\"Nie wystawił\"; datetime=\"-\"; lon=\"-\"; lat=\"-\";";
Pattern p = Pattern.compile("tresc(.*?);");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group());
}
Output
tresc="Nie wystawił";
If you want to get only the value of tresc you can use :
Pattern p = Pattern.compile("tresc=\"(.*?)\";");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1));
}
Output
Nie wystawił
Something along the lines of
Pattern p = Pattern.compile("tresc=\"([^\"]+)\");
Matcher m = p.matcher(stringFromServer);
if(m.find()) {
String whatYouWereLookingfor = m.group(1);
}
should to the trick. JSON parsing might be much better in the long run if you need additional values
Your question is unclear but i think you get a string from server and from that string you want the string/value for tresc. You can first search for tresc in the string you get. like:
serverString.substring(serverString.indexOf("tresc") + x , serverString.length());
Here replace x with 'how much further you want to pick characters.
Read on substring and delimiters
As values are separated by semicolon so annother solution could be:
int delimiter = serverstring.indexOf(";");
//in string thus giving you the index of where it is in the string
// Now delimiter can be -1, if lets say the string had no ";" at all in it i.e. no ";" is not found.
//check and account for it.
if (delimiter != -1)
String subString= serverstring.substring(5 , iend);
Here 5 means tresc is on number five in string, so it will five you tresc part.
You can then use it anyway you want.

Replace group 1 of Java regex with out replacing the entire regex

I have a regex pattern that will have only one group. I need to find texts in the input strings that follows the pattern and replace ONLY the match group 1. For example I have the regex pattern and the string to be applied on as shown below. The replacement string is "<---->"
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher("plan plans lander planitia");
The expected result is
plan p<--->s <--->der p<--->itia
I tried following approaches
String test = "plan plans lander planitia";
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher(test);
String result = "";
while(m.find()){
result = test.replaceAll(m.group(1),"<--->");
}
System.out.print(result);
This gives result as
p<---> p<--->s <--->der p<--->itia
Another approach
String test = "plan plans lander planitia";
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher(test);
String result = "";
while(m.find()){
result = test.replaceAll("\\w*(lan)\\w+","<--->");
}
System.out.print(result);
Result is
plan <---> <---> <--->
I have gone through this link. Here the part of the string before the match is always constant and is "foo" but in my case it varies. Also I have looked at this and this but I am unable to apply any on the solutions given to my present scenario.
Any help is appreciated
You need to use the following pattern with capturing groups:
(\w*)lan(\w+)
^-1-^ ^-2-^
and replace with $1<--->$2
See the regex demo
The point is that we use a capturing group around the parts that we want to keep and just match what we want to discard.
Java demo:
String str = "plan plans lander planitia";
System.out.println(str.replaceAll("(\\w*)lan(\\w+)", "$1<--->$2"));
// => plan p<--->s <--->der p<--->itia
If you need to be able to replace the Group 1 and keep the rest, you may use the replace callback method emulation with Matcher#appendReplacement:
String text = "plan plans lander planitia";
String pattern = "\\w*(lan)\\w+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(0).replaceFirst(Pattern.quote(m.group(1)), "<--->"));
}
m.appendTail(sb); // append the rest of the contents
System.out.println(sb.toString());
// output => plan p<--->s <--->der p<--->itia
See another Java demo
Here, since we process a match by match, we should only replace the Group 1 contents once with replaceFirst, and since we replace the substring as a literal, we should Pattern.quote it.
To dynamically control the replacement value, use a find() loop with appendReplacement(), finalizing the result with appendTail().
That way you have full control of the replacement value. In your case, the pattern is the following, and you can get the positions indicated.
start(1)
↓ end(1)
↓ ↓
\\w*(lan)\\w+
↑ ↑
start() end()
You can then extract the values to keep.
String input = "plan plans lander planitia";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher(input);
while (m.find())
m.appendReplacement(buf, input.substring(m.start(), m.start(1)) +
"<--->" +
input.substring(m.end(1), m.end()));
String output = m.appendTail(buf).toString();
System.out.println(output);
Output
plan p<--->s <--->der p<--->itia
If you don't like that it uses the original string, you can use the matched substring instead.
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher("plan plans lander planitia");
while (m.find()) {
String match = m.group();
int start = m.start();
m.appendReplacement(buf, match.substring(0, m.start(1) - start) +
"<--->" +
match.substring(m.end(1) - start, m.end() - start));
}
String output = m.appendTail(buf).toString();
While Wiktors explanation of the use of capturing groups is completely correct, you could avoid using them at all. The \\w* at the start of your pattern seems irrelevant, as you want to keep it anyways, so we can simply leave it out of the pattern. The check for a word-character after lan can be done using a lookahead, like (?=\w), so we actually only match lan in a pattern like "lan(?=\\w)" and can do a simple replace with "<--->" (or whatever you like).
I like others solutions. This is slightly optimalised bulletproof version:
public static void main (String [] args) {
int groupPosition = 1;
String replacement = "foo";
Pattern r = Pattern.compile("foo(bar)");
Matcher m = r.matcher("bar1234foobar1234bar");
StringBuffer sb = new StringBuffer();
while (m.find()) {
StringBuffer buf = new StringBuffer(m.group());
buf.replace(m.start(groupPosition)-m.start(), m.end(groupPosition)-m.start(), replacement);
m.appendReplacement(sb, buf.toString());
}
m.appendTail(sb);
System.out.println(sb.toString()); // result is "bar1234foofoo1234bar"
}

Pattern/Matcher in Java?

I have a certain text in Java, and I want to use pattern and matcher to extract something from it. This is my program:
public String getItemsByType(String text, String start, String end) {
String patternHolder;
StringBuffer itemLines = new StringBuffer();
patternHolder = start + ".*" + end;
Pattern pattern = Pattern.compile(patternHolder);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
itemLines.append(text.substring(matcher.start(), matcher.end())
+ "\n");
}
return itemLines.toString();
}
This code works fully WHEN the searched text is on the same line, for instance:
String text = "My name is John and I am 18 years Old";
getItemsByType(text, "My", "John");
immediately grabs the text "My name is John" out of the text. However, when my text looks like this:
String text = "My name\nis John\nand I'm\n18 years\nold";
getItemsByType(text, "My", "John");
It doesn't grab anything, since "My" and "John" are on different lines. How do I solve this?
Use this instead:
Pattern.compile(patternHolder, Pattern.DOTALL);
From the javadoc, the DOTALL flag means:
Enables dotall mode.
In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.
Use Pattern.compile(patternHolder, Pattern.DOTALL) to compile the pattern. This way the dot will match the newline. By default, newline is treated in a special way and not matched by the dot.

Escape special characters in java

I have a text file having | (pipe) as the separator. If I am reading a column and the column itself also contains | then it while separating another column is created.
Example :
name|date|age
zzz|20-03-22|23
"xx|zz"|23-23-33|32
How can I escape the character within the double quotes ""
how to escape the regular expression used in the split, so that it works for user-specified delimiters
i have tried
String[] cols = line.split("\|");
System.out.println("lets see column only=="+cols[1]);
How can I escape the character within the double quotes ""
Here's one approach:
String str = "\"xx|zz\"|23-23-33|32";
Matcher m = Pattern.compile("\"[^\"]*\"").matcher(str);
StringBuffer sb = new StringBuffer();
while (m.find())
m.appendReplacement(sb, m.group().replace("|", "\\\\|"));
m.appendTail(sb);
System.out.println(sb); // prints "xx\|zz"|23-23-33|32
In order to get the columns back you'd do something like this:
String str = "\"xx\\|zz\"|23-23-33|32";
String[] cols = str.split("(?<!\\\\)\\|");
for (String col : cols)
System.out.println(col.replace("\\|", "|"));
Regarding your edit:
how to escape the regular expression used in the split, so that it works for user-specified delimiters
You should use Pattern.quote on the string you want to split on:
String[] cols = line.split(Pattern.quote(delimiter));
This will ensure that the split works as intended even if delimiter contains special regex-symbols such as . or |.
You can use a CSV parser like OpenCSV ou Commons CSV
http://opencsv.sourceforge.net
http://commons.apache.org/sandbox/csv
You can replace it with its unicode sequence (prior to delimiting with pipe)
But what you should do is adjust your parser to take that into account, rather than changing the files.
Here is one way to parse it
String str = "zzz|20-03-22|23 \"xx|zz\"|23-23-33|32";
String regex = "(?<=^|\\|)(([^\"]*?)|([^\"]+\"[^\"]+\".*?))(?=\\||$)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
while(m.find()) {
System.out.println(m.group());
}
Output:
zzz
20-03-22
23 "xx|zz"
23-23-33
32

Categories

Resources