Pattern matching for character and end of line - java

I have a string which is in following format:
I am extracting this Hello:A;B;C, also Hello:D;E;F
How do I extract the strings A;B;C and D;E;F?
I have written below code snippet to extract but not able to extract the last matching character D;E;F
Pattern pattern = Pattern.compile("(?<=Hello:).*?(?=,)");

The $ means end-of-line.
Thus this should work:
Pattern pattern = Pattern.compile("(?<=Hello:).*?(?=,|$)");
So you look-ahead for a comma or the end-of-line.
Test.

Try this:
String test = "I am extracting this Hello:Word;AnotherWord;YetAnotherWord, also Hello:D;E;F";
// any word optionally followed by ";" three times, the whole thing followed by either two non-word characters or EOL
Pattern pattern = Pattern.compile("(\\w+;?){3}(?=\\W{2,}|$)");
Matcher matcher = pattern.matcher(test);
while (matcher.find()) {
System.out.println(matcher.group());
}
Output:
Word;AnotherWord;YetAnotherWord
D;E;F

Assuming you mean omitting certain patterns in a string:
String s = "I am extracting this Hello:A;B;C, also Hello:D;E;F" ;
ArrayList<String> tokens = new ArrayList<String>();
tokens.add( "A;B;C" );
tokens.add( "D;E;F" );
for( String tok : tokens )
{
if( s.contains( tok ) )
{
s = s.replace( tok, "");
}
}
System.out.println( s );

Related

Can't split a line in Java

I am facing a problem that I don't know correctly split this line. I only need RandomAdresas0 100 2018 1.
String line = Files.readAllLines(Paths.get(failas2)).get(userInp);
System.out.println(line);
arr = line.split("[\\s\\-\\.\\'\\?\\,\\_\\#]+");;
Content in line:
[Pastatas{pastatoAdresas='RandomAdresas0',pastatoAukstuSkaicius=100,pastatoPastatymoData=2018, pastatoButuKiekis=1}]
You can try this code (basically extracting a string between two delimiters):
String ss = "[Pastatas{pastatoAdresas='RandomAdresas0',pastatoAukstuSkaicius=100,pastatoPastatymoData=2018, pastatoButuKiekis=1}]";
Pattern pattern = Pattern.compile("=(.*?)[,}]");
Matcher matcher = pattern.matcher(ss);
while (matcher.find()) {
System.out.println(matcher.group(1).replace("'", ""));
}
This output:
RandomAdresas0
100
2018
Remove all the characters before '{' including '{'
Remove all the characters after '}' including '}'
You can do the both by using indexOf method and substring.
Now you will left with only the following:
pastatoAdresas='RandomAdresas0',pastatoAukstuSkaicius=100,pastatoPastatymoData=2018, pastatoButuKiekis=1
After this read this [thread][1] : Parse a string with key=value pair in a map?
Here is a solution using a regular expression and the Pattern & Matcher classes. The values you are after can be retrieved using the group() method and you get all values by looping as long as find() returns true.
String data = "[Pastatas{pastatoAdresas='RandomAdresas0',pastatoAukstuSkaicius=100,pastatoPastatymoData=2018, pastatoButuKiekis=1}]";
Pattern pattern = Pattern.compile("=([^, }]*)");
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
System.out.printf("[%d:%d] %s", matcher.start(), matcher.end(), matcher.group(1));
}
The matched value is in group 1, group 0 matches the whole reg ex

extract a set of a characters between some characters

I have a string email = John.Mcgee.r2d2#hitachi.com
How can I write a java code using regex to bring just the r2d2?
I used this but got an error on eclipse
String email = John.Mcgee.r2d2#hitachi.com
Pattern pattern = Pattern.compile(".(.*)\#");
Matcher matcher = patter.matcher
for (Strimatcher.find()){
System.out.println(matcher.group(1));
}
To match after the last dot in a potential sequence of multiple dots request that the sequence that you capture does not contain a dot:
(?<=[.])([^.]*)(?=#)
(?<=[.]) means "preceded by a single dot"
(?=#) means "followed by # sign"
Note that since dot . is a metacharacter, it needs to be escaped either with \ (doubled for Java string literal) or with square brackets around it.
Demo.
Not sure if your posting the right code. I'll rewrite it based on what it should look like though:
String email = John.Mcgee.r2d2#hitachi.com
Pattern pattern = Pattern.compile(".(.*)\#");
Matcher matcher = pattern.matcher(email);
int count = 0;
while(matcher.find()) {
count++;
System.out.println(matcher.group(count));
}
but I think you just want something like this:
String email = John.Mcgee.r2d2#hitachi.com
Pattern pattern = Pattern.compile(".(.*)\#");
Matcher matcher = pattern.matcher(email);
if(matcher.find()){
System.out.println(matcher.group(1));
}
No need to Pattern you just need replaceAll with this regex .*\.([^\.]+)#.* which mean get the group ([^\.]+) (match one or more character except a dot) which is between dot \. and #
email = email.replaceAll(".*\\.([^\\.]+)#.*", "$1");
Output
r2d2
regex demo
If you want to go with Pattern then you have to use this regex \\.([^\\.]+)# :
String email = "John.Mcgee.r2d2#hitachi.com";
Pattern pattern = Pattern.compile("\\.([^\\.]+)#");
Matcher matcher = pattern.matcher(email);
if (matcher.find()) {
System.out.println(matcher.group(1));// Output : r2d2
}
Another solution you can use split :
String[] split = email.replaceAll("#.*", "").split("\\.");
email = split[split.length - 1];// Output : r2d2
Note :
Strings in java should be between double quotes "John.Mcgee.r2d2#hitachi.com"
You don't need to escape # in Java, but you have to escape the dot with double slash \\.
There are no syntax for a for loop like you do for (Strimatcher.find()){, maybe you mean while

How to extract a substring using regex in java

I have the following string :
String xmlnode = "<firstname id="{$person.id}"> {$person.firstname} </firstname>";
How can I write a regex to extract the data inside the {$STRING_I_WANT}
The part I need is without {$} how can I achieve that?
You can use this regex \{\$(.*?)\} with pattern like this :
String xmlnode = "<firstname id=\"{$person.id}\"> {$person.firstname} </firstname>";
Pattern pattern = Pattern.compile("\\{\\$(.*?)\\}");
Matcher matcher = pattern.matcher(xmlnode);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Note : you have to escape each character { $ } with \ because each one is special character in regex.
Outputs
person.id
person.firstname

Wildcard match and replace in Java

I want to check a string to see if it contains $wildcard$, and ONLY if it does I want to extract the value between the "$ $", which I'll use to retrieve a replacement. Then replace the full new string (removing the $ $ as well)
Edit: managed to get this working demo
String subject = "test/$name$/something";
String replace = "foo_bar";
Pattern regex = Pattern.compile("(\\$).*?(\\$)");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
String something = m.group(0);
System.out.println(something);
m.appendReplacement(b, replace);
}
m.appendTail(b);
String replaced = b.toString();
System.out.println(replaced);
Gives me the output of
$name$
test/foo_bar/something
I could substring to remove the lead/trailing $ but is there a way to split these into groups so I can just get what is between $ $. But also ensuring that the initial check ensures it has a start and end $
Add another matching group for the content of the tag:
Pattern.compile("(\\$)(.*?)(\\$)");
Remove unnecessary capturing groups from both \\$, set the capturing group on the pattern that matches what is between two $ chars (and the most efficient construct to use here is a negated character class [^$]), and then just grab the value of .group(1):
String subject = "test/$name$/something";
String replace = "foo_bar";
Pattern regex = Pattern.compile("\\$([^$]*)\\$"); // ONLY 1 GROUP ROUND [^$]*
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
String something = m.group(1); // ACCESS GROUP 1
System.out.println(something);
m.appendReplacement(b, replace);
}
m.appendTail(b);
String replaced = b.toString();
System.out.println(replaced);
See the Java demo
Result:
name
test/foo_bar/something
Pattern details
\\$ - a $ char
([^$]*) - Capturing group 1 matching zero or more chars other than $ char
\\$ - a $ char.
It has slightly different syntax that you're asking for, but check out Apache Commons Text: https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StrSubstitutor.html
This will let you do things like:
Map<String,String> substitutions = ImmutableMap.of("name", "foo_bar");
String template = "/test/${name}/something";
StrSubstitutor substitutor = new StrSubstitutor(substitutions);
System.out.println(substitutor.replace(template));
You could build your own Map to populate with your substitution values.

Regex composion

I want to parse a line from a CSV(comma separated) file, something like this:
Bosh,Mark,mark#gmail.com,"3, Institute","83, 1, 2",1,21
I have to parse the file, and instead of the commas between the apostrophes I wanna have ';', like this:
Bosh,Mark,mark#gmail.com,"3; Institute","83; 1; 2",1,21
I use the following Java code but it doesn't parse it well:
Pattern regex = Pattern.compile("(\"[^\\]]*\")");
Matcher matcher = regex.matcher(line);
if (matcher.find()) {
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
the output is:
Bosh,Mark,mark#gmail.com,"3; Institute";"83; 1; 2",1,21
anyone have any idea how to fix this?
This is my solution to replace , inside quote to ;. It assumes that if " were to appear in a quoted string, then it is escaped by another ". This property ensures that counting from start to the current character, if the number of quotes " is odd, then that character is inside a quoted string.
// Test string, with the tricky case """", which resolves to
// a length 1 string of single quote "
String line = "Bosh,\"\"\"\",mark#gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern pattern = Pattern.compile("\"[^\"]*\"");
Matcher matcher = pattern.matcher(line);
int start = 0;
StringBuilder output = new StringBuilder();
while (matcher.find()) {
// System.out.println(m.group() + "\n " + m.start() + " " + m.end());
output
.append(line.substring(start, matcher.start())) // Append unrelated contents
.append(matcher.group().replaceAll(",", ";")); // Append replaced string
start = matcher.end();
}
output.append(line.substring(start)); // Append the rest of unrelated contents
// System.out.println(output);
Although I cannot find any case that will fail the method of replace the matched group like you did in line = line.replace(matcher.group(), replacedMatch);, I feel safer to rebuild the string from scratch.
Here's a way:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
String in = "Bosh,Mark,mark#gmail.com,\"3, \"\" Institute\",\"83, 1, 2\",1,21";
String regex = "[^,\"\r\n]+|\"(\"\"|[^\"])*\"";
Matcher matcher = Pattern.compile(regex).matcher(in);
StringBuilder out = new StringBuilder();
while(matcher.find()) {
out.append(matcher.group().replace(',', ';')).append(',');
}
out.deleteCharAt(out.length() - 1);
System.out.println(in + "\n" + out);
}
}
which will print:
Bosh,Mark,mark#gmail.com,"3, "" Institute","83, 1, 2",1,21
Bosh,Mark,mark#gmail.com,"3; "" Institute","83; 1; 2",1,21
Tested on Ideone: http://ideone.com/fCgh7
Here is the what you need
String line = "Bosh,Mark,mark#gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern regex = Pattern.compile("(\"[^\"]*\")");
Matcher matcher = regex.matcher(line);
while(matcher.find()){
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
line will have value you needed.
Have you tried to make the RegExp lazy?
Another idea: inside the [] you should use a " too. If you do that, you should have the expected output with global flag set.
Your regex is faulty. Why would you want to make sure there are no ] within the "..." expression? You'd rather make the regex reluctant (default is eager, which means it catches as much as it can).
"(\"[^\\]]*\")"
should be
"(\"[^\"]*\")"
But nhadtdh is right, you should use a proper CSV library to parse it and replace , to ; in the values the parser returns.
I'm sure you'll find a parser when googling "Java CSV parser".
Shouldn't your regex be ("[^"]*") instead? In other words, your first line should be:
Pattern regex = Pattern.compile("(\"[^\"]*\")");
Of course, this is assuming you can't have quotes in the quoted values of your input line.

Categories

Resources