regular expressions escape on character - java

I have to separate a big list of emails and names, I have to split on commas but some names have commas in them so I have to deal with that first. Luckily the names are between "quotes".
At the moment I get with my regex output like this for example (edit: it doesn't display emails in the forum I see!):
"Talboom, Esther"
"Wolde, Jos van der"
"Debbie Derksen" <deberken#casema.nl>, corine <corine5#xs4all.nl>, "
The last one went wrong cause the name had no comma so it continues until it founds one and that was the one i want to use to separate. So I want it to look until it finds '<'.
How can I do that?
import java.util.regex.Pattern;
import java.util.regex.Matcher;
String test = "\"Talboom, Esther\" <E.Talboom#wegener.nl>, \"Wolde, Jos van der\" <J.vdWolde#wegener.nl>, \"Debbie Derksen\" <deberken#casema.nl>, corine <corine5#xs4all.nl>, \"Markies Aart\" <A.Markies#wegenernieuwsmedia.nl>";
Pattern pattern = Pattern.compile("\".*?,.*?\"");
Matcher matcher = pattern.matcher(test);
boolean found = false;
while (matcher.find ()) {
System.out.println(matcher.group());
}
edit:
better line to work with since not all have a name or quotes:
String test = "\"Talboom, Esther\" <E.Talboom#wegener.nl>, DRP - Wouter Haan <wouter#drp.eu>, \"Wolde, Jos van der\" <J.vdWolde#wegener.nl>, \"Debbie Derksen\" <deberken#casema.nl>, corine <corine5#xs4all.nl>, clankilllller#gmail.com, \"Markies Aart\" <A.Markies#wegenernieuwsmedia.nl>";

I would simplify the code by using String.split and String.replaceAll. This avoids the hassle of working with a Pattern and makes the code neat and brief.
Try this:
public static void main(String[] args) {
String test = "\"Talboom, Esther\" <E.Talboom#wegener.nl>, \"Wolde, Jos van der\" <J.vdWolde#wegener.nl>, \"Debbie Derksen\" <deberken#casema.nl>, corine <corine5#xs4all.nl>, \"Markies Aart\" <A.Markies#wegenernieuwsmedia.nl>";
// Split up into each person's details
String[] nameEmailPairs = test.split(",\\s*(?=\")");
for (String nameEmailPair : nameEmailPairs) {
// Extract exactly the parts you need from the person's details
String name = nameEmailPair.replaceAll("\"([^\"]+)\".*", "$1");
String email = nameEmailPair.replaceAll(".*<([^>]+).*", "$1");
System.out.println(name + " = " + email);
}
}
Output, showing it actually works :)
Talboom, Esther = E.Talboom#wegener.nl
Wolde, Jos van der = J.vdWolde#wegener.nl
Debbie Derksen = corine5#xs4all.nl
Markies Aart = A.Markies#wegenernieuwsmedia.nl

Related

Regex to remove line break within double quote in CSV

Hi I have a csv file with an error in it.so i want it to correct with regular expression, some of the fields contain line break, Example as below
"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre Pkwy
California",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"
the above two lines should be in one line
"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre PkwyCalifornia",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"
I tried to use the below regex but it didnt help me
%s/\\([^\"]\\)\\n/\\1/
Try this:
public static void main(String[] args) {
String input = "\"AHLR150\",\"CDS\",\"-1\",\"MDCPBusinessRelationshipID\","
+ ",,\"Investigating\",\"1600 Amphitheatre Pkwy\n"
+ "California\",,\"Mountain View\",,\"United\n"
+ "States\",,\"California\",,,\"94043-1351\",\"9958\"\n";
Matcher matcher = Pattern.compile("\"([^\"]*[\n\r].*?)\"").matcher(input);
Pattern patternRemoveLineBreak = Pattern.compile("[\n\r]");
String result = input;
while(matcher.find()) {
String quoteWithLineBreak = matcher.group(1);
String quoteNoLineBreaks = patternRemoveLineBreak.matcher(quoteWithLineBreak).replaceAll(" ");
result = result.replaceFirst(quoteWithLineBreak, quoteNoLineBreaks);
}
//Output
System.out.println(result);
}
Output:
"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre Pkwy California",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"
Create a RegEx surrounding the text you want to keep by parentheses and that will create a group of matched characters. Then replace the string using the group index to compose as you wish.
String test = "\"AHLR150\",\"CDS\",\"-1\",\"MDCPBusinessRelationshipID\","
+ ",,\"Investigating\",\"1600 Amphitheatre Pkwy\n"
+ "California\",,\"Mountain View\",,\"United\n"
+ "States\",,\"California\",,,\"94043-1351\",\"9958\"\n";
System.out.println(test.replaceAll("(\"[^\"]*)\n([^\"]*\")", "$1$2"));
So when we replace the matching string ("United\nStates") by $1$2 we are removing the line break because it not belongs to any group:
$1 => the first group (\"[^\"]*) that will match "United
$2 => the second group ([^\"]*\")" that will match States"
Based on this you can try with:
/\r?\n|\r/
I checked it here and seems to be fine

How can I get non-matching groups using a Matcher in Java?

I'm trying to write a java regex to catch some groups of words from a String using a Matcher.
Say i got this string: "Hello, we are #happy# to see you today".
I would like to get 2 group of matches, one having
Hello, we are
to see you today
and the other
happy
So far, I was only able to match the word between the #s using this Pattern:
Pattern p = Pattern.compile("#(.+?)#");
I've read about negative lookahead and lookaround, played a bit with it but without success.
I assume I should do some sort of negation of the regex so far, but I couldn't come up with anything.
Any help would be really appreciated, thank you.
From comment:
I may incur in a string where I got more than one instances of words wrapped by #, such as "#Hello# kind #stranger#"
From comment:
I need to apply some different style format to both the text inside and outside.
Since you need to apply different stylings, the code need to process each block of text separately, and needs to know if the text is inside or outside a #..# section.
Note, in the following code, it will silently skip the last #, if there is an odd number of them.
String input = ...
for (Matcher m = Pattern.compile("([^#]+)|#([^#]+)#").matcher(input); m.find(); ) {
if (m.start(1) != -1) {
String outsideText = m.group(1);
System.out.println("Outside: \"" + outsideText + "\"");
} else {
String insideText = m.group(2);
System.out.println("Inside: \"" + insideText + "\"");
}
}
Output for input = "Hello, we are #happy# to see you today"
Outside: "Hello, we are "
Inside: "happy"
Outside: " to see you today"
Output for input = "#Hello# kind #stranger#"
Inside: "Hello"
Outside: " kind "
Inside: "stranger"
Output for input = "This #text# has unpaired # characters"
Outside: "This "
Inside: "text"
Outside: " has unpaired "
Outside: " characters"
The best I could do is splitting in 3 groups, then merging the group 1 and 4 :
(^.*)(\#(.+?)\#)(.*)
Test it here
EDIT: Taking remarks from the comments :
(^[^\#]*)(?:\#(.+?)\#)([^\#]*)
Thanks to #Lino we don't capture the useless group with # anymore, and we capture anything except #, instead of any non whitespace character in the 1st and 2nd groups.
Test it here
Is this solution fine?
Pattern pattern =
Pattern.compile("([^#]+)|#([^#]*)#");
Matcher matcher =
pattern.matcher("Hello, we are #happy# to see you today");
List<String> notBetween = new ArrayList<>(); // not surrounded by #
List<String> between = new ArrayList<>(); // surrounded by #
while (matcher.find()) {
if (Objects.nonNull(matcher.group(1))) notBetween.add(matcher.group(1));
if (Objects.nonNull(matcher.group(2))) between.add(matcher.group(2));
}
System.out.println("Printing group 1");
for (String string :
notBetween) {
System.out.println(string);
}
System.out.println("Printing group 2");
for (String string :
between) {
System.out.println(string);
}

Accept everything in java if condition

today I wrote a programm that automaticaly checks if an Netflix account is working or not. But I'm struggling at a point where I need to accept all the country codes in the URL. I wanted to use something like * in linux but my IDE is giving me Errors. What is the Solution and are there better ways?
WebUI.openBrowser('')
WebUI.navigateToUrl('https://www.netflix.com/login')
WebUI.setText(findTestObject('/Page_Netflix/input_email'), 'example#gmail.com')
WebUI.setText(findTestObject('/Page_Netflix/input_password'), '1234')
WebUI.click(findTestObject('/Page_Netflix/button_Sign In'))
TimeUnit.SECONDS.sleep(10)
if (WebUI.getUrl() == "https://www.netflix.com/" + * + "-" + * + "/login") {
}
WebUI.closeBrowser()
So this is your attempt:
if (WebUI.getUrl() == "https://www.netflix.com/" + * + "-" + * + "/login") {
}
which fails, as you can't just use * like that (in addition to using ==, which isn't what you should do when using java). But I think this is what you want:
if (WebUI.getUrl().matches("https://www\\.netflix\\.com/.+-.+/login")) {
// do whatever
}
which would match in whatever country you are in: any url like https://www.netflix.com/it-en/login. If within the if statement you need to use the country information, you'll might want a matcher:
import java.util.regex.*;
Pattern p = Pattern.compile("https://www\\.netflix\\.com/(.+)-(.+)/login");
Matcher m = p.matcher(WebUI.getUrl());
if (m.matches()) {
String country = m.group(1);
String language = m.group(2);
// do whatever
}
Note that we're using java here, as you have the question tagged like that. Katalon is able to use also javascript and groovy, which you've also used in your single-quote strings and leaving out semicolons. In groovy, == for string comparison is ok, and it can also use shorthands for regular expressions.
You could create a list of pair valid values for the country codes if you want to keep track of which country you are in, and the just compare the two strings.
If you don't want to do it that way and accept everything it comes in the url string, then I recommend you using split method:
String sections[] = (WebUI.getUrl()).split("/");
/* Now we have:
sections[0] = "https:""
sections[1] = ""
sections[2] = "www.netflix.com"
sections[3] = whatever the code country is
sections[4] = login
*/
Try to solve it with regular expression on URL string:
final String COUNTRY_CODES_REGEX =
"Country1|Country2|Country3";
Pattern pattern = Pattern.compile(COUNTRY_CODES_REGEX);
Matcher matcher = pattern.matcher(WebUI.getUrl());
if (matcher.find()) {
// Do some stuff.
}
Instead of using WebUI.getUrl() == ...
you could use String.matches (String pattern). Similarly to AutomatedOwl's reply you would define a String variable that is a regex logical-or separated aggregate of the individual country codes. So you have
String country1 = ...
String country2 = ...
String countryN = ...
String countryCodes = String.join("|", country1, country2, countryN);
then you have something along the lines of:
if (WebUI.getUrl().matches("https://www.netflix.com/" + countryCodes + "/login")) {
... do stuff
}

How to replace a word with specific word

I have a String:
String s="<p>Dear <span>{customerName}, your {accountName} is actived </span></p><p> </p><p><span>Congrats!.....</span></p>";
So I want to take CustomerName and accountName words and replace with customers details. Can anyone please tell me how can I replace. Here customerName and accountName are dynamically changing ..because those are columns in database sometimes different columns. So i want to find the words within the { and } and need to replace with column data.
Use the following code
s = s.replace("{customerName}", realCustomerName);
s = s.replace("{accountName}", realAccountNAme);
With String's replace function, the first argument is the string you want to replace, and the second argument is the string you want to insert.
Try:
s=s.replace('{customerName}',CustomerName ).replace('{accountName}',accountName);
where CustomerName and accountName will be the strings holding your customers details
If you simply want to replace the words, you could do the following:
String s="<p>Dear <span>{customerName}, your {accountName} is actived </span></p><p> </p><p><span>Congrats!.....</span></p>";
s.replace( "{customerName}", customer.getName() );
s.replace( "{accountName}", account.getName() );
Or, if you are building the string yourself and you can modify it, it might be better to do the following:
String s="<p>Dear <span>%1$s, your %1$s is actived </span></p><p> </p><p><span>Congrats!.....</span></p>";
// You may also just create a new String object...
s = String.format( s, customer.getName(), account.getName() );
Finally, I found the answer to replace the words using regular expressions. Here words b/w ~ need to replace and these words are not fixed and dynamically will be added to string from UI text Area.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularEx {
/**
* #param args
*/
public static void main(String args[]) {
Pattern pattern = Pattern.compile("\\~.*?\\~");
StringBuilder s = new StringBuilder(
"~ABCD~~BBCc~All the best ~ABCD~~BBCc~~in~~Raja~ Such kind of people ~in~~Raja~~ABCD~~BBCc~~in~~Raja~rajasekhar~ABCD~~BBCc~~in~~Raja~ Bayanapalli ~Chinthalacheruvu~");
Matcher matcher = pattern.matcher(s);
// using Matcher find(), group(), start() and end() methods
String s1 =new String("~ABCD~~BBCc~All the best ~ABCD~~BBCc~~in~~Raja~ Such kind of people ~in~~Raja~~ABCD~~BBCc~~in~~Raja~rajasekhar~ABCD~~BBCc~~in~~Raja~ Bayanapalli ~Chinthalacheruvu~");
int i = 0;
while (matcher.find()) {
String grp = matcher.group();
int si = matcher.start();
int ei = matcher.end();
System.out.println("Found the text \"" + grp
+ "\" starting at " + si + " index and ending at index " + ei);
s1=s1.replaceAll(grp, "Raja");
//System.out.println("FinalString" + s1);
}
System.out.println("------------------------------------\nFinalString" + s1);
}
}
s = s.replace("{customerName}", "John Doe");
s = s.replace("{accountName}", "jdoe");

Using Multiple Java regular expressions

I am trying to extract an email and replace it with a space using a pattern(EMAIL_PATTERN). When running the following, no output is produced when a full document is passed in. The pattern will only match the entire region. So this means if we pass in only the email, the email will be matched and be replaced with a space. But the purpose of the following method is to find the email and previous manual extraction is not required. After the email in the tempString has been replaced, I want to use it for the next pattern. Should I combine the patterns I want to use in one method or should they be placed in separate methods? Below is the code I have as of now. Also I have other patterns, but since my method is not working correctly I have not posted them yet.
private static final String EMAIL_PATTERN = "[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*#[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
public static void main (String[] args) {
//Document takes in a ID, student information(which includes email, address, phone, name), school, and text
Document r = new Document("", "FirstName LastName, Address, example#email.com, phoneNumber", "School", "experience", "text");
personalEmailZone(r);
}
public static Document personalEmailZone(Document doc){
//tempString is the personal information section of a resume
String tempPI = doc.tempString();
if(doc.tempString().matches(EMAIL_PATTERN) == true){
//Pattern pattern = Pattern.compile("");
Pattern pattern = Pattern.compile(EMAIL_PATTERN);
Matcher matcher = pattern.matcher(tempPI);
String emailTemp = "";
if(matcher.find()){
emailTemp = matcher.group();
System.out.println(emailTemp);
//PI.replace(emailTemp, "");
System.out.println(emailTemp.replace(emailTemp, ""));
tempPI = tempPI.replace(emailTemp, "");
System.out.println(tempPI);
}
}
return doc;
}
You have several problems:
public static Document personalEmailZone(Document doc){
//tempString is the personal information section of a resume
String tempPI = doc.tempString();
if(doc.tempString().matches(EMAIL_PATTERN) == true){
The above statement attempts to match the entire document against the email address pattern. This will not match unless doc.tempString() contains ONLY a single email address and nothing else.
//Pattern pattern = Pattern.compile("");
Pattern pattern = Pattern.compile(EMAIL_PATTERN);
Matcher matcher = pattern.matcher(tempPI);
String emailTemp = "";
if(matcher.find()){
emailTemp = matcher.group();
System.out.println(emailTemp);
//PI.replace(emailTemp, "");
System.out.println(emailTemp.replace(emailTemp, ""));
Not sure what the above is for. If your code ever reached this point, it would always print an empty line.
tempPI = tempPI.replace(emailTemp, "");
System.out.println(tempPI);
}
Since there's no loop, you will have replaced only the first occurrence of an email address. If you're expecting to replace ALL occurrences, you need to loop over the input.
}
return doc;
At this point you haven't actually modified doc, so you're returning the document in its original form, with email addresses included.
}
Look at the Javadoc for String#replaceAll(String regex, String replacement)
You can place your patterns in different methods, which return the modified string for the text pattern usage. For example
String tempPI = doc.tempString();
tempPI = applyPattern1(tempPI);
tempPI = applyPattern2(tempPI)
tempPI = applyPattern3(tempPI);
Your code does't show any output because of doc.tempString().matches(EMAIL_PATTERN) == true. Maybe it's not needed there, since it expects the entire string to be an email.

Categories

Resources