SwiftMessage Regular expression - java

I have the below message:
{1:F01ANZBDEF0AXXX0509036846}{2:I103ANZBDEF0XXXXN}{4::20:TEST000001:23B:CRED:32A:141117EUR0,1:33B:EUR1000,00:50A:ANZBAU30:59:ANZBAU30:71A:SHA-}{5:{CHK:1DBBF1D81EE1}{TNG:}}
And i want it to be converted like below, with whitespaces in block 4 (which is
{4: :20:TEST000001 :23B:CRED :32A:141117EUR0,1 :33B:EUR1000,00 :50A:ANZBAU30 :59:ANZBAU30 :71A:SHA -}
{1:F01ANZBDEF0AXXX0509036846}{2:I103ANZBDEF0XXXXN}{4: :20:TEST000001 :23B:CRED :32A:141117EUR0,1 :33B:EUR1000,00 :50A:ANZBAU30 :59:ANZBAU30 :71A:SHA -}{5:{CHK:1DBBF1D81EE1}{TNG:}}
I tried to extract using groups and then apply regular expression. But, i was unsuccessfully. Unable to find the error i am making.
public static void StringReplace() {
String data = "{1:F01ANZBDEF0AXXX0509036846}{2:I103ANZBDEF0XXXXN}{4::20:TEST000001:23B:CRED:32A:141117EUR0,1:33B:EUR1000,00:50A:ANZBAU30:59:ANZBAU30:71A:SHA-}{5:{CHK:1DBBF1D81EE1}{TNG:}}";
Pattern pat = Pattern.compile("(({1:\\w+})({2:\\w+})({4::\\d+:\\w+:\\d+.:\\w+:\\d+.:\\d+\\w+,\\d:\\d+.:\\w+,\\d+:\\d+.:\\w+:\\d+:\\w+:\\d+.:\\w+-})({5:{\\w+:.\\w+}{\\w+.}}))");
Matcher m = pat.matcher(data);
if(m.matches()) {
System.out.println(m.group(0));
}
}
Thanks in Adavance

You have just matched the string and simply printed it but havn't put logic of introducing a space in between. You need to add the logic of introducing space in block 4.
Looking at the expected output of your block 4, you can first catch the block 4 using this regex,
(.*?)(\\{4.*?\\})(.*?)
and then replace colon with a space colon ( :) in group 2 content which you call as block 4. I see you are not introducing space with every colon instead just for colon which are followed by 2-3 characters followed by colon. I have implemented the logic accordingly in my replaceAll() method.
Here is the modified java code,
public static void StringReplace() {
String data = "{1:F01ANZBDEF0AXXX0509036846}{2:I103ANZBDEF0XXXXN}{4::20:TEST000001:23B:CRED:32A:141117EUR0,1:33B:EUR1000,00:50A:ANZBAU30:59:ANZBAU30:71A:SHA-}{5:{CHK:1DBBF1D81EE1}{TNG:}}";
Pattern pat = Pattern.compile("(.*)(\\{4.*?\\})(.*)");
Matcher m = pat.matcher(data);
if (m.find()) {
String g1 = m.group(1);
String g2 = m.group(2).replaceAll(":(?=\\w{2,3}:)", " :");
String g3 = m.group(3);
System.out.println(g1 + g2 + g3);
} else {
System.out.println("Didn't match");
}
}
This prints the following output as you expect,
{1:F01ANZBDEF0AXXX0509036846}{2:I103ANZBDEF0XXXXN}{4: :20:TEST000001 :23B:CRED :32A:141117EUR0,1 :33B:EUR1000,00 :50A:ANZBAU30 :59:ANZBAU30 :71A:SHA-}{5:{CHK:1DBBF1D81EE1}{TNG:}}

Related

Regex to remove line break within double quote in CSV

Hi I have a csv file with an error in it.so i want it to correct with regular expression, some of the fields contain line break, Example as below
"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre Pkwy
California",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"
the above two lines should be in one line
"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre PkwyCalifornia",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"
I tried to use the below regex but it didnt help me
%s/\\([^\"]\\)\\n/\\1/
Try this:
public static void main(String[] args) {
String input = "\"AHLR150\",\"CDS\",\"-1\",\"MDCPBusinessRelationshipID\","
+ ",,\"Investigating\",\"1600 Amphitheatre Pkwy\n"
+ "California\",,\"Mountain View\",,\"United\n"
+ "States\",,\"California\",,,\"94043-1351\",\"9958\"\n";
Matcher matcher = Pattern.compile("\"([^\"]*[\n\r].*?)\"").matcher(input);
Pattern patternRemoveLineBreak = Pattern.compile("[\n\r]");
String result = input;
while(matcher.find()) {
String quoteWithLineBreak = matcher.group(1);
String quoteNoLineBreaks = patternRemoveLineBreak.matcher(quoteWithLineBreak).replaceAll(" ");
result = result.replaceFirst(quoteWithLineBreak, quoteNoLineBreaks);
}
//Output
System.out.println(result);
}
Output:
"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre Pkwy California",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"
Create a RegEx surrounding the text you want to keep by parentheses and that will create a group of matched characters. Then replace the string using the group index to compose as you wish.
String test = "\"AHLR150\",\"CDS\",\"-1\",\"MDCPBusinessRelationshipID\","
+ ",,\"Investigating\",\"1600 Amphitheatre Pkwy\n"
+ "California\",,\"Mountain View\",,\"United\n"
+ "States\",,\"California\",,,\"94043-1351\",\"9958\"\n";
System.out.println(test.replaceAll("(\"[^\"]*)\n([^\"]*\")", "$1$2"));
So when we replace the matching string ("United\nStates") by $1$2 we are removing the line break because it not belongs to any group:
$1 => the first group (\"[^\"]*) that will match "United
$2 => the second group ([^\"]*\")" that will match States"
Based on this you can try with:
/\r?\n|\r/
I checked it here and seems to be fine

How can I get non-matching groups using a Matcher in Java?

I'm trying to write a java regex to catch some groups of words from a String using a Matcher.
Say i got this string: "Hello, we are #happy# to see you today".
I would like to get 2 group of matches, one having
Hello, we are
to see you today
and the other
happy
So far, I was only able to match the word between the #s using this Pattern:
Pattern p = Pattern.compile("#(.+?)#");
I've read about negative lookahead and lookaround, played a bit with it but without success.
I assume I should do some sort of negation of the regex so far, but I couldn't come up with anything.
Any help would be really appreciated, thank you.
From comment:
I may incur in a string where I got more than one instances of words wrapped by #, such as "#Hello# kind #stranger#"
From comment:
I need to apply some different style format to both the text inside and outside.
Since you need to apply different stylings, the code need to process each block of text separately, and needs to know if the text is inside or outside a #..# section.
Note, in the following code, it will silently skip the last #, if there is an odd number of them.
String input = ...
for (Matcher m = Pattern.compile("([^#]+)|#([^#]+)#").matcher(input); m.find(); ) {
if (m.start(1) != -1) {
String outsideText = m.group(1);
System.out.println("Outside: \"" + outsideText + "\"");
} else {
String insideText = m.group(2);
System.out.println("Inside: \"" + insideText + "\"");
}
}
Output for input = "Hello, we are #happy# to see you today"
Outside: "Hello, we are "
Inside: "happy"
Outside: " to see you today"
Output for input = "#Hello# kind #stranger#"
Inside: "Hello"
Outside: " kind "
Inside: "stranger"
Output for input = "This #text# has unpaired # characters"
Outside: "This "
Inside: "text"
Outside: " has unpaired "
Outside: " characters"
The best I could do is splitting in 3 groups, then merging the group 1 and 4 :
(^.*)(\#(.+?)\#)(.*)
Test it here
EDIT: Taking remarks from the comments :
(^[^\#]*)(?:\#(.+?)\#)([^\#]*)
Thanks to #Lino we don't capture the useless group with # anymore, and we capture anything except #, instead of any non whitespace character in the 1st and 2nd groups.
Test it here
Is this solution fine?
Pattern pattern =
Pattern.compile("([^#]+)|#([^#]*)#");
Matcher matcher =
pattern.matcher("Hello, we are #happy# to see you today");
List<String> notBetween = new ArrayList<>(); // not surrounded by #
List<String> between = new ArrayList<>(); // surrounded by #
while (matcher.find()) {
if (Objects.nonNull(matcher.group(1))) notBetween.add(matcher.group(1));
if (Objects.nonNull(matcher.group(2))) between.add(matcher.group(2));
}
System.out.println("Printing group 1");
for (String string :
notBetween) {
System.out.println(string);
}
System.out.println("Printing group 2");
for (String string :
between) {
System.out.println(string);
}

How to create a regex that accepts specific characters?

I have this regex:
^[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>[-\]]{8,}$
I need a regex to accept a minimum word length of 8, letters(uppercase & lowercase), numbers and these characters:
!#$%&'*+-/=?^_`{|}~"(),:;<>#[]
It works when I tested it here.
This is how I used it in Java Android.
public static final String regex = "^[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>[-\\]]{8,}$";
This is the error that I received.
java.util.regex.PatternSyntaxException: Missing closing bracket in character class near index 49
^[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>[-\]]{8,}$
If you just want to test if a given input string matches your pattern, you may use String#matches directly, e.g.
String regex = "[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>\\[\\]-]{8,}";
String input = "Jon#Skeet#123";
if (input.matches(regex)) {
System.out.println("Found a match");
}
else {
System.out.println("No match");
}
If you wanted to parse a larger input text and identify such matching words, then you would want to use a formal Pattern and Matcher. But, I don't see the need for this just based on your question.
You have to use pattern marcher concept. it may help you.
follow tutorial : https://www.mkyong.com/regular-expressions/how-to-validate-password-with-regular-expression/
Here is one Example.
try {
Pattern pattern;
Matcher matcher;
final String PASSWORD_PATTERN = "((?=.*\\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%]).{6,20})";
pattern = Pattern.compile(PASSWORD_PATTERN);
matcher = pattern.matcher(password_string );
if(matcher.matches()){
Log.e("TAG", "TRUE")
}else{
Log.e("TAG", "FALSE")
}
} catch (RuntimeException e) {
return false;
}

Regular expression in java that encloses some url

i have this problem:
i have to make a regular expression which take this urls:
http://www.amazon.it/TP-LINK-TL-WR841N-Wireless-300Mbps-Ethernet/dp/B001FWYGJS?ie=UTF8&redirect=true&ref_=s9_simh_gw_p147_d0_i2
http://www.amazon.it/gp/product/B014KMQWU0/
http://www.amazon.it/gp/product/glance/B014KMQWU0/
I need a regular expression which matches the full url until the ASIN of the product (ASIN is a word of 10 capital letters)
I have write this regex but not make what i want:
String regex="http:\\/\\/(?:www\\.|)amazon\\.com\\/(?:gp\\ product|| gp\\ product\\ glance || [^\\/]+\\/dp|dp)\\/([^\\/]{10})";
Pattern pattern=Pattern.compile(regex);
Matcher urlAmazonMatcher = pattern.matcher(url);
while (urlAmazonMatcher.find()) {
System.out.println("PROVA "+urlAmazonMatcher.group(0));
}
This is my solution. Finally it works :D
String regex="(http|www\\.)amazon\\.(com|it|uk|fr|de)\\/(?:gp\\/product|gp\\/product\\/glance|[^\\/]+\\/dp|dp)\\/([^\\/]{10})";
Pattern pattern=Pattern.compile(regex);
Matcher urlAmazonMatcher = pattern.matcher(url);
String toReturn = null;
while (urlAmazonMatcher.find()) {
toReturn=urlAmazonMatcher.group(0);
}
How about
/[^/?]{10}(/$|\?)
This matches 10 characters that are neither / nor ? following a slash if those characters are followed by a final slash or a question mark.
You can get the part that precedes or follows the ASIN using one of the various Matcher functions.
Here is my work from a previous project that was to extract URLs from text:
private Pattern getUriPattern() {
if(uriPattern == null) {
// taken from http://labs.apache.org/webarch/uri/rfc/rfc3986.html
//TODO implement the full URI syntax
String genDelims = "\\:\\/\\?\\#\\[\\]\\#";
String subDelims = "\\!\\$\\&\\'\\*\\+\\,\\;\\=";
String reserved = genDelims + subDelims;
String unreserved = "\\w\\-\\.\\~"; // i.e. ALPHA / DIGIT / "-" / "." / "_" / "~"
String allowed = reserved + unreserved;
// ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
uriPattern = Pattern.compile("((?:[^\\:/\\?\\#]+:)?//[" + allowed + "&&[^\\?\\#]]*(?:\\?([" + allowed + "&&[^\\#]]*))?(?:\\#[" + allowed + "]*)?).*");
}
return uriPattern;
}
You can use the above method as follows:
Matcher uriMatcher =
getUriPattern().matcher(text);
if(uriMatcher.matches()) {
String candidateUriString = uriMatcher.group(1);
try {
new URI(candidateUriString); // check once again if you matched a URL
// your code here
} catch (Exception e) {
// error handling
}
}
This will catch the whole URL, including params. You can then split it up to the first occurence of '?' (if any) and take the first part. Of course, you can rework the regex too.

Java: Find a specific pattern using Pattern and Matcher

This is the string that I have:
KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007
This is a weather report. I need to extract the following numbers from the report: 10/M13. It is temperature and dewpoint, where M means minus. So, the place in the String may differ and the temperature may be presented as M10/M13 or 10/13 or M10/13.
I have done the following code:
public String getTemperature (String metarIn){
Pattern regex = Pattern.compile(".*(\\d+)\\D+(\\d+)");
Matcher matcher = regex.matcher(metarIn);
if (matcher.matches() && matcher.groupCount() == 1) {
temperature = matcher.group(1);
System.out.println(temperature);
}
return temperature;
}
Obviously, the regex is wrong, since the method always returns null. I have tried tens of variations but to no avail. Thanks a lot if someone can help!
This will extract the String you seek, and it's only one line of code:
String tempAndDP = input.replaceAll(".*(?<![M\\d])(M?\\d+/M?\\d+).*", "$1");
Here's some test code:
public static void main(String[] args) throws Exception {
String input = "KLAS 282356Z 32010KT 10SM FEW090 M01/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
String tempAndDP = input.replaceAll(".*(?<![M\\d])(M?\\d+/M?\\d+).*", "$1");
System.out.println(tempAndDP);
}
Output:
M01/M13
The regex should look like:
M?\d+/M?\d+
For Java this will look like:
"M?\\d+/M?\\d+"
You might want to add a check for white space on the front and end:
"\\sM?\\d+/M?\\d+\\s"
But this will depend on where you think you are going to find the pattern, as it will not be matched if it is at the end of the string, so instead we should use:
"(^|\\s)M?\\d+/M?\\d+($|\\s)"
This specifies that if there isn't any whitespace at the end or front we must match the end of the string or the start of the string instead.
Example code used to test:
Pattern p = Pattern.compile("(^|\\s)M?\\d+/M?\\d+($|\\s)");
String test = "gibberish M130/13 here";
Matcher m = p.matcher(test);
if (m.find())
System.out.println(m.group().trim());
This returns: M130/13
Try:
Pattern regex = Pattern.compile(".*\\sM?(\\d+)/M?(\\d+)\\s.*");
Matcher matcher = regex.matcher(metarIn);
if (matcher.matches() && matcher.groupCount() == 2) {
temperature = matcher.group(1);
System.out.println(temperature);
}
Alternative for regex.
Some times a regex is not the only solution. It seems that in you case, you must get the 6th block of text. Each block is separated by a space character. So, what you need to do is count the blocks.
Considering that each block of text does NOT HAVE fixed length
Example:
String s = "KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
int spaces = 5;
int begin = 0;
while(spaces-- > 0){
begin = s.indexOf(' ', begin)+1;
}
int end = s.indexOf(' ', begin+1);
String result = s.substring(begin, end);
System.out.println(result);
Considering that each block of text does HAVE fixed length
String s = "KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
String result = s.substring(33, s.indexOf(' ', 33));
System.out.println(result);
Prettier alternative, as pointed by Adrian:
String result = rawString.split(" ")[5];
Note that split acctualy receives a regex pattern as parameter

Categories

Resources