Java regex partial match with period - java

In Java I am using Pattern and Matcher to find all instances of ".A (a number)" in a set of strings to retrieve the numbers.
I run into problems because one of the words in the file is "P.A.M.X." and the number returns 0. It won't go through the rest of the file. I've tried using many different regular expressions but I can't get past that occurrence of "P.A.M.X." and onto the next ".A (number)"
for (int i = 0; i < input.size(); i++) {
Pattern pattern = Pattern.compile("\\.A\\s\\d+");
Matcher matcher = pattern.matcher(input.get(i));
while (matcherDocId.find())
{
String matchFound = matcher.group().toString();
int numMatch = 0;
String[] tokens = matchFound.split(" ");
numMatch = Integer.parseInt(tokens[1]);
System.out.println("The number is: " + numMatch);
}
}

Short sample for you:
Pattern pattern = Pattern.compile("\\.A\\s(\\d+)"); // grouping number
Matcher matcher = pattern.matcher(".A 1 .A 2 .A 3 .A 4 *text* .A5"); // full input string
while (matcher.find()) {
int n = Integer.valueOf(matcher.group(1)); // getting captured number - group #1
System.out.println(n);
}

Related

Count & Split by regex pattern in java

I have a string in below format.
-52/ABC/35/BY/200/L/DEF/307/C/110/L
I need to perform the following.
1. Find the no of occurrences of 3 letter word's like ABC,DEF in the above text.
2. Split the above string by ABC and DEF as shown below.
ABC/35/BY/200/L
DEF/307/C/110/L
I have tried using regex with below code, but it always shows the match count is zero. How to approach this easily.
static String DEST_STRING = "^[A-Z]{3}$";
static Pattern DEST_PATTERN = Pattern.compile(DEST_STRING,
Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
public static void main(String[] args) {
String test = "-52/ABC/35/BY/200/L/DEF/307/C/110/L";
Matcher destMatcher = DEST_PATTERN.matcher(test);
int destCount = 0;
while (destMatcher.find()) {
destCount++;
}
System.out.println(destCount);
}
Please note i need to use JDK 6 for this,
You can use this code :
public static void main(String[] args) throws Exception {
String s = "-52/ABC/35/BY/200/L/DEF/307/C/110/L";
// Pattern to find all 3 letter words . The \\b means "word boundary", which ensures that the words are of length 3 only.
Pattern p = Pattern.compile("(\\b[a-zA-Z]{3}\\b)");
Matcher m = p.matcher(s);
Map<String, Integer> countMap = new HashMap<>();
// COunt how many times each 3 letter word is used.
// Find each 3 letter word.
while (m.find()) {
// Get the 3 letter word.
String val = m.group();
// If the word is present in the map, get old count and add 1, else add new entry in map and set count to 1
if (countMap.containsKey(val)) {
countMap.put(val, countMap.get(val) + 1);
} else {
countMap.put(val, 1);
}
}
System.out.println(countMap);
// Get ABC.. and DEF.. using positive lookahead for a 3 letter word or end of String
// Finds and selects everything starting from a 3 letter word until another 3 letter word is found or until string end is found.
p = Pattern.compile("(\\b[a-zA-Z]{3}\\b.*?)(?=/[A-Za-z]{3}|$)");
m = p.matcher(s);
while (m.find()) {
String val = m.group();
System.out.println(val);
}
}
O/P :
{ABC=1, DEF=1}
ABC/35/BY/200/L
DEF/307/C/110/L
Check this one:
String stringToSearch = "-52/ABC/35/BY/200/L/DEF/307/C/110/L";
Pattern p1 = Pattern.compile("\\b[a-zA-Z]{3}\\b");
Matcher m = p1.matcher(stringToSearch);
int startIndex = -1;
while (m.find())
{
//Try to use Apache Commons' StringUtils
int count = StringUtils.countMatches(stringToSearch, m.group());
System.out.println(m.group +":"+ count);
if(startIndex != -1){
System.out.println(stringToSearch.substring(startIndex,m.start()-1));
}
startIndex = m.start();
}
if(startIndex != -1){
System.out.println(stringToSearch.substring(startIndex));
}
output:
ABC : 1
ABC/35/BY/200/L
DEF : 1
DEF/307/C/110/L

Regular Expression to check if a String contains '1-n' Integers and then '0-m' Alphabets

I got a peculiar situation where I need to validate a String.
String has to satisfy some criteria to move further. which are :
String should start with an Integer value whose length should be > 1
and < n
and then followed by alphabets whose length should be from 0 to m (which means alphabet may be present or may not be present)
myString.charAt(0) is giving me if the string starts with Integer.
How to validate it contains only < n integers ?
How to validate it is by > 0 and < n integers followed by 0 to < m alphabets ?
can I get a regular expression to solve it ?
This should work
^\d{1,n - 1}[A-Za-z]{0,m - 1}$
As you want < n. So it should be n-1
DEMO
Code in JAVA
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
static boolean isValid(String x, int n, int m)
{
String pattern = "^\\d{1," + (n - 1) + "}[A-Za-z]{0," + (m - 1) + "}$";
Pattern r = Pattern.compile(pattern);
Matcher t = r.matcher(x);
return t.find();
}
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "123abcdef";
int n = 4, m = 4;
if (isValid(line, n, m)) {
System.out.println("FOUND");
} else {
System.out.println("NOT FOUND");
}
}
}
The value of n should be greater than or equal to 2 and the value of m should be greater than 1
IDEONE DEMO
You can match this with a very simple regex:
^(\d+)([A-z]*)$
1 or more digits, followed by 0 or more letters. You can very easily grab the capture groups to find out exactly how many digits or how many letters are in the string. If you know m and n ahead of time as specific numbers, then insert them into the regex like so:
For n = 4 and m = 3,
^(\d{1,4})([A-z]{0,3})$
This will match 0000aaa, but not aaa or 000aaaa.
Another variant of the solution that we have seen so far. All answers are really good. This should also match unicode.
Pattern
\b\p{N}{1,n}\p{L}{0,m}\W
Source Code
public static void matchNumeroAlphaString(){
int n = 3;
int m = 3;
String text =
"John32 54writes about this, 444 and 456Joh writes about that," +
" and John writes #about 9EveryThing. ";
String patternString = "\\b\\p{N}{1," + n + "}\\p{L}{0," + m + "}\\W";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println("Found: " + matcher.group());
}
}
Output
Found: 444
Found: 456Joh

Get substring in a string with multiple occurring string

I have a string something like
(D#01)5(D#02)14100319530033M(D#03)1336009-A-A(D#04)141002A171(D#05)1(D#06)
Now i want to get substring between (D#01)5(D#02)
If i have something like
(D#01)5(D#02)
i can get detail with
quantity = content.substring(content.indexOf("(D#01)") + 6, content.indexOf("(D#02)"));
But somethings D#02 can be different like #05, Now how can i use simple (D# to get string in between. there are multiple repetitions of (D#
Basically this is what i want to do
content.substring(content.indexOf("(D#01)") + 6, content.nextOccurringIndexOf("(D#"));
I suppose you can do
int fromIndex = content.indexOf("(D#01)") + 6;
int toIndex = content.indexOf("(D#", fromIndex); // next occurring
if (fromIndex != -1 && toIndex != -1)
str = content.substring(fromIndex, toIndex);
Output
5
See http://ideone.com/RrUtBy demo.
Assuming that the marker and value are some how linked and you want to know each ((D#01) == 5), then you can make use of the Pattern/Matcher API, for example
String text = "(D#01)5(D#02)14100319530033M(D#03)1336009-A-A(D#04)141002A171(D#05)1(D#06)";
Pattern p = Pattern.compile("\\(D#[0-9]+\\)");
Matcher m = p.matcher(text);
while (m.find()) {
String name = m.group();
if (m.end() < text.length()) {
String content = text.substring(m.end()) + 1;
content = content.substring(0, content.indexOf("("));
System.out.println(name + " = " + content);
}
}
Which outputs
(D#01) = 5
(D#02) = 14100319530033M
(D#03) = 1336009-A-A
(D#04) = 141002A171
(D#05) = 1
Now, this is a little heavy handed, I'd create some kind of "marker" object which contained the key (D#01) and it's start and end indices. I'd then keep this information in a List and cut up each value based on the end of the earlier key and the start of the last key...but that's just me ;)
You can use regex capture groups if want the content between the (D###)'s
Pattern p = Pattern.compile("(\\(D#\\d+\\))(.*?)(?=\\(D#\\d+\\))");
Matcher matcher = p.matcher("(D#01)5(D#02)14100319530033M(D#03)1336009-A-A(D#04)141002A171(D#05)1(D#06)");
while(matcher.find()) {
System.out.println(String.format("%s start: %2s end: %2s matched: %s ",
matcher.group(1), matcher.start(2), matcher.end(2), matcher.group(2)));
}
(D#01) start: 6 end: 7 matched: 5
(D#02) start: 13 end: 28 matched: 14100319530033M
(D#03) start: 34 end: 45 matched: 1336009-A-A
(D#04) start: 51 end: 61 matched: 141002A171
(D#05) start: 67 end: 68 matched: 1
You can user regex to split the input - as suggested by #MadProgrammer. split() method produces a table of Strings, so the order of the occurrences of the searched values will be exactly the same as the order of the values in the table produced by split(). For example:
String input = "(D#01)5(D#02)14100319530033M(D#03)1336009-A-A(D#04)141002A171(D#05)1(D#06)";
String[] table = input.split("\(D#[0-9]+\)");
Try this:
public static void main(String[] args) {
String input = "(D#01)5(D#02)14100319530033M(D#03)1336009-A-A(D#04)141002A171(D#05)1(D#06)";
Pattern p = Pattern.compile("\\(D#\\d+\\)(.*?)(?=\\(D#\\d+\\))");
Matcher matches = p.matcher(input);
while(matches.find()) {
int number = getNum(matches.group(0)); // parses the number
System.out.printf("%d. %s\n", number, matches.group(1)); // print the string
}
}
public static int getNum(String str) {
int start = str.indexOf('#') + 1;
int end = str.indexOf(')', start);
return Integer.parseInt(str.substring(start,end));
}
Result:
1. 5
2. 14100319530033M
3. 1336009-A-A
4. 141002A171
5. 1

java regex matching each group starting with specific string

I have a string like a1wwa1xxa1yya1zz.
I would like to get every groups starting with a1 until next a1 excluded.
(In my example, i would be : a1ww, a1xx, a1yyand a1zz
If I use :
Matcher m = Pattern.compile("(a1.*?)a1").matcher("a1wwa1xxa1yya1zz");
while(m.find()) {
String myGroup = m.group(1);
}
myGroup capture 1 group every two groups.
So in my example, I can only capture a1ww and a1yy.
Anyone have a great idea ?
Split is a good solution, but if you want to remain in the regex world, here is a solution:
Matcher m = Pattern.compile("(a1.*?)(?=a1|$)").matcher("a1wwa1xxa1yya1zz");
while (m.find()) {
String myGroup = m.group(1);
System.out.println("> " + myGroup);
}
I used a positive lookahead to ensure the capture is followed by a1, or alternatively by the end of line.
Lookahead are zero-width assertions, ie. they verify a condition without advancing the match cursor, so the string they verify remains available for further testing.
You can use split() method, then append "a1" as a prefix to splitted elements:
String str = "a1wwa1xxa1yya1zz";
String[] parts = str.split("a1");
String[] output = new String[parts.length - 1];
for (int i = 0; i < output.length; i++)
output[i] = "a1" + parts[i + 1];
for (String p : output)
System.out.println(p);
Output:
a1ww
a1xx
a1yy
a1zz
I would use an approach like this:
String str = "a1wwa1xxa1yya1zz";
String[] parts = str.split("a1");
for (int i = 1; i < parts.length; i++) {
String found = "a1" + parts[i];
}

How to Insert Commas Into a Number WITHIN a String of Other Words

I have a String like the following:
"The answer is 1000"
I want to insert commas into the number 1000 without destroying the rest of the String.
NOTE: I also want to use this for other Strings of differing lengths, so substring(int index) would not be advised for getting the number.
The best way that I can think of is to use a regex command, but I have no idea how.
Thanks in advance!
The following formats all the non-decimal numbers:
public String formatNumbers(String input) {
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(input);
NumberFormat nf = NumberFormat.getInstance();
StringBuffer sb = new StringBuffer();
while(m.find()) {
String g = m.group();
m.appendReplacement(sb, nf.format(Double.parseDouble(g)));
}
return m.appendTail(sb).toString();
}
e.g. if you call: formatNumbers("The answer is 1000 1000000")
Result is: "The answer is 1,000 1,000,000"
See: NumberFormat and Matcher.appendReplacement().
modified from Most efficient way to extract all the (natural) numbers from a string:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Example {
private static final String REGEX = "\\d+";
public static void main(String[] args) {
String input = "dog dog 1342 dog doggie 2321 dogg";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(input); // get a matcher object
int end = 0;
String result = "";
while (m.find()) {
result = result + input.substring(end, m.start());
result = result
+ addCommas(
input.substring(
m.start(), m.end()));
end = m.end();
}
System.out.println(result);
}
private static String addCommas(String s) {
char[] c = s.toCharArray();
String result = "";
for (int i = 0; i < s.length(); i++) {
if (s.length() % 3 == i % 3)
result += ",";
result += c[i];
}
return result;
}
}
You could use the regular expression:
[0-9]+
To find contiguous sets of digits, so it would match 1000, or 7500 or 22387234, etc.. You can test this on http://regexpal.com/ This doesn't handle the case of numbers that involve decimal points, BTW.
This isn't a complete, with code answer, but the basic algorithm is as follows:
You use that pattern to find the index(es) of the match(es) within the string (the index of the characters where the various matches start)
From each of those indexes, you copy the digits into a temporary string that contains only the digits of the number(s) in the String
You write a function that starts at the end of the String, and for every 3rd digit (from the end) you insert a comma before it, unless the index of the current character is 0 (which will prevent 300 from being turned into ,300
Replace the original number in the source string with the comma'ed String, using the replace() method

Categories

Resources