Regex required for space delimited strings java

Regex required for space delimited strings java - java

I have an operation that deals with many space delimited strings, I am looking for a regex for the String matches function which will trigger pass if first two strings before first space starts with capital letters and will return false if they are not.
Examples:
"AL_RIT_121 PA_YT_32 rit cell 22 pulse"
will return true as first two substring AL_RIT_121 and PA_YT_32 starts with capital letter A and P respectively
"AL_RIT_252 pa_YT_21 mal cell reg 32 1 ri"
will return false as p is in lower case.

Pattern.compile("^\\p{Lu}\\S*\\s+\\p{Lu}")
will work with the .find() method. There's no reason to use matches on a prefix test, but if you have an external constraint, just do
Pattern.compile("^\\p{Lu}\\S*\\s+\\p{Lu}.*", Pattern.DOTALL)
To break this down:
^ matches the start of the string,
\\p{Lu} matches any upper-case letter,
\\S* matches zero or more non-space characters, including _
\\s+ matches one or more space characters, and
the second \\p{Lu} matches the upper-case letter starting the second word.
In the second variant, .* combined with Pattern.DOTALL matches the rest of the input.

Simply string.matches("[A-Z]\\w+ [A-Z].*")

You can use a specific regex if those two examples demonstrate your input format:
^(?:[A-Z]+_[A-Z]+_\d+\s*)+
Which means:
^ - Match the beginning of the string
(?: - Start a non-capturing group (used to repeat the following)
[A-Z]+ - Match one or more uppercase characters
_ - Match an underscore
[A-Z]+ - Match one or more uppercase characters
_ - Match an underscore
\d+ - Match one or more decimals (0-9)
\s* - Match zero or more space characters
)+ - Repeat the above group one or more times
You would use it in Java like this:
Pattern pattern = Pattern.compile("^(?:[A-Z]+_[A-Z]+_\\d+\\s*)+");
Matcher matcher = p.matcher( inputString);
if( matcher.matches()) {
System.out.println( "Match found.");
}

Check this out:
public static void main(String[] args)
{
String text = "AL_RIT_121 pA_YT_32 rit cell 22 pulse";
boolean areFirstTwoWordsCapitalized = areFirstTwoWordsCapitalized(text);
System.out.println("areFirstTwoWordsCapitalized = <" + areFirstTwoWordsCapitalized + ">");
}
private static boolean areFirstTwoWordsCapitalized(String text)
{
boolean rslt = false;
String[] words = text.split("\\s");
int wordIndx = 0;
boolean frstWordCap = false;
boolean scndWordCap = false;
for(String word : words)
{
wordIndx++;
//System.out.println("word = <" + word + ">");
Pattern ptrn = Pattern.compile("^[A-Z].+");
Matcher mtchr = ptrn.matcher(word);
while(mtchr.find())
{
String match = mtchr.group();
//System.out.println("\tMatch = <" + match + ">");
if(wordIndx == 1)
{
frstWordCap = true;
}
else if(wordIndx == 2)
{
scndWordCap = true;
}
}
}
rslt = frstWordCap && scndWordCap;
return rslt;
}

Try this:
public class RegularExp
{
/**
* #param args
*/
public static void main(String[] args) {
String regex = "[A-Z][^\\s.]*\\s[A-Z].*";
String str = "APzsnnm lmn Dlld";
System.out.println(str.matches(regex));
}
}

Related

Parsing a string that contains double quotes

i have a relatively simple java question. I have a string that looks like this:
"Anderson,T",CWS,SS
I need to parse it in a way that I have
Anderson,T
CWS
SS
all as separate strings.
Thanks!

Here's a solution that will capture quoted strings, remove spaces, and match empty items:
public static void main(String[] args) {
String quoted = "\"(.*?(?<!\\\\)(?:\\\\\\\\)*)\"";
Pattern regex = Pattern.compile(
"(?:^|(?<=,))\\s*(" + quoted + "|[^,]*?)\\s*(?:$|,)");
String line = "\"Anderson,T\",CWS,\"single quote\\\"\", SS ,,hello,,";
Matcher m = regex.matcher(line);
int count = 0;
while (m.find()) {
String s = m.group(2) == null ? m.group(1) : m.group(2);
System.out.println(s);
count++;
}
System.out.printf("(%d matches found)%n", count);
}
I split out the quoted part of the pattern to make it a bit easier to follow. Capturing group 1 is the quoted string, 2 is every other match.
To break down the overall pattern:
Look for start of line or previous comma (?:^|(?<=,)) (don't capture)
Ignore 0+ spaces \\s*
Look for quoted string or string without comma (" + quoted + "|[^,]*?)
(The non-comma match is non-greedy so it doesn't grab any following spaces)
Ignore 0+ spaces again \\s*
Look for end of line, or comma (?:$|,) (don't capture)
To break down the quote pattern:
Look for opening quote \"
Start group capture (
Get the minimum match of any character .*?
Match 0+ even number of backslashes (?<!\\\\)(?:\\\\\\\\)* (to avoid matching escaped quotes with or without preceding escaped backslashes)
Close capturing group )
Match closing quote \"

Assuming your string looks like this
String input = "\"Anderson,T\",CWS,SS";
You can use this solution found for a similar scenario.
String input = "\"Anderson,T\",CWS,SS";
List<String> result = new ArrayList<String>();
int start = 0; //start index. Used to determine where the word starts
boolean inQuotes = false;
for (int current = 0; current < input.length(); current++) { //iterate through characters
if (input.charAt(current) == '\"') //if found a quote
inQuotes = !inQuotes; // toggle state
if(current == (input.length() - 1))//if it is the last character
result.add(input.substring(start)); //add last word
else if (input.charAt(current) == ',' && !inQuotes) { //if found a comma not inside quotes
result.add(input.substring(start, current)); //add everything between the start index and the current character. (add a word)
start = current + 1; //update start index
}
}
System.out.println(result);
I have modified it a bit to improve readability. This code stores the strings you want in the list result.

regular expression java for URL parameter string

I am trying to verify if the string match a regular expression or not.
The URL format is : key=value&key=value&....
Key or value can be empty.
My code is :
Pattern patt = Pattern.compile("\\w*=\\w*&(\\w *=\\w*)* ");
Matcher m = patt.matcher(s);
if(m.matches()) return true;
else return false;
when i enter one=1&two=2, it shows false whereas it should show true.
Any idea !

The regex you need is
Pattern.compile("(?:\\w+=\\w*|=\\w+)(?:&(?:\\w+=\\w*|=\\w+))*");
See the regex demo. It will match:
(?:\\w+=\\w*|=\\w+) - either 1+ word chars followed with = and then 0+ word chars (obligatory key, optional value) or = followed with 1+ word chars (optional key)
(?:&(?:\\w+=\\w*|=\\w+))* - zero or more of such sequences as above.
Java demo:
String s = "one=1&two=2&=3&tr=";
Pattern patt = Pattern.compile("(?:\\w+=\\w*|=\\w+)(?:&(?:\\w+=\\w*|=\\w+))*");
Matcher m = patt.matcher(s);
if(m.matches()) {
System.out.println("true");
} else {
System.out.println("false");
}
// => true
To allow whitespaces, add \\s* where needed. If you need to also allow non-word chars, use, say, [\\w.-] instead of \w to match word chars, . and - (keep the - at the end of the character class).

Splitting a nested string keeping quotation marks

I am working on a project in Java that requires having nested strings.
For an input string that in plain text looks like this:
This is "a string" and this is "a \"nested\" string"
The result must be the following:
[0] This
[1] is
[2] "a string"
[3] and
[4] this
[5] is
[6] "a \"nested\" string"
Note that I want the \" sequences to be kept.
I have the following method:
public static String[] splitKeepingQuotationMarks(String s);
and I need to create an array of strings out of the given s parameter by the given rules, without using the Java Collection Framework or its derivatives.
I am unsure about how to solve this problem.
Can a regex expression be made that would get this solved?
UPDATE based on questions from comments:
each unescaped " has its closing unescaped " (they are balanced)
each escaping character \ also must be escaped if we want to create literal representing it (to create text representing \ we need to write it as \\).

You can use the following regex:
"[^"\\]*(?:\\.[^"\\]*)*"|\S+
See the regex demo
Java demo:
String str = "This is \"a string\" and this is \"a \\\"nested\\\" string\"";
Pattern ptrn = Pattern.compile("\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"|\\S+");
Matcher matcher = ptrn.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Explanation:
"[^"\\]*(?:\\.[^"\\]*)*" - a double quote that is followed with any 0+ characters other than a " and \ ([^"\\]) followed with 0+ sequences of any escaped sequence (\\.) followed with any 0+ characters other than a " and \
| - or...
\S+ - 1 or more non-whitespace characters
NOTE
#Pshemo's suggestion - "\"(?:\\\\.|[^\"])*\"|\\S+" (or "\"(?:\\\\.|[^\"\\\\])*\"|\\S+" would be more correct) - is the same expression, but much less efficient since it is using an alternation group quantified with *. This construct involves much more backtracking as the regex engine has to test each position, and there are 2 probabilities for each position. My unroll-the-loop based version will match chunks of text at once, and is thus much faster and reliable.
UPDATE
Since String[] type is required as output, you need to do it in 2 steps: count the matches, create the array, and then re-run the matcher again:
int cnt = 0;
String str = "This is \"a string\" and this is \"a \\\"nested\\\" string\"";
Pattern ptrn = Pattern.compile("\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"|\\S+");
Matcher matcher = ptrn.matcher(str);
while (matcher.find()) {
cnt++;
}
System.out.println(cnt);
String[] result = new String[cnt];
matcher.reset();
int idx = 0;
while (matcher.find()) {
result[idx] = matcher.group(0);
idx++;
}
System.out.println(Arrays.toString(result));
See another IDEONE demo

Another regex approach that works uses a negative lookbehind: "words" (\w+) OR "quote followed by anything up to the next quote that ISN'T preceded by a backslash", and set your match to "global" (don't return on first match)
(\w+|".*?(?<!\\)")
see it here.

An alternative method that does not use a regex:
import java.util.ArrayList;
import java.util.Arrays;
public class SplitKeepingQuotationMarks {
public static void main(String[] args) {
String pattern = "This is \"a string\" and this is \"a \\\"nested\\\" string\"";
System.out.println(Arrays.toString(splitKeepingQuotationMarks(pattern)));
}
public static String[] splitKeepingQuotationMarks(String s) {
ArrayList<String> results = new ArrayList<>();
StringBuilder last = new StringBuilder();
boolean inString = false;
boolean wasBackSlash = false;
for (char c : s.toCharArray()) {
if (Character.isSpaceChar(c) && !inString) {
if (last.length() > 0) {
results.add(last.toString());
last.setLength(0); // Clears the s.b.
}
} else if (c == '"') {
last.append(c);
if (!wasBackSlash)
inString = !inString;
} else if (c == '\\') {
wasBackSlash = true;
last.append(c);
} else
last.append(c);
}
results.add(last.toString());
return results.toArray(new String[results.size()]);
}
}
Output:
[This, is, "a string", and, this, is, "a \"nested\" string"]

Regex to add digit between delimiter characters if missing

I didn't use regex a lot and I need a little bit of help. I have a situation where I have digits which are separated with dot char, something like this:
0.0.1
1.1.12.1
20.3.4.00.1
Now I would like to ensure that each number between . has two digits:
00.00.01
01.01.12.01
20.03.04.00.01
How can I accomplish that? Thank you for your help.

You can use String.split() to accomplish this:
public static void main(String[] args) {
String[] splitString = "20.3.4.00.1".split("\\.");
String output = "";
for(String a : splitString)
{
if(a.length() < 2)
{
a = "0" + a;
}
output += a + ".";
}
output = output.substring(0, output.length() - 1);
System.out.println(output);
}

use this pattern
\b(?=\d(?:\.|$))
and replace with 0
Demo
\b # <word boundary>
(?= # Look-Ahead
\d # <digit 0-9>
(?: # Non Capturing Group
\. # "."
| # OR
$ # End of string/line
) # End of Non Capturing Group
) # End of Look-Ahead

You can iterate over the matching groups retrieved from matching the following expression: /([^.]+)/g.
Example:
public class StackOverFlow {
public static String text;
public static String pattern;
static {
text = "20.3.4.00.1";
pattern = "([^.]+)";
}
public static String appendLeadingZero(String text) {
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(text);
StringBuilder sb = new StringBuilder();
while (m.find()) {
String firstMatchingGroup = m.group(1);
if (firstMatchingGroup.length() < 2) {
sb.append("0" + firstMatchingGroup);
} else {
sb.append(firstMatchingGroup);
}
sb.append(".");
}
return sb.substring(0, sb.length() - 1);
}
public static void main(String[] args) {
System.out.println(appendLeadingZero(text));
}
}

I am going with the assumption that you want to ensure every integer is at least two digits, both between . and on the ends. This is what I came up with
public String ensureTwoDigits(String original){
return original.replaceAll("(?<!\\d)(\\d)(?!\\d)","0$1");
}
Test case
public static void main(String[] args) {
Foo f = new Foo();
List<String> values = Arrays.asList("1",
"1.1",
"01.1",
"01.01.1.1",
"01.2.01",
"01.01.01");
values.forEach(s -> System.out.println(s + " -> " + f.ensureTwoDigits(s)));
}
Test output
1 -> 01
1.1 -> 01.01
01.1 -> 01.01
01.01.1.1 -> 01.01.01.01
01.2.01 -> 01.02.01
01.01.01 -> 01.01.01
The regex (?<!\\d)(\\d)(?!\\d) uses both negative lookbehind and negative lookahead to check if a single digit has other digits around it. Otherwise, it will put a zero in front of every single digit. The replacement string "0$1" says put a 0 in front of the first capturing group. There really is only one, that being (\\d) -- the single digit occurrance.
EDIT: I should note that I realize this is not a strict match to the original requirements. It won't matter what you use between single digits -- letters, various punctuation, et. al., will all return just fine with zero in front of any single digit. If you want it to fail or skip strings that may contain characters other than digits and ., the regex would need to be changed.

you can use this simple regex:
\\b\\d\\b
and replace with 0$0

String does not match regex

I want the regex to check if there is a "word" ( [a-zA-Z0-9] ) and if there is a bracket there has to be something like (id = [0-9]+ ) after that one of these six "relations" followed by another word. There has to be at least one whitespace between the words and the relation. There can be more whitespaces between the words and the relation and between (,),id,= and number. The three Strings are just there to shorten the lines.
This code always prints false no matter what I try:
String first = "\\s[a-zA-Z0-9]+(\\s\\(\\sid\\s=\\s[0-9]+\\s\\))?\\s+";
String second = "(contains|contained-in|part-of|has-part|successor-of|predecessor-of)\\s+";
String third = "[a-zA-Z0-9]+(\\s\\(\\sid\\s=\\s[0-9]+\\s\\))?\\s";
Pattern linePattern = Pattern.compile(first + second + third);
public static void main(String[] args) {
String a = "apple(id=107) contained-in tree";
Matcher matcher = linePattern.matcher(a);
boolean matches = matcher.matches();
System.out.println(matches);
}

Try using \s*:
\s*[a-zA-Z0-9]+(\s*\(\s*id\s*=\s*[0-9]+\s*\))?\s*(contains|contained-in|part-of|has-part|successor-of|predecessor-of)\s+[a-zA-Z0-9]+(\s*\(\s*id\s*=\s*[0-9]+\s*\))?\s*
Pattern linePattern = Pattern.compile("\\s*[a-zA-Z0-9]+(\\s*\(\\s*id\\s*=\\s*[0-9]+\\s*\\))?\\s*(contains|contained-in|part-of|has-part|successor-of|predecessor-of)\\s+[a-zA-Z0-9]+(\\s*\\(\\s*id\\s*=\\s*[0-9]+\\s*\\))?\\s*");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex required for space delimited strings java - java

Simply string.matches("[A-Z]\\w+ [A-Z].*")

Try this: public class RegularExp { /** * #param args / public static void main(String[] args) { String regex = "[A-Z][^\\s.]\\s[A-Z].*"; String str = "APzsnnm lmn Dlld"; System.out.println(str.matches(regex)); } }

Related

Parsing a string that contains double quotes

regular expression java for URL parameter string

Splitting a nested string keeping quotation marks

Regex to add digit between delimiter characters if missing

String does not match regex

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex required for space delimited strings java - java

Simply string.matches("[A-Z]\\w+ [A-Z].*")

Try this: public class RegularExp { /** * #param args */ public static void main(String[] args) { String regex = "[A-Z][^\\s.]*\\s[A-Z].*"; String str = "APzsnnm lmn Dlld"; System.out.println(str.matches(regex)); } }

Related

Parsing a string that contains double quotes

regular expression java for URL parameter string

Splitting a nested string keeping quotation marks

Regex to add digit between delimiter characters if missing

String does not match regex

Categories

Resources

Try this: public class RegularExp { /** * #param args / public static void main(String[] args) { String regex = "[A-Z][^\\s.]\\s[A-Z].*"; String str = "APzsnnm lmn Dlld"; System.out.println(str.matches(regex)); } }