I need to parse raw data and allow strings that can contain alphabets and ONLY one punctuation character.
Here is what I have done so far:
public class ProcessRawData {
public static void main(String[] args) {
String myData = "Australia India# America#!";
ProcessRawData data = new ProcessRawData();
data.process(myData);
}
public void process(String rawData) {
String[] splitData = rawData.split(" ");
for (String s : splitData) {
System.out.println("My Data Elements: " + s);
Pattern pattern = Pattern.compile("^[\\p{Alpha}\\p{Punct}]*$");
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println("Allowed");
} else {
System.out.println("Not allowed");
}
}
}
}
It prints below,
My Data Elements: Australia
Allowed
My Data Elements: India#
Allowed
My Data Elements: America#!
Allowed
Expected is it should NOT print America#! as it contains more than one punctuation character.
I guess I might need to use quantifiers, but not sure where to place them so that it will allow ONLY one punctuation character?
Can someone help?
You should compile your Pattern outside the loop.
When using matches(), there's no need for ^ and $, since it'll match against the entire string anyway.
If you need at most one punctuation character, you need to match a single optional punctuation character, preceded and/or followed by optional alphabet characters.
Note that using \\p{Alpha} and \\p{Punct} excludes digits. No digit will be allowed. If you want to consider a digit as a special character, replace \\p{Punct} with \\P{Alpha} (uppercase P means not Alpha).
public static void main(String[] args) {
process("Australia India# Amer$ca America#! America1");
}
public static void process(String rawData) {
Pattern pattern = Pattern.compile("\\p{Alpha}*\\p{Punct}?\\p{Alpha}*");
for (String s : rawData.split(" ")) {
System.out.println("My Data Elements: " + s);
if (pattern.matcher(s).matches()) {
System.out.println("Allowed");
} else {
System.out.println("Not allowed");
}
}
}
Output
My Data Elements: Australia
Allowed
My Data Elements: India#
Allowed
My Data Elements: Amer$ca
Allowed
My Data Elements: America#!
Not allowed
My Data Elements: America1
Not allowed
You may use
^\\p{Alpha}*(?:\\p{Punct}\\p{Alpha}*)?$
Explanation:
^ - start of string
\\p{Alpha}* - zero or more letters
(?:\\p{Punct}\\p{Alpha}*)? - one or zero (due to the ? quantifier) sequences of:
\\p{Punct} - a single occurrence of a punctuation symbol
\\p{Alpha}* - zero or more letters
$ - end of string.
Using it with String#matches will allow dropping the ^ and $ anchors since the pattern will then be anchored by default:
if (input.matches("\\p{Alpha}*(?:\\p{Punct}\\p{Alpha}*)?")) { ... }
You can do it with a simple negative look-ahead:
((?!\\p{Punct}{2}).)*
So your code becomes simply:
public void process(String rawData) {
if (input.matches("((?!\\p{Punct}{2}).)*"))
System.out.println("Allowed");
} else {
System.out.println("Not allowed");
}
}
The regex just asserts that each character is not a {Punct} followed by another {Punct}.
I hope that would be helpful.
public static void process(String rawData) {
String[] splitData = rawData.split(" ");
for (String s : splitData) {
Pattern pNum = Pattern.compile("[0-9]");
Matcher match = pNum.matcher(s);
if (match.find()) {
System.out.println(s + ": Not Allowed");
continue;
}
Pattern p = Pattern.compile("[^a-z]", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(s);
int count = 0;
while (m.find()) {
count = count + 1;
}
if (count > 1) {
System.out.println(s + ": Not Allowed");
} else {
System.out.println(s + ": Allowed");
}
}
}
Output
Australia: Allowed
India#: Allowed
America#!: Not Allowed
America1: Not Allowed
Alright! edit again
You can use following regex
^[A-Za-z]*[!"\#$%&'()*+,\-.\/:;<=>?#\[\\\]^_`{|}~]?[A-Za-z]*$
Regex
This will work for only one punctuation residing at any place.
Related
This question already has answers here:
Java regex - expression with exactly one whitespace
(6 answers)
Closed 5 years ago.
How to check for special chars in a string? I am checking for just empty spaces using regex but when i enter special chars it's considering them as space. Below is my code
private boolean emptySpacecheck(String msg){
return msg.matches(".*\\w.*");
}
How to check for special chars?
You can use Pattern matcher for check special character and you can check below example:
Pattern regex = Pattern.compile("[$&+,:;=\\\\?##|/'<>.^*()%!-]");
if (regex.matcher(your_string).find()) {
Log.d("TTT, "SPECIAL CHARS FOUND");
return;
}
Hope this helps you...if you need any help you can ask
An easy way is to check if a string has any non-alphanumeric
characters.
TRY THIS,
StringChecker.java
public class StringChecker {
public static void main(String[] args) {
String str = "abc$def^ghi#jkl";
Pattern p = Pattern.compile("[^a-z0-9 ]", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(str);
System.out.println(str);
int count = 0;
while (m.find()) {
count = count+1;
System.out.println("position " + m.start() + ": " + str.charAt(m.start()));
}
System.out.println("There are " + count + " special characters");
}
}
And you get the result look like below:
$ java SpecialChars
abc$def^ghi#jkl
position 3: $
position 7: ^
position 11: #
There are 3 special characters
You can pass your own patterns as param in compile methods as per your needs
to checking special characters:
Pattern.compile("[$&+,:;=\\\\?##|/'<>.^*()%!-]");
...when i enter special chars it's considering them as space.
Which means you only want to check whether String contains space or not.
You don't need regular expression to check for space. You can simply call String#contains method.
private boolean emptySpacecheck(String msg){
return msg != null && msg.contains(" ");
}
You can use the following RegExp:
private boolean emptySpacecheck(String msg){
return msg.matches(".*\\s+.*");
}
\s matches with these characters: [ \t\n\x0B\f\r]
Try it online: https://regex101.com/r/GztOoI/1
I would like to remove single or double quotes from both ends of a string. The string may contain additional quotes or/and double quotes which shall remain untouched - so removeAll() is not an option.
String one = "\"some string\"";
String two = "'some \"other string\"'";
// expected result
// some string
// some "other string"
What I tried so far:
two = two.replace("/^[\"\'])|([\"\']$/g", "");
The following would work but there must be a much more elegant way to achieve this..
if ((one != null && one.length() > 1) && ((one.startsWith("\"") && one.endsWith("\"")) ||
(one.startsWith("\'") && one.endsWith("\'")))) {
one = one.substring(1, one.length() - 1);
}
Any ideas?
Update / clarification
My use case is the command line interface of an app, where the user can also drag files/paths into, instead of typing them.
Under Windows the dragged files are beeing surrounded by double quotes, under Linux with single quotes. All I want to do is get rid of them. So in my use case the quotes are always symetric (they match).
But I can perfectly live with a solution, which would strip them even if they wouldn't match, because they always do
Option 1: Removing all single and double quotes from start and end
You can use replaceAll which accepts a regular expression - replace doesn't - and do it twice, once for quotes at the start of the string and once for quotes at the end:
public class Test {
public static void main(String[] args) {
String text = "\"'some \"other string\"'";
String trimmed = text
.replaceAll("^['\"]*", "")
.replaceAll("['\"]*$", "");
System.out.println(trimmed);
}
}
The ^ in the first replacement anchors the quotes to the start of the string; the $ in the second anchors the quotes to the end of the string.
Note that this doesn't try to "match" quotes at all, unlike your later code.
Option 2: Removing a single quote character from start and end, if they match
String trimmed = text.replaceAll("^(['\"])(.*)\\1$", "$2");
This will trim exactly one character from the start and end, if they both match. Sample:
public class Test {
public static void main(String[] args) {
trim("\"foo\"");
trim("'foo'");
trim("\"bar'");
trim("'bar\"");
trim("\"'baz'\"");
}
static void trim(String text) {
String trimmed = text.replaceAll("^(['\"])(.*)\\1$", "$2");
System.out.println(text + " => " + trimmed);
}
}
Output:
"foo" => foo
'foo' => foo
"bar' => "bar'
'bar" => 'bar"
"'baz'" => 'baz'
To complete Jon Skeet response, if you want to remove quotes only if there is one on the beginning AND one on the end you can do :
public String removeQuotes(String str) {
Pattern pattern = Pattern.compile("^['\"](.*)['\"]$");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
return matcher.group(1);
} else {
return str;
}
}
if you are looking in javascript try this :
function t(k){
var l="\"\'"; //you can add more chars here.
if (l.indexOf(k[0])>-1) {
return t(k.substr(1,k.length));
} else if (l.indexOf(k[k.length-1])>-1) {
return t(k.substr(0,k.length-1));
} else {
return k;
}
}
One possible way with using replaceFirst():
String one = "\"some string\"";
System.out.println("one: " + one);
one = one.replaceFirst("\"", "");
String reversed = new StringBuilder(one).reverse().toString();
one = one.replaceFirst("\"", "");
one = new StringBuilder(reversed).reverse().toString();
System.out.println("result: " + one);
I didn't use regex a lot and I need a little bit of help. I have a situation where I have digits which are separated with dot char, something like this:
0.0.1
1.1.12.1
20.3.4.00.1
Now I would like to ensure that each number between . has two digits:
00.00.01
01.01.12.01
20.03.04.00.01
How can I accomplish that? Thank you for your help.
You can use String.split() to accomplish this:
public static void main(String[] args) {
String[] splitString = "20.3.4.00.1".split("\\.");
String output = "";
for(String a : splitString)
{
if(a.length() < 2)
{
a = "0" + a;
}
output += a + ".";
}
output = output.substring(0, output.length() - 1);
System.out.println(output);
}
use this pattern
\b(?=\d(?:\.|$))
and replace with 0
Demo
\b # <word boundary>
(?= # Look-Ahead
\d # <digit 0-9>
(?: # Non Capturing Group
\. # "."
| # OR
$ # End of string/line
) # End of Non Capturing Group
) # End of Look-Ahead
You can iterate over the matching groups retrieved from matching the following expression: /([^.]+)/g.
Example:
public class StackOverFlow {
public static String text;
public static String pattern;
static {
text = "20.3.4.00.1";
pattern = "([^.]+)";
}
public static String appendLeadingZero(String text) {
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(text);
StringBuilder sb = new StringBuilder();
while (m.find()) {
String firstMatchingGroup = m.group(1);
if (firstMatchingGroup.length() < 2) {
sb.append("0" + firstMatchingGroup);
} else {
sb.append(firstMatchingGroup);
}
sb.append(".");
}
return sb.substring(0, sb.length() - 1);
}
public static void main(String[] args) {
System.out.println(appendLeadingZero(text));
}
}
I am going with the assumption that you want to ensure every integer is at least two digits, both between . and on the ends. This is what I came up with
public String ensureTwoDigits(String original){
return original.replaceAll("(?<!\\d)(\\d)(?!\\d)","0$1");
}
Test case
public static void main(String[] args) {
Foo f = new Foo();
List<String> values = Arrays.asList("1",
"1.1",
"01.1",
"01.01.1.1",
"01.2.01",
"01.01.01");
values.forEach(s -> System.out.println(s + " -> " + f.ensureTwoDigits(s)));
}
Test output
1 -> 01
1.1 -> 01.01
01.1 -> 01.01
01.01.1.1 -> 01.01.01.01
01.2.01 -> 01.02.01
01.01.01 -> 01.01.01
The regex (?<!\\d)(\\d)(?!\\d) uses both negative lookbehind and negative lookahead to check if a single digit has other digits around it. Otherwise, it will put a zero in front of every single digit. The replacement string "0$1" says put a 0 in front of the first capturing group. There really is only one, that being (\\d) -- the single digit occurrance.
EDIT: I should note that I realize this is not a strict match to the original requirements. It won't matter what you use between single digits -- letters, various punctuation, et. al., will all return just fine with zero in front of any single digit. If you want it to fail or skip strings that may contain characters other than digits and ., the regex would need to be changed.
you can use this simple regex:
\\b\\d\\b
and replace with 0$0
I have some pdf files in my downloads folder with a particular string pattern.I need to take the latest saved file.
My code is
public static void main(String args[])
{
String directory=System.getProperty("user.home")+"\\Downloads";
File dir=new File(directory);
for(File file:dir.listFiles())
{
if(file.getName().endsWith(".pdf"))
{
String res=file.getName();
match(res);
//System.out.println(file.getName());
}
}
}
private static void match(String res) {
String pattern="[a-zA-Z][0-9][0-9]CR[0-9][0-9][0-9][0-9]-[a-zA-Z][a-zA-Z][a-zA-Z]-[A-Z]-[0-9] \\(\\d+\\).pdf";
Pattern r=Pattern.compile(pattern);
Matcher m=r.matcher(res);
if(m.find())
{
System.out.println("******* Match *********"+m.group());
}
else
{
System.out.println("******No match*******");
}
}
And my output is like this
******* Match *********F90CR0010-HBR-C-4 (5).pdf
******* Match *********F90CR0010-HBR-C-4 (6).pdf
******* Match *********F90CR0010-HBR-C-4 (7).pdf
Now I need to find the file which has the greatest number inside the braces ().So In this case I need
******* Match *********F90CR0010-HBR-C-4 (7).pdf
Here how to match the greatest integer in regex?
thanks
A simple strategy may be to retrieve the digit in parenthesis, to fill some sorted map where the mapping would be digit -> filename, and finally to get the filename associated to the greatest digit. I don't think it's possible simply with a REGEX.
You can add a group to your regex, and and a counter to keep the number:
int greater = 0;
String greaterFile = "";
String pattern="[a-zA-Z][0-9][0-9]CR[0-9][0-9][0-9][0-9]-[a-zA-Z][a-zA-Z][a-zA-Z]-[A-Z]-[0-9] \\((\\d+)\\).pdf";
//^^^^^^^^
Pattern r=Pattern.compile(pattern);
Matcher m=r.matcher("F90CR0010-HBR-C-4 (7).pdf");
if(m.find())
{
System.out.println("******* Match *********"+m.group());
int number = Integer.parseInt(m.group(1));
if (number > greater)
{
greater = number;
greaterFile = m.group();
}
}
else
{
System.out.println("******No match*******");
}
System.out.println("Greater number is " + greater + " for " + greaterFile);
Notice that I did not escape the () in \\((\\d+)\\).pdf, this is because of their function in the expression, they define a group.
I can later retrieve the group using its index, knowing that the group 0 is the entire match, the next group, 1, is our number.
This is for one file, but you can easily transpose it to your context.
Edit regarding your regex, it can be simplified like this:
String pattern="[a-zA-Z]\\d{2}CR\\d{4}-[a-zA-Z]{3}-[A-Z]-\\d \\((\\d+)\\).pdf";
\\d means a number and {n} means the previous expression n times.
I have an operation that deals with many space delimited strings, I am looking for a regex for the String matches function which will trigger pass if first two strings before first space starts with capital letters and will return false if they are not.
Examples:
"AL_RIT_121 PA_YT_32 rit cell 22 pulse"
will return true as first two substring AL_RIT_121 and PA_YT_32 starts with capital letter A and P respectively
"AL_RIT_252 pa_YT_21 mal cell reg 32 1 ri"
will return false as p is in lower case.
Pattern.compile("^\\p{Lu}\\S*\\s+\\p{Lu}")
will work with the .find() method. There's no reason to use matches on a prefix test, but if you have an external constraint, just do
Pattern.compile("^\\p{Lu}\\S*\\s+\\p{Lu}.*", Pattern.DOTALL)
To break this down:
^ matches the start of the string,
\\p{Lu} matches any upper-case letter,
\\S* matches zero or more non-space characters, including _
\\s+ matches one or more space characters, and
the second \\p{Lu} matches the upper-case letter starting the second word.
In the second variant, .* combined with Pattern.DOTALL matches the rest of the input.
Simply string.matches("[A-Z]\\w+ [A-Z].*")
You can use a specific regex if those two examples demonstrate your input format:
^(?:[A-Z]+_[A-Z]+_\d+\s*)+
Which means:
^ - Match the beginning of the string
(?: - Start a non-capturing group (used to repeat the following)
[A-Z]+ - Match one or more uppercase characters
_ - Match an underscore
[A-Z]+ - Match one or more uppercase characters
_ - Match an underscore
\d+ - Match one or more decimals (0-9)
\s* - Match zero or more space characters
)+ - Repeat the above group one or more times
You would use it in Java like this:
Pattern pattern = Pattern.compile("^(?:[A-Z]+_[A-Z]+_\\d+\\s*)+");
Matcher matcher = p.matcher( inputString);
if( matcher.matches()) {
System.out.println( "Match found.");
}
Check this out:
public static void main(String[] args)
{
String text = "AL_RIT_121 pA_YT_32 rit cell 22 pulse";
boolean areFirstTwoWordsCapitalized = areFirstTwoWordsCapitalized(text);
System.out.println("areFirstTwoWordsCapitalized = <" + areFirstTwoWordsCapitalized + ">");
}
private static boolean areFirstTwoWordsCapitalized(String text)
{
boolean rslt = false;
String[] words = text.split("\\s");
int wordIndx = 0;
boolean frstWordCap = false;
boolean scndWordCap = false;
for(String word : words)
{
wordIndx++;
//System.out.println("word = <" + word + ">");
Pattern ptrn = Pattern.compile("^[A-Z].+");
Matcher mtchr = ptrn.matcher(word);
while(mtchr.find())
{
String match = mtchr.group();
//System.out.println("\tMatch = <" + match + ">");
if(wordIndx == 1)
{
frstWordCap = true;
}
else if(wordIndx == 2)
{
scndWordCap = true;
}
}
}
rslt = frstWordCap && scndWordCap;
return rslt;
}
Try this:
public class RegularExp
{
/**
* #param args
*/
public static void main(String[] args) {
String regex = "[A-Z][^\\s.]*\\s[A-Z].*";
String str = "APzsnnm lmn Dlld";
System.out.println(str.matches(regex));
}
}