Regex expression to help count only the zeros in a string

Regex expression to help count only the zeros in a string - java

I'm trying to count the number of 0s in a string of numbers. Not exactly just the character 0, but the number zero. e.g. I want to count 0, 0.0, 0.000 etc. The numbers will be separated by spaces, e.g.:
1.0 5.0 1 5.4 12 0.1 14.2675 0.0 0.00005
A simple search for " 0" in the string nearly does the job (I have to first insert a leading space in the string for this to work - in case the first number is a zero). However it doesn't work for numbers in the form 0.x e.g. 0.1, 0.02 etc. I suppose I need to check for 0 and see if there is a decimal point and then non-zero numbers after it, but I have no idea how to do that. Something like:
" 0*|(0\\.(?!\\[1-9\\]))"
Anyone have any ideas how I might accomplish this? Using a regular expression preferably. Or if it it's easier, I'm happy to count the number of non-zero elements. Thank you.
NOTE: I'm using split in Java to do this (split the string using the regular expression and then count with .length()).

How about this:
(?<=^|\s)[0.]+(?=\s|$)
Explanation:
(?<=^|\s) # Assert position after a space or the start of the string
[0.]+ # Match one or more zeroes/decimal points
(?=\s|$) # Assert position before a space or the end of the string
Remember to double the backslashes in Java strings.

You should instead split by whitespace and use Double.parseDouble() on each fragment, then if it indeed is a double, compare it to 0.
String[] parts = numbers.split("\\s+");
int numZeros = 0;
for (String s: parts) {
try {
if (Double.parseDouble(s) == 0) {
numZeros ++;
}
}
catch (Exception e) {
}
}
There is no easy solution for the regex anyway. The easiest thought would be to use the \b boundary operator, but it fails badly. Also, the Double.parseDouble means that things like -0 are supported too.

split() is not the solution to this problem, though it can be part of the solution, as Antti's answer demonstrated. You'll find it much easier to match the zero-valued numbers with find() in a loop and count the matches, like this:
String s = "1.0 5.0 1 5.4 12 0.1 14.2675 0.0 0.00005 0. .0 0000 -0.0";
Pattern p = Pattern.compile("(?<!\\S)-?(?:0+(?:\\.?0*)|\\.0+)(?!\\S)");
Matcher m = p.matcher(s);
int n = 0;
while (m.find()) {
System.out.printf("%n%s ", m.group());
n++;
}
System.out.printf("%n%n%d zeroes total%n", n);
output:
0.0
0.
.0
0000
-0.0
5 zeroes total
This is how Tim meant for you to use the regex in his answer, too (I think). Breaking down my regex, we have:
(?<!\\S) is a negative lookbehind that matches a position that's not preceded by a non-whitespace character. It's equivalent to Tim's positive lookbehind, (?<=^|\s), which explicitly matches the beginning of the string or right after a whitespace character.
-?(?:0+(?:\\.?0*)|\\.0+) matches an optional minus sign followed by at least one zero and at most one decimal point.
(?!\\S) is equivalent to (?=\s|$) - it matches right before a whitespace character or at the end of the string.
The lookbehind and lookahead ensure that you always match the whole token, just like you would if you were splitting on whitespace. Without those, it would also match zeros that are part of a non-zero tokens like 1230.0456.
EDIT (in response to a comment): My main objection to using split() is that it's needlessly convoluted. You're creating an array of strings comprising all the parts of the string you don't care about, then doing some math on the array's length to get the information you want. Sure it's only one line of code, but it does a very poor job of communicating its intent. Anyone who's not not already familiar with the idiom could have a very difficult time sussing out what it does.
Then there's the trailing empty tokens issue: if you use the split technique on my revised sample string you'll get a count of 4, not 5. That's because the last chunk of the string matches the split regex, meaning the last token should be an empty string. But Java (following Perl's lead) silently drops trailing empty tokens by default. You can override that behavior by passing a negative integer as the second argument, but what if you forget to do that? It's a very easy mistake to make, and potentially a very difficult one to troubleshoot.
As for performance, the two approaches are virtually identical in speed (I don't know about memory they use). It's not likely to be a problem when working with reasonably-sized texts.

Related

Validate if input string is a number between 0-255 using regex

I am facing problem while matching input string with Regex. I want to validate input number is between 0-255 and length should be up to 3 characters long. code is working fine but when I input 000000 up to any length is shows true instead false.
Here is my code :-
String IP = "000000000000000";
System.out.println(IP.matches("(0*(?:[0-9][0-9]?|[0-2][0-5][0-5]))"));

Tested this:
static String pattern = "^(([0-1]?[0-9]?[0-9]?|2[0-4][0-9]|25[0-5])\\.){3}([0-1]?[0-9]?[0-9]?|2[0-4][0-9]|25[0-5]){1}$";
It works for the following:
IP Addresses xxx.xxx.xxx.xxx / xx.xx.xx.xx / x.x.x.x / mix of these.
Leading zeros are allowed.
Range 0-255 / maximum 3 digts.

You may use this regex:
boolean valid = IP.matches("^(?:1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])$");
// since .matches expects complete match you can omit anchors
boolean valid = IP.matches("(?:1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])");
RegEx Demo

You can use this pattern which matches "0", "1", ... "255":
"([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])"
Demo on Ideone

Using boundary tags to ensure only (0 to 255) numbers is matched, the optimized pattern that I have to offer is:
\b(?:1\d{2}|2[0-4]\d|[1-9]?\d|25[0-5])\b
Pattern Demo (in PHP/PCRE to show step count)
4010 steps when checking a list from 0 to 256.
This pattern will not match 01 or 001. (no match on one or more leading zeros)
Considerations:
Use quantifiers on consecutive duplicate characters.
Organize the alternatives not in sequential order (single-digit, double-digit, 100's, under-249, 250-255) but with quickest mis-matches first.
Avoid non-essential capture (or non-capture) groups. (despite seeming logical to condense the "two hundreds" portion of the pattern)

Please try this
"^(((\d|0\d|00\d|\d{2}|0\d{2}|1\d{2}|2[0-4]\d|2[0-5]{2})\.){3})(\d|0\d|00\d|\d{2}|0\d{2}|1\d{2}|2[0-4]\d|2[0-5]{2})$"
It works also with leading zeroes

boolean valid = IP.matches("(0?[0-9]{1,2}|1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])");

Complete ip inet4 match :
JS
/(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])/g.exec(myIp);
https://regex101.com/r/tU3gC3/12
Minified :
/(1?(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.){3}(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])/g.exec(myIp);
https://regex101.com/r/tU3gC3/13

This will work for following pattern and ip containing initial zeros
e.g: 023.45.12.56
pattern=(\\d{1,2}|(0|1)\\d{2}|2[0-4]\\d|25[0-5]);

If you need leading zeros, try this:
"((\\d{1,2}|[01]\\d{1,2}|[0-2][0-4]\\d|25[0-5])\\.){3}(\\d{1,2}|[01]\\d{1,2}|[0-2][0-4]\\d|25[0-5])"
It satisfies following conditions: IP address is a string in the form "A.B.C.D", where the value of A, B, C, and D may range from 0 to 255. Leading zeros are allowed. The length of A, B, C, or D can't be greater than 3.
Maybe somebody can help with additional simplifying?

If you want to validate ip4 with 'ip/mask', so regex looks like this:
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\/([0-9]|[1-2][0-9]|3[0-2]))$
Just ip
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
JS code to test it
function isMatchByRegexp(stringToValidate, regexp) {
var re = new RegExp(regexp);
return re.test(stringToValidate);
}

(2[0-4][0-9])|([0-1]?[0-9]?[0-9])
To match 0 to 249 specifically

You can simplify it by thinking in four conditions that might happen
String zeroTo255 = "((0|1)\\d{2}|2[0-4]\\d|25[0-5]|\\d{1,2})";
String validIpPattern = zeroTo255 + "\\." + zeroTo255 + "\\." + zeroTo255 + "\\." + zeroTo255;
(0|1)\d{2} catches any three digit number starting with 0 or 1.
2[0-4]\d catches numbers between 200 and 249.
25[0-5] catches numbers between 250 and 255.
\d{1,2} catches any one or two digit number
you can test it in https://regexr.com/
To test it here in regexr you need to remove one backslash
(\\d --> \d)
((0|1)\d{2}|2[0-4]\d|25[0-5]|\d{1,2})
Note that \d represents digits in regular expressions, same as [0-9]

Which is the right regular expression to use for Numbers and Strings?

I am trying to create simple IDE and coloring my JTextPane based on
Strings (" ")
Comments (// and /* */)
Keywords (public, int ...)
Numbers (integers like 69 and floats like 1.5)
The way i color my source code is by overwritting the insertString and removeString methods inside the StyledDocument.
After much testing, i have completed comments and keywords.
Q1: As for my Strings coloring, I color my strings based on this regular expression:
Pattern strings = Pattern.compile("\"[^\"]*\"");
Matcher matcherS = strings.matcher(text);
while (matcherS.find()) {
setCharacterAttributes(matcherS.start(), matcherS.end() - matcherS.start(), red, false);
}
This works 99% of the time except for when my string contains a specific kind of string where there is a "\ inside the code. This messes up my whole color coding.
Can anyone correct my regular expression to fix my error?
Q2: As for Integers and Decimal coloring, numbers are detected based on this regular expression:
Pattern numbers = Pattern.compile("\\d+");
Matcher matcherN = numbers.matcher(text);
while (matcherN.find()) {
setCharacterAttributes(matcherN.start(), matcherN.end() - matcherN.start(), magenta, false);
}
By using the regular expression "\d+", I am only handling integers and not floats. Also, integers that are part of another string are matched which is not what i want inside an IDE. Which is the correct expression to use for integer color coding?
Below is a screenshot of the output:
Thank you for any help in advance!

For the strings, this is probably the fastest regex -
"\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\""
Formatted:
" [^"\\]*
(?: \\ . [^"\\]* )*
"
For integers and decimal numbers, the only foolproof expression I know of is
this -
"(?:\\d+(?:\\.\\d*)?|\\.\\d+)"
Formatted:
(?:
\d+
(?: \. \d* )?
| \. \d+
)
As a side note, If you're doing each independently from the start of
the string you could be possibly overlapping highlights.

Try with:
\\b\\d+(\\.\\d+)?\\b for int, float and double,
"(?<=[{(,=\\s+]+)".+?"(?=[,;)+ }]+)" for Strings,

For Integer go with
(?<!(\\^|\\d|\\.))[+-]?(\\d+(\\.\\d+)?)(?!(x|\\d|\\.))

Match a String ignoring the \" situations
".*?(?<!\\)"
The above will start a match once it sees a " and it will continue matching on anything until it gets to the next " which is not preceded by a \. This is achieved using the lookbehind feature explained very well at http://www.regular-expressions.info/lookaround.html
Match all numbers with & without decimal points
(\d+)(\.\d+)? will give you at least one digit followed by a point and any number of other digits greater than 1.
The question of matching numbers inside strings can be achieved in 2 ways :
a Modifying the above so that they have to exist with whitespace on either side \W(\d+)(\.\d+)?\W, which I don't think will be satisfactory in mathematical situations (ie 10+10) or at the end of an expression (ie 10;).
b Making this a matter of precedence. If the String colouring is checked after the numbers then that part of the string will be coloured pink at first but then immediately overwritten with red. String colouring takes precedence.

R1: I believe there is no regex-based answer to non-escaped " characters in the middle of an ongoing string. You'd need to actively process the text to eliminate or circumvent the false-positives for characters that are not meant to be matched, based on your specific syntax rules (which you didn't specify).
However:
If you mean to simply ignore escaped ones, \", like java does, then I believe you can simply include the escape+quote pair in the center as a group, and the greedy * will take care of the rest:
\"((\\\\\")|[^\"])*\"
R2: I believe the following regex would work for finding both integers and fractions:
\\d+(\.\\d+)?
You can expand it to find other kinds of numerals too. For example, \\d+([\./]\\d+)?, would additionally match numerals like "1/4".

regex to strip leading zeros treated as string

I have numbers like this that need leading zero's removed.
Here is what I need:
00000004334300343 -> 4334300343
0003030435243 -> 3030435243
I can't figure this out as I'm new to regular expressions. This does not work:
(^0)

You're almost there. You just need quantifier:
str = str.replaceAll("^0+", "");
It replaces 1 or more occurrences of 0 (that is what + quantifier is for. Similarly, we have * quantifier, which means 0 or more), at the beginning of the string (that's given by caret - ^), with empty string.

Accepted solution will fail if you need to get "0" from "00". This is the right one:
str = str.replaceAll("^0+(?!$)", "");
^0+(?!$) means match one or more zeros if it is not followed by end of string.
Thank you to the commenter - I have updated the formula to match the description from the author.

If you know input strings are all containing digits then you can do:
String s = "00000004334300343";
System.out.println(Long.valueOf(s));
// 4334300343
Code Demo
By converting to Long it will automatically strip off all leading zeroes.

Another solution (might be more intuitive to read)
str = str.replaceFirst("^0+", "");
^ - match the beginning of a line
0+ - match the zero digit character one or more times
A exhausting list of pattern you can find here Pattern.

\b0+\B will do the work. See demo \b anchors your match to a word boundary, it matches a sequence of one or more zeros 0+, and finishes not in a word boundary (to not eliminate the last 0 in case you have only 00...000)

The correct regex to strip leading zeros is
str = str.replaceAll("^0+", "");
This regex will match 0 character in quantity of one and more at the string beginning.
There is not reason to worry about replaceAll method, as regex has ^ (begin input) special character that assure the replacement will be invoked only once.
Ultimately you can use Java build-in feature to do the same:
String str = "00000004334300343";
long number = Long.parseLong(str);
// outputs 4334300343
The leading zeros will be stripped for you automatically.

I know this is an old question, but I think the best way to do this is actually
str = str.replaceAll("(^0+)?(\d+)", "$2")
The reason I suggest this is because it splits the string into two groups. The second group is at least one digit. The first group matches 1 or more zeros at the start of the line. However, the first group is optional, meaning that if there are no leading zeros, you just get all of the digits. And, if str is only a zero, you get exactly one zero (because the second group must match at least one digit).
So if it's any number of 0s, you get back exactly one zero. If it starts with any number of 0s followed by any other digit, you get no leading zeros. If it starts with any other digit, you get back exactly what you had in the first place.

Here is the simple and proper solution.
str = str.replaceAll(/^0+/g, "");
Global Flag g is required when using replaceAll with regex

Java Regex of String start with number and fixed length

I made a regular expression for checking the length of String , all characters are numbers and start with number e.g 123
Following is my expression
REGEX =^123\\d+{9}$";
But it was unable to check the length of String. It validates those strings only their length is 9 and start with 123.
But if I pass the String 1234567891 it also validates it. But how should I do it which thing is wrong on my side.

Like already answered here, the simplest way is just removing the +:
^123\\d{9}$
or
^123\\d{6}$
Depending on what you need exactly.
You can also use another, a bit more complicated and generic approach, a negative lookahead:
(?!.{10,})^123\\d+$
Explanation:
This: (?!.{10,}) is a negative look-ahead (?= would be a positive look-ahead), it means that if the expression after the look-ahead matches this pattern, then the overall string doesn't match. Roughly it means: The criteria for this regular expression is only met if the pattern in the negative look-ahead doesn't match.
In this case, the string matches only if .{10} doesn't match, which means 10 or more characters, so it only matches if the pattern in front matches up to 9 characters.
A positive look-ahead does the opposite, only matching if the criteria in the look-ahead also matches.
Just putting this here for curiosity sake, it's more complex than what you need for this.

Try using this one:
^123\\d{6}$
I changed it to 6 because 1, 2, and 3 should probably still count as digits.
Also, I removed the +. With it, it would match 1 or more \ds (therefore an infinite amount of digits).

Based on your comment below Doorknobs's answer you can do this:
int length = 9;
String prefix = "123"; // or whatever
String regex = "^" + prefix + "\\d{ " + (length - prefix.length()) + "}$";
if (input.matches(regex)) {
// good
} else {
// bad
}

Java: Change an var's value (String) according to the value of an regex

I would like to know if it is possible (and if possible, how can i implement it) to manipulate an String value (Java) using one regex.
For example:
String content = "111122223333";
String regex = "?";
Expected result: "1111 2222 3333 ##";

With one regex only, I don't think it is possible. But you can:
first, replace (?<=(.))(?!\1) with a space;
then, use a string append to append " ##".
ie:
Pattern p = Pattern.compile("(?<=(.))(?!\\1)");
String ret = p.matcher(input).replaceAll(" ") + " ##";
If what you meant was to separate all groups, then drop the second operation.
Explanation: (?<=...) is a positive lookbehind, and (?!...) a negative lookahead. Here, you are telling that you want to find a position where there is one character behind, which is captured, and where the same character should not follow. And if so, replace with a space. Lookaheads and lookbehinds are anchors, and like all anchors (including ^, $, \A, etc), they do not consume characters, this is why it works.

OK, since the OP has redefined the problem (ie, a group of 12 digits which should be separated in 3 groups of 4, then followed by ##, the solution becomes this:
Pattern p = Pattern.compile("(?<=\\d)(?=(?:\\d{4})+$)");
String ret = p.matcher(input).replaceAll(" ") + " ##";
The regex changes quite a bit:
(?<=\d): there should be one digit behind;
(?=(?:\d{4})+$): there should be one or more groups of 4 digits afterwards, until the end of line (the (?:...) is a non capturing grouping -- not sure it really makes a difference for Java).
Validating that the input is 12 digits long can easily be done with methods which are not regex-related at all. And this validation is, in fact, necessary: unfortunately, this regex will also turn 12345 into 1 2345, but there is no way around that, for the reason that lookbehinds cannot match arbitrary length regexes... Except with the .NET languages. With them, you could have written:
(?<=^(?:\d{4})+)(?=(?:\d{4})+$

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex expression to help count only the zeros in a string - java

How about this: (?<=^|\s)[0.]+(?=\s|$) Explanation: (?<=^|\s) # Assert position after a space or the start of the string [0.]+ # Match one or more zeroes/decimal points (?=\s|$) # Assert position before a space or the end of the string Remember to double the backslashes in Java strings.

Related

Validate if input string is a number between 0-255 using regex

Which is the right regular expression to use for Numbers and Strings?

regex to strip leading zeros treated as string

Java Regex of String start with number and fixed length

Java: Change an var's value (String) according to the value of an regex

Categories

Resources