Find substring between two strings

Find substring between two strings - java

I have a string such as:
file = "UserTemplate324.txt"
I'd like to extract "324". How can I do this without any external libraries like Apache StringUtils?

Assuming you want "the digits just before the dot":
String number = str.replaceAll(".*?(\\d+)\\..*", "$1");
This uses regex to find and capture the digits and replace the entire input with them.
Of minor note is the use of a non-greedy quantifier to consume the minimum leading input (so as not to consume the leading part of the numbers too).

If you want to exclude the non-digit characters:
String number = file.replaceAll("\\D+", "");
\\D+ means a series of one or more non digit (0-9) characters and you replace any such series by "" i.e. nothing, which leaves the digits only.

Related

Modifying part of a regex in replaceAll call

I am trying to format a string with a regex as follows:
String string = "5.07+12.0+2.14";
string = string.replaceAll("\\.0[^0-9]","");
What I think will happen is the string will become:
5.07+122.14 //the regex will delete the .0+ next to the 12
How can I create the regex so that it deletes only the .0 not the + sign?
I would prefer to do everything in the same call to "replaceAll"
thanks for any suggestions

Matched characters will be replaced. So, instead of matching the non-digit at the end, you can use lookahead, which will perform the desired check but won't consume any characters. Also, the shorthand for a non-digit is \D, which is a bit nicer to read than [^0-9]:
String string = "5.07+12.0+2.14";
string = string.replaceAll("\\.0(?=\\D)","");
If you want to replace all trailing zeros (for example, replace 5.00 with 5 instead of 50, which you probably don't want), then repeat the 0 one or more times with + to ensure that all zeros after the decimal point get replaced:
String string = "5.07+12.000+2.14";
string = string.replaceAll("\\.0+(?=\\D)","");
If the string never contains alphabetical or underscore _ characters (those and numeric characters count as word characters), then you can make it even prettier with a word boundary instead of a lookahead. A word boundary, as it sounds, will match a position with a word character on one side and a non-word character on the other side, with \b:
string = string.replaceAll("\\.0+\\b","");

Regex for a random set of chars and digits

I am looking for a regex that matches only when it sees a string that is randomly filled by digits and chars.
For example, adfak332arg3 is allowed but 332352 and fagaaah are not allowed. .*[^\\s] looks fine for strings with only chars but how to fix it to accepts the desired strings and refuses the other two types?

Use a positive lookahead (?=) to ensure that the string contains required characters.
^(?=.*[a-zA-Z])(?=.*\d)[a-zA-Z\d]+$
Test this regex pattern here.

You can try this regex
"[\\d\\w]*\\d\\w[\\d\\w]*|[\\d\\w]*\\w\\d[\\d\\w]*"

If you need just a mixed string of characters A-Z, a-z and 0-9 you can use:
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])$
If you want to force the string to have a minimum number of characters in your string you can use (e.g. minimum 8 in the string):
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9]).{8,}$
If you want to have a string length from min-length to max-length then use (e.g. string of at least 5 characters and max 20 characters):
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9]).{5,20}$

To ensure that an input contains digits as well as characters, you could use this regex:
^(?:[A-Za-z]+\\d+|\\d+[A-Za-z]+)[A-Za-z\\d]*$
The regex ensures that the input contains at least a number and a character, and allows only numbers or characters (no special characters etc.)
(?:[A-Za-z]+\d+|\d+[A-Za-z]+) ensures that it starts with one or more characters followed by digits or alternatively |\d+[A-Za-z]+ one or more digits followed by one or more characters
[A-Za-z\d]* allows any number of characters or digits after the previous check
^ and $ to match starting and ending anchor
Regex101 Demo
Hope this helps!

Try this Regex.
[A-z][0-9]|[0-9][A-z]

Merge multiple regex in Java

I have written a regex to omit the characters after the first occurrence of some characters (, and #)
String number = "(123) (456) (7890)#123";
number = number.replaceAll("[,#](.*)", ""); //This is the 1st regex
Then a second regex to get only numbers (remove spaces and other non numeric characters)
number = number.replaceAll("[^0-9]+", ""); //This is the 2nd regex
Output: 1234567890
How can I merge the two regex into one like piping the O/p from first regex to the second.

You can combine both regex in the following way.
String number = "(123) (456) (7890)#123";
number = number.replaceAll("[,#](.*)", "").replaceAll("[^0-9]+", "");

So you need to remove all symbols other than digits and the whole rest of the string after the first hash symbol or a comma.
You cannot just concatenate the patterns with |operator because one of the patterns is anchored implicitly at the end of the string.
You need to remove any symbols but digits AND hashes with commas first since the tegex engine processes the string from left to right and then you can add the alternative to match a comma or hash with any text after them. Use DOTALL modifier in case you have newline symbols in your input.
Use
 (?s)[,#].*$|[^#,0-9]+

regex to strip leading zeros treated as string

I have numbers like this that need leading zero's removed.
Here is what I need:
00000004334300343 -> 4334300343
0003030435243 -> 3030435243
I can't figure this out as I'm new to regular expressions. This does not work:
(^0)

You're almost there. You just need quantifier:
str = str.replaceAll("^0+", "");
It replaces 1 or more occurrences of 0 (that is what + quantifier is for. Similarly, we have * quantifier, which means 0 or more), at the beginning of the string (that's given by caret - ^), with empty string.

Accepted solution will fail if you need to get "0" from "00". This is the right one:
str = str.replaceAll("^0+(?!$)", "");
^0+(?!$) means match one or more zeros if it is not followed by end of string.
Thank you to the commenter - I have updated the formula to match the description from the author.

If you know input strings are all containing digits then you can do:
String s = "00000004334300343";
System.out.println(Long.valueOf(s));
// 4334300343
Code Demo
By converting to Long it will automatically strip off all leading zeroes.

Another solution (might be more intuitive to read)
str = str.replaceFirst("^0+", "");
^ - match the beginning of a line
0+ - match the zero digit character one or more times
A exhausting list of pattern you can find here Pattern.

\b0+\B will do the work. See demo \b anchors your match to a word boundary, it matches a sequence of one or more zeros 0+, and finishes not in a word boundary (to not eliminate the last 0 in case you have only 00...000)

The correct regex to strip leading zeros is
str = str.replaceAll("^0+", "");
This regex will match 0 character in quantity of one and more at the string beginning.
There is not reason to worry about replaceAll method, as regex has ^ (begin input) special character that assure the replacement will be invoked only once.
Ultimately you can use Java build-in feature to do the same:
String str = "00000004334300343";
long number = Long.parseLong(str);
// outputs 4334300343
The leading zeros will be stripped for you automatically.

I know this is an old question, but I think the best way to do this is actually
str = str.replaceAll("(^0+)?(\d+)", "$2")
The reason I suggest this is because it splits the string into two groups. The second group is at least one digit. The first group matches 1 or more zeros at the start of the line. However, the first group is optional, meaning that if there are no leading zeros, you just get all of the digits. And, if str is only a zero, you get exactly one zero (because the second group must match at least one digit).
So if it's any number of 0s, you get back exactly one zero. If it starts with any number of 0s followed by any other digit, you get no leading zeros. If it starts with any other digit, you get back exactly what you had in the first place.

Here is the simple and proper solution.
str = str.replaceAll(/^0+/g, "");
Global Flag g is required when using replaceAll with regex

How can I express such requirement using Java regular expression?

I need to check that a file contains some amounts that match a specific format:
between 1 and 15 characters (numbers or ",")
may contains at most one "," separator for decimals
must at least have one number before the separator
this amount is supposed to be in the middle of a string, bounded by alphabetical characters (but we have to exclude the malformed files).
I currently have this:
\d{1,15}(,\d{1,14})?
But it does not match with the requirement as I might catch up to 30 characters here.
Unfortunately, for some reasons that are too long to explain here, I cannot simply pick a substring or use any other java call. The match has to be in a single, java-compatible, regular expression.

^(?=.{1,15}$)\d+(,\d+)?$
^ start of the string
(?=.{1,15}$) positive lookahead to make sure that the total length of string is between 1 and 15
\d+ one or more digit(s)
(,\d+)? optionally followed by a comma and more digits
$ end of the string (not really required as we already checked for it in the lookahead).
You might have to escape backslashes for Java: ^(?=.{1,15}$)\\d+(,\\d+)?$
update: If you're looking for this in the middle of another string, use word boundaries \b instead of string boundaries (^ and $).
\b(?=[\d,]{1,15}\b)\d+(,\d+)?\b
For java:
"\\b(?=[\\d,]{1,15}\\b)\\d+(,\\d+)?\\b"
More readable version:
"\\b(?=[0-9,]{1,15}\\b)[0-9]+(,[0-9]+)?\\b"

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Find substring between two strings - java

I have a string such as: file = "UserTemplate324.txt" I'd like to extract "324". How can I do this without any external libraries like Apache StringUtils?

If you want to exclude the non-digit characters: String number = file.replaceAll("\\D+", ""); \\D+ means a series of one or more non digit (0-9) characters and you replace any such series by "" i.e. nothing, which leaves the digits only.

Related

Modifying part of a regex in replaceAll call

Regex for a random set of chars and digits

Merge multiple regex in Java

regex to strip leading zeros treated as string

How can I express such requirement using Java regular expression?

Categories

Resources