Find index of a non-digit using regex in Java

Find index of a non-digit using regex in Java - java

This is probably an easy question but I haven't been able to figure it out. I want to find the next letter (A to Z) in a string after a certain point in the string. The result I want from below is for the string money to be "$5. 00" but num2 always comes up as -1.
String text = "hello$5. 00Bla bla words that don't matter"
int num1 = text.indexOf('$');
int num2 = text.indexOf("[a-zA-Z]" , num1 + 1); // Always results in -1
String money = text.substring(num1, num2);

To find the first letter following a $ dollar sign, using regex, you can use the following regex:
\$\P{L}*\p{L}
Explanation:
\$ Match a $ dollar sign
\P{L}* Match 0 or more characters that are not Unicode letters
\p{L} Match a Unicode letter
The index of the letter is then the last character of the matched substring, i.e. one character before the end() of the match.
Example
String text = "hello$5. 00Bla bla words that don't matter";
Matcher m = Pattern.compile("\\$\\P{L}*\\p{L}").matcher(text);
if (m.find()) {
int idx = m.end() - 1;
System.out.println("Letter found at index " + idx + ": '" + text.substring(idx) + "'");
}
Output
Letter found at index 11: 'Bla bla words that don't matter'
UPDATE
It seems the actual question was slightly different than answered above, so to capture the text from $ dollar sign (inclusive) and all following characters up to first letter (exclusive) or end of string, use this regex:
\$\P{L}*
Example
String text = "hello$5. 00Bla bla words that don't matter";
Matcher m = Pattern.compile("\\$\\P{L}*").matcher(text);
if (m.find()) {
String money = m.group();
System.out.println("money = \"" + money + "\"");
}
Output
money = "$5. 00"

This is untested, as my workstation isn't set up for Java 9, but using that release, you should be able to do this:
String result = text.substring(text.indexOf('$'), text.length())
.takeWhile(ch -> !Character.isAlphabetic(ch))
.map(Object::toString).collect(Collectors.joining());
result will evaluate to $5. 00
Note: Stream<T>#takeWhile is a Java 9 feature

Thanks for the help everyone. I found a way to do this without using regex.
String money = "";
while (!Character.isLetter(text.charAt(num1))) {
money = money + text.charAt(num1);
num1++;
}
It might need some work later but it seems to work.

Related

finding out if the characters of a string exist in another string with the same order or not using regex in java

i want to write a program in java using REGEX that gets 2 strings from the input ( the first one is shorter than the second one ) and then if the characters of the first string was inside the second string with the same order but they do not need to be next to each other ( it is not substring ) it outputs "true" and if not it outputs "false" here's an example:
example1:
input:
phantom
pphvnbajknzxcvbnatopopoim
output:
true
in the above example it is obvious we can see the word "phantom" in the second string (the characters are in the same order)
example2:
input:
apple
fgayiypvbnltsrgte
output:
false
as you can see apple dos not exists in the second string with the conditions i have earlier mentioned so it outputs false
import java.util.Scanner;
public class Main {
public static void main(String[] args) {
Scanner input = new Scanner(System.in);
String word1 = input.next();
String word2 = input.next();
String pattern = "";
int n = word1.length();
char[] word1CharArr = word1.toCharArray();
for ( int i = 0 ; i < n ; i++) {
pattern += "[:alnum:]" +word1CharArr[i]+"[:alnum:]";
// pattern += ".*\\b|\\B" +word1CharArr[i]+"\\b|\\B";
}
pattern = "^" + pattern + "$";
// pattern = "(?s)" + pattern + ".*";
// System.out.println(pattern);
System.out.println(word2.matches(pattern));
}
}
here is what i did . i broke my first string to its characters and want to use REGEX before and after each character to determine the pattern. I have searched much about REGEX and how to use it but still i have problem here. the part i have commented comes out from one of my searches but it did not work
I emphasize that i want to solve it with REGEX not any other way.

[:alnum:] isn't a thing. Even if it is, that would match exactly one character, not 'any number, from 0 to infinitely many of them'.
You just want phantom with .* in the middle: ^.*p.*h.*a.*n.*t.*o.*m.*$' is all you need. After all, phantom` 'fits', and so does paahaanaataaoaamaa -
String pattern = word1.chars()
.mapToObj(c -> ".*" + (char) c)
.collect(Collectors.joining()) + ".*";
should get the job done.

Split a string contain hyphen and minus sign

This question is similar to my previous question Split a string contain dash and minus sign. But I asked it in a wrong and then it got a slightly different semantics and people answered(including) in that perspective. Therefore rather than modifying that question I thought it's better to ask in a new question.
I have to split a string which contain hyphen-minus character and minus sign. I tried to split based on the unicode character (https://en.wikipedia.org/wiki/Hyphen#Unicode), still it considering minus sign same as hyphen-minus character. Is there a way I can solve it?
Expected output
(coun)
(US)
-1
Actual output
(coun)
(US)
// actually blank line will print here but SO editor squeezing the blank line
1
public static void main(String[] args) {
char dash = '-';
int i = -1;
String a = "(country)" + dash + "(US)" + dash + i;
Pattern p = Pattern.compile("-", Pattern.LITERAL);
String[] m = p.split(a);
for (String s : m) {
System.out.println(s);
}
}

char dash = '\u2010'; // 2010 is hyphen, 002D is hyphen-minus
int i = -1;
String a = "(country)" + dash + "(US)" + dash + i;
Pattern p = Pattern.compile("\u2010", Pattern.LITERAL);
String[] m = p.split(a);
for (String s : m) {
System.out.println(s);
}
The string representation of an integer always uses the hyphen-minus as the negative sign:
From Integer.toString:
If the first argument is negative, the first element of the result is the ASCII minus character '-' ('\u002D'). If the first argument is not negative, no sign character appears in the result.
so in the end your string has 3 hyphen-minus characters. That's why split can't distinguish between them.
Since you can't change the string representation of an integer, you need to change the dash variable to store a hyphen instead of hyphen-minus. Now there are 2 hyphens and 1 hyphen-minus in your string, making split able to distinguish between them.

Regular expression for phrase contain literals and numbers but is not all phrase as a number only with fixed range length

i want to have regular expression to check input character as a-z and 0-9 but i do not want to allow input as just numeric value at all ( must be have at least one alphabetic character)
for example :
413123123123131
not allowed but if have just only one alphabetic character in any place of phrase it's ok
i trying to define correct Regex for that and at final i raised to
[0-9]*[a-z].*
but in now i confused how to defined {x,y} length of phrase i want to have {9,31} but after last * i can not to have length block too i trying to define group but unlucky and not worked
tested at https://www.debuggex.com/
how can i to add it ??

What you seek is
String regex = "(?=.{9,31}$)\\p{Alnum}*\\p{Alpha}\\p{Alnum}*";
Use it with String#matches() / Pattern#matches() method to require a full string match:
if (s.matches(regex)) {
return true;
}
Details
^ - implicit in matches() - matches the start of string
(?=.{9,31}$) - a positive lookahead that requires 9 to 31 any chars other than line break chars from the start to end of the string
\\p{Alnum}* - 0 or more alphanumeric chars
\\p{Alpha} - an ASCII letter
\\p{Alnum}* - 0 or more alphanumeric chars
Java demo:
String lines[] = {"413123123123131", "4131231231231a"};
Pattern p = Pattern.compile("(?=.{9,31}$)\\p{Alnum}*\\p{Alpha}\\p{Alnum}*");
for(String line : lines)
{
Matcher m = p.matcher(line);
if(m.matches()) {
System.out.println(line + ": MATCH");
} else {
System.out.println(line + ": NO MATCH");
}
}
Output:
413123123123131: NO MATCH
4131231231231a: MATCH

This might be what you are looking for.
[0-9a-zA-Z]*[a-zA-Z][0-9a-zA-Z]*
To help explain it, think of the middle term as your one required character and the outer terms as any number of alpha numeric characters.
Edit: to restrict the length of the string as a whole you may have to check that manually after matching. ie.
if (str.length > 9 && str.length < 31)
Wiktor does provide a solution that involves more regex, please look at his for a better regex pattern

Try this Regex:
^(?:(?=[a-z])[a-z0-9]{9,31}|(?=\d.*[a-z])[a-z0-9]{9,31})$
OR a bit shorter form:
^(?:(?=[a-z])|(?=\d.*[a-z]))[a-z0-9]{9,31}$
Demo
Explanation(for the 1st regex):
^ - position before the start of the string
(?=[a-z])[a-z0-9]{9,31} means If the string starts with a letter, then match Letters and digits. minimum 9 and maximum 31
| - OR
(?=\d.*[a-z])[a-z0-9]{9,31} means If the string starts with a digit followed by a letter somewhere in the string, then match letters and digits. Minimum 9 and Maximum 31. This also ensures that If the string starts with a digit and if there is no letter anywhere in the string, there won't be any match
$ - position after the last literal of the string
OUTPUT:
413123123123131 NO MATCH(no alphabets)
kjkhsjkf989089054835werewrew65 MATCH
kdfgfd4374985794379857984379857weorjijuiower NO MATCH(length more than 31)
9087erkjfg9080980984590p465467 MATCH
4131231231231a MATCH
kjdfg34 NO MATCH(Length less than 9)

Here's the regex:
[a-zA-Z\d]*[a-zA-Z][a-zA-Z\d]*
The trick here is to have something that is not optional. The leading and trailing [a-zA-Z\d] has a * quantifier, so they are optional. But the [a-zA-Z] in the middle there is not optional. The string must have a character that matches [a-zA-Z] in order to be matched.
However, you need to check the length of the string with length afterwards and not with regex. I can't think of any way how you can do this in regex.
Actually, I think you can do this regexless pretty easily:
private static boolean matches(String input) {
for (int i = 0 ; i < input.length() ; i++) {
if (Character.isLetter(input.charAt(i))) {
return input.length() >= 9 && input.length() <= 31;
}
}
return false;
}

how to find only exact cases of the world "is" within another string, not including "is" that is part of word "this"

I'm new to Java and is trying to learn regular expression. I'm trying to find the exact world "is" within another string. Below is the code I have came up with.
int count = 0;
String text
= "This is the text which is to be searched "
+ "for occurrences of the word 'is'.";
String patternString = "is";
Pattern p1 = Pattern.compile(patternString);
Matcher m1 = p1.matcher(text);
while (m1.find()) {
count++;
System.out.printf("found %s %d: from index %d to index %d%n",
patternString, count, m1.start(), m1.end() );
}
However, not only did it found all "is", it also found "is" that is part of "this", which is not what I want. How could I find only exact cases of "is"?

Try with this : String patternString = "\\bis\\b";

This can be done using negative lookarounds:
String patternString = "(?<!\\w)is(?!\\w)";
The regex
(?<!\w)is(?!\w)
matches the word is only if it's neither preceded nor followed by a word character (\w).
Output:
found (?<!\w)is(?!\w) 1: from index 5 to index 7
found (?<!\w)is(?!\w) 2: from index 23 to index 25
found (?<!\w)is(?!\w) 3: from index 70 to index 72

Iterating through String with .find() in Java regex

I'm currently trying to solve a problem from codingbat.com with regular expressions.
I'm new to this, so step-by-step explanations would be appreciated. I could solve this with String methods relatively easily, but I am trying to use regular expressions.
Here is the prompt:
Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.
wordEnds("abcXY123XYijk", "XY") → "c13i"
wordEnds("XY123XY", "XY") → "13"
wordEnds("XY1XY", "XY") → "11"
etc
My code thus far:
String regex = ".?" + word+ ".?";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
String newStr = "";
while(m.find())
newStr += m.group().replace(word, "");
return newStr;
The problem is that when there are multiple instances of word in a row, the program misses the character preceding the word because m.find() progresses beyond it.
For example: wordEnds("abc1xyz1i1j", "1") should return "cxziij", but my method returns "cxzij", not repeating the "i"
I would appreciate a non-messy solution with an explanation I can apply to other general regex problems.

This is a one-liner solution:
String wordEnds = input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
This matches your edge case as a look ahead within a non-capturing group, then matches the usual (consuming) case.
Note that your requirements don't require iteration, only your question title assumes it's necessary, which it isn't.
Note also that to be absolutely safe, you should escape all characters in word in case any of them are special "regex" characters, so if you can't guarantee that, you need to use Pattern.quote(word) instead of word.
Here's a test of the usual case and the edge case, showing it works:
public static String wordEnds(String input, String word) {
word = Pattern.quote(word); // add this line to be 100% safe
return input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
}
public static void main(String[] args) {
System.out.println(wordEnds("abcXY123XYijk", "XY"));
System.out.println(wordEnds("abc1xyz1i1j", "1"));
}
Output:
c13i
cxziij

Use positive lookbehind and postive lookahead which are zero-width assertions
(?<=(.)|^)1(?=(.)|$)
^ ^ ^-looks for a character after 1 and captures it in group2
| |->matches 1..you can replace it with any word
|
|->looks for a character just before 1 and captures it in group 1..this is zero width assertion that doesn't move forward to match.it is just a test and thus allow us to capture the values
$1 and $2 contains your value..Go on finding till the end
So this should be like
String s1 = "abcXY123XYiXYjk";
String s2 = java.util.regex.Pattern.quote("XY");
String s3 = "";
String r = "(?<=(.)|^)"+s2+"(?=(.)|$)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s1);
while(m.find()) s3 += m.group(1)+m.group(2);
//s3 now contains c13iij
works here

Use regex as follows:
Matcher m = Pattern.compile("(.|)" + Pattern.quote(b) + "(?=(.?))").matcher(a);
for (int i = 1; m.find(); c += m.group(1) + m.group(2), i++);
Check this demo.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Find index of a non-digit using regex in Java - java

Thanks for the help everyone. I found a way to do this without using regex. String money = ""; while (!Character.isLetter(text.charAt(num1))) { money = money + text.charAt(num1); num1++; } It might need some work later but it seems to work.

Related

finding out if the characters of a string exist in another string with the same order or not using regex in java

Split a string contain hyphen and minus sign

Regular expression for phrase contain literals and numbers but is not all phrase as a number only with fixed range length

how to find only exact cases of the world "is" within another string, not including "is" that is part of word "this"

Iterating through String with .find() in Java regex

Categories

Resources