Regular expression with & as separator

Regular expression with & as separator - java

I was given a long text in which I need to find all the text that are embedded in a pair of & (For example, in a text "&hello&&bye&", I need to find the words "hello" and "bye").
I try using the regex ".*&([^&])*&.*" but it doesn't work, I don't know what's wrong with that.
Any help?
Thanks

Try this way
String data = "&hello&&bye&";
Matcher m = Pattern.compile("&([^&]*)&").matcher(data);
while (m.find())
System.out.println(m.group(1));
output:
hello
bye

No regex needed. Just iterate!
boolean started = false;
List<String> list;
int startIndex;
for(int i = 0; i < string.length(); ++i){
if(string.charAt(i) != '&')
continue;
if(!started) {
started = true;
startIndex = i + 1;
}
else {
list.add(string.substring(startIndex, i)); // maybe some +-1 here in indices
}
started = !started;
}
or use split!
String[] parts = string.split("&");
for(int i = 1; i < parts.length; i += 2) { // every second
list.add(parts[i]);
}

If you don't want to use regular expressions, here's a simple way.
String string = "xyz...." // the string containing "hello", "bye" etc.
String[] tokens = string.split("&"); // this will split the string into an array
// containing tokens separated by "&"
for(int i=0; i<tokens.length; i++)
{
String token = tokens[i];
if(token.length() > 0)
{
// handle edge case
if(i==tokens.length-1)
{
if(string.charAt(string.length()-1) == '&')
System.out.println(token);
}
else
{
System.out.println(token);
}
}
}

Two problems:
You're repeating the capturing group. This means that you'll only catch the last letter between &s in the group.
You will only match the last word because the .*s will gobble up the rest of the string.
Use lookarounds instead:
(?<=&)[^&]+(?=&)
Now the entire match will be hello (and bye when you apply the regex for the second time) because the surrounding &s won't be part of the match any more:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("(?<=&)[^&]+(?=&)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}

The surrounding .* don't make sense and are unproductive. Just &([^&])*& is sufficient.

I would simplify it even further.
Check that the first char is &
Check that the last char is &
String.split("&&") on the substring between them
In code:
if (string.length < 2)
throw new IllegalArgumentException(string); // or return[], whatever
if ( (string.charAt(0) != '&') || (string.charAt(string.length()-1) != '&')
// handle this, too
String inner = string.substring(1, string.length()-1);
return inner.split("&&");

Related

Restrict special variable in java

I want to restrict getGatewaySerialNumber from taking special characters.
I have written these condition but if block it is executing only first condition it is not checking for and condition.
How to restrict Gatewayserialnumber from taking special character.
If(manifestRequestEntity.getGatewaySerialNumber().lenghth()>16 && manifestRequestEntity.getGatewaySerialNumber().matches(regex :"[0-9a-fA-F]+"))

You can use some helper methods to check if it only contains alphabets and numbers
something like this :
public static boolean checkForSpecialChar(String stringToCheck){
char ch[] = stringToCheck.toCharArray();
for(int i=0; i<ch.length; i++)
{
if((ch[i]>='A' && ch[i]<='Z') || (ch[i]>='a' && ch[i]<='z') || (ch[i]>='0' && ch[i]<='9'))
{
continue;
}
return true;
}
return false;
}

Try This.
name = "abc";
Pattern special= Pattern.compile("[^a-z0-9 ]", Pattern.CASE_INSENSITIVE);
Pattern number = Pattern.compile("[0-9]", Pattern.CASE_INSENSITIVE);
Matcher matcher = special.matcher(name);
Matcher matcherNumber = number.matcher(name);
boolean constainsSymbols = matcher.find();
boolean containsNumber = matcherNumber.find();
if(constainsSymbols){
//string contains special symbol/character
}
else if(containsNumber){
//string contains numbers
}
else{
//string doesn't contain special characters or numbers
}

Check for multiple occurrence of certain character in string

Edit: To those who downvote me, this question is difference from the duplicate question which you guy linked. The other question is about returning the indexes. However, for my case, I do not need the index. I just want to check whether there is duplicate.
This is my code:
String word = "ABCDE<br>XYZABC";
String[] keywords = word.split("<br>");
for (int index = 0; index < keywords.length; index++) {
if (keywords[index].toLowerCase().contains(word.toLowerCase())) {
if (index != (keywords.length - 1)) {
endText = keywords[index];
definition.setText(endText);
}
}
My problem is, if the keywords is "ABC", then the string endText will only show "ABCDE". However, "XYZABC" contains "ABC" as well. How to check if the string has multiple occurrence? I would like to make the definition textview become definition.setText(endText + "More"); if there is multiple occurrence.
I tried this. The code is working, but it is making my app very slow. I guess the reason is because I got the String word through textwatcher.
String[] keywords = word.split("<br>");
for (int index = 0; index < keywords.length; index++) {
if (keywords[index].toLowerCase().contains(word.toLowerCase())) {
if (index != (keywords.length - 1)) {
int i = 0;
Pattern p = Pattern.compile(search.toLowerCase());
Matcher m = p.matcher( word.toLowerCase() );
while (m.find()) {
i++;
}
if (i > 1) {
endText = keywords[index];
definition.setText(endText + " More");
} else {
endText = keywords[index];
definition.setText(endText);
}
}
}
}
Is there any faster way?

It's a little hard for me to understand your question, but it sounds like:
You have some string (e.g. "ABCDE<br>XYZABC"). You also have some target text (e.g. "ABC"). You want to split that string on a delimiter (e.g. "<br>", and then:
If exactly one substring contains the target, display that substring.
If more than one substring contains the target, display the last substring that contains it plus the suffix "More"
In your posted code, the performance is really slow because of the Pattern.compile() call. Re-compiling the Pattern on every loop iteration is very costly. Luckily, there's no need for regular expressions here, so you can avoid that problem entirely.
String search = "ABC".toLowerCase();
String word = "ABCDE<br>XYZABC";
String[] keywords = word.split("<br>");
int count = 0;
for (String keyword : keywords) {
if (keyword.toLowerCase().contains(search)) {
++count;
endText = keyword;
}
}
if (count > 1) {
definition.setText(endText + " More");
}
else if (count == 1) {
definition.setText(endText);
}

You are doing it correctly but you are doing unnecessary check which is if (index != (keywords.length - 1)). This will ignore if there is match in the last keywords array element. Not sure is that a part of your requirement.
To enhance performance when you found the match in second place break the loop. You don't need to check anymore.
public static void main(String[] args) {
String word = "ABCDE<br>XYZABC";
String pattern = "ABC";
String[] keywords = word.split("<br>");
String endText = "";
int count = 0;
for (int index = 0; index < keywords.length; index++) {
if (keywords[index].toLowerCase().contains(pattern.toLowerCase())) {
//If you come into this part mean found a match.
if(count == 1) {
// When you found the second match it will break to loop. No need to check anymore
// keep the first found String and append the more part to it
endText += " more";
break;
}
endText = keywords[index];
count++;
}
}
System.out.println(endText);
}
This will print ABCDE more

Hi You have to use your condition statement like this
if (word.toLowerCase().contains(keywords[index].toLowerCase()))

You can use this:
String word = "ABCDE<br>XYZABC";
String[] keywords = word.split("<br>");
for (int i = 0; i < keywords.length - 1; i++) {
int c = 0;
Pattern p = Pattern.compile(keywords[i].toLowerCase());
Matcher m = p.matcher(word.toLowerCase());
while (m.find()) {
c++;
}
if (c > 1) {
definition.setText(keywords[i] + " More");
} else {
definition.setText(keywords[i]);
}
}
But like what I mentioned in comment, there is no double occurrence in word "ABCDE<br>XYZABC" when you want to split it by <br>.
But if you use the word "ABCDE<br>XYZABCDE" there is two occurrence of word "ABCDE"

void test() {
String word = "ABCDE<br>XYZABC";
String sequence = "ABC";
if(word.replaceFirst(sequence,"{---}").contains(sequence)){
int startIndex = word.indexOf(sequence);
int endIndex = word.indexOf("<br>");
Log.v("test",word.substring(startIndex,endIndex)+" More");
}
else{
//your code
}
}
Try this

How to capitalize the first and last letters of every word in a string in java

How to capitalize the first and last letters of every word in a string
i have done it this way -
String cap = "";
for (int i = 0; i < sent.length() - 1; i++)
{
if (sent.charAt(i + 1) == ' ')
{
cap += Character.toUpperCase(sent.charAt(i)) + " " + Character.toUpperCase(sent.charAt(i + 2));
i += 2;
}
else
cap += sent.charAt(i);
}
cap += Character.toUpperCase(sent.charAt(sent.length() - 1));
System.out.print (cap);
It does not work when the first word is of more than single character
Please use simple functions as i am a beginner

Using apache commons lang library it becomes very easy to do:
String testString = "this string is needed to be 1st and 2nd letter-uppercased for each word";
testString = WordUtils.capitalize(testString);
testString = StringUtils.reverse(testString);
testString = WordUtils.capitalize(testString);
testString = StringUtils.reverse(testString);
System.out.println(testString);
ThiS StrinG IS NeedeD TO BE 1sT AnD 2nD Letter-uppercaseD FoR EacH
WorD

You should rather split your String with a whitespace as character separator, then for each token apply toUpperCase() on the first and the last character and create a new String as result.
Very simple sample :
String cap = "";
String sent = "hello world. again.";
String[] token = sent.split("\\s+|\\.$");
for (String currentToken : token){
String firstChar = String.valueOf(Character.toUpperCase(currentToken.charAt(0)));
String between = currentToken.substring(1, currentToken.length()-1);
String LastChar = String.valueOf(Character.toUpperCase(currentToken.charAt(currentToken.length()-1)));
if (!cap.equals("")){
cap += " ";
}
cap += firstChar+between+LastChar;
}
Of course you should favor the use of StringBuilder over String as you perform many concatenations.
Output result : HellO World. AgaiN

Your code is missing out the first letter of the first word. I would treat this as a special case, i.e.
cap = ""+Character.toUpperCase(sent.charAt(0));
for (int i = 1; i < sent.length() - 1; i++)
{
.....
Of course, there are much easier ways to do what you are doing.

Basically you just need to iterate over all characters and replace them if one of the following conditions is true:
it's the first character
it's the last character
the previous character was a whitespace (or whatever you want, e.g. punctuation - see below)
the next character is a whitespace (or whatever you want, e.g. punctuation - see below)
If you use a StringBuilder for performance and memory reasons (don't create a String in every iteration which += would do) it could look like this:
StringBuilder sb = new StringBuilder( "some words in a list even with longer whitespace in between" );
for( int i = 0; i < sb.length(); i++ ) {
if( i == 0 || //rule 1
i == (sb.length() - 1 ) || //rule 2
Character.isWhitespace( sb.charAt( i - 1 ) ) || //rule 3
Character.isWhitespace( sb.charAt( i + 1 ) ) ) { //rule 4
sb.setCharAt( i, Character.toUpperCase( sb.charAt( i ) ) );
}
}
Result: SomE WordS IN A LisT EveN WitH LongeR WhitespacE IN BetweeN
If you want to check for other rules as well (e.g. punctuation etc.) you could create a method that you call for the previous and next character and which checks for the required properties.

String stringToSearch = "this string is needed to be first and last letter uppercased for each word";
// First letter upper case using regex
Pattern firstLetterPtn = Pattern.compile("(\\b[a-z]{1})+");
Matcher m = firstLetterPtn.matcher(stringToSearch);
StringBuffer sb = new StringBuffer();
while(m.find()){
m.appendReplacement(sb,m.group().toUpperCase());
}
m.appendTail(sb);
stringToSearch = sb.toString();
sb.setLength(0);
// Last letter upper case using regex
Pattern LastLetterPtn = Pattern.compile("([a-z]{1}\\b)+");
m = LastLetterPtn.matcher(stringToSearch);
while(m.find()){
m.appendReplacement(sb,m.group().toUpperCase());
}
m.appendTail(sb);
System.out.println(sb.toString());
output:
ThiS StrinG IS NeedeD TO BE FirsT AnD LasT LetteR UppercaseD FoR EacH WorD

Finding the longest "number sequence" in a string using only a single regex

I want to find a single regex which matches the longest numerical string in a URL.
I.e for the URL: http://stackoverflow.com/1234/questions/123456789/ask, I would like it to return : 123456789
I thought I could use : ([\d]+)
However this returns the first match from the left, not the longest.
Any ideas :) ?
This regex will be used as an input to a strategy pattern, which extracts certain characteristics from urls:
public static String parse(String url, String RegEx) {
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(url);
if (m.find()) {
return m.group(1);
}
return null;
}
So it would be much tidier if I could use a single regex. :( –

Don't use regex. Just iterate the characters:
String longest = 0;
int i = 0;
while (i < str.length()) {
while (i < str.length() && !Character.isDigit(str.charAt(i))) {
++i;
}
int start = i;
while (i < str.length() && Character.isDigit(str.charAt(i))) {
++i;
}
if (i - start > longest.length()) {
longest = str.substring(start, i);
}
}

#Andy already gave a non-regex answer, which is probably faster, but if you want to use regex, you must, as #Jan points out, add logic, e.g.:
public String findLongestNumber(String input) {
String longestMatch = "";
int maxLength = 0;
Matcher m = Pattern.compile("([\\d]+)").matcher(input);
while (m.find()) {
String currentMatch = m.group();
int currentLength = currentMatch.length();
if (currentLength > maxLength) {
maxLength = currentLength;
longestMatch = currentMatch;
}
}
return longestMatch;
}
t

Not possible with pure Regex, however I would do it this way (using Stream Max and Regex) :
String url = "http://stackoverflow.com/1234/questions/123456789/ask";
Pattern biggest = Pattern.compile("/(\\d+)/");
Matcher m = biggest.matcher(url);
List<String> matches = new ArrayList<>();
while(m.find()){
matches.add(m.group(1));
}
System.out.println(matches.parallelStream().max((String a, String b) -> Integer.compare(a.length(), b.length())).get());
Will print : 123456789

Extract sub-string between two certain words using regex in java

I would like to extract sub-string between certain two words using java.
For example:
This is an important example about regex for my work.
I would like to extract everything between "an" and "for".
What I did so far is:
String sentence = "This is an important example about regex for my work and for me";
Pattern pattern = Pattern.compile("(?<=an).*.(?=for)");
Matcher matcher = pattern.matcher(sentence);
boolean found = false;
while (matcher.find()) {
System.out.println("I found the text: " + matcher.group().toString());
found = true;
}
if (!found) {
System.out.println("I didn't found the text");
}
It works well.
But I want to do two additional things
If the sentence is: This is an important example about regex for my work and for me.
I want to extract till the first "for" i.e. important example about regex
Some times I want to limit the number of words between the pattern to 3 words i.e. important example about
Any ideas please?

For your first question, make it lazy. You can put a question mark after the quantifier and then the quantifier will match as less as possible.
(?<=an).*?(?=for)
I have no idea what the additional . at the end is good for in .*. its unnecessary.
For your second question you have to define what a "word" is. I would say here probably just a sequence of non whitespace followed by a whitespace. Something like this
\S+\s
and repeat this 3 times like this
(?<=an)\s(\S+\s){3}(?=for)
To ensure that the pattern mathces on whole words use word boundaries
(?<=\ban\b)\s(\S+\s){1,5}(?=\bfor\b)
See it online here on Regexr
{3} will match exactly 3 for a minimum of 1 and a max of 3 do this {1,3}
Alternative:
As dma_k correctly stated in your case here its not necessary to use look behind and look ahead. See here the Matcher documentation about groups
You can use capturing groups instead. Just put the part you want to extract in brackets and it will be put into a capturing group.
\ban\b(.*?)\bfor\b
See it online here on Regexr
You can than access this group like this
System.out.println("I found the text: " + matcher.group(1).toString());
^
You have only one pair of brackets, so its simple, just put a 1 into matcher.group(1) to access the first capturing group.

Your regex is "an\\s+(.*?)\\s+for". It extracts all characters between an and for ignoring white spaces (\s+). The question mark means "greedy". It is needed to prevent pattern .* to eat everything including word "for".

public class SubStringBetween {
public static String subStringBetween(String sentence, String before, String after) {
int startSub = SubStringBetween.subStringStartIndex(sentence, before);
int stopSub = SubStringBetween.subStringEndIndex(sentence, after);
String newWord = sentence.substring(startSub, stopSub);
return newWord;
}
public static int subStringStartIndex(String sentence, String delimiterBeforeWord) {
int startIndex = 0;
String newWord = "";
int x = 0, y = 0;
for (int i = 0; i < sentence.length(); i++) {
newWord = "";
if (sentence.charAt(i) == delimiterBeforeWord.charAt(0)) {
startIndex = i;
for (int j = 0; j < delimiterBeforeWord.length(); j++) {
try {
if (sentence.charAt(startIndex) == delimiterBeforeWord.charAt(j)) {
newWord = newWord + sentence.charAt(startIndex);
}
startIndex++;
} catch (Exception e) {
}
}
if (newWord.equals(delimiterBeforeWord)) {
x = startIndex;
}
}
}
return x;
}
public static int subStringEndIndex(String sentence, String delimiterAfterWord) {
int startIndex = 0;
String newWord = "";
int x = 0;
for (int i = 0; i < sentence.length(); i++) {
newWord = "";
if (sentence.charAt(i) == delimiterAfterWord.charAt(0)) {
startIndex = i;
for (int j = 0; j < delimiterAfterWord.length(); j++) {
try {
if (sentence.charAt(startIndex) == delimiterAfterWord.charAt(j)) {
newWord = newWord + sentence.charAt(startIndex);
}
startIndex++;
} catch (Exception e) {
}
}
if (newWord.equals(delimiterAfterWord)) {
x = startIndex;
x = x - delimiterAfterWord.length();
}
}
}
return x;
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular expression with & as separator - java

I was given a long text in which I need to find all the text that are embedded in a pair of & (For example, in a text "&hello&&bye&", I need to find the words "hello" and "bye"). I try using the regex ".&([^&])&.*" but it doesn't work, I don't know what's wrong with that. Any help? Thanks

Try this way String data = "&hello&&bye&"; Matcher m = Pattern.compile("&([^&]*)&").matcher(data); while (m.find()) System.out.println(m.group(1)); output: hello bye

The surrounding .* don't make sense and are unproductive. Just &([^&])*& is sufficient.

Related

Restrict special variable in java

Check for multiple occurrence of certain character in string

How to capitalize the first and last letters of every word in a string in java

Finding the longest "number sequence" in a string using only a single regex

Extract sub-string between two certain words using regex in java

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular expression with & as separator - java

I was given a long text in which I need to find all the text that are embedded in a pair of & (For example, in a text "&hello&&bye&", I need to find the words "hello" and "bye"). I try using the regex ".*&([^&])*&.*" but it doesn't work, I don't know what's wrong with that. Any help? Thanks

Try this way String data = "&hello&&bye&"; Matcher m = Pattern.compile("&([^&]*)&").matcher(data); while (m.find()) System.out.println(m.group(1)); output: hello bye

The surrounding .* don't make sense and are unproductive. Just &([^&])*& is sufficient.

Related

Restrict special variable in java

Check for multiple occurrence of certain character in string

How to capitalize the first and last letters of every word in a string in java

Finding the longest "number sequence" in a string using only a single regex

Extract sub-string between two certain words using regex in java

Categories

Resources

I was given a long text in which I need to find all the text that are embedded in a pair of & (For example, in a text "&hello&&bye&", I need to find the words "hello" and "bye"). I try using the regex ".&([^&])&.*" but it doesn't work, I don't know what's wrong with that. Any help? Thanks