How to use regular expression while searching in HashSet - java

I am writing a Java program in which I need to search a particular word from a Set. The word that has to be searched is something like ("wo.d") where '.' can be replaced by any other alphabet. I am using regex to match such type of word cases.
This is what I have so far
HashSet<String> words = new HashSet<String>();//this set is already populated
String word = "t.st";
if(word.contains(".")){
Pattern p = Pattern.compile(word);
Matcher m;
boolean match = false;
for(String setWord : words){
m = p.matcher(setWord);
if(m.matches())
match = true;
}
if(match)
System.out.println("Its a match");
else
System.out.println("Its not a match");
}
else{
System.out.println("The word does not contain regex do other stuff");
}
The code above works but is not efficient because it is being called many times in a second. So it produces a lag in the program.

You need to stop iterating as soon as you get a match, so assuming that you use Java 8, your for loop could be rewritten efficiently as next:
boolean match = words.stream().anyMatch(w -> p.matcher(w).matches());
You could also parallelize the research using parallelStream() instead of stream() especially if your Set has a lot of words.
If you don't use Java 7, it could still be done using FluentIterable from Google Guava but without the ability to parallelize the research unfortunately.
boolean match = FluentIterable.from(words).anyMatch(
new Predicate<String>() {
#Override
public boolean apply(#Nullable final String w) {
return p.matcher(w).matches();
}
}
);
But in your case, I don't believe that using FluentIterable can be more interesting than simply adding a break when you get a match, as it will still be easier to read and maintain
if (p.matcher(setWord).matches()) {
match = true;
break;
}
So, if you really need to use a regular expression and you cannot use Java 8, your best option is to use break as described above, there is no magic trick to consider.
Assuming that you will only have one character to replace, it could be done using startsWith(String) and endsWith(String) which will always be much faster than a regular expression. Something like this:
// Your words should be in a TreeSet to be already sorted alphabetically
// in order to get a match as fast as possible
Set<String> words = new TreeSet<String>(); //this set is already populated
int index = word.indexOf('.');
if (index != -1) {
String prefix = word.substring(0, index);
String suffix = word.substring(index + 1);
boolean match = false;
for (String setWord : words){
// From the fastest to the slowest thing to check
// to get the best possible performances
if (setWord.length() == word.length()
&& setWord.startsWith(prefix)
&& setWord.endsWith(suffix)) {
match = true;
break;
}
}
if(match)
System.out.println("Its a match");
else
System.out.println("Its not a match");
}
else {
System.out.println("The word does not contain regex do other stuff");
}

Use TreeSet instead of HashSet. And test for sub range of the set.
TreeSet<String> words = new TreeSet<>();// this set is already populated
String word = "t.st";
if (word.contains(".")) {
String from = word.replaceFirst("\\..*", "");
String to = from + '\uffff';
Pattern p = Pattern.compile(word);
Matcher m;
boolean match = false;
for (String setWord : words.subSet(from, to)) {
m = p.matcher(setWord);
if (m.matches()) {
match = true;
break;
}
}
if (match)
System.out.println("Its a match");
else
System.out.println("Its not a match");
} else {
System.out.println("The word does not contain regex do other stuff");
}
In this case words.subSet(from, to) contains only the words start with "t".

Just break out of loop to stop further regex matching of your HashSet as soon as you get a match:
if(m.matches()) {
match = true;
break;
}
Full Code:
HashSet<String> words = new HashSet<String>();//this set is already populated
String word = "t.st";
if(word.contains(".")){
Pattern p = Pattern.compile(word);
Matcher m;
boolean match = false;
for(String setWord : words){
m = p.matcher(setWord);
if(m.matches()) {
match = true;
break:
}
}
if(match)
System.out.println("Its a match");
else
System.out.println("Its not a match");
}
else{
System.out.println("The word does not contain regex do other stuff");
}

Use original matching method like this.
static boolean match(String wild, String s) {
int len = wild.length();
if (len != s.length())
return false;
for (int i = 0; i < len; ++i) {
char w = wild.charAt(i);
if (w == '.')
continue;
else if (w != s.charAt(i))
return false;
}
return true;
}
and
HashSet<String> words = new HashSet<>();// this set is already populated
String word = "t.st";
boolean match = false;
if (word.contains(".")) {
for (String setWord : words) {
if (match(word, setWord)) {
match = true;
break;
}
}
if (match)
System.out.println("Its a match");
else
System.out.println("Its not a match");
} else {
System.out.println("The word does not contain regex do other stuff");
}

Related

Changing or to our

I've been working on a program for school for some time now that is supposed to change every word that ends in "or" and does not have a vowel before the "or". However I am running into an issue where it replaces the letters for words such as paper and letter into papeur and letteur. Here's my code:
import java.util.Scanner;
public class main {
public static void main(String[] args){
//Initializing Variables
Scanner input = new Scanner(System.in);
char[] vowels = new char[] {'a','e','i','o','u'};
boolean hasVowel = false;
boolean running = true;
boolean oR = false;
while(running){
//Check for word
System.out.println("Enter a word more than 4 letters long or type quit to stop");
String word = input.nextLine();
if(word.equalsIgnoreCase("quit")){
System.exit(0);
}
while(word.length() <= 4){
System.out.println("That word is not more than 4 letters long");
word = input.next();
}
//Used to insert words
StringBuilder stringBuilder = new StringBuilder(word);
//Check for the letters
char x = word.charAt(word.length()-2);
if(word.endsWith("r")){
if(x == 'o') {
oR = true;
System.out.println("Has or");
for (char c : vowels) {
if (c == word.charAt(word.length() - 3)) {
hasVowel = true;
oR = false;
}
}
}
}
//output
if (hasVowel){
System.out.println(word);
}
else{
if(oR = true) {
stringBuilder.insert(word.length() - 1, "u");
System.out.println(stringBuilder.toString());
}
else if (oR = false) {
System.out.println(word);
}
}
System.out.println(hasVowel);
System.out.println(word.charAt(word.length()-3));
}
}
}
If someone could help me out that would be amazing!
Your problem is this line:
if(oR = true) {
This will always be true, because it is an assignment rather than an equality check. You want == here.
Also note that the else clause that goes with this:
else if (oR = false) {
can be just:
else {
since if a boolean value isn't true, it has to be false.
This way you can edit
public static void main(String[] args){
//Initializing Variables
Scanner input = new Scanner(System.in);
char[] vowels = new char[] {'a','e','i','o','u'};
boolean hasVowel = false;
boolean running = true;
boolean oR = false;
while(running){
//Check for word
System.out.println("Enter a word more than 4 letters long or type quit to stop");
String word = input.nextLine();
if(word.equalsIgnoreCase("quit")){
System.exit(0);
}
while(word.length() <= 4){
System.out.println("That word is not more than 4 letters long");
word = input.next();
}
//Used to insert words
StringBuilder stringBuilder = new StringBuilder(word);
//Check for the letters
char x = word.charAt(word.length()-2);
if(word.endsWith("r")){
if(x == 'o') {
oR = true;
System.out.println("Has or");
for (char c : vowels) {
if (c == word.charAt(word.length() - 3)) {
hasVowel = true;
oR = false;
}
}
}
}
//output
if (hasVowel){
System.out.println(word);
}
else{
if(oR) { //if(oR == true)
stringBuilder.insert(word.length() - 1, "u");
System.out.println(stringBuilder.toString());
}
else { //else if (!oR) //else if (oR == false)
System.out.println(word);
}
}
System.out.println(hasVowel);
System.out.println(word.charAt(word.length()-3));
}
}
As mentioned in the comments, this task could be resolved using regular expressions because it is a natural way of searching and replacing substrings in strings.
The basic regular expression to replace suffix -or with suffix -our on the condition that -or is not preceded by a vowel is as follows:
public String simpleOrToOur(String word) {
return word.replaceAll("(\\w+[^aeiou])or", "$1our");
}
However, the initial rule does not seem to be correct because the replacements should NOT occur in multiple cases when there is a consonant before or, and may occur for -ior prefix:
bor: no rubour
cor: no decour, mucour, only rancour/succour
dor: only ardour, candour, odour, splendor are valid, no replacement needed in dor, condor, corridor, vendor, etc.
for: is not replaced with four
gor: only clangour, rigour, vigour, and no replacement in: mortgagor, pledgor, turgor
hor: not replaceable at all: anchor, author, camphor, etc.
ior: vowel need to replace in behavior, pavior, savior
lor: no replacement needed in bachelor, chancellor, counsellor, jailor, sailor, tailor etc.
mor: actually it should be a[r]mor or umor as in armor, clamor, glamor, humor, no replacement in mor, tremor
nor: replacement needed only in honor, demeanor and their derivatives, no replacement in donor, manor, minor, signor, tenor, etc.
por: needed in sapor, vapor, not in stupor, torpor
vor: should be avor/ervor as in flavor, fervor, not in salvor, survivor
Other prefixes with remaining consonants [jkqstwxyz]or do not need replacement to -our either.
That being said, a better matching regexp consisting of prefix including the mentioned subparts joined with OR | and suffix -or may be implemented:
public static String changePrefixOrToOur(String word) {
return word.replaceAll("(((h?ar|[lt]a|neigh)b)|(ranc|succ)|((ar|can|o|splen)d)|((clan|[rv]i)g)|(vi)|(([cd]o|par|va)l)|((ar?|u)m)|((ho|demea)n)|([sv]ap)|((a|er)v))or",
"$1our"
);
}
Test:
String[] replaceableWords = {
"arbor", "harbor", "neighbor", "labor", "tabor",
"rancor", "succor",
"ardor", "candor", "odor", "splendor",
"clangor", "vigor", "rigor",
"misbehavior", "pavior", "savior",
"color", "dolor", "parlor", "valor",
"amor", "armor", "tumor", "humor", "clamor", "glamor",
"dishonor", "honor", "misdemeanor",
"vapor", "sapor",
"flavor", "endeavor", "favor", "savor", "disfavor"
};
int successCount = 0;
for (String word : replaceableWords) {
String replaced = changePrefixOrToOur(word);
successCount += replaced.endsWith("our") ? 1 : 0;
//System.out.printf("%s -> %s ? %s%n", word, replaced, replaced.endsWith("our") ? "OK" : "FAIL");
}
System.out.printf("Replaced `or` to `our`: %d of %d words%n---%n%n", successCount, replaceableWords.length);
String[] notReplaceableWords = {
"rubor",
"decor", "mucor",
"ambassador", "condor", "corridor", "conquistador", "dor", "matador", "picador", "vendor",
"meteor",
"for",
"mortgagor", "pledgor", "turgor", "tangor",
"abhor", "anaphor", "anchor", "author", "camphor", "metaphor",
"anterior", "prior", "superior", "warrior",
"Angkor",
"bachelor", "counselor", "chancellor", "squalor", "tailor", "sailor", "taylor",
"mor", "tremor",
"nor", "assignor", "donor", "governor", "signor", "minor", "manor", "tenor", "intervenor",
"door", "floor", "moor", "outdoor", "poor", "boor",
"por", "sopor", "torpor", "stupor",
"advisor", "censor", "professor", "processor", "sensor", "tensor",
"actor", "doctor", "director", "factor", "bettor",
"liquor", "languor", "fluor",
"survivor", "salvor",
"xor", "luxor", "taxor",
"mayor",
"razor", "seizor", "vizor"
};
for (String word : notReplaceableWords) {
String replaced = changePrefixOrToOur(word);
successCount += replaced.endsWith("or") ? 1 : 0;
//System.out.printf("%s -> %s ? %s%n", word, replaced, replaced.endsWith("or") ? "OK" : "FAIL");
}
System.out.printf("Not replaced `or` to `our`: %d of %d words%n", successCount - replaceableWords.length, notReplaceableWords.length);
Output:
Replaced `or` to `our`: 37 of 37 words
---
Not replaced `or` to `our`: 79 of 79 words

Restrict special variable in java

I want to restrict getGatewaySerialNumber from taking special characters.
I have written these condition but if block it is executing only first condition it is not checking for and condition.
How to restrict Gatewayserialnumber from taking special character.
If(manifestRequestEntity.getGatewaySerialNumber().lenghth()>16 && manifestRequestEntity.getGatewaySerialNumber().matches(regex :"[0-9a-fA-F]+"))
You can use some helper methods to check if it only contains alphabets and numbers
something like this :
public static boolean checkForSpecialChar(String stringToCheck){
char ch[] = stringToCheck.toCharArray();
for(int i=0; i<ch.length; i++)
{
if((ch[i]>='A' && ch[i]<='Z') || (ch[i]>='a' && ch[i]<='z') || (ch[i]>='0' && ch[i]<='9'))
{
continue;
}
return true;
}
return false;
}
Try This.
name = "abc";
Pattern special= Pattern.compile("[^a-z0-9 ]", Pattern.CASE_INSENSITIVE);
Pattern number = Pattern.compile("[0-9]", Pattern.CASE_INSENSITIVE);
Matcher matcher = special.matcher(name);
Matcher matcherNumber = number.matcher(name);
boolean constainsSymbols = matcher.find();
boolean containsNumber = matcherNumber.find();
if(constainsSymbols){
//string contains special symbol/character
}
else if(containsNumber){
//string contains numbers
}
else{
//string doesn't contain special characters or numbers
}

Java - Find the the words before and after a given word in a string

Say I have a string
String str = "This problem sucks and is hard"
and I wanted to get the words before and after "problem", so "This" and "sucks". Is regex the best way to accomplish this (keeping in mind that I'm a beginner with regex), or does Java have some kind of library (i.e. StringUtils) that can accomplish this for me?
To find the words before and after a given word, you can use this regex:
(\w+)\W+problem\W+(\w+)
The capture groups are the words you're looking for.
In Java, that would be:
Pattern p = Pattern.compile("(\\w+)\\W+problem\\W+(\\w+)");
Matcher m = p.matcher("This problem sucks and is hard");
if (m.find())
System.out.printf("'%s', '%s'", m.group(1), m.group(2));
Output
'This', 'sucks'
If you want full Unicode support, add flag UNICODE_CHARACTER_CLASS, or inline as (?U):
Pattern p = Pattern.compile("(?U)(\\w+)\\W+problema\\W+(\\w+)");
Matcher m = p.matcher("Questo problema è schifoso e dura");
if (m.find())
System.out.printf("'%s', '%s'", m.group(1), m.group(2));
Output
'Questo', 'è'
For finding multiple matches, use a while loop:
Pattern p = Pattern.compile("(?U)(\\w+)\\W+problems\\W+(\\w+)");
Matcher m = p.matcher("Big problems or small problems, they are all just problems, man!");
while (m.find())
System.out.printf("'%s', '%s'%n", m.group(1), m.group(2));
Output
'Big', 'or'
'small', 'they'
'just', 'man'
Note: The use of \W+ allows symbols to occur between words, e.g. "No(!) problem here" will still find "No" and "here".
Also note that a number is considered a word: "I found 1 problem here" returns "1" and "here".
There is a StringUtils library by apache which does have the methods to substring before and after the string. Additionally there is java's own substring which you can play with to get what you need.
Apache StringUtils library API:
https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringUtils.html
The methods that you might need - substringBefore() and substringBefore().
https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringUtils.html#substringBefore(java.lang.String,%20java.lang.String)
Check this out if you want to explore java's own api's
Java: Getting a substring from a string starting after a particular character
A bit verbose but this gets the job done accurately and quickly:
import java.io.*;
import java.util.*;
public class HelloWorld{
public static void main(String []args){
String EntireString="Hello World this is a test";
String SearchWord="World";
System.out.println(getPreviousWordFromString(EntireString,SearchWord));
}
public static String getPreviousWordFromString(String EntireString, String SearchWord) {
List<Integer> IndicesOfWords = new ArrayList();
boolean isWord = false;
int indexOfSearchWord=-1;
if(EntireString.indexOf(SearchWord)!=-1) {
indexOfSearchWord = EntireString.indexOf(SearchWord)-1;
} else {
System.out.println("ERROR: SearchWord passed (2nd arg) does not exist in string EntireString. EntireString: "+EntireString+" SearchWord: "+SearchWord);
return "";
}
if(EntireString.indexOf(SearchWord)==0) {
System.out.println("ERROR: The search word passed is the first word in the search string, so there are no words before it.");
return "";
}
for (int i = 0; i < EntireString.length(); i++) {
if (Character.isLetter(EntireString.charAt(i)) && i != indexOfSearchWord) {
isWord = true;
} else if (!Character.isLetter(EntireString.charAt(i)) && isWord) {
IndicesOfWords.add(i);
isWord = false;
} else if (Character.isLetter(EntireString.charAt(i)) && i == indexOfSearchWord) {
IndicesOfWords.add(i);
}
}
if(IndicesOfWords.size()>0) {
boolean isFirstWordAWord=true;
for (int i = 0; i < IndicesOfWords.get(0); i++) {
if(!Character.isLetter(EntireString.charAt(i))) {
isFirstWordAWord=false;
}
}
if(isFirstWordAWord==true) {
String firstWord = EntireString.substring(0,IndicesOfWords.get(0));
IndicesOfWords.add(0,0);
}
} else {
return "";
}
String ResultingWord = "";
for (int i = IndicesOfWords.size()-1; i >= 0; i--) {
if (EntireString.substring(IndicesOfWords.get(i)).contains(SearchWord)) {
if (i > 0) {
ResultingWord=EntireString.substring(IndicesOfWords.get(i-1),IndicesOfWords.get(i));
break;
}
if (i==0) {
ResultingWord=EntireString.substring(IndicesOfWords.get(0),IndicesOfWords.get(1));
}
}
}
return ResultingWord;
}

fastest way for replacing i-th occurance in a string

I have a string which consists of total n equal substrings. For example, string "hellooo dddd" has 3 "dd" substrings (I say it has occured 3 times). In a more general case which we have n equal substrings in a string, how can I replace i-th occurance in the string. A ,method like replace() for i-th substring. I want to implement it in my android code. (English isn’t my first language, so please excuse any mistakes.).
public static String replace(String input, String pattern, int occurence, String replacement){
String result = input;
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(result);
if(occurence == 0){
return result;
} else if(occurence == 1){
m.find();
result = result.substring(0,m.start()) + replacement + result.substring(m.end());
} else {
m.find();
int counter = 1;
try {
while((counter<occurence)&&m.find(m.start()+1)){
counter++;
}
result = result.substring(0,m.start()) + replacement + result.substring(m.end());
} catch(IllegalStateException ise){
throw new IllegalArgumentException("There are not this many occurences of the pattern in the String.");
}
}
return result;
}
Seems to do something similar to what you want if I understand correctly.
Using the matcher/pattern system it's open to much more complex regex.

Regular expression with & as separator

I was given a long text in which I need to find all the text that are embedded in a pair of & (For example, in a text "&hello&&bye&", I need to find the words "hello" and "bye").
I try using the regex ".*&([^&])*&.*" but it doesn't work, I don't know what's wrong with that.
Any help?
Thanks
Try this way
String data = "&hello&&bye&";
Matcher m = Pattern.compile("&([^&]*)&").matcher(data);
while (m.find())
System.out.println(m.group(1));
output:
hello
bye
No regex needed. Just iterate!
boolean started = false;
List<String> list;
int startIndex;
for(int i = 0; i < string.length(); ++i){
if(string.charAt(i) != '&')
continue;
if(!started) {
started = true;
startIndex = i + 1;
}
else {
list.add(string.substring(startIndex, i)); // maybe some +-1 here in indices
}
started = !started;
}
or use split!
String[] parts = string.split("&");
for(int i = 1; i < parts.length; i += 2) { // every second
list.add(parts[i]);
}
If you don't want to use regular expressions, here's a simple way.
String string = "xyz...." // the string containing "hello", "bye" etc.
String[] tokens = string.split("&"); // this will split the string into an array
// containing tokens separated by "&"
for(int i=0; i<tokens.length; i++)
{
String token = tokens[i];
if(token.length() > 0)
{
// handle edge case
if(i==tokens.length-1)
{
if(string.charAt(string.length()-1) == '&')
System.out.println(token);
}
else
{
System.out.println(token);
}
}
}
Two problems:
You're repeating the capturing group. This means that you'll only catch the last letter between &s in the group.
You will only match the last word because the .*s will gobble up the rest of the string.
Use lookarounds instead:
(?<=&)[^&]+(?=&)
Now the entire match will be hello (and bye when you apply the regex for the second time) because the surrounding &s won't be part of the match any more:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("(?<=&)[^&]+(?=&)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
The surrounding .* don't make sense and are unproductive. Just &([^&])*& is sufficient.
I would simplify it even further.
Check that the first char is &
Check that the last char is &
String.split("&&") on the substring between them
In code:
if (string.length < 2)
throw new IllegalArgumentException(string); // or return[], whatever
if ( (string.charAt(0) != '&') || (string.charAt(string.length()-1) != '&')
// handle this, too
String inner = string.substring(1, string.length()-1);
return inner.split("&&");

Categories

Resources