In java I want to iterate an array to find any matching words from my input string
if the input string is appended to numbers it should return true.
Array arr = {"card","creditcard","debitcard"}
String inputStr = "need to discard pin" --> Return False
String inputStr = "need to 444card pin" --> Return True if its followed by number
I tried the below code, but it returns true as it takes "card" from the "discard" string and compares, but I need to do an exact match
Arrays.stream(arr).anymatch(inputString::contains)
Try this:
String[] arr = {"card","creditcard","debitcard"}; // array that keeps the words
String inputStr = "need to discard pin"; // String that keeps the 'sentence'
String[] wordsToBeChecked = inputStr.split(" "); // We take the string and split it at each " " (space)
HashSet<String> matchingWords = new HashSet<>(); // This will keep the matching words
for (String s : arr)
{
for (String s1 : wordsToBeChecked)
{
if(s.equalsIgnoreCase(s1)) // If first word matches with the second
{
matchingWords.add(s1); // add it to our container
}
}
}
Or using Java 8 Streams:
List<String> wordList = Arrays.asList(arr);
List<String> sentenceWordList = Arrays.asList(inputStr.split(" "));
List<String> matchedWords = wordList.stream().filter(sentenceWordList::contains)
.collect(Collectors.toList());
The problem with most answers here is that they do not take punctuation into consideration. To solve this, you could use a regular expression like below.
String[] arr = { "card", "creditcard", "debitcard" };
String inputStr = "You need to discard Pin Card.";
Arrays.stream(arr)
.anyMatch(word -> Pattern
.compile("(?<![a-z-])" + Pattern.quote(word) + "(?![a-z-])", Pattern.CASE_INSENSITIVE)
.matcher(inputStr)
.find());
With Pattern.quote(word), we escape any character within each word with is a special character in the context of a regular expression. For instance, the literal string a^b would never match, because ^ means the start of a string if used in a regular expression.
(?<![a-z-]) and (?![a-z-]) mean that there is not a word character immediately preceding or succeeding the word. For instance, discard will not match, even if it contains the word card. I have used only lowercase in these character classes because of the next bullet:
The flag CASE_INSENSITIVE passed to the compile method causes the pattern to be matched in a case-insensitive manner.
Online demo
You could split the string using a regular expression
String[] arr = {"card","creditcard","debitcard"};
String inputStr = "need to discard pin";
List<String> wordsToBeChecked = Arrays.asList(inputStr.split("[ 0-9]"));
Arrays.stream(arr).anyMatch(wordsToBeChecked::contains);
If your word list and input string is longer, consider splitting your input string into a hashset. Looksups will be faster, then:
Set<String> wordsToBeChecked = new HashSet<>(Arrays.asList(inputStr.split(" ")));
You can create a Set of the words in inputStr and then check the words list against that Set.
Set<String> inputWords = uniqueWords(inputStr);
List<String> matchedWords = Arrays.stream(arr)
.anyMatch(word -> inputWords.contains(word))
.collect(Collectors.toList());
Building the Set may be non-trivial if you have to account for hyphenation, numbers, punctuation, and so forth. I'll wave my hands and ignore that - here's a naive implementation of uniqueWords(String) that assumes they are separated by spaces.
public Set<String> uniqueWords(String string) {
return Arrays.stream(string.split(" "))
.collect(Collectors.toSet());
}
One way would be
String[] arr = {"card","creditcard","debitcard"};
String inputStr = "need to discard pin";
var contains = Arrays.stream(inputStr.split(" ")).anyMatch(word -> Arrays.asList(arr).contains(word));
You can adjust the split regex to include all kinds of whitespace too.
Also: Consider an appropriate data structure for lookups. Array will be O(n), HashSet will be O(1).
Related
I'm trying to take two sentences and see if they have words in common. Example:
A- "Hello world this is a test"
B- "Test to create things"
The common word here is "test"
I tried using .contains() but it doesn't work because I can only search for one word.
text1.toLowerCase ().contains(sentence1.toLowerCase ())
You can create HashSets from both of the words after splitting on whitespace. You can use Set#retainAll to find the intersection (common words).
final String a = "Hello world this is a test", b = "Test to create things";
final Set<String> words = new HashSet<>(Arrays.asList(a.toLowerCase().split("\\s+")));
final Set<String> words2 = new HashSet<>(Arrays.asList(b.toLowerCase().split("\\s+")));
words.retainAll(words2);
System.out.println(words); //[test]
Spilt the two sentences by space and add each word from first string in a Set. Now in a loop, try adding words from second string in the set. If add operation returns false then it is a common word.
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
public class Sample {
public static void main(String[] args) {
// TODO Auto-generated method stub
String str1 = "Hello world this is a test";
String str2 = "Test to create things";
str1 = str1.toLowerCase();
str2 = str2.toLowerCase();
String[] str1words = str1.split(" ");
String[] str2words = str2.split(" ");
boolean flag = true;
Set<String> set = new HashSet<String>(Arrays.asList(str1words));
for(int i = 0;i<str2words.length;i++) {
flag = set.add(str2words[i]);
if(flag == false)
System.out.println(str2words[i]+" is common word");
}
}
}
You can split the sentence by space and collect the word as list and then search one list item in another list and collect the common words.
Here an example using Java Stream API. Here first sentence words collect as Set to faster the search operation for every words (O(1))
String a = "Hello world this is a test";
String b = "Test to create things";
Set<String> aWords = Arrays.stream(a.toLowerCase().split(" "))
.collect(Collectors.toSet());
List<String> commonWords = Arrays.stream(b.toLowerCase().split(" "))
.filter(bw -> aWords.contains(bw))
.collect(Collectors.toList());
System.out.println(commonWords);
Output: test
Here's one approach:
// extract the words from the sentences by splitting on white space
String[] sentence1Words = sentence1.toLowerCase().split("\\s+");
String[] sentence2Words = sentence2.toLowerCase().split("\\s+");
// make sets from the two word arrays
Set<String> sentence1WordSet = new HashSet<String>(Arrays.asList(sentence1Words));
Set<String> sentence2WordSet = new HashSet<String>(Arrays.asList(sentence2Words));
// get the intersection of the two word sets
Set<String> commonWords = new HashSet<String>(sentence1WordSet);
commonWords.retainAll(sentence2WordSet);
This will yield a Set containing lower case versions of the common words between the two sentences. If it is empty there is no similarity. If you don't care about some words like prepositions you can filter those out of the final similarity set or, better yet, preprocess your sentences to remove those words first.
Note that a real-world (ie. useful) implementation of similarity checking is usually far more complex, as you usually want to check for words that are similar but with minor discrepancies. Some useful starting points to look into for these type of string similarity checking are Levenshtein distance and metaphones.
Note there is a redundant copy of the Set in the code above where I create the commonWords set because intersection is performed in-place, so you could improve performance by simply performing the intersection on sentence1WordSet, but I have favoured code clarity over performance.
Try this.
static boolean contains(String text1, String text2) {
String text1LowerCase = text1.toLowerCase();
return Arrays.stream(text2.toLowerCase().split("\\s+"))
.anyMatch(word -> text1LowerCase.contains(word));
}
and
String text1 = "Hello world this is a test";
String text2 = "Test to create things";
System.out.println(contains(text1, text2));
output:
true
If I split a string, say like this:
List<String> words = Arrays.asList(input.split("\\s+"));
And I then wanted to modify those words in various way, then reassmble them using the same logic, assuming no word lengths have changed, is there a way to do that easily? Humor me in that there's a reason I'm doing this.
Note: I need to match all whitspace, not just spaces. Hence the regex.
i.e.:
"Beautiful Country" -> ["Beautiful", "Country"] -> ["BEAUTIFUL", "COUNTRY"] -> "BEAUTIFUL COUNTRY"
If you use String.split, there is no way to be sure that the reassembled strings will be the same as the original ones.
In general (and in your case) there is no way to capture what the actual separators used were. In your example, "\\s+" will match one or more whitespace characters, but you don't know which characters were used, or how many there were.
When you use split, the information about the separators is lost. Period.
(On the other hand, if you don't care that the reassembled string may be a different length or may have different separators to the original, use the Joiner class ...)
Assuming you are have a limit on how many words you can expect, you could try writing a regular expression like
(\S+)(\s+)?(\S+)?(\s+)?(\S+)?
(for the case in which you expect up to three words). You could then use the Matcher API methods groupCount(), group(n) to pull the individual words (the odd groups) or whitespace separators (the even groups >0), do what you needed with the words, and re-assemble them once again...
I tried this:
import java.util.*;
import java.util.stream.*;
public class StringSplits {
private static List<String> whitespaceWords = new ArrayList<>();
public static void main(String [] args) {
String input = "What a Wonderful World! ...";
List<String> words = processInput(input);
// First transformation: ["What", "a", "Wonderful", "World!", "..."]
String first = words.stream()
.collect(Collectors.joining("\", \"", "[\"", "\"]"));
System.out.println(first);
// Second transformation: ["WHAT", "A", "WONDERFUL", "WORLD!", "..."]
String second = words.stream()
.map(String::toUpperCase)
.collect(Collectors.joining("\", \"", "[\"", "\"]"));
System.out.println(second);
// Final transformation: WHAT A WONDERFUL WORLD! ...
String last = IntStream.range(0, words.size())
.mapToObj(i -> words.get(i) + whitespaceWords.get(i))
.map(String::toUpperCase)
.collect(Collectors.joining());
System.out.println(last);
}
/*
* Accepts input string of words containing character words and
* whitespace(s) (as defined in the method Character#isWhitespce).
* Processes and returns only the character strings. Stores the
* whitespace 'words' (a single or multiple whitespaces) in a List<String>.
* NOTE: This method uses String concatenation in a loop. For processing
* large inputs consider using a StringBuilder.
*/
private static List<String> processInput(String input) {
List<String> words = new ArrayList<>();
String word = "";
String whitespaceWord = "";
boolean wordFlag = true;
for (char c : input.toCharArray()) {
if (! Character.isWhitespace(c)) {
if (! wordFlag) {
wordFlag = true;
whitespaceWords.add(whitespaceWord);
word = whitespaceWord = "";
}
word = word + String.valueOf(c);
}
else {
if (wordFlag) {
wordFlag = false;
words.add(word);
word = whitespaceWord = "";
}
whitespaceWord = whitespaceWord + String.valueOf(c);
}
} // end-for
whitespaceWords.add(whitespaceWord);
if (! word.isEmpty()) {
words.add(word);
}
return words;
}
}
I have a List of strings like this "Taxi or bus driver". I need to convert first letter of each word to capital letter except the word "or" . Is there any easy way to achieve this using Java stream.
I have tried with Pattern.compile.splitasstream technique, I could not concat all splitted tokens back to form the original string
any help will be appreciated.If any body needs I can post my code here.
You need the right pattern to identify the location where a change has to be made, a zero-width pattern when you want to use splitAsStream. Match location which are
a word start
looking at a lower case character
not looking at the word “or”
Declare it like
static final Pattern WORD_START_BUT_NOT_OR = Pattern.compile("\\b(?=\\p{Ll})(?!or\\b)");
Then, using it to process the tokens is straight-forward with a stream and map. Getting a string back works via .collect(Collectors.joining()):
List<String> input = Arrays.asList("Taxi or bus driver", "apples or oranges");
List<String> result = input.stream()
.map(s -> WORD_START_BUT_NOT_OR.splitAsStream(s)
.map(w -> Character.toUpperCase(w.charAt(0))+w.substring(1))
.collect(Collectors.joining()))
.collect(Collectors.toList());
result.forEach(System.out::println);
Taxi or Bus Driver
Apples or Oranges
Note that when splitting, there will always be a first token, regardless of whether it matched the criteria. Since the word “or” usually never appears at the beginning of a phrase and the transformation is transparent to non-lowercase letter characters, this should not a problem here. Otherwise, treating the first element specially with a stream would make the code too complicated. If that’s an issue, a loop would be preferable.
A loop based solution could look like
private static final Pattern FIRST_WORD_CHAR_BUT_NOT_OR
= Pattern.compile("\\b(?!or\\b)\\p{Ll}");
(now using a pattern that matches the character rather than looking at it)
public static String capitalizeWords(String phrase) {
Matcher m = FIRST_WORD_CHAR_BUT_NOT_OR.matcher(phrase);
if(!m.find()) return phrase;
StringBuffer sb = new StringBuffer();
do m.appendReplacement(sb, m.group().toUpperCase()); while(m.find());
return m.appendTail(sb).toString();
}
which, as a bonus, is also capable of handling characters which span multiple char units. Starting with Java 9, the StringBuffer can be replaced with StringBuilder to increase the efficiency. This method can be used like
List<String> result = input.stream()
.map(s -> capitalizeWords(s))
.collect(Collectors.toList());
Replacing the lambda expression s -> capitalizeWords(s) with a method reference of the form ContainingClass::capitalizeWords is also possible.
Here is my code:
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;
public class ConvertToCapitalUsingStreams {
// collection holds all the words that are not to be capitalized
private static final List<String> EXCLUSION_LIST = Arrays.asList(new String[]{"or"});
public String convertToInitCase(final String data) {
String[] words = data.split("\\s+");
List<String> initUpperWords = Arrays.stream(words).map(word -> {
//first make it lowercase
return word.toLowerCase();
}).map(word -> {
//if word present in EXCLUSION_LIST return the words as is
if (EXCLUSION_LIST.contains(word)) {
return word;
}
//if the word not present in EXCLUSION_LIST, Change the case of
//first letter of the word and return
return Character.toUpperCase(word.charAt(0)) + word.substring(1);
}).collect(Collectors.toList());
// convert back the list of words into a single string
String finalWord = String.join(" ", initUpperWords);
return finalWord;
}
public static void main(String[] a) {
System.out.println(new ConvertToCapitalUsingStreams().convertToInitCase("Taxi or bus driver"));
}
}
Note:
You may also want to look at this SO post about using apache commons-text library to do this job.
Split your string as words then convert first character to uppercase, then joining it to form original String:
String input = "Taxi or bus driver";
String output = Stream.of(input.split(" "))
.map(w -> {
if (w.equals("or") || w.length() == 0) {
return w;
}
return w.substring(1) + Character.toUpperCase(w.charAt(0));
})
.collect(Collectors.joining(" "));
I need to extract the desired string which attached to the word.
For example
pot-1_Sam
pot-22_Daniel
pot_444_Jack
pot_5434_Bill
I need to get the names from the above strings. i.e Sam, Daniel, Jack and Bill.
Thing is if I use substring the position keeps on changing due to the length of the number. How to achieve them using REGEX.
Update:
Some strings has 2 underscore options like
pot_US-1_Sam
pot_RUS_444_Jack
Assuming you have a standard set of above formats, It seems you need not to have any regex, you can try using lastIndexOf and substring methods.
String result = yourString.substring(yourString.lastIndexOf("_")+1, yourString.length());
Your answer is:
String[] s = new String[4];
s[0] = "pot-1_Sam";
s[1] = "pot-22_Daniel";
s[2] = "pot_444_Jack";
s[3] = "pot_5434_Bill";
ArrayList<String> result = new ArrayList<String>();
for (String value : s) {
String[] splitedArray = value.split("_");
result.add(splitedArray[splitedArray.length-1]);
}
for(String resultingValue : result){
System.out.println(resultingValue);
}
You have 2 options:
Keep using the indexOf method to get the index of the last _ (This assumes that there is no _ in the names you are after). Once that you have the last index of the _ character, you can use the substring method to get the bit you are after.
Use a regular expression. The strings you have shown essentially have the pattern where in you have numbers, followed by an underscore which is in turn followed by the word you are after. You can use a regular expression such as \\d+_ (which will match one or more digits followed by an underscore) in combination with the split method. The string you are after will be in the last array position.
Use a string tokenizer based on '_' and get the last element. No need for REGEX.
Or use the split method on the string object like so :
String[] strArray = strValue.split("_");
String lastToken = strArray[strArray.length -1];
String[] s = {
"pot-1_Sam",
"pot-22_Daniel",
"pot_444_Jack",
"pot_5434_Bill"
};
for (String e : s)
System.out.println(e.replaceAll(".*_", ""));
The question is we have to split the string and write how many words we have.
Scanner in = new Scanner(System.in);
String st = in.nextLine();
String[] tokens = st.split("[\\W]+");
When I gave the input as a new line and printed the no. of tokens .I have got the answer as one.But i want it as zero.What should i do? Here the delimiters are all the symbols.
Short answer: To get the tokens in str (determined by whitespace separators), you can do the following:
String str = ... //some string
str = str.trim() + " "; //modify the string for the reasons described below
String[] tokens = str.split("\\s+");
Longer answer:
First of all, the argument to split() is the delimiter - in this case one or more whitespace characters, which is "\\s+".
If you look carefully at the Javadoc of String#split(String, int) (which is what String#split(String) calls), you will see why it behaves like this.
If the expression does not match any part of the input then the resulting array has just one element, namely this string.
This is why "".split("\\s+") would return an array with one empty string [""], so you need to append the space to avoid this. " ".split("\\s+") returns an empty array with 0 elements, as you want.
When there is a positive-width match at the beginning of this string then an empty leading substring is included at the beginning of the resulting array.
This is why " a".split("\\s+") would return ["", "a"], so you need to trim() the string first to remove whitespace from the beginning.
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Since String#split(String) calls String#split(String, int) with the limit argument of zero, you can add whitespace to the end of the string without changing the number of words (because trailing empty strings will be discarded).
UPDATE:
If the delimiter is "\\W+", it's slightly different because you can't use trim() for that:
String str = ...
str = str.replaceAll("^\\W+", "") + " ";
String[] tokens = str.split("\\W+");
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
String line = null;
while (!(line = in.nextLine()).isEmpty()) {
//logic
}
System.out.print("Empty Line");
}
output
Empty Line