Fastest way to check if a haystack contains set of needles

Fastest way to check if a haystack contains set of needles - java

I have a haystack string and I would like to check if it contains any of the needle strings. Currently I do it that way:
Set<String> needles = ...;
...
String [] pieces = haystack.split(" ");
for (String piece: pieces) {
if (needles.contains(piece) {
return true;
}
}
return false;
It works, but it is relatively slow.
Question: Is there a faster way to accomplish the task?
Example.
Haystack: I am a big tasty potato .
Needles: big, tasty
== RUN ==
I am a big tasty potato .
|
[tasty] got a match, we are good!

You should take a look at Aho-Corasick algorithm. This suits your problem because it build an automaton of all words(needles) and traverse the text(haystack) over the built automaton to find all matching words. Its basically constructs a finite state machine that resembles a trie.
The time complexity is O(n + m + z) where
z is the total number of occurrences of words in text, n is the length of text and m is the total number characters in all words.
Edit 2
Here is a straight-forward implementation which stop traversing after finding first occurrence of any needle.
import java.util.*;
class AhoCorasick {
static final int ALPHABET_SIZE = 256;
Node[] nodes;
int nodeCount;
public static class Node {
int parent;
char charFromParent;
int suffLink = -1;
int[] children = new int[ALPHABET_SIZE];
int[] transitions = new int[ALPHABET_SIZE];
boolean leaf;
{
Arrays.fill(children, -1);
Arrays.fill(transitions, -1);
}
}
public AhoCorasick(int maxNodes) {
nodes = new Node[maxNodes];
// create root
nodes[0] = new Node();
nodes[0].suffLink = 0;
nodes[0].parent = -1;
nodeCount = 1;
}
public void addString(String s) {
int cur = 0;
for (char ch : s.toCharArray()) {
int c = ch;
if (nodes[cur].children[c] == -1) {
nodes[nodeCount] = new Node();
nodes[nodeCount].parent = cur;
nodes[nodeCount].charFromParent = ch;
nodes[cur].children[c] = nodeCount++;
}
cur = nodes[cur].children[c];
}
nodes[cur].leaf = true;
}
public int suffLink(int nodeIndex) {
Node node = nodes[nodeIndex];
if (node.suffLink == -1)
node.suffLink = node.parent == 0 ? 0 : transition(suffLink(node.parent), node.charFromParent);
return node.suffLink;
}
public int transition(int nodeIndex, char ch) {
int c = ch;
Node node = nodes[nodeIndex];
if (node.transitions[c] == -1)
node.transitions[c] = node.children[c] != -1 ? node.children[c] : (nodeIndex == 0 ? 0 : transition(suffLink(nodeIndex), ch));
return node.transitions[c];
}
// Usage example
public static void main(String[] args) {
AhoCorasick ahoCorasick = new AhoCorasick(1000);
ahoCorasick.addString("big");
ahoCorasick.addString("tasty");
String s = "I am a big tasty potato";
int node = 0;
for (int i = 0; i < s.length(); i++) {
node = ahoCorasick.transition(node, s.charAt(i));
if (ahoCorasick.nodes[node].leaf) {
System.out.println("A match found! Needle ends at: " + i); // A match found! Needle ends at: 9
break;
}
}
}
}
However currently this code will find the end position of any occurrences in text. If you need the starting position and/or the needle, you can trace back from the ending position until finding a space to get the matched word.
This doesn't guaranty speed in worst-case, but should work better on average and best cases.

You can use java8 plus with parallel streams with anymatch function
boolean hi=Arrays.stream(pieces).parallel().anyMatch(i->needle.contains(i));

You should make sure needless is an instance of a HashSet which makes contains a "fast", constant time operation. Next, don't process all of haystack if you don't have to... Try this:
int i, j, l = haystack.length();
for(i = 0; i < l; i = j + 1) {
j = haystack.indexOf(' ', i + 1);
if(j == -1) {
j = l - 1;
}
String hay = haystack.s substring(i, j - 1).trim();
if(hay.length() > 0 && needles.contains(hay)) {
return true;
}
}
return false;
*note: this is untested and indexes might be off by +-1, as well as some edge cases might exist. use at your own risk.

Generally most of your slowdown is the split command. You are way better off searching the one string you have than allocating a crap ton of objects. You'd be better off doing regex, and avoiding new object construction. And using Aho would be quite effective. Assuming your lists are big enough to be troublesome.
public class NeedleFinder {
static final int RANGEPERMITTED = 26;
NeedleFinder next[];
public NeedleFinder() {
}
public NeedleFinder(String haystack) {
buildHaystack(haystack);
}
public void buildHaystack(String haystack) {
buildHaystack(this,haystack,0);
}
public void buildHaystack(NeedleFinder node, String haystack, int pos) {
if (pos >= haystack.length()) return;
char digit = (char) (haystack.charAt(pos) % RANGEPERMITTED);
if (digit == ' ') {
buildHaystack(this,haystack,pos+1);
return;
}
if (node.next == null) node.next = new NeedleFinder[RANGEPERMITTED];
if (node.next[digit] == null) node.next[digit] = new NeedleFinder();
NeedleFinder nodeNext = node.next[digit];
buildHaystack(nodeNext,haystack,pos+1);
}
public boolean findNeedle(String needle) {
return findNeedle(this, needle,0);
}
private boolean findNeedle(NeedleFinder node, String needle, int pos) {
if (pos >= needle.length()) return true;
char digit = (char) (needle.charAt(pos) % RANGEPERMITTED);
if (node.next == null) return false;
if (node.next[digit] == null) return false;
return findNeedle(node.next[digit],needle,pos+1);
}
}
On success, check the contains to make sure it's not a false positive. But, it's fast. We're talking 1/5th the speed of binary search.
Speaking of, binary search is a great idea. It's in the right time complexity alone. Just sort your silly list of haystack strings then when you look through the needles do a binary search. In java these are really basic and items in Collections. Both the .sort() and the .binarySearch() commands. And it's going to be orders of magnitude better than brute.
value = Collections.binarySearch(haystackList, needle, strcomp);
If value is positive it was found.
Collections.sort(words, strcomp);
With the strcomp.
public Comparator<String> strcomp = new Comparator<String>() {
#Override
public int compare(String s, String t1) {
if ((s == null) && (t1 == null)) return 0;
if (s == null) return 1;
if (t1 == null) return -1;
return s.compareTo(t1);
}
};

If it's really all about speed, and you want to search through a list of items instead of a solid string, you could divide the work into different threads (I'm not sure how many items you're checking with, but if it's not taking minutes, this might not be the way to go)
If you don't need to make the haystack into an array, you could instead iterate through needles, and test haystack via String.contains();

Related

Using for loop and charAt to check if each letter appears exactly twice in a word

i'm writing a program in java that checks if a letter appers exactly twice, i was able to write it but my problem is that for some words the code doesn't check if the letter appear exactly twice.
here is my code:
public class Test {
public static void main(String[] args) {
isDoubloon("abba");
isDoubloon("Shanghaiingss");/*it still prints out true though 's' does appear exactly twice*/}
//checks if every letter appears twice in a word
public static void isDoubloon(String s){
String l=s.toLowerCase();
int count=0;
for(int i= 0; i<l.length()-1;i++){
for(int j=i+1;j<l.length();j++){
if(l.charAt(i)==l.charAt(j)) count++;
}
}
if(count%2==0){
System.out.println("True, This is a doubloon");
}else
System.err.println("False, This is not a doubloon");
}}

Your whole logic is not correct. You have to check for every letter in your text if it occurs twice.
Try this:
String l=s.toLowerCase();
boolean check = true;
for(int i= 0; i<l.length();i++){
int count=0;
for(int j=0;j<l.length();j++){
if(l.charAt(i)==l.charAt(j)) count++;
}
if (count != 2) {
check = false;
break;
}
}
if(check==true){
System.out.println("True, This is a doubloon");
}else
System.out.println("False, This is not a doubloon");
}

Your code counts how often each letter occurs (-1) and adds all this values. If the result is even you imply that each letter is exactly twice in the word. That cannot work.
Simply try the word "aaabbbb". (your code think it is a doubloon)
So you need to check if no character occurs exactly twice and that for each character separately.
You could do it this way:
public static void main(String[] args) {
if(isDoubloon("Shanghaiingss")){
System.out.println("True, This is a doubloon");
}else{
System.err.println("False, This is not a doubloon");
}
}
public static boolean isDoubloon(final String s) {
final String l = s.toLowerCase();
for (int i = 0; i < l.length(); i++) {
int count = 0;
for (int j = 0; j < l.length(); j++) {
if (l.charAt(i) == l.charAt(j)) {
count++;
if (2 < count) {
return false; // more than twice
}
}
}
if (1 == count) {
return false; // character occurs only once
}
}
return true;
}
This algorithm is similar to yours. But it is far from fast O(n²). Is you need it you can implement it faster O(n) but you would need some extra space.

The main flaw here is that you are using a single "count" variable when you want to do a count for each letter.
I would suggest using a map to hold a count for each letter, loop over the list and add each letter to your map and finally iterate over the map and confirm all values are 2.
public static void isDoubloon(String s){
String l=s.toLowerCase();
Map<Character, Integer> counts = new HashMap();
for(int i= 0; i<l.length()-1;i++){
int prevValue = counts.getOrDefault(l.charAt(i), 0);
counts.put(l.charAt(i), prevValue + 1);
}
for (Map.Entry<Character, Integer> entry: counts.entrySet()) {
if (entry.getValue() != 2) {
System.err.println("False, This is not a doubloon");
}
}
System.out.println("True, This is a doubloon");
}

Other solution
private boolean isDoubloon(String s) {
String convertWord = s.toLowerCase();
char[] letter = convertWord.toCharArray();
int[] count = new int[26];
for (int letters = 0; letters < letter.length; letters++) {
char index = letter[letters];
count[index - 97]++;
}
for( int i = 0; i < 26; i++ ) {
if (count[i] != 0 && count[i] != 2) return false;
}
return true;
}

public static boolean isDoubloon(String s) {
if (s.length() %2 != 0)
return false;
String str = s.toLowerCase();
while (str.length() > 0) {
int index2 = str.indexOf(str.charAt(0), 1);
if (index2 == -1) {
return false;
}
int index3 = str.indexOf(str.charAt(0), index2 + 1);
if (index3 != -1) {
return false;
}
str = str.substring(1, index2) + str.substring(index2 + 1);
}
return true;
}

Obligatory Java Streams examples:
groupingBy() and counting()
public static boolean isDoubloon(String str) {
return
// Stream over chars, and box to Integer
// These will be the ASCII values of the chars
!str.chars().boxed()
// Group by identity
.collect(Collectors.groupingBy(Function.identity(),
// and map each key to the count of characters
Collectors.counting()))
// We now have a Map<Integer, Long>, the Integer being the character
// value and the Long being the number of occurrences.
// Stream over the Map's values
.values().stream()
// Retain all values unequal to 2
.filter(i -> !Objects.equals(i, 2L))
// Shortcut if found and check if a value is present
.findAny().isPresent();
// If a value is present, that means that there are one or more
// characters with less or more than two occurrences.
}
https://ideone.com/PT8sQi
distinct() and count()
public static boolean isDoubloon(String str) {
long distinct = str.chars().distinct().count();
long length = str.length();
return (length % 2 == 0 && length / 2 == distinct);
}
https://ideone.com/UaOKDF

Dijkstra adjacency list

I have run into a problem converting pseudocode of Dijkstras algorithm into actual code. I was given and adjacency list such as "Location - adjacent location - distance to location," example for one node: AAA AAC 180 AAD 242 AAH 40.
My task was to read a file organized as adjacency list as described, and compute the shortest path from one node to another.
Here is the Dijkstra pseudocode:
void dijkstra( Vertex s )
{
for each Vertex v
{
v.dist = INFINITY;
v.known = false;
}
s.dist = 0;
while( there is an unknown distance vertex )
{
Vertex v = smallest unknown distance vertex;
v.known = true;
for each Vertex w adjacent to v
if( !w.known )
{
DistType cvw = cost of edge from v to w;
if( v.dist + cvw < w.dist )
{
// Update w
decrease( w.dist to v.dist + cvw );
w.path = v;
}
}
}
}
im having the most trouble with the line "for each Vertex w adjacent to v"
Here is my nonworking code:
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
import java.util.ListIterator;
public class Dijkstra {
public static boolean isInteger(String s) {
return isInteger(s, 10);
}
public static boolean isInteger(String s, int radix) {
if (s.isEmpty())
return false;
for (int i = 0; i < s.length(); i++) {
if (i == 0 && s.charAt(i) == '-') {
if (s.length() == 1)
return false;
else
continue;
}
if (Character.digit(s.charAt(i), radix) < 0)
return false;
}
return true;
}
public static void dijkstra(Vertex[] a, Vertex s, int lineCount) {
int i = 0;
while (i < (lineCount)) // each Vertex v
{
a[i].dist = Integer.MAX_VALUE;
a[i].known = false;
i++;
}
s.dist = 0;
int min = Integer.MAX_VALUE; //
while (!(a[0].known == true && a[1].known == true && a[2].known == true && a[3].known == true
&& a[4].known == true && a[5].known == true && a[6].known == true && a[7].known == true
&& a[8].known == true && a[9].known == true && a[10].known == true && a[11].known == true
&& a[12].known == true)) {
System.out.println("here");
for (int b = 0; b < lineCount; b++) {
if (a[b].dist < min && a[b].known == false) {
min = a[b].dist;
}
}
int c = 0;
while (c < lineCount) {
if (a[c].dist == min && a[c].known == false) {
break;
}
c++;
}
System.out.println(min);
a[c].known = true;
int adjSize = a[c].adj.size();
int current = 0;
System.out.println(adjSize);
while (current < adjSize - 1) {
String currentAdjacent = (String) a[c].adj.get(current);
int p = 0;
while (p < lineCount) {
if (a[p].name.equals(currentAdjacent)) {
if (!a[p].known) {
String cvwString = (String) a[c].distance.get(current);
int cvw = Integer.parseInt(cvwString);
System.out.println(" This is cvw" + cvw);
System.out.println("Here2");
if (a[c].dist + cvw < a[p].dist) {
a[p].dist = a[c].dist + cvw;
a[p].path = a[c];
}
}
}
p++;
}
current++;
}
}
}
public static class Vertex {
public List adj; // Adjacency list
public List distance;
public boolean known;
public int dist; // DistType is probably int
public Vertex path;
public String name;
// Other fields and methods as needed
}
public static void printPath(Vertex v) {
if (v.path != null) {
printPath(v.path);
System.out.print(" to ");
}
System.out.print(v);
}
public static void main(String[] args) throws IOException {
int lineCounter = 0;
BufferedReader br = new BufferedReader(new FileReader("airport.txt"));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
lineCounter = lineCounter + 1;
}
Vertex[] arr = new Vertex[lineCounter];
for (int i = 0; i < lineCounter; i++) {
arr[i] = new Vertex();
arr[i].adj = new LinkedList<String>();
arr[i].distance = new LinkedList<Integer>();
}
;
//
int arrayCounter = 0;
String everything = sb.toString();
String[] lines = everything.split("\\s*\\r?\\n\\s*");
for (String line1 : lines) {
arr[arrayCounter] = new Vertex();
arr[arrayCounter].adj = new LinkedList<String>();
arr[arrayCounter].distance = new LinkedList<Integer>();
String[] result = line1.split("\\s+");
for (int x = 0; x < result.length; x++) {
if (x == 0) {
arr[arrayCounter].name = result[0];
continue;
} else if (isInteger(result[x])) {
arr[arrayCounter].distance.add(result[x]);
continue;
} else {
arr[arrayCounter].adj.add(result[x]);
continue;
}
}
arrayCounter++;
}
for (int i = 0; i < 12; i++) {
System.out.println(arr[i].name);
}
System.out.println(lineCounter);
dijkstra(arr, arr[3], lineCounter - 1);
printPath(arr[11]);
} finally {
br.close();
}
}
}
Using my vertex class as is I was using a series of while loops to first, traverse the adjacency strings stored in a linked list while comparing to see which vertex is equivalent to the adjacency list string. Is there a better way to code "for each Vertex w adjacent to v" using my Vertex class? And apologies ahead for messy code and any others style sins i may have committed. Thanks!

To solve this problem you need a bunch of "Node" objects, stored in a HashMap, keyed on Source Location.
In the node, you need a collection of references to adjacent "Node" objects (or at least their "key" so you can write logic against it. The "Node" also needs to know it's location and distance to each "adjacent" node. Think Lundon Underground Tube Maps - each station connects to at least one other station. Usually two or more. Therefore, adjacent nodes to tube stations are the immediate next stops you can get to from that station.
Once you have that data structure in place, you can then use a recursive routine to iterate through each individual node. It should then iterate through each child node (aka adjacent node), and track distances from the initial (source) node to the current node by storing this data in a HashMap and using the current accumulated distance whilst recursing (or "walking" the graph"). This tracking information should be part of your method signature when recursing. You will also need to track the current path you have taken when recursing, in order to avoid circular loops (which will ultimately and ironically cause a StackOverflowError). You can do this by using a HashSet. This Set should track the source and current node's location as the entry key. If you see this present during your recursion, then you have already seen it, so don't continue processing.
I'm not going to code the solution for you because I suspect that you ask more specific questions as you work your way through understanding the answer, which are very likely answered elsewhere.

Turning a String into a LinkedList and browsing it with recursion

I'm very new to recursion (and I'm required to use it) and am having some serious logic trouble using one of my search methods. Please see below:
//these are methods within a Linked List ADT with StringBuilder functionality
//the goal here is to access the char (the Node data) at a certain index
public char charAt(int index)
{
if((firstNode == null) || (index < 0) || (index >= length + 1))
//firstNode is the 1st Node in the Linked List, where the search begins
{
System.out.println("Invalid Index or FirstNode is null");
IndexOutOfBoundsException e = new IndexOutOfBoundsException();
throw e;
}
else
{
char c = searchForChar(firstNode, index);
return c;
}
}
private char searchForChar(Node nodeOne, int index)
{
int i = 0;
if(nodeOne == null) //basecase --> end
{
i = 0;
System.out.println("nodeOne null, returning null Node data");
return 'n';
}
else if(i == index) //basecase --> found
{
i = 0;
return nodeOne.data; //nodeOne.data holds the char in the Node
}
else if(nodeOne != null) //search continues
{
searchForChar(nodeOne.next, index);
i++;
return nodeOne.data;
}
return nodeOne.data;
}
The output is length-1 prints of "nodeOne null, returning null Node data". I don't understand how the recursive statement in the last else-if statement if being reached when it seems like the null statement in the first if statement is being reached as well.
I tried rearranging the if statements so that the if(nodeOne != null) is first, but that gives me a NullPointerException. Not sure what I'm doing wrong. Especially because I can print the data in the Nodes using a toString() method so I know the Nodes don't have null data.
Can anyone please help me understand?

I wrote a complete example I hope this is what you need. If you would loop over the string StackOverflow with i < 14 it will also print the null character \0 if you would use i < 15 it will give you a IndexOutOfBoundsException. By reducing index by 1 every time you are actually saying I need to (index - 1) hops to my destination node.
public class CharTest {
public static class Node {
private char content;
private Node nextNode;
public Node () {
content = '\0';
nextNode = null;
}
public Node (String str) {
Node temp = this;
for (int i = 0; i < str.length(); i++) {
temp.content = str.charAt(i);
temp.nextNode = new Node();
temp = temp.nextNode;
}
}
public char charAt(int index) {
if (index == 0) {
return content;
} else if (index < 0 || nextNode == null) {
throw new IndexOutOfBoundsException();
}
return nextNode.charAt(index - 1);
}
}
public static void main(String[] args) {
Node test = new Node("StackOverflow");
for (int i = 0; i < 13; i++) {
System.out.print(test.charAt(i));
}
System.out.println();
}
}
I will leave making a toString() method either iteratively or recursively an exercise to the reader. But using a StringBuilder or a char[] would be a good idea, because of a performance reasons.

Detecting if a word is valid when it contains a blank

I'm working on a phone based word game, and there could potentially be quite a few blanks (representing any letter) that a player could have the option to use.
I store all the possible words in a hashSet, so detecting if a word is valid when it has one blank is simply a matter of looping through the alphabet replacing the blank with a letter and testing the word. I have a recursive call so this will work with any number of blanks. The code is as follows:
public boolean isValidWord(String word) {
if (word.contains(" ")){
for (char i = 'A'; i <= 'Z'; i++) {
if (isValidWord(word.replaceFirst(" ", Character.toString(i))))
return true;
}
return false;
}
else
return wordHashSet.contains(word);
}
As the number of blanks increases, the number of words we have to test increase exponentially. By the time we get to 3 blanks we're having to do 17576 lookups before we can reject a word, and this is affecting game play. Once there are 4 blanks the game will just freeze for a while.
What is the most efficient way for me to check words with multiple blanks. Should I just iterate through the hashset and check if we have a match against each word? If so, then what's the fastest way for me to compare two strings taking the blanks into account? I've tried doing this using a regular expression and String.matches(xx), but it's too slow. A straight String.equals(xx) is fast enough, but that obviously doesn't take blanks into account.

A very fast method althrough somewhat challenging to implement would be to store your words in a Trie - http://en.wikipedia.org/wiki/Trie
A trie is a tree structure that contains a char in every node and an array of pointers pointing to next nodes.
Without blank spaces it would be easy - just follow the trie structure, you can check this in linear time. When you have a blank, you will have a loop to search all possible routes.
This can sound complicated and difficult if you are not familiar with tries but if you get stuck I can help you with some code.
EDIT:
Ok, here is some c# code for your problem using tries, I think you will have no problems converting it in JAVA. If you do, leave a comment and I will help.
Trie.cs
public class Trie
{
private char blank = '_';
public Node Root { get; set; }
public void Insert(String key)
{
Root = Insert(Root, key, 0);
}
public bool Contains(String key)
{
Node x = Find(Root, key, 0);
return x != null && x.NullNode;
}
private Node Find(Node x, String key, int d)
{ // Return value associated with key in the subtrie rooted at x.
if (x == null)
return null;
if (d == key.Length)
{
if (x.NullNode)
return x;
else
return null;
}
char c = key[d]; // Use dth key char to identify subtrie.
if (c == blank)
{
foreach (var child in x.Children)
{
var node = Find(child, key, d + 1);
if (node != null)
return node;
}
return null;
}
else
return Find(x.Children[c], key, d + 1);
}
private Node Insert(Node x, String key, int d)
{ // Change value associated with key if in subtrie rooted at x.
if (x == null) x = new Node();
if (d == key.Length)
{
x.NullNode = true;
return x;
}
char c = key[d]; // Use dth key char to identify subtrie.
x.Children[c] = Insert(x.Children[c], key, d + 1);
return x;
}
public IEnumerable<String> GetAllKeys()
{
return GetKeysWithPrefix("");
}
public IEnumerable<String> GetKeysWithPrefix(String pre)
{
Queue<String> q = new Queue<String>();
Collect(Find(Root, pre, 0), pre, q);
return q;
}
private void Collect(Node x, String pre, Queue<String> q)
{
if (x == null) return;
if (x.NullNode) q.Enqueue(pre);
for (int c = 0; c < 256; c++)
Collect(x.Children[c], pre + ((char)c), q);
}
}
Node.cs
public class Node
{
public bool NullNode { get; set; }
public Node[] Children { get; set; }
public Node()
{
NullNode = false;
Children = new Node[256];
}
}
Sample usage:
Trie tr = new Trie();
tr.Insert("telephone");
while (true)
{
string str = Console.ReadLine();
if( tr.Contains( str ) )
Console.WriteLine("contains!");
else
Console.WriteLine("does not contain!");
}

A straight String.equals(xx) is fast enough, but that obviously
doesn't take blanks into account.
So I recommend to implement this simple solution, which is very close to String.equals(), and takes blanks into account:
public boolean isValidWord(String word) {
if (wordHashSet.contains(word)) {
return true;
}
for (String fromHashSet: wordHashSet){
if (compareIgnoreBlanks(fromHashSet, word)) {
return true;
}
}
return false;
}
/**
* Inspired by String.compareTo(String). Compares two String's, ignoring blanks in the String given as
* second argument.
*
* #param s1
* String from the HashSet
* #param s2
* String with potential blanks
* #return true if s1 and s2 match, false otherwise
*/
public static boolean compareIgnoreBlanks(String s1, String s2) {
int len = s1.length();
if (len != s2.length()) {
return false;
}
int k = 0;
while (k < len) {
char c1 = s1.charAt(k);
char c2 = s2.charAt(k);
if (c2 != ' ' && c1 != c2) {
return false;
}
k++;
}
return true;
}

public boolean isValidWord(String word) {
word = word.replaceAll(" ", "[a-z]");
Pattern pattern = Pattern.compile(word);
for (String wordFromHashSet: hashSet){
Matcher matcher = pattern.matcher(wordFromHashSet);
if (matcher.matches()) return true;
}
return false;
}

public boolean isValidWord(String word) {
ArrayList<Integer> pos = new ArrayList<Integer>();
for (int i=0; i!=word.length();i++){
if (word.charAt(i) == ' ') pos.add(i);
}
for (String hashSetWord: hashSet){
for (Integer i: pos){
hashSetWord = hashSetWord.substring(0,i)+" "+hashSetWord.substring(i+1);
}
if (hashSetWord.equals(word)) return true;
}
return false;
}

A kind of ugly, but I would guess fairly fast method would be to create a string containing all valid words like this:
WORD1
WORD2
WORD3
etc.
Then use a regex like (^|\n)A[A-Z]PL[A-Z]\n (i.e. replacing all blanks with [A-Z]), and match it on that string.

How do I search for a String in an array of Strings using binarySearch or another method?

Using binarySearch never returns the right index
int j = Arrays.binarySearch(keys,key);
where keys is type String[] and key is type String
I read something about needing to sort the Array, but how do I even do that if that is the case?
Given all this I really just need to know:
How do you search for a String in an array of Strings (less than 1000) then?

From Wikipedia:
"In computer science, a binary search is an algorithm for locating the position of an element in a sorted list by checking the middle, eliminating half of the list from consideration, and then performing the search on the remaining half.[1][2] If the middle element is equal to the sought value, then the position has been found; otherwise, the upper half or lower half is chosen for search based on whether the element is greater than or less than the middle element."
So the prerequisite for binary search is that the data is sorted. It has to be sorted because it cuts the array in half and looks at the middle element. If the middle element is what it is looking for it is done. If the middle element is larger it takes the lower half of the array. If the middle element is smaller it the upper half of the array. Then the process is repeated (look in the middle etc...) until the element is found (or not).
If the data isn't sorted the algorithm cannot work.
So you would do something like:
final String[] data;
final int index;
data = new String[] { /* init the elements here or however you want to do it */ };
Collections.sort(data);
index = Arrays.binarySearch(data, value);
or, if you do not want to sort it do a linear search:
int index = -1; // not found
for(int i = 0; i < data.length; i++)
{
if(data[i].equals(value))
{
index = i;
break; // stop looking
}
}
And for completeness here are some variations with the full method:
// strict one - disallow nulls for everything
public <T> static int linearSearch(final T[] data, final T value)
{
int index;
if(data == null)
{
throw new IllegalArgumentException("data cannot be null");
}
if(value == null)
{
throw new IllegalArgumentException("value cannot be null");
}
index = -1;
for(int i = 0; i < data.length; i++)
{
if(data[i] == null)
{
throw new IllegalArgumentException("data[" + i + "] cannot be null");
}
if(data[i].equals(value))
{
index = i;
break; // stop looking
}
}
return (index);
}
// allow null for everything
public static <T> int linearSearch(final T[] data, final T value)
{
int index;
index = -1;
if(data != null)
{
for(int i = 0; i < data.length; i++)
{
if(value == null)
{
if(data[i] == null)
{
index = i;
break;
}
}
else
{
if(value.equals(data[i]))
{
index = i;
break; // stop looking
}
}
}
}
return (index);
}
You can fill in the other variations, like not allowing a null data array, or not allowing null in the value, or not allowing null in the array. :-)
Based on the comments this is also the same as the permissive one, and since you are not writing most of the code it would be better than the version above. If you want it to be paranoid and not allow null for anything you are stuck with the paranoid version above (and this version is basically as fast as the other version since the overhead of the method call (asList) probably goes away at runtime).
public static <T> int linearSearch(final T[] data, final T value)
{
final int index;
if(data == null)
{
index = -1;
}
else
{
final List<T> list;
list = Arrays.asList(data);
index = list.indexOf(value);
}
return (index);
}

java.util.Arrays.sort(myArray);
That's how binarySearch is designed to work - it assumes sorting so that it can find faster.
If you just want to find something in a list in O(n) time, don't use BinarySearch, use indexOf. All other implementations of this algorithm posted on this page are wrong because they fail when the array contains nulls, or when the item is not present.
public static int indexOf(final Object[] array, final Object objectToFind, int startIndex) {
if (array == null) {
return -1;
}
if (startIndex < 0) {
startIndex = 0;
}
if (objectToFind == null) {
for (int i = startIndex; i < array.length; i++) {
if (array[i] == null) {
return i;
}
}
} else {
for (int i = startIndex; i < array.length; i++) {
if (objectToFind.equals(array[i])) {
return i;
}
}
}
return -1;
}

To respond correctly to you question as you have put it. Use brute force

I hope it will help
public int find(String first[], int start, int end, String searchString){
int mid = start + (end-start)/2;
// start = 0;
if(first[mid].compareTo(searchString)==0){
return mid;
}
if(first[mid].compareTo(searchString)> 0){
return find(first, start, mid-1, searchString);
}else if(first[mid].compareTo(searchString)< 0){
return find(first, mid+1, end, searchString);
}
return -1;
}

Of all the overloaded versions of binarySearch in Java, there is no such a version which takes an argument of String. However, there are three types of binarySearch that might be helpful to your situation:
static int binarySearch(char[] a, char key);
static int binarySearch(Object[] a, Object key);
static int binarySearch(T[] a, T key, Comparator c)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.