How to search an array for a part of string? - java

I have an arraylist<string> of words. I sort it using Collections.sort(wordsList);
I'm using this array for an auto-suggest drop down box, so that when the user is typing in a letter, they are given a list of suggestions similar to what they are typing in.
How do I go about searching this array for a prefix of string, say the user types in "mount" and the array contains the word "mountain", how can I search this array and return similar values.
Here's my code so far:
public List<Interface> returnSuggestedList(String prefix) {
String tempPrefix = prefix;
suggestedPhrases.clear();
//suggestedPhrases = new ArrayList<Interface>();
//Vector<String> list = new Vector<String>();
//List<Interface> interfaceList = new ArrayList<Interface>();
Collections.sort(wordsList);
System.out.println("Sorted Vector contains : " + wordsList);
int i = 0;
while (i != wordsList.size()) {
int index = Collections.binarySearch(wordsList, prefix);
String tempArrayString = wordsList.get(index).toString();
if (tempArrayString.toLowerCase().startsWith(prefix.toLowerCase())) {
ItemInterface itemInt = new Item(tempArrayString);
suggestedPhrases.add(itemInt);
System.out.println(suggestedPhrases.get(i).toString());
System.out.println("Element found at : " + index);
}
i++;
}
return suggestedPhrases;
}

The most basic approach would be
List<String> result = new ArrayList<String>();
for(String str: words){
if(str.contains(keyword){
result.add(str);
}
}
You can improve this version, if you only concern with startWith instead of contains then you can distribute words in a HashMap and you will have narrowed search

For this task, there are better data structures than a sorted array of strings. You might look e.g. at DAWG (Directed acyclic word graph).

If wordList is fixed (does not change from one method call to the other) you should sort it somewhere else, because sort is costly, and store it in lowercase.
In the rest of the method you would do something like:
List<String> selected = new ArrayList<String>();
for(String w:wordList){
if(w.startsWith(prefix.toLower())) // or .contains(), depending on
selected.add(w); // what you want exactly
}
return selected;

Also see the trie data structure. This question has useful info. I should think its getPrefixedBy() will be more efficient than anything you can roll by hand quickly.
Of course, this will work for prefix searches only. Contains search is a different beast altogether.

As #Jiri says you can use a DAWG, but if you don't want to go that far you can do some simple and useful things.
Make use of the sorting
If you want to sort the array of words do it previously. don't sort it each time
As it's sorted you can find the first and the last word in the list that are matches. The use list.subList(from, to) to return sublist. It's a little more optimal that adding each one.
Use a pre-sorted structure
Use a TreeSet<String> for storing the strings (the will be sorted internally).
Then use treeSet.subSet(from, true, to, false);
Where from is the prefix and to is the "prefix plus one char". By example if you're looking for abc, to must be abd. If you don't want to make that char transformation anyway you can ask for treeSet.headSet(from) and iterate over it until there are no more prefixes.
This is specially useful if you read more than you write. Maybe ordering strings is a little expensive but once ordered you can find them very fast (O(log n)).
Case insensitive comparing
You can provide a Comparator<String> to the tree set in order to indicate how it must order the strings. You cam implement it or maybe there are a prebuild case-insensitive comparator over there.
Anyway its code should be:
int compare(String a, String b) {
return a.toLowerCase().compareTo(b.toLowerCase());
}

Here is a similar example:
-> http://samuelsjoberg.com/archive/2009/10/autocompletion-in-swing

Related

How to check String contains one of Strings in collection

I want to check if the target string contains string in collections. And match the longest one. E.g.
Target string: str = "eignelaiwgn"
Collection strings: eig, a, eb, eigne, eignep
The result needs to be eigne
First I thought HashMap, but it is not sorted. So I try to put collection strings into ArrayList, then sort the list with string length. Then use for each loop to check
if ( str.contains("eigne") )
This needs to loop list each time. Is there a better(faster) way to achieve this?
Seems pretty straightforward with streams:
String targetString = "eignelaiwgn";
Collection<String> collection = Arrays.asList("eig", "a", "eb", "eigne", "eignep");
Optional<String> longestMatch = collection.stream()
.filter(targetString::contains)
.max(Comparator.comparingInt(String::length));
longestMatch.ifPresent(System.out::println); // eigne
This reads as: For every string in the collection, check if the target string contains it. If true, return the string with the max length. (As the collection might be empty, or as no string in the collection might match the filter, max returns an Optional<String>).
You could use a TreeSet for the same.
String str = "eignelaiwgn";
// Assuming that the 'sub-strings' are stored in a list
List<String> myList = Arrays.asList("eig", "a", "eb", "eigne", "eignep");
// Create a TreeSet that sorts based on descending order of length
Set<String> treeSet = new TreeSet<>((a, b) -> b.length() - a.length());
treeSet.addAll(myList);
String containsSub = treeSet.stream().filter(e -> str.contains(e))
.findFirst()
.orElse("Not found");
Now we iterate over the TreeSet and find the first occurrence where the sub-string is present in the original string. Now since the TreeSet is sorted in descending order of length, iteration will start from the highest to the lowest.
you can use LevensteinDistance() method of StringUtils class in java which will tell you the number of changes needed to change one String into another.you can print string with minimum changes needed, which is your answer. see this document -> LevenshteinDistance
Also look for differences method for same class which will tell the difference between the two string.
You could use a suffix tree. Please follow this link:
https://www.geeksforgeeks.org/pattern-searching-using-suffix-tree/

Collections Sort to sort both ArrayLists the same

My program has to use the Collections sort method to sort the ArrayList of Strings lexicographically but each String has a corresponding integer value stored in a separate ArrayList. I want to sort them both the same so the integer values stay with the correct Strings. And if you know a better way to store both values I'm all ears.
public class a5p1b {
public static void main(String[] args) {
Scanner input = new Scanner(System.in).useDelimiter("[^a-zA-z]+");
// ArrayLists to store the Strings and the frequencies
ArrayList<String> lst = new ArrayList<String>();
ArrayList<Integer> intLst = new ArrayList<Integer>();
//loops through as long as there is user input
while (input.hasNext()) {
String str = input.next().toLowerCase();
// if the list already has the string it doesn't add it and it
// ups the count by 1
if (lst.contains(str)) {
int index = lst.indexOf(str);
intLst.set(index, intLst.get(index) + 1);
} else {
// if the word hasnt been found yet it adds it to the list
lst.add(str);
intLst.add(1);
}
}
}
}
You are getting your abstractions wrong. If that string and that number belong together, then do not keep them in two distinct lists.
Instead create a class (or maybe use one of the existing Pair classes) that holds those two values. You can then provide an equals method for that class; plus a specific comparator, that only compares the string elements.
Finally, you put objects of that class into a single list; and then you sort that list.
The whole idea of good OO programming is to create helpful abstractions!
For the record: as dnault suggests, if there is really no "tight" coupling between strings and numbers you could also use a TreeMap (to be used as TreeMap<String, Integer>) to take care of sorting strings that have a number with them.
Try
inList.sort(Comparator.comparing(i -> i.toString());
Although, I don't think the two lists is a good idea.
You should use a Map to associate each unique String key with an Integer value.
Then you can invoke Collections.sort on the map's set of keys returned by keySet().
Additionally, if you use a SortedMap such as TreeMap, it is not necessary to sort the keys. However that solution may not fulfill the requirements of your "Assignment 5 Problem 1b."

Java Collection, Set, Map or Array to hold unique sorted non blank strings

I am fairly new to java and would like a container that I can use to hold strings that are not empty and have them sorted.
So far, I have mostly been using ArrayList, but this seems a bit limited for this case.
Thanks
Use TreeSet or TreeMap, depending on your requirements. Both are collections that accept unique elements and keep them sorted.
A Set is what you want, as the items in it have to be unique.
As the Strings should be sorted you'll need a TreeSet.
As for the non blank Strings you have to override the insertion methods like this:
Set<String> sortedSetOfStrings = new TreeSet<String>() {
#Override
public boolean add(String s) {
if(s.isEmpty())
return false;
return super.add(s);
}
};
EDIT: Simplified thanks to Peter Rader's comment.
Thanks for all the help. Here is what I eventually came up with, using TreeSet and apache commons StringUtils. My Input is a CSV String so, I didn't use the check on the input.
String csvString = "Cat,Dog, Ball, Hedge,, , Ball, Cat"
String[] array = StringUtils.split((String) csvString, ",");
for (int i = 0; i < array.length; i++)
{
array[i] = array[i].trim(); //Remove unwanted whitespace
}
set = new TreeSet<String>(Arrays.asList(array));
set.remove(""); //Remove the one empty string if it is there
set now contains: Ball,Cat,Dog,Hedge

Compare and retrieve elements from ArrayList

I'm trying to build a simple dictionary that compares a string to a word on the ArrayList and then returns a different value from the list. The ArrayList is laid out with the foreign word and then followed by the English equivalent so the idea is that I type in a word, use scanner to compare it to the array list and then return the index value +1 so if the word I type is 7th on the list, I want it to return the 8th word and print it out.
I've got the basic idea of inputting a string and comparing it, but I don't know how to return the following word from the ArrayList:
public void translateWords(){
String nameSearch;
nameSearch=input.nextLine();
for (Phrase c:phrases) {
if (c.getName().equals(nameSearch)) {
System.out.println( c.advancedToString());
return;
}
}
System.out.println("not on list");
I've tried playing about with the get method for the ArrayList but I'm unsure on how to use it so any feedback would be very appreciated here.
for-each is not appropriate in this case because you can not access successive elements, in your case translated words.
So I suggest you to use Iterate
I suppose phrases is of type List<Phrase>
public void translateWords(){
String nameSearch;
nameSearch=input.nextLine();
Iterator<Phrase> it = phrases.iterator();
while(it.hasNext())
{
Phrase c = it.next();
if (c.getName().equals(nameSearch)) {
System.out.println( it.next().advancedToString());
return;
}
}
System.out.println("not on list");
}
Would something like this do the job for you? Not tested, just from my head:
public void translateWords(){
String nameSearch;
nameSearch=input.nextLine();
if(c.indexof(nameSearch)){
System.out.println(c.get(c.indexof(c.indexof(nameSearch)+1));
} else {
System.out.println("not on list");
}
}
Another way would be to use a dictionary, e.g., a Java HashMap. Without knowing how your Phrase class is implemented, this might look like this:
// create dictionary
Map<Phrase, Phrase> d = new HashMap<Phrase, Phrase>();
// add phrases
d.put(new Phrase("Guten Tag", "de"), new Phrase("Good morning", "en"));
// get translation
Phrase p = d.get(new Phrase(nameSearch, "de"));
(To make this work, Phrase.equals and Phrase.hashCode have to be implemented.)
This seems to be a better choice of data structure for your task. Using a HashMap, lookup times for items are O(1), as opposed to O(n) in your approach, which could be relevant if your dictionary contains thousands or millions of phrases, or if you have to do this very often. A possible downside is that the entries have to be really equal, e.g. you could not (easily) check for "similar" phrases or words.

Searching through Collections in Java

I have a java properties file containing a key/value pair of country names and codes. I will load the contents of this file into a Collection like List or HashMap.
Then, I want users to be able to search for a country, e.g if they type 'Aus' in a textbox and click submit, then I want to search through the collection I have, containing a key/value pair of country codes/names (e.g AUS=>Australia), and return those countries which are found matching.
Is there any more efficient way of doing this, other than looping through the elements of the collection and using charAt()?
If performance is important, you can use a TreeSet or TreeMap to hold the country names, and do the following can be used to identify countries that start with a given string.
NavigableMap<String, String> countries = new TreeMap<String, String>();
countries.put("australia", "Australia");
...
String userText = ...
String tmp = userText.toLower();
List<String> hits = new ArrayList<String>();
Map.Entry<String, String> entry = countries.ceilingEntry(tmp);
while (entry != null && entry.getKey().startsWith(tmp)) {
hits.add(entry.getValue());
entry = map.higherEntry(entry.getKey());
}
// hits now contains all country names starting with the value of `userText`,
// ignoring differences in letter case.
This is O(logN) where N is the number of countries. By contrast a linear search of a collection is O(N)
Looping with String.contains() is the way unless you want to move in some heavy artillery like Lucene.
Short of indexing the collection via something like Lucene, then you'd have to manually check by looping through all of the elements. You could use startsWith as opposed to looping over the string:
String userText = ...
for (Map.Entry<String, String> entry : map) {
boolean entryMatches = entry.getKey().startsWith(userText);
...
Or alternatively use regular expressions:
Pattern pattern = Pattern.compile(userText);
for (Map.Entry<String, String> entry : map) {
boolean entryMatches = pattern.matcher(entry.getKey()).find();
...
Since the list is small enough to load into memory, sort it and then do a binary search, using the static method java.util.Collections.binarySearch(). This returns an index, and works regardless of whether the exact string is in the list or not (although if it's not it returns a negative number, so be sure to check that). Then, starting from that index, just iterative forward to find all the strings with that prefix. As a nice side-effect, the resulting output will be in alphabetical order.
To make the whole thing case insensitive, remember to convert to lowercase when loading the list and of course convert the prefix to lowercase before searching.

Categories

Resources