Compare two arrayList and get longest matching String - java

So what I'm trying to do is get two text files and to return the longest matching string in both. I put both textfiles in arraylist and seperated them by everyword. This is my code so far, but I'm just wondering how I would return the longest String and not just the first one found.
for(int i = 0; i < file1Words.size(); i++)
{
for(int j = 0; j < file2Words.size(); j++)
{
if(file1Words.get(i).equals(file2Words.get(j)))
{
matchingString += file1Words.get(i) + " ";
}
}
}

String longest = "";
for (String s1: file1Words)
for (String s2: file2Words)
if (s1.length() > longest.length() && s1.equals(s2)) longest = s1;

if you are looking for performance in time and space,when compared to above replies, you can use below code.
System.out.println("Start time :"+System.currentTimeMillis());
String longestMatch="";
for(int i = 0; i < file1Words.size(); i++) {
if(file1Words.get(i).length()>longestMatch.length()){
for(int j = 0; j < file2Words.size(); j++) {
String w = file1Words.get(i);
if (w.length() > longestMatch.length() && w.equals(file2Words.get(j)))
longestMatch = w;
}
}
System.out.println("End time :"+System.currentTimeMillis());

I'm not going to give you the code but I'll help you with the main ides...
You will need a new string variable "curLargestString" to keep track of what is currently the largest string. Declare this outside of your for loops. Now, for every time you get two matching words, compare the size of the matching word to the size of the size of the word in "curLargestString". If the new matching word is larger, than set "curLargestString" to the new word. Then, after your for loop have run, return curLargestString.
One more note, be sure to initialize curLargestString with an empty string. This will prevent an error when you call the size function on it after you get your first matching word

Assuming, your files are small enough to fit in memory, sort them both with a custom comparator, that puts longer strings before shorter ones, and otherwise sorts lexicographically.
Then go through both files in order, advancing only one index at a time (teh one, pointing to the "smallest" entry of two), and return the first match.

You can use following code:
String matchingString = "";
Set intersection = new HashSet(file1Words);
intersection.retainAll(file2Words)
for(String word: intersection)
if(word.length() > matchingString.size())
matchingString = word;

private String getLongestString(List<String> list1, List<String> list2) {
String longestString = null;
for (String list1String : list1) {
if (list1String.size() > longestString.size()) {
for (String list2String : list2) {
if (list1String.equals(list2String)) {
longestString = list1String;
}
}
}
}
return longestString;
}

Related

Check for multiple occurrence of certain character in string

Edit: To those who downvote me, this question is difference from the duplicate question which you guy linked. The other question is about returning the indexes. However, for my case, I do not need the index. I just want to check whether there is duplicate.
This is my code:
String word = "ABCDE<br>XYZABC";
String[] keywords = word.split("<br>");
for (int index = 0; index < keywords.length; index++) {
if (keywords[index].toLowerCase().contains(word.toLowerCase())) {
if (index != (keywords.length - 1)) {
endText = keywords[index];
definition.setText(endText);
}
}
My problem is, if the keywords is "ABC", then the string endText will only show "ABCDE". However, "XYZABC" contains "ABC" as well. How to check if the string has multiple occurrence? I would like to make the definition textview become definition.setText(endText + "More"); if there is multiple occurrence.
I tried this. The code is working, but it is making my app very slow. I guess the reason is because I got the String word through textwatcher.
String[] keywords = word.split("<br>");
for (int index = 0; index < keywords.length; index++) {
if (keywords[index].toLowerCase().contains(word.toLowerCase())) {
if (index != (keywords.length - 1)) {
int i = 0;
Pattern p = Pattern.compile(search.toLowerCase());
Matcher m = p.matcher( word.toLowerCase() );
while (m.find()) {
i++;
}
if (i > 1) {
endText = keywords[index];
definition.setText(endText + " More");
} else {
endText = keywords[index];
definition.setText(endText);
}
}
}
}
Is there any faster way?
It's a little hard for me to understand your question, but it sounds like:
You have some string (e.g. "ABCDE<br>XYZABC"). You also have some target text (e.g. "ABC"). You want to split that string on a delimiter (e.g. "<br>", and then:
If exactly one substring contains the target, display that substring.
If more than one substring contains the target, display the last substring that contains it plus the suffix "More"
In your posted code, the performance is really slow because of the Pattern.compile() call. Re-compiling the Pattern on every loop iteration is very costly. Luckily, there's no need for regular expressions here, so you can avoid that problem entirely.
String search = "ABC".toLowerCase();
String word = "ABCDE<br>XYZABC";
String[] keywords = word.split("<br>");
int count = 0;
for (String keyword : keywords) {
if (keyword.toLowerCase().contains(search)) {
++count;
endText = keyword;
}
}
if (count > 1) {
definition.setText(endText + " More");
}
else if (count == 1) {
definition.setText(endText);
}
You are doing it correctly but you are doing unnecessary check which is if (index != (keywords.length - 1)). This will ignore if there is match in the last keywords array element. Not sure is that a part of your requirement.
To enhance performance when you found the match in second place break the loop. You don't need to check anymore.
public static void main(String[] args) {
String word = "ABCDE<br>XYZABC";
String pattern = "ABC";
String[] keywords = word.split("<br>");
String endText = "";
int count = 0;
for (int index = 0; index < keywords.length; index++) {
if (keywords[index].toLowerCase().contains(pattern.toLowerCase())) {
//If you come into this part mean found a match.
if(count == 1) {
// When you found the second match it will break to loop. No need to check anymore
// keep the first found String and append the more part to it
endText += " more";
break;
}
endText = keywords[index];
count++;
}
}
System.out.println(endText);
}
This will print ABCDE more
Hi You have to use your condition statement like this
if (word.toLowerCase().contains(keywords[index].toLowerCase()))
You can use this:
String word = "ABCDE<br>XYZABC";
String[] keywords = word.split("<br>");
for (int i = 0; i < keywords.length - 1; i++) {
int c = 0;
Pattern p = Pattern.compile(keywords[i].toLowerCase());
Matcher m = p.matcher(word.toLowerCase());
while (m.find()) {
c++;
}
if (c > 1) {
definition.setText(keywords[i] + " More");
} else {
definition.setText(keywords[i]);
}
}
But like what I mentioned in comment, there is no double occurrence in word "ABCDE<br>XYZABC" when you want to split it by <br>.
But if you use the word "ABCDE<br>XYZABCDE" there is two occurrence of word "ABCDE"
void test() {
String word = "ABCDE<br>XYZABC";
String sequence = "ABC";
if(word.replaceFirst(sequence,"{---}").contains(sequence)){
int startIndex = word.indexOf(sequence);
int endIndex = word.indexOf("<br>");
Log.v("test",word.substring(startIndex,endIndex)+" More");
}
else{
//your code
}
}
Try this

add space between list elements at odd position

I have a string:
str = "Hello there"
I am removing the whitespace:
String[] parts = str.split("\\s+");
Creating a List and populating it with the parts:
List<String> theParts = new ArrayList<String>();
for (int i = 0; i < parts.length; i++) {
theParts.add(parts[i]);
}
The size of the List is 2.Now, I want to increase it's size in order to be the same size as another list.
Let's say the other list has size 3.
So, I check:
if (otherList.size() > theParts.size()) {
and then, I want to change the theParts list in order to contain an empty space (the number which shows how much greater the otherList is) between it's parts.
So, I want theParts to be (add a space at every odd position):
theParts[0] = "Hello"
theParts[1] = " "
theParts[2] = "there"
I am not sure if this can be happen with Lists, but I can't think another solution.
Or use something like join (doesn't work, just an idea to use something like this):
if (otherList.size() > theParts.size()) {
for (int i = 0; i < otherList.size(); i++) {
if (i%2 !=0) {
String.join(" ", theParts);
}
}
}
Just insert the spaces as you're populating the list:
List<String> theParts = new ArrayList<>(2 * parts.length - 1);
for (int i = 0; i < parts.length; i++) {
if (i > 0) theParts.add(" ");
theParts.add(parts[i]);
}
You could use a word break regex:
public void test() throws Exception {
String str = "Hello there";
List<String> strings = Arrays.asList(str.split("\\b"));
for ( String s : strings ) {
System.out.println("'"+s+"'");
}
}
this will retain all of the spaces for you.
'Hello'
' '
'there'
for(String dis : theParts){
newParts.add(dis);//'newPart is another list '
String last = parts[parts.length -2]; // until new list read last element
if(!last.equals(dis)){
newParts.add(" ");
}if(last.equals(dis)){
newParts.add(" ");
}
}

How to locate simple words amongst compound/simple words using Java?

I have a list of words that have both 'simple' and 'compound' words in them, and would like to implement an algorithm that prints out a list of words without the compound words that are made up of the simple words.
Sampel input:
chat, ever, snapchat, snap, salesperson, per, person, sales, son, whatsoever, what, so
Desired output:
chat, ever, snap, per, sales, son, what, so
I have written the following, but am stuck as to how to take it on from here:
private static String[] find(String[] words) {
ArrayList<String> alist = new ArrayList<String>();
Set<String> r1 = new HashSet<String>();
for(String s: words){
alist.add(s);
}
Collections.sort(alist,new Comparator<String>() {
public int compare(String o1, String o2) {
return o1.length()-o2.length();
}
});
int count= 0;
for(int i=0;i<alist.size();i++){
String check = alist.get(i);
r1.add(check);
for(int j=i+1;j<alist.size();j++){
String temp = alist.get(j);
//System.out.println(check+" "+temp);
if(temp.contains(check) ){
alist.remove(temp);
}
}
}
System.out.println(r1.toString());
String res[] = new String[r1.size()];
for(String i:words){
if(r1.contains(i)){
res[count++] = i;
}
}
return res;
}
Any guidance/insight or suggestions to a better approach would be appreciated.
I tried to go through your code, looks like "son" is not in your output. I believe it failed because of this line:
if(temp.contains(check)) { <-- wrong check.
alist.remove(temp);
}
So instead of simply checking if temp.contains(check), you should have a small loop that does the following:
does temp start with check?
if 1) passed, then let temp = temp.substring(check.length), then go back to 1) again, until temp == "";
Another implementation would be setting up a trie (https://en.wikipedia.org/wiki/Trie) and check using that?
sort the word list based on word length
foreach of the word, if the word is not in the trie, add it to the trie. otherwise, this is either a dup or a compound word
output the trie into a list of words using DFS.
step 1 make sure that when u check for a compound word, its simple word is already in the trie.
I didn't try to find the bug in your code, but rather wrote my own impl using a simple loop and a recursive helper method:
private static String[] find(String[] array) {
Set<String> words = new LinkedHashSet<>(Arrays.asList(array));
Set<String> otherWords = new HashSet<>(words);
for (Iterator<String> i = words.iterator(); i.hasNext(); ) {
String next = i.next();
otherWords.remove(next);
if (isCompound(next, otherWords)) {
i.remove();
} else {
otherWords.add(next);
}
}
return words.stream().toArray(String[]::new);
}
private static boolean isCompound(String string, Set<String> otherWords) {
if (otherWords.contains(string)) {
return true;
}
for (String word : otherWords) {
if (string.startsWith(word)) {
return isCompound(string.replaceAll("^" + word, ""), otherWords);
}
if (string.endsWith(word)) {
return isCompound(string.replaceAll(word + "$", ""), otherWords);
}
}
return false;
}
See live demo.
This produces your desired output, which requires preserving word order.
Explanation
A compound word is comprised solely of other words in the list. Importantly, this implies that compound words both start and end with other words. Rather than search for other words at every position in a word, we can use this fact to only check the start/end , which greatly simplifies the code.
Thus: for each word in the list, if it start/ends with another word, remove that word and repeat the process until there's nothing left, at which point you know the word is compound.
A set of "other words", which is the full set with the current word removed, is passed to the helper method to further simplify the code.
Here is my straightforward n^2 solution:
static String[] simpleWords(String[] words) {
String[] result;
HashSet<Integer> map = new HashSet<>();
for(int i = 0; i < words.length; i++) {
String word = words[i];
for(int j = 0; j < words.length; j++) {
if(j != i) {
word = word.replaceAll(words[j], "");
}
}
if(!word.equals("")) {
map.add(i);
}
}
result = new String[map.size()];
int i = 0;
for(int index: map) {
result[i] = words[index];
i++;
}
return result;
}

Extract words from an array of Strings in java based on conditions

I am trying to do an assignment that works with Arrays and Strings. The code is almost complete, but I've run into a hitch. Every time the code runs, it replaces the value in the index of the output array instead of putting the new value in a different index. For example, if I was trying to search for the words containing a prefix "b" in the array of strings, the intended output is "bat" and "brewers" but instead, the output comes out as "brewers" and "brewers". Any suggestions? (ps. The static main method is there for testing purposes.)
--
public static void main(String[] args) {
String[] words = {"aardvark", "bat", "brewers", "cadmium", "wolf", "dastardly", "enigmatic", "frenetic",
"sycophant", "rattle", "zinc", "alloy", "tunnel", "nitrate", "sample", "yellow", "mauve", "abbey",
"thinker", "junk"};
String prefix = "b";
String[] output = new String[wordsStartingWith(words, prefix).length];
output = wordsStartingWith(words, prefix);
for (int i = 0; i < output.length; i++) {
System.out.println("Words: " + i + " " + output[i]);
}
}
public static String[] wordsStartingWith(String[] words, String prefix) {
// method that finds and returns all strings that start with the prefix
String[] returnWords;
int countWords = 0;
for (int i = 0; i < words.length; i++) {
// loop to count the number of words that actually have the prefix
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
countWords++;
}
}
// assign length of array based on number of words containing prefix
returnWords = new String[countWords];
for (int i = 0; i < words.length; i++) {
// loop to put strings containing prefix into new array
for (int j = 0; j < returnWords.length; j++) {
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
returnWords[j] = words[i];
}
}
}
return returnWords;
}
--
Thank You
Soul
Don't reinvent the wheel. Your code can be replaced by this single, easy to read, bug free, line:
String[] output = Arrays.stream(words)
.filter(w -> w.startsWith(prefix))
.toArray(String[]::new);
Or if you just want to print the matching words:
Arrays.stream(words)
.filter(w -> w.startsWith(prefix))
.forEach(System.out::println);
Its because of the code you have written. If you would have thought it properly you would have realized your mistake.
The culprit code
for (int j = 0; j < returnWords.length; j++) {
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
returnWords[j] = words[i];
}
}
When you get a matching word you set whole of your output array to that word. This would mean the last word found as satisfying the condition will replace all the previous words in the array.
All elements of array returnWords gets first initialized to "bat" and then each element gets replaced by "brewers"
corrected code will be like this
int j = 0;
for (int i = 0; i < words.length; i++) {
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
returnWords[j] = words[i];
j++;
}
}
Also you are doing multiple iterations which is not exactly needed.
For example this statement
String[] output = new String[wordsStartingWith(words, prefix).length];
output = wordsStartingWith(words, prefix);
can be rectified to a simpler statement
String[] output = wordsStartingWith(words, prefix);
The way you're doing this is looping through the same array multiple times.
You only need to check the values once:
public static void main(String[] args) {
String[] words = {"aardvark", "bat", "brewers", "cadmium", "wolf", "dastardly", "enigmatic", "frenetic",
"sycophant", "rattle", "zinc", "alloy", "tunnel", "nitrate", "sample", "yellow", "mauve", "abbey",
"thinker", "junk"};
String prefix = "b";
for (int i = 0; i < words.length; i++) {
if (words[i].toLowerCase().startsWith(prefix.toLowerCase())) {
System.out.println("Words: " + i + " " + words[i]);
}
}
}
Instead of doing two separate loops, try just having one:
String[] returnWords;
int[] foundWords = new int[words.length];
int countWords = 0;
for (int i = 0; i < words.length; i++) {
// loop to count the number of words that actually have the prefix
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
foundWords[index] = words[i];
countWords++;
}
}
// assign length of array based on number of words containing prefix
returnWords = new String[countWords];
for (int i = 0; i < countWords; i++) {
returnWords[i] = foundWords[i];
}
My method has another array (foundWords) for all the words that you found during the first loop which has the size of words in case every single word starts with the prefix. And index keeps track of where to place the found word in foundWords. And lastly, you just have to go through the countWords and assign each element to your returnWords.
Not only will this fix your code but it will optimize it so that it will run faster (very slightly; the bigger the word bank is, the greater fast it will search through).

Java String Array Mergesort

Hi all I wrote a mergesort program for a string array that reads in .txt files from the user. But what I want to do now is compare both files and print out the words in file one and not in file two for example apple is in file 1 but not file 2. I tried storing it in a string array again and then printing that out at the end but I just cant seem to implement it.
Here is what I have,
FileIO reader = new FileIO();
String words[] = reader.load("C:\\list1.txt");
String list[] = reader.load("C:\\list2.txt");
mergeSort(words);
mergeSort(list);
String x = null ;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length; j++)
{
if(!words[i].equals(list[j]))
{
x = words[i];
}
}
}
System.out.println(x);
Any help or suggestions would be appriciated!
If you want to check the words that are in the first array but do not exist in the second, you can do like this:
boolean notEqual = true;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length && notEqual; j++)
{
if(words[i].equals(list[j])) // If the word of file one exist
{ // file two we set notEqual to false
notEqual = false; // and we terminate the inner cycle
}
}
if(notEqual) // If the notEqual remained true
System.out.println(words[i]); // we print the the element of file one
// that do not exist in the second file
notEqual = true; // set variable to true to be used check
} // the other words of file one.
Basically, you take a word from the first file (string from the array) and check if there is a word in file two that is equal. If you find it, you set the control variable notEqual to false, thus getting out of the inner loop for and not print the word. Otherwise, if there is not any word on file two that match the word from file one, the control variable notEqual will be true. Hence, print the element outside the inner loop for.
You can replace the printing statement, for another one that store the unique word in an extra array, if you wish.
Another solution, although slower that the first one:
List <String> file1Words = Arrays.asList(words);
List <String> file2Words = Arrays.asList(list);
for(String s : file1Words)
if(!file2Words.contains(s))
System.out.println(s);
You convert your arrays to a List using the method Arrays.asList, and use the method contains to verify if the word of the first file is on the second file.
Why not just convert the Arrays to Sets? Then you can simply do
result = wordsSet.removeAll(listSet);
your result will contain all the words that do not exist in list2.txt
Also keep in mind that the set will remove duplicates ;)
you can also just go through the loop and add it when you reached list.length-1.
and if it matches you can break the whole stuff
FileIO reader = new FileIO();
String words[] = reader.load("C:\\list1.txt");
String list[] = reader.load("C:\\list2.txt");
mergeSort(words);
mergeSort(list);
//never ever null
String x = "" ;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length; j++)
{
if(words[i].equals(list[j]))
break;
if(j == list.length-1)
x += words[i] + " ";
}
}
System.out.println(x);
Here is a version (though it does not use sorting)
String[] file1 = {"word1", "word2", "word3", "word4"};
String[] file2 = {"word2", "word3"};
List<String> l1 = new ArrayList(Arrays.asList(file1));
List<String> l2 = Arrays.asList(file2);
l1.removeAll(l2);
System.out.println("Not in file2 " + l1);
it prints
Not in file2 [word1, word4]
This looks kind of close. What you're doing is for every string in words, you're comparing it to every word in list, so if you have even one string in list that's not in words, x is getting set.
What I'd suggest is changing if(!words[i].equals(list[j])) to if(words[i].equals(list[j])). So now you know that the string in words appears in list, so you don't need to display it. if you completely cycle through list without seeing the word, then you know you need to explain it. So something like this:
for(int i = 0; i<words.length; i++)
{
boolean wordFoundInList = false;
for(int j = 0; j<list.length; j++)
{
if(words[i].equals(list[j]))
{
wordFoundInList = true;
break;
}
}
if (!wordFoundInList) {
System.out.println(x);
}
}

Categories

Resources