Java String Array Mergesort

Java String Array Mergesort - java

Hi all I wrote a mergesort program for a string array that reads in .txt files from the user. But what I want to do now is compare both files and print out the words in file one and not in file two for example apple is in file 1 but not file 2. I tried storing it in a string array again and then printing that out at the end but I just cant seem to implement it.
Here is what I have,
FileIO reader = new FileIO();
String words[] = reader.load("C:\\list1.txt");
String list[] = reader.load("C:\\list2.txt");
mergeSort(words);
mergeSort(list);
String x = null ;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length; j++)
{
if(!words[i].equals(list[j]))
{
x = words[i];
}
}
}
System.out.println(x);
Any help or suggestions would be appriciated!

If you want to check the words that are in the first array but do not exist in the second, you can do like this:
boolean notEqual = true;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length && notEqual; j++)
{
if(words[i].equals(list[j])) // If the word of file one exist
{ // file two we set notEqual to false
notEqual = false; // and we terminate the inner cycle
}
}
if(notEqual) // If the notEqual remained true
System.out.println(words[i]); // we print the the element of file one
// that do not exist in the second file
notEqual = true; // set variable to true to be used check
} // the other words of file one.
Basically, you take a word from the first file (string from the array) and check if there is a word in file two that is equal. If you find it, you set the control variable notEqual to false, thus getting out of the inner loop for and not print the word. Otherwise, if there is not any word on file two that match the word from file one, the control variable notEqual will be true. Hence, print the element outside the inner loop for.
You can replace the printing statement, for another one that store the unique word in an extra array, if you wish.
Another solution, although slower that the first one:
List <String> file1Words = Arrays.asList(words);
List <String> file2Words = Arrays.asList(list);
for(String s : file1Words)
if(!file2Words.contains(s))
System.out.println(s);
You convert your arrays to a List using the method Arrays.asList, and use the method contains to verify if the word of the first file is on the second file.

Why not just convert the Arrays to Sets? Then you can simply do
result = wordsSet.removeAll(listSet);
your result will contain all the words that do not exist in list2.txt
Also keep in mind that the set will remove duplicates ;)

you can also just go through the loop and add it when you reached list.length-1.
and if it matches you can break the whole stuff
FileIO reader = new FileIO();
String words[] = reader.load("C:\\list1.txt");
String list[] = reader.load("C:\\list2.txt");
mergeSort(words);
mergeSort(list);
//never ever null
String x = "" ;
for(int i = 0; i<words.length; i++)
{
for(int j = 0; j<list.length; j++)
{
if(words[i].equals(list[j]))
break;
if(j == list.length-1)
x += words[i] + " ";
}
}
System.out.println(x);

Here is a version (though it does not use sorting)
String[] file1 = {"word1", "word2", "word3", "word4"};
String[] file2 = {"word2", "word3"};
List<String> l1 = new ArrayList(Arrays.asList(file1));
List<String> l2 = Arrays.asList(file2);
l1.removeAll(l2);
System.out.println("Not in file2 " + l1);
it prints
Not in file2 [word1, word4]

This looks kind of close. What you're doing is for every string in words, you're comparing it to every word in list, so if you have even one string in list that's not in words, x is getting set.
What I'd suggest is changing if(!words[i].equals(list[j])) to if(words[i].equals(list[j])). So now you know that the string in words appears in list, so you don't need to display it. if you completely cycle through list without seeing the word, then you know you need to explain it. So something like this:
for(int i = 0; i<words.length; i++)
{
boolean wordFoundInList = false;
for(int j = 0; j<list.length; j++)
{
if(words[i].equals(list[j]))
{
wordFoundInList = true;
break;
}
}
if (!wordFoundInList) {
System.out.println(x);
}
}

Related

Java scanner reading null from my text file?

I'm writing some code to read an input file of book titles, and putting the read lines into an array and trying to print out the array. But when I try to print out the array, it just returns 'null' for each read line. I'm not sure what I'm doing wrong or what my code is doing. Any suggestions? Thanks!
Code:
import java.io.*;
import java.util.*;
public class LibraryInputandOutputs {
public static void main(String args[]) throws IOException{
int lineCount = 0;
File inputFile = new File("bookTitles.inp.txt");
Scanner reader = new Scanner(inputFile);
while(reader.hasNextLine()) {
reader.nextLine();
lineCount++;
}
String[] bookArray = new String[lineCount];
while (reader.hasNextLine()) {
for (int i = 0; i < lineCount; i++) {
bookArray[i] = reader.next();
}
}
for (int k = 0; k < lineCount; k++) {
System.out.println(bookArray[k]);
}
reader.close();
inputFile.close();
}
}
My text file I'm reading from is 20 book titles, all on different lines.
My output on the terminal is 20 lines of null.

Lets break this down:
This reads every line of the input file, counts each one, and then discards them:
while(reader.hasNextLine()) {
reader.nextLine();
lineCount++;
}
You are now at the end of file.
Allocate a string array that is large enough.
String[] bookArray = new String[lineCount];
Attempt to read more lines. The loop will terminate immediately because reader.hasNextLine() will return false. You are already at the end of file.
So you the statement assigning to bookArray[i] won't be executed.
while (reader.hasNextLine()) {
for (int i = 0; i < lineCount; i++) {
bookArray[i] = reader.next();
}
}
Since bookArray[i] = ... was never executed above, all of the array elements will still be null.
for (int k = 0; k < lineCount; k++) {
System.out.println(bookArray[k]);
}
One solution is to open and read the file twice.
Another solution is to "reset" the file back to the beginning. (A bit complicated.)
Another solution would be to use a List rather than an array so that you don't need to read the file twice.
Another solution is to search the javadocs for a method that will read all lines of a file / stream as an array of strings.
(Some of these may be precluded by the requirements of your exercise. You work it out ... )
The nested loop in step 3 is also wrong. You don't need a for loop inside a while loop. You need a single loop that "iterates" the over the lines and also increments the array index (i). They don't both need to be done by the loop statement itself. You could do one or the other (or both) in the loop body.

Stephen C has already pointed out the main problems with your logic. You're trying to loop twice through the file but you've already reached the end of the file the first time. Don't loop twice. "Merge" both the while loops into one, remove that for loop inside the while loop and collect all the book titles. You can then use the size of the list to print them later on. My Java might be rusty but here it goes -
import java.io.*;
import java.util.*;
public class LibraryInputandOutputs {
public static void main(String args[]) throws IOException {
// int lineCount = 0; - You don't need this.
File inputFile = new File("bookTitles.inp.txt");
Scanner reader = new Scanner(inputFile);
// Use an array list to collect book titles.
List<String> bookArray = new ArrayList<>();
// Loop through the file and add titles to the array list.
while(reader.hasNextLine()) {
bookArray.add(reader.nextLine());
// lineCount++; - not needed
}
// Not needed -
// while (reader.hasNextLine()) {
// for (int i = 0; i < lineCount; i++) {
// bookArray[i] = reader.next();
// }
// }
// Use the size method of the array list class to get the length of the list
// and use it for looping.
for (int k = 0; k < bookArray.size(); k++) {
System.out.println(bookArray[k]);
}
reader.close();
inputFile.close();
}
}

I agree with Stephen C. In particular, using a List is usually better than an array because it's more flexible. If you need an array, you can always use toArray() after the List is filled.
Are your book titles on separate lines? If so you might not need a Scanner class, and could use something like a BufferedReader or LineNumberReader.

Extract words from an array of Strings in java based on conditions

I am trying to do an assignment that works with Arrays and Strings. The code is almost complete, but I've run into a hitch. Every time the code runs, it replaces the value in the index of the output array instead of putting the new value in a different index. For example, if I was trying to search for the words containing a prefix "b" in the array of strings, the intended output is "bat" and "brewers" but instead, the output comes out as "brewers" and "brewers". Any suggestions? (ps. The static main method is there for testing purposes.)
--
public static void main(String[] args) {
String[] words = {"aardvark", "bat", "brewers", "cadmium", "wolf", "dastardly", "enigmatic", "frenetic",
"sycophant", "rattle", "zinc", "alloy", "tunnel", "nitrate", "sample", "yellow", "mauve", "abbey",
"thinker", "junk"};
String prefix = "b";
String[] output = new String[wordsStartingWith(words, prefix).length];
output = wordsStartingWith(words, prefix);
for (int i = 0; i < output.length; i++) {
System.out.println("Words: " + i + " " + output[i]);
}
}
public static String[] wordsStartingWith(String[] words, String prefix) {
// method that finds and returns all strings that start with the prefix
String[] returnWords;
int countWords = 0;
for (int i = 0; i < words.length; i++) {
// loop to count the number of words that actually have the prefix
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
countWords++;
}
}
// assign length of array based on number of words containing prefix
returnWords = new String[countWords];
for (int i = 0; i < words.length; i++) {
// loop to put strings containing prefix into new array
for (int j = 0; j < returnWords.length; j++) {
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
returnWords[j] = words[i];
}
}
}
return returnWords;
}
--
Thank You
Soul

Don't reinvent the wheel. Your code can be replaced by this single, easy to read, bug free, line:
String[] output = Arrays.stream(words)
.filter(w -> w.startsWith(prefix))
.toArray(String[]::new);
Or if you just want to print the matching words:
Arrays.stream(words)
.filter(w -> w.startsWith(prefix))
.forEach(System.out::println);

Its because of the code you have written. If you would have thought it properly you would have realized your mistake.
The culprit code
for (int j = 0; j < returnWords.length; j++) {
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
returnWords[j] = words[i];
}
}
When you get a matching word you set whole of your output array to that word. This would mean the last word found as satisfying the condition will replace all the previous words in the array.
All elements of array returnWords gets first initialized to "bat" and then each element gets replaced by "brewers"
corrected code will be like this
int j = 0;
for (int i = 0; i < words.length; i++) {
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
returnWords[j] = words[i];
j++;
}
}
Also you are doing multiple iterations which is not exactly needed.
For example this statement
String[] output = new String[wordsStartingWith(words, prefix).length];
output = wordsStartingWith(words, prefix);
can be rectified to a simpler statement
String[] output = wordsStartingWith(words, prefix);

The way you're doing this is looping through the same array multiple times.
You only need to check the values once:
public static void main(String[] args) {
String[] words = {"aardvark", "bat", "brewers", "cadmium", "wolf", "dastardly", "enigmatic", "frenetic",
"sycophant", "rattle", "zinc", "alloy", "tunnel", "nitrate", "sample", "yellow", "mauve", "abbey",
"thinker", "junk"};
String prefix = "b";
for (int i = 0; i < words.length; i++) {
if (words[i].toLowerCase().startsWith(prefix.toLowerCase())) {
System.out.println("Words: " + i + " " + words[i]);
}
}
}

Instead of doing two separate loops, try just having one:
String[] returnWords;
int[] foundWords = new int[words.length];
int countWords = 0;
for (int i = 0; i < words.length; i++) {
// loop to count the number of words that actually have the prefix
if (words[i].substring(0, prefix.length()).equalsIgnoreCase(prefix)) {
foundWords[index] = words[i];
countWords++;
}
}
// assign length of array based on number of words containing prefix
returnWords = new String[countWords];
for (int i = 0; i < countWords; i++) {
returnWords[i] = foundWords[i];
}
My method has another array (foundWords) for all the words that you found during the first loop which has the size of words in case every single word starts with the prefix. And index keeps track of where to place the found word in foundWords. And lastly, you just have to go through the countWords and assign each element to your returnWords.
Not only will this fix your code but it will optimize it so that it will run faster (very slightly; the bigger the word bank is, the greater fast it will search through).

Compare two arrayList and get longest matching String

So what I'm trying to do is get two text files and to return the longest matching string in both. I put both textfiles in arraylist and seperated them by everyword. This is my code so far, but I'm just wondering how I would return the longest String and not just the first one found.
for(int i = 0; i < file1Words.size(); i++)
{
for(int j = 0; j < file2Words.size(); j++)
{
if(file1Words.get(i).equals(file2Words.get(j)))
{
matchingString += file1Words.get(i) + " ";
}
}
}

String longest = "";
for (String s1: file1Words)
for (String s2: file2Words)
if (s1.length() > longest.length() && s1.equals(s2)) longest = s1;

if you are looking for performance in time and space,when compared to above replies, you can use below code.
System.out.println("Start time :"+System.currentTimeMillis());
String longestMatch="";
for(int i = 0; i < file1Words.size(); i++) {
if(file1Words.get(i).length()>longestMatch.length()){
for(int j = 0; j < file2Words.size(); j++) {
String w = file1Words.get(i);
if (w.length() > longestMatch.length() && w.equals(file2Words.get(j)))
longestMatch = w;
}
}
System.out.println("End time :"+System.currentTimeMillis());

I'm not going to give you the code but I'll help you with the main ides...
You will need a new string variable "curLargestString" to keep track of what is currently the largest string. Declare this outside of your for loops. Now, for every time you get two matching words, compare the size of the matching word to the size of the size of the word in "curLargestString". If the new matching word is larger, than set "curLargestString" to the new word. Then, after your for loop have run, return curLargestString.
One more note, be sure to initialize curLargestString with an empty string. This will prevent an error when you call the size function on it after you get your first matching word

Assuming, your files are small enough to fit in memory, sort them both with a custom comparator, that puts longer strings before shorter ones, and otherwise sorts lexicographically.
Then go through both files in order, advancing only one index at a time (teh one, pointing to the "smallest" entry of two), and return the first match.

You can use following code:
String matchingString = "";
Set intersection = new HashSet(file1Words);
intersection.retainAll(file2Words)
for(String word: intersection)
if(word.length() > matchingString.size())
matchingString = word;

private String getLongestString(List<String> list1, List<String> list2) {
String longestString = null;
for (String list1String : list1) {
if (list1String.size() > longestString.size()) {
for (String list2String : list2) {
if (list1String.equals(list2String)) {
longestString = list1String;
}
}
}
}
return longestString;
}

removing duplicated words from an array

I am trying to remove duplicated words from an array, and I keep getting null values. I'm not allowed to use java sorting methods so I have to develop my own. Here's my code:
public class Duplicate{
public static void main(String[] args){
String[] test = {"a", "b", "abvc", "abccc", "a", "bbc", "ccc", "abc", "bbc"};
removeDuplicate(test);
}
public static String[] removeDuplicate(String[] words){
boolean [] isDuplicate = new boolean[words.length];
int i,j;
String[] tmp = new String[words.length];
for (i = 0; i < words.length ; i++){
if (isDuplicate[i])
continue;
for(j = 0; j < words.length ; j++){
if (words[i].equals(words[j])) {
isDuplicate[j] = true;
tmp[i] = words[i];
}
}
}
for(i=0;i<words.length;i++)
System.out.println(tmp[i]);
return tmp;
}
}
I tried doing
if(words == null)
words == "";
But it doesn't work. I also want to return the tmp array with a new size.
For example, test array length = 9, after removing the duplicates,I should get a new array with a length of 7.Thank you for your help.
EDIT:
result i get:
a
b
abvc
abccc
null
bbc
ccc
abc
null

You're getting nulls because the result array contains fewer words than the input array. However, you're constructing the arrays of the same length.
You don't have to sort to solve this problem. However, if you're not allowed to use the tools provided by java.utils, then this is either a poorly contrived test question or whomever told you not to use the Java utility classes is poorly informed.
You can solve without sorting by doing (assuming Java 1.5+):
public class Duplicate {
public static void main(String[] args) {
String[] test = {"a", "b", "abvc", "abccc", "a", "bbc", "ccc", "abc", "bbc"};
String[] deduped = removeDuplicate(test);
print(deduped);
}
public static String[] removeDuplicate(String[] words) {
Set<String> wordSet = new LinkedHashSet<String>();
for (String word : words) {
wordSet.add(word);
}
return wordSet.toArray(new String[wordSet.size()]);
}
public static void print(String[] words) {
for (String word : words) {
System.out.println(word);
}
}
}
The output will be:
a
b
abvc
abccc
bbc
ccc
abc

I would go for hashset to remove duplicates, it will remove duplicates since hash function for the same string will give same value, and duplicates will be eliminated. Then you can convert it to a string.

I would recommend doing this with a different approach. If you can use an ArrayList, why not just create one of those, and add the non-duplicate values to it, like this:
ArrayList<String> uniqueArrayList = new ArrayList<String>();
for(int i = 0; i < words.length; i++){
if(!uniqueArrayList.contains(words[i])){ // If the value isn't in the list already
uniqueArrayList.add(words[i]);
}
}
Now, you have an array list of all of your values without the duplicates. If you need to, you can work on converting that back to a regular array.
EDIT
I really think you should use the above option if you can, as there is no clean or decently efficient way to do this only using arrays. However, if you must, you can do something like this:
You can use the code you have to mark values as null if they are duplicates, and also create a counter to see how many unique values you have, like this:
int uniqueCounter = 0;
for(int i = 0; i < isDuplicate.length; i++){
if(!isDuplicate[i]){
uniqueCounter++;
}
}
Then, you can create a new array of the size of unique items, and loop through the words and add non-duplicate values.
String[] uniqueArray = new String[uniqueCounter];
int uniqueIndex = 0;
int wordsIndex = 0;
while(index < uniqueArray.length){
// Check if words index is not a duplicate
if(!isDuplicate[wordsIndex]){
// Add to array
uniqueArray[uniqueIndex] = words[wordsIndex];
uniqueIndex++; // Need to move to next spot in unique.
}
// Need to move to next spot in words
wordsIndex++;
}
Again, I HIGHLY recommend against something like this. It is very poor, and pains me to write, but for the sake of example on how it could be done using an array, you can try it.

I don't have the time to write functioning code, but I would reccomend to first sort the array using Arrays.sort(stringArray) and then loop throug the array coparing one string to the previous. Strings that match the previous one are duplicates.
Note: This method is probably not the fastest one and though only should be used on small arrays or in tasks where performance does not matter.

What about this approach?
public static String[] removeDuplicate(String[] words){
// remember which word is a duplicate
boolean[] isDuplicate = new boolean[words.length];
// and count them
int countDuplicate = 0;
for (int i = 0; i < words.length ; i++){
// only check "forward" because "backwards checked" duplicates have been marked yet
for(int j = i + 1; j < words.length ; j++){
if (words[i].equals(words[j])) {
isDuplicate[j] = true;
countDuplicate++;
}
}
}
// collect non-duplicate strings
String[] tmp = new String[words.length - countDuplicate];
int j = 0;
for (int i = 0; i < isDuplicate.length; i++) {
if (isDuplicate[i] == false) {
tmp[j] = words[i];
j++;
}
}
// and return them
return tmp;
}

how to split an arraylist of String whenever a space is encountered in java?

I have an input like this in an ArrayList<String>:
cat eats mouse
mouse eats cheese
cheese is tasty
(blank lines should be ignored since I will be reading this input from a file)
and I want to convert it into a 2-d array of String which will have dimensions [no. of elements in ArrayList][3].
The no. 3 is fixed i.e. each sentence will have 3 words.
like this:
"cat" "eats" "mouse"
"mouse" "eats" "cheese"
"cheese" "is" "tasty"
here's what I have tried:
public static int processData(ArrayList<String> array)
{
String str[]=new String[array.size()];
array.toArray(str);
String str1[][]=new String[str.length][5];
for(int i=0;i<str.length;i++)
{
str1[i][]=str.split("\\s+"); //i want to do something like this, but this is showing errors.
}
return 0; //this is temporary, I will be modifying it
}
Tell me if I am not clear.

You are close. In Java, you can't put new elements at the end of an array by using empty brackets []. The following code does the thing. Note that number of elements in the second array is limited by 5. So, after the first 5 words, the rest of the line will be ignored. If the line is shorter, there will be nulls in the end of the array.
public static int processData(ArrayList<String> array) {
String[] str = new String[array.size()];
array.toArray(str);
String[][] str1 = new String[str.length][3];
for(int i=0; i < str.length; i++) {
String[] parts = str[i].split("\\s+");
for(int j = 0; j < parts.length || j < 3; j++) {
str1[i][j] = parts[j];
}
}
// do something next
}

A shorter, and slightelly more efficient version:
static int processData(ArrayList<String> array)
{
String str[][] = new String[array.size()][3];
for(int i = 0; i < str.length; ++i) {
str[i] = array.get(i).split("\\s+");
}
return 0;
}
There is no reasion for the first array called str in your code, since you cann access the Strings directly from the ArrayList.
Also you can don't have to copy the Strings, you can just put the arrays of Strings into the array of arrays, like in my code
Plus, if you have a fixed size of 3, and don't need to add any more to the arrays, why do you allocate space for 5 strings?

As you mentioned "arraylist" in Subject:
try{
BufferedReader br = new BufferedReader(new FileReader("filename"));
Arraylist<String[]> l = new ArrayList<String[]>();
String line;
while((line = br.readline) != null)
l.add(line.split("\\s+");
br.close();
}catch(Exception e){e.printStackTrace();}

Change your for loop to:
String str1[][] = new String[str.length][3];
for(int i = 0; i < str.length; i++) {
str1[i] = str[i].split("\\s+");
}
You don't need to have 5 elements if you know that you have only 3 words, do not waste your resources.
str is a String[], str[i] is a String, str[i].split() is a String[] and so is str1[i]. The types match.
Also, this way the code is clearer and easier to understand. I agree with Ongy to remove str if you do not need it, but I can't tell now because you said you are going to change this method later (at least the return value)
Bonus: Btw, the names array, str, str1 are not the best choice for that piece of code, it is really easy to be confused what is what. Try finding better name such as lines, linesArray, words or something like that

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java String Array Mergesort - java

Why not just convert the Arrays to Sets? Then you can simply do result = wordsSet.removeAll(listSet); your result will contain all the words that do not exist in list2.txt Also keep in mind that the set will remove duplicates ;)

Related

Java scanner reading null from my text file?

Extract words from an array of Strings in java based on conditions

Compare two arrayList and get longest matching String

removing duplicated words from an array

how to split an arraylist of String whenever a space is encountered in java?

Categories

Resources