reading text on file and verifying through an array on java

reading text on file and verifying through an array on java - java

So basically i have to verify how many times words are counted within a file on an array.
Edit#2 ( i will run this program through cmd after)
This program takes a number of command line arguments:
1) The file to open
2+) The words to search for and count
If you don’t give it any words to search for, it defaults to these words: "doctor",
"frankenstein", "the", "monster", "igor", "student", "college", "lightning",
"electricity", "blood", and "soul".
You may want to run it with fewer or different words, eg:
$ java ReadSearchAndSort frankenstein.txt frankenstein doctor igor monster
Edit/Update*:
If we consider the simple program:
public class CopyCat{
public static void main(String[] arguments){
for (int ii = 0; ii < arguments.length; ii++){
System.out.println("Argument "+ ii + " = " +
arguments[ii]);
}
}
}
Then if we run it as such:
$ java CopyCat a b c
we will get the following output
Argument 0 = a
Argument 1 = b
Argument 2 = c
Using this as your basis:
Get the first argument and store it in a String named filename
● Get the rest of the arguments and place them in a String array named
queryWords (this should be of the correct size to hold all of the words to count,
and there should be no maximum size — ignoring memory demands of your
system)
Loop through the scanner while things remain in the file ( i.e. hasNext() is true)
and get the next word in the file (i.e. call next())
● Put the word in an array — note, you will need to constantly resize the array —
○ So first instantiate an empty array, outside of the for loop
String[] words = new String[0]();
○ Then, when you read a word, resize the array using Arrays.copyof, i.e.
words = Arrays.copyOf(words, words.length+1)
○ Then place the word in the last spot of the array
■ … but first, we want to make it lower case
word = word.toLowerCase();
■ and remove all of the non-letters
word = word.replaceAll("[^a-zA-Z ]", "");
For each word in our words list, we need to go through the array of words we have read
in, and count each instance. To do this we will make a helper function:
Implementing the function countWordsInUnsorted
Create a counter variable to record how many times we have seen a word. Create a
loop that iterates over every word in the array. Each time a word matches the query word, increment the counter. We need to use the equals method, since we care about whether two strings have the same characters in them, not that they are the exact same
object.
Once you’ve completed the two tasks above, you can run your code. Verify that
"frankenstein" appears 26 times
Overall im having trouble understanding the directions and how i should order my code so its understandable. If you can help in anyway i would appreciate it. I know its quite long, but this isnt even whole assignment so its shortened.
EDIT 9/7/18 updates instructions
Timing things in Java
How long did your code take to run?
In this assignment, we will see some search algorithms run faster than others. To seehow much faster, we will record how long the code takes to run.
Here is an example of timing code:
// Look at the clock when we start
long t0 = (new Date()).getTime();
for (var i = 0; i < 100000; i++) {
// Do something that takes time
}
// Look at the clock when we are finished
long t1 = (new Date()).getTime();
long elapsed = t1 - t0;
System.out.println("My code took " + elapsed + "milliseconds to
run.")
This is already set up for you in main and will output how long your code took to run.
Your job is to implement and call your functions, and make sure that they are working
correctly. If you look in the starter code, you’ll see that we’re running your two different
search and count methods 100 times, so as to get a good average value for how long it
takes. We’re only do the sort once, as sorting the array of 75289 words already takes
awhile.
4. Sorting an array of words with Merge Sort
(35 points)
● Implement mergeSort (Do not use Java's built-in Array.sort for this assignment)
● Call your new method in main
● Checking if you are right: the words (after SORTED in the output) will be in
alphabetical order.
Implementing mergeSort
Implement a mergeSort method that sorts an array of strings. The signature for this
method should be:
public static void mergeSort(String[] arrayToSort, String[]
tempArray, int first, int last)
The first argument is the String[] to sort, the second argument is an empty temporary
array that should be the same size as the array to sort, the third argument is the starting
index for the portion of the array you want to sort, and the fourth argument is the ending
index for the portion of the array you want to sort.
Note that the version of mergeSort we’ll be implementing sorts arrayToSort in place.
This means that you pass an unsorted array in to mergeSort and after the call that
same array is now sorted.
Note that mergeSort involves comparing to values to see which is larger. With numbers
you can just use < or > to compare them.
if you have two Strings s1 and s2, then:
● s1.compareTo(s2) == 0 if s1 and s2 contain the same string.
● s1.compareTo(s2) < 0 if s1 would be alphabetically ordered before s2.
● s1.compareTo(s2) > 0 if s1 would be alphabetically ordered after s2.
Calling mergeSort in main
To call mergeSort in main, you first need to create a temporary string array that is the same length as allWords.
We want to sort the whole array, so the third argument should be the index of the first word in allWords and the fourth argument should be the
index of the last word in allWords. The four arguments to call mergeSort with are then:
● The array we’re sorting, which is allWords.
● Our new temporary array we’ve created that’s the same length as allWords.
● The index of the first word in allWords (hint: what’s always the index of the very first element of an array?).
● The index of the last word in allWords (hint: if you know the length of an array, you can easily compute the index of the last element).
Checking you are right
Did this sort the array of words? The program prints every 500th word. Do they look
sorted?
5. Counting words (with Binary Search) and timing it
(35 points)
● Implement
public static int binarySearch(String[] sortedWords,
String query, int startIndex, int endIndex)
● Implement
public static int getSmallestIndex(String[] words,
String query, int startIndex, int endIndex)
● Implement
public static int getLargestIndex(String[] words, String
query, int startIndex, int endIndex)
● Call getSmallestIndex and getLargestIndex in main to get the smallest and
largest indices at which the word you’re looking for appears in the sorted array.
Use these two values to compute how many times that word appears.
● Checking if you are right: you will get the same values as in the first search section, but much faster.
Implementing binarySearch
The arguments to binarySearch are:
● The sorted array of words to search in.
● The word to search for.
● The index in the array at which to start searching.
● The index in the array at which to stop searching.
Binary search returns the array index where it found the word. If the word only appears once in the array, then this index will be where it occurs.
But if the word appears multiple times in the sorted array (so all the instances of the word will be next to eachother in the array), then this index will be to one of the words in the middle of the group.
The binary search algorithm doesn’t guarantee that this will be the first element in the group or the last element in the group.
So we need to implement some other methods to do this.
Implementing getSmallestIndex
The method getSmallestIndex will be a recursive method that uses the
binarySearch method to find the smallest index for which a word is found in the array.
The outline for this method is:
● Use binarySearch to find an index to the word. If the index binarySearch
returns is -1, then the word wasn’t found and getSmallestIndex should just
return -1. This is the base case.
● If binarySearch did find the word, then recursively call getSmallestIndex on
the portion of the array before where the word was found. This is from index 0 up
to (but not including) the index where the word was found. If this returns -1 then
we know we already had the smallest index, otherwise the recursive call to
getSmallestIndex found the smallest index and we should return that. This is
the recursive case.
Implementing getLargestIndex
The method getLargestIndex will be a recursive method that uses the binarySearch
method to find the largest index for which a word is found in the array. The outline for
this method is very similar to the outline above for getSmallestIndex, except that the
recursive call should search the portion of the array starting after where binarySearch
has found the word.
Using getSmallestIndex and getLargestIndex in main to count words
Since the array that you’re working with has been sorted by this point, all the same
words appear next to each other (e.g. all 26 appearances of the word "frankenstein"
are next to each other in the array). So if you’ve found the smallest and largest index for
a word, then you can just subtract the two indices to count the words! But it’s not quite
word appears only once (so the first and last index are the same) and the case where
the word doesn’t appear at all.
Checking if you are right
Check that you’re getting the same answers as the naive approach that iterates through
the whole array. Also, look to see how much faster this approach is to the naive
approach!
Example arguments and output
$ java ReadSearchAndSort frankenstein.txt student college frankenstein blood the
Arguments: use ''student,college,frankenstein,blood,the'' words, time 100 iterations, search for words:
student,college,frankenstein,blood,the
NAIVE SEARCH:
student:2
college:3
frankenstein:26
blood:19
the:4194
96 ms for 500 searches, 0.192000 ms per search
SORTING:
38 ms to sort 75097 words
SORTED (every 498 word): a a aboard affection all although ancient and and and and and angel approached
as asked attended be been believe boat but but by cause clerval comprehensive continued country dante
decay desire died do duvillard end escape exception eyes favourite few flourishing for frankenstein from
girl grief had happiness have he heart her his histories however i i i i i ice impossible in in
inhabitants investigating it its know leaves limbs love man me me might mixture most my my my my nature
no not oatmeal of of of of of old on or over passed philosophy possession profoundly rain regular
resolve room saved seem shape should smiles some spirit straw sun tavernier that that the the the the
the the the the the then thick those time to to to to town uncle upon vessels was was was were when
which while will with with would you you
BINARY SEARCH:
student:2
college:3
frankenstein:26
blood:19
the:4194
6 ms for 500 searches, 0.012000 ms per search
Of course the actual timing for running the searches and sort will vary on your computer,
depending on how fast your computer is and how many other processes are running on it.
EDIT 9/7/18
import java.io.*;
import java.util.*;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Scanner;
public class ReadSearchAndSort {
public static void main(String[] args) throws FileNotFoundException {
String filename = args[0];
int n = args.length - 1;
String[] queryWords = null;
String[] allWords = readWords("frankenstein.txt");
String[] tempAllWords = new String[allWords.length];
if (n > 0) { //if user provides the querywords
queryWords = new String[n];
for (int i = 1; i < args.length; i++) {
queryWords[i - 1] = args[i];
}
} else { //if user doesn't provide querywords
queryWords = new String[]{"doctor", "frankenstein", "the", "monster", "igor", "student", "college", "lightning", "electricity", "blood", "soul"};
}
ReadSearchAndSort.readWords("C:/Users/Jordles/IdeaProjects/TimeConversionToSecond.java/out/production/GettingStarted/cs141/frankenstein.txt");
int timingCount = 100;
System.out.println("\nArguments: use '" + String.join(",", queryWords) + "' words, time " + timingCount + " iterations, search for words: " + String.join(",", queryWords) + "\n");
System.out.println("NAIVE SEARCH:");
// Record the current time
long t0 = new Date().getTime();
// Time how long it takes to run timingCount loops
// for countWordsInUnsorted
for (int j = 0; j < timingCount; j++) {
for (int i = 0; i < queryWords.length; i++) {
int count = countWordsInUnsorted(allWords, queryWords[i]);
if (j == 0) {
System.out.println(queryWords[i] + ":" + count);
}
}
}
long t1 = (new Date()).getTime();
long timeToSearchNaive = t1 - t0;
int searchCount = timingCount * queryWords.length;
// Output how long the searches took, for how many searches
// (remember: searches = timingcount * the number of words searched)
int searches = timingCount * 500;
System.out.printf("%d ms for %d searches, %f ms per search\n", timeToSearchNaive, searchCount, timeToSearchNaive * 1.0f / searchCount);
// Sort the list of words
System.out.println("\nSORTING: ");
mergeSort(allWords, tempAllWords, 0, allWords.length);
long t2 = (new Date()).getTime();
// Output how long the sorting took
long timeToSort = t2 - t1;
System.out.printf("%d ms to sort %d words\n", timeToSort, allWords.length);
// Output every 1000th word of your sorted wordlist
int step = (int) (allWords.length * .00663 + 1);
System.out.print("\nSORTED (every " + step + " word): ");
for (int i = 0; i < allWords.length; i++) {
if (i % step == 0)
System.out.print(allWords[i] + " ");
}
System.out.println("\n");
System.out.println("BINARY SEARCH:");
// Run timingCount loops for countWordsInSorted
// for the first loop, output the count for each word
for (int j = 0; j < timingCount; j++) {
for (int i = 0; i < queryWords.length; i++) {
int count = countWordsInUnsorted(allWords, queryWords[i]);
if (j == 0) {
System.out.println(queryWords[i] + ":" + count);
}
}
}
long t3 = (new Date()).getTime();
long timeToSearchBinary = t3 - t2;
System.out.printf("%d ms for %d searches, %f ms per search\n", timeToSearchBinary, searchCount, timeToSearchBinary*1.0f/searchCount);
}
public static String[] readWords(String fileName) throws FileNotFoundException{
String[] words = new String[0];
int i = 0;
File file = new File(fileName);
Scanner input = new Scanner(file, "UTF-8");
input.useDelimiter("\\s+|\\-");
while(input.hasNext()){
String word = input.next();
word = word.toLowerCase();
word = word.replaceAll("[^a-zA-Z ]", "");
words = Arrays.copyOf(words, words.length + 1);
words[i++] = word;
}
return null;
}
public static int countWordsInUnsorted(String[] WordsToCount, String countedWord){
if ((countedWord == null) || (WordsToCount == null)){
return 0;
}
int counter = 0;
for( String word : WordsToCount){
if (word.equals(countedWord)){
counter++;
}
}
return counter;
}
public static void mergeSort(String[] arrayToSort, String[] tempArray, int first, int last){
if (first == last){
return;
}
int mid = (first + last)/2;
mergeSort(arrayToSort, tempArray, first, mid);
mergeSort(arrayToSort, tempArray, mid + 1, last);
merge(arrayToSort, tempArray, first, mid, mid + 1, last);
}
public static void merge(String[] arrayToSort, String[] tempArray, int first, int mid, int mid1, int last){
int j = first;
int k = mid1;
int l = first;
do{
if (tempArray[j].compareTo(tempArray[k]) < 0){
arrayToSort[l] = tempArray[j];
j++;
}
else{
arrayToSort[l] = tempArray[k];
k++;
}
}
while (j <= mid && k <= last){
}
}
public static int binarySearch(String[] sortedWords, String query, int startIndex, int endIndex){
if (startIndex > endIndex){
}
int mid = (startIndex + endIndex)/2;
if(query.compareTo(sortedWords[mid]) == 0){
return;
}
if (query.compareTo(sortedWords[mid]) < 0){
binarySearch(sortedWords, query, startIndex, mid -1 );
}
if (query.compareTo(sortedWords[mid]) > 0){
binarySearch(sortedWords, query, mid + 1, endIndex);
}
return -1;
}
public static int getSmallestIndex(String[] words, String query, int startIndex, int endIndex){
return -1;
}
public static int getLargestIndex(String[] words, String query, int startIndex, int endIndex){
return -1;
}
public static int countWordsInSorted(String[] wordsTocount, String countedWord){
return 0;
}
}

In the readWords method you should call .toLowerCase() and .replaceAll() only after reading the word form the file.
Edit: The return null; must not be the only return of the method, otherwise you're just throwing away everything the method did. You could use it to handle error, but I strongly suggest to return an empty array.
If you return an empty array (or collection), you avoid surprising consumers (whoever call your method) of this method with an unintentional NullPointerException, so so they don't have to guard against it.
Take also a look at the tutorial for try-with-resources statement I've used.
public static String[] readWords(final String filename) {
// Check on input
if (filename == null) {
return null;
}
String[] words = new String[0];
int i = 0; // index of the first empty position in the array
final File inputFile = new File(filename);
try (Scanner in = new Scanner(inputFile)) {
in.useDelimiter("\\s+|\\-");
while (in.hasNext()) {
String word = in.next().toLowerCase();
word = word.replaceAll("[^a-z ]", ""); // No need of A-Z since we called .toLowerCase()
words = Arrays.copyOf(words, words.length + 1);
words[i++] = word; // Put the current word at the bottom of the array, then increment the index
}
} catch (final FileNotFoundException e) {
System.err.println("The file " + filename + " does not exist");
return null;
}
return words;
}
The method is very simple, I suggest using the enhanced for loop.
Please also consider Java naming conventions: variable name should start with lowercase letter.
Edit: same here about the return 0;. It must not be the only return of the method.
public static int countWordsInUnsorted(final String searchedWord, final String[] words) {
// Check on input
if ((searchedWord == null) || (words == null)) {
return 0;
}
int count = 0;
for (final String word : words) {
if (word.equals(searchedWord)) {
count++;
}
}
return count;
}
Edit4:
This program takes one command line argument: the words to search for and count.
java SearchAndSort "frankenstein,doctor,igor,monster"
Since the assignement uses a comma as separator, you can use String.split() to get the array of strings.
Due to the need of calculating the elapsed time, you cannot search and print the words: the print operation is very time expensive, so it would distort the result. That's why I used an array of int to store the count of every word.
int[] wordsCounter = new int[queryWords.length];
In order to calculate the time per search, you need a double, but if you do an arithmetic operation between long, you'll always get a long. That's why you need to cast at least one operand to double.
double timePerSearch = ((double) elapsedTime) / totalSearches;
Refactored main method:
public static void main(final String[] args) {
// Default words to search for
String queryWordsString = "doctor,frankenstein,the,monster,igor,student,college,lightning,electricity,blood,soul";
if (args.length > 0) { // If the user provided query words
queryWordsString = args[0];
}
final String[] queryWords = queryWordsString.split(","); // split the string into an array
System.out.println("SEARCH AND SORT");
System.out.println();
System.out.println("Searching and counting the words " + queryWordsString); // print the words to search for
System.out.println();
// Just the name of the file if it's in the same directory of program
// The absolute path if they are in different directory
final String filename = "frankenstein.txt";
// Read words from file
final String[] words = SearchAndSort.readWords(filename);
if (words == null) {
return;
}
final int[] wordsCounter = new int[queryWords.length];
// Store the starting time
final long startTime = System.currentTimeMillis();
// Search the words and store their count
for (int i = 0; i < queryWords.length; i++) {
final int count = SearchAndSort.countWordsInUnsorted(queryWords[i], words);
wordsCounter[i] = count;
}
final long elapsedTime = System.currentTimeMillis() - startTime;
System.out.println("NAIVE SEARCH:");
// Print how many time each word appears
for (int i = 0; i < queryWords.length; i++) {
System.out.println(queryWords[i] + ": " + wordsCounter[i]);
}
final int totalSearches = queryWords.length * words.length;
final double timePerSearch = ((double) elapsedTime) / totalSearches;
// Print the elapsed time in ms
System.out.println(elapsedTime + " ms for " + totalSearches + " searches, " + timePerSearch + " ms per search");
}
Edit4:
long t0 = ( new Date()).getTime();
for (var i = 0 ; i < 100000 ; i++) {
// Do something that takes time
}
In this example the 100000 isn't a new variable, it's the standard upper limit of the for loop you're going to use. It's queryWords.length for NAIVE SEARCH: block and words.length for SORTING: block.
Edit:
// Record the current time
long t0 = (new Date()).getTime();
Don't do this. If you want to measure elapsed time, you should use long t0 = System.nanoTime(). From the documentation
Returns the current value of the running Java Virtual Machine's high-resolution time source, in nanoseconds.
This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time. The value returned represents nanoseconds since some fixed but arbitrary origin time (perhaps in the future, so values may be negative). The same origin is used by all invocations of this method in an instance of a Java virtual machine; other virtual machine instances are likely to use a different origin.
This method provides nanosecond precision, but not necessarily nanosecond resolution (that is, how frequently the value changes) - no guarantees are made except that the resolution is at least as good as that of currentTimeMillis().
[...]
The values returned by this method become meaningful only when the difference between two such values, obtained within the same instance of a Java virtual machine, is computed.
For example, to measure how long some code takes to execute:
Probably you would read System.currentTimeMillis() documentation too.

Related

Finding the nth term in a sequence

I have a sequence, and I am trying to make a program to find the nth term of the sequence.
The sequence is as follows:
1, 11, 21, 1211, 111221, 312211...
In this sequence, each term describes the previous term. For example, "1211" means that the previous term; the previous term is "21" where there is one occurrence of a 2 and then one occurrence of a 1 (=1211). To get the third term, "21," you look at the second term: 11. There are two occurrences of a 1 which gives us "21."
import java.util.*;
class Main {
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
int n = scan.nextInt();
System.out.println( Main.num(n-1, "1"));
}
public static String num(int times, String x){
if(times == 0){
return x;
}else{
//System.out.println("meow");
String y = "" + x.charAt(0);
int counter = 0;
for(int i = 1; i < x.length(); i++){
if(x.charAt(i) == x.charAt(i-1)){
counter++;
}else{
y += "" + counter + x.charAt(i-1);
counter = 0;
}
}
return num(times--, y);
}
//return "";
}
}
My code uses recursion to find the nth term. But, it gives us errors :(
First, I start of the method "num" by passing it the number of terms-1 (since the first term is already given) and the first term (1).
In the method num, we start off by using a conditional to establish the base case (when you are done finding the nth term).
If the base case is false, then you find the next term in the sequence.

This is a very cool sequence! I like that it is English based and not mathematical, haha. (Though now I wonder ... is there a formula we could make for the nth term? I'm pretty sure it's impossible or uses some crazy-level math, but just something to think about ...)
In your solution, the recursive logic of your code is correct: after you find each term, you repeat the method with your knew number and find the next term using that element, and end when you have determined the first n elements. Your base case is also correct.
However, the algorithm you developed for determining the terms in the sequence is the issue.
To determine the next element in the sequence, we want to:
Logical Error:
Create a empty variable, y, for your next element. The variable, counter, should not start at 0, however. This is because every element will ALWAYS have an occurrence of at least 1, so we should initialize int counter = 1;
Iterate through the characters in x. (You did this step correctly) We begin at i = 1, because we compare each character to the previous one.
If the current character is equal to the previous character, we increment counter by 1.
Otherwise, we concatenate counter and the character being repeated, to y. Remember, to reinitialize counter to 1, not 0.
Technical Errors:
Once we reach the end of iterating x, we need to concatenate our final counter and character to y, since the else statement for the final characters will never run in our for loop.
This is done with the following code: y += "" + counter + x.charAt(x.length() - 1);
Finally, when you are doing your recursive call, you should do --times instead of times--. The difference between these two parameters is that with your original code, you are post-decrementing. This means the value of times is decreasing after the method call, when we want the decreased value to be sent into the method. To solve this, we need to pre-decrement, by doing --times.
import java.util.*;
class CoolSequence {
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
int n = scan.nextInt();
System.out.println(num(n, "1"));
}
public static String num(int times, String x){
if(times == 0){
return x;
}
else{
String y = "";
int counter = 1;
for(int i = 1; i < x.length(); i++){
if(x.charAt(i) == x.charAt(i - 1)){
counter++;
}
else{
y += "" + counter + x.charAt(i - 1);
counter = 1;
}
}
y += "" + counter + x.charAt(x.length() - 1);
return num(--times, y);
}
}
}
Testing:
6
13112221
An alternative approach would be using an iterative method:
import java.util.*;
class CoolSequence2 {
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
ArrayList<String> nums = new ArrayList<String>();
int n = scan.nextInt();
String val = "1";
for(int i = 0; i < n; i++){
String copy = val;
val = "";
while(!copy.equals("")){
char curr = copy.charAt(0);
int ind = 0;
int cons = 0;
while(ind < copy.length() && curr == copy.charAt(ind)){
cons += 1;
ind += 1;
}
val += String.valueOf(cons) + copy.charAt(cons - 1);
copy = copy.substring(cons);
}
nums.add(val);
}
System.out.println(nums.get(nums.size() - 1));
}
}
6
13112221
In this method, we use a for loop to iterate through n terms. To determine each element, we do a similar method to your logic:
We create an empty string, val, to hold the new element, and store our current element in copy. We also initialize a cons, similar to your counter.
While copy is not empty, we iterate through copy and increment cons until there is an element that is not equal to the next element.
When this occurs, we concatenate cons and the repeated element to val, like in your code. Then, we cut out the repeated elements from copy and continue the process.
We add the new value of val to nums, and keep iterating through the n elements.
I hope these two methods of approaching your problem helped! Please let me know if you have any further questions or clarifications :)

You can use Pattern with backreference.
The regular expression "(.)\\1*" matches any single character ("(.)") and zero or more sequences of the same character ("\\1*"). "\\1" is called a backreference, it refers to the string enclosed in parentheses 1st.
For example, it matches 111, 22 and 1 for 111221.
replaceAll() calls the lambda expression specified by the argument each time it matches. The lambda expression receives a MatchResult and returns a string. The matched string is replaced with the result.
The lambda expression in this case concatenates the length of the matched string (match.group().length()) and the first character (match.group(1)).
static final Pattern SEQUENCE_OF_SAME_CHARACTER = Pattern.compile("(.)\\1*");
static String num(int times, String x) {
for (int i = 0; i < times; ++i)
x = SEQUENCE_OF_SAME_CHARACTER.matcher(x).replaceAll(
match -> match.group().length() + match.group(1));
return x;
}
public static void main(String[] args) {
for (int i = 1; i <= 8; ++i)
System.out.print(num(i - 1, "1") + " ");
}
output:
1 11 21 1211 111221 312211 13112221 1113213211

How to sort strings by length

I have run into a problem of sorting strings from an Array. I was able to sort integers, but I don't really know how to sort the Strings from shortest to longest.
I have tried converting the Strings into integers by using Strings.length but I don't know how to convert them back into their original String.
String Handler;
System.out.println("\nNow we will sort String arrays.");
System.out.println("\nHow many words would you like to be sorted.");
Input = in.nextInt();
int Inputa = Input;
String[] Strings = new String [Input];
for (int a = 1; a <= Input; Input --) //will run however many times the user inputed it will
{
counter ++; //counter counts up
System.out.println("Number " + counter + ": "); //display
userinputString = in.next();
Strings[countera] = userinputString; //adds input to array
countera ++;
}
System.out.println("\nThe words you inputed are :");
System.out.println(Arrays.toString(Strings));
System.out.println("\nFrom shortest to longest the words are:");
counter = 0;
int[] String = new int [Strings.length];
for (int i = 0; i < Strings.length; i++) //will run however many times the user inputed it will
{
int a = Strings[i].length();
String[i] = a;
}
System.out.println(Arrays.toString(String));
I expected to be able to have the actual String to sort but the I'm getting numbers and am unable to find how to turn those numbers back into their string after sorting.

If you're allowed to use library functions, then you might want to do the following:
Arrays.sort(Strings, Comparator.comparing(String::length));
this works in Java 8 and above. Just make sure you import import java.util.*; at some point in your file.

It is not possible to convert them as you store only the length - there might be many different strings with same length.
Instead of this you can try to implement your own comparator and pass it to java's sort methods: given two strings returns 1 if first is longer, 0 if equal, -1 if shorter. You can also do this in a lambda comparator passed to the Arrays.sort().
(s1, s2) -> {
if (s1.length() < s2.length()) {
return -1;
}
return s1.length() > s2.length();
}

Do String methods in Java run in O(1) time?

This is my code to find the length of the longest substring without repeating characters. Does this code run in O(n) time? Or does it run in O(n^2) time? I am not sure if the String methods run in O(n) time or O(1) time.
A test case includes: "pwwkewo" returning 4 because "kewo" is the the longest substring inside the string that has unique characters.
public int lengthOfLongestSubstring(String s) {
int maxLength = 0;
String sub = "";
for(int i=0; i<s.length(); i++) {
//add that char to the sub string
sub = sub + s.charAt(i);
//if we find a char in our string
if(sub.indexOf(s.charAt(i)) != -1) {
//drop any replicated indexes before it
sub = sub.substring(sub.indexOf(s.charAt(i)) + 1);
}
if(sub.length() > maxLength) {
maxLength = sub.length();
}
}
return maxLength;
}

sub.indexOf() can take linear time, since it might have to check all the characters of sub. And the length of sub is O(n) in the worst case (assuming your input has no repeating characters).
Therefore your total running time is O(n2), since the loop has n iterations and in each iteration you call indexOf().

Your code runs in O(n^2) as there is a for loop and inside it you call an indexOf() method. This in turn is O(n).
Thus your method is O(n^2).

How does java.util.Collections.contains() perform faster than a linear search?

I've been fooling around with a bunch of different ways of searching collections, collections of collections, etc. Doing lots of stupid little tests to verify my understanding. Here is one which boggles me (source code further below).
In short, I am generating N random integers and adding them to a list. The list is NOT sorted. I then use Collections.contains() to look for a value in the list. I intentionally look for a value that I know won't be there, because I want to ensure that the entire list space is probed. I time this search.
I then do another linear search manually, iterating through each element of the list and checking if it matches my target. I also time this search.
On average, the second search takes 33% longer than the first one. By my logic, the first search must also be linear, because the list is unsorted. The only possibility I could think of (which I immediately discard) is that Java is making a sorted copy of my list just for the search, but (1) I did not authorize that usage of memory space and (2) I would think that would result in MUCH more significant time savings with such a large N.
So if both searches are linear, they should both take the same amount of time. Somehow the Collections class has optimized this search, but I can't figure out how. So... what am I missing?
import java.util.*;
public class ListSearch {
public static void main(String[] args) {
int N = 10000000; // number of ints to add to the list
int high = 100; // upper limit for random int generation
List<Integer> ints;
int target = -1; // target will not be found, forces search of entire list space
long start;
long end;
ints = new ArrayList<Integer>();
start = System.currentTimeMillis();
System.out.print("Generating new list... ");
for (int i = 0; i < N; i++) {
ints.add(((int) (Math.random() * high)) + 1);
}
end = System.currentTimeMillis();
System.out.println("took " + (end-start) + "ms.");
start = System.currentTimeMillis();
System.out.print("Searching list for target (method 1)... ");
if (ints.contains(target)) {
// nothing
}
end = System.currentTimeMillis();
System.out.println(" Took " + (end-start) + "ms.");
System.out.println();
ints = new ArrayList<Integer>();
start = System.currentTimeMillis();
System.out.print("Generating new list... ");
for (int i = 0; i < N; i++) {
ints.add(((int) (Math.random() * high)) + 1);
}
end = System.currentTimeMillis();
System.out.println("took " + (end-start) + "ms.");
start = System.currentTimeMillis();
System.out.print("Searching list for target (method 2)... ");
for (Integer i : ints) {
// nothing
}
end = System.currentTimeMillis();
System.out.println(" Took " + (end-start) + "ms.");
}
}
EDIT: Below is a new version of this code. The interesting thing is that now my manual linear loop performs 16% faster than the contains method (NOTE: both are designed to intentionally search the entire list space, so I know they are equal in number of iterations). I can't account for this 16% gain... more confusion.
import java.util.*;
public class ListSearch {
public static void main(String[] args) {
int N = 10000000; // number of ints to add to the list
int high = 100; // upper limit for random int generation
List<Integer> ints;
int target = -1; // target will not be found, forces search of entire list space
long start;
long end;
ints = new ArrayList<Integer>();
start = System.currentTimeMillis();
System.out.print("Generating new list... ");
for (int i = 0; i < N; i++) {
ints.add(((int) (Math.random() * high)) + 1);
}
end = System.currentTimeMillis();
System.out.println("took " + (end-start) + "ms.");
start = System.currentTimeMillis();
System.out.print("Searching list for target (method 1)... ");
if (ints.contains(target)) {
System.out.println("hit");
}
end = System.currentTimeMillis();
System.out.println(" Took " + (end-start) + "ms.");
System.out.println();
ints = new ArrayList<Integer>();
start = System.currentTimeMillis();
System.out.print("Generating new list... ");
for (int i = 0; i < N; i++) {
ints.add(((int) (Math.random() * high)) + 1);
}
end = System.currentTimeMillis();
System.out.println("took " + (end-start) + "ms.");
start = System.currentTimeMillis();
System.out.print("Searching list for target (method 2)... ");
for (int i = 0; i < N; i++) {
if (ints.get(i) == target) {
System.out.println("hit");
}
}
end = System.currentTimeMillis();
System.out.println(" Took " + (end-start) + "ms.");
}
}

Your comparison code is buggy, and this is distorting your results.
This does search for the target
if (ints.contains(target)) {
// nothing
}
But this does not!
for (Integer i : ints) {
// nothing
}
You are actually just iterating over the list elements without testing them.
Having said that, the second version is slower than the first version for one or more of the following reasons:
The first version will be iterating the backing array using a simple for loop and an index. The second version is equivalent to the following:
Iterator<Integer> it = ints.iterator();
while (it.hasNext()) {
Integer i = (Integer) it.next();
}
In other words, each time around the loop involves 2 method calls and a typecast1.
The first version will return true as soon as it gets a match. Because of the bug in your implementation, the second version will iterate the entire list every time. In fact, given the choice of N and high, this effect is the most likely the main cause of the difference in performance.
1 - Actually, it is not entirely clear what the JIT compiler will do with all of this. It could in theory inline the method calls, deduce that the typcaset is unnecessary, or even optimize away the entire loop. On the other hand, there are factors that may inhibit these optimizations. For instance, ints is declared as List<Integer>which could inhibit inlining ... unless the JIT is able to deduce that the actual type is always the same.
Your results are possibly also distorted for another reason. Your code does not take account of JVM warmup. Read this Question for more details: How do I write a correct micro-benchmark in Java?

Here is the difference:
When you use contains, it use the internal array of the object and does a search like this:
for (int i = 0; i < size; i++)
if (searchObject.equals(listObject[i]))
return true;
return false;
Here when it tries to get the ith element, it directly gets the ith element object form the internal array.
When you write your own, you are writing like:
for (Integer i : ints) {
// nothing
}
its equivalent to :
for(Iterator<Integer> iter= ints.iterator(); iter.hasNext(); ) {
Integer i = iter.next();
}
Which is performing much more steps than the contains.

So I'm not entirely sure you're testing anything. Javac (compiler) is smart enough to realize that you don't have any code inside your for loop and your if statement. In this case, Java will remove that code from it's compilation. The reason you might be getting time back is because you're actually counting the time it takes to print a string. System output time can vary drastically depending on what your system is doing. When writing timing tests, any I/O can create invalid tests.
First I would remove the string prints from inside your timing.
Second of all, ArrayList.contains is linear. It doesn't use the special for loop like you're doing. Your loop has some overhead of getting the iterator from the collection and then iterating over it. This is how the special for loop works behind the scenes.
Hope this helps.

Project Euler 14: Issue with array indexing in a novel solution

The problem in question can be found at http://projecteuler.net/problem=14
I'm trying what I think is a novel solution. At least it is not brute-force. My solution works on two assumptions:
1) The less times you have iterate through the sequence, the quicker you'll get the answer. 2) A sequence will necessarily be longer than the sequences of each of its elements
So I implemented an array of all possible numbers that could appear in the sequence. The highest number starting a sequence is 999999 (as the problem only asks you to test numbers less than 1,000,000); therefore the highest possible number in any sequence is 3 * 999999 + 1 = 2999998 (which is even, so would then be divided by 2 for the next number in the sequence). So the array need only be of this size. (In my code the array is actually 2999999 elements, as I have included 0 so that each number matches its array index. However, this isn't necessary, it is for comprehension).
So once a number comes in a sequence, its value in the array becomes 0. If subsequent sequences reach this value, they will know not to proceed any further, as it is assumed they will be longer.
However, when i run the code I get the following error, at the line introducing the "wh:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 3188644
For some reason it is trying to access an index of the above value, which shouldn't be reachable as it is over the possible max of 29999999. Can anyone understand why this is happening?
Please note that I have no idea if my assumptions are actually sound. I'm an amateur programmer and not a mathematician. I'm experimenting. Hopefully I'll find out whether it works as soon as I get the indexing correct.
Code is as follows:
private static final int MAX_START = 999999;
private static final int MAX_POSSIBLE = 3 * MAX_START + 1;
public long calculate()
{
int[] numbers = new int[MAX_POSSIBLE + 1];
for(int index = 0; index <= MAX_POSSIBLE; index++)
{
numbers[index] = index;
}
int longestChainStart = 0;
for(int index = 1; index <= numbers.length; index++)
{
int currentValue = index;
if(numbers[currentValue] != 0)
{
longestChainStart = currentValue;
while(numbers[currentValue] != 0 && currentValue != 1)
{
numbers[currentValue] = 0;
if(currentValue % 2 == 0)
{
currentValue /= 2;
}
else
{
currentValue = 3 * currentValue + 1;
}
}
}
}
return longestChainStart;
}

Given that you can't (easily) put a limit on the possible maximum number of a sequence, you might want to try a different approach. I might suggest something based on memoization.
Suppose you've got an array of size 1,000,000. Each entry i will represent the length of the sequence from i to 1. Remember, you don't need the sequences themselves, but rather, only the length of the sequences. You can start filling in your table at 1---the length is 0. Starting at 2, you've got length 1, and so on. Now, say we're looking at entry n, which is even. You can look at the length of the sequence at entry n/2 and just add 1 to that for the value at n. If you haven't calculated n/2 yet, just do the normal calculations until you get to a value you have calculated. A similar process holds if n is odd.
This should bring your algorithm's running time down significantly, and prevent any problems with out-of-bounds errors.

You can solve this by this way
import java.util.LinkedList;
public class Problem14 {
public static void main(String[] args) {
LinkedList<Long> list = new LinkedList<Long>();
long length =0;
int res =0;
for(int j=10; j<1000000; j++)
{
long i=j;
while(i!=1)
{
if(i%2==0)
{
i =i/2;
list.add(i);
}
else
{
i =3*i+1;
list.add(i);
}
}
if(list.size()>length)
{
length =list.size();
res=j;
}
list.clear();
}
System.out.println(res+ " highest nuber and its length " + length);
}}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

reading text on file and verifying through an array on java - java

Related

Finding the nth term in a sequence

How to sort strings by length

Do String methods in Java run in O(1) time?

How does java.util.Collections.contains() perform faster than a linear search?

Project Euler 14: Issue with array indexing in a novel solution

Categories

Resources