My thoughts/Questions:
I'm working on a Java challenge(Directions bellow). I have Part 1 finished(shown in the code bellow). I'm very close to having Part 2/3 finished.
As you'll see in my code I have 2 for-loops. The first, to iterate through my array of sorted names. The second, to iterate through the characters in each name.
As stated in the directions, an int values is to be generated for each character, and these values are then added. So, A is 1, B is 2, C is 3...and ABC is 6. This int values is then multiplied by the index number of the given String/name. So, if ABC(with a value of 6) was at index 2, it's score would be 12.
After the above step is complete, I am to total all of the all of the scores(each names score).
The above is my understanding of the directions.
The problem is my output looks like this:
"AARON"
0
"ABBEY"
-25
"ABBIE"
-82
"ABBY"
-90
"ABDUL"
-80
"ABE"
-260
"ABEL"
-240
"ABIGAIL"
-133
"ABRAHAM"
-128
"ABRAM"
-225
"ADA"
-540
"ADAH"
-506
"ADALBERTO"
216
"ADALINE"
-182
"ADAM"
-574
"ADAN"
-600
"ADDIE"
-592
"ADELA"
-629
I've ran through my logic a few times and it seems correct to me, but I don't know how I'm generating these numbers. The only thought I have is that the quotation marks(") are throwing off my calculations. They have an ASCII value of 34. I have attempted to remove them at multiple places in my code with both replace() & replaceAll(), but I have not been able too.
What am I doing wrong/how can I fix it/what do I need to do to complete this assignment/how can I improve my code?
Challenge Directions:
Use the names.txt file, a 46K text file containing over five-thousand first names found in the resources directory.
Part 1: Begin by sorting the list into alphabetical order. Save this new file as p4aNames.txt in the answers directory.
Part 2: Using p4aNames.txt, take the alphabetical value for each name, and multiply this value by its alphabetical position in the list to obtain a name score. For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 × 53 = 49714. Save the list of all name scores as p4bNames.txt.
Part 3: What is the total of all the name scores in the file?
Pic Link Showing Output & Directory:
http://screencast.com/t/tiiBoyOpR
My Current Code:
package app;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Arrays;
public class AlphabetizedList {
public static void main() throws IOException {
new AlphabetizedList().sortingList();
}
public void sortingList() throws IOException {
FileReader fb = new FileReader("resources/names.txt");
BufferedReader bf = new BufferedReader(fb);
String out = bf.readLine();
out = out.substring(out.indexOf("\"")); //get rid of strange characters appearingbeforefirstname
// System.out.println(out); // output:
// "MARY","PATRICIA","LINDA","BARBARA","ELIZABETH","JENNIFER","MARIA"...
String[] sortedStr = out.split(",");
Arrays.sort(sortedStr);
PrintWriter pw = new PrintWriter(new BufferedWriter(new FileWriter("answers/p4aNames.txt")));
for (int i = 0; i < sortedStr.length; i++) {
pw.println(sortedStr[i]);
System.out.println(sortedStr[i]);// print to console just to see output
int score = 0;
// sortedStr[i].replaceAll("\"", ""); // I used this to try to remove the "s from my Strings
for (char ch: sortedStr[i].toUpperCase().toCharArray()) {
score += ((int)ch - 64); /* A is decimal 65 */
}
score = score * i; /* multiply by position in the list */
pw.println(score);
System.out.println(score);
}
bf.close();
fb.close();
pw.close();
}
}
You wrote
// sortedStr[i].replaceAll("\"", ""); // I used this to try to remove the "s from my Strings
Java String is immutable. That function returns a new string with the quotes removed. You can use
sortedStr[i] = sortedStr[i].replaceAll("\"", "");
and it should work fine.
Related
Say there is a file too big to be put to memory. How can I get a random line from it? Thanks.
Update:
I want to the probabilities of getting each line to be equal.
Reading the entire file if you want only one line seems a bit excessive. The following should be more efficient:
Use RandomAccessFile to seek to a random byte position in the file.
Seek left and right to the next line terminator. Let L the line between them.
With probability (MIN_LINE_LENGTH / L.length) return L. Otherwise, start over at step 1.
This is a variant of rejection sampling.
Line lengths include the line terminator character(s), hence MIN_LINE_LENGTH >= 1. (All the better if you know a tighter bound on line length).
It is worth noting that the runtime of this algorithm does not depend on file size, only on line length, i.e. it scales much better than reading the entire file.
Here's a solution. Take a look at the choose() method which does the real thing (the main() method repeatedly exercises choose(), to show that the distribution is indeed quite uniform).
The idea is simple: when you read the first line it has a 100% chance of being chosen as the result. When you read the 2nd line it has a 50% chance of replacing the first line as the result. When you read the 3rd line it has a 33% chance of becoming the result. The fourth line has a 25%, and so on....
import java.io.*;
import java.util.*;
public class B {
public static void main(String[] args) throws FileNotFoundException {
Map<String,Integer> map = new HashMap<String,Integer>();
for(int i = 0; i < 1000; ++i)
{
String s = choose(new File("g:/temp/a.txt"));
if(!map.containsKey(s))
map.put(s, 0);
map.put(s, map.get(s) + 1);
}
System.out.println(map);
}
public static String choose(File f) throws FileNotFoundException
{
String result = null;
Random rand = new Random();
int n = 0;
for(Scanner sc = new Scanner(f); sc.hasNext(); )
{
++n;
String line = sc.nextLine();
if(rand.nextInt(n) == 0)
result = line;
}
return result;
}
}
Either you
read the file twice - once to count the number of lines, the second time to extract a random line, or
use reservoir sampling
Looking over Itay's answer, it looks as though it reads the file a thousand times over after sampling one line of the code, whereas true reservoir sampling should only go over the 'tape' once. I've devised some code to go over code once with real reservoir sampling, based on this and the various descriptions on the web.
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.List;
public class reservoirSampling {
public static void main(String[] args) throws FileNotFoundException, IOException{
Sampler mySampler = new Sampler();
List<String> myList = mySampler.sampler(10);
for(int index = 0;index<myList.size();index++){
System.out.println(myList.get(index));
}
}
}
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.Scanner;
public class Sampler {
public Sampler(){}
public List<String> sampler (int reservoirSize) throws FileNotFoundException, IOException
{
String currentLine=null;
//reservoirList is where our selected lines stored
List <String> reservoirList= new ArrayList<String>(reservoirSize);
// we will use this counter to count the current line number while iterating
int count=0;
Random ra = new Random();
int randomNumber = 0;
Scanner sc = new Scanner(new File("Open_source.html")).useDelimiter("\n");
while (sc.hasNext())
{
currentLine = sc.next();
count ++;
if (count<=reservoirSize)
{
reservoirList.add(currentLine);
}
else if ((randomNumber = (int) ra.nextInt(count))<reservoirSize)
{
reservoirList.set(randomNumber, currentLine);
}
}
return reservoirList;
}
}
The basic premise is that you fill up the reservoir, and then go back to it and fill in random lines with a 1/ReservoirSize chance. I hope this provides more efficient code. Please let me know if this doesn't work for you, as I've literally knocked it up in half an hour.
Use RandomAccessFile:
Construct a RandomAccessFile, file
Get the length of that file, filelen, by calling file.length()
Generate a random number, pos, between 0 and filelen
Call file.seek(pos) to seek to the random position
Call file.readLine() to get to the end of the current line
Read the next line by calling file.readLine() again
Using this method, I've been sampling lines from the Brown Corpus at random, and can easily retrieve a 1000 random samples from randomly chosen files in a few seconds. If I tried to do the same by reading through each file line-by-line it would take me much longer.
The same principle can be used for selecting random elements from a list. Rather than reading through the list and stopping at a random place, if you generate a random number between 0 and the length of the list, then you can index directly into the list.
Reading a random line from a file in java:
public String getRandomLineFromTheFile(String filePathWithFileName) throws Exception {
File file = new File(filePathWithFileName);
final RandomAccessFile f = new RandomAccessFile(file, "r");
final long randomLocation = (long) (Math.random() * f.length());
f.seek(randomLocation);
f.readLine();
String randomLine = f.readLine();
f.close();
return randomLine;
}
Use a BufferedReader and read line wise. Use the java.util.Random object to stop randomly ;)
This question already has answers here:
How do I convert a large binary String to byte array java?
(3 answers)
Closed 6 years ago.
I want to store some 0s and 1s into memory
I do not know how to explain this clearly but I will try my best to do so.
Let's say, I have an IMAGE file of around 420bytes.
red icon
I want to visualize its binary code meaning I want to see the 0s and 1s. I run this piece of code to do that and this works just fine...
import java.util.Scanner;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
public class fileToBin {
public static void main(String[] args) throws Exception {
StringBuilder sb = new StringBuilder();
Scanner ana = new Scanner(System.in);
System.out.println("File?");
String fileName = ana.nextLine();
try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(fileName))) {
for (int b; (b = is.read()) != -1;) {
String s = "0000000" + Integer.toBinaryString(b);
s = s.substring(s.length() - 8);
sb.append(s);
}
}
System.out.println(sb);
}
}
I send FF0000.png as input and got the following as output...
100010010101000001001110010001110000110100001010000110100000101000000000000000000000000000001101010010010100100001000100010100100000000000000000000000001000000000000000000000000000000010000000000010000000011000000000000000000000000011000011001111100110000111001011000000000000000000000000000000010111001101010010010001110100001000000000101011101100111000011100111010010000000000000000000000000000010001100111010000010100110101000001000000000000000010110001100011110000101111111100011000010000010100000000000000000000000000001001011100000100100001011001011100110000000000000000000011101100001100000000000000000000111011000011000000011100011101101111101010000110010000000000000000000000000100111001010010010100010001000001010101000111100001011110111011011101001000110001000000010000000000000000000011001100001110100000111110100011011110111101000010010000100100000111000011101101100001101101010001111001011100000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010011001000010110110011110110101011011010001100001110111001011110010011001011111011001101010000000000000000000000000000000000100100101000101010011100100010010101110010000100110000010000010
I understand that this is the memory orientation(please correct me if I am wrong about any of these terms) of this particular file.
Now, let's say I do not have nay image file and I did not retrieved and binary code of any image file. The only thing I have is this 0s and 1s and I do not know whether this set of 0s and 1s actually represent a file or not. I have no idea what this represents.
I want to insert/load this 0s and 1s into computer memory. How can I do that?
This can be called the reverse process of my earlier action where I retrieved binary code from a file. Now, I want to insert some 0s and 1s into memory and save it as a file. That does not need to be an IMAGE file, any file extension can be okay. Because I assumed that I am not aware of the presence of any image file.
So, my main task is I have some 0s and 1s and I want to load it to memory and save as a file. Is it possible to do that? How can I do this with Java or any other programming language? How does this memory and binary representation work?
Sorry for my noobness and thank you for your patience :)
Given a String of binary called str and some sort of OutputStream (e.g. a FileOutputStream) called out:
For every 8 characters in str, get the byte's numerical value with Integer.parseInt, and write it to out.
String str = ...;
OutputStream out = ...;
for (int i = 0; i < str.length; i += 8) {
String byteStr = str.substring(i, i+8);
int byteVal = Integer.parseInt(byteStr, 2);
out.write(byteVal);
}
Note that this will cause an IndexOutOfBoundsException if str.length isn't a multiple of 8.
I'm having problems tyring to keep score in my "guessing" game. I have to use a for loop or while loop. I have it so 10 random numbers are created in a text file called mystery.txt and a file reader reads these numbers from the text file.
Your score starts at 0. If the user guesses the correct number from the text file they get -10 points. If they get the number wrong they add the the absolute value difference of the number they guessed from a number in the file. The lower the score in the end the better.
When I only run my if else statement once, it works correctly. Once I loop it more than once it starts to act up.
I have to use an if else statement and a for or while loop. Thanks!
Edit- Turns out I have to use a for loop not a while loop, I'm completely lost now.
How it should work:
When you run the program a text file gets generated with 10 different numbers (I already have the code for that ready) The user gets asked to enter a number, the number the user enters gets compared to the first file on the text file. If it is the same never they get -10 points to their score. If they get it wrong they get the difference of the number the guessed and the number in the text file added to the score. This is suppose to repeat ten times.
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Scanner;
import java.util.Random;
import java.lang.Math;
public class lab4Fall15 {
public static void numberGuessingGame() throws IOException {
Scanner myscnr = new Scanner (System.in);
PrintWriter mywriter = new PrintWriter("mysteryNumber.txt");
int randomNumber;
int i = 0;
i=1;
while (i<=10) {
Random randomGen = new Random();
randomNumber= randomGen.nextInt(11);
// then store (write) it in the file
mywriter.println(randomNumber);
i = i + 1;
}
//Decided to use a loop to generate the numbers-------
mywriter.close();
FileReader fr = new FileReader("./mysteryNumber.txt");
BufferedReader textReader = new BufferedReader(fr);
int numberInFile;
// the number in your file is as follows:
numberInFile = Integer.valueOf(textReader.readLine());
int score= 0;
int a = 1;
while (a<=10) {
a=a+1;
System.out.print ("Please enter a number between 0 and 10: ");
int userNumber= myscnr.nextInt();
if (userNumber==numberInFile){
score = score-10;
}
else{
score = score + Math.abs(userNumber-numberInFile);
}
System.out.println ("current score is: "+score);
}
System.out.println ("your score is "+score);
textReader.close();
}
public static void main(String[] args) throws IOException {
// ...
numberGuessingGame();
}
}
if (userNumber==numberInFile){
score = score-10;
}
I don't understand what you going to mean. but I can guess this. your above code not show any error. normally , you check your , above part of code. you take variable 'numberInFile'. sometime , your file reader take this with 'whitespace or String or e.t.c' . first you check this out put .you put manual data to this variable and check out put. if it work fine , you correct that function.
OK, first, let's just go over for loops, since that's what your question was asking about. From the code you provided, it seems that you already understand while loops, and that's good, because in Java, for loops are (usually) just while loops in disguise. In general, if you have this while loop,
int a = 0;
while (a < 10) {
// do stuff with a
a = a + 1; // or ++a or a++
}
You can always rewrite it like this:
for (int a = 0; a < 10; a = a + 1) {
// do stuff with a
}
By convention (and this convention is useful when you study arrays and Collection types) you'll want to index your loops from 0 rather than 1. Since you're just learning, take my word for it for now. Loop from 0 to n-1, not from 1 to n.
With that out of the way, let's tackle why you're getting the wrong answer (which, incidentally, has nothing at all to do with loops). Rewritten as a for loop, the ask-and-score part of your program looks like this.
for (int a = 0; a < 10; ++a) {
System.out.print ("Please enter a number between 0 and 10: ");
int userNumber = myscnr.nextInt();
if (userNumber == numberInFile){
score = score - 10;
} else {
score = score + Math.abs(userNumber - numberInFile);
}
System.out.println ("current score is: "+score);
}
You will note that nowhere in this section do you update the value of numberInFile. That means that every run of this loop is still looking at whatever value that variable had at the beginning of the loop. That value came from this line:
// the number in your file is as follows:
numberInFile = Integer.valueOf(textReader.readLine());
That line is executed exactly once, before the loop runs. If you want to load the next number every time the user guesses a number, you'll need to move it inside the loop. I'll leave that as an exercise to the reader.
You are not actually capturing the number the user is entering. Try this:
int userNumber = Integer.parseInt(KeyIn.readLine());
I'm new to Stack Overflow, and this is my first question/post! I'm working on a project for school using Java. The first part I'm having trouble with inolves:
Read each line in a file (listed at the end of my post) one time
Create a "ragged" array of integers, 4 by X, where my 4 rows will be the "region number" (the number found in the Nbr of Region) column, and fill each column with the state population for that region.
So, for example, Row 1 would hold the state populations of Region 1 resulting in 6 columns, Row 2 represents Region 2 resulting in 7 columns, and so on resulting in a "ragged" array.
My question is how to populate, or what would be the best way to populate my array with the results of my file read? I know how to declare the array, initialize the array and create space in the array, but I'm not sure of how to write my method in my State class to populate the results of my file read into the array. Right now I'm getting an "out of bounds" error when I try to compile this code using Netbeans.
Here is my code for Main and State. my input file is listed beneath it:
import java.util.*;
import java.io.*;
public class Main
{
public static void main(String[] args) throws IOException
{
// create new jagged array obj and fill it with some
// initial "dummy" values
int[][] arrPopulation =
{
{0,1,2,3,4,5},
{0,1,2,3,4,5,6},
{0,1,2,3,4,5,6,7,8,9},
{0,1,2,3,4,5,6,7,8,9,10}
};//end array declaration
// read in file States.txt, instantiate BufferedReader object,
// set new BufferedReader object to variable #newLine
FileReader f = new FileReader("States.txt");
BufferedReader br = new BufferedReader(f);
String newLine = br.readLine();
for (int rows = 0; rows < arrPopulation.length; rows++)
{
for (int col = 0; col < arrPopulation[col].length; col++) {
System.out.print(arrPopulation[rows][col] + " ");
}
// display on new lines; print out in a "table" format
System.out.println();
} // end for
State newState = new State(newLine);
int count = 0;
while(newLine != null)
{
newLine = br.readLine();
System.out.println(newState.getRegionNum());
}// end while
br.close();//close stream
} // end public static void main
} // end main
This is what I have for my State class so far:
import java.util.*;
public class State
{
private String statePop, stateNum, regionNum;
public State(String fileRead)
{
statePop = fileRead.substring(32,39);
regionNum = fileRead.substring(55,fileRead.length());
} // end constructor
public int getStatePop()
{
int population = Integer.parseInt(statePop);
return population;
} // #method getStatePop end method
public int getRegionNum()
{
int numOfRegion = Integer.parseInt(regionNum);
return numOfRegion;
}// end getRegionNum
public int getAvgPop()
{
int average = 2+2;
return average;
// total number of populations
// divide number of populations
}// #return the average population of states
public int getStateTotal()
{
//initialize static variable
int totalPopulation = 0;
int stateTotal = this.getStatePop() + totalPopulation;
return stateTotal;
} // #return stateTotal
public String toString()
{
return statePop + " " + stateNum + " ";
} // #method end toString method
} // end State class
The names of the columns (not used in the file read, just for explaining purposes) are:
State Capital Abbrev Population Region Nbr of Region
Washington Olympia WA 5689263West 6
Oregon Salem OR 3281974West 6
Massachusetts Boston MA 6147132New_England 1
Connecticut Hartford CT 3274069New_England 1
Rhode_Island Providence RI 988480New_England 1
New_York Albany NY18146200Middle_Atlantic2
Pennsylvania Harrisburg PA12001451Middle_Atlantic2
New_Jersey Trenton NJ 8115011Middle_Atlantic2
Maryland Annapolis MD 5134808Middle_Atlantic2
West_Virginia Charleston WV 1811156Middle_Atlantic2
Delaware Dover DE 743603Middle_Atlantic2
Virginia Richmond VA 6791345Middle_Atlantic2
South_Carolina Columbia SC 3835962South 3
Tennessee Nashville TN 5430621South 3
Maine Augusta ME 1244250New_England 1
Vermont Montpelier VT 588632New_England 1
New_Hampshire Concord NH 1185048New_England 1
Georgia Atlanta GA 7642207South 3
Florida Tallahassee FL14915980South 3
Alabama Montgomery AL 4351999South 3
Arkansas Little_Rock AR 2538303South 3
Louisiana Baton_Rouge LA 4368967South 3
Kentucky Frankfort KY 3936499South 3
Mississippi Jackson MS 2752092South 3
North_Carolina Raleigh NC 7546493South 3
California Sacramento CA32182118West 6
Idaho Boise ID 1228684West 6
Montana Helena MT 880453West 6
Wyoming Cheyenne WY 480907West 6
Nevada Carson_City NV 1746898West 6
Utah Salt_Lake_City UT 2099758West 6
Colorado Denver CO 3970971West 6
Alaska Juno AK 614010West 6
Hawaii Honolulu HI 1193001West 6
Am I on the right track here?
Thanks for any help in advance, and sorry for the long post!
The collections package would be useful here.
Create Map < Integer, List < < Integer > >
As you scan the file one line at a time, grab the region number and population.
Is the region number a key in the map?
No. Then create a new List < Integer > and add it to the map. Now it is.
Add the population to the list.
Now, the map should have four entries. Create the outer array like: int [ ] [ ] array = new int [ 4 ] [ ] ;
Then iterate over the lists in the map and populate the outer array like: array[i]=new int[list.size()];
Then iterate over the list and populate the inner array like: array[i][j]=list.get(j);
thanks for your quick response.
My professor hasn't gone over the Collections package yet (looking at some of the classes in the package, I think we'll go over those things next semester) so I'm not too familiar with the Map interface. Also, I think he wants us to use arrays specifically, although he did say that we could use ArrayList...
Up until I read your post, I was re-writing my code in another attempt to solve the problem. I'm going to look at the Map interface, but out of curiosity am I close with the following code? Seems like I just need to fix one line to correct my out of bounds error...
import java.util.*;
import java.io.*;
public class Main
{
public static void main(String[] args) throws IOException
{
// read in file States.txt, instantiate BufferedReader object,
// set new BufferedReader object to variable #newLine
FileReader f = new FileReader("States.txt");
BufferedReader br = new BufferedReader(f);
String newLine = br.readLine();
//declare jagged array
int[][] arrPopulation;
arrPopulation = new int[4][];
//create state object, and get
State newState = new State(newLine);
//read file
int col = 0;
arrPopulation[0] = new int[col];
arrPopulation[1] = new int[col];
arrPopulation[2] = new int[col];
arrPopulation[3] = new int[col];
int stateRegion = newState.getRegionNum();
while (newLine != null)
{
switch (stateRegion)
{
case 1:
arrPopulation[0][col] = newState.getStatePop();
System.out.println("Population added:" +
arrPopulation[0][col]);//test if col was created
col++; //increment columns
break;
case 2:
arrPopulation[1][col] = newState.getStatePop();
col++; //increment columns
break;
case 3:
arrPopulation[2][col] = newState.getStatePop();
col++;
break;
case 6:
arrPopulation[3][col] = newState.getStatePop();
System.out.println("Population added:" +
arrPopulation[3][col]);
col++; //increment columns
break;
}
br.readLine();
}//endwhile
br.close();//close stream
} // end public static void main
} // end main
Sorry if I'm supposed to place all this in the Comments section, however, I hit my character limit and couldn't post my code. Thanks again for your help!
Say there is a file too big to be put to memory. How can I get a random line from it? Thanks.
Update:
I want to the probabilities of getting each line to be equal.
Reading the entire file if you want only one line seems a bit excessive. The following should be more efficient:
Use RandomAccessFile to seek to a random byte position in the file.
Seek left and right to the next line terminator. Let L the line between them.
With probability (MIN_LINE_LENGTH / L.length) return L. Otherwise, start over at step 1.
This is a variant of rejection sampling.
Line lengths include the line terminator character(s), hence MIN_LINE_LENGTH >= 1. (All the better if you know a tighter bound on line length).
It is worth noting that the runtime of this algorithm does not depend on file size, only on line length, i.e. it scales much better than reading the entire file.
Here's a solution. Take a look at the choose() method which does the real thing (the main() method repeatedly exercises choose(), to show that the distribution is indeed quite uniform).
The idea is simple: when you read the first line it has a 100% chance of being chosen as the result. When you read the 2nd line it has a 50% chance of replacing the first line as the result. When you read the 3rd line it has a 33% chance of becoming the result. The fourth line has a 25%, and so on....
import java.io.*;
import java.util.*;
public class B {
public static void main(String[] args) throws FileNotFoundException {
Map<String,Integer> map = new HashMap<String,Integer>();
for(int i = 0; i < 1000; ++i)
{
String s = choose(new File("g:/temp/a.txt"));
if(!map.containsKey(s))
map.put(s, 0);
map.put(s, map.get(s) + 1);
}
System.out.println(map);
}
public static String choose(File f) throws FileNotFoundException
{
String result = null;
Random rand = new Random();
int n = 0;
for(Scanner sc = new Scanner(f); sc.hasNext(); )
{
++n;
String line = sc.nextLine();
if(rand.nextInt(n) == 0)
result = line;
}
return result;
}
}
Either you
read the file twice - once to count the number of lines, the second time to extract a random line, or
use reservoir sampling
Looking over Itay's answer, it looks as though it reads the file a thousand times over after sampling one line of the code, whereas true reservoir sampling should only go over the 'tape' once. I've devised some code to go over code once with real reservoir sampling, based on this and the various descriptions on the web.
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.List;
public class reservoirSampling {
public static void main(String[] args) throws FileNotFoundException, IOException{
Sampler mySampler = new Sampler();
List<String> myList = mySampler.sampler(10);
for(int index = 0;index<myList.size();index++){
System.out.println(myList.get(index));
}
}
}
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.Scanner;
public class Sampler {
public Sampler(){}
public List<String> sampler (int reservoirSize) throws FileNotFoundException, IOException
{
String currentLine=null;
//reservoirList is where our selected lines stored
List <String> reservoirList= new ArrayList<String>(reservoirSize);
// we will use this counter to count the current line number while iterating
int count=0;
Random ra = new Random();
int randomNumber = 0;
Scanner sc = new Scanner(new File("Open_source.html")).useDelimiter("\n");
while (sc.hasNext())
{
currentLine = sc.next();
count ++;
if (count<=reservoirSize)
{
reservoirList.add(currentLine);
}
else if ((randomNumber = (int) ra.nextInt(count))<reservoirSize)
{
reservoirList.set(randomNumber, currentLine);
}
}
return reservoirList;
}
}
The basic premise is that you fill up the reservoir, and then go back to it and fill in random lines with a 1/ReservoirSize chance. I hope this provides more efficient code. Please let me know if this doesn't work for you, as I've literally knocked it up in half an hour.
Use RandomAccessFile:
Construct a RandomAccessFile, file
Get the length of that file, filelen, by calling file.length()
Generate a random number, pos, between 0 and filelen
Call file.seek(pos) to seek to the random position
Call file.readLine() to get to the end of the current line
Read the next line by calling file.readLine() again
Using this method, I've been sampling lines from the Brown Corpus at random, and can easily retrieve a 1000 random samples from randomly chosen files in a few seconds. If I tried to do the same by reading through each file line-by-line it would take me much longer.
The same principle can be used for selecting random elements from a list. Rather than reading through the list and stopping at a random place, if you generate a random number between 0 and the length of the list, then you can index directly into the list.
Reading a random line from a file in java:
public String getRandomLineFromTheFile(String filePathWithFileName) throws Exception {
File file = new File(filePathWithFileName);
final RandomAccessFile f = new RandomAccessFile(file, "r");
final long randomLocation = (long) (Math.random() * f.length());
f.seek(randomLocation);
f.readLine();
String randomLine = f.readLine();
f.close();
return randomLine;
}
Use a BufferedReader and read line wise. Use the java.util.Random object to stop randomly ;)