Java Big File Sorter - java

I am doing an assignment for my class where I have a file that is 7 Mb. Essentially I am supposed to break it up into 2 phases.
Phase 1: I add each word from the file into an array list and sort it in alphabetical order. I then add every 100,000 words into 1 file, so I have 12 files in total with the naming convention as displayed below in code.
Phase 2: For every 2 files, I read one line from each file, and write which one comes first in alphabetical order into a new file (basically sort), until I eventually merge 2 files into 1 that is sorted. I do this in a loop, so that the number of files get halved each time while being sorted, so essentially I would have 7 MB all sorted into one file.
What I am having trouble with: For phase 2, I successfully read phase 1, but it seems that my files are all being copied repeatedly into multiple files, rather than being sorted and merged. I appreciate any help given, thank you.
File: It seems I cannot upload the .txt file, but the code should work so that any file with any number of lines can be merged, just the number of lines variable needs to be changed.
Summary: 1 Big big file unsorted, turns into multiple sorted files (ie. 12), first sort and merge turns it into 6 files, second sort and merge turns it into 3 files, third merge turns it into 2 files, and fourth merge turns it into 1 file big file again.
Code:
package Assignment11;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Scanner;
public class FileSorter_1
{
public static ArrayList<String> storyline = new ArrayList<String>();
public static int num_lines = 100000; //this number can be changed
public static int num_files_initial;
public static int num_files_sec;
public static void main(String[] args) throws IOException
{
phase1();
phase2();
}
public static void phase1() throws IOException
{
Scanner story = new Scanner(new File("Aesop_Shakespeare_Shelley_Twain.txt")); //file name
int f = 0;
while(story.hasNext())
{
int i = 0;
while(story.hasNext())
{
String temp = story.next();
storyline.add(temp);
i++;
if(i > num_lines)
{
break;
}
}
Collections.sort(storyline, String.CASE_INSENSITIVE_ORDER);
BufferedWriter write2file = new BufferedWriter(new FileWriter("temp_0_" + f + ".txt")); //initialze new file
for(int x = 0; x<num_lines;x++)
{
write2file.write(storyline.get(x));
write2file.newLine();
}
write2file.close();
f++;
}
num_files_initial = f;
}
public static void phase2() throws IOException
{
int file_n = 1;
int prev_fn = 0;
int t = 0;
int g = 0;
while(g<5)
{
System.out.println(num_files_initial);
if(t+1 > num_files_initial-1)
{
if(num_files_initial % 2 != 0)
{
BufferedWriter w = new BufferedWriter(new FileWriter("temp_"+file_n +"_" + g + ".txt"));
Scanner file1 = new Scanner(new File("temp_"+prev_fn +"_" + t + ".txt"));
String word1 = file1.next();
while(file1.hasNext())
{
w.write(word1);
w.newLine();
}
g++;
break;
}
num_files_initial = num_files_initial / 2 + num_files_initial % 2;
g = 0;
t = 0;
file_n++;
prev_fn++;
}
String s1="temp_"+file_n +"_" + g + ".txt";
String s2="temp_"+prev_fn +"_" + t + ".txt";
String s3="temp_"+prev_fn +"_" + (t+1) + ".txt";
System.out.println(s2);
System.out.println(s3);
BufferedWriter w = new BufferedWriter(new FileWriter(s1));
Scanner file1 = new Scanner(new File(s2));
Scanner file2 = new Scanner(new File(s3));
String word1 = file1.next();
String word2 = file2.next();
System.out.println(num_files_initial);
//System.out.println(t);
//System.out.println(g);
while(file1.hasNext() && file2.hasNext())
{
if(word1.compareTo(word2) == 1) //if word 1 comes first = 1
{
w.write(word1);
w.newLine();
file1.next();
}
if(word1.compareTo(word2) == 0) //if word 1 comes second = 0
{
w.write(word2);
w.newLine();
file2.next();
}
}
while(file1.hasNext())
{
w.write(word1);
w.newLine();
break;
}
while(file2.hasNext())
{
w.write(word2);
w.newLine();
break;
}
g++;
t+=2;
w.close();
file1.close();
file2.close();
}
}
}

After writing data into the new files you are not clearing the existing sorted array and that's why it is being copied into new files. Here are some fixes:
...
int f = 0;
while(story.hasNext())
{
// initilize the array here.
storyline = new ArrayList<>();
int i = 0;
while(story.hasNext())
{
String temp = story.next();
storyline.add(temp);
i++;
if(i > num_lines)
{
break;
}
}
Collections.sort(storyline, String.CASE_INSENSITIVE_ORDER);
BufferedWriter write2file = new BufferedWriter(new FileWriter("temp_0_" + f + ".txt")); //initialze new file
// instead of num_lines use i
for(int x = 0; x<i;x++)
{
write2file.write(storyline.get(x));
write2file.newLine();
}
write2file.close();
f++;
}
num_files_initial = f;
Hope this helps.

Related

How to store Header in all splitted CSV files?

Below java code is to split a big .csv file into multiple .csv files. But how to store Header in all splitted files?
import java.io.*;
import java.util.Scanner;
import java.io.File;
import java.io.FileInputStream;
import java.util.Iterator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class split {
public static void main(String args[]) {
try {
String inputfile = "E:/Sumit/csv-splitting-2/Proposal_Details__c.csv";
System.out.println("Input Path is :- " + inputfile);
double nol = 100000.0;
File file = new File(inputfile);
Scanner scanner = new Scanner(file);
int count = 0;
while (scanner.hasNextLine()) {
scanner.nextLine();
count++;
}
System.out.println("Lines in the file: " + count);
double temp = (count / nol);
int temp1 = (int) temp;
int nof = 0;
if (temp1 == temp) {
nof = temp1;
} else {
nof = temp1 + 1;
}
System.out.println("No. of files to be generated :" + nof);
FileInputStream fstream = new FileInputStream(inputfile);
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
for (int j = 1; j <= nof; j++) {
String outputpath = "E:/Sumit/csv-splitting-2/";
String outputfile = "File-2-Proposal_Details__c" + j + ".csv";
System.out.println(outputpath + outputfile);
FileWriter fstream1 = new FileWriter(outputpath + outputfile);
BufferedWriter out = new BufferedWriter(fstream1);
for (int i = 1; i <= nol; i++) {
strLine = br.readLine();
if (strLine != null) {
out.write(strLine);
if (i != nol) {
out.newLine();
}
}
}
out.close();
}
in.close();
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
}
}
Assuming your first line is the header, you can have a String header; that will get the read of the first line, eg: header = br.readLine();.
On your for loop for nof (which I assume means number_of_files), you always add the header as the first line when you create a new file.
It would be something like this:
before your for-loop, you just save the header on a variable
String header = br.readLine();
you have 2 for-loops, one that creates a file, the other one that write each line to the newly created file
Inside the first for loop, right after you create the file, you just write the header to it: our.write(header);
General tips:
use variable names that makes sense. nol, nof, j... none of them make sense, you can pretty much call them numOfLines, numOfFiles and currentFile for example.

Writing an object array to csv file in java

I am trying to take an initial CSV file, pass it through a class that checks another file if it has an A or a D to then adds or deletes the associative entry to an array object.
example of pokemon.csv:
1, Bulbasaur
2, Ivysaur
3, venasaur
example of changeList.csv:
A, Charizard
A, Suirtle
D, 2
That being said, I am having a lot of trouble getting the content of my new array to a new CSV file. I have checked to see whether or not my array and class files are working properly. I have been trying and failing to take the final contents of "pokedex1" object array into the new CSV file.
Main File
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Scanner;
public class PokedexManager {
public static void printArray(String[] array) {
System.out.print("Contents of array: ");
for(int i = 0; i < array.length; i++) {
if(i == array.length - 1) {
System.out.print(array[i]);
}else {
System.out.print(array[i] + ",");
}
}
System.out.println();
}
public static void main(String[] args) {
try {
//output for pokedex1 using PokemonNoGaps class
PokemonNoGaps pokedex1 = new PokemonNoGaps();
//initializes scanner to read from csv file
String pokedexFilename = "pokedex.csv";
File pokedexFile = new File(pokedexFilename);
Scanner pokescanner = new Scanner(pokedexFile);
//reads csv file, parses it into an array, and then adds new pokemon objects to Pokemon class
while(pokescanner.hasNextLine()) {
String pokeLine = pokescanner.nextLine();
String[] pokemonStringArray = pokeLine.split(", ");
int id = Integer.parseInt(pokemonStringArray[0]);
String name = pokemonStringArray[1];
Pokemon apokemon = new Pokemon(id, name);
pokedex1.add(apokemon);
}
//opens changeList.csv file to add or delete entries from Pokemon class
String changeListfilename = "changeList.csv";
File changeListFile = new File(changeListfilename);
Scanner changeScanner = new Scanner(changeListFile);
//loads text from csv file to be parsed to PokemonNoGaps class
while(changeScanner.hasNextLine()) {
String changeLine = changeScanner.nextLine();
String[] changeStringArray = changeLine.split(", ");
String action = changeStringArray[0];
String nameOrId = changeStringArray[1];
//if changList.csv file line has an "A" in the first spot add this entry to somePokemon
if(action.equals("A")) {
int newId = pokedex1.getNewId();
String name = nameOrId;
Pokemon somePokemon = new Pokemon(newId, name);
pokedex1.add(somePokemon);
}
//if it has a "D" then send it to PokemonNoGaps class to delete the entry from the array
else { //"D"
int someId = Integer.parseInt(nameOrId);
pokedex1.deleteById(someId);
}
//tests the action being taken and the update to the array
//System.out.println(action + "\t" + nameOrId + "\n");
System.out.println(pokedex1);
//*(supposedly)* prints the resulting contents of the array to a new csv file
String[] pokemonList = changeStringArray;
try {
String outputFile1 = "pokedex1.csv";
FileWriter writer1 = new FileWriter(outputFile1);
writer1.write(String.valueOf(pokemonList));
} catch (IOException e) {
System.out.println("\nError writing to Pokedex1.csv!");
e.printStackTrace();
}
}
//tests final contents of array after being passed through PokemonNoGaps class
//System.out.println(pokedex1);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
PokemonNoGaps class file:
public class PokemonNoGaps implements ChangePokedex {
private Pokemon[] pokedex = new Pokemon[1];
private int numElements = 0;
private static int id = 0;
// add, delete, search
#Override
public void add(Pokemon apokemon) {
// if you have space
this.pokedex[this.numElements] = apokemon;
this.numElements++;
// if you don't have space
if(this.numElements == pokedex.length) {
Pokemon[] newPokedex = new Pokemon[ this.numElements * 2]; // create new array
for(int i = 0; i < pokedex.length; i++) { // transfer all elements from array into bigger array
newPokedex[i] = pokedex[i];
}
this.pokedex = newPokedex;
}
this.id++;
}
public int getNewId() {
return this.id + 1;
}
#Override
public void deleteById(int id) {
for(int i = 0; i < numElements; i++) {
if(pokedex[i].getId() == id) {
for(int j = i+1; j < pokedex.length; j++) {
pokedex[j-1] = pokedex[j];
}
numElements--;
pokedex[numElements] = null;
}
}
}
public Pokemon getFirstElement() {
return pokedex[0];
}
public int getNumElements() {
return numElements;
}
public String toString() {
String result = "";
for(int i = 0; i < this.numElements; i++) {
result += this.pokedex[i].toString() + "\n";
}
return result;
}
}
Excpeted output:
1, Bulbasaur
3, Venasaur
4, Charizard
5, Squirtle
Am i using the wrong file writer? Am I calling the file writer at the wrong time or incorrectly? In other words, I do not know why my output file is empty and not being loaded with the contents of my array. Can anybody help me out?
I spotted a few issues whilst running this. As mentioned in previous answer you want to set file append to true in the section of code that writes to the new pokedx1.csv
try {
String outputFile1 = "pokedex1.csv";
FileWriter fileWriter = new FileWriter(prefix+outputFile1, true);
BufferedWriter bw = new BufferedWriter(fileWriter);
for(String pokemon : pokedex1.toString().split("\n")) {
System.out.println(pokemon);
bw.write(pokemon);
}
bw.flush();
bw.close();
} catch (IOException e) {
System.out.println("\nError writing to Pokedex1.csv!");
e.printStackTrace();
}
I opted to use buffered reader for the solution. Another issue I found is that your reading pokedex.csv but the file is named pokemon.csv.
String pokedexFilename = "pokemon.csv";
I made the above change to fix this issue.
On a side note I noticed that you create several scanners to read the two files. With these types of resources its good practice to call the close method once you have finished using them; as shown below.
Scanner pokescanner = new Scanner(pokedexFile);
// Use scanner code here
// Once finished with scanner
pokescanner.close();
String outputFile1 = "pokedex1.csv";
FileWriter writer1 = new FileWriter(outputFile1);
appears to be within your while loop so a new file will be created every time.
Either use the FileWriter(File file, boolean append) constructor or create before the loop

Java - .csv file as input

My program stimulates FCFS scheduling algorithm. It takes a .csv file as input and output the average waiting time. I have trouble with inputting the file. This is the error that i get when i ran the code:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at main.FCFS.main(FCFS.java:16)
What am I doing wrong? I cannot seems to figure it out. Please help.
package main;
//programming FCFS scheduling algorithm
import java.util.Scanner;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.File;
import java.io.FileInputStream;
public class FCFS {
public static void main(String[] args) throws FileNotFoundException {
// To Store Name of the file to be opened
String file = args[0];
int i = 0, n;
double AWT = 0, ATT = 0;
int AT[] = new int[100];
int BT[] = new int[100];
int WT[] = new int[100];
int TAT[] = new int[100];
int PID[] = new int[100];
// To open file in read mode
FileInputStream fin = null;
// To read input(file name) from standard input stream
Scanner s = new Scanner(new File("/Users/SLO/ex.csv"));
// To hold each single record obtained from CSV file
String oneRecord = "";
try {
// Open the CSV file for reading
fin = new FileInputStream(file);
// To read from CSV file
s = new Scanner(fin);
// Loop until all the records in CSV file are read
while (s.hasNextLine()) {
oneRecord = s.nextLine();
// Split record into fields using comma as separator
String[] details = oneRecord.split(",");
PID[i] = Integer.parseInt(details[0]);
AT[i] = Integer.parseInt(details[1]);
BT[i] = Integer.parseInt(details[2]);
System.out.printf("Process Id=%d\tArrival Time=%d\tBurst Time=%d\n", PID[i], AT[i], BT[i]);
i++;
}
WT[0] = 0;
for (n = 1; n < i; n++) {
WT[n] = WT[n - 1] + BT[n - 1];
WT[n] = WT[n] - AT[n];
}
for (n = 0; n < i; n++) {
TAT[n] = WT[n] + BT[n];
AWT = AWT + WT[n];
ATT = ATT + TAT[n];
}
System.out.println(" PROCESS BT WT TAT ");
for (n = 0; n < i; n++) {
System.out.println(" " + PID[n] + " " + BT[n] + " " + WT[n] + " " + TAT[n]);
}
System.out.println("Avg waiting time=" + AWT / i);
System.out.println("Avg waiting time=" + ATT / i);
} catch (FileNotFoundException e) {
System.out.printf("There is no CSV file with the name %s", file);
}
finally {
if (fin != null) {
try {
fin.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
Well, an ArrayIndexOutOfBoundsException is thrown if there are no arguments, because you access the empty array at a non existing index. Add the following lines to check if the argument is passed correctly:
...
public static void main(String[] args) throws FileNotFoundException {
if (args.length == 0)
throw new IllegalArgumentException("Missing mandatory file name in argument list");
// To Store Name of the file to be opened
String file = args[0];
...
If the missing argument ist the reason for the failure, check out https://docs.oracle.com/javase/tutorial/essential/environment/cmdLineArgs.html to find out how to pass it properly.

Benford's Law Java - Extracting first digit from a string array read from a file?

I am trying to create a program in Java that reads from a file, extracts the first digit of every number, determines the frequencies of 0-9, and prints out the frequencies (in percentages) of the numbers 0 through 9. I already figured out how to read from my file ("lakes.txt");
FileReader fr = new FileReader ("lakes.txt");
BufferedReader br = new BufferedReader(fr);
//for loop that traverses each line of the file
int count = 0;
for (String s = br.readLine(); s!= null; s = br.readLine()) {
System.out.println(s); //print out every term
count++;
}
String [] nums;
nums = new String[count];
//close and reset file readers
fr.close();
fr = new FileReader ("lakes.txt");
br = new BufferedReader(fr);
//read each line of the file
count = 0;
for (String s = br.readLine(); s!= null; s = br.readLine()) {
nums[count] = s;
count++;
}
I am currently printing out every term just to make sure it is working.
Now I am trying to figure out how to extract the first digit from each term in my string array.
For example, the first number in the array is 15,917, and I want to extract 1. The second number is 8,090 and I want to extract 8.
How can I do this?
To extract the first number from a String
Get the first letter from the String
Parse (1) into a number
For example:
String firstLetter = Character.toString(s.charAt(0));//alternatively use s.substring(0,1)
int value = Integer.parseInt(firstLetter);
This would be placed inside the file reading loop, assuming each line of the file contains a numeric value (in other words, no further processing or error handling of the lines of the file is required).
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class TermReader {
public static void main(String[] args) throws IOException {
FileReader fr = new FileReader ("lakes.txt");
BufferedReader br = new BufferedReader(fr);
int[] tally = new int[]{0,0,0,0,0,0,0,0,0,0};
int total = 0;
for (String s = br.readLine(); s!= null; s = br.readLine()) {
char[] digits = s.toCharArray();
for(char digit : digits) {
if( Character.isDigit(digit)) {
total++;
tally[Integer.parseInt(Character.toString(digit))]++;
break;
}
}
}
br.close();
for(int index = 0; index < 10; index++) {
double average = tally[index] == 0 ? 0.0 : (((double)tally[index]) / total) * 100;
System.out.println("[" + index + "][" + tally[index] + "][" + total + "][" + Math.round(average * 100.0) / 100.0 + "]");
}
}
}

Storing integer characters from a read file into a 2-Dimensional array

I'm trying to write a program that reads a file of an array (arranged with the Rows as the first character, and the Columns as the next character, and then a box of RxC terms) and tries to determine if five characters next to each other horizontally, vertically, or either way diagonally are the same, to color differently (in my GUI main program)
The code is EXTREMELY slow, and only works for smaller arrays? I don't understand what I'm doing wrong.
The Files look like this:
5 4
1 2 3 4 5
1 2 3 4 5
7 3 2 0 1
6 1 2 3 5
Code:
public class fiveinarow
{
int[][] Matrix = new int [100][100];
byte[][] Tag = new byte [100][100];
int row, col;
String filepath, filename;
public fiveinarow()
{
row = 0;
col = 0;
filepath = null;
filename = null;
}
public void readfile()
{
JFileChooser chooser = new JFileChooser();
chooser.setDialogType(JFileChooser.OPEN_DIALOG );
chooser.setDialogTitle("Open Data File");
int returnVal = chooser.showOpenDialog(null);
if( returnVal == JFileChooser.APPROVE_OPTION)
{
filepath = chooser.getSelectedFile().getPath();
filename = chooser.getSelectedFile().getName();
}
try
{
Scanner inputStream = new Scanner(new FileReader(filepath));
int intLine;
row = scan.nextInt();
col = scan.nextInt();
for (int i=0; i < row; i++)
{
for (int j = 0 ; j < col; j++)
{
int[][]Matrix = new int[row][col];
Matrix[i][j] = inputStream.nextInt();
}
}
}
catch(IOException ioe)
{
System.exit(0);
}
}
When I compute a 7x7, I get confirmation of opening and processing gives an array (7x7) of all Zeroes.
When I compute a 15x14, I get "Exception in thread "AWT-EventQueue-0" errors and no array when processed.
Some suggestions:
Create a method that returns int[][] to read in your input file
After reading the row and column, create the matrix int[][] result = new int[row][column];
inside your loop, read each integer into the matrix (result[i][j] = scan.nextInt();
Don't forget to move to the next line after scanning all the numbers on one line'
You might use something like:
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.Scanner;
import javax.swing.JFileChooser;
public class ReadMatrix {
static ReadMatrix mReadMatrix;
int[][] matrix;
int row, col;
String filepath, filename;
/**
* #param args
*/
public static void main(String[] args) {
mReadMatrix = new ReadMatrix();
mReadMatrix.readfile();
}
// int[][] Matrix = new int [100][100];
// byte[][] Tag = new byte [100][100];
public void readfile() {
JFileChooser chooser = new JFileChooser();
chooser.setDialogType(JFileChooser.OPEN_DIALOG);
chooser.setDialogTitle("Open Data File");
int returnVal = chooser.showOpenDialog(null);
if (returnVal == JFileChooser.APPROVE_OPTION) {
filepath = chooser.getSelectedFile().getPath();
filename = chooser.getSelectedFile().getName();
}
Scanner inputStream;
try {
inputStream = new Scanner(new FileReader(filepath));
row = inputStream.nextInt();
col = inputStream.nextInt();
System.out.println(" matrix is " + row + " rows and " + col + " columns");
matrix = new int[row][col];
for (int i = 0; i < row; i++) {
for (int j = 0; j < col; j++) {
matrix[i][j] = inputStream.nextInt();
System.out.println(" " + i + "," + j + ": " + matrix[i][j]);
}
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
You might want to check out something similar that I did:
Loading Tile Maps From Text Files In Slick2D
or
Why Aren't My Tile Maps Displaying Correctly?

Categories

Resources