Below java code is to split a big .csv file into multiple .csv files. But how to store Header in all splitted files?
import java.io.*;
import java.util.Scanner;
import java.io.File;
import java.io.FileInputStream;
import java.util.Iterator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class split {
public static void main(String args[]) {
try {
String inputfile = "E:/Sumit/csv-splitting-2/Proposal_Details__c.csv";
System.out.println("Input Path is :- " + inputfile);
double nol = 100000.0;
File file = new File(inputfile);
Scanner scanner = new Scanner(file);
int count = 0;
while (scanner.hasNextLine()) {
scanner.nextLine();
count++;
}
System.out.println("Lines in the file: " + count);
double temp = (count / nol);
int temp1 = (int) temp;
int nof = 0;
if (temp1 == temp) {
nof = temp1;
} else {
nof = temp1 + 1;
}
System.out.println("No. of files to be generated :" + nof);
FileInputStream fstream = new FileInputStream(inputfile);
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
for (int j = 1; j <= nof; j++) {
String outputpath = "E:/Sumit/csv-splitting-2/";
String outputfile = "File-2-Proposal_Details__c" + j + ".csv";
System.out.println(outputpath + outputfile);
FileWriter fstream1 = new FileWriter(outputpath + outputfile);
BufferedWriter out = new BufferedWriter(fstream1);
for (int i = 1; i <= nol; i++) {
strLine = br.readLine();
if (strLine != null) {
out.write(strLine);
if (i != nol) {
out.newLine();
}
}
}
out.close();
}
in.close();
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
}
}
Assuming your first line is the header, you can have a String header; that will get the read of the first line, eg: header = br.readLine();.
On your for loop for nof (which I assume means number_of_files), you always add the header as the first line when you create a new file.
It would be something like this:
before your for-loop, you just save the header on a variable
String header = br.readLine();
you have 2 for-loops, one that creates a file, the other one that write each line to the newly created file
Inside the first for loop, right after you create the file, you just write the header to it: our.write(header);
General tips:
use variable names that makes sense. nol, nof, j... none of them make sense, you can pretty much call them numOfLines, numOfFiles and currentFile for example.
Related
After importing a CSV file and sorting it in to a 2-Dimensional array I get a couple of weird characters in only the first and possibly the last cell.
Expected output: S1358_R1
Actual output: S1358_R1
Does anyone know why these extra characters show up? The code used to do this is included below:
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class open2 {
public static void main(String[] args) {
String line = "";
String splitBy = ",";
try {
//parsing a CSV file into BufferedReader class constructor
int i = 0;
String[][] ss = new String[10000][10000];
BufferedReader br = new BufferedReader(new FileReader("C:\\Users\\micha\\Documents\\spreadsheet.csv"));
while ((line = br.readLine()) != null) //returns a Boolean value
{
String[] cells = line.split(splitBy);
for (int j = 0; j < cells.length; j++) {
ss[i][j] = cells[j];
} // use comma as separator
i = i + 1;
}
System.out.println(ss[0][0]);
} catch (IOException e) {
e.printStackTrace();
}
}
}
I am doing an assignment for my class where I have a file that is 7 Mb. Essentially I am supposed to break it up into 2 phases.
Phase 1: I add each word from the file into an array list and sort it in alphabetical order. I then add every 100,000 words into 1 file, so I have 12 files in total with the naming convention as displayed below in code.
Phase 2: For every 2 files, I read one line from each file, and write which one comes first in alphabetical order into a new file (basically sort), until I eventually merge 2 files into 1 that is sorted. I do this in a loop, so that the number of files get halved each time while being sorted, so essentially I would have 7 MB all sorted into one file.
What I am having trouble with: For phase 2, I successfully read phase 1, but it seems that my files are all being copied repeatedly into multiple files, rather than being sorted and merged. I appreciate any help given, thank you.
File: It seems I cannot upload the .txt file, but the code should work so that any file with any number of lines can be merged, just the number of lines variable needs to be changed.
Summary: 1 Big big file unsorted, turns into multiple sorted files (ie. 12), first sort and merge turns it into 6 files, second sort and merge turns it into 3 files, third merge turns it into 2 files, and fourth merge turns it into 1 file big file again.
Code:
package Assignment11;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Scanner;
public class FileSorter_1
{
public static ArrayList<String> storyline = new ArrayList<String>();
public static int num_lines = 100000; //this number can be changed
public static int num_files_initial;
public static int num_files_sec;
public static void main(String[] args) throws IOException
{
phase1();
phase2();
}
public static void phase1() throws IOException
{
Scanner story = new Scanner(new File("Aesop_Shakespeare_Shelley_Twain.txt")); //file name
int f = 0;
while(story.hasNext())
{
int i = 0;
while(story.hasNext())
{
String temp = story.next();
storyline.add(temp);
i++;
if(i > num_lines)
{
break;
}
}
Collections.sort(storyline, String.CASE_INSENSITIVE_ORDER);
BufferedWriter write2file = new BufferedWriter(new FileWriter("temp_0_" + f + ".txt")); //initialze new file
for(int x = 0; x<num_lines;x++)
{
write2file.write(storyline.get(x));
write2file.newLine();
}
write2file.close();
f++;
}
num_files_initial = f;
}
public static void phase2() throws IOException
{
int file_n = 1;
int prev_fn = 0;
int t = 0;
int g = 0;
while(g<5)
{
System.out.println(num_files_initial);
if(t+1 > num_files_initial-1)
{
if(num_files_initial % 2 != 0)
{
BufferedWriter w = new BufferedWriter(new FileWriter("temp_"+file_n +"_" + g + ".txt"));
Scanner file1 = new Scanner(new File("temp_"+prev_fn +"_" + t + ".txt"));
String word1 = file1.next();
while(file1.hasNext())
{
w.write(word1);
w.newLine();
}
g++;
break;
}
num_files_initial = num_files_initial / 2 + num_files_initial % 2;
g = 0;
t = 0;
file_n++;
prev_fn++;
}
String s1="temp_"+file_n +"_" + g + ".txt";
String s2="temp_"+prev_fn +"_" + t + ".txt";
String s3="temp_"+prev_fn +"_" + (t+1) + ".txt";
System.out.println(s2);
System.out.println(s3);
BufferedWriter w = new BufferedWriter(new FileWriter(s1));
Scanner file1 = new Scanner(new File(s2));
Scanner file2 = new Scanner(new File(s3));
String word1 = file1.next();
String word2 = file2.next();
System.out.println(num_files_initial);
//System.out.println(t);
//System.out.println(g);
while(file1.hasNext() && file2.hasNext())
{
if(word1.compareTo(word2) == 1) //if word 1 comes first = 1
{
w.write(word1);
w.newLine();
file1.next();
}
if(word1.compareTo(word2) == 0) //if word 1 comes second = 0
{
w.write(word2);
w.newLine();
file2.next();
}
}
while(file1.hasNext())
{
w.write(word1);
w.newLine();
break;
}
while(file2.hasNext())
{
w.write(word2);
w.newLine();
break;
}
g++;
t+=2;
w.close();
file1.close();
file2.close();
}
}
}
After writing data into the new files you are not clearing the existing sorted array and that's why it is being copied into new files. Here are some fixes:
...
int f = 0;
while(story.hasNext())
{
// initilize the array here.
storyline = new ArrayList<>();
int i = 0;
while(story.hasNext())
{
String temp = story.next();
storyline.add(temp);
i++;
if(i > num_lines)
{
break;
}
}
Collections.sort(storyline, String.CASE_INSENSITIVE_ORDER);
BufferedWriter write2file = new BufferedWriter(new FileWriter("temp_0_" + f + ".txt")); //initialze new file
// instead of num_lines use i
for(int x = 0; x<i;x++)
{
write2file.write(storyline.get(x));
write2file.newLine();
}
write2file.close();
f++;
}
num_files_initial = f;
Hope this helps.
i want to counting matching and counting words.
i have two text file , and to compare words eachother.
for example,
a text file : a b c d e.
b text file : a a a a a.
and i want to see this output.
output : a 5.
but when i wrote code, it didn't works.
please help me.
i wrote code for java adk 1.8 using eclipses, windows 8.1 64bit.
this is code following this.
package test1;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
public class ex01 {
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
FileReader fr = new FileReader("C:/Users/Hong/Desktop/승현연구/152-300/301.txt");
FileReader key_item = new FileReader("C:/Users/Hong/Desktop/승현연구/no-yes2500.txt");
BufferedReader br = new BufferedReader(fr);
BufferedReader br2 = new BufferedReader(key_item);
FileOutputStream file = new FileOutputStream("C:/Users/Hong/Desktop/승현연구/답변빈도/a301.txt");
List<String> key = new ArrayList<String>();
List<String> str = new ArrayList<String>();
String in = "";
String s = "";
String ss[];
while ((in = br2.readLine()) != null) {
key.add(in);
}
while ((s = br.readLine()) != null) {
str.add(s);
}
***int cnt = 0;
int count = 0;
int cont = 0;
String txt = "";
for (int i = 0; i < key.size(); i++) {
for (int j = 0; j < str.size(); j++) {
System.out.println(j + " " + str.get(j));
if (str.get(j).lastindexOf(key.get(i))) {
cnt++;
//System.out.println(key.get(i) + " " + cnt);
}
if (cnt == 1){
//cont ++ 1;
//System.out.printf("%d",cont);
}
}
System.out.println(key.get(i) + " " + cnt);
txt = txt + key.get(i) + " " + cnt + "\n";
cnt = 0;
}
file.write(txt.getBytes());
}***
//System.out.println("Hello Java");
}
in my coding, error causeing this line
[ if (str.get(j).lastindexOf(key.get(i)))]
i don't know why
this is summary for explain text file and what i want to do
First, the code i'd like to see is to compare 301 text file and no-yes2500 text file and output the word counts belonging to no-yes2500
(ex : apple 3
banana 2 )
301.txt is a text file that consists of sentences about Q&A community answers.
no-yes2500.txt is a keyword list
str.get(j).toString().lastIndexOf(key.get(i).toString())>=0
Try this
You have a typo in your code, replace lastindexOf with lastIndexOf and make the expression in the if-clause to evaluate to a boolean, currently its an int (i.e. an index)
I am trying to count line no. of a file using Java LineNumberReader. The output comes with problem. The problem is the alternative lines are displayed like line no. 1,3,5,... and on counting total no of lines i got half no. of the total actual lines. Here is the code
import java.lang.*;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
public class countLine{
File file=null;
public countLine(){
file =new File("E:\\test.txt");
getFileData();
}
public void getFileData(){
try{
if(file.exists()){
FileReader fr = new FileReader(file);
LineNumberReader lnr = new LineNumberReader(fr);
int linenumber = 0;
do{
System.out.println(lnr.readLine());
linenumber++;
}while (lnr.readLine() != null);
System.out.println("Total number of lines : " + linenumber);
lnr.close();
}else{
System.out.println("File does not exists!");
}
}
catch(Exception e){
e.printStackTrace();
}
}
public static void main(String h[]){
countLine cl = new countLine();
}
}
You read the line twice, once with System.out.println(lnr.readLine()); and once with while (lnr.readLine() != null);
Combining the two other answers into one gives correct line count as well as the ability of doing the System.out.println(...) with line content:
int linenumber = 0;
String tmp = new String();
while ((tmp = lnr.readLine()) != null) {
linenumber++;
System.out.println(tmp);
}
This would have been enough for counting:
FileReader fr = new FileReader(file);
LineNumberReader lnr = new LineNumberReader(fr);
while (lnr.readLine() != null);
System.out.println( lnr.getLineNumber() );
lnr.close();
Added later Or, if you need to print lines (+ line numbers):
String line = null;
while ((line = lnr.readLine()) != null){
System.out.println( lnr.getLineNumber() + " " + lnr.getLineNumber() );
}
You can get the number of lines with two lines of code: something like
lineNumberReader.skip(Long.MAX_VALUE);
int count = LineNumberReader.getLineNumber();
E&OE
Every lnr.readLine() read a line from the file and return it.
You are making two reads:
1)At the System.out.println
2)At the while statement
You need make a call to readLine method,
save the result at variable and when it a null is the end of the file.
Take a look at:
int linenumber = 0;
String tmp = new String();
while ((tmp = lnr.readLine()) != null) {
linenumber++;
System.out.println(tmp);
}
I am trying to split text files in a directory along a line 'END OF CUSTOMER STATEMENT' and I store the result files into a temporary directory. The split happens only for the first file while the other file is ignored, what is the problem with my code. I was expecting the for loop will engulf all the files in the directory? Here is my code.
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
/**
*
* #author Administrator
*/
public class SplitFiles {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
File f = new File("D:/statements/");
String[] filenames = f.list();
File[] texts = f.listFiles();
String lines = "";
for (int m = 0; m < filenames.length; m++) {
try {
int count = 0;
FileInputStream fs = new FileInputStream("D:/statements/" + filenames[m]);
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
FileOutputStream fos = new FileOutputStream("D:/DFCU Statements/statement" + count + ".RPT");
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(fos));
while ((lines = br.readLine()) != null) {
String mine = lines.trim();
if (mine.startsWith("END OF CUSTOMER STATEMENT")) {
bw.close();
count++;
fos = new FileOutputStream("D:/DFCU Statements/statement" + count + ".RPT");
bw = new BufferedWriter(new OutputStreamWriter(fos));
continue;
}
if (mine.isEmpty()) {
continue;
} else {
bw.write(lines);
bw.newLine();
bw.flush();
}
}
fos.close();
fs.close();
br.close();
bw.close();
} catch (Exception ag) {
System.out.println(ag);
}
}
}
}
I think you should do this in the first place (there are possibly more bugs)
int count = 0;
for (int m = 0; m < filenames.length; m++) {
...
UPDATE besides, remove your count++ and place it after each file creation
FileOutputStream fos = new FileOutputStream("D:/DFCU Statements/statement" + count + ".RPT");
count++;
then it will work as expected
I assume that since the target files have nothing that distinguishes them from each other (they are all named statementX.RPT) - that the last file is actually the one you have in your output - but this is only a guess.
try to change your output file to be named "statement." + m + "." + count ".RPT" and that way you will have unique output files.
Also, take note to the following comments:
When using the File class, the listFiles API is more usefull (in my opinion) - from each file you get you can query getName and getPath.
About this line: FileInputStream fs = new FileInputStream("D:/statements/" + filenames[m]); - if you used the results you got from listFiles you could replace it with FileInputStream fs = new FileInputStream(files[m]); - no need to hard-code the path.
You should modify your code. Otherwise instead of creating two output files your code will create three output files. Here is the correct code
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
/**
*
* #author Administrator
*/
public class SplitFiles {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
File f = new File("D:/statements/");
String[] filenames = f.list();
File[] texts = f.listFiles();
String lines = "";
int count = 0;
for (int m = 0; m < filenames.length; m++) {
try {
FileInputStream fs = new FileInputStream("D:/statements/" + filenames[m]);
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
FileOutputStream fos = null;
BufferedWriter bw = null;
while ((lines = br.readLine()) != null) {
String mine = lines.trim();
if (mine.startsWith("END OF CUSTOMER STATEMENT")) {
if(bw!=null)
{
bw.close();
}
count++;
continue;
}
if (mine.isEmpty()) {
continue;
} else {
if(bw==null)
{
fos = new FileOutputStream("D:/DFCU Statements/statement" + count + ".RPT");
bw = new BufferedWriter(new OutputStreamWriter(fos));
}
bw.write(lines);
bw.newLine();
bw.flush();
}
}
fos.close();
fs.close();
br.close();
bw.close();
} catch (Exception ag) {
System.out.println(ag);
}
}
}
}