How to improve performance of writing DB data to CSV - java

I have the following code to read data from oracle database and write into csv file. I used OpenCSV jar to write. It takes 230 seconds to write 1MB data. Is there any other way to increase the performance?
springJdbcTemplate.query(query,new ResultSetExtractor<ResultSet>(){
#Override
public ResultSet extractData(ResultSet rs) throws SQLException,
DataAccessException {
try {
CSVWriter writer = new CSVWriter(new FileWriter("C:/csv/Sample.csv"), ';');
writer.writeAll(rs, true);
} catch (Exception e) {
System.out.println("Exception -> "+e);
}
return rs;
}});

It is taking 7 seconds without writing.
I can't image why the CSVWriter is so slow unless it need buffering.
Can you try
CSVWriter writer = new CSVWriter(
new BufferedWriter(new FileWriter("C:/csv/Sample.csv")), ';');
and add
writer.close();
or use Java 7+
try(CSVWriter writer = ...) {
try this
import java.io.*;
public class DumbCSVWriter {
private final Writer writer;
private final String sep;
public DumbCSVWriter(Writer writer, String sep) {
this.sep = sep;
this.writer = writer instanceof BufferedWriter ? writer : new BufferedWriter(writer);
}
public void addRow(Object... values) throws IOException {
for (int i = 0; i < values.length - 1; i++) {
print(values[i]);
writer.write(sep);
}
if (values.length > 0)
print(values[values.length - 1]);
writer.write("\n");
}
private void print(Object value) throws IOException {
if (value == null) return;
String str = value.toString();
if (str.contains(sep) || str.contains("\"") || str.contains("\n")) {
str = '"' + str.replaceAll("\"", "\"\"");
}
writer.write(str);
}
public static void main(String[] args) throws IOException {
long start = System.nanoTime();
File file = new File("/tmp/deleteme");
DumbCSVWriter writer = new DumbCSVWriter(new FileWriter(file), ";");
String[] words = "hello,0123456789,has a;semi-colon,has a \"quote".split(",");
for (int i = 0; file.length() < 1024 * 1024; i++) {
writer.addRow(words);
}
writer.close();
long time = System.nanoTime() - start;
System.out.printf("Time tow rite 1 MB %.3f%n", time / 1e9);
}
private void close() throws IOException {
writer.close();
}
}
prints
Time to write 1 MB 0.307

This is the standard way of doing it, there's nothing wrong with your code; the only suggestion I'd have about your code is to use BufferedWriter or OutputStreamWriter like:
CSVWriter writer = new CSVWriter(new OutputStreamWriter(new FileOutputStream("C:/csv/Sample.csv"))
It may help just a little bit, but 230 seconds is not about writing a 1MB CSV file that's got to be about your database connection. Try just looping through the result set without writing to a file, I bet you'd get the same time.
Try setting the fetchSize for your statement (Statement or PreparedStatement) like
stmt.setFetchSize(1000);
This can significantly reduce the result set fetching time.

Related

Changing the first line in a file

I'm having an issue with changing a line in a file, the purpose of this code is to change the first number of the file to itself + 1. For some reason the code doesn't seem to be functioning at all, any help would be appreciated!
public static void changenumber(String fileName)
{
ArrayList<String> list = new ArrayList<String>();
File temp = new File(fileName);
Scanner sc;
try {
sc = new Scanner(temp);
while (sc.hasNextLine())
{
list.add(sc.nextLine());
}
sc.close();
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
String first = list.get(0);
int i = Integer.parseInt(first);
i = i+1;
first = Integer.toString(i);
list.set(0, first);
writenumber(list,fileName);
}
public static void writenumber(ArrayList<String> list, String fileName)
{
PrintWriter write;
try {
write = new PrintWriter(new FileWriter(fileName, true));
for(int i = 0; i<list.size();i++)
{
write.append(list.get(i));
}
}
catch(IOException err)
{
err.printStackTrace();
}
}
Your problem is that you never closed the FileWriter.
Use try-with-resources to ensure that file streams are closed correctly.
A few other improvements to your code:
Do not ignore exceptions. Continuing execution as-if nothing bad happened will cause lots of problems. Let the exception bounce back to caller, and let caller decide what to do if the file cannot be updated.
Scanner is slow. Since all you're doing to reading lines, use BufferedReader instead.
The lines in memory don't end in newline characters, so you need to use the println() method when writing the lines back out, otherwise the result is a file with all the lines concatenated into a single line.
Variables renamed to be more descriptive.
public static void changenumber(String fileName) throws IOException {
ArrayList<String> lines = new ArrayList<>();
try (BufferedReader in = new BufferedReader(new FileReader(fileName))) {
for (String line; (line = in.readLine()) != null; ) {
lines.add(line);
}
}
int i = Integer.parseInt(lines.get(0));
i++;
lines.set(0, Integer.toString(i));
writenumber(lines, fileName);
}
public static void writenumber(List<String> lines, String fileName) throws IOException {
try (PrintWriter out = new PrintWriter(new FileWriter(fileName, true))) {
for (String line : lines) {
out.println(line);
}
}
}
Of course, you could simplify the code immensely by using the newer NIO.2 classes added in Java 7, in particular the java.nio.file.Files class.
public static void changenumber(String fileName) throws IOException {
Path filePath = Paths.get(fileName);
List<String> lines = Files.readAllLines(filePath);
lines.set(0, Integer.toString(Integer.parseInt(lines.get(0)) + 1));
Files.write(filePath, lines);
}

Java - Read large .txt data file in batch size of 10

I have a large data file say dataset.txt where data is in the format -
1683492079 kyra maharashtra 18/04/2017 10:16:17
1644073389 pam delhi 18/04/2017 10:16:17
.......
The fields are id, name, state, and timestamp.
I have around 50,000 lines of data in the .txt data file.
My requirement is to read the data from this data file in batch size of 10.
So in first batch I need to read from 0 to 9th elements. Next batch from 10th to 19th elements and so on...
Using BufferedReader I have managed to read the whole file:
import java.io.*;
public class ReadDataFile {
public static void main(String args[]) throws IOException {
BufferedReader br = new BufferedReader(new FileReader("dataset.txt"));
String line;
while((line = br.readLine())!= null)
{
System.out.println(line);
}
br.close();
}
}
But my requirement is to read the file in batch size of 10. I am new to Java so would really appreciate if some one can help me in simple terms.
As per #GhostCat answer - this what I have got -
public class ReadDataFile {
public static void main(String args[]) throws IOException {
BufferedReader br = new BufferedReader(new FileReader("dataSetExample.txt"));
readBatch(br,10);
}
public static void readBatch(BufferedReader reader, int batchSize) throws IOException {
List<String> result = new ArrayList<>();
for (int i = 0; i < batchSize; i++) {
String line = reader.readLine();
if (line != null) {
// result.add(line);
System.out.println(line);
}
}
// return result;
return ;
}
}
The file is read in the readBatch method so how do I know in the main method that the end of file is reached to call the next 10 records? Kindly help.
Your requirements aren't really clear; but something simple to get you started:
A) your main method shouldn't do any reading; it just prepare that BufferedReader object
B) you use that reader with a method like:
private static List<String> readBatch(Reader reader, int batchSize) throws IOException {
List<String> result = new ArrayList<>();
for (int i = 0; i < batchSize; i++) {
String line = reader.readLine();
if (line != null) {
result.add(line);
} else {
return result;
}
}
return result;
}
To be used in your main:
BufferedReader reader = ...
int batchSize = 10;
boolean moreLines = true;
while (moreLines) {
List<String> batch = readBatch(reader, batchSize);
... do something with that list
if (batch.size() < batchSize) {
moreLines = false;
}
This is meant as "suggestion" how you could approach this. Things missing from my answer: probably you should use a distinct class, and do parsing right there (and return a List<DataClass> instead of moving around those raw "line strings".
And of course: 50000 lines isn't really much of data. Unless we are talking an embedded device, there is really not much point regarding "batch style".
And finally: the term batch processing has a very distinct meaning; also in Java, and if you intend to go there, see here for further reading.
Anybody in need of working example ---
// Create a method to read lines (using buffreader) and should accept the batchsize as argument
private static List<String> readBatch(BufferedReader br, int batchSize) throws IOException {
// Create a List object which will contain your Batch Sized lines
List<String> result = new ArrayList<>();
for (int i = 1; i < batchSize; i++) { // loop thru all your lines
String line = br.readLine();
if (line != null) {
result.add(line); // add your lines to your (List) result
} else {
return result; // Return your (List) result
}
}
return result; // Return your (List) result
}
public static void main(String[] args) throws IOException {
//input file
BufferedReader br = new BufferedReader(new FileReader("c://ldap//buffreadstream2.csv"));
//output file
BufferedWriter bw = new BufferedWriter(new FileWriter("c://ldap//buffreadstream3.csv"));
// Your Batch size i.e. how many lines you want in your batch
int batchSize = 5; // Define your batchsize here
String line = null;
long batchNumber = 1;
try {
List<String> mylist = null;
while ((line = br.readLine()) != null) { // Do it for your all line in your csv file
bw.write("Batch Number # " + batchNumber + "\n");
System.out.println("Batch Number # " + batchNumber);
bw.write(line + "\n"); // Since br.readLine() reads the next line you have to catch your first line here itself
System.out.println(line); // else you will miss every batchsize number line
// process your First Line here...
mylist = readBatch(br, batchSize); // get/catch your (List) result here as returned from readBatch() method
for (int i = 0; i < mylist.size(); i++) {
System.out.println(mylist.get(i));
// process your lines here...
bw.write(mylist.get(i) + "\n"); // write/process your returned lines
}
batchNumber++;
}
System.out.println("Lines are Successfully copied!");
br.close(); // one you are done .. dont forget to close/flush
br = null; // all
bw.flush(); // your
bw.close(); // BR and
bw = null; // BWs..
} catch (Exception e) {
System.out.println("Exception caught: " + e.getMessage()); // Catch any exception here
}
}

Limit file size while writing in java

I need to limit the file size to 1 GB while writing preferably using BufferedWriter.
Is it possible using BufferedWriter or I have to use other libraries ?
like
try (BufferedWriter writer = Files.newBufferedWriter(path)) {
//...
writer.write(lines.stream());
}
You can always write your own OutputStream to limit the number of bytes written.
The following assumes you want to throw exception if size is exceeded.
public final class LimitedOutputStream extends FilterOutputStream {
private final long maxBytes;
private long bytesWritten;
public LimitedOutputStream(OutputStream out, long maxBytes) {
super(out);
this.maxBytes = maxBytes;
}
#Override
public void write(int b) throws IOException {
ensureCapacity(1);
super.write(b);
}
#Override
public void write(byte[] b) throws IOException {
ensureCapacity(b.length);
super.write(b);
}
#Override
public void write(byte[] b, int off, int len) throws IOException {
ensureCapacity(len);
super.write(b, off, len);
}
private void ensureCapacity(int len) throws IOException {
long newBytesWritten = this.bytesWritten + len;
if (newBytesWritten > this.maxBytes)
throw new IOException("File size exceeded: " + newBytesWritten + " > " + this.maxBytes);
this.bytesWritten = newBytesWritten;
}
}
You will of course now have to set up the Writer/OutputStream chain manually.
final long SIZE_1GB = 1073741824L;
try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(
new LimitedOutputStream(Files.newOutputStream(path), SIZE_1GB),
StandardCharsets.UTF_8))) {
//
}
Exact bytes to 1 GB is very difficult in cases where you are writing lines. Each line may contain unknown number of bytes in it. I am assuming you want to write data line by line in file.
However, you can check how many bytes does line has before writing it to the file and another approach is to check file size after writing each line.
Following basic example writes one same line each time. Here This is just a test ! text takes 21 bytes on file in UTF-8 encoding. Ultimately after 49 writes it reaches to 1029 Bytes and stops writing.
public class Test {
private static final int ONE_KB = 1024;
public static void main(String[] args) {
File file = new File("D:/test.txt");
try (BufferedWriter writer = Files.newBufferedWriter(file.toPath())) {
while (file.length() < ONE_KB) {
writer.write("This is just a test !");
writer.flush();
}
System.out.println("1 KB Data is written to the file.!");
} catch (IOException e) {
e.printStackTrace();
}
}
}
As you can see we have already written out of the limit of 1KB as above program writes 1029 Bytes and not less than 1024 Bytes.
Second approach is checking the bytes according to specific encoding before writing it to file.
public class Test {
private static final int ONE_KB = 1024;
public static void main(String[] args) throws UnsupportedEncodingException {
File file = new File("D:/test.txt");
String data = "This is just a test !";
int dataLength = data.getBytes("UTF-8").length;
try (BufferedWriter writer = Files.newBufferedWriter(file.toPath())) {
while (file.length() + dataLength < ONE_KB) {
writer.write(data);
writer.flush();
}
System.out.println("1 KB Data written to the file.!");
} catch (IOException e) {
e.printStackTrace();
}
}
}
In this approach we check length of bytes prior to writing it to the file. So, it will write 1008 Bytes and it will stop writing.
Problems with both the approaches,
Write and Check : You may end up with some extra bytes and file size may cross the limit
Check and Write : You may have less bytes than the limit if next line has lot of data in it. You should be careful about the encoding.
However, there are other ways to do this validations with some third party library like apache io and I find it more cumbersome then conventional java ways.
int maxSize = 1_000_000_000;
Charset charset = StandardCharsets.UTF_F);
int size = 0;
int lineCount = 0;
while (lineCount < lines.length) {
long size2 = size + (lines[lineCount] + "\r\n").getBytes(charset).length;
if (size2 > maxSize) {
break;
}
size = size2;
++lineCount;
}
List<String> linesToWrite = lines.substring(0, lineCount);
Path path = Paths.get("D:/test.txt");
Files.write(path, linesToWrite , charset);
Or a bit faster while decoding only once:
int lineCount = 0;
try (FileChannel channel = new RandomAccessFile("D:/test.txt", "w").getChannel()) {
ByteBuffer buf = channel.map(FileChannel.MapMode.WRITE, 0, maxSize);
lineCount = lines.length;
for (int i = 0; i < lines.length; i++) {
bytes[] line = (lines.get(i) + "\r\n").getBytes(charset);
if (line.length > buffer.remaining()) {
lineCount = i;
break;
}
buffer.put(line);
}
}
IIUC, there are various ways to do it.
Keep writing data in chucks and flushing it and keep checking the file size after every flush.
Use log4j (or some logging framework) which can let us rollover to new file after certain size or time or some other trigger point.
While BufferedReader is great, there are some new APIs in java which could make it faster. Fastest way to write huge data in text file Java

how to remove a line of file from an other line in the same file?

I have a text file with the following format:
String1
String1String2
String1String2String3
....
String1Strin2String3.....String(i)...String(n)
I want to remove some parts of this file to have the following format(result file):
String1
String2
String3
...
String(i)
String(n)
I tried with this fonction but my output file is always empty:
public static void FileFormatted(String inputFile,String outputFile)
{
String FileContent = readFile(inputFile,
StandardCharsets.UTF_8);
String[] FileSentences = FileContent.split("[\n]");
for (int i = 0; i < FileSentences.length; i++)
{
StringBuilder builder = new StringBuilder();
for(int j=1;j<FileSentences.length;j++)
{
int index= FileSentences[j].indexOf("FileSentences[i]");
String temp=FileSentences[j].substring(index);
FileSentences[j]=FileSentences[j].replaceAll(temp," ");
builder.append(FileSentences[j]+ "\n");
}
writeIntoFile(builder, outputFile, true);
}
}
public static void writeIntoFile(StringBuilder stringBuilder,
String txtFilePath, boolean append) {
File file = new File(txtFilePath);
// if file doesn't exists, then create it
if (!file.exists()) {
try {
file.createNewFile();
} catch (IOException e) {
e.printStackTrace();
}
}
FileWriter fw;
try {
fw = new FileWriter(file.getAbsoluteFile(), append);
BufferedWriter bw = new BufferedWriter(fw);
bw.write(stringBuilder.toString());
bw.close();
} catch (IOException e) {
e.printStackTrace();
}
}
Can someone please help me.
Okay, first of all reading the whole file in, in one go is bad practice. Imagine you have a 6gb file, that means you need 6gb of RAM to store that file when you read it in. It would be better to read the file line by line.
So the Aim of the logic would be read line by line.
When we read the first line we can get the length of it.
When we read read the second line we know the length of the first line so that means it is our starting point on the second line. This means you can use sub-string method, passing the start position and end position.
And repeat this logic for line 3,4,...n
The benefit of this is that you don't waste memory, you are only storing the size of the line in text.
Update
I have written the code that I suggested earlier. It's pretty basic and there is no validation so you will need to add to it. But it covers the basics
public static void main(String[] args) throws IOException {
FileReader fileReader = new FileReader("test.txt");
BufferedReader br = new BufferedReader(fileReader);
int startPosition = 0;
String line;
ArrayList<String> items = new ArrayList<String>();
while((line = br.readLine() ) != null)
{
items.add(line.substring(startPosition, line.length()));
System.out.println(line.substring(startPosition, line.length()));
startPosition = line.length();
}
write("test2.txt", items);
}
public static void write (String filename, ArrayList<String> items) throws IOException{
BufferedWriter outputWriter = null;
outputWriter = new BufferedWriter(new FileWriter(filename));
for (String item : items) {
outputWriter.write(item);
outputWriter.newLine();
}
outputWriter.flush();
outputWriter.close();
}
be sure the pattern is consistent in the hole file, then Do this:
public static void main(String[] args) {
String wordTofind = "String";
String st = "String1String2String3String4";
String[] arra = st.split(wordTofind);
for (int i = 1; i < arra.length - 1; i++) {
System.out.println(wordTofind + arra[i]);
//write to a file or similar.
}
}
you can use regex too, but this is acceptable...

How do I remove all occurrences of "," and "[" from the output in java?

Here is my code. The input consists of names of anime(japanese cartoons) which i have stored it in testfile in anime.txt and I am arranging them in alphabetical order and writing it back into another file name animeout.txt.
The input file does not contain any comma or square bracket but the output file has it.
public class Main {
public static ArrayList<String> read(String filePath) throws IOException {
ArrayList<String> names = new ArrayList<String>();
BufferedReader reader = new BufferedReader(new FileReader(filePath));
int numRead = 0;
String line;
while ((line = reader.readLine()) != null) {
names.add(line + "\n");
numRead++;
}
System.out.println("\n\n count " +numRead);
reader.close();
System.out.println(names);
return names;
}
public static void write(ArrayList<String> input) throws IOException
{
File file = new File("Animeout.txt");
file.createNewFile();
FileWriter writer = new FileWriter(file);
writer.write(input);
writer.flush();
writer.close();
}
public static void main(String args[]) throws IOException
{
ArrayList<String> names2 = new ArrayList<String>();
String path= "anime.txt";
String test;
names2 = read(path);
Collections.sort(names2, null);
// System.out.println(names2);
write(names2);
}
}
Input file has about 200 lines. Below is just a small example
One piece
Naruto/naruto shippuden
Bleach
Fullmetal alchemist brotherhood
Fate/stay night
Fairy tale
Blue exorcist
Soul eater
Death note
Output file contains , and [
count 105
[11 eyes
, A certain magical index
, A certain magical index II
, Aldnoah.Zero
, Angel beats!
, Another
, Asu no yoichi
, Bay blade
, Beelzebub
, Ben-To
String str = "[12,34,45]";
String out = str.replaceAll(",|\\[|\\]","");
output:
123445
Why are you using a ObjectOuputStream? That is intended for when you want to serialise Java objects and restore them later. I don't see why you need it here.
Just use a FileWriter, like so:
public static void write(ArrayList<String> input) throws IOException
{
try
{
File file = new File("Animeout.txt");
FileWriter fw = new FileWriter(file);
for (int i = 0; i < input.size(); i++) {
fw.append(input.get(i) + "\n");
}
}
finally
{
try {
if (fw != null)
fw.close();
}
catch (Exception e) {
// ignore
}
}
}
Your write method is unfortunate. Try something like this instead (and remove the + "\n" when reading the lines):
public static void write(ArrayList<String> lines) throws IOException
{
File file = new File("Animeout.txt");
PrintStream ps = null;
try {
ps = new PrintStream(file);
for (final String line : lines) {
ps.println(line);
}
} finally {
if (ps != null) { ps.close(); }
}
}
The ObjectOutputStream you are using is not appropriate for simply writing lines of text.
Finally, if all you want to do is sorting the lines of a text file, at least on a POSIX system, you can just do it with
$ sort anime.txt > Animeout.txt
from the command line.

Categories

Resources