I am a computer science student - second year. I was asked to prepare a project - by Hoffman Code.
During the project I got stuck in the fault, I am in the project phase building the encoder. I get a file - and I have to encode it in bytes - according to the Hoffman code.
My question is how to encode the file in bytes - what I did: for example:
I received the word "abca cadbara" in the file. And into another file I put the encoding but using a string and not in bytes.
the part of the code:
public static void writeOutputFile (String[] input_names, String[] output_names, Map<Character, String> codes)
{
FileInputStream input;
FileOutputStream output;
try
{
input = new FileInputStream(input_names[0]);
output = new FileOutputStream(output_names[0]);
for (int i = 0; i < (int) input.getChannel().size(); i++)
{
int x = input.read();
String codeOutput = codes.get((char) x);
//output.write(Integer.parseInt(codeOutput, 2));
for (int j = 0; j < codeOutput.length(); j++) {
output.write((int) codeOutput.charAt(j));
}
}
input.close();
output.close();
}
catch (Exception e)
{
e.printStackTrace();
}
}
How can I use bytes and not the string?
Thanks for the help.
public static void writeOutputFile (String[] input_names,
String[] output_names,
Map<Character, String> codes) {
try (FileInputStream input = new FileInputStream(input_names[0]);
FileOutputStream output = new FileOutputStream(output_names[0])) {
StringBuilder toWrite = new StringBuilder();
for (int i = 0; i < (int) input.getChannel().size(); i++) {
toWrite.append(codes.get((char) input.read()));
}
output.write(toWrite.toString().getBytes());
} catch (IOException e) {
e.printStackTrace();
}
}
Use String.getBytes() to write bytes to the file.
Use try-with-resources and don't worry about closing the resources. Use ; to separate multiple resources.
Don't write in a loop. Build the string first and then write it once. I/O is slow.
When concatenating in a loop, use StringBuilder avoid creating new Strings.
I made your code a bit more concise, you can rewrite as you like.
Related
I have a class which reads a CSV file but when size of file is high, the program throws Java heap size error, so I need to split that file into pieces and transfer lines to other files according to line size.
For example;
I have a file of 500 000 lines and I'm dividing it into 5 files by 100 000 lines. So I have 5 files consisting of 100 000 lines so that I can read them.
I couldn't find a way to do that so it would be nice if I see example lines of code.
public static void splitLargeFile(final String fileName,
final String extension,
final int maxLines,
final boolean deleteOriginalFile) {
try (Scanner s = new Scanner(new FileReader(String.format("%s.%s", fileName, extension)))) {
int file = 0;
int cnt = 0;
BufferedWriter writer = new BufferedWriter(new FileWriter(String.format("%s_%d.%s", fileName, file, extension)));
while (s.hasNext()) {
writer.write(s.next() + System.lineSeparator());
if (++cnt == maxLines && s.hasNext()) {
writer.close();
writer = new BufferedWriter(new FileWriter(String.format("%s_%d.%s", fileName, ++file, extension)));
cnt = 0;
}
}
writer.close();
} catch (Exception e) {
e.printStackTrace();
}
if (deleteOriginalFile) {
try {
File f = new File(String.format("%s.%s", fileName, extension));
f.delete();
} catch (Exception e) {
e.printStackTrace();
}
}
}
If you're on Linux and you can run the CSV through a script first, then you can use "split":
$ split -l 100000 big.csv small-
This generates files named small-aa, small-ab, small-ac... To rename these to csv's if needed:
$ for a in small-*; do
mv $a $a.csv; # rename split files to .csv
java MyCSVProcessor $a.csv; # or just process them anyways
done
Try this for additional options:
$ split -h
-a –suffix-length=N use suffixes of length N (default 2)
-b –bytes=SIZE put SIZE bytes per output file
-C –line-bytes=SIZE put at most SIZE bytes of lines per output file
-d –numeric-suffixes use numeric suffixes instead of alphabetic
-l –lines=NUMBER put NUMBER lines per output file
This is however a poor mitigation for your problem - the reason your CSV reader module is running out of memory, is because it's either reading the whole file into memory before splitting it, or it's doing that and keeping your processed output in memory. To make your code more portable and universally runnable, you should consider processing one line at a time - and splitting the input yourself, line by line. (From https://stackabuse.com/reading-and-writing-csvs-in-java/)
BufferedReader csvReader = new BufferedReader(new FileReader(pathToCsv));
while ((row = csvReader.readLine()) != null) {
String[] data = row.split(",");
// do something with the data
}
csvReader.close();
Caveat with the above code is that quoted commas will just be treated as new columns - you will have to add some additional processing if your CSV data contains quoted commas.
Of course, if you really want to use your existing code, and just want to split the file, you can adapt the above:
import java.io.*;
public class split {
static String CSVFile="test.csv";
static String row;
static BufferedReader csvReader;
static PrintWriter csvWriter;
public static void main(String[] args) throws IOException {
csvReader = new BufferedReader(new FileReader(CSVFile));
int line = 0;
while ((row = csvReader.readLine()) != null) {
if (line % 100000 == 0) { // maximum lines per file
if (line>0) { csvWriter.close(); }
csvWriter = new PrintWriter("cut-"+Integer.toString(line)+CSVFile);
}
csvWriter.println(row);
// String[] data = row.split(",");
// do something with the data
line++;
}
csvWriter.close();
csvReader.close();
}
}
I chose PrintWriter above FileWriter or BufferedWriter because it automatically prints the relevent newlines - and I would presume that it's buffered... I've not written anything in Java in 20 years, so I bet you can improve on the above.
I created a simple fun to create a childcsv from parent based on the start and last Range. It can be used as splitter based on line range.
public static void createcsv(String csvPath,String newcsvPath, int startRange, int lastRange) {
csvPath = csvPath.trim();
String childcsvPath = newcsvPath.trim();
Scanner sc = null;
FileWriter writer = null;
int count = 0;
// Iterate to startRange Location
try {
sc = new Scanner(new File(csvPath));
sc.useDelimiter(","); // sets the delimiter pattern
ArrayList<String> newCsv = new ArrayList<String>();
while (sc.hasNextLine()) // returns a boolean value
{
String value = sc.nextLine();
count++;
if (count > lastRange)
break;
else if (count >= startRange) {
newCsv.add(value);
} else
continue;
}
writer = new FileWriter(childcsvPath);
for (int j = 0; j < newCsv.size(); j++) {
writer.append(newCsv.get(j));
writer.append("\n");
}
} catch (Exception e) {
System.out.print("Exception Found" + e);
} finally {
if (sc != null) {
try {
sc.close();
writer.close();
} catch (Exception e) {
}
}
}
}
I'm doing something like this:
for (int i = 0; i < 100000; i++) {
System.out.println( i );
}
Basically, I compute an integer and output a string about 10K-100K times and then need to write the result to system.out, each result separated by a newline.
What's the fastest way to achieve this?
Thank you for the suggestions. I created a test program to compare them:
import java.io.BufferedOutputStream;
import java.io.BufferedWriter;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.lang.StringBuilder;
public class systemouttest {
public static void main(String[] args) throws Exception {
long starttime = System.currentTimeMillis();
for (int i = 0; i < 100000; i++) {
System.out.println( i );
}
long printlntime = System.currentTimeMillis();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 100000; i++) {
sb.append( i + "\n" );
}
System.out.print(sb.toString());
long stringbuildertime = System.currentTimeMillis();
OutputStream out = new BufferedOutputStream ( System.out );
for (int i = 0; i < 100000; i++) {
out.write((i + "\n").getBytes());
}
out.flush();
long bufferedoutputtime = System.currentTimeMillis();
BufferedWriter log = new BufferedWriter(new OutputStreamWriter(System.out));
for (int i = 0; i < 100000; i++) {
log.write(i + "\n");
}
log.flush();
long bufferedwritertime = System.currentTimeMillis();
System.out.println( "System.out.println: " + (printlntime - starttime) );
System.out.println( "StringBuilder: " + (stringbuildertime - printlntime) );
System.out.println( "BufferedoutputStream: " + (bufferedoutputtime - stringbuildertime) );
System.out.println( "BufferedWriter: " + (bufferedwritertime - bufferedoutputtime) );
}
}
Results:
Environment1
System.out.println: 482
StringBuilder: 210
BufferedoutputStream: 86
BufferedWriter: 202
Environment2
System.out.println: 1763
StringBuilder: 45
BufferedoutputStream: 76
BufferedWriter: 34
The suggestions all performed better than System.out.println. BufferedOutputStream seems to be the safest choice as it performed well in both test environments. BufferedWriter maybe faster though.
Please post further suggestions if anyone has some ideas. I'm sure someone can make it go faster :)
For large amount of data,System.out.println might be inefficient as it does not
do very good buffering. In that case, you can use a BufferedOutputStream or a BufferedWriter.
Keep in mind that I/O operations are very slow compared to in-memory processing (e.g. parsing of Integer).
So, I would propose you to create the whole string 'in advance' and then print it out only once (of course if its possible):
StringBuilder sb = new StringBuilder();
for(int i = 0 ; i < 100000; i++) { sb.append(i).append("\n");}
String printMe = sb.toString();
System.out.println(printMe);
There are various techniques like buffering the the level of output stream you're using, but I assume that you prefer to stay with the most basic System.out.println
Hope this helps
This includes fast input and output method as well
import java.io.*;
public class templa{
static class FastReader
{
BufferedReader br;
StringTokenizer st;
public FastReader()
{
br = new BufferedReader(new
InputStreamReader(System.in));
}
String next()
{
while (st == null || !st.hasMoreElements())
{
try
{
st = new StringTokenizer(br.readLine());
}
catch (IOException e)
{
e.printStackTrace();
}
}
return st.nextToken();
}
int nextInt()
{
return Integer.parseInt(next());
}
long nextLong()
{
return Long.parseLong(next());
}
double nextDouble()
{
return Double.parseDouble(next());
}
String nextLine()
{
String str = "";
try
{
str = br.readLine();
}
catch (IOException e)
{
e.printStackTrace();
}
return str;
}
}
public static void main(String...args) throws Exception {
OutputStream outputStream =System.out;
PrintWriter out =new PrintWriter(outputStream);
FastReader in =new FastReader();
int testcase = in.nextInt();
while(testcase-- >0){
//in object works same as Scanner Object but much faster
//out.println() works faster than System.out.println()
//Write your code here
}
out.close();
}
}
The slowest part of writing to System.out is the time taken to display what you are writing. i.e. for every line you write the computer has to turn the information into pixels using a font and scroll a whole line. This is much more work than whatever you are likely to be doing to display the text.
You can speed up writing to the console by
writing less (usually the best idea)
writing to a file instead (This can be 5-10x faster)
I was asked to use huffman code to compress an input file and write it to an output file. I have finished implementing the huffman tree structure and generating the huffman codes. But I dont know how to write those codes into a file so that the file is less in size than the original file.
Right now I have the codes in string representation (e.g huffman code for 'c' is "0100"). Someone please help me write those bits into a
file.
Here a possible implementation to write stream of bits(output of Huffman coding) into file.
class BitOutputStream {
private OutputStream out;
private boolean[] buffer = new boolean[8];
private int count = 0;
public BitOutputStream(OutputStream out) {
this.out = out;
}
public void write(boolean x) throws IOException {
this.count++;
this.buffer[8-this.count] = x;
if (this.count == 8){
int num = 0;
for (int index = 0; index < 8; index++){
num = 2*num + (this.buffer[index] ? 1 : 0);
}
this.out.write(num - 128);
this.count = 0;
}
}
public void close() throws IOException {
int num = 0;
for (int index = 0; index < 8; index++){
num = 2*num + (this.buffer[index] ? 1 : 0);
}
this.out.write(num - 128);
this.out.close();
}
}
By calling write method you will able to write bit by bit in a file (OutputStream).
Edit
For your specific problem, to save each character's huffman code you can simply use this if you don't want to use some other fancy class -
String huffmanCode = "0100"; // lets say its huffman coding output for c
BitSet huffmanCodeBit = new BitSet(huffmanCode.length());
for (int i = 0; i < huffmanCode.length(); i++) {
if(huffmanCode.charAt(i) == '1')
huffmanCodeBit.set(i);
}
String path = Resources.getResource("myfile.out").getPath();
ObjectOutputStream outputStream = null;
try {
outputStream = new ObjectOutputStream(new FileOutputStream(path));
outputStream.writeObject(huffmanCodeBit);
} catch (IOException e) {
e.printStackTrace();
}
I need to read a text file into a 2D array, I can read files into the program perfectly fine (see my code below) however I cannot get my head around how to read them into a 2D array. The array the function is reading into is a global array hence why it's not in the function.
Also I won't know the amount of rows the array has at first (currently set at 300 as it won't be over this) and I know this could cause a problem, I've seen some people suggest using ArrayLists however I have to have a 2D array so I was also wondering if there was a way to change an ArrayList to a 2D array and if this would be more effective?
public static String readMaze(String fileName) {
String line = null;
try {
FileReader fileReader = new FileReader(fileName);
BufferedReader bufferedReader = new BufferedReader(fileReader);
while ((line = bufferedReader.readLine()) != null) {
System.out.println(line);
for (int i = 0; i < mazeNew.length; i++) {
for (int j = 0; j < mazeNew[i].length; j++) {
// mazeNew[i][j] = ; - this is where I think something needs to be added
}
}
}
bufferedReader.close();
}
catch (FileNotFoundException ex) {
System.out.println("Unable to open file: " + fileName);
}
catch (IOException ex) {
System.out.println("Error reading file: " + fileName);
}
return fileName;
}
example text file:
11 4
5 6
4 6
0 5
3 5
8 7
1 4
There's a few options here, but generally you'll want to use the Java Scanner class as it's designed for exactly this kind of thing. Alternatively, use an existing structured data format (like JSON or XML) and an existing parser to go with it - the advantage being you can make use of a vast amount of tools and libraries which deal with those formats and don't have to re-invent anything.
However, following through with the scanner approach, it would be like so:
public static ArrayList<int[]> readMaze(String fileName) {
// Number of ints per line:
int width=2;
// This will be the output - a list of rows, each with 'width' entries:
ArrayList<int[]> results=new ArrayList<int[]>();
String line = null;
try {
FileReader fileReader = new FileReader(fileName);
BufferedReader bufferedReader = new BufferedReader(fileReader);
Scanner mazeRunner = new Scanner(bufferedReader);
// While we've got another line..
while (mazeRunner.hasNextLine()) {
// Setup current row:
int[] row = new int[width];
// For each number..
for (int i = 0; i < width; i++) {
// Read the number and add it to the current row:
row[i] = mazeRunner.nextInt();
}
// Add the row to the results:
results.add(row);
// Go to the next line (optional, but helps deal with erroneous input files):
if ( mazeRunner.hasNextLine() ) {
// Go to the next line:
mazeRunner.nextLine();
}
}
mazeRunner.close();
}
catch (FileNotFoundException ex) {
System.out.println("Unable to open file: " + fileName);
}
catch (IOException ex) {
System.out.println("Error reading file: " + fileName);
}
return results;
}
If you have fixed no. of columns you can use this, but make sure input file must follow the same no of coulmns.
FileReader fileReader = new FileReader(fileName);
Scanner sc = new Scanner(fileReader);
int row=0, col=0;
while ((sc.hasNext()) != null) {
if(col < colSize){ //colSize is size of column
mazeNew[row][col]= sc.nextInt();
}
else{
col=0;
row++;
}
}
Below is the core logic, you would probably also like to to handle some errors, such as how many elements is a line split into, are there empty lines, etc.
List<String[]> list = new ArrayList<>();
Pattern pattern = Pattern.compile("\\s+");
while ((line = bufferedReader.readLine()) != null) {
list.add(pattern.split(line, -1));
}
String[][] mazeNew = list.toArray(new String[0][0]);
Something like this would work
it wont only read 2d text files .. it should work fine with any dimensions
public class Utile{
public static ArrayList<int[]> readMaze(String path){
ArrayList<int[]> result = new ArrayList<>();
try{
Scanner sc = new Scanner(new File(path));
String[] temp;
String line;
while(sc.hasNextLine()){
line = sc.nextLine();
if (line.length() != 0){ //if the line is empty it will cause NumberFormatException
temp = line.split(" ");
int[] val = new int[temp.length];
for(int i = 0;i < temp.length;i++){
val[i] = Integer.pareseInt(temp[i]);
}
result.add(val);
}
}
sc.close();
}catch(Exception e){
e.printStackTrace(); //just log it for now
}
return result;
}
}
I am not a java expert, but in PHP I would do it with explode(). But I found an example how to do the same in java using string.split(). The result is the same ... an 2D Array of the content. If possible you should try to add an delimiter to the rows inside that text document. But you could split the rows on the space character either.
Example:
String foo = "This,that,other";
String[] split = foo.split(",");
StringBuilder sb = new StringBuilder();
for (int i = 0; i < split.length; i++) {
sb.append(split[i]);
if (i != split.length - 1) {
sb.append(" ");
}
}
String joined = sb.toString();
I need to limit the file size to 1 GB while writing preferably using BufferedWriter.
Is it possible using BufferedWriter or I have to use other libraries ?
like
try (BufferedWriter writer = Files.newBufferedWriter(path)) {
//...
writer.write(lines.stream());
}
You can always write your own OutputStream to limit the number of bytes written.
The following assumes you want to throw exception if size is exceeded.
public final class LimitedOutputStream extends FilterOutputStream {
private final long maxBytes;
private long bytesWritten;
public LimitedOutputStream(OutputStream out, long maxBytes) {
super(out);
this.maxBytes = maxBytes;
}
#Override
public void write(int b) throws IOException {
ensureCapacity(1);
super.write(b);
}
#Override
public void write(byte[] b) throws IOException {
ensureCapacity(b.length);
super.write(b);
}
#Override
public void write(byte[] b, int off, int len) throws IOException {
ensureCapacity(len);
super.write(b, off, len);
}
private void ensureCapacity(int len) throws IOException {
long newBytesWritten = this.bytesWritten + len;
if (newBytesWritten > this.maxBytes)
throw new IOException("File size exceeded: " + newBytesWritten + " > " + this.maxBytes);
this.bytesWritten = newBytesWritten;
}
}
You will of course now have to set up the Writer/OutputStream chain manually.
final long SIZE_1GB = 1073741824L;
try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(
new LimitedOutputStream(Files.newOutputStream(path), SIZE_1GB),
StandardCharsets.UTF_8))) {
//
}
Exact bytes to 1 GB is very difficult in cases where you are writing lines. Each line may contain unknown number of bytes in it. I am assuming you want to write data line by line in file.
However, you can check how many bytes does line has before writing it to the file and another approach is to check file size after writing each line.
Following basic example writes one same line each time. Here This is just a test ! text takes 21 bytes on file in UTF-8 encoding. Ultimately after 49 writes it reaches to 1029 Bytes and stops writing.
public class Test {
private static final int ONE_KB = 1024;
public static void main(String[] args) {
File file = new File("D:/test.txt");
try (BufferedWriter writer = Files.newBufferedWriter(file.toPath())) {
while (file.length() < ONE_KB) {
writer.write("This is just a test !");
writer.flush();
}
System.out.println("1 KB Data is written to the file.!");
} catch (IOException e) {
e.printStackTrace();
}
}
}
As you can see we have already written out of the limit of 1KB as above program writes 1029 Bytes and not less than 1024 Bytes.
Second approach is checking the bytes according to specific encoding before writing it to file.
public class Test {
private static final int ONE_KB = 1024;
public static void main(String[] args) throws UnsupportedEncodingException {
File file = new File("D:/test.txt");
String data = "This is just a test !";
int dataLength = data.getBytes("UTF-8").length;
try (BufferedWriter writer = Files.newBufferedWriter(file.toPath())) {
while (file.length() + dataLength < ONE_KB) {
writer.write(data);
writer.flush();
}
System.out.println("1 KB Data written to the file.!");
} catch (IOException e) {
e.printStackTrace();
}
}
}
In this approach we check length of bytes prior to writing it to the file. So, it will write 1008 Bytes and it will stop writing.
Problems with both the approaches,
Write and Check : You may end up with some extra bytes and file size may cross the limit
Check and Write : You may have less bytes than the limit if next line has lot of data in it. You should be careful about the encoding.
However, there are other ways to do this validations with some third party library like apache io and I find it more cumbersome then conventional java ways.
int maxSize = 1_000_000_000;
Charset charset = StandardCharsets.UTF_F);
int size = 0;
int lineCount = 0;
while (lineCount < lines.length) {
long size2 = size + (lines[lineCount] + "\r\n").getBytes(charset).length;
if (size2 > maxSize) {
break;
}
size = size2;
++lineCount;
}
List<String> linesToWrite = lines.substring(0, lineCount);
Path path = Paths.get("D:/test.txt");
Files.write(path, linesToWrite , charset);
Or a bit faster while decoding only once:
int lineCount = 0;
try (FileChannel channel = new RandomAccessFile("D:/test.txt", "w").getChannel()) {
ByteBuffer buf = channel.map(FileChannel.MapMode.WRITE, 0, maxSize);
lineCount = lines.length;
for (int i = 0; i < lines.length; i++) {
bytes[] line = (lines.get(i) + "\r\n").getBytes(charset);
if (line.length > buffer.remaining()) {
lineCount = i;
break;
}
buffer.put(line);
}
}
IIUC, there are various ways to do it.
Keep writing data in chucks and flushing it and keep checking the file size after every flush.
Use log4j (or some logging framework) which can let us rollover to new file after certain size or time or some other trigger point.
While BufferedReader is great, there are some new APIs in java which could make it faster. Fastest way to write huge data in text file Java