How to solve this?
File f=new File("d:/tester.txt");
long size=f.length(); // returns the size in bytes
char buff[]=new char[size]; // line of ERROR
// will not accept long in it's argument
// I can't do the casting (loss of data)
Is it possible to use size as the length of buff without the loss of data?
If yes how can i use it?
My second question is :
Why i am not getting the actual number of bytes?
This is the program :
import java.io.*;
class tester {
public static void main(String args[]) {
File f=new File("c:/windows/system32/drivers/etc/hosts.File");
long x=f.length(); // returns the number of bytes read from the file
System.out.println("long-> " + x );
}
}
The output is long-> 0 ,but obviously it is not so.Why do i get this result?
You need to cast the long to an int
char buff[]=new char[(int) size];
This will only work for files less than 2 GB in size.
However, if you intend to use this to read the file perhaps you meant
byte[] buff=new byte[(int) size];
I would look at FileUtils and IOUtils from Apache Commons IO which has lots of help methods.
I doubt you have a file with that name. perhaps you need to drop the .File at the end which sounds like an odd extension.
I would check f.exists() first.
Related
Suppose my file is 2GB, I want some specific data from one. index to another index(considering specific data 300MB between two index), what is the better way to do that?? I tried substring but throwing out of memory exception. Please suggest better way to do same.
In general, assuming that 2GB file is on disk, and you want to read some part from it into memory, you absolutely don't have to read the whole 2GB into memory first.
The most straightforward solution is using Random Access File
The point is that it provides an abstraction of a pointer that can be moved back and forth over a big file and once you're set you can read bytes from the place the pointer points on.
RandomAccessFile file = new RandomAccessFile(path, "r");
file.seek(position);
byte[] bytes = new byte[size];
file.read(bytes);
file.close();
Reading the file by character and writing them to the output file can solve the issue. Since it won't load the whole file at once.
So, the process will be - read the input file by character, continue to the desired substring start index, then start writing to an output file until the end of the substring.
If you are getting Exception in thread "main" java.lang.OutOfMemoryError: Java heap space, you can try increasing the heap size if you really need to read the file at once and you are sure that String size won't go past max String size limit.
The following snippet shows the idea above -
import java.io.*;
public class LargeFileSubstr {
public static void main(String[] args) throws IOException {
BufferedReader r = new BufferedReader(new FileReader("/Users/me/Downloads/big.txt"));
try (PrintWriter wr = new PrintWriter(new FileWriter("/Users/me/Downloads/big_substr.txt"))) {
int startIndex = 100;
int endIndex = 200;
int pointer = 0;
int ch;
while ((ch = r.read()) != -1) {
if (pointer > endIndex) {
break;
}
if (pointer >= startIndex) {
wr.print((char) ch);
}
pointer++;
}
}
}
}
I have tried this to take a 200MB substring out of 2GB file, works pretty reasonably fast.
This question already has answers here:
How do I convert a large binary String to byte array java?
(3 answers)
Closed 6 years ago.
I want to store some 0s and 1s into memory
I do not know how to explain this clearly but I will try my best to do so.
Let's say, I have an IMAGE file of around 420bytes.
red icon
I want to visualize its binary code meaning I want to see the 0s and 1s. I run this piece of code to do that and this works just fine...
import java.util.Scanner;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
public class fileToBin {
public static void main(String[] args) throws Exception {
StringBuilder sb = new StringBuilder();
Scanner ana = new Scanner(System.in);
System.out.println("File?");
String fileName = ana.nextLine();
try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(fileName))) {
for (int b; (b = is.read()) != -1;) {
String s = "0000000" + Integer.toBinaryString(b);
s = s.substring(s.length() - 8);
sb.append(s);
}
}
System.out.println(sb);
}
}
I send FF0000.png as input and got the following as output...
100010010101000001001110010001110000110100001010000110100000101000000000000000000000000000001101010010010100100001000100010100100000000000000000000000001000000000000000000000000000000010000000000010000000011000000000000000000000000011000011001111100110000111001011000000000000000000000000000000010111001101010010010001110100001000000000101011101100111000011100111010010000000000000000000000000000010001100111010000010100110101000001000000000000000010110001100011110000101111111100011000010000010100000000000000000000000000001001011100000100100001011001011100110000000000000000000011101100001100000000000000000000111011000011000000011100011101101111101010000110010000000000000000000000000100111001010010010100010001000001010101000111100001011110111011011101001000110001000000010000000000000000000011001100001110100000111110100011011110111101000010010000100100000111000011101101100001101101010001111001011100000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010010001000000100111010000001001110000000000011100010000001011000100000010011001000010110110011110110101011011010001100001110111001011110010011001011111011001101010000000000000000000000000000000000100100101000101010011100100010010101110010000100110000010000010
I understand that this is the memory orientation(please correct me if I am wrong about any of these terms) of this particular file.
Now, let's say I do not have nay image file and I did not retrieved and binary code of any image file. The only thing I have is this 0s and 1s and I do not know whether this set of 0s and 1s actually represent a file or not. I have no idea what this represents.
I want to insert/load this 0s and 1s into computer memory. How can I do that?
This can be called the reverse process of my earlier action where I retrieved binary code from a file. Now, I want to insert some 0s and 1s into memory and save it as a file. That does not need to be an IMAGE file, any file extension can be okay. Because I assumed that I am not aware of the presence of any image file.
So, my main task is I have some 0s and 1s and I want to load it to memory and save as a file. Is it possible to do that? How can I do this with Java or any other programming language? How does this memory and binary representation work?
Sorry for my noobness and thank you for your patience :)
Given a String of binary called str and some sort of OutputStream (e.g. a FileOutputStream) called out:
For every 8 characters in str, get the byte's numerical value with Integer.parseInt, and write it to out.
String str = ...;
OutputStream out = ...;
for (int i = 0; i < str.length; i += 8) {
String byteStr = str.substring(i, i+8);
int byteVal = Integer.parseInt(byteStr, 2);
out.write(byteVal);
}
Note that this will cause an IndexOutOfBoundsException if str.length isn't a multiple of 8.
thanks for reading this
Well what I'm trying to do is to take a .wav file (only a short audio) and convert it to ints, and every one represent a tone of the audio...
If you're asking why I'm doing this, is because I'm doing an arduino project, and I want to make the arduino to play a song, and for doing that I need an int array where every int is a tone.
So I thought, "well if I program a little application to convert any .wav file to a txt where are stored the ints that represent the melody notes, I just need to copy this values to the arduino project code";
So after all this, maybe you're asking "What is your problem?";
I done the code and is "working", the only problem is that the txt only have "1024" in each line...
So it's obviously that I'm having a problem, no all the tones are 1024 -_-
package WaveToText;
import java.io.*;
/**
*
* #author Luis Miguel Mejía Suárez
* #project This porject is to convert a wav music files to a int array
* Which is going to be printed in a txt file to be used for an arduino
* #serial 1.0.1 (05/11/201)
*/
public final class Converter
{
/**
*
* #Class Here is where is going to be allowed all the code for the application
*
* #Param Text is an .txt file where is going to be stored the ints
* #Param MyFile is the input of the wav file to be converted
*/
PrintStream Text;
InputStream MyFile;
public Converter () throws FileNotFoundException, IOException
{
MyFile = new FileInputStream("C:\\Users\\luismiguel\\Dropbox\\ESTUDIO\\PROGRAMAS\\JAVA\\WavToText\\src\\WaveToText\\prueba.wav");
Text = new PrintStream(new File("Notes.txt"));
}
public void ConvertToTxt() throws IOException
{
BufferedInputStream in = new BufferedInputStream(MyFile);
int read;
byte[] buff = new byte[1024];
while ((read = in.read(buff)) > 0)
{
Text.println(read);
}
Text.close();
}
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException{
// TODO code application logic here
Converter Exc = new Converter();
Exc.ConvertToTxt();
}
}
Wait wait wait..... a lot of things aren't right here....
You can't just read the bytes and send them to Arduino because as you are saying Arduino expects note numbers. The numbers in a Wav file are, first the "header" with audio info, and then the numbers representing discrete points in the signal (Waveform). If you want to get notes you need some algorithms for pitch detection or music transcription.
Pitch detection could work if your music is monophonic or close to monophonic. For full band songs it would be troublesome. So... I guess the "Arduino part" will play monophonic music, and you need to extract the fundamental frequency of the signal in particular time moment (This is called pitch detection and there are different ways to do it (autocorrelation, amdf, spectral analisys)). You must also keep the timing of the notes.
When you extract the frequencies there is a formula to convert frequency into integer number representing a note number on a piano. n=12(log2(f/440)) + 49 where n is the integer note number and f is the fundamental frequency of the note. Before calculating you should also quantize the frequencies you get from the pitch recognition algorithm to the closest (google for the exact note frequencies).
However I really suggest to do some more research. It would be really difficult to detect note in a music where you have few instruments playing, drums, singer, all together....
while ((read = in.read(buff)) > 0)
{
Text.println(read);
}
This bit of code reads 1024 bytes of data from in, then assigns the number of bytes read to read, which is 1024, until the end of file. You then print read to your text file.
You probably wanted to print buff to your text file, but that is going to write 1024 bytes, rather than the 1024 ints you want.
You will need to create a for loop to print the individual bytes as ints.
while ((read = in.read(buff)) > 0)
{
for (int i = 0; i < buff.length; i++)
Text.print((int)buff[i]);
}
I want to read the first x bytes from a java.net.URLConnection (although I'm not forced to use this class - other suggestions welcome).
My code looks like this:
val head = new Array[Byte](2000)
new BufferedInputStream(connection.getInputStream).read(head)
IOUtils.toString(new ByteArrayInputStream(head), charset)
It works, but does this code load only the first 2000 bytes from the network?
Next trial
As 'JB Nizet' said it is not useful to use a buffered input stream, so I tried it with an InputStreamReader:
val head = new Array[Char](2000)
new InputStreamReader(connection.getInputStream, charset).read(head)
new String(head)
This code may be better, but the load times are about the same. So does this procedure limit the transferred bytes ?
No, it doesn't. It could read up to 8192 bytes (the deault buffer size of BufferedInputStream). It could also read 0 bytes, or any number of bytes between 0 and 2000, since you don't check the number of bytes that have actually been read, and which is returned by the read() method.
And finally, depending on the value of charset, and of the actual charset used by the HTTP response, this could return an incorrect string, or a String truncated in the middle of a multi-byte character. You should use a Reader to read text.
I suggest you read the Java IO tutorial.
You can use read(Reader, char[]) from Apache Commons IO. Just pass a 2000-character buffer to it and it will fill it with as many characters as possible, up to 2000.
Be sure you understand the objections in the other answers/comments, in particular:
Don't use Buffered... wrappers, it goes against your intentions.
If you read textual data, then use a Reader to read 2000 characters instead of InputStream reading 2000 bytes. The proper procedure would be to determine the character encoding from the headers of a response (Content-Type) and set that encoding into InputStreamReader.
Calling plain read(char[]) on a Reader will not fully fill the array you give to it. It can read as little as one character no matter how big the array is!
Don't forget to close the reader afterwards.
Other than that, I'd strongly recommend you to use Apache HttpClient in favor of java.net.URLConnection. It's much more flexible.
Edit: To understand the difference between Reader.read and IOUtils.read, it's worth examining the source of the latter:
public static int read(Reader input, char[] buffer,
int offset, int length)
throws IOException
{
if (length < 0) {
throw new IllegalArgumentException("Length must not be negative: " + length);
}
int remaining = length;
while (remaining > 0) {
int location = length - remaining;
int count = input.read(buffer, offset + location, remaining);
if (EOF == count) { // EOF
break;
}
remaining -= count;
}
return length - remaining;
}
Since Reader.read can read less characters than a given length (we only know it's at least 1 and at most the length), we need to iterate calling it until we get the amount we want.
I'm working with a very big text file (755Mb).
I need to sort the lines (about 1890000) and then write them back in another file.
I already noticed that discussion that has a starting file really similar to mine:
Sorting Lines Based on words in them as keys
The problem is that i cannot store the lines in a collection in memory because I get a Java Heap Space Exception (even if i expanded it at maximum)..(already tried!)
I can't either open it with excel and use the sorting feature because the file is too large and it cannot be completely loaded..
I thought about using a DB ..but i think that writing all the lines then use the SELECT query it's too much long in terms of time executing..am I wrong?
Any hints appreciated
Thanks in advance
I think the solution here is to do a merge sort using temporary files:
Read the first n lines of the first file, (n being the number of lines you can afford to store and sort in memory), sort them, and write them to file 1.tmp (or however you call it). Do the same with the next n lines and store it in 2.tmp. Repeat until all lines of the original file has been processed.
Read the first line of each temporary file. Determine the smallest one (according to your sort order), write it to the destination file, and read the next line from the corresponding temporary file. Repeat until all lines have been processed.
Delete all the temporary files.
This works with arbitrary large files, as long as you have enough disk space.
You can run the following with
-mx1g -XX:+UseCompressedStrings # on Java 6 update 29
-mx1800m -XX:-UseCompressedStrings # on Java 6 update 29
-mx2g # on Java 7 update 2.
import java.io.*;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class Main {
public static void main(String... args) throws IOException {
long start = System.nanoTime();
generateFile("lines.txt", 755 * 1024 * 1024, 189000);
List<String> lines = loadLines("lines.txt");
System.out.println("Sorting file");
Collections.sort(lines);
System.out.println("... Sorted file");
// save lines.
long time = System.nanoTime() - start;
System.out.printf("Took %.3f second to read, sort and write to a file%n", time / 1e9);
}
private static void generateFile(String fileName, int size, int lines) throws FileNotFoundException {
System.out.println("Creating file to load");
int lineSize = size / lines;
StringBuilder sb = new StringBuilder();
while (sb.length() < lineSize) sb.append('-');
String padding = sb.toString();
PrintWriter pw = new PrintWriter(fileName);
for (int i = 0; i < lines; i++) {
String text = (i + padding).substring(0, lineSize);
pw.println(text);
}
pw.close();
System.out.println("... Created file to load");
}
private static List<String> loadLines(String fileName) throws IOException {
System.out.println("Reading file");
BufferedReader br = new BufferedReader(new FileReader(fileName));
List<String> ret = new ArrayList<String>();
String line;
while ((line = br.readLine()) != null)
ret.add(line);
System.out.println("... Read file.");
return ret;
}
}
prints
Creating file to load
... Created file to load
Reading file
... Read file.
Sorting file
... Sorted file
Took 4.886 second to read, sort and write to a file
divide and conquer is the best solution :)
divide your file to smaller ones, sort each file seperately then regroup.
Links:
Sort a file with huge volume of data given memory constraint
http://hackerne.ws/item?id=1603381
Algorithm:
How much memory do we have available? Let’s assume we have X MB of memory available.
Divide the file into K chunks, where X * K = 2 GB. Bring each chunk into memory and sort the lines as usual using any O(n log n) algorithm. Save the lines back to the file.
Now bring the next chunk into memory and sort.
Once we’re done, merge them one by one.
The above algorithm is also known as external sort. Step 3 is known as N-way merge
Why don't you try multithreading and increasing heap size of the program you are running? (this also requires you to use merge sort kind of thing provided you have more memory than 755mb in your system.)
Maybe u can use perl to format the file .and load into the database like mysql. it's so fast. and use the index to query the data. and write to another file.
u can set jvm heap size like '-Xms256m -Xmx1024m' .i hope to help u .thanks