I am trying to write Long using Java command for Random File IO as follows:
fstreamOut = new FileOutputStream(new File("C:\\Basmah","dataOutput.7"),true);
DataOutputStream out=new DataOutputStream(fstreamOut);
Long p= Long.parseLong(longNumberInString ); // Number of digits for this long key are 7-15
out.writeLong(p);
The problem is that when I write 7-15 digit number using writeLong ; it writes 8 bytes in file.
Then I am trying to read the same record into my program and decode it
Long l=in.readLong();
but I dont get the same number as I wrote ; Instead Iget EOF exception.
A long id 64-bit long. That makes 8 bytes. The DataOutputStream's writeLong method writes the binary representation of the long, not the textual one.
Without knowing the code used to read the long value, it's impossible to tell why it doesn't work.
The code given in your example and comment should work. The fact that it doesn't suggests that something else is going on here:
Maybe the writing and reading is happening on different files.
Maybe the file being written is not flushed / closed before you attempt to read it.
Maybe something else is overwriting the file.
Maybe the snippets of code you have provided are different enough to the real code to make a difference.
In the code that attempts to read the file, print what you get when you call f.length().
Related
Story
While conducting an experiment I was saving a stream of random Bytes generated by a hardware RNG device. After the experiment was finished, I realized that the saving method was incorrect. I hope I can find the way how to fix the corrupted file so that I obtain the correct stream of random numbers back.
Example
The story of the problem can be explained in the following simple example.
Let's say I have a stream of random numbers in an input file randomInput.bin. I will simulate the stream of random numbers coming from the hardware RNG device by sending the input file to stdout via cat. I found two ways how to save this stream to a file:
A) Harmless saving method
This method gives me exactly the original stream of random Bytes.
import scala.sys.process._
import java.io.File
val res = ("cat randomInput.bin" #> new File(outputFile))!
B) Saving method leading to corruption
Unfortunately, this is the original saving method I chose.
import scala.sys.process._
import java.io.PrintWriter
val randomBits = "cat randomInput.bin".!!
val out = new PrintWriter(outputFile)
out.println(randomBits)
if (out != null) {
out.close()
Seq("chmod", "600", outputFile).!
}
The file saved using method B) is still binary, however, is is approximately 2x larger that the file saved by method A). Further analysis shows that the stream of random Bits is significantly less random.
Summary
I suspect that the saving method B) adds something to almost every byte, however, the understanding of this is behind my expertise in Java/Scala I/O.
I would very much appreciate if somebody explained me the low-level difference between methods A) and B). The goal is to revert the changes created by saving method B) and obtain the original stream of random Bytes.
Thank you very much in advance!
The problem is probably that println is meant for text, and this text is being encoded as Unicode, which uses multiple bytes for some or all characters, depending on which version of Unicode.
If the file is exactly 2x larger than it should be, then you've probably got a null byte every other byte, which could be easy to fix. Otherwise, it may be harder to figure out what you would need to do to recover the binary data. Viewing the corrupted file in a hex editor may help you see what happened. Either way, I think it may be easier to just generate new random data and save it correctly.
Especially if this is for an experiment, if your random data has been corrupted and then fixed, it may be harder to justify that the data is truly random compared to just generating it properly in the first place.
This question already has answers here:
Reading/writing a BINARY File with Strings?
(2 answers)
Closed 8 years ago.
I'm working on a simple VM/interpreter kind of program for a simple toy language I implemented. Currently the compiler emits textual assembly instructions such as push and add to be executed by the VM.
Recently I figured it would be a better idea to actually compile to binary opcodes instead of textual ones, for performance and space. (It doesn't actually matter in this toy project, but this is for learning).
Soon I realized, even though generally I consider myself a decent Java programmer, that I have no idea how to work with binary data in Java. No clue.
Regarding this, I have two questions:
How can I save binary data to a file? For example say I want to save the byte 00000001 and then the byte 00100000 to a file (each byte will be an opcode for the VM) - how can I do that?
How can I read binary data from a file on disk, save it in variables, parse it and manipulate it? Do I use the usual I/0 and parsing techniques I use with regular Strings, or is this something different?
Thanks for your help
You should be able to use any OutputStream to do this, because they all have a write method that takes a byte. For example, you could use a FileOutputStream.
Similarly, you can use an InputStream's read method for reading bytes.
OutputStream and its subtypes have methods to write bytes to files. DataOutputStream has additional methods to write int, long, float, double, etc. if you need those, although it writes them in big endian byte order. The corresponding input streams have equivalent methods for reading.
You may want to try to use ByteArrayInputStream and DataOutputStream
Refer to the Oracle documentation for details.
Also, checkout this similar question: How to output binary data to a file in Java?
Use a FileOutputStream. It has a write(byte[] b) method to write an array of bytes to a file and a write(int b) method to write a single byte to the file.
So, to save 00000001 followed by 00100001, you can do :
FileOutputStream file = new FileOutputStream(new File(path));
file.write(1);
file.write(33);
Similarly you can read FileInputStream for reading from a binary file.
I'm attempting to read magic numbers/bytes to check the format of a file. Will reading a file byte by byte work in the same way on a Linux machine?
Edit: The following shows to get the magic bytes from a class file using an int. I'm trying to do the same for a variable number of bytes.
http://www.rgagnon.com/javadetails/java-0544.html
I'm not sure that I understand what you are trying to do, but it sounds like what you are trying to do isn't the same thing as what the code that you are linking to is doing.
The java class format is specified to start with a magic number, so that code can only be used to verify if a file might be a java class or not. You can't use the same logic and apply it to arbritraty file formats.
Edit: .. or do you only want to check for wav files?
Edit2: Everything in Java is in big endian, that means that you can use DataInputStream.readInt to read the first four bytes from the file, and then compare the returned int with 0x52494646 (RIFF as a big endian integer)
I have a program in C++ and it writes a binary file on disk. Then I use a Java program to read the number. The problem is the number read is different from the number written.
Say, I write an integer 4 using c++ and get back 67108864 when use JAVA to read it (using readint()). I suspect its due to big or small endian. Do you have any simple solutions to solve this?
Java's java.nio buffers let you specify the endianness.
See http://download.oracle.com/javase/6/docs/api/java/nio/ByteBuffer.html especially the order method which lets you specify endianness and the getInt method which lets you read an int.
To read a file using a ByteBuffer do something like:
ByteBuffer buffer = new RandomAccessFile(myFile, "r")
.getChannel.map(MapMode.READ, offset, length);
Remember to close it when you're done.
I am writing a utility in Java that reads a stream which may contain both text and binary data. I want to avoid having I/O wait. To do that I create a thread to keep reading the data (and wait for it) putting it into a buffer, so the clients can check avialability and terminate the waiting whenever they want (by closing the input stream which will generate IOException and stop waiting). This works every well as far as reading bytes out of it; as binary is concerned.
Now, I also want to make it easy for the client to read line out of it like '.hasNextLine()' and '.readLine()'. Without using an I/O-wait stream like buffered stream, (Q1) How can I check if a binary (byte[]) contain a valid unicode line (in the form of the length of the first line)? I look around the String/CharSet API but could not find it (or I miss it?). (NOTE: If possible I don't want to use non-build-in library).
Since I could not find one, I try to create one. Without being so complicated, here is my algorithm.
1). I look from the start of the byte array until I find '\n' or '\r' without '\n'.
2). Then, I cut the byte array from the start to that point and using it to create a string (with CharSet if specified) using 'new String(byte[])' or 'new String(byte[], CharSet)'.
3). If that success without exception, we found the first valid line and return it.
4). Otherwise, these bytes may not be a string, so I look further to another '\n' or '\r' w/o '\n'. and this process repeat.
5. If the search ends at the end of available bytes I stop and return null (no valid line found).
My question is (Q2)Is the following algorithm adequate?
Just when I was about to implement it, I searched on Google and found that there are many other codes for new line, for example U+2424, U+0085, U+000C, U+2028 and U+2029.
So my last question is (Q3), Do I really need to detect these code? If I do, Will it increase the chance of false alarm?
I am well aware that recognize something from binary is not absolute. I am just trying to find the best balance.
To sum up, I have an array of byte and I want to extract a first valid string line from it with/without specific CharSet. This must be done in Java and avoid using any non-build-in library.
Thanks you all in advance.
I am afraid your problem is not well-defined. You write that you want to extract the "first valid string line" from your data. But whether somet byte sequence is a "valid string" depends on the encoding. So you must decide which encoding(s) you want to use in testing.
Sensible choices would be:
the platform default encoding (Java property "file.encoding")
UTF-8 (as it is most common)
a list of encodings you know your clients will use (such as several Russian or Chinese encodings)
What makes sense will depend on the data, there's no general answer.
Once you have your encodings, the problem of line termination should follow, as most encodings have rules on what terminates a line. In ASCII or Latin-1, LF,CR-LF and LF-CR would suffice. On Unicode, you need all the ones you listed above.
But again, there's no general answer, as new line codes are not strictly regulated. Again, it would depend on your data.
First of all let me ask you a question, is the data you are trying to process a legacy data? In other words, are you responsible for the input stream format that you are trying to consume here?
If you are indeed controlling the input format, then you probably want to take a decision Binary vs. Text out of the Q1 algorithm. For me this algorithm has one troubling part.
`4). Otherwise, these bytes may not be a string, so I look further to
another '\n' or '\r' w/o '\n'. and this process repeat.`
Are you dismissing input prior to line terminator and take the bytes that start immediately after, or try to reevaluate the string with now 2 line terminators? If former, you may have broken binary data interface, if latter you may still not parse the text correctly.
I think having well defined markers for binary data and text data in your stream will simplify your algorithm a lot.
Couple of words on String constructor. new String(byte[], CharSet) will not generate any exception if the byte array is not in particular CharSet, instead it will create a string full of question marks ( probably not what you want ). If you want to generate an exception you should use CharsetDecoder.
Also note that in Java 6 there are 2 constructors that take charset
String(byte[] bytes, String charsetName) and String(byte[] bytes, Charset charset). I did some simple performance test a while ago, and constructor with String charsetName is magnitudes faster than the one that takes Charset object ( Question to Sun: bug, feature? ).
I would try this:
make the IO reader put strings/lines into a thread safe collection (for example some implementation of BlockingQueue)
the main code has only reference to the synced collection and checks for new data when needed, like queue.peek(). It doesn't need to know about the io thread nor the stream.
Some pseudo java code (missing exception & io handling, generics, imports++) :
class IORunner extends Thread {
IORunner(InputStream in, BlockingQueue outputQueue) {
this.reader = new BufferedReader(new InputStreamReader(in, "utf-8"));
this.outputQueue = outputQueue;
}
public void run() {
String line;
while((line=reader.readLine())!=null)
this.outputQueue.put(line);
}
}
class Main {
public static void main(String args[]) {
...
BlockingQueue dataQueue = new LinkedBlockingQueue();
new IORunner(myStreamFromSomewhere, dataQueue).start();
while(true) {
if(!dataQueue.isEmpty()) { // can also use .peek() != null
System.out.println(dataQueue.take());
}
Thread.sleep(1000);
}
}
}
The collection decouples the input(stream) more from the main code. You can also limit the number of lines stored/mem used by creating the queue with a limited capacity (see blockingqueue doc).
The BufferedReader handles the checking of new lines for you :) The InputStreamReader handles the charset (recommend setting one yourself since the default one changes depending on OS etc.).
The java.text namespace is designed for this sort of natural language operation. The BreakIterator.getLineInstance() static method returns an iterator that detects line breaks. You do need to know the locale and encoding for best results, though.
Q2: The method you use seems reasonable enough to work.
Q1: Can't think of something better than the algorithm that you are using
Q3: I believe it will be enough to test for \r and \n. The others are too exotic for usual text files.
I just solved this to get test stubb working for Datagram - I did byte[] varName= String.getBytes(); then final int len = varName.length; then send the int as DataOutputStream and then the byte array and just do readInt() on the rcv then read bytes(count) using the readInt.
Not a lib, not hard to do either. Just read up on readUTF and do what they did for the bytes.
The string should construct from the byte array recovered that way, if not you have other problems. If the string can be reconstructed, it can be buffered ... no?
May be able to just use read / write UTF() in DataStream - why not?
{ edit: per OP's request }
//Sending end
String data = new String("fdsfjal;sajssaafe8e88e88aa");// fingers pounding keyboard
DataOutputStream dataOutputStream = new DataOutputStream();//
final Integer length = new Integer(data.length());
dataOutputStream.writeInt(length.intValue());//
dataOutputStream.write(data.getBytes());//
dataOutputStream.flush();//
dataOutputStream.close();//
// rcv end
DataInputStream dataInputStream = new DataInputStream(source);
final int sizeToRead = dataInputStream.readInt();
byte[] datasink = new byte[sizeToRead.intValue()];
dataInputStream.read(datasink,sizeToRead);
dataInputStream.close;
try
{
// constructor
// String(byte[] bytes, int offset, int length)
final String result = new String(datasink,0x00000000,sizeToRead);//
// continue coding here
Do me a favor, keep the heat off of me. This is very fast right in the posting tool - code probably contains substantial errors - it's faster for me just to explain it writing Java ~ there will be others who can translate it to other code language ( s ) which you can too if you wish it in another codebase. You will need exception trapping an so on, just do a compile and start fixing errors. When you get a clean compile, start over from the beginnning and look for blunders. ( that's what a blunder is called in engineering - a blunder )