I have a following usecase.
A process serializes certain objects to a file using BufferedOutputStream.
After writing each object, process invokes flush()
The use case is that if the process crashes while writing an object, I want to recover the file upto the previous object that has been written successfully.
How can I deserialize such file? How will Java behave while deserializing such file.
Will it successfully deserialize upto the object that were written successfully before crash?
While reading the last partially written object, what will be the behavior. How can I detect that?
Update1 -
I have tried to simulate process crash via manually killing the process while objects are being written. I have tried around 10-15 times.Each time i am able to deserialize the file and file does not has any partial object.
I am not sure if my test is exhaustive enough and therefore need further advice.
Update2 - Adam had pointed a way which could simulate such test using truncating the file randomly.
Following is the behavior observed for trying out around 100 iterations -
From the truncated file ( which should be equivalent to the condition of file when a process crashes), Java can read upto last complete object successfully.
Upon reaching the last partially written object, Java does not throw any StreamCorruptedException or IOException. It simply throws EOFException indicated EOF and ignores the partial object.
Each object is deserialized or not before reading the next one. It won't be impacted because a later object failed to be written or will fail to deserialize
I suspect you are misusing java serialization - it's not intended to be a reliable and recoverable means of permanent storage. Use a database for that. If you must, you can
use a database to store the serialized form of java objects, but that would be pretty inefficient.
Yeah, testing such scenario manually (by killing the process) may be difficult. I would suggest writing a test case, where you :
Serialize a set of objects and write them to a file .
Open the file and basically truncate it at random position.
Try to load and deserialize (and see what happens)
Repeat 1. to 3. with several other truncate positions.
This way you are sure that you are loading a broken file and that your code handles it properly.
Have you tried appending to ObjectOutputStream? You can find the solution HERE just find the post where explains how to create an ObjectOutputStream with append.
Related
I am dealing with an object in Java that is very expensive to compute and several megabytes in size. In order to preserve it across application restarts, I want to serialize it into a File, and re-load that file on startup (if present).
The problem is that most file systems are not transactional. The file writing process can be interrupted by exceptions, JVM termination and/or power failure. What I absolutely need to assert is that if the file is used, the information within is complete. I can throw away the information and recalculate if needed, but reading and relying on incomplete data must be avoided.
My attempt would be to serialize and write a "seal" object at the end of the file, like a checksum for example. The presence of this object during deserialization guarantees that the serialization process was complete. If the seal object is absent during deserialization, I know that I cannot trust the data in the file as it might be incomplete. I am looking for an OS-independent solution, and I do not need to consider "attacks" that maliciously modify the contents of the serialized file.
My question is: Is the seal object approach outlined above safe, or are there still some corner cases where I can end up reading an incomplete file without noticing it?
Just write the file under a different, temporary name. Once the file is complete, delete any previous version of the file and rename the new file to the real name.
If program dies during write, you're just left with an incomplete temp file. The real file is still as before (or missing), so you'll never see an incomplete file to load.
I am receiving files through a socket
and saving them to database.
So, i'm receiving the byte stream, and passing it
to a back-end process, say Process1
for the DB save.
I'm looking to do this without saving
the stream on disk. So, rather than storing the incoming stream
as a file on disk and then passing that file to Process1,
i'm looking to pass it while it's still in the memory.
This is to eliminate the time-costly disk read & write.
One way i can do is to pass the byte[] to Process1.
I'm wondering whether there's a better way of doing this.
TIA.
You can use a ByteArrayOutputStream. It is, essentially, a growable byte[] which you can write into at will, that is in the limit of your available heap space.
After having written to it/flushed it/closed it (although those two last operations are essentially a no-op, that's no reason for ditching sane practices), you can obtain the underlying byte array using this class's .toByteArray().
Socket sounds like what you are looking for.
I've read some posts on stackoverflow about this topic but I'm still confused. When reading a file that is currently being written in Java, how do you keep track of how many lines have actually been written so that you don't get weird read results?
EDIT: sorry, I should have mentioned that the file writing it is in C++ and the one reading it is in Java so variables can't really be shared easily
When reading a file that is currently being written in Java, how do you keep track of how many lines have actually been written so that you don't get weird read results?
The problem is that you can never be sure that the current last character of the file is the end of a line. If it is a line terminator, you are OK. If BufferedReader.readLine() will interpret it as a complete line without a line terminator ... and weird results will ensue.
What you need to do is to implement your own line buffering. When you get an EOF you wait until the file grows some more and then resume reading the line.
Alternatively, if you are using Java 7 or later, the file watcher APIs allow you to watch for file writes without polling the file's size.
By the way, there is an Apache commons class that is designed for doing this kind of thing:
http://commons.apache.org/io/api-2.0/org/apache/commons/io/input/Tailer.html
If I understand, the file is being written in C# in some process and another Java process wants to read it while it is being written.
Look at File Monitoring section on the tail command here. But I want to warn you that when I used the cygwin tail on Windows recently to follow log files that were rolling over, it sometimes failed under heavy load. Other implementations may be more robust.
To have a count of the number of lines, just keep a counter on the side that's doing the writing.
So, every time you write a line, increment a counter, and make that counter readable via a method, something like, public int getNumLinesWritten()
The obvious answer to me... Why not use a buffer? Use a string or whatever you need. (You could use a list/array of strings if you want, one for each line maybe?) Append to the string just as you would write to the file, then instead of reading from the file, read from that string. Would that work for you?
The app I am working on needs to read a JSON file that may be anywhere from 1.5 to 3 MB in size. It seems to have no problem opening the file and converting the data to a string, but when it attempts to convert the string to a JSONArray, OutOfMemoryErrors are thrown. The exceptions look something like this:
E/dalvikvm-heap( 5307): Out of memory on a 280-byte allocation.
W/dalvikvm( 5307): Exception thrown (Ljava/lang/OutOfMemoryError;) while throwing internal exception (Ljava/lang/OutOfMemoryError;)
One strange thing about this is that the crash only occurs every 2nd or 3rd time the app is run, leaving me to believe that the memory consumed by the app is not being garbage collected each time the app closes.
Any insight into how I might get around this issue would be greatly appreciated. I am open to the idea of loading the file in chunks, but I'm not quite sure what the best approach is for such a task.
Thank you
When you say "2nd or 3rd its run" do you mean each time your starting with a fresh emulator? or do you mean leaving the application and coming back? (for instance pressing home, or calling finalize() )
If your referring to leaving the application and re launching it:
if you haven't set android:launchMode in your manifest to define the activity as singleInstance or singleTask then each time the application is launched a new activity is created and added to the activity stack. You could easily have multiple copies of your activity running in your application process eating a lot of memory.
If its happening the 2nd launch your still using a a lot memory and should break it down more.
One strange thing about this is that the crash only occurs every 2nd or 3rd time the app is run, leaving me to believe that the memory consumed by the app is not being garbage collected each time the app closes.
That is certainly possible, and if it is the case then it probably due to a memory leak that can be traced back to something that your application is doing. I think you should focus your initial efforts into investigating this aspect ... rather than loading the file in chunks. (I am not familiar with the Android tool-chain, but I am sure it includes memory usage profilers or memory dump analysers.)
EDIT
In response to your followup comment, the fact that it works 2 times in 3 suggests that your app ought to work roughly as-is. Admittedly, you don't have much leeway if the input file gets bigger.
A couple of ideas though:
Rather than reading the file into a String and running the JSON parser on the String, use a parser that can read directly from a stream. Your current solution needs space for two complete copies of the data in memory while you are doing the parsing.
If the file gets much bigger, you may need to think of a design that doesn't create a complete in-memory representation of the data.
I'm not sure that it is a good idea to read a JSON file in "chunks". This could present problems for parsing the JSON ... depending on exactly what you mean by reading in chunks.
EDIT 2
Maybe what you need is a "SAX like" JSON parser; e.g. http://code.google.com/p/async-json-library/
Try to use parse JSON Data Efficiently on Android, using JsonReader. It's like SAX parsing for XML.
We have a process that outputs the contents of a large XML file to System.out.
When this output is pretty printed (ie: multiple lines) everything works. But when it's on one line Eclipse crashes with an OutOfMemory error. Any ideas how to prevent this?
Sounds like it is the Console panel blowing up. Consider limiting its buffer size.
EDIT: It's in Preferences. Search for Console.
How do you print it on one line?
using several System.out.print(String s)
using System.out.println(String verybigstring)
in the second case, you need a lot more memory...
If you want more memory for eclipse, could try to increase eclipses memory by changing the -Xmx value in eclipse.ini
I'm going to assume that you're building an org.w3c.Document, and writing it using a serializer. If you're hand-building an XML string, you're all but guaranteed to be producing something that's almost-but-not-quite XML, and I strongly suggest fixing that first.
That said, if you're writing to a stream from the serializer (and System.out is a stream), then you should be writing directly to the stream rather than writing to a string and printing that (which you'd do with a StringWriter). The reason for this is that the XML serializer will properly handle character encodings, while serializer to String to stream may not.
If you're not currently building a DOM, and are concerned about the memory requirements of doing so, then I suggest looking at the Practical XML library (which I maintain), in particular the builder package. It uses lightweight nodes, that are then output via a serializer using a SAX transform.
Edit in response to comment:
OK, you've got the serializer covered with XStream. I'm next going to assume that you are calling XStream.toXML(Object) to produce the string, and recommend that you call the variant toXML(Object, OutputStream), and pass it the actual output. The reason for this is that XML is very sensitive to character encoding, which is something that often breaks when converting strings to streams.
This may, of course, cause issues with building your POST request, particularly if you're using a library that doesn't provide you an OutputStream.