Java: check if object exists in object file before writing - java

I have a file called "objects.txt" which contains some serializable objects.
I want to write some objects to the file.
Is there a way to check if the objects I want to write to the file already exist in the file before writing? Would it be better to not check even if the objects already exist in the file?
Below is example of writing object to file:
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import javax.swing.JFrame;
public class WriteObjectsDemo {
public static void main(String[] args)
{
try(FileOutputStream f = new FileOutputStream("objects.txt"))
{
ObjectOutputStream o = new ObjectOutputStream(f);
// Write objects to file
JFrame j = new JFrame();
o.writeObject(j);
}
catch (IOException e)
{
e.printStackTrace();
}
}
}

I'd say the answer is maybe. It might be pushing the serialization machinery beyond its comfort zone, but if it's going to work at all it'll go something like this:
First, read through the file once using a FileInputStream wrapped in an ObjectInputStream in order to determine whether or not the file already contains your object. Close the stream when you're done.
Then, if you decide you want to write your object, open the file for appending with new FileOutputStream(file, true), wrap that stream in an ObjectOutputStream and write away.
P.S.: I'd suggest reconsidering the .txt extension on your filename. The serialized object data is most definitely not text.

Is there a way to check if the objects I want to write to the file already exist in the file before writing?
Yes.
Read the entire file, deserialize every object in it, and see if the object you're about to write is already there.
Not very efficient, is it?
So one better way:
When your process starts, read all the objects in the file into a Set<>.
While you're processing, add objects to that Set<>. Since a Set<> only allows a single instance of any object, duplicate objects will be dropped.
When you're done processing, rewrite the entire file from your Set<>, serializing every object in it to the file.
Note that to implement this, your objects need to properly override the equals() method and the hashCode() method so equivalent objects compare as equals. See Compare two objects with .equals() and == operator to start - and read the accepted answer - all of it. Then read the links. Then think hard about what equals() means for your objects. Then implement equals() and hashCode() methods in your Java code that work.

Related

Reading and writing objects via GZIP streams?

I am new to Java. I want to learn to use GZIPstreams. I already have tried this:
ArrayList<SubImage>myObject = new ArrayList<SubImage>(); // SubImage is a Serializable class
ObjectOutputStream compressedOutput = new ObjectOutputStream(
new BufferedOutputStream(new GZIPOutputStream(new FileOutputStream(
new File("....")))));
compressedOutput.writeObject(myObject);
and
ObjectInputStream compressedInput = new ObjectInputStream(
new BufferedInputStream(new GZIPInputStream(new FileInputStream(
new File("....")))));
myObject=(ArrayList<SubImage>)compressedInput.readObject();
When the program writes myObject to a file without throwing any exception, but when it reaches the line
myObject=(ArrayList<SubImage>)compressedInput.readObject();
it throws this exception:
Exception in thread "main" java.io.EOFException: Unexpected end of ZLIB input stream
How can I solve this problem?
You have to flush and close your outputstream. Otherwhise, at least, the BufferedOutputStream will not write everything to the file (it does in big chucks to avoid penalizing performance).
If you call compressedOutput.flush() and compressedOutput.close() it will suffice.
You can try writing a simple string object and checking if the file is well written.
How? If you write a xxx.txt.gz file you can open it with your preferred zip app and look at the xxx.txt. If the app complains, then the content is not full written.
Extended answer to a comment: compressing even more the data
Changing serialization
You could change the standard serialization of SubImage object if it's an object of your own. Check java.io.Serializable javadoc to know how to do it. It's pretty straightforward.
Writing just what you need
Serialization has the drawback that needs to write "it's a SubImage" just before every instance you write. It's not necessary if you know what's going to be there beforehand. So you could try to serialize it more manually.
To write your list, instead of writing an object write directly the values that conform your list. You will need just a DataOutputStream (but ObjectOutputStream is a DOS so you can use it anyway).
dos.writeInt(yourList.size()); // tell how many items
for (SubImage si: yourList) {
// write every field, in order (this should be a method called writeSubImage :)
dos.writeInt(...);
dos.writeInt(...);
...
}
// to read the thing just:
int size = dis.readInt();
for (int i=0; i<size; i++) {
// read every field, in the same order (this should be a method called readSubImage :)
dis.readInt(...);
dis.readInt(...);
...
// create the subimage
// add it to the list you are recreating
}
This method is more manual but if:
you know what's going to be written
you will not need this kind of serialization for many types
it's pretty affordable and definitively more compressed than the Serializable counterpart.
Have in mind that there are alternative frameworks to serialize objects or create string messages (XStream for xml, Google Protocol Buffers for binary messages, and so on). That frameworks could work directly to binary or writing a string that could be then written.
If your app will need more on this, or just curious, maybe you should look at them.
Alternative serialization frameworks
Just looked in SO and found several questions (and answers) addressing this issue:
https://stackoverflow.com/search?q=alternative+serialization+frameworks+java
I've found that XStream is pretty easy and straightforward to use. And JSON is a format pretty readable and succint (and Javascript compatible which could be a plus :).
I should go for:
Object -> JSON -> OutputStreamWriter(UTF-8) -> GZippedOutputStream -> FileOutputStream

Java ObjectOutputStream and updating a file

I am having trouble figuring out one implementation problem, I have one class, it behaves like list but instead of holding a file in some collection it saves them on a disk.
The problem occurs when I want to add some element to my list. At the start of my file I have one int that tells me how many objects there are in my list, but I can't figure out elegant way to update this value. I have something like this:
public boolean add(T element)
{
try
{
out.writeObject(element);
out.flush();
//and here we need to update the int in my file
} catch (IOException e)
{
e.printStackTrace();
}
return true;
}
I tried to use something like this:
ObjectOutputStream upd=new ObjectOutputStream(new FileOutputStream(data.getAbsolutePath(),true));
but as I observed it writes some data to the start of the file, some serialization header or sth, how can I update single entry in my file or how to change
ObjectOutputStream
"pointer" to write at the beginning of the file?
Typically with stream based classes (especially higher order streams like OOS), you should rewrite the whole file, anytime you update it.
If you really INSIST on only updating part of a file, then you should think of the file as made up of N streams, where each 'stream' represents one object that you are writing. So i would use a RandomAccessFile for the base file, and then when i want to write an object i would wrap an ObjectOutputStream on top of a ByteArrayOutputStream, write your object into that, then take those bytes, and rewrite those bytes into the RandomAcessFile where you want.
This probably won't be particularly efficient, as you will write N OOS headers, and N class descriptions for the object you are writing.

Read the file once and use the data multiple times

I wrote a java class which reads a file and stores each line in an arraylist. I want to access this arraylist large number of times. Everytime the class is called to access the arraylist, it reads the file again. I want the file to be read once and then access the arraylist multiple times. How can I do this?
Store it in a field of the class. I.e.:
public class Foo {
private List<String> list;
public List<String> readData() {
if (list != null) {
return list;
}
// do the reading.
}
}
Note that if this is used in a multithreaded environment you'd have to take extra measures. For example put synchronized on the method.
As Peter noted, if you can read multiple files, then you can use a Map<String, List<String>>
Another note is that you should use only one instance of this class. If you create multiple instances you won't have the desired effect.
It sounds like you should be reading the file on construction of the class rather than when accessing it. That doesn't necessarily mean in the constructor, mind you - you may well want to have a static factory method that reads the files into an ArrayList, and then passes that list to the real constructor. This would make the class easier to test (and use in other tests).
Then you only need to create the class once, and make the rest of your code use the same instance. Note that this doesn't require use of the singleton pattern, which would itself make testing harder. It just means propagating the instance to all the code that needs it.
Maybe you need to make a singleton? Then you will read the file only once - when you create a really new instance of class.
If its a web application, maybe you would consider storing it in the ServletContext or in the user HttpSession depending on how much does the file changes

writing many java objects to a single file

how can I write many serializable objects to a single file and then read a few of the objects as and when needed?
You'd have to implement the indexing aspect yourself, but otherwise this could be done. When you serialize an object you essentially get back an OutputStream, which you can point to wherever you want. Storing multiple objects into a file this way would be straightforward.
The tough part comes when you want to read "a few" objects back. How are you going to know how to seek to the position in the file that contains the specific object you want? If you're always reading objects back in the same order you wrote them, from the start of the file onwards, this will not be a problem. But if you want to have random access to objects in the "middle" of the stream, you're going to have to come up with some way to determine the byte offset of the specific object you're interested in.
(This method would have nothing to do with synchronization or even Java per se; you've got to design a scheme that will fit with your requirements and environment.)
The writing part is easy. You just have to remember that you have to write all objects 'at once'. You can't create a file with serialized objects, close it and open it again to append more objects. If you try it, you'll get error messages on reading.
For deserializing, I think you have to process the complete file and keep the objects you're interested in. The others will be created but collected by the gc on the next occasion.
Make Object[] for storing your objects. It worked for me.
I'd use a Flat File Database (e. g. Berkeley DB Java Edition). Just write your nodes as rows in a table like:
Node
----
id
value
parent_id
To read more Objects from file:
public class ReadObjectFromFile {
public static Object[] readObject() throws IOException {
Object[] list = null;
try {
byte[] bytes = Files.readAllBytes(Paths.get("src/objectFile.txt"));
ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(bytes));
list = (Object[]) ois.readObject();
ois.close();
} catch (IOException | ClassNotFoundException e) {
e.printStackTrace();
}
return list;
}
}

Java 1.4 singleton containing a mutable field

I'm working on a legacy Java 1.4 project, and I have a factory that instantiates a csv file parser as a singleton.
In my csv file parser, however, I have a HashSet that will store objects created from each line of my CSV file. All that will be used by a web application, and users will be uploading CSV files, possibly concurrently.
Now my question is : what is the best way to prevent my list of objects to be modified by 2 users ?
So far, I'm doing the following :
final class MyParser {
private File csvFile = null;
private Set myObjects = Collections.synchronizedSet(new HashSet);
public synchronized void setFile(File file) {
this.csvFile = file;
}
public void parse()
FileReader fr = null;
try {
fr = new FileReader(csvFile);
synchronized(myObjects) {
myObjects.clear();
while(...) { // foreach line of my CSV, create a "MyObject"
myObjects.add(new MyObject(...));
}
}
} catch (Exception e) {
//...
}
}
}
Should I leave the lock only on the myObjects Set, or should I declare the whole parse() method as synchronized ?
Also, how should I synchronize - both - the setting of the csvFile and the parsing ? I feel like my actual design is broken because threads could modify the csv file several times while a possibly long parse process is running.
I hope I'm being clear enough, because myself am a bit confused on those multi-synchronization issues.
Thanks ;-)
Basically you are assuming methods need to setFile first and then call parser. Let us consider this,
t1 (with setFile XX) and t2 (with setFile YY) are coming at the same time and t2 set the file to be YY. Then t1 asks for parse() and starts getting records from YY. No amount of synchronised is going to solve this for you and the only way out is to have the parse method take a File parameter or remove the singleton constraint (so that each thread has its own file object). So use a
public void parse(File file) //and add synchronised if you want.
I think there are multiple issues which are there in this code.
If this class is a singleton, this class should be stateless i.e no state should be present in this class. therefore having setter for the file itself is not the right thing to do. Pass the file object into parse method and let it work on the argument. This should fix your issue of synchronizing across various methods
Though your myObjects Set is private, I am assuming you are not passing this to any other calling classes. In case you are, always return clone of this set to avoid callers making changes to original set.
Synchronized on the object is good enough if all your set changes are within the synchronized block.
Use separate MyParser object for every parse request and you will not have to deal with concurrency (at least not in MyParser). Also, then will you be able to truly service multiple users at a time, not forcing them to wait or erasing the results of previous parsing jobs.
The singleton thing is mostly a red herring. Nothing to do with concurrency issues you are considering. As far as synchronisation goes, I think you are ok. Making the method synchronized will also work despite the fact that myObjects is static because it is a singleton.

Categories

Resources