Serialize object that contains a Dataset - java

I am using Spark 2.3.1 with Java
I have an object that encapsulate a Dataset. I want to be able to serialize and deserialize this object.
My code is as follow :
public class MyClass implements Serializable {
private static final long serialVersionUID = -189012460301698744L;
public Dataset<Row> dataset;
public MyClass(final Dataset<Row> dataset) {
this.dataset = dataset;
}
/**
* Save the current instance of MyClass into a file as a serialized object.
*/
public void save(final String filepath, final String filename) throws Exception{
File file = new File(filepath);
file.mkdirs();
file = new File(filepath+"/"+filename);
try (final ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(file))) {
oos.writeObject(this);
}
}
/**
* Create a new MyClass from a serialized MyClass object
*/
public static MyClass load(final String filepath) throws Exception{
final File file = new File(filepath);
final MyClass myclass;
try (final ObjectInputStream ois = new ObjectInputStream(new FileInputStream(file))) {
myclass = ((MyClass) ois.readObject());
}
System.out.println("test 1 : "+ myclass);
System.out.println("test 2 : "+ myclass.dataset);
myclass.dataset.printSchema();
return myclass;
}
// Some other functions
}
But the serialization does not seem to be done properly. The load() function give me the following display :
test 1 : MyClass#520e6089
test 2 : Invalid tree; null:
null
And throws a java.lang.NullPointerException on the printSchema()
What am I missing to properly serialize my object ?

Spark Datasets are meaningful only in the scope of the session that has been used to create these. Therefore serializing Dataset is utterly meaningless.
If you want to serialize data just write Dataset to a persistent storage.
If you want to "serialize" pipeline, just keep use the code (method) that takes some form of input, and returns desired Dataset. Don't try to serialize Dataset itself.

Related

Creating an object of Class, other than via the constructor

In Java, given
Class c = ...
We can make an object of this class by first obtaining a constructor. For example, if we want to use the default (no parameters) constructor,
c.getConstructor().newInstance()
This seems straightforward, and seems to match how things are done in Java source code.
But, curiously, it is not how things are done in JVM byte code. There, creating an object is done in two steps: new to actually create the object, then invokespecial to call an appropriate constructor.
Is there a way to bypass the constructor when what you have is a Class (with the actual class to be determined at runtime)? If not, was the rationale for the difference between how this works, and how the byte code works, ever documented?
You wanna allocate an uninitialized object.
You can try the library named Objenesis.
Otherwise, you can create an object by serialization. This is a widely used method to create a uninitialized object.
public class Serialization {
static class TestSerialization implements Serializable {
int val = 0;
public TestSerialization() {
System.out.println("constructor");
val = 1;
}
#Override
public String toString() {
return "val is " + val;
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException {
TestSerialization testSerialization = new TestSerialization();
// constructor
// val is 1
System.out.println(testSerialization);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(bos);
oos.writeObject(testSerialization);
oos.close();
ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(bos.toByteArray()));
Object obj = ois.readObject();
// val is 1
System.out.println(obj);
}
}
One step closer, you can use ReflectionFactory to create an empty uninitialized object.
public class Main {
static class TestClass {
public int val = 0;
public TestClass() {
val = 1;
}
#Override
public String toString() {
return "value is " + val;
}
}
public static void main(String[] args) throws Exception {
// by constructor
TestClass obj = new TestClass();
// value is 1
System.out.println(obj);
// by reflect
Constructor<TestClass> constructor = TestClass.class.getConstructor();
obj = constructor.newInstance();
// value is 1
System.out.println(obj);
// by ReflectionFactory
ReflectionFactory reflectionFactory = ReflectionFactory.getReflectionFactory();
Constructor<Object> objectConstructor = Object.class.getDeclaredConstructor();
Constructor<?> targetConstructor = reflectionFactory.newConstructorForSerialization(TestClass.class, objectConstructor);
obj = (TestClass) targetConstructor.newInstance();
// value is 0
System.out.println(obj);
}
}

Deserialization of a singleton object

Hi i'm trying to find a way to serialize and deserialize a singleton object while retrieving the object that was serialized before e.g: after I want to add doctors to my hospital object and after it being deserialized i get my doctors list back.
I read that in order to serialize a singleton I need to add readResolve() method.
but still every time i rebuild my object I'm getting new instance and it's empty, although i'm not getting any errors
public class Hospital implements Serializable{
/**
*
*/
private static final long serialVersionUID = 1L;
private static Hospital theHospital = null;
private HashMap<Integer, Doctor> doctors;
private HashMap<Integer, Nurse> nurses;
private HashMap<Integer, PatientReport> reports;
private HashMap<Integer, Patient> patients;
private HashMap<Integer, Patient> hotelPatients;
private HashMap<Integer, Disease> diseases;
private HashMap<Integer, Department> departments;
private HashMap<String,Department> departmentsByName;
private HashMap<Patient, HashSet<Doctor>> doctorsList;
private HashMap<Patient, HashSet<Nurse>> nursesList;
private TreeMap<Integer, Nurse> nurseShiftSet;
private ArrayList<SubDepartment> subSet;
private HashMap<String,HashMap<String,Doctor>> docUser;
private HashMap<String,HashMap<String,Nurse>> nurseUser;
private TreeSet<Department> DepList;
public static Hospital getInstance() {
if (theHospital == null){
theHospital = new Hospital();
}
return theHospital;
}
this is the object getInstance()
and here is the methods that i have used to write and read the serialized file .
private static ObjectInputStream input;
public static void writeObject(Hospital h) {
try {
FileOutputStream file = new FileOutputStream("Hospital.ser");
ObjectOutputStream out = new ObjectOutputStream(file);
out.writeObject(h);
out.close();
file.close();
} catch (IOException e) {
System.out.println("Error in creating the file");
}
}
public static Hospital readObject() {
try {
FileInputStream file = new FileInputStream("Hospital.ser");
input = new ObjectInputStream(file);
Hospital h = (Hospital) input.readObject();
file.close();
input.close();
return h;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
Update
Example :
Hospital h = Hospital.getInstance();
Department department = new Department("department name");
HashMap<String,Department> map = h.getDepartmentsByName();
// adding the department to the hospital instance .
map.put(department.getName(),department);
Serializing.writeObject(h); // writing the given instance h.
map.remove(d.getName());
// removing from the current instance after serializing.
h = Serializing.readObject(); // deserializing from the file.
map = h.getDepartmentsByName();
map.get(d.getName);// i'm expecting to return the department d , but it returns null.
Use object-mappers. Also your question doesn’t explain what exactly you are looking for. Share some examples of the same to explain the problem. Also, serialize and deserialise doesn’t seem right the way you are doing.

Getting same hashcode every time before serialization and after seriadeserialization of object without using readResolve method in Java why?

Getting same Hashcode every time before serialization and after deserialization of object without using readResolve() method in Java why ?
Here is my class
public class SerializedSingletonClass implements Serializable{
private static final long serialVersionUID = 18989987986l;
private SerializedSingletonClass(){};
private static class InstanceHelper {
private static SerializedSingletonClass obj = new SerializedSingletonClass();
}
public static SerializedSingletonClass getInstance(){
return InstanceHelper.obj;
}
}
Test Class --
public class TestSingleton {
public static void main(String[] args) throws FileNotFoundException,
IOException, ClassNotFoundException {
// Test Serialization for singleton pattern
SerializedSingletonClass instanse1 = SerializedSingletonClass
.getInstance();
ObjectOutputStream obs = new ObjectOutputStream(new FileOutputStream(
"filename1.ser"));
obs.writeObject(instanse1);
obs.close();
ObjectInputStream objInputStream = new ObjectInputStream(
new FileInputStream("filename1.ser"));
SerializedSingletonClass instance2 = (SerializedSingletonClass) objInputStream
.readObject();
objInputStream.close();
System.out.println("instance1==" + instanse1.getClass().hashCode());
System.out.println("instance2==" + instance2.getClass().hashCode());
}
}
Output ::
instance1==1175576547
instance2==1175576547
Your objects are instances of the same class, SerializedSingletonClass. You're getting the hashCode from the class, not from the instance. instanse1.getClass() evaluates to the same thing as instance2.getClass(), so of course they produce the same hashCode.
To find the hashCode of the objects, use instanse1.hashCode() and instance2.hashCode().

Is there a way to make current class point to the new object that has just been read in Java Serializable?

I am trying to read an object for a SocialNetwork Simulation. The read methods makes use of Java Serializable. The code looks like this:
public class SocialNetwork implements Serializable{
// lots of fields
public SocialNetwork(){
//lots of inilization
}
public void writethisObject() throws IOException{
ObjectOutputStream objectOutputStream = new ObjectOutputStream(new FileOutputStream("Simulation.bin"));
objectOutputStream.writeObject(this);
}
public void readfromObject(File f) throws IOException, ClassNotFoundException{
ObjectInputStream objectInputStream = new ObjectInputStream(new FileInputStream(f));
SocialNetwork newSocialNetwork = (SocialNetwork) objectInputStream.readObject();
this = newSocialNetwork;
}
}
However as you can see I am trying to make the current class point to the object that I just read by making this = newSocialNetwork. This gives me an error as expected. I can work around this by making each and every field of the current SocialNetwork class to the newSocialNetwork. However, I do not want to do that as there are tons of field in my class. And it would look very messy. I hope you have got the idea of what I am trying to do.
Since you cannot override this in Java and you want to have only one instance of your class use a Singelton Pattern:
public class SocialNetwork implements Serializable{
// lots of fields
private static SocialNetwork myself = new SocialNetwork();
private SocialNetwork(){ // private constructor
//lots of inilization
}
public static SocialNetwork getInstance() {
return myself;
}
public void writethisObject() throws IOException{...}
public void readfromObject(File f) throws IOException, ClassNotFoundException{
ObjectInputStream objectInputStream = new ObjectInputStream(new FileInputStream(f));
myself = (SocialNetwork) objectInputStream.readObject();
}
}
You can't do that, but instead you can declare a static method.
public static SocialNetwork readfromObject(File f) throws IOException, ClassNotFoundException{
ObjectInputStream objectInputStream = new ObjectInputStream(new FileInputStream(f));
return (SocialNetwork) objectInputStream.readObject();
}
We can use Singelton Pattern to achieve. Classes outside SocialNetwork will use use SocialNetwork.getInstance() to initialize the SocialNetwork. Here is the code
public class SocialNetwork implements Serializable{
// lots of fields
private static SocialNetwork myself = null;
private SocialNetwork(){
//lots of inilization
}
public static SocialNetwork getInstance() {
if (myself == null){
myself = new SocialNetwork();
}
return myself;
}
public void writethisObject() throws IOException{
ObjectOutputStream objectOutputStream = new ObjectOutputStream(new FileOutputStream("Simulation.bin"));
objectOutputStream.writeObject(this);
}
public void readfromObject(File f) throws IOException, ClassNotFoundException{
ObjectInputStream objectInputStream = new ObjectInputStream(new FileInputStream(f));
SocialNetwork newSocialNetwork = (SocialNetwork) objectInputStream.readObject();
myself = newSocialNetwork;
}
}

Java: accessing transient object fields inside class

Accessing private transient object fields from any method in class must be controlled with some code. What is the best practice?
private transient MyClass object = null;
internal get method:
private MyClass getObject() {
if (object == null)
object = new MyClass();
return object;
}
// use...
getObject().someWhat();
or "make sure" method:
private void checkObject() {
if (object == null)
object = new MyClass();
}
// use...
checkObject();
object.someWhat();
or something clever, more safe or more powerful?
Transient fields are lost at serialization but you need them only after deserialization, so you have to restore them to what you need in the readObject method...
Have to post a new answer about transient because it's too long for a comment. Following code prints
Before: HELLO FOO BAR
After: HELLO null null
public class Test {
public static void main(String[] args) throws Exception {
final Foo foo1 = new Foo();
System.out.println("Before:\t" + foo1.getValue1() + "\t" + foo1.getValue2() + "\t" + foo1.getValue3());
final File tempFile = File.createTempFile("test", null);
// to arrange for a file created by this method to be deleted automatically
tempFile.deleteOnExit();
final FileOutputStream fos = new FileOutputStream(tempFile);
final ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeObject(foo1);
oos.close();
final FileInputStream fis = new FileInputStream(tempFile);
final ObjectInputStream ois = new ObjectInputStream(fis);
final Foo foo2 = (Foo) ois.readObject();
ois.close();
System.out.println("After:\t" + foo2.getValue1() + "\t" + foo2.getValue2() + "\t" + foo2.getValue3());
}
static class Foo implements Serializable {
private static final long serialVersionUID = 1L;
private String value1 = "HELLO";
private transient String value2 = "FOO";
private transient String value3;
public Foo() {
super();
this.value3 = "BAR";
}
public String getValue1() {
return this.value1;
}
public String getValue2() {
return this.value2;
}
public String getValue3() {
return this.value3;
}
}
}
Most safe (and normal) way would be either directly initializing it:
private transient MyClass object = new MyClass();
or using the constructor
public ParentClass() {
this.object = new MyClass();
}
Lazy loading in getters (as you did in your example) is only useful if the constructor and/or initialization blocks of MyClass is doing fairly expensive stuff, but it is not threadsafe.
The transient modifier doesn't make any difference. It only skips the field whenever the object is about to be serialized.
Edit: not relevant anymore. As proven by someone else, they indeed don't get reinitialized on deserialization (interesting thought though, it will actually only happen if they are declared static). I'd go ahead with the lazy loading approach or by resetting them through their setters directly after deserialization.

Categories

Resources