Is there a way to serialize java collections in Hadoop?
The Writable interface is for Java primitives only. I have following class attributes.
private String keywords;
private List<Status> tweets;
private long queryTime = 0;
public TweetStatus(String keys, List<Status> tweets, long queryTime){
this.keywords = keys;
this.tweets = tweets;
this.queryTime = queryTime;
}
How I can serialize List object?
The Writable interface is for Java primitives only.
Right. Basically you need to break down your object into a sequence of objects that you can serialize.
So, from first principles, to serialize a list you need to serialize the size of the list and then serialize each element of the list. This way, when you need to deserialize, you know how many elements you need to deserialize.
Something like this should get you on the write (pun!) track:
class TweetStatusWritable implements Writable {
private String keywords;
private List<Status> tweets;
private long queryTime;
// add getters for the above three fields
public void readFields(DataInput in) {
this.keywords = in.readUTF();
int size = in.readInt();
this.tweets = new List<Status>();
for(int i = 0; i < size; i++) {
Status status = // deserialize an instance of Status
tweets.add(status);
}
this.queryTime = in.readLong();
}
public void write(DataOutput out) {
out.writeUTF(this.keywords);
out.writeInt(this.tweets.size());
for(int i = 0; i < this.tweets.size(); i++) {
// serialize tweets[i] onto out
}
out.writeLong(queryTime);
}
public TweetStatusWritable(
String keywords,
List<Status> tweets,
long queryTime
) {
this.keywords = keywords;
this.tweets = tweets;
this.queryTime = queryTime;
}
}
Take a look at ArrayWritable. It lets you serialize an array of instances (all of the same type). You could build one of those from your List
If you have a lot of serialization stuff ahead, you might find Avro useful.
Related
I have an assingment for school and I am having trouble with some ArrayLists. I have an input file which has one entry at every line. This entry has an integer and up to four strings. This input file is about locations that a film is filmed. The integer is the movieID in my case and the strings are the locations. However not every film has 4 locations which means that when my program tries to load the file it returns an error because it expects 5 fields at every row and this never happens because I have movies with 1 or 2 or the locations. I use a data loader class because I have to load several different files. My other files have a specific number of entries and fields at each row so loading those isn't a problem. The load process is done by adding the file into an array list and then creating the objects needed. I know that I need the program somehow to understand the empty fields and maybe handle them dynamically, for example a movie has 3 locations so the 4th field is empty, but I haven't figured it out yet. Any suggestions? Thank you!
This is my LocationsLoader class.
package dataLoader;
import java.util.ArrayList;
import dataModel.Locations;
public class LocationsLoader extends AbstractFileLoader<Locations>{
public int constructObjectFromRow(String[] tokens, ArrayList<Locations> locations) {
int movieID;
List<String> loc = new List();
movieID = Integer.parseInt(tokens[0]);
loc = tokens[]; // What goes here?
Locations l;
l = new Locations(movieID, loc);
locations.add(l);
System.out.println(l);
//System.out.println(locations.toString());
return 0;
}
}
And this is my Locations class:
package dataModel;
public class Locations {
private int movieID;
private List<String> loc;
public Locations(int otherMovieID, List<String> otherLocations) {
this.movieID = otherMovieID;
this.loc = otherLocations;
}
public int getMovieID() {
return movieID;
}
public void setMovieID(int id) {
this.movieID = id;
}
public String getLocations(int index) {
return loc.get(index);
}
}
}
You fill an array here
String[] tokens = new String[numFields];
for (int i = 0; i < numFields; i++) {
tokens[i] = tokenizer.nextToken();
}
but arrays are fixed length, there's really no reason to use them if you can have fewer values. Fill a list instead.
List<String> tokens = new ArrayList<>();
while (tokenizer.hasNextToken()) {
String token = tokenizer.nextToken().trim();
if (!token.isEmpty()) {
tokens.add(tokenizer.nextToken());
}
}
In fact, I'm not sure why you would need to give the reader the number of expected tokens at all.
But as Dodgy pointed out, you might as well use String#split:
String[] tokens = line.split(delimiter);
which will yield empty Strings as well, but you can just ignore those in your constructObjectFromRow function.
Disclaimer: I just learnt how to do this yesterday, and I am desperate
I have 3 classes (Student, ClassSchedule, ClassEnrolment)
public class ClassEnrolment {
ClassSchedule cs;
Student stud;
public ClassEnrolment(ClassSchedule cs, Student stud) {
super();
this.cs = cs;
this.stud = stud;
}
}
-----
public class Student{
private String name;
private String matricNo;
ClassEnrolment classroom;
ClassSchedule schedule;
}
-----
public class ClassSchedule {
String classType;
int index;
int group;
String day;
String time;
String venue;
String remark;
String courseCode;
}
I am trying to read/write a text file (database). I am having issues with 3 lines inside "/////////////////".
I am aware that the attributes declared in ClassEnrolment are not int nor string. How should I do this? How do I bring ClassSchedule and Student as part of StringTokenizer?
The textfile stores index from ClassSchedule and matricNo from Student. I have a feeling I am doing this wrongly.
public static ArrayList readEnrolment(String filename) throws IOException {
ArrayList stringArray = (ArrayList) read(filename);
ArrayList alr = new ArrayList();
for (int i = 0; i < stringArray.size(); i++) {
String st = (String) stringArray.get(i);
StringTokenizer star = new StringTokenizer(st, SEPARATOR);
///////////////////////////////////////////////////
int cs = Integer.parseInt(star.nextToken().trim());
String stud = star.nextToken().trim();
ClassEnrolment enrolment = new ClassEnrolment(cs, stud);
//////////////////////////////////////////////////
alr.add(enrolment);
}
return alr;
}
public static void saveEnrolment(String filename, List al) throws IOException {
List alw = new ArrayList();
for (int i = 0; i < al.size(); i++) {
ClassEnrolment enrolment = (ClassEnrolment) al.get(i);
StringBuilder st = new StringBuilder();
st.append(enrolment.getCs2());
st.append(SEPARATOR);
st.append(enrolment.getStud2());
alw.add(st.toString());
}
write(filename, alw);
}
Your problem here is an issue of Serialization. You could very well declare your classEnrolment as Serializable(do not forget the serialUUID) and let the classic java serialization process to write your objects as bytes in your file.
However, what you are trying here is to conceive a custom serializer for your class. I'd advise you to use generics in the likes of :
public interface Serializer<T> {
byte[] write(T objectToSerialize);
//to build a factory/service around it
boolean canDeserialize(byte[] serializedObject);
T read(byte[] serializedObject);
}
So that your methods read/writeEnrollment will simply be creating a Serializer<ClassEnrollment> and letting it do its job (which will probably create a Serializer<Student> and a Serializer<ClassSchedule>) .
For the matter of the Serialization process, I have to say that the StringTokenizer is a legacy class from JDK 1 whose use is discouraged by its own javadoc :
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
And if you want to Serialize objects of potentially different classes than just your ClassEnrollment, it would be nice to be able to distinguish the class used. There are many ways to do it, and as one commenter said, json serialization would fare very well.
I have this class that serves as a container which I will use the instance variable for processing later
class Data{
static int counter= 0;
boolean boolean1;
String string1;
public Data() {
counter++;
}
}
And I have this method that sets the values of Data
public Data setData()
{
Data data = null;
for (int i = 0; i < somecoutnerhere; i++) {
Data = new Data();
Data.boolean1 = some boolean put here;
Data.string1 = "some string to be put here";
}
return ProcessData(Data);
}
I also have this class ProcessData that will make use of Data and will construct the response
private class ProcessData
{
private final Map<String, List<?>> map = new HashMap<String, List<?>>();
int counter;
public ProcessData(Data data)
{
map.put("boolean1", data.boolean1);
map.put("String1", data.string1);
counter = data.counter;
}
public String someMethodToGenerateReturnData(){
// some code here to make use of the Data collected. Will basically use map to construct the return String
}
}
My problem is that I couldn't figure out how can I return all the instance variables created on the for-loop for Data on setData(). Any thoughts?
My problem is that I couldn't figure out how can I return all the instance variables created on the for-loop for Data on setData(). Any thoughts?
According to this your problem is not "returning all instance one variables in one call", as your title states, but rather a question about how returning all Data-Objects created in your for-loop, which is easier.
Your code is erronous though, so I went ahead & corrected it (I hope I didn't mess up). I also renamed a few things.
The changes I made are:
renamed "boolean1" and "string1" to "trueOrFalse" and "string"
added a public, fully parameterized constructor to the Data-class
added a ProcessData-list to the setData()-method, which is filled in the for-loop
(+ a comment)
However, I'd strongly recommend you to check your architecture, and also to learn a bit about naming conventions, or coding conventions in general. Names should point out the purpose or content of the method/variable/class, and "boolean1" isn't really doing that.
Regarding the architecture: The Data-class seems to exist solely for the counter, and you could easily change that, making the Data-class obsolete (unless it's used somewhere else).
Data class:
class Data {
static int counter = 0;
boolean trueOrFalse;
String string;
public Data() {
counter++;
}
public Data(boolean someBoolean, String someString) {
this.trueOrFalse= someBoolean;
this.string = someString;
counter++;
}
}
setData()-Method:
public List<ProcessData> setData() {
List<ProcessData> processedDataList = new ArrayList<ProcessData>();
for (int i = 0; i < someCounterHere; i++) {
processedDataList.add(new ProcessData(new Data(true, "testString"));
// a new Data-object is created (parameters true and "testString")
// a new ProcessData-object is created (parameter is the newly created Data-Object)
// the newly created ProcessData-object is added to the list
}
return processedDataList;
}
ProcessData-class:
private class ProcessData {
private final Map<String, List<?>> map = new HashMap<String, List<?>>();
int counter;
public ProcessData(Data data) {
map.put("trueOrFalse", data.trueOrFalse);
map.put("string", data.string);
counter = data.counter;
}
public String someMethodToGenerateReturnData() {
// some code here to make use of the Data collected. Will basically use map to construct the return String
}
}
I don't know if this is possible in Java but I was wondering if it is possible to use an object in Java to return multiple values without using a class.
Normally when I want to do this in Java I would use the following
public class myScript {
public static void main(String[] args) {
// initialize object class
cl_Object lo_Object = new cl_Object(0, null);
// populate object with data
lo_Object = lo_Object.create(1, "test01");
System.out.println(lo_Object.cl_idno + " - " + lo_Object.cl_desc);
//
// code to utilize data here
//
// populate object with different data
lo_Object = lo_Object.create(2, "test02");
System.out.println(lo_Object.cl_idno + " - " + lo_Object.cl_desc);
//
// code to utilize data here
//
}
}
// the way I would like to use (even though it's terrible)
class cl_Object {
int cl_idno = 0;
String cl_desc = null;
String cl_var01 = null;
String cl_var02 = null;
public cl_Object(int lv_idno, String lv_desc) {
cl_idno = lv_idno;
cl_desc = lv_desc;
cl_var01 = "var 01";
cl_var02 = "var 02";
}
public cl_Object create(int lv_idno, String lv_desc) {
cl_Object lo_Object = new cl_Object(lv_idno, lv_desc);
return lo_Object;
}
}
// the way I don't really like using because they get terribly long
class Example {
int idno = 0;
String desc = null;
String var01 = null;
String var02 = null;
public void set(int idno, String desc) {
this.idno = idno;
this.desc = desc;
var01 = "var 01";
var02 = "var 02";
}
public int idno() {
return idno;
}
public String desc() {
return desc;
}
public String var01() {
return var01;
}
public String var02() {
return var02;
}
}
Which seems like a lot of work considering in Javascript (I know they are different) I can achieve the same effect just doing
var lo_Object = f_Object();
console.log(lo_Object["idno"] + " - " + lo_Object[desc]);
function f_Object() {
var lo_Object = {};
lo_Object = {};
lo_Object["idno"] = 1;
lo_Object["desc"] = "test01";
return lo_Object;
}
NOTE
I know the naming convention is wrong but it is intentional because I have an informix-4gl program that runs with this program so the coding standards are from the company I work for
The best way to do this is to use HashMap<String, Object>
import java.util.HashMap;
public class Main {
public static void main(String[] args) {
HashMap<String, Object> person =
new HashMap<String, Object>();
// add elements dynamically
person.put("name", "Lem");
person.put("age", 46);
person.put("gender", 'M');
// prints the name value
System.out.println(person.get("name"));
// asures that age element is of integer type before
// printing
System.out.println((int)person.get("age"));
// prints the gender value
System.out.println(person.get("gender"));
// prints the person object {gender=M, name=Lem, age=46}
System.out.println(person);
}
}
The advantage of doing this is that you can add elements as you go.
The downside of this is that you will lose type safety like in the case of the age. Making sure that age is always an integer has a cost. So to avoid this cost just use a class.
No, there is no such a feature, you have to type out the full type name(class name).
Or use may use val :
https://projectlombok.org/features/val.html
Also, if you use IntelliJ IDEA
try this plugin :
https://bitbucket.org/balpha/varsity/wiki/Home
I am not sure if it's possible with Java. Class is the primitive structure to generate Object. We need a Class to generate object. So, for the above code, i don't think there is a solution.
Java methods only allow one return value. If you want to return multiple objects/values consider returning one of the collections. Map, List, Queue, etc.
The one you choose will depend on your needs. For example, if you want to store your values as key-value pairs use a Map. If you just want to store values sequentially, use a list.
An example with a list:
list<Object> myList = new ArrayList<Object>();
myList.add("Some value");
return myList;
As a side note, your method create is redundant. You should use getters and setters to populate the object, or populate it through the constructor.
cl_Object lo_Object = new cl_Object(1, "test01");
The way you have it set up right now, you're creating one object to create another of the same type that has the values you want.
Your naming convention is also wrong. Please refer to Java standard naming convention:
http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-135099.html#367
I'm trying to figure out if there's someway for me to dynamically fill an array of objects within a class, without using array initialization. I'd really like to avoid filling the array line by line. Is this possible given the code I have here?
final class Attributes {
private final int TOTAL_ATTRIBUTES = 7;
Attribute agility;
Attribute endurance;
Attribute intelligence;
Attribute intuition;
Attribute luck;
Attribute speed;
Attribute strength;
private Attributes[] attributes; //array to hold objects
private boolean initialized = false;
public Attributes() {
initializeAttributes();
initialized = true;
store(); //method for storing objects once they've been initialized.
}
private void initializeAttributes() {
if (initialized == false) {
agility = new Agility();
endurance = new Endurance();
intelligence = new Intelligence();
intuition = new Intuition();
luck = new Luck();
speed = new Speed();
strength = new Strength();
}
}
private void store() {
//Dynamically fill "attributes" array here, without filling each element line by line.
}
}
attributes = new Attributes[sizeOfInput];
for (int i=0; i<sizeOfInput; i++) {
attributes[i] = itemList[i];
}
Also, FYI you can add things to an ArrayList and then call toArray() to get an Array of the object.
There is a short Array initialization syntax:
attributes = new Attribute[]{luck,speed,strength,etc};
Field[] fields = getClass().getDeclaredFields();
ArrayList<Attrubute> attributesList = new ArrayList<Attrubute>();
for(Field f : fields)
{
if(f.getType() == Attrubute.class)
{
attributesList.add((Attrubute) f.get(this));
}
}
attributes = attributesList.toArray(new Attrubute[0]);
You can use a HashMap<String,Attribute>