org.apache.spark.SparkException: Task not serializable in java - java

i'm trying use spark graphx. before that i wanted to arrange my vertex and edge rdd using dataframes. for that purpose i used JavaRdd map function.but i'm getting above error.i tried various ways to fix this issue.i serialized tha whole class.but it didn't work.and also i implement Function and Serializable classes in one class ind used it in map function.but it aslo didn't work.please help me in advance.
//add long unique id for vertex dataframe and get javaRdd
JavaRDD<Row> ff = vertex_dataframe.javaRDD().zipWithIndex().map(new Function<Tuple2<Row, java.lang.Long>, Row>() {
public Row call(Tuple2<Row, java.lang.Long> rowLongTuple2) throws Exception {
return RowFactory.create(rowLongTuple2._1().getString(0), rowLongTuple2._2());
}
});
i serialized Function() class like below.
public abstract class SerialiFunJRdd<T1,R> implements Function<T1, R> , java.io.Serializable{
}

I will suggest you to read something about serializing non static inner classes in java. you are creating a non static inner class here in your map which is not serialisable even if you mark that serialisable. you have to make it static first.
JavaRDD<Row> ff = vertex_dataframe.javaRDD().zipWithIndex().map(mapFunc);
static SerialiFunJRdd<Tuple2<Row, java.lang.Long>, Row> mapFunc=new SerialiFunJRdd<Tuple2<Row, java.lang.Long>, Row>() {
#Override
public Row call(Tuple2<Row, java.lang.Long> rowLongTuple2) throws Exception {
return RowFactory.create(rowLongTuple2._1().getString(0), rowLongTuple2._2());
}
}

Related

How do I properly write an ExtensionObject Array on an Eclipse Milo OpcUa Server?

I am trying to write an Array of ExtensionObject on an Eclipse Milo OpcUa Server.
I'm doing all this in Java 8 and on Milo 0.2.3.
My way to test what I wrote to my Server is the Unified Automation UaExpert Client and a little Python client. Both show the same results.
I have the following Structure (I named it MyStructure for this scenario). It is already present as an Array and I want to write it to the respective node.
#Getter
#Setter
#AllArgsConstructor
public class MyStructure implements UaStructure {
private String name;
private Integer dataType;
private String stringValue;
private Integer intValue;
private Float floatValue;
public static final String Description = "MyStructure ";
public static NodeId getNodeId() {
return new NodeId(2, 3081);
}
#Override
public NodeId getTypeId() {
return getNodeId();
}
#Override
public NodeId getBinaryEncodingId() {
return getNodeId();
}
#Override
public NodeId getXmlEncodingId() {
return getNodeId();
}
public static class Codec extends GenericDataTypeCodec<MyStructure > {
#Override
public Class<MyStructure > getType() {
return MyStructure .class;
}
#Override
public MyStructure decode(SerializationContext context, UaDecoder reader) {
return new MyStructure (
reader.readString("Name"),
reader.readInt32("DataType"),
reader.readString("StringValue"),
reader.readInt32("IntValue"),
reader.readFloat("FloatValue")
);
}
#Override
public void encode(SerializationContext context, MyStructure myStructure, UaEncoder writer) {
writer.writeString("Name", myStructure.getName());
writer.writeInt32("DataType", myStructure.getDataType());
writer.writeString("StringValue", myStructure.getStringValue());
writer.writeInt32("IntValue", myStructure.getIntValue());
writer.writeFloat("FloatValue", myStructure.getFloatValue());
}
}
}
I write the node like this, where node is an instance of UaVariableNode and array my Array object, which I created like this:
node.setValue(new DataValue(new Variant(array)));
Object array = Array.newInstance(MyStructure.class, myStructureList.size());
for (int i = 0; i < myStructureList.size(); i++) {
Array.set(array, i,myStructureList.get(i));
}
I registered MyStructure definition beforehand like this:
OpcUaBinaryDataTypeDictionary dictionary = new OpcUaBinaryDataTypeDictionary("mynamespace");
dictionary.registerStructCodec(
new MyStructure.Codec().asBinaryCodec(),
"MyStructure",
new NodeId(2, 3081)
);
OpcUaDataTypeManager.getInstance().registerTypeDictionary(dictionary);
Whenever I set my node, the server doesn't complain. It actually sets something, to be precise it sets 42 Extension Objects. In UaExpert I see that the value, including its timestamp, changed, but I can't see the actual value. The value is just of the type Array of ExtensionObject and I can't read any of the nested values. But that is what I saw in other projects. They have custom structures, and the nested fields are human readable in UaExpert.
The problem doesn't change if I do it without the Array and just write one MyStructure.
Do you have an idea, what I am doing wrong or not doing at all?
Right now custom structures in Milo only work if the client reading/writing them knows about the structure in advance.
What you're missing (and isn't implemented by Milo yet) is all the complexity around creating a DataTypeDictionary, registering it in the address space, and linking your codec to an entry in that dictionary via a DataTypeEncoding.
If you were to use a tool like UaModeler and create a custom structure in it, then take a look at the generated XML, you'd see there's a whole bunch of other supporting nodes that go along with it.
When these things are in place, clients can learn how to decode custom structures without knowing about them in advance. Milo's client includes this functionality as well.
Also, fwiw, you should encode your array of structures by making an ExtensionObject[], with each ExtensionObject holding one scalar structure value.

Flatten processing result in spring batch

Does anyone know how in spring-batch (3.0.7) can I flat a result of processor that returns list of entities?
Example:
I got a processor that returns List
public class MyProcessor implements ItemProcessor < Long , List <Entity>> {
public List<Entity> process ( Long id )
}
Now all following processors / writers need to work on List < Entity >. Is there any way to flat the result to simply Entity so the further processors in given step can work on single Entities?
The only way is to persist the list somehow with a writer and then create a separate step that would read from the persisted data.
Thanks in advance!
As you know, processors in spring-batch can be chained with a composite processor. Within the chain, you can change the processing type from processor to processor, but of course input and output type of two "neighbour"-processors have to match.
However, Input out Output type is always treated as one item. Therefore, if the output type of a processor ist a List, this list is regared as one item. Hence, the following processor needs to have an InputType "List", resp., if a writer follows, the Writer needs to have a List-of-List as type its write-method.
Moreover, a processor can not multiply its element. There can only be one output item for every input element.
Basically, there is nothing wrong with having a chain like
Reader<Integer>
ProcessorA<Integer,List<Integer>>
ProcessorB<List<Integer>,List<Integer>>
Writer<List<Integer>> (which leads to a write-method write(List<List<Integer>> items)
Depending on the context, there could be a better solution.
You could mitigate the impact (for instance reuseability) by using wrapper-processors and a wrapper-writer like the following code examples:
public class ListWrapperProcessor<I,O> implements ItemProcessor<List<I>, List<O>> {
ItemProcessor<I,O> delegate;
public void setDelegate(ItemProcessor<I,O> delegate) {
this.delegate = delegate;
}
public List<O> process(List<I> itemList) {
List<O> outputList = new ArrayList<>();
for (I item : itemList){
O outputItem = delegate.process(item);
if (outputItem!=null) {
outputList.add(outputItem);
}
}
if (outputList.isEmpty()) {
return null;
}
return outputList;
}
}
public class ListOfListItemWriter<T> implements InitializingBean, ItemStreamWriter<List<T>> {
private ItemStreamWriter<T> itemWriter;
#Override
public void write(List<? extends List<T>> listOfLists) throws Exception {
if (listOfLists.isEmpty()) {
return;
}
List<T> all = listOfLists.stream().flatMap(Collection::stream).collect(Collectors.toList());
itemWriter.write(all);
}
#Override
public void afterPropertiesSet() throws Exception {
Assert.notNull(itemWriter, "The 'itemWriter' may not be null");
}
public void setItemWriter(ItemStreamWriter<T> itemWriter) {
this.itemWriter = itemWriter;
}
#Override
public void close() {
this.itemWriter.close();
}
#Override
public void open(ExecutionContext executionContext) {
this.itemWriter.open(executionContext);
}
#Override
public void update(ExecutionContext executionContext) {
this.itemWriter.update(executionContext);
}
}
Using such wrappers, you could still implement "normal" processor and writers and then use such wrappers in order to move the "List"-handling out of them.
Unless you can provide a compelling reason, there's no reason to send a List of Lists to your ItemWriter. This is not the way the ItemProcessor was intended to be used. Instead, you should create/configure and ItemReader to return one object with relevant objects.
For example, if you're reading from the database, you could use the HibernateCursorItemReader and a query that looks something like this:
"from ParentEntity parent left join fetch parent.childrenEntities"
Your data model SHOULD have a parent table with the Long id that you're currently passing to your ItemProcessor, so leverage that to your advantage. The reader would then pass back ParentEntity objects, each with a collection of ChildEntity objects that go along with it.

reading/writing avro file in spark core using java

I need to access avro file data in a program written in java on spark core. I can use the MapReduce InputFormat class but it gives me a tuple containing each line of file as a key. It's very hard to parse it as i am not using scala.
JavaPairRDD<AvroKey<GenericRecord>, AvroValue> avroRDD = sc.newAPIHadoopFile("dataset/testfile.avro", AvroKeyInputFormat.class, AvroKey.class, NullWritable.class,new Configuration());
Is there any utility class or jar available which i can use to map avro data directly into java classes. E.g. the codehaus.jackson package has a provision for mapping json to java class.
Otherwise is there any other method to easily parse fields present in avro file to java classes or RDDs.
Consider that your avro file contains serialized pairs, with key being a String, and value being an avro class. Then you could have a generic static function of some Utils class that looks like this:
public class Utils {
public static <T> JavaPairRDD<String, T> loadAvroFile(JavaSparkContext sc, String avroPath) {
JavaPairRDD<AvroKey, NullWritable> records = sc.newAPIHadoopFile(avroPath, AvroKeyInputFormat.class, AvroKey.class, NullWritable.class, sc.hadoopConfiguration());
return records.keys()
.map(x -> (GenericRecord) x.datum())
.mapToPair(pair -> new Tuple2<>((String) pair.get("key"), (T)pair.get("value")));
}
}
And then you could use the method this way:
JavaPairRDD<String, YourAvroClassName> records = Utils.<YourAvroClassName>loadAvroFile(sc, inputDir);
You might also need to use KryoSerializer and register your custom KryoRegistrator:
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
sparkConf.set("spark.kryo.registrator", "com.test.avro.MyKryoRegistrator");
And the registrator class would look this way:
public class MyKryoRegistrator implements KryoRegistrator {
public static class SpecificInstanceCollectionSerializer<T extends Collection> extends CollectionSerializer {
Class<T> type;
public SpecificInstanceCollectionSerializer(Class<T> type) {
this.type = type;
}
#Override
protected Collection create(Kryo kryo, Input input, Class<Collection> type) {
return kryo.newInstance(this.type);
}
#Override
protected Collection createCopy(Kryo kryo, Collection original) {
return kryo.newInstance(this.type);
}
}
Logger logger = LoggerFactory.getLogger(this.getClass());
#Override
public void registerClasses(Kryo kryo) {
// Avro POJOs contain java.util.List which have GenericData.Array as their runtime type
// because Kryo is not able to serialize them properly, we use this serializer for them
kryo.register(GenericData.Array.class, new SpecificInstanceCollectionSerializer<>(ArrayList.class));
kryo.register(YourAvroClassName.class);
}
}

Java: Passing different custom objects that implement same interface to method [duplicate]

This question already has answers here:
Java collections covariance problem
(3 answers)
Closed 6 years ago.
I'm trying to write a general method that writes different types of Java Beans (so List<JavaBean>) to a file. I'm currently constructing a FileManager utility class. Each Java Bean implements the same interface. Here's an example of what I'm trying to do.
public interface Data { method declarations }
public class RecipeData implements Data { class stuff goes here }
public class DemographicData implements Data { class stuff goes here }
final public class FileManager {
public static void writeToCsvFile(String filename, List<Data> data) { file writing logic goes here }
}
I want to be able to pass a List<RecipeData> and a List<DemographicData> to this method. Obviously what I have does not work.
It doesn't seem I can even do the following:
List<Data> data = new ArrayList<RecipeData>();
How would this normally be done? In Swift I might use the as? keyword to cast it to the correct type.
**************EDIT**************
Just to preface I'm using the SuperCSV library to assist in parsing rows into a Java Bean and I am using the accepted answer below for the method definition. So I have the following code:
Data dataset;
while((dataset = beanReader.read(Data.class, nameMappings, processors)) != null ) {
container.add(dataset);
}
I get the following error:
The method add(capture#1-of ? extends Data) in the type List is not applicable for the arguments (Data)
dataset needs to be either type RecipeData or DemographicData for this to work I'd assume. Is there an easy way to fix this so that it is flexible if I add more Beans in the future?
final public class FileManager {
public static void writeToCsvFile(String filename, List<? extends Data> data) { file writing logic goes here }
}
additionally, you can declare
List<Data> data = new ArrayList<RecipeData>();
as
List<Data> data = new ArrayList<Data>();
or, in Java 7,
List<Data> data = new ArrayList<>();
and just populate it with RecipeData, since either way you are losing the information that this List is to contain only RecipeData

Generic getter in container object

Some container class holds list of objects of base class "Table" (like WoodenTable, MetalTable...). Each Table class keeps its MaterialType (MaterialType.Wood, MaterialType.Metal...). Question is how to provide proper getter method for container class that could return each SubClass of Table.
So far I've found following ways:
1.Getter with material type as parameter. Danger here is ClassCastException if type of T doesn't correspond to materialType:
<T extends Table> T getTable(MaterialType materialtype)
WoodenTable table = getTable(MaterialType.Wood);
MetalTable table = getTable(MaterialType.Wood); // ups... exception
2.Getter with Class parameter. Safe but not so clear for user (comparing to MaterialType as parameter):
<T extends Table> T getTable(Class<T> tableClass)
WoodenTable table = getTable(WoodenTable.class);
3.Getter for each Table SubClass. Cumbersome to use,write and add new Table subClasses:
WoodenTable getWoodenTable()
WoodenTable table = getWoodenTable();
4.Getter for just Table interface. Cast done outside of container class if necessary.
Table getTable(MaterialType materialType)
WoodenTable woodenTable = (WoodenTable)getTable(MaterialType.Wood)
Is any other (better) way to do that? If not, then which of those would be most appriopriate or least smelly?
It should be just this simple:
public Table getTable()
{
return this.table;
}
This will return a Table object it is up to the invoker to do with it what they want. Similar to the following block:
public Collection<String> getCollection()
{
return new ArrayList<String>();
}
The body returns an ArrayList but the function really returns a Collection. A well defined API that utilizes the most common interface between objects will give you this same flexibility.
EDIT
Yes, that could be one use case but quite often I need also something
like : for(Table t : TableContainer) {
(SubClassA)t.doSomethingSpecificForA();} and that's were my problems
begin.
Let us assume the following interface and implementations:
public interface Table
{
Table getTable();
void doSpecial();
}
public class WoddenTable implements Table
{
...
public Table getTable()
{
return this;
}
public void doSpecial()
{
mySpecial();
}
private void mySpecial()
{
System.out.println("Wooden");
}
}
public class MetalTable implements Table
{
...
public Table getTable()
{
return this;
}
public void doSpecial()
{
mySpecial();
}
private void mySpecial()
{
System.out.println("Metal");
}
}
and the following code:
public static void main(String[] args)
{
Collection<Table> tables = new ArrayList<Table>();
tables.add(new WoodenTable());
tables.add(new MetalTable());
for(Table table : tables)
{
table.doSpecial();
}
}
The approach here is that there is a shared public API. And the internals of each class are not exposed so the need to do something special for each class is hidden behind the common interface. This prevents having to do instance of checks or any of the other messy ways to solve this type of problem.
I would recommend to stop thinking about tables as about data structures with attributes (material), and start treating them as "persons" (Object Thinking). Don't get a "material" out of them. Instead, let them expose their behavior.
When you change the design of the table you will automatically change the design of their container. And it will become obvious that the container shouldn't care about tables' materials, but should let them control whether they want to get out of container or remain there.

Categories

Resources