YamlSlurper cannot parse Enum value - java

I have yaml:
- flowId: "2021:6:10:20:22:7"
flowType: "BIG"
summary: "Description"
flowStatus: "NEW"
createdDate: "2021-06-10"
lastModifiedDate: "2021-06-10"
class Flow{
...
FlowType flowType;
...
}
Enum FlowType{
SMALL,MEDIUM, BIG;
}
Parsing file using YamlSlurper:
def flowList = new YamlSlurper().parseText(new File(myFile).text).collect { it as Flow }
error: java.lang.ClassCastException: java.lang.String cannot be cast to model.FlowType
Is there a way to solve this?

The YAML slurper is a cute tool to quickly read a YAML file or string
and deal with it to the degree, that you would use the other slurpers:
get some basic data types inside lists and maps and just use that.
Your attempt to cast the map to the object only works for very basic
objects. The cast basically gets unrolled to something like:
[a: 42] as X
becomes
def x = new X()
map.each{ k, v ->
x."$k" = v
}
This does:
not coerce/convert types
fails if keys in the map, that are not set-able properties in the
resulting object
If you need proper object mapping, the slurpers are not directly useful
most of the time. You would rather switch to using something made for
that task -- e.g. like Jackson.
Lucky for us, the YAML slurper just uses Jackson (it actually just
transforms the YAML into JSON and then uses the JSON slurper to give you
your data back).
Following an example of how to load the YAML data into objects using
Jackson (all deps are already there, if you already see YamlSlurper):
import com.fasterxml.jackson.core.type.TypeReference
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.databind.DeserializationFeature
import com.fasterxml.jackson.dataformat.yaml.YAMLFactory
import com.fasterxml.jackson.dataformat.yaml.YAMLMapper
def yamlStr = """\
- flowId: "2021:6:10:20:22:7"
flowType: "BIG"
summary: "Description"
flowStatus: "NEW"
createdDate: "2021-06-10"
lastModifiedDate: "2021-06-10"
"""
enum FlowType{ SMALL,MEDIUM, BIG }
#groovy.transform.ToString
class Flow {
FlowType flowType
}
def flows = new ObjectMapper(new YAMLFactory()).with{
configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
readValue(yamlStr, new TypeReference<List<Flow>>(){})
}
println flows
// → Flow(BIG)

Related

Mapping POJOs to case classes with immutable lists during deserialisation

I am coming from Java background and trying to understand how to model Domain classes/POJOs in Scala.
I am trying to Deserialize JSON response from a RestAPI and my Java POJOs are as follows:
#Data
public class ColumnResponse {
private String id;
private String name;
private String type;
...
}
k
#Data
public class DataSetGetResponse {
private String id;
private List<ColumnResponse> columns;
...
}
Now I have created following Case Classes
case class DataSetGetResponse (id: String,
columns: List[ColumnResponse]
.... )
case class ColumnResponse (id: String,name: String ...)
I am trying to use https://sttp.readthedocs.io/en/latest/json.html#json4s library for HTTP communication and json4s for deserialization.
Questions:
1) In the DataSetGetResponse case class, field "columns" is a List.By default this is an immutable list. How the Deserialization library add new DataColumnGetResponse objects to this immutable list? Do I have to declare this as mutable ?
2) There is a field called 'type' field in the ColumnResponse POJO. In Scala 'type' is a reserved keyword.How to handle this case?
Answer the first one:
An immutable object can be mutated with the copy function:
dataSet.copy(columns = newResp :: dataSet.columns)
For more complex tasks you can use Lenses see for example here: enter link description here
Answer the second one:
If it is a reserved word you can do it like
case class ColumnResponse (id: String, name: String, `type`: String)
This answer addresses the following aspect of the question:
How the Deserialization library add new DataColumnGetResponse objects
to this immutable list?
Let us consider a simplified version of the problem:
JsonMethods.parse("""[1,2,3]""").extract[List[Int]]
How does json4s deserialise [1,2,3] into immutable List[Int]? First it parses the raw string into an intermediary AST (abstract syntax tree) data structure where it represents the list like so
case class JArray(arr: List[JValue]) extends JValue
We see here that arr is an immutable list. The key line that builds it up after parse executes is in JsonParser
def newValue(v: JValue): Unit = {
...
case a: JArray => vals.replace(JArray(v :: a.arr))
...
}
Note how the operator :: in v :: a.arr adds an element at the beginning of this list and returns a new list with v added in. This means since there are three elements in [1,2,3] the following three lists are created by json4s in the process of deserialisation
JArray(List(JInt(1))
JArray(List(JInt(2), JInt(1)))
JArray(List(JInt(3), JInt(2), JInt(1)))
Again note these are three separate lists.
Next, after internal AST is created, actual deserialisation to List[Int] takes place by calling extract[List[Int]]. The key component that does this for lists is CollectionBuilder
private class CollectionBuilder(json: JValue, tpe: ScalaType)(implicit formats: Formats) {
...
val array: Array[_] = json match {
case JArray(arr) => arr.map(extractDetectingNonTerminal(_, typeArg)).toArray
...
}
Note how we simply map over AST arr built up during parsing step and convert each element to the model of type typeArg, which in our simple case is Int but in your case would be DataColumnGetResponse.

Spark SQL - Encoders for Tuple Containing a List or Array as an Element

Using Spark 2.2 + Java 1.8
I have two custom data types "Foo" and "Bar". Each one implements serializable.'Foo' has a one to many relationship with 'Bar' so their relationship is represented as a Tuple:
Tuple2<Foo, List<Bar>>
Typically, when I have a 1:1 relationship, I can encode to my custom types like so:
Encoder<Tuple2<Foo,Bar>> fooBarEncoder = Encoders.tuple(Encoders.bean(Foo.class),Encoders.bean(Bar.class));
and then use to encode my Dataset
Dataset<Tuple2<Foo,Bar>> fooBarSet = getSomeData().as(fooBarEncoder);
But I am having trouble finding a way to encode for the scenario when I have a list (or an array) as a Tuple2 element. What I would like to be able to do is to provide an encoder for the second element like this:
Encoder<Tuple2<Foo,List<Bar>>> fooBarEncoder = Encoders.tuple(Encoders.bean(Foo.class), List<Bar>.class);
and then encode to my dataset:
Dataset<Tuple2<Foo,List<Bar>>> fooBarSet = getSomeData().as(fooBarEncoder)
But obviously I cannot invoke .class on a parameterized type like List
I know that for String and primitive types, arrays are supported by spark implicits e.g.:
sparkSession.implicits().newStringArrayEncoder()
But how would I create an encoder for a List or Array of a custom class type?
I'm not sure how well this method could be implemented within your setup but here goes. Create a wrapper class for your list and try it out.
public class BarList implements Serializable {
List<Bar> list;
public List<Bar> getList() {
return list;
}
public void setList(List<Bar> l) {
list = l;
}
}
I'm don't know if it's possible. I tried the following Scala, trying to help, figuring that I could build up the encoder by first teaching spark how to encode X, then List[X] and finally a tuple containing List[X] (not shown below):
import org.apache.spark.sql.Encoders
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import scala.beans.BeanProperty
class X(#BeanProperty var field: String) extends Serializable
case class Z(field: String)
implicit val XEncoder1 = Encoders.bean(classOf[X])
implicit val ZEncoder = Encoders.product[Z]
val listXEncoder = ExpressionEncoder[List[X]] // doesn't work
val listZEncoder = ExpressionEncoder[List[Z]]
listZEncoder works fine
Switching to use
implicit val XEncoder2 = org.apache.spark.sql.Encoders.kryo[X]
Still doesn't work for listXEncoder
The error ends up at a place in catalyst ScalaReflection, which is beyond me.

scala jackson dealing with any type, possible custom interceptor required

I'm currently rewriting one of our applications, which uses PostgreSQL and now should use Mongo.
The architecture is pretty simple
db => case class => rest api
we are using scala jackson for it and everything works fine, except for some minor annoyance and I'm just looking for the right approach.
consider this case class to understand my annoyance
case class Test(val id:String, val value:Any)
the value in our application can be a number or a string at this point, number either an Integer or a Double.
so when we receive JSON like this, from our legacy application:
{ id:"a",value:"test"}
it gets mapped correctly and results in the expected types of String, String.
But assuming the following:
{ id:"b", value:"1"}
we would like to have this mapped instead to String,Integer. But obviously jackson thinks it's a String and maps it to String, String.
Is there some transparent way todo this? My thinking, would be something like an interceptor for jackson would be nice, which simple tests
if type.isDouble == true return double value
else if type.isInteger == true return integer value
else return string value
so we would end up with
String,Double
String,Integer
String,String
in this example.
Obviously I can write a generic parser and bring the dataformat into the correct form beforehand, but I rather would have this done transparency, since we never know when user will submit the legacy JSON format, with this bug and so possibly corrupt the system.
thx
Ok it looks like this is easier than exspected, with writing a customer serializer and annotating our one field which needs it
Solution found here
actual example:
class NumberDeserializer extends JsonDeserializer[Any] {
override def deserialize(jsonParser: JsonParser, deserializationContext: DeserializationContext): Any = {
val jsonNode: JsonNode = jsonParser.getCodec.readTree(jsonParser)
val content = jsonNode.textValue
try {
content.toInt
} catch {
case e: NumberFormatException => try {
content.toDouble
}
catch {
case e2: NumberFormatException => content
}
}
}
}
usage:
case class Test(
#JsonDeserialize(using = classOf[NumberDeserializer])
value: Any
)

How to convert scala.collection.Set to java.util.Set with serializable within an RDD

I have a scala.collection.Set scalaSet : Set[Long].
How will I be able to convert it into a java.util.Set with serializable. I tried the following code, but got
java.io.notserializableexception: scala.collection.convert.wrappers$setWrapper
import scala.collection.JavaConversions._
Class MySerializableClass extends Serializable {
// method to implement the Scala to Java operations on the given RDD
def rddOps(dummyRDD: RDD[(Long, Set[Long])]) = {
val dummyRDDWithJavaSet = dummyRDD.map( {
case(key, value) => (key, scalaToJavaSetConverter(value))
}
// scala Set to Java Set Converters
def scalaToJavaSetConverter(scalaSet: Set[Long]): java.util.Set[Long] = {
val javaSet : java.util.Set[Long] = setAsJavaSet(scalaSet)
javaSet
}
}
I have seen the thread notserializable exception when trying to serialize java map converted from scala for an answer, but the solution didn't work with serialization
The serialization issue with the scala.collection.JavaConvertions/JavaConverters is that these converters are wrappers that use the underlying (scala/java) object. They are merely a wrapper and therefore for it to be effectively serializable, they must have a warranty that the underlying structure is serializable.
The easiest solution in your case is to implement a structural copy in your conversion method:
// scala Set to Java Set Converters
def scalaToJavaSetConverter(scalaSet: Set[Long]): java.util.Set[Long] = {
val javaSet = new java.util.HashSet[Long]()
scalaSet.foreach(entry => javaSet.add(entry))
javaSet
}

How would I model a Gson object that can handle a dynamic field?

An API my application is communicating with sends responses that look like:
{
Code: 200,
Message: "HELLO",
Data: []
}
The Data field is always an array of SOMETHING. But that something could be a single node of text, another array of something else, or any other of an assortment of different objects.
In the below example, the data node is an array of an array of car objects.
Data: [ [ {car:1}, {car:2} ] ]
Another return type could be an array of insect objects:
Data: [ {insect : spider} ]
I would like to design a Gson object to handle this and was wondering what the best way would be.
My first thought is to have an abstract class that holds the Code and Message fields, and then have many sub-types that all have their own Data field. Then I would just call .fromJson() passing it the sub-class.
Is there a more optimal way to design it so that Gson would handle the differences?
I figured out what I believe is the best answer. Fairly straightforward!
Make the class generic and supply the type by creating a TypeToken before passing to Gson:
public class Response<T> {
private String code;
private String message;
private List<T> data;
}
Then when using Gson:
Type myCarListResponse = new TypeToken<Response<List<Car>>>(){}.getType();
Response<List<Car>> response = gson.fromJson(json, myCarListResponse);
Replace > with the type you are expecting from the Data node. The above example satisfies the first example from the original post.
To satisfy the second example:
Type myInsectResponse = new TypeToken<Response<Insect>>(){}.getType();
Response<Insect> response = gson.fromJson(json, myInsectResponse);
In Jackson, you can use #JsonAnyGetter/Setter to achieve this.
Refer http://www.cowtowncoder.com/blog/archives/2011/07/entry_458.html, http://wiki.fasterxml.com/JacksonFeatureAnyGetter

Categories

Resources