RDD to CSV JAVA - java

I'm working with Apache Spark and Apache Kylin and I have to store a csv file in HDFS to be able to create with it a cube inside Kylin. The idea is to convert an RDD I obtain into a csv file and I was trying to transform my RDD into the csv file like this:
bookingDF.write().format("com.databricks.spark.csv").option("header", "true").save("hdfs://10.7.30.131:8020/tmp/hfile/e.csv");
but I'm always obtaining an error quite long like this and I think it is because of the field Date of the object I'm using:
17/01/19 14:50:24 ERROR Utils: Aborting taskscala.MatchError: Fri Dec 09 07:45:27 CET 2016 (of class java.util.Date)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:255)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:250)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)...
down below I attach the code of the Java Object I'm using:
#JsonIgnoreProperties(ignoreUnknown = true)
public class Booking implements Serializable {
private String bPk;
private String type;
private String transactionId;
private Boolean revisit;
private String device;
#JsonProperty("serverTime")
private Date time;
private String trackingId;
private String browserFamily;
#JsonProperty("action")
private String measure;
private String userId;
public String getUserId() {
return userId;
}
public void setUserId(String userId) {
this.userId = userId;
}
public String getMeasure() {
return measure;
}
public void setMeasure(String measure) {
this.measure = measure;
}
public String getBrowserFamily() {
return browserFamily;
}
public void setBrowserFamily(String browserFamily) {
this.browserFamily = browserFamily;
}
public void setTime(Date time) {
this.time = time;
}
public String getTrackingId() {
return trackingId;
}
public void setTrackingId(String trackingId) {
this.trackingId = trackingId;
}
public Date getTime() {
return time;
}
....
I'm not sure what I'm doing wrong, I tried to transform the java.util.date to java.sql.date but I'm still obtaining the same error but with java.sql.date.

Can you try using java.text.SimpleDateFormat ?
From spark-csv source code:
dateFormat:
specifies a string that indicates the date format to use when reading
dates or timestamps. Custom date formats follow the formats at
java.text.SimpleDateFormat. This applies to both DateType and
TimestampType. By default, it is null which means trying to parse
times and date by java.sql.Timestamp.valueOf() and
java.sql.Date.valueOf().

Related

Deserialize part of JSON string to DateTime in POJO using Jackson

I am reading a json of the given form and storing it as a POJO.
{
"details" : [
{
"version" : 1,
"time" : "2021-01-01T00:00:00.000Z",
}
]
}
My POJO class looks like :
public class Details
{
private int version;
private String time;
public Integer getVersion(){
return version;
}
public void setVersion(int version){
this.version = version;
}
public String getTime(){
return time;
}
public void setTime(String time){
this.time = time;
}
}
The time is being read as a string. How do I deserialize it to DateTime using Jackson?
Should be able to use #JsonFormat annotation for your date. First change your time field from String to Date then do the following:
#JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "dd-MM-yyyy'T'hh:mm:ss.SSS'Z'")
private Date time;
The following link shows how to do other different conversions especially if its a standard time format
https://www.baeldung.com/jackson-serialize-dates
ObjectMapper mapper = new ObjectMapper();
mapper.registerModule(new JodaModule());
Adding this worked for me.
In POJO, gave time as 'DateTime' instead of 'String'.
public class Details
{
private int version;
private DateTime time;
...
//getters & setters
}

DynamoDB converting ZonedDateTime in nested object

I'm switching from MongoDB to DynamoDB on a project. Now I'm trying to store this Post object in the db. I'm using the DynamoDBTypeConverter to convert the ZonedDateTime to a String, as DynamoDB doesn't support ZonedDateTime.
That works fine, but when I'm adding a ZonedDateTime field in the Comment object and try to convert it too it doesn't work. I've tried adding a converter to the Comment class, and tried using the converter in the Post class for Comment, but nothing seems to work. Is there a way to convert a field in a nested object for DynamoDB?
com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMappingException: Cannot marshall type class java.time.ZonedDateTime without a custom marshaler or #DynamoDBDocument annotation.
#DynamoDBTable(tableName = "Post")
public class Post {
#DynamoDBHashKey
private String postNumber;
private ZonedDateTime date;
private List<Comment> comments;
#DynamoDBTypeConverted(converter = ZonedDateTimeConverter.class)
#DynamoDBAttribute
public ZonedDateTime getDate() {
return date;
}
#DynamoDBAttribute(attributeName = "comments")
public List<Comment> getComments() {
return comments;
}
static public class ZonedDateTimeConverter implements DynamoDBTypeConverter<String, ZonedDateTime> {
#Override
public String convert(final ZonedDateTime time) {
return time.toString();
}
#Override
public ZonedDateTime unconvert(final String stringValue) {
return ZonedDateTime.parse(stringValue);
}
}
#DynamoDBDocument
public class Comment {
private String commentNumber;
#NotNull
private User user;
private ZonedDateTime date;
#DynamoDBTypeConverted(converter = ZonedDateTimeConverter.class)
#DynamoDBAttribute
public ZonedDateTime getDate(){
return this.date;
}
static public class ZonedDateTimeConverter implements DynamoDBTypeConverter<String, ZonedDateTime> {
#Override
public String convert(final ZonedDateTime time) {
return time.toString();
}
#Override
public ZonedDateTime unconvert(final String stringValue) {
return ZonedDateTime.parse(stringValue);
}
}

Timestamp invalid hour in Java

I am using JPA with my Java project, and the timestamp is not working very well : it only shows 2015-08-12 00:00:00.0 (the day is correct but the hour is not)
#Entity
public class Session implements Serializable {
..
#Temporal(TemporalType.DATE)
private Date timestamp;
..
public Session(String sessionId) {
super();
this.sessionId = sessionId;
this.timestamp = new Date();
}
public Session() {
super();
this.timestamp = new Date();
}
}
Do you know how to fix this?
You should use TemporalType.TIMESTAMP that will map the field to a java.sql.Timestamp, hence it will contain also time related info, not only regarding date. In comparison, the type you used, TemporalType.DATE are mapped to java.sql.Date, class containing information like day, month year.
So, your code will transform in:
#Entity
public class Session implements Serializable {
..
#Temporal(TemporalType.TIMESTAMP)
private Date timestamp;
..
public Session(String sessionId) {
this.sessionId = sessionId;
this.timestamp = new Date();
}
public Session() {
this.timestamp = new Date();
}
}

Always getting value of date field as current date after converting json into java object using JSONSerializer.toJava

I am converting the following json into java object using JSONSerializer.toJava.
{
"sessionId": "d792-54fd8a87-ses-Administrator-2200-0",
"campaignId": 2,
"callBackTime": "2015-08-08 07:23:00",
"isSelfCallBack": "false",
"userId": "a1",
"callBackHandlerType": "voice.campaign.callback.handler",
"callBackProperties":
{
"customerId": "112",
"phone": "33334444"
}
}
And my root class for json config is described as below
public class ProxyAddCallbackRequestBean extends ProxySessionRequestBean {
private static final long serialVersionUID = 1L;
private Integer campaignId;
private Date callBackTime;
private boolean isSelfCallBack;
private String userId;
private String callBackHandlerType;
private Map<String, String> callBackProperties;
public Integer getCampaignId() {
return campaignId;
}
public void setCampaignId(Integer campaignId) {
this.campaignId = campaignId;
}
public Date getCallBackTime() {
return callBackTime;
}
public void setCallBackTime(Date callBackTime) {
this.callBackTime = callBackTime;
}
public boolean isSelfCallBack() {
return isSelfCallBack;
}
public void setSelfCallBack(boolean isSelfCallBack) {
this.isSelfCallBack = isSelfCallBack;
}
public String getUserId() {
return userId;
}
public void setUserId(String userId) {
this.userId = userId;
}
public String getCallBackHandlerType() {
return callBackHandlerType;
}
public void setCallBackHandlerType(String callBackHandlerType) {
this.callBackHandlerType = callBackHandlerType;
}
public Map<String, String> getCallBackProperties() {
return callBackProperties;
}
public void setCallBackProperties(Map<String, String> callBackProperties) {
this.callBackProperties = callBackProperties;
}
}
After converting to java object, callBackTime value is set to current time while other fields have correct values.
I am new to json can you please help me to find out where i am doing wrong.
Assuming you're using this json-lib, there's nothing from a quick scan of the documentation to suggest that it will auto-convert a String to a Date. Therefore, you're going to need to parse the date. If you're happy pulling in the dependency Joda Time has a good reputation. Otherwise, if the date you've shown is expected, something like:
DateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
public void setCallBackTime(String rawTime) {
this.callBackTime = df.parse(rawTime);
}
should get you started. (Javadoc for SimpleDateFormat)
(Note that the date you've quoted looks like, but isn't ISO 8601).

How to format #param String in Spring Data Rest

I have to convert the incoming parameter value to Repository interface into desired format, is it possible to do it. My Domain Class,
#DynamoDBTable(tableName = "test")
public class Test implements Serializable{
#Id
private String id;
private String name;
private String date;
#DynamoDBHashKey(attributeName = "id")
#DynamoDBAutoGeneratedKey
public String getId() {
return id;
}
#DynamoDBAttribute(attributeName = "name")
public String getName() {
return name;
}
#DynamoDBAttribute(attributeName = "date")
#JsonSerialize(using = StringDateSerializer.class)
public String getDate() {
return date;
}
public void setId(String id) {
this.id = id;
}
public void setName(String name) {
this.name = name;
}
#JsonDeserialize(using = StringDateDeserializer.class)
public void setDate(String date) {
this.date = date;
}
}
And my repository interface,
#EnableScan
#RestResource(path="test", rel="test")
public interface TestRepository extends PagingAndSortingRepository<Test, String>{
#RestResource(path="testsearch", rel="test")
public Page<Test> findByNameAndDateLessThan(#Param("name") String name, #Param("date") String date, Pageable pageable);
}
Here I have to convert the incoming date String to time using getTime() method of Java. Is it possible to achieve this without using controller and am not interested in sending from client side because timezone problem may occur.
My Convertors:
public class StringDateSerializer extends JsonSerializer<String> {
private static final SimpleDateFormat dateFormat = new SimpleDateFormat("dd-MM-yyyy");
#Override
public void serialize(String time, JsonGenerator gen,
SerializerProvider provider) throws IOException,
JsonProcessingException {
Date date = new Date(Long.parseLong(time));
String formattedDate = dateFormat.format(date);
gen.writeString(formattedDate);
}
}
public class StringDateDeserializer extends JsonDeserializer<String> {
private static final SimpleDateFormat dateFormat = new SimpleDateFormat("dd-MM-yyyy");
#Override
public String deserialize(JsonParser parser, DeserializationContext context)
throws IOException, JsonProcessingException {
String dateReceived = parser.getText();
Date date = null;
try {
date = dateFormat.parse(dateReceived);
} catch (ParseException e) {
e.printStackTrace();
}
return String.valueOf(date.getTime());
}
}
Here I have to use, GET /test/search/test?name=xx&date=14-06-2014. I need to get all the names with date less than 14-06-2014 and left the datas with or after 14-06-2014.
While POST and GET, I have converted the incoming and outgoing string using JsonSerialize and JsonDeserialize annotations but if I want to fetch any data using finder method its not converting as I thought.
For example, If I save {"name": "Test", "date": "08-10-2014"}, in DB it will be saved by its equivalent time and If I want to search it using 08-10-2014 not the time constant. I am new to springs and I cant find a way for it. Thanks in advance.
What's the reason you use String as the type for the date in the first place. That's quite suboptimal (to phrase it politely) API design.
Spring Data REST support the usage of #DateTimeFormat on query method parameters to turn the String base representation you get from the HTTP request into a Date. So your repository interface might look something like this:
public interface TestRepository extends PagingAndSortingRepository<Test, String>{
public Page<Test> findByNameAndDate(#Param("name") String name,
#Param("date") #DateTimeFormat(iso = ISO.DATE) Date date, Pageable pageable);
}
This will cause Strings like 2014-06-08 to be turned into the appropriate Date.
If I'm understanding your issue correctly, there are two areas of concern - how Spring-Data-Rest handles date mapping, and how Spring-Data-DynamoDB handles date mapping.
With regard to Spring-Data-DynamoDB:
DynamoDB stores dates as Strings, so if you have a date attribute as part of your date model you can either represent them as Strings in your data model ( as I think you are doing currently ), or you can represent them as Dates, and configure Spring-Data-DynamoDB so that it maps the Dates to Strings. This can be done using Custom Marshallers from amazon-aws-sdk, and support has been added to handle this in the Spring Data DynamoDB module.
You can read about marshallers here : http://java.awsblog.com/post/Tx1K7U34AOZBLJ2/Using-Custom-Marshallers-to-Store-Complex-Objects-in-Amazon-DynamoDB
Note that this marshalling is separate from any mapping that you may be requiring Spring-Data-Rest to perform from JSON to objects - for this you will still need the #DateTimeFormat annotation.
If you want to represent the date as a java.util.Date in your data model, simply annotate the getter for the attribute in your domain class with #DynamoDBMarshalling, and pass in the class of marshaller you wish to use, eg:
#DynamoDBRangeKey(attributeName = "ReplyDateTime")
#DynamoDBMarshalling(marshallerClass=DefaultDynamoDBDateMarshaller.class)
public Date getReplyDateTime() {
...
DefaultDynamoDBDateMarshaller here is a support class from Spring-Data-DynamoDB, but you can implement your own for custom date/string mapping.
With this in place, you can now change your repository finder methods so they expect Date parameters rather than Strings:
public Page<Reply> findByReplyDateTimeAfter(Date replyDateTime,Pageable pageable);
Hope this helps,
Cheers,
Michael

Categories

Resources