Performance issue in Converting Java object to JSON object - java

I have tested below example before I do my exact task of converting Java Objects to JSON.
Converting Java objects to JSON with Jackson
And I was looking for the better performance (Converting time should be very less).
This article is showing the stats for the performance in between different APIs from this answer.
My finding was for example with the first link that I mentioned (with few records):
ValueData object = new ValueData();
List<ValueItems> information = new ArrayList<ValueItems>();
ValueItems v1 = new ValueItems(String.valueOf(Calendar.getInstance().getTimeInMillis()), "feat1", 1, "data1");
ValueItems v2 = new ValueItems(String.valueOf(Calendar.getInstance().getTimeInMillis()), "feat2", 2, "data2");
ValueItems v3 = new ValueItems(String.valueOf(Calendar.getInstance().getTimeInMillis()), "feat3", 3, "data3");
ValueItems v4 = new ValueItems(String.valueOf(Calendar.getInstance().getTimeInMillis()), "feat4", 4, "data4");
ValueItems v5 = new ValueItems(String.valueOf(Calendar.getInstance().getTimeInMillis()), "feat5", 5, "data5");
ValueItems v6 = new ValueItems(String.valueOf(Calendar.getInstance().getTimeInMillis()), "feat6", 6, "data6");
ValueItems v7 = new ValueItems(String.valueOf(Calendar.getInstance().getTimeInMillis()), "feat7", 7, "data7");
information.add(v1);
information.add(v2);
information.add(v3);
information.add(v4);
information.add(v5);
information.add(v6);
information.add(v7);
object.setInformation(information);
And I'm going to convert this object by using Jackson:
long smili = Calendar.getInstance().getTimeInMillis();
ObjectWriter ow = new ObjectMapper().writer().withDefaultPrettyPrinter();
String json = ow.writeValueAsString(object);
long emili = Calendar.getInstance().getTimeInMillis();
System.out.println("taken time using jackson = " + (emili - smili) + " milli seconds");
And now I'm doing by using StringBuilder:
smili = Calendar.getInstance().getTimeInMillis();
StringBuilder sb = new StringBuilder();
sb.append("{\n\"information\" : [\n");
for (ValueItems vi : object.getInformation()) {
sb.append("{\n\"timestamp\" : \""+vi.getTimestamp()+"\",");
sb.append("\"feature\" : \""+vi.getFeature()+"\",");
sb.append("\"ean\" : "+vi.getEan()+",");
sb.append("\"data\" : \""+vi.getData()+"\"\n},");
}
sb.deleteCharAt(sb.length() - 1);
sb.append("]\n}");
emili = Calendar.getInstance().getTimeInMillis();
System.out.println("taken time using StringBuilder = " + (emili - smili) + " milli seconds");
I got the timing as given below just for the list size 7:
taken time using jackson = 534 milli seconds
taken time using StringBuilder = 1 milli seconds
I want to convert the object with the information list size more than 10k but the time should be very less.
Creating JSON buy using StringBuilder will help in this case?
Is there other API gives the facility that I require?
Please help me on this.

Thanks Sam B.
I have tried with jakson-afterburner:
ObjectMapper mapper = new ObjectMapper();
mapper.registerModule(new AfterburnerModule());
ow = mapper.writer().withDefaultPrettyPrinter();
json = ow.writeValueAsString(object);
And I have tested with list sizes 7, 7000, 70000 and 700000:
timing was:
For 7:
taken time using jackson = 217 milli seconds
taken time using StringBuilder = 1 milli seconds
taken time using after-burner = 25 milli seconds
For 7000:
taken time using jackson = 310 milli seconds
taken time using StringBuilder = 31 milli seconds
taken time using after-burner = 65 milli seconds
For 70000:
taken time using jackson = 469 milli seconds
taken time using StringBuilder = 149 milli seconds
taken time using after-burner = 101 milli seconds
For 700000:
taken time using jackson = 1120 milli seconds
taken time using StringBuilder = 705 milli seconds
taken time using after-burner = 623 milli seconds
When the list size increases, afterburner is efficient.

Related

How can I get seconds and milliseconds together from time?

I am trying to get page load time in automation testing project.
pageLoad3.start();
WebDriverWait wait3 = new WebDriverWait(driver, 30);
wait3.until(ExpectedConditions.visibilityOfElementLocated(By.xpath(hotelNameDetailsXpath)));
pageLoad3.stop();
long pageLoadTime_ms5 = pageLoad3.getTime();
long pageLoadTime_Seconds5 = pageLoadTime_ms5 / 1000;
//System.out.println("Time taken to get load Total price element ::");
System.out.println("pageLoadTime_ms5 ::"+pageLoadTime_ms5);
System.out.println("Time taken to load DB Response :: " + pageLoadTime_Seconds5 + " seconds");
Output:
pageLoadTime_ms5 ::11479
Time taken to load DB Response :: 11 seconds
I am getting 11 seconds but not able to get for 479, how can I get it in below format?
Actual Requirement: I want time to get in below format like
11 seconds 47 milliseconds i.e. (00:11:47)
To get the output as 11 seconds 47 milliseconds you can use the following solution:
Code Block:
public class division_by_1000 {
public static void main(String[] args) {
//long pageLoadTime_ms5 = pageLoad3.getTime();
// assuming pageLoadTime_ms5 = 11479
String pageLoadTime_ms5 = "11479";
System.out.println("pageLoadTime_ms5 ::"+pageLoadTime_ms5);
System.out.println("Time taken to load DB Response :: " + (Integer.parseInt(pageLoadTime_ms5)/1000) + " seconds " + (Integer.parseInt(pageLoadTime_ms5)%1000) + " millisseconds ");
}
}
Console Output:
pageLoadTime_ms5 ::11479
Time taken to load DB Response :: 11 seconds 479 millisseconds

Spark DataFrame java.lang.OutOfMemoryError: GC overhead limit exceeded on long loop run

I'm running a Spark application (Spark 1.6.3 cluster), which does some calculations on 2 small data sets, and writes the result into an S3 Parquet file.
Here is my code:
public void doWork(JavaSparkContext sc, Date writeStartDate, Date writeEndDate, String[] extraArgs) throws Exception {
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
S3Client s3Client = new S3Client(ConfigTestingUtils.getBasicAWSCredentials());
boolean clearOutputBeforeSaving = false;
if (extraArgs != null && extraArgs.length > 0) {
if (extraArgs[0].equals("clearOutput")) {
clearOutputBeforeSaving = true;
} else {
logger.warn("Unknown param " + extraArgs[0]);
}
}
Date currRunDate = new Date(writeStartDate.getTime());
while (currRunDate.getTime() < writeEndDate.getTime()) {
try {
SparkReader<FirstData> sparkReader = new SparkReader<>(sc);
JavaRDD<FirstData> data1 = sparkReader.readDataPoints(
inputDir,
currRunDate,
getMinOfEndDateAndNextDay(currRunDate, writeEndDate));
// Normalize to 1 hours & 0.25 degrees
JavaRDD<FirstData> distinctData1 = data1.distinct();
// Floor all (distinct) values to 6 hour windows
JavaRDD<FirstData> basicData1BySixHours = distinctData1.map(d1 -> new FirstData(
d1.getId(),
TimeUtils.floorTimePerSixHourWindow(d1.getTimeStamp()),
d1.getLatitude(),
d1.getLongitude()));
// Convert Data1 to Dataframes
DataFrame data1DF = sqlContext.createDataFrame(basicData1BySixHours, FirstData.class);
data1DF.registerTempTable("data1");
// Read Data2 DataFrame
String currDateString = TimeUtils.getSimpleDailyStringFromDate(currRunDate);
String inputS3Path = basedirInput + "/dt=" + currDateString;
DataFrame data2DF = sqlContext.read().parquet(inputS3Path);
data2DF.registerTempTable("data2");
// Join data1 and data2
DataFrame mergedDataDF = sqlContext.sql("SELECT D1.Id,D2.beaufort,COUNT(1) AS hours " +
"FROM data1 as D1,data2 as D2 " +
"WHERE D1.latitude=D2.latitude AND D1.longitude=D2.longitude AND D1.timeStamp=D2.dataTimestamp " +
"GROUP BY D1.Id,D1.timeStamp,D1.longitude,D1.latitude,D2.beaufort");
// Create histogram per ID
JavaPairRDD<String, Iterable<Row>> mergedDataRows = mergedDataDF.toJavaRDD().groupBy(md -> md.getAs("Id"));
JavaRDD<MergedHistogram> mergedHistogram = mergedDataRows.map(new MergedHistogramCreator());
logger.info("Number of data1 results: " + data1DF.select("lId").distinct().count());
logger.info("Number of coordinates with data: " + data1DF.select("longitude","latitude").distinct().count());
logger.info("Number of results with beaufort histograms: " + mergedDataDF.select("Id").distinct().count());
// Save to parquet
String outputS3Path = basedirOutput + "/dt=" + TimeUtils.getSimpleDailyStringFromDate(currRunDate);
if (clearOutputBeforeSaving) {
writeWithCleanup(outputS3Path, mergedHistogram, MergedHistogram.class, sqlContext, s3Client);
} else {
write(outputS3Path, mergedHistogram, MergedHistogram.class, sqlContext);
}
} finally {
TimeUtils.progressToNextDay(currRunDate);
}
}
}
public void write(String outputS3Path, JavaRDD<MergedHistogram> outputRDD, Class outputClass, SQLContext sqlContext) {
// Apply a schema to an RDD of JavaBeans and save it as Parquet.
DataFrame fullDataDF = sqlContext.createDataFrame(outputRDD, outputClass);
fullDataDF.write().parquet(outputS3Path);
}
public void writeWithCleanup(String outputS3Path, JavaRDD<MergedHistogram> outputRDD, Class outputClass,
SQLContext sqlContext, S3Client s3Client) {
String fileKey = S3Utils.getS3Key(outputS3Path);
String bucket = S3Utils.getS3Bucket(outputS3Path);
logger.info("Deleting existing dir: " + outputS3Path);
s3Client.deleteAll(bucket, fileKey);
write(outputS3Path, outputRDD, outputClass, sqlContext);
}
public Date getMinOfEndDateAndNextDay(Date startTime, Date proposedEndTime) {
long endOfDay = startTime.getTime() - startTime.getTime() % MILLIS_PER_DAY + MILLIS_PER_DAY ;
if (endOfDay < proposedEndTime.getTime()) {
return new Date(endOfDay);
}
return proposedEndTime;
}
The size of data1 is around 150,000 and data2 is around 500,000.
What my code does is basically does some data manipulation, merges the 2 data objects, does a bit more manipulation, prints some statistics and saves to parquet.
The spark has 25GB of memory per server, and the code runs fine.
Each iteration takes about 2-3 minutes.
The problem starts when I run it on a large set of dates.
After a while, I get an OutOfMemory:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.List.$colon$colon$colon(List.scala:127)
at org.json4s.JsonDSL$JsonListAssoc.$tilde(JsonDSL.scala:98)
at org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:139)
at org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:72)
at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:144)
at org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:164)
at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:38)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:87)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:72)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:72)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:71)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:70)
Last time it ran, it crashed after 233 iterations.
The line it crashed on was this:
logger.info("Number of coordinates with data: " + data1DF.select("longitude","latitude").distinct().count());
Can anyone please tell me what can be the reason for the eventual crashes?
I'm not sure that everyone will find this solution viable, but upgrading the Spark cluster to 2.2.0 seems to have resolved the issue.
I have ran my application for several days now, and had no crashes yet.
This error occurs when GC takes up over 98% of the total execution time of process. You can monitor the GC time in your Spark Web UI by going to stages tab in http://master:4040.
Try increasing the driver/executor(whichever is generating this error) memory using spark.{driver/executor}.memory by --conf while submitting the spark application.
Another thing to try is to change the garbage collector that the java is using. Read this article for that: https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html. It very clearly explains why GC overhead error occurs and which garbage collector is best for your application.

How to obtain , record and duration of app usage in minutes , seconds or hours

I have block of code that allows me to retreive all the apps/services running on my android device including the app that I
am building. I am not entirely sure if I am on the right path butbecause I am debugging on android 4.3 I would like to use ActivityManager.RunningService.activeSince
(per service/app) and subtract it from SystemClock.elapsedRealtime(); which I understand is total milliseconds since reboot . So for example
if the device was rebboted at 10am and whatsapp was started at 10:15 and the current time is 1030 I want to be able to use these values
to get an a close estimate of the amount spent on whatsapp. I have a feeling that this is not the most elegant way to achieve this and I am therefore very open to
any advice. This my code below thus far . For now I am using android 4.3
ActivityManager am = (ActivityManager)this.getSystemService(Context.ACTIVITY_SERVICE);
List<ActivityManager.RunningServiceInfo> services = am.getRunningServices(Integer.MAX_VALUE);
for (ActivityManager.RunningServiceInfo info : services) {
cal.setTimeInMillis(currentMillis-info.activeSince);
long millisSinceBoot = SystemClock.elapsedRealtime();
long appStartTime = info.activeSince;
long appDuration = appStartTime - millisSinceBoot ;
//long time = ((millisSinceBoot - values.get(position).activeSince)/1000);
//long time = ((millisSinceBoot - currentMillis-info.activeSince)/1000);
//Log.i("HRHHRHRHRHR", "%%%%%%%%%%%%%%%%"+time);
//String time1 = String.valueOf(time);
int seconds = (int) (appDuration / 1000) % 60 ;
int minutes = (int) ((appDuration / (1000*60)) % 60);
int hours = (int) ((appDuration / (1000*60*60)) % 24);
String time11 = hours+":"+minutes+":"+seconds;
Log.i("Time", "Secs:- " + seconds + " " + "Mins:- " + minutes + " " + "Hours:- " + hours);
Log.i(TAG, String.format("Process %s with component %s has been running since %s (%d milliseconds)",
info.process, info.service.getClassName(), cal.getTime().toString(), info.activeSince ));
}

Processing CQEngine ResultSet with Scala foreach is very slow

I'm trying to process CQEngine's ResultSet using Scala's foreach, but the result is very slow.
Following is the snippet of what I'm trying to do
import collection.JavaConversions._
val query = existIn(myOtherCollection, REFERENCE, REFERENCE)
val resultSet = myIndexCollection.retrieve(query)
resultSet.foreach(r =>{
//do something here
})
Somehow the .foreach method is very slow. I tried to debug by putting SimonMonitor and change the .foreach using while(resultSet.hasNext), surprisingly, every call to hasNext method takes about 1-2 seconds. That's very slow.
I tried to create the same version using Java, and the Java version is super fast.
Please help
I am not able to reproduce your problem with the below test code. Can you try it on your system and let me know how it runs?
(Uncomment line 38, garages.addIndex(HashIndex.onAttribute(Garage.BRANDS_SERVICED)), to make BOTH the Scala and Java iterators run blazingly fast...)
The output first (time in milliseconds):
Done adding data
Done adding index
============== Scala ==============
Car{carId=4, name='BMW M3', description='2013 model', features=[radio, convertible]}
Time : 3 seconds
Car{carId=1, name='Ford Focus', description='great condition, low mileage', features=[spare tyre, sunroof]}
Time : 1 seconds
Car{carId=2, name='Ford Taurus', description='dirty and unreliable, flat tyre', features=[spare tyre, radio]}
Time : 2 seconds
============== Java ==============
Car{carId=4, name='BMW M3', description='2013 model', features=[radio, convertible]}
Time : 3 seconds
Car{carId=1, name='Ford Focus', description='great condition, low mileage', features=[spare tyre, sunroof]}
Time : 1 seconds
Car{carId=2, name='Ford Taurus', description='dirty and unreliable, flat tyre', features=[spare tyre, radio]}
Time : 2 seconds
Code below:
import collection.JavaConversions._
import com.googlecode.cqengine.query.QueryFactory._
import com.googlecode.cqengine.CQEngine;
import com.googlecode.cqengine.index.hash._;
import com.googlecode.cqengine.IndexedCollection;
import com.googlecode.cqengine.query.Query;
import java.util.Arrays.asList;
object CQTest {
def main(args: Array[String]) {
val cars: IndexedCollection[Car] = CQEngine.newInstance();
cars.add(new Car(1, "Ford Focus", "great condition, low mileage", asList("spare tyre", "sunroof")));
cars.add(new Car(2, "Ford Taurus", "dirty and unreliable, flat tyre", asList("spare tyre", "radio")));
cars.add(new Car(3, "Honda Civic", "has a flat tyre and high mileage", asList("radio")));
cars.add(new Car(4, "BMW M3", "2013 model", asList("radio", "convertible")));
// add cruft to try and slow down CQE
for (i <- 1 to 10000) {
cars.add(new Car(i, "BMW2014_" + i, "2014 model", asList("radio", "convertible")))
}
// Create an indexed collection of garages...
val garages: IndexedCollection[Garage] = CQEngine.newInstance();
garages.add(new Garage(1, "Joe's garage", "London", asList("Ford Focus", "Honda Civic")));
garages.add(new Garage(2, "Jane's garage", "Dublin", asList("BMW M3")));
garages.add(new Garage(3, "John's garage", "Dublin", asList("Ford Focus", "Ford Taurus")));
garages.add(new Garage(4, "Jill's garage", "Dublin", asList("Ford Focus")));
// add cruft to try and slow down CQE
for (i <- 1 to 10000) {
garages.add(new Garage(i, "Jill's garage", "Dublin", asList("DONT_MATCH_CARS_BMW2014_" + i)))
}
println("Done adding data")
// cars.addIndex(HashIndex.onAttribute(Car.NAME));
// garages.addIndex(HashIndex.onAttribute(Garage.BRANDS_SERVICED));
println("Done adding index")
val query = existsIn(garages, Car.NAME, Garage.BRANDS_SERVICED, equal(Garage.LOCATION, "Dublin"))
val resultSet = cars.retrieve(query)
var previous = System.currentTimeMillis()
println("============== Scala ============== ")
// Scala version
resultSet.foreach(r => {
println(r);
val t = (System.currentTimeMillis() - previous)
System.out.println("Time : " + t / 1000 + " seconds")
previous = System.currentTimeMillis()
})
println("============== Java ============== ")
previous = System.currentTimeMillis()
// Java version
val i: java.util.Iterator[Car] = resultSet.iterator()
while (i.hasNext) {
val r = i.next()
println(r);
val t = (System.currentTimeMillis() - previous)
System.out.println("Time : " + t / 1000 + " seconds")
previous = System.currentTimeMillis()
}
}
}

Convert seconds to W,D,H,M format in JAVA

I have time in seconds i want to convert it into a format like 6w 3d 9h 5m . Can someone please provide a method which can do this task. Thanks :)
w=weeks
d=days
h=hours
m=minutes
I have tried the below code but i dont get weeks using
int day = (int)TimeUnit.SECONDS.toDays(seconds);
long hours = TimeUnit.SECONDS.toHours(seconds) - TimeUnit.SECONDS.toHours(TimeUnit.SECONDS.toDays(seconds));
long minute = TimeUnit.SECONDS.toMinutes(seconds) - TimeUnit.SECONDS.toMinutes(TimeUnit.SECONDS.toHours(seconds));
System.out.println("Day :"+day+" Hours :"+hours+" Minutes :"+minute);
this will give you:
1w 4d 10h 20m
there should be more elegant way, but this works:
long s = 987654l;
final long M=60,H=60*M, D=24*H, W=7*D;
long w = s/W;
s%=W;
long d = s/D;
s%=D;
long h = s/H;
s%=H;
long m = s/M;
System.out.printf("%dw %dd %dh %dm",w,d,h,m);
int seconds=98765410;
int weeks = (int) (TimeUnit.SECONDS.toDays(seconds) / 7);
int days = (int) (TimeUnit.SECONDS.toDays(seconds) - 7 * weeks);
long hours = TimeUnit.SECONDS.toHours(seconds) - TimeUnit.DAYS.toHours(days) - TimeUnit.DAYS.toHours(7*weeks);
long minutes = TimeUnit.SECONDS.toMinutes(seconds) - (TimeUnit.SECONDS.toHours(seconds) * 60);
System.out.println(weeks+"w "+days+"d "+hours+"h "+minutes+"m");
Will print out:
163w 2d 2h 50m

Categories

Resources