How to use ogr2ogr in java gdal - java

I would like to code ogr2ogr -f "GeoJSON" destination.geojson source.geojson -s_srs EPSG:3068 -t_srs EPSG:4326 in java-gdal. I tried to understand how it should work by looking at this example but since ogr2ogr has many many uses I could not quite figure what is relevant for me. Here is my attempt:
public static void ogr2ogr(String name, String sourceSRS, String destSRS) {
ogr.RegisterAll();
String pszFormat = "GeoJSON";
DataSource poDS = ogr.Open(name + "_" + sourceSRS+".geojson", false);
/* -------------------------------------------------------------------- */
/* Try opening the output datasource as an existing, writable */
/* -------------------------------------------------------------------- */
DataSource poODS = ogr.Open(name + "_" + destSRS+".geojson", 0);
Driver poDriver = ogr.GetDriverByName(pszFormat);
SpatialReference poOutputSRS = new SpatialReference();
poOutputSRS.SetFromUserInput( destSRS );
SpatialReference poSourceSRS = new SpatialReference();
poSourceSRS.SetFromUserInput( sourceSRS );
CoordinateTransformation poCT = CoordinateTransformation.CreateCoordinateTransformation( poSourceSRS, poOutputSRS );
Layer poDstLayer = poODS.GetLayerByName("bla");
Feature poDstFeature = new Feature( poDstLayer.GetLayerDefn() );
Geometry poDstGeometry = poDstFeature.GetGeometryRef();
int eErr = poDstGeometry.Transform( poCT );
poDstLayer.CommitTransaction();
poDstGeometry.AssignSpatialReference(poOutputSRS);
}
I get this exception at poOutputSRS.SetFromUserInput( destSRS ); (that is line 99):
ERROR 3: Cannot open file 'C:\Users\Users\Desktop\test\target_EPSG4326.geojson'
Exception in thread "main" java.lang.RuntimeException: OGR Error: Corrupt data
at org.gdal.osr.osrJNI.SpatialReference_SetFromUserInput(Native Method)
at org.gdal.osr.SpatialReference.SetFromUserInput(SpatialReference.java:455)
at wmsRasterToGeojsonVector.GdalJava.ogr2ogr(GdalJava.java:99)
at wmsRasterToGeojsonVector.GdalJava.main(GdalJava.java:30)

Related

quarkus-redis-client java.util.concurrent.CompletionException: MOVED

I'm implementing a Redis client (standalone mode) on Quarkus.
I get this intermittent error when I query if a key already exists in Redis :
Caused by: MOVED 3271 xxxx-memorydb-redis-cluster-0001-002.XXXX-memorydb-redis-cluster.xxxx.memorydb.us-east-1.amazonaws.com:6379
java.util.concurrent.CompletionException: MOVED 3814 xxxx-memorydb-redis-cluster-XXXX-002.xxx-memorydb-redis-cluster.xxxx.memorydb.us-east-1.amazonaws.com:6379
at io.smallrye.mutiny.operators.uni.UniBlockingAwait.await(UniBlockingAwait.java:79)
at io.smallrye.mutiny.groups.UniAwait.atMost(UniAwait.java:65)
at io.quarkus.redis.client.runtime.RedisClientImpl.await(RedisClientImpl.java:1046)
at io.quarkus.redis.client.runtime.RedisClientImpl.exists(RedisClientImpl.java:172)
at io.quarkus.redis.client.RedisClient_761b9a6e5f634178e3291b09c1921f229025da0c_Synthetic_ClientProxy.exists(Unknown Source)
PART OF CODE WHERE THE ERROR OCCURS:
public ObjectNode generateMobileDeviceToken(long idClient) {
Integer mobileDeviceToken = null;
while (mobileDeviceToken == null) {
mobileDeviceToken = ThreadLocalRandom.current().nextInt(100000, 1000000);
// ************************************
// ***the error occurs on this line ***
// Verify if token already exists
int keysFound = this.redisClient.exists(List.of(REDIS_ENV + "." + MOBILE_DEVICE_TOKEN_GROUP_NAME + "." + mobileDeviceToken)).toInteger();
// ***the error occurs on this line ***
// ************************************
if (keysFound != 0) {
mobileDeviceToken = null;
}
}
this.redisClient.setex(REDIS_ENV + "." + MOBILE_DEVICE_TOKEN_GROUP_NAME + "." + mobileDeviceToken, TOKEN_TTL_S.toString(), String.valueOf(idClient));
ObjectNode mobileDeviceTokenJSON = OBJECT_MAPPER.createObjectNode();
mobileDeviceTokenJSON.put("mobileDeviceToken", mobileDeviceToken);
mobileDeviceTokenJSON.put("mobileDeviceTokenExpiration", LocalDateTime.now()
.plus(TOKEN_TTL_S, ChronoUnit.SECONDS).toEpochSecond(ZoneOffset.UTC));
return mobileDeviceTokenJSON;
}
APPLICATION PROPERTIES
# Redis
quarkus.redis.hosts=rediss://<MY-REDIS-ADDRESS>.amazonaws.com:6379
quarkus.redis.ssl.enabled=true
quarkus.redis.max-pool-size=25
quarkus.redis.max-pool-waiting=100
quarkus.redis.client-type=standalone
quarkus.redis.devservices.enabled=false

How to launch an interactive process in Windows on Java?

I need to run application in Windows with administrator rights on another user desktop.
I could do it with PsExec -i https://learn.microsoft.com/en-us/sysinternals/downloads/psexec but I want to do it in my Java application without additional exe files.
I run my code as administrator with elevated rights.
I found this article (it describes how to do it on .net):
https://www.codeproject.com/Articles/35773/Subverting-Vista-UAC-in-Both-32-and-64-bit-Archite
I translated code from article to Java but advapi32.CreateProcessAsUser returns false and I get 1314 error. Does anybody see what I missed in this code?
pom dependencies
<dependencies>
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna</artifactId>
<version>5.2.0</version>
</dependency>
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna-platform</artifactId>
<version>5.2.0</version>
</dependency>
</dependencies>
my code
import com.sun.jna.Native;
import com.sun.jna.platform.win32.*;
public class TestWinRunSessionId {
public static void main(String[] args) {
System.out.println(System.getProperty("user.name"));
// id of the process which we use as a pointer to the target desktop (not administrator) where we will open new application from current user (administrator)
int procId = 18160;
WinNT.HANDLE hProcess = Kernel32.INSTANCE.OpenProcess(
WinNT.PROCESS_ALL_ACCESS,
false,
procId
);
System.out.println(hProcess);
WinNT.HANDLEByReference hPToken = new WinNT.HANDLEByReference();
boolean openProcessToken = Advapi32.INSTANCE.OpenProcessToken(
hProcess,
WinNT.TOKEN_DUPLICATE,
hPToken
);
if (!openProcessToken) {
Kernel32.INSTANCE.CloseHandle(hProcess);
throw new RuntimeException("1");
}
System.out.println(hPToken);
WinBase.SECURITY_ATTRIBUTES sa = new WinBase.SECURITY_ATTRIBUTES();
sa.dwLength = new WinDef.DWORD(sa.size());
WinNT.HANDLEByReference hUserTokenDup = new WinNT.HANDLEByReference();
boolean duplicateTokenEx = Advapi32.INSTANCE.DuplicateTokenEx(
hPToken.getValue(),
WinNT.TOKEN_ALL_ACCESS,
sa,
WinNT.SECURITY_IMPERSONATION_LEVEL.SecurityIdentification,
WinNT.TOKEN_TYPE.TokenPrimary,
hUserTokenDup
);
if (!duplicateTokenEx) {
Kernel32.INSTANCE.CloseHandle(hProcess);
Kernel32.INSTANCE.CloseHandle(hPToken.getValue());
throw new RuntimeException("2");
}
System.out.println(hUserTokenDup);
WinBase.STARTUPINFO si = new WinBase.STARTUPINFO();
si.cb = new WinDef.DWORD(si.size());
si.lpDesktop = "winsta0\\default";
boolean result = Advapi32.INSTANCE.CreateProcessAsUser(
hUserTokenDup.getValue(), // client's access token
null, // file to execute
"C:\\Windows\\System32\\cmd.exe", // command line
sa, // pointer to process SECURITY_ATTRIBUTES
sa, // pointer to thread SECURITY_ATTRIBUTES
false, // handles are not inheritable
WinBase.CREATE_UNICODE_ENVIRONMENT | WinBase.CREATE_NEW_CONSOLE, // creation flags ???
null, // pointer to new environment block ???
null, // name of current directory
si, // pointer to STARTUPINFO structure
new WinBase.PROCESS_INFORMATION() // receives information about new process
);
System.out.println("result: " + result);
System.out.println("error: " + Native.getLastError());
}
}
According to the CreateProcessAsUser.hToken:
A handle to the primary token that represents a user. The handle must
have the TOKEN_QUERY, TOKEN_DUPLICATE, and TOKEN_ASSIGN_PRIMARY access
rights.
So, you should OpenProcessToken with TOKEN_QUERY | TOKEN_DUPLICATE | TOKEN_ASSIGN_PRIMARY.
The token duplicated does also not have enough permissions. You only specify the permissions of READ_CONTROL.
According to DuplicateTokenEx.dwDesiredAccess:
To request the same access rights as the existing token, specify zero.
So, you need to set securityLevel to zero.
Or juse specify TOKEN_QUERY | TOKEN_DUPLICATE | TOKEN_ASSIGN_PRIMARY directly at DuplicateTokenEx
According to the document, CreateProcessAsUser requires two privileges:
SE_INCREASE_QUOTA_NAME
SE_ASSIGNPRIMARYTOKEN_NAME
Corresponding to Control Panel\All Control Panel Items\Administrative Tools\Local Security Policy\Security Settings\Local Policies\User Rights Assignment:
Adjust memory quotas for a process
Replace a process level token
EDIT:
Finally, I found a way to do(The error checking was removed and pay attention to the comments inside):
#include <windows.h>
#include <iostream>
#include <stdio.h>
#pragma comment(lib, "Advapi32.lib")
int main()
{
DWORD session_id = 0;
//Get a system token from System process id.
//Why? Because the following call: "SetTokenInformation" needs "the Act as part of the operating system" privilege, and local system has.
HANDLE hSys_Process = OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION, false, 588);
HANDLE Sys_Token = 0;
OpenProcessToken(hSys_Process, TOKEN_QUERY| TOKEN_DUPLICATE, &Sys_Token);
CloseHandle(hSys_Process);
HANDLE Sys_Token_Dup;
if (!DuplicateTokenEx(Sys_Token, MAXIMUM_ALLOWED, NULL, SecurityIdentification, TokenPrimary, &Sys_Token_Dup))
{
printf("DuplicateTokenEx ERROR: %d\n", GetLastError());
return FALSE;
}
//Enabling Privileges: "SE_INCREASE_QUOTA_NAME" and "SE_ASSIGNPRIMARYTOKEN_NAME" for CreateProcessAsUser().
TOKEN_PRIVILEGES *tokenPrivs=(TOKEN_PRIVILEGES*)malloc(sizeof(DWORD)+2* sizeof(LUID_AND_ATTRIBUTES));
tokenPrivs->PrivilegeCount = 2;
LookupPrivilegeValue(NULL, SE_INCREASE_QUOTA_NAME, &tokenPrivs->Privileges[0].Luid);
LookupPrivilegeValue(NULL, SE_ASSIGNPRIMARYTOKEN_NAME, &tokenPrivs->Privileges[1].Luid);
tokenPrivs->Privileges[0].Attributes = SE_PRIVILEGE_ENABLED;
tokenPrivs->Privileges[1].Attributes = SE_PRIVILEGE_ENABLED;
AdjustTokenPrivileges(Sys_Token_Dup, FALSE, tokenPrivs, 0, (PTOKEN_PRIVILEGES)NULL, 0);
free(tokenPrivs);
//let the calling thread impersonate the local system, so that we can call SetTokenInformation().
ImpersonateLoggedOnUser(Sys_Token_Dup);
//get current process user token.
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, false, GetCurrentProcessId());
HANDLE Token = 0, hTokenDup = 0;
OpenProcessToken(hProcess, TOKEN_QUERY | TOKEN_DUPLICATE, &Token);
CloseHandle(hProcess);
if (!DuplicateTokenEx(Token, MAXIMUM_ALLOWED, NULL, SecurityIdentification, TokenPrimary, &hTokenDup))
{
printf("DuplicateTokenEx ERROR: %d\n", GetLastError());
return FALSE;
}
//set session id to token.
if (!SetTokenInformation(hTokenDup, TokenSessionId, &session_id, sizeof(DWORD)))
{
printf("SetTokenInformation Error === %d\n", GetLastError());
return FALSE;
}
//init struct.
STARTUPINFO si;
ZeroMemory(&si, sizeof(STARTUPINFO));
si.cb = sizeof(STARTUPINFO);
char temp[] = "winsta0\\default";
char applicationName[] = "C:\\Windows\\System32\\cmd.exe";
si.lpDesktop = temp;
PROCESS_INFORMATION procInfo;
ZeroMemory(&procInfo, sizeof(PROCESS_INFORMATION));
//will return error 5 without CREATE_BREAKAWAY_FROM_JOB
//see https://blogs.msdn.microsoft.com/alejacma/2012/03/09/createprocessasuser-fails-with-error-5-access-denied-when-using-jobs/
int dwCreationFlags = CREATE_BREAKAWAY_FROM_JOB | CREATE_NEW_CONSOLE;
BOOL result = CreateProcessAsUser(
hTokenDup,
NULL, // file to execute
applicationName, // command line
NULL, // pointer to process SECURITY_ATTRIBUTES
NULL, // pointer to thread SECURITY_ATTRIBUTES
false, // handles are not inheritable
dwCreationFlags, // creation flags
NULL, // pointer to new environment block
NULL, // name of current directory
&si, // pointer to STARTUPINFO structure
&procInfo // receives information about new process
);
RevertToSelf();
return 0;
}

How to manipulate parameters sending to remote object in CORBA using interceptors

New to CORBA but could establish remote method invoking from a client to server. When using interceptors and try to encrypt parameters for the remote method, it throws below
Failed to initialise ORB: org.omg.CORBA.NO_RESOURCES: vmcid: OMG minor code: 1 completed: No org.omg.CORBA.NO_RESOURCES: vmcid: OMG minor code: 1completed: No at com.sun.corba.se.impl.logging.OMGSystemException.piOperationNotSupported1(Unknown Source)
at com.sun.corba.se.impl.logging.OMGSystemException.piOperationNotSupported1(Unknown Source)
at com.sun.corba.se.impl.interceptors.ClientRequestInfoImpl.arguments(Unknown Source)
at orb.CustomClientInterceptor.send_request(CustomClientInterceptor.java:23)
From Interceptors I'm trying to access arguments and encrypt them like below.
public void send_request( ClientRequestInfo ri )
{
System.out.println( ri.arguments() );
System.out.println( "Arguments.." );
logger( ri, "send_request" );
}
But cannot even access them, it throws above error. Intercepting methods are calling fine. Could you guide me with some code or a link.
Thanks in Advance
I found the answer and if someone hits this in future..
We cannot manipulate parameters in interceptors unless the call to CORBA object is either DII or DSI call. So first you need to make a call in either of these. I did it via DII. code is as follows.
//-ORBInitialPort 1050 -ORBInitialHost localhost
Properties p = new Properties();
p.put("org.omg.PortableInterceptor.ORBInitializerClass.orb.InterceptorORBInitializer", "");
//ORB orb = ORB.init(args, p);
String[] orbArgs = { "-ORBInitialHost", "localhost", "-ORBInitialPort", "1050" };
//NO_NEED ORB orb = ORB.init( orbArgs, null );
orb = ORB.init(orbArgs, p);
//objRef = orb.resolve_initial_references( "NameService" );
//ncRef = NamingContextExtHelper.narrow( objRef );
//DII Additional configs
org.omg.CORBA.Object ncRef = orb.resolve_initial_references ("NameService");
NamingContext nc = NamingContextHelper.narrow (ncRef);
NameComponent nComp = new NameComponent ("ABC", "");
NameComponent [] path = {nComp};
objRef = nc.resolve (path);
Then do the DII call, I have some mixed code here but you will understand what to do
NVList argList = orb.create_list (valueMap.size());
for (Map.Entry<String, String> entry : valueMap.entrySet()) {
Any argument = orb.create_any ();
argument.insert_string (entry.getValue());
argList.add_value (entry.getKey().toLowerCase(), argument, org.omg.CORBA.ARG_IN.value);
}
//Result
Any result = orb.create_any ();
result.insert_string( null );
NamedValue resultVal = orb.create_named_value ("result", result, org.omg.CORBA.ARG_OUT.value);
//Invoking Method
Request thisReq = objRef._create_request (null, methodName, argList, resultVal);
thisReq.invoke ();
//Extract Result
result = thisReq.result().value ();
Now from the interceptors you will need to filter the DII call only and then access the parameters like below.
public void send_request( ClientRequestInfo ri )
{
if(ri.operation().equals( "processPayment" ))
{
System.out.println( "################# CLIENT SIDE ###############" );
int count = 0;
for(Parameter param : ri.arguments())
{
System.out.println( "Arg : "+count );
System.out.println( param.argument.extract_string());
param.argument.insert_string( EncryptionDecryption.encrypt( param.argument.extract_string() ) );
count++;
}
}
System.out.println( "Arguments.." );
logger( ri, "send_request" );
}

Spark DataFrame java.lang.OutOfMemoryError: GC overhead limit exceeded on long loop run

I'm running a Spark application (Spark 1.6.3 cluster), which does some calculations on 2 small data sets, and writes the result into an S3 Parquet file.
Here is my code:
public void doWork(JavaSparkContext sc, Date writeStartDate, Date writeEndDate, String[] extraArgs) throws Exception {
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
S3Client s3Client = new S3Client(ConfigTestingUtils.getBasicAWSCredentials());
boolean clearOutputBeforeSaving = false;
if (extraArgs != null && extraArgs.length > 0) {
if (extraArgs[0].equals("clearOutput")) {
clearOutputBeforeSaving = true;
} else {
logger.warn("Unknown param " + extraArgs[0]);
}
}
Date currRunDate = new Date(writeStartDate.getTime());
while (currRunDate.getTime() < writeEndDate.getTime()) {
try {
SparkReader<FirstData> sparkReader = new SparkReader<>(sc);
JavaRDD<FirstData> data1 = sparkReader.readDataPoints(
inputDir,
currRunDate,
getMinOfEndDateAndNextDay(currRunDate, writeEndDate));
// Normalize to 1 hours & 0.25 degrees
JavaRDD<FirstData> distinctData1 = data1.distinct();
// Floor all (distinct) values to 6 hour windows
JavaRDD<FirstData> basicData1BySixHours = distinctData1.map(d1 -> new FirstData(
d1.getId(),
TimeUtils.floorTimePerSixHourWindow(d1.getTimeStamp()),
d1.getLatitude(),
d1.getLongitude()));
// Convert Data1 to Dataframes
DataFrame data1DF = sqlContext.createDataFrame(basicData1BySixHours, FirstData.class);
data1DF.registerTempTable("data1");
// Read Data2 DataFrame
String currDateString = TimeUtils.getSimpleDailyStringFromDate(currRunDate);
String inputS3Path = basedirInput + "/dt=" + currDateString;
DataFrame data2DF = sqlContext.read().parquet(inputS3Path);
data2DF.registerTempTable("data2");
// Join data1 and data2
DataFrame mergedDataDF = sqlContext.sql("SELECT D1.Id,D2.beaufort,COUNT(1) AS hours " +
"FROM data1 as D1,data2 as D2 " +
"WHERE D1.latitude=D2.latitude AND D1.longitude=D2.longitude AND D1.timeStamp=D2.dataTimestamp " +
"GROUP BY D1.Id,D1.timeStamp,D1.longitude,D1.latitude,D2.beaufort");
// Create histogram per ID
JavaPairRDD<String, Iterable<Row>> mergedDataRows = mergedDataDF.toJavaRDD().groupBy(md -> md.getAs("Id"));
JavaRDD<MergedHistogram> mergedHistogram = mergedDataRows.map(new MergedHistogramCreator());
logger.info("Number of data1 results: " + data1DF.select("lId").distinct().count());
logger.info("Number of coordinates with data: " + data1DF.select("longitude","latitude").distinct().count());
logger.info("Number of results with beaufort histograms: " + mergedDataDF.select("Id").distinct().count());
// Save to parquet
String outputS3Path = basedirOutput + "/dt=" + TimeUtils.getSimpleDailyStringFromDate(currRunDate);
if (clearOutputBeforeSaving) {
writeWithCleanup(outputS3Path, mergedHistogram, MergedHistogram.class, sqlContext, s3Client);
} else {
write(outputS3Path, mergedHistogram, MergedHistogram.class, sqlContext);
}
} finally {
TimeUtils.progressToNextDay(currRunDate);
}
}
}
public void write(String outputS3Path, JavaRDD<MergedHistogram> outputRDD, Class outputClass, SQLContext sqlContext) {
// Apply a schema to an RDD of JavaBeans and save it as Parquet.
DataFrame fullDataDF = sqlContext.createDataFrame(outputRDD, outputClass);
fullDataDF.write().parquet(outputS3Path);
}
public void writeWithCleanup(String outputS3Path, JavaRDD<MergedHistogram> outputRDD, Class outputClass,
SQLContext sqlContext, S3Client s3Client) {
String fileKey = S3Utils.getS3Key(outputS3Path);
String bucket = S3Utils.getS3Bucket(outputS3Path);
logger.info("Deleting existing dir: " + outputS3Path);
s3Client.deleteAll(bucket, fileKey);
write(outputS3Path, outputRDD, outputClass, sqlContext);
}
public Date getMinOfEndDateAndNextDay(Date startTime, Date proposedEndTime) {
long endOfDay = startTime.getTime() - startTime.getTime() % MILLIS_PER_DAY + MILLIS_PER_DAY ;
if (endOfDay < proposedEndTime.getTime()) {
return new Date(endOfDay);
}
return proposedEndTime;
}
The size of data1 is around 150,000 and data2 is around 500,000.
What my code does is basically does some data manipulation, merges the 2 data objects, does a bit more manipulation, prints some statistics and saves to parquet.
The spark has 25GB of memory per server, and the code runs fine.
Each iteration takes about 2-3 minutes.
The problem starts when I run it on a large set of dates.
After a while, I get an OutOfMemory:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.List.$colon$colon$colon(List.scala:127)
at org.json4s.JsonDSL$JsonListAssoc.$tilde(JsonDSL.scala:98)
at org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:139)
at org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:72)
at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:144)
at org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:164)
at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:38)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:87)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:72)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:72)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:71)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:70)
Last time it ran, it crashed after 233 iterations.
The line it crashed on was this:
logger.info("Number of coordinates with data: " + data1DF.select("longitude","latitude").distinct().count());
Can anyone please tell me what can be the reason for the eventual crashes?
I'm not sure that everyone will find this solution viable, but upgrading the Spark cluster to 2.2.0 seems to have resolved the issue.
I have ran my application for several days now, and had no crashes yet.
This error occurs when GC takes up over 98% of the total execution time of process. You can monitor the GC time in your Spark Web UI by going to stages tab in http://master:4040.
Try increasing the driver/executor(whichever is generating this error) memory using spark.{driver/executor}.memory by --conf while submitting the spark application.
Another thing to try is to change the garbage collector that the java is using. Read this article for that: https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html. It very clearly explains why GC overhead error occurs and which garbage collector is best for your application.

Weka output predictions

I've used the Weka GUI for training and testing a file (making predictions), but can't do the same with the API. The error I'm getting says there's a different number of attributes in the train and test files. In the GUI, this can be solved by checking "Output predictions".
How to do something similar using the API? do you know of any samples out there?
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.NominalToBinary;
import weka.filters.unsupervised.attribute.Remove;
public class WekaTutorial
{
public static void main(String[] args) throws Exception
{
DataSource trainSource = new DataSource("/tmp/classes - edited.arff"); // training
Instances trainData = trainSource.getDataSet();
DataSource testSource = new DataSource("/tmp/classes_testing.arff");
Instances testData = testSource.getDataSet();
if (trainData.classIndex() == -1)
{
trainData.setClassIndex(trainData.numAttributes() - 1);
}
if (testData.classIndex() == -1)
{
testData.setClassIndex(testData.numAttributes() - 1);
}
String[] options = weka.core.Utils.splitOptions("weka.filters.unsupervised.attribute.StringToWordVector -R first-last -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer -M 1 "
+ "-tokenizer \"weka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"");
Remove remove = new Remove();
remove.setOptions(options);
remove.setInputFormat(trainData);
NominalToBinary filter = new NominalToBinary();
NaiveBayes nb = new NaiveBayes();
FilteredClassifier fc = new FilteredClassifier();
fc.setFilter(filter);
fc.setClassifier(nb);
// train and make predictions
fc.buildClassifier(trainData);
for (int i = 0; i < testData.numInstances(); i++)
{
double pred = fc.classifyInstance(testData.instance(i));
System.out.print("ID: " + testData.instance(i).value(0));
System.out.print(", actual: " + testData.classAttribute().value((int) testData.instance(i).classValue()));
System.out.println(", predicted: " + testData.classAttribute().value((int) pred));
}
}
}
Error:
Exception in thread "main" java.lang.IllegalArgumentException: Src and Dest differ in # of attributes: 2 != 17152
This was not an issue for the GUI.
You need to ensure that categories in train and test sets are compatible, try to
combine train and test sets
List item
preprocess them
save them as arff
open two empty files
copy the header from the top to line "#data"
copy in training set into first file and test set into second file

Categories

Resources