HBase put[util.List[Put]) does not work - java

I have been trying to insert List of Records into HBase using HBase client library.
it works for a single Put in either Table or HTable(deprecated) but does not recognize List(Puts)
Error Says: Expected: util.List, but was List
Could not understand the meaning of this error. Tried converting to JavaList but did not succeed.
Any quick advice would be higher appreciable.
`
val quoteRecords = new ListBuffer[Put]()
val listQuotes = lines.foreachRDD(rdd => {
rdd.foreach(record => addPutToList(buildPut(record)))
})
table.put(quoteRecords.toList)
quoteRecords.foreach(table.put)
println(listQuotes)
`

listQuotes.toList returns a type of a scala.List. You have to convert that into a java.util.List type. this sof thread will give you some insight.

Related

Passing data to Tensorflow model in Java

I'm trying to use a Tensorflow model that I trained in python to score data in Scala (using TF Java API). For the model, I've used thisregression example, with the only change being that I dropped asText=True from export_savedmodel.
My snippet of Scala:
val b = SavedModelBundle.load("/tensorflow/tf-estimator-tutorials/trained_models/reg-model-01/export/1531933435/", "serve")
val s = b.session()
// output = predictor_fn({'csv_rows': ["0.5,1,ax01,bx02", "-0.5,-1,ax02,bx02"]})
val input = "0.5,1,ax01,bx02"
val inputTensor = Tensor.create(input.getBytes("UTF-8"))
val result = s.runner()
.feed("csv_rows", inputTensor)
.fetch("dnn/logits/BiasAdd")
.run()
.get(0)
When I run, I get the following error:
Exception in thread "main" java.lang.IllegalArgumentException: Input to reshape is a tensor with 2 values, but the requested shape has 4
[[Node: dnn/input_from_feature_columns/input_layer/alpha_indicator/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _output_shapes=[[?,2]], _device="/job:localhost/replica:0/task:0/device:CPU:0"](dnn/input_from_feature_columns/input_layer/alpha_indicator/Sum, dnn/input_from_feature_columns/input_layer/alpha_indicator/Reshape/shape)]]
at org.tensorflow.Session.run(Native Method)
at org.tensorflow.Session.access$100(Session.java:48)
at org.tensorflow.Session$Runner.runHelper(Session.java:298)
at org.tensorflow.Session$Runner.run(Session.java:248)
I figure that there's a problem with how I've prepared my input Tensor, but I'm stuck on how to best debug this.
The error message suggests that the shape of the input tensor in some operation isn't what is expected.
Looking at the Python notebook you linked to (particularly section 8a and 8c), it seems that the input tensor is expected to be a "batch" of string tensors, not a single string tensor.
You can observe this by comparing the shapes of the tensors in your Scala and Python program (inputTensor.shape() in scala vs. the shape of csv_rows provided to predict_fn in the Python notebook).
From that, it seems what you want is for inputTensor to be a vector of strings, not a single scalar string. To do that, you'd want to do something like:
val input = Array("0.5,1,ax01,bx02")
val inputTensor = Tensor.create(input.map(x => x.getBytes("UTF-8"))
Hope that helps

Filter by array in Spark [duplicate]

I'm trying to filter a Spark DataFrame using a list in Java.
java.util.List<Long> selected = ....;
DataFrame result = df.filter(df.col("something").isin(????));
The problem is that isin(...) method accepts Scala Seq or varargs.
Passing in JavaConversions.asScalaBuffer(selected) doesn't work either.
Any ideas?
Use stream method as follows:
df.filter(col("something").isin(selected.stream().toArray(String[]::new))))
A bit shorter version would be:
df.filter(col("something").isin(selected.toArray()));

Or operator inside When function Spark Java API

How to use Or operation inside when function in Spark Java API. I want something like this but I get a compiler error.
Dataset<Row> ds = ds1.withColumn("Amount2", when(ds2.col("Type").equalTo("A") Or ds2.col("Type").equalTo("B"), "Amount1").otherwise(0))
Can somebody guide me please with a sample expression.
You should use or method:
ds2.col("Type").equalTo("A").or(ds2.col("Type").equalTo("B"))
With equalTo isin should work as well:
ds2.col("Type").isin("A", "B")

Getting the size of arraylist from a MongoDB table

I am trying to get the count of followers from user table which is basically a array list using java programming.
I am using below query to get the count using command line interface.
db.userinfo.aggregate([{ $project : {followersCount : {$size: "$followers"}}}])
But I am not able to create the same query in java as I am new. Below is the code I wrote in java and I am getting com.mongodb.MongoCommandException: Command failed with error 17124: 'The argument to $size must be an array, but was of type: int' Error.
AggregateIterable<Document> elements = collectionUserInfo.aggregate(
Arrays.asList(new BasicDBObject("$project", new BasicDBObject("followers",
new BasicDBObject("$size", new BasicDBObject("followers",1).put("followers",new BasicDBObject("$ifNull", "[]")))))));
Can Anyone please help me with this
You can try something like with Java 8 and Mongo Driver 3.x version. You should try not to use old type ( BasicDbObject) and where possible use api method.
List<Integer> followersCount = collectionUserInfo.aggregate(
Arrays.asList(Aggregates.project(Projections.computed(
"followersCount",
Projections.computed("$size", "$followers"))
)
))
.map(follower -> follower.getInteger("followersCount"))
.into(new ArrayList<>());
I tried this and it worked fine.
AggregateIterable<Document> elements = collectionUserInfo.aggregate(
Arrays.asList(new BasicDBObject("$project", new BasicDBObject("followers",
new BasicDBObject("$size", "$followers")))));
Thanks

Apache Flink : Extract TypeInformation of Tuple

I am using FlinkKafkaConsumer09 wherein I have a ByteArrayDeseializationSchema implementing KeyedDeserializationSchema> , now in the getProducedType how do I extract the TypeInformation.
I read in the documentation that TypeExtractor.getForClass method does not support ParameterizedTypes , which method of TypeExtractor should I use to to achieve this ?
I think we have to use createTypeInfo method , can you please tell me how do I use this to return the TypeInformation ?
If the returned type of your deserialization schema is a byte[] then you can use PrimitiveArrayTypeInfo.BYTE_PRIMITIVE_ARRAY_TYPE_INFO as the return value of getProducedType.
If you want a TypeInformation of Tuple2<byte[], byte[]> :
return TupleTypeInfo.getBasicAndBasicValueTupleTypeInfo(byte[].class, byte[].class)
could be possible ?
Type Hints in the Java API
To help cases where Flink cannot reconstruct the erased generic type information, the Java API offers so called type hints from version 0.9 on. The type hints tell the system the type of the data set produced by a function. The following gives an example:
DataSet<SomeType> result = dataSet
.map(new MyGenericNonInferrableFunction<Long, SomeType>())
.returns(SomeType.class);
The returns statement specifies the produced type, in this case via a class. The hints support type definition through
Classes, for non-parameterized types (no generics)
Strings in the form of returns("Tuple2<Integer, my.SomeType>"), which are parsed and converted to a TypeInformation.
A TypeInformation directly
You can try this one :
case class CaseClassForYourExpectedMessage(key:String,value:String)
/* getProducedType method */
override def getProducedType: TypeInformation[CaseClassForYourExpectedMessage] =
TypeExtractor.getForClass(classOf[CaseClassForYourExpectedMessage])

Categories

Resources