Assertion failed on spark using GeoSpark

Assertion failed on spark using GeoSpark - java

I have the following dataframe :
+--------------+-------------------+---------------------+
|longitude_f | latitude_f | geom |
+--------------+-------------------+---------------------+
|7.0737816 |33.82666 | 00 00 00 00 01 0..|
|-7.6198783 |33.5942549 | 00 00 00 00 01 0..|
+--------------+-------------------+---------------------+
longitude_f : double ----
latitude_f : double ----
geom : geometry ----
When I execute the following code :
Dataset <Row> result_f = sparkSession.sql("select * from data_f where ST_Within(ST_GeomFromText(CONCAT('POINT(',longitude_f,' ',latitude_f,')',4326)),geom)");
result_f.show();
I get the following exception :
20/08/01 19:50:36 ERROR Executor: Exception in task 87.0 in stage 40.0 (TID 930)
java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:156)
at org.apache.spark.sql.geosparksql.expressions.ST_PolygonFromEnvelope.eval(Constructors.scala:250)
at org.apache.spark.sql.geosparksql.expressions.ST_Within.eval(Predicates.scala:105)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.And_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown Source)
at org.apache.spark.sql.execution.joins.CartesianProductExec$$anonfun$doExecute$1$$anonfun$2.apply(CartesianProductExec.scala:89)
at org.apache.spark.sql.execution.joins.CartesianProductExec$$anonfun$doExecute$1$$anonfun$2.apply(CartesianProductExec.scala:88)
I need your help .
Thank you

ST_PolygonFromEnvelope takes 4 args and they have assertion which is failing
assert(inputExpressions.length == 4)
provide 4 input arguments to this function and proceed.
On a side note, i would suggest download the source from git and check these.
Also refer the documentation, I think, you are not doing that

Related

How does one decode bytes from QR code in Java? [duplicate]

This question already has answers here:
Storing binary data in QR codes
(7 answers)
Closed 3 years ago.
Imagine I've got this example Node buffer Buffer.from([1,2,3]) which I've encoded into a QR code as specified by the qrcode npm package's documentation.
const buffer = Buffer.from([1, 2, 3]);
QRCode.toFile(`public/images/${product.name}.png`, [{ data: buffer, mode: 'byte' }], {
color: { light: '#0000' },
}, (err) => { if (err) throw err; });)
This is the output:
Now I must recover those encoded bytes over at my Android application. I'm currently using Zxing and the output looks like this. Unfortunately, there's a bunch of formatting bytes added into the mix in the beginning and end (ec 11 ec 11...).
A more complex example yields the same results. For buffer 16 76 20 42 3f 5d 4d d5 82 9a 45 19 32 90 19 53 00 00 00 02 00 00 00 63 0e 46 72 6f 7a 65 6e 20 4c 61 73 61 67 6e 65, here's the Zxing decoding:
What is that 427 value at the start and how do I get rid of it? Because otherwise the data looks good. How would I go about retrieving my ByteBuffer back?
Thank you!

After spending too much time I'm ashamed of revealing, I've found a way to fix it.
The solution was encoding the buffer into Base64.
const buffer = Buffer.from([1, 2, 3]);
QRCode.toFile(`public/images/${product.name}.png`, buffer.toString('base64'), {
color: { light: '#0000' },
}, (err) => { if (err) throw err; });)
Then over at Java, just decode it.
byte[] res = Base64.getDecoder().decode(rawResult.getText());
The resulting bytes no longer include the formatting as they otherwise would via the previous approach.

Pack200 / Network Transfer Format Spec format specification for SourceDebugExtension attribute

If you try to pack spring-context 5.0.1.RELEASE JAR with pack200, the packer complains that it does not know the class attribute SourceDebugExtension that is used in a couple of classes in there that were compiled from Kotlin classes.
JSR-045 defines this attribute as
The SourceDebugExtension attribute is an optional attribute in the attributes table of the ClassFile structure. There can be no more than one SourceDebugExtension attribute in the attributes table of a given ClassFile structure.
The SourceDebugExtension attribute has the following format:
SourceDebugExtension_attribute {
u2 attribute_name_index;
u4 attribute_length;
u1 debug_extension[attribute_length];
}
The items of the SourceDebugExtension_attribute structure are as follows:
attribute_name_index
The value of the attribute_name_index item must be a valid index into the constant_pool table. The constant_pool entry at that index must be a CONSTANT_Utf8_info structure representing the string "SourceDebugExtension".
attribute_length
The value of the attribute_length item indicates the length of the attribute, excluding the initial six bytes. The value of the attribute_length item is thus the number of bytes in the debug_extension[] item.
debug_extension[]
The debug_extension array holds a string, which must be in UTF-8 format. There is no terminating zero byte.
The string in the debug_extension item will be interpreted as extended debugging information. The content of this string has no semantic effect on the Java Virtual Machine.
The Network Transfer Format Spec defines how to define the format of such attributes, so that pack200 can handle them.
Pack200 allows to skip files with those attributes or throw those attributes out or define their format according to the Network Transfer Format Spec. Unfortunatley I didn't get the format specifier correct for the attribute to get parsed correctly. An example Hexdump of the actual data that needs to be matched by the format specifier - which is the value of debug_extension[] is
00000b90: 53 4d 41 50 0a 42 65 61 6e 44 ; SMAP.BeanD
00000ba0: 65 66 69 6e 69 74 69 6f 6e 44 73 6c 2e 6b 74 0a ;efinitionDsl.kt.
00000bb0: 4b 6f 74 6c 69 6e 0a 2a 53 20 4b 6f 74 6c 69 6e ;Kotlin.*S Kotlin
00000bc0: 0a 2a 46 0a 2b 20 31 20 42 65 61 6e 44 65 66 69 ;.*F.+ 1 BeanDefi
00000bd0: 6e 69 74 69 6f 6e 44 73 6c 2e 6b 74 0a 6f 72 67 ;nitionDsl.kt.org
00000be0: 2f 73 70 72 69 6e 67 66 72 61 6d 65 77 6f 72 6b ;/springframework
00000bf0: 2f 63 6f 6e 74 65 78 74 2f 73 75 70 70 6f 72 74 ;/context/support
00000c00: 2f 42 65 61 6e 44 65 66 69 6e 69 74 69 6f 6e 44 ;/BeanDefinitionD
00000c10: 73 6c 24 62 65 61 6e 24 31 24 63 75 73 74 6f 6d ;sl$bean$1$custom
00000c20: 69 7a 65 72 24 31 0a 2a 4c 0a 31 23 31 2c 32 37 ;izer$1.*L.1#1,27
00000c30: 33 3a 31 0a 2a 45 0a ;3:1.*E.
Unfortunately I was not able to find the correct format yet. I hope someone here either did this already or has more luck in finding the right format.

Finally I found a working solution myself.
The format is a bit tricky, as the SourceDebugExtension attribute is defined as a direct UTF-8 string without any terminating character like \0 and in the format string you cannot define something like "take all remaining bytes" or "do until the end of the byte array is reached".
But after reading up a bit on the possibilities in the format string and on the format of the content of the SourceDebugExtension attribute, I came up with a format that should work in most cases.
The SourceDebugExtension attribute carries a resolved SMAP. Resolved is important in this case, because in an unresolved SMAP there can be embedded SMAPs that would already contain an end section and this would make it a bit more complex, but not impossible. In a resolved SMAP you always have in the end <line terminator>*E<line terminator> where <line terminator> could be the usual suspects \r, \n or \r\n and this sequence is impossible to appear earlier in the SMAP if it is resolved.
Now we can use the union layout element with a recursive self-call to build the following format string that will match the SMAPs correctly in most cases. The only thing this format string assumes is, that if before the *E the line terminator \r\n is found, it is also expected after it and if only \r or \n is found before, only \r or \n is expected after it. Which one is not important, just not \r\n. If it would happen, the packing would fail, complaining that one byte was not handled. But if we would check for both characters and there is only one left, we would get an ArrayIndexOutOfBoundsException, and I think this is the less likely case, that different line terminators are mixed.
So here my current approach:
[TB(10)[TB(42)[TB(69)[TB(13,10)[]()[(0)]]()[(0)]]()[(0)]](13)[TB(10)[TB(42)[TB(69)[TB(13)[TB(10)[]()[(0)]]()[(0)]]()[(0)]]()[(0)]](42)[TB(69)[TB(13,10)[]()[(0)]]()[(0)]]()[(0)]]()[(0)]]
For better understanding here the same format with some spacing and semantic content. Like this it cannot be used directly. It has to be passed through com.sun.java.util.jar.pack.Attribute#normalizeLayoutString which is a public static method in a package-private class, so regularly not accessible. If you use reflection or Groovy which does it for you or copy the method body into your own method, you can use this version in your code of course.
[
# covered likely cases:
# \\n*E\\n
# \\r\\n*E\\r\\n
# \\r*E\\r
#
# covered unlikely cases:
# \\n*E\\r
# \\r*E\\n
#
# uncovered unlikely cases:
# \\n*E\\r\\n
# \\r*E\\r\\n
# \\r\\n*E\\r
# \\r\\n*E\\n
TB
(\\\n) [
# covered cases:
# \\n*E\\r
# \\n*E\\n
TB
(\\*) [
TB
(\\E) [
TB
(\\\r, \\\n) []
() [(0)]
]
() [(0)]
]
() [(0)]
]
(\\\r) [
# covered cases:
# \\r\\n*E\\r\\n
# \\r*E\\r
# \\r*E\\n
TB
(\\\n) [
# covered cases:
# \\r\\n*E\\r\\n
TB
(\\*) [
TB
(\\E) [
TB
(\\\r) [
TB
(\\\n) []
() [(0)]
]
() [(0)]
]
() [(0)]
]
() [(0)]
]
(\\*) [
# covered cases:
# \\r*E\\r
# \\r*E\\n
TB
(\\E) [
TB
(\\\r, \\\n) []
() [(0)]
]
() [(0)]
]
() [(0)]
]
() [(0)]
]

Getting NumberFormatException in map reduce program for some strange reason

Not sure why I'm getting this error. I installed hadoop 2.7.3 via brew on my MBP. I think I'm running it in single node
Everything I'm asking about is from this hadoop tutorial site. I'm getting a NumberFormatException error, but it says it's "null".
First, here's the input file:
1979 23 23 2 43 24 25 26 26 26 26 25 26 25
1980 26 27 28 28 28 30 31 31 31 30 30 30 29
1981 31 32 32 32 33 34 35 36 36 34 34 34 34
1984 39 38 39 39 39 41 42 43 40 39 38 38 40
1985 38 39 39 39 39 41 41 41 00 40 39 39 45
Only one space between each integer. The only weird thing is the single digit number but that's not null.
Next, here's the error message I get when running the program:
snip snip
snip snip
17/03/06 17:21:40 WARN mapred.LocalJobRunner: job_local1731001664_0001
java.lang.Exception: java.lang.NumberFormatException: null
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.NumberFormatException: null // complains something is null here
at java.lang.Integer.parseInt(Integer.java:454)
at java.lang.Integer.parseInt(Integer.java:527)
at com.servicenow.bigdata.ProcessUtil$E_EMapper.map(ProcessUtil.java:35)
at com.servicenow.bigdata.ProcessUtil$E_EMapper.map(ProcessUtil.java:16)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
snip snip
snip snip
Lastly, here's a snippet from the offending line/function above:
public void map(LongWritable key, Text value, // offending line #16 here
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException
{
String line = value.toString();
String lasttoken = null;
StringTokenizer s = new StringTokenizer(line,"\t");
String year = s.nextToken();
while(s.hasMoreTokens())
{
lasttoken=s.nextToken();
}
int avgprice = Integer.parseInt(lasttoken); // offneding #35 line here
output.collect(new Text(year), new IntWritable(avgprice));
Thanks in advance for your help. Hopefully I'm not wasting people's time if this is a simple mistake.

It seems s.hasMoreTokens() is false from the beginning on, therefore lasttoken remains null and hence the NumberFormatException: null when trying to parse it.
Also, if there is a space ' ' between each number and you are trying to split the tokens with a tab '\t'there won't be any tokens

TutorialsPoint has outdated code. It tells you to download Hadoop 1.2.1? That is several years old... Go check the official Hadoop MapReduce tutorials.
You have no tabs in your data that you copied, just spaces.
You can test that same exact code outside of MapReduce.
You can replace all that string stuff with this
if (value == null) return null;
String[] splits = value.toString().split("\\s+");
String year = splits[0];
String lasttoken = splits[splits.length - 1];

Make sure that your text file has only Space delimiter.
Change the Code as following also works.
StringTokenizer s = new StringTokenizer(line,"\t");

Java Convert an object in table format

I'm implementing an api that reads data from json response and writes the resulting objects to csv.
Is there a way to convert an object in java to a table format (row-column)?
E.g. assume I have these objects:
public class Test1 {
private int a;
private String b;
private Test2 c;
private List<String> d;
private List<Test2> e;
// getters-setters ...
}
public class Test2 {
private int x;
private String y;
private List<String> z;
// getters-setters ...
}
Lets say I have an instance with the following values
Test1 c1 = new Test1();
c1.setA(11);
c1.setB("12");
c1.setC(new Test2(21, "21", Arrays.asList(new String[] {"211", "212"}) ));
c1.setD(Arrays.asList(new String[] {"111", "112"}));
c1.setE(Arrays.asList(new Test2[] {
new Test2(31, "32"),
new Test2(41, "42")
}));
I would like to see something like this returned as a List<Map<String, Object>> or some other object:
a b c.x c.y c.z d e.x e.y
---- ---- ------ ------- ------ ---- ------ ------
11 12 21 21 211 111 31 32
11 12 21 21 211 111 41 42
11 12 21 21 211 112 31 32
11 12 21 21 211 112 41 42
11 12 21 21 212 111 31 32
11 12 21 21 212 111 41 42
11 12 21 21 212 112 31 32
11 12 21 21 212 112 41 42
I have already implemented something in order to achieve this result using reflections but my solution is too slow for larger objects.
I was thinking in using an in memory database so to convert the object into a database table and then select the result, something like MongoDB or ObjectDB, but I think its an overkill, and maybe slower than my approach. Also, these two do not support in memory database and I do not want to use another disk database, since I'm already using MySQL with hibernate. Usint ramdisk is not an option, since my server only has limited ram. Is there there an in memory oodbms that can do this?
I would prefeer as a solution an algorithm, or even better, if there is already a library that can convert any object to a row-column format? something like jackson or jaxb that convert data to/from other formats.
Thanks for the help

Finally after one week of banging my head against any possible thing available in my house I managed to find a solution.
I shared the code on GitHub so that if anyone ever encounters this problem again, he can avoids a couple of migranes :)
you can get the code from here:
https://github.com/Sebb77/Denormalizer
Note: I had to use the getType() function and the FieldType enum for my specific problem.
In the future I will try to speed up the code with some caching, or something else :)
Note2: this is just a sample code that should be used only for reference. Lots of improvements can be done.
Anyone is free to use the code, just send me a thank you email :)
Any suggestions, improvements or bugs reports are very welcome.

Field seems to have become binary field on update from Lucene 3.6 to Lucene 4.1

Strange test failure after converting code from Lucene 3.6 to Lucene 4.1
public void testIndexPuid() throws Exception {
addReleaseOne();
RAMDirectory ramDir = new RAMDirectory();
createIndex(ramDir);
IndexReader ir = IndexReader.open(ramDir);
Fields fields = MultiFields.getFields(ir);
Terms terms = fields.terms("puid");
TermsEnum termsEnum = terms.iterator(null);
termsEnum.next();
assertEquals("efd2ace2-b3b9-305f-8a53-9803595c0e38", termsEnum.term());
}
returns:
Expected :efd2ace2-b3b9-305f-8a53-9803595c0e38
Actual :[65 66 64 32 61 63 65 32 2d 62 33 62 39 2d 33 30 35 66 2d 38 61 35 33 2d 39 38 30 33 35 39 35 63 30 65 33 38]
It seems to be adding the field as a binary field rather than a text field, but I checked and the field is being added using the deprecated
new Field("puid", value, Field.Index.NOT_ANALYZED_NO_NORMS, new KeywordAnalyzer())
so shouldn't that work the same way as before ?

Doh, my bad missing utf8ToString(), line should be: assertEquals("efd2ace2-b3b9-305f-8a53-9803595c0e38", termsEnum.term().utf8ToString()); – Paul Taylor Feb 19 at 22:20

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Assertion failed on spark using GeoSpark - java

Related

How does one decode bytes from QR code in Java? [duplicate]

Pack200 / Network Transfer Format Spec format specification for SourceDebugExtension attribute

Getting NumberFormatException in map reduce program for some strange reason

Java Convert an object in table format

Field seems to have become binary field on update from Lucene 3.6 to Lucene 4.1

Categories

Resources