i tried to use Apache Avro on project... and i've met some difficulties
avro serialization/ deserialization work like a charm ... but i get decoder exceptions.. like unknown union branch blah-blah-blah... in case incomming json does't contain namepsace record ...
e.g.
"user":{"demo.avro.User":{"age":1000... //that's ok
"user":{"age":1000... //org.apache.avro.AvroTypeException: Unknown union branch age
I cannot put object in default namespace... but it is important to parse incoming json regardless it contains namespace node or not
Could you help me to fix it
If you use JSON, why are you using Avro decoders? There are tons of JSON libraries which are designed to work with JSON: with Avro, the idea is to Avro's own compact format, and JSON is mostly used for debugging (i.e. you can expose Avro data as JSON if necessary).
Related
I am building a tool in an Apache Beam pipeline which will ingest lots of different types of data (different Schemas, different filetypes, etc.) and will output the results as Avro files. Because there are many different types of output schemas, I'm using GenericRecords to write the Avro data. These GenericRecords include schemas generated during ingestion for each unique file / schema layout. In general, I have been using the built in Avro Schema class to handle these.
I tried using DecoderFactory to convert the Json data to Avro
DecoderFactory decoderFactory = new DecoderFactory();
Decoder decoder = decoderFactory.jsonDecoder(schema, content);
DatumReader<GenericData.Record> reader = new GenericDatumReader<>(schema);
return reader.read(null, decoder);
Which works just fine for the most part, except for when I have a case of a schema that has nullable fields, because the data is being read in from a JSON format that does not include typed fields, so when it creates the Schema it knows whether or not that field can be nullable, or is required, etc. This produces a problem when it writes the data to Avro:
If I have a nullable record that looks like this:
{"someField": "someValue"}
Avro is expecting the JSON data to look like this:
{"someField": {"string": "someValue"}}. This presents a problem anytime this combination appears (which is very frequent).
One possible solution raised was to use an AvroMapper. I laid it out like it shows on that page, created the Schema object as an AvroSchema, packaged the data into a byte array with the schema using AvroMapper.writter()
static GenericRecord convertJsonToGenericRecord(String content, Schema schema)
throws IOException {
JsonNode node = ObjectMappers.defaultObjectMapper().readTree(content);
AvroSchema avroSchema = new AvroSchema(schema);
byte[] avroData =
mapper
.writer(avroSchema)
.writeValueAsBytes(node);
return mapper.readValue(avroData, GenericRecord.class);
Which may hopefully get around the typing problem with nullable records, but which is still giving me issues in the form of not recognizing that the AvroSchema is inside the actual byte array that I'm passing in (avroData). Here is the stack trace:
com.fasterxml.jackson.core.JsonParseException: No AvroSchema set, can not parse
at com.fasterxml.jackson.dataformat.avro.deser.MissingReader._checkSchemaSet(MissingReader.java:68)
at com.fasterxml.jackson.dataformat.avro.deser.MissingReader.nextToken(MissingReader.java:41)
at com.fasterxml.jackson.dataformat.avro.deser.AvroParserImpl.nextToken(AvroParserImpl.java:97)
at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4762)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4668)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3691)
When I checked the avroData byte array just to see what it looked like, it did not include anything other than the actual value I passed into it. It didn't include the schema, and it didn't even include the header or key. For the test, I'm using a single K/V pair as in the example above, and all I got back was the value.
An alternative route that I may pursue if this doesn't work is to manually format the JSON data as it comes in, but this is messy, and will require lots of recursion. I'm 99% sure that I can get it working that way, but would love to avoid it if at all possible.
To reiterate, what I'm trying to do is package incoming JSON-formatted data (string, byte array, node, whatever) with an Avro Schema to create GenericRecords which will be output to .avro files. I need to find a way to ingest the data and Schema such that it will allow for nullable records to be untyped in the JSON-string.
Thank you for your time, and don't hesitate to ask clarifying questions.
I have a json file that i want to sent it's contents to a Kafka consumer. The kafka uses Avro and a Schema that adheres to the Json i want to send .So is there a way to read the json and then send the whole contents of it through kafka without the need to first parse the json and then send everything separately with keys and values?
Thanks.
Assuming you're using the Schema Registry, sure, you can remove whitespace from the file, then read it as stdin
kafka-avro-console-consumer ... < $(cat file.json)
Otherwise, you would need to write your own producer to be able to plugin the Avro serializer
I have a scenario where to convert the messages present as Json object to Apache Parquet format using Java. Any sample code or examples would be helpful. As far as what I have found to convert the messages to Parquet either Hive, Pig, Spark are being used. I need to convert to Parquet without involving these only by Java.
To convert JSON data files to Parquet, you need some in-memory representation. Parquet doesn't have its own set of Java objects; instead, it reuses the objects from other formats, like Avro and Thrift. The idea is that Parquet works natively with the objects your applications probably already use.
To convert your JSON, you need to convert the records to Avro in-memory objects and pass those to Parquet, but you don't need to convert a file to Avro and then to Parquet.
Conversion to Avro objects is already done for you, see Kite's JsonUtil, and is ready to use as a file reader. The conversion method needs an Avro schema, but you can use that same library to infer an Avro schema from JSON data.
To write those records, you just need to use ParquetAvroWriter. The whole setup looks like this:
Schema jsonSchema = JsonUtil.inferSchema(fs.open(source), "RecordName", 20);
try (JSONFileReader<Record> reader = new JSONFileReader<>(
fs.open(source), jsonSchema, Record.class)) {
reader.initialize();
try (ParquetWriter<Record> writer = AvroParquetWriter
.<Record>builder(outputPath)
.withConf(new Configuration)
.withCompressionCodec(CompressionCodecName.SNAPPY)
.withSchema(jsonSchema)
.build()) {
for (Record record : reader) {
writer.write(record);
}
}
}
I had the same problem, and what I understood that there are not much samples available for parquet write without using avro or other frameworks. Finally I went with Avro. :)
Have a look at this, may help you.
I have a scenario where to convert the messages present as Json object to Apache Parquet format using Java. Any sample code or examples would be helpful. As far as what I have found to convert the messages to Parquet either Hive, Pig, Spark are being used. I need to convert to Parquet without involving these only by Java.
To convert JSON data files to Parquet, you need some in-memory representation. Parquet doesn't have its own set of Java objects; instead, it reuses the objects from other formats, like Avro and Thrift. The idea is that Parquet works natively with the objects your applications probably already use.
To convert your JSON, you need to convert the records to Avro in-memory objects and pass those to Parquet, but you don't need to convert a file to Avro and then to Parquet.
Conversion to Avro objects is already done for you, see Kite's JsonUtil, and is ready to use as a file reader. The conversion method needs an Avro schema, but you can use that same library to infer an Avro schema from JSON data.
To write those records, you just need to use ParquetAvroWriter. The whole setup looks like this:
Schema jsonSchema = JsonUtil.inferSchema(fs.open(source), "RecordName", 20);
try (JSONFileReader<Record> reader = new JSONFileReader<>(
fs.open(source), jsonSchema, Record.class)) {
reader.initialize();
try (ParquetWriter<Record> writer = AvroParquetWriter
.<Record>builder(outputPath)
.withConf(new Configuration)
.withCompressionCodec(CompressionCodecName.SNAPPY)
.withSchema(jsonSchema)
.build()) {
for (Record record : reader) {
writer.write(record);
}
}
}
I had the same problem, and what I understood that there are not much samples available for parquet write without using avro or other frameworks. Finally I went with Avro. :)
Have a look at this, may help you.
I am using Spring security oauth2. By default oauth2 returns it's own error format like {error : "Invalid_grant", error_description : "something"}. I want to change it my own custom format so in my application, it remains consistent. Can anyone please help me? I have gone through lots of links but didn't find any suitable solution till now.
What you get as a result is a JSON document.
Look at Jackson or Gson libraries for example to parse(deserialize) JSON documents. You can get data values 1 by 1 or deserialize into a class instance.
Once you parse modify the data as you wish
Use the same library Jackson or Gson to write(serialize) a new JSON document.
Jackson may also produce as output XML, YAML and CSV documents
https://github.com/FasterXML/jackson-core
https://github.com/google/gson