Sending Whole Json to kafka with avro serialization? - java

I have a json file that i want to sent it's contents to a Kafka consumer. The kafka uses Avro and a Schema that adheres to the Json i want to send .So is there a way to read the json and then send the whole contents of it through kafka without the need to first parse the json and then send everything separately with keys and values?
Thanks.

Assuming you're using the Schema Registry, sure, you can remove whitespace from the file, then read it as stdin
kafka-avro-console-consumer ... < $(cat file.json)
Otherwise, you would need to write your own producer to be able to plugin the Avro serializer

Related

can we send a flat file to IBM MQ or we have to create a string representation of the flat file so that it can be send as a message

I do not have a Flat file which could be converted to string directly. Instead I have a Java object which I need to send as a Flat File to MQ.
You can use IBM MQ with JMS
There are following options, you can use to send your java class data into MQ queue.
TextMessage - can send any kind of text data, i.e. you can serialize your java object to XML using JAXB (or JAX-WS in case of SOAP), or JSON using something like GSON, Jackson, JSONP (Resteasy,Jersey in case of REST etc) or even CSV
ObjectMessage - can send java serializable objects, please note - this is risky
BytesMessage - can send any kind of data (including any kind of text format). I.e. you can serialize your java objects into some binary format like Google Protocol Buffers or some ASN.1 etc.
You could have a ToString method on your object, or convert into XML or JSON, which will serialise.
So for example if you use JSON, your receiving app can deserialise the flattened string into JSON, then an appropriate Java Object
The sending app -
import org.json.simple.parser.JSONParser;
import org.json.simple.JSONObject;
JSONObject obj = new JSONObject();
obj.put("xxx", yourJavaObject.somepropertyormethod);
obj.put("yyy", yourJavaObject.someotherpropertyormethod);
String msgAsAString = obj.toString();

How to unmarshal JSON to multiple items with JBoss Fuse (camel)?

I currently have the following simplified JSON (test.json):
{"type":"monkey","food":"banana"},{"type":"dog","food":"bone"},{"type":"cat","food":"fish"}
Doing the following:
from("file:/tmp/input").routeId("test")
.unmarshal().json(JsonLibrary.Jackson)
.log("After unmarshall:\n ${body}");
When dropping the JSON file into the input folder, I only get the following map:
{type=monkey, food=banana}
How to get all items into the mapping?
If you want to get all then put them to an array like
[{"type":"monkey","food":"banana"},{"type":"dog","food":"bone"},{"type":"cat","food":"fish"}]
Then setup dataFormat with useList option true. In Xml DSL that i tried is
<dataFormats>
<json id="json" library="Jackson" useList="true"/>
</dataFormats>
In Java is
JacksonDataFormat json = new JacksonDataFormat();
json.useList();
Then use unmarshall with the preceding data format
XML DSL:
<unmarshal ref="json"/>
So you should have :
from("file:/tmp/input").routeId("test")
.unmarshal().json(json)
.log("After unmarshall:\n ${body}");
And the output is a list:
[{type=monkey, food=banana}, {type=dog, food=bone}, {type=cat, food=fish}]
The JSON you're trying to parse is not a valid JSON. If you have control of what's being sent to Camel, try wrapping it in an array, like this:
[{"type":"monkey","food":"banana"},{"type":"dog","food":"bone"},{"type":"cat","food":"fish"}]
This will net you an array of objects and should parse correctly.
If you can't control the input, you can also add the array marks as part of Camel processing, just before the unmarshal() call

How to convert Pojo to parquet? [duplicate]

I have a scenario where to convert the messages present as Json object to Apache Parquet format using Java. Any sample code or examples would be helpful. As far as what I have found to convert the messages to Parquet either Hive, Pig, Spark are being used. I need to convert to Parquet without involving these only by Java.
To convert JSON data files to Parquet, you need some in-memory representation. Parquet doesn't have its own set of Java objects; instead, it reuses the objects from other formats, like Avro and Thrift. The idea is that Parquet works natively with the objects your applications probably already use.
To convert your JSON, you need to convert the records to Avro in-memory objects and pass those to Parquet, but you don't need to convert a file to Avro and then to Parquet.
Conversion to Avro objects is already done for you, see Kite's JsonUtil, and is ready to use as a file reader. The conversion method needs an Avro schema, but you can use that same library to infer an Avro schema from JSON data.
To write those records, you just need to use ParquetAvroWriter. The whole setup looks like this:
Schema jsonSchema = JsonUtil.inferSchema(fs.open(source), "RecordName", 20);
try (JSONFileReader<Record> reader = new JSONFileReader<>(
fs.open(source), jsonSchema, Record.class)) {
reader.initialize();
try (ParquetWriter<Record> writer = AvroParquetWriter
.<Record>builder(outputPath)
.withConf(new Configuration)
.withCompressionCodec(CompressionCodecName.SNAPPY)
.withSchema(jsonSchema)
.build()) {
for (Record record : reader) {
writer.write(record);
}
}
}
I had the same problem, and what I understood that there are not much samples available for parquet write without using avro or other frameworks. Finally I went with Avro. :)
Have a look at this, may help you.

Json object to Parquet format using Java without converting to AVRO(Without using Spark, Hive, Pig,Impala)

I have a scenario where to convert the messages present as Json object to Apache Parquet format using Java. Any sample code or examples would be helpful. As far as what I have found to convert the messages to Parquet either Hive, Pig, Spark are being used. I need to convert to Parquet without involving these only by Java.
To convert JSON data files to Parquet, you need some in-memory representation. Parquet doesn't have its own set of Java objects; instead, it reuses the objects from other formats, like Avro and Thrift. The idea is that Parquet works natively with the objects your applications probably already use.
To convert your JSON, you need to convert the records to Avro in-memory objects and pass those to Parquet, but you don't need to convert a file to Avro and then to Parquet.
Conversion to Avro objects is already done for you, see Kite's JsonUtil, and is ready to use as a file reader. The conversion method needs an Avro schema, but you can use that same library to infer an Avro schema from JSON data.
To write those records, you just need to use ParquetAvroWriter. The whole setup looks like this:
Schema jsonSchema = JsonUtil.inferSchema(fs.open(source), "RecordName", 20);
try (JSONFileReader<Record> reader = new JSONFileReader<>(
fs.open(source), jsonSchema, Record.class)) {
reader.initialize();
try (ParquetWriter<Record> writer = AvroParquetWriter
.<Record>builder(outputPath)
.withConf(new Configuration)
.withCompressionCodec(CompressionCodecName.SNAPPY)
.withSchema(jsonSchema)
.build()) {
for (Record record : reader) {
writer.write(record);
}
}
}
I had the same problem, and what I understood that there are not much samples available for parquet write without using avro or other frameworks. Finally I went with Avro. :)
Have a look at this, may help you.

Avro json decoder : ignore namespace

i tried to use Apache Avro on project... and i've met some difficulties
avro serialization/ deserialization work like a charm ... but i get decoder exceptions.. like unknown union branch blah-blah-blah... in case incomming json does't contain namepsace record ...
e.g.
"user":{"demo.avro.User":{"age":1000... //that's ok
"user":{"age":1000... //org.apache.avro.AvroTypeException: Unknown union branch age
I cannot put object in default namespace... but it is important to parse incoming json regardless it contains namespace node or not
Could you help me to fix it
If you use JSON, why are you using Avro decoders? There are tons of JSON libraries which are designed to work with JSON: with Avro, the idea is to Avro's own compact format, and JSON is mostly used for debugging (i.e. you can expose Avro data as JSON if necessary).

Categories

Resources