object saved as binary in couchbase - java

I am using ‘MappingCouchbaseConverter’ from spring library like below
CouchbaseDocument target = new CouchbaseDocument();
mappingCouchbaseConverter.write(myObject, target);
ctx.insert(couchbaseClientFactory.getCollection(
“mybucketName”).reactive(), null, target.getContent());
I am using transactions.
above code writes a binary data insteed of JSON, can anyone please help, how can I write the Json insteed of binary object

Related

How to convert JSON to AVRO GenericRecord in Java

I am building a tool in an Apache Beam pipeline which will ingest lots of different types of data (different Schemas, different filetypes, etc.) and will output the results as Avro files. Because there are many different types of output schemas, I'm using GenericRecords to write the Avro data. These GenericRecords include schemas generated during ingestion for each unique file / schema layout. In general, I have been using the built in Avro Schema class to handle these.
I tried using DecoderFactory to convert the Json data to Avro
DecoderFactory decoderFactory = new DecoderFactory();
Decoder decoder = decoderFactory.jsonDecoder(schema, content);
DatumReader<GenericData.Record> reader = new GenericDatumReader<>(schema);
return reader.read(null, decoder);
Which works just fine for the most part, except for when I have a case of a schema that has nullable fields, because the data is being read in from a JSON format that does not include typed fields, so when it creates the Schema it knows whether or not that field can be nullable, or is required, etc. This produces a problem when it writes the data to Avro:
If I have a nullable record that looks like this:
{"someField": "someValue"}
Avro is expecting the JSON data to look like this:
{"someField": {"string": "someValue"}}. This presents a problem anytime this combination appears (which is very frequent).
One possible solution raised was to use an AvroMapper. I laid it out like it shows on that page, created the Schema object as an AvroSchema, packaged the data into a byte array with the schema using AvroMapper.writter()
static GenericRecord convertJsonToGenericRecord(String content, Schema schema)
throws IOException {
JsonNode node = ObjectMappers.defaultObjectMapper().readTree(content);
AvroSchema avroSchema = new AvroSchema(schema);
byte[] avroData =
mapper
.writer(avroSchema)
.writeValueAsBytes(node);
return mapper.readValue(avroData, GenericRecord.class);
Which may hopefully get around the typing problem with nullable records, but which is still giving me issues in the form of not recognizing that the AvroSchema is inside the actual byte array that I'm passing in (avroData). Here is the stack trace:
com.fasterxml.jackson.core.JsonParseException: No AvroSchema set, can not parse
at com.fasterxml.jackson.dataformat.avro.deser.MissingReader._checkSchemaSet(MissingReader.java:68)
at com.fasterxml.jackson.dataformat.avro.deser.MissingReader.nextToken(MissingReader.java:41)
at com.fasterxml.jackson.dataformat.avro.deser.AvroParserImpl.nextToken(AvroParserImpl.java:97)
at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4762)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4668)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3691)
When I checked the avroData byte array just to see what it looked like, it did not include anything other than the actual value I passed into it. It didn't include the schema, and it didn't even include the header or key. For the test, I'm using a single K/V pair as in the example above, and all I got back was the value.
An alternative route that I may pursue if this doesn't work is to manually format the JSON data as it comes in, but this is messy, and will require lots of recursion. I'm 99% sure that I can get it working that way, but would love to avoid it if at all possible.
To reiterate, what I'm trying to do is package incoming JSON-formatted data (string, byte array, node, whatever) with an Avro Schema to create GenericRecords which will be output to .avro files. I need to find a way to ingest the data and Schema such that it will allow for nullable records to be untyped in the JSON-string.
Thank you for your time, and don't hesitate to ask clarifying questions.

Deserialize Avro to Map

Does anybody know the way to deserialize Avro without using any Pojo and Schemas?
The problem:
I have a data stream of different Avro files.
The goal is to group that data depending on the presence of some attributes (e.g. user.role, another.really.deep.attribute.with.specific.value and so on).
Each avro entry might contain any number of matching attributes - from zero to all listed).
So, there is no need to do anything with data. Just to peek at some elements.
The question is, is there any way to convert that data to Map or Node? Like I can do it with JSON using Jackson or GSON.
I've tried to use GenericDatumReader, but it requires a Schema. So maybe all I need is to read the schema from avro (how?).
Also, I've tried to use something like this, but this approach doesn't work.
public Map deserialize(byte[] data) {
DatumReader<LinkedHashMap> reader
= new SpecificDatumReader<>(LinkedHashMap.class);
Decoder decoder = null;
try {
decoder = DecoderFactory.get().binaryDecoder(data, null);
return reader.read(null, decoder);
} catch (IOException e) {
logger.error("Deserialization error:" + e.getMessage());
}
}
Since I have time to 'play' with the problem, I have created a utility class that generates schemas depending on keys. It works, but looks like a big overhead.
A reader schema is required to deserialize any message.
If you have the writer schema available, you can simply use that. Note that if you have Avro files, these include the schema they were written with and you can use avro-tools.jar -getschema to extract it
Without these options, then you'll need to figure out the schema on your own (maybe using a hexdump and knowing how Avro data gets encoded)

How to convert Pojo to parquet? [duplicate]

I have a scenario where to convert the messages present as Json object to Apache Parquet format using Java. Any sample code or examples would be helpful. As far as what I have found to convert the messages to Parquet either Hive, Pig, Spark are being used. I need to convert to Parquet without involving these only by Java.
To convert JSON data files to Parquet, you need some in-memory representation. Parquet doesn't have its own set of Java objects; instead, it reuses the objects from other formats, like Avro and Thrift. The idea is that Parquet works natively with the objects your applications probably already use.
To convert your JSON, you need to convert the records to Avro in-memory objects and pass those to Parquet, but you don't need to convert a file to Avro and then to Parquet.
Conversion to Avro objects is already done for you, see Kite's JsonUtil, and is ready to use as a file reader. The conversion method needs an Avro schema, but you can use that same library to infer an Avro schema from JSON data.
To write those records, you just need to use ParquetAvroWriter. The whole setup looks like this:
Schema jsonSchema = JsonUtil.inferSchema(fs.open(source), "RecordName", 20);
try (JSONFileReader<Record> reader = new JSONFileReader<>(
fs.open(source), jsonSchema, Record.class)) {
reader.initialize();
try (ParquetWriter<Record> writer = AvroParquetWriter
.<Record>builder(outputPath)
.withConf(new Configuration)
.withCompressionCodec(CompressionCodecName.SNAPPY)
.withSchema(jsonSchema)
.build()) {
for (Record record : reader) {
writer.write(record);
}
}
}
I had the same problem, and what I understood that there are not much samples available for parquet write without using avro or other frameworks. Finally I went with Avro. :)
Have a look at this, may help you.

Json object to Parquet format using Java without converting to AVRO(Without using Spark, Hive, Pig,Impala)

I have a scenario where to convert the messages present as Json object to Apache Parquet format using Java. Any sample code or examples would be helpful. As far as what I have found to convert the messages to Parquet either Hive, Pig, Spark are being used. I need to convert to Parquet without involving these only by Java.
To convert JSON data files to Parquet, you need some in-memory representation. Parquet doesn't have its own set of Java objects; instead, it reuses the objects from other formats, like Avro and Thrift. The idea is that Parquet works natively with the objects your applications probably already use.
To convert your JSON, you need to convert the records to Avro in-memory objects and pass those to Parquet, but you don't need to convert a file to Avro and then to Parquet.
Conversion to Avro objects is already done for you, see Kite's JsonUtil, and is ready to use as a file reader. The conversion method needs an Avro schema, but you can use that same library to infer an Avro schema from JSON data.
To write those records, you just need to use ParquetAvroWriter. The whole setup looks like this:
Schema jsonSchema = JsonUtil.inferSchema(fs.open(source), "RecordName", 20);
try (JSONFileReader<Record> reader = new JSONFileReader<>(
fs.open(source), jsonSchema, Record.class)) {
reader.initialize();
try (ParquetWriter<Record> writer = AvroParquetWriter
.<Record>builder(outputPath)
.withConf(new Configuration)
.withCompressionCodec(CompressionCodecName.SNAPPY)
.withSchema(jsonSchema)
.build()) {
for (Record record : reader) {
writer.write(record);
}
}
}
I had the same problem, and what I understood that there are not much samples available for parquet write without using avro or other frameworks. Finally I went with Avro. :)
Have a look at this, may help you.

How to generate a JSON file from stored procedure(SQL) result

I want to get the data from DB using stored procedures in the form of JSon file.
In simple words , my output should be json file which should be result of data in DB(based on stored procedure) .How can i move forward?
You have to create Object Mapper which will convert your data to an object (I think there's an apache library that can do this). Then you can use existing APIs to convert your objects to JSON string example of this is Google's or Jackson. When you have the JSON string you can now write it to a file. Hope this helps.

Categories

Resources