Generate Avro Schema from certain Java Object - java

Apache Avro provides a compact, fast, binary data format, rich data structure for serialization. However, it requires user to define a schema (in JSON) for object which need to be serialized.
In some case, this can not be possible (e.g: the class of that Java object has some members whose types are external java classes in external libraries). Hence, I wonder there is a tool can get the information from object's .class file and generate the Avro schema for that object (like Gson use object's .class information to convert certain object to JSON string).

Take a look at the Java reflection API.
Getting a schema looks like:
Schema schema = ReflectData.get().getSchema(T);
See the example from Doug on another question for a working example.
Credits of this answer belong to Sean Busby.

Here's how to Generate an Avro Schema from POJO definition
ObjectMapper mapper = new ObjectMapper(new AvroFactory());
AvroSchemaGenerator gen = new AvroSchemaGenerator();
mapper.acceptJsonFormatVisitor(RootType.class, gen);
AvroSchema schemaWrapper = gen.getGeneratedSchema();
org.apache.avro.Schema avroSchema = schemaWrapper.getAvroSchema();
String asJson = avroSchema.toString(true);

** Example**
Pojo class
public class ExportData implements Serializable {
private String body;
// ... getters and setters
}
Serialize
File file = new File(fileName);
DatumWriter<ExportData> writer = new ReflectDatumWriter<>(ExportData.class);
DataFileWriter<ExportData> dataFileWriter = new DataFileWriter<>(writer);
Schema schema = ReflectData.get().getSchema(ExportData.class);
dataFileWriter.create(schema, file);
for (Row row : resultSet) {
String rec = row.getString(0);
dataFileWriter.append(new ExportData(rec));
}
dataFileWriter.close();
Deserialize
File file = new File(avroFilePath);
DatumReader<ExportData> datumReader = new ReflectDatumReader<>(ExportData.class);
DataFileReader<ExportData> dataFileReader = new DataFileReader<>(file, datumReader);
ExportData record = null;
while (dataFileReader.hasNext()){
record = dataFileReader.next(record);
// process record
}

Related

mapping particular column of a csv file with particular POJO's field

I have to map particular CSV column based on index with particular POJO attributes. Mapping will be based on a json file which will contain columnIndex and attribute name which means that for a particular columnIndex from csv file you have to map particular attribute from Pojo class.
Below is a sample of json file which shows column mapping strategy with Pojo attributes.
[{"index":0,"columnname":"date"},{"index":1,"columnname":"deviceAddress"},{"index":7,"columnname":"iPAddress"},{"index":3,"columnname":"userName"},{"index":10,"columnname":"group"},{"index":5,"columnname":"eventCategoryName"},{"index":6,"columnname":"message"}]
I have tried with OpenCSV library but the challenges which i faced with that I am not able to read partial column with it. As in above json you can see that we are skipping index 2 and 4 to read from CSV file. Below is the code with openCSV file.
public static List<BaseDataModel> readCSVFile(String filePath,List<String> columnListBasedOnIndex) {
List<BaseDataModel> csvDataModels = null;
File myFile = new File(filePath);
try (FileInputStream fis = new FileInputStream(myFile)) {
final ColumnPositionMappingStrategy<BaseDataModel> strategy = new ColumnPositionMappingStrategy<BaseDataModel>();
strategy.setType(BaseDataModel.class);
strategy.setColumnMapping(columnListBasedOnIndex.toArray(new String[0]));
final CsvToBeanBuilder<BaseDataModel> beanBuilder = new CsvToBeanBuilder<>(new InputStreamReader(fis));
beanBuilder.withMappingStrategy(strategy);
csvDataModels = beanBuilder.build().parse();
} catch (Exception e) {
e.printStackTrace();
}
}
List<ColumnIndexMapping> columnIndexMappingList = dataSourceModel.getColumnMappingStrategy();
List<String> columnNameList = columnIndexMappingList.stream().map(ColumnIndexMapping::getColumnname)
.collect(Collectors.toList());
List<BaseDataModel> DataModels = Utility
.readCSVFile(file.getAbsolutePath() + File.separator + fileName, columnNameList);
I have also tried with univocity but with this library how can i map csv with particular attributes. Below is the code -
CsvParserSettings settings = new CsvParserSettings();
settings.detectFormatAutomatically(); //detects the format
settings.getFormat().setLineSeparator("\n");
//extracts the headers from the input
settings.setHeaderExtractionEnabled(true);
settings.selectIndexes(0, 2); //rows will contain only values of columns at position 0 and 2
CsvRoutines routines = new CsvRoutines(settings); // Can also use TSV and Fixed-width routines
routines.parseAll(BaseDataModel.class, new File("/path/to/your.csv"));
List<String[]> rows = new CsvParser(settings).parseAll(new File("/path/to/your.csv"), "UTF-8");
Please have a look if someone can help me in this case.
Author of univocity-parsers here. You can define mappings to your class attributes in code instead of annotations. Something like this:
public class BaseDataModel {
private String a;
private int b;
private String c;
private Date d;
}
Then on your code, map the attributes to whatever column names you need:
ColumnMapper mapper = routines.getColumnMapper();
mapper.attributeToColumnName("a", "col1");
mapper.attributeToColumnName("b", "col2");
mapper.attributeToColumnName("c", "col3");
mapper.attributeToColumnName("d", "col4");
You can also use mapper.attributeToIndex("d", 3); to map attributes to a given column index.
Hope this helps.

Inconsistentency in deserealizing objects with Jackson streaming API

I am trying to use Jackson streaming API to deserialize huge objects from XML. The idea is to combine streaming API and ObjectMapper to parse XML(or JSON) by small chunks. However I see some inconsistent behavior with XML Parser.
With this code snippet:
try {
String xml1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo></foo>";
String xml2 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo><bar></bar></foo>";
XmlFactory xmlFactory = new XmlFactory();
JsonParser jp = xmlFactory.createParser(new ByteArrayInputStream(xml1.getBytes()));
JsonToken token = jp.nextToken();
while (token != null) {
System.out.println("xml1 token=" + token);
token = jp.nextToken();
}
jp = xmlFactory.createParser(new ByteArrayInputStream(xml2.getBytes()));
token = jp.nextToken();
while (token != null) {
System.out.println("xml2 token=" + token);
token = jp.nextToken();
}
} catch (IOException e) {
e.printStackTrace();
}
I am getting:
xml1 token=START_OBJECT
xml1 token=END_OBJECT
xml2 token=START_OBJECT
xml2 token=FIELD_NAME
xml2 token=VALUE_NULL
xml2 token=END_OBJECT
Why is the FIELD_NAME token missing for xml1? Why is there just one START_OBJECT token for the second xml? Is there any setting that would allow me to see FIELD_NAME of outer tag?
Problem is quite simple: XML module is different from most other Jackson dataformat modules in that direct access via Streaming API is not supported.
This is mentioned on project README (along with mention that "tree model" is similarly not supported).
Not supported does not necessarily mean "can not be used at all", just that its behavior is different from handling for JSON so callers really need to know what they are doing above and beyond API used for JSON content (and Smile, CBOR, YAML -- even CSV content is represented in a way that is compatible with JSON access).
While you can try to use XmlFactory and streaming parser/generator, its behavior is controlled by XmlMapper based on metadata from Java classes, to make things works correctly via databinding API (that is, XmlMapper).
With that, the reason for observed tokens is that such translation is necessary to map to expected Java object structure:
public class Foo {
public Bar bar;
}
which would map to JSON like:
json
{
"bar" : null
}
as well as XML of
xml
<foo>
<bar></bar>
</foo>
Another way to put this is that XML and JSON data models are fundamentally different, and they can not be trivially translated. Since Jackson's token model is based on JSON, some work is needed to translated XML elements and attributes into structure that equivalent JSON would have.
Above is not to say that what you try to do is impossible. There are 2 ways you might be able to make things work:
Knowing translation that XmlParser does, call getToken() expecting translation
Instead of using XmlParser directly, construct XMLStreamReader (Stax low-level streaming parser), read "raw" tokens, and construct separate XmlParser (via XmlFactory) at expected location, use that for reading.
I hope this helps.
A kid with a hammer...
I don't know much about Jackson; in fact, I just started using it, thinking of using JSON or YAML instead of XML. But for XML, we have been using XStream with success.
//Consumer side
FileInputStream fis = new FileInputStream(filename);
XStream xs = new XStream();
Object obj = xs.fromXML(fis);
fis.close();
Also, if the case is that you are also originating the serialization and it is from Java, you could use Java serialization altogether for a lower footprint and faster operation.
//producer side
FileOutputStream fos = new FileOutputStream(filename);
ObjectOutputStream oos = new ObjectOutputStream(new BufferedOutputStream(fos));
oos.writeObject(yourVeryComplexObjectStructure); //I am writing a list of ten 1MB objects
oos.flush();
oos.close();
fos.close();
//Consumer side
final FileInputStream fin = new FileInputStream(filename);
final ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(fin));
#SuppressWarnings("unchecked")
final YourVeryComplexObjectStructureType object = (YourVeryComplexObjectStructureType) ois.readObject();
ois.close();
fin.close();

How to write union when creating Avro file in Java

I'm trying to create Avro file in Java (just testing code at the moment). Everything works fine, the code looks about like this:
GenericRecord record = new GenericData.Record(schema);
File file = new File("test.avro");
DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema);
DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter);
dataFileWriter.create(schema, file);
dataFileWriter.append(record);
dataFileWriter.close();
The problem I'm facing now is - what kind of Java object do I instantiate when I want to write Union? Not necessarily on the top level, possibly attach the union to a record being written. There are a few objects for complex types prepared, like GenericData.Record, GenericData.Array etc. For those that are not prepared, usually the right object is simply a standard Java object (java.util.Map implementing classes for "map" Avro type etc.).
But I cannot figure out what is the right object to instantiate for writing a Union.
This question refers to writing Avro file WITHOUT code generation. Any help is very much appreciated.
Here's what I did:
Suppose the schema is defined like this:
record MyStructure {
...
record MySubtype {
int p1;
}
union {null, MySubtype} myField = null;
...
}
And this is the Java code:
Schema schema; // the schema of the main structure
// ....
GenericRecord rec = new GenericData.Record(schema);
int i = schema.getField("myField").schema().getIndexNamed("MySubtype");
GenericRecord myField = new GenericData.Record(schema.getField("myField").schema().getTypes().get(i));
myField.put("p1", 100);
rec.put("myField", myField);

How Serialize Objects To String Using SimpleFramework

Am working with GWT application and integrated with Simple framework to parse objects into XML, I have POJO classes on client side and use the parser on server side. I need to write the serialized object to String variable instead of file cause files not allowed in GWT App engine https://groups.google.com/forum/?fromgroups=#!topic/google-web-toolkit/M7Zo3U7CKD8.
Current code I have in the server side on GWT RPC ServiceImpl
File result = new File("c:/myXMLFile.xml");
Serializer serializer = new Persister();
MyBeanToSerialize beanToSerialize = new MyBeanToSerialize("firstName","LastName");
serializer.write(beanToSerialize, result);
I found the solution for returning String from the XML parser by using the writer object instead of File the code is as the following:-
String parser(){
StringWriter writer = new StringWriter();
Serializer serializer = new Persister();
MyBeanToSerialize beanToSerialize = new MyBeanToSerialize("firstName","LastName");
serializer.write(beanToSerialize, writer);
return writer.getBuffer().toString();
)

Problems converting from an object to XML in java

What I'm trying to do is to convert an object to xml, then use a String to transfer it via Web Service so another platform (.Net in this case) can read the xml and then deparse it into the same object. I've been reading this article:
http://simple.sourceforge.net/download/stream/doc/tutorial/tutorial.php#start
And I've been able to do everything with no problems until here:
Serializer serializer = new Persister();
PacienteObj pac = new PacienteObj();
pac.idPaciente = "1";
pac.nomPaciente = "Sonia";
File result = new File("example.xml");
serializer.write(pac, result);
I know this will sound silly, but I can't find where Java creates the new File("example.xml"); so I can check the information.
And I wanna know if is there any way to convert that xml into a String instead of a File, because that's what I need exactly. I can't find that information at the article.
Thanks in advance.
And I wanna know if is there any way to convert that xml into a String instead of a File, because that's what I need exactly. I can't find that information at the article.
Check out the JavaDoc. There is a method that writes to a Writer, so you can hook it up to a StringWriter (which writes into a String):
StringWriter result = new StringWriter(expectedLength);
serializer.write(pac, result)
String s = result.toString();
You can use an instance of StringWriter:
Serializer serializer = new Persister();
PacienteObj pac = new PacienteObj();
pac.idPaciente = "1";
pac.nomPaciente = "Sonia";
StringWriter result = new StringWriter();
serializer.write(pac, result);
String xml = result.toString(); // xml now contains the serialized data
Log or print the below statement will tell you where the file is on the file system.
result.getAbsolutePath()

Categories

Resources