jackson serialize csv property order - java

we have a table with 350+ columns. pojo class is generated and getters order get messed up. trying to use csvmapper from jackson, but it generates csv based on getter order. #JsonPropertyOrder is also not use feasible because of many columns.we maintain column ordering in xml and can generate field order array at runtime. can we override at runtime to provide array of fieldnames for property ordering? can we customize using annotation introspector?

What you are looking for is called a MappingFeature. You need to disable alphanumeric sorting of properties, which is enabled by default:
CsvMapper mapper = new CsvMapper();
mapper.disable(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY);
More on this you can find here: Add a feature in CsvSchema to allow definition of ordering #42

Just in case you get here in 2020, this comes straight from the documentation.
So how do you get a CSV Schema instance to use? There are 3 ways:
Create schema based on a Java class
Build schema manually.
Use the first line of CSV document to get the names (no types) for Schema Here
is code for above cases:
// Schema from POJO (usually has #JsonPropertyOrder annotation)
CsvSchema schema = mapper.schemaFor(Pojo.class);
// Manually-built schema: one with type, others default to "STRING"
CsvSchema schema = CsvSchema.builder()
.addColumn("firstName")
.addColumn("lastName")
.addColumn("age", CsvSchema.ColumnType.NUMBER)
.build();
// Read schema from the first line; start with bootstrap instance
// to enable reading of schema from the first line
// NOTE: reads schema and uses it for binding
CsvSchema bootstrapSchema = CsvSchema.emptySchema().withHeader();
ObjectMapper mapper = new CsvMapper();
mapper.readerFor(Pojo.class).with(bootstrapSchema).readValue(json);

Note that #JsonPropertyOrder does not necessarily have to include all properties, just ones you are to include for serialization. But to indicate what is to be serialized you may need to use combination of #JsonProperty (to indicate properties to serialize) and different visibility for inclusion (either via ObjectMapper.setVisibility() for defaults, or via #JsonAutoDetect for per-POJO).
But assuming you do not want to use #JsonPropertyOrder, you can:
Override method in JacksonAnnotationIntrospector that reads the annotation, provide your own implementation that uses other sources (does not need to come from annotations at all)
If using Jackson 2.8.0, there is new way to specify per-class defaults for some things (see ObjectMapper.configOverride() object), including property order
Similarly you could override method that looks for #JsonProperty (findNameForDeserialization() and/or findNameForSerialization()) if you want to use custom criteria for inclusion/exclusion.
There are other mechanisms for inclusion/exclusion as well, like JSON Views (#JsonView), JSON Filters.

I believe your only choice here is uniVocity-parsers, as it allows you to choose which columns to write and in what order:
CsvWriterSettings settings = new CsvWriterSettings();
// Sets the file headers (used for selection only, these values won't be written automatically)
settings.setHeaders("Year", "Make", "Model", "Description", "Price");
// Selects which fields from the input should be written. In this case, fields "make" and "model" will be empty
// The field selection is not case sensitive
settings.selectFields("description", "price", "year");
//configures the writer process java beans with annotations (assume TestBean has a few annotated fiedls)
settings.setRowWriterProcessor(new BeanWriterProcessor<TestBean>(TestBean.class));
// Creates a writer with the above settings;
CsvWriter writer = new CsvWriter(new File("/path/to/output.csv"), settings);
// Writes the headers specified in the settings
writer.writeHeaders();
//creates a bean instance for writing
TestBean bean = new TestBean();
bean.setPrice(new BigDecimal("500.33"));
bean.setDescription("Blah,blah");
bean.setYear(1997);
//writes it
writer.processRecord(bean);
writer.close();
Hope it helps.
Disclosure: I'm the author of this libary, it's open-source and free (Apache 2.0 License)

Related

handle environment variable in .yml with jackson-dataformat-yaml

I'm using jackson to retrive file .yml into pojo java.
It's working fine.
how could I handle environment variables in .yml when read to pojo? jackson has this implementation?
example:
attr1: ${ENV_ATTR} # read this value from environment when has ${}
Dependencie
implementation 'com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.13.1'
Implementation code
var fileYml = new File(getClass().getClassLoader().getResource("file.yml").toURI());
var mapper = new ObjectMapper(new YAMLFactory());
mapper.setPropertyNamingStrategy(PropertyNamingStrategies.KEBAB_CASE);
var entityFromYml = mapper.readValue(fileYmlContent, MyEntity.class);
note: I'm not using spring or spring boot.
There's Apache Commons' StringSubstitutor which does the job.
You can either pre-process the input string with it, or post-process each loaded string.
Post-processing obviously works only on values that load into Strings so you can't use it e.g. on a value that loads into an int.
Pre-processing, on the other hand, is dangerous because it doesn't protect you against YAML special characters in the env variable's value. In your example, set ENV_ATTR to
foo
bar: baz
and after substitution, you will have
attr1: foo
bar: baz
which might not be desired.
If you want to guard against that but also want to substitute in non-String values, you'll need to use SnakeYAML's API directly and specialize its Composer. Jackson is an abstraction API on top of SnakeYAML that restricts what you can do, so this is not possible with Jackson.

Java - Dynamically Change Kafka Data Type By Input for SpecificAvroSerde

import KafkaDataType;
...
...
final Serde<KafkaDataType> eventSchema = new SpecificAvroSerde<>();
...
...
StreamsBuilder builder = new StreamsBuilder();
KStream<String, KafkaDataType> eventStream = builder.stream(STREAM_TOPIC);
My KafkaDataType is an avro schema that is automatically generated from the associated .avsc file. My understanding is that KafkaDataType must be pre-defined. However are there existing methods that allow for a dynamic or generic KafkaDataType? If so a sample code block would be much appreciated.
The goal is to have the KafkaDataType be a generic data type such that different Kafka streams with different avro schemas can be swapped in and out and be processed by the Java code. Currently, for each different avro schema, I would need to change KafkaDataType to the specific Java auto generated classes from the .avsc schemas.
Let me know if things need more clarification.
My understanding is that KafkaDataType must be pre-defined.
For using a SpecificRecord / serde, yes
However are there existing methods that allow for a dynamic or generic KafkaDataType?
Avro SpecificRecord classes (such as those generated) extend from GenericRecord, which you can use with GenericSerde instead of your specific type to work with multiple types of records in the same stream

Jackson YAML parsing with parent objects & array of objects

I have a yaml file with four parent objects. The fourth object is an array of element of which I will want to create a class for and populate. Is there a way to use the object mapper in jackson to ignore the first three objects and then parse my list of "InspectionModules"?
My yaml file looks like this:
# Don't care about these elements
InspectionGroups:
InspecitonModes:
ThresholdTypes:
# This array is of interest
InspectionModules:
- feature : FC_Trig1
name: "first inspection"
Channels:
- id: ICI_01
category: Dia_MC_Config
- feature : FC_Trig2
name: "Diagonal Missing Cap"
Channels:
- id: ICI_02
category: Dia_MC_Config
Basically I want to create a class called InspectionModule and have the mapper map the elements of the InspectionModules array into this class. Is there a simple way to do this in jackson? If not would it be recommended to reorganize our YAML file so we can leverage the object mapper?
I assume you already have some familiarity with Jackson's ObjectMapper.
First you will need a Java class representing the entire
contents of your YAML file. Let's call it Root:
public class Root {
#JsonProperty("InspectionModules")
private List<InspectionModule> inspectionModules;
// getters and setters omitted here for brevity
}
Notice that you'll need to use #JsonProperty to tell Jackson
that the YAML name InspectionModules corresponds to your
Java property inspectionModules in spite of their different
spellings.
Next you need a Java class representing one of the
YAML sections below the InspectionModules:.
public class InspectionModule {
private String feature;
private String name;
#JsonProperty("Channels")
private List<Channel> channels;
// getters and setters omitted here for brevity
}
And finally you need a Channel class representing one of the
YAML sections below Channels:
You should be able to write this Java class by yourself already.
Now you are ready to use Jackson's YAMLMapper for reading your YAML
file into a Root object.
File file = new File("example.yaml");
ObjectMapper objectMapper = new YAMLMapper();
objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
Root root = objectMapper.readValue(file, Root.class);
Notice that you need to tell Jackson that it is OK to encounter
unknown properties (like InspectionGroups) during reading.

Reading CSV data in Spring Batch (creating a custom LineMapper)

I've been doing a bit of work writing some batch processing code on CSV data. I found a tutorial online and so far have been using it without really understanding how or why it works, which means I'm unable to solve a problem I'm currently facing.
The code I'm working with is below:
#Bean
public LineMapper<Employee> lineMapper() {
DefaultLineMapper<Employee> lineMapper = new DefaultLineMapper<Employee>();
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
lineTokenizer.setNames(new String[] { "id", "firstName", "lastName" });
lineTokenizer.setIncludedFields(new int[] { 0, 1, 2 });
BeanWrapperFieldSetMapper<Employee> fieldSetMapper = new BeanWrapperFieldSetMapper<Employee>();
fieldSetMapper.setTargetType(Employee.class);
lineMapper.setLineTokenizer(lineTokenizer);
lineMapper.setFieldSetMapper(fieldSetMapper);
return lineMapper;
}
I'm not entirely clear on what setNames or setIncludedFields is really doing. I've looked through the docs, but still don't know what's happening under the hood. Why do we need to give names to the lineTokenizer? Why can't it just be told how many columns of data there will be? Is its only purpose so that the fieldSetMapper knows which fields to map to which data objects (do they all need to be named the same as the fields in the POJO?)?
I have a new problem where I have CSVs with a large amount of columns (about 25-35) that I need to process. Is there a way to generate the columns in setNames programmatically with the variable names of the POJOs, rather than editing them in by hand?
Edit:
An example input file may be something like:
test.csv:
field1, field2, field3,
a,b,c
d,e,f
g,h,j
The DTO:
public class Test {
private String field1;
private String field2;
private String field3;
//setters and getters and constructor
I see the confusion, so I will try to clarify how key interfaces work together. A LineMapper is responsible for mapping a single line from your input file to an instance of your domain type. The default implementation provided by Spring Batch is the DefaultLineMapper, which delegates the work to two collaborators:
LineTokenizer: which takes a String and tokenizes it into a FieldSet (which is similar to the ResultSet in the JDBC world, where you can get fields by index or name)
FieldSetMapper: which maps the FieldSet to an instance of your domain type
So the process is: String -> FieldSet -> Object:
Each interface comes with a default implementation, but you can provide your own if needed.
DelimitedLineTokenizer
The names attribute in DelimitedLineTokenizer is used to create named fields in the FieldSet. This allows you to get a field by name from the FieldSet (again, similar to ResultSet methods where you can get a field by name). The includedFields allows to select a subset of fields from your input file, just like in your use case where you have 25 fields and you only need to extract a subset of fields.
BeanWrapperFieldSetMapper
This FieldSetMapper implementation expects a type and uses the JavaBean naming conventions for getters/setters to set fields on the target object from the FieldSet.
Is there a way to generate the columns in setNames programmatically with the variable names of the POJOs, rather than editing them in by hand?
This is what the BeanWrapperFieldSetMapper will do. If you provide field names in the FieldSet, the mapper will call the setter of each field having the same name. The name matching is fuzzy in the sense that it tolerates close matches, here is an excerpt from the Javadoc:
Property name matching is "fuzzy" in the sense that it tolerates close matches,
as long as the match is unique. For instance:
* Quantity = quantity (field names can be capitalised)
* ISIN = isin (acronyms can be lower case bean property names, as per Java Beans recommendations)
* DuckPate = duckPate (capitalisation including camel casing)
* ITEM_ID = itemId (capitalisation and replacing word boundary with underscore)
* ORDER.CUSTOMER_ID = order.customerId (nested paths are recursively checked)
This mapper is also configurable with a custom ConversionService if needed. If this still does not cover your use case, you need to provide a custom mapper.

Use of JsonIgnoreProperties specific property deserialize properties exists only in JSON

I stumbled upon some code that adds JsonIgnoreProperties to a property that doesn't exists in class, but exists in JSON, e.g.:
#JsonIgnoreProperties({"ignoreprop"})
public class VO {
public String prop;
}
When JSON is
{ "prop":"1", "ignoreprop":"9999"}
I wonder if ignoring properties has any advantage(s) performance-wise or is it just redundant code?
Annotation that can be used to either suppress serialization of properties (during serialization), or ignore processing of JSON properties read (during deserialization).
EDIT
Is there an advantage(s) ignoring specific property over all (with
#JsonIgnoreProperties(ignoreUnknown=true))?
I wonder if ignoring properties has any advantage
Yes, it is used a lot for forward-compatibility in services. Let's say you have Services A and B. Currently A sends requests to B with some JSON objects.
Now you want to support a new property in the JSON. If you have this feature you are able to let A start sending the new property before B knows how to handle it. Decoupling the development processes of those two services.
ignoring specific property over all
This case does have some minor performance advantages. First, it doesn't try to parse this property which can be a simple string or complex object/array. Second, it helps you avoid handling an exception. Think that all the following can be valid calls and you only care about prop:
{ "prop":"1", "ignoreprop":"9999"}
{ "prop":"1", "ignoreprop":{ "a": { "key": "value", "foo": false }}}
{ "prop":"1", "ignoreprop":[1,2,3,4,5,6..... 1000000]}
From the documentation, mainly the purpose of use this is To ignore any unknown properties in JSON input without exception: which is better not to popup exception when properties are not found either in class or JSON, and this might helps serializing faster docs
Example:
// to prevent specified fields from being serialized or deserialized
// (i.e. not include in JSON output; or being set even if they were included)
#JsonIgnoreProperties({ "internalId", "secretKey" })
// To ignore any unknown properties in JSON input without exception:
#JsonIgnoreProperties(ignoreUnknown=true)
Starting with 2.0, this annotation can be applied both to classes and to properties. If used for both, actual set will be union of all ignorals: that is, you can only add properties to ignore, not remove or override. So you can not remove properties to ignore using per-property annotation.

Categories

Resources