I want to map structured data (microdata, jsonld) extracted from html text into a Java POJO. For extraction I use the library Apache Any23 and configured a JSONLDWriter to convert the structured data found in the html document into json-ld format.
This works as expected an gives me the following output:
[ {
"#graph" : [ {
"#id" : "_:node1gn1v4pudx1",
"#type" : [ "http://schema.org/JobPosting" ],
"http://schema.org/datePosted" : [ {
"#language" : "en-us",
"#value" : "Wed Jan 11 02:00:00 UTC 2023"
} ],
"http://schema.org/description" : [ {
"#language" : "en-us",
"#value" : "Comprehensive Job Description"
} ],
"http://schema.org/hiringOrganization" : [ {
"#language" : "en-us",
"#value" : "Org AG"
} ],
"http://schema.org/jobLocation" : [ {
"#id" : "_:node1gn1v4pudx2"
} ],
"http://schema.org/title" : [ {
"#language" : "en-us",
"#value" : "Recruiter (m/f/d)\n "
} ]
}, {
"#id" : "_:node1gn1v4pudx2",
"#type" : [ "http://schema.org/Place" ],
"http://schema.org/address" : [ {
"#id" : "_:node1gn1v4pudx3"
} ]
}, {
"#id" : "_:node1gn1v4pudx3",
"#type" : [ "http://schema.org/PostalAddress" ],
"http://schema.org/addressCountry" : [ {
"#language" : "en-us",
"#value" : "Company Country"
} ],
"http://schema.org/addressLocality" : [ {
"#language" : "en-us",
"#value" : "Company City"
} ],
"http://schema.org/addressRegion" : [ {
"#language" : "en-us",
"#value" : "Company Region"
} ]
}, {
"#id" : "https://career.company.com/job/Recruiter/",
"http://www.w3.org/1999/xhtml/microdata#item" : [ {
"#id" : "_:node1gn1v4pudx1"
} ]
} ],
"#id" : "https://career.company.com/job/Recruiter/"
} ]
Next I want to deserialize the json-ld object into a Java bean using jackson. The POJO class should look somthing like this:
public class JobPosting {
private String datePosting;
private String hiringOrganization;
private String title;
private String description;
// Following members could be enclosed in a class too if easier
// Like class Place{private PostalAddress postalAddress;}
// private Place place;
private String addressCountry;
private String addressLocality;
private String addressRegion;
}
I would like to do it with annotations provided by Jackson lib but I struggle with a few things:
The #type value wrapped with an array node
The actual data has an extra #value layer
And some objects only hold a reference to other objects in the graph via #id fields
How can I map these fields to my Java Pojo properly?
The trick is to process the json-ld with a json-ld processor to get a more developer friendly json. The titanium-json-ld library provides such processors.
JsonDocument input = JsonDocument.of(jsonLdAsInputStream);
JsonObject frame = JsonLd.frame(input, URI.create("http://schema.org")).get();
The above code snippet resolves references via #id and resolves json keys with the given IRI.
That leads to the following output which is easy to parse via Jackson lib:
[{
"id": "_:b0",
"type": "JobPosting",
"datePosted": {
"#language": "en-us",
"#value": "Wed Jan 11 02:00:00 UTC 2023"
},
"description": {
"#language": "en-us",
"#value": "Comprehensive Job Description"
},
"hiringOrganization": {
"#language": "en-us",
"#value": "Org AG"
},
"jobLocation": {
"id": "_:b1",
"type": "Place",
"address": {
"id": "_:b2",
"type": "PostalAddress",
"addressCountry": {
"#language": "en-us",
"#value": "Company Country"
},
"addressLocality": {
"#language": "en-us",
"#value": "Company City"
},
"addressRegion": {
"#language": "en-us",
"#value": "Company Region"
}
}
},
"title": {
"#language": "en-us",
"#value": "Recruiter (m/f/d)\n "
}
}]
Looking the elements you are interested in the json (for example the "datePosted", "hiringOrganization" values) they are always labelled by "#value" and included in the array corresponding to their names (in this case "http://schema.org/datePosted" and "http://schema.org/hiringOrganization". These are all contained in a part of your json file that can be converted to a JsonNode node that can be obtained in the following way:
JsonNode root = mapper.readTree(json)
.get(0)
.get("#graph")
.get(0);
So if you have a pojo like below:
#Data
public class JobPosting {
private String datePosted;
private String hiringOrganization;
}
and you want to retrieve the datePosted and hiringOrganization values you can check that the relative position is still the same in the json file and can be calculated in a for loop:
JsonNode root = mapper.readTree(json)
.get(0)
.get("#graph")
.get(0);
String strSchema = "http://schema.org/";
String[] fieldNames = {"datePosted", "hiringOrganization"};
//creating a Map<String, String> that will be converted to the JobPosting obj
Map<String, String> map = new HashMap<>();
for (String fieldName: fieldNames) {
map.put(fieldName,
root.get(strSchema + fieldName)
.get(0)
.get("#value")
.asText()
);
}
JobPosting jobPosting = mapper.convertValue(map, JobPosting.class);
//it prints JobPosting(datePosted=Wed Jan 11 02:00:00 UTC 2023, hiringOrganization=Org AG)
System.out.println(jobPosting);
This would require some preprocessing first to turn your graph with id pointers into a simplified tree before mapping it with Jackson:
Turn it into a tree by replacing the #id references with the actual objects themselves.
Flatten those troublesome object/array wrappers around #value.
Full code below, using Java 17 and a bit of recursion:
package org.example;
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.File;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import static java.util.stream.Collectors.toMap;
class Main {
public static void main(String[] args) throws Exception {
var mapper = new ObjectMapper();
var node = mapper.readValue(new File("test.json"), Object.class);
// Build a lookup map of "#id" to the actual object.
var lookup = buildLookup(node, new HashMap<>());
// Replace "#id" references with the actual objects themselves instead
var referenced = lookupReferences(node, lookup);
// Flattens single object array containing "#value" to be just the "#value" themselves
var flattened = flatten(referenced);
// Jackson should be able to under our objects at this point, so convert it
var jobPostings =
mapper.convertValue(flattened, new TypeReference<List<RootObject>>() {}).stream()
.flatMap(it -> it.graph().stream())
.filter(it -> it instanceof JobPosting)
.map(it -> (JobPosting) it)
.toList();
System.out.println(jobPostings);
}
private static Map<String, Object> buildLookup(Object node, Map<String, Object> lookup) {
if (node instanceof List<?> list) {
for (var value : list) {
buildLookup(value, lookup);
}
} else if (node instanceof Map<?, ?> map) {
for (var value : map.values()) {
buildLookup(value, lookup);
}
if (map.size() > 1 && map.get("#id") instanceof String id) {
lookup.put(id, node);
}
}
return lookup;
}
private static Object lookupReferences(Object node, Map<String, Object> lookup) {
if (node instanceof List<?> list
&& list.size() == 1
&& list.get(0) instanceof Map<?, ?> map
&& map.size() == 1
&& map.get("#id") instanceof String id) {
return lookupReferences(lookup.get(id), lookup);
}
if (node instanceof List<?> list) {
return list.stream().map(value -> lookupReferences(value, lookup)).toList();
}
if (node instanceof Map<?, ?> map) {
return map.entrySet().stream()
.map(entry -> Map.entry(entry.getKey(), lookupReferences(entry.getValue(), lookup)))
.collect(toMap(Entry::getKey, Entry::getValue));
}
return node;
}
private static Object flatten(Object node) {
if (node instanceof List<?> list && list.size() == 1) {
if (list.get(0) instanceof String s) {
return s;
}
if (list.get(0) instanceof Map<?, ?> map) {
var value = map.get("#value");
if (value != null) {
return value;
}
}
}
if (node instanceof List<?> list) {
return list.stream().map(Main::flatten).toList();
}
if (node instanceof Map<?, ?> map) {
return map.entrySet().stream()
.map(entry -> Map.entry(entry.getKey(), flatten(entry.getValue())))
.collect(toMap(Entry::getKey, Entry::getValue));
}
return node;
}
}
#JsonIgnoreProperties(ignoreUnknown = true)
record RootObject(#JsonProperty("#graph") List<GraphObject> graph) {}
#JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "#type", defaultImpl = Ignored.class)
#JsonSubTypes({
#JsonSubTypes.Type(value = JobPosting.class, name = "http://schema.org/JobPosting"),
#JsonSubTypes.Type(value = Place.class, name = "http://schema.org/Place"),
#JsonSubTypes.Type(value = PostalAddress.class, name = "http://schema.org/PostalAddress"),
})
interface GraphObject {}
#JsonIgnoreProperties(ignoreUnknown = true)
record Ignored() implements GraphObject {}
#JsonIgnoreProperties(ignoreUnknown = true)
record JobPosting(
#JsonProperty("http://schema.org/title") String title,
#JsonProperty("http://schema.org/description") String description,
#JsonProperty("http://schema.org/hiringOrganization") String hiringOrganization,
#JsonProperty("http://schema.org/datePosted") String datePosted,
#JsonProperty("http://schema.org/jobLocation") Place jobLocation)
implements GraphObject {}
#JsonIgnoreProperties(ignoreUnknown = true)
record Place(#JsonProperty("http://schema.org/address") PostalAddress address)
implements GraphObject {}
#JsonIgnoreProperties(ignoreUnknown = true)
record PostalAddress(
#JsonProperty("http://schema.org/addressLocality") String locality,
#JsonProperty("http://schema.org/addressRegion") String region,
#JsonProperty("http://schema.org/addressCountry") String country)
implements GraphObject {}
Related
I am looking for a solution that have Jackson use toString method whenever it can not serialize an object type.
Let me explain more detail.
I have a class:
#AllArgsConstructor
public class TestJackson {
public String RequestId;
public AntiSerialize foo;
#JsonFormat(pattern = "yyyy-MM-dd HH:mm:ss")
public LocalDateTime dateRequest;
public Map<String, Object> headers;
private static class AntiSerialize {
#Override
public String toString() {
return "AntiSerialize " + ZonedDateTime.now().toEpochSecond();
}
}
public static TestJackson createSample() {
return new TestJackson(
"123",
new TestJackson.AntiSerialize(),
LocalDateTime.now(),
Map.of("content", 999,
"b3", new TestJackson.AntiSerialize(),
"b4", Arrays.asList(
new TestJackson.AntiSerialize(),
new TestJackson.AntiSerialize()
)
)
);
}
}
This is to test serialize object
var OM = new ObjectMapper()
.configure(SerializationFeature.FAIL_ON_EMPTY_BEANS, false);
try {
//var f = TestJackson.createSample().foo;
//out.println(((Object)f).toString());
var json = OM.writerWithDefaultPrettyPrinter()
.writeValueAsString(TestJackson.createSample());
out.println(json);
} catch (Exception e) {
e.printStackTrace(out);
}
It prints
{
"RequestId" : "123",
"foo" : { },
"dateRequest" : "2022-08-04 09:04:14",
"headers" : {
"b3" : { },
"b4" : [ { }, { } ],
"content" : 999
}
}
But I expect:
{
"RequestId" : "123",
"foo" : "AntiSerialize 1659578741",
"dateRequest" : "2022-08-04 09:04:14",
"headers" : {
"b3" : "AntiSerialize 1659578752",
"b4" : [ "AntiSerialize 1659578763", "AntiSerialize 1659578774" ],
"content" : 999
}
}
Assume TestJackson is third party class and can't modify.
AntiSerialize is also a private nested class.
Here is a simplified JSON payload that I want to be able to change (the original one is much longer)
{
"request": {
"jsonRequest": {
"Alpha": {
"Bravo": [
{
"Delta": "something"
}
],
"Desert": [
{
"id": 1,
"name": "Lena",
"age": "25",
"city": "Florida",
"street": "Florida Street"
},
{
"id": 2,
"name": "Moa",
"age": "21",
"city": "Mexico",
"street": "Mexico Street"
},
{
"id": 3,
"name": "Nils",
"age": "29",
"city": "Tampas",
"street": "Tampas Street"
}
]
}
}
}
}
Most of the values should be hardcoded and not changed, however, there are some fields I need to be able to modify before I send it as PUT-request.
So, here is what I did:
I created a java class and a method to return this String. Then I'll be using String.format() to solve my problem.
public class myClass {
/* The string I want to send - note the %s which is the sub-strings that I want to be able to modify */
private String payload = "{\r\n\"request\":{\r\n\"jsonRequest\":{\r\n\"Alpha\":{\r\n\"Bravo\":[\r\n{\r\n\"Delta\":\"something\"\r\n}\r\n],\r\n\"Desert\":[\r\n{\r\n\"id\":1,\r\n\"name\":\"%s\",\r\n\"age\":\"25\",\r\n\"city\":\"Florida\",\r\n\"street\":\"%s\"\r\n},\r\n{\r\n\"id\":2,\r\n\"name\":\"Moa\",\r\n\"age\":\"21\",\r\n\"city\":\"Mexico\",\r\n\"street\":\"%s\"\r\n},\r\n{\r\n\"id\":3,\r\n\"name\":\"Nils\",\r\n\"age\":\"29\",\r\n\"city\":\"Tampas\",\r\n\"street\":\"TampasStreet\"\r\n}\r\n]\r\n}\r\n}\r\n}\r\n}"
// Method to return this string
public String getPayload() {return this.payload;}
}
Then in the main class, I call this method like this:
String temp = new myClass().getPayload();
String payload = String.format(temp, "Lena", "Florida Street", "Mexico Street");
This solves the issue, but I think I've stepped on too many toes with my approach to solve it. I gladly take advice in how to do this is a more efficient and readable way, because right now I'll be the only one to understand what is going on.
You can use a more structured way using Gson or Jackson libraries. Create your POJO class and annotate it with these libraries' specific annotations. The POJO depth level is related to your design. The easiest way is to create a Map<String, Object> and fill it with what you want as follow:
ObjectMapper mapper = new ObjectMapper();
mapper.enable(SerializationFeature.INDENT_OUTPUT);
Map<String, Object> items = new LinkedHashMap<>();
items.put("request", Map.of(
"jsonRequest", Map.of(
"Alpha", Map.of(
"Bravo", List.of("Delta", "something"),
"Desert", List.of(
Map.of("id", 1),
Map.of("id", 2),
Map.of("id", 3),
Map.of("id", 4)
)
)
)));
System.out.println(mapper.writeValueAsString(items));
The ObjectMapper will produce something like the below:
{
"request" : {
"jsonRequest" : {
"Alpha" : {
"Bravo" : [ "Delta", "something" ],
"Desert" : [ {
"id" : 1
}, {
"id" : 2
}, {
"id" : 3
}, {
"id" : 4
} ]
}
}
}
}
The more strict way is to design your POJO with more details. The jackson-object-mapper is used to achieve the goals.
static class Request {
private Map<String, JsonRequest> items = new LinkedHashMap<>();
#JsonAnyGetter
public Map<String, JsonRequest> getItems() {
return items;
}
public void addItem(String property, JsonRequest value) {
items.put(property, value);
}
}
static class JsonRequest {
private Map<String, List<JsonRequestItem>> items = new LinkedHashMap<>();
#JsonAnyGetter
public Map<String, List<JsonRequestItem>> getItems() {
return items;
}
public void addItem(String property, List<JsonRequestItem> value) {
items.put(property, value);
}
}
static class JsonRequestItem {
private Map<String, Object> items = new LinkedHashMap<>();
#JsonAnyGetter
public Map<String, Object> getItems() {
return items;
}
public void addItem(String property, Object value) {
items.put(property, value);
}
}
You can test the structure using the code below:
public static void main(String[] args) throws JsonProcessingException {
ObjectMapper mapper = new ObjectMapper();
mapper.enable(SerializationFeature.INDENT_OUTPUT);
Request request = new Request();
JsonRequest alpha = new JsonRequest();
alpha.addItem("Bravo", List.of(
new JsonRequestItem() {{
addItem("Delta", "something");
}}
));
alpha.addItem("Desert", List.of(
new JsonRequestItem() {{
addItem("id", 1);
addItem("name", "Lena");
addItem("age", "25");
addItem("city", "Florida");
addItem("street", "Florida Street");
}},
new JsonRequestItem() {{
addItem("id", 2);
addItem("name", "Moa");
addItem("age", "21");
addItem("city", "Mexico");
addItem("street", "Mexico Street");
}},
new JsonRequestItem() {{
addItem("id", 3);
addItem("name", "Nils");
addItem("age", "29");
addItem("city", "Tampas");
addItem("street", "Tampas Street");
}}
));
request.addItem("Alpha", alpha);
System.out.println(mapper.writeValueAsString(request));
}
Then the result will be similar:
{
"Alpha" : {
"Bravo" : [ {
"Delta" : "something"
} ],
"Desert" : [ {
"id" : 1,
"name" : "Lena",
"age" : "25",
"city" : "Florida",
"street" : "Florida Street"
}, {
"id" : 2,
"name" : "Moa",
"age" : "21",
"city" : "Mexico",
"street" : "Mexico Street"
}, {
"id" : 3,
"name" : "Nils",
"age" : "29",
"city" : "Tampas",
"street" : "Tampas Street"
} ]
}
}
When I call my API with a request body I deserialize it with the variable name in my POJO. I modify the same list and return back but it duplicates the list
#JsonSerialize
#JsonIgnoreProperties(ignoreUnknown = true)
public class UASchema {
#JsonProperty("metric_id")
private ArrayList<String> fMetricId;
#JsonProperty("schema")
private ArrayList<String> fSchema;
#JsonProperty("hash")
private String fHash;
...
...
//getter and setters
}
Request body is
{
"data" : [
{
"metric_id": ["k1", "ak2", "d90"],
"schema": ["s1", "s2"]
},
{
"metric_id": ["k21", "k22"],
"schema": ["a11", "s22"]
}
]
}
Response I get is (added hash)
{
"result": [
{
"fmetricId": [
"k1",
"ak2",
"d90"
],
"fschema": [
"s1",
"s2"
],
"metric_id": [
"k1",
"ak2",
"d90"
],
"schema": [
"s1",
"s2"
],
"hash": "389abc9093442cfd2aee1f20807ba467"
},
{
"fmetricId": [
"k21",
"k22"
],
"fschema": [
"a11",
"s22"
],
"metric_id": [
"k21",
"k22"
],
"schema": [
"a11",
"s22"
],
"hash": "5f366dde65b69fa679f95a81f3115b7f"
}
]
}
It duplicates the list and not correctly serializing it. I want the response to just have the same list as request body and I added hash back.
It looks like your algorithm duplicates entries or you manually generated getters and setters which duplicate output. By default Jackson does not add extra entries. See below example how you can do that, I generated getters and setters in IDE. f-fieldName pattern for fields is outdated and you should use regular names. See, for example, Google's Java Guide:
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.SerializationFeature;
import java.io.File;
import java.util.List;
import java.util.UUID;
public class JsonApp {
public static void main(String[] args) throws Exception {
File jsonFile = new File("./resource/test.json").getAbsoluteFile();
ObjectMapper mapper = new ObjectMapper();
mapper.enable(SerializationFeature.INDENT_OUTPUT);
Root root = mapper.readValue(jsonFile, Root.class);
root.getData().forEach(s -> s.setfHash(UUID.randomUUID().toString()));
System.out.println(mapper.writeValueAsString(root));
}
}
class Root {
private List<UASchema> data;
public List<UASchema> getData() {
return data;
}
public void setData(List<UASchema> data) {
this.data = data;
}
#Override
public String toString() {
return "Root{" +
"data=" + data +
'}';
}
}
class UASchema {
#JsonProperty("metric_id")
private List<String> fMetricId;
#JsonProperty("schema")
private List<String> fSchema;
#JsonProperty("hash")
private String fHash;
public List<String> getfMetricId() {
return fMetricId;
}
public void setfMetricId(List<String> fMetricId) {
this.fMetricId = fMetricId;
}
public List<String> getfSchema() {
return fSchema;
}
public void setfSchema(List<String> fSchema) {
this.fSchema = fSchema;
}
public String getfHash() {
return fHash;
}
public void setfHash(String fHash) {
this.fHash = fHash;
}
#Override
public String toString() {
return "UASchema{" +
"fMetricId=" + fMetricId +
", fSchema=" + fSchema +
", fHash='" + fHash + '\'' +
'}';
}
}
Above code prints:
{
"data" : [ {
"metric_id" : [ "k1", "ak2", "d90" ],
"schema" : [ "s1", "s2" ],
"hash" : "80dcf06d-1d83-463c-afb8-edef8efdc71f"
}, {
"metric_id" : [ "k21", "k22" ],
"schema" : [ "a11", "s22" ],
"hash" : "a83d7981-4b80-4318-a632-f3c91d14379b"
} ]
}
I have a Java list representation like below
List representation of data
[
{ "type" : "Error", "name" : "xyz" },
{ "type" : "Success", "name" : "abc" },
{ "type" : "none", "name" : "prq" },
{ "type" : "Success", "name" : "" },
{ "type" : "Success", "name" : "xyz" },
{ "type" : "Warning", "name" : "efc" }
.
.
.
]
(Partial representation here).
and an Object representation below
public Node {
List<String> errorNames;
List<String> SuccessNames;
List<String> WarningNames;
}
I want to use Java streams to separate the three type of names based on their type and add each name to the respective Lists.
What will be the best way (Stream.filter/Collect/Map anything else) to split the list such that at the end "Node's->(all the lists)" will have corresponding data?
Assuming your Node class actually looks something like this:
public class Node {
private String type;
private String name;
public Node(String type, String name) {
this.type = type;
this.name = name;
}
public String getType() {
return type;
}
public String getName() {
return name;
}
}
You can use Collectors#groupingBy in combination with Collectors#mapping to create a Map<String, List<String>> where they key is the type and the value are the List of name for every Node of that type:
var nodes = List.of(
new Node("Error", "xyz"), new Node("Success", "abc"),
new Node("none", "prq"), new Node("Success", ""),
new Node("Success", "xyz"), new Node("Warning", "efc")
);
var map = nodes.stream()
.collect(Collectors.groupingBy(Node::getType,
Collectors.mapping(Node::getName, Collectors.toList())));
System.out.println(map);
Output:
{Warning=[efc], Error=[xyz], none=[prq], Success=[abc, , xyz]}
My repository implements the following interface:
QueryDslPredicateExecutor<Rule>
The structure of mongo's document(Rule object) is:
{
"_id" : ObjectId("5aa924242a73bec4ce581717"),
"name" : "test-name",
"expressionGroups" : [
{
"type" : "WHEN",
"expressions" : [
{
"name" : "name1",
"values" : ["VAL1", "VAL2", "VAL3"]
},
{
"name" : "name2",
"values" : ["VAL4", "VAL5"]
}
]
},
{
"type" : "FOR",
"expressions" : [
{
"name" : "name3",
"values" : ["VAL6", "VAL7"]
}
]
}
]
}
I want to use the following method to find particular rule within mongodb:
Iterable<T> findAll(Predicate predicate);
I've prepared mongo shell query:
db.rule.find({
'expressionGroups.expressions': {
$all: [
{
'$elemMatch': {
'name': "name1",
'values': "VAL2"
}
},
{
'$elemMatch': {
'name': "name3",
'values': "VAL7"
}
}
]
}
}).pretty()
How can I create com.querydsl.core.types.Predicate based on above query?
There is still no answer and I've met the same issue. So I created such code:
private List<Predicate> toPredicates(String root, Map<String, Object> map) {
List<Predicate> predicates = new ArrayList<>();
for (Map.Entry<String, Object> entry: map.entrySet()) {
String path = root+'.'+entry.getKey();
Object value = entry.getValue();
if (entry.getValue() instanceof Map) {
predicates.addAll(toPredicates(path, (Map<String, Object>) value));
} else {
predicates.add(new SimplePath(path).eq(value.toString()));
}
}
return predicates;
}
private static class SimplePath extends StringPath {
protected SimplePath(String var) {
super(var);
}
}
So, you can parse Json to Map and this is it.