Microdata Java Parser from HTML - java

Actually I am trying to parse schema.org Microdata from HTML source pages using a java library.I tried many sites finally I found any23 and Mf2j from github For .apache any23 I didn't find proper documentation so I left that and finally using Mf2j I build the project and I got the jar. I created a sample project to execute Mf2j to parse Microdata. Here is the sample code which I wrote.
package org.mypro;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;
import java.util.Map;
import java.util.Properties;
import java.util.Map.Entry;
import com.kylewm.mf2j.Mf2Parser;
public class MySample {
/**
* #param args
* #throws URISyntaxException
* #throws IOException
*/
public static void main(String[] args) throws IOException, URISyntaxException {
// TODO Auto-generated method stub
Mf2Parser parser = new Mf2Parser()
.setIncludeAlternates(true)
.setIncludeRelUrls(true);
URL server = new URL("http://blogbasics.com/examples-of-blogs/");
Properties systemProperties = System.getProperties();
systemProperties.setProperty("http.proxyHost","pp.xyz.com");
systemProperties.setProperty("http.proxyPort","1234");
HttpURLConnection connection = (HttpURLConnection)server.openConnection();
connection.connect();
Map<String,Object> parsed = parser.parse(new URI("http://blogbasics.com/examples-of-blogs/"));
for (Entry<String, Object> string : parsed.entrySet()) {
System.out.println(string.getKey() +" = "+string.getValue());
}
}
}
But I got output like this. I got ly image URL's but I need all Microdata metadata.
rels = {"Icon":["http://blogbasics.com/wp-content/uploads/cropped-Blog-Basics-favicon.png"],"Shortcut":["http://blogbasics.com/wp-content/uploads/cropped-Blog-Basics-favicon.png"],"apple-touch-icon-precomposed":["http://blogbasics.com/wp-content/uploads/apple-touch-icon-144x144.png","http://blogbasics.com/wp-content/uploads/apple-touch-icon-114x114.png","http://blogbasics.com/wp-content/uploads/apple-touch-icon-72x72.png","http://blogbasics.com/wp-content/uploads/apple-touch-icon-57x57.png"],"canonical":["http://blogbasics.com/examples-of-blogs/"],"external":["http://blogbasics.com","http://allisondduncan.com","http://www.kuldipsonsjewellers.com/Earring.html","http://jin6000.tumblr.com/","http://ckckarate.com","http://www.ferrypress.com/profile.php?u=Herman1879"],"nofollow":["http://bit.ly/1EQF6HG","https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/","http://blogbasics.com/examples-of-blogs/#comment-261","http://blogbasics.com","http://blogbasics.com/examples-of-blogs/#comment-262","http://allisondduncan.com","http://blogbasics.com/examples-of-blogs/#comment-270","http://blogbasics.com/examples-of-blogs/#comment-736","http://www.kuldipsonsjewellers.com/Earring.html","http://blogbasics.com/examples-of-blogs/#comment-1407","http://blogbasics.com/examples-of-blogs/#comment-3036","http://blogbasics.com/examples-of-blogs/#comment-5682","http://blogbasics.com/examples-of-blogs/#comment-6877","http://blogbasics.com/examples-of-blogs/#comment-8615","http://jin6000.tumblr.com/","http://blogbasics.com/examples-of-blogs/#comment-8684","http://ckckarate.com","http://blogbasics.com/examples-of-blogs/#comment-18326","http://www.ferrypress.com/profile.php?u=Herman1879","http://blogbasics.com/examples-of-blogs/#comment-22883","http://blogbasics.com/examples-of-blogs/#comment-26672","http://blogbasics.com/examples-of-blogs/#respond","http://www.blogbasics.com","http://www.blogbasics.com/privacy","http://www.blogbasics.com/terms-conditions/","http://www.blogbasics.com/contact/"],"publisher":["https://www.google.com/+Blogbasics"],"stylesheet":["http://blogbasics.com/wp-content/themes/tru/style.css?v=1427736009&ver=2.1.3","http://fonts.googleapis.com/css?family=Open+Sans%3A300%2C400italic%2C400%2C600%2C700%7CRokkitt&ver=1.5","http://optinskin.com/src-4/min/normalize.min.css?ver=4.3","http://blogbasics.com/wp-content/plugins/OptinSkin/skins/1/style.css?ver=4.3","http://blogbasics.com/wp-content/themes/tru/includes/lib/spyr_slidingshare/style.css?ver=0.9.3"]}
items = []
rel-urls = {"http://allisondduncan.com":{"rels":["external","nofollow"],"text":"Allison Duncan"},"http://bit.ly/1EQF6HG":{"rels":["nofollow"]},"http://blogbasics.com":{"rels":["external","nofollow"],"text":"Paul Odtaa"},"http://blogbasics.com/examples-of-blogs/":{"rels":["canonical"]},"http://blogbasics.com/examples-of-blogs/#comment-1407":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-18326":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-22883":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-261":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-262":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-26672":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-270":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-3036":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-5682":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-6877":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-736":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-8615":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-8684":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#respond":{"rels":["nofollow"],"text":"Cancel reply"},"http://blogbasics.com/wp-content/plugins/OptinSkin/skins/1/style.css?ver=4.3":{"media":"all","rels":["stylesheet"],"type":"text/css"},"http://blogbasics.com/wp-content/themes/tru/includes/lib/spyr_slidingshare/style.css?ver=0.9.3":{"media":"all","rels":["stylesheet"],"type":"text/css"},"http://blogbasics.com/wp-content/themes/tru/style.css?v=1427736009&ver=2.1.3":{"media":"all","rels":["stylesheet"],"type":"text/css"},"http://blogbasics.com/wp-content/uploads/apple-touch-icon-114x114.png":{"rels":["apple-touch-icon-precomposed"]},"http://blogbasics.com/wp-content/uploads/apple-touch-icon-144x144.png":{"rels":["apple-touch-icon-precomposed"]},"http://blogbasics.com/wp-content/uploads/apple-touch-icon-57x57.png":{"rels":["apple-touch-icon-precomposed"]},"http://blogbasics.com/wp-content/uploads/apple-touch-icon-72x72.png":{"rels":["apple-touch-icon-precomposed"]},"http://blogbasics.com/wp-content/uploads/cropped-Blog-Basics-favicon.png":{"rels":["Shortcut","Icon"],"type":"image/x-icon"},"http://ckckarate.com":{"rels":["external","nofollow"],"text":"Ed JP"},"http://fonts.googleapis.com/css?family=Open+Sans%3A300%2C400italic%2C400%2C600%2C700%7CRokkitt&ver=1.5":{"media":"all","rels":["stylesheet"],"type":"text/css"},"http://jin6000.tumblr.com/":{"rels":["external","nofollow"],"text":"Edward Carty"},"http://optinskin.com/src-4/min/normalize.min.css?ver=4.3":{"media":"all","rels":["stylesheet"],"type":"text/css"},"http://www.blogbasics.com":{"rels":["nofollow"],"text":"Blog Basics"},"http://www.blogbasics.com/contact/":{"rels":["nofollow"],"text":"Contact"},"http://www.blogbasics.com/privacy":{"rels":["nofollow"],"text":"Privacy Policy"},"http://www.blogbasics.com/terms-conditions/":{"rels":["nofollow"],"text":"Terms and Conditions"},"http://www.ferrypress.com/profile.php?u=Herman1879":{"rels":["external","nofollow"],"text":"hgh xl"},"http://www.kuldipsonsjewellers.com/Earring.html":{"rels":["external","nofollow"],"text":"Jewellery Shop in Ranchi"},"https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/":{"rels":["nofollow"],"text":"(Click Here)"},"https://www.google.com/+Blogbasics":{"rels":["publisher"]}}
Actually I need output in this form, not in json format. I need data content like this. I found this using JavaScript Microdata Parser from http://foolip.org/microdatajs/live/ but I need same kind of parser in java. Please suggest me. Thanks
"type": [ "http://schema.org/WebPage" ], "properties": { "mainContentOfPage": [ { "type": [ "http://schema.org/Blog" ], "properties": { "blogPost": [ { "type": [ "http://schema.org/BlogPosting" ], "properties": { "headline": [ "Examples of Blogs" ], "author": [ { "type": [ "http://schema.org/Person" ], "properties": { "name": [ "Kenneth Byrd" ] } } ],
Read more: http://foolip.org/microdatajs/live/#ixzz3lJJII7g0

Related

JSON validation returns different results

I have created a minimal application in order to debug the following problem:
The application is creating a Java object which includes a GeoJsonPoint. When the object gets validated against an automatically created json schema, I receive different results when starting the application multiple times. For example, I am starting the application for 5 times and the result is "Json not valid!". When I start the application another time, I receive the result "Json successfully validated".
The error message when the json is not valid tells me:
/geoPosition/coordinates: instance type (array) does not match any allowed primitive type (allowed: [object])
The following line is randomly returning a different json schema:
JsonNode fstabSchema = schemaFactory.createSchema(inputObj.getClass());
I do not understand that this happens randomly. Has someone seen this behavior before?
To exclude dependency problems during runtime, I have created a jar with all dependencies (jar-with-dependencies)
Below are my files:
pom.xml dependencies:
<dependencies>
<dependency>
<groupId>com.github.reinert</groupId>
<artifactId>jjschema</artifactId>
<version>1.16</version>
</dependency>
<dependency>
<groupId>com.github.java-json-tools</groupId>
<artifactId>json-schema-validator</artifactId>
<version>2.2.12</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
<version>1.18.16</version>
</dependency>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-mongodb</artifactId>
<version>3.2.2</version>
</dependency>
</dependencies>
App.java:
package de.s2.json.test;
import java.util.ArrayList;
import com.github.fge.jsonschema.core.exceptions.ProcessingException;
import org.springframework.data.mongodb.core.geo.GeoJsonPoint;
public class App
{
public static void main( String[] args )
{
Address address = new Address();
address.setCountry("Deutschland");
GeoJsonPoint geoPoint = new GeoJsonPoint(12, 23);
address.setGeoPosition(geoPoint);
ArrayList<String> ret = null;
try {
ret = Toolbox.validateJson(address);
} catch (ProcessingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
if(ret != null) {
System.out.println("Json not valid!");
for (int i = 0; i < ret.size(); i++) {
System.out.println(ret.get(i));
}
} else {
System.out.println("Json successfully validated");
}
}
}
Toolbox.java:
package de.s2.json.test;
import java.util.ArrayList;
import java.util.Iterator;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.github.reinert.jjschema.v1.JsonSchemaV4Factory;
import com.github.fge.jsonschema.core.exceptions.ProcessingException;
import com.github.fge.jsonschema.core.report.ProcessingMessage;
import com.github.fge.jsonschema.core.report.ProcessingReport;
import com.github.fge.jsonschema.main.JsonSchema;
import com.github.fge.jsonschema.main.JsonSchemaFactory;
public class Toolbox {
public static <T> ArrayList<String> validateJson(T inputObj) throws ProcessingException {
com.github.reinert.jjschema.v1.JsonSchemaFactory schemaFactory = new JsonSchemaV4Factory();
schemaFactory.setAutoPutDollarSchema(true);
JsonNode fstabSchema = schemaFactory.createSchema(inputObj.getClass()); // <= here I get different results
final JsonSchemaFactory factory = JsonSchemaFactory.byDefault();
final JsonSchema schema = factory.getJsonSchema(fstabSchema);
ObjectMapper objectMapper = new ObjectMapper();
JsonNode baseReceiptJson = objectMapper.convertValue(inputObj, JsonNode.class);
ProcessingReport report;
report = schema.validate(baseReceiptJson);
ArrayList<String> validationErrorDetails = new ArrayList<String>();
if (!report.isSuccess()) {
StringBuilder builder = new StringBuilder();
builder.append("Not all required fields are filled with data");
builder.append(System.getProperty("line.separator"));
for (Iterator<ProcessingMessage> i = report.iterator(); i.hasNext();) {
ProcessingMessage msg = i.next();
builder.append(msg.asJson().findValue("instance").findValue("pointer").toString());
builder.append(": ");
builder.append(msg.getMessage());
String detail = msg.asJson().findValue("instance").findValue("pointer").toString() + ": " + msg.getMessage();
detail = detail.replace("\"", "");
validationErrorDetails.add(detail);
builder.append("\n");
}
return validationErrorDetails;
}
return null;
}
}
Address.java:
package de.s2.json.test;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.github.reinert.jjschema.Attributes;
import org.springframework.data.mongodb.core.geo.GeoJsonPoint;
import lombok.AllArgsConstructor;
import lombok.Getter;
import lombok.NoArgsConstructor;
import lombok.Setter;
#Getter
#Setter
#AllArgsConstructor
#NoArgsConstructor
public class Address
{
#JsonProperty("country")
private String country = "";
#JsonProperty("geoPosition")
#Attributes(required=false, description="longitude and latitude (it is initialized with 0,0)")
private GeoJsonPoint geoPosition = new GeoJsonPoint(0, 0);
}
Thank you very much for your support!
Update 1:
Answer to Hiran Chaudhuri
If the validation fails, then the schema looks like this:
{
"type": "object",
"properties": {
"country": {
"type": "string"
},
"geoPosition": {
"type": "object",
"properties": {
"coordinates": {
"type": "object"
},
"TYPE": {
"type": "string"
},
"x": {
"type": "number"
},
"y": {
"type": "number"
}
},
"description": "longitude and latitude (it is initialized with 0,0 which is inside the ocean)"
}
},
"$schema": "http://json-schema.org/draft-04/schema#"
}
If it is successful, then it looks like this:
{
"type": "object",
"properties": {
"country": {
"type": "string"
},
"geoPosition": {
"type": "object",
"properties": {
"coordinates": {
"type": "array",
"items": {
"type": "number"
}
},
"TYPE": {
"type": "string"
},
"x": {
"type": "number"
},
"y": {
"type": "number"
}
},
"description": "longitude and latitude (it is initialized with 0,0 which is inside the ocean)"
}
},
"$schema": "http://json-schema.org/draft-04/schema#"
}
As already in the error message indicated, one time the coordinates are an 'object' the other time an 'array'
Update 2
As this issue is blocking me to continue with my project, I have added an "ugly hack" to check in the schema if it contains a GeoJsonPoint. If it is included, it will patch it with the correct values.
package de.s2.json.test;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;
import com.github.reinert.jjschema.v1.JsonSchemaV4Factory;
import com.github.fge.jsonschema.core.exceptions.ProcessingException;
import com.github.fge.jsonschema.core.report.ProcessingMessage;
import com.github.fge.jsonschema.core.report.ProcessingReport;
import com.github.fge.jsonschema.main.JsonSchema;
import com.github.fge.jsonschema.main.JsonSchemaFactory;
public class Toolbox {
public static <T> ArrayList<String> validateJson(T inputObj) throws ProcessingException {
com.github.reinert.jjschema.v1.JsonSchemaFactory schemaFactory = new JsonSchemaV4Factory();
schemaFactory.setAutoPutDollarSchema(true);
JsonNode fstabSchema = schemaFactory.createSchema(inputObj.getClass());
ArrayList<String> validationErrorDetails = new ArrayList<String>();
// *****************************************************************************************
// /!\ this is an ugly hack /!\
// Randomly the function createSchema() returns an 'array' or 'object' type for coordinates
// of the geoJsonPoint. The correct value should be array. The following code checks if a
// geoJsonPoint is inside the schema and overwrites it with the correct value.
// TODO: fix this issue correctly
// *****************************************************************************************
try {
String jsonCoordinate = "{\"type\": \"array\",\"items\": {\"type\": \"number\"}}";
ObjectMapper mapper = new ObjectMapper();
JsonNode jsonNodeCoordinate = mapper.readTree(jsonCoordinate);
JsonNode coordJsonNode = fstabSchema.findValue("geoPosition").get("properties");
ObjectNode coordObjNode = (ObjectNode) coordJsonNode;
coordObjNode.set("coordinates", jsonNodeCoordinate);
} catch (JsonProcessingException e) {
validationErrorDetails.add("Could not patch geoPosition");
return validationErrorDetails;
} catch (IOException e) {
validationErrorDetails.add("Could not patch geoPosition");
return validationErrorDetails;
} catch (NullPointerException e) {
// this means that geoPosition could not be found
// we do nothing ...
}
// ******************
final JsonSchemaFactory factory = JsonSchemaFactory.byDefault();
final JsonSchema schema = factory.getJsonSchema(fstabSchema);
ObjectMapper objectMapper = new ObjectMapper();
JsonNode baseReceiptJson = objectMapper.convertValue(inputObj, JsonNode.class);
ProcessingReport report;
report = schema.validate(baseReceiptJson);
if (!report.isSuccess()) {
StringBuilder builder = new StringBuilder();
builder.append("Not all required fields are filled with data");
builder.append(System.getProperty("line.separator"));
for (Iterator<ProcessingMessage> i = report.iterator(); i.hasNext();) {
ProcessingMessage msg = i.next();
builder.append(msg.asJson().findValue("instance").findValue("pointer").toString());
builder.append(": ");
builder.append(msg.getMessage());
String detail = msg.asJson().findValue("instance").findValue("pointer").toString() + ": " + msg.getMessage();
detail = detail.replace("\"", "");
validationErrorDetails.add(detail);
builder.append("\n");
}
return validationErrorDetails;
}
return null;
}
}
By serializing the JSON Schema you generated we are able now to compare whether they are the same or how much they actually differ. Take into account that the method to generate a schema requires some input parameter.
I was never a friend of generating a schema based on actual data.
Why do you not strictly define the schema you want and let your code measure whether you really meet that structure? This way you can use the schema as a contract between application components since it is documented and not generated on the fly.
This is still no answer to the question why this happens randomly. But it is maybe a better practice to follow for you and others reading along.

How to get the last entry of a field in JSON

I use an API from api.nasa.gov that I query using the following Java class to read magnitude information on the latest solar flare of the day.
The response is used to fill in a field on a weather station that informs about current solar hazards.
The problem is that this code only returns the "classType" (magnitude) of first "flrID" (solar flare) event of the day. On July 3, for example, there was more than one event, and the most relevant event is the latest.
I am trying to find out how to get the "classType" of the last "flrID" in the JSON string, given that the flrID text contains an unknown arbitrary time of day.
Secondly, I am not sure how to get event driven updates from NASA, other than from pushed email messages. I am allowed an API request every 3.6 seconds, so that will work, but I am inquiring if there is a less expensive method to get near real time updates from NASA.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import org.json.JSONException;
import org.json.JSONObject;
public class NasaFlrApiReader {
public static final String API_KEY = "DEMO_KEY";
private static String readAll(Reader rd) throws IOException {
StringBuilder sb = new StringBuilder();
int cp;
while ((cp = rd.read()) != -1) {
sb.append((char) cp);
}
return sb.toString();
}
public static JSONObject readJsonFromUrl(String url) throws IOException, JSONException {
try (InputStream is = new URL(url).openStream()) {
BufferedReader rd = new BufferedReader(new InputStreamReader(is, StandardCharsets.UTF_8));
String jsonText = readAll(rd);
int i = jsonText.indexOf("{");
jsonText = jsonText.substring(i);
JSONObject json = new JSONObject(jsonText);
return json;
}
}
public static void main(String[] args) throws IOException, JSONException {
JSONObject json = readJsonFromUrl("https://api.nasa.gov/DONKI/FLR?startDate=2021-07-27-03&endDate=2021-07-03&api_key=" + API_KEY);
System.out.println("Class Type: " + json.get("classType"));
}
}
This is the JSON text that is returned from the above code:
[
{
"flrID": "2021-07-03T02:14:00-FLR-001",
"instruments": [
{
"displayName": "GOES-P: EXIS 1.0-8.0"
}
],
"beginTime": "2021-07-03T02:14Z",
"peakTime": "2021-07-03T02:31Z",
"endTime": "2021-07-03T02:39Z",
"classType": "C5.7",
"sourceLocation": "N23W75",
"activeRegionNum": 12838,
"linkedEvents": [
{
"activityID": "2021-07-03T03:48:00-CME-001"
}
],
"link": "https://kauai.ccmc.gsfc.nasa.gov/DONKI/view/FLR/17197/-1"
},
{
"flrID": "2021-07-03T07:04:00-FLR-001",
"instruments": [
{
"displayName": "GOES-P: EXIS 1.0-8.0"
}
],
"beginTime": "2021-07-03T07:04Z",
"peakTime": "2021-07-03T07:17Z",
"endTime": "2021-07-03T07:22Z",
"classType": "M2.7",
"sourceLocation": "N23W78",
"activeRegionNum": 12838,
"linkedEvents": [
{
"activityID": "2021-07-03T08:00:00-CME-001"
}
],
"link": "https://kauai.ccmc.gsfc.nasa.gov/DONKI/view/FLR/17189/-1"
},
{
"flrID": "2021-07-03T14:18:00-FLR-001",
"instruments": [
{
"displayName": "GOES-P: EXIS 1.0-8.0"
}
],
"beginTime": "2021-07-03T14:18Z",
"peakTime": "2021-07-03T14:29Z",
"endTime": "2021-07-03T14:34Z",
"classType": "X1.5",
"sourceLocation": "N23W80",
"activeRegionNum": 12838,
"linkedEvents": [
{
"activityID": "2021-07-03T14:48:00-CME-001"
}
],
"link": "https://kauai.ccmc.gsfc.nasa.gov/DONKI/view/FLR/17201/-1"
},
{
"flrID": "2021-07-03T16:59:00-FLR-001",
"instruments": [
{
"displayName": "GOES-P: EXIS 1.0-8.0"
}
],
"beginTime": "2021-07-03T16:59Z",
"peakTime": "2021-07-03T17:03Z",
"endTime": "2021-07-03T17:14Z",
"classType": "M1.0",
"sourceLocation": "N23W82",
"activeRegionNum": 12838,
"linkedEvents": [
{
"activityID": "2021-07-03T17:36:00-CME-001"
}
],
"link": "https://kauai.ccmc.gsfc.nasa.gov/DONKI/view/FLR/17208/-1"
}
]
The JSON contains an array so during deseralisation, you need to use JSONArray and then get the last element of the array using length-1,
JSONArray jsonArr = new JSONArray(jsonText);
JSONObject lastObject = (JSONObject) jsonArr.get(jsonArr.length()-1);
If length-1 does not give you the last record then you need to sort the JSONArray based on the required field and then take the first or last based on sorting. You can check the below thread for sorting the JSONArray
How can I sort a JSONArray in JAVA

Manipulate a JSON file and parse it into an Object after manipulation is done - in memory

My main goal is to get the information out of a JSON file and manipulate it in memory (trying to solve syntax issues in memory). Afterward, parse the manipulated value into an Object with com.google.gson.JsonParser and leave the old File as it was in the beginning (with the syntax errors included).
I managed to manipulate the File with RandomAccessFile and Parse it afterward while using "new FileReader". But after running through the code I noticed that the "old" file was modified and that shouldn't be the case. I don't want to change the file itself but the content and parse it into an Object with JsonParser or whatever Parser there is available for this.
import com.google.gson.JsonArray;
import com.google.gson.JsonParser;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.RandomAccessFile;
public class JsonLoggingParser {
public static final void main(String[] args) {
String path = "C:\\CEC\\Dev\\logs\\BIMBO\\2019-04-12asdf.json";
parseJsonLogFile(path);
}
public static void parseJsonLogFile(String filePath){
try {
File f = new File(filePath);
RandomAccessFile randomAccessFile = new RandomAccessFile(f, "rw");
randomAccessFile.seek(0);
randomAccessFile.write("[{".getBytes());
randomAccessFile.seek(f.length());
randomAccessFile.write("]".getBytes());
randomAccessFile.close();
JsonParser jsonParser = new JsonParser();
Object object = jsonParser.parse(new FileReader(f));
JsonArray jsonArray = (JsonArray)object;
System.out.println(jsonArray.toString());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
This is how my JSON file looks like:
{
"788ad1bc-e9c8-4be5-b5b6-26ba011dc40e-request": {
"date": "2019-04-15 10:28:58.943",
"url": "/BIMBO/credit",
"handleType": "request",
"status": "200",
"request": "Hello There",
"response": "",
"performanceDuration": "",
"principalID": "123456789"
}
},{
"788ad1bc-e9c8-4be5-b5b6-26ba011dc40e-response": {
"date": "2019-04-15 10:28:59.125",
"url": "/BIMBO/credit",
"handleType": "response",
"status": "422",
"request": "",
"response": "Hello Back",
"performanceDuration": "218.696979ms",
"principalID": ""
}
}
Basically I need to add a "[" and "]" at the beginning/end of the file to solve the syntax issue.
The json is not melformed in any way, its just a map like Map<String, Objekt> a TypeToken might help solve youre problem:
gson.fromJson(reader, new TypeToken<HashMap<String, class>>() {}.getType());
if you have a fileWriter it will overright data
example data:
Hello world
if you write a K at 0 it will look like this
Kello world

Is it possible to read and write Parquet using Java without a dependency on Hadoop and HDFS?

I've been hunting around for a solution to this question.
It appears to me that there is no way to embed reading and writing Parquet format in a Java program without pulling in dependencies on HDFS and Hadoop. Is this correct?
I want to read and write on a client machine, outside of a Hadoop cluster.
I started to get excited about Apache Drill, but it appears that it must run as a separate process. What I need is an in-process ability to read and write a file using the Parquet format.
You can write parquet format out side hadoop cluster using java Parquet Client API.
Here is a sample code in java which writes parquet format to local disk.
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericRecord;
import org.apache.hadoop.fs.Path;
import org.apache.parquet.avro.AvroSchemaConverter;
import org.apache.parquet.avro.AvroWriteSupport;
import org.apache.parquet.hadoop.ParquetWriter;
import org.apache.parquet.hadoop.metadata.CompressionCodecName;
import org.apache.parquet.schema.MessageType;
public class Test {
void test() throws IOException {
final String schemaLocation = "/tmp/avro_format.json";
final Schema avroSchema = new Schema.Parser().parse(new File(schemaLocation));
final MessageType parquetSchema = new AvroSchemaConverter().convert(avroSchema);
final WriteSupport<Pojo> writeSupport = new AvroWriteSupport(parquetSchema, avroSchema);
final String parquetFile = "/tmp/parquet/data.parquet";
final Path path = new Path(parquetFile);
ParquetWriter<GenericRecord> parquetWriter = new ParquetWriter(path, writeSupport, CompressionCodecName.SNAPPY, BLOCK_SIZE, PAGE_SIZE);
final GenericRecord record = new GenericData.Record(avroSchema);
record.put("id", 1);
record.put("age", 10);
record.put("name", "ABC");
record.put("place", "BCD");
parquetWriter.write(record);
parquetWriter.close();
}
}
avro_format.json,
{
"type":"record",
"name":"Pojo",
"namespace":"com.xx.test",
"fields":[
{
"name":"id",
"type":[
"int",
"null"
]
},
{
"name":"age",
"type":[
"int",
"null"
]
},
{
"name":"name",
"type":[
"string",
"null"
]
},
{
"name":"place",
"type":[
"string",
"null"
]
}
]
}
Hope this helps.

Elasticsearch FunctionScore query using Java API

I need to implement the below function_score query using Java APIs. I couldn't find any official documentation for function_score query in the Java API section of elasticsearch
"function_score": {
"functions": [
{
"boost_factor": "3",
"filter": {
"terms" : {"course_cd" : ["writ100", "writ112", "writ113"] }
}
}
],
"query": {
"match" : {
"party_id" : "12"
}
}
}
Please help!
Thanks to Jörg
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.index.query.functionscore.FunctionScoreQueryBuilder;
import java.util.Arrays;
import static org.elasticsearch.client.Requests.searchRequest;
import static org.elasticsearch.index.query.FilterBuilders.termsFilter;
import static org.elasticsearch.index.query.QueryBuilders.matchQuery;
import static org.elasticsearch.index.query.functionscore.ScoreFunctionBuilders.factorFunction;
import static org.elasticsearch.search.builder.SearchSourceBuilder.searchSource;
public class FunctionScoreTest {
public void testFunctionScore() {
SearchRequest searchRequest = searchRequest()
.source(searchSource().query(new FunctionScoreQueryBuilder(matchQuery("party_id", "12"))
.add(termsFilter("course_cd", Arrays.asList("writ100", "writ112", "writ113")), factorFunction(3.0f))));
}
}

Categories

Resources