Boilerpipe - How do I output JSON? - java

I am using boilerpipe and it seems great, but I want to output JSON. I am using the Java version and testing in NetBeans as follows:
final URL url = new URL("http://mashable.com/2012/09/26/worlds-best-father-kickstarter-calendar");
System.out.println(ArticleExtractor.INSTANCE.getText(url));
Can anyone tell me how I go about this?

Boilerpipe does not come with a JSON serializer.
You can, however, do this (assuming you already extracted all data):
public String articleTextToJson(String article, String title, String sourceUrl) {
if (null == article) {
return "{ \"error\" : { " +
" \"message\" : \"Article did not extract\", " +
" \"code\" : 1 " +
" }, " +
" \"status\" : \"error\" " +
"}";
}
return "{ \"response\" : { " +
" \"title\" : \"" + title + "\" " +
" \"content\" : \"" + article + "\", " +
" \"source\" : \"" + sourceUrl + "\" " +
" }, " +
" \"status\" : \"success\" " +
"}"
}
The tricky part will be of course getting the title...
Or better yet use some JSON serializer like JSONObject.
Hope that helps.

Related

How to validate array item types with networknt/json-schema-validator?

its probably a very simple thing that Im missing, but my json-schema-validator (networknt/json-schema-validator) doesnt validate the type of my items inside an array.
This Java-snippet should yield errors:
var schema = "{\n" +
" \"$schema\": \"http://json-schema.org/draft/2020-12/schema#\",\n" +
" \"type\": \"array\",\n" +
" \"items\": [\n" +
" {\n" +
" \"type\": \"integer\"\n" +
" }\n" +
" ]\n" +
"}";
var input = "[true,false,{},\"foo\"]";
var factory = JsonSchemaFactory.getInstance(SpecVersion.VersionFlag.V202012);
var jSchema = factory.getSchema(schema);
var errors = jSchema.validate(new ObjectMapper().readTree(input));
System.out.println("Errors: '"+errors.stream().map(ValidationMessage::getMessage).collect(joining(", ")) +"'");
but 'errors is empty. Over on www.jsonschemavalidator.net this is invalid as expected.
What obvious am I overlooking here ?
Thanks!

Having syntax issue for Spring Mongo #Query while trying to validate null values coming from the front end

I am trying to recieve data from the front end and then passing it to my #Query annotation, if the value is null , it would skip it , else it would look for it inside the Mongo database.
So far I have tried this #Query
#Query("{$and: [{"
+ " $or: [{"
+ " $where: '?0 == null'"
+ " }, {"
+ " name: '?0'"
+ " }]"
+ " }, {"
+ " $or: [{"
+ " $where: '?1 == null'"
+ " }, {"
+ " batch: '?1'"
+ " }]"
+ " }, {"
+ " $or: [{"
+ " $where: '?2 == null'"
` + " }, {"
+ " city: '?2'"
+ " }]"
+ " }]}")
Now for the above query i am getting the following syntax issue
SyntaxError: identifier starts immediately after numeric literal on server
Now i know the issue is how i am writing the $where part, but I am not really able to understand it.
Thanks.

Parse list of list to a JSON object

I have the following data, which is a list of lists:
"segmentation": [[239.97,260.24,222.04,270.49,199.84,253.41,213.5,227.79,259.62,200.46,274.13,202.17,277.55,210.71,249.37,253.41,237.41,264.51,242.54,261.95,228.87,271.34]]
What I need to do is to parse the information to a JSON object without removing the second braces.
I tried it with Jackson, but this fails with any data types.
Do you have any idea how to handle this?
Parse to JsonNode will work. I think u try with invalid json. check:
String value = "{\n" +
" \"segmentation\": [\n" +
" [\n" +
" 239.97,\n" +
" 260.24,\n" +
" 222.04,\n" +
" 270.49,\n" +
" 199.84,\n" +
" 253.41,\n" +
" 213.5,\n" +
" 227.79,\n" +
" 259.62,\n" +
" 200.46,\n" +
" 274.13,\n" +
" 202.17,\n" +
" 277.55,\n" +
" 210.71,\n" +
" 249.37,\n" +
" 253.41,\n" +
" 237.41,\n" +
" 264.51,\n" +
" 242.54,\n" +
" 261.95,\n" +
" 228.87,\n" +
" 271.34\n" +
" ]\n" +
" ]\n" +
"}";
JsonNode jsonNode = new ObjectMapper().readTree(value);

Java - Obtain text within script tag

How to get value https://example1.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886&ip=176.9.117.35 in this script?
the first Link value https://example.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886 just does not work.
String html ="<script>function getcookie(Name){\n" +
" var search=Name+\"=\";\n" +
" if(document.cookie.length>0){\n" +
" offset=document.cookie.indexOf(search)\n" +
" if(offset!=-1){\n" +
" offset+=search.length\n" +
" end=document.cookie.indexOf(\";\",offset)\n" +
" if(end==-1){end=document.cookie.length}\n" +
" return unescape(document.cookie.substring(offset, en))\n" +
" }\n" +
" }\n" +
"}\n" +
" var player = new Playerjs({\n" +
" \"id\":\"player\",\n" +
" \"poster\":\"https://media.example.com/img/2147414277.jpg\",\n" +
" \"file\":\"[SD (480p)]https://example.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886 or https://example1.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886&ip=176.9.117.35,[HD (720р)]https://example2.com/720/2147414277.mp4?md5=GXD4cKVnM5RVY363Uxn9ww&time=1580513886 or https://exampl3.com/720/2147414277.mp4?md5=GXD4cKVnM5RVY363Uxn9ww&time=1580513886&ip=190.9.117.35\",\n" +
" \"default_quality\":\"SD (480p)\"\n" +
" });\n" +
" function PlayerjsEvents(event,id,data){\n" +
" if(event==\"start\"){\n" +
" var xhttp = new XMLHttpRequest();\n" +
" xhttp.open(\"GET\", \"stat.php?id=2147414277\", true);\n" +
" xhttp.send();\n" +
" }\n" +
" if(event==\"end\"){\n" +
" \n" +
" }\n" +
" }</script>";
Pattern p = Pattern.compile("file\"",Pattern.DOTALL);
String url = "";
for (Element element : script) {
Matcher m = p.matcher(element.data());
if (m.find()){
url = m.group(1);
}
}
System.out.println(url);
I don’t really understand how the Java Regex Pattern works. I tried to find this URL many times but I'm failed.
I would be grateful if someone will help with that or at least give me a link guide of java regex for a newbie. thx
It seems to me you are trying to extract URLs from JS code using java.
There are many regex playgrounds on the web you can try such as:
http://buildregex.com
https://regex101.com
and of course Google
Search.
For your specific case :
import java.util.*;
import java.util.regex.*;
import java.net.*;
public class HelloWorld{
public static void main(String []args){
System.out.println("Hello World");
String jsCode ="<script>function getcookie(Name){\n" +
" var search=Name+\"=\";\n" +
" if(document.cookie.length>0){\n" +
" offset=document.cookie.indexOf(search)\n" +
" if(offset!=-1){\n" +
" offset+=search.length\n" +
" end=document.cookie.indexOf(\";\",offset)\n" +
" if(end==-1){end=document.cookie.length}\n" +
" return unescape(document.cookie.substring(offset, en))\n" +
" }\n" +
" }\n" +
"}\n" +
" var player = new Playerjs({\n" +
" \"id\":\"player\",\n" +
" \"poster\":\"https://media.example.com/img/2147414277.jpg\",\n" +
" \"file\":\"[SD (480p)]https://example.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886 or https://example1.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886&ip=176.9.117.35,[HD (720р)]https://example2.com/720/2147414277.mp4?md5=GXD4cKVnM5RVY363Uxn9ww&time=1580513886 or https://exampl3.com/720/2147414277.mp4?md5=GXD4cKVnM5RVY363Uxn9ww&time=1580513886&ip=190.9.117.35\",\n" +
" \"default_quality\":\"SD (480p)\"\n" +
" });\n" +
" function PlayerjsEvents(event,id,data){\n" +
" if(event==\"start\"){\n" +
" var xhttp = new XMLHttpRequest();\n" +
" xhttp.open(\"GET\", \"stat.php?id=2147414277\", true);\n" +
" xhttp.send();\n" +
" }\n" +
" if(event==\"end\"){\n" +
" \n" +
" }\n" +
" }</script>";
List<String> urls = new ArrayList<String>();
String myUrlPattern = "((https?|file):((//)|(\\\\))+[\\w\\d:##%/;$()~_?\\+-=\\\\\\.&]*)";
Pattern p = Pattern.compile(myUrlPattern);
Matcher m = p.matcher(jsCode);
while (m.find()) {
urls.add(m.group());
}
for(String s: urls)
System.out.println(s);
}
}
Gives out:
https://media.example.com/img/2147414277.jpg
https://example.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886
https://example1.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886&ip=176.9.117.35,
https://example2.com/720/2147414277.mp4?md5=GXD4cKVnM5RVY363Uxn9ww&time=1580513886
https://exampl3.com/720/2147414277.mp4?md5=GXD4cKVnM5RVY363Uxn9ww&time=1580513886&ip=190.9.117.35
Hope this helps!

Json object, convert json array into a json object in java

I have one json array like this
"values":[
{
"locale":"en_US",
"source_key":"book_format",
"value":"Hardback",
"display_attr_name":"Book Format",
"source_value":"Hardback",
"isPrimary":"true"
},
{
"isFacetValue":"true",
"facet_version":"1.1",
"locale":"en_US",
"value":"Hardcover"
}
]
I need to get the only the distinct keys from above json array
{
"locale":"en_US",
"source_key":"book_format",
"value":"Hardback",
"display_attr_name":"Book Format",
"source_value":"Hardback",
"isPrimary":"true",
"isFacetValue":"true",
"facet_version":"1.1"
}
And the output will be in the form of jsonobject.
https://github.com/octomix/josson
Josson josson = Josson.fromJsonString(
"{" +
" \"values\": [" +
" {" +
" \"locale\": \"en_US\"," +
" \"source_key\": \"book_format\"," +
" \"value\": \"Hardback\"," +
" \"display_attr_name\": \"Book Format\"," +
" \"source_value\": \"Hardback\"," +
" \"isPrimary\": \"true\"" +
" }," +
" {" +
" \"isFacetValue\": \"true\"," +
" \"facet_version\": \"1.1\"," +
" \"locale\": \"en_US\"," +
" \"value\": \"Hardcover\"" +
" }" +
" ]" +
"}");
JsonNode node = josson.getNode("values.mergeObjects()");
System.out.println(node.toPrettyString());
Output
{
"locale" : "en_US",
"source_key" : "book_format",
"value" : "Hardcover",
"display_attr_name" : "Book Format",
"source_value" : "Hardback",
"isPrimary" : "true",
"isFacetValue" : "true",
"facet_version" : "1.1"
}

Categories

Resources