I am using boilerpipe and it seems great, but I want to output JSON. I am using the Java version and testing in NetBeans as follows:
final URL url = new URL("http://mashable.com/2012/09/26/worlds-best-father-kickstarter-calendar");
System.out.println(ArticleExtractor.INSTANCE.getText(url));
Can anyone tell me how I go about this?
Boilerpipe does not come with a JSON serializer.
You can, however, do this (assuming you already extracted all data):
public String articleTextToJson(String article, String title, String sourceUrl) {
if (null == article) {
return "{ \"error\" : { " +
" \"message\" : \"Article did not extract\", " +
" \"code\" : 1 " +
" }, " +
" \"status\" : \"error\" " +
"}";
}
return "{ \"response\" : { " +
" \"title\" : \"" + title + "\" " +
" \"content\" : \"" + article + "\", " +
" \"source\" : \"" + sourceUrl + "\" " +
" }, " +
" \"status\" : \"success\" " +
"}"
}
The tricky part will be of course getting the title...
Or better yet use some JSON serializer like JSONObject.
Hope that helps.
Related
its probably a very simple thing that Im missing, but my json-schema-validator (networknt/json-schema-validator) doesnt validate the type of my items inside an array.
This Java-snippet should yield errors:
var schema = "{\n" +
" \"$schema\": \"http://json-schema.org/draft/2020-12/schema#\",\n" +
" \"type\": \"array\",\n" +
" \"items\": [\n" +
" {\n" +
" \"type\": \"integer\"\n" +
" }\n" +
" ]\n" +
"}";
var input = "[true,false,{},\"foo\"]";
var factory = JsonSchemaFactory.getInstance(SpecVersion.VersionFlag.V202012);
var jSchema = factory.getSchema(schema);
var errors = jSchema.validate(new ObjectMapper().readTree(input));
System.out.println("Errors: '"+errors.stream().map(ValidationMessage::getMessage).collect(joining(", ")) +"'");
but 'errors is empty. Over on www.jsonschemavalidator.net this is invalid as expected.
What obvious am I overlooking here ?
Thanks!
I am trying to recieve data from the front end and then passing it to my #Query annotation, if the value is null , it would skip it , else it would look for it inside the Mongo database.
So far I have tried this #Query
#Query("{$and: [{"
+ " $or: [{"
+ " $where: '?0 == null'"
+ " }, {"
+ " name: '?0'"
+ " }]"
+ " }, {"
+ " $or: [{"
+ " $where: '?1 == null'"
+ " }, {"
+ " batch: '?1'"
+ " }]"
+ " }, {"
+ " $or: [{"
+ " $where: '?2 == null'"
` + " }, {"
+ " city: '?2'"
+ " }]"
+ " }]}")
Now for the above query i am getting the following syntax issue
SyntaxError: identifier starts immediately after numeric literal on server
Now i know the issue is how i am writing the $where part, but I am not really able to understand it.
Thanks.
I have the following data, which is a list of lists:
"segmentation": [[239.97,260.24,222.04,270.49,199.84,253.41,213.5,227.79,259.62,200.46,274.13,202.17,277.55,210.71,249.37,253.41,237.41,264.51,242.54,261.95,228.87,271.34]]
What I need to do is to parse the information to a JSON object without removing the second braces.
I tried it with Jackson, but this fails with any data types.
Do you have any idea how to handle this?
Parse to JsonNode will work. I think u try with invalid json. check:
String value = "{\n" +
" \"segmentation\": [\n" +
" [\n" +
" 239.97,\n" +
" 260.24,\n" +
" 222.04,\n" +
" 270.49,\n" +
" 199.84,\n" +
" 253.41,\n" +
" 213.5,\n" +
" 227.79,\n" +
" 259.62,\n" +
" 200.46,\n" +
" 274.13,\n" +
" 202.17,\n" +
" 277.55,\n" +
" 210.71,\n" +
" 249.37,\n" +
" 253.41,\n" +
" 237.41,\n" +
" 264.51,\n" +
" 242.54,\n" +
" 261.95,\n" +
" 228.87,\n" +
" 271.34\n" +
" ]\n" +
" ]\n" +
"}";
JsonNode jsonNode = new ObjectMapper().readTree(value);
How to get value https://example1.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886&ip=176.9.117.35 in this script?
the first Link value https://example.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886 just does not work.
String html ="<script>function getcookie(Name){\n" +
" var search=Name+\"=\";\n" +
" if(document.cookie.length>0){\n" +
" offset=document.cookie.indexOf(search)\n" +
" if(offset!=-1){\n" +
" offset+=search.length\n" +
" end=document.cookie.indexOf(\";\",offset)\n" +
" if(end==-1){end=document.cookie.length}\n" +
" return unescape(document.cookie.substring(offset, en))\n" +
" }\n" +
" }\n" +
"}\n" +
" var player = new Playerjs({\n" +
" \"id\":\"player\",\n" +
" \"poster\":\"https://media.example.com/img/2147414277.jpg\",\n" +
" \"file\":\"[SD (480p)]https://example.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886 or https://example1.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886&ip=176.9.117.35,[HD (720р)]https://example2.com/720/2147414277.mp4?md5=GXD4cKVnM5RVY363Uxn9ww&time=1580513886 or https://exampl3.com/720/2147414277.mp4?md5=GXD4cKVnM5RVY363Uxn9ww&time=1580513886&ip=190.9.117.35\",\n" +
" \"default_quality\":\"SD (480p)\"\n" +
" });\n" +
" function PlayerjsEvents(event,id,data){\n" +
" if(event==\"start\"){\n" +
" var xhttp = new XMLHttpRequest();\n" +
" xhttp.open(\"GET\", \"stat.php?id=2147414277\", true);\n" +
" xhttp.send();\n" +
" }\n" +
" if(event==\"end\"){\n" +
" \n" +
" }\n" +
" }</script>";
Pattern p = Pattern.compile("file\"",Pattern.DOTALL);
String url = "";
for (Element element : script) {
Matcher m = p.matcher(element.data());
if (m.find()){
url = m.group(1);
}
}
System.out.println(url);
I don’t really understand how the Java Regex Pattern works. I tried to find this URL many times but I'm failed.
I would be grateful if someone will help with that or at least give me a link guide of java regex for a newbie. thx
It seems to me you are trying to extract URLs from JS code using java.
There are many regex playgrounds on the web you can try such as:
http://buildregex.com
https://regex101.com
and of course Google
Search.
For your specific case :
import java.util.*;
import java.util.regex.*;
import java.net.*;
public class HelloWorld{
public static void main(String []args){
System.out.println("Hello World");
String jsCode ="<script>function getcookie(Name){\n" +
" var search=Name+\"=\";\n" +
" if(document.cookie.length>0){\n" +
" offset=document.cookie.indexOf(search)\n" +
" if(offset!=-1){\n" +
" offset+=search.length\n" +
" end=document.cookie.indexOf(\";\",offset)\n" +
" if(end==-1){end=document.cookie.length}\n" +
" return unescape(document.cookie.substring(offset, en))\n" +
" }\n" +
" }\n" +
"}\n" +
" var player = new Playerjs({\n" +
" \"id\":\"player\",\n" +
" \"poster\":\"https://media.example.com/img/2147414277.jpg\",\n" +
" \"file\":\"[SD (480p)]https://example.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886 or https://example1.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886&ip=176.9.117.35,[HD (720р)]https://example2.com/720/2147414277.mp4?md5=GXD4cKVnM5RVY363Uxn9ww&time=1580513886 or https://exampl3.com/720/2147414277.mp4?md5=GXD4cKVnM5RVY363Uxn9ww&time=1580513886&ip=190.9.117.35\",\n" +
" \"default_quality\":\"SD (480p)\"\n" +
" });\n" +
" function PlayerjsEvents(event,id,data){\n" +
" if(event==\"start\"){\n" +
" var xhttp = new XMLHttpRequest();\n" +
" xhttp.open(\"GET\", \"stat.php?id=2147414277\", true);\n" +
" xhttp.send();\n" +
" }\n" +
" if(event==\"end\"){\n" +
" \n" +
" }\n" +
" }</script>";
List<String> urls = new ArrayList<String>();
String myUrlPattern = "((https?|file):((//)|(\\\\))+[\\w\\d:##%/;$()~_?\\+-=\\\\\\.&]*)";
Pattern p = Pattern.compile(myUrlPattern);
Matcher m = p.matcher(jsCode);
while (m.find()) {
urls.add(m.group());
}
for(String s: urls)
System.out.println(s);
}
}
Gives out:
https://media.example.com/img/2147414277.jpg
https://example.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886
https://example1.com/2147414277.mp4?md5=OFvyZ55egHb4A5hUZJvSEQ&time=1580513886&ip=176.9.117.35,
https://example2.com/720/2147414277.mp4?md5=GXD4cKVnM5RVY363Uxn9ww&time=1580513886
https://exampl3.com/720/2147414277.mp4?md5=GXD4cKVnM5RVY363Uxn9ww&time=1580513886&ip=190.9.117.35
Hope this helps!
I have one json array like this
"values":[
{
"locale":"en_US",
"source_key":"book_format",
"value":"Hardback",
"display_attr_name":"Book Format",
"source_value":"Hardback",
"isPrimary":"true"
},
{
"isFacetValue":"true",
"facet_version":"1.1",
"locale":"en_US",
"value":"Hardcover"
}
]
I need to get the only the distinct keys from above json array
{
"locale":"en_US",
"source_key":"book_format",
"value":"Hardback",
"display_attr_name":"Book Format",
"source_value":"Hardback",
"isPrimary":"true",
"isFacetValue":"true",
"facet_version":"1.1"
}
And the output will be in the form of jsonobject.
https://github.com/octomix/josson
Josson josson = Josson.fromJsonString(
"{" +
" \"values\": [" +
" {" +
" \"locale\": \"en_US\"," +
" \"source_key\": \"book_format\"," +
" \"value\": \"Hardback\"," +
" \"display_attr_name\": \"Book Format\"," +
" \"source_value\": \"Hardback\"," +
" \"isPrimary\": \"true\"" +
" }," +
" {" +
" \"isFacetValue\": \"true\"," +
" \"facet_version\": \"1.1\"," +
" \"locale\": \"en_US\"," +
" \"value\": \"Hardcover\"" +
" }" +
" ]" +
"}");
JsonNode node = josson.getNode("values.mergeObjects()");
System.out.println(node.toPrettyString());
Output
{
"locale" : "en_US",
"source_key" : "book_format",
"value" : "Hardcover",
"display_attr_name" : "Book Format",
"source_value" : "Hardback",
"isPrimary" : "true",
"isFacetValue" : "true",
"facet_version" : "1.1"
}