Using Alchemy Entity Extraction to retrieve JSON output - java

I am running the EntityTest.java file from the Alchemy API Java SDK which can be found here. The programs works just fine, but it seems there is no way to change output format to JSON.
I have tried executing this code-
// Create an AlchemyAPI object.
AlchemyAPI alchemyObj = AlchemyAPI.GetInstanceFromFile("api_key.txt");
// Force the output type to be JSON
AlchemyAPI_NamedEntityParams params = new AlchemyAPI_NamedEntityParams();
params.setOutputMode("json");
// Extract a ranked list of named entities for a web URL.
Document doc = alchemyObj.URLGetRankedNamedEntities("http://www.techcrunch.com/", params);
System.out.println(getStringFromDocument(doc));
But the code throws a RunTimeException, and prints the following on console-
Exception in thread "main" java.lang.RuntimeException: Invalid setting json for parameter outputMode
at com.alchemyapi.api.AlchemyAPI_Params.setOutputMode(AlchemyAPI_Params.java:42)
at com.alchemyapi.test.EntityTest.main(EntityTest.java:29)
Also, here is the setOutputCode method from AlchemyAPI_Params.java file-
public void setOutputMode(String outputMode) {
if( !outputMode.equals(AlchemyAPI_Params.OUTPUT_XML) && !outputMode.equals(OUTPUT_RDF) )
{
throw new RuntimeException("Invalid setting " + outputMode + " for parameter outputMode");
}
this.outputMode = outputMode;
}
As is evident from the code, it seems that the only 2 acceptable output formats are XML and RDF. Is that so?? Is there no way the get the output in JSON?
Can anybody please help me out regarding that??

You will need to add new constant : OUTPUT_JSON in AlchemyAPI_Params and modify the setOutputMode method to accept it.
After that in AlchemyAPI :
You will need to modify the doRequest method with a the new OUTPUT_JSON case.
You can use :
http://www.oracle.com/technetwork/articles/java/json-1973242.html
to create the new content.
Hope it help

I solved the problem by resorting to a completely different approach. Instead of using the already available Java SDK, I made an HTTP connection to the endpoint of URLGetRankedNamedEntities API, and retrieved the response.
Here is a code sample that demonstrates how to do this-
URL urlObj = new URL("http://access.alchemyapi.com/calls/url/URLGetRankedNamedEntities?apikey=" + API_KEY_HERE + "&url=http://www.smashingmagazine.com/2015/04/08/web-scraping-with-nodejs/&outputMode=json");
System.out.println(urlObj.toString() + "\n");
URLConnection connection = urlObj.openConnection();
connection.connect();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
StringBuilder builder = new StringBuilder();
while ((line = reader.readLine()) != null) {
builder.append(line + "\n");
}
System.out.println(builder);
Similar endpoints are avaliable for other APIs as well, which can found here.

Related

Parse a String java

I have a BuilderString that contain the same result as in this link:
https://hadoop.apache.org/docs/current/hadoop-project-dist/
I'm looking to extract the values of the ``. And return a list of String that contain all the files name.
My code is:
try {
HttpURLConnection conHttp = (HttpURLConnection) url.openConnection();
conHttp.setRequestMethod("GET");
conHttp.setDoInput(true);
InputStream in = conHttp.getInputStream();
int ch;
StringBuilder sb = new StringBuilder();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
How can I parse JSON to take all the values of pathSuffix and return a list of string that contains the file names ?
Could you please give me a suggestion ? Thanks
That is JSON formatted data; JSON is not regular, tehrefore, trying to parse this with a regular expression is impossible, and trying to parse it out with substring and friends will take you a week and will be very error prone.
Read up on what JSON is (no worries; it's very simple to understand!), then get a good JSON library (the standard json.org library absolutely sucks, don't get that one), such as Jackson or GSON, and the code to extract what you need will be robust and easy to write and test.
The good option
Do the following steps:
Convert to JSON
Get the value using: JSONObject.get("FileStatuses").getAsJson().get("FileStatus").getAsJsonArray()
Iterate over all objects in the array to get the value you want
The bad option
Although as mentioned it is not recommended- If you want to stay with Strings you can use:
String str_to_find= "pathSuffix" : \"";
while (str.indexOf(str_to_find) != -1){
str = str.substring(str.indexOf(str_to_find)+str_to_find.length);
value = str.substring(0,str.indexOf("\""));
System.out.println("Value is " + value);
}
I would not recommend to build from scratch an API binding for hadoop.
This binding exist already for the Java language:
https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html#listLocatedStatus-org.apache.hadoop.fs.Path-org.apache.hadoop.fs.PathFilter-

Google drive api to get all children is not working if I dynamically pass fileId to query

I am trying google drive api to search parents of a folder. In search query i have to pass file id dynamically instead of hard coding. I tried below code. but I am getting file not found json response.
here its not taking fileId as value i think its consider as String
if I hardcode the value it is working.
FileList result = service.files().list().setQ("name='testfile' ").execute();
for (com.google.api.services.drive.model.File file : result.getFiles()) {
System.out.printf("Found file: %s (%s)\n",
file.getName(), file.getId());
String fileId =file.getId();
FileList childern = service.files().list().setQ(" + \"file.getId()\" in parents").setFields("files(id, name, modifiedTime, mimeType)").execute();
This should help.
String fileid=file.getId()
service.files().list().setQ("'" + fileId + "'" + " in parents").setFields("files(id, name, modifiedTime, mimeType)").execute();
Make sure you have valid file.getId()
I know your question states java but the only sample of this working is in C#. Another issue is as far as i know PageStreamer.cs does not have an equivalent in the java client library.
I am hoping that C# and java are close enough that this might give you some ideas of how to get it working in Java. My java knowledge is quote basic but i may be able to help you debug it if you want to try to convert this.
try
{
// Initial validation.
if (service == null)
throw new ArgumentNullException("service");
// Building the initial request.
var request = service.Files.List();
// Applying optional parameters to the request.
request = (FilesResource.ListRequest)SampleHelpers.ApplyOptionalParms(request, optional);
var pageStreamer = new Google.Apis.Requests.PageStreamer<Google.Apis.Drive.v3.Data.File, FilesResource.ListRequest, Google.Apis.Drive.v3.Data.FileList, string>(
(req, token) => request.PageToken = token,
response => response.NextPageToken,
response => response.Files);
var allFiles = new Google.Apis.Drive.v3.Data.FileList();
allFiles.Files = new List<Google.Apis.Drive.v3.Data.File>();
foreach (var result in pageStreamer.Fetch(request))
{
allFiles.Files.Add(result);
}
return allFiles;
}
catch (Exception Ex)
{
throw new Exception("Request Files.List failed.", Ex);
}

JAVA Delete API with Array String Body

Sorry in advance for my googled english,
I work with an API and I make a JAVA software that allows to use it.
I need to make a DELETE and the software.
I have to perform a deletion, and with the supplied software to test the API, I am shown that I have to add the line in a body to remove it, like this :
["email","Termine","13/03/2018 09:52:20",etc...,""].
The body must contain a String Array with all the contents of the line to delete.
I can make it work in the test software.
However I can not understand how to make a DELETE with JAVA. I can make it work in the software test. That's what I did for now:
public static String delete(String json, String nomUrl) throws IOException {
URL url = new URL(baseUrl + "survey/"+ nomUrl + "/data");
//String json = "[\"Marc#Houdijk.nl\",\"Contacte\",\"10/04/2018 11:30:05\",\"Avoriaz\",\"Office de Tourisme\",\"Accueil OT\",\"Neerlandais\",\"Semaine 6\",\"Periode 2\",\"16\",\"\",\"Hiver 2018\",\"BJBR-CDQB\",\"04/12/2018 14:15:13\",\"04/12/2018 14:15:13\",\"04/12/2018 14:15:13\",\"\",\"Direct\",\"\",\"\",\"\"]\n";
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setRequestMethod("DELETE");
con.setRequestProperty("Content-Type","application/json");
con.setRequestProperty("Accept","application/json");
con.setRequestProperty("Authorization","Bearer "+token);
con.setDoOutput(true);
DataOutputStream wr = new DataOutputStream(con.getOutputStream());
wr.writeBytes(json);
wr.flush();
wr.close();
int responseCode = con.getResponseCode();
StringBuilder responce = new StringBuilder();
responce.append("\\nSending 'DELETE' request to URL : ").append(url);
responce.append("\nResponse Code : ").append(responseCode);
BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
responce.append("\n").append(inputLine);
}
in.close();
return responce.toString();
}
I was inspired by what I did for the post and the get. But I do not see how to add a body correctly with my String Array to my delete function because it doesn't work, and the internet did not help me ...
Thank you in advance for your help !
EDIT : Finally, my code works. So if you want to delete with body, you can use this code. However, the problem comes from the json: I'm french, so was some accents on my words and special characters. After cleaning my string, everythings works.
EDIT : Finally, my code works. So if you want to delete with body, you can use this code. However, the problem comes from the json: I'm french, so was some accents on my words and special characters. After cleaning my string, everythings works.
You can create a POJO class with the fields required by RequestBody and send it to API, by Serializing the Object (Serialization means converting Java Objects into JSON and this can be done via GSON library). on API side you can easily get the ArrayList or whatever you want, just need to create same POJO class on server side as well, RequestBody will deserialize this JSON into Appropriate class, now via object of the class you can get whatever variables you want. Hope this helps.

how to exclude tag from XML String in java

I am making a piece of code to send and recieve data from and to an webpage. I am doeing this in java. But when i 'receive' the xml data it is still between tags like this
<?xml version='1.0'?>
<document>
<title> TEST </title>
</document>
How can i get the data without the tags in Java.
This is what i tried, The function writes the data and then should get the reponse and use that in a System.out.println.
public static String User_Select(String username, String password) {
String mysql_type = "1"; // 1 = Select
try {
String urlParameters = "mysql_type=" + mysql_type + "&username=" + username + "&password=" + password;
URL url = new URL("http://localhost:8080/HTTP_Connection/index.php");
URLConnection conn = url.openConnection();
conn.setDoOutput(true);
OutputStreamWriter writer = new OutputStreamWriter(conn.getOutputStream());
writer.write(urlParameters);
writer.flush();
String line;
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while ((line = reader.readLine()) != null) {
System.out.println(line);
//System.out.println("Het werkt!!");
}
writer.close();
reader.close();
return line;
} catch (IOException iox) {
iox.printStackTrace();
return null;
}
}
Thanks in advance
I would suggest simply using RegEx to read the XML, and get the tag content that you are after.
That simplifies what you need to do, and limits the inclusion of additional (unnecessary) libraries.
And then there are lots of StackOverflows on this topic: Regex for xml parsing and In RegEx, I want to find everything between two XML tags just to mention 2 of them.
use DOMParser in java.
Check further in java docs
Use an XML Parser to Parse your XML. Here is a link to Oracle's Tutorial
Oracle Java XML Parser Tutorial
Simply pass the InputStream from URLConnection
Document doc = DocumentBuilderFactory.
newInstance().
newDocumentBuilder().
parse(conn.getInputStream());
From there you could use xPath to query the contents of the document or simply walk the document model.
Take a look at Java API for XML Processing (JAXP) for more details
You have to use an XML Parser , in your case the perfect choice is JSoup which scrap data from the web and parse XML & HTML format ,it will load data and parse it and give you what you want , here is a an example of how it works :
1. XML From an URL
String xml = Jsoup.connect("http://localhost:8080/HTTP_Connection/index.php")
.get().toString();
Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
String myTitle=doc.select("title").first();// myTitle contain now TEST
Edit :
to send GET or POST parameters with you request use this code:
String xml = Jsoup.connect("http://localhost:8080/HTTP_Connection/index.php")
.data("param1Name";"param1Value")
.data("param2Name","param2Value").get().toString();
you can use get() to invoke HTTP GET method or post() to invoke HTTP POST method.
2. XML From String
You can use JSoup to parse XML data in a String :
String xmlData="<?xml version='1.0'?><document> <title> TEST </title> </document>" ;
Document doc = Jsoup.parse(xmlData, "", Parser.xmlParser());
String myTitle=doc.select("title").first();// myTitle contain now TEST

App engine Url request utf-8 characters becoming '??' or '???'

I have an error where I am loading data from a web-service into the datastore. The problem is that the XML returned from the web-service has UTF-8 characters and app engine is not interpreting them correctly. It renders them as ??.
I'm fairly sure I've tracked this down to the URL Fetch request. The basic flow is: Task queue -> fetch the web-service data -> put data into datastore so it definitely has nothing to do with request or response encoding of the main site.
I put log messages before and after Apache Digester to see if that was the cause, but determined it was not. This is what I saw in logs:
string from the XML: "Doppelg��nger"
After digester processed: "Doppelg??nger"
Here is my url fetching code:
public static String getUrl(String pageUrl) {
StringBuilder data = new StringBuilder();
log.info("Requesting: " + pageUrl);
for(int i = 0; i < 5; i++) {
try {
URL url = new URL(pageUrl);
URLConnection connection = url.openConnection();
connection.connect();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
data.append(line);
}
reader.close();
break;
} catch (Exception e) {
log.warn("Failed to load page: " + pageUrl, e);
}
}
String resp = data.toString();
if(resp.isEmpty()) {
return null;
}
return resp;
Is there a way I can force this to recognize the input as UTF-8. I tested the page I am loading and the W3c validator recognized it as valid utf-8.
The issue is only on app engine servers, it works fine in the development server.
Thanks
try
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8"));
I was drawn into the same issue 3 months back Mike. It does look like and I would assume your problems are same.
Let me recollect and put it down here. Feel free to add if I miss something.
My set up was Tomcat and struts.
And the way I resolved it was through correct configs in Tomcat.
Basically it has to support the UTF-8 character there itself. useBodyEncodingForURI in the connector. this is for GET params
Plus you can use a filter for POST params.
A good resource where yu can find all this in one roof is Click here!
I had a problem in the production thereafter where I had apache webserver redirecting request to tomcat :). Similarly have to enable UTF-8 there too. The moral of the story resolve the problem as it comes :)

Categories

Resources