Get token string from tokenID using Stanford Parser in GATE - java

I am trying to use some Java RHS to get the string value of dependent tokens using Stanford dependency parser in GATE, and add them as features of a new annotation.
I am having problems targeting just the 'dependencies' feature of the token, and getting the string value from the tokenID.
Using below specifying only 'depdencies' also throws a java null pointer error:
for(Annotation lookupAnn : tokens.inDocumentOrder())
{
FeatureMap lookupFeatures = lookupAnn.getFeatures();
token = lookupFeatures.get("dependencies").toString();
}
I can use below to get all the features of a token,
gate.Utils.inDocumentOrder
but it returns all features, including the dependent tokenID's; i.e:
dependencies = [nsubj(8390), dobj(8394)]
I would like to get just the dependent token's string value from these tokenID's.
Is there any way to access dependent token string value and add them as a feature to the annotation?
Many thanks for your help

Here is a working JAPE example. It only printns to the GATE's message window (std out), It doesn't create any new annotations with features you asked for. Please finish it yourself...
Stanford_CoreNLP plugin has to be loaded in GATE to make this JAPE file loadable. Otherwise you will get class not found exception for DependencyRelation class.
Imports: {
import gate.stanford.DependencyRelation;
}
Phase: GetTokenDepsPhase
Input: Token
Options: control = all
Rule: GetTokenDepsRule
(
{Token}
): token
-->
:token {
//note that tokenAnnots contains only a single annotation so the loop could be avoided...
for (Annotation token : tokenAnnots) {
Object deps = token.getFeatures().get("dependencies");
//sometimes the dependencies feature is missing - skip it
if (deps == null) continue;
//token.getFeatures().get("string") could be used instead of gate.Utils.stringFor(doc,token)...
System.out.println("Dependencies for token " + gate.Utils.stringFor(doc, token));
//the dependencies feature has to be typed to List<DependencyRelation>
List<DependencyRelation> typedDeps = (List<DependencyRelation>) deps;
for (DependencyRelation r : typedDeps) {
//use DependencyRelation.getTargetId() to get the id of the target token
//use inputAS.get(id) to get the annotation for its id
Annotation targetToken = inputAS.get(r.getTargetId());
//use DependencyRelation.getType() to get the dependency type
System.out.println(" " +r.getType()+ ": " +gate.Utils.stringFor(doc, targetToken));
}
}
}

Related

Convert json to dynamically generated protobuf in Java

Given the following json response:
{
"id" : "123456",
"name" : "John Doe",
"email" : "john.doe#example.com"
}
And the following user.proto file:
message User {
string id = 1;
string name = 2;
string email = 3;
}
I would like to have the possibility to dynamically create the protobuf message class (compile a .proto at runtime), so that if the json response gets enhanced with a field "phone" : "+1234567890" I could just upload a new version of the protobuf file to contain string phone = 4 and get that field exposed in the protobuf response, without a service restart.
If I were to pull these classes from a hat, I would like to be able to write something along the following code.
import com.googlecode.protobuf.format.JsonFormat;
import com.googlecode.protobuf.Message;
import org.apache.commons.io.FileUtils;
...
public Message convertToProto(InputStream jsonInputStream){
// get the latest user.proto file
String userProtoFile = FileUtils.readFileToString("user.proto");
Message userProtoMessage = com.acme.ProtobufUtils.compile(userProtoFile);
Message.Builder builder = userProtoMessage.newBuilderForType();
new JsonFormat().merge(jsonInputStream, Charset.forName("UTF-8"), builder);
return builder.build();
}
Is there an existing com.acme.ProtobufUtils.compile(...) method? Or how to implement one? Running a protoc + load class seems overkill, but I'm willing to use it if no other option...
You cannot compile the .proto file (at least not in Java), however you can pre-compile the .proto into a descriptor .desc
protoc --descriptor_set_out=user.desc user.proto
and then use the DynamicMessage's parser:
DynamicMessage.parseFrom(Descriptors.Descriptor type, byte[] data)
Source: google groups thread

Conversation ID leads to unkown path in graph-api

I have a code that fetches conversations and the messages inside them (a specific number of pages). It works most of the time, but for certain conversations it throws an exception, such as:
Exception in thread "main" com.restfb.exception.FacebookOAuthException: Received Facebook error response of type OAuthException: Unknown path components: /[id of the message]/messages (code 2500, subcode null)
at com.restfb.DefaultFacebookClient$DefaultGraphFacebookExceptionMapper.exceptionForTypeAndMessage(DefaultFacebookClient.java:1192)
at com.restfb.DefaultFacebookClient.throwFacebookResponseStatusExceptionIfNecessary(DefaultFacebookClient.java:1118)
at com.restfb.DefaultFacebookClient.makeRequestAndProcessResponse(DefaultFacebookClient.java:1059)
at com.restfb.DefaultFacebookClient.makeRequest(DefaultFacebookClient.java:970)
at com.restfb.DefaultFacebookClient.makeRequest(DefaultFacebookClient.java:932)
at com.restfb.DefaultFacebookClient.fetchConnection(DefaultFacebookClient.java:356)
at test.Test.main(Test.java:40)
After debugging I found the ID that doesn't work and tried to access it from graph-api, which results in an "unknown path components" error. I also attempted to manually find the conversation in me/conversations and click the next page link in the graph api explorer which also lead to the same error.
Is there a different way to retrieve a conversation than by ID? And if not, could someone show me an example to verify first if the conversation ID is valid, so if there are conversations I can't retrieve I could skip them instead of getting an error. Here's my current code:
Connection<Conversation> fetchedConversations = fbClient.fetchConnection("me/Conversations", Conversation.class);
int pageCnt = 2;
for (List<Conversation> conversationPage : fetchedConversations) {
for (Conversation aConversation : conversationPage) {
String id = aConversation.getId();
//The line of code which causes the exception
Connection<Message> messages = fbClient.fetchConnection(id + "/messages", Message.class, Parameter.with("fields", "message,created_time,from,id"));
int tempCnt = 0;
for (List<Message> messagePage : messages) {
for (Message msg : messagePage) {
System.out.println(msg.getFrom().getName());
System.out.println(msg.getMessage());
}
if (tempCnt == pageCnt) {
break;
}
tempCnt++;
}
}
}
Thanks in advance!
Update: Surrounded the problematic part with a try catch as a temporary solution, also counted the number of occurrences and it only effects 3 out of 53 conversations. I also printed all the IDs, and it seems that these 3 IDs are the only ones that contain a "/" symbol, I'm guessing it has something to do with the exception.
The IDs that work look something like this: t_[text] (sometimes a "." or a ":" symbol) and the ones that cause an exception are always t_[text]/[text]
conv_id/messages is not a valid graph api call.
messages is a field of conversation.
Here is what you do (single call to api):
Connection<Conversation> conversations = facebookClient.fetchConnection("me/conversations", Conversation.class);
for (Conversation conv : conversations.getData()) {
// To get list of messages for given conversation
LinkedList<Message> allConvMessagesStorage = new LinkedList<Message>();
Connection<Message> messages25 = facebookClient.fetchConnection(conv.getId()+"/messages", Message.class);
//Add messages returned
allConvMessagesStorage.addAll(messages25.getData());
//Check if there is next page to fetch
boolean progress = messages25.hasNext();
while(progress){
messages25 = facebookClient.fetchConnectionPage(messages25.getNextPageUrl(), Message.class);
//Append next page of messages
allConvMessagesStorage.addAll(messages25.getData());
progress = messages25.hasNext();
}
}

SYSTEM repository not showing repository details

In my machine, base/data directory contains multiple repositories. But when I access this data directory from java program it gives me only SYSTEM repository record.
Code to retrieve the repositories :
String dataDir = "D:\\SesameStorage\\data\\"
LocalRepositoryManager localManager = new LocalRepositoryManager(new File(dataDir));
localManager.initialize();
// Get all repositories
Collection<Repository> repos = localManager.getAllRepositories();
System.out.println("LocalRepositoryManager All repositories : "
+ repos.size());
for (Repository repo : repos) {
System.out.println("This is : " + repo.getDataDir());
RepositoryResult<Statement> idStatementIter = repo
.getConnection().getStatements(null,
RepositoryConfigSchema.REPOSITORYID, null,
true, new Resource[0]);
Statement idStatement;
try {
while (idStatementIter.hasNext()) {
idStatement = (Statement) idStatementIter.next();
if ((idStatement.getObject() instanceof Literal)) {
Literal idLiteral = (Literal) idStatement
.getObject();
System.out.println("idLiteral.getLabel() : "
+ idLiteral.getLabel());
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
Output :
LocalRepositoryManager All repositories : 1
This is : D:\SemanticStorage\data\repositories\SYSTEM
idLiteral.getLabel() : SYSTEM
Adding repository to LocalRepositoryManager :
String repositoryName = "data.ttl";
RepositoryConfig repConfig = new RepositoryConfig(repositoryName);
SailRepositoryConfig config = new SailRepositoryConfig(new MemoryStoreConfig());
repConfig.setRepositoryImplConfig(config);
manager.addRepositoryConfig(repConfig);
Getting the repository object :
Repository repository = manager.getRepository(repositoryName);
repository.initialize();
I have successfully added new repository to LocalRepositoryManager and it shows me the repository count to 2. But when I restart the application it shows me only one repository and that is the SYSTEM repository.
My SYSTEM repository is not getting updated, Please suggest me, how should I load that data directory in my LocalRepositoryManager object.
You haven't provided a comprehensive test case, just individual snippets of code with no clear indication of the order in which they get executed, which makes it somewhat hard to figure out what exactly is going wrong.
I would hazard a guess, however, that the problem is that you don't properly close and shut down resources. First of all you are obtaining a RepositoryConnection without ever closing it:
RepositoryResult<Statement> idStatementIter = repo
.getConnection().getStatements(null,
RepositoryConfigSchema.REPOSITORYID, null,
true, new Resource[0]);
You will need to change this to something like this:
RepositoryConnection conn = repo.getConnection();
try {
RepositoryResult<Statement> idStatementIter =
conn.getStatements(null,
RepositoryConfigSchema.REPOSITORYID, null,
true, new Resource[0]);
(... do something with the result here ...)
}
finally {
conn.close();
}
As an aside: if your goal is retrieve repository meta-information (id, title, location), the above code is far too complex. There is no need to open a connection to the SYSTEM repository to read this information at all, you can obtain this stuff directly from the RepositoryManager. For example, you can retrieve a list of repository identifiers simply by doing:
List<String> repoIds = localManager.getRepositoryIDs();
for (String id: repoIds) {
System.out.println("repository id: " + id);
}
Or if you want to also get the file location and/or description, use:
Collection<RepositoryInfo> infos = localManager.getAllRepositoryInfos();
for (RepositoryInfo info: infos) {
System.out.println("id: " + info.getId());
System.out.println("description: " + info.getDescription());
System.out.println("location: " + info.getLocation());
}
Another problem with your code is that I suspect you never properly call manager.shutDown() nor repository.shutDown(). Calling these when your program exits allows the manager and the repository to properly close resources, save state, and exit gracefully. Since you are creating a RepositoryManager object yourself, you need to care to do this on program exit yourself as well.
An alternative to creating your own RepositoryManager object is to use a RepositoryProvider instead (see also the relevant section in the Sesame Programmers Manual). This is a utility class that comes with a built-in shutdown hook, saving you from having to deal with these manager/repository shutdown issues.
So instead of this:
LocalRepositoryManager localManager = new LocalRepositoryManager(new File(dataDir));
localManager.initialize();
Do this:
LocalRepositoryManager localManager =
RepositoryProvider.getRepositoryManager(new File(datadir));

createUserDefinedFunction : if already exists?

I'm using azure-documentdb java SDK in order to create and use "User Defined Functions (UDFs)"
So from the official documentation I finally find the way (with a Java client) on how to create an UDF:
String regexUdfJson = "{"
+ "id:\"REGEX_MATCH\","
+ "body:\"function (input, pattern) { return input.match(pattern) !== null; }\","
+ "}";
UserDefinedFunction udfREGEX = new UserDefinedFunction(regexUdfJson);
getDC().createUserDefinedFunction(
myCollection.getSelfLink(),
udfREGEX,
new RequestOptions());
And here is a sample query :
SELECT * FROM root r WHERE udf.REGEX_MATCH(r.name, "mytest_.*")
I had to create the UDF one time only because I got an exception if I try to recreate an existing UDF:
DocumentClientException: Message: {"Errors":["The input name presented is already taken. Ensure to provide a unique name property for this resource type."]}
How should I do to know if the UDF already exists ?
I try to use "readUserDefinedFunctions" function without success. Any example / other ideas ?
Maybe for the long term, should we suggest a "createOrReplaceUserDefinedFunction(...)" on azure feedback
You can check for existing UDFs by running query using queryUserDefinedFunctions.
Example:
List<UserDefinedFunction> udfs = client.queryUserDefinedFunctions(
myCollection.getSelfLink(),
new SqlQuerySpec("SELECT * FROM root r WHERE r.id=#id",
new SqlParameterCollection(new SqlParameter("#id", myUdfId))),
null).getQueryIterable().toList();
if (udfs.size() > 0) {
// Found UDF.
}
An answer for .NET users.
`var collectionAltLink = documentCollections["myCollection"].AltLink; // Target collection's AltLink
var udfLink = $"{collectionAltLink}/udfs/{sampleUdfId}"; // sampleUdfId is your UDF Id
var result = await _client.ReadUserDefinedFunctionAsync(udfLink);
var resource = result.Resource;
if (resource != null)
{
// The UDF with udfId exists
}`
Here _client is Azure's DocumentClient and documentCollections is a dictionary of your documentDb collections.
If there's no such UDF in the mentioned collection, the _client throws a NotFound exception.

Using a resource loader for FileWritingMessageHandler

When using a directory-expression for an <int-file:outbound-gateway> endpoint, the method below is called on org.springframework.integration.file.FileWritingMessageHandler:
private File evaluateDestinationDirectoryExpression(Message<?> message) {
final File destinationDirectory;
final Object destinationDirectoryToUse = this.destinationDirectoryExpression.getValue(
this.evaluationContext, message);
if (destinationDirectoryToUse == null) {
throw new IllegalStateException(String.format("The provided " +
"destinationDirectoryExpression (%s) must not resolve to null.",
this.destinationDirectoryExpression.getExpressionString()));
}
else if (destinationDirectoryToUse instanceof String) {
final String destinationDirectoryPath = (String) destinationDirectoryToUse;
Assert.hasText(destinationDirectoryPath, String.format(
"Unable to resolve destination directory name for the provided Expression '%s'.",
this.destinationDirectoryExpression.getExpressionString()));
destinationDirectory = new File(destinationDirectoryPath);
}
else if (destinationDirectoryToUse instanceof File) {
destinationDirectory = (File) destinationDirectoryToUse;
} else {
throw new IllegalStateException(String.format("The provided " +
"destinationDirectoryExpression (%s) must be of type " +
"java.io.File or be a String.", this.destinationDirectoryExpression.getExpressionString()));
}
validateDestinationDirectory(destinationDirectory, this.autoCreateDirectory);
return destinationDirectory;
}
Based on this code I see that if the directory to use evaluates to a String, it uses that String to create a new java.io.File object.
Is there a reason that a ResourceLoader couldn't/shouldn't be used instead of directly creating a new file?
I ask because my expression was evaluating to a String of the form 'file://path/to/file/' which of course is an invalid path for the java.io.File(String) constructor. I had assumed that Spring would treat the String the same way as it treats the directory attribute on <int-file:outbound-gateway> and pass it through a ResourceLoader.
Excerpt from my configuration file:
<int-file:outbound-gateway
request-channel="inputChannel"
reply-channel="updateTable"
directory-expression="
'${baseDirectory}'
+
T(java.text.MessageFormat).format('${dynamicPathPattern}', headers['Id'])
"
filename-generator-expression="headers.filename"
delete-source-files="true"/>
Where baseDirectory is a property that changes per-environment of the form 'file://hostname/some/path/'
There's no particular reason that this is the case, it probably just wasn't considered at the time of implementation.
The request sounds reasonable to me and will benefit others (even though you have found a work-around), by providing simpler syntax. Please open an 'Improvement' JIRA issue; thanks.
While not directly answering the question, I wanted to post the workaround that I used.
In my XML configuration, I changed the directory-expression to evaluate to a file through the DefaultResourceLoader instead of a String.
So this is what my new configuration looked like:
<int-file:outbound-gateway
request-channel="inputChannel"
reply-channel="updateTable"
directory-expression=" new org.springframework.core.io.DefaultResourceLoader().getResource(
'${baseDirectory}'
+
T(java.text.MessageFormat).format('${dynamicPathPattern}', headers['Id'])).getFile()
"
filename-generator-expression="headers.filename"
delete-source-files="true"/>

Categories

Resources