I've recently found out about UIMA (http://uima.apache.org/). It looks promising for simple NLP tasks, such as tokenizing, sentence splitting, part-of-speech tagging etc.
I've managed to get my hands on an already configured minimal java sample that is using OpenNLP components for its pipeline.
The code looks like this:
public void ApplyPipeline() throws IOException, InvalidXMLException,
ResourceInitializationException, AnalysisEngineProcessException {
XMLInputSource in = new XMLInputSource(
"opennlp/OpenNlpTextAnalyzer.xml");
ResourceSpecifier specifier = UIMAFramework.getXMLParser()
.parseResourceSpecifier(in);
AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(specifier);
JCas jcas = ae.newJCas();
jcas.setDocumentText("This is my text.");
ae.process(jcas);
this.doSomethingWithResults(jcas);
jcas.reset();
ae.destroy();
}
private void doSomethingWithResults(JCas jcas) {
AnnotationIndex<Annotation> idx = jcas.getAnnotationIndex();
FSIterator<Annotation> it = idx.iterator();
while (it.hasNext()) {
System.out.println(it.next().toString());
}
}
Excerpt from OpenNlpTextAnalyzer.xml:
<delegateAnalysisEngine key="SentenceDetector">
<import location="SentenceDetector.xml" />
</delegateAnalysisEngine>
<delegateAnalysisEngine key="Tokenizer">
<import location="Tokenizer.xml" />
</delegateAnalysisEngine>
The java code produces output like this:
Token
sofa: _InitialView
begin: 426
end: 435
pos: "NNP"
I'm trying to get the same information from each Annotation object that the toString() method uses. I've already looked into UIMA's source code to understand where the values are coming from. My attempts to retrieve them sort of works, but they aren't smart in any way.
I'm struggling to find easy examples that, extract information out of the JCas objects.
I'm looking for a way to get for instance all Annotations produces by my PosTagger or by the SentenceSplitter for further usage.
I guess
List<Feature> feats = it.next().getType().getFeatures();
is a start to get values, but due to UIMA owns classes for primitive types, even the source code of the toString method in the annotation class reads like a slap in the face.
Where do I find java code that uses basic UIMA stuff and where are good tutorials (except javadoc from the framework itself)?
Generate JCas wrapper classes for your annotation types (you can do this using the type system editor UIMA plugin for Eclipse that comes with UIMA). This will provide you with Java classes that you can use to access the annotations - these offer getters and setters for features.
You should have a look at uimaFIT, which provides a more convenient API including convenience methods to retrieve annotations from the JCas, e.g. select(jcas, Token.class) (where Token.class is one of the classes you generated with the type system editor).
You could find some quick-starting Groovy scripts and a collection of UIMA components on the DKPro Core page.
There is material from the UIMA#GSCL 2013 tutorial (slides and sample code) which might be useful for you. Go here and scroll down to "Tutorial".
Disclosure: I'm developer on UIMA, uimaFIT, DKPro Core and co-organizer on the UIMA#GSCL 2013 workshop.
Related
I'm going to use lots of custom documentation notes all around code base of Kotlin and Java project. Seems like reasonable choice would be to use annotation.
As far as I know annotation is some sort of magic, handled by build tools. But I need to work with it in the Kotlin/Java code.
So we have two files /src/some/thing.kt
package some
annotation class Doc
#Doc
private fun some_doc() {
println("some doc")
}
and /src/generate_docs.kt
fun main() {
Find and call all the functions over all
the codebase marked with #Doc
And then run some other code to process
those notes and output HTML docs
}
How could it be done? Basically I can do it by manually writing all those calls, but I hope there's a better way.
fun main() {
some_doc()
another_doc()
yet_another_doc()
...
couple tens or hundreds more lines
And then run some other code to process
those notes and output HTML docs
}
If possible I would like to avoid Maven plugins and magic and just have plain old Java/Kotlin code I can run as java GenerateDocsKt.
I am trying to create a plugin to generate some java code and write back to the main source module. I was able to create a some simple pojo class using JavaPoet and write to the src/main/java.
To make this useful, it should read the code from src/maim/java folder and analyze the classes using reflection. Look for some annotation then generate some codes. Do I use the SourceTask for this case. Looked like I can only access the classes by the files. Is that possible to read the java classes as the class and using reflection analyze the class?
Since you specified what you want to do:
You'll need to implement an annotation processor. This has absolutely nothing to do with gradle, and a gradle plugin is actually the wrong way to go about this. Please look into Java Annotation Processor and come back with more questions if any come up.
With JavaForger you can read input classes and generate sourcecode based on that. It also provides an API to insert it into existing classes or create new classes based on the input file. In contrast to JavaPoet, JavaForger has a clear separation between code to be generated and settings on where and how to insert it. An example of a template for a pojo can look like this:
public class ${class.name}Data {
<#list fields as field>
private ${field.type} ${field.name};
</#list>
<#list fields as field>
public ${field.type} ${field.getter}() {
return ${field.name};
}
public void ${field.setter}(${field.type} ${field.name}) {
this.${field.name} = ${field.name};
}
</#list>
}
The example below uses a template called "myTemplate.javat" and adds some extra settings like creating the file if it does not exist and changing the path where the file will be created from */path/* to */pathToDto/*. The the path to the input class is given to read the class name and fields and more.
JavaForgerConfiguration config = JavaForgerConfiguration.builder()
.withTemplate("myTemplate.javat")
.withCreateFileIfNotExists(true)
.withMergeClassProvider(ClassProvider.fromInputClass(s -> s.replace("path", "pathToPojo")))
.build();
JavaForger.execute(config, "MyProject/path/inputFile.java");
If you are looking for a framework that allows changing the code more programatticaly you can also look at JavaParser. With this framework you can construct an abstract syntax tree from a java class and make changes to it.
Intro:
I'm asking this before I try, fail and get frustrated as I have 0 experience with Apache Ant. A simple 'yes this will work' may suffice, or if it won't please tell me what will.
Situation:
I'm working on a project that uses JavaFX to create a GUI. JavaFX relies on Java Bean-like objects that require a lot of boilerplate code for it's properties. For example, all functionality I want to have is a String called name with default value "Unnamed", or in a minimal Java syntax:
String name = "Unnamed";
In JavaFX the minimum amount of code increases a lot to give the same functionality (where functionality in this case means to me that I can set and get a certain variable to use in my program):
private StringProperty name = new StringProperty("Unnamed");
public final String getName() { return name.get(); }
public final void setName(String value) { name.set(value); }
Question: Can I use Ant to generate this boilerplate code?
It seems possible to make Ant scripts that function as (Java) preprocessors. For instance by using the regex replace (https://ant.apache.org/manual/Tasks/replaceregexp.html) functions. I'm thinking of lines of code similar to this in my code, which then will be auto-replaced:
<TagToSignifyReplaceableLine> StringProperty person "Unnamed"
Final remark: As I've said before I have never used Ant before, so I want to check with you if 1) this can be done and 2) if this is a good way to do it or if there are better ways.
Thanks!
Yes, possible. You can even implement your own Ant task, that does this job very easily.
Something like so in ant:
<taskdef name="codegen" classpath="bin/" classname="com.example.CodeGen" />
and then
<codegen className="Test.java">
<Property name="StringProperty.name" value="Unnamed"/>
</codegen>
The CodeGen.java then like so:
public class CodeGen extends Task {
private String className = null;
private List properties = new ArrayList();
public void setClassName(String className) {
this.className = className;
}
/**
* Called by ant for every <property> tag of the task.
*
* #param property The property.
*/
public void addConfiguredProperty(Property property) {
properties.add(property);
}
public void execute() throws BuildException {
// here we go!
}
}
I know it can be done because my previous firm used ant to generate model objects in java.
The approach they used was to define model objects in an XML file and run an ant task to generate the pojo and dto.
I quickly googled and saw that there are tools that allow you to generate java from XML. You could probably give your schema/default values etc in XML and have an nt task to run the tool.
I would look at JSR-269 specifically: genftw which makes JSR-269 easier...
And yes it will work with Ant with out even having to write a plugin and will work better than a brittle RegEx.
The other option if your really adventurous is to check out XText for code generation but it is rather complicated.
Yes, it can be done :-)
I once wrote a webservices adapter that used a WSDL document (XML file describing a SOAP based webservice) to generate the POJO Java class that implemented the functional interface to my product. What lead me to do this was the mindlessly repetitive Java code which was necessary to talk to our proprietary system.
The technical solution used an XSLT stylesheet to transform the input XML document into an output Java text file which was subsequently compiled by ANT.
<!-- Generate the implementation classes -->
<xslt force="true" style="${resources.dir}/javaServiceStub.xsl" in="${src.dir}/DemoService.wsdl" out="${build.dir}/DemoService/src/com/myspotontheweb/DemoServiceSkeleton.java" classpathref="project.path">
<param name="package" expression="com.myspotontheweb"/>
..
..
</xslt>
Unfortunately XSLT is the closest thing to a templating engine supported by native ANT.
Best of luck!
I've defined a simple Xtext grammar which looks like this (simplified):
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
System:
'Define System'
(
'Define Components' '{' components+=Component+ '}'
)
'End'
;
Component:
'Component' name=ID 'Value' value=Double ';'
;
Double returns ecore::EDouble:
'-'? INT? '.' INT
;
The problem I like to solve is - how can I convert a simple Java Object to a valid xtext file?
To simplify my problem, lets say we create a list of components in Java:
List<Component> components = new ArrayList<Component>();
components.add(new Component("FirstComponent", 1.0));
components.add(new Component("SecondComponent", 2.0));
components.add(new Component("ThirdComponent", 3.0));
The output-file I like to create should look like this:
Define System
Define Components {
Component FirstComponent Value 1.0;
Component SecondComponent Value 2.0;
Component ThirdComponent Value 3.0;
}
End
It is important that this file is checked by the xtext grammar, so that it's valid.
I Hope you have any ideas for me. Here are some of mine, but so far I don't know how to implement them:
Idea #1:
I know how to read and write a file. In my head one solution could look like this:
I have the list in my Java code, now I like to write a file which looks like the output-file above. Afterwards I like to read this file and check for errors by the grammar. How can I do this?
Idea #2:
If I imagine I would create a xml file out of Java code using JDOM, I wish I could do the same in xtext. Just define a parent "Define System" which ends with "End" (see my output-file) and then add a child "Define Components {" which ends with "}" and then add the children to this, e.g. "Component FirstComponent Value 1.0;". I hope this isn't confusing :-)
Idea #3:
I could use a template like the following and add children between the braces "{" ... "}":
Define System
Define Components { ... }
End
Btw: I already tried Linking Xtext with StringTemplate code generator, but it is kind of another problem. Hope you have any ideas.
You can use Xtext's serialization for this. Unlike Java's default Serialization API, Xtext's implementation creates the DSL.
The code would look like so:
Injector injector = Guice.createInjector(new my.dsl.MyDslRuntimeModule());
Serializer serializer = injector.getInstance(Serializer.class);
String s = serializer.serialize(eobj);
where eobj is an instance of System.
If you have written a formatter for your DSL, the output will also look nice.
Related blog post: Implement toString with Xtext's Serializer
Xtext provides an EMF-based AST for you. This AST features classes like System and Component together with their corresponding attributes, such as the Value attribute of Component. These classes are available in the src-gen folder of your language project.
To instantiate these objects, you have to use a factory class, also available in the same package.
To serialize such an AST, it is possible to reuse standard EMF tooling by creating a resource, and saving the contents. During serialization the AST is validated.
System system = ...; //Creating the AST programmatically
ResourceSet set = new ResourceSetImpl();
Resource resource = set.createResource(URI.create...URI("filename")); //Initializing an EMF resource that represents a file
resource.getContents.add(system); //adding your AST to the file resource
resource.save();
Minor remark: if you are not developing an Eclipse plug-in, you have to initialize the Xtext tooling by calling generated «YourLanguage»StandaloneSetup.doSetup() static method.
For other programmatic validation options, you can have a look at the ParseHelper and ValidatorTester classes used by the Xtext test framework.
I while ago I wrote a Java application that processes XML with XSLT using Xalan. Now I'm trying to move towards Spring.
I've been having trouble accessing components. As far as I can tell my XML, XSLT and Java objects are correct, but Spring cannot seem to find and reference the components I want to access.
...
<axslt:component prefix="oni" functions="say">
<axslt:script lang="javaclass" src="xslt.components.TestComponent" />
</axslt:component>
...
I also tried with a JavaScript component (with bsf.jar and js.jar) and that also fails.
...
<axslt:component prefix="js" functions="say">
<xalan:script lang="javascript">
function say() { return "Hello from JavaScript"; }
</xalan:script>
</axslt:component>
...
I consistently get this error:
javax.xml.transform.TransformerConfigurationException: Could not compile stylesheet
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(Unknown Source)
org.springframework.web.servlet.view.xslt.XsltView.loadTemplates(XsltView.java:417)
...
I've looked online and haven't found a lot to go on. Spring+XSLT doesn't seem to be a very prominent topic. Any suggestions on something in Spring I need to configure, or something I would need to extend?
The source code for Spring's XsltView class is freely available. I suggest reading it to see how it uses the XSLT API, and compare that with how your own code did it.