I'm still new to the recognition process and still trying to learn more about it.
I have a project where I need to recognize: Table names, People, Departments.
I tried using Stanford NER with it's 3 class and it did recognize people names.
For the department names I tried to train the NER to recognize the departments as organizations. As I found no result on how to create a new annotation for them.
I did follow the instruction from their website.
First I did create a txt file with the following content:
Ahmad works in Customer Service department. The department name is
Customer Service. It has started in 1997, it was called Customer
Service since then. Customer Service has one manager and many
employees. The number of Customer Service department is 1122D. Ahmad
works in Development department. The department name is Development.
It has started in 1997, it was called Development since then.
Development has one manager and many employees. The number of
Development department is 1122D. Ahmad works in Finance department.
The department name is Finance. It has started in 1997, it was called
Finance since then. Finance has one manager and many employees. The
number of Finance department is 1122D. Ahmad works in Human Resources
department. The department name is Human Resources. It has started in
1997, it was called Human Resources since then. Human Resources has
one manager and many employees. The number of Human Resources
department is 1122D. Ahmad works in Marketing department. The
department name is Marketing. It has started in 1997, it was called
Marketing since then. Marketing has one manager and many employees.
The number of Marketing department is 1122D.
Then I used these commands:
java -cp stanford-ner.jar edu.stanford.nlp.process.PTBTokenizer corpus.txt > corpus.tok
perl -ne 'chomp; print "$_\tO\n"' corpus.tok > corpus.tsv
java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop corpus.prop
Then I got the following error:
CRFClassifier invoked on Mon Dec 01 09:38:10 AST 2014 with arguments:
-prop corpus.prop
argsToProperties could not read properties file: null
Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: java.io.IOException: Unable to resolve "corpus.prop" as either class path, filename or URL
at edu.stanford.nlp.util.StringUtils.argsToProperties(StringUtils.java:879)
at edu.stanford.nlp.util.StringUtils.argsToProperties(StringUtils.java:818)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:2869)
Caused by: java.io.IOException: Unable to resolve "corpus.prop" as either class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:448)
at edu.stanford.nlp.util.StringUtils.argsToProperties(StringUtils.java:866)
... 2 more
How can I train the classier properly?
Many Thanks
UPDATE:
Here is my .prop file
#location of the training file
trainFile = /Users/ha/stanford-ner-2014-10-26/corpus.tsv
#location where you would like to save (serialize to) your
#classifier; adding .gz at the end automatically gzips the file,
#making it faster and smaller
serializeTo = dept-model.ser.gz
#structure of your training file; this tells the classifier
#that the word is in column 0 and the correct answer is in
#column 1
map = word=0,answer=1
#these are the features we'd like to train with
#some are discussed below, the rest can be
#understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
useNGrams=true
#no ngrams will be included that do not contain either the
#beginning or end of the word
noMidNGrams=true
useDisjunctive=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
#the next 4 deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
Related
I convert geodata (coordinates, attributes,...) to a dxf file.
I write attributes into extended data, but under the group code 1001 there must be an application name. I tried to write "Test" and some other words in it, but nothing works.
I receive the error message:
Invalid application name in 1001 group on line 50.
What is the application name in this context, where can I get it or whatever?
You are correct that DXF group 1001 should contain the Application ID of the Extended Entity Data (xData) attached to your entity.
This application ID may be an arbitrary name which fulfils the requiremnts of a symbol table name (which are documented as part of the AutoLISP snvalid function). When specifying an Application ID, you should try to ensure that it is unique and you should AVOID using ACAD, as this is reserved and used internally by AutoCAD.
The key point that is causing your file to fail to be parsed is that every Application ID referenced by xData within the file must also appear as a symbol table name within the APPID symbol table.
I'm using elasticsearch 6.x version with ingest plugin to let me query inside document.
I managed to insert record with attachment document and I'm able to query it against various fields.
When I query the content of the file I'm doing this:
boolQuery.filter(new MatchPhrasePrefixQueryBuilder("attachment.content", "St. Anna Church"))
It works, but I want now to make query with this field: "Church Wall People" where basically it's not a complete phrase, I want back all the documents that contain the words Church, Wall and People.
I'm using Babelnet java api to query the online service:
1) I get the synset of some words by the babelnet relation .getSynsets() on a babelnet instance
2) I also have a method to get the categories of these words by the relation .getDomains().
I want to get the synsets but only those related to a specific domain, for example the word "apple" generates the synsets: {Apple (music group), Apple, Ariane Passenger PayLoad Experiment, Apple (album)} and domains :{MUSIC, COMPUTING, PHYSICS AND ASTRONOMY}.
Specifying the MUSIC domain I want to end up with the synset {Apple (music group), Apple (album)}, so: is there any way to filter the synset I get on step 1) by any specific domain got through step 2)?
I am new to UIMA ...
I want to connect to a database, extract data and process it using UIMA regex annotator and write back to database.
Example:
Table: emp
Name Department EmpId
AB-C Sale's 2134[3]
XYZ, Fina&nce 23423
PQ#R Marketing 234(47
To be transformed using UIMA regex annotator
Desired Output
Name Department EmpId
ABC Sales 21343
XYZ Finance 23423
PQR Marketing 23447
I have installed UIMA, ECLIPSE and relevant JDBC drivers to connect database.
Thanks in advance
There are a couple of ways to achieve this.
The simplest (not so extendable) way would be to write 3 classes (Use uimaFIT http://uima.apache.org/uimafit.html#Documentation to make coding easier) :
CollectionReader:
- read in all data in objects
- iterate over the objects and create JCASes from each object, you can store the primary key in an annotation.
Analysis Engine:
- use the UIMA regex annotator to manipulate the JCAS's documentText
Consumer:
- read the JCAS documentText and use the primary key to update the database
A better way would be to abstract the reading and writing by creating an external resource (http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.externalresources) that connects to the database (provide a hasNext() and next() method - this is very convenient for use in the CollectionReader and Consumer). This has the advantage that all initialisation logic can be isolated. When using UIMAFit, you can use configuration parameter injection (http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.configurationparameters), for example to make the connection string and the search query configurable.
Use the SimplePipeline class in uimaFIT to run your pipeline: http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.pipelines
I'm trying to create invoices using the Java SDK (2.2.1) with the v3 API. My API calls to create invoices are failing because of a Business Validation Error stating that my transaction line does not have a tax code associated with it (error code: 6000).
I'm trying to set the tax for the transaction (sales line) like this:
TaxLineDetail taxLineDetail = new TaxLineDetail();
taxLineDetail.setPercentBased(true);
taxLineDetail.setTaxPercent( getTaxPercent() );
salesLine.setTaxLineDetail(taxLineDetail);
I've also tried using Invoice#setTxnTaxDetail(TxnTaxDetail) but it fails in both cases. It seems the API wants a tax code, presumably one that I set with:
TaxLineDetail#setTaxRateRef(ReferenceType)
But I don't understand where I get this tax rate code from. My QB account does have two taxes configured (23% and 0%) but how do I associate one of these with an invoice? Where is this (integer?) code that I need?
For what it's worth, this is a non-US account.
I think the bigger problem here is that you're looking at the completely wrong object type.
The error message is about:
does not have a tax code associated with it
But you're trying to set a:
TaxLineDetail
And a:
TaxLineDetail#set TaxRate Ref(ReferenceType)
Bottom line here - tax codes are NOT the same thing as tax rates. They are related, but totally separate entities.
What QuickBooks is complaining about is that you're not setting a tax code on your line items. See the line item documentation, and look for the TaxCodeRef node which you should be setting.
https://developer.intuit.com/docs/0025_quickbooksapi/0050_data_services/030_entity_services_reference/invoice#SalesItemLineDetail
You should query for tax codes, using a query like this:
SELECT * FROM TaxCode
And then set your TaxCodeRef value.
You can query TaxCode and TaxRate ref to get details and use the corresponding Ids in time of Invoice creation.
https://developer.intuit.com/docs/0025_quickbooksapi/0050_data_services/030_entity_services_reference/taxcode
https://developer.intuit.com/docs/0025_quickbooksapi/0050_data_services/030_entity_services_reference/taxrate
ApiExplorer - https://developer.intuit.com/apiexplorer?apiname=V3QBO#TaxCode
To get a correct XML structure of an Invoice object with TaxCode and TaxRate, you can create an invoice from QBO UI(with tax) and retrieve the same using API.
Thanks