Training Data set for Name entity recognition using StanfordNLP

Training Data set for Name entity recognition using StanfordNLP - java

I have used OpenNLP to create an model. But now i am looking into StanfordNLP which uses Condition Random Field. I want to know how to train a data for NER using stanfordNLP.
For OpenNLp we use START and END tag but i do no how to train using StanfordNLP. please give me an example.

Look at http://nlp.stanford.edu/software/crf-faq.shtml#b
That should explain what you need to know to get started. There are then many options mainly only documented in the code.

Related

How to create a simple Italian Model for a Named Entity Extraction of Persons using OpenNLP?

I have to do a project with OpenNLP, strictly in italian language. Since it's almost impossible to find some existing structures in this language, my idea is to create a simple model myself. Reading some posts on this platform, my idea is try to do this using model-builder addon.
First of all, it's possible to obtain my goal with this addon?
If so, referring to this other post, what kind of file is meant by "modelOutFile"? In my case I don't have an existing model.
N.B.: the addon uses some deprecated functions (such as nameFinderME.train()).
Naively, I tried to pass as a "modelOutFile" a simple empty file "model.bin", but, of course I bumped into an error:
Cannot invoke "java.util.Properties.getProperty(String)" because "manifest" is null
Furthermore, I used a few names and sentences for the test (I only wanted to know if this worked), not the large amount requested (15000 sentences at least).
I'm open to other suggestions instead of the use of modelbuilder addons.
Hope someone can help me.

Neural Net use for finding specific types of websites?

So I'm working on my first project and I'm trying to incorporate a neural net in it somehow. At the moment I just created web crawler that basically takes a word as input and then performs a google search and retrieves the html data of the links.
Now I am trying to only use the html data from specific types of websites, in my case websites that offer free educational content/courses. Example being This site https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-092-java-preparation-for-6-170-january-iap-2006/index.htm
I'm new to neural nets but is this something a neural net is able to do or would another method be better?
Also the rest of my code, such as for web crawler is in Java, so If neural net is applicable in this case what library or tool would you guys recommend for building/training the neural net. I was thinking Neuroph but would love to hear some suggestions.

When you use Neural Networks , it's for predicting something , for example you get an image as input and as ouput you'll have to get the nature of the image , for example knowing what's the content of image : is it a cat or a dog .. etc
About Web Crawler :
The web crawler you've been talking about is not something that necessarily needs neural network ( the idea that you wanted ) , but in case you wanna add some predictions then you can use it , for example taking word as input , making google search about it and then predicting nature of content
I dont know exacly what you wanna predict or the nature of prediction you want to do ( classification or regression ) but i can suggest you first how to take an input html
Taking Html content as input :
First thing to mention , the neural networks doesnt treat caracters , it treats numbers , so if you wanna treat an html content you'll have to use a mecanism , and that's not an easy step , there is a domain called NLP ( Natural Language Processing ) which gives you some good ways to treat texts , you can also use it for html content ( or in a different way if you want ).
I already made before a project of Text Suggestion with a recurrent neural network where i use one the NLP's methods , you can check it on my github because i explained on the Readme all the steps in details : https://github.com/KaramMed/Modele-de-Suggestion-du-Texte
About Library :
I recommand you to use TensorFlow for Java , it's one of the best librairies of Deep Learning and you can find so much tutorials about it

Exact Dictionary based Named Entity Recognition with Stanford

I have a dictionary of named entities, extracted from Wikipedia. I want to use it as the dictionary of an NER. I wanted to know how can I use Stanford-NER with this data of mine.
I have also downloaded Lingpipe, although I have no idea how can I use it. I would appreciate all kinds of information.
Thanks for your helps.

You can use dictionary (or regular expression-based) named entity recognition with Stanford CoreNLP. See the RegexNER annotator. For some applications, we run this with quite large dictionaries of entities. Nevertheless, for us this is typically a secondary tool to using statistical (CRF-based) NER.

Stanford-NER is based on CRFs, which is a statistic model. I'm afraid it doesn't support extra dictionary or lexicon. However, you can train a new model according to your own task.

you can use MER: http://labs.fc.ul.pt/mer/
a minimal entity recognizer developed in bash: https://github.com/lasigeBioTM/MER
that only requires a lexicon (text file) as input

Apache OpenNLP Part of Speech Tagger: Trained on which data set?

I am using the Apache OpenNLP Part-of-Speech Tagger for word class recognition in a collection of text.
I am trying to evaluate the tagger for its performance and I wondered on which data it might have been trained?
The name of the models that exist for English give no hint about the used training data.
The Apache OpenNLP documentation mentions several corpora which potentially might have been used for training the POS-Tagger, too.
http://opennlp.apache.org/documentation/manual/opennlp.html#tools.corpora
Does anyone know how to find out on which training data the English POS-Models have been trained?

Yes, you are right, that there are several corpora used in Opennlp.
But if you'll see the OpenNLP Model page, it's mentioned which dataset is used to train the model like below.

Classification using Bernoulli classifier in Lingpipe

I want to classify my data through bernoulli classifier in lingpipe
If someone has the working method for it, please share

http://java2s.com/Open-Source/Java/Natural-Language-Processing/LingPipe/com/aliasi/test/unit/classify/BernoulliClassifierTest.java.htm
I.e.
FeatureExtractor FEATURE_EXTRACTOR
= new TokenFeatureExtractor(IndoEuropeanTokenizerFactory.INSTANCE);
...
BernoulliClassifier classifier
= new BernoulliClassifier(FEATURE_EXTRACTOR);
Then use handle() to add training data, and classify() to get answers out.
To find this I googled on "Bernoulli classifier in Lingpipe" (without the quotes). I found the API docs, and saw no example usage and that it is poor quality. So, I guessed there might be a unit test, as Java programmers are quite anal about testing. So then I googled for "bernoulliClassifier lingpipe test" (again, without the quotes).
(By "poor quality" docs, I mean the function descriptions just repeat the function names, and add no information.)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Training Data set for Name entity recognition using StanfordNLP - java

I have used OpenNLP to create an model. But now i am looking into StanfordNLP which uses Condition Random Field. I want to know how to train a data for NER using stanfordNLP. For OpenNLp we use START and END tag but i do no how to train using StanfordNLP. please give me an example.

Look at http://nlp.stanford.edu/software/crf-faq.shtml#b That should explain what you need to know to get started. There are then many options mainly only documented in the code.

Related

How to create a simple Italian Model for a Named Entity Extraction of Persons using OpenNLP?

Neural Net use for finding specific types of websites?

Exact Dictionary based Named Entity Recognition with Stanford

Apache OpenNLP Part of Speech Tagger: Trained on which data set?

Classification using Bernoulli classifier in Lingpipe

Categories

Resources