Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I am working on a Natural Language parser which examines a sentence in english and extracts some information like name, date etc.
for example: "Lets meet next tuesday at 5 PM at the beach."
So the output will be something like : "Lets meet 15/09/2009 at 1700 hr at the beach"
So basically, what i want to know is that is there any framework or library available for JAVA to do these kind of operations like parsing dates from a sentence and give a output with some specified format.
Regards,
Pranav
Thanks for the replies. I have looked on few NLPs like LingPipe, OpenPL, Stanford NLP. I wanted to ask do they hav anything for date parsing for java.
Natty is a really good replacement for JChronic.
You can use JChronic, the Java port of Chronic.
Have you tried jchronic? However, I doubt any library could directly work with sentences: you'd have to extract sentence fragments and feeding them to a NLP date parsing framework yourself, perhaps on a trial-n-error basis (larger and larger fragments until the framework throws an error).
I don't think there's any framework out there that does that out of the box. What you can do is create a set of regular expressions to match those patterns.
I would suggest using UIMA with OpenNLP connectors and same hand made regexp rules.
I wrote a NLP script in Python's NLTK and fed the results to Ruby's chronic.
For my use case, I had more luck with chrono-java - sadly it looks stale and is not available in any Maven repository (also not via https://jitpack.io/ since the build is broken), so you have to fix and build it for yourself.
However, checking out the code and fixing a dependency (maven-javadoc-plugin was missing groupId and I updated the version), allowed me to build and run a simple example successfully:
List<ParsedResult> results = Chrono.Parse("Datum Freitag, 08.04. bis einschl. Sonntag 10.04.2016");
results.forEach(result -> System.out.println(result));
resulted in 2 Dates being extracted:
ParsedResult: " 08.04" > 04/08/2018 12:00
ParsedResult: "10.04.2016" > 04/10/2016 12:00
Pretty old question bur PrettyTime::NLP is another option to try
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
So, I have four lists of strings, each list which corresponds to a specific category. Each string is a job title, such as "web-developer", which corresponds to the category "IT".
The input string is going to be another job title, and the idea is to sort that job title into the appropriate category based on how well it matches the list of strings
Does anyone know a good library to accomplish this? Sadly, I do not have enough source material to properly train a machine learning system... All the libraries I've found so far seem to be based on machine learning
Alternatively, if no such library exists, do anyone have any suggestions on how to accomplish this? My best idea so far have been to just... search through all the strings and do a string.contains(searchString) and just match it like that. I dunno how to handle multiple matches though...
Ideally the library should be java, but this is not a necessity.
Alternatively, if no such library exists, do anyone have any
suggestions on how to accomplish this? My best idea so far have been
to just... search through all the strings and do a
string.contains(searchString) and just match it like that. I dunno how
to handle multiple matches though...
You could use an algorithm like Levenshtein string distance to achieve this. The algorithm gives you the number of steps needed to change one string to another: the less steps needed, the more similar the strings are.
There is an implementation within the StringUtils Apache Commons library.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 9 months ago.
Improve this question
I have a random text content in a String variable. I want to look for all word inflections of a specific word user specifies.
Example: If the user is looking for the word "assist" then it should grab all "assist, assists, assisted, assisting" occurrences in the String.
Is there a Java library available to detect such inflections automatically in the specified String?
Note: I have seen a Java library called WolframAlpha that claims it does this and here is its web interface, but i don't see this library working, and no guide is available for using it.
First of all it is not Java library, it is Wolfram language previously known as Mathematica. It does have JLink and can be called from Java, but you must have Wolfram Kernel running that executes the code.
This is called Natural Language Processing and it's a huge, complex field. I have fiddled about with few problems, but all I can say this is harder then complex if you want to get reliable solution.
Something you might want to take a look at would be : The Stanford NLP
It is called word stemming. First you need (for a specific language) derive the stem:
assisting -> assist using -ance, -ing, -ly, -s, -ed etcetera.
sought -> search using an exception list
Then do a search, maybe with a regular expression (Matcher.find). Pattern:
"\\bassist\\p{L}*"
"\\b(search|sought)\\p{L}"
For prefixes un- dis- inter- the case would still be more complicated, but in general flections are word endings in English. Then there is synonym searching.
Dictionaries out there are often called corpora. A search for "free English corpus" will yield results.
\\b = word boundary
p{L}* = 0 or more (*) letters
Check this out..
I don't know how big your requirement is, but you can always use wiktionary and parse your data??
Check this question.. Can be of help
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
When I can use it, I am a big fan of using Guava's Preconditions. However, the Guava jar is 2 MB, which can be quite sizeable...
I have a project whose jar weighs 26k, therefore the question is, is there a lightweight library having such a utility class, with no dependencies other than the JDK (6+)? While I could create one, I'd rather not reinvent the wheel!
I would use the whole library as suggested in the comments, but if you really want the small size, there is a recommended way specified in guava's docs - Shrinking JARs with ProGuard
You can use valid4j with hamcrest-matchers instead (found on Maven Central as org.valid4j:valid4j).
For input validation (throwing recoverable exception):
import static org.valid4j.Validation.*;
validate(argument, isValid(), otherwiseThrowing(InvalidException.class));
Or for preconditions (like assertions really):
import static org.valid4j.Assertive.*;
require(x, greaterThan(0)); // throws RequireViolation extends AssertionError
Links:
http://www.valid4j.org/
https://github.com/helsing/valid4j
Take a look at the Requirements API that I authored. The upcoming 3.0.0 release weighs in at 167k. It is very well maintained and is very easy to use:
Requirements API
String actual = "foosball";
String expected = "ballroom";
requireThat(actual, "actual").isEqualTo(expected, "expected")
gives you this:
(If your terminal does not support colors, you will get a textual diff instead)
You might want to check Fernando Cejas Arrow Library. It has Preconditions, Guava Strings, Collections, etc.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I have thousands of pieces of address data and I want to parse them so I can separate street from country from postal code and so on.
Is there any way to do that in Java ?
I know that google open sourced their international address and phone number parsing library. I'd suggest you check their presentation here and javadoc.
If you simply have addresses from all over the world in the form they are on the letters, and you later want to send letters there, you better leave them in this format (maybe after splitting of the country, which comes usually last).
The internal formats very differ between the individual countries (even if only comparing Germany, Great Britain, Russia), and having a database with the individual components afterwards requires individual (country-specific) logic to put them together again.
(I once had an application which took input of the individual fields and later created an address list from then (by the "german way to do this"), and always received complains from the British users that I formatted their addresses in wrong order. So in a later version I simply created a multi-line "address" input field, which I then outputted without any change.)
You could probably use regular expressions if you don't want to add 3rd party dependencies.
See: http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
and http://download.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html
Usage is basically:
private static final Pattern PAT_NAME = Pattern.compile("my\\sregex");
...
Matcher matcher = PAT_NAME.matcher("my address");
There is an older library here: http://jgeocoder.sourceforge.net/parser.html, but it works for most cases.
If you want to use an API, I've used SmartyStreets in the past and they work decently well (https://smartystreets.com/).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
In Java, I have set of expressions like cond1 AND (cond2 OR cond3) AND ( cond 4 OR cond5). I would like to convert it into tree and then evaluate the final boolean answer. I tried searching a lot around java BDD but not able to get any. Any suggestion with sample code ?
A 5-second Google search returned some reasonable-looking results:
JavaBDD
Java Decision Diagram Libraries
What is the best Binary Decision Diagram library for Java?
Is this not what you're looking for?
He means Binary Decision Diagrams.
I've been tinkering with JavaBDD and JBDD/JDD. Both are based on BuDDY (a C library) -- JBDD actually uses the C DLLs for a marginal performance boost.
It looks to me like JavaBDD is more fully-featured (ex. it supports composing BDDs, which is what I need). But there is also no tutorial for it, and while the class docs aren't terrible, frankly I can't figure out how to use it for the most basic of boolean operations (like the problem you pose).
JBDD/JDD requires you to use manual garbage collection, and does weird things like store BDD objects in Java integers -- clearly carry-overs from C. But it has a set of tutorials.
If you want to run your own parser, check out JavaCC.
Here is a nice tutorial to get you started. A bit older, but still valid:
http://www.javaworld.com/jw-12-2000/jw-1229-cooltools.html