Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I have thousands of pieces of address data and I want to parse them so I can separate street from country from postal code and so on.
Is there any way to do that in Java ?
I know that google open sourced their international address and phone number parsing library. I'd suggest you check their presentation here and javadoc.
If you simply have addresses from all over the world in the form they are on the letters, and you later want to send letters there, you better leave them in this format (maybe after splitting of the country, which comes usually last).
The internal formats very differ between the individual countries (even if only comparing Germany, Great Britain, Russia), and having a database with the individual components afterwards requires individual (country-specific) logic to put them together again.
(I once had an application which took input of the individual fields and later created an address list from then (by the "german way to do this"), and always received complains from the British users that I formatted their addresses in wrong order. So in a later version I simply created a multi-line "address" input field, which I then outputted without any change.)
You could probably use regular expressions if you don't want to add 3rd party dependencies.
See: http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
and http://download.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html
Usage is basically:
private static final Pattern PAT_NAME = Pattern.compile("my\\sregex");
...
Matcher matcher = PAT_NAME.matcher("my address");
There is an older library here: http://jgeocoder.sourceforge.net/parser.html, but it works for most cases.
If you want to use an API, I've used SmartyStreets in the past and they work decently well (https://smartystreets.com/).
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
So, I have four lists of strings, each list which corresponds to a specific category. Each string is a job title, such as "web-developer", which corresponds to the category "IT".
The input string is going to be another job title, and the idea is to sort that job title into the appropriate category based on how well it matches the list of strings
Does anyone know a good library to accomplish this? Sadly, I do not have enough source material to properly train a machine learning system... All the libraries I've found so far seem to be based on machine learning
Alternatively, if no such library exists, do anyone have any suggestions on how to accomplish this? My best idea so far have been to just... search through all the strings and do a string.contains(searchString) and just match it like that. I dunno how to handle multiple matches though...
Ideally the library should be java, but this is not a necessity.
Alternatively, if no such library exists, do anyone have any
suggestions on how to accomplish this? My best idea so far have been
to just... search through all the strings and do a
string.contains(searchString) and just match it like that. I dunno how
to handle multiple matches though...
You could use an algorithm like Levenshtein string distance to achieve this. The algorithm gives you the number of steps needed to change one string to another: the less steps needed, the more similar the strings are.
There is an implementation within the StringUtils Apache Commons library.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 9 months ago.
Improve this question
I have a random text content in a String variable. I want to look for all word inflections of a specific word user specifies.
Example: If the user is looking for the word "assist" then it should grab all "assist, assists, assisted, assisting" occurrences in the String.
Is there a Java library available to detect such inflections automatically in the specified String?
Note: I have seen a Java library called WolframAlpha that claims it does this and here is its web interface, but i don't see this library working, and no guide is available for using it.
First of all it is not Java library, it is Wolfram language previously known as Mathematica. It does have JLink and can be called from Java, but you must have Wolfram Kernel running that executes the code.
This is called Natural Language Processing and it's a huge, complex field. I have fiddled about with few problems, but all I can say this is harder then complex if you want to get reliable solution.
Something you might want to take a look at would be : The Stanford NLP
It is called word stemming. First you need (for a specific language) derive the stem:
assisting -> assist using -ance, -ing, -ly, -s, -ed etcetera.
sought -> search using an exception list
Then do a search, maybe with a regular expression (Matcher.find). Pattern:
"\\bassist\\p{L}*"
"\\b(search|sought)\\p{L}"
For prefixes un- dis- inter- the case would still be more complicated, but in general flections are word endings in English. Then there is synonym searching.
Dictionaries out there are often called corpora. A search for "free English corpus" will yield results.
\\b = word boundary
p{L}* = 0 or more (*) letters
Check this out..
I don't know how big your requirement is, but you can always use wiktionary and parse your data??
Check this question.. Can be of help
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I am looking for an open source dictionary, The user in my application enter city name in Hebrew, and I want to translate it to English. I have another component that can use only English city names.
Where can I find such a component?
I guess that your main problem is not how to match Hebrew names to English ones (once you have a list of pairs), but actually how to obtain the data of matchings between Hebrew and English names. I'll also guess that you are aiming for cities in Israel if you are speaking about hebrew names. Therefore I'd try the statistics from the central bureau of statistics - they have a table with hebrew city names, and their english corressponding names. You can find it here: http://www.cbs.gov.il/ishuvim/ishuvim_print.htm
Did you try to use one of the Map implementations? For example HashMap or TreeMap? I'd even say more: just store all localized names in resource files (properties format) and use ResourceBundle to access them.
for example call your file towns.properties
tel-aviv = Tel Aviv
jerusalem = Jerusalem
The Hebrew version of your file is towns-iw.properties
tel-aviv = תל אביב
jerusalem = ירושלים
Now using ResourceBundle API create Map that contains direct associations between English and Hebrew names. Then just use it.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I am working on a Natural Language parser which examines a sentence in english and extracts some information like name, date etc.
for example: "Lets meet next tuesday at 5 PM at the beach."
So the output will be something like : "Lets meet 15/09/2009 at 1700 hr at the beach"
So basically, what i want to know is that is there any framework or library available for JAVA to do these kind of operations like parsing dates from a sentence and give a output with some specified format.
Regards,
Pranav
Thanks for the replies. I have looked on few NLPs like LingPipe, OpenPL, Stanford NLP. I wanted to ask do they hav anything for date parsing for java.
Natty is a really good replacement for JChronic.
You can use JChronic, the Java port of Chronic.
Have you tried jchronic? However, I doubt any library could directly work with sentences: you'd have to extract sentence fragments and feeding them to a NLP date parsing framework yourself, perhaps on a trial-n-error basis (larger and larger fragments until the framework throws an error).
I don't think there's any framework out there that does that out of the box. What you can do is create a set of regular expressions to match those patterns.
I would suggest using UIMA with OpenNLP connectors and same hand made regexp rules.
I wrote a NLP script in Python's NLTK and fed the results to Ruby's chronic.
For my use case, I had more luck with chrono-java - sadly it looks stale and is not available in any Maven repository (also not via https://jitpack.io/ since the build is broken), so you have to fix and build it for yourself.
However, checking out the code and fixing a dependency (maven-javadoc-plugin was missing groupId and I updated the version), allowed me to build and run a simple example successfully:
List<ParsedResult> results = Chrono.Parse("Datum Freitag, 08.04. bis einschl. Sonntag 10.04.2016");
results.forEach(result -> System.out.println(result));
resulted in 2 Dates being extracted:
ParsedResult: " 08.04" > 04/08/2018 12:00
ParsedResult: "10.04.2016" > 04/10/2016 12:00
Pretty old question bur PrettyTime::NLP is another option to try
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Somewhat related to this question, but in the absence of any answer about QuickBooks specifically, does anyone know of an address parser for Java? Something that can take unstructured address information and parse out the address line 1, 2 and city state postal code and country?
I do know that the Google Maps web service is great at doing this. So, if you want to use that, you could save a lot of effort.
The real issue here is that you need a worldwide database of city/country/province names to effectively parse UNSTRUCTURED addresses.
Here is how I build a URL for use by the Google Maps API in C#:
string url = "http://maps.google.com/maps/geo?key=" + HttpUtility.UrlEncode(this.apiKey) + "&sensor=false&output=xml&oe=utf8&q=" + HttpUtility.UrlEncode(location);
The SourceForge JGeocoder has an address parser that you may find useful. See http://jgeocoder.sourceforge.net/parser.html.
Might want to read this Stack Overflow question:
"Parse usable Street Address, City, State, Zip from a string". No actual Java code to do the job (just some VB), but there is some discussion of the problem and more info on the alternative John Gietzen mentions, of using a web service to interpret it for you.
The Mural project has an address parser: https://mural.dev.java.net/. I haven't figured out how to exract it from the larger Mural engine, but it does work based on some very limited tests.
See www.address-parser.com, they offer a web service for parsing international addresses.