Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am looking for a good library or some project that has been done in the area of SMS text normalization. I have found some good research projects like this one.
I am using Java as the programming language.
The concept in a nutshell is to handle SMS based text like "tel him 2 go home nw" and convert it to normal english language text "tell him to go home now".
Why not just to download a dictionary from a site like this: http://smsdictionary.co.uk/abbreviations and use a string replacement?
Dictionary substitution does not cut it, since it misses context in translations. e.g. do you translate '2' to 'to', 'too' or 'two'?
You can get a corpus and train a statistical model yourself, either using Moses (http://www.statmt.org/moses/) or Phrasal (http://nlp.stanford.edu/software/phrasal/).
As an author of the Stanford one (http://www-nlp.stanford.edu/sms/translate.php), I could be convinced to offer a REST based API for such a service, but I don't know the demand for it...
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I'm writing an application that stores people's street addresses.
I'm aware that addresses can be complicated (especially in parsing user input) so I'm concerned writing a simplistic class myself might cause me issues I have to fix later.
Is there a decent model in the JDK or an external library that I could utilise instead?
No. Java does not provide such modelling out of the box. No language does. If anything, stuff like that is implemented in a way of libraries or frameworks.
If you need to support international addresses, then I recommend looking at the field types and length from FedEx and/or UPS, because if you will model based on what you know from your country then you are bound to make many changes in the future.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is there any Library in Java or PHP that could help in retrieving all the derivations of some word?
Eg. happiness and unhappy from the root word happy
Thanks
I don't know such library, it is rather a dictionary data, but I don't know also such dictionary data.
Maybe you could use a plain dictionary (list of all word and its deriviations) and find similiar words, by calculating the Levenshtein distance from your original word. I think it could work in general, but it will give you also some errors.
Here you can find the algorithm for java: http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance
I have found some dictionary data hare: http://dumps.wikimedia.org/enwiktionary/latest/
Maybe you can find out there some data you can use and adapt :)
There is a project called snowball. It is a library for creating stemming algorithms. You can see a little demo here: http://snowball.tartarus.org/demo.php
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Is there an API for finding the category of a domain? I'm looking for an alternative to reading the website's meta tags with jsoup.
For example a query with allmusic.com should yield music, while one for arsenal.com should yield sports.
The ideal candidate can be used easily with Java. :-)
P. S: The system does not have to be perfect.
These should help:
http://www.similarweb.com/pricing
https://www.webshrinker.com/
https://zvelo.com/zvelo-products/zvelodb/
https://aws.amazon.com/awis/
All of those allow you to get Categories for a given domain. Not free, but then that was not specified.
Seems that I have to rely on my old idea, which scans the website's meta tags and its URL.
Maybe I could use Wikipedia's API to find an article and then look at its categories.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Does anyone know a good library/jar to parse RSS/Atom Feeds?
I'd like to stuff a URL in, and want to get the news-items in a homogenous way. (Means: it should not matter whether the Source contains an Atom or RSS feed. I just want "Items" back.)
After browsing on SO, I came up with ROME, but it seems to be chaotic at the moment (no download for example), and is a pure Java solution. Scala would be preferred, but Java is quite okay if nothing Scala-specific exists. Also, it should be a single JAR Library, since I don't use Maven etc.
PS: It's not for Android, just for a good old desktop app.
EDIT: To be more clear, I already know how to get the content from resources as XML. I want to parse them! Autodetect whether it's Atom or some RSS and give me back a uniform list of items.
Have you checked out Dispatch? It doesn't have any RSS/Atom features built in per se, but you should be able to handle the response text as a scala.xml.NodeSeq and do whatever you need.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I want to develop an eBook reader app. What are some good libraries available to parse formats like .azw, .mobi, .pdf etc.?
As Ranhiru said, here and here you can see how PDFs are parsed.
For .mobi, however, there is no library, so you'll have to parse the format yourself. A full specification of the format can be read on the mobileread wiki.
With .azw files, it's different: if the Kindle ebook is DRM-free, then its format coincides with the .mobi one, i.e. they are absolutely interchangeable. Otherwise, it's very difficult to do, since you'll also have to generate a Kindle PID and perform the de-DRM-ing of the .azw file. There's a guide on how to do that on the desktop here. However, it is strongly not recommended, since it breaks the whole point of DRM and is illegal pretty much everywhere.
For mobi there isn't complete spec sheet available, but you should directly jump into PDB format which is extended & used by MOBI
http://jola.comm.pl/palm/opispdb.htm