Convert Numbers written as Words to Integers? [duplicate] - java

This question already has answers here:
How to convert words to a number? [closed]
(3 answers)
Closed 9 years ago.
Is there an open-source Java library for converting String numbers into their equivalent Integers (for example, converting "ten" into 10)? I know how to do it, but I'd rather not waste my customer's time writing one from scratch if there's already a library available.

I doubt that such a library exists.
If you're only looking to convert a limited number of numbers(such as zero through ten) than it probably would take you more time to ask this question here than to just implement it yourself.
If you're looking at converting more complex numbers such as "one hundred twenty four and fifty-one hundredth's" than you're looking for is a natural language recognizer, which is extremely complicated, and unlikely to have a good library in any language.
In the end, It's normally best for back end values and user consumable content to not be coupled.

For "twenty-seven" or "twenty and seven"? For "twenty seven" or "score and seven"? Baker's dozen anyone? A pair of dice, or two dice? One short of a six pack? The trifecta of number processing routines? The 21st century (year 20xx)?
Your requirements are a bit broader than I imagine you considered them. I'd recommend that you work with a framework that will actually allow the flexibility to add new representations instead of assuming a single representation, Apache's Open Natural Language processing framework might be a good choice.
After a few attempts, you might build the trinity of number processing routines. Or at least have a plethora of ideas.

Related

how to replace variables like two-third, one-fifth, two-hundreth number form using java

I want to take input in the name form like two-third or one-fifth and I want my system to convert it into numerical form and give the answer.
Que: two-third of thirty is?
The system should output 20
How can I program it?
As a general problem natural language processing (NLP) - which is what you're talking about - is a difficult open-ended problem.
There are lots of libraries for this stuff. If you want background look here:
Is there a good natural language processing library
Or look up Natural Language Processing in Wikipedia.
However you said you want to do this and you're new to programming.
The first thing you need to do is break the problem down. That's how we solve programming problems.
So first try writing a program that can read a string containing a single word and map it to a number.
For example "One" outputs 1, "Two" outputs 2, "Thirty" outputs 30.
Next try and write a program that cuts a string into its constituent words.
You probably want to use an array here.
That's a process called tokenizing and Java has a built in StringTokenizer to do that.
You might want to code that yourself, but you're learning and it might be the moment to start learning using library code.
When you've got those try combining them so your program can convert "Thirty Seven" into 37 (i.e. numbers under 100).
That new program should combine the ideas of your program than can convert "Thirty" and "Seven" and the one that can split words up.
This is the other thing we do in programming - combining things.
We break it down to smaller problems solve them and then build them back up to solve the bigger problems.
(I apologize if I'm patronizing you but I have no idea of your experience).
After that you might add logic that handles "Five Hundred And Thirty Seven".
Again, notice how spotting Five followed by Hundred is like converting Five and then finding a token that tells you to multiply what you just saw by 100.
You could go on to handle Thousands, Hundred Thousand etc.
Or you could branch off into the fractions.
That's similar but you just have a different vocabulary.
Seven Forty-Seconds = 7/42.
As a learning challenge I would suggest you'll have come a long way if your program handles things like "forty two ninety-thirds of eight hundred and eighty nine".
The easy solution outputs 0.000508 - the floating point answer to (42/93)*889.
The extra credit solution outputs 2/3937 - (42/93)*889 can be simplified as a rational number to 2/3937.
To be honest, you'll be doing well if you can handle "nine-ninths of ninety nine".
Notice that the first word is the numerator (n). The second is the denominator (d). The third is always 'of'. The forth word is either the tens (t) or the units (u). If the forth was the units you're done otherwise if there is a fifth word it's the units.
The answer in that case is n/d*(t*10+u). If the tens or units are missing they're zero - obviously.
PS: You might need special handling for zero if you object to someone typing in ninety zero. It obviously means ninety but we don't say it in English!
you could try an mapping from
one ->1
two ->2
three ->3
four ->4
and so on
and on the other hand:
half ->2
third ->3
fourth ->4
then create an double to divide first value with 2nd..
at least multiply this value with the third (you can use the first mapping for this value) and you got the result.
At least, it is not easy due to you have to build the mapping between string and int manually.

Simple physical quantity measurement unit parser for Java

I want to be able to parse expressions representing physical quantities like
g/l
m/s^2
m/s/kg
m/(s*kg)
kg*m*s
°F/(lb*s^2)
and so on. In the simplest way possible. Is it possible to do so using something like Pyparsing (if such a thing exists for Java), or should I use more complex tools like Java CUP?
EDIT: To answere MrD's question the goal is to make conversion between quantities, so for example convert g to kg (this one is simple...), or maybe °F/(kg*s^2) to K/(lb*h^2) supposing h is four hour and lb for pounds
This is harder than it looks. (I have done a fair amount of work here). The main problem is there is no standard (I have worked with NIST on units and although they have finally created a markup language few people use it). So it's really a form of natural language processing and has to deal with :
ambiguity (what does "M" mean - meters or mega)
inconsistent punctuation
abbreviations
symbols (e.g. "mu" for micro)
unclear semantics (e.g. is kg/m/s the same as kg/(m*s)?
If you are just creating a toy system then you should create a BNF for the system and make sure that all examples adhere to it. This will use common punctuation ("/", "", "(", ")", "^"). Character fields can be of variable length ("m", "kg", "lb"). Algebra on these strings ("kg" -> 1000"g" has problems as kg is a fundamental unit.
If you are doing it seriously then ANTLR (#Yaugen) is useful, but be aware that units in the wild will not follow a regular grammar due to the inconsistencies above.
If you are REALLY serious (i.e. prepared to put in a solid month), I'd be interested to know. :-)
My current approach (which is outside the scope of your question) is to collect a large number of examples from the literature automatically and create a number of heuristics.

Java Double Comparison [duplicate]

This question already has answers here:
comparing float/double values using == operator
(9 answers)
Closed 5 years ago.
Are there any java libraries for doing double comparison?
e.g.
public static boolean greaterThanOrEqual(double a, double b, double epsilon){
return a - b > -epsilon;
}
Every project I start I end up re-implementing this and copy-pasting code and test.
NB a good example of why its better to use 3rd party JARs is that IBM recommend the following:
"If you don't know the scale of the underlying measurements, using the
test "abs(a/b - 1) < epsilon" is likely to be more robust than simply
comparing the difference"
I doubt many people would have thought of this and illustrates that even simple code can be sub-optimal.
Guava has DoubleMath.fuzzyCompare().
In the standard Java library there are no methods to handle your problem actually I suggest you to follow Joachim's link and use that library which is quite good for your needs, even though my suggestion would be to create an utils library in which you could add frequently used methods as the one you've stated in your question, as for different implementations of your problem you should consider looking into this :
Java double comparison epsilon
Feel free to ask out any other ambiguities
You should abstain from any library that uses the naive "maximum absolute difference" approach (like Guava). As detailed in the Bruce Dawson's excellent article Comparing Floating Point Numbers, 2012 edition, it is highly error-prone as it only works for a very limited range of values. A much more robust approach is to use relative differences or ULPs for approximate comparisons.
The only library I know of that does implement a correct approximate comparison algorithm is apache.common.math.

Conversion of CharSequence to maths expression

In my App i have two TextViews one contains an expression eg. 3 + 4 =
And the second contains an answer eg. 7
How would i go about turning this into a valid maths expression so the app could calculate it and return the answer as an int?
Depending on the complexity of the expressions you expect in your TextViews, you might need to construct a parser/interpreter for them. If that's the case, I heartily recommend ANTLR. For more information about using ANTLR on Android, see this question.
Another parser generator that I know of is JavaCC, but ANTLR is a lot more flexible and powerful.

Natural language processing to recognise numerical data

My requirement is to recognize and extract numerical data from a natural language sentence (English only) in response to queries. Platform is Java. For example if the user query is "What is the height of mount Everest" and we have a paragraph as:
In 1856, the Great Trigonometric Survey of British India established the first published height of Everest, then known as Peak XV, at 29,002 ft (8,840 m). In 1865, Everest was given its official English name by the Royal Geographical Society upon recommendation of Andrew Waugh, the British Surveyor General of India at the time, who named it after his predecessor in the post, and former chief, Sir George Everest.[4] Chomolungma had been in common use by Tibetans for centuries, but Waugh was unable to propose an established local name because Nepal and Tibet were closed to foreigners. (Pasted from wikipedia)
For a user query "Height of mount Everest" from the paragraph I need to get 29002 ft or 8840 m as the answer. Can anyone please suggest any possible ways of doing it in Java? Are there any open source libraries for the same?
Obviously, doing this well is extremely difficult to do. If it's an assignment though then I'm guessing the expectation is a bit lower. Here are some thoughts to hopefully get you started:
I'd split the problem into 2 parts; parsing the question block and then passing the answer block. From the question block, you need to know 2 pieces of information, the noun of what you're searching for, and also the type of the answer. In this case the noun is Everest and the type is height. "Types" of data you can build a dictionary for fairly quickly to search your input string for (e.g. "height", "weight", "distance", "age"). The nouns are more difficult, so I'd say to just assume that every non-type in the question is a potential noun, perhaps removing a dictionary of known non-nouns (such as "at", "the", "of" etc.).
Once you've identified the noun and type from the question, you can begin scanning your answer block. I'd begin by breaking that up into sentences. Then scan each sentence for each of your nouns. If one is found in that sentence, you need to scan the sentence again for numbers (taking into account possible whitespace or comma delimiting). Finally, you need to look "around" any numbers you find for a measurement type. So in this case, your "type" that we parsed from the question was "height". You would need to create a mapping of types to measurements, so "height" would map "km, ft, in, cm, m" etc. If the number has one of these types around it, then return the number and measurement type as the answer.
Hope that gets you started. As stated above, this is not intended to be a robust, commercial solution. It's homework-level.

Categories

Resources