diff implementation in Java [duplicate] - java

This question already has answers here:
How to perform string Diffs in Java?
(9 answers)
Closed 5 years ago.
I'm looking for a diff implementation in Java. I've seen that Python has its own SequenceMatcher (with difflib), which is exactly what I need... in Java.
Is there any portage? Or is there any other class/library that performs the same in Java?
If not, where can I find the source code of that difflib (if free as in speech) to make my own implementation of SequenceMatcher in Java ?
Unfortunately, Apache Commons Lang doesn't help me much.
Thanks!

This library seems to be what you're after: google-diff-match-patch.
It has the following main features:
Diff: Compare two blocks of plain text and efficiently return a list of differences.
Match: Given a search string, find its best fuzzy match in a block of plain text. Weighted for both accuracy and location.
Patch: Apply a list of patches onto plain text. Use best-effort to apply patch even when the underlying text doesn't match.
In case you want an alternative, you could also try this: java-diff-utils

Hi You can run a MR job which can use https://code.google.com/p/google-diff-match-patch/ to do the required job. I dont feel there are any tools out of the box to do your job.

Related

Java isEmpty or "".equals for performance [duplicate]

This question already has answers here:
Should I use string.isEmpty() or "".equals(string)?
(6 answers)
Closed 7 years ago.
I'm writing a lot of components in Adobe CQ so have to deal a lot with user set properties. And i'm getting a little tired of all the null checks before I can do an isEmpty check.
I'd like to do something like.
"".equals(string);
This would be a lot more readable, but how would it compare performance wise. And yes i would expect to create the "" as a constant if there where multiple checks.
Thanks
D
Personally I use Apache's StringUtils, eg:
if (StringUtils.isEmpty(someString)) {
...
or
if (StringUtils.isNotEmpty(someString)) {
...
Also I really wouldn't worry about the performance of this unless you have benchmarked an identified it as an issue
It is preferred to use the isEmpty() method(Simpler and faster source code ).
Another efficient way to check empty string in java is to use:
string.length() == 0;
You should not care about performance here. Both version have similar speed. Even if they compile differently, JITted code will unlikely to differ more than several CPU cycles (especially given the fact that String.equals is JVM intrinsic). Not the thing you should worry about when programming on Java.

Convert Numbers written as Words to Integers? [duplicate]

This question already has answers here:
How to convert words to a number? [closed]
(3 answers)
Closed 9 years ago.
Is there an open-source Java library for converting String numbers into their equivalent Integers (for example, converting "ten" into 10)? I know how to do it, but I'd rather not waste my customer's time writing one from scratch if there's already a library available.
I doubt that such a library exists.
If you're only looking to convert a limited number of numbers(such as zero through ten) than it probably would take you more time to ask this question here than to just implement it yourself.
If you're looking at converting more complex numbers such as "one hundred twenty four and fifty-one hundredth's" than you're looking for is a natural language recognizer, which is extremely complicated, and unlikely to have a good library in any language.
In the end, It's normally best for back end values and user consumable content to not be coupled.
For "twenty-seven" or "twenty and seven"? For "twenty seven" or "score and seven"? Baker's dozen anyone? A pair of dice, or two dice? One short of a six pack? The trifecta of number processing routines? The 21st century (year 20xx)?
Your requirements are a bit broader than I imagine you considered them. I'd recommend that you work with a framework that will actually allow the flexibility to add new representations instead of assuming a single representation, Apache's Open Natural Language processing framework might be a good choice.
After a few attempts, you might build the trinity of number processing routines. Or at least have a plethora of ideas.

Java Double Comparison [duplicate]

This question already has answers here:
comparing float/double values using == operator
(9 answers)
Closed 5 years ago.
Are there any java libraries for doing double comparison?
e.g.
public static boolean greaterThanOrEqual(double a, double b, double epsilon){
return a - b > -epsilon;
}
Every project I start I end up re-implementing this and copy-pasting code and test.
NB a good example of why its better to use 3rd party JARs is that IBM recommend the following:
"If you don't know the scale of the underlying measurements, using the
test "abs(a/b - 1) < epsilon" is likely to be more robust than simply
comparing the difference"
I doubt many people would have thought of this and illustrates that even simple code can be sub-optimal.
Guava has DoubleMath.fuzzyCompare().
In the standard Java library there are no methods to handle your problem actually I suggest you to follow Joachim's link and use that library which is quite good for your needs, even though my suggestion would be to create an utils library in which you could add frequently used methods as the one you've stated in your question, as for different implementations of your problem you should consider looking into this :
Java double comparison epsilon
Feel free to ask out any other ambiguities
You should abstain from any library that uses the naive "maximum absolute difference" approach (like Guava). As detailed in the Bruce Dawson's excellent article Comparing Floating Point Numbers, 2012 edition, it is highly error-prone as it only works for a very limited range of values. A much more robust approach is to use relative differences or ULPs for approximate comparisons.
The only library I know of that does implement a correct approximate comparison algorithm is apache.common.math.

Way to detect if String content is a DateTime - RegExp? [duplicate]

This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
generically parsing String to date
Following situation:
I need to detect if a String contains a DateTime/Timestamp. The problem is that those DateTimes come in various formats and granularity such as:
2011-09-12
12-09-2011
12.09.2011
2011-09-01-14:15
... and many many more variations
I don't need to understand the semantics (e.g. distinct between day or months) I just need to detect let's say 80% of the most common DateTime variations.
My first thought was using RegExp - which I'm far from being familiar with and also I would need to familiarize myselft with all variations in which DateTimes can come.
So my questions:
Does anybody know a canned RegExps to achieve this?
Is there maybe some Java library that could do this task?
Thanks!!
There is another question of same context, hope that link will help you: Dynamic regex for date time formats
you're going to struggle to find a generic match. For the day - month - year section you could possibly use a pattern like (\d{1,2}.){2}\d{4} which would match dates in format dd*mm*yyyy
DateFormat would be a better choice, I think. As John B suggested above, create a list of valid formats and try to match against each one.
Use Java's DateFormat.
You can set up as many formats as you want and iterate through them looking for a match. You will have to catch exceptions for the formats that don't parse and so this solution is not efficient but will work.
Edit per comment:
If you don't want to have exceptions due to performance the you would need to set up a list of regular expressions (one for each format you will support). Find the regex (if any) that matches your input and convert it to a date based on the matching format. What I would suggest would be to match a DateFormat to each regex and let the appropriate DateFormat do the work of parsing once you have identified the appropriate DateFormat. This would reduce the chance of errors in using the groups from the regex to produce the date. Personally, I don't know if this would actually be more efficient than try/catch so I would opt for the more straightforward mechanism (using DateFormat directly).

Is there a Java equivalent of Python's printf hash replacement?

Specifically I am converting a python script into a java helper method. Here is a snippet (slightly modified for simplicity).
# hash of values
vals = {}
vals['a'] = 'a'
vals['b'] = 'b'
vals['1'] = 1
output = sys.stdout
file = open(filename).read()
print >>output, file % vals,
So in the file there are %(a), %(b), %(1) etc that I want substituted with the hash keys. I perused the API but couldn't find anything. Did I miss it or does something like this not exist in the Java API?
You can't do this directly without some additional templating library. I recommend StringTemplate. Very lightweight, easy to use, and very optimized and robust.
I doubt you'll find a pure Java solution that'll do exactly what you want out of the box.
With this in mind, the best answer depends on the complexity and variety of Python formatting strings that appear in your file:
If they're simple and not varied, the easiest way might be to code something up yourself.
If the opposite is true, one way to get the result you want with little work is by embedding Jython into your Java program. This will enable you to use Python's string formatting operator (%) directly. What's more, you'll be able to give it a Java Map as if it were a Python dictionary (vals in your code).

Categories

Resources