Generate formatted diff output in Java - java

Are there any libraries out there for Java that will accept two strings, and return a string with formatted output as per the *nix diff command?
e.g. feed in
test 1,2,3,4
test 5,6,7,8
test 9,10,11,12
test 13,14,15,16
and
test 1,2,3,4
test 5,6,7,8
test 9,10,11,12,13
test 13,14,15,16
as input, and it would give you
test 1,2,3,4 test 1,2,3,4
test 5,6,7,8 test 5,6,7,8
test 9,10,11,12 | test 9,10,11,12,13
test 13,14,15,16 test 13,14,15,16
Exactly the same as if I had passed the files to diff -y expected actual
I found this question, and it gives some good advice on general libraries for giving you programmatic output, but I'm wanting the straight string results.
I could call diff directly as a system call, but this particular app will be running on unix and windows and I can't be sure that the environment will actually have diff available.

java-diff-utils
The DiffUtils library for computing
diffs, applying patches, generationg
side-by-side view in Java
Diff Utils library is an OpenSource
library for performing the comparison
operations between texts: computing
diffs, applying patches, generating
unified diffs or parsing them,
generating diff output for easy future
displaying (like side-by-side view)
and so on.
Main reason to build this library was
the lack of easy-to-use libraries with
all the usual stuff you need while
working with diff files. Originally it
was inspired by JRCS library and it's
nice design of diff module.
Main Features
computing the difference between two texts.
capable to hand more than plain ascci. Arrays or List of any type that
implements hashCode() and equals()
correctly can be subject to
differencing using this library
patch and unpatch the text with the given patch
parsing the unified diff format
producing human-readable differences

I ended up rolling my own. Not sure if it's the best implementation, and it's ugly as hell, but it passes against test input.
It uses java-diff to do the heavy diff lifting (any apache commons StrBuilder and StringUtils instead of stock Java StringBuilder)
public static String diffSideBySide(String fromStr, String toStr){
// this is equivalent of running unix diff -y command
// not pretty, but it works. Feel free to refactor against unit test.
String[] fromLines = fromStr.split("\n");
String[] toLines = toStr.split("\n");
List<Difference> diffs = (new Diff(fromLines, toLines)).diff();
int padding = 3;
int maxStrWidth = Math.max(maxLength(fromLines), maxLength(toLines)) + padding;
StrBuilder diffOut = new StrBuilder();
diffOut.setNewLineText("\n");
int fromLineNum = 0;
int toLineNum = 0;
for(Difference diff : diffs) {
int delStart = diff.getDeletedStart();
int delEnd = diff.getDeletedEnd();
int addStart = diff.getAddedStart();
int addEnd = diff.getAddedEnd();
boolean isAdd = (delEnd == Difference.NONE && addEnd != Difference.NONE);
boolean isDel = (addEnd == Difference.NONE && delEnd != Difference.NONE);
boolean isMod = (delEnd != Difference.NONE && addEnd != Difference.NONE);
//write out unchanged lines between diffs
while(true) {
String left = "";
String right = "";
if (fromLineNum < (delStart)){
left = fromLines[fromLineNum];
fromLineNum++;
}
if (toLineNum < (addStart)) {
right = toLines[toLineNum];
toLineNum++;
}
diffOut.append(StringUtils.rightPad(left, maxStrWidth));
diffOut.append(" "); // no operator to display
diffOut.appendln(right);
if( (fromLineNum == (delStart)) && (toLineNum == (addStart))) {
break;
}
}
if (isDel) {
//write out a deletion
for(int i=delStart; i <= delEnd; i++) {
diffOut.append(StringUtils.rightPad(fromLines[i], maxStrWidth));
diffOut.appendln("<");
}
fromLineNum = delEnd + 1;
} else if (isAdd) {
//write out an addition
for(int i=addStart; i <= addEnd; i++) {
diffOut.append(StringUtils.rightPad("", maxStrWidth));
diffOut.append("> ");
diffOut.appendln(toLines[i]);
}
toLineNum = addEnd + 1;
} else if (isMod) {
// write out a modification
while(true){
String left = "";
String right = "";
if (fromLineNum <= (delEnd)){
left = fromLines[fromLineNum];
fromLineNum++;
}
if (toLineNum <= (addEnd)) {
right = toLines[toLineNum];
toLineNum++;
}
diffOut.append(StringUtils.rightPad(left, maxStrWidth));
diffOut.append("| ");
diffOut.appendln(right);
if( (fromLineNum > (delEnd)) && (toLineNum > (addEnd))) {
break;
}
}
}
}
//we've finished displaying the diffs, now we just need to run out all the remaining unchanged lines
while(true) {
String left = "";
String right = "";
if (fromLineNum < (fromLines.length)){
left = fromLines[fromLineNum];
fromLineNum++;
}
if (toLineNum < (toLines.length)) {
right = toLines[toLineNum];
toLineNum++;
}
diffOut.append(StringUtils.rightPad(left, maxStrWidth));
diffOut.append(" "); // no operator to display
diffOut.appendln(right);
if( (fromLineNum == (fromLines.length)) && (toLineNum == (toLines.length))) {
break;
}
}
return diffOut.toString();
}
private static int maxLength(String[] fromLines) {
int maxLength = 0;
for (int i = 0; i < fromLines.length; i++) {
if (fromLines[i].length() > maxLength) {
maxLength = fromLines[i].length();
}
}
return maxLength;
}

Busybox has a diff implementation that is very lean, should not be hard to convert to java, but you would have to add the two-column functionality.

http://c2.com/cgi/wiki?DiffAlgorithm I found this on Google and it gives some good background and links. If you care about the algorithm beyond just doing the project, a book on basic algorithm that covers Dynamic Programming or a book just on it. Algorithm knowledge is always good:)

You can use Apache Commons Text library to achieve this. This library provides 'diff' capability based on "very efficient algorithm from Eugene W. Myers".
This provides you ability to create your own visitor so that you can process the diff in the way you want & may be output to console or HTML etc. Here is one article which walks through nice & simple example to output side by side diff in HTML format using Apache Commons Text library & simple Java code.

Related

Foobar code working in IDE but not in Solution file

I've written code for a foobar challenge that works in my IDE but not in the solutions file provided by foobar. Also, is there anyway to show the output even if the test fails? Possibly to with it being a static method or the input being {1, 2, 3, 4} whereas mine is working with new int {1,2,3,4,5}? My code is:
public static int solution(int[] l) {
List<Integer> numberList = Arrays.stream(l).boxed().sorted(Comparator.reverseOrder()).collect(Collectors.toList());
while (true) {
StringBuilder number = new StringBuilder();
int i = 0;
while (i < numberList.size()) {
number.append(numberList.get(i));
i++;
}
List<Integer> startingList = Arrays.stream(l).boxed().sorted(Comparator.reverseOrder()).collect(Collectors.toList());
int testValue = numberList.size();
for (Integer integer : numberList) {
if (startingList.contains(integer)) {
startingList.remove(integer);
testValue--;
}
}
if (testValue == 0) {
int f = 0;
int total = 0;
while (f < numberList.size()) {
total = total + numberList.get(f);
f++;
}
if (total % 3 == 0) {
StringBuilder answer = new StringBuilder();
int c = 0;
while (c < numberList.size()) {
answer.append(numberList.get(c));
c++;
}
return Integer.parseInt(answer.toString());
}
}
Integer nextNumber = Integer.parseInt(number.toString()) - 1;
String[] stringArray = valueOf(nextNumber).split("");
numberList = new ArrayList<>();
for (String s : stringArray) {
numberList.add(Integer.parseInt(s));
}
}
}
Pretty rubbish but it does the job (at least in my IDE!)
As mentioned in a comment on the question, you should undoubtedly give some more context for your questions (since it is pretty unclear what your code is intended to do). I'm pretty sure I've inferred the actual question from context though, and I can suggest a couple of problems. In short (and a pretty good assumption for coding in general) the issue is not the environment running your code incorrectly, but rather your code having missed bugs due to lack of comprehensive testing. If you had presented a number of sample inputs and results I would guess you would have seen that your solution does not work locally.
The Java List.remove() method takes an index rather than a value to be removed (https://docs.oracle.com/javase/8/docs/api/java/util/List.html). The way it is used in your sample will result in throwing exceptions in a number of circumstances. Proper testing would have identified this (and will pick up most of your problems if fixed)
What happens if there is no solution? For example, an input of {1, 1} is going to get into a pretty messy state as the 'nextNumber' value slips below 0. You should know what the desired behavior is in this situation, and your tests should cover it before you try to upload a solution
This happened to me as well, but I then realized that my compilation was not successful because I have not imported the package that I am using at the top of the source code file like all java programs are write

Find all valid words when given a string of characters (Recursion / Binary Search)

I'd like some feedback on a method I tried to implement that isn't working 100%. I'm making an Android app for practice where the user is given 20 random letters. The user then uses these letters to make a word of whatever size. It then checks a dictionary to see if it is a valid English word.
The part that's giving me trouble is with showing a "hint". If the user is stuck, I want to display the possible words that can be made. I initially thought recursion. However, with 20 letters this can take quite a long time to execute. So, I also implemented a binary search to check if the current recursion path is a a prefix to anything in the dictionary. I do get valid hints to be output however it's not returning all possible words. Do I have a mistake here in my recursion thinking? Also, is there a recommended, faster algorithm? I've seen a method in which you check each word in a dictionary and see if the characters can make each word. However, I'd like to know how effective my method is vs. that one.
private static void getAllWords(String letterPool, String currWord) {
//Add to possibleWords when valid word
if (letterPool.equals("")) {
//System.out.println("");
} else if(currWord.equals("")){
for (int i = 0; i < letterPool.length(); i++) {
String curr = letterPool.substring(i, i+1);
String newLetterPool = (letterPool.substring(0, i) + letterPool.substring(i+1));
if(dict.contains(curr)){
possibleWords.add(curr);
}
boolean prefixInDic = binarySearch(curr);
if( !prefixInDic ){
break;
} else {
getAllWords(newLetterPool, curr);
}
}
} else {
//Every time we add a letter to currWord, delete from letterPool
//Attach new letter to curr and then check if in dict
for(int i=0; i<letterPool.length(); i++){
String curr = currWord + letterPool.substring(i, i+1);
String newLetterPool = (letterPool.substring(0, i) + letterPool.substring(i+1));
if(dict.contains(curr)) {
possibleWords.add(curr);
}
boolean prefixInDic = binarySearch(curr);
if( !prefixInDic ){
break;
} else {
getAllWords(newLetterPool, curr);
}
}
}
private static boolean binarySearch(String word){
int max = dict.size() - 1;
int min = 0;
int currIndex = 0;
boolean result = false;
while(min <= max) {
currIndex = (min + max) / 2;
if (dict.get(currIndex).startsWith(word)) {
result = true;
break;
} else if (dict.get(currIndex).compareTo(word) < 0) {
min = currIndex + 1;
} else if(dict.get(currIndex).compareTo(word) > 0){
max = currIndex - 1;
} else {
result = true;
break;
}
}
return result;
}
The simplest way to speed up your algorithm is probably to use a Trie (a prefix tree)
Trie data structures offer two relevant methods. isWord(String) and isPrefix(String), both of which take O(n) comparisons to determine whether a word or prefix exist in a dictionary (where n is the number of letters in the argument). This is really fast because it doesn't matter how large your dictionary is.
For comparison, your method for checking if a prefix exists in your dictionary using binary search is O(n*log(m)) where n is the number of letters in the string and m is the number of words in the dictionary.
I coded up a similar algorithm to yours using a Trie and compared it to the code you posted (with minor modifications) in a very informal benchmark.
With 20-char input, the Trie took 9ms. The original code didn't complete in reasonable time so I had to kill it.
Edit:
As to why your code doesn't return all hints, you don't want to break if the prefix is not in your dict. You should continue to check the next prefix instead.
Is there a recommended, faster algorithm?
See Wikipedia article on "String searching algorithm", in particular the section named "Algorithms using a finite set of patterns", where "finite set of patterns" is your dictionary.
The Aho–Corasick algorithm listed first might be a good choice.

Guava's CharMatcher removeFrom

There are two questions here actually. The first one:
1) Is Java smart enough not to copy one array element into itself?
What I mean by that is :
int i = 1;
char [] chars=... //some chars
char[1] = char[i]; // first element into itself
2) Are the benchmarking's that guava made available to the public?
And what I mean by that is: I was looking into the source code of CharMatcher removeFrom method and saw this :
// This unusual loop comes from extensive benchmarking
OUT: while (true) {
pos++;
while (true) {
if (pos == chars.length) {
break OUT;
}
if (isLetter(chars[pos])) {
break;
}
chars[pos - spread] = chars[pos];
pos++;
}
spread++;
}
return new String(chars, 0, pos - spread);
I really liked the idea, but coded my own method:
public static String removeMine(String input){
char [] chars = input.toCharArray();
int howManyLetters = 0;
for(int i=0;i<chars.length;++i){
if(isLetter(chars[i])) {
chars[howManyLetters++] = chars[i];
}
else {
if(i == (chars.length - 1)) break;
chars[i] = chars[i+1];
}
}
return new String(chars, 0, howManyLetters);
}
I then added some benchmarks (I will put them on github if needed), here are the results:
https://microbenchmarks.appspot.com/runs/61e76bdc-b0d6-4145-8b8b-c1683287f038#r:scenario.benchmarkSpec.parameters.input,scenario.benchmarkSpec.methodName
I have (serious?) doubts that creators of guava did not have a version like that and I assume there was a strong reason to drop it (the comment is more then obvious). What I would like to see is either the actual benchmarks that they made or some serious reason to have such a method. I assume it has to do with the type of JVM that you run your code into, but actual proof would be appreciated.
P.S. I am still to test this with jmh also, will provide results once I'm done.

Parse an Expression to its components and sub components

I need to parse an expression such as: neg(and(X,Y))
I need it to come out with the Abstract Stack Machine Code Such as for the example above:
LOAD X;
LOAD Y;
EXEC and;
EXEC neg;
But for now the machine code is not an issue, how can i parse / break up my input string of an expression into all its sub expressions?
I have tried to find the first bracket and then concat from that to the last bracket but that then gives isuess if you have a inner expression?
code that i have tried: (please not it is still very much in the development phase)
private boolean evaluateExpression(String expression) {
int brackets = 0;
int beginIndex = -1;
int endIndex = -1;
for (int i = 0; i < expression.length(); i++) {
if (expression.charAt(i) == '(') {
brackets++;
if (brackets == 0) {
endIndex = i;
System.out.println("the first expression ends at " + i);
}
}
if (expression.charAt(i) == ')') {
brackets--;
if (brackets == 0) {
endIndex = i;
System.out.println("the first expression ends at " + i);
}
}
}
// Check for 1st bracket
for (int i = 0; i < expression.length(); i++) {
if (expression.charAt(i) == '(') {
beginIndex = i;
break;
}
}
String subExpression = expression.substring(beginIndex, endIndex);
System.out.println("Sub expression: " + subExpression);
evaluateExpression(subExpression);
return false;
}
I am just looking for a basic solution, It only has to do: and, or, neg
The expressions you are trying to parse are actually making a Context Free Language, which can be represented as a Context Free Grammer.
You can create a context free grammer that represents this language of expressions, and use a CFG parser to parse it.
One existing java tool that does it (and more) is JavaCC, though it could be an overkill here.
Another algorithm to parse sentences using a CFG is CYK, which is fairly easy to program and use.
In here, the CFG representing the available expressions are:
S -> or(S,S)
S -> and(S,S)
S -> not(S)
S -> x | for each variable x
Note that though this is relatively simple CFG - the language it describes is irregular, so if you were hoping for regex - it's probably not the way to go.
Actually if you want your parser to be strong enough to deal with most cases, you would like to use a tokenizer(java has a implemented tokenizer class) to token the string first, then try to recognize each expression, storing operands and operators in a tree structure, then evaluate them recursively.
If you only want to deal with some simple situations, remember to use recursion, that is the core part~
Parsing things like this is typically done using syntax trees, using some type of preference for order of operations. An example for what you have posted would be as follows:
Processing items left to right the tree would be populated like this
1arg_fcall(neg)
2arg_fcall(and)
Load Y
Load X
Now we can recursively visit this tree bottom to top to get
Load X
Load Y
EXEC and //on X and Y
EXEC neg //on result of and

Bad results using Hopfield network

Im writing a program that will recognize traffic signs using neural networks and I have a problem with Hopfield network. I'm using this example to make my own hopfield network.
As an input, I use those traffic signs after normalization and it's a 50x50 matrix of 0 and 1.
The problem that I encounter is that when Hopfield network will learn 2 patterns it recognize them well, but when I try to train it with more than 2 patterns as a result it gives me a pattern that isn't matching any of those that it was trained on and it returns it for any input that I provide.
Here is my code, quite similar to the one from official encog examples:
public BiPolarNeuralData convertPattern(double[][] data, int index)
{
int resultIndex = 0;
BiPolarNeuralData result = new BiPolarNeuralData(WIDTH*HEIGHT);
for(int i=0;i<(WIDTH*HEIGHT);i++)
{
boolean znak=true;
if(data[index][i]==1)znak=true;
else znak=false;
result.setData(resultIndex++,data[index][i]==1.0);
}
return result;
}
public void display(BiPolarNeuralData pattern1,BiPolarNeuralData pattern2)
{
int index1 = 0;
int index2 = 0;
for(int row = 0;row<HEIGHT;row++)
{
StringBuilder line = new StringBuilder();
for(int col = 0;col<WIDTH;col++)
{
if(pattern1.getBoolean(index1++))
line.append('O');
else
line.append(' ');
}
line.append(" -> ");
for(int col = 0;col<WIDTH;col++)
{
if(pattern2.getBoolean(index2++))
line.append('O');
else
line.append(' ');
}
System.out.println(line.toString());
}
}
public void evaluate(HopfieldNetwork hopfieldLogic, double[][] pattern)
{
for(int i=0;i<pattern.length;i++)
{
BiPolarNeuralData pattern1 = convertPattern(pattern,i);
hopfieldLogic.setCurrentState(pattern1);
int cycles = hopfieldLogic.runUntilStable(100);
BiPolarNeuralData pattern2 = hopfieldLogic.getCurrentState();
System.out.println("Cycles until stable(max 100): " + cycles + ", result=");
display( pattern1, pattern2);
System.out.println("----------------------");
}
}
public BasicNetwork trainHopfieldNetwork(){
HopfieldNetwork hopfieldLogic = new HopfieldNetwork(HEIGHT*WIDTH);
for(int i=0;i<inputData.length;i++)
{
hopfieldLogic.addPattern(convertPattern(inputData,i));
System.out.println("Pattern : "+i);
}
evaluate(hopfieldLogic,inputData);
return null;
}
Where inputData is an array[2500] of type double.
What I've tried so far is:
Changing size of patterns to be smaller (10x10, 20x20).
Trying to learn different numbers of patterns (from 2 to 20). I always get strange results that don't match any of patterns that network was trained on.
So afterall the problem was the learning rule of network, since encog framework has implemented only hebb learning rule that isn't quite usefull for complex networks i had to implement pseudoinversion learning rule, and after that hopfield network started to recognize patterns without troubles

Categories

Resources