Guava's CharMatcher removeFrom

Guava's CharMatcher removeFrom - java

There are two questions here actually. The first one:
1) Is Java smart enough not to copy one array element into itself?
What I mean by that is :
int i = 1;
char [] chars=... //some chars
char[1] = char[i]; // first element into itself
2) Are the benchmarking's that guava made available to the public?
And what I mean by that is: I was looking into the source code of CharMatcher removeFrom method and saw this :
// This unusual loop comes from extensive benchmarking
OUT: while (true) {
pos++;
while (true) {
if (pos == chars.length) {
break OUT;
}
if (isLetter(chars[pos])) {
break;
}
chars[pos - spread] = chars[pos];
pos++;
}
spread++;
}
return new String(chars, 0, pos - spread);
I really liked the idea, but coded my own method:
public static String removeMine(String input){
char [] chars = input.toCharArray();
int howManyLetters = 0;
for(int i=0;i<chars.length;++i){
if(isLetter(chars[i])) {
chars[howManyLetters++] = chars[i];
}
else {
if(i == (chars.length - 1)) break;
chars[i] = chars[i+1];
}
}
return new String(chars, 0, howManyLetters);
}
I then added some benchmarks (I will put them on github if needed), here are the results:
https://microbenchmarks.appspot.com/runs/61e76bdc-b0d6-4145-8b8b-c1683287f038#r:scenario.benchmarkSpec.parameters.input,scenario.benchmarkSpec.methodName
I have (serious?) doubts that creators of guava did not have a version like that and I assume there was a strong reason to drop it (the comment is more then obvious). What I would like to see is either the actual benchmarks that they made or some serious reason to have such a method. I assume it has to do with the type of JVM that you run your code into, but actual proof would be appreciated.
P.S. I am still to test this with jmh also, will provide results once I'm done.

Related

Foobar code working in IDE but not in Solution file

I've written code for a foobar challenge that works in my IDE but not in the solutions file provided by foobar. Also, is there anyway to show the output even if the test fails? Possibly to with it being a static method or the input being {1, 2, 3, 4} whereas mine is working with new int {1,2,3,4,5}? My code is:
public static int solution(int[] l) {
List<Integer> numberList = Arrays.stream(l).boxed().sorted(Comparator.reverseOrder()).collect(Collectors.toList());
while (true) {
StringBuilder number = new StringBuilder();
int i = 0;
while (i < numberList.size()) {
number.append(numberList.get(i));
i++;
}
List<Integer> startingList = Arrays.stream(l).boxed().sorted(Comparator.reverseOrder()).collect(Collectors.toList());
int testValue = numberList.size();
for (Integer integer : numberList) {
if (startingList.contains(integer)) {
startingList.remove(integer);
testValue--;
}
}
if (testValue == 0) {
int f = 0;
int total = 0;
while (f < numberList.size()) {
total = total + numberList.get(f);
f++;
}
if (total % 3 == 0) {
StringBuilder answer = new StringBuilder();
int c = 0;
while (c < numberList.size()) {
answer.append(numberList.get(c));
c++;
}
return Integer.parseInt(answer.toString());
}
}
Integer nextNumber = Integer.parseInt(number.toString()) - 1;
String[] stringArray = valueOf(nextNumber).split("");
numberList = new ArrayList<>();
for (String s : stringArray) {
numberList.add(Integer.parseInt(s));
}
}
}
Pretty rubbish but it does the job (at least in my IDE!)

As mentioned in a comment on the question, you should undoubtedly give some more context for your questions (since it is pretty unclear what your code is intended to do). I'm pretty sure I've inferred the actual question from context though, and I can suggest a couple of problems. In short (and a pretty good assumption for coding in general) the issue is not the environment running your code incorrectly, but rather your code having missed bugs due to lack of comprehensive testing. If you had presented a number of sample inputs and results I would guess you would have seen that your solution does not work locally.
The Java List.remove() method takes an index rather than a value to be removed (https://docs.oracle.com/javase/8/docs/api/java/util/List.html). The way it is used in your sample will result in throwing exceptions in a number of circumstances. Proper testing would have identified this (and will pick up most of your problems if fixed)
What happens if there is no solution? For example, an input of {1, 1} is going to get into a pretty messy state as the 'nextNumber' value slips below 0. You should know what the desired behavior is in this situation, and your tests should cover it before you try to upload a solution

This happened to me as well, but I then realized that my compilation was not successful because I have not imported the package that I am using at the top of the source code file like all java programs are write

Find all valid words when given a string of characters (Recursion / Binary Search)

I'd like some feedback on a method I tried to implement that isn't working 100%. I'm making an Android app for practice where the user is given 20 random letters. The user then uses these letters to make a word of whatever size. It then checks a dictionary to see if it is a valid English word.
The part that's giving me trouble is with showing a "hint". If the user is stuck, I want to display the possible words that can be made. I initially thought recursion. However, with 20 letters this can take quite a long time to execute. So, I also implemented a binary search to check if the current recursion path is a a prefix to anything in the dictionary. I do get valid hints to be output however it's not returning all possible words. Do I have a mistake here in my recursion thinking? Also, is there a recommended, faster algorithm? I've seen a method in which you check each word in a dictionary and see if the characters can make each word. However, I'd like to know how effective my method is vs. that one.
private static void getAllWords(String letterPool, String currWord) {
//Add to possibleWords when valid word
if (letterPool.equals("")) {
//System.out.println("");
} else if(currWord.equals("")){
for (int i = 0; i < letterPool.length(); i++) {
String curr = letterPool.substring(i, i+1);
String newLetterPool = (letterPool.substring(0, i) + letterPool.substring(i+1));
if(dict.contains(curr)){
possibleWords.add(curr);
}
boolean prefixInDic = binarySearch(curr);
if( !prefixInDic ){
break;
} else {
getAllWords(newLetterPool, curr);
}
}
} else {
//Every time we add a letter to currWord, delete from letterPool
//Attach new letter to curr and then check if in dict
for(int i=0; i<letterPool.length(); i++){
String curr = currWord + letterPool.substring(i, i+1);
String newLetterPool = (letterPool.substring(0, i) + letterPool.substring(i+1));
if(dict.contains(curr)) {
possibleWords.add(curr);
}
boolean prefixInDic = binarySearch(curr);
if( !prefixInDic ){
break;
} else {
getAllWords(newLetterPool, curr);
}
}
}
private static boolean binarySearch(String word){
int max = dict.size() - 1;
int min = 0;
int currIndex = 0;
boolean result = false;
while(min <= max) {
currIndex = (min + max) / 2;
if (dict.get(currIndex).startsWith(word)) {
result = true;
break;
} else if (dict.get(currIndex).compareTo(word) < 0) {
min = currIndex + 1;
} else if(dict.get(currIndex).compareTo(word) > 0){
max = currIndex - 1;
} else {
result = true;
break;
}
}
return result;
}

The simplest way to speed up your algorithm is probably to use a Trie (a prefix tree)
Trie data structures offer two relevant methods. isWord(String) and isPrefix(String), both of which take O(n) comparisons to determine whether a word or prefix exist in a dictionary (where n is the number of letters in the argument). This is really fast because it doesn't matter how large your dictionary is.
For comparison, your method for checking if a prefix exists in your dictionary using binary search is O(n*log(m)) where n is the number of letters in the string and m is the number of words in the dictionary.
I coded up a similar algorithm to yours using a Trie and compared it to the code you posted (with minor modifications) in a very informal benchmark.
With 20-char input, the Trie took 9ms. The original code didn't complete in reasonable time so I had to kill it.
Edit:
As to why your code doesn't return all hints, you don't want to break if the prefix is not in your dict. You should continue to check the next prefix instead.

Is there a recommended, faster algorithm?
See Wikipedia article on "String searching algorithm", in particular the section named "Algorithms using a finite set of patterns", where "finite set of patterns" is your dictionary.
The Aho–Corasick algorithm listed first might be a good choice.

Java - Complex Recursive Backtracking

For Java practice I started working on a method countBinary that accepts an integer n as a parameter that prints all binary numbers that have n digits in ascending order, printing each value on a separate line. Assuming n is non-negative and greater than 0, some example outputs would look like this.
I am getting pretty much nowhere with this. I am able to write a program that finds all possible letter combinations of a String and similar things, but I have been unable to make almost any progress with this specific problem using binary and integers.
Apparently the best way to go about this issue is by defining a helper method that accepts different parameters than the original method and by building up a set of characters as a String for eventual printing.
Important Note: I am NOT supposed to use for loops at all for this exercise.
Edit - Important Note: I need to have trailing 0's so that all outputs are the same length.
So far this is what I have:
public void countBinary(int n)
{
String s = "01";
countBinary(s, "", n);
}
private static void countBinary(String s, String chosen, int length)
{
if (s.length() == 0)
{
System.out.println(chosen);
}
else
{
char c = s.charAt(0);
s = s.substring(1);
chosen += c;
countBinary(s, chosen, length);
if (chosen.length() == length)
{
chosen = chosen.substring(0, chosen.length() - 1);
}
countBinary(s, chosen, length);
s = c + s;
}
}
When I run my code my output looks like this.
Can anyone explain to me why my method is not running the way I expect it to, and if possible show me a solution to my issue so that I might get the correct output? Thank you!

There are more efficient ways to do it, but this will give you a start:
public class BinaryPrinter {
static void printAllBinary(String s, int n) {
if (n == 0) System.out.println(s);
else {
printAllBinary(s + '0', n - 1);
printAllBinary(s + '1', n - 1);
}
}
public static void main(String [] args) {
printAllBinary("", 4);
}
}
I'll let you work out the more efficient way.

Common Substring of two strings

This particular interview-question stumped me:
Given two Strings S1 and S2. Find the longest Substring which is a Prefix of S1 and suffix of S2.
Through Google, I came across the following solution, but didnt quite understand what it was doing.
public String findLongestSubstring(String s1, String s2) {
List<Integer> occurs = new ArrayList<>();
for (int i = 0; i < s1.length(); i++) {
if (s1.charAt(i) == s2.charAt(s2.length()-1)) {
occurs.add(i);
}
}
Collections.reverse(occurs);
for(int index : occurs) {
boolean equals = true;
for(int i = index; i >= 0; i--) {
if (s1.charAt(index-i) != s2.charAt(s2.length() - i - 1)) {
equals = false;
break;
}
}
if(equals) {
return s1.substring(0,index+1);
}
}
return null;
}
My questions:
How does this solution work?
And how do you get to discovering this solution?
Is there a more intuitive / easier solution?

Part 2 of your question
Here is a shorter variant:
public String findLongestPrefixSuffix(String s1, String s2) {
for( int i = Math.min(s1.length(), s2.length()); ; i--) {
if(s2.endsWith(s1.substring(0, i))) {
return s1.substring(0, i);
}
}
}
I am using Math.min to find the length of the shortest String, as I don't need to and cannot compare more than that.
someString.substring(x,y) returns you the String you get when reading someString beginning from character x and stopping at character y. I go backwards from the biggest possible substring (s1 or s2) to the smallest possible substring, the empty string. This way the first time my condition is true it will be biggest possible substring the fulfills it.
If you prefer you can go the other way round, but you have to introduce a variable saving the length of the longest found substring fulfilling the condition so far:
public static String findLongestPrefixSuffix(String s1, String s2) {
if (s1.equals(s2)) { // this part is optional and will
return s1; // speed things up if s1 is equal to s2
} //
int max = 0;
for (int i = 0; i < Math.min(s1.length(), s2.length()); i++) {
if (s2.endsWith(s1.substring(0, i))) {
max = i;
}
}
return s1.substring(0, max);
}
For the record: You could start with i = 1 in the latter example for a tiny bit of extra performance. On top of this you can use i to specify how long the suffix has at least to be you want to get. ;) If you writ Math.min(s1.length(), s2.length()) - x you can use x to specify how long the found substring may be at most. Both of these things are possible with the first solution, too, but the min length is a bit more involving. ;)
Part 1 of your question
In the part above the Collections.reverse the author of the code searches for all positions in s1 where the last letter of s2 is and saves this position.
What follows is essentially what my algorithm does, the difference is, that he doesn't check every substring but only those that end with the last letter of s2.
This is some sort of optimization to speed things up. If speed is not that important my naive implementation should suffice. ;)

Where did you find that solution? Was it written by a credible, well-respected coder? If you're not sure of that, then it might not be worth reading it. One could write really complex and inefficient code to accomplish something really simple, and it will not be worth understanding the algorithm.
Rather than trying to understand somebody else's solution, it might be easier to come up with it on your own. I think you understand the problem much better that way, and the logic becomes your own. Over time and practice the thought process will start to come more naturally. Practice makes perfect.
Anyway, I put a more simple implementation in Python here (spoiler alert!). I suggest you first figure out the solution on your own, and compare it to mine later.

Apache commons lang3, StringUtils.getCommonPrefix()
Java is really bad in providing useful stuff via stdlib. On the plus side there's almost always some reasonable tool from Apache.

I converted the #TheMorph's answer to javascript. Hope this helps js developer
if (typeof String.prototype.endsWith !== 'function') {
String.prototype.endsWith = function(suffix) {
return this.indexOf(suffix, this.length - suffix.length) !== -1;
};
}
function findLongestPrefixSuffix(s2, s1) {
for( var i = Math.min(s1.length, s2.length); ; i--) {
if(s2.endsWith(s1.substring(0, i))) {
return s1.substring(0, i);
}
}
}
console.log(findLongestPrefixSuffix('abc', 'bcd')); // result: 'bc'

Generate formatted diff output in Java

Are there any libraries out there for Java that will accept two strings, and return a string with formatted output as per the *nix diff command?
e.g. feed in
test 1,2,3,4
test 5,6,7,8
test 9,10,11,12
test 13,14,15,16
and
test 1,2,3,4
test 5,6,7,8
test 9,10,11,12,13
test 13,14,15,16
as input, and it would give you
test 1,2,3,4 test 1,2,3,4
test 5,6,7,8 test 5,6,7,8
test 9,10,11,12 | test 9,10,11,12,13
test 13,14,15,16 test 13,14,15,16
Exactly the same as if I had passed the files to diff -y expected actual
I found this question, and it gives some good advice on general libraries for giving you programmatic output, but I'm wanting the straight string results.
I could call diff directly as a system call, but this particular app will be running on unix and windows and I can't be sure that the environment will actually have diff available.

java-diff-utils
The DiffUtils library for computing
diffs, applying patches, generationg
side-by-side view in Java
Diff Utils library is an OpenSource
library for performing the comparison
operations between texts: computing
diffs, applying patches, generating
unified diffs or parsing them,
generating diff output for easy future
displaying (like side-by-side view)
and so on.
Main reason to build this library was
the lack of easy-to-use libraries with
all the usual stuff you need while
working with diff files. Originally it
was inspired by JRCS library and it's
nice design of diff module.
Main Features
computing the difference between two texts.
capable to hand more than plain ascci. Arrays or List of any type that
implements hashCode() and equals()
correctly can be subject to
differencing using this library
patch and unpatch the text with the given patch
parsing the unified diff format
producing human-readable differences

I ended up rolling my own. Not sure if it's the best implementation, and it's ugly as hell, but it passes against test input.
It uses java-diff to do the heavy diff lifting (any apache commons StrBuilder and StringUtils instead of stock Java StringBuilder)
public static String diffSideBySide(String fromStr, String toStr){
// this is equivalent of running unix diff -y command
// not pretty, but it works. Feel free to refactor against unit test.
String[] fromLines = fromStr.split("\n");
String[] toLines = toStr.split("\n");
List<Difference> diffs = (new Diff(fromLines, toLines)).diff();
int padding = 3;
int maxStrWidth = Math.max(maxLength(fromLines), maxLength(toLines)) + padding;
StrBuilder diffOut = new StrBuilder();
diffOut.setNewLineText("\n");
int fromLineNum = 0;
int toLineNum = 0;
for(Difference diff : diffs) {
int delStart = diff.getDeletedStart();
int delEnd = diff.getDeletedEnd();
int addStart = diff.getAddedStart();
int addEnd = diff.getAddedEnd();
boolean isAdd = (delEnd == Difference.NONE && addEnd != Difference.NONE);
boolean isDel = (addEnd == Difference.NONE && delEnd != Difference.NONE);
boolean isMod = (delEnd != Difference.NONE && addEnd != Difference.NONE);
//write out unchanged lines between diffs
while(true) {
String left = "";
String right = "";
if (fromLineNum < (delStart)){
left = fromLines[fromLineNum];
fromLineNum++;
}
if (toLineNum < (addStart)) {
right = toLines[toLineNum];
toLineNum++;
}
diffOut.append(StringUtils.rightPad(left, maxStrWidth));
diffOut.append(" "); // no operator to display
diffOut.appendln(right);
if( (fromLineNum == (delStart)) && (toLineNum == (addStart))) {
break;
}
}
if (isDel) {
//write out a deletion
for(int i=delStart; i <= delEnd; i++) {
diffOut.append(StringUtils.rightPad(fromLines[i], maxStrWidth));
diffOut.appendln("<");
}
fromLineNum = delEnd + 1;
} else if (isAdd) {
//write out an addition
for(int i=addStart; i <= addEnd; i++) {
diffOut.append(StringUtils.rightPad("", maxStrWidth));
diffOut.append("> ");
diffOut.appendln(toLines[i]);
}
toLineNum = addEnd + 1;
} else if (isMod) {
// write out a modification
while(true){
String left = "";
String right = "";
if (fromLineNum <= (delEnd)){
left = fromLines[fromLineNum];
fromLineNum++;
}
if (toLineNum <= (addEnd)) {
right = toLines[toLineNum];
toLineNum++;
}
diffOut.append(StringUtils.rightPad(left, maxStrWidth));
diffOut.append("| ");
diffOut.appendln(right);
if( (fromLineNum > (delEnd)) && (toLineNum > (addEnd))) {
break;
}
}
}
}
//we've finished displaying the diffs, now we just need to run out all the remaining unchanged lines
while(true) {
String left = "";
String right = "";
if (fromLineNum < (fromLines.length)){
left = fromLines[fromLineNum];
fromLineNum++;
}
if (toLineNum < (toLines.length)) {
right = toLines[toLineNum];
toLineNum++;
}
diffOut.append(StringUtils.rightPad(left, maxStrWidth));
diffOut.append(" "); // no operator to display
diffOut.appendln(right);
if( (fromLineNum == (fromLines.length)) && (toLineNum == (toLines.length))) {
break;
}
}
return diffOut.toString();
}
private static int maxLength(String[] fromLines) {
int maxLength = 0;
for (int i = 0; i < fromLines.length; i++) {
if (fromLines[i].length() > maxLength) {
maxLength = fromLines[i].length();
}
}
return maxLength;
}

Busybox has a diff implementation that is very lean, should not be hard to convert to java, but you would have to add the two-column functionality.

http://c2.com/cgi/wiki?DiffAlgorithm I found this on Google and it gives some good background and links. If you care about the algorithm beyond just doing the project, a book on basic algorithm that covers Dynamic Programming or a book just on it. Algorithm knowledge is always good:)

You can use Apache Commons Text library to achieve this. This library provides 'diff' capability based on "very efficient algorithm from Eugene W. Myers".
This provides you ability to create your own visitor so that you can process the diff in the way you want & may be output to console or HTML etc. Here is one article which walks through nice & simple example to output side by side diff in HTML format using Apache Commons Text library & simple Java code.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Guava's CharMatcher removeFrom - java

Related

Foobar code working in IDE but not in Solution file

Find all valid words when given a string of characters (Recursion / Binary Search)

Java - Complex Recursive Backtracking

Common Substring of two strings

Generate formatted diff output in Java

Categories

Resources