Suggestion on equalsIgnoreCase in Java to C implementation

Suggestion on equalsIgnoreCase in Java to C implementation - java

All,
In a bid to improve my C skills, I decided to start implementing various Java libraries/library functions to C code. This would ensure that everyone knows the functionality of my implementation at least. Here is the link to the C source code that simulates the equalsIgnoreCase() of String class in Java : C source code. I have tested the code and it looks fine as per my testing skills are concerned. My aim was to use as much basic operations and datatypes as possible. Though, it would be great if the gurus here can:
1 > Give me any suggestion to improve the code quality
2 > Enlighten me with any missing coding standard/practices
3 > Locate bugs in my logic.

100 lines of code is not too long to post here.
You calculate the string length twice. In C, the procedure to calculate the string length starts at the beginning of the string and runs along all of it (not necessarily in steps of 1 byte) until it finds the terminating null byte. If your strings are 2Mbyte long, you "walk" along 4Mbyte unnecessarily.
in <ctype.h> there are the two functions tolower() and toupper() declared. You can use one of them (tolower) instead of extractFirstCharacterASCIIVal(). The advantage of using the library function is that it is not locked in to ASCII and may even work with foreign characters when you go 'international'.
You use awkward (very long) names for your variables (and functions too). eg: ch1 and ch2 do very well for characters in file 1 and file 2 respectively :-)
return 1; at the end of main usually means something went wrong with the program. return 0; is idiomatic for successful termination.
Edit: for comparison with tcrosley version
#include <ctype.h>
int cmpnocase(const char *s1, const char *s2) {
while (*s1 && *s2) {
if (tolower((unsigned char)*s1) != tolower((unsigned char)*s2)) break;
s1++;
s2++;
}
return (*s1 != *s2);
}

With C++, you could replace performComparison(char* string1, char * string2) with stricmp.
However stricmp is not part of the standard C library. Here is a version adapted to your example. Note you don't need the extractFirstCharacterASCIIVal function, use tolower instead. Also note there is no need to explicitly calculate the string length ahead of time, as strings in C are terminated by the NULL character '\0'.
int performComparison(char* string1, char * string2)
{
char c1, c2;
int v;
do {
c1 = *string1++;
c2 = *string2++;
v = (UINT) tolower(c1) - (UINT) tolower(c2);
} while ((v == 0) && (c1 != '\0') && (c2 != '\0') );
return v != 0;
}
If you do want to use your own extractFirstCharacterASCIIVal function instead of the tolower macro, to make the code more transparent then you should code it like so:
if ((str >= 'a') && (str <= 'z'))
{
returnVal = str - ('a' - 'A');
}
else
{
returnVal = str;
}
to make it more obvious what you are doing. Also you should include a comment that this assumes the characters a..z and A..Z are contiguous. (They are in ASCII, but not always in other encodings.)

Related

Get parameters of nested functions

I am trying to implement a parser in Java to extract the arguments of some functions.
When I have a function like:
max(1, 2, 3)
I just simply can use a Regular Expresion to extract the args.
But all my functions are not like that. If I have some nested function, eg:
max(sum(1, max(1,2,sum(2,5)), 3, 5, mult(3,3))
I would like to obtain:
sum(1, max(1,2,sum(2,5))
3
5
mult(3,3)
I tried using a Regular Expression, but I asume the language is not regular. Another approach was splliting by ',', but did not work as well.
Is there any method for extracting the arguments of a function? I do not really know how this type of problem can be solved since there is no a pattern to use for extracting the arguments.
Any help or insight would be really appreciated. Thanks!!

Parsing a source code into an some abstract model is quite complex topic, depending on the language complexity.
But first step is usually tokenization, where you read one character at a time and detect full tokens (like variable names, function names, operators, literals etc).
Since you presented only very limited scope for the problem , you have very small set of tokens:
name of a function
( and ) to indicate method call
, to separate arguments
numbers
Reading one symbol at the time, you should be able to very easily detect when one token ends and the next one begins. Also your tokens are very distinct (i.e. you don't have to differentiate function name from variable name), you can very easily categorize them.
Once you have a token, you know the grammar (you have only function calls), you can easily build a syntax tree (where at the root you have top level function call with its arguments being children nodes).
From that structure you can easily fetch whichever parts you wish.
If you are more interested in how it works in the javac compiler, you can always check out its source code (it's open source after all):
https://github.com/openjdk/jdk/blob/master/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavacParser.java
https://github.com/openjdk/jdk/blob/master/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavaTokenizer.java
However, that may be quite a long read.

Finally found a method that works:
public List<String> parseArgs(String l){
int startIdx = l.indexOf("(") + 1;
int endIdx = l.lastIndexOf(")") - 1;
int count = 0;
int argIdx = startIdx;
List<String> args = new ArrayList<>();
for (int i = startIdx; i < endIdx; i++) {
if (l.charAt(i) == '(')
count -= 1;
else if (l.charAt(i) == ')'){
count += 1;
}
else if (l.charAt(i) == ',' && count == 0){
args.add(l.substring(argIdx, i).trim());
argIdx = i + 1;
}
}
args.add(l.substring(argIdx, endIdx + 1).trim());
return args;
}
String l = "max(sum(1, max(1,2,sum(2,5))), 3, 5, mult(3,3))";
parseArgs(l).forEach(System.out::println);
//Prints
sum(1, max(1,2,sum(2,5)))
3
5
mult(3,3)

Treating an integer as a boolean in Java

I'm working on updating an old Java application to work on modern operating systems and I've run into an error that I can't figure out. I don't have much experience with Java but from what I've read, you can't store boolean values in an integer (1 or 0) like in C++.
Here's the bit of code where the error is:
public static double a(cr paramcr, int paramInt) {
double d = 0.0D;
int i = ++b;
if (b <= l.length()) {
if (paramInt < 0) {
int j = b; // <----------- defined again as an integer.
d = e();
if (e == true && d >= h && d + f - g >= 1.0D && d + f - g <= d) {
String str1 = String.valueOf((int)d);
d = d + f - g;
String str2 = String.valueOf((int)d);
l = a(j, l, str1, str2, i);
} else if (j == false) { // <---------------------------- error
if (((d < 1.0D || d > d) ? false : true) == false)
c = 7;
}
With error message:
The operator == is undefined for the argument type(s) int, boolean
The variable j is defined as a static boolean at the top of the program, but later redefined as an integer inside of this 'a' class. I notice that there are other integer variables being used to compare to true and false statements in this class, but they are all being compared to some condition in order to get a true or false result. That's obviously not the case here, and my problem is that I can't think of a way that this program would have ever functioned in the past if this is how it was written. Any ideas as to why this could be or suggestions on my next move?
It doesn't help that none of this has any documentation or intuitive variable names.

In Java you cannot treat an int as a boolean. Period.
I was sent some files that I was told contained the source code. It did not. I was then told to decompile it.
What you are apparently looking at is some code that has been decompiled from a ".class" file. Decompilation is NOT guaranteed to produce valid (compilable) Java source code. And in this case, it appears that it hasn't. Indeed, there are clues in that code that imply that the original bytecodes were obfuscated ... to deliberately make it hard for the decompiler to generate readable / valid Java source code.
(The problem is that the same bytecodes are used dealing with boolean and integer types up to int. In this case, the decompiler has assumed that the local variable is an int, and not been able to figure out that its assumption was incorrect. A better decompiler might be able to figure it out ...)
So what you will need to do is figure out how to modify that (not-really-Java) code to make it 1) compilable, and 2) do the correct thing1.
It doesn't help that none of this has any documentation or intuitive variable names.
Well ... that what happens when you try to use decompiled code. All local variable names and comments (including javadocs) are discarded by the compiler, and the decompiler has no way to reconstruct them.
The alternative is to go back to the people who were supposed to give you the source code and ask them to provide it to you ... for real!
1 - This assumes that you can figure out what this method is really supposed to be doing. I don't think we can help you with that. For a start, it would probably be necessary to read the disassembled bytecodes to figure out what the code really does.

JVM String methods implementation

String class has some methods that i cannot understand why they were implemented like this... replace is one of them.
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Are there some significant advantages over a simpler and more efficient (fast!) method?
public static String replace(String string, String searchFor, String replaceWith) {
StringBuilder result=new StringBuilder();
int index=0;
int beginIndex=0;
while((index=string.indexOf(searchFor, index))!=-1){
result.append(string.substring(beginIndex, index)+replaceWith);
index+=searchFor.length();
beginIndex=index;
}
result.append(string.substring(beginIndex, string.length()));
return result.toString();
}
Stats with Java 7:
1,000,000 iterations
replace "b" with "x" in "a.b.c"
result: "a.x.c"
Times:
string.replace: 485ms
string.replaceAll: 490ms
optimized replace = 180ms
Code like the Java 7 split method is heavily optimized to avoid pattern compile / regex processing when possible:
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.value.length == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[]{this};
// Add remaining segment
if (!limited || list.size() < limit)
list.add(substring(off, value.length));
// Construct result
int resultSize = list.size();
if (limit == 0)
while (resultSize > 0 && list.get(resultSize - 1).length() == 0)
resultSize--;
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
Following the logic of the replace method:
public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}
The split implementation should be:
public String[] split(String regex, int limit) {
return Pattern.compile(regex).split(this, limit);
}
The performance losses are not far from the ones found on the replace methods. For some reason Oracle gives a fastpath approach on some methods and not others.

Are you sure your proposed method is indeed faster than the regex-based one used by the String class - not just for your own test input, but for every possible input that a program might throw at it? It relies on String.indexOf to do substring matching, which is itself a naive implementation that is subject to bad worst-case performance. It's entirely possible that Pattern implements a more sophisticated matching algorithm such as KMP to avoid redundant comparisons.
In general, the Java team takes performance of the core libraries very seriously, and maintains lots of internal benchmarks using a wide range of real-world data. I've never encountered a situation where regex processing was a bottleneck. My standing advice is to start by writing the simplest possible code that works correctly, and don't even begin to think about rewriting the Java built-ins until profiling proves that it's a bottleneck, and you've exhausted all other avenues of optimization.
Regarding your latest edit - first, I would not describe the split method as heavily optimized. It handles one special case that happens to be extremely common and is guaranteed not to suffer from the poor worst-case complexity described above for the naive string matching algorithm - that of splitting on a single-character, literal token.
It may very well be that the same special case could be optimized for replace, and would provide some measurable improvement. But look what it took to achieve that simple optimization - about 50 lines of code. Those lines of code come at a cost, especially when they're a part of what's probably the most widely-used class in the Java library. Cost comes in many forms:
Resources - That's 50 lines of code that some developer must spend time writing, testing, documenting, and maintaining for the lifetime of the Java language.
Risk - That's 50 opportunities for subtle bugs that slip past the initial testing.
Complexity - That's 50 extra lines of code that any developer who wants to understand how the method works must now take time to read and understand.
Your question now boils down to "why was this one method optimized to handle a special case, but not the other one?" or even more generally "why was this particular feature not implemented?" Nobody but the original author can answer that definitively, but the answer is almost always that either there is not sufficient demand for that feature, or that the benefit derived from having the feature is deemed not worth the cost of adding it.

Must be an array type but resolved to string

I am getting the "Must be an array type but it resolved to string" error in my code. It also says that i (in the code below) cannot be resolved to a variable which I don't get.
public class DNAcgcount{
public double ratio(String dna){
int count=0;
for (int i=0;i<dna.length();i++);
if (dna[i]== "c"){
count+= 1;
if (dna[i]=="g"){
count+=1;
double answer = count/dna.length();
return answer;
}
}
}
}
Could you guys please help me figure out where the problem lies? I'm new to coding in Java so I am not entirely comfortable with the format yet.
Thanks a lot,
Junaid

You cannot access a String's character using subscript (dna[i]). Use charAt instead:
dna.charAt(i) == 'c'
Also, "c" is a String, 'c' is a char.
One more thing - integer division ( e.g. int_a / int_b ) results in an int, and so you lose accuracy, instead - cast one of the ints to double:
double answer = count/(double)dna.length();

Use {} to define the scope of the loop. Also, as others already pointed out, use charAt instead of [] and use ' for characters, and use floating point division for the ratio.
for (int i = 0; i < dna.length(); i++) {
if (dna.charAt(i) == 'c') {
count += 1;
}
if (dna.charAt(i) == 'g') {
count += 1;
}
}
Or a bit shorter, use || to or the two clauses together
if (dna.charAt(i) == 'c' || dna.charAt(i) == 'g') {
count += 1;
}

I think you are currently a bit weak at brackets , this is what i understood from your code and corrected it;
public class DNAcgcount{
public double ratio(String dna){
int count=0;
for (int i=0;i<dna.length();i++){
if (dna.charAt(i)== 'c')
count+= 1;
if (dna.charAt(i)=='g')
count+=1;
}
double answer = count/(double)dna.length();
return answer;
}
}
After if we have to close the brackets when what you want in if is finished . I think you wanted count to be the number of time c or g is present in the dna.
You also did some other mistakes like you have to use 'c' and 'g' instead of "c" and "g" if you are using .charAt(i) because it will be treated like a character and then only you can compare .
You may view this link
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/if.html
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/for.html
and you may also have a look at works you can do with string like charAt.

It seems like that you have a few problems with the main syntax of basic java functions like loops or if-else statement. Click here for a good tutorial on these.
You must correct your for-loop and your if-statement:
for(int i=0;i<dna.length();i++){
if(...){
...;
}
if(...){
...;
}
}
Now you wont get the Cant be resolved to a variable... exception.
Second thing is the usage of your string. You have to use it like this:
for(int i=0;i<dna.length();i++){
if(dna.charAt(i) == 'c'){
count += 1;
}
if(dna.charAt(i) == 'g'){
count += 1;
}
}
Now all your exceptions should be eleminated.

Your problem is with syntax dna[i], dna is a string and you access it as it would be an array by []. Use dna.charAt(i); instead.

You using String incorrectly. Instead of accessing via [] use dna.charAt(i).
Altough logically a string is an array of characters in Java a String type is a class (which means it has attributes and methods) and not a typical array.
And if you want to compare a single character to another enclose it with '' instead of "":
if (dna.charAt(i) == 'c')
.
.

There are two errors:
count should be double or should be casted do double answer = (double)count / dna.length();
and as mentioned above you should replace dna[i] with dna.charAt(i)

Recursive method to determine if a string is a hex number - Java

This is a homework question that I am having a bit of trouble with.
Write a recursive method that determines if a String is a hex number.
Write javadocs for your method.
A String is a hex number if each and every character is either
0 or 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9
or a or A or b or B or c or C or d or D or e or E or f or f.
At the moment all I can see to test this is if the character at 0 of the string is one of these values he gave me then that part of it is a hex.
Any hints or suggestions to help me out?
This is what I have so far: `
public boolean isHex(String string){
if (string.charAt(0)==//what goes here?){
//call isHex again on s.substring(1)
}else return false;
}
`

If you're looking for a good hex digit method:
boolean isHexDigit(char c) {
return Character.isDigit(c) || (Character.toUpperCase(c) >= 'A' && Character.toUpperCase(c) <= 'F');
}
Hints or suggestions, as requested:
All recursive methods call themselves with a different input (well, hopefully a different input!)
All recursive methods have a stop condition.
Your method signature should look something like this
boolean isHexString(String s) {
// base case here - an if condition
// logic and recursion - a return value
}
Also, don't forget that hex strings can start with "0x". This might be (more) difficult to do, so I would get the regular function working first. If you tackle it later, try to keep in mind that 0xABCD0x123 shouldn't pass. :-)
About substring: Straight from the String class source:
public String substring(int beginIndex, int endIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > count) {
throw new StringIndexOutOfBoundsException(endIndex);
}
if (beginIndex > endIndex) {
throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
}
return ((beginIndex == 0) && (endIndex == count)) ? this :
new String(offset + beginIndex, endIndex - beginIndex, value);
}
offset is a member variable of type int
value is a member variable of type char[]
and the constructor it calls is
String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}
It's clearly an O(1) method, calling an O(1) constructor. It can do this because String is immutable. You can't change the value of a String, only create a new one. (Let's leave out things like reflection and sun.misc.unsafe since they are extraneous solutions!) Since it can't be changed, you also don't have to worry about some other Thread messing with it, so it's perfectly fine to pass around like the village bicycle.

Since this is homework, I only give some hints instead of code:
Write a method that always tests the first character of a String if it fulfills the requirements. If not, return false, if yes, call the method again with the same String, but the first character missing. If it is only 1 character left and it is also a hex character then return true.
Pseudocode:
public boolean isHex(String testString) {
If String has 0 characters -> return true;
Else
If first character is a hex character -> call isHex with the remaining characters
Else if the first character is not a hex character -> return false;
}

When solving problems recursively, you generally want to solve a small part (the 'base case'), and then recurse on the rest.
You've figured out the base case - checking if a single character is hex or not.
Now you need to 'recurse on the rest'.
Here's some pseudocode (Python-ish) for reversing a string - hopefully you will see how similar methods can be applied to your problem (indeed, all recursive problems)
def ReverseString(str):
# base case (simple)
if len(str) <= 1:
return str
# recurse on the rest...
return last_char(str) + ReverseString(all_but_last_char(str))

Sounds like you should recursively iterate the characters in string and return the boolean AND of whether or not the current character is in [0-9A-Fa-f] with the recursive call...

You have already received lots of useful answers. In case you want to train your recursive skills (and Java skills in general) a bit more I can recommend you to visit Coding Bat. You will find a lot of exercises together with automated tests.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.