Problem with underscore(_) in Collections.binarySearch (Java)

Problem with underscore(_) in Collections.binarySearch (Java) - java

Problem:
I am using Java Tutorials™ sourcecode for this. This is the source code.
I tried this:
--following with another section of sorted words--
words.add("count");
words.add("cvs");
words.add("dce");
words.add("depth");
--following with another section of sorted words--
and it works perfectly. However when I use this:
--just a section of sorted words--
words.add("count");
words.add("cvs");
words.add("dce_iface");
words.add("dce_opnum");
words.add("dce_stub_data");
words.add("depth");
--following with another section of sorted words--
It does show dce_iface when I type dce, but when I type _ then following with o or s it shows me something else like dce_offset where the offset comes from words.add("fragoffset"); somewhere in the list.
What can I do to solve this problem? Thank you in advance.

It's probably because of these lines in the code:
for (w = pos; w >= 0; w--) {
if (! Character.isLetter(content.charAt(w))) {
break;
}
}
_ is not a letter character, so it treats it the same way as a space. You can try changing the condition to:
char c = content.charAt(w);
if (! (Character.isLetter(c) || c == '_')) {

I guess you have to add the underscore as "letter" here
// Find where the word starts
int w;
for (w = pos; w >= 0; w--) {
if (!Character.isLetter(content.charAt(w))) {
break;
}
}

It has to do with this section in insertUpdate():
// Find where the word starts
int w;
for (w = pos; w >= 0; w--) {
if (! Character.isLetter(content.charAt(w))) {
break;
}
}
Specifically, Character.isLetter() returns false for the underscore character. That means that the word starts after the underscore position.
To solve it, you need to modify the if statement to allow any non letter characters you want to use in the words. You could explicitly check for '_' or use Chacter.isWhiteSpace() to include all characters that aren't a space, tab or newline.

Related

Valid parentheses in Java

Code:
public static void main(String[] args) {
Arrays.asList("a+(b*c)-2-a", "(a+b*(2-c)-2+a)*2", "(a*b-(2+c)", "2*(3-a))", ")3+b*(2-c)(")
.stream().forEach((expression) -> {
if (replaceAll(expression, "[(]") == replaceAll(expression, "[)]")) {
System.out.println("correct");
} else {
System.out.println("incorrect");
}
});
}
private static int replaceAll(String word, String regex) {
int count = word.length() - word.replaceAll(regex, "").length();
return count;
}
I have to check if the expression is valid or not. What determine if an expression is valid or not are the parentheses. If it's self closed, is valid, otherwise, not.
My code is almost correct, it's printing:
correct
correct
incorrect
incorrect
correct
But it must print
correct
correct
incorrect
incorrect
incorrect -> the last expression isn't valid.

You need not only to check if the number of opening parentheses matches the number of closed, but also if each closing parenthesis goes after opening one which isn't "closed" yet:
static boolean checkParentheses(String s) {
int opened = 0;
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) == '(')
opened++;
else if (s.charAt(i) == ')') {
if (opened == 0) // means that all parentheses are "closed" yet
return false;
opened--;
}
}
return opened == 0;
}
If you strictly need regex to be involved, do the following:
static boolean checkParentheses(String s) {
// capture a text starting with one opening parenthesis,
// ending with one closing and having no parentheses inside
Pattern p = Pattern.compile("\\([^()]*\\)");
Matcher m;
while ((m = p.matcher(s)).find())
s = m.replaceAll("");
return !(s.contains("(") || s.contains(")"));
}

Your issue is that it's not enough just to count parentheses; you also need to spot where a ')' comes too early. For example ")(" is not valid even though there are an equal number of opening and closing parentheses.
One approach is to keep a count. Start at zero. Each time you see '(', count++. Each time you see ')', count--.
After a decrement, if(count<0) the input is invalid.
At the end of input, if(count!0) the input is invalid.
It's been pointed out that this can't be done in a single regex. That's because a regex represents a finite state machine. count could in principle increase infinitely.
If you pick a maximum nesting depth, you can write a regex to check it. For example, for a maximum depth of 3:
x*(<x*(<x*(<x*>)*x*>)*x*>)*x*
(I've used 'x' instead of arbitrary chars here, for readability. Replace it with [^<>] to actually match other chars. I've also used <> instead of \(\) again for readability. The () here are for grouping.).
You can always make it work one level deeper by replacing the x* in the middle with x*(<x*>)*x* -- but you can never made a regex that doesn't stop working at a certain depth.
An alternative method is closer to what a real statement parser would do with nested structures: recurse. Something like (pseudocode):
def consumeBlock() {
switch(next char)
case end-of-input
throw error -- reached end of input inside some parentheses
case '('
consumeBlock() -- go down a nesting level
break;
case ')'
return -- go up a nesting level
default
It's an uninteresting character. Do nothing.
(a real parser compiler would do something more interesting)
}
Here consumeBlock() assumes you've just consumed a '(' and you intend to read until its pair.
Some of your inputs don't begin with a '(', so prime it by first appending a ')' to the end, as the pair to a "silent" ')' you're saying it's already consumed.
The pseudocode already shows that if you hit end-of-input mid-block, it's invalid input. Also if you are not at end-of-input when the top-level call to consumeBlock() returns, it's invalid input.

You could go through it char by char and using a counter to tell the parenthesis level in the statement.
boolean valid = true;
int level = 0;
for(int i=0; i < expr.length(); i++) {
if(expr.charAt(i) == '(') level++;
if(expr.charAt(i) == ')') level--;
if(level < 0) { // ) with no (
valid = false;
break;
}
}
if(level > 0) valid = false; // ( with no )
return valid; // true if level returned to 0

Java - See if a string contains any characters in it

The problem i'm having is when i check to see if the string contains any characters it only looks at the first character not the whole string. For instance I would like to be able to input "123abc" and the characters are recognized so it fails. I also need the string to be 11 characters long and since my program only works with 1 character it cannot go any further.
Here is my code so far:
public static int phoneNumber(int a)
{
while (invalidinput)
{
phoneNumber[a] = myScanner.nextLine();
if (phoneNumber[a].matches("[0-9+]") && phoneNumber[a].length() == 11 )
{
System.out.println("Continue");
invalidinput = false;
}
else
{
System.out.print("Please enter a valid phone number: ");
}
}
return 0;
}
For instance why if i take away the checking to see the phoneNumber.length() it still only registers 1 character so if i enter "12345" it still fails. I can only enter "1" for the program to continue.
If someone could explain how this works to me that would be great

Your regex and if condition is wrong. Use it like this:
if ( phoneNumber[a].matches("^[0-9]{11}$") ) {
System.out.println("Continue");
invalidinput = false;
}
This will only allow phoneNumber[a] to be a 11 character long comprising only digits 0-9

The + should be outside the set, or you could specifically try to match 11 digits like this: ^[0-9]{11}$ (the ^ and $ anchor the match to the start and end of the string).

You need to put the "+" after the "]" in your regex. So, you would change it to:
phoneNumber[a].matches("[0-9]+")

Why not try using a for loop to go through each character?
Like:
public static int phoneNumber(int a)
{
while (invalidinput)
{
int x = 0;
for(int i = 0; i < phoneNumber[a].length(); i++)
{
char c = phoneNumber[a].charAt(i);
if(c.matches("[0-9+]")){
x++;
}
}
if (x == phoneNumber[a].length){
System.out.println("Continue");
invalidinput = false;
}
else
{
System.out.print("Please enter a valid phone number: ");
}
}
return 0;
}

Are the legal characters in your phone numbers 0..9 and +? If so, then you should use the regular expression [0-9+]*, which matches zero or more legal characters. (If not, you probably meant [0-9]+.) Also, you can use [0-9+]{11} instead of your explicit check for a length of 11.
The reason that your current code fails, is that String#matches() does not check whether the regular expression matches part of the string, but whether it matches all of the string. You can see this in the JavaDoc, which points you to Matcher#matches(), which "Attempts to match the entire region against the pattern."

Finding Palindromes in a word list

I'm working on a program for Java on how to find a list of palindromes that are embedded in a word list file. I'm in an intro to Java class so any sort of help or guidance will be greatly appreciated!
Here is the code I have so far:
import java.util.Scanner;
import java.io.File;
class Palindromes {
public static void main(String[] args) throws Exception {
String pathname = "/users/abrick/resources/american-english-insane";
File dictionary = new File(pathname);
Scanner reader = new Scanner(dictionary);
while (reader.hasNext()) {
String word = reader.nextLine();
for (int i = 0; i > word.length(); i++) {
if (word.charAt(word.indexOf(i) == word.charAt(word.indexOf(i)) - 1) {
System.out.println(word);
}
}
}
}
}
There are 3 words that are 7 letters or longer in the list that I am importing.

You have a few ways to solve this problem.
A word is considered a palindrome if:
It can be read the same way backwards as forwards.
The first element is the same as the last element, up until we reach the middle.
Half of the word is the same as the other half, reversed.
A word of length 1 is trivially a palindrome.
Ultimately, your method isn't doing much of that. In fact, you're not doing any validation at all - you're only printing the word if the first and last character match.
Here's a proposal: Let's read each end of the String, and see if it's a palindrome. We have to take into account the case that it could potentially be empty, or be of length 1. We also want to get rid of any white space in the string, as that can cause errors on validation - we use replaceAll("\\s", "") to solve that.
public boolean isPalindrome(String theString) {
if(theString.length() == 0) {
throw new IllegalStateException("I wouldn't expect a word to be zero-length");
}
if(theString.length() == 1) {
return true;
} else {
char[] wordArr = theString.replaceAll("\\s", "").toLowerCase().toCharArray();
for(int i = 0, j = wordArr.length - 1; i < wordArr.length / 2; i++, j--) {
if(wordArr[i] != wordArr[j]) {
return false;
}
}
return true;
}
}

I'm assuming that you're reading in strings. Use string.toCharArray() to convert each string to a char[]. Iterate through the character array using a for loop as follows: on iteration 1, if the first character is equal to the last character, then proceed to the next iteration, else return false. On iteration 2, if the second character is equal to the second-to-last character then proceed to the next iteration, else return false. And so on, until you reach the middle of the string, at which point you return true. Be careful of off-by-one errors; some strings will have an even length, some will have an odd length.
If your palindrome checker is case insensitive, then use string.toLowerCase().toCharArray() to preprocess the character array.
You can use string.charAt(i) instead of string.toCharArray() in the for loop; in this case, if the palindrome checker is case insensitive then preprocess the string with string = string.toLowerCase()

Let's break the problem down: In the end, you are checking if the reverse of the word is equal to the word. I'm going to assume you have all of the words stored in an array called wordArray[].
I have some code for getting the reverse of the word (copied from here):
public String reverse(String str) {
if ((null == str) || (str.length() <= 1)) {
return str;
}
return new StringBuffer(str).reverse().toString();
}
So, now we just need to call that on every word. So:
for(int count = 0; count<wordArray.length;count++) {
String currentWord = wordArray[count];
if(currentWord.equals(reverse(currentWord)) {
//it's a palendrome, do something
}
}

Since this is homework, i'll not supply you with code.
When i code, the first thing i do is take a step back and ask myself,
"what am i trying to get the computer to do that i would do myself?"
Ok, so you've got this huuuuge string. Probably something like this: "lkasjdfkajsdf adda aksdjfkasdjf ghhg kajsdfkajsdf oopoo"
etc..
A string's length will either be odd or even. So, first, check that.
The odd/even will be used to figure out how many letters to read in.
If the word is odd, read in ((length-1)/2) characters.
if even (length/2) characters.
Then, compare those characters to the last characters. Notice that you'll need to skip the middle character for an odd-lengthed string.
Instead of what you have above, which checks the 1st and 2nd, then 2nd and 3rd, then 3rd and fourth characters, check from the front and back inwards, like so.
while (reader.hasNext()) {
String word = reader.nextLine();
boolean checker = true;
for (int i = 0; i < word.length(); i++) {
if(word.length()<2){return;}
if (word.charAt(i) != word.charAt(word.length()-i) {
checker = false;
}
}
if(checker == true)
{System.out.println(word);}
}

Retrieve method source code from class source code file

I have here a String that contains the source code of a class. Now i have another String that contains the full name of a method in this class. The method name is e.g.
public void (java.lang.String test)
Now I want to retieve the source code of this method from the string with the class' source code. How can I do that? With String#indexOf(methodName) i can find the start of the method source code, but how do i find the end?
====EDIT====
I used the count curly-braces approach:
internal void retrieveSourceCode()
{
int startPosition = parentClass.getSourceCode().IndexOf(this.getName());
if (startPosition != -1)
{
String subCode = parentClass.getSourceCode().Substring(startPosition, parentClass.getSourceCode().Length - startPosition);
for (int i = 0; i < subCode.Length; i++)
{
String c = subCode.Substring(0, i);
int open = c.Split('{').Count() - 1;
int close = c.Split('}').Count() - 1;
if (open == close && open != 0)
{
sourceCode = c;
break;
}
}
}
Console.WriteLine("SourceCode for " + this.getName() + "\n" + sourceCode);
}
This works more or less fine, However, if a method is defined without body, it fails. Any hints how to solve that?

Counting braces and stopping when the count decreases to 0 is indeed the way to go. Of course, you need to take into account braces that appear as literals and should thus not be counted, e.g. braces in comments and strings.
Overall this is kind of a thankless endeavour, comparable in complexity to say, building a command line parser if you want to get it working really reliably. If you know you can get away with it you could cut some corners and just count all the braces, although I do not recommend it.
Update:
Here's some sample code to do the brace counting. As I said, this is a thankless job and there are tons of details you have to get right (in essence, you 're writing a mini-lexer). It's in C#, as this is the closest to Java I can write code in with confidence.
The code below is not complete and probably not 100% correct (for example: verbatim strings in C# do not allow spaces between the # and the opening quote, but did I know that for a fact or just forgot about it?)
// sourceCode is a string containing all the source file's text
var sourceCode = "...";
// startIndex is the index of the char AFTER the opening brace
// for the method we are interested in
var methodStartIndex = 42;
var openBraces = 1;
var insideLiteralString = false;
var insideVerbatimString = false;
var insideBlockComment = false;
var lastChar = ' '; // White space is ignored by the C# parser,
// so a space is a good "neutral" character
for (var i = methodStartIndex; openBraces > 0; ++i) {
var ch = sourceCode[i];
switch (ch) {
case '{':
if (!insideBlockComment && !insideLiteralString && !insideVerbatimString) {
++openBraces;
}
break;
case '}':
if (!insideBlockComment && !insideLiteralString && !insideVerbatimString) {
--openBraces;
}
break;
case '"':
if (insideBlockComment) {
continue;
}
if (insideLiteralString) {
// "Step out" of the string if this is the closing quote
insideLiteralString = lastChar != '\';
}
else if (insideVerbatimString) {
// If this quote is part of a two-quote pair, do NOT step out
// (it means the string contains a literal quote)
// This can throw, but only for source files with syntax errors
// I 'm ignoring this possibility here...
var nextCh = sourceCode[i + 1];
if (nextCh == '"') {
++i; // skip that next quote
}
else {
insideVerbatimString = false;
}
}
else {
if (lastChar == '#') {
insideVerbatimString = true;
}
else {
insideLiteralString = true;
}
}
break;
case '/':
if (insideLiteralString || insideVerbatimString) {
continue;
}
// TODO: parse this
// It can start a line comment, if followed by /
// It can start a block comment, if followed by *
// It can end a block comment, if preceded by *
// Line comments are intended to be handled by just incrementing i
// until you see a CR and/or LF, hence no insideLineComment flag.
break;
}
lastChar = ch;
}
// From the values of methodStartIndex and i we can now do sourceCode.Substring and get the method source

Have a look at:- Parser for C#
It recommends using NRefactory to parse and tokenise source code, you should be able to use that to navigate your class source and pick out methods.

You will have to, probably, know the sequence of the methods listed in the code file. So that, you can look for the method closing scope } which may be right above start of next method.
So you code might look like:
nStartOfMethod = String.indexOf(methodName)
nStartOfNextMethod = String.indexOf(NextMethodName)
Look for .LastIndexOf(yourMethodTerminator /*probably a}*/,...) between a string of nStartOfMethod and nStartOfNextMethod
In this case, if you dont know the sequence of methods, you might end up skipping a method in between, to find an ending brace.

Can I improve this Pig-Latin converter? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I'm brand-spanking new to Java and I made this little translator for PigLatin.
package stringmanipulation;
public class PigLatinConverter {
public String Convert(String word){
int position = 0;
if (!IsVowel(word.charAt(0))) {
for (int i= 0; i < word.length(); i++) {
if (IsVowel(word.charAt(i))) {
position = i;
break;
}
}
String first = word.substring(position, word.length());
String second = word.substring(0, position) + "ay";
return first + second;
} else {
return word + "way";
}
}
public boolean IsVowel(char c){
if (c == 'a')
return true;
else if(c == 'e')
return true;
else if(c == 'i')
return true;
else if(c == 'o')
return true;
else if(c == 'u')
return true;
else
return false;
}
}
Are there any improvements I can make?
Are there any nifty Java tricks that are in the newest Java version I might not be aware of? I come from a C# background.
Thank you!

I'd rewrite isVowel(char ch) as follows:
return "aeiou".indexOf(ch) != -1;
And I'd write the following instead:
// String first = word.substring(position, word.length());
String first = word.substring(position);
I'd also rename method names to follow coding convention.
And of course, being me, I'd also use regex instead of substring and for loop.
System.out.println("string".replaceAll("([^aeiou]+)(.*)", "$2$1ay"));
// ingstray
References
Java Coding Convention - Naming Convention

Disclaimer: I don't know Java.
Inverted logic is confusing please write your if statement as such:
if (IsVowel(word.charAt(0))) {
return word + "way";
} else {
for (int i= 0; i < word.length(); i++) {
// ...
return first + second;
}
You can even drop the else.
IsVowel may need to be private. It can also be rewritten using a single || chain, or as a "".indexOf (or whatever it is in Java).
Your for logic can be simplified int a short while:
while (position < word.length() && !IsVowel(word.charAt(position)) {
++position;
}

Here's a complete rewrite that makes the code more readable if you know how to read regex:
String[] words =
"nix scram stupid beast dough happy question another if".split(" ");
for (String word : words) {
System.out.printf("%s -> %s%n", word,
("w" + word).replaceAll(
"w(qu|[^aeiou]+|(?<=(w)))([a-z]*)",
"$3-$1$2ay"
)
);
}
This prints (as seen on ideone.com):
nix -> ix-nay
scram -> am-scray
stupid -> upid-stay
beast -> east-bay
dough -> ough-day
happy -> appy-hay
question -> estion-quay
another -> another-way
if -> if-way
Note that question becomes estion-quay, which is the correct translation according to Wikipedia article. In fact, the above words and translations are taken from the article.
The way the regex work is as follows:
First, all words are prefixed with w just in case it's needed
Then, skipping that w, look for either qu or a non-empty sequence of consonants. If neither can be found, then the actual word starts with a vowel, so grab the w using capturing lookbehind
Then just rearrange the components to get the translation
That is:
"skip" dummy w
|
w(qu|[^aeiou]+|(?<=(w)))([a-z]*) --> $3-$1$2ay
\ 2\_/ /\______/
\_________1_________/ 3
References
regular-expressions.info
Character class:[…], Alternation: |, Repetition:+,*, Lookaround:(?<=…), and Capturing:(…)

I know this question is well over a year old, but I thought I would put my modification of it. There are several improvements in this code.
public String convert(String word)
{
int position = 0;
while(!isVowel(word.charAt(position)))
{
++position;
}
if(position == 0)
{
return word + "-way";
}
else if(word.charAt(0) == 'q')
{
++position;
}
return word.substring(position) + "-" + word.substring(0, position) + "ay";
}
public boolean isVowel(char character)
{
switch(character)
{
case 'a': case 'e': case 'i': case 'o': case 'u':
return true;
default:
return false;
}
}
First the code will find the position of the first vowel, and then jump out of the loop. This is simpler than using a for loop to iterate through each letter and using break; to jump out of the loop. Secondly, this will match all the testcases on the Wikipedia site. Lastly, since chars are actually a limited range int, a switch statement can be used to improve performance and readability.

Not strictly an improvement as such, but Java convention dictates that methods should start with a lowercase letter.

I'm years removed from Java, but overall, your code looks fine. If you wanted to be nitpicky, here are some comments:
Add comments. It doesn't have to follow the Javadoc specification, but at least explicitly describe the accepted argument and the expected return value and perhaps give some hint as to how it works (behaving differently depending on whether the first character is a vowel)
You might want to catch IndexOutOfBoundsException, which I think might happen if you pass it a zero length string.
Method names should be lower case.
IsVowel can be rewritten return c == 'a' || c == 'e' and so on. Due to short-circuiting, the performance in terms of number of comparisons should be similar.

Is this homework? If so, tag it as such.
Unclear what expected behaviour is for "honest" or "ytterbium".
It doesn't respect capitals ("Foo" should turn into "Oofay", and AEIOU are also vowels).
It crashes if you pass in the empty string.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Problem with underscore(_) in Collections.binarySearch (Java) - java

I guess you have to add the underscore as "letter" here // Find where the word starts int w; for (w = pos; w >= 0; w--) { if (!Character.isLetter(content.charAt(w))) { break; } }

Related

Valid parentheses in Java

Java - See if a string contains any characters in it

Finding Palindromes in a word list

Retrieve method source code from class source code file

Can I improve this Pig-Latin converter? [closed]

Categories

Resources