Splitting string algorithm in Java

Splitting string algorithm in Java - java

I'm trying to make the following algorithm work. What I want to do is split the given string into substrings consisting of either a series of numbers or an operator.
So for this string = "22+2", I would get an array in which [0]="22" [1]="+" and [2]="2".
This is what I have so far, but I get an index out of bounds exception:
public static void main(String[] args) {
String string = "114+034556-2";
int k,a,j;
k=0;a=0;j=0;
String[] subStrings= new String[string.length()];
while(k<string.length()){
a=k;
while(((int)string.charAt(k))<=57&&((int)string.charAt(k))>=48){
k++;}
subStrings[j]=String.valueOf(string.subSequence(a,k-1)); //exception here
j++;
subStrings[j]=String.valueOf(string.charAt(k));
j++;
}}
I would rather be told what's wrong with my reasoning than be offered an alternative, but of course I will appreciate any kind of help.

I'm deliberately not answering this question directly, because it looks like you're trying to figure out a solution yourself. I'm also assuming that you're purposefully not using the split or the indexOf functions, which would make this pretty trivial.
A few things I've noticed:
If your input string is long, you'd probably be better off working with a char array and stringbuilder, so you can avoid memory problems arising from immutable strings
Have you tried catching the exception, or printing out what the value of k is that causes your index out of bounds problem?
Have you thought through what happens when your string terminates? For instance, have you run this through a debugger when the input string is "454" or something similarly trivial?

You could use a regular expression to split the numbers from the operators using lookahead and lookbehind assertions
String equation = "22+2";
String[] tmp = equation.split("(?=[+\\-/])|(?<=[+\\-/])");
System.out.println(Arrays.toString(tmp));

If you're interested in the general problem of parsing, then I'd recommend thinking about it on a character-by-character level, and moving through a finite state machine with each new character. (Often you'll need a terminator character that cannot occur in the input--such as the \0 in C strings--but we can get around that.).
In this case, you might have the following states:
initial state
just parsed a number.
just parsed an operator.
The characters determine the transitions from state to state:
You start in state 1.
Numbers transition into state 2.
Operators transition into state 3.
The current state can be tracked with something like an enum, changing the state after each character is consumed.
With that setup, then you just need to loop over the input string and switch on the current state.
// this is pseudocode -- does not compile.
List<String> parse(String inputString) {
State state = INIT_STATE;
String curr = "";
List<String> subStrs = new ArrayList<String>();
for(Char c : inputString) {
State next;
if (isAnumber(c)) {
next = JUST_NUM;
} else {
next = JUST_OP;
}
if (state == next) {
// no state change, just add to accumulator:
acc = acc + c;
} else {
// state change, so save and reset the accumulator:
subStrs.add(acc);
acc = "";
}
// update the state
state = next;
}
return subStrs;
}
With a structure like that, you can more easily add new features / constructs by adding new states and updating the behavior depending on the current state and incoming character. For example, you could add a check to throw errors if letters appear in the string (and include offset locations, if you wanted to track that).

If your critera is simply "Anything that is not a number", then you can use some simple regex stuff if you dont mind working with parallel arrays -
String[] operands = string.split("\\D");\\split around anything that is NOT a number
char[] operators = string.replaceAll("\\d", "").toCharArray();\\replace all numbers with "" and turn into char array.

String input="22+2-3*212/21+23";
String number="";
String op="";
List<String> numbers=new ArrayList<String>();
List<String> operators=new ArrayList<String>();
for(int i=0;i<input.length();i++){
char c=input.charAt(i);
if(i==input.length()-1){
number+=String.valueOf(c);
numbers.add(number);
}else if(Character.isDigit(c)){
number+=String.valueOf(c);
}else{
if(c=='+' || c=='-' || c=='*' ||c=='/'){
op=String.valueOf(c);
operators.add(op);
numbers.add(number);
op="";
number="";
}
}
}
for(String x:numbers){
System.out.println("number="+x+",");
}
for(String x:operators){
System.out.println("operators="+x+",");
}
this will be the output
number=22,number=2,number=3,number=212,number=21,number=23,operator=+,operator=-,operator=*,operator=/,operator=+,

Related

I am trying to write a recursive function that checks if one word matches the reverse word but im not sure if its recursion

All I really need to know is if the function I am using is recursive or if the method simply doesnt get called within itself.
In my code, I have a helper function to reverse the second word and I put a toLowerCase in order to be able to compare words even if there are any random capitals.
Is this recursion or is it just a function that compares the two?
import java.util.Scanner;
public class isReverse {
public static void main(String[] args) {
isReverse rev = new isReverse();
Scanner in = new Scanner(System.in);
System.out.println("Please enter a word: ");
String a = in.nextLine();
System.out.println("Please Enter a second word to compare: ");
String b = in.nextLine();
System.out.println(rev.isReverse(a, b));
}
String rev = "";
public boolean isReverse(String wordA, String wordB){
String fword = wordA.replaceAll("\\s+", "").toLowerCase();
String clean2 = wordB.replaceAll("\\s+", "").toLowerCase();
String reverse = revString(clean2);
if(fword.length() == 0){
return false;
}
if (fword.equals(reverse)){
return true;
}
if (!reverse.equals(fword)){
return false;
}
else
return isReverse(fword,reverse);
}
public String revString(String sequence) {
String input = sequence;
StringBuilder order = new StringBuilder();
order.append(input);
order = order.reverse();
rev = order.toString();
return rev;
}
}

As far as your question is concerned, your code is not behaving like a recursive function because your code is not entering into the last else condition. For recursion to work you need to have:
a base case(if there is no base case the recursion will go on forever)
a recursive case(this is where you kind of reduce the original problem)
But my comment about your code:
If you're doing the actual reverse logic you don't need to use recursion just to check if the original string and the reverse string are the same. These is purely an algorithm problem so here is the way to solve the problem:
If the length of the given input is 1 then the reverse is the same.
else:
check the first and last chars of the string, if they are equal, then you need to remove those two chars and check if the rest of the string is a palindrome. This is the actual recursive step.
else the string is not a palindrome.

Technically? Well, you are calling a method from within itself, so, technically, yeah.
Pragmatically? No. The recursive call part will never be invoked.
Your code does this: I have 2 words. If the words are equal to each other, stop and do something. if they are not equal to each other, stop and do something. Otherwise, recurse.
And that's where it falls apart: It'll never recurse - either the words are equal, or they are not.
The general idea behind a recursive function is three-fold:
The method (java-ese for 'function') calls itself.
Upon each call to itself, the parameters passed are progressing to an end state - they become smaller or trend towards a stop value such as 0.
There are edge cases where the function does not call itself, and returns instead (the answer for the smallest/zero-est inputs does not require recursion and is trivial).
You're missing the #2 part here. Presumably, this is what you'd want for a recursive approach. Forget about revString, delete that entirely. Do this instead:
If both inputs are completely empty, return true (That's the #3 - edge cases part).
If one of the two inputs is empty but the other one is not, false. (Still working on #3)
If the first character of the input string is NOT equal to the last character of the output string, false. (Still #3).
Now lop the first char off of the first input and the last off of the latter (Working on #2 now - by shortening the strings we're inevitably progressing towards an end no matter what)
now call ourself, with these new lopped-down strings (That'll be #1).
That would be a recursive approach to the problem. It's more complicated than for loops, but, then, recursive functions often are.

Actually this is not a recursing. All you need is just:
Check that both string have the same length
Iteratively check letters from 0 to n from the first string and from n to 0 from the second string. If they equal, then go to the next iteration (recutsion) or return fail otherqwise.
// here do not check signature of the public method
public static boolean isReverse(String one, String two) {
return isReverse(one, 0, two, two.length() - 1);
}
// support method has two additional counters to check letters to be equal
private static boolean isReverse(String one, int i, String two, int j) {
if (i == one.length())
return j == -1;
if (j == two.length())
return i == -1;
// if not equal, then strings are not equal
if (one.charAt(i) != two.charAt(j))
return false;
// go to the next recursion to check next letters
return isReverse(one, i + 1, two, j - 1);
}

Java StringTokenizer - Problems with nextToken() usage with substring

I have a text file I must iterate through and want to move certain elements of each line into an ArrayList. Each line of the file is in the format: number. String number. decimal decimal
As the two numbers have a full stop (.) at the end and I need to read these as a String, removed the . using substring and then convert to a primitive data type (int or short).
Example on file:
294. ABC123 66. .00 .00
I get a string range error if I try this: (* temp is a String)
while(fileLine.hasMoreTokens())
{
oneNumber = Integer.valueOf(fileLine.nextToken().substring(0,
fileLine.nextToken().indexOf('.')));
twoString = fileLine.nextToken();
threeNumber = Short.valueOf(fileLine.nextToken().substring(0,
fileLine.nextToken().indexOf('.')));
temp = fileLine.nextToken(); //Handle attributes not required
temp = fileLine.nextToken(); //Handle attributes not required
}
I believe why this is happening is that the nextToken() in the substring's parameters is confusing the StringTokenizer. So I fixed it like this:
while(fileLine.hasMoreTokens())
{
temp = fileLine.nextToken();
oneNumber = Integer.valueOf(temp.substring(0, temp.indexOf('.')));
twoString = fileLine.nextToken();
temp = fileLine.nextToken();
threeNumber= Short.valueOf(temp.substring(0, temp.indexOf('.')));
temp = fileLine.nextToken();
temp = fileLine.nextToken();
}
While this works it feels a bit redundant. Is there something I can try to make this cleaner, while retaining use of the StringTokenizer?

This is the intended behavior of .nextToken(): it returns the token and moves past the current token. When you use Integer.valueOf(fileLine.nextToken().substring(0, fileLine.nextToken().indexOf('.'))), you are calling .nextToken() twice, which means you are dealing with two distinct tokens. It has nothing to do with how String#substring works. You need to store the token in a variable if you need to perform additional operations on it. This exact same problem can also be caused by using BufferedReader#readLine twice when one should be storing the value.

Yup. nextToken() is stateful, calling it changes things, so using it twice in a single line would consume two tokens.
Your second snippet seems much easier to read to me, so I'm not sure what the problem is. Presumably you want your code to be more readable.
An easy fix is to make helper methods:
while (fileLine.hasMoreTokens()) {
oneNumber = fetchHeadingNumber(fileLine);
twoString = fileLine.nextToken();
threeNumber = fetchHeadingNumber(fileLine);
fileLine.nextToken(); // no need to assign it.
fileLine.nextToken();
}
with this method:
int fetchHeadingNumber(StringTokenizer t) {
String token = t.nextToken();
return Integer.parseInt(token.substring(0, token.indexOf('.')));
}
you can go even further and make a class representing a line, which has all the code needed to parse it (I made up names; your snippet doesn't make clear what kind of thing the line represents):
#lombok.Value class InventoryItem {
int warehouse;
String name;
int shelf;
public static InventoryItem read(StringTokenizer tokenizer) {
int warehouse = num(tokenizer);
String name = tokenizer.nextToken();
int shelf = num(tokenizer);
tokenizer.nextToken();
tokenizer.nextToken();
return new InventoryItem(warehouse, name, shelf);
}
private static int num(StringTokenizer t) {
String token = t.nextToken();
return Integer.parseInt(token.substring(0, token.indexOf('.')));
}
}
and then reading a line and retrieving, say, the location where it is stored is so much nicer: Now things actually have names!
InventoryItem item = InventoryItem.read(fileLine);
System.out.println("This item is in warehouse " + item.getWarehouse());
NB: Uses lombok's #Value to avoid putting a lot of boilerplate in this answer.

recursion moving char to the end of the string

i need to get a string and rearrange it with recursion by getting char and by that char i have to move that char everywhere on the string to the end
like "Hello world!" ,'l' => "Heo word!lll"
i have problems understading the recursion way of thinking
so i started with this:
public static String ChToLast (String str, char ch){
if(str.indexOf(ch)== -1){
return str;
}else{
if(str.indexOf(0) == ch){
return str;
}
}
thank you for your help :)

Recursion is the practise of reusing your method inside itself. In this case, I will provide a solution to explain what happens:
public static String chrToLast(String str, char ch) {
//This if statement details the end condition
if(str.length() < 1) {
return "";
}
String newString = str.substring(1); //Create new string without first character
if(str.indexOf(ch) == 0) { //This happens when your character is found
return chrToLast(newString, ch) + ch;
} else { //This happens with all other characters
return str.charAt(0) + chrToLast(newString, ch);
}
}
If you execute:
chrToLast("Hello, World!", 'l')
This will result in the desired result: Heo, Word!lll
Process
In general, this method works by checking which character is currently the first in the given string, and then deciding what to do. If the first character is the same as the one your looking for (l), it will then remove that character from the string and use chrToLast on that new string. But, it also adds the character it found to the end of the result by using + ch. It continues to do this until there are no more characters left, which is what the end condition is for.
The end condition
The end condition returns an empty string "" because that is what is called the base case of the algorithm. You can think of a recursive algorithm as something solving a problem by calling itself a number of times. By calling themselves, recursive algorithms move towards a base. In this particular case, it does that by subtracting one character off the string each time the method is executed. Once there are no characters left, it reaches the base case which is "", where the string is finally empty and no characters can be subtracted anymore. (Hence it returns nothing as it's final state)
I hope this answers your question. It's important to understand this concept, as it is very powerful. Try to study the code and comment if something's not clear.
Something that can also help is by executing this code in an IDE and using the debugger to walk through its execution. You can then see for yourself what the flow of the program is, and see the value of the variables in play.

If you use recursion, it will be pretty expensive call for the result you are expecting. Lot of movement of String or charArray elements, eitherway you do. I don't see its a wiser choice. I would do it this way, it will be of space complexity O(2n) & performance complexity O(n).
public class Solve {
public static void main(String[] args) {
System.out.println(ChToLast("Hello world!", 'l'));
}
public static String ChToLast(String str, char ch) {
char[] chars = str.toCharArray();
char[] modChars = new char[chars.length];
int i = 0;
for(char element : chars){
if(ch != element){
modChars[i++] = element;
}
}
Arrays.fill(modChars, i, chars.length , ch);
return new String(modChars);
}
}

If you use while loop and write a method to check if that string means perfect statement then that may work for you
Here you would need some help of NLP concept to check everytime if arranged chars are making any statement or are grammatically correct.
This will help

Using a user inputted string of characters find the longest word that can be made

Basically I want to create a program which simulates the 'Countdown' game on Channel 4. In effect a user must input 9 letters and the program will search for the largest word in the dictionary that can be made from these letters.I think a tree structure would be better to go with rather than hash tables. I already have a file which contains the words in the dictionary and will be using file io.
This is my file io class:
public static void main(String[] args){
FileIO reader = new FileIO();
String[] contents = reader.load("dictionary.txt");
}
This is what I have so far in my Countdown class
public static void main(String[] args) throws IOException{
Scanner scan = new Scanner(System.in);
letters = scan.NextLine();
}
I get totally lost from here. I know this is only the start but I'm not looking for answers. I just want a small bit of help and maybe a pointer in the right direction. I'm only new to java and found this question in an interview book and thought I should give it a .
Thanks in advance

welcome to the world of Java :)
The first thing I see there that you have two main methods, you don't actually need that. Your program will have a single entry point in most cases then it does all its logic and handles user input and everything.
You're thinking of a tree structure which is good, though there might be a better idea to store this. Try this: http://en.wikipedia.org/wiki/Trie
What your program has to do is read all the words from the file line by line, and in this process build your data structure, the tree. When that's done you can ask the user for input and after the input is entered you can search the tree.
Since you asked specifically not to provide answers I won't put code here, but feel free to ask if you're unclear about something

There are only about 800,000 words in the English language, so an efficient solution would be to store those 800,000 words as 800,000 arrays of 26 1-byte integers that count how many times each letter is used in the word, and then for an input 9 characters you convert to similar 26 integer count format for the query, and then a word can be formed from the query letters if the query vector is greater than or equal to the word-vector component-wise. You could easily process on the order of 100 queries per second this way.

I would write a program that starts with all the two-letter words, then does the three-letter words, the four-letter words and so on.
When you do the two-letter words, you'll want some way of picking the first letter, then picking the second letter from what remains. You'll probably want to use recursion for this part. Lastly, you'll check it against the dictionary. Try to write it in a way that means you can re-use the same code for the three-letter words.

I believe, the power of Regular Expressions would come in handy in your case:
1) Create a regular expression string with a symbol class like: /^[abcdefghi]*$/ with your letters inside instead of "abcdefghi".
2) Use that regular expression as a filter to get a strings array from your text file.
3) Sort it by length. The longest word is what you need!
Check the Regular Expressions Reference for more information.
UPD: Here is a good Java Regex Tutorial.

A first approach could be using a tree with all the letters present in the wordlist.
If one node is the end of a word, then is marked as an end-of-word node.
In the picture above, the longest word is banana. But there are other words, like ball, ban, or banal.
So, a node must have:
A character
If it is the end of a word
A list of children. (max 26)
The insertion algorithm is very simple: In each step we "cut" the first character of the word until the word has no more characters.
public class TreeNode {
public char c;
private boolean isEndOfWord = false;
private TreeNode[] children = new TreeNode[26];
public TreeNode(char c) {
this.c = c;
}
public void put(String s) {
if (s.isEmpty())
{
this.isEndOfWord = true;
return;
}
char first = s.charAt(0);
int pos = position(first);
if (this.children[pos] == null)
this.children[pos] = new TreeNode(first);
this.children[pos].put(s.substring(1));
}
public String search(char[] letters) {
String word = "";
String w = "";
for (int i = 0; i < letters.length; i++)
{
TreeNode child = children[position(letters[i])];
if (child != null)
w = child.search(letters);
//this is not efficient. It should be optimized.
if (w.contains("%")
&& w.substring(0, w.lastIndexOf("%")).length() > word
.length())
word = w;
}
// if a node its end-of-word we add the special char '%'
return c + (this.isEndOfWord ? "%" : "") + word;
}
//if 'a' returns 0, if 'b' returns 1...etc
public static int position(char c) {
return ((byte) c) - 97;
}
}
Example:
public static void main(String[] args) {
//root
TreeNode t = new TreeNode('R');
//for skipping words with "'" in the wordlist
Pattern p = Pattern.compile(".*\\W+.*");
int nw = 0;
try (BufferedReader br = new BufferedReader(new FileReader(
"files/wordsEn.txt")))
{
for (String line; (line = br.readLine()) != null;)
{
if (p.matcher(line).find())
continue;
t.put(line);
nw++;
}
// line is not visible here.
br.close();
System.out.println("number of words : " + nw);
String res = null;
// substring (1) because of the root
res = t.search("vuetsrcanoli".toCharArray()).substring(1);
System.out.println(res.replace("%", ""));
}
catch (Exception e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Output:
number of words : 109563
counterrevolutionaries
Notes:
The wordlist is taken from here
the reading part is based on another SO question : How to read a large text file line by line using Java?

Removing duplicate chars from a string passed as a parameter

I am a little confused how to approach this problem. The userKeyword is passed as a parameter from a previous section of the code. My task is to remove any duplicate chars from the inputted keyword(whatever it is). We have just finished while loops in class so some hints regarding these would be appreciated.
private String removeDuplicates(String userKeyword){
String first = userKeyword;
int i = 0;
while(i < first.length())
{
if (second.indexOf(first.charAt(i)) > -1){
}
i++;
return "";
Here's an update of what I have tried so far - sorry about that.

This is the perfect place to use java.util.Set, a construct which is designed to hold unique elements. By trying to add each word to a set, you can check if you've seen it before, like so:
static String removeDuplicates(final String str)
{
final Set<String> uniqueWords = new HashSet<>();
final String[] words = str.split(" ");
final StringBuilder newSentence = new StringBuilder();
for(int i = 0; i < words.length; i++)
{
if(uniqueWords.add(words[i]))
{
//Word is unique
newSentence.append(words[i]);
if((i + 1) < words.length)
{
//Add the space back in
newSentence.append(" ");
}
}
}
return newSentence.toString();
}
public static void main(String[] args)
{
final String str = "Words words words I love words words WORDS!";
System.out.println(removeDuplicates(str)); //Words words I love WORDS!
}

Have a look at this answer.
You might not understand this, but it does the job (it cleverly uses a HashSet that doesn't allow duplicate values).
I think your teacher might be looking for a solution using loops however - take a look at William Morisson's answer for this.
Good luck!

For future reference, StackOverflow normally requires you to post what you have, and ask for suggestions for improvement.
As its not an active day, and I am bored I've done this for you. This code is pretty efficient and makes use of no advanced data structures. I did this so you could more easily understand it.
Please do try to understand what I'm doing. Learning is what StackOverflow is for.
I've added comments in the code to assist you in learning.
private String removeDuplicates(String keyword){
//stores whether a character has been encountered before
//a hashset would likely use less memory.
boolean[] usedValues = new boolean[Character.MAX_VALUE];
//Look into using a StringBuilder. Using += operator with strings
//is potentially wasteful.
String output = "";
//looping over every character in the keyword...
for(int i=0; i<keyword.length(); i++){
char charAt = keyword.charAt(i);
//characters are just numbers. if the value in usedValues array
//is true for this char's number, we've seen this char.
boolean shouldRemove = usedValues[charAt];
if(!shouldRemove){
output += charAt;
//now this character has been used in output. Mark that in
//usedValues array
usedValues[charAt] = true;
}
}
return output;
}
Example:
//output will be the alphabet.
System.out.println(removeDuplicates(
"aaaabcdefghijklmnopqrssssssstuvwxyyyyxyyyz"));

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Splitting string algorithm in Java - java

You could use a regular expression to split the numbers from the operators using lookahead and lookbehind assertions String equation = "22+2"; String[] tmp = equation.split("(?=[+\\-/])|(?<=[+\\-/])"); System.out.println(Arrays.toString(tmp));

Related

I am trying to write a recursive function that checks if one word matches the reverse word but im not sure if its recursion

Java StringTokenizer - Problems with nextToken() usage with substring

recursion moving char to the end of the string

Using a user inputted string of characters find the longest word that can be made

Removing duplicate chars from a string passed as a parameter

Categories

Resources