Tricky Java problem to split and manipulate String

Tricky Java problem to split and manipulate String - java

I need to create a function in Java translate_string(),
which is doing "translation" of the string letter by letter.
Function takes String as arg, and returns array of String[]:
public static String[] translate_string(String string)
Given existing function translate_letter(),
which takes String as arg and returns array of String[]
public static String[] translate_letter(String letter)
and it translating each letter, I need to sequentially translate
the whole string letter by letter into combination of output strings.
Complexity is, that one character could be translated to multiple
sets of characters, it is not one-to-one relationship between
input string and output string, it is - one-to-many relationship,
where one letter as input could produce multiple combinations
(i.e. arrays) as output.
N.B.: Term translation has nothing to do with actual translation from
one language to another, it is just substitution of one character
to set of other characters.
Below is exact code id simplified version of function
translate_letter() (no modification is required):
//------------------------------------------------------------
// translate letters - return array of diff combinations
public static String[] translate_letter(String s) {
ArrayList<String> o = new ArrayList<>(1);
if ( s.equals("a") ) { // if a
o.add("a1");
} else {
if ( s.equals("b") ) { // if b
o.add("b1"); o.add("b2");
} else {
if ( s.equals("c") ) { // if c
o.add("c1"); o.add("c2"); o.add("c3");
} else {
if ( s.equals("d") ) { // if d
o.add("d1"); o.add("d2"); o.add("d3"); o.add("d4");
} else {
o.add(s); // s = def add (if nothing above matches)
} // end if d
} // end if c
} // end if b
} // end if a
//Convert ArrayList o to string array[]
String[] array = o.toArray(new String[o.size()]);
return array;
} // end translate_letter()
//------------------------------------------------------------
So, how to translate the string?
Let's have a look at simple version of translate - when
translate_letter() return just string.
So, letter "a" will be translated to "a1", letter "b"
will be translated to "b1", "c" - to "c1".
Input string "a" will be translated as "a1", simple.
Input string "ab" will be translated as "a1b1", simple.
Input string "abc" will be translated as "a1b1c1", simple.
I don't need to create this simple version - it is nothing to do here,
just split the input string and translate each letter by letter.
What I want to write (and I cannot do this) - is complicated version of translate_string(),
when function translate_letter() returns multiple combinations,
i.e. - array of output combinations.
For example (according to the code of translate_letter() above,
letter "a" will be translated to "a1", simple.
But letter "b" will be translated by translate_letter() to 2 combinations:
"b1" and "b2",
and output is String[] array = {"b1", "b2" }
So, string "a" will be translated by translate_string() as array of
just 1 element R[] = { "a1" }
String "ab" will be translated as array of 2 elements
R[] = { "a1b1", "a1b2" }
String "abc" will be translated as:
R[] = { "a1b1c1", "a1b1c2", "a1b1c3", "a1b2c1", "a1b2c2", "a1b2c3" }
String "db" will be translated as:
R[] = { "d1b1", "d2b1", "d3b1", "d4b1", "d1b2", "d2b2", "d3b2", "d4b2" }
This task is more complicated, than it seems initially just by
look at it. I have tried and failed with 2 approaches - simple Arrays[]
and Array[] of ArrayList - cannot loop through two array (with different
indexes) at once and need some external help or ideas - how to accomplish this.

You need a recursive translateString method (according to Java naming conventions named in camel case without any underscore in the name). The algorithm is:
If string is the empty string, return an array of one element, the empty string.
From translateLetter() obtain all possible translations of the first letter.
From a recursive call to translateString() obtain all possible translations of the remainder of the string after the first letter. Or for the shortcut: just call translateString() passing the part of the string that comes after the first character as argument.
In two nested loops concatenate each possible translation of the first letter with each possible translation of the remainder of the string. One of the loops will iterate over the possible translations of the first letter obtained from 2. above. The other loop over the possible translations of the remainder of the string obtained from 3. Add the concatenated strings to an ArrayList<String>.
Convert the list to an array and return it.

This is one of the rare occasions I'll answer homework, though Ole V.V. is right.
When we do the homework, the learning effect is relative, and it is not honest with respect to those wo struggled.
public static String[] translate_string(String s) {
Set<String> translations = new HashSet<>();
if (!s.isEmpty()) {
s.codePoints(cp -> {
String letter = new String(new int[] {cp}, 0, 1);
String[] more = translate_letter(letter);
if (more.length == 0) {
} else if (more.length == 1) {
translations.add(more[0]);
} else {
Set<String> prior_translations = new HashSet<>(translations);
translations.clear();
for (String prior: prior_translations) {
for (String letter: letter_translations) {
translations.add(prior + letter);
}
}
}
});
}
return translations.toArray(new String[0]);
}
The trick is: when having N translations and getting for a next letter L letter translations, the result is N×L translations.
The answer of Ole V.V. explains things better.

Related

Java extracting substring from sentences

There are combination of words like is, is not, does not contain. We have to match these words in a sentence and have to split it.
Intput : if name is tom and age is not 45 or name does not contain tom then let me know.
Expected output:
If name is
tom and age is not
45 or name does not contain
tom then let me know
I tried below code to split and extract but the occurrence of "is" is in "is not" as well which my code is not able to find out:
public static void loadOperators(){
operators.add("is");
operators.add("is not");
operators.add("does not contain");
}
public static void main(String[] args) {
loadOperators();
for(String s : operators){
System.out.println(str.split(s).length - 1);
}
}

Since there could be multiple occurence of a word split wouldn't solve your use case, as in is and is not being different operators for you. You would ideally :
Iterate :
1. Find the index of the 'operator'.
2. Search for the next space _ or word.
3. Then update your string as substring from its index to length-1.

I am not entirely sure about what you try to achieve, but let's give it a shot.
For your case, a simple "workaround" might work just fine:
Sort the operators by their length, descending. This way the "largest match" will get found first. You can define "largest" as either literally the longest string, or preferably the number of words (number of spaces contained), so is a has precedence over contains
You'll need to make sure that no matches overlap though, which can be done by comparing all matches' start and end indices and discarding overlaps by some criteria, like first match wins

This code does what you seem to be wanting to do (or what I guessed you are wanting to do):
public static void main(String[] args) {
List<String> operators = new ArrayList<>();
operators.add("is");
operators.add("is not");
operators.add("does not contain");
String input = "if name is tom and age is not 45 or name does not contain tom then let me know.";
List<String> output = new ArrayList<>();
int lastFoundOperatorsEndIndex = 0; // First start at the beginning of input
for (String operator : operators){
int indexOfOperator = input.indexOf(operator); // Find current operator's position
if (indexOfOperator > -1) { // If operator was found
int thisOperatorsEndIndex = indexOfOperator + operator.length(); // Get length of operator and add it to the index to include operator
output.add(input.substring(lastFoundOperatorsEndIndex, thisOperatorsEndIndex).trim()); // Add operator to output (and remove trailing space)
lastFoundOperatorsEndIndex = thisOperatorsEndIndex; // Update startindex for next operator
}
}
output.add(input.substring(lastFoundOperatorsEndIndex, input.length()).trim()); // Add rest of input as last entry to output
for (String part : output) { // Output to console
System.out.println(part);
}
}
But it is highly dependant on the order of the sentence and the operators. If we're talking about user-input, the task will be much more complicated.
A better method using regular expressions (regExp) would be:
public static void main(String... args) {
// Define inputs
String input1 = "if name is tom and age is not 45 or name does not contain tom then let me know.";
String input2 = "the name is tom and he is 22 years old but the name does not contain jack, but merry is 24 year old.";
// Output split strings
for (String part : split(input1)) {
System.out.println(part.trim());
}
System.out.println();
for (String part : split(input2)) {
System.out.println(part.trim());
}
}
private static String[] split(String input) {
// Define list of operators - 'is not' has to precede 'is'!!
String[] operators = { "\\sis not\\s", "\\sis\\s", "\\sdoes not contain\\s", "\\sdoes contain\\s" };
// Concatenate operators to regExp-String for search
StringBuilder searchString = new StringBuilder();
for (String operator : operators) {
if (searchString.length() > 0) {
searchString.append("|");
}
searchString.append(operator);
}
// Replace all operators by operator+\n and split resulting string at \n-character
return input.replaceAll("(" + searchString.toString() + ")", "$1\n").split("\n");
}
Notice the order of the operators! 'is' has to come after 'is not' or 'is not' will always be split.
You can prevent this by using a negative lookahead for the operator 'is'.
So "\\sis\\s" would become "\\sis(?! not)\\s" (reading like: "is", not followed by a " not").
A minimalist Version (with JDK 1.6+) could look like this:
private static String[] split(String input) {
String[] operators = { "\\sis(?! not)\\s", "\\sis not\\s", "\\sdoes not contain\\s", "\\sdoes contain\\s" };
return input.replaceAll("(" + String.join("|", operators) + ")", "$1\n").split("\n");
}

Generate new word from wildcard [duplicate]

This question already has answers here:
Returning a list of wildcard matches from a HashMap in java
(3 answers)
Closed 7 years ago.
Im trying to generate a word with a wild card and check and see if this word is stored in the dictionary database. Like "appl*" should return apply or apple. However the problem comes in when I have 2 wild cards. "app**" will make words like appaa, appbb..appzz... instead of apple. The second if condition is just for a regular string that contains no wildcards"*"
public static boolean printWords(String s) {
String tempString, tempChar;
if (s.contains("*")) {
for (char c = 'a'; c <= 'z'; c++) {
tempChar = Character.toString(c);
tempString = s.replace("*", tempChar);
if (myDictionary.containsKey(tempString) == true) {
System.out.println(tempString);
}
}
}
if (myDictionary.containsKey(s) == true) {
System.out.println(s);
return true;
} else {
return false;
}
}

You're only using a single for loop over characters, and replacing all instances of * with that character. See the API for String.replace here. So it's no surprise that you're getting strings like Appaa, Appbb, etc.
If you want to actually use Regex expressions, then you shouldn't be doing any String.replace or contains, etc. etc. See Anubian's answer for how to handle your problem.
If you're treating this as a String exercise and don't want to use regular expressions, the easiest way to do what you're actually trying to do (try all combinations of letters for each wildcard) is to do it recursively. If there are no wild cards left in the string, check if it is a word and if so print. If there are wild cards, try each replacement of that wildcard with a character, and recursively call the function on the created string.
public static void printWords(String s){
int firstAsterisk = s.indexOf("*");
if(firstAsterisk == -1){ // doesn't contain asterisk
if (myDictionary.containsKey(s))
System.out.println(s);
return;
}
for(char c = 'a', c <= 'z', c++){
String s2 = s.subString(0, firstAsterisk) + c + s.subString(firstAsterisk + 1);
printWords(s2);
}
}
The base cause relies on the indexOf function - when indexOf returns -1, it means that the given substring (in our case "*") does not occur in the string - thus there are no more wild cards to replace.
The substring part basically recreates the original string with the first asterisk replaced with a character. So supposing that s = "abcd**ef" and c='z', we know that firstAsterisk = 4 (Strings are 0-indexed, index 4 has the first "*"). Thus,
String s2 = s.subString(0, firstAsterisk) + c + s.subString(firstAsterisk + 1);
= "abcd" + 'z' + "*ef"
= "abcdz*ef"

The * character is a regex wildcard, so you can treat the input string as a regular expression:
for (String word : myDictionary) {
if (word.matches(s)) {
System.out.println(word);
}
}
Let the libraries do the heavy lifting for you ;)

With your approach you have to check all possible combinations.
The better way would be to make a regex out of your input string, so replace all * with ..
Than you can loop over your myDirectory and check for every entry whether it matches the regex.
Something like this:
Set<String> dict = new HashSet<String>();
dict.add("apple");
String word = "app**";
Pattern pattern = Pattern.compile(word.replace('*', '.'));
for (String entry : dict) {
if (pattern.matcher(entry).matches()) {
System.out.println("matches: " + entry);
}
}
You have to take care if your input string already contains . than you have to escape them with a \. (The same for other special regex characters.)
See also
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html and
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html

Java String.contains() to take care of natural numbers

I'm a computer science student learning Java, and as an exercise, we're doing a permutation algorhythm.
Now, i'm stuck at a point where i need to search for a natural number within a String full of numbers, splitted by a comma:
String myString = "0,1,2,10,14,";
The problem is i'm using...
myString.contains(String.valueOf(anInteger);
...to check for the presence of a specific number. This works for numbers from 0 to 9, but when looking for a more-than-1-digit number, the program does not recognize it as a natural number.
In other words, and as an example: "14" is not the integer 14, its just a string with an "1", and a "4"; so, if i run...
String myString = "0,1,2,10,14,";
if (myString.contains(myString.valueOf(4))) { doSomething(); }
...the "if" statement will be true, since the integer "4" is present in the string, as part of the natural number "14".
At this point, i've been searching through StackOverflow and other pages for a solution, and learnt i should use Pattern and Matcher.
My question is: what's the best way to do use them?
Relevant part of my code:
for (int i = 0; i<r; i++)
{
if (!act.contains(String.valueOf(i)))
{
...
}
...
}
I use this method several times in my code, so an exact substitution would be nice.
Thank you all in advance!

You only need a method call to matches():
if (myString.matches(".*\\b" + anInteger + "\\b.*"))
// string contains the number
This works using by creating a regex that has a word boundary (\b) at either end of the target number. The leading and trailing .* are required because matches() must match the whole string to return true.

Look into how to split a String into an array of String. So:
String[] splitStrings = myString.split(",")
ArrayList<Integer> parsedInts = new ArrayList<Integer>();
for (String str : splitStrings) {
parsedInts.add(Integer.parseInt(str));
}
then in your for loop:
if (parsedInts.contains(i)) {
// body
}

Something like this:
String myString = "0,1,2,10,14,";
String[] split = myString.split(",");
for (String string : split) {
int num = Integer.parseInt(string);
if (num == 4) {
System.out.println(num);
// ...
}
}

String myString = "0,1,2,10,14,2323232";
String[] allList = myString.split(",");
for (String string : allList) {
if(string.matches("[0-9]*"))
{
System.out.println("Its number with value "+string);
}
}

I think you need to pick all the numbers in the given string and find the permutation.
I think you need to Tokenize the given string with the Comma Separator.
When I do such program, I divide my logic to parse the String and write the logic in another method. Below is the snippet
String myString = "0,1,2,10,14,";
StringTokenizer st2 = new StringTokenizer(myString , ",");
while (st2.hasMoreElements()) {
doSomething(st2.nextElement());
}

split string only on first instance - java

I want to split a string by '=' charecter. But I want it to split on first instance only. How can I do that ? Here is a JavaScript example for '_' char but it doesn't work for me
split string only on first instance of specified character
Example :
apple=fruit table price=5
When I try String.split('='); it gives
[apple],[fruit table price],[5]
But I need
[apple],[fruit table price=5]
Thanks

string.split("=", limit=2);
As String.split(java.lang.String regex, int limit) explains:
The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the array are in the order in which they occur in this string. If the expression does not match any part of the input then the resulting array has just one element, namely this string.
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.
The string boo:and:foo, for example, yields the following results with these parameters:
Regex Limit Result
: 2 { "boo", "and:foo" }
: 5 { "boo", "and", "foo" }
: -2 { "boo", "and", "foo" }
o 5 { "b", "", ":and:f", "", "" }
o -2 { "b", "", ":and:f", "", "" }
o 0 { "b", "", ":and:f" }

Yes you can, just pass the integer param to the split method
String stSplit = "apple=fruit table price=5"
stSplit.split("=", 2);
Here is a java doc reference : String#split(java.lang.String, int)

As many other answers suggest the limit approach, This can be another way
You can use the indexOf method on String which will returns the first Occurance of the given character, Using that index you can get the desired output
String target = "apple=fruit table price=5" ;
int x= target.indexOf("=");
System.out.println(target.substring(x+1));

String string = "This is test string on web";
String splitData[] = string.split("\\s", 2);
Result ::
splitData[0] => This
splitData[1] => is test string
String string = "This is test string on web";
String splitData[] = string.split("\\s", 3);
Result ::
splitData[0] => This
splitData[1] => is
splitData[1] => test string on web
By default split method create n number's of arrays on the basis of given regex. But if you want to restrict number of arrays to create after a split than pass second argument as an integer argument.

This works:
public class Split
{
public static void main(String...args)
{
String a = "%abcdef&Ghijk%xyz";
String b[] = a.split("%", 2);
System.out.println("Value = "+b[1]);
}
}

String[] func(String apple){
String[] tmp = new String[2];
for(int i=0;i<apple.length;i++){
if(apple.charAt(i)=='='){
tmp[0]=apple.substring(0,i);
tmp[1]=apple.substring(i+1,apple.length);
break;
}
}
return tmp;
}
//returns string_ARRAY_!
i like writing own methods :)

why did it not split?

i am confused on why it doest not split the string? my array of string exp does not contain anything when i debug it is the split wrong?what i am trying to do is to split a very simple expression like 1+2+3 and then parse the values, doing a calculator.
EDIT
hi, why i am splitting on each character is because i am doing a calculator, and have read something about converting infix to postfix,so i need to split the string and then loop through each of the string and do the checking as shown below,however when i debug it shows the exp[] is empty
For each token in turn in the input infix expression:
* If the token is an operand, append it to the postfix output.
* If the token is an operator A then:
o While there is an operator B of higher or equal precidence than A at the top of the stack, pop B off the stack and append it to the output.
o Push A onto the stack.
* If the token is an opening bracket, then push it onto the stack.
* If the token is a closing bracket:
o Pop operators off the stack and append them to the output, until the operator at the top of the stack is a opening bracket.
o Pop the opening bracket off the stack.
When all the tokens have been read:
* While there are still operator tokens in the stack:
o Pop the operator on the top of the stack, and append it to the output.
// the main class
public class Main {
public static void main(String[] args) {
calcExpChecker calc = new calcExpChecker("1+2+3+4");
calc.legitExp();
calc.displayPostfix();
}
}
//the class
package javaapplication4;
import java.util.*;
public class calcExpChecker {
private String originalExp; // the orginal display passed
private boolean isItLegitExp; // the whole expression is it legit
private boolean isItBlank; // is the display blank?
private StringBuilder expression = new StringBuilder(50);
private Stack stack = new Stack();//stack for making a postfix string
calcExpChecker(String original)
{
originalExp = original;
}
//check for blank expression
public void isitBlank()
{
if(originalExp.equals(""))
{
isItBlank = true;
}
else
{
isItBlank = false;
}
}
//check for extra operators
public void legitExp()
{
String[] exp = originalExp.split(".");
for(int i = 0 ; i < exp.length ; i++)
{
if(exp[i].matches("[0-9]"))
{
expression.append(exp[i]);
}
else if(exp[i].matches("[+]"))
{
if(stack.empty())
{
stack.push(exp[i]);
}
else
{
while(stack.peek().equals("+"))
{
expression.append(stack.pop());
}
stack.push(exp[i]);
}
}
if (!stack.empty())
{
expression.append(stack.pop());
}
}
}
public void displayPostfix()
{
System.out.print(expression.toString());
}
}

If you make every character a delimiter, what is between them? Nothing
e.g.,
1+2+3+4
is 1 a delimiter? yes, ok, capture everything between it and the next delimiter. Next delimiter? +. Nothing captured. Next delimiter? 2. etc etc

You want to split on every character, so rather use string.split("").
for (String part : string.split("")) {
// ...
}
Or better, just iterate over every character returned by string.toCharArray().
for (char c : string.toCharArray()) {
// ...
}
With chars you can use a switch statement which is better than a large if/else block.

why do you need to split on each character & rather not go for foreach character in the String. That way you don't have to reference as exp[i] either.
Anyways you can split using "" instead of "."

Confession:
Okay I guess my answer is bad because there are subtle differences between Java and C# with this stuff. Still, maybe it'll help someone with the same problem but in C#!
Btw, in C#, if you pass in a RegEx "." you don't get an empty array, instead you get an array of blanks (""), one for each character boundary in the string.
Edit
You can pass a regex expression into the split() function:
string expressions = "10+20*4/2";
/* this will separate the string into an array of numbers and the operators;
the numbers will be together rather than split into individual characters
as "" or "." would do;
this should make processing the expression easier
gives you: {"10", "+", "20", "*", "4", "/", "2"} */
foreach (string exp in expressions.split(#"(\u002A)|(\u002B)|(\u002D)|(\u002F)"))
{
//process each number or operator from the array in this loop
}
Original
String[] exp = originalExp.split(".");
You should get at least one string from the return value of split() (the original un-split string). If the array of strings is empty, the original string was probably empty.
Java String Split() Method

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Tricky Java problem to split and manipulate String - java

Related

Java extracting substring from sentences

Generate new word from wildcard [duplicate]

Java String.contains() to take care of natural numbers

split string only on first instance - java

why did it not split?

Categories

Resources