How to search for substrings - java

I'm looking for patterns like "tip" and "top" in the string -- length-3, starting with 't' and ending with 'p'. The goal is to return a string where for all such words, the middle letter is gone. So for example, "tipXtap" yields "tpXtp".
So far, I've thought about using recursion, and the replace() method, but am not sure if that is the best way to approach this problem.
Here is my code thus far:
String result = "";
if(str.length() < 3)
return str;
for(int i = 0; i <= str.length() - 2; i++){
if(str.charAt(i) == 't' && str.charAt(i + 2) == 'p'){
str.replaceAll(str.substring(i + 1, i + 2), "");
}
return str;
}
return str;

Use this Java code:
String str = "tipXtap";
str = str.replaceAll("t.p", "tp");
This uses regular expressions and the String.replaceAll function. The . (dot) character is a regex metacharacter that matches any single character.

One way of doing this.
Convert the String to a char array.
Use if conditions to validate first and third letter from the first letter. First look whether a char of a String is T and then check the char two chars away is a 'p'. You have to do this inside a loop traversing the char array.
If the validation condition is true, remove the middle element. You will have to move the element in the char array.
Convert the char array to a String and return it.
Hope this helps.

Here's a JavaScript solution to this problem using regular expressions:
foo = 'tipXtop'
foo.replace(/t\wp/g, 'tp')
The \w regex operator matches a word character like a-z, A-Z, 0-9 or _.
The g regex flag will match all instances of the regex in the string.

Related

Writing one regular expression for string in java

I am trying to write one regular expression for string. Let us say there is a string RBY_YBR where _ represents empty so we can recursively replace the alphabets and _ and the result is RRBBYY_ . There can be two or more alphabet pairs can be formed or something like this also RRR .
Conditions
1). Left or right alphabet should be the same.
2). If there is no _ then the alphabet should be like RRBBYY not RBRBYY or RBYRBY etc.
3). There can be more than one underscore _ .
From regular expression I am trying to find whether the given string can satisfy the regular expression or not by replacing the character with _ to form a pattern of consecutive alphabets
The regular expression which I wrote is
String regEx = "[A-ZA-Z_]";
But this regular expression is failing for RBRB. since there is no empty space to replace the characters and RBRB is also not in a pattern.
How could I write the effective regular expression to solve this.
Ok, as I understand it, a matching string shall either consist only of same characters being grouped together, or must contain at least one underscore.
So, RRRBBR would be invalid, while RRRRBB, RRRBBR_, and RRRBB_R_ would all be valid.
After comment of question creator, additional condition: Every character must occur 0 or 2 or more times.
As far as I know, this is not possible with Regular Expressions, as Regular Expressions are finite-state machines without "storage". You would have to "store" each character found in the string to check that it won't appear later again.
I would suggest a very simple method for verifying such strings:
public static boolean matchesMyPattern(String s) {
boolean withUnderscore = s.contains("_");
int[] found = new int[26];
for (int i = 0; i < s.length(); i++) {
char ch = s.charAt(i);
if (ch != '_' && (ch < 'A' || ch > 'Z')) {
return false;
}
if (ch != '_' && i > 0 && s.charAt(i - 1) != ch && found[ch - 'A'] > 0
&& !withUnderscore) {
return false;
}
if (ch != '_') {
found[ch - 'A']++;
}
}
for (int i = 0; i < found.length; i++) {
if (found[i] == 1) {
return false;
}
}
return true;
}
Please take my answer with a grain of salt, since it's a bit of a "Fastest gun in the West" post.
It follows the same assumptions as Florian Albrecht's answer. (thanks)
I believe that this will solve your problem:
(([A-Za-z])(\2|_)+)+
https://regex101.com/r/7TfSVc/1
It works by using the second capturing group and ensuring that more of it follow, or there are underscores.
Known bug: it does not work if an underscore starts a string.
EDIT
This one is better, though I forgot what I was doing by the end of it.
(([A-Za-z_])(\2|_)+|_+[A-Za-z]_*)+
https://regex101.com/r/7TfSVc/4

Java Regex from beginning to first char

How can I find any word from beginning of string to first char "~" using java?
Example:
Worddjjfdskfjsdkfjdsj ~ Word ~ Word
I want it to capture
Worddjjfdskfjsdkfjdsj
You can also do it without regex in a very simple way.
First of all use indexOf() String method to find the index of the "~" character. Then use the substring() method to extract the string you are lookin for.
Here is an example:
String stringToProcess = "hello~world";
int charIndex = stringToProcess.indexOf('~');
String finalString = stringToProcess.substring(0, charIndex);
You can use this regex to capture all character from start of string ^ to first occurrence of ~:
^[^~]*
[^~]* is negation based regex that matches 0 or more of anything but ~
Without regex it can be solved
Simply split your string by ~.
String str[] = "Worddjjfdskfjsdkfjdsj ~ Word ~ Word".split("~");
System.out.println(str[0]);
Here is regular expression that you can use: ^(.*?)~.
However in your simple case you do not need regular expressions at all. Use indexOf() and substring():
int tilda = str.indexOf('~');
if (tilda >= 0) {
word = str.substring(0, tilda);
}

Use regex to replace sequences in a string with modified characters

I am trying to solve a codingbat problem using regular expressions whether it works on the website or not.
So far, I have the following code which does not add a * between the two consecutive equal characters. Instead, it just bulldozes over them and replaces them with a set string.
public String pairStar(String str) {
Pattern pattern = Pattern.compile("([a-z])\\1", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
if(matcher.find())
matcher.replaceAll(str);//this is where I don't know what to do
return str;
}
I want to know how I could keep using regex and replace the whole string. If needed, I think a recursive system could help.
This works:
while(str.matches(".*(.)\\1.*")) {
str = str.replaceAll("(.)\\1", "$1*$1");
}
return str;
Explanation of the regex:
The search regex (.)\\1:
(.) means "any character" (the .) and the brackets create a group - group 1 (the first left bracket)
\\1, which in regex is \1 (a java literal String must escape a backslash with another backslash) means "the first group" - this kind of term is called a "back reference"
So together (.)\1 means "any repeated character"
The replacement regex $1*$1:
The $1 term means "the content captured as group 1"
Recursive solution:
Technically, the solution called for on that site is a recursive solution, so here is recursive implementation:
public String pairStar(String str) {
if (!str.matches(".*(.)\\1.*")) return str;
return pairStar(str.replaceAll("(.)\\1", "$1*$1"));
}
FWIW, here's a non-recursive solution:
public String pairStar(String str) {
int len = str.length();
StringBuilder sb = new StringBuilder(len*2);
char last = '\0';
for (int i=0; i < len; ++i) {
char c = str.charAt(i);
if (c == last) sb.append('*');
sb.append(c);
last = c;
}
return sb.toString();
}
I dont know java, but I believe there is replace function for string in java or with regular expression. Your match string would be
([a-z])\\1
And the replace string would be
$1*$1
After some searching I think you are looking for this,
str.replaceAll("([a-z])\\1", "$1*$1").replaceAll("([a-z])\\1", "$1*$1");
This is my own solutions.
Recursive solution (which is probably more or less the solution that the problem is designed for)
public String pairStar(String str) {
if (str.length() <= 1) return str;
else return str.charAt(0) +
(str.charAt(0) == str.charAt(1) ? "*" : "") +
pairStar(str.substring(1));
}
If you want to complain about substring, then you can write a helper function pairStar(String str, int index) which does the actual recursion work.
Regex one-liner one-function-call solution
public String pairStar(String str) {
return str.replaceAll("(.)(?=\\1)", "$1*");
}
Both solution has the same spirit. They both check whether the current character is the same as the next character or not. If they are the same then insert a * between the 2 identical characters. Then we move on to check the next character. This is to produce the expected output a*a*a*a from input aaaa.
The normal regex solution of "(.)\\1" has a problem: it consumes 2 characters per match. As a result, we failed to compare whether the character after the 2nd character is the same character. The look-ahead is used to resolve this problem - it will do comparison with the next character without consuming it.
This is similar to the recursive solution, where we compare the next character str.charAt(0) == str.charAt(1), while calling the function recursively on the substring with only the current character removed pairStar(str.substring(1).

How to remove leading and trailing whitespace from the string in Java?

I want to remove the leading and trailing whitespace from string:
String s = " Hello World ";
I want the result to be like:
s == "Hello world";
s.trim()
see String#trim()
Without any internal method, use regex like
s.replaceAll("^\\s+", "").replaceAll("\\s+$", "")
or
s.replaceAll("^\\s+|\\s+$", "")
or just use pattern in pure form
String s=" Hello World ";
Pattern trimmer = Pattern.compile("^\\s+|\\s+$");
Matcher m = trimmer.matcher(s);
StringBuffer out = new StringBuffer();
while(m.find())
m.appendReplacement(out, "");
m.appendTail(out);
System.out.println(out+"!");
String s="Test ";
s= s.trim();
I prefer not to use regular expressions for trivial problems. This would be a simple option:
public static String trim(final String s) {
final StringBuilder sb = new StringBuilder(s);
while (sb.length() > 0 && Character.isWhitespace(sb.charAt(0)))
sb.deleteCharAt(0); // delete from the beginning
while (sb.length() > 0 && Character.isWhitespace(sb.charAt(sb.length() - 1)))
sb.deleteCharAt(sb.length() - 1); // delete from the end
return sb.toString();
}
Use the String class trim method. It will remove all leading and trailing whitespace.
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html
String s=" Hello World ";
s = s.trim();
For more information See This
Simply use trim(). It only eliminate the start and end excess white spaces of a string.
String fav = " I like apple ";
fav = fav.trim();
System.out.println(fav);
Output:
I like apple //no extra space at start and end of the string
String.trim() answers the question but was not an option for me.
As stated here :
it simply regards anything up to and including U+0020 (the usual space character) as whitespace, and anything above that as non-whitespace.
This results in it trimming the U+0020 space character and all “control code” characters below U+0020 (including the U+0009 tab character), but not the control codes or Unicode space characters that are above that.
I am working with Japanese where we have full-width characters Like this, the full-width space would not be trimmed by String.trim().
I therefore made a function which, like xehpuk's snippet, use Character.isWhitespace().
However, this version is not using a StringBuilder and instead of deleting characters, finds the 2 indexes it needs to take a trimmed substring out of the original String.
public static String trimWhitespace(final String stringToTrim) {
int endIndex = stringToTrim.length();
// Return the string if it's empty
if (endIndex == 0) return stringToTrim;
int firstIndex = -1;
// Find first character which is not a whitespace, if any
// (increment from beginning until either first non whitespace character or end of string)
while (++firstIndex < endIndex && Character.isWhitespace(stringToTrim.charAt(firstIndex))) { }
// If firstIndex did not reach end of string, Find last character which is not a whitespace,
// (decrement from end until last non whitespace character)
while (--endIndex > firstIndex && Character.isWhitespace(stringToTrim.charAt(endIndex))) { }
// Return substring using indexes
return stringToTrim.substring(firstIndex, endIndex + 1);
}
s = s.trim();
More info:
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#trim()
Why do you not want to use predefined methods? They are usually most efficient.
See String#trim() method
Since Java 11 String class has strip() method which is used to returns a string whose value is this string, with all leading and trailing white space removed. This is introduced to overcome the problem of trim method.
Docs: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#strip()
Example:
String str = " abc ";
// public String strip()
str = str.strip(); // Returns abc
There are two more useful methods in Java 11+ String class:
stripLeading() : Returns a string whose value is this string,
with all leading white space removed.
stripTrailing() : Returns a string whose value is this string,
with all trailing white space removed.
While #xehpuk's method is good if you want to avoid using regex, but it has O(n^2) time complexity. The following solution also avoids regex, but is O(n):
if(s.length() == 0)
return "";
char left = s.charAt(0);
char right = s.charAt(s.length() - 1);
int leftWhitespace = 0;
int rightWhitespace = 0;
boolean leftBeforeRight = leftWhitespace < s.length() - 1 - rightWhitespace;
while ((left == ' ' || right == ' ') && leftBeforeRight) {
if(left == ' ') {
leftWhitespace++;
left = s.charAt(leftWhitespace);
}
if(right == ' ') {
rightWhitespace++;
right = s.charAt(s.length() - 1 - rightWhitespace);
}
leftBeforeRight = leftWhitespace < s.length() - 1 - rightWhitespace;
}
String result = s.substring(leftWhitespace, s.length() - rightWhitespace);
return result.equals(" ") ? "" : result;
This counts the number of trailing whitespaces in the beginning and end of the string, until either the "left" and "right" indices obtained from whitespace counts meet, or both indices have reached a non-whitespace character. Afterwards, we either return the substring obtained using the whitespace counts, or the empty string if the result is a whitespace (needed to account for all-whitespace strings with odd number of characters).

RegEx to match strings that have only one C

I am looking for some tips on how I can take a string like:
KIGABCCA TQABCCAXT
GABCCASZYU GZTTABCCA MHNBABCCA CLZGABCA ABCCALZH
ABCCADQRNS VIZABCCA GABCCAG
UEKABCCA KBTOABCCA GABCCAMFFJ HABCCAISOJ OFJJABCCA HPABCCA
WBXRABCCA
ABCCAKH
VABCCAJX WBDOABCCA ABCCAWM GCABCA QHRABCCA
ABCCAMDDD WPABCCAD OGABCCA
TVABCCA JGLABCA
IUABCCA
and to return any entire string with only one C in it.
PLEASE NOTE: I AM NOT LOOKING FOR A SOLUTION!
Just some pointers or a description of the sort of constructs I should be looking at.
I have been labouring over it for ages, and have come close to hurting someone because of this. It is a homework question and I'm not looking to cheat, just some guidance.
I have read extensively about Reg Ex and I understand them.
I'm not looking for a beginners guide.
You want to first put a word boundary at the start and end. Then match any character that isn't C or a word boundary 0 or more times, then a C, then again, any character that isn't a C or word boundary 0 or more times. So it'll match a C on it's own, or a C with any non-C characters either (or both) side of it.
The no-C or word boundary you could do in two ways... say "any character that isn't a C or word boundary" or you could say "I want A, B or anything from D-Z". Up to you.
Search for a pattern that has the following elements, in order:
The beginning of the string or any whitespace.
Zero or more non-whitespace non-C characters.
A "C"
Zero or more non-whitespace non-C characters.
The end of the string or any whitespace.
you can create a count function. then pass each string to it. just an example
String string = "KIGABCCA"
public static boolean countChar(String string, char ch){
int count =0;
for(int i = 0; i<string.length();i++){
if(string.charAt(i) == ch ){
count++;
}
}
if ( count == 1){
return true;
}else {
return false;
}
}

Categories

Resources