how do I extract data between two characters in java

how do I extract data between two characters in java - java

String text = "/'Team1 = 6', while /'Team2 = 4', and /'Team3 = 2'";
String[] body = text.split("/|,");
String b1 = body[1];
String b2 = body[2];
String b3 = body[3];
Desired results:
b1 = 'Team1 = 6'
b2 = 'Team2 = 4'
b3 = 'Team3 = 2'

Use regex. Something like this:
String text = "/'Team1 = 6', while /'Team2 = 4', and /'Team3 = 2'";
Matcher m = Pattern.compile("(\\w+\\s=\\s\\d+)").matcher(text);
// \w+ matches the team name (eg: Team1). \s=\s matches " = " and \d+ matches the score.
while (m.find()){
System.out.print(m.group(1)+"\n");
}
This prints:
Team1 = 6
Team2 = 4
Team3 = 2

There's a few ways you can do this, but in your case I'd use regex.
I don't know Java but think something like this regex pattern should work:
Pattern compile("\/'(.*?)'")
A random regex tester site with this pattern is here: https://regex101.com/r/MCRfMm/1

I'm going to say "friends don't let friends use regex" and recommend parsing this out. The built-in class StreamTokenizer will handle the job.
private static void testTok( String in ) throws Exception {
System.out.println( "Input: " + in );
StreamTokenizer tok = new StreamTokenizer( new StringReader( in ) );
tok.resetSyntax();
tok.wordChars( 'a', 'z' );
tok.wordChars( 'A', 'Z' );
tok.wordChars( '0', '9' );
tok.whitespaceChars( 0, ' ' );
String prevToken = null;
for( int type; (type = tok.nextToken()) != StreamTokenizer.TT_EOF; ) {
// System.out.println( tokString( type ) + ": nval=" + tok.nval + ", sval=" + tok.sval );
if( type == '=' ) {
tok.nextToken();
System.out.println( prevToken + "=" + tok.sval );
}
prevToken = tok.sval;
}
}
Output:
Input: /'Team1 = 6', while /'Team2 = 4', and /'Team3 = 2'
Team1=6
Team2=4
Team3=2
BUILD SUCCESSFUL (total time: 0 seconds)
One advantage of this technique is that the individual tokens like "Team1", "=" and "6" are all parsed separately, whereas the regex presented so far is already complex to read and would have to be made even more complex to isolate each of those tokens separately.

You can split on "a slash, optionally preceded by a comma followed by zero or more non-slash characters":
String[] body = text.split("(?:,[^/]*)?/");

public class MyClass {
public static void main(String args[]) {
String text = "/'Team1 = 6', while /'Team2 = 4', and /'Team3 = 2'";
char []textArr = text.toCharArray();
char st = '/';
char ed = ',';
boolean lookForEnd = false;
int st_idx =0;
for(int i =0; i < textArr.length; i++){
if(textArr[i] == st){
st_idx = i+1;
lookForEnd = true;
}
else if(lookForEnd && textArr[i] == ed){
System.out.println(text.substring(st_idx,i));
lookForEnd = false;
}
}
// we still didn't find ',' therefore print everything from lastFoundIdx of '/'
if(lookForEnd){
System.out.println(text.substring(st_idx));
}
}
}
/*
'Team1 = 6'
'Team2 = 4'
'Team3 = 2'
*/

You could use split and a regex using an alternation matching either the start of the string followed by a forward slash or matching a comma, match not a comma one or more times and then a forward slash followed by a positive lookahead to assert that what follows the alternation is a '
(?:^/|,[^,]+/)(?=')
Explanation
(?: Start non capturing group
^/ Assert the start of the string followed by forward slash
| Or
,[^,]+/ Match a comma followed by match not a comma one or more times using a negated character class and then match a forward slash
(?=') Positive lookahead to assert what follows is '
) Close non capturing group
Regex demo - Java demo
Getting a match instead of split
If you want to to match a pattern like 'Team1 = 6', you could use:
'[^=]+=[^']+'
Regex demo - Java demo

Related

Regex to capture the staring with specific word or character and ending with either one of the word

Want to capture the string after the last slash and before either a (; sid=) word or a (?) character.
sample data:
sessionId=30a793b1-ed7e-464a-a630; Url=https://www.example.com/mybook/order/newbooking/itemSummary; sid=KJ4dgQGdhg7dDn1h0TLsqhsdfhsfhjhsdjfhjshdjfhjsfddscg139bjXZQdkbHpzf9l6wy1GdK5XZp; targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=122;
sessionId=sfdsdfsd-ba57-4e21-a39f-34; Url=https://www.example.com/mybook/order/newbooking/itemList?id=76734&para=jhjdfhj&type=new&ordertype=kjkf&memberid=273647632&iSearch=true; sid=Q4hWgR1GpQb8xWTLpQB2yyyzmYRgXgFlJLGTc0QJyZbW targetUrl=https://www.example.com/ mybook/order/newbooking/page1?id=123;
sessionId=0e1acab1-45b8-sdf3454fds-afc1-sdf435sdfds; Url=https://www.example.com/mybook/order/newbooking/; sid=hkm2gRSL2t5ScKSJKSJn3vg2sfdsfdsfdsfdsfdfdsfdsfdsfvJZkDD3ng0kYTjhNQw8mFZMn; targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=343;
Expecting the below output:
1. itemSummary
2. itemList
3. ''(empty string)
Have build the below regex to capture it but its 100% accurate. It is capturing some additional part.
Regex
Url=.*\/(.*)(; sid|\?)
Could you please help me to improve the regex to get desired output?
Thanks in advance!

You may use this regex in Java with a greedy match after Url=:
\bUrl=\S+/([^?;/]+)(?=; sid|\?)
RegEx Demo
RegEx Demo:
\b: Word boundary
Url=: Match text Url=
\S+/: Match 1+ non-whitespace characters followed by a /
([^?;/]+): Match 1+ of a character that not ? and ; and /
(?=; sid|\?): Lookahead to assert that we have ; sid or ? ahead

Alternative solution:
Used regex:
"^Url=.*/(\\w+|)$"
Regex in test bench and context:
public static void main(String[] args) {
String input1 = "sessionId=30a793b1-ed7e-464a-a630; "
+ "Url=https://www.example.com/mybook/order/newbooking/itemSummary; "
+ "sid=KJ4dgQGdhg7dDn1h0TLsqhsdfhsfhjhsdjfhjshdjfhjsfddscg139bjXZQdkbHpzf9l6wy1GdK5XZp; "
+ "targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=122;";
String input2 = "sessionId=sfdsdfsd-ba57-4e21-a39f-34; "
+ "Url=https://www.example.com/mybook/order/newbooking/itemList?id=76734&para=jhjdfhj&type=new&ordertype=kjkf&memberid=273647632&iSearch=true; "
+ "sid=Q4hWgR1GpQb8xWTLpQB2yyyzmYRgXgFlJLGTc0QJyZbW "
+ "targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=123;";
String input3 = "sessionId=0e1acab1-45b8-sdf3454fds-afc1-sdf435sdfds; "
+ "Url=https://www.example.com/mybook/order/newbooking/; "
+ "sid=hkm2gRSL2t5ScKSJKSJn3vg2sfdsfdsfdsfdsfdfdsfdsfdsfvJZkDD3ng0kYTjhNQw8mFZMn; "
+ "targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=343;";
List<String> inputList = Arrays.asList(input1, input2, input3);
// Pre-compiled Patterns should not be in loops - that is why they are placed outside the loops
Pattern replaceWithNewLinePattern = Pattern.compile(";?\\s|\\?");
Pattern extractWordFromUrlPattern = Pattern.compile("^Url=.*/(\\w+|)$", Pattern.MULTILINE);
int count = 0;
for(String input : inputList) {
String inputWithNewLines = replaceWithNewLinePattern.matcher(input).replaceAll("\n");
// System.out.println(inputWithNewLines); // Check the change...
Matcher matcher = extractWordFromUrlPattern.matcher(inputWithNewLines);
while (matcher.find()) {
System.out.printf( "%d. '%s'%n", ++count, matcher.group(1));
}
}
}
Output:
1. 'itemSummary'
2. 'itemList'
3. ''

Regex to capture groups and ignore last two characters where one is optional

I need to capture two groups from an input string. The values differ in structure as they come in.
The following are examples of the incoming strings:
Comment = "This is a comment";
NumericValue = 123456;
What I am trying to accomplish is to capture the string value from the left of the equals sign as one group and the value after the equals sign as a second group. The semicolon should never be included.
The caveat is that if the second group is a string, the quotes from each end must not be included in that capture group.
The expected results would be:
Comment = "This is a comment";
key group => Comment
value group => This is a comment
NumericValue = 123456;
key group => NumericValue
value group => 123456
The following is what I have so far. This works fine for capturing the numeric value, but leaves the end double quote when capturing the string value.
(?<key>\w+)\s*=\s*(?:[\"]?)(?<group>.+(?:(?=[\"]?;)))
EDIT
When applying the regex against a string value, it must allow capture of semicolons and double quotes within the string and ignore only the closing ones.
So, if we have an input of:
Comment = "This is a "comment"; This is still a comment";
The second capture group should be:
This is a "comment"; This is still a comment

An option is to use an alternation where you would have to check for group 2 or group 3:
(?<key>\w+)\h*=\h*(?:"(.*?)"|([^"\r\n]+));$
(?<key>\w+) Group key match 1+ word chars
\h*=\h* Match an = between optional horizontal whitespace chars
(?: Non capturing group
"(.+?)" Capture in group 2 1+ times any char between "
| Or
([^"\r\n]+) Capture group 3, match 1+ times any char except " or a newline
); Close non capturing group and match ;
$ End of string
Regex demo
In Java
String regex = "(?<key>\\w+)\\h*=\\h*(?:\"(.*?)\"|([^\"\\r\\n]+));$";

Edited based on comment to include ; and " in the comments as per the examples given:
(?<key>\w+)\s*=\s*(?:[\"]?)(?<value>((")(?!;?$)|;(?!$)|[^;"])+)"?;?$
The following one additionally doesn't allow ; or " to appear in the numeric text. However, to include this, I had to rename the capturing groups because the name cannot be used for more than one group.
(?<key>\w+)\s*=\s*((?:")(?<valueT>((")(?!;?$)|;(?!$)|[^;"])+)";?$|(?<valueN>[^;"]+);?$)
Here is a class that tests it.
For readability, I have separated the key and value regexes in the class. I have added the test cases in a method within the class. However, this still doesn't handle the case of a numeric text containing ; or ". Also, the line needs to be trimmed before being subjected to the pattern test (which I think is feasible).
public class NameValuePairRegex{
public static void main( String[] args ){
String SPACE = "\\s*";
String EQ = "=";
String OR = "|";
/* The original regex tried by you (for comparison). */
String orig = "(?<key>\\w+)\\s*=\\s*(?:[\\\"]?)(?<value>.+(?:(?=;)))";
String key = "(?<key>\\w+)";
String valuePatternForText = "(?:\")(?<valueT>((\")(?!;?$)|;(?!$)|[^;\"])+)\";?$";
String valuePatternForNumbers = "(?<valueN>[^;\"]+);?$";
String p = key + SPACE + EQ + SPACE + "(" + valuePatternForText + OR + valuePatternForNumbers + ")";
Pattern nvp = Pattern.compile( p );
System.out.println( nvp.pattern() );
print( input(), nvp );
}
private static void print( List<String> input, Pattern ep ) {
for( String e : input ) {
System.out.println( e );
Matcher m = ep.matcher( e );
boolean found = m.find();
if( !found ) {
System.out.println( "\t\tNo match" );
continue;
}
String valueT = m.group( "valueT" );
String valueN = m.group( "valueN" );
System.out.print( "\t\t" + m.group( "key" ) + " -> " + ( valueT == null ? "" : valueT ) + " " + ( valueN == null ? "" : valueN ) );
System.out.println( );
}
}
private static List<String> input(){
List<String> neg = new ArrayList<>();
Collections.addAll( neg,
"Comment = \"This is a comment\";",
"Comment = \"This is a comment with semicolon ;\";",
"Comment = \"This is a comment with semicolon ; and quote\"\";",
"Comment = \"This is a comment\"",
"Comment = \"This is a \"comment\"; This is still a comment\";",
"NumericValue = 123456;",
"NumericValue = 123;456;",
"NumericValue = 123\"456;",
"NumericValue = 123456" );
return neg;
}
}
Original answer:
The following changed regex is fulfilling the requirements you mentioned. I added the exclusion of ; and " from the value part.
Original that you tried:
(?<key>\w+)\s*=\s*(?:[\"]?)(?<group>.+(?:(?=[\"]?;)))
The changed one:
(?<key>\w+)\s*=\s*(?:[\"]?)(?<value>[^;"]+)

Regular expressions are fun, but look how clean and easy to read this would be without using a regular expression:
int equals = s.indexOf('=');
String key = s.substring(0, equals).trim();
String value = s.substring(equals + 1).trim();
if (value.endsWith(";")) {
value = value.substring(0, value.length() - 1).trim();
}
if (value.startsWith("\"") && value.endsWith("\"")) {
value = value.substring(1, value.length() - 1);
}
Don’t assume that because this uses more lines of code than a regular expression that it’s slower. The lines of code executed internally by a regex engine will far exceed the above code.

Remove pattern from string in Java

I am currently working on a tool, which helps me to analyze a constantly growing String, that can look like this: String s = "AAAAAAABBCCCDDABQ". What I want to do is to find a sequence of A's and B's, do something and then remove that sequence from the original String.
My code looks like this:
while (someBoolean){
if(Pattern.matches("A+B+", s)) {
//Do stuff
//Remove the found pattern
}
if(Pattern.matches("C+D+", s)) {
//Do other stuff
//Remove the found pattern
}
}
return s;
Also, how I could remove the three sequences, so that s just contains "Q" at the end of the calculation, without and endless loop?

You should use a regex replacement loop, i.e. the methods appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb).
To find one of many patterns, use the | regex matcher, and capture each pattern separately.
You can then use group(int group) to get the matched string for each capture group (first group is group 1), which returns null if that group didn't match. For better performance, to simply check whether the group matched, use start(int group), which returns -1 if that group didn't match.
Example:
String s = "AAAAAAABBCCCDDABQ";
StringBuffer buf = new StringBuffer();
Pattern p = Pattern.compile("(A+B+)|(C+D+)");
Matcher m = p.matcher(s);
while (m.find()) {
if (m.start(1) != -1) { // Group 1 found
System.out.println("Found AB: " + m.group(1));
m.appendReplacement(buf, ""); // Replace matched substring with ""
} else if (m.start(2) != -1) { // Group 2 found
System.out.println("Found CD: " + m.group(2));
m.appendReplacement(buf, ""); // Replace matched substring with ""
}
}
m.appendTail(buf);
String remain = buf.toString();
System.out.println("Remain: " + remain);
Output
Found AB: AAAAAAABB
Found CD: CCCDD
Found AB: AB
Remain: Q

This solution assumes that the string always ends in Q.
String s="AAAAAAABBCCCDDABQ";
Pattern abPattern = Pattern.compile("A+B+");
Pattern cdPattern = Pattern.compile("C+D+");
while (s.length() > 1){
Matcher abMatcher = abPattern.matcher(s);
if (abMatcher.find()) {
s = abMatcher.replaceFirst("");
//Do other stuff
}
Matcher cdMatcher = cdPattern.matcher(s);
if (cdMatcher.find()) {
s = cdMatcher.replaceFirst("");
//Do other stuff
}
}
System.out.println(s);

You are probably looking for something like this:
String input = "AAAAAAABBCCCDDABQ";
String result = input;
String[] chars = {"A", "B", "C", "D"}; // chars to replace
for (String ch : chars) {
if (result.contains(ch)) {
String pattern = "[" + ch + "]+";
result = result.replaceAll(pattern, ch);
}
}
System.out.println(input); //"AAAAAAABBCCCDDABQ"
System.out.println(result); //"ABCDABQ"
This basically replace sequence of each character for single one.
If you want to remove the sequence completely, just replace ch to "" in replaceAll method parameters inside if body.

Skip first occurance and split the string in Java

I want to skip first occurrence if no of occurrence more than 4. For now I will get max of 5 number underscore occurrence. I need to produce the output A_B, C, D, E, F and I did using below code. I want better solution. Please check and let me know. Thanks in advance.
String key = "A_B_C_D_E_F";
int occurance = StringUtils.countOccurrencesOf(key, "_");
System.out.println(occurance);
String[] keyValues = null;
if(occurance == 5){
key = key.replaceFirst("_", "-");
keyValues = StringUtils.tokenizeToStringArray(key, "_");
keyValues[0] = replaceOnce(keyValues[0], "-", "_");
}else{
keyValues = StringUtils.tokenizeToStringArray(key, "_");
}
for(String keyValue : keyValues){
System.out.println(keyValue);
}

You can use this regex to split:
String s = "A_B_C_D_E_F";
String[] list = s.split("(?<=_[A-Z])_");
Output:
[A_B, C, D, E, F]
The idea is to match only the _ who are preceded by "_[A-Z]", which effectively skips only the first one.
If the strings you are considering have a different format between the "_", you have to replace [A-Z] by the appropriate regex

Well, it is relatively "simple":
String str = "A_B_C_D_E_F_G";
String[] result = str.split("(?<!^[^_]*)_|_(?=(?:[^_]*_){0,3}[^_]*$)");
System.out.println(Arrays.toString(result));
Here a version with comments for better understanding that can also be used as is:
String str = "A_B_C_D_E_F_G";
String[] result = str.split("(?x) # enable embedded comments \n"
+ " # first alternative splits on all but the first underscore \n"
+ "(?<! # next character should not be preceded by \n"
+ " ^[^_]* # only non-underscores since beginning of input \n"
+ ") # so this matches only if there was an underscore before \n"
+ "_ # underscore \n"
+ "| # alternatively split if an underscore is followed by at most three more underscores to match the less than five underscores case \n"
+ "_ # underscore \n"
+ "(?= # preceding character must be followed by \n"
+ " (?:[^_]*_){0,3} # at most three groups of non-underscores and an underscore \n"
+ " [^_]*$ # only more non-underscores until end of line \n"
+ ")");
System.out.println(Arrays.toString(result));

You can use this regex based on \G and instead of splitting use matching:
String str = "A_B_C_D_E_F";
Pattern p = Pattern.compile("(^[^_]*_[^_]+|\\G[^_]+)(?:_|$)");
Matcher m = p.matcher(str);
List<String> resultArr = new ArrayList<>();
while (m.find()) {
resultArr.add( m.group(1) );
}
System.err.println(resultArr);
\G asserts position at the end of the previous match or the start of the string for the first match.
Output:
[A_B, C, D, E, F]
RegEx Demo

I would do it after the split.
public void test() {
String key = "A_B_C_D_E_F";
String[] parts = key.split("_");
if (parts.length >= 5) {
String[] newParts = new String[parts.length - 1];
newParts[0] = parts[0] + "-" + parts[1];
System.arraycopy(parts, 2, newParts, 1, parts.length - 2);
parts = newParts;
}
System.out.println("parts = " + Arrays.toString(parts));
}

Although Java does not say that officially, you can use * and + in the lookbehind as they are implemented as limiting quantifiers: * as {0,0x7FFFFFFF} and + as {1,0x7FFFFFFF} (see Regex look-behind without obvious maximum length in Java). So, if your strings are not too long, you can use
String key = "A_B_C_D"; // => [A, B, C, D]
//String key = "A_B_C_D_E_F"; // => [A_B, C, D, E, F]
String[] res = null;
if (key.split("_").length > 4) {
res = key.split("(?<!^[^_]*)_");
} else {
res = key.split("_");
}
System.out.println(Arrays.toString(res));
See the JAVA demo
DISCLAIMER: Since this is an exploit of the current Java 8 regex engine, the code may break in the future when the bug is fixed in Java.

Java how to replace 2 or more spaces with single space in string and delete leading and trailing spaces

Looking for quick, simple way in Java to change this string
" hello there "
to something that looks like this
"hello there"
where I replace all those multiple spaces with a single space, except I also want the one or more spaces at the beginning of string to be gone.
Something like this gets me partly there
String mytext = " hello there ";
mytext = mytext.replaceAll("( )+", " ");
but not quite.

Try this:
String after = before.trim().replaceAll(" +", " ");
See also
String.trim()
Returns a copy of the string, with leading and trailing whitespace omitted.
regular-expressions.info/Repetition
No trim() regex
It's also possible to do this with just one replaceAll, but this is much less readable than the trim() solution. Nonetheless, it's provided here just to show what regex can do:
String[] tests = {
" x ", // [x]
" 1 2 3 ", // [1 2 3]
"", // []
" ", // []
};
for (String test : tests) {
System.out.format("[%s]%n",
test.replaceAll("^ +| +$|( )+", "$1")
);
}
There are 3 alternates:
^_+ : any sequence of spaces at the beginning of the string
Match and replace with $1, which captures the empty string
_+$ : any sequence of spaces at the end of the string
Match and replace with $1, which captures the empty string
(_)+ : any sequence of spaces that matches none of the above, meaning it's in the middle
Match and replace with $1, which captures a single space
See also
regular-expressions.info/Anchors

You just need a:
replaceAll("\\s{2,}", " ").trim();
where you match one or more spaces and replace them with a single space and then trim whitespaces at the beginning and end (you could actually invert by first trimming and then matching to make the regex quicker as someone pointed out).
To test this out quickly try:
System.out.println(new String(" hello there ").trim().replaceAll("\\s{2,}", " "));
and it will return:
"hello there"

Use the Apache commons StringUtils.normalizeSpace(String str) method. See docs here

This worked perfectly for me : sValue = sValue.trim().replaceAll("\\s+", " ");

trim() method removes the leading and trailing spaces and using replaceAll("regex", "string to replace") method with regex "\s+" matches more than one space and will replace it with a single space
myText = myText.trim().replaceAll("\\s+"," ");

The following code will compact any whitespace between words and remove any at the string's beginning and end
String input = "\n\n\n a string with many spaces, \n"+
" a \t tab and a newline\n\n";
String output = input.trim().replaceAll("\\s+", " ");
System.out.println(output);
This will output a string with many spaces, a tab and a newline
Note that any non-printable characters including spaces, tabs and newlines will be compacted or removed
For more information see the respective documentation:
String#trim() method
String#replaceAll(String regex, String replacement) method
For information about Java's regular expression implementation see the documentation of the Pattern class

"[ ]{2,}"
This will match more than one space.
String mytext = " hello there ";
//without trim -> " hello there"
//with trim -> "hello there"
mytext = mytext.trim().replaceAll("[ ]{2,}", " ");
System.out.println(mytext);
OUTPUT:
hello there

To eliminate spaces at the beginning and at the end of the String, use String#trim() method. And then use your mytext.replaceAll("( )+", " ").

You can first use String.trim(), and then apply the regex replace command on the result.

Try this one.
Sample Code
String str = " hello there ";
System.out.println(str.replaceAll("( +)"," ").trim());
OUTPUT
hello there
First it will replace all the spaces with single space. Than we have to supposed to do trim String because Starting of the String and End of the String it will replace the all space with single space if String has spaces at Starting of the String and End of the String So we need to trim them. Than you get your desired String.

String blogName = "how to do in java . com";
String nameWithProperSpacing = blogName.replaceAll("\\\s+", " ");

trim()
Removes only the leading & trailing spaces.
From Java Doc,
"Returns a string whose value is this string, with any leading and trailing whitespace removed."
System.out.println(" D ev Dum my ".trim());
"D ev Dum my"
replace(), replaceAll()
Replaces all the empty strings in the word,
System.out.println(" D ev Dum my ".replace(" ",""));
System.out.println(" D ev Dum my ".replaceAll(" ",""));
System.out.println(" D ev Dum my ".replaceAll("\\s+",""));
Output:
"DevDummy"
"DevDummy"
"DevDummy"
Note: "\s+" is the regular expression similar to the empty space character.
Reference : https://www.codedjava.com/2018/06/replace-all-spaces-in-string-trim.html

In Kotlin it would look like this
val input = "\n\n\n a string with many spaces, \n"
val cleanedInput = input.trim().replace(Regex("(\\s)+"), " ")

A lot of correct answers been provided so far and I see lot of upvotes. However, the mentioned ways will work but not really optimized or not really readable.
I recently came across the solution which every developer will like.
String nameWithProperSpacing = StringUtils.normalizeSpace( stringWithLotOfSpaces );
You are done.
This is readable solution.

You could use lookarounds also.
test.replaceAll("^ +| +$|(?<= ) ", "");
OR
test.replaceAll("^ +| +$| (?= )", "")
<space>(?= ) matches a space character which is followed by another space character. So in consecutive spaces, it would match all the spaces except the last because it isn't followed by a space character. This leaving you a single space for consecutive spaces after the removal operation.
Example:
String[] tests = {
" x ", // [x]
" 1 2 3 ", // [1 2 3]
"", // []
" ", // []
};
for (String test : tests) {
System.out.format("[%s]%n",
test.replaceAll("^ +| +$| (?= )", "")
);
}

See String.replaceAll.
Use the regex "\s" and replace with " ".
Then use String.trim.

String str = " hello world"
reduce spaces first
str = str.trim().replaceAll(" +", " ");
capitalize the first letter and lowercase everything else
str = str.substring(0,1).toUpperCase() +str.substring(1,str.length()).toLowerCase();

you should do it like this
String mytext = " hello there ";
mytext = mytext.replaceAll("( +)", " ");
put + inside round brackets.

String str = " this is string ";
str = str.replaceAll("\\s+", " ").trim();

This worked for me
scan= filter(scan, " [\\s]+", " ");
scan= sac.trim();
where filter is following function and scan is the input string:
public String filter(String scan, String regex, String replace) {
StringBuffer sb = new StringBuffer();
Pattern pt = Pattern.compile(regex);
Matcher m = pt.matcher(scan);
while (m.find()) {
m.appendReplacement(sb, replace);
}
m.appendTail(sb);
return sb.toString();
}

The simplest method for removing white space anywhere in the string.
public String removeWhiteSpaces(String returnString){
returnString = returnString.trim().replaceAll("^ +| +$|( )+", " ");
return returnString;
}

check this...
public static void main(String[] args) {
String s = "A B C D E F G\tH I\rJ\nK\tL";
System.out.println("Current : "+s);
System.out.println("Single Space : "+singleSpace(s));
System.out.println("Space count : "+spaceCount(s));
System.out.format("Replace all = %s", s.replaceAll("\\s+", ""));
// Example where it uses the most.
String s = "My name is yashwanth . M";
String s2 = "My nameis yashwanth.M";
System.out.println("Normal : "+s.equals(s2));
System.out.println("Replace : "+s.replaceAll("\\s+", "").equals(s2.replaceAll("\\s+", "")));
}
If String contains only single-space then replace() will not-replace,
If spaces are more than one, Then replace() action performs and removes spacess.
public static String singleSpace(String str){
return str.replaceAll(" +| +|\t|\r|\n","");
}
To count the number of spaces in a String.
public static String spaceCount(String str){
int i = 0;
while(str.indexOf(" ") > -1){
//str = str.replaceFirst(" ", ""+(i++));
str = str.replaceFirst(Pattern.quote(" "), ""+(i++));
}
return str;
}
Pattern.quote("?") returns literal pattern String.

My method before I found the second answer using regex as a better solution. Maybe someone needs this code.
private String replaceMultipleSpacesFromString(String s){
if(s.length() == 0 ) return "";
int timesSpace = 0;
String res = "";
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if(c == ' '){
timesSpace++;
if(timesSpace < 2)
res += c;
}else{
res += c;
timesSpace = 0;
}
}
return res.trim();
}

Stream version, filters spaces and tabs.
Stream.of(str.split("[ \\t]")).filter(s -> s.length() > 0).collect(Collectors.joining(" "))

I know replaceAll method is much easier but I wanted to post this as well.
public static String removeExtraSpace(String input) {
input= input.trim();
ArrayList <String> x= new ArrayList<>(Arrays.asList(input.split("")));
for(int i=0; i<x.size()-1;i++) {
if(x.get(i).equals(" ") && x.get(i+1).equals(" ")) {
x.remove(i);
i--;
}
}
String word="";
for(String each: x)
word+=each;
return word;
}

String myText = " Hello World ";
myText = myText.trim().replace(/ +(?= )/g,'');
// Output: "Hello World"

string.replaceAll("\s+", " ");

If you already use Guava (v. 19+) in your project you may want to use this:
CharMatcher.whitespace().trimAndCollapseFrom(input, ' ');
or, if you need to remove exactly SPACE symbol ( or U+0020, see more whitespaces) use:
CharMatcher.anyOf(" ").trimAndCollapseFrom(input, ' ');

public class RemoveExtraSpacesEfficient {
public static void main(String[] args) {
String s = "my name is mr space ";
char[] charArray = s.toCharArray();
char prev = s.charAt(0);
for (int i = 0; i < charArray.length; i++) {
char cur = charArray[i];
if (cur == ' ' && prev == ' ') {
} else {
System.out.print(cur);
}
prev = cur;
}
}
}
The above solution is the algorithm with the complexity of O(n) without using any java function.

Please use below code
package com.myjava.string;
import java.util.StringTokenizer;
public class MyStrRemoveMultSpaces {
public static void main(String a[]){
String str = "String With Multiple Spaces";
StringTokenizer st = new StringTokenizer(str, " ");
StringBuffer sb = new StringBuffer();
while(st.hasMoreElements()){
sb.append(st.nextElement()).append(" ");
}
System.out.println(sb.toString().trim());
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

how do I extract data between two characters in java - java

String text = "/'Team1 = 6', while /'Team2 = 4', and /'Team3 = 2'"; String[] body = text.split("/|,"); String b1 = body[1]; String b2 = body[2]; String b3 = body[3]; Desired results: b1 = 'Team1 = 6' b2 = 'Team2 = 4' b3 = 'Team3 = 2'

There's a few ways you can do this, but in your case I'd use regex. I don't know Java but think something like this regex pattern should work: Pattern compile("\/'(.*?)'") A random regex tester site with this pattern is here: https://regex101.com/r/MCRfMm/1

You can split on "a slash, optionally preceded by a comma followed by zero or more non-slash characters": String[] body = text.split("(?:,[^/]*)?/");

Related

Regex to capture the staring with specific word or character and ending with either one of the word

Regex to capture groups and ignore last two characters where one is optional

Remove pattern from string in Java

Skip first occurance and split the string in Java

Java how to replace 2 or more spaces with single space in string and delete leading and trailing spaces

Categories

Resources