split string method

split string method - java

i have written a code in java as given under
public class sstring
{
public static void main(String[] args)
{
String s="a=(b+c); string st='hello adeel';";
String[] ss=s.split("\\b");
for(int i=0;i<ss.length;i++)
System.out.println(ss[i]);
}
}
and the output of this code is
a
=(
b
+
c
);
string
st
='
hello
adeel
';
what should i do in order to split =( or ); etc in two separate elements rather than single elements. in this array. i.e. my output may look as
a
=
(
b
+
c
)
;
string
st
=
'
hello
adeel
'
;
is it possible ?

This matches with every find either a word \\w+ (small w) or a non-word character \\W (capital W).
It is an unaccepted answer of can split string method of java return the array with the delimiters as well of the above comment of #RohitJain.
public String[] getParts(String s) {
List<String> parts = new ArrayList<String>();
Pattern pattern = Pattern.compile("(\\w+|\\W)");
Matcher m = pattern.matcher(s);
while (m.find()) {
parts.add(m.group());
}
return parts.toArray(new String[parts.size()]);
}

Use this code there..
Pattern pattern = Pattern.compile("(\\w+|\\W)");
Matcher m = pattern.matcher("a=(b+c); string st='hello adeel';");
while (m.find()) {
System.out.println(m.group());
}

Related

Retrieve a Sub-string from a string after the first occurrence of a character from a range of characters

i am trying to retrieve a sub-string from a string from the first occurence of any character between A-Z and a-z
for example:
if the string is 13BHO1234FO
then substring should be BHO1234FO
i.e the string from the first occurence of the character 'B'.

Try this. It simply deletes the first part of the string you don't want and returns the rest. The original string is unchanged.
String[] testCases =
{ "13BHO1234FO", "ARSTOP123!", "133KSLK", "122222" };
for (String s : testCases) {
String sub = s.replaceFirst("^[^A-Za-z]+", "");
System.out.println("'" + sub + "'");
}
Prints substrings surrounded by single quotes to show the string.
'BHO1234FO'
'ARSTOP123!'
'KSLK'
''

You can use a regex and Matcher to find the index of the first alphabetical character and make a substring starting from the index:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
String text = "13BHO1234FO";
Pattern pattern = Pattern.compile("[A-Za-z]");
Matcher matcher = pattern.matcher(text);
matcher.find();
int index = matcher.start();
String substr = text.substring(index);
System.out.println(substr);
}
}

Try this one:
public static void main(String[] args) {
String test = "13BHO1234FO";
System.out.println(test.replaceFirst("^.*?(?=[A-Za-z])", ""));
}

Splitting string on spaces unless in double quotes but double quotes can have a preceding string attached

I need to split a string in Java (first remove whitespaces between quotes and then split at whitespaces.)
"abc test=\"x y z\" magic=\" hello \" hola"
becomes:
firstly:
"abc test=\"xyz\" magic=\"hello\" hola"
and then:
abc
test="xyz"
magic="hello"
hola
Scenario :
I am getting a string something like above from input and I want to break it into parts as above. One way to approach was first remove the spaces between quotes and then split at spaces. Also string before quotes complicates it. Second one was split at spaces but not if inside quote and then remove spaces from individual split. I tried capturing quotes with "\"([^\"]+)\"" but I'm not able to capture just the spaces inside quotes. I tried some more but no luck.

We can do this using a formal pattern matcher. The secret sauce of the answer below is to use the not-much-used Matcher#appendReplacement method. We pause at each match, and then append a custom replacement of anything appearing inside two pairs of quotes. The custom method removeSpaces() strips all whitespace from each quoted term.
public static String removeSpaces(String input) {
return input.replaceAll("\\s+", "");
}
String input = "abc test=\"x y z\" magic=\" hello \" hola";
Pattern p = Pattern.compile("\"(.*?)\"");
Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer("");
while (m.find()) {
m.appendReplacement(sb, "\"" + removeSpaces(m.group(1)) + "\"");
}
m.appendTail(sb);
String[] parts = sb.toString().split("\\s+");
for (String part : parts) {
System.out.println(part);
}
abc
test="xyz"
magic="hello"
hola
Demo
The big caveat here, as the above comments hinted at, is that we are really using a regex engine as a rudimentary parser. To see where my solution would fail fast, just remove one of the quotes by accident from a quoted term. But, if you are sure you input is well formed as you have showed us, this answer might work for you.

I wanted to mention the java 9's Matcher.replaceAll lambda extension:
// Find quoted strings and remove there whitespace:
s = Pattern.compile("\"[^\"]*\"").matcher(s)
.replaceAll(mr -> mr.group().replaceAll("\\s", ""));
// Turn the remaining whitespace in a comma and brace all.
s = '{' + s.trim().replaceAll("\\s+", ", ") + '}';

Probably the other answer is better but still I have written it so I will post it here ;) It takes a different approach
public static void main(String[] args) {
String test="abc test=\"x y z\" magic=\" hello \" hola";
Pattern pattern = Pattern.compile("([^\\\"]+=\\\"[^\\\"]+\\\" )");
Matcher matcher = pattern.matcher(test);
int lastIndex=0;
while(matcher.find()) {
String[] parts=matcher.group(0).trim().split("=");
boolean newLine=false;
for (String string : parts[0].split("\\s+")) {
if(newLine)
System.out.println();
newLine=true;
System.out.print(string);
}
System.out.println("="+parts[1].replaceAll("\\s",""));
lastIndex=matcher.end();
}
System.out.println(test.substring(lastIndex).trim());
}
Result is
abc
test="xyz"
magic="hello"
hola

It sounds like you want to write a basic parser/Tokenizer. My bet is that after you make something that can deal with pretty printing in this structure, you will soon want to start validating that there arn't any mis-matching "'s.
But in essence, you have a few stages for this particular problem, and Java has a built in tokenizer that can prove useful.
import java.util.LinkedList;
import java.util.List;
import java.util.StringTokenizer;
import java.util.stream.Collectors;
public class Q50151376{
private static class Whitespace{
Whitespace(){ }
#Override
public String toString() {
return "\n";
}
}
private static class QuotedString {
public final String string;
QuotedString(String string) {
this.string = "\"" + string.trim() + "\"";
}
#Override
public String toString() {
return string;
}
}
public static void main(String[] args) {
String test = "abc test=\"x y z\" magic=\" hello \" hola";
StringTokenizer tokenizer = new StringTokenizer(test, "\"");
boolean inQuotes = false;
List<Object> out = new LinkedList<>();
while (tokenizer.hasMoreTokens()) {
final String token = tokenizer.nextToken();
if (inQuotes) {
out.add(new QuotedString(token));
} else {
out.addAll(TokenizeWhitespace(token));
}
inQuotes = !inQuotes;
}
System.out.println(joinAsStrings(out));
}
private static String joinAsStrings(List<Object> out) {
return out.stream()
.map(Object::toString)
.collect(Collectors.joining());
}
public static List<Object> TokenizeWhitespace(String in){
List<Object> out = new LinkedList<>();
StringTokenizer tokenizer = new StringTokenizer(in, " ", true);
boolean ignoreWhitespace = false;
while (tokenizer.hasMoreTokens()){
String token = tokenizer.nextToken();
boolean whitespace = token.equals(" ");
if(!whitespace){
out.add(token);
ignoreWhitespace = false;
} else if(!ignoreWhitespace) {
out.add(new Whitespace());
ignoreWhitespace = true;
}
}
return out;
}
}

Java Regex compress String

I have random String for example "aaaaaaBccccCCCCd" I need make regex which searches the text for groups to get effect "a6B1c4C4d1". My regex looks like that "(\\D+)\\D*\\1" but he lost single letters, so in this sample B and d.
Maybe someone would have an idea?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Compress {
public static void main(String[] args) {
String text = "aaaaaaBccccCCCCd";
String regex = "(\\D+)\\D*\\1"; // or (.+).*\\1
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
String result = new String();
while (matcher.find()) {
String letter = matcher.group().substring(0, 1);
String numberOfLetter = String.valueOf(matcher.group().length());
result = result + letter + numberOfLetter;
}
System.out.println(result);
}
}
Thank you.

Use the following approach based on Matcher#appendReplacement:
String text = "aaaaaaBccccCCCCd"; //a6B1c4C4d1
String regex = "(.)(\\1*)";
String pattern = "test";
Pattern r = Pattern.compile(regex);
Matcher m = r.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(1) + (m.group(2).length()+1));
}
m.appendTail(sb);
System.out.println(sb);
See the Java demo
The (.)(\1*) will capture any char into Group 1 and then will capture into Group 2 zero or more repetitions of the same content. In the "callback", Group 1 is concatenated with the length of Group 2 incremented to account for the Group 1 length.

Uppercase all characters but not those in quoted strings

I have a String and I would like to uppercase everything that is not quoted.
Example:
My name is 'Angela'
Result:
MY NAME IS 'Angela'
Currently, I am matching every quoted string then looping and concatenating to get the result.
Is it possible to achieve this in one regex expression maybe using replace?

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("\\'(.*?)\\'");
String input = "'s'Hello This is 'Java' Not '.NET'";
Matcher regexMatcher = regex.matcher(input);
StringBuffer sb = new StringBuffer();
int counter = 0;
while (regexMatcher.find())
{// Finds Matching Pattern in String
regexMatcher.appendReplacement(sb, "{"+counter+"}");
matchList.add(regexMatcher.group());// Fetching Group from String
counter++;
}
String format = MessageFormat.format(sb.toString().toUpperCase(), matchList.toArray());
System.out.println(input);
System.out.println("----------------------");
System.out.println(format);
Input: 's'Hello This is 'Java' Not '.NET'
Output: 's'HELLO THIS IS 'Java' NOT '.NET'

You could use a regular expression like this:
([^'"]+)(['"]+[^'"]+['"]+)(.*)
# match and capture everything up to a single or double quote (but not including)
# match and capture a quoted string
# match and capture any rest which might or might not be there.
This will only work with one quoted string, obviously. See a working demo here.

Ok. This will do it for you.. Not efficient, but will work for all cases. I actually don't suggest this solution as it will be too slow.
public static void main(String[] args) {
String s = "'Peter' said, My name is 'Angela' and I will not change my name to 'Pamela'.";
Pattern p = Pattern.compile("('\\w+')");
Matcher m = p.matcher(s);
List<String> quotedStrings = new ArrayList<>();
while(m.find()) {
quotedStrings.add(m.group(1));
}
s=s.toUpperCase();
// System.out.println(s);
for (String str : quotedStrings)
s= s.replaceAll("(?i)"+str, str);
System.out.println(s);
}
O/P :
'Peter' SAID, MY NAME IS 'Angela' AND I WILL NOT CHANGE MY NAME TO 'Pamela'.

Adding to the answer by #jan_kiran, we need to call the
appendTail()
method appendTail(). Updated code is:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("\\'(.*?)\\'");
String input = "'s'Hello This is 'Java' Not '.NET'";
Matcher regexMatcher = regex.matcher(input);
StringBuffer sb = new StringBuffer();
int counter = 0;
while (regexMatcher.find())
{// Finds Matching Pattern in String
regexMatcher.appendReplacement(sb, "{"+counter+"}");
matchList.add(regexMatcher.group());// Fetching Group from String
counter++;
}
regexMatcher.appendTail(sb);
String formatted_string = MessageFormat.format(sb.toString().toUpperCase(), matchList.toArray());

I did not find my luck with these solutions, as they seemed to remove trailing non-quoted text.
This code works for me, and treats both ' and " by remembering the last opening quotation mark type. Replace toLowerCase appropriately, of course...
Maybe this is extremely slow; I don't know:
private static String toLowercaseExceptInQuotes(String line) {
StringBuffer sb = new StringBuffer(line);
boolean nowInQuotes = false;
char lastQuoteType = 0;
for (int i = 0; i < sb.length(); ++i) {
char cchar = sb.charAt(i);
if (cchar == '"' || cchar == '\''){
if (!nowInQuotes) {
nowInQuotes = true;
lastQuoteType = cchar;
}
else {
if (lastQuoteType == cchar) {
nowInQuotes = false;
}
}
}
else if (!nowInQuotes) {
sb.setCharAt(i, Character.toLowerCase(sb.charAt(i)));
}
}
return sb.toString();
}

find substrings inside string

How can i find substrings inside string and then remember and delete it when i found it.
EXAMPLE:
select * from (select a.iid_organizacijske_enote,
a.sifra_organizacijske_enote "Sifra OE",
a.naziv_organizacijske_enote "Naziv OE",
a.tip_organizacijske_enote "Tip OE"
I would like to get all word inside " ", so
Sifra OE
Naziv OE
TIP OE
and return
select * from (select a.iid_organizacijske_enote,
a.sifra_organizacijske_enote,
a.naziv_organizacijske_enote,
a.tip_organizacijske_enote
i try with regex, indexOf() but no one works ok

String.replace(..):
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
str = str.replace(wordToRemove, "");
If you don't know the words in advance, you can use the regex version:
str = str.replaceAll("\"[^\"]+\"", "");
This means, that all strings starting and ending with quotes, with any character except quotes between them, will be replaced with empty string.

Consider using regex with capturing groups. With Java's Matcher class, you can find the first match, and then use replaceFirst(String).
--EDIT--
example (not efficient for long inputs):
String in = "hello \"there\", \"friend!\"";
Pattern p = Pattern.compile("\\\"([^\"]*)\\\"");
Matcher m = p.matcher(in);
while(m.find()){
System.out.println(m.group(1));
in = m.replaceFirst("");
m = p.matcher(in);
}
System.out.println(in);

i tried and created function as below -- its working fine and returning output you want
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Program p = new Program();
string s = p.mystring("select * from (select a.iid_organizacijske_enote, a.sifra_organizacijske_enote 'Sifra OE', "
+"a.naziv_organizacijske_enote 'Naziv OE', "+
"a.tip_organizacijske_enote 'Tip OE'");
}
public string mystring(string s)
{
if (s.IndexOf("'") > 0)
{
string test = s.Substring(0, s.IndexOf("'"));
s = s.Replace(test+"'", "");
s = s.Remove(0, s.IndexOf("'") + 1);
test = test.Replace("'", "");
test = test + s;
return mystring(test);
}
else
{
return s;
}
}
}
}

best & optimized code is here:
public static void main(String[] args){
int j =0;
boolean substr = true;
String mainStr = "abcdefgh";
String ipStr = "efg";
for(int i=0 ; i < mainStr.length();i++){
if(j<ipStr.length() && mainStr.charAt(i)==ipStr.charAt(j)){
j++;
}
}
if(j>=0 && j !=ipStr.length()){
substr = false;
}
System.out.println("its a substring:"+substr);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

split string method - java

Use this code there.. Pattern pattern = Pattern.compile("(\\w+|\\W)"); Matcher m = pattern.matcher("a=(b+c); string st='hello adeel';"); while (m.find()) { System.out.println(m.group()); }

Related

Retrieve a Sub-string from a string after the first occurrence of a character from a range of characters

Splitting string on spaces unless in double quotes but double quotes can have a preceding string attached

Java Regex compress String

Uppercase all characters but not those in quoted strings

find substrings inside string

Categories

Resources