find substrings inside string

find substrings inside string - java

How can i find substrings inside string and then remember and delete it when i found it.
EXAMPLE:
select * from (select a.iid_organizacijske_enote,
a.sifra_organizacijske_enote "Sifra OE",
a.naziv_organizacijske_enote "Naziv OE",
a.tip_organizacijske_enote "Tip OE"
I would like to get all word inside " ", so
Sifra OE
Naziv OE
TIP OE
and return
select * from (select a.iid_organizacijske_enote,
a.sifra_organizacijske_enote,
a.naziv_organizacijske_enote,
a.tip_organizacijske_enote
i try with regex, indexOf() but no one works ok

String.replace(..):
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
str = str.replace(wordToRemove, "");
If you don't know the words in advance, you can use the regex version:
str = str.replaceAll("\"[^\"]+\"", "");
This means, that all strings starting and ending with quotes, with any character except quotes between them, will be replaced with empty string.

Consider using regex with capturing groups. With Java's Matcher class, you can find the first match, and then use replaceFirst(String).
--EDIT--
example (not efficient for long inputs):
String in = "hello \"there\", \"friend!\"";
Pattern p = Pattern.compile("\\\"([^\"]*)\\\"");
Matcher m = p.matcher(in);
while(m.find()){
System.out.println(m.group(1));
in = m.replaceFirst("");
m = p.matcher(in);
}
System.out.println(in);

i tried and created function as below -- its working fine and returning output you want
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Program p = new Program();
string s = p.mystring("select * from (select a.iid_organizacijske_enote, a.sifra_organizacijske_enote 'Sifra OE', "
+"a.naziv_organizacijske_enote 'Naziv OE', "+
"a.tip_organizacijske_enote 'Tip OE'");
}
public string mystring(string s)
{
if (s.IndexOf("'") > 0)
{
string test = s.Substring(0, s.IndexOf("'"));
s = s.Replace(test+"'", "");
s = s.Remove(0, s.IndexOf("'") + 1);
test = test.Replace("'", "");
test = test + s;
return mystring(test);
}
else
{
return s;
}
}
}
}

best & optimized code is here:
public static void main(String[] args){
int j =0;
boolean substr = true;
String mainStr = "abcdefgh";
String ipStr = "efg";
for(int i=0 ; i < mainStr.length();i++){
if(j<ipStr.length() && mainStr.charAt(i)==ipStr.charAt(j)){
j++;
}
}
if(j>=0 && j !=ipStr.length()){
substr = false;
}
System.out.println("its a substring:"+substr);
}

Related

Splitting string on spaces unless in double quotes but double quotes can have a preceding string attached

I need to split a string in Java (first remove whitespaces between quotes and then split at whitespaces.)
"abc test=\"x y z\" magic=\" hello \" hola"
becomes:
firstly:
"abc test=\"xyz\" magic=\"hello\" hola"
and then:
abc
test="xyz"
magic="hello"
hola
Scenario :
I am getting a string something like above from input and I want to break it into parts as above. One way to approach was first remove the spaces between quotes and then split at spaces. Also string before quotes complicates it. Second one was split at spaces but not if inside quote and then remove spaces from individual split. I tried capturing quotes with "\"([^\"]+)\"" but I'm not able to capture just the spaces inside quotes. I tried some more but no luck.

We can do this using a formal pattern matcher. The secret sauce of the answer below is to use the not-much-used Matcher#appendReplacement method. We pause at each match, and then append a custom replacement of anything appearing inside two pairs of quotes. The custom method removeSpaces() strips all whitespace from each quoted term.
public static String removeSpaces(String input) {
return input.replaceAll("\\s+", "");
}
String input = "abc test=\"x y z\" magic=\" hello \" hola";
Pattern p = Pattern.compile("\"(.*?)\"");
Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer("");
while (m.find()) {
m.appendReplacement(sb, "\"" + removeSpaces(m.group(1)) + "\"");
}
m.appendTail(sb);
String[] parts = sb.toString().split("\\s+");
for (String part : parts) {
System.out.println(part);
}
abc
test="xyz"
magic="hello"
hola
Demo
The big caveat here, as the above comments hinted at, is that we are really using a regex engine as a rudimentary parser. To see where my solution would fail fast, just remove one of the quotes by accident from a quoted term. But, if you are sure you input is well formed as you have showed us, this answer might work for you.

I wanted to mention the java 9's Matcher.replaceAll lambda extension:
// Find quoted strings and remove there whitespace:
s = Pattern.compile("\"[^\"]*\"").matcher(s)
.replaceAll(mr -> mr.group().replaceAll("\\s", ""));
// Turn the remaining whitespace in a comma and brace all.
s = '{' + s.trim().replaceAll("\\s+", ", ") + '}';

Probably the other answer is better but still I have written it so I will post it here ;) It takes a different approach
public static void main(String[] args) {
String test="abc test=\"x y z\" magic=\" hello \" hola";
Pattern pattern = Pattern.compile("([^\\\"]+=\\\"[^\\\"]+\\\" )");
Matcher matcher = pattern.matcher(test);
int lastIndex=0;
while(matcher.find()) {
String[] parts=matcher.group(0).trim().split("=");
boolean newLine=false;
for (String string : parts[0].split("\\s+")) {
if(newLine)
System.out.println();
newLine=true;
System.out.print(string);
}
System.out.println("="+parts[1].replaceAll("\\s",""));
lastIndex=matcher.end();
}
System.out.println(test.substring(lastIndex).trim());
}
Result is
abc
test="xyz"
magic="hello"
hola

It sounds like you want to write a basic parser/Tokenizer. My bet is that after you make something that can deal with pretty printing in this structure, you will soon want to start validating that there arn't any mis-matching "'s.
But in essence, you have a few stages for this particular problem, and Java has a built in tokenizer that can prove useful.
import java.util.LinkedList;
import java.util.List;
import java.util.StringTokenizer;
import java.util.stream.Collectors;
public class Q50151376{
private static class Whitespace{
Whitespace(){ }
#Override
public String toString() {
return "\n";
}
}
private static class QuotedString {
public final String string;
QuotedString(String string) {
this.string = "\"" + string.trim() + "\"";
}
#Override
public String toString() {
return string;
}
}
public static void main(String[] args) {
String test = "abc test=\"x y z\" magic=\" hello \" hola";
StringTokenizer tokenizer = new StringTokenizer(test, "\"");
boolean inQuotes = false;
List<Object> out = new LinkedList<>();
while (tokenizer.hasMoreTokens()) {
final String token = tokenizer.nextToken();
if (inQuotes) {
out.add(new QuotedString(token));
} else {
out.addAll(TokenizeWhitespace(token));
}
inQuotes = !inQuotes;
}
System.out.println(joinAsStrings(out));
}
private static String joinAsStrings(List<Object> out) {
return out.stream()
.map(Object::toString)
.collect(Collectors.joining());
}
public static List<Object> TokenizeWhitespace(String in){
List<Object> out = new LinkedList<>();
StringTokenizer tokenizer = new StringTokenizer(in, " ", true);
boolean ignoreWhitespace = false;
while (tokenizer.hasMoreTokens()){
String token = tokenizer.nextToken();
boolean whitespace = token.equals(" ");
if(!whitespace){
out.add(token);
ignoreWhitespace = false;
} else if(!ignoreWhitespace) {
out.add(new Whitespace());
ignoreWhitespace = true;
}
}
return out;
}
}

Uppercase all characters but not those in quoted strings

I have a String and I would like to uppercase everything that is not quoted.
Example:
My name is 'Angela'
Result:
MY NAME IS 'Angela'
Currently, I am matching every quoted string then looping and concatenating to get the result.
Is it possible to achieve this in one regex expression maybe using replace?

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("\\'(.*?)\\'");
String input = "'s'Hello This is 'Java' Not '.NET'";
Matcher regexMatcher = regex.matcher(input);
StringBuffer sb = new StringBuffer();
int counter = 0;
while (regexMatcher.find())
{// Finds Matching Pattern in String
regexMatcher.appendReplacement(sb, "{"+counter+"}");
matchList.add(regexMatcher.group());// Fetching Group from String
counter++;
}
String format = MessageFormat.format(sb.toString().toUpperCase(), matchList.toArray());
System.out.println(input);
System.out.println("----------------------");
System.out.println(format);
Input: 's'Hello This is 'Java' Not '.NET'
Output: 's'HELLO THIS IS 'Java' NOT '.NET'

You could use a regular expression like this:
([^'"]+)(['"]+[^'"]+['"]+)(.*)
# match and capture everything up to a single or double quote (but not including)
# match and capture a quoted string
# match and capture any rest which might or might not be there.
This will only work with one quoted string, obviously. See a working demo here.

Ok. This will do it for you.. Not efficient, but will work for all cases. I actually don't suggest this solution as it will be too slow.
public static void main(String[] args) {
String s = "'Peter' said, My name is 'Angela' and I will not change my name to 'Pamela'.";
Pattern p = Pattern.compile("('\\w+')");
Matcher m = p.matcher(s);
List<String> quotedStrings = new ArrayList<>();
while(m.find()) {
quotedStrings.add(m.group(1));
}
s=s.toUpperCase();
// System.out.println(s);
for (String str : quotedStrings)
s= s.replaceAll("(?i)"+str, str);
System.out.println(s);
}
O/P :
'Peter' SAID, MY NAME IS 'Angela' AND I WILL NOT CHANGE MY NAME TO 'Pamela'.

Adding to the answer by #jan_kiran, we need to call the
appendTail()
method appendTail(). Updated code is:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("\\'(.*?)\\'");
String input = "'s'Hello This is 'Java' Not '.NET'";
Matcher regexMatcher = regex.matcher(input);
StringBuffer sb = new StringBuffer();
int counter = 0;
while (regexMatcher.find())
{// Finds Matching Pattern in String
regexMatcher.appendReplacement(sb, "{"+counter+"}");
matchList.add(regexMatcher.group());// Fetching Group from String
counter++;
}
regexMatcher.appendTail(sb);
String formatted_string = MessageFormat.format(sb.toString().toUpperCase(), matchList.toArray());

I did not find my luck with these solutions, as they seemed to remove trailing non-quoted text.
This code works for me, and treats both ' and " by remembering the last opening quotation mark type. Replace toLowerCase appropriately, of course...
Maybe this is extremely slow; I don't know:
private static String toLowercaseExceptInQuotes(String line) {
StringBuffer sb = new StringBuffer(line);
boolean nowInQuotes = false;
char lastQuoteType = 0;
for (int i = 0; i < sb.length(); ++i) {
char cchar = sb.charAt(i);
if (cchar == '"' || cchar == '\''){
if (!nowInQuotes) {
nowInQuotes = true;
lastQuoteType = cchar;
}
else {
if (lastQuoteType == cchar) {
nowInQuotes = false;
}
}
}
else if (!nowInQuotes) {
sb.setCharAt(i, Character.toLowerCase(sb.charAt(i)));
}
}
return sb.toString();
}

Filtering a group of words from a String in Java using Regular Expressions

I have this string.
String longText = "Associatepm: 4654-8199-9146";
and
String longText2 = "Associatepm: 465481999146";
I want to check if longText contains 4654-9199-9146 or 465491999146
using regular expressions, and retrieve it.
String newLongText = 4654-8199-9146
System.out.println("The value of new long text" + newLongText);
//prints 4654-8199-9146 or 465491999146
This is the code I've tried:
if(this.text.contains("associate 4444-4444-4444")){
//print 4444-4444-4444
} else if(this.text.contains("associatepm 444444444444")){
//print 444444444444
}

This should match both of your scenarios.
[0-9]{4}-?[0-9]{4}-?[0-9]{4}
https://regex101.com/r/gH2hN8/1

This is the answer to my question:
public static void main(String[] args){
String text = "Associatepm: 4654-8199-9146";
Pattern p = Pattern.compile("[0-9]{4}-?[0-9]{4}-?[0-9]{4}");
Matcher m = p.matcher(text);
boolean b = m.find();
if(b == true){
text = text.replaceAll("\\D+","");
}
System.out.println(text);
}
Special thanks to Ryan Reimer.

SubString replacement in text

I want to replace all the occurrences of a string in a text except the first one.
for eg:
input: Example [2] This is a sample text. This is a sample text. This is a sample text.
replaced word: sample (sImple)
output: Example [2] This is a sample text. This is a sImple text. This is a sImple text.
In string functions what I see is replace, replaceAll, replaceFirst.
How should I handle this case.
Thanks in advance.

You can use this regex to search:
((?:\bsample\b|(?<!^)\G).*?)\bsample\b
And this for replcement:
$1simple
RegEx Demo
Java Code:
String r = input.replaceAll("((?:\\bsample\\b|(?<!^)\\G).*?)\\bsample\\b", "$1simple");

replaceAll and replace will replace all substrings (difference between them is that replaceAll uses regular expression as argument, while replace uses literals).
replaceFirst will replace only first substring which will match pattern you want to find.
What you can do is
use indexOf(String str, int fromIndex) method to determine indexes of first and second sample word,
then substring(int beginIndex) on index of second sample to get part of string from which you want to let replacing possible
and call your replace method on this part
when replacement is done you can concatenate part which shouldn't be changed (before index of second sample word) and part with replaced values
Other solution would be using appendReplacement and appendTail form Matcher class and use replacing value after you find second sample word. Code for it can look like
String yourString = "Example [2] This is a sample text. This is a sample text. This is a sample text.";
Pattern p = Pattern.compile("sample", Pattern.LITERAL);
Matcher m = p.matcher(yourString);
StringBuffer sb = new StringBuffer();
boolean firstWordAlreadyFound = false;
while (m.find()) {
if (firstWordAlreadyFound) {
m.appendReplacement(sb, "sImple");
} else {
m.appendReplacement(sb, m.group());
firstWordAlreadyFound = true;
}
}
m.appendTail(sb);
String result = sb.toString();
System.out.println(result);
Output:
Example [2] This is a sample text. This is a sImple text. This is a sImple text.

Here is a naive approach:
public static String replaceAllButFirst(String text, String toReplace, String replacement) {
String[] parts = text.split(toReplace, 2);
if(parts.length == 2) { //Found at least one match
return parts[0] + toReplace + parts[1].replaceAll(toReplace, replacement);
} else { //no match found giving original text
return text;
}
}
public static void main(String[] args) {
String x = "This is a sample test. This is a sample test. This is a sample test";
System.out.println(replaceAllButFirst(x, "sample", "simple"));
}
Which will give:
This is a sample test. This is a simple test. This is a simple test

Try with substring and indexOf methods to break it in two string then replace in second string and finally append both the strings back
sample code:
String str = "Example [2] This is a sample text. This is a sample text. This is a sample text.";
String findWhat = "sample";
int index = str.indexOf(findWhat) + findWhat.length();
String temp = str.substring(0, index + 1); // first string
str = str.substring(index + 1); // second string
//replace in second string and combine back
str = temp + str.replace(findWhat, "simple"); // final string
System.out.println(str);
combine all in few statements:
int index = str.indexOf(findWhat) + findWhat.length();
str = str.substring(0, index + 1) + str.substring(index + 1).replace(findWhat, "simple");

There is no built-in function that does exactly what you want, either in the String or StringBuilder classes. You'll need to write your own. Here's a quickie:
private string ReplaceText(string originalText, string textToReplace, string replacementText)
{
string tempText;
int firstIndex, lastIndex;
tempText = originalText;
firstIndex = originalText.IndexOf(textToReplace);
lastIndex = tempText.LastIndexOf(textToReplace);
while (firstIndex >= 0 && lastIndex > firstIndex)
{
tempText = tempText.Substring(0,lastIndex) + replacementText + tempText.Substring(lastIndex + textToReplace.Length);
lastIndex = tempText.LastIndexOf(textToReplace);
}
return tempText;
}

Another option:
(?<=\bsample\b)(.*?)\bsample\b
And replacement:
$1yourstring
Java Code:
String s=input.replaceAll("(?<=\\bsample\\b)(.*?)\\bsample\\b", "$1yourString");

split string method

i have written a code in java as given under
public class sstring
{
public static void main(String[] args)
{
String s="a=(b+c); string st='hello adeel';";
String[] ss=s.split("\\b");
for(int i=0;i<ss.length;i++)
System.out.println(ss[i]);
}
}
and the output of this code is
a
=(
b
+
c
);
string
st
='
hello
adeel
';
what should i do in order to split =( or ); etc in two separate elements rather than single elements. in this array. i.e. my output may look as
a
=
(
b
+
c
)
;
string
st
=
'
hello
adeel
'
;
is it possible ?

This matches with every find either a word \\w+ (small w) or a non-word character \\W (capital W).
It is an unaccepted answer of can split string method of java return the array with the delimiters as well of the above comment of #RohitJain.
public String[] getParts(String s) {
List<String> parts = new ArrayList<String>();
Pattern pattern = Pattern.compile("(\\w+|\\W)");
Matcher m = pattern.matcher(s);
while (m.find()) {
parts.add(m.group());
}
return parts.toArray(new String[parts.size()]);
}

Use this code there..
Pattern pattern = Pattern.compile("(\\w+|\\W)");
Matcher m = pattern.matcher("a=(b+c); string st='hello adeel';");
while (m.find()) {
System.out.println(m.group());
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

find substrings inside string - java

Related

Splitting string on spaces unless in double quotes but double quotes can have a preceding string attached

Uppercase all characters but not those in quoted strings

Filtering a group of words from a String in Java using Regular Expressions

SubString replacement in text

split string method

Categories

Resources