Search a space after a particular string in java - java

String text = "select ename from emp";
I want to know the space index after the word from. How to do it?

If you're specifically looking for the index of the first space after the word "from", you can use:
text.substring(text.indexOf("from")).indexOf(' ');
If you're trying to do something more general, than you'll need to give a bit more information. But the indexOf() method will probably be very useful to you.
Edit: This should actually be
text.indexOf(' ', text.indexOf("from"));
The first version returns the index relative to "from", whereas the second returns the index relative to the original string. (thanks #jpm)
This loop will find all space characters in the given string:
int index = text.indexOf(' ');
while (index >= 0) {
System.out.println(index);
index = text.indexOf(' ', index + 1);
}

The very basic answer might look something like...
String text = "select ename from emp";
text = text.toLowerCase();
if (text.contains("from ")) {
int index = text.indexOf("from ") + "from".length();
System.out.println("Found space # " + index);
System.out.println(text.substring(index));
} else {
System.out.println(text + " does not contain `from `");
}
Or you could use some regular expression (this is a rather poor example, but hay)
Pattern pattern = Pattern.compile("from ");
Matcher matcher = pattern.matcher(text);
String match = null;
int endIndex = -1;
if (matcher.find()) {
endIndex = matcher.end();
}
if (endIndex > -1) {
endIndex--;
System.out.println("Found space # " + endIndex);
System.out.println(text.substring(endIndex));
} else {
System.out.println(text + " does not contain `from `");
}
To find the index of each space you could do something like...
Pattern pattern = Pattern.compile(" ");
Matcher matcher = pattern.matcher(text);
String match = null;
while (matcher.find()) {
System.out.println(matcher.start());
}
Which will output
6
12
17

use indexOf() method. Hope u got the answer

Related

Looking for punctuations marks in a string and then finding their index to do a substring

I have some string where I need to extract a substring from it based on either the first occurrence of a punctuation mark or the first occurrence of a digit. E.g
from Taltz 80mg autoinjector I need to extract Taltz or from Trulicity 0.75mg, weekly I need to extract Trulicity
Here's my code:
char [] punctuations = {'.' , ',' , ';' , ':','"' , '\'' ,'/', ')' , '('};
String value = "Taltz, 80mg autoinjector";
int pos = value.replaceFirst("^(\\D+).*$", "$1").length();
for(int j = 0; j < value.length(); j++) {
for (int k = 0; k < punctuations.length;k++){
if(value.charAt(j) == punctuations[k]){
value = value.substring(0,value.indexOf(punctuations[k]));
break;
}
}
}
if(value.matches(".*\\d+.*")){
value = value.substring(0, pos);
}
System.out.println(value);
}
Is there a more efficient way to do this?
You can define the part that you want to keep and capture it with a regex :
String s = "Taltz test 80mg autoinjector";
Pattern pattern = Pattern.compile("([a-zA-Z ]+).*");
Matcher matcher = pattern.matcher(s);
if(matcher.matches()) {
System.out.println("matches : " + matcher.group(1).trim());
} else {
System.out.println("Does not match");
}
Output :
Taltz test
You can also capture everything that is "neither a punctuation sign nor a digit" with the following regex :
Pattern pattern = Pattern.compile("([^0-9;,:.?]+).*");
(same output)

How to extract a multiple quoted substrings in Java

I have a string that has multiple substring which has to be extracted. Strings which will be extracted is between ' character.
I could only extract the first or the last one when I use indexOf or regex.
How could I extract them and put them into array or list without parsing the same string only?
resultData = "Error 205: 'x' data is not crawled yet. Check 'y' and 'z' data and update dataset 't'";
I have a tried below;
protected static String errorsTPrinted(String errStr, int errCode) {
if (errCode== 202 ) {
ArrayList<String> ar = new ArrayList<String>();
Pattern p = Pattern.compile("'(.*?)'");
Matcher m = p.matcher(errStr);
String text;
for (int i = 0; i < errStr.length(); i++) {
m.find();
text = m.group(1);
ar.add(text);
}
return errStr = "Err 202: " + ar.get(0) + " ... " + ar.get(1) + " ..." + ar.get(2) + " ... " + ar.get(3);
}
Edit
I used #MinecraftShamrock 's approach.
if (errCode== 202 ) {
List<String> getQuotet = getQuotet(errStr, '\'');
return errStr = "Err 202: " + getQuotet.get(0) + " ... " + getQuotet.get(1) + " ..." + getQuotet.get(2) + " ... " + getQuotet.get(3);
}
You could use this very straightforward algorithm to do so and avoid regex (as one can't be 100% sure about its complexity):
public List<String> getQuotet(final String input, final char quote) {
final ArrayList<String> result = new ArrayList<>();
int n = -1;
for(int i = 0; i < input.length(); i++) {
if(input.charAt(i) == quote) {
if(n == -1) { //not currently inside quote -> start new quote
n = i + 1;
} else { //close current quote
result.add(input.substring(n, i));
n = -1;
}
}
}
return result;
}
This works with any desired quote-character and has a runtime complexity of O(n). If the string ends with an open quote, it will not be included. However, this can be added quite easily.
I think this is preferable over regex as you can ba absolutely sure about its complexity. Also, it works with a minimum of library classes. If you care about efficiency for big inputs, use this.
And last but not least, it does absolutely not care about what is between two quote characters so it works with any input string.
Simply use the pattern:
'([^']++)'
And a Matcher like so:
final Pattern pattern = Pattern.compile("'([^']++)'");
final Matcher matcher = pattern.matcher(resultData);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
This loops through each match in the String and prints it.
Output:
x
y
z
t
Here is a simple approach (assuming there are no escaping characters etc.):
// Compile a pattern to find the wanted strings
Pattern p = Pattern.compile("'([^']+)'");
// Create a matcher for given input
Matcher m = p.matcher(resultData);
// A list to put the found strings into
List<String> list = new ArrayList<String>();
// Loop over all occurrences
while(m.find()) {
// Retrieve the matched text
String text = m.group(1);
// Do something with the text, e.g. add it to a List
list.add(text);
}

matcher to avoid words ending with s,ing, or words in the middle

I am trying to match a text against a glossary list. the problem is that my pattern shows different behavour for one text.
for example here is my text :
\nfor Sprints \nSprints \nSprinting \nAccount Accounts Accounting\nSprintsSprints
with the following pattern matcher, I try to only find the exact word matches with glossary,and avoid finding the words ends with s,ing,... it only return me the right answer for "Account" word, but if I try Sprint, then it returns me Sprints, Sprinting, etc which is not right:
Pattern findTerm = Pattern.compile("(" + item.getTerm() + ")(\\W)",Pattern.DOTALL);
and here is my code :
private static String findGlossaryTerms(String response, List<Glossary> glossary) {
StringBuilder builder = new StringBuilder();
for (int offset = 0; offset < response.length(); offset++) {
boolean match = false;
if (response.startsWith("<", offset)) {
String newString = response.substring(offset);
Pattern findHtmlTag = Pattern.compile("\\<.*?\\>");
Matcher matcher = findHtmlTag.matcher(newString);
if (matcher.find()) {
String htmlTag = matcher.group(0);
builder.append(htmlTag);
offset += htmlTag.length() - 1;
match = true;
}
}
for (Glossary item : glossary) {
if (response.startsWith(item.getTerm(), offset)) {
String textFromOffset = response.substring(offset - 1);
Pattern findTerm = Pattern.compile("(" + item.getTerm() + ")(\\W)",Pattern.DOTALL);
Matcher matcher = findTerm.matcher(textFromOffset);
if (matcher.find()) {
builder.append("<span class=\"term\">").append(item.getTerm()).append("</span>");
offset += item.getTerm().length() - 1;
match = true;
break;
}
}
if (!match)
builder.append(response.charAt(offset));
}
return builder.toString();
}
What is the \\W in your pattern good for? if it just to ensure that the word ends, then use word boundaries instead:
Pattern findTerm = Pattern.compile("(\\b" + item.getTerm() + "\\b)",Pattern.DOTALL);
Those word boundaries ensure, that you are really matching the complete word and don't get partial matches.

Find the last index of the first match in Java

Lets say I have this string
String s ="stackjomvammssastackvmlmvlrstack"
And I want to find the last index of the first match of the substring "stack" which is (index=)4 in my example.
How will I do that?
Here's what I have done so far
Matcher m = pattern.matcher(s);
int i=0;
while (m.find())
{
System.out.println(m.start());
System.out.println(m.end());
}
But that displays the last index of the last match.
You can simply find the position of the word and add its length:
String s = "stackjomvammssastackvmlmvlrstack";
String match = "stack";
int start = s.indexOf(match);
int end = (start + match.length() - 1);
System.out.println(match + " found at index " + start);
System.out.println("Index of last character of first match is " + end);
If you need to use a regex, your code is close to the solution - you could do this:
String s = "stackjomvammssastackvmlmvlrstack";
String match = "s.*?k";
Matcher m = Pattern.compile(match).matcher(s);
if (m.find()) {
System.out.println(m.end() - 1);
}
try this code:
if (m.find())
System.out.println(m.end() - 1);
from java doc of Matcher.end():
Returns the offset after the last character matched.
If I understood your question right, you are trying to find the first match of "stack", and then get the last index....
You can accomplish this by doing:
string toFind = "stack";
int firstOccurance = s.indexOf(toFind); (being s the word you posted above)
int whatYourLooking = firstOccurance + toFind.length() - 1;
Hope it helps! ;)
This way?
String s ="stackjomvammssastackvmlmvlrstack";
String word = "stack";
int index = s.indexOf(word) + word.length() - 1;
final String input = "stackjomvammssastackvmlmvlrstack";
final String stack = "stack";
int start = input.indexOf(stack);
int end = start + stack.length() - 1;
String s = "stackjomvammssastackvmlmvlrstack";
String stack = "stack";
int index = s.indexOf(stack) + stack.length();
im edited my previous answer
String s ="stackjomvammssastackvmlmvlrstack";
String pattern = "stack";
Matcher m = pattern.matcher(s);
int i=0;
while (m.find())
{
System.out.println(m.start());
System.out.println(m.start()+m.length());
}

How to Split a string in java based on limit

I have following String and i want to split this string into number of sub strings(by taking ',' as a delimeter) when its length reaches 36. Its not exactly splitting on 36'th position
String message = "This is some(sampletext), and has to be splited properly";
I want to get the output as two substrings follows:
1. 'This is some (sampletext)'
2. 'and has to be splited properly'
Thanks in advance.
A solution based on regex:
String s = "This is some sample text and has to be splited properly";
Pattern splitPattern = Pattern.compile(".{1,15}\\b");
Matcher m = splitPattern.matcher(s);
List<String> stringList = new ArrayList<String>();
while (m.find()) {
stringList.add(m.group(0).trim());
}
Update:
trim() can be droped by changing the pattern to end in space or end of string:
String s = "This is some sample text and has to be splited properly";
Pattern splitPattern = Pattern.compile("(.{1,15})\\b( |$)");
Matcher m = splitPattern.matcher(s);
List<String> stringList = new ArrayList<String>();
while (m.find()) {
stringList.add(m.group(1));
}
group(1) means that I only need the first part of the pattern (.{1,15}) as output.
.{1,15} - a sequence of any characters (".") with any length between 1 and 15 ({1,15})
\b - a word break (a non-character before of after any word)
( |$) - space or end of string
In addition I've added () surrounding .{1,15} so I can use it as a whole group (m.group(1)).
Depending on the desired result, this expression can be tweaked.
Update:
If you want to split message by comma only if it's length would be over 36, try the following expression:
Pattern splitPattern = Pattern.compile("(.{1,36})\\b(,|$)");
The best solution I can think of is to make a function that iterates through the string. In the function you could keep track of whitespace characters, and for each 16th position you could add a substring to a list based on the position of the last encountered whitespace. After it has found a substring, you start anew from the last encountered whitespace. Then you simply return the list of substrings.
Here's a tidy answer:
String message = "This is some sample text and has to be splited properly";
String[] temp = message.split("(?<=^.{1,16}) ");
String part1 = message.substring(0, message.length() - temp[temp.length - 1].length() - 1);
String part2 = message.substring(message.length() - temp[temp.length - 1].length());
This should work on all inputs, except when there are sequences of chars without whitespace longer than 16. It also creates the minimum amount of extra Strings by indexing into the original one.
public static void main(String[] args) throws IOException
{
String message = "This is some sample text and has to be splited properly";
List<String> result = new ArrayList<String>();
int start = 0;
while (start + 16 < message.length())
{
int end = start + 16;
while (!Character.isWhitespace(message.charAt(end--)));
result.add(message.substring(start, end + 1));
start = end + 2;
}
result.add(message.substring(start));
System.out.println(result);
}
If you have a simple text as the one you showed above (words separated by blank spaces) you can always think of StringTokenizer. Here's some simple code working for your case:
public static void main(String[] args) {
String message = "This is some sample text and has to be splited properly";
while (message.length() > 0) {
String token = "";
StringTokenizer st = new StringTokenizer(message);
while (st.hasMoreTokens()) {
String nt = st.nextToken();
String foo = "";
if (token.length()==0) {
foo = nt;
}
else {
foo = token + " " + nt;
}
if (foo.length() < 16)
token = foo;
else {
System.out.print("'" + token + "' ");
message = message.substring(token.length() + 1, message.length());
break;
}
if (!st.hasMoreTokens()) {
System.out.print("'" + token + "' ");
message = message.substring(token.length(), message.length());
}
}
}
}

Categories

Resources