Splitting a String that has a particular structure

Splitting a String that has a particular structure - java

I have a string that goes something like this
"330 Daniel T92435"
Now I need to obtain the name "Daniel", and I could simply just type
string.substring(4,11);
But the position where a name ("Daniel") is placed could vary.
And I don't want to use the split[] method.
I was thinking if there was a way to make the substring method read data until a whitespace is found.

If input string always has the following string structure "someSymbols Name someSymbols" you can use the following regular expression to extract the name:
"[^\\s]+\\s+(\\p{Alpha}+)\\s+[^\\s]+"
\\p{Alpha} - alphabetic character;
\\s - white space;
[^\\s] - any symbol apart from the white space.
In the code below Pattern is as object representing the regular expression. In turn, Matcher is a special object that is responsible for navigation over the given string and allows discovering the parts of this string that match the pattern.
public static String findName(String source) {
Pattern pattern = Pattern.compile("[^\\s]+\\s+(\\p{Alpha}+)\\s+[^\\s]+");
Matcher matcher = pattern.matcher(source);
String result = "no match was found";
if (matcher.find()) {
result = matcher.group(1); // group 1 corresponds to the first element enclosed in parentheses (\\p{Alpha}+)
}
return result;
}
main()
public static void main(String[] args) {
System.out.println(findName("330 Daniel T92435"));
}
Output
Daniel

You can use the str.indexOf(" ") function.
int start = string.indexOf(" ")+1;
string.substring(start,start + 7);
Edit: You can use
int start = string.indexOf(" ")+1;
int end = string.indexOf(" ", start+1);
string.substring(start,end >= 0 ? end : string.length());
if you want to select the first word and don't know how long it will be.

Related

Delete some part of the string in beginning and some at last in java

I want a dynamic code which will trim of some part of the String at the beginning and some part at last. I am able to trim the last part but not able to trim the initial part of the String to a specific point completely. Only the first character is deleted in the output.
public static String removeTextAndLastBracketFromString(String string) {
StringBuilder str = new StringBuilder(string);
int i=0;
do {
str.deleteCharAt(i);
i++;
} while(string.equals("("));
str.deleteCharAt(string.length() - 2);
return str.toString();
}
This is my code. When I pass Awaiting Research(5056) as an argument, the output given is waiting Research(5056. I want to trim the initial part of such string till ( and I want only the digits as my output. My expected output here is - 5056. Please help.

You don't need loops (in your code), you can use String.substring(int, int) in combination with String.indexOf(char):
public static void main(String[] args) {
// example input
String input = "Awaiting Research(5056)";
// find the braces and use their indexes to get the content
String output = input.substring(
input.indexOf('(') + 1, // index is exclusive, so add 1
input.indexOf(')')
);
// print the result
System.out.println(output);
}
Output:
5056
Hint:
Only use this if you are sure the input will always contain a ( and a ) with indexOf('(') < indexOf(')') or handle IndexOutOfBoundsExceptions, which will occur on most Strings not matching the braces constraint.

If your goal is just to look one numeric value of the string, try split the string with regex for the respective numeric value and then you'll have the number separated from the string
e.g:
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher("somestringwithnumberlike123");
if(matcher.find()) {
System.out.println(matcher.group());
}

Using a regexp to extract what you need is a better option :
String test = "Awaiting Research(5056)";
Pattern p = Pattern.compile("([0-9]+)");
Matcher m = p.matcher(test);
if (m.find()) {
System.out.println(m.group());
}

For your case, battery use regular expression to extract your interested part.
Pattern pattern = Pattern.compile("(?<=\\().*(?=\\))");
Matcher matcher = pattern.matcher("Awaiting Research(5056)");
if(matcher.find())
{
return matcher.group();
}

It is much easier to solve the problem e.g. using the String.indexOf(..) and String.substring(from,to). But if, for some reason you want to stick to your approach, here are some hints:
Your code does what is does because:
string.equals("(") is only true if the given string is exacly "("
the do {code} while (condition)-loop executes code once if condition is not true -> think about using the while (condition) {code} loop instead
if you change the condition to check for the character at i, your code would remove the first, third, fifth and so on: After first execution i is 1 and char at i is now the third char of the original string (because the first has been removed already) -> think about always checking and removing charAt 0.

How to substring before nth occurence of a separator?

first;snd;3rd;4th;5th;6th;...
How can I split the above after the third occurence of the ; separator? Especially without having to value.split(";") the whole string as an array, as I won't need the values separated. Just the first part of the string up until nth occurence.
Desired output would be:
first;snd;3rd.
I just need that as a string substring, not as split separated values.

Use StringUtils.ordinalIndexOf() from Apache
Finds the n-th index within a String, handling null. This method uses String.indexOf(String).
Parameters:
str - the String to check, may be null
searchStr - the String to find, may be null
ordinal - the n-th searchStr to find
Returns:
the n-th index of the search String, -1 (INDEX_NOT_FOUND) if no match or null string input
Or this way, no libraries required:
public static int ordinalIndexOf(String str, String substr, int n) {
int pos = str.indexOf(substr);
while (--n > 0 && pos != -1)
pos = str.indexOf(substr, pos + 1);
return pos;
}

I would go with this, easy and basic:
String test = "first;snd;3rd;4th;5th;6th;";
int result = 0;
for (int i = 0; i < 3; i++) {
result = test.indexOf(";", result) +1;
}
System.out.println(test.substring(0, result-1));
Output:
first;snd;3rd
You can ofc change the 3 in the loop with the number of arguments you need

If you want to use regular expressions, it is pretty straightforward:
import re
value = "first;snd;3rd;4th;5th;6th;"
reg = r'^([\w]+;[\w]+;[\w]+)'
re.match(reg, value).group()
Outputs:
"first;snd;3rd"
More options here .

You could use a regex that uses a negated character class to match from the start of the string not a semicolon.
Then repeat a grouping structure 2 times that matches a semicolon followed by not a semicolon 1+ times.
^[^;]+(?:;[^;]+){2}
Explanation
^ Assert the start of the string
[^;]+ Negated character class to match not a semicolon 1+ times
(?: Start non capturing group
;[^;]+ Match a semicolon and 1+ times not a semi colon
){2} Close non capturing group and repeat 2 times
For example:
String regex = "^[^;]+(?:;[^;]+){2}";
String string = "first;snd;3rd;4th;5th;6th;...";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
if (matcher.find()) {
System.out.println(matcher.group(0)); // first;snd;3rd
}
See the Java demo

If you don't want to use split, just use indexOf in a for loop to know the index of the 3rd and 4th ";" then do a substring between these index.
Also you can do a split with a regex that match the 3rd ; but it's probably not the best solution.

If you need to do this frequently it is best to compile the regex upfront in a static Pattern instance:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class NthOccurance {
static Pattern pattern=Pattern.compile("^(([^;]*;){3}).*");
public static void main(String[] args) {
String in="first;snd;3rd;4th;5th;6th;";
Matcher m=pattern.matcher(in);
if (m.matches())
System.out.println(m.group(1));
}
}
Replace the '3' by the number of elements you want.

Below code find index of 3rd occurence of ';' character and make substring.
String s = "first;snd;3rd;4th;5th;6th;";
String splitted = s.substring(0, s.indexOf(";", s.indexOf(";", s.indexOf(";") + 1) + 1));

attempting to return a range of words in a string using regex java

I am trying to return a range of words in string using regular expressions but i am lost in between the lines of doing it. This is my attempt
private static final String REGEX = "\\hello";
private static final String INPUT = "I love holidays hello how are you today during summer";
public static void main( String args[] ) {
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); // get a matcher object
int count = 0;
while(m.find()) {
count++;
System.out.println("Match number "+count);
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
System.out.println("value of m >> "+m);//to print >>> hello how are you
Does anyone know how I can return this word hello how are you from the string using regex

Since the requirement is to start at hello and end at the first you you could use the following expression: hello.*?you.
Here .*? means you want to match anything but as little as possible. This is meant to stop at the first you rather than the the last.
If you want to prevent matches on input where hello and you are only parts of other words surround them with the word boundary \b: \bhello\b.*?\byou\b.
If you want to match inside sentences only, i.e. the match should not include ., ! or ?, you could use a negative character class like [^.!?], i.e. replace .*? with [^.!?]*?. Note that inside the character class . has a literal meaning, i.e. it is the dot and not the "any character" wildcard.

Parsing text using Regex

So I am trying to parse a String that contains two key components. One tells me the timing options, and the other is position.
Here is what the text looks like
KB_H9Oct4GFP_20130305_p00{iiii}t00000{ttt}z001c02.tif
The {iiii} is the position and the {ttt} is the timing options.
I need to separate the {ttt} and {iiii} out so I can get a full file name: example, position 1 and time slice 1 = KB_H9Oct4GFP_20130305_p0000001t000000001z001c02.tif
So far here is how I am parsing them:
int startTimeSlice = 1;
int startTile = 1;
String regexTime = "([^{]*)\\{([t]+)\\}(.*)";
Pattern patternTime = Pattern.compile(regexTime);
Matcher matcherTime = patternTime.matcher(filePattern);
if (!matcherTime.find() || matcherTime.groupCount() != 3)
{
throw new IllegalArgumentException("Incorect filePattern: " + filePattern);
}
String timePrefix = matcherTime.group(1);
int tCount = matcherTime.group(2).length();
String timeSuffix = matcherTime.group(3);
String timeMatcher = timePrefix + "%0" + tCount + "d" + timeSuffix;
String timeFileName = String.format(timeMatcher, startTimeSlice);
String regex = "([^{]*)\\{([i]+)\\}(.*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(timeFileName);
if (!matcher.find() || matcher.groupCount() != 3)
{
throw new IllegalArgumentException("Incorect filePattern: " + filePattern);
}
String prefix = matcher.group(1);
int iCount = matcher.group(2).length();
String suffix = matcher.group(3);
String nameMatcher = prefix + "%0" + iCount + "d" + suffix;
String fileName = String.format(nameMatcher, startTile);
Unfortunately my code is not working and it fails when checking if the second matcher finds anything in timeFileName.
After the first regex check it gets the following as the timeFileName: 000000001z001c02.tif, so it is cutting off the beginning potions including the {iiii}
Unfortunately I cannot assuming which group goes first ({iiii} or {ttt}), so I am trying to devise a solution that just handles {ttt} first and then processes {iiii}.
Also, here is another example of valid text that I am also trying to parse: F_{iii}_{ttt}.tif

Steps to follow:
Find string {ttt...} in file name
Form a number format based on no of "t" in string
Find string {iiii...} in file name
Form a number format based on no of "i" in string
Use String.replace() method to replace time and possition
Here is the code:
String filePattern = "KB_H9Oct4GFP_20130305_p00{iiii}t00000{ttt}z001c02.tif";
int startTimeSlice = 1;
int startTile = 1;
Pattern patternTime = Pattern.compile("(\\{[t]*\\})");
Matcher matcherTime = patternTime.matcher(filePattern);
if (matcherTime.find()) {
String timePattern = matcherTime.group(0);// {ttt}
NumberFormat timingFormat = new DecimalFormat(timePattern.replaceAll("t", "0")
.substring(1, timePattern.length() - 1));// 000
Pattern patternPosition = Pattern.compile("(\\{[i]*\\})");
Matcher matcherPosition = patternPosition.matcher(filePattern);
if (matcherPosition.find()) {
String positionPattern = matcherPosition.group(0);// {iiii}
NumberFormat positionFormat = new DecimalFormat(positionPattern
.replaceAll("i", "0").substring(1, positionPattern.length() - 1));// 0000
System.out.println(filePattern.replace(timePattern,
timingFormat.format(startTimeSlice)).replace(positionPattern,
positionFormat.format(startTile)));
}
}

Okay, so after a bit of testing I found a way to handle the case:
For parsing the {ttt} I can use the regex: (.*)\\{t([t]+)\\}(.*)
Now this means I have to increment tCount by one to account for the t I grab from \\{t
Same goes for {iii}: (.*)\\{i([i]+)\\}(.*)

Your first pattern looks like this:
String regexTime = "([^{]*)\\{([t]+)\\}(.*)";
This finds a string consisting of a sequence of zero or more non-{ characters, followed by {t...t}, followed by other characters.
When your input is
KB_H9Oct4GFP_20130305_p00{iiii}t00000{ttt}z001c02.tif
the first substring that matches is
iiii}t00000{ttt}z001c02.tif
The { before the i's can't match, because you told it only to match non-{ characters. The result is that when you re-form the string to do the second match, it will start with iiii} and therefore won't match {iiii} like you're trying to do.
When you're looking for {ttt...}, I don't see any reason to exclude { or any other character from the first part of the string. So changing the regex to
"^(.*)\\{(t+\\}(.*)$"
may be a simple way to fix this. Note that if you want to make sure you include the entire beginning of the string and the entire end of the string in your groups, you should include ^ and $ to match the beginning and end of the string, respectively; otherwise the matcher engine may decide not to include everything. In this case, it won't, but it's a good habit to get into anyway, because that makes things explicit and doesn't require anyone to know the difference between "greedy" and "reluctant" matching. Or use matches() instead of find(), since matches() automatically tries to match the entire string.

Perhaps an easier way to do this (as confirmed by http://regex101.com/r/vG7kY7) is
(\{i+\}).*(\{t+\})
You don't need the [] around a single character you are matching. Keep it simple. i+ means "one or more i's", and as long as these are in the order given, this expression will work (with the first match being {iiii} and the second {ttttt}).
You may need to escape the backslash when writing it in a string...

How to find the text between ( and )

I have a few strings which are like this:
text (255)
varchar (64)
...
I want to find out the number between ( and ) and store that in a string. That is, obviously, store these lengths in strings.
I have the rest of it figured out except for the regex parsing part.
I'm having trouble figuring out the regex pattern.
How do I do this?
The sample code is going to look like this:
Matcher m = Pattern.compile("<I CANT FIGURE OUT WHAT COMES HERE>").matcher("text (255)");
Also, I'd like to know if there's a cheat sheet for regex parsing, from where one can directly pick up the regex patterns

I would use a plain string match
String s = "text (255)";
int start = s.indexOf('(')+1;
int end = s.indexOf(')', start);
if (end < 0) {
// not found
} else {
int num = Integer.parseInt(s.substring(start, end));
}
You can use regex as sometimes this makes your code simpler, but that doesn't mean you should in all cases. I suspect this is one where a simple string indexOf and substring will not only be faster, and shorter but more importantly, easier to understand.

You can use this pattern to match any text between parentheses:
\(([^)]*)\)
Or this to match just numbers (with possible whitespace padding):
\(\s*(\d+)\s*\)
Of course, to use this in a string literal, you have to escape the \ characters:
Matcher m = Pattern.compile("\\(\\s*(\\d+)\\s*\\)")...

Here is some example code:
import java.util.regex.*;
class Main
{
public static void main(String[] args)
{
String txt="varchar (64)";
String re1=".*?"; // Non-greedy match on filler
String re2="\\((\\d+)\\)"; // Round Braces 1
Pattern p = Pattern.compile(re1+re2,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String rbraces1=m.group(1);
System.out.print("("+rbraces1.toString()+")"+"\n");
}
}
}
This will print out any (int) it finds in the input string, txt.
The regex is \((\d+)\) to match any numbers between ()

int index1 = string.indexOf("(")
int index2 = string.indexOf(")")
String intValue = string.substring(index1+1, index2-1);

Matcher m = Pattern.compile("\\((\\d+)\\)").matcher("text (255)");
if (m.find()) {
int len = Integer.parseInt (m.group(1));
System.out.println (len);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Splitting a String that has a particular structure - java

Related

Delete some part of the string in beginning and some at last in java

How to substring before nth occurence of a separator?

attempting to return a range of words in a string using regex java

Parsing text using Regex

How to find the text between ( and )

Categories

Resources