java create variable from regex findings - java

I'm pretty new to Java, but I am looking to create a String variable from a regex finding. But I am not too sure how.
Basically I need: previous_identifer = (all the text in nextline up to the third comma);
Something maybe like this?
previous_identifier = line.split("^(.+?),(.+?),(.+?),");
Or:
line = reader.readLine();
Pattern courseColumnPattern = Pattern.compile("^(.+?),(.+?),(.+?),");
previous_identifier = (courseColumnPattern.matcher(line).find());
But I know that won't work. What should I do differently?

You can use split to return an array of Strings, then use a StringBuilder to build your return string. An advantage of this approach is being able to easily return the first four strings, two strings, ten strings, etc.
int limit = 3, current = 0;
StringBuilder sb = new StringBuilder();
// Used as an example of input
String str = "test,west,best,zest,jest";
String[] strings = str.split(",");
for(String s : strings) {
if(++current > limit) {
// We've reached the limit; bail
break;
}
if(current > 1) {
// Add a comma if it's not the first element. Alternative is to
// append a comma each time after appending s and remove the last
// character
sb.append(",");
}
sb.append(s);
}
System.out.println(sb.toString()); // Prints "test,west,best"
If you don't need to use the three elements separately (you truly want just the first three elements in a chunk), you can use a Matcher with the following regular expression:
String str = "test, west, best, zest, jest";
// Matches against "non-commas", then a comma, then "non-commas", then
// a comma, then "non-commas". This way, you still don't have a trailing
// comma at the end.
Matcher match = Pattern.compile("^([^,]*,[^,]*,[^,]*)").matcher(str);
if(match.find())
{
// Print out the output!
System.out.println(match.group(1));
}
else
{
// We didn't have a match. Handle it here.
}

Your regex will work, but could be expressed more briefly. This is how you can "extract" it:
String head = str.replaceAll("((.+?,){3}).*", "$1");
This matches the whole string, while capturing the target, with the replacement being the captured input using a back reference to group 1.
Despite the downvote, here's proof the code works!
String str = "foo,bar,baz,other,stuff";
String head = str.replaceAll("((.+?,){3}).*", "$1");
System.out.println(head);
Output:
foo,bar,baz,

try an online regex tester to work out the regex, i think you need less brackets to get the entire text, i'd guess something like:
([^,+?],[^,+?],[^,+?])
Which says, find everything except a comma, then a comma, then everything but a comma, then a comman, then everything else that isn't a comma. I suspect this can be improved dramatically, i am not a regex expert
Then your java just needs to compile it and match against your string:
line = reader.readLine();
Pattern courseColumnPattern = Pattern.compile("([^,+?],[^,+?],[^,+?])");
if (previous_identifier.matches()) {
previous_identifier = (courseColumnPattern.matcher(line);
}

Related

Delete some part of the string in beginning and some at last in java

I want a dynamic code which will trim of some part of the String at the beginning and some part at last. I am able to trim the last part but not able to trim the initial part of the String to a specific point completely. Only the first character is deleted in the output.
public static String removeTextAndLastBracketFromString(String string) {
StringBuilder str = new StringBuilder(string);
int i=0;
do {
str.deleteCharAt(i);
i++;
} while(string.equals("("));
str.deleteCharAt(string.length() - 2);
return str.toString();
}
This is my code. When I pass Awaiting Research(5056) as an argument, the output given is waiting Research(5056. I want to trim the initial part of such string till ( and I want only the digits as my output. My expected output here is - 5056. Please help.
You don't need loops (in your code), you can use String.substring(int, int) in combination with String.indexOf(char):
public static void main(String[] args) {
// example input
String input = "Awaiting Research(5056)";
// find the braces and use their indexes to get the content
String output = input.substring(
input.indexOf('(') + 1, // index is exclusive, so add 1
input.indexOf(')')
);
// print the result
System.out.println(output);
}
Output:
5056
Hint:
Only use this if you are sure the input will always contain a ( and a ) with indexOf('(') < indexOf(')') or handle IndexOutOfBoundsExceptions, which will occur on most Strings not matching the braces constraint.
If your goal is just to look one numeric value of the string, try split the string with regex for the respective numeric value and then you'll have the number separated from the string
e.g:
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher("somestringwithnumberlike123");
if(matcher.find()) {
System.out.println(matcher.group());
}
Using a regexp to extract what you need is a better option :
String test = "Awaiting Research(5056)";
Pattern p = Pattern.compile("([0-9]+)");
Matcher m = p.matcher(test);
if (m.find()) {
System.out.println(m.group());
}
For your case, battery use regular expression to extract your interested part.
Pattern pattern = Pattern.compile("(?<=\\().*(?=\\))");
Matcher matcher = pattern.matcher("Awaiting Research(5056)");
if(matcher.find())
{
return matcher.group();
}
It is much easier to solve the problem e.g. using the String.indexOf(..) and String.substring(from,to). But if, for some reason you want to stick to your approach, here are some hints:
Your code does what is does because:
string.equals("(") is only true if the given string is exacly "("
the do {code} while (condition)-loop executes code once if condition is not true -> think about using the while (condition) {code} loop instead
if you change the condition to check for the character at i, your code would remove the first, third, fifth and so on: After first execution i is 1 and char at i is now the third char of the original string (because the first has been removed already) -> think about always checking and removing charAt 0.

How to use regex with pattern matcher against multiple strings?

I'm reading in a list of strings from a List<String>. The strings look like this:
blah1
blah2
blah3
blah4
In java, I'd like to build a regex that checks for a pattern like this (myString/|yourString) and concatenate that to each of the strings in the list above while doing a pattern match against the lines of a file.
So I do this (the code below is just snippits):
String pattern = "(myString/|yourString.)"
private String listAsString;
private void createListAsStrings() {
StringBuilder sb = new StringBuilder();
for(String string : stringList) {
sb.append(string + "|"); # using the pipe hoping it will do an OR in the regex
}
listAsString = sb.toString();
}
To build the pattern, I'm trying to do the following:
Pattern p = Pattern.compile(pattern + listAsString);
But when I get to running the matcher it doesn't go through each string in the list of strings from my stringbuilder. And then the last problem is that my last string will contain a |.
Is there a way to match myString/blah1 or yourString.blah1 or myString/blah2 etc.. using a regex against each line in a file?
There is a lot of code, so I just posted what seemed relevant.
The expression that you are looking to build should be as follows:
myString/(?:\Qblah1\E|\Qblah2\E)
You need to wrap the strings blah1, blah2, etc. in \Q - \E in case the strings contain regex metacharacters. To fix the addition of leading | use a boolean variable that indicates if this is the first iteration through the loop or not:
StringBuilder sb = new StringBuilder();
boolean isFirst = true;
for(String word : stringList) {
if (!isFirst) {
sb.append('|');
} else {
isFirst = false;
}
sb.append("\\Q");
sb.append(word);
sb.append("\\E");
}
String regex = "myString/" + "(?:" + sb + ")";
I think the basic problem is that your pattern (ignoring the trailing | problem) is something like
(myString/|yourString.)blah1|blah2|blah3
which will match one of these
myString/blah1
yourString.blah1
blah2
blah3
That's how the operator precedence works in regexes. You need an extra set of parentheses around the lines from the file (plus see the other answers about \Q..\E and avoiding the bar at the end of the string).

Problems with building this regex [1,2,3]

i have a problem to build following regex:
[1,2,3,4]
i found a work-around, but i think its ugly
String stringIds = "[1,2,3,4]";
stringIds = stringIds.replaceAll("\\[", "");
stringIds = stringIds.replaceAll("\\]", "");
String[] ids = stringIds.split("\\,");
Can someone help me please to build one regex, which i can use in the split function
Thanks for help
edit:
i want to get from this string "[1,2,3,4]" to an array with 4 entries. the entries are the 4 numbers in the string, so i need to eliminate "[","]" and ",". the "," isn't the problem.
the first and last number contains [ or ]. so i needed the fix with replaceAll. But i think if i use in split a regex for ",", i also can pass a regex which eliminates "[" "]" too. But i cant figure out, who this regex should look like.
This is almost what you're looking for:
String q = "[1,2,3,4]";
String[] x = q.split("\\[|\\]|,");
The problem is that it produces an extra element at the beginning of the array due to the leading open bracket. You may not be able to do what you want with a single regex sans shenanigans. If you know the string always begins with an open bracket, you can remove it first.
The regex itself means "(split on) any open bracket, OR any closed bracket, OR any comma."
Punctuation characters frequently have additional meanings in regular expressions. The double leading backslashes... ugh, the first backslash tells the Java String parser that the next backslash is not a special character (example: \n is a newline...) so \\ means "I want an honest to God backslash". The next backslash tells the regexp engine that the next character ([ for example) is not a special regexp character. That makes me lol.
Maybe substring [ and ] from beginning and end, then split the rest by ,
String stringIds = "[1,2,3,4]";
String[] ids = stringIds.substring(1,stringIds.length()-1).split(",");
Looks to me like you're trying to make an array (not sure where you got 'regex' from; that means something different). In this case, you want:
String[] ids = {"1","2","3","4"};
If it's specifically an array of integer numbers you want, then instead use:
int[] ids = {1,2,3,4};
Your problem is not amenable to splitting by delimiter. It is much safer and more general to split by matching the integers themselves:
static String[] nums(String in) {
final Matcher m = Pattern.compile("\\d+").matcher(in);
final List<String> l = new ArrayList<String>();
while (m.find()) l.add(m.group());
return l.toArray(new String[l.size()]);
}
public static void main(String args[]) {
System.out.println(Arrays.toString(nums("[1, 2, 3, 4]")));
}
If the first line your code is following:
String stringIds = "[1,2,3,4]";
and you're trying to iterate over all number items, then the follwing code-frag only could work:
try {
Pattern regex = Pattern.compile("\\b(\\d+)\\b", Pattern.MULTILINE);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
for (int i = 1; i <= regexMatcher.groupCount(); i++) {
// matched text: regexMatcher.group(i)
// match start: regexMatcher.start(i)
// match end: regexMatcher.end(i)
}
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}

Splitting strings based on a delimiter

I am trying to break apart a very simple collection of strings that come in the forms of
0|0
10|15
30|55
etc etc. Essentially numbers that are seperated by pipes.
When I use java's string split function with .split("|"). I get somewhat unpredictable results. white space in the first slot, sometimes the number itself isn't where I thought it should be.
Can anybody please help and give me advice on how I can use a reg exp to keep ONLY the integers?
I was asked to give the code trying to do the actual split. So allow me to do that in hopes to clarify further my problem :)
String temp = "0|0";
String splitString = temp.split("|");
results
\n
0
|
0
I am trying to get
0
0
only. Forever grateful for any help ahead of time :)
I still suggest to use split(), it skips null tokens by default. you want to get rid of non numeric characters in the string and only keep pipes and numbers, then you can easily use split() to get what you want. or you can pass multiple delimiters to split (in form of regex) and this should work:
String[] splited = yourString.split("[\\|\\s]+");
and the regex:
import java.util.regex.*;
Pattern pattern = Pattern.compile("\\d+(?=([\\|\\s\\r\\n]))");
Matcher matcher = pattern.matcher(yourString);
while (matcher.find()) {
System.out.println(matcher.group());
}
The pipe symbol is special in a regexp (it marks alternatives), you need to escape it. Depending on the java version you are using this could well explain your unpredictable results.
class t {
public static void main(String[]_)
{
String temp = "0|0";
String[] splitString = temp.split("\\|");
for (int i=0; i<splitString.length; i++)
System.out.println("splitString["+i+"] is " + splitString[i]);
}
}
outputs
splitString[0] is 0
splitString[1] is 0
Note that one backslash is the regexp escape character, but because a backslash is also the escape character in java source you need two of them to push the backslash into the regexp.
You can do replace white space for pipes and split it.
String test = "0|0 10|15 30|55";
test = test.replace(" ", "|");
String[] result = test.split("|");
Hope this helps for you..
You can use StringTokenizer.
String test = "0|0";
StringTokenizer st = new StringTokenizer(test);
int firstNumber = Integer.parseInt(st.nextToken()); //will parse out the first number
int secondNumber = Integer.parseInt(st.nextToken()); //will parse out the second number
Of course you can always nest this inside of a while loop if you have multiple strings.
Also, you need to import java.util.* for this to work.
The pipe ('|') is a special character in regular expressions. It needs to be "escaped" with a '\' character if you want to use it as a regular character, unfortunately '\' is a special character in Java so you need to do a kind of double escape maneuver e.g.
String temp = "0|0";
String[] splitStrings = temp.split("\\|");
The Guava library has a nice class Splitter which is a much more convenient alternative to String.split(). The advantages are that you can choose to split the string on specific characters (like '|'), or on specific strings, or with regexps, and you can choose what to do with the resulting parts (trim them, throw ayway empty parts etc.).
For example you can call
Iterable<String> parts = Spliter.on('|').trimResults().omitEmptyStrings().split("0|0")
This should work for you:
([0-9]+)
Considering a scenario where in we have read a line from csv or xls file in the form of string and need to separate the columns in array of string depending on delimiters.
Below is the code snippet to achieve this problem..
{ ...
....
String line = new BufferedReader(new FileReader("your file"));
String[] splittedString = StringSplitToArray(stringLine,"\"");
...
....
}
public static String[] StringSplitToArray(String stringToSplit, String delimiter)
{
StringBuffer token = new StringBuffer();
Vector tokens = new Vector();
char[] chars = stringToSplit.toCharArray();
for (int i=0; i 0) {
tokens.addElement(token.toString());
token.setLength(0);
i++;
}
} else {
token.append(chars[i]);
}
}
if (token.length() > 0) {
tokens.addElement(token.toString());
}
// convert the vector into an array
String[] preparedArray = new String[tokens.size()];
for (int i=0; i < preparedArray.length; i++) {
preparedArray[i] = (String)tokens.elementAt(i);
}
return preparedArray;
}
Above code snippet contains method call to StringSplitToArray where in the method converts the stringline into string array splitting the line depending on the delimiter specified or passed to the method. Delimiter can be comma separator(,) or double code(").
For more on this, follow this link : http://scrapillars.blogspot.in

Regarding String manipulation

I have a String str which can have list of values like below. I want the first letter in the string to be uppercase and if underscore appears in the string then i need to remove it and need to make the letter after it as upper case. The rest all letter i want it to be lower case.
""
"abc"
"abc_def"
"Abc_def_Ghi12_abd"
"abc__de"
"_"
Output:
""
"Abc"
"AbcDef"
"AbcDefGhi12Abd"
"AbcDe"
""
Well, without showing us that you put any effort into this problem this is going to be kinda vague.
I see two possibilities here:
Split the string at underscores, apply the answer from this question to each part and re-combine them.
Create a StringBuilder, walk through the string and keep track of whether you are
at the start of the string
after an underscore or
somewhere else
and act appropriately on the current character before appending it to the StringBuilder instance.
replace _ with space (str.replace("_", " "))
use WordUtils.capitalizeFully(str); (from commons-lang)
replace space with nothing (str.replace(" ", ""))
You can use following regexp based code:
public static String camelize(String input) {
char[] c = input.toCharArray();
Pattern pattern = Pattern.compile(".*_([a-z]).*");
Matcher m = pattern.matcher(input);
while ( m.find() ) {
int index = m.start(1);
c[index] = String.valueOf(c[index]).toUpperCase().charAt(0);
}
return String.valueOf(c).replace("_", "");
}
Use Pattern/Matcher in the java.util.regex package:
for each string that is in your array do the following:
StringBuffer output = new StringBuffer();
Matcher match = Pattern.compile("[^|_](\w)").matcher(inStr);
while(match.find()) {
match.appendReplacement(output, matcher.match(0).ToUpper());
}
match.appendTail(output);
// Will have the properly capitalized string.
String capitalized = output.ToString();
The regular expression looks for either the start of the string or an underscore "[^|_]"
Then puts the following character into a group "(\w)"
The code then goes through each of the matches in the input string capitalizing the first satisfying group.

Categories

Resources