Suppose that I want to build a very large regex with capture groups on run-time based on user's decisions.
Simple example:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
static boolean findTag, findWordA, findOtherWord, findWordX;
static final String TAG = "(<[^>]+>)";
static final String WORD_A = "(wordA)";
static final String OTHER_WORD = "(anotherword)";
static final String WORD_X = "(wordX)";
static int tagCount = 0;
static int wordACount = 0;
static int otherWordCount = 0;
static int wordXCount = 0;
public static void main(String[] args) {
// Boolean options that will be supplied by the user
// make them all true in this example
findTag = true;
findWordA = true;
findOtherWord = true;
findWordX = true;
String input = "<b>this is an <i>input</i> string that contains wordX, wordX, anotherword and wordA</b>";
StringBuilder regex = new StringBuilder();
if (findTag)
regex.append(TAG + "|");
if (findWordA)
regex.append(WORD_A + "|");
if (findOtherWord)
regex.append(OTHER_WORD + "|");
if (findWordX)
regex.append(WORD_X + "|");
if (regex.length() > 0) {
regex.setLength(regex.length() - 1);
Pattern pattern = Pattern.compile(regex.toString());
System.out.println("\nWHOLE REGEX: " + regex.toString());
System.out.println("\nINPUT STRING: " + input);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
// only way I know of to find out which group was matched:
if (matcher.group(1) != null) tagCount++;
if (matcher.group(2) != null) wordACount++;
if (matcher.group(3) != null) otherWordCount++;
if (matcher.group(4) != null) wordXCount++;
}
System.out.println();
System.out.println("Group1 matches: " + tagCount);
System.out.println("Group2 matches: " + wordACount);
System.out.println("Group3 matches: " + otherWordCount);
System.out.println("Group4 matches: " + wordXCount);
} else {
System.out.println("No regex to build.");
}
}
}
The problem is that I can only count each group's matches only when I know beforehand which regex/groups the user wants to find.
Note that the full regex will contain a lot more capture groups and they will be more complex.
How can I determine which capture group was matched so that I can count each group's occurrences, without knowing beforehand which groups the user wants to find?
construct the regex to used named groups:
(?<tag>wordA)|(?<wordx>wordX)|(?<anotherword>anotherword)
I just wrote my program in C# but I want rewrite it in Java. I want create spintax text.
My C# code:
static string spintax(Random rnd, string str)
{
// Loop over string until all patterns exhausted.
string pattern = "{[^{}]*}";
Match m = Regex.Match(str, pattern);
while (m.Success)
{
// Get random choice and replace pattern match.
string seg = str.Substring(m.Index + 1, m.Length - 2);
string[] choices = seg.Split('|');
str = str.Substring(0, m.Index) + choices[rnd.Next(choices.Length)] + str.Substring(m.Index + m.Length);
m = Regex.Match(str, pattern);
}
// Return the modified string.
return str;
}
I've Updated My Code to
static String Spintax(Random rnd,String str)
{
String pat = "\\{[^{}]*\\}";
Pattern ma;
ma = Pattern.compile(pat);
Matcher mat = ma.matcher(str);
while(mat.find())
{
String segono = str.substring(mat.start() + 1,mat.end() - 1);
String[] choies = segono.split("\\|",-1);
str = str.substring(0, mat.start()) + choies[rnd.nextInt(choies.length)].toString() + str.substring(mat.start()+mat.group().length());
mat = ma.matcher(str);
}
return str;
}
works like a charm :D thanks all for your support..
You need to escape the brackets
String pat = "\\{[^{}]*\\}";
In Java, I need to find all occurrences of a String inside of a String.
eg.
String myString;
myString = "Random_XML_Stuff_Here <tag k="name" v="Example Road"/>
More_Random_XML_Stuff <tag k="name" v="Another name"/> More_XML_Stuff" Etc...
So I need to be able go grab the contents off all the Road names. In the first example, I need to be able to set a String to "Example Road".
Pseudo-Code:
String streets = "";
while(more occurrences of street names exist)
{
streets = streets + "," + (street.occurrence of street name);
}
In the above example, the string would have the contents "Example Road, Another name".
You could use something like this String[] parseValue(String) function
public static String[] parseValue(String in) {
String openTag = "<tag k=\"name\" v=\"";
int p1 = in.indexOf(openTag);
java.util.List<String> al = new java.util.ArrayList<String>();
while (p1 > -1) {
int p2 = in.indexOf("\"/>", p1 + 1);
if (p2 > -1) {
al.add(in.substring(p1 + openTag.length(), p2));
} else {
break;
}
p1 = in.indexOf(openTag, p2 + 1);
}
String[] out = new String[al.size()];
return al.toArray(out);
}
public static void main(String[] args) {
String myString = "Random_XML_Stuff_Here <tag k=\"name\" v=\"Example Road\"/> "
+ "More_Random_XML_Stuff <tag k=\"name\" v=\"Another name\"/>";
System.out.println(java.util.Arrays
.toString(parseValue(myString)));
}
which outputs
[Example Road, Another name]
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class welcome {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
String text = "Random_XML_Stuff_Here <tag k=\"name\" v=\"Example Road\"/> More_Random_XML_Stuff <tag k=\"name\" v=\"Another name\"/> More_XML_Stuff";
String road_string = GetRoadString(text);
System.out.println(road_string);
}
static String GetRoadString(String text)
{
Pattern pattern = Pattern.compile("<tag\\s+k=\"name\"\\s+v=\"(.*?)\"/>");
Matcher matcher = pattern.matcher(text);
// using Matcher find(), group(), start() and end() methods
String road_string = "";
while (matcher.find()) {
road_string = road_string + matcher.group(1)+ ",";
}
return road_string.substring(0, road_string.length()-1);
}
}
enter code here
Since you have a fixed XML format, you can use a regular expression match to find the road names mentioned.
Pattern roadNamePattern = Pattern.compile("v=\"(.*?)\"/>");
Matcher matcher = roadNamePattern.matcher(xmlString);
while (matcher.find()) {
String roadName = matcher.group(1);
}
ANswering following question using python:
inside a string find all characters with its occurrences eg:
I/P str='aababbdf'
O/P: [('a', 2), ('b', 1), ('a', 1), ('b', 2), ('d', 1), ('f', 1)]
i=j=total=0;
data=[];
for i in range (0,len(str)):
count=0;
i = j;
if( total != len(str)):
for j in range (i,len(str)):
if str[i] == str[j]:
count=count+1;
total = total +1;
char=str[i];
else:
break;
data.append((char,count));
print("\n data",data);
I have a string,
String s = "test string (67)";
I want to get the no 67 which is the string between ( and ).
Can anyone please tell me how to do this?
There's probably a really neat RegExp, but I'm noob in that area, so instead...
String s = "test string (67)";
s = s.substring(s.indexOf("(") + 1);
s = s.substring(0, s.indexOf(")"));
System.out.println(s);
A very useful solution to this issue which doesn't require from you to do the indexOf is using Apache Commons libraries.
StringUtils.substringBetween(s, "(", ")");
This method will allow you even handle even if there multiple occurrences of the closing string which wont be easy by looking for indexOf closing string.
You can download this library from here:
https://mvnrepository.com/artifact/org.apache.commons/commons-lang3/3.4
Try it like this
String s="test string(67)";
String requiredString = s.substring(s.indexOf("(") + 1, s.indexOf(")"));
The method's signature for substring is:
s.substring(int start, int end);
By using regular expression :
String s = "test string (67)";
Pattern p = Pattern.compile("\\(.*?\\)");
Matcher m = p.matcher(s);
if(m.find())
System.out.println(m.group().subSequence(1, m.group().length()-1));
Java supports Regular Expressions, but they're kind of cumbersome if you actually want to use them to extract matches. I think the easiest way to get at the string you want in your example is to just use the Regular Expression support in the String class's replaceAll method:
String x = "test string (67)".replaceAll(".*\\(|\\).*", "");
// x is now the String "67"
This simply deletes everything up-to-and-including the first (, and the same for the ) and everything thereafter. This just leaves the stuff between the parenthesis.
However, the result of this is still a String. If you want an integer result instead then you need to do another conversion:
int n = Integer.parseInt(x);
// n is now the integer 67
In a single line, I suggest:
String input = "test string (67)";
input = input.subString(input.indexOf("(")+1, input.lastIndexOf(")"));
System.out.println(input);`
You could use apache common library's StringUtils to do this.
import org.apache.commons.lang3.StringUtils;
...
String s = "test string (67)";
s = StringUtils.substringBetween(s, "(", ")");
....
Test String test string (67) from which you need to get the String which is nested in-between two Strings.
String str = "test string (67) and (77)", open = "(", close = ")";
Listed some possible ways: Simple Generic Solution:
String subStr = str.substring(str.indexOf( open ) + 1, str.indexOf( close ));
System.out.format("String[%s] Parsed IntValue[%d]\n", subStr, Integer.parseInt( subStr ));
Apache Software Foundation commons.lang3.
StringUtils class substringBetween() function gets the String that is nested in between two Strings. Only the first match is returned.
String substringBetween = StringUtils.substringBetween(subStr, open, close);
System.out.println("Commons Lang3 : "+ substringBetween);
Replaces the given String, with the String which is nested in between two Strings. #395
Pattern with Regular-Expressions: (\()(.*?)(\)).*
The Dot Matches (Almost) Any Character
.? = .{0,1}, .* = .{0,}, .+ = .{1,}
String patternMatch = patternMatch(generateRegex(open, close), str);
System.out.println("Regular expression Value : "+ patternMatch);
Regular-Expression with the utility class RegexUtils and some functions.
Pattern.DOTALL: Matches any character, including a line terminator.
Pattern.MULTILINE: Matches entire String from the start^ till end$ of the input sequence.
public static String generateRegex(String open, String close) {
return "(" + RegexUtils.escapeQuotes(open) + ")(.*?)(" + RegexUtils.escapeQuotes(close) + ").*";
}
public static String patternMatch(String regex, CharSequence string) {
final Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
final Matcher matcher = pattern .matcher(string);
String returnGroupValue = null;
if (matcher.find()) { // while() { Pattern.MULTILINE }
System.out.println("Full match: " + matcher.group(0));
System.out.format("Character Index [Start:End]«[%d:%d]\n",matcher.start(),matcher.end());
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
if( i == 2 ) returnGroupValue = matcher.group( 2 );
}
}
return returnGroupValue;
}
String s = "test string (67)";
int start = 0; // '(' position in string
int end = 0; // ')' position in string
for(int i = 0; i < s.length(); i++) {
if(s.charAt(i) == '(') // Looking for '(' position in string
start = i;
else if(s.charAt(i) == ')') // Looking for ')' position in string
end = i;
}
String number = s.substring(start+1, end); // you take value between start and end
String result = s.substring(s.indexOf("(") + 1, s.indexOf(")"));
public String getStringBetweenTwoChars(String input, String startChar, String endChar) {
try {
int start = input.indexOf(startChar);
if (start != -1) {
int end = input.indexOf(endChar, start + startChar.length());
if (end != -1) {
return input.substring(start + startChar.length(), end);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return input; // return null; || return "" ;
}
Usage :
String input = "test string (67)";
String startChar = "(";
String endChar = ")";
String output = getStringBetweenTwoChars(input, startChar, endChar);
System.out.println(output);
// Output: "67"
Another way of doing using split method
public static void main(String[] args) {
String s = "test string (67)";
String[] ss;
ss= s.split("\\(");
ss = ss[1].split("\\)");
System.out.println(ss[0]);
}
Use Pattern and Matcher
public class Chk {
public static void main(String[] args) {
String s = "test string (67)";
ArrayList<String> arL = new ArrayList<String>();
ArrayList<String> inL = new ArrayList<String>();
Pattern pat = Pattern.compile("\\(\\w+\\)");
Matcher mat = pat.matcher(s);
while (mat.find()) {
arL.add(mat.group());
System.out.println(mat.group());
}
for (String sx : arL) {
Pattern p = Pattern.compile("(\\w+)");
Matcher m = p.matcher(sx);
while (m.find()) {
inL.add(m.group());
System.out.println(m.group());
}
}
System.out.println(inL);
}
}
The "generic" way of doing this is to parse the string from the start, throwing away all the characters before the first bracket, recording the characters after the first bracket, and throwing away the characters after the second bracket.
I'm sure there's a regex library or something to do it though.
The least generic way I found to do this with Regex and Pattern / Matcher classes:
String text = "test string (67)";
String START = "\\("; // A literal "(" character in regex
String END = "\\)"; // A literal ")" character in regex
// Captures the word(s) between the above two character(s)
String pattern = START + "(\w+)" + END;
Pattern pattern = Pattern.compile(pattern);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println(matcher.group()
.replace(START, "").replace(END, ""));
}
This may help for more complex regex problems where you want to get the text between two set of characters.
The other possible solution is to use lastIndexOf where it will look for character or String from backward.
In my scenario, I had following String and I had to extract <<UserName>>
1QAJK-WKJSH_MyApplication_Extract_<<UserName>>.arc
So, indexOf and StringUtils.substringBetween was not helpful as they start looking for character from beginning.
So, I used lastIndexOf
String str = "1QAJK-WKJSH_MyApplication_Extract_<<UserName>>.arc";
String userName = str.substring(str.lastIndexOf("_") + 1, str.lastIndexOf("."));
And, it gives me
<<UserName>>
String s = "test string (67)";
System.out.println(s.substring(s.indexOf("(")+1,s.indexOf(")")));
Something like this:
public static String innerSubString(String txt, char prefix, char suffix) {
if(txt != null && txt.length() > 1) {
int start = 0, end = 0;
char token;
for(int i = 0; i < txt.length(); i++) {
token = txt.charAt(i);
if(token == prefix)
start = i;
else if(token == suffix)
end = i;
}
if(start + 1 < end)
return txt.substring(start+1, end);
}
return null;
}
This is a simple use \D+ regex and job done.
This select all chars except digits, no need to complicate
/\D+/
it will return original string if no match regex
var iAm67 = "test string (67)".replaceFirst("test string \\((.*)\\)", "$1");
add matches to the code
String str = "test string (67)";
String regx = "test string \\((.*)\\)";
if (str.matches(regx)) {
var iAm67 = str.replaceFirst(regx, "$1");
}
---EDIT---
i use https://www.freeformatter.com/java-regex-tester.html#ad-output to test regex.
turn out it's better to add ? after * for less match. something like this:
String str = "test string (67)(69)";
String regx1 = "test string \\((.*)\\).*";
String regx2 = "test string \\((.*?)\\).*";
String ans1 = str.replaceFirst(regx1, "$1");
String ans2 = str.replaceFirst(regx2, "$1");
System.out.println("ans1:"+ans1+"\nans2:"+ans2);
// ans1:67)(69
// ans2:67
String s = "(69)";
System.out.println(s.substring(s.lastIndexOf('(')+1,s.lastIndexOf(')')));
Little extension to top (MadProgrammer) answer
public static String getTextBetween(final String wholeString, final String str1, String str2){
String s = wholeString.substring(wholeString.indexOf(str1) + str1.length());
s = s.substring(0, s.indexOf(str2));
return s;
}
So say I have a string called x that = "Hello world". I want to somehow make it so that it will flip those two words and instead display "world Hello". I am not very good with loops or arrays and obviously am a beginner. Could I accomplish this somehow by splitting my string? If so, how? If not, how could I do this? Help would be appreciated, thanks!
1) split string into String array on space.
String myArray[] = x.split(" ");
2) Create new string with words in reverse order from array.
String newString = myArray[1] + " " + myArray[0];
Bonus points for using a StringBuilder instead of concatenation.
String abc = "Hello world";
String cba = abc.replace( "Hello world", "world Hello" );
abc = "This is a longer string. Hello world. My String";
cba = abc.replace( "Hello world", "world Hello" );
If you want, you can explode your string as well:
String[] pieces = abc.split(" ");
for( int i=0; i<pieces.length-1; ++i )
if( pieces[i]=="Hello" && pieces[i+1]=="world" ) swap(pieces[i], pieces[i+1]);
There are many other ways you can do it too. Be careful for capitalization. You can use .toUpperCase() in your if statements and then make your matching conditionals uppercase, but leave the results with their original capitalization, etc.
Here's the solution:
import java.util.*;
public class ReverseWords {
public String reverseWords(String phrase) {
List<String> wordList = Arrays.asList(phrase.split("[ ]"));
Collections.reverse(wordList);
StringBuilder sbReverseString = new StringBuilder();
for(String word: wordList) {
sbReverseString.append(word + " ");
}
return sbReverseString.substring(0, sbReverseString.length() - 1);
}
}
The above solution was coded by me, for Google Code Jam and is also blogged here: Reverse Words - GCJ 2010
Just use this method, call it and pass the string that you want to split out
static String reverseWords(String str) {
// Specifying the pattern to be searched
Pattern pattern = Pattern.compile("\\s");
// splitting String str with a pattern
// (i.e )splitting the string whenever their
// is whitespace and store in temp array.
String[] temp = pattern.split(str);
String result = "";
// Iterate over the temp array and store
// the string in reverse order.
for (int i = 0; i < temp.length; i++) {
if (i == temp.length - 1) {
result = temp[i] + result;
} else {
result = " " + temp[i] + result;
}
}
return result;
}
Depending on your exact requirements, you may want to split on other forms of whitespace (tabs, multiple spaces, etc.):
static Pattern p = Pattern.compile("(\\S+)(\\s+)(\\S+)");
public String flipWords(String in)
{
Matcher m = p.matcher(in);
if (m.matches()) {
// reverse the groups we found
return m.group(3) + m.group(2) + m.group(1);
} else {
return in;
}
}
If you want to get more complex see the docs for Pattern http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
Try something as follows:
String input = "how is this";
List<String> words = Arrays.asList(input.split(" "));
Collections.reverse(words);
String result = "";
for(String word : words) {
if(!result.isEmpty()) {
result += " ";
}
result += word;
}
System.out.println(result);
Output:
this is how
Too much?
private static final Pattern WORD = Pattern.compile("^(\\p{L}+)");
private static final Pattern NUMBER = Pattern.compile("^(\\p{N}+)");
private static final Pattern SPACE = Pattern.compile("^(\\p{Z}+)");
public static String reverseWords(final String text) {
final StringBuilder sb = new StringBuilder(text.length());
final Matcher wordMatcher = WORD.matcher(text);
final Matcher numberMatcher = NUMBER.matcher(text);
final Matcher spaceMatcher = SPACE.matcher(text);
int offset = 0;
while (offset < text.length()) {
wordMatcher.region(offset, text.length());
numberMatcher.region(offset, text.length());
spaceMatcher.region(offset, text.length());
if (wordMatcher.find()) {
final String word = wordMatcher.group();
sb.insert(0, reverseCamelCase(word));
offset = wordMatcher.end();
} else if (numberMatcher.find()) {
sb.insert(0, numberMatcher.group());
offset = numberMatcher.end();
} else if (spaceMatcher.find()) {
sb.insert(0, spaceMatcher.group(0));
offset = spaceMatcher.end();
} else {
sb.insert(0, text.charAt(offset++));
}
}
return sb.toString();
}
private static final Pattern CASE_REVERSAL = Pattern
.compile("(\\p{Lu})(\\p{Ll}*)(\\p{Ll})$");
private static String reverseCamelCase(final String word) {
final StringBuilder sb = new StringBuilder(word.length());
final Matcher caseReversalMatcher = CASE_REVERSAL.matcher(word);
int wordEndOffset = word.length();
while (wordEndOffset > 0 && caseReversalMatcher.find()) {
sb.insert(0, caseReversalMatcher.group(3).toUpperCase());
sb.insert(0, caseReversalMatcher.group(2));
sb.insert(0, caseReversalMatcher.group(1).toLowerCase());
wordEndOffset = caseReversalMatcher.start();
caseReversalMatcher.region(0, wordEndOffset);
}
sb.insert(0, word.substring(0, wordEndOffset));
return sb.toString();
}