Recognizing a pattern for regular expressions [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Please, help me with how use java regular expression for the scenario described Specifically how to recognize a particular pattern to match:
I have an input string that may look something like this:
something + {SomeProductSet1}.count + {SomeOtherProductSet2}.amount >
{SomeProductSet3}.count + {SomeUSOC4}.amount
I need to replace everything in the {} with something like this
something + [abc].count+[xyz].count+[something].count + [xom].amount+
[ytkd].amount > [d].count
Basically, everything in between "{..}" has it's equivalent as a list of things that I put later using "[..]".
I have the list of things for [..] but, how can I "recognize" the '{...}' part It is of variable length and variable character sets.
What would I use as a pattern if using regular expressions ??
Thank you !! Much appreciated.

In Java there are lots of ways you can code to get the contents between brackets (or any two specific characters for that matter) but you sort of want to dabble with a regular expression which can be rather slow for this sort of thing especially when dealing with multiple instances of bracket pairs within a supplied string.
The basic code you want to gather up the contents contained between Curly Brackets could be something like this:
String myString = "something + {SomeProductSet1}.count + {SomeOtherProductSet2}.amount > \n" +
" {SomeProductSet3}.count + {SomeUSOC4}.amount";
Matcher match = Pattern.compile("\\{([^}]+)\\}").matcher(myString);
while(match.find()) {
System.out.println(match.group(1));
}
What the above Regular Expression means:
\\{ Open Curly Bracket character {
( start match group
[ one of these characters
^ not the following character
} with the previous ^, this means "every character except the Close
Curly Bracket }"
+ one of more other characters from the [] set
) stop match group
\\} literal Closing Curly Bracket }
If it were me, I would create a method to house this code so that it can be used for other bracket types as well like: Parentheses (), Square Brackets [], Curly Brackets {} (as shown in code), Chevron Brackets <>, or even between any two supplied characters like: /.../ or %...% or maybe even A...A. See the example method below which demonstrates this.
In the example code above it would be within the while loop where you would handle each substring found between each set of brackets. You will of course require a mechanism to determine which substring detected is to be replaced with whatever string like perhaps a multidimensional Array, or perhaps even a custom dialog which would display the found substring between each bracket and allow the User to select its replacement from perhaps a Combo Box with a Do All option Check Box. There are of course several options here for how and what you want to handle each found substring between each set of brackets.
Here is a method example which demonstrates what we've discussed here. It is well commented:
public String replaceBetween(String inputString, String openChar,
String closeChar, String[][] replacements) {
// If the supplied input String contains nothing
// then return a Null String ("").
if (inputString.isEmpty()) { return ""; }
// Declare a string to hold the input string, this way
// we can freely manipulate it without jeopordizing the
// original input string.
String inString = inputString;
// Set the escape character (\) for RegEx special Characters
// for both the openChar and closeChar parameters in case
// a character in each was supplied that is a special RegEx
// character. We'll use RegEx to do this.
Pattern regExChars = Pattern.compile("[{}()\\[\\].+*?^$\\\\|]");
String opnChar = regExChars.matcher(openChar).replaceAll("\\\\$0");
String clsChar = regExChars.matcher(closeChar).replaceAll("\\\\$0");
// Create our Pattern to find the items contained between
// the characters tht was supplied in the openChar and
// closeChar parameters.
Matcher m = Pattern.compile(opnChar + "([^" + closeChar + "]+)" + clsChar).matcher(inString);
// Iterate through the located items...
while(m.find()) {
String found = m.group(1);
// Lets see if the found item is contained within
// our supplied 2D replacement items Array...
for (int i = 0; i < replacements.length; i++) {
// Is an item found in the array?
if (replacements[i][0].equals(found)) {
// Yup... so lets replace the found item in our
// input string with the related element in our
// replacement array.
inString = inString.replace(openChar + found + closeChar, replacements[i][1]);
}
}
}
// Return the modified input string.
return inString;
}
To use this method you might do this:
// Our 2D replacement array. In the first column we place
// the substrings we would like to find within our input
// string and in the second column we place what we want
// to replace the item in the first column with if it's
// found.
String[][] replacements = {{"SomeProductSet1", "[abc]"},
{"SomeOtherProductSet2", "[xyz]"},
{"SomeProductSet3", "[xom]"},
{"SomeUSOC4", "[ytkd]"}};
// The string we want to modify (the input string):
String example = "something + {SomeProductSet1}.count + {SomeOtherProductSet2}.amount > \n" +
" {SomeProductSet3}.count + {SomeUSOC4}.amount";
// Lets make the call and place the returned result
// into a new variable...
String newString = replaceBetween(example, "{", "}", replacements);
// Display the new string variable contents
// in Console.
System.out.println(newString);
The Console should display:
something + [abc].count + [xyz].amount >
[xom].count + [ytkd].amount
Notice how it also replaces the Curly Brackets? This appears to be one of your requirements but can be easily modified to just replace the substring between the brackets. Perhaps you can modify this method (if you like) to do this optionally and as yet another added optional feature....allow it to ignore letter case.

Related

How can I check if ArrayMap.keySet() contains a certain variable + Regex?

I have an ArrayMap, of which the keys are something like tag - randomWord. I want to check if the tag part of the key matches a certain variable.
I have tried messing around with Patterns, but to no success. The only way I can get this working at this moment, is iterating through all the keys in a for loop, then splitting the key on ' - ', and getting the first value from that, to compare to my variable.
for (String s : testArray) {
if ((s.split("(\\s)(-)(\\s)(.*)")[0]).equals(variableA)) {
// Do stuff
}
}
This seems very devious to me, especially since I only need to know if the keySet contains the variable, that's all I'm interested in. I was thinking about using the contains() method, and put in (variableA + "(\\s)(-)(\\s)(.*)"), but that doesn't seem to work.
Is there a way to use the .contains() method for this case, or do I have to loop the keys manually?
You should split these tasks into two steps - first extract the tag, then compare it. Your code should look something like this:
for (String s : testArray) {
if (arrayMap. keySet().contains(extractTag(s)) {
// Do stuff
}
}
Notice that we've separated our concerns into two steps, making it easier to verify each step behaves correctly individually. So now the question is "How do we implement extractTag()?"
The ( ) symbols in a regular expression create a group match, which you can retrieve via Matcher.group() - if you only care about tag you could use a Pattern like so:
"(\\S+)\\s-\\s.*"
In which case your extractTag() method would look like:
private static final Pattern TAG_PATTERN = Pattern.compile("(\\S+)\\s-\\s.*");
private static String extractTag(String s) {
Matcher m = TAG_PATTERN.matcher(s);
if (m.matches()) {
return m.group(1);
}
throw new IllegalArgumentException(
"'" + s + "' didn't match " TAG_PATTERN.pattern());
}
If you'd rather use String.split() you just need to define a regular expression that matches the delimiter, in this case -; you could use the following regular expression in a split() call:
"\\s-\\s"
It's often a good idea to use + after \\s to support one or more spaces, but it depends on what inputs you need to process. If you know it's always exactly one-space-followed-by-one-dash-followed-by-one-space, you could just split on:
" - "
In which case your extractTag() method would look like:
private static String extractTag(String s) {
String[] parts = s.split(" - ");
if (parts.length > 1) {
return s[0];
}
throw new IllegalArgumentException("Could not extract tag from '" + s + "'");
}

Making only the first letter of a word uppercase

I have a method that converts all the first letters of the words in a sentence into uppercase.
public static String toTitleCase(String s)
{
String result = "";
String[] words = s.split(" ");
for (int i = 0; i < words.length; i++)
{
result += words[i].replace(words[i].charAt(0)+"", Character.toUpperCase(words[i].charAt(0))+"") + " ";
}
return result;
}
The problem is that the method converts each other letter in a word that is the same letter as the first to uppercase. For example, the string title comes out as TiTle
For the input this is a title this becomes the output This Is A TiTle
I've tried lots of things. A nested loop that checks every letter in each word, and if there is a recurrence, the second is ignored. I used counters, booleans, etc. Nothing works and I keep getting the same result.
What can I do? I only want the first letter in upper case.
Instead of using the replace() method, try replaceFirst().
result += words[i].replaceFirst(words[i].charAt(0)+"", Character.toUpperCase(words[i].charAt(0))+"") + " ";
Will output:
This Is A Title
The problem is that you are using replace method which replaces all occurrences of described character. To solve this problem you can either
use replaceFirst instead
take first letter,
create its uppercase version
concatenate it with rest of string which can be created with a little help of substring method.
since you are using replace(String, String) which uses regex you can add ^ before character you want to replace like replace("^a","A"). ^ means start of input so it will only replace a that is placed after start of input.
I would probably use second approach.
Also currently in each loop your code creates new StringBuilder with data stored in result, append new word to it, and reassigns result of output from toString().
This is infective approach. Instead you should create StringBuilder before loop that will represent your result and append new words created inside loop to it and after loop ends you can get its String version with toString() method.
Doing some Regex-Magic can simplify your task:
public static void main(String[] args) {
final String test = "this is a Test";
final StringBuffer buffer = new StringBuffer(test);
final Pattern patter = Pattern.compile("\\b(\\p{javaLowerCase})");
final Matcher matcher = patter.matcher(buffer);
while (matcher.find()) {
buffer.replace(matcher.start(), matcher.end(), matcher.group().toUpperCase());
}
System.out.println(buffer);
}
The expression \\b(\\p{javaLowerCase}) matches "The beginning of a word followed by a lower-case letter", while matcher.group() is equal to whats inside the () in the part that matches. Example: Applying on "test" matches on "t", so start is 0, end is 1 and group is "t". This can easily run through even a huge amount of text and replace all those letters that need replacement.
In addition: it is always a good idea to use a StringBuffer (or similar) for String manipulation, because each String in Java is unique. That is if you do something like result += stringPart you actually create a new String (equal to result + stringPart) each time this is called. So if you do this with like 10 parts, you will in the end have at least 10 different Strings in memory, while you only need one, which is the final one.
StringBuffer instead uses something like char[] to ensure that if you change only a single character no extra memory needs to be allocated.
Note that a patter only need to be compiled once, so you can keep that as a class variable somewhere.

Error when splitting a string in java

I am trying to split a string according to a certain set of delimiters.
My delimiters are: ,"():;.!? single spaces or multiple spaces.
This is the code i'm currently using,
String[] arrayOfWords= inputString.split("[\\s{2,}\\,\"\\(\\)\\:\\;\\.\\!\\?-]+");
which works fine for most cases but i'm have a problem when the the first word is surrounded by quotation marks. For example
String inputString = "\"Word\" some more text.";
Is giving me this output
arrayOfWords[0] = ""
arrayOfWords[0] = "Word"
arrayOfWords[1] = "some"
arrayOfWords[2] = "more"
arrayOfWords[3] = "text"
I want the output to give me an array with
arrayOfWords[0] = "Word"
arrayOfWords[1] = "some"
arrayOfWords[2] = "more"
arrayOfWords[3] = "text"
This code has been working fine when quotation marks are used in the middle of the sentence, I'm not sure what the trouble is when it's at the beginning.
EDIT: I just realized I have same problem when any of the delimiters are used as the first character of the string
Unfortunately you wont be able to remove this empty first element using only split. You should probably remove first elements from your string that match your delimiters and split after it. Also your regex seems to be incorrect because
by adding {2,} inside [...] you are in making { 2 , and } characters delimiters,
you don't need to escape rest of your delimiters (note that you don't have to escape - only because it is at end of character class [] so he cant be used as range operator).
Try maybe this way
String regexDelimiters = "[\\s,\"():;.!?\\-]+";
String inputString = "\"Word\" some more text.";
String[] arrayOfWords = inputString.replaceAll(
"^" + regexDelimiters,"").split(regexDelimiters);
for (String s : arrayOfWords)
System.out.println("'" + s + "'");
output:
'Word'
'some'
'more'
'text'
A delimiter is interpreted as separating the strings on either side of it, thus the empty string on its left is added to the result as well as the string to its right ("Word"). To prevent this, you should first strip any leading delimiters, as described here:
How to prevent java.lang.String.split() from creating a leading empty string?
So in short form you would have:
String delim = "[\\s,\"():;.!?\\-]+";
String[] arrayOfWords = inputString.replaceFirst("^" + delim, "").split(delim);
Edit: Looking at Pshemo's answer, I realize he is correct regarding your regex. Inside the brackets it's unnecessary to specify the number of space characters, as they will be caught be the + operator.

Remove curly brace in Java

I have a text file in which each line begins and ends with a curly brace:
{aaa,":"bbb,ID":"ccc,}
{zzz,":"sss,ID":"fff,}
{ggg,":"hhh,ID":"kkk,} ...
Between the characters there are no spaces. I'm trying to remove the curly braces and replace them with white space as follows:
String s = "{aaa,":"bbb,ID":"ccc,}";
String n = s.replaceAll("{", " ");
I've tried escaping the curly brace using:
String n = s.replaceAll("/{", " ");
String n = s.replaceAll("'{'", " ");
None of this works, as it comes up with an error. Does anyone know a solution?
you cannot define a String like this:
String s = "{aaa,":"bbb,ID":"ccc,}";
The error is here, you have to escape the double quotes inside the string, like this:
String s = "{aaa,\":\"bbb,ID\":\"ccc,}";
Now there will be no error if you call
s.replaceAll("\\{", " ");
If you have an IDE (that is a program like eclipse), you will notice that a string is colored different from the standard color black (for example the color of a method or a semicolon [;]). If the string is all of the same color (usually brown, sometimes blue) then you should be ok, if you notice some black color inside, you are doing something wrong. Usually the only thing that you would put after a double quote ["] is a plus [+] followed by something that has to be added to the string. For example:
String firstPiece = "This is a ";
// this is ok:
String all = s + "String";
//if you call:
System.out.println(all);
//the output will be: This is a String
// this is not ok:
String allWrong = s "String";
//Even if you are putting one after the other the two strings, this is forbidden and is a Syntax error.
String.replaceAll() takes a regex, and regex requires escaping of the '{' character. So, replace:
s.replaceAll("{", " ");
with:
s.replaceAll("\\{", " ");
Note the double-escapes - one for the Java string, and one for the regex.
However, you don't really need a regex here since you're just matching a single character. So you could use the replace method instead:
s.replace("{", " "); // Replace string occurrences
s.replace('{', ' '); // Replace character occurrences
Or, use the regex version to replace both braces in one fell swoop:
s.replaceAll("[{}]", " ");
No escaping is needed here since the braces are inside a character class ([]).
Just adding to the answer above:
If somebody is trying like below, this won't work:
if(values.contains("\\{")){
values = values.replaceAll("\\{", "");
}
if(values.contains("\\}")){
values = values.replaceAll("\\}", "");
}
Use below code if you are using contains():
if(values.contains("{")){
values = values.replaceAll("\\{", "");
}
if(values.contains("}")){
values = values.replaceAll("\\}", "");
}

Regular expression, value in between quotes

I'm having a little trouble constructing the regular expression using java.
The constraint is, I need to split a string seperated by !. The two strings will be enclosed in double quotes.
For example:
"value"!"value"
If I performed a java split() on the string above, I want to get:
value
value
However the catch is value can be any characters/punctuations/numerical character/spaces/etc..
So here's a more concrete example. Input:
""he! "l0"!"wor!"d1"
Java's split() should return:
"he! "l0
wor!"d1
Any help is much appreciated. Thanks!
Try this expression: (".*")\s*!\s*(".*")
Although it would not work with split, it should work with Pattern and Matcher and return the 2 strings as groups.
String input = "\" \"he\"\"\"\"! \"l0\" ! \"wor!\"d1\"";
Pattern p = Pattern.compile("(\".*\")\\s*!\\s*(\".*\")");
Matcher m = p.matcher(input);
if(m.matches())
{
String s1 = m.group(1); //" "he""""! "l0"
String s2 = m.group(2); //"wor!"d1"
}
Edit:
This would not work for all cases, e.g. "he"!"llo" ! "w" ! "orld" would get the wrong groups. In that case it would be really hard to determine which ! should be the separator. That's why often rarely used characters are used to separate parts of a string, like # in email addresses :)
have the value split on "!" instead of !
String REGEX = "\"!\"";
String INPUT = "\"\"he! \"l0\"!\"wor!\"d1\"";
String[] items = p.split(INPUT);
It feels like you need to parse on:
DOUBLEQUOTE = "
OTHER = anything that isn't a double quote
EXCLAMATION = !
ITEM = (DOUBLEQUOTE (OTHER | (DOUBLEQUOTE OTHER DOUBLEQUOTE))* DOUBLEQUOTE
LINE = ITEM (EXCLAMATION ITEM)*
It feels like it's possible to create a regular expression for the above (assuming the double quotes in an ITEM can't be nested even further) BUT it might be better served by a very simple grammer.
This might work... excusing missing escapes and the like
^"([^"]*|"[^"]*")*"(!"([^"]*|"[^"]*")*")*$
Another option would be to match against the first part, then, if there's a !and more, prune off the ! and keep matching (excuse the no-particular-language, I'm just trying to illustrate the idea):
resultList = []
while(string matches \^"([^"]*|"[^"]*")*(.*)$" => match(1)) {
resultList += match
string = match(2)
if(string.beginsWith("!")) {
string = string[1:end]
} elseif(string.length > 0) {
// throw an error, since there was no exclamation and the string isn't done
}
}
if(string.length > 0) {
// throw an exception since the string isn't done
}
resultsList == the list of items in the string
EDIT: I realized that my answer doesn't really work. You can have a single doublequote inside the strings, as well as exclamation marks. As such, you really CAN'T have "!" inside one of the strings. As such, the idea of 1) pull quotes off the ends, 2) split on '"!"' is really the right way to go.

Categories

Resources