Using "Predefined character classes" on a substring

Using "Predefined character classes" on a substring - java

I want to check the last character of the String for characters that are not a non-word character using '\W' and allow certain symbols like ". , ! etc" from the top of my head I thought of using a code similar to this.
Boolean notCompleted = true;
int deduct = 1;
while(notCompleted){
if(string.charAt(string.length() -deduct) == '\W'){ // '\W' <-- doesn't work since it accepts anything other than "escape sequences".
if(string.charAt(string.length() -deduct) == '.'||string.charAt(string.length() -deduct) == ','||string.charAt(string.length() -deduct) == '!'){
//Do nothing and move on to the while loop
}else{
//Replace the non word character with ' '.
}
}
deduct++;
if(deduct >= html.length()){
notCompleted = false;
}
}
The reason why this doesn't work is because using string.charAt only accepts "Escapes sequence".
My question is there another way to pull this off rather than doing.
string.replaceAll("\W", "");
All suggestions is greatly appreciated. Thank you.
Thanks to the tip npinti gave me I built this code. However I am getting an error line
Desired Result of fakeNewString as requested should be "! asdsdefwef.,a,,sda.sd";
fakeNewString = sb.toString(); // NullPointerException
public static void test5(){
Boolean notCompleted = true;
String fakeNewString = "!##$%^&*( asdsdefwef.,a,,sda.sd";
int start = 0, end = 1;
StringBuilder sb = null;
try{
while(notCompleted){
start++;
String tempString = fakeNewString.substring(start, end);
if(Pattern.matches("\\W$", tempString)){
if(Pattern.matches("!", tempString)||Pattern.matches(".", tempString)||Pattern.matches(",", tempString)||Pattern.matches("\"", tempString)){
//do nothing
sb.append(tempString);
}else{
//Change it to spaces.
tempString = " ";
sb.append(tempString);
}
}
end++;
if(end >= fakeNewString.length()){
notCompleted = false;
fakeNewString = sb.toString();
System.out.println(fakeNewString);
}
}
}catch (Exception e) {
// TODO: handle exception
e.printStackTrace();
}
}

You can do something like so:
Pattern pattern = Pattern.compile("\\W$");
Matcher matcher = pattern.match(string);
if (matcher.find())
{
//do something when the string ends with a non word character
}
Take a look at this tutorial for more information on regular expressions.

You can use String.replaceAll in a slightly different way to do this. It achieves the same effect as the code you're trying to write, which seems like a complex solution for a simple problem. Try this code:
string.replaceAll("[^\\w!,.]", " ");
All the invalid characters are now replaced by a space, and multiple sequential occurrences of them are replaced by multiple spaces.

Lets try to break down the question (desire) and answer it:
I want to check the last character of the String for characters that are not a non-word character using '\W' and allow certain symbols like ". , ! etc"
First we have:
I want to check the last character of the String
Expression for character X at end of string:
X$
Then:
for characters that are not a non-word character
Expression:
[^\W] i.e. \w
And also:
allow certain symbols like ". , ! etc"
Added to the expression above:
[\w.,!]
And the combined final result is:
[\w.,!]$
Ta-da! (Altho I'm guessing OP is looking for something else, I did it for teh lulz.)

Related

String match of only 3 specific words

I want a regex to match a string that only has the words A,I and D without any order or sort
also:if the string has a letter thats not any of these then doesnt go into the if
I have tried with || and other symbols but still cant get it
Doesnt have to be a regex Im just trying to find a way to solve it
String message = "AIDDDAAIDAAA"
if(message.matches("(A|D|I)")){
System.out.println("Matches");
}

You can use include all characters you are interested in within a square bracket. To match one or more occurrences of these characters in square brackets, append + to it. The message string should be entirely made up of only these characters for it to be considered a match.
Try this.
String message = "ADAIAIAIAIAIADDDAI";
if(message .matches("[ADI]+")) {
System.out.println("Matches");
}

Since you're asking about "words", I guess A, D and I stand for words with more than one letter and that that is the reaason why you're not using the character class [ADI]. You just have to add a + because the message consists of more words.
String message = "AIDDDAAIDAAA";
if (message.matches("(A|D|I)+")) {
System.out.println("Matches");
}
scigs answer works as well.

You could use string replace to do something similar:
String message = "AIDDDAAIDAAA";
message = message.replace("A","")
.replace("I","")
.replace("D","");
if (message.equals("")) {
//do your thing
}

You could just check each char composing the String:
String message = "AIDDDAAIDAAA";
boolean matches = true;
for (int i=0; i<message.length(); i++;){
if (message.charAt(i)!='A' && message.charAt(i)!='I' && message.charAt(i)!='D'){
matches = false;
break;
}
}
if (matches) System.out.println("Matches");

Replacing Strings with a number in it without a for loop

So I currently have this code;
for (int i = 1; i <= this.max; i++) {
in = in.replace("{place" + i + "}", this.getUser(i)); // Get the place of a user.
}
Which works well, but I would like to just keep it simple (using Pattern matching)
so I used this code to check if it matches;
System.out.println(StringUtil.matches("{place5}", "\\{place\\d\\}"));
StringUtil's matches;
public static boolean matches(String string, String regex) {
if (string == null || regex == null) return false;
Pattern compiledPattern = Pattern.compile(regex);
return compiledPattern.matcher(string).matches();
}
Which returns true, then comes the next part I need help with, replacing the {place5} so I can parse the number. I could replace "{place" and "}", but what if there were multiple of those in a string ("{place5} {username}"), then I can't do that anymore, as far as I'm aware, if you know if there is a simple way to do that then please let me know, if not I can just stick with the for-loop.

then comes the next part I need help with, replacing the {place5} so I can parse the number
In order to obtain the number after {place, you can use
s = s.replaceAll(".*\\{place(\\d+)}.*", "$1");
The regex matches arbitrary number of characters before the string we are searching for, then {place, then we match and capture 1 or more digits with (\d+), and then we match the rest of the string with .*. Note that if the string has newline symbols, you should append (?s) at the beginning of the pattern. $1 in the replacement pattern "restores" the value we need.

Split comma separated string with quotes and commas within quotes and escaped quotes within quotes

I searched even on page 3 at google for this problem, but it seems there is no proper solution.
The following string
"zhg,wimö,'astor wohnideen','multistore 2002',yonza,'asdf, saflk','marc o\'polo'"
should be splitted by comma in Java. The quotes can be double quotes or single. I tried the following regex
,(?=([^\"']*[\"'][^\"']*[\"'])*[^\"']*$)
but because of the escaped quote at 'marc o\'polo' it fails...
Can somebody help me out?
Code for tryout:
String checkString = "zhg,wimö,'astor wohnideen','multistore 2002',yonza,'asdf, saflk','marc \'opolo'";
Pattern COMMA_PATTERN = Pattern.compile(",(?=([^\"']*[\"'][^\"']*[\"'])*[^\"']*$)");
String[] splits = COMMA_PATTERN.split(checkString);
for (String split : splits) {
System.out.println(split);
}

You can do it like this:
List<String> result = new ArrayList<String>();
Pattern p = Pattern.compile("(?>[^,'\"]++|(['\"])(?>[^\"'\\\\]++|\\\\.|(?!\\1)[\"'])*\\1|(?<=,|^)\\s*(?=,|$))+", Pattern.DOTALL);
Matcher m = p.matcher(checkString);
while(m.find()) {
result.add(m.group());
}

Splitting CSV with regex is not the right solution... which is probably why you are struggling to find one with split/csv/regex search terms.
Using a dedicated library with a state machine is typically the best solution. There are a number of them:
This closed question seems relevant: https://stackoverflow.com/questions/12410538/which-is-the-best-csv-parser-in-java
I have used opencsv in the past, and I beleive the apache csv tool is good too. I am sure there are others. I am specifically not linking any library because you should o your own research on what to use.
I have been involved in a number of commercail projects where the csv parser was custom-built, but I see no reason why that should still be done.
What I can say, is that regex and CSV get very, very complicated relatively quickly (as you have discovered), and that for performance reasons alone, a 'raw' parser is better.

If you are parsing CVS (or something very similar) than using one of the stablished frameworks normally is a good idea as they cover most corner-cases and are tested by a wider audience thorough usage in different projects.
If however libraries are no option you could go with e.g. this:
public class Curios {
public static void main(String[] args) {
String checkString = "zhg,wimö,'astor wohnideen','multistore 2002',yonza,'asdf, saflk','marc o\\'polo'";
List<String> result = splitValues(checkString);
System.out.println(result);
System.out.println(splitValues("zhg\\,wi\\'mö,'astor wohnideen','multistore 2002',\"yo\\\"nza\",'asdf, saflk\\\\','marc o\\'polo',"));
}
public static List<String> splitValues(String checkString) {
List<String> result = new ArrayList<String>();
// Used for reporting errors and detecting quotes
int startOfValue = 0;
// Used to mark the next character as being escaped
boolean charEscaped = false;
// Is the current value quoted?
boolean quoted = false;
// Quote-character in use (only valid when quoted == true)
char quote = '\0';
// All characters read from current value
final StringBuilder currentValue = new StringBuilder();
for (int i = 0; i < checkString.length(); i++) {
final char charAt = checkString.charAt(i);
if (i == startOfValue && !quoted) {
// We have not yet decided if this is a quoted value, but we are right at the beginning of the next value
if (charAt == '\'' || charAt == '"') {
// This will be a quoted String
quote = charAt;
quoted = true;
startOfValue++;
continue;
}
}
if (!charEscaped) {
if (charAt == '\\') {
charEscaped = true;
} else if (quoted && charAt == quote) {
if (i + 1 == checkString.length()) {
// So we don't throw an exception
quoted = false;
// Last value will be added to result outside loop
break;
} else if (checkString.charAt(i + 1) == ',') {
// Ensure we don't parse , again
i++;
// Add the value to the result
result.add(currentValue.toString());
// Prepare for next value
currentValue.setLength(0);
startOfValue = i + 1;
quoted = false;
} else {
throw new IllegalStateException(String.format(
"Value was quoted with %s but prematurely terminated at position %d " +
"maybe a \\ is missing before this %s or a , after? " +
"Value up to this point: \"%s\"",
quote, i, quote, checkString.substring(startOfValue, i + 1)));
}
} else if (!quoted && charAt == ',') {
// Add the value to the result
result.add(currentValue.toString());
// Prepare for next value
currentValue.setLength(0);
startOfValue = i + 1;
} else {
// a boring character
currentValue.append(charAt);
}
} else {
// So we don't forget to reset for next char...
charEscaped = false;
// Here we can do interpolations
switch (charAt) {
case 'n':
currentValue.append('\n');
break;
case 'r':
currentValue.append('\r');
break;
case 't':
currentValue.append('\t');
break;
default:
currentValue.append(charAt);
}
}
}
if(charEscaped) {
throw new IllegalStateException("Input ended with a stray \\");
} else if (quoted) {
throw new IllegalStateException("Last value was quoted with "+quote+" but there is no terminating quote.");
}
// Add the last value to the result
result.add(currentValue.toString());
return result;
}
}
Why not simply a regular expression?
Regular expressions don't understand nesting very well. While certainly the regular expression by Casimir does a good job, differences between quoted and unquoted values are easier to model in some form of a state-machine. You see how difficult it was to ensure you don't accidentally match an ecaped or quoted ,. Also while you are allready evaluating every character it is easy to interpret escape-sequences like \n
What to watch out for?
My function was not written for white-space arround values (this can be changed)
My function will interpret the escape-sequences \n, \r, \t, \\ like most C-style language interpreters while reading \x as x (this can easily be changed)
My function accepts quotes and escapes inside unquoted values (this can easily be changed)
I did only a few tests and tried my best to exhibit a good memory-management and timing, but you will need to see if it fits your needs.

inserting parentheses and asterisks into string according to some conditions

I have the following method which is used to insert parentheses and asterisks into a boolean expression when dealing with multiplication. For instance, an input of A+B+AB will give A+B+(A*B).
However, I also need to take into account the primes (apostrophes). The following are some examples of input/output:
A'B'+CD should give (A'*B')+(C*D)
A'B'C'D' should give (A'*B'*C'*D')
(A+B)'+(C'D') should give (A+B)'+(C'*D')
I have tried the following code but seems to have errors. Any thoughts?
public static String modify(String expression)
{
String temp = expression;
StringBuilder validated = new StringBuilder();
boolean inBrackets=false;
for(int idx=0; idx<temp.length()-1; idx++)
{
//no prime
if((Character.isLetter(temp.charAt(idx))) && (Character.isLetter(temp.charAt(idx+1))))
{
if(!inBrackets)
{
inBrackets = true;
validated.append("(");
}
validated.append(temp.substring(idx,idx+1));
validated.append("*");
}
//first prime
else if((Character.isLetter(temp.charAt(idx))) && (temp.charAt(idx+1)=='\'') && (Character.isLetter(temp.charAt(idx+2))))
{
if(!inBrackets)
{
inBrackets = true;
validated.append("(");
}
validated.append(temp.substring(idx,idx+2));
validated.append("*");
idx++;
}
//second prime
else if((Character.isLetter(temp.charAt(idx))) && (temp.charAt(idx+2)=='\'') && (Character.isLetter(temp.charAt(idx+1))))
{
if(!inBrackets)
{
inBrackets = true;
validated.append("(");
}
validated.append(temp.substring(idx,idx+1));
validated.append("*");
idx++;
}
else
{
validated.append(temp.substring(idx,idx+1));
if(inBrackets)
{
validated.append(")");
inBrackets=false;
}
}
}
validated.append(temp.substring(temp.length()-1));
if(inBrackets)
{
validated.append(")");
inBrackets=false;
}
return validated.toString();
}
Your help will greatly be appreciated. Thank you in advance! :)

I would suggest you should start with positions of + character in your string. If they differ by 1, you dont do anything. If they differ by two then there are two possiblities: AB or A'. So you check for it. If they differ by more than 2, then just check for ' symbol and put required symbol.

You can do it in 2 passes using regular expressions:
StringBuilder input = new StringBuilder("A'B'+(CDE)+A'B");
Pattern pattern1 = Pattern.compile("[A-Z]'?(?=[A-Z]'?)");
Matcher matcher1 = pattern1.matcher(input);
while (matcher1.find()) {
input.insert(matcher1.end(), '*');
matcher1.region(matcher1.end() + 1, input.length());
}
Pattern pattern2 = Pattern.compile("([A-Z]'?[*])+[A-Z]'?");
Matcher matcher2 = pattern2.matcher(input);
while (matcher2.find()) {
int start = matcher2.start();
int end = matcher2.end();
if (start==0||input.charAt(start-1) != '(') {
input.insert(start, '(');
end++;
}
if (input.length() == end || input.charAt(end) != ')') {
input.insert(end, ')');
end++;
}
matcher2.region(end, input.length());
}
It works as follows: the regex [A-Z]'? will match a letter from A-Z (all the capital letters) and it can be followed by an optional apostrophe, so it conveniently takes care of whether there is an apostrophe or not for us. The regex [A-Z]'?(?=[A-Z]'?) then means "look for a capital letter followed by an option apostrophe and then look for (but don't match against) a capital letter followed by an option apostrophe. This wil be all the places after which you want to put an asterisk. We then create a Matcher and find all the characters that match it. then we insert the asterisk. Since we modified the string, we need to update the Matcher for it to function properly.
In the second pass, we use the regex ([A-Z]'?[*])+[A-Z]'? which will look for "a capital letter followed by an option apostrophe and then an asterisk at least one time and then a capital letter followed by an option apostrophe". this is where all the groups that parentheses need to go in lie. So we create a Matcher and find the matches. we then check to see if there is already a parentese there (making sure not to go out of bounds ). If not we add a one. We again need to update the Matcher since we inserted characters. once this is over we have or final string.
for more on regex:
Pattern documentation
Regex tutorial

Java regex and pattern matching: finding "blanks" in pattern which do not include them?

So, I need to write a compiler scanner for a homework, and thought it'd be "elegant" to use regex. Fact is, I seldomly used them before, and it was a long time ago. So I forgot most of the stuff about them and needed to have a look around. I used them successfully for the identifiers (or at least I think so, I still need to do some further tests but for now they all look ok), but I have a problem with the numbers-recognition.
The function nextCh() reads the next character on the input (lookahead char). What I'd like to do here is to check if this char matches the regex [0-9]*. I append every matching char in the str field of my current token, then I read the int value of this field. It recognizes a single number input such as "123", but the problem I have is that for the input "123 456", the final str will be "123 456" while I should get 2 separate tokens with fields "123" and "456". Why is the " " being matched?
private void readNumber(Token t) {
t.str = "" + ch; // force conversion char --> String
final Pattern pattern = Pattern.compile("[0-9]*");
nextCh(); // get next char and check if it is a digit
Matcher match = pattern.matcher("" + ch);
while (match.find() && ch != EOF) {
t.str += ch;
nextCh();
match = pattern.matcher("" + ch);
}
t.kind = Kind.number;
try {
int value = Integer.parseInt(t.str);
t.val = value;
} catch(NumberFormatException e) {
error(t, Message.BIG_NUM, t.str);
}
Thank you!
PS: I did solve my problem using the code below. Nevertheless, I'd like to understand where the flaw is in my regex expression.
t.str = "" + ch;
nextCh(); // get next char and check if it is a number
while (ch>='0' && ch<='9') {
t.str += ch;
nextCh();
}
t.kind = Kind.number;
try {
int value = Integer.parseInt(t.str);
t.val = value;
} catch(NumberFormatException e) {
error(t, Message.BIG_NUM, t.str);
}
EDIT: turns out my regex also doesn't work for the identifiers recognition (again, includes blanks), so I had to switch to a system similar to my "solution" (while with a lot of conditions). Guess I'll need to study the regex again :O

I'm not 100% sure whether this is relevant in your case, but this:
Pattern.compile("[0-9]*");
matches zero or more numbers anywhere in the string, because of the asterisk. I think the space gets matched because it is a match for 'zero numbers'. If you wanted to make sure the char was a number, you would have to match one or more, using the plus sign:
Pattern.compile("[0-9]+");
or, since you are only comparing a single char at a time, just match one number:
Pattern.compile("^[0-9]$");

You should be using the matches method rather than the find method. From the documentation:
The matches method attempts to match the entire input sequence against the pattern
The find method scans the input sequence looking for the next subsequence that matches the pattern.
So in other words, by using find, if the string contains a digit anywhere at all, you'll get a match, but if you use matches the entire string must match the pattern.
For example, try this:
Pattern p = Pattern.compile("[0-9]*");
Matcher m123abc = p.matcher("123 abc");
System.out.println(m123abc.matches()); // prints false
System.out.println(m123abc.find()); // prints true

Use a simpler regex like
/\d+/
Where
\d means a digit
+ means one or more
In code:
final Pattern pattern = Pattern.compile("\\d+");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using "Predefined character classes" on a substring - java

You can do something like so: Pattern pattern = Pattern.compile("\\W$"); Matcher matcher = pattern.match(string); if (matcher.find()) { //do something when the string ends with a non word character } Take a look at this tutorial for more information on regular expressions.

Related

String match of only 3 specific words

Replacing Strings with a number in it without a for loop

Split comma separated string with quotes and commas within quotes and escaped quotes within quotes

inserting parentheses and asterisks into string according to some conditions

Java regex and pattern matching: finding "blanks" in pattern which do not include them?

Categories

Resources