Regex pattern for ^A - java

I need to remove ^A from an incoming string, I'm looking for a regex pattern for it
Don’t want to use \\p{cntrl} , I don’t want to delete the other control characters coming in the string

You should use escaping for '^A':
public static void main(String[] args) {
String value = "remove special char ^A A and B";
System.out.println(value.replaceAll("\\^A", ""));
}
Output:
remove special char A and B

I suggest you avoid using regex and instead use basic string manipulation :
String toRemove = "^A";
yourString.replace(toRemove, "");
You can try it here.
Be careful not to use String methods that work on regexs, especially since the name of the methods aren't very informative (replace replaces all occurences of a fixed string, replaceAll replaces all occurences that match a regex pattern).

Related

Java regex to match the start of the word?

Objective: for a given term, I want to check if that term exist at the start of the word. For example if the term is 't'. then in the sentance:
"This is the difficult one Thats it"
I want it to return "true" because of :
This, the, Thats
so consider:
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "/\\b"+term+"[^\\b]*?\\b/gi";
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
}
}
I am getting following Exception:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 7
/\bt[^\b]*?\b/gi
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.escape(Pattern.java:2416)
at java.util.regex.Pattern.range(Pattern.java:2577)
at java.util.regex.Pattern.clazz(Pattern.java:2507)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.util.regex.Pattern.matches(Pattern.java:1128)
at java.lang.String.matches(String.java:2063)
at HelloWorld.main(HelloWorld.java:8)
Also the following does not work:
import java.util.regex.*;
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "\\b"+term+"gi";
//String regex = ".";
System.out.println(regex);
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
System.out.println(m.find());
}
}
Example:
{ This , one, Two, Those, Thanks }
for words This Two Those Thanks; result should be true.
Thanks
Since you're using the Java regex engine, you need to write the expressions in a way Java understands. That means removing trailing and leading slashes and adding flags as (?<flags>) at the beginning of the expression.
Thus you'd need this instead:
String regex = "(?i)\\b"+term+".*?\\b"
Have a look at regular-expressions.info/java.html for more information. A comparison of supported features can be found here (just as an entry point): regular-expressions.info/refbasic.html
In Java we don't surround regex with / so instead of "/regex/flags" we just write regex. If you want to add flags you can do it with (?flags) syntax and place it in regex at position from which flag should apply, for instance a(?i)a will be able to find aa and aA but not Aa because flag was added after first a.
You can also compile your regex into Pattern like this
Pattern pattern = Pattern.compile(regex, flags);
where regex is String (again not enclosed with /) and flag is integer build from constants from Pattern like Pattern.DOTALL or when you need more flags you can use Pattern.CASE_INSENSITIVE|Pattern.MULTILINE.
Next thing which may confuse you is matches method. Most people are mistaken by its name, because they assume that it will try to check if it can find in string element which can be matched by regex, but in reality, it checks if entire string can be matched by regex.
What you seem to want is mechanism to test of some regex can be found at least once in string. In that case you may either
add .* at start and end of your regex to let other characters which are not part of element you want to find be matched by regex engine, but this way matches must iterate over entire string
use Matcher object build from Pattern (representing your regex), and use its find() method, which will iterate until it finds match for regex, or will find end of string. I prefer this approach because it will not need to iterate over entire string, but will stop when match will be found.
So your code could look like
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.find());
In case your term could contain some regex special characters but you want regex engine to treat them as normal characters you need to make sure that they will be escaped. To do this you can use Pattern.quote method which will add all necessary escapes for you, so instead of
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
for safety you should use
Pattern pattern = Pattern.compile("\\b"+Pattern.quote(term), Pattern.CASE_INSENSITIVE);
String regex = "(?i)\\b"+term;
In Java, the modifiers must be inserted between "(?" and ")" and there is a variant for turning them off again: "(?-" and ")".
For finding all words beginning with "T" or "t", you may want to use Matcher's find method repeatedly. If you just need the offset, Matcher's start method returns the offset.
If you need to match the full word, use
String regex = "(?i)\\b"+term + "\\w*";
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("^[+"+term+"].*",Pattern.CASE_INSENSITIVE);
String[] strings = str.split(" ");
for (String s : strings) {
if (pattern.matcher(s).matches()) {
System.out.println(s+"-->"+true);
} else {
System.out.println(s+"-->"+false);
}
}

Case insensitive String split() method

When I perform
String test="23x34 ";
String[] array=test.split("x"); //splitting using simple letter
I got two items in array as 23 and 34
but when I did
String test="23x34 ";
String[] array=test.split("X"); //splitting using capitalletter
I got one item in array 23x34
So is there any way I can use the split method as case insensitive or whether there is any other method that can help?
split uses, as the documentation suggests, a regexp. a regexp for your example would be :
"[xX]"
Also, the (?i) flag toggles case insensitivty. Therefore, the following is also correct :
"(?i)x"
In this case, x can be any litteral properly escaped.
Use regex pattern [xX] in split
String x = "24X45";
String[] res = x.split("[xX]");
System.out.println(Arrays.toString(res));
You can also use an embedded flag in your regex:
String[] array = test.split("(?i)x"); // splits case insensitive
I personally prefer using
String modified = Pattern.compile("x", Pattern.CASE_INSENSITIVE).matcher(stringContents).replaceAll(splitterValue);
String[] parts = modified.split(splitterValue);
In this way you can ensure any regex will work, as long as you have a unique splitter value
In addition to the existing answers, you can use Pattern.CASE_INSENSITIVE flag to convert your regex pattern into a case-insensitive pattern which you can directly use to split your string e.g.
String[] arr = Pattern.compile("x", Pattern.CASE_INSENSITIVE).split("23x34 ");
Demo:
import java.util.Arrays;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("x", Pattern.CASE_INSENSITIVE);
System.out.println(Arrays.toString(pattern.split("23x34 ")));
System.out.println(Arrays.toString(pattern.split("23X34 ")));
}
}
Output:
[23, 34 ]
[23, 34 ]
Java's String class' split method also accepts regex.
To keep things short, this should help you: http://www.coderanch.com/t/480781/java/java/String-split
For JavaScript:
var test="23x34 ";
var array = test.split(\x\i);
It's a bit complex, but here's how it could be implemented:
Lowercase both the strings (overall text and search term)
Run the text.split(searchTerm)
This gives you an array of strings that are NOT search terms
By walking through this array, you're getting lengths of each of these strings
Between each of those strings, there must be a search term (with known length)
By figuring out indexes, you can now .slice() the pieces from the original string
You could use a regex as an argument to split, like this:
"32x23".split("[xX]");
Or you could use a StringTokenizer that lets you set its set of delimiters, like this:
StringTokenizer st = new StringTokenizer("32x23","xX");
// ^^ ^^
// string delimiter
This has the advantage that if you want to build the list of delimiters programatically, for example for each lowercase letter in the delimiter list add its uppercase corespondent, you can do this and then pass the result to the StringTokenizer.

Java using regex to verify an input string

g.:
String string="Marc Louie, Garduque Bautista";
I want to check if a string contains only words, a comma and spaces. i have tried to use regex and the closest I got is this :
String pattern = "[a-zA-Z]+(\\s[a-zA-Z]+)+";
but it doesnt check if there is a comma in there or not. Any suggestion ?
You need to use the pattern
^[A-Za-z, ]++$
For example
public static void main(String[] args) throws IOException {
final String input = "Marc Louie, Garduque Bautista";
final Pattern pattern = Pattern.compile("^[A-Za-z, ]++$");
if (!pattern.matcher(input).matches()) {
throw new IllegalArgumentException("Invalid String");
}
}
EDIT
As per Michael's astute comment the OP might mean a single comma, in which case
^[A-Za-z ]++,[A-Za-z ]++$
Ought to work.
Why not just simply:
"[a-zA-Z\\s,]+"
Use this will best
"(?i)[a-z,\\s]+"
If you mean "some words, any spaces and one single comma, wherever it occurs to be" then my feeling is to suggest this approach:
"^[^,]* *, *[^,]*$"
This means "Start with zero or more characters which are NOT (^) a comma, then you could find zero or more spaces, then a comma, then again zero or more spaces, then finally again zero or more characters which are NOT (^) a comma".
To validate String in java where No special char at beginning and end but may have some special char in between.
String strRGEX = "^[a-zA-Z0-9]+([a-zA-Z0-9-/?:.,\'+_\\s])+([a-zA-Z0-9])$";
String toBeTested= "TesADAD2-3t?S+s/fs:fds'f.324,ffs";
boolean testResult= Pattern.matches(strRGEX, toBeTested);
System.out.println("Test="+testResult);

How to remove special characters from a string?

I want to remove special characters like:
- + ^ . : ,
from an String using Java.
That depends on what you define as special characters, but try replaceAll(...):
String result = yourString.replaceAll("[-+.^:,]","");
Note that the ^ character must not be the first one in the list, since you'd then either have to escape it or it would mean "any but these characters".
Another note: the - character needs to be the first or last one on the list, otherwise you'd have to escape it or it would define a range ( e.g. :-, would mean "all characters in the range : to ,).
So, in order to keep consistency and not depend on character positioning, you might want to escape all those characters that have a special meaning in regular expressions (the following list is not complete, so be aware of other characters like (, {, $ etc.):
String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");
If you want to get rid of all punctuation and symbols, try this regex: \p{P}\p{S} (keep in mind that in Java strings you'd have to escape back slashes: "\\p{P}\\p{S}").
A third way could be something like this, if you can exactly define what should be left in your string:
String result = yourString.replaceAll("[^\\w\\s]","");
This means: replace everything that is not a word character (a-z in any case, 0-9 or _) or whitespace.
Edit: please note that there are a couple of other patterns that might prove helpful. However, I can't explain them all, so have a look at the reference section of regular-expressions.info.
Here's less restrictive alternative to the "define allowed characters" approach, as suggested by Ray:
String result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");
The regex matches everything that is not a letter in any language and not a separator (whitespace, linebreak etc.). Note that you can't use [\P{L}\P{Z}] (upper case P means not having that property), since that would mean "everything that is not a letter or not whitespace", which almost matches everything, since letters are not whitespace and vice versa.
Additional information on Unicode
Some unicode characters seem to cause problems due to different possible ways to encode them (as a single code point or a combination of code points). Please refer to regular-expressions.info for more information.
This will replace all the characters except alphanumeric
replaceAll("[^A-Za-z0-9]","");
As described here
http://developer.android.com/reference/java/util/regex/Pattern.html
Patterns are compiled regular expressions. In many cases, convenience methods such as String.matches, String.replaceAll and String.split will be preferable, but if you need to do a lot of work with the same regular expression, it may be more efficient to compile it once and reuse it. The Pattern class and its companion, Matcher, also offer more functionality than the small amount exposed by String.
public class RegularExpressionTest {
public static void main(String[] args) {
System.out.println("String is = "+getOnlyStrings("!&(*^*(^(+one(&(^()(*)(*&^%$##!#$%^&*()("));
System.out.println("Number is = "+getOnlyDigits("&(*^*(^(+91-&*9hi-639-0097(&(^("));
}
public static String getOnlyDigits(String s) {
Pattern pattern = Pattern.compile("[^0-9]");
Matcher matcher = pattern.matcher(s);
String number = matcher.replaceAll("");
return number;
}
public static String getOnlyStrings(String s) {
Pattern pattern = Pattern.compile("[^a-z A-Z]");
Matcher matcher = pattern.matcher(s);
String number = matcher.replaceAll("");
return number;
}
}
Result
String is = one
Number is = 9196390097
Try replaceAll() method of the String class.
BTW here is the method, return type and parameters.
public String replaceAll(String regex,
String replacement)
Example:
String str = "Hello +-^ my + - friends ^ ^^-- ^^^ +!";
str = str.replaceAll("[-+^]*", "");
It should remove all the {'^', '+', '-'} chars that you wanted to remove!
To Remove Special character
String t2 = "!##$%^&*()-';,./?><+abdd";
t2 = t2.replaceAll("\\W+","");
Output will be : abdd.
This works perfectly.
Use the String.replaceAll() method in Java.
replaceAll should be good enough for your problem.
You can remove single char as follows:
String str="+919595354336";
String result = str.replaceAll("\\\\+","");
System.out.println(result);
OUTPUT:
919595354336
If you just want to do a literal replace in java, use Pattern.quote(string) to escape any string to a literal.
myString.replaceAll(Pattern.quote(matchingStr), replacementStr)

String splitting

I have a string in what is the best way to put the things in between $ inside a list in java?
String temp = $abc$and$xyz$;
how can i get all the variables within $ sign as a list in java
[abc, xyz]
i can do using stringtokenizer but want to avoid using it if possible.
thx
Maybe you could think about calling String.split(String regex) ...
The pattern is simple enough that String.split should work here, but in the more general case, one alternative for StringTokenizer is the much more powerful java.util.Scanner.
String text = "$abc$and$xyz$";
Scanner sc = new Scanner(text);
while (sc.findInLine("\\$([^$]*)\\$") != null) {
System.out.println(sc.match().group(1));
} // abc, xyz
The pattern to find is:
\$([^$]*)\$
\_____/ i.e. literal $, a sequence of anything but $ (captured in group 1)
1 and another literal $
The […] is a character class. Something like [aeiou] matches one of any of the lowercase vowels. [^…] is a negated character class. [^aeiou] matches one of anything but the lowercase vowels.
(…) is used for grouping. (pattern) is a capturing group and creates a backreference.
The backslash preceding the $ (outside of character class definition) is used to escape the $, which has a special meaning as the end of line anchor. That backslash is doubled in a String literal: "\\" is a String of length one containing a backslash).
This is not a typical usage of Scanner (usually the delimiter pattern is set, and tokens are extracted using next), but it does show how'd you use findInLine to find an arbitrary pattern (ignoring delimiters), and then using match() to access the MatchResult, from which you can get individual group captures.
You can also use this Pattern in a Matcher find() loop directly.
Matcher m = Pattern.compile("\\$([^$]*)\\$").matcher(text);
while (m.find()) {
System.out.println(m.group(1));
} // abc, xyz
Related questions
Validating input using java.util.Scanner
Scanner vs. StringTokenizer vs. String.Split
Just try this one:temp.split("\\$");
I would go for a regex myself, like Riduidel said.
This special case is, however, simple enough that you can just treat the String as a character sequence, and iterate over it char by char, and detect the $ sign. And so grab the strings yourself.
On a side node, I would try to go for different demarkation characters, to make it more readable to humans. Use $ as start-of-sequence and something else as end-of-sequence for instance. Or something like I think the Bash shell uses: ${some_value}. As said, the computer doesn't care but you debugging your string just might :)
As for an appropriate regex, something like (\\$.*\\$)* or so should do. Though I'm no expert on regexes (see http://www.regular-expressions.info for nice info on regexes).
Basically I'd ditto Khotyn as the easiest solution. I see you post on his answer that you don't want zero-length tokens at beginning and end.
That brings up the question: What happens if the string does not begin and end with $'s? Is that an error, or are they optional?
If it's an error, then just start with:
if (!text.startsWith("$") || !text.endsWith("$"))
return "Missing $'s"; // or whatever you do on error
If that passes, fall into the split.
If the $'s are optional, I'd just strip them out before splitting. i.e.:
if (text.startsWith("$"))
text=text.substring(1);
if (text.endsWith("$"))
text=text.substring(0,text.length()-1);
Then do the split.
Sure, you could make more sophisticated regex's or use StringTokenizer or no doubt come up with dozens of other complicated solutions. But why bother? When there's a simple solution, use it.
PS There's also the question of what result you want to see if there are two $'s in a row, e.g. "$foo$$bar$". Should that give ["foo","bar"], or ["foo","","bar"] ? Khotyn's split will give the second result, with zero-length strings. If you want the first result, you should split("\$+").
If you want a simple split function then use Apache Commons Lang which has StringUtils.split. The java one uses a regex which can be overkill/confusing.
You can do it in simple manner writing your own code.
Just use the following code and it will do the job for you
import java.util.ArrayList;
import java.util.List;
public class MyStringTokenizer {
/**
* #param args
*/
public static void main(String[] args) {
List <String> result = getTokenizedStringsList("$abc$efg$hij$");
for(String token : result)
{
System.out.println(token);
}
}
private static List<String> getTokenizedStringsList(String string) {
List <String> tokenList = new ArrayList <String> ();
char [] in = string.toCharArray();
StringBuilder myBuilder = null;
int stringLength = in.length;
int start = -1;
int end = -1;
{
for(int i=0; i<stringLength;)
{
myBuilder = new StringBuilder();
while(i<stringLength && in[i] != '$')
i++;
i++;
while((i)<stringLength && in[i] != '$')
{
myBuilder.append(in[i]);
i++;
}
tokenList.add(myBuilder.toString());
}
}
return tokenList;
}
}
You can use
String temp = $abc$and$xyz$;
String array[]=temp.split(Pattern.quote("$"));
List<String> list=new ArrayList<String>();
for(int i=0;i<array.length;i++){
list.add(array[i]);
}
Now the list has what you want.

Categories

Resources