remove all special characters in java [duplicate] - java

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Replacing all non-alphanumeric characters with empty strings
import java.util.Scanner;
import java.util.regex.*;
public class io{
public static void main(String args[]){
Scanner scan = new Scanner(System.in);
String c;
if((c=scan.nextLine())!=null)
{
Pattern pt = Pattern.compile("[^a-zA-Z0-9]");
Matcher match= pt.matcher(c);
while(match.find()){
c=c.replace(Character.toString(c.charAt(match.start())),"");
}
System.out.println(c);
}
}
}
Case 1
Input : hjdg$h&jk8^i0ssh6
Expect : hjdghjk8i0ssh6
Output : hjdgh&jk8^issh6
Case 2
Input : hjdgh&jk8i0ssh6
Expect : hjdghjk8i0ssh6
Output : hjdghjk8i0ssh6
Case 3
Input : hjdgh&j&k8i0ssh6
Expect : hjdghjk8i0ssh6
Output : hjdghjki0ssh6
Anyone please help me to figure out, what is wrong in my code logic ??

use [\\W+] or "[^a-zA-Z0-9]" as regex to match any special characters and also use String.replaceAll(regex, String) to replace the spl charecter with an empty string. remember as the first arg of String.replaceAll is a regex you have to escape it with a backslash to treat em as a literal charcter.
String c= "hjdg$h&jk8^i0ssh6";
Pattern pt = Pattern.compile("[^a-zA-Z0-9]");
Matcher match= pt.matcher(c);
while(match.find())
{
String s= match.group();
c=c.replaceAll("\\"+s, "");
}
System.out.println(c);

You can read the lines and replace all special characters safely this way.
Keep in mind that if you use \\W you will not replace underscores.
Scanner scan = new Scanner(System.in);
while(scan.hasNextLine()){
System.out.println(scan.nextLine().replaceAll("[^a-zA-Z0-9]", ""));
}

Your problem is that the indices returned by match.start() correspond to the position of the character as it appeared in the original string when you matched it; however, as you rewrite the string c every time, these indices become incorrect.
The best approach to solve this is to use replaceAll, for example:
System.out.println(c.replaceAll("[^a-zA-Z0-9]", ""));

Related

kotlin/java match a number in a string with a regular expression [duplicate]

This question already has answers here:
How to extract numbers from a string and get an array of ints?
(13 answers)
Closed 1 year ago.
For example, if I have these strings, is there any way I can get 123 of all these strings, or 777 or 888?
https://www.example.com/any/123/ and
https://www.example.com/any/777/123/ and
https://www.example.com/any/777/123/888
What I mean is how to match the first or second or the third last number in the string.
You can use capture groups to solve this as
val strList = listOf("https://www.example.com/any/777/123/888", "https://www.example.com/any/123/", "https://www.example.com/any/777/123/")
val intList = mutableListOf<Int>()
val regex = Regex("/?(\\d+)")
strList.forEach { str ->
regex.findAll(str).forEach {
intList.add(it.groupValues[1].toInt())
}
}
Assuming the digits all follow a slash and nothing intervenes,
(?<=/)\d+(?=/\d+){0}$ parses the last number
(?<=/)\d+(?=/\d+){1}$ parses the second to last number
(?<=/)\d+(?=/\d+){2}$ parses the third to last,
etc.
With Java, You can make use of the Pattern and Matcher class from the java.util.regex package.
e.g for your case above, you want to match integers - use \d Predefined character class to match digits.
String str = "https://www.example.com/any/777/123/";
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(str);
for(; matcher.find(); System.out.println(matcher.group()));
In the above you loop through the String finding matches, and printing each subsequent found match.

Regex for Delimeter [duplicate]

This question already has answers here:
Is it possible to match nested brackets with a regex without using recursion or balancing groups?
(2 answers)
Closed 5 years ago.
I am trying to write a regex for delimiters “(“, “)”, “,”. I tried to write a regex but it is not the correct for the delimeters.
Let's say the input is mult(3,add(2,subs(4,3))). The output with my delimeter regex is: 3,add(2,subs(4,3.
public class Practice {
private static final String DELIMETER = "\\((.*?)\\)";
public static void main(String[] args) {
Scanner reader = new Scanner(System.in);
String arg = reader.next();
Pattern p = Pattern.compile(DELIMETER);
Matcher m = p.matcher(arg);
while (m.find()) {
System.out.println(m.group(1));
}
}
}
What is the correct regex to get string between the delimeters?
In general, you cannot use a regex to match anything which can nest recursively. However, if you removed the ? from your regex, it would match from the first ( to the last ), which might be good enough, depending on what you expect the input to look like.

Java (Regex?) split string between number/letter combination

I've been looking through pages and pages of Google results but haven't come across anything that could help me.
What I'm trying to do is split a string like Bananas22Apples496Pears3, and break it down into some kind of readable format. Since String.split() cannot do this, I was wondering if anyone could point me to a regex snippet that could accomplish this.
Expanding a bit: the above string would be split into (String[] for simplicity's sake):
{"Bananas:22", "Apples:496", "Pears:3"}
Try this
String s = "Bananas22Apples496Pears3";
String[] res = s.replaceAll("(?<=\\p{L})(?=\\d)", ":").split("(?<=\\d)(?=\\p{L})");
for (String t : res) {
System.out.println(t);
}
The first step would be to replace the empty string with a ":", when on the left is a letter with the lookbehind assertion (?<=\\p{L}) and on the right is a digit, with the lookahead assertion (?=\\d).
Then split the result, when on the left is a digit and on the right is a letter.
\\p{L} is a Unicode property that matches every letter in every language.
You need to Replace and then split the string.You can't do it with the split alone
1> Replace All the string with the following regex
(\\w+?)(\\d+)
and replace it with
$1:$2
2> Now Split it with this regex
(?<=\\d)(?=[a-zA-Z])
This should do what you want:
import java.util.regex.*;
String d = "Bananas22Apples496Pears3"
Pattern p = Pattern.compile("[A-Za-z]+|[0-9]+");
Matcher m = p.matcher(d);
while (m.find()) {
System.out.println(m.group());
}
// Bananas
// 22
// Apples
// 496
// Pears
// 3
String myText = "Bananas22Apples496Pears3";
System.out.println(myText.replaceAll("([A-Za-z]+)([0-9]+)", "$1:$2,"));
Replace \d+ by :$0 and then split at (?=[a-zA-Z]+:\d+).

Problems with building this regex [1,2,3]

i have a problem to build following regex:
[1,2,3,4]
i found a work-around, but i think its ugly
String stringIds = "[1,2,3,4]";
stringIds = stringIds.replaceAll("\\[", "");
stringIds = stringIds.replaceAll("\\]", "");
String[] ids = stringIds.split("\\,");
Can someone help me please to build one regex, which i can use in the split function
Thanks for help
edit:
i want to get from this string "[1,2,3,4]" to an array with 4 entries. the entries are the 4 numbers in the string, so i need to eliminate "[","]" and ",". the "," isn't the problem.
the first and last number contains [ or ]. so i needed the fix with replaceAll. But i think if i use in split a regex for ",", i also can pass a regex which eliminates "[" "]" too. But i cant figure out, who this regex should look like.
This is almost what you're looking for:
String q = "[1,2,3,4]";
String[] x = q.split("\\[|\\]|,");
The problem is that it produces an extra element at the beginning of the array due to the leading open bracket. You may not be able to do what you want with a single regex sans shenanigans. If you know the string always begins with an open bracket, you can remove it first.
The regex itself means "(split on) any open bracket, OR any closed bracket, OR any comma."
Punctuation characters frequently have additional meanings in regular expressions. The double leading backslashes... ugh, the first backslash tells the Java String parser that the next backslash is not a special character (example: \n is a newline...) so \\ means "I want an honest to God backslash". The next backslash tells the regexp engine that the next character ([ for example) is not a special regexp character. That makes me lol.
Maybe substring [ and ] from beginning and end, then split the rest by ,
String stringIds = "[1,2,3,4]";
String[] ids = stringIds.substring(1,stringIds.length()-1).split(",");
Looks to me like you're trying to make an array (not sure where you got 'regex' from; that means something different). In this case, you want:
String[] ids = {"1","2","3","4"};
If it's specifically an array of integer numbers you want, then instead use:
int[] ids = {1,2,3,4};
Your problem is not amenable to splitting by delimiter. It is much safer and more general to split by matching the integers themselves:
static String[] nums(String in) {
final Matcher m = Pattern.compile("\\d+").matcher(in);
final List<String> l = new ArrayList<String>();
while (m.find()) l.add(m.group());
return l.toArray(new String[l.size()]);
}
public static void main(String args[]) {
System.out.println(Arrays.toString(nums("[1, 2, 3, 4]")));
}
If the first line your code is following:
String stringIds = "[1,2,3,4]";
and you're trying to iterate over all number items, then the follwing code-frag only could work:
try {
Pattern regex = Pattern.compile("\\b(\\d+)\\b", Pattern.MULTILINE);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
for (int i = 1; i <= regexMatcher.groupCount(); i++) {
// matched text: regexMatcher.group(i)
// match start: regexMatcher.start(i)
// match end: regexMatcher.end(i)
}
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}

Splitting strings based on a delimiter

I am trying to break apart a very simple collection of strings that come in the forms of
0|0
10|15
30|55
etc etc. Essentially numbers that are seperated by pipes.
When I use java's string split function with .split("|"). I get somewhat unpredictable results. white space in the first slot, sometimes the number itself isn't where I thought it should be.
Can anybody please help and give me advice on how I can use a reg exp to keep ONLY the integers?
I was asked to give the code trying to do the actual split. So allow me to do that in hopes to clarify further my problem :)
String temp = "0|0";
String splitString = temp.split("|");
results
\n
0
|
0
I am trying to get
0
0
only. Forever grateful for any help ahead of time :)
I still suggest to use split(), it skips null tokens by default. you want to get rid of non numeric characters in the string and only keep pipes and numbers, then you can easily use split() to get what you want. or you can pass multiple delimiters to split (in form of regex) and this should work:
String[] splited = yourString.split("[\\|\\s]+");
and the regex:
import java.util.regex.*;
Pattern pattern = Pattern.compile("\\d+(?=([\\|\\s\\r\\n]))");
Matcher matcher = pattern.matcher(yourString);
while (matcher.find()) {
System.out.println(matcher.group());
}
The pipe symbol is special in a regexp (it marks alternatives), you need to escape it. Depending on the java version you are using this could well explain your unpredictable results.
class t {
public static void main(String[]_)
{
String temp = "0|0";
String[] splitString = temp.split("\\|");
for (int i=0; i<splitString.length; i++)
System.out.println("splitString["+i+"] is " + splitString[i]);
}
}
outputs
splitString[0] is 0
splitString[1] is 0
Note that one backslash is the regexp escape character, but because a backslash is also the escape character in java source you need two of them to push the backslash into the regexp.
You can do replace white space for pipes and split it.
String test = "0|0 10|15 30|55";
test = test.replace(" ", "|");
String[] result = test.split("|");
Hope this helps for you..
You can use StringTokenizer.
String test = "0|0";
StringTokenizer st = new StringTokenizer(test);
int firstNumber = Integer.parseInt(st.nextToken()); //will parse out the first number
int secondNumber = Integer.parseInt(st.nextToken()); //will parse out the second number
Of course you can always nest this inside of a while loop if you have multiple strings.
Also, you need to import java.util.* for this to work.
The pipe ('|') is a special character in regular expressions. It needs to be "escaped" with a '\' character if you want to use it as a regular character, unfortunately '\' is a special character in Java so you need to do a kind of double escape maneuver e.g.
String temp = "0|0";
String[] splitStrings = temp.split("\\|");
The Guava library has a nice class Splitter which is a much more convenient alternative to String.split(). The advantages are that you can choose to split the string on specific characters (like '|'), or on specific strings, or with regexps, and you can choose what to do with the resulting parts (trim them, throw ayway empty parts etc.).
For example you can call
Iterable<String> parts = Spliter.on('|').trimResults().omitEmptyStrings().split("0|0")
This should work for you:
([0-9]+)
Considering a scenario where in we have read a line from csv or xls file in the form of string and need to separate the columns in array of string depending on delimiters.
Below is the code snippet to achieve this problem..
{ ...
....
String line = new BufferedReader(new FileReader("your file"));
String[] splittedString = StringSplitToArray(stringLine,"\"");
...
....
}
public static String[] StringSplitToArray(String stringToSplit, String delimiter)
{
StringBuffer token = new StringBuffer();
Vector tokens = new Vector();
char[] chars = stringToSplit.toCharArray();
for (int i=0; i 0) {
tokens.addElement(token.toString());
token.setLength(0);
i++;
}
} else {
token.append(chars[i]);
}
}
if (token.length() > 0) {
tokens.addElement(token.toString());
}
// convert the vector into an array
String[] preparedArray = new String[tokens.size()];
for (int i=0; i < preparedArray.length; i++) {
preparedArray[i] = (String)tokens.elementAt(i);
}
return preparedArray;
}
Above code snippet contains method call to StringSplitToArray where in the method converts the stringline into string array splitting the line depending on the delimiter specified or passed to the method. Delimiter can be comma separator(,) or double code(").
For more on this, follow this link : http://scrapillars.blogspot.in

Categories

Resources