Android split string with regex not working

Android split string with regex not working - java

I'm trying to split a string at every "." or "?" and I use this regular expression:
(?<=(?!.[0-9])[?.])
In theory the code also prevents splitting if the point is followed by a number so things like 3.000 are not split and it also includes the point in the new string.
For example if I have this text: "Hello. What's your favourite number? It's 3.560." I want to get thi: "Hello.","What's your favourite number?","It's 3.560."
I've made a simple java program on my computer and it works exactly like I want:
String[] x = c.split("(?<=(?!.[0-9])[?.])");
for(String v : x){
System.out.println(v);
}
However when I use this same regex in my Android app it doesn't split anything...
x = txt.split("(?<=(?!.[0-9])[?.])");
//This, in my android app, returns an array with only one entry which is the whole string without splitting.
PS. Using (?<=[?.]) works so the problem must be in the (?!.[0-9]) part which is meant to exclude points followed by a number.

Use regex pattern
(?:(?<=[.])(?![0-9])|(?<=[?]))
str.split("(?:(?<=[.])(?![0-9])|(?<=[?]))");

Remember that outside a square bracket character class, . in a regular expression means any single character. You need \. to match a literal dot, which in turn means you need \\. in the string literal.

Try this.
public class Tester {
public static void main(String[] args){
String regex = "[?.][^\\d]";
String tester = "Testing 3.015 dsd . sd ? sds";
String[] arr = tester.split(regex);
for (String s : arr){
System.out.println(s);
}
}
}
Output:
Testing 3.015 dsd
sd
sds

Related

How to remove all non-alphabetic character excluding space using meta character from a string?

Is there a way to achieve this using meta character?
I can do this by the following regular expression : (note the space in between)
s.replaceAll("[^A-Z a-z]","")
But how to do this using meta character as I can't implement AND.
Following code obviously doesn't work , but how to do a similar thing?
s.replaceAll("\\S&\\d|\\W, "")

You can do as follows:
s = s.replaceAll("[^a-zA-Z\\s]", "");
To keep number too, you can do:
s = s.replaceAll("[^a-zA-Z0-9\\s]", "");

You can use the regex, [^A-Za-z\\s].
Demo:
public class Main {
public static void main(String[] args) {
String s = "123 & Hello * World!";
s = s.replaceAll("[^A-Za-z\\s]", "");
System.out.println(s);
}
}
Output:
Hello World

[^A-Z a-z] is called a character class (negated here).
\S, \w, etc. are called shorthand character classes.
You ask if you can match any character different from a letter and space with a shorthand character class.
The answer is: there is no such a shorthand class, but there are alternatives:
[^\\p{Alpha} ]
[^\\p{Alpha}\\p{javaSpaceChar}]
[\\p{N}\\p{P}\\p{S}]

Filtering string between double or single quotations with varying spaces

I have these two variations of this string
name='Anything can go here'
name="Anything can go here"
where name= can have spaces like so
name=(text)
name =(text)
name = (text)
I need to extract the text between the quotes, I'm not sure what's the best way to approach this, should I just have mechanism to cut the string off at quotes and do you have an example where I wont have many case handling, or should I use regex.

I'm not sure I understand the question exactly but I'll give it my best shot:
If you want to just assign a variable name2 to the string inside the quotation marks then you can easily do :
String name = 'Anything can go here';
String name2= name.replace("'","");
name2 = name2.replace("\"","");

You're wanting to get Anything can go here whether it's in between single quotes or double quotes. Regex has the capabilities of doing this regardless of the spaces before or after the "=" by using the following pattern:
"[\"'](.+)[\"']"
Breakdown:
[\"'] - Character class consisting of a double or single quote
(.+) - One or more of any character (may or may not match line terminators stored in capture group 1
[\"'] - Character class consisting of a double or single quote
In short, we are trying to capture anything between single or double quotes.
Example:
public static void main(String[] args) {
List<String> data = new ArrayList(Arrays.asList(
"name='Anything can go here'",
"name = \"Really! Anything can go here\""
));
for (String d : data) {
Matcher matcher = Pattern.compile("[\"'](.+)[\"']").matcher(d);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}
}
Results:
Anything can go here
Really! Anything can go here

Word not preceded by a regular expression

There are plenty of these questions but they all focus on having a couple of characters.
In a text file i have TXX and txx and i need to find those. But I also have Base64 encoded pictures.
Meaning I have
"picture":"/9j/4AAQSkTXX . . .
Basically TXX, txx can appear randomly in Base64-encoded pictures.
I used the following regular expression:
(?<!"picture":")(?:(\w|\/|\+)+)(TXX|txx)
I also realized it should probably be changed into:
(?<!"picture":")(?:(\d|\w|\/|\+|\=)+)(TXX|txx)
But it says I'm doing a catastrophic backtracking, and even without the (?:) (non-capturing group) it still doesn't work. Basically it just doesn't take the "picture":" and the first char and takes everything else.
Since I cannot put a regular expression inside the negative look-behind with a quantifier like
(?<!"picture":".+)TXX|txx
How should I form that regular expression so that these pass
"something-txx": "somerandomstring"
value not picture: "some other stringtxxsome string"
But this doesn't
"picture":"txxl5l71JGwnxMXAmJGOt8ZPwN24JNgtZpYHPBQLTViqVatk4ZoZhY+husj7Pgv3ag4NmpJ4CBlXudzydA5c+5QecmgaPz9vLrSbzRa+tNns0GjUfD+NSa5ZHo9KRf2nCWLl7360x2Kx8zA6dquNqubjoElpVRo2Dq0GOmZ8HMycktxxH08veKg84OPlCZvdDqvNxkPhOB0sn5wly+vdgx1Di82KzMxMlAoJQZkSJdGjZ0+UrlCJi/Xysc5GCPETtxxgUAgEAieNoQQLygg/P8K8VLaFCVVez+/SfMmPo74sNyxGz+/0YI8QKBQCAQCP4DPG6MeLrZcQvihFar46L6govdPE69movlMhIPh0NYaRJTtu2e+FQWyPkqDSsLqker0fKJVR0Oe5ap1RqoWD+pfuo7hefhbVJcfA8VlK42ycudJlIlMd1iMrnakePok5BPDyoUSvnhBMsEs9XMQ+PYrDQRqwd0Oj2vh/eVleXj5OMF7BSqhq2YjEa2TQ83nNDrPeHp5YWQEmXg4+vPPeLzIoR4gUAgEAcvvgETxtCiBcI/ifY2Y2aA57eWu7lJBAIBAKBQCB4eP62EC/JYWmoPBnFeieRnGKnk7e3yWTiYjN5fZPYLId5kcV67sHtcLBt+vZG4VzIu93lVe8SqUmsdzpsrDz7jse2tZrs+O/kxc7z5oGE/PtB+XOWs7tCtpB4z9NIkGf9YU3JeSmb0yV422np5AI8eaTXX"
Sample input is on :
http://pastebin.com/5XJVNqGS
(I know pastebin is bad since the expiration but i'm having problem pasting that amount of text as the page stucks)
And the results should be:
Result1: "some-txx": value
Result2: hereisTXX: "1235"
Result3: "GROUPDATA" : "{DATA1: sample, TXX-value:12312 ,DATA2: sample2}"

I believe you can use a rather useful Java "to-some-extent" variable-width look-behind:
(?<!"picture":"[^"]{0,10000})(?i:txx)
You can adjust the 10000 value in case you have longer Base64-encoded strings.
Tested on RegexPlanet
In case you have very large images, use a reverse-string trick with a reversed regex (look-aheads can be of undefined variable size):
String rx = "(?i)\"[^\"]*\"\\s*:\\s*\"[^\"]*xxt[^\"]*\"(?![^\"]*\":\"erutcip\")";
Sample Java program on Ideone:
import java.util.regex.*;
class HelloWorld{
public static void main(String []args){
String str = "THE_HUIGE_STRING_THAT_CAUSED_Body is limited to 30000 characters;you entered 53501_ISSUE";
str = new StringBuilder(str).reverse().toString();
String rx = "\"?[^\"]*\"?\\s*\"?[^\"\\n\\r]*(?:xxt|XXT)[^\"\\n\\r]*(?![^\"]*\":\"erutcip\")";
Pattern ptrn = Pattern.compile(rx);
Matcher m = ptrn.matcher(str);
while (m.find()) {
System.out.println(new StringBuilder(m.group(0)).reverse().toString());
}
m = ptrn.matcher(new StringBuilder("\"something-txx\": \"somerandomstring\"").reverse().toString());
while (m.find()) {
System.out.println(new StringBuilder(m.group(0)).reverse().toString());
}
}
}

java strings with numbers

I am having a group of strings in Arraylist.
I want to remove all the strings with only numbers
and also strings like this : (0.75%),$1.5 ..basically everything that does not contain the characters.
2) I want to remove all special characters in the string before i write to the console.
"God should be printed God.
"Including should be printed: quoteIncluding
'find should be find

Java boasts a very nice Pattern class that makes use of regular expressions. You should definitely read up on that. A good reference guide is here.
I was going to post a coding solution for you, but styfle beat me to it! The only thing I was going to do different here was within the for loop, I would have used the Pattern and Matcher class, as such:
for(int i = 0; i < myArray.size(); i++){
Pattern p = Pattern.compile("[a-z][A-Z]");
Matcher m = p.matcher(myArray.get(i));
boolean match = m.matches();
//more code to get the string you want
}
But that too bulky. styfle's solution is succinct and easy.

When you say "characters," I'm assuming you mean only "a through z" and "A through Z." You probably want to use Regular Expressions (Regex) as D1e mentioned in a comment. Here is an example using the replaceAll method.
import java.util.ArrayList;
public class Test {
public static void main(String[] args) {
ArrayList<String> list = new ArrayList<String>(5);
list.add("\"God");
list.add(""Including");
list.add("'find");
list.add("24No3Numbers97");
list.add("w0or5*d;");
for (String s : list) {
s = s.replaceAll("[^a-zA-Z]",""); //use whatever regex you wish
System.out.println(s);
}
}
}
The output of this code is as follows:
God
quotIncluding
find
NoNumbers
word
The replaceAll method uses a regex pattern and replaces all the matches with the second parameter (in this case, the empty string).

String splitting

I have a string in what is the best way to put the things in between $ inside a list in java?
String temp = $abc$and$xyz$;
how can i get all the variables within $ sign as a list in java
[abc, xyz]
i can do using stringtokenizer but want to avoid using it if possible.
thx

Maybe you could think about calling String.split(String regex) ...

The pattern is simple enough that String.split should work here, but in the more general case, one alternative for StringTokenizer is the much more powerful java.util.Scanner.
String text = "$abc$and$xyz$";
Scanner sc = new Scanner(text);
while (sc.findInLine("\\$([^$]*)\\$") != null) {
System.out.println(sc.match().group(1));
} // abc, xyz
The pattern to find is:
\$([^$]*)\$
\_____/ i.e. literal $, a sequence of anything but $ (captured in group 1)
1 and another literal $
The […] is a character class. Something like [aeiou] matches one of any of the lowercase vowels. [^…] is a negated character class. [^aeiou] matches one of anything but the lowercase vowels.
(…) is used for grouping. (pattern) is a capturing group and creates a backreference.
The backslash preceding the $ (outside of character class definition) is used to escape the $, which has a special meaning as the end of line anchor. That backslash is doubled in a String literal: "\\" is a String of length one containing a backslash).
This is not a typical usage of Scanner (usually the delimiter pattern is set, and tokens are extracted using next), but it does show how'd you use findInLine to find an arbitrary pattern (ignoring delimiters), and then using match() to access the MatchResult, from which you can get individual group captures.
You can also use this Pattern in a Matcher find() loop directly.
Matcher m = Pattern.compile("\\$([^$]*)\\$").matcher(text);
while (m.find()) {
System.out.println(m.group(1));
} // abc, xyz
Related questions
Validating input using java.util.Scanner
Scanner vs. StringTokenizer vs. String.Split

Just try this one:temp.split("\\$");

I would go for a regex myself, like Riduidel said.
This special case is, however, simple enough that you can just treat the String as a character sequence, and iterate over it char by char, and detect the $ sign. And so grab the strings yourself.
On a side node, I would try to go for different demarkation characters, to make it more readable to humans. Use $ as start-of-sequence and something else as end-of-sequence for instance. Or something like I think the Bash shell uses: ${some_value}. As said, the computer doesn't care but you debugging your string just might :)
As for an appropriate regex, something like (\\$.*\\$)* or so should do. Though I'm no expert on regexes (see http://www.regular-expressions.info for nice info on regexes).

Basically I'd ditto Khotyn as the easiest solution. I see you post on his answer that you don't want zero-length tokens at beginning and end.
That brings up the question: What happens if the string does not begin and end with $'s? Is that an error, or are they optional?
If it's an error, then just start with:
if (!text.startsWith("$") || !text.endsWith("$"))
return "Missing $'s"; // or whatever you do on error
If that passes, fall into the split.
If the $'s are optional, I'd just strip them out before splitting. i.e.:
if (text.startsWith("$"))
text=text.substring(1);
if (text.endsWith("$"))
text=text.substring(0,text.length()-1);
Then do the split.
Sure, you could make more sophisticated regex's or use StringTokenizer or no doubt come up with dozens of other complicated solutions. But why bother? When there's a simple solution, use it.
PS There's also the question of what result you want to see if there are two $'s in a row, e.g. "$foo$$bar$". Should that give ["foo","bar"], or ["foo","","bar"] ? Khotyn's split will give the second result, with zero-length strings. If you want the first result, you should split("\$+").

If you want a simple split function then use Apache Commons Lang which has StringUtils.split. The java one uses a regex which can be overkill/confusing.

You can do it in simple manner writing your own code.
Just use the following code and it will do the job for you
import java.util.ArrayList;
import java.util.List;
public class MyStringTokenizer {
/**
* #param args
*/
public static void main(String[] args) {
List <String> result = getTokenizedStringsList("$abc$efg$hij$");
for(String token : result)
{
System.out.println(token);
}
}
private static List<String> getTokenizedStringsList(String string) {
List <String> tokenList = new ArrayList <String> ();
char [] in = string.toCharArray();
StringBuilder myBuilder = null;
int stringLength = in.length;
int start = -1;
int end = -1;
{
for(int i=0; i<stringLength;)
{
myBuilder = new StringBuilder();
while(i<stringLength && in[i] != '$')
i++;
i++;
while((i)<stringLength && in[i] != '$')
{
myBuilder.append(in[i]);
i++;
}
tokenList.add(myBuilder.toString());
}
}
return tokenList;
}
}

You can use
String temp = $abc$and$xyz$;
String array[]=temp.split(Pattern.quote("$"));
List<String> list=new ArrayList<String>();
for(int i=0;i<array.length;i++){
list.add(array[i]);
}
Now the list has what you want.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Android split string with regex not working - java

Use regex pattern (?:(?<=[.])(?![0-9])|(?<=[?])) str.split("(?:(?<=[.])(?![0-9])|(?<=[?]))");

Remember that outside a square bracket character class, . in a regular expression means any single character. You need \. to match a literal dot, which in turn means you need \\. in the string literal.

Try this. public class Tester { public static void main(String[] args){ String regex = "[?.][^\\d]"; String tester = "Testing 3.015 dsd . sd ? sds"; String[] arr = tester.split(regex); for (String s : arr){ System.out.println(s); } } } Output: Testing 3.015 dsd sd sds

Related

How to remove all non-alphabetic character excluding space using meta character from a string?

Filtering string between double or single quotations with varying spaces

Word not preceded by a regular expression

java strings with numbers

String splitting

Categories

Resources