Java regex substring not followed by comma character - java

I have a string that I am using String.split(regex) to eventually get a string[].
The string format is
January,WEEKDAY,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,AJanuary,WEEKEND,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,B,B,BJanuary,HOLIDAY,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,C,C,CFebruary,WEEKDAY,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,AFebruary,WEEKEND,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,B,B,BFebruary,HOLIDAY,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,AMarch,WEEKDAY,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,C,C,C
The first string after the split should be
January,WEEKDAY,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A
So I'm thinking I need to do a split at either ,A or ,B or ,C that is not followed by a ,
To test the first one, I tried making my regex "(?<!,)A," but that didn't work
Any ideas?

It seems you're looking for something like the following:
String[] parts = s.split("(?<=,[ABC](?!,))");
Or you can use a word/non-word boundary here as well:
String[] parts = s.split("(?<=\\b[ABC]\\B)");
Ideone Demo

You can also use (?<=,[ABC])(?=[^,]) to split.

Related

Splitting a sentence

I'm trying to split a string: multiple characters such as !!! , ??, ... denote the end of the sentence so I want anything after this to be on a new line e.g. sentence hey.. hello split !!! example me. should be turned into:
hey..
hello split !!!
example me.
What I tried:
String myStr= "hey.. hello split !!! example me.";
String [] split = myStr.split("(?<=\\.{2,})");
This works fine when I have multiple dots but doesn't work for anything else, I can't add exclamation marks to this expression too "(?<=[\\.{2,}!{2,}]). This splits after each dot and exclamation. Is there any way to combine those ?
Ideally I wanted the app to split after a SINGLE dot too (anything that denotes the end of the sentence) but I don't think this is possible in a single pass...Thanks
Just do like this,
String [] split = myStr.split("(?<=([?!.])\\1+)");
oir
String [] split = myStr.split("(?<=([?!.])\\1{1,99})");
It captures the first character from the list [?.!] and expects the same character to be present one or more times. If yes, then the splitting should occur next to this.
or
String[] split = s.split("(?<=\\.{2,}+)|(?<=\\?{2,}+)|(?<=!{2,}+)");
Ideone
Ideally I wanted the app to split after a SINGLE dot too (anything that denotes the end of the sentence)
To do this first you have to lay down as to what cases are you considering as end of sentence. Multiple special symbols are not standard form of ending a sentence (as per my knowledge).
But if you are keeping in mind the nefarious users or some casual mistakes ending up making special symbols look like end of sentence then at least make a list of such cases and then proceed.
For your situation here where you want to split the string on multiple special symbols. Lookbehind won't be of much help because as Wiktor noted
The problem is in the backreference whose length is not known from the start.
So we need to find that zero-width where splitting needs to be done. And following regex does the same.
Regex:
(?<=[.!?])(?=[^.!?]) Regex101 Demo Ideone Demo
(?<=[.!?]) (?=[^.!?]) Regex101 Demo Ideone Demo
Note the space between two assertions in second regex.If you want to consume the preceding space when start next line.
Explanation:
This will split on the zero-width where it's preceded by special and not succeeded by it.
hey..¦ hello split !!!¦ example me. ( ¦ denotes the zero-width)
A look behind, with a negative look to prevent split within the group:
String[] lines = s.split("(?<=[?!.]{2,3})(?![?!.])");
Some test code:
public static void main (String[] args) {
String s = "hey..hello split !!!example me.";
String[] lines = s.split("(?<=[?!.]{2,3})(?![?!.])");
Arrays.stream(lines).forEach(System.out::println);
}
Output:
hey..
hello split !!!
example me.

Split String if it has number

Hi Guys its been a while since I ask another question,
I have this String which consist of a name and a number
Ex.
String myString = "give11arrow123test2356read809cell1245cable1257give222..."
Now what I am trying to do is to split it whenever there is a number attached to it
I have to split it so that I could have a result like this
give11, arrow123, test2356, read809, cell1245, cable1257, give222, ....
I could use this code but I cant find the right regex
String[] arrayString = myString.split("Regex")
Thanks for your help.
You can use a combination of lookarounds to split your string.
Lookarounds are zero-width assertions. They don't consume any characters on the string. The point of zero-width is the validation to see if a regex can or cannot be matched looking ahead or looking back from the current position, without adding them to the overall match.
String s = "give11arrow123test2356read809cell1245cable1257give222...";
String[] parts = s.split("(?<=\\d)(?=\\D)");
System.out.println(Arrays.toString(parts));
Output
[give11, arrow123, test2356, read809, cell1245, cable1257, give222, ...]
Use this regex for spliting
String regex = "(?<=\\d)(?=\\D)";
I am unfamiliar with using regex in java, but this expression matches what you need on www.rubular.com
([A-Za-z]+[0-9]+)

Using regex to separate individual words?

I have the following line to split a sentence into words and store it into an array based on white spaces: string[] s = Regex.Split(input, #"\s+");
The problem is at the end of the sentence, it also picks up the period. For example: C# is cool.
The code would store:
C#
is
cool.
The question is: How do I get it not to pick up the period ?
You can use a character class [] to add in the dot . or other characters that you need to split on.
string[] s = Regex.Split(input, #"[\s.]+");
See Demo
You can add dot (and other punctuation marks as needed) to the regular expression, like this:
string[] s = Regex.Split(input, #"(\s|[.;,])+");
string[] s = Regex.Split(input, #"[^\w#]+");
You may need to add more characters to set [^\w#], so it will work for you based on your requirements...
Use the non-word character pattern: \W
string[] s = Regex.Split(input, #"\W+");
Consider using Regex.Matches as alternative for your requirement...
string[] outputMessage = Regex.Matches(inputMessage, #"\w+").Cast<Match>().Select(match => match.Value).ToArray();
Good Luck!

String splitting with different character

i think it is a weird question. So here is my splitting:
String s = "asd#asd";
String[] raw1 = s.split("#"); // this has size of two raw[0] = raw[1] = "asd"
However,
String s = "asd$asd";
String[] raw2 = s.split("$"); // this has size of ONE
raw2 is not splitted. Does anyone know why?
Because split() takes a regexp, and $ indicates the end-of-line. If you need to split on a character that is actually a regexp metacharacter, then you'll need to escape it.
See Pattern for the regexp metacharacters.
You may find that StringTokenizer is more appropriate for your needs. This will take a list of characters that you should split on, and it won't interpret them as regular expression metacharacters. However it's a little more verbose and unweildy to use. As Nandkumar notes below, the latest docs states that it is discouraged in new code.
Because split() takes a regex and $ matches the end of a line.
You have to escape it :
s.split("\\$");
See Pattern documentation for more information on regexes.
You have to escape it:
String s = "asd$asd";
String[] raw2 = s.split("\\$"); // this has size of TWO
You need to escape special character, make it
s.split("\\$");

Java Split on Regex

I cant seem to be able to split on a simple regex,
If i have a string [data, data2] and i attempt to split like so: I tried to escape the brackets.
String regex = "\\[,\\]";
String[] notifySplit = notifyWho.split(regex);
The output of looping through notifySplit shows this regex not working
notify: [Everyone, Teachers only]
Any help on what the proper regex is, i am expecting an array like so:
data, data2
where i could possibly ignore these two characters [ ,
First, you don't want to split on the brackets. You just want to exclude them from your end result. So first thing you'll probably want to do is strip those out:
notifyWho = notifyWho.replace("[", "").replace("]", "");
Then you can do a basic split on the comma:
String[] notifySplit = notifyWho.split(",");
I would do it in one line, first removing the square brackets, then splitting:
String[] notifySplit = notifyWho.replaceAll("[[\\]]", "").split(",");

Categories

Resources