Using regex to separate individual words? - java

I have the following line to split a sentence into words and store it into an array based on white spaces: string[] s = Regex.Split(input, #"\s+");
The problem is at the end of the sentence, it also picks up the period. For example: C# is cool.
The code would store:
C#
is
cool.
The question is: How do I get it not to pick up the period ?

You can use a character class [] to add in the dot . or other characters that you need to split on.
string[] s = Regex.Split(input, #"[\s.]+");
See Demo

You can add dot (and other punctuation marks as needed) to the regular expression, like this:
string[] s = Regex.Split(input, #"(\s|[.;,])+");

string[] s = Regex.Split(input, #"[^\w#]+");
You may need to add more characters to set [^\w#], so it will work for you based on your requirements...

Use the non-word character pattern: \W
string[] s = Regex.Split(input, #"\W+");

Consider using Regex.Matches as alternative for your requirement...
string[] outputMessage = Regex.Matches(inputMessage, #"\w+").Cast<Match>().Select(match => match.Value).ToArray();
Good Luck!

Related

Java regex substring not followed by comma character

I have a string that I am using String.split(regex) to eventually get a string[].
The string format is
January,WEEKDAY,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,AJanuary,WEEKEND,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,B,B,BJanuary,HOLIDAY,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,C,C,CFebruary,WEEKDAY,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,AFebruary,WEEKEND,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,B,B,BFebruary,HOLIDAY,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,AMarch,WEEKDAY,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,C,C,C
The first string after the split should be
January,WEEKDAY,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A
So I'm thinking I need to do a split at either ,A or ,B or ,C that is not followed by a ,
To test the first one, I tried making my regex "(?<!,)A," but that didn't work
Any ideas?
It seems you're looking for something like the following:
String[] parts = s.split("(?<=,[ABC](?!,))");
Or you can use a word/non-word boundary here as well:
String[] parts = s.split("(?<=\\b[ABC]\\B)");
Ideone Demo
You can also use (?<=,[ABC])(?=[^,]) to split.

Split String if it has number

Hi Guys its been a while since I ask another question,
I have this String which consist of a name and a number
Ex.
String myString = "give11arrow123test2356read809cell1245cable1257give222..."
Now what I am trying to do is to split it whenever there is a number attached to it
I have to split it so that I could have a result like this
give11, arrow123, test2356, read809, cell1245, cable1257, give222, ....
I could use this code but I cant find the right regex
String[] arrayString = myString.split("Regex")
Thanks for your help.
You can use a combination of lookarounds to split your string.
Lookarounds are zero-width assertions. They don't consume any characters on the string. The point of zero-width is the validation to see if a regex can or cannot be matched looking ahead or looking back from the current position, without adding them to the overall match.
String s = "give11arrow123test2356read809cell1245cable1257give222...";
String[] parts = s.split("(?<=\\d)(?=\\D)");
System.out.println(Arrays.toString(parts));
Output
[give11, arrow123, test2356, read809, cell1245, cable1257, give222, ...]
Use this regex for spliting
String regex = "(?<=\\d)(?=\\D)";
I am unfamiliar with using regex in java, but this expression matches what you need on www.rubular.com
([A-Za-z]+[0-9]+)

for some reason my split() works in one line of my code but not the other line

this line of code is messing with me,
String[] parseEmailDomain = parseEmail[1].split(".");
When i do System.out.println(parseEmailDomain.length) the size of the array ends up being 0 BUT the output of System.out.println(parseEmail[1]) is
cs.uh.edu
anyone have any idea as to why when I try to split the array, it doesn't split it but when I try to just output the array it outputs perfectly fine?
I am able to do this
String[] parseEmail = parseLn[i].split("#");
the out put of System.out.println(parseEmail[0]); is hanak
and parseLn is an entire line from a text file
because . will match anything because split takes a regex!
you will need to escape the dot with \:
String[] parseEmailDomain = parseEmail[1].split("\\.");
See also the related answer here: Java RegEx meta character (.) and ordinary dot?
Try to put two backslash's before the dot:
String[] parseEmailDomain = parseEmail[1].split("\\.");
A dot in regexes means any character.
parameter of 'split(String)' is actually regular expression so use Scanner or StringTokenizer instead

String splitting with different character

i think it is a weird question. So here is my splitting:
String s = "asd#asd";
String[] raw1 = s.split("#"); // this has size of two raw[0] = raw[1] = "asd"
However,
String s = "asd$asd";
String[] raw2 = s.split("$"); // this has size of ONE
raw2 is not splitted. Does anyone know why?
Because split() takes a regexp, and $ indicates the end-of-line. If you need to split on a character that is actually a regexp metacharacter, then you'll need to escape it.
See Pattern for the regexp metacharacters.
You may find that StringTokenizer is more appropriate for your needs. This will take a list of characters that you should split on, and it won't interpret them as regular expression metacharacters. However it's a little more verbose and unweildy to use. As Nandkumar notes below, the latest docs states that it is discouraged in new code.
Because split() takes a regex and $ matches the end of a line.
You have to escape it :
s.split("\\$");
See Pattern documentation for more information on regexes.
You have to escape it:
String s = "asd$asd";
String[] raw2 = s.split("\\$"); // this has size of TWO
You need to escape special character, make it
s.split("\\$");

Java Split on Regex

I cant seem to be able to split on a simple regex,
If i have a string [data, data2] and i attempt to split like so: I tried to escape the brackets.
String regex = "\\[,\\]";
String[] notifySplit = notifyWho.split(regex);
The output of looping through notifySplit shows this regex not working
notify: [Everyone, Teachers only]
Any help on what the proper regex is, i am expecting an array like so:
data, data2
where i could possibly ignore these two characters [ ,
First, you don't want to split on the brackets. You just want to exclude them from your end result. So first thing you'll probably want to do is strip those out:
notifyWho = notifyWho.replace("[", "").replace("]", "");
Then you can do a basic split on the comma:
String[] notifySplit = notifyWho.split(",");
I would do it in one line, first removing the square brackets, then splitting:
String[] notifySplit = notifyWho.replaceAll("[[\\]]", "").split(",");

Categories

Resources