I have some Strings. They contain some data. Example: "Alberto Macano. Here is description." And another example: "Pablo Don Carlo. Description here."
What I need: A method to split The Name from description. e.g getting the name in one string, and the description in another string. It woudl be easier if id know how much words will name contain, but it can contatin up to 5-6 words, so idk how mcuh will it be. Exact thing that i know, that a punct splits them.
You can use the .split(String regex) method to split the string into an array of strings. So for instance:
String line = "Alberto Macano. Here is description.";
String[] words = line.split("\\.");
The 'words' variable will contain the following:
{0}: Alberto Macano
{1}: Here is description
You might notice that there are two slashes before the period sign, this is because the period is a special keyword in regular expressions, so it has to be escaped by a slash. You might want to look at the Java Regex Documentation for more information.
Use the split(String regex) method in the String class to obtain an array of String objects by splitting a String up based on some regular expression.
[String.split][1] will give you an array of Strings divided on regular expression matches. There's a summary of regular expression constructs in the java.util.regex.Pattern API here.
Related
I have a String input as well as String pattern and assume they could contain all sort of such characters which have special meaning for regex, and I would like exact word replacement to take place without giving any special consideration to special characters. Any special meaning should be ignored. And I won't know at compile time exactly how many such special characters might be present in either the input string or the input pattern.
So here is the formal problem statement:-
Assume the the object input_string is the input of type String.
Then we have another string input_pattern which is also an object of type String.
Now I want to perform the following:-
String result=input_string.replaceFirst(input_pattern,"replacewithsomethingdoesntmatter");
the replacement should take place in 'exact' match manner, without considering any regex special meaning of characters if present in the strings. How to make it happen?
You can use the Pattern.quote() method to escape characters that have a special meaning in regular expressions:
String pattern = "^(.*)$";
String quotedPattern = Pattern.quote(pattern);
System.out.println(quotedPattern);
This will wrap the pattern in quotation markers (\Q and \E), indicating that the wrapped sequence needs to be matched literally.
Alternatively, you can wrap the pattern in quotation markers manually:
String pattern = "^(.*)$";
String quotedPattern = "\\Q" + pattern + "\\E";
System.out.println(quotedPattern);
The first approach is probably safer, because it will also make accommodations for expressions that already contain quotation markers.
I have a String "Student1Student2Student3" for example, is there anyway i can turn this String into a list with three elements e.g. ["Student1", "Student2", "Student3"]?
The above string is just an example, "Student1" could be a random name of variable length. Thanks in advance.
Based on your comment
I think in the final code it would be ideal to split on a lowercase letter followed by an uppercase letter.
it looks like you are looking for split("(?<=[a-z])(?=[A-Z])").
Look-around mechanisms are zero-length which means they don't include in matched (in delimiter) characters they ware suppose to test. This will allow us to test lower-case character from left and upper-case from right but not consume it while splitting (since they will not be included in match).
You can use the regex ([a-zA-Z]+[0-9]+) to match specific parts of the string (for example in the string Student1Student2Student3, the first match would be Student1, and then Student2.
Here's a hint, use the Pattern class and the Matcher class
:) hopefully it will be enough to help you out.
Sample Input:
a:b
a.in:b
asds.sdsd:b
a:b___a.sds:bc___ab:bd
Sample Output:
a:replaced
a.in:replaced
asds.sdsd:replaced
a:replaced___a.sds:replaced___ab:replaced
String which comes after : should be replaced with custom function.
I have done the same without Regex. I feel it can be replaced with regex as we are trying to extract string out of specific pattern.
For first three cases, it's simple enough to extract String after :, but I couldn't find a way to deal with third case, unless I split the string ___ and apply the approach for first type of pattern and again concatenate them.
Just replace only the letters with exists next to : with the string replaced.
string.replaceAll("(?<=:)[A-Za-z]+", "replaced");
DEMO
or
If you also want to deal with digits, then add \d inside the char class.
string.replaceAll("(?<=:)[A-Za-z\\d]+", "replaced");
(:)[a-zA-Z]+
You can simply do this with string.replaceAll.Replace by $1replaced.See demo.
https://regex101.com/r/fX3oF6/18
I'm using this regex:
x.split("[^a-zA-Z0-9']+");
This returns an array of strings with letters and/or numbers.
If I use this:
String name = "CEN01_Automated_TestCase.java";
String[] names = name.Split.split("[^a-zA-Z0-9']+");
I got:
CEN01
Automated
TestCase
Java
But if I use this:
String name = "CEN01_Automação_Caso_Teste.java";
String[] names = name.Split.split("[^a-zA-Z0-9']+");
I got:
CEN01
Automa
o
Caso
Teste
Java
How can I modify this regex to include accented characters? (á,ã,õ, etc...)
From http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Categories that behave like the java.lang.Character boolean ismethodname methods (except for the deprecated ones) are available through the same \p{prop} syntax where the specified property has the name javamethodname.
Since Character class contains isAlphabetic method you can use
name.split("[^\\p{IsAlphabetic}0-9']+");
You can also use
name.split("(?U)[^\\p{Alpha}0-9']+");
but you will need to use UNICODE_CHARACTER_CLASS flag which can be used by adding (?U) in regex.
I would check out the Java Documentation on Regular Expressions. There is a unicode section which I believe is what you may be looking for.
EDIT: Example
Another way would be to match on the character code you are looking for. For example
\uFFFF where FFFF is the hexadecimal number of the character you are trying to match.
Example: \u00E0 matches à
Realize that the backslash will need to be escaped in Java if you are using it as a string literal.
Read more about it here.
You can use this:
String[] names = name.split("[^a-zA-Z0-9'\\p{L}]+");
System.out.println(Arrays.toString(names)); Will output:
[CEN01, Automação, Caso, Teste, java]
See this for more information.
Why not split on the separator characters?
String[] names = name.split("[_.]");
Instead of blacklisting all the characters you don't want, you could always whitlist the characters you want like :
^[^<>%$]*$
The expression [^(many characters here)] just matches any character that is not listed.
But that is a personnal opinion.
If I have a string "Sc_bookstore.PKG_book.SP_Harrypotter", for example.
How would I separate the schema, package and StoredProc? i.e, I want to get he following output:
Sc_bookstore
PKG_book
SP_Harrypotter
I am going to be using this regex on multiple StoredProc calls, and all of them follow the standard of "Sc_" "PKG_" "SP_"
Thanks in advance!
Use "\\." or "[.]" if a regular expression split delimiter is required (as with String.split).
A . in a Java regular expression means "match any character" (roughly, depending on options); the two forms above escape/prevent this meaning so it matches a literal period.
Any time you need to divide a string by a delimeter, you should consider using the String split method:
String s = "Sc_bookstore.PKG_book.SP_Harrypotter";
String parts[] = s.split("\\.");
parts[0]; // holds "Sc_bookstore"
parts[1]; // holds "PKG_book"
parts[2]; // holds "SP_Harrypotter"
If you really want/need Regex, something in the lines of:
^(Sc_\w+)\.(PKG_\w+)\.(SP_\w+)$
Would allow you to capture the groups values
For input Sc_bookstore.PKG_book.SP_Harrypotter:
Group 1: "Sc_bookstore"
Group 2: "PKG_book"
Group 3: "SP_Harrypotter"
But, in your case, just splitting the String by . would suit your needs very well
you may use split (with no regex)
String[] values;
values = str.split("\\.");
Assuming that a single string always contains exactly one such record, use this:
^Sc_([^.]+)\.PKG_([^.]+)\.SP_([^$]+)$
...note that this removes the Sc_, PKG_ and SP_ parts of the string and only returns "content".
For testing purposes, I created a simple JavaRegexTester application to test regular expressions.