Regular Expression for a Java String Contains contains Alphabet and : /

Regular Expression for a Java String Contains contains Alphabet and : / - java

I have a set of Strings like "04/21 01:55 P ", "1", "10/21". I wrote a regex as follows
^([[0-9]{2}/[0-9]{2} [0-9]{2}:[0-9]{2} P|A ]+)
It should accept only the format of Strings like "04/21 01:55 P ". But it is also accepting strings like "1", "10/21"
Could any one let me know where I want wrong.

Replace the surrounding [] by ().
You'll need to change the P|A part too, either by (P|A) or [PA].

You've put everything in one big character class, which is why single digits are being matched as well. You can try something like
^(\d{2}/\d{2} \d{2}:\d{2} (?:P|A) )+

Related

how to convert one line containing several sentences into lines according to dot(.) [duplicate]

I am wondering if I am going about splitting a string on a . the right way? My code is:
String[] fn = filename.split(".");
return fn[0];
I only need the first part of the string, that's why I return the first item. I ask because I noticed in the API that . means any character, so now I'm stuck.

split() accepts a regular expression, so you need to escape . to not consider it as a regex meta character. Here's an example :
String[] fn = filename.split("\\.");
return fn[0];

I see only solutions here but no full explanation of the problem so I decided to post this answer
Problem
You need to know few things about text.split(delim). split method:
accepts as argument regular expression (regex) which describes delimiter on which we want to split,
if delim exists at end of text like in a,b,c,, (where delimiter is ,) split at first will create array like ["a" "b" "c" "" ""] but since in most cases we don't really need these trailing empty strings it also removes them automatically for us. So it creates another array without these trailing empty strings and returns it.
You also need to know that dot . is special character in regex. It represents any character (except line separators but this can be changed with Pattern.DOTALL flag).
So for string like "abc" if we split on "." split method will
create array like ["" "" "" ""],
but since this array contains only empty strings and they all are trailing they will be removed (like shown in previous second point)
which means we will get as result empty array [] (with no elements, not even empty string), so we can't use fn[0] because there is no index 0.
Solution
To solve this problem you simply need to create regex which will represents dot. To do so we need to escape that .. There are few ways to do it, but simplest is probably by using \ (which in String needs to be written as "\\" because \ is also special there and requires another \ to be escaped).
So solution to your problem may look like
String[] fn = filename.split("\\.");
Bonus
You can also use other ways to escape that dot like
using character class split("[.]")
wrapping it in quote split("\\Q.\\E")
using proper Pattern instance with Pattern.LITERAL flag
or simply use split(Pattern.quote(".")) and let regex do escaping for you.

Split uses regular expressions, where '.' is a special character meaning anything. You need to escape it if you actually want it to match the '.' character:
String[] fn = filename.split("\\.");
(one '\' to escape the '.' in the regular expression, and the other to escape the first one in the Java string)
Also I wouldn't suggest returning fn[0] since if you have a file named something.blabla.txt, which is a valid name you won't be returning the actual file name. Instead I think it's better if you use:
int idx = filename.lastIndexOf('.');
return filename.subString(0, idx);

the String#split(String) method uses regular expressions.
In regular expressions, the "." character means "any character".
You can avoid this behavior by either escaping the "."
filename.split("\\.");
or telling the split method to split at at a character class:
filename.split("[.]");
Character classes are collections of characters. You could write
filename.split("[-.;ld7]");
and filename would be split at every "-", ".", ";", "l", "d" or "7". Inside character classes, the "." is not a special character ("metacharacter").

As DOT( . ) is considered as a special character and split method of String expects a regular expression you need to do like this -
String[] fn = filename.split("\\.");
return fn[0];
In java the special characters need to be escaped with a "\" but since "\" is also a special character in Java, you need to escape it again with another "\" !

String str="1.2.3";
String[] cats = str.split(Pattern.quote("."));

Wouldn't it be more efficient to use
filename.substring(0, filename.indexOf("."))
if you only want what's up to the first dot?

Usually its NOT a good idea to unmask it by hand. There is a method in the Pattern class for this task:
java.util.regex
static String quote(String s)

The split must be taking regex as a an argument... Simply change "." to "\\."

The solution that worked for me is the following
String[] fn = filename.split("[.]");

Note: Further care should be taken with this snippet, even after the dot is escaped!
If filename is just the string ".", then fn will still end up to be of 0 length and fn[0] will still throw an exception!
This is, because if the pattern matches at least once, then split will discard all trailing empty strings (thus also the one before the dot!) from the array, leaving an empty array to be returned.

Using ApacheCommons it's simplest:
File file = ...
FilenameUtils.getBaseName(file.getName());
Note, it also extracts a filename from full path.

split takes a regex as argument. So you should pass "\." instead of "." because "." is a metacharacter in regex.

Split on non arabic characters

I have a String like this
أصبح::ينال::أخذ::حصل (على)::أحضر
And I want to split it on non Arabic characters using java
And here's my code
String s = "أصبح::ينال::أخذ::حصل (على)::أحضر";
String[] arr = s.split("^\\p{InArabic}+");
System.out.println(Arrays.toString(arr));
And the output was
[, ::ينال::أخذ::حصل (على)::أحضر]
But I expect the output to be
[ينال,أخذ,حصل,على,أحضر]
So I don't know what's wrong with this?

You need a negated class, and to do that, you need square brackets [ ... ]. Try to split with this:
"[^\\p{InArabic}]+"
If \\p{InArabic} matches any arabic character, then [^\\p{InArabic}] will match any non-arabic character.
Another option you can consider is an equivalent syntax, using P instead of p to indicate the opposite of the \\p{InArabic} character class like #Pshemo mentioned:
"\\P{InArabic}+"
This works just like \\W is the opposite of \\w.
The only possible advantage you get with the first syntax over the second (again like #Pshemo mentioned), is that if you want to add other characters to the list of characters which shouldn't match, for example, if you want to match all non \\p{InArabic} except periods, the first one is more flexible:
"[^\\p{InArabic}.]+"
^
Otherwise, if you really want to use \\P{InArabic}, you'll need subtraction within classes:
"[\\P{InArabic}&&[^.]]+"

The expression you want is "\\P{InArabic}+"
This means match any (non-zero) number of characters that are not Arabic.

Regex excluding square brackets

I am new to regex. I have this regex:
\[(.*[^(\]|\[)].*)\]
Basically it should take this:
[[a][b][[c]]]
And be able to replace with:
[dd[d]]
abc, d are unrelated. Needless to say the regex bit isn't working. it replaces the entire string with "d" in this case.
Any explanation or aid would be great!
EDIT:
I tried another regex,
\[([^\]]{0})\]
This one worked for the case where brackets contain no inner brackets and nothing else inside. But it doesn't work for the described case.

You need to know that . dot is special character which represents "any character beside new line mark" and * is greedy so it will try to find maximal match.
In your regex \[(.*[^(\]|\[)].*)\] first .* will represent maximal set of characters between [ and [^(\]|\[)].*)\]] and this part can be understood as non [ or ] character, optional other characters .* and finally ]. So this regex will match your entire input.
To get rid of that problem remove both .* from your regex. Also you don't need to use | or ( ) inside [^...].
System.out.println("[[a][b][[c]]]".replaceAll("\\[[^\\]\\[]\\]", "d"));
Output: [dd[d]]

\[(\[a\])(\[b\])\[(\[c\])\]\]
If you need to double backslashes in the current context (such as you are placing it in a "" style string):
\\[(\\[a\\])(\\[b\\])\\[(\\[c\\])\\]\\]
An example replacement for a, b and c is [^\]]*, or if you need to escape backslashes [^\\]]*.
Now you can replace capture one, capture two and capture three each with d.
If the string you are replacing in is not exactly of that format, then you want to do a global replacement with
(\[a\])
replacing a,
(\[[^\]]*\])
doubling backslashes,
(\\[[^\\]]*\\])

Try this:
System.out.println("[[a][b][[c]]]".replaceAll("\\[[^]\\[]]", "d"));
if a,b,c are in real world more than one character, use this:
System.out.println("[[a][b][[c]]]".replaceAll("\\[[^]\\[]++]", "d"));
The idea is to use a character class that contains all characters but [ and ]. The class is: [^]\\[] and other square brackets in the pattern are literals.
Note that a literal closing square bracket don't need to be escaped at the first position in a character class and outside a character class.

if then condition using regex in java

I have a pattern which goes like this
String1 :"String2",
i have to validate this pattern. here if u see there are two cases, the somestring1 can contain special characters if it is given within double quotes.
eg: "xxxx-xxx" :"yyyyyyyy",--------> is valid
but xxxx-xxx :"yyyyyyyy",--------> is not valid
"xxxx-xxx :"yyyyyyyy",--------> is not valid
So i need to create a regex which will check whether the double quotes is closed properly if it is present in String1.

Short answer: Regex doesn't work like that.
What you can do however, is to use two separate patterns to validate:
\"[^\"]+?\" :.*
To check the one that can contain special characters, and:
[a-zA-Z]+? :.*
To check the one that can't
EDIT:
Thinking some more about it, you could combine the two patterns above like so:
^(\"[^\"]+?\"|[a-zA-Z]+?) :.*$
Which will match something :"something" and "some-thing" :"something" but not "some-thing : "something" or some-thing : "something". Assuming that the string only contains the given text.

If I understand your question right, this simple regex should work
\"string1\" :\"string2\"

Maybe something like this?
(?<normalString>^[a-zA-Z]+$)|(?<specialString>^".*?"$)
This will capture only a-z characters and put them in the "normalString" group, or if there's an string within quotation marks, capture that and put it in the "specialString" group.

Need some help getting some stuff off a string

I want to get some info out of my string but there's two possible "expressions" for the string. I want to get "a" & "b" out of the string. This is how they look:
Format one:
http://default.com/default/a/b
Format two:
http://default.com/#!default|1|a|b|1
How can I do this?

If the strings always looks like this, you could do the following:
Search for the #-char to decide, if you have type 1 or 2.
In case of type 1, split with delimiter '/' and always take the last and the one before. For type 2, also first split with '/' and then, split the last part again with delimiter '|' and take results[2] and results[3].

Use a regex to split the string.
Split on "default"
Regex Split

There are many ways you can do this - regular expressions is the most common.
In pseudo code:
if the string contains "/#!default" then:
Use the regular expression ^.*/([^/])/([^/])$
if the string contains "/default" then:
Use the regular expression ^.*|([^|])|([^|])|1$
Take the 1st and 2nd blocks from the matcher

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular Expression for a Java String Contains contains Alphabet and : / - java

I have a set of Strings like "04/21 01:55 P ", "1", "10/21". I wrote a regex as follows ^([[0-9]{2}/[0-9]{2} [0-9]{2}:[0-9]{2} P|A ]+) It should accept only the format of Strings like "04/21 01:55 P ". But it is also accepting strings like "1", "10/21" Could any one let me know where I want wrong.

Replace the surrounding [] by (). You'll need to change the P|A part too, either by (P|A) or [PA].

You've put everything in one big character class, which is why single digits are being matched as well. You can try something like ^(\d{2}/\d{2} \d{2}:\d{2} (?:P|A) )+

Related

how to convert one line containing several sentences into lines according to dot(.) [duplicate]

Split on non arabic characters

Regex excluding square brackets

if then condition using regex in java

Need some help getting some stuff off a string

Categories

Resources