Regex to identify characters other than Special char , numbers and alpabets - java

I am trying to write an informatica regex to identify the non english characters .I have the below regex but it is not working. Can anyone please help ?
IIF(REG_MATCH(Input_data, "[a-zA-Z0-9!##$%^&*()_+|\-=\\{}\[\]:"";'<>?,./ ]+"),'ENGLISH','NON-ENGLISH')

Try the following. Some characters need "\" escaping: [a-zA-Z0-9!##$%^&*()_+|\-=\{\}\[\]:\";'<>?,.\/ ]+?
See https://regex101.com/r/NEZicz/1

Related

Remove everything from a string upto a certain character and optionally a string if it follows too

I am looking to write a regex that can remove any characters upto the first &emsp and if there is a (new section) following &emsp then remove that as well. But the following regex doesn't seem to work. Why? How do I correct this?
String removeEmsp =" “[<centd>[</centd>]§ 431:10A–126 (new section)[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.";
Pattern removeEmspPattern1 = Pattern.compile("(.*( (\\(new section\\)))?)(.*)", Pattern.MULTILINE);
System.out.println(removeEmspPattern1.matcher(removeEmsp).replaceAll("$2"));
Have you tried String Split? This creates an array of strings from a string, based on a deliminator.
Once you have the string split, just select the elements of the array that you need for print statement.
Read more here
Your regex is very long and I do not want to debug it. However the tip is that some characters have special meaning in regular expressions. For example & means "and". Squire brackets allow defining characters groups etc. Such characters must be escaped if you want them to be interpreted as just characters and not regex commands. To escape special character you have to write \ in front of it. But \ is escape character for java too, so it should be duplicate.
For example to replace ampersand by letter A you should write str.replaceAll("\\&", "A")
Now you have all information you need. Try to start from simpler regex and then expand it to what you need. Good luck.
EDIT
BTW parsing XML and/or HTML using regular expressions is possible but is highly not recommended. Use special parser for such formats.
Try this:
String removeEmsp =" “[<centd>[</centd>]§ 431:10A–126 (new section)[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.";
System.out.println(removeEmsp.replaceFirst("^.*?\\ (\\(new\\ssection\\))?", ""));
System.out.println(removeEmsp.replaceAll("^.*?\\ (\\(new\\ssection\\))?", ""));
Output:
[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.
[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.
It will remove everything up to " " and optionally, the following "(new section)" text if any.

Java regex - replace substrings between delimiters with same substrings without delimiters

I'm trying to figure out how to replace with Java 1.6 in strings like
hello ${world }! ${txt + '_t'}<br/> ${do_not_replace
any substring identified between '${' and '}' with the same substring without these delimiters.
So the output for the string above should be
hello world ! txt + '_t'<br/> ${do_not_replace
I identified a working pattern that allows me to replace the substrings with a fixed string
str.replaceAll('[${](.*?)}', '_')
and i know that i cannot use named groups with this version of Java.
Any suggestion for a simple solution to this problem are highly appreciated! Many thanks
try
s = s.replaceAll("\\$\\{(.+?)}", "$1");

why does this regex not work?

I am trying to match a string with a java regex and I cannot succeed. I'm pretty new to java and with most of my experience being linux based regex, I've had no success. Can someone help me?
Below are the codes that Im using.
The regex is-
//vod//final\_\d{0,99}.\d{0,99}\\-Frag\d{0,99}
The line that I'm trying to match is
/vod/final_1.3Seg1-Frag1
where I want 1.3, 1 and 1 to be wildcarded.
Someone please help me out... :(
You are missing the Seg1 part. Also you are escaping characters that need not to be escaped. Try out this regexp: /vod/final_\\d+\\.\\d+Seg1-Frag\\d+
This should work:
Pattern p = Pattern.compile( "/vod/final_\\d+\\.\\d+Seg\\d+-Frag\\d+" );
Notes: To protect special characters, you can use Pattern.quote()
When running into problems like this, start with a simple text and pattern and build from there. I.e. first try to match /, then /vod/, then /vod/final_1, etc.
You're escaping too much. Don't escape /, _, -.
Something like:
/vod/final_\d{0,99}.\d{0,99}-Frag\d{0,99}
Does this work?
/\/vod\/final\_\d{0,99}.\d{0,99}Seg\d-Frag\d{0,99}
Also, here's what I used to edit the regex you provided above: http://rubular.com/
It says it's for ruby, but it also mentions that it works for java too.

How do I write this regex in Java?

Basically, for this regex
{(\(\(("\w{1,}",{0,1}){2}\),\(("[^:=;#"\)\(\{\}\[\]]{1,}",{0,1}){2}"[LR]{1}"\)\),{0,1}){1,}}
Which I've tested on Regexpal for this input:
{(("st0","sy0"),("st1","sy3","L")),(("st0","sy0"),("st1","^","L"))}
I now need in Java. I can't seem to figure out how to convert it. Can somebody show me how to?
You need to escape the special chars - specifically the backslashes and the quote marks.
The regular expression could work on Java, the only thing that you have to do, is escape the backslash .

How to use regex to remove punctuations in a sentence

I am trying to take from a file all the valid words. Valid words are defined as normal characters that can appear like so:
don't won't can't
and I have to ignore commas periods and exclamation points.
I have gotten the expression to just get characters but now it won't get words like don't and can't or won't.
This is the expression I am using "[^A-Za-z]+" and I have tried "\'[^A-Za-z]+" but this breaks and allows all characters. Does anyone have any idea what I can use to get normal words including don't and won't and can't and such words.
Thank you very much
[^A-Za-z] Would mean anything NOT matching those character ranges! Try this:
[A-Za-z']
You may need to escape the single quote, in which case you'll probably need to escape the slash that escapes it:
[A-Za-z\\']
Another way (using abbreviations) is: \b[\w']+
This will match letters from any language and exclude numbers.
\b[\p{L}\!\'\?]+
Here is a very good resource for regular expressions.
http://www.regular-expressions.info/

Categories

Resources