Java regex pattern for number - java

I know that this question can be stupid but I am trying to get some information from text and you are my last hope after last three hours of trying..
DIC: C/40764176 IC: 407641'6
Dekujerne a t8ime se na shledanou
I need to get for example this 40764176
I need to get string with 8-10 length, sometimes there can be some special chars like I,i,G,S,O,ó,l) but I have tried a lot of patterns for this and no one works...
I tried:
String generalDicFormatPattern = "([0-9IiGSOól]{8,10})";
String generalDicFormatPattern = ".*([0-9IiGSOól]{8,10}).*";
String generalDicFormatPattern = "\\b([0-9IiGSOól]{8,10})\\b";
nothing works... do you know where is the problem?
edit:
I use regex in this way:
private List<String> getGeneralDicFromLine(String concreteLine) {
List<String> allMatches = new ArrayList<String>();
Pattern pattern = Pattern.compile(generalDicFormatPattern);
Matcher matcher = pattern.matcher(concreteLine);
while (matcher.find()) {
allMatches.add(matcher.group(1));
}
return allMatches;
}

If your string's pattern is fixed you can use the regex
C/([^\s]{8,10})\sIC:
Sample code:
String s = "DIC: C/40764176 IC: 407641'6";
Pattern p = Pattern.compile("C/([^\\s]{8,10})\\sIC:");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(1)); // 40764176
}
I'm expecting any character (includes the special ones you've shown in examples) but a white space.

May be you can split your string with spaces (string.split('\\s');), then you should have an array like this :
DIC:
C/40764176
IC: 407641'6
...
shledanou
Get the second string, split it using '/', and get the second element.
I hope it helped you.
Tip : you can check after the result using a regex (([0-9IiGSOól]{8,10})

Related

Pattern (string) allows characters only one time

I want to check if my string contains only allowed characters. Everything works properly for example 7B, 77B or 7BBBB, but when I input something like this 7B7 or 7BB2 it's not matching.
Everything work fine, but when integer is last character it's not working.
Could You tell me what is wrong with that code?
pattern = Pattern.compile("[0-9]*[a-f]*[A-F]*");
matcher = pattern.matcher(stNumber);
if (matcher.matches()) {...}
If you want to mix numbers and chars in a various order you need sth like:
Pattern pattern = Pattern.compile("[\\da-fA-F]*")
Why not try it this way?
// Compile this pattern.
Pattern pattern = Pattern.compile("[0-9]*[a-f]*[A-F]*[0-9]*");
// See if this String matches.
Matcher m = pattern.matcher("num123");
if (m.matches()) {
System.out.println(true);
}
Source
Are you trying to verify that the string only has digits and letters and nothing else?
If so try using the following:
pattern = Pattern.compile("^[a-z-A-Z\\d]*$");
matcher = pattern.matcher(stNumber);
if (matcher.matches()) {...}

Regex to split around a word

I am struggling to get the String.split() to do what I would like it to do.
I have an Input of a string of words separated by spaces. Some words have a special function. They look something like this: "special:word".
The input string I am using to test my regex looks like this:
String str = "Hello wonderful special:world what a great special:day";
The result I would like to get from str.split(regex) is an array with the words "world" and "day";
I tried doing it with lookahead (?<=special\:)(\w+) but this splits the string at the words I am looking for. How do I inverse this expression to get the result I am looking for and what exactly do lookaheads and reverse lookaheads do?
Using split in this case would create few problems:
overcomplicated regex to match part that we should split on
Hello wonderful special:world what a great special:day
^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
after split your first element would be empty string "" because split doesn't trim first empty elements like it does in last empty elements so your result would be
["", "world", "day"]
To avoid this use more intuitive approach: instead of finding everything that is NOT part that you want, find only part that you are interested in. To do this use Pattern and Matcher classes. Here is example of how you can find all your special words:
String str = "Hello wonderful special:world what a great special:day";
Pattern p = Pattern.compile("\\b\\w+:(\\w+)\\b");//word after : will be in group 1
Matcher m = p.matcher(str);
while(m.find()){//this will iterate over all found substrings
//here we can use found substrings
System.out.println(m.group(1));
}
Output:
world
day
use patter and matcher, simple example
public static ArrayList<String> parseOut(String s)
{
ArrayList<String> list = new ArrayList<String>();
String regex = "([:])(\\w+)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
list.add(matcher.group().substring(1));
}
return list;
}
Try Pattern and Matcher
String searchPattern = "Hello wonderful special:world what a great special:day";
Pattern pa = Pattern.compile(":[a-zA-Z0-9]+");
Matcher ma = pa.matcher(searchPattern);
while(ma.find()){
System.out.println(ma.group().replaceFirst(":",""));
}
output:
world
day
By using split() we can do as:
String searchPattern1 = "Hello wonderful special:world what a great special:day";
for(String i:searchPattern1.split("\\s")){
if(i.contains(":")){
System.out.println(i.split[1]);
}
}
Here also we get the same output as above.

Extracting a word containing a symbol from a string in Java

The basic idea is that I want to pull out any part of the string with the form "text1.text2". Some examples of the input and output of what I'd like to do would be:
"employee.first_name" ==> "employee.first_name"
"2 * employee.salary AS double_salary" ==> "employee.salary"
Thus far I have just .split(" ") and then found what I needed and .split("."). Is there any cleaner way?
I would go with an actual Pattern and an iterative find, instead of splitting the String.
For instance:
String test = "employee.first_name 2 * ... employee.salary AS double_salary blabla e.s blablabla";
// searching for a number of word characters or puctuation, followed by dot,
// followed by a number of word characters or punctuation
// note also we're avoiding the "..." pitfall
Pattern p = Pattern.compile("[\\w\\p{Punct}&&[^\\.]]+\\.[\\w\\p{Punct}&&[^\\.]]+");
Matcher m = p.matcher(test);
while (m.find()) {
System.out.println(m.group());
}
Output:
employee.first_name
employee.salary
e.s
Note: to simplify the Pattern you could only list the allowed punctuation forming your "."-separated words in the categories
For instance:
Pattern p = Pattern.compile("[\\w_]+\\.[\\w_]+");
This way, foo.bar*2 would be matched as foo.bar
You need to make use of split to break the string into fragments.Then search for . in each of those fragments using contains method, to get the desired fragments:
Here you go:
public static void main(String args[]) {
String str = "2 * employee.salary AS double_salary";
String arr[] = str.split("\\s");
for (int i = 0; i < arr.length; i++) {
if (arr[i].contains(".")) {
System.out.println(arr[i]);
}
}
}
String mydata = "2 * employee.salary AS double_salary";
pattern = Pattern.compile("(\\w+\\.\\w+)");
Matcher matcher = pattern.matcher(mydata);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
I'm not an expert in JAVA, but as I used regex in python and based on internet tutorials, I offer you to use r'(\S*)\.(\S*)' as the pattern. I tried it in python and it worked well in your example.
But if you want to use multiple dots continuously, it has a bug. I mean if you are trying to match something like first.second.third, this pattern identifies ('first.second', 'third') as the matched group and I think it relates to the best match strategy.

forming correct regular expression in dynamic string

I have a FileInputStream who reads a file which somewhere contains a string subset looking like:
...
OperatorSpecific(XXX)
{
Customer(someContent)
SaveImage()
{
...
I would like to identify the Customer(someContent) part of the string and switch the someContent inside the parenthesis for something else.
someContent will be a dynamic parameter and will contain a string of maybe 5-10 chars.
I have used regEx before, like once or twice, but I feel that in a context such as this where I don't know what value will be inside the parenthesis I'm at a loss of how I should express it...
In summary I want to have a string returned to me which has my someContent value inside the Customer-parenthesis.
Does anyone have any bright ideas of how to get this done?
Try this one (double the escaping backslashes for the use in java!)
(?<=Customer\()[^\)]*
And replace with your content.
See it here at Regexr
(?<=Customer\() is look behind assertion. It checks at every position if there is a "Customer(" on the left, if yes it matches on the right all characters that are not a ")" with the [^\)]*, this is then the part that will be replaced.
Some working java code
Pattern p = Pattern.compile("(?<=Customer\\()[^\\)]*");
String original = "Customer(someContent)";
String Replacement = "NewContent";
Matcher m = p.matcher(original);
String result = m.replaceAll(Replacement);
System.out.println(result);
This will print
Customer(NewContent)
Using groups works and non-greedy works:
String s =
"OperatorSpecific(XXX)\n {\n" +
" Customer(someContent)\n" +
" SaveImage() {";
Pattern p = Pattern.compile("Customer\\((.*?)\\)");
Matcher matcher = p.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
will print
someContent
Untested, but something like the following should work:
Pattern pattern = Pattern.compile("\\s+Customer\\(\\s*(\\w+)\\s*\\)\\s*");
Matcher matcher = pattern.matcher(input);
matcher.matches();
System.out.println(matcher.group(1));
EDIT
This of course won't work with all possible cases:
// legal variable names
Customer(_someContent)
Customer($some_Content)

extract substring in java using regex

I need to extract "URPlus1_S2_3" from the string:
"Last one: http://abc.imp/Basic2#URPlus1_S2_3,"
using regular expression in Java language.
Can someone please help me? I am using regex for the first time.
Try
Pattern p = Pattern.compile("#([^,]*)");
Matcher m = p.matcher(myString);
if (m.find()) {
doSomethingWith(m.group(1)); // The matched substring
}
String s = "Last one: http://abc.imp/Basic2#URPlus1_S2_3,";
Matcher m = Pattern.compile("(URPlus1_S2_3)").matcher(s);
if (m.find()) System.out.println(m.group(1));
You gotta learn how to specify your requirements ;)
You haven't really defined what criteria you need to use to find that string, but here is one way to approach based on '#' separator. You can adjust the regex as necessary.
expr: .*#([^,]*)
extract: \1
Go here for syntax documentation:
http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
String s = Last one: http://abc.imp/Basic2#URPlus1_S2_3,"
String result = s.replaceAll(".*#", "");
The above returns the full String in case there's no "#". There are better ways using regex, but the best solution here is using no regex. There are classes URL and URI doing the job.
Since it's the first time you use regular expressions I would suggest going another way, which is more understandable for now (until you master regular expressions ;) and it will be easily modified if you will ever need to:
String yourPart = new String().split("#")[1];
Here's a long version:
String url = "http://abc.imp/Basic2#URPlus1_S2_3,";
String anchor = null;
String ps = "#(.+),";
Pattern p = Pattern.compile(ps);
Matcher m = p.matcher(url);
if (m.matches()) {
anchor = m.group(1);
}
The main point to understand is the use of the parenthesis, they are used to create groups which can be extracted from a pattern. In the Matcher object, the group method will return them in order starting at index 1, while the full match is returned by the index 0.
If you just want everything after the #, use split:
String s = "Last one: http://abc.imp/Basic2#URPlus1_S2_3," ;
System.out.println(s.split("#")[1]);
Alternatively, if you want to parse the URI and get the fragment component you can do:
URI u = new URI("http://abc.imp/Basic2#URPlus1_S2_3,");
System.out.println(u.getFragment());

Categories

Resources