Splitting string into array of words using specific words java - java

I want to split this String to give my desired output
sinXcos(b+c)
Gives output as
sinX
cos(b+c)
I know how to split a string like
200XY
using
token = 200XY;
String[] mix_token = token.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
But how can I use something like this on a string like
sinXcos(b+c)
or a String like
sinXcos(b+c)tan(z)

This will work..
public static void main(String[] args) {
String text = "sinXcos(b+c)tan(z)";
String patternString1 = "(sin|cos|tan)(?![a-z])\\(?\\w(\\+\\w)?\\)?";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
O/P:
sinX
cos(b+c)
tan(z)
2. Input :"sinabc(X+y)cos(b+c)tan(z)";
O/P :
cos(b+c)
tan(z)
Explaination :
S
tring patternString1 = "(sin|cos|tan)(?![a-z])\\(?\\w(\\+\\w)?\\)?";
1. (sin|cos|tan) -->start with (sin or cos or tan)
2. (?:![a-z]) --> negative lookahead. check if the next character is not in between [a to z].
3. \\(?\\w(\\+\\w)?\\)?--> an optional brace followed by an alphabet followed by a "+" and another alphabet.

Related

Parsing comma separated string with prefix

I am getting comma sepeated string in below format:
String codeList1 = "abc,pqr,100101,P101001,R108972";
or
String codeList2 = "mno, 100101,108972";
Expected Result : Check if code is numeric after removing first alphabet. If yes, remove prefix and return. If no, still return the code.
codeList1 = "abc,pqr,100101,101001,108972";
or
codeList2 = "mno, 100101,108972";
As you can see, I can get codes (P101001 or 101001) and (R108972 ,108972) format. There is will be only one prefix only.
If I am getting(P101001), I want to remove 'P' prefix and return number 101001.
If I am getting 101001, do nothing.
Below is the working code. But is there any easier or more efficient way of achieving this. Please help
for (String code : codeList.split(",")) {
if(StringUtils.isNumeric(code)) {
codes.add(code);
} else if(StringUtils.isNumeric(code.substring(1))) {
codes.add(Integer.toString(Integer.parseInt(code.substring(1))));
} else {
codes.add(code);
}
}
If you want to remove prefixes from the numbers you can easilly use :
String[] codes = {"abc,pqr,100101,P101001,R108972", "mno, 100101,108972"};
for (String code : codes){
System.out.println(
code.replaceAll("\\b[A-Z](\\d+)\\b", "$1")
);
}
Outputs
abc,pqr,100101,101001,108972
mno, 100101,108972
If you are using Java 8+, and want to extract only the numbers, you can just use :
String codeList1 = "abc,pqr,100101,P101001,R108972";
List<Integer> results = Arrays.stream(codeList1.split("\\D")) //split with non degits
.filter(c -> !c.isEmpty()) //get only non empty results
.map(Integer::valueOf) //convert string to Integer
.collect(Collectors.toList()); //collect to results to list
Outputs
100101
101001
108972
You can use regex to do it
String str = "abc,pqr,100101,P101001,R108972";
String regex = ",?[a-zA-Z]{0,}(\\d+)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Output
100101
101001
108972
Updated:
For your comment(I want to add add the codes. If single alphabet prefix found , remove it and add remaining ),you can use below code:
String str = "abc,pqr,100101,P101001,R108972";
String regex = "(?=,?)[a-zA-Z]{0,}(?=\\d+)|\\s";// \\s is used to remove space
String[] strs = str.replaceAll(regex,"").split(",");
Output:
abc
pqr
100101
101001
108972
How about this:
String codeList1 = "abc,pqr,100101,P101001,R108972";
String[] codes = codeList1.split(",");
for (String code : codes) {
if (code.matches("[A-Z]?\\d{6}")) {
String codeF = code.replaceAll("[A-Z]+", "");
System.out.println(codeF);
}
}
100101
101001
108972
Demo

How to write regEx to get value .4318. from String app.abc.4318.20161017223456.log.gz in Java

I am using below pattern to get four digit number from below string.
Required output is .4318.
public static void main(String args[]) {
String path = "app.xyz.4318.20161017223456.log.gz";
Pattern p = Pattern.compile("(\\d{4})");
Matcher m = p.matcher(path);
if (m.find()) {
System.out.println(m.group(1));
}
}
Output: 4318 but, I need output like .4318.
Please suggest any pattern to get the output as .4318.
Pattern p=Pattern.compile("(\\.\\d{4})");
"\." is an escaped period.

Creating regex to extract 4 digit number from string using java

Hi I am trying to build one regex to extract 4 digit number from given string using java. I tried it in following ways:
String mydata = "get the 0025 data from string";
Pattern pattern = Pattern.compile("^[0-9]+$");
//Pattern pattern = Pattern.compile("^[0-90-90-90-9]+$");
//Pattern pattern = Pattern.compile("^[\\d]+$");
//Pattern pattern = Pattern.compile("^[\\d\\d\\d\\d]+$");
Matcher matcher = pattern.matcher(mydata);
String val = "";
if (matcher.find()) {
System.out.println(matcher.group(1));
val = matcher.group(1);
}
But it's not working properly. How to do this. Need some help. Thank you.
Change you pattern to:
Pattern pattern = Pattern.compile("(\\d{4})");
\d is for a digit and the number in {} is the number of digits you want to have.
If you want to end up with 0025,
String mydata = "get the 0025 data from string";
mydata = mydata.replaceAll("\\D", ""); // Replace all non-digits
Pattern pattern = Pattern.compile("\\b[0-9]+\\b");
This should do it for you.^$ will compare with the whole string.It will match string with only numbers.
Remove the anchors.. put paranthesis if you want them in group 1:
Pattern pattern = Pattern.compile("([0-9]+)"); //"[0-9]{4}" for 4 digit number
And extract out matcher.group(1)
Many better answers, but if you still have to use in the same way.
String mydata = "get the 0025 data from string";
Pattern pattern = Pattern.compile("(?<![-.])\\b[0-9]+\\b(?!\\.[0-9])");
Matcher matcher = pattern.matcher(mydata);
String val = "";
if (matcher.find()) {
System.out.println(matcher.group(0));
val = matcher.group(0);
}
changed matcher.group(1); to matcher.group(0);
You can go with \d{4} or [0-9]{4} but note that by specifying the ^ at the beginning of regex and $ at the end you're limiting yourself to strings that contain only 4 digits.
My recomendation: Learn some regex basics.
Scanner sc=new Scanner(System.in);
HashMap<String,String> a=new HashMap<>();
ArrayList<String> b=new ArrayList<>();
String s=sc.nextLine();
Pattern p=Pattern.compile("\\d{4}");
Matcher m=p.matcher(s);
while(m.find())
{
String x="";
x=x+(m.group(0));
a.put(x,"0");
b.add(x);
}
System.out.println(a.size());
System.out.println(b);
You can find all matched digit patterns and unique patterns (for unique use Set<String> k=b.keySet();)
If you want to match any number of digits then use pattern like the following:
^\D*(\d+)\D*$
And for exactly 4 digits go for
^\D*(\d{4})\D*$

Regex after a special character in Java

I am using regex in java to get a specific output from a list of rooms at my University.
A outtake from the list looks like this:
(A55:G260) Laboratorium 260
(A55:G292) Grupperom 292
(A55:G316) Grupperom 316
(A55:G366) Grupperom 366
(HDS:FLØYEN) Fløyen (appendix)
(ODO:PC-STUE) Pulpakammeret (PC-stue)
(SALEM:KONF) Konferanserom
I want to get the value that comes between the colon and the parenthesis.
The regex I am using at the moment is:
pattern = Pattern.compile("[:]([A-Za-z0-9ÆØÅæøå-]+)");
matcher = pattern.matcher(room.text());
I've included ÆØÅ, because some of the rooms have Norwegian letters in them.
Unfortunately the regex includes the building code also (e.g. "A55") in the output... Comes out like this:
A55
A55
A55
:G260
:G292
:G316
Any ideas on how to solve this?
The problem is not your regular expression. You need to reference group(1) for the match result.
while (matcher.find()) {
System.out.println(matcher.group(1));
}
However, you may consider using a negated character class instead.
pattern = Pattern.compile(":([^)]+)");
You can try a regex like this :
public static void main(String[] args) {
String s = "(HDS:FLØYEN) Fløyen (appendix)";
// select everything after ":" upto the first ")" and replace the entire regex with the selcted data
System.out.println(s.replaceAll(".*?:(.*?)\\).*", "$1"));
String s1 = "ODO:PC-STUE) Pulpakammeret (PC-stue)";
System.out.println(s1.replaceAll(".*?:(.*?)\\).*", "$1"));
}
O/P :
FLØYEN
PC-STUE
Can try with String Opreations as follows,
String val = "(HDS:FLØYEN) Fløyen (appendix)";
if(val.contains(":")){
String valSub = val.split("\\s")[0];
System.out.println(valSub);
valSub = valSub.substring(1, valSub.length()-1);
String valA = valSub.split(":")[0];
String valB = valSub.split(":")[1];
System.out.println(valA);
System.out.println(valB);
}
Output :
(HDS:FLØYEN)
HDS
FLØYEN
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class test
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "(HDS:FLØYEN) Fløyen (appendix)";
String pattern = ":([^)]+)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
while (m.find()) {
System.out.println(m.group(1));
}
}
}

Java: Find a specific pattern using Pattern and Matcher

This is the string that I have:
KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007
This is a weather report. I need to extract the following numbers from the report: 10/M13. It is temperature and dewpoint, where M means minus. So, the place in the String may differ and the temperature may be presented as M10/M13 or 10/13 or M10/13.
I have done the following code:
public String getTemperature (String metarIn){
Pattern regex = Pattern.compile(".*(\\d+)\\D+(\\d+)");
Matcher matcher = regex.matcher(metarIn);
if (matcher.matches() && matcher.groupCount() == 1) {
temperature = matcher.group(1);
System.out.println(temperature);
}
return temperature;
}
Obviously, the regex is wrong, since the method always returns null. I have tried tens of variations but to no avail. Thanks a lot if someone can help!
This will extract the String you seek, and it's only one line of code:
String tempAndDP = input.replaceAll(".*(?<![M\\d])(M?\\d+/M?\\d+).*", "$1");
Here's some test code:
public static void main(String[] args) throws Exception {
String input = "KLAS 282356Z 32010KT 10SM FEW090 M01/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
String tempAndDP = input.replaceAll(".*(?<![M\\d])(M?\\d+/M?\\d+).*", "$1");
System.out.println(tempAndDP);
}
Output:
M01/M13
The regex should look like:
M?\d+/M?\d+
For Java this will look like:
"M?\\d+/M?\\d+"
You might want to add a check for white space on the front and end:
"\\sM?\\d+/M?\\d+\\s"
But this will depend on where you think you are going to find the pattern, as it will not be matched if it is at the end of the string, so instead we should use:
"(^|\\s)M?\\d+/M?\\d+($|\\s)"
This specifies that if there isn't any whitespace at the end or front we must match the end of the string or the start of the string instead.
Example code used to test:
Pattern p = Pattern.compile("(^|\\s)M?\\d+/M?\\d+($|\\s)");
String test = "gibberish M130/13 here";
Matcher m = p.matcher(test);
if (m.find())
System.out.println(m.group().trim());
This returns: M130/13
Try:
Pattern regex = Pattern.compile(".*\\sM?(\\d+)/M?(\\d+)\\s.*");
Matcher matcher = regex.matcher(metarIn);
if (matcher.matches() && matcher.groupCount() == 2) {
temperature = matcher.group(1);
System.out.println(temperature);
}
Alternative for regex.
Some times a regex is not the only solution. It seems that in you case, you must get the 6th block of text. Each block is separated by a space character. So, what you need to do is count the blocks.
Considering that each block of text does NOT HAVE fixed length
Example:
String s = "KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
int spaces = 5;
int begin = 0;
while(spaces-- > 0){
begin = s.indexOf(' ', begin)+1;
}
int end = s.indexOf(' ', begin+1);
String result = s.substring(begin, end);
System.out.println(result);
Considering that each block of text does HAVE fixed length
String s = "KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
String result = s.substring(33, s.indexOf(' ', 33));
System.out.println(result);
Prettier alternative, as pointed by Adrian:
String result = rawString.split(" ")[5];
Note that split acctualy receives a regex pattern as parameter

Categories

Resources