Java: Find a specific pattern using Pattern and Matcher - java

This is the string that I have:
KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007
This is a weather report. I need to extract the following numbers from the report: 10/M13. It is temperature and dewpoint, where M means minus. So, the place in the String may differ and the temperature may be presented as M10/M13 or 10/13 or M10/13.
I have done the following code:
public String getTemperature (String metarIn){
Pattern regex = Pattern.compile(".*(\\d+)\\D+(\\d+)");
Matcher matcher = regex.matcher(metarIn);
if (matcher.matches() && matcher.groupCount() == 1) {
temperature = matcher.group(1);
System.out.println(temperature);
}
return temperature;
}
Obviously, the regex is wrong, since the method always returns null. I have tried tens of variations but to no avail. Thanks a lot if someone can help!

This will extract the String you seek, and it's only one line of code:
String tempAndDP = input.replaceAll(".*(?<![M\\d])(M?\\d+/M?\\d+).*", "$1");
Here's some test code:
public static void main(String[] args) throws Exception {
String input = "KLAS 282356Z 32010KT 10SM FEW090 M01/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
String tempAndDP = input.replaceAll(".*(?<![M\\d])(M?\\d+/M?\\d+).*", "$1");
System.out.println(tempAndDP);
}
Output:
M01/M13

The regex should look like:
M?\d+/M?\d+
For Java this will look like:
"M?\\d+/M?\\d+"
You might want to add a check for white space on the front and end:
"\\sM?\\d+/M?\\d+\\s"
But this will depend on where you think you are going to find the pattern, as it will not be matched if it is at the end of the string, so instead we should use:
"(^|\\s)M?\\d+/M?\\d+($|\\s)"
This specifies that if there isn't any whitespace at the end or front we must match the end of the string or the start of the string instead.
Example code used to test:
Pattern p = Pattern.compile("(^|\\s)M?\\d+/M?\\d+($|\\s)");
String test = "gibberish M130/13 here";
Matcher m = p.matcher(test);
if (m.find())
System.out.println(m.group().trim());
This returns: M130/13

Try:
Pattern regex = Pattern.compile(".*\\sM?(\\d+)/M?(\\d+)\\s.*");
Matcher matcher = regex.matcher(metarIn);
if (matcher.matches() && matcher.groupCount() == 2) {
temperature = matcher.group(1);
System.out.println(temperature);
}

Alternative for regex.
Some times a regex is not the only solution. It seems that in you case, you must get the 6th block of text. Each block is separated by a space character. So, what you need to do is count the blocks.
Considering that each block of text does NOT HAVE fixed length
Example:
String s = "KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
int spaces = 5;
int begin = 0;
while(spaces-- > 0){
begin = s.indexOf(' ', begin)+1;
}
int end = s.indexOf(' ', begin+1);
String result = s.substring(begin, end);
System.out.println(result);
Considering that each block of text does HAVE fixed length
String s = "KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
String result = s.substring(33, s.indexOf(' ', 33));
System.out.println(result);
Prettier alternative, as pointed by Adrian:
String result = rawString.split(" ")[5];
Note that split acctualy receives a regex pattern as parameter

Related

N-th indexOf in String?

I need to extract a sub-string of a URL.
URLs
/service1/api/v1.0/foo -> foo
/service1/api/v1.0/foo/{fooId} -> foo/{fooId}
/service1/api/v1.0/foo/{fooId}/boo -> foo/{fooId}/boo
And some of those URLs may have request parameters.
Code
String str = request.getRequestURI();
str = str.substring(str.indexOf("/") + 1);
str = str.substring(str.indexOf("/") + 1);
str = str.substring(str.indexOf("/") + 1);
str = str.substring(str.indexOf("/") + 1, str.indexOf("?"));
Is there a better way to extract the sub-string instead of recurrent usage of indexOf method?
There are many alternative ways:
Use Java-Stream API on splitted String with \ delimiter:
String str = "/service1/api/v1.0/foo/{fooId}/boo";
String[] split = str.split("\\/");
String url = Arrays.stream(split).skip(4).collect(Collectors.joining("/"));
System.out.println(url);
With the elimination of the parameter, the Stream would be like:
String url = Arrays.stream(split)
.skip(4)
.map(i -> i.replaceAll("\\?.+", ""))
.collect(Collectors.joining("/"));
This is also where Regex takes its place! Use the classes Pattern and Matcher.
String str = "/service1/api/v1.0/foo/{fooId}/boo";
Pattern pattern = Pattern.compile("\\/.*?\\/api\\/v\\d+\\.\\d+\\/(.+)");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
If you rely on the indexOf(..) usage, you might want to use the while-loop.
String str = "/service1/api/v1.0/foo/{fooId}/boo?parameter=value";
String string = str;
while(!string.startsWith("v1.0")) {
string = string.substring(string.indexOf("/") + 1);
}
System.out.println(string.substring(string.indexOf("/") + 1, string.indexOf("?")));
Other answers include a way that if the prefix is not mutable, you might want to use only one call of idndexOf(..) method (#JB Nizet):
string.substring("/service1/api/v1.0/".length(), string.indexOf("?"));
All these solutions are based on your input and fact, the pattern is known, or at least the number of the previous section delimited with \ or the version v1.0 as a checkpoint - the best solution might not appear here since there are unlimited combinations of the URL. You have to know all the possible combinations of input URL to find the best way to handle it.
Path is quite useful for that :
public static void main(String[] args) {
Path root = Paths.get("/service1/api/v1.0/foo");
Path relativize = root.relativize(Paths.get("/service1/api/v1.0/foo/{fooId}/boo"));
System.out.println(relativize);
}
Output :
{fooId}/boo
How about this:
String s = "/service1/api/v1.0/foo/{fooId}/boo";
String[] sArray = s.split("/");
StringBuilder sb = new StringBuilder();
for (int i = 4; i < sArray.length; i++) {
sb.append(sArray[i]).append("/");
}
sb.deleteCharAt(sb.length() - 1);
System.out.println(sb.toString());
Output:
foo/{fooId}/boo
If the url prefix is always /service1/api/v1.0/, you just need to do s.substring("/service1/api/v1.0/".length()).
There are a few good options here.
1) If you know "foo" will always be the 4th token, then you have the right idea already. The only issue with your way is that you have the information you need to be efficient, but you aren't using it. Instead of copying the String multiple times and looping anew from the beginning of the new String, you could just continue from where you left off, 4 times, to find the starting point of what you want.
String str = "/service1/api/v1.0/foo/{fooId}/boo";
// start at the beginning
int start = 0;
// get the 4th index of '/' in the string
for (int i = 0; i != 4; i++) {
// get the next index of '/' after the index 'start'
start = str.indexOf('/',start);
// increase the pointer to the next character after this slash
start++;
}
// get the substring
str = str.substring(start);
This will be far, far more efficient than any regex pattern.
2) Regex: (java.util.regex.*). This will work if you what you want is always preceded by "service1/api/v1.0/". There may be other directories before it, e.g. "one/two/three/service1/api/v1.0/".
// \Q \E will automatically escape any special chars in the path
// (.+) will capture the matched text at that position
// $ marks the end of the string (technically it matches just before '\n')
Pattern pattern = Pattern.compile("/service1/api/v1\\.0/(.+)$");
// get a matcher for it
Matcher matcher = pattern.matcher(str);
// if there is a match
if (matcher.find()) {
// get the captured text
str = matcher.group(1);
}
If your path can vary some, you can use regex to account for it. e.g.: service/api/v3/foo/{bar}/baz/" (note varying number formats and trailing '/') could be matched as well by changing the regex to "/service\\d*/api/v\\d+(?:\\.\\d+)?/(.+)(?:/|$)"

JAVA Get text from String

Hi I get this String from server :
id_not="autoincrement"; id_obj="-"; id_tr="-"; id_pgo="-"; typ_not=""; tresc="Nie wystawił"; datetime="-"; lon="-"; lat="-";
I need to create a new String e.x String word and send a value which I get from String tresc="Nie wystawił"
Like #Jan suggest in comment you can use regex for example :
String str = "id_not=\"autoincrement\"; id_obj=\"-\"; id_tr=\"-\"; id_pgo=\"-\"; typ_not=\"\"; tresc=\"Nie wystawił\"; datetime=\"-\"; lon=\"-\"; lat=\"-\";";
Pattern p = Pattern.compile("tresc(.*?);");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group());
}
Output
tresc="Nie wystawił";
If you want to get only the value of tresc you can use :
Pattern p = Pattern.compile("tresc=\"(.*?)\";");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1));
}
Output
Nie wystawił
Something along the lines of
Pattern p = Pattern.compile("tresc=\"([^\"]+)\");
Matcher m = p.matcher(stringFromServer);
if(m.find()) {
String whatYouWereLookingfor = m.group(1);
}
should to the trick. JSON parsing might be much better in the long run if you need additional values
Your question is unclear but i think you get a string from server and from that string you want the string/value for tresc. You can first search for tresc in the string you get. like:
serverString.substring(serverString.indexOf("tresc") + x , serverString.length());
Here replace x with 'how much further you want to pick characters.
Read on substring and delimiters
As values are separated by semicolon so annother solution could be:
int delimiter = serverstring.indexOf(";");
//in string thus giving you the index of where it is in the string
// Now delimiter can be -1, if lets say the string had no ";" at all in it i.e. no ";" is not found.
//check and account for it.
if (delimiter != -1)
String subString= serverstring.substring(5 , iend);
Here 5 means tresc is on number five in string, so it will five you tresc part.
You can then use it anyway you want.

How would I search a certain word after a character in java?

So I am doing some cw, and I want to search a string for words after a hashtag, "#".
How would I go about this?
Say for example the string was 'Hello World #me'? how would i return the word "me"?
kind regards
Use a regex and prepare a Matcher to find hashtags iteratively as
String input = "Hello #World! #Me";
Pattern pattern = Pattern.compile("#(\\S+)");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output :
World!
Me
Split the String on basis of that character say
String []splittedString=inputString.split("#");
System.out.println(splittedString[1]);
So for Input String
Hello World #me'
Output
me
Use this
example.substring(example.indexOf("#") + 1);
Using regex:
// Matches a string of word characters preceded by a '#'
Pattern p = Pattern.compile("(?<=#)\\w*");
Matcher m = p.matcher("Hello World #me");
String hashtag = "";
if(m.find())
{
hashtag = m.group(); //me
}
So then John, let me guess. You're a computer Science student at the university of Warwick. Here you go,
String s = "hello #yolo blaaa";
if(s.contains("#")){
int hash = s.indexOf("#") - 1;
s = s.substring(hash);
int space = s.indexOf(' ');
s = s.substring(space);
}
remove the -1 if you don't want to include the #
A simple way of doing it would be to Use indexOf and then you should use overloaded indexOf with a subString
EG:
String myString = originalString.substring(originalString.indexOf("#"),originalString.indexOf(" "),originalString.indexOf("#"));
Please note that this can throw out of bounds error is the characters are not found. Read the java doc links to understand in detail as to what this is doing.

How do I read and remove a number from a string?

So for example, I have this string:
0no1no2yes3yes4yes
The first 0 here should be removed and used an an index of array. I am doing so by this statement:
string = string.replaceFirst(dataLine.substring(0, 1), "");
However, when I have say this string:
10yes11no12yes13yes14no
My code fails, since I want to process the 10 but my code extracts just the 1.
So in sort, single digits work fine, but double or triple digits cause IndexOutOfBound Error.
Here's the code: http://pastebin.com/uspYp1FK
And here's some sample data: http://pastebin.com/kTQx5WrJ
Here's the output for the sample data:
Enter filename: test.txt
Data before cleanUp: {"assignmentID":"2CCYEPLSP75KTVG8PTFALQES19DXRA","workerID":"AGMJL8K9OMU64","start":1359575990087,"end":"","elapsedTime":"","itemIndex":0,"responses":[{"jokeIndex":0,"response":"no"},{"jokeIndex":1,"response":"no"},{"jokeIndex":2,"response":"yes"},{"jokeIndex":3,"response":"yes"},{"jokeIndex":4,"response":"yes"}],"mturk":"yes"},
Data after cleanUp: 0no1no2yes3yes4yes
Data before cleanUp: {"assignmentID":"2118D8J3VE7W013Z4273QCKAGJOYID","workerID":"A2P0GYVEKGM8HF","start":1359576154789,"end":"","elapsedTime":"","itemIndex":3,"responses":[{"jokeIndex":15,"response":"no"},{"jokeIndex":16,"response":"no"},{"jokeIndex":17,"response":"no"},{"jokeIndex":18,"response":"no"},{"jokeIndex":19,"response":"no"}],"mturk":"yes"},
Data after cleanUp: 15no16no17no18no19no
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 2
at java.lang.String.substring(String.java:1907)
at jokes.main(jokes.java:34)
Basically, what the code is supposed to do is strip off the data into strings as shown above, and then read the number, and if it's followed by yes increase it's index's value in dataYes, or if followed by no increase value in dataNo. Makes sense?
What can I do? How can I make my code more flexible?
An alternative, more specific attempt: -
String regex = "^(\\d+)(yes|no)";
String myStr = "10yes11no";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(myStr);
while (m.find())
{
String all = m.group();
String digits = m.group(1);
String bool = m.group(2);
// do not try and combine the next 2 lines ... it doesn't work!
myStr = myStr.substring(all.length());
m.reset(myStr);
System.out.println(String.format("all = %s, digits = %s, bool = %s", all, digits, bool));
}
does it work for you?
string = string.replaceAll("^\\d+","");
Try this
System.out.println("10yes11no12yes13yes14no".replaceFirst("^\\d+",""));
How about: -
String regex = "^\\d+";
String myStr = "10abc11def";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(myStr);
if(m.find())
{
String digits = m.group();
myStr = m.replaceFirst("");
}

Substring to remove everything before first period and after second

So I have a filename that looks like this:
myFile.12345.txt
If I wanted to end up with just the "12345" how would I go about removing that from the filename if the 12345 could be anywhere between 1 and 5 numbers in length?
If you are sure that there would be 2 periods . for sure
String fileName = string.split("\\.")[1]
you can use this
String s="ghgj.7657676.jklj";
String p = s.substring(s.indexOf(".")+1,s.lastIndexOf("."));
Assuming you want to extract all the numbers, you could use a simple regex to remove all the non-digits characters:
String s = "myFile.12345.txt";
String numbers = s.replaceAll("[^\\d]","");
System.out.println(numbers); //12345
Note: It would not work with file12.12345.txt for example
static final Pattern P = Pattern.compile("^(.*?)\\.(.*?)\\.(.*?)$");
...
...
...
Matcher m = P.matcher(input);
if (m.matches()) {
//String first = m.group(1);
String middle = m.group(2);
//String last = m.group(3);
...
}

Categories

Resources