Java Regex Return Last Word - java

String regex = "(some|text|)";
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(input);
while (matcher.find())
{
int start = matcher.start();
int end = matcher.end();
System.out.print("Start index: " + start);
System.out.print(" End index: " + end + " ");
System.out.println(matcher.group());
}
Hi I would like to return the the full substring including to the end of the word. For example,
if the input is:
String input = "I am a texte";
I would expect it to return 7 11, I would basically would want to return text upto "e" rather than "t". Is this possible? If so, how can this be implemented?

Why use a regexp for this? String has lastIndexOf to find the last index of a delimiter (like space), and it looks like you're not trying to find a "word" but "the substring after the last space" (which are not the same thing in many, many languages), so given that, just use:
String last = input.substring(input.getLastIndexOf(' ') + 1);
(optionally as two lines with a check to see if getLastIndexOf is a sensible position), and done?

you can use the following regex instead:
String regex = "(.[^\\s+].*some*.[^\\s]+|.[^\\s+].*text*.[^\\s]+)";
This will take all the words that start with some or text. For example: someone.

Related

Find out number of words in a string with a lot of special character

I need to find out the number of words in a string. However, this string is not the normal type of string. It has a lot of special character like < , /em, /p and many more. So most of the method used in StackOverflow does not work. As a result, I need to define a regular expression by myself.
What I intend to do is to define what is a word using a regular expression and count the number of time a word appears.
This is how I define a word.
It must start with a letter and end with one of this : or , or ! or ? or ' or - or ) or . or "
This is how I define my regular expression
pattern = Pattern.compile("^[a-zA-Z](:|,|!|?|'|-|)|.|")$");
matcher = pattern.matcher(line);
while (matcher.find())
wordCount++;
However, there is an error with the first line
pattern = Pattern.compile("^[a-zA-Z](:|,|!|?|'|-|)|.|")$");
How can I fix this problem?
In fact you also want to remove tags, like <em> (HTML emphasized), which otherwise would count as words. If you then consider full tags with attributes:
<span font="Consolas"> then it is easier to remove tags:
public int static wordCount(String s) {
s.replaceAll("<[A-Za-z/][^>]*>", " ") // Tags as space
.replaceAll("[^\\p{L}\\p{M}\\d]+", " ") // Non-letters, -accents, -digits as blank
.trim() // Not before or after (empty words)
.split(" ").length;
}
It is quite inefficient, replaceAll and trim. At least precompiling and using Pattern would be nicer. But probably not worth it.
Does this help?
String line = "so.this:is,what)you!wanted?";
int wordCount = 0;
Pattern pattern = Pattern.compile("([a-zA-Z]++[:'-,\\.!\\?\")]{1})");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
wordCount++;
}
System.out.println(wordCount); // Prints 6

Extracting a string using Regex

I have the following code to extract the string within double quotes using Regex.
String str ="\"Java\",\"programming\"";
final Pattern pattern = Pattern.compile("\"([^\"]*)\"");
final Matcher matcher = pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
}
The output I get now is java programming.But from the String str I want the content in the second double quotes which is programming. Can any one tell me how to do that using Regex.
If you take your example, and change it slightly to:
String str ="\"Java\",\"programming\"";
final Pattern pattern = Pattern.compile("\"([^\"]*)\"");
final Matcher matcher = pattern.matcher(str);
int i = 0
while(matcher.find()){
System.out.println("match " + ++i + ": " + matcher.group(1) + "\n");
}
You should find that it prints:
match 1: Java
match 2: programming
This shows that you are able to loop over all of the matches. If you only want the last match, then you have a number of options:
Store the match in the loop, and when the loop is finished, you have the last match.
Change the regex to ignore everything until your pattern, with something like: Pattern.compile(".*\"([^\"]*)\"")
If you really want explicitly the second match, then the simplest solution is something like Pattern.compile("\"([^\"]*)\"[^\"]*\"([^\"]*)\""). This gives two matching groups.
If you want the last token inside double quotes, add an end-of-line archor ($):
final Pattern pattern = Pattern.compile("\"([^\"]*)\"$");
In this case, you can replace while with if if your input is a single line.
Great answer from Paul. Well,You can also try this pattern
final Pattern pattern = Pattern.compile(",\"(\\w+)\"");
Java program
String str ="\"Java\",\"programming\"";
final Pattern pattern = Pattern.compile(",\"(\\w+)\"");
final Matcher matcher = pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Explanation
,\": matches a comma, followed by a quotation mark "
(\\w+): matches one or more words
\": matches the last quotation mark "
Then the group(\\w+) is captured (group 1 precisely)
Output
programming

match ;ABC12;10;250.3 using regex java

String regex = "^;[A-Z0-9]{5};[\\d]{1,};[\\d]{1,}.[\\d]{1,}";
String str = ";ABC12;10;250.3";
System.out.println(str.matches(regex));
The above regex works fine.
Consider the following strings
str1=";ABC12;10;250.3"
str2=;ABB62;5;2.3
str3=;ABF02;8;25120.3
str4=;AKC12;11;2504.303
Now i have the string as String strToMatch= str1,str2,str3,str4
How do i convert my regex expression above inorder to match the above string.
Note : There can be n number of comma separated values in the above string. And i also need to take care that the string strToMatch doesnot end with comma.
You can capture the regex with round brackets and repeat one or more times:
String regex = "^(;[A-Z0-9]{5};\\d+;\\d+\\.\\d+){1,}";
Try this pattern instead: (;[A-Z0-9]{5};[\\d]{1,};[\\d]{1,}\\.[\\d]{1,},?)+
This has two differences to your pattern: first I use \\. to denote that this has to be a . because a single dot means "any character" in regex.
Then I used the grouping brackets (...) and the + at the end to say: "Look for this once or more". As the , is optional at the end, I added a ?
If you want to get single matches to process using a Matcher later on, a simple modification should do the trick: (;[A-Z0-9]{5};[\\d]{1,};[\\d]{1,}\\.[\\d]{1,}),?
The + is gone and the ,? is outside the grouping brackets, because those are now capturing brackets (as well).
Example:
final Pattern pattern = Pattern.compile("(;[A-Z0-9]{5};[\\d]{1,};[\\d]{1,}\\.[\\d]{1,}),?");
final Matcher matcher = pattern.matcher(";ABC12;10;250.3,;ABB62;5;2.3,;ABF02;8;25120.3,;AKC12;11;2504.303");
while (matcher.find()) {
System.out.println("Whole match: " + matcher.group());
for (int i = 1; i <= matcher.groupCount(); ++i) {
System.out.println("Group #" + i + ": " + matcher.group(i));
}
}
I have found below way of solving the problem.
String strToMatch = ";ABC12;10;250.3,;ABB62;5;2.3,;ABF02;8;25120.3,;AKC12;11;2504.303";
if(strToMatch.endsWith(",") || strToMatch.startsWith(","))
return false;
else{
String[] str = strToMatch.split(",");
int count = 0;
for (String s : str){
String regex = ";[A-Z0-9]{5};\\d+;\\d+\\.\\d+";
if(s.matches(regex))
return false;
}
return true;
}
Any simpler way than this?

How do I find multiple substrings from one string using regex in Java?

I want to find every instance of a number, followed by a comma (no space), followed by any number of characters in a string. I was able to get a regex to find all the instances of what I was looking for, but I want to print them individually rather than all together. I'm new to regex in general, so maybe my pattern is wrong?
This is my code:
String test = "1 2,A 3,B 4,23";
Pattern p = Pattern.compile("\\d+,.+");
Matcher m = p.matcher(test);
while(m.find()) {
System.out.println("found: " + m.group());
}
This is what it prints:
found: 2,A 3,B 4,23
This is what I want it to print:
found: 2,A
found: 3,B
found: 4,23
Thanks in advance!
try this regex
Pattern p = Pattern.compile("\\d+,.+?(?= |$)");
You could take an easier route and split by space, then ignore anything without a comma:
String values = test.split(' ');
for (String value : values) {
if (value.contains(",") {
System.out.println("found: " + value);
}
}
What you apparently left out of your requirements statement is where "any number of characters" is supposed to end. As it stands, it ends at the end of the string; from your sample output, it seems you want it to end at the first space.
Try this pattern: "\\d+,[^\\s]*"

ignore newline for finding a needle in haystack and preserve the text positions

I'm trying to 'wrap around' a search, basically ignoring \n when using ether indexOf or regex Pattern. I can't just remove all newline chars as then the indexes found would be wrong.
For example:
Matcher matcher = Pattern.compile("dog").matcher("cat\n do\ng cow");
matcher.find();
int start = matcher.start();
int end = matcher.end();
System.out.println("Start: "+start+" End: "+end);
Should output:
Start: 5 End: 9
If I remove the newlines,
Matcher matcher = Pattern.compile("dog").matcher("cat\n do\ng cow".replaceAll("\n",""));
Then the indexes would be messed up:
Start: 4 End: 7
Note: I'm also going to be using more complex regex than I used in the example.
I'm implementing the find function in a text editor and am trying create an 'wrap around' option.
Any ideas?
You need to take search keyword and prepare it by interjecting optional line break after every character before you search in the hey-stack. Consider this code:
String needle = "dog";
String regex = needle.replaceAll("(.(?!$))", "$1\n?"); // inserts line breaks
// regex now becomes "d\n?o\n?g"
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher("cat do\ng cow");
if (matcher.find()) {
int start = matcher.start();
int end = matcher.end();
System.out.println("Start: "+start+" End: "+end);
}
else
System.err.println("No match available");
OUTPUT:
Start: 4 End: 8
BTW your expected output 5 and 9 doesn't seem correct to me.
myString.replaceAll("\n","");
try this one

Categories

Resources