Search on a particular line using Regular Expression in Java - java

I am new with Regular Expression and might be my question is very basic one.
I want to create a regular expression that can search an expression on a particular line number.
eg.
I have data
"\nerferf erferfre erferf 12545" +
"\ndsf erf" +
"\nsdsfd refrf refref" +
"\nerferf erferfre erferf 12545" +
"\ndsf erf" +
"\nsdsfd refrf refref" +
"\nerferf erferfre erferf 12545" +
"\ndsf erf" +
"\nsdsfd refrf refref" +
"\nerferf erferfre erferf 12545" +
And I want to search the number 1234 on 7th Line. It may or may not be present on other lines also.
I have tried with
"\\n.*\\n.*\\n.*\\n.*\\n.*\\n.*\\d{4}"
but am not getting the result.
Please help me out with the regular expression.

Firstly, your newline character should be placed at the end of the lines. That way, picturing a particular line would be easier. Below explanation is based on this modification.
Now, to get to 7th line, you would first need to skip the first 6 line, that you can do with {n,m} quantifier. You don't need to write .*\n 6 times. So, that would be like this:
(.*\n){6}
And then you are at 7th line, where you can match your required digit. That part would be something like this:
.*?1234
And then match rest of the text, using .*
So, your final regex would look like:
(?s)(.*\n){6}.*?1234.*
So, just use String#matches(regex) method with this regex.
P.S. (?s) is used to enable single-line matching. Since dot(.) by default, does not matches the newline character.
To print something you matched, you can use capture groups:
(?s)(?:.*\n){6}.*?(1234).*
This will capture 1234 if matched in group 1. Although it seems unusual, that you capture an exact string that you are matching - like capturing 1234 is no sense here, as you know you are matching 1234, and not against \\d, in which case you might be interested in exactly what are those digits.

Try
Pattern p = Pattern.compile("^(\\n.*){6}\\n.*\\d{4}" );
System.out.println(p.matcher(s).find());

This problem is better not solved with regex alone. Start by splitting the string on a newline character, to get an array of lines:
String[] lines = data.split("\\n");
Then, to execute the regex on line 7:
try {
String line7 = lines[6];
// do something with it
} catch (IndexOutOfBoundsException ex) {
System.error.println("Line not found");
}
Hope this is a start for you.
Edit: I'm not a pro in Regex but I would try with this one:
"(\\n.*){5}(.*)"
Sorry if this isn't the correct Java syntax but this should capture 5 new lines + data first, so that's six lines gone, and the data itself should be available in the second capture group (including newline). If you want to exclude the newline in front:
"(\\n.*){5}\\n(.*)"

You can use:
(^.*\r\n)(^.*\r\n)(^.*\r\n)(^.*\r\n)(^.*\r\n)(^.*\r\n)(^.*)(1234)

Related

How to spot * in regular expressions?

I want to spot and delete all lines that have *** in them. How can I do this?
I tried to use regex but got
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 6
Here is my regular expression: (?m)^**.*.
.........text...........
***..........text....... //want to delete this line
........................
The * character in a regular expression has a special meaning. To show the Pattern you don't mean for this special meaning, you have to "escape" it. The easiest way to do it is to put your expression through Pattern.quote().
For example:
String searchFor = Pattern.quote("***");
Then use that string to search
Note that* is a special character in regex so you have to use \\*
Your expression will be: (?m)^\\*\\*.\\*
This is not perfect, but it'll get you started:
// 4 lines, 2 of each containing "***" at random locations
String input = "abc***def\nghijkl\n***mnop\n**blah";
// replacing multiline pattern starting with any character 0 or more times,
// followed by 3 escaped "*"s,
// followed by any character 0 or more times
System.out.println(input.replaceAll("(?m).*\\*{3}.*", ""));
Output:
ghijkl
**blah
If the three asterisks are not always at the begining of the line, you can use this pattern that removes newlines too:
(\r?\n)?[^\r\n*]*\Q***\E.*((1)?|\r?\n?)
If all you're doing is looking for three specific characters together in a string, you don't need a regex at all:
if (line.contains("***")) {
...
}
(But if things get more complicated and you do need a regex, then use a backslash or Pattern.quote as the other answers say.)
(This is assuming you're reading lines one at a time, instead of having one big long buffer containing all the lines with newline characters. Some of the other answers handle the latter case.)

capture all characters between match character (single or repeated) on string

I'm trying to extract the string preceding a specific character (even when character is repeated, like this (ie: underscore '_'):
this_is_my_example_line_0
this_is_my_example_line_1_
this_is_my_example_line_2___
_this_is_my_ _example_line_3_
__this_is_my___example_line_4__
and after running my regex I should get this (the regex should ignore the any instances of the matching character in the middle of the string):
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4
In other words I'm trying to 'trim' the matched character(s) at the beginning and end of string.
I'm trying to use a Regex in Java to accomplish this, my idea is to capture the group of characters between the special character(s) at the end or beginning of the line.
So far I can only do this successfully for example 3 with this regexp:
/[^_]+|_+(.*)[_$]+|_$+/
[^_]+ not 'underscore' once or more
| OR
_+ underscore once or more
(.*) capture all characters
[_$]+ not 'underscore' once or more followed by end of line
|_$+ OR 'underscore' once or more followed by end of line
I just realized that this excludes the first word of the message on example 0,1,2 since the string doesn't start with underscore and it only starts matching after finding a underscore..
Is there an easier way not involving regex?
I don't really care about the first character (although it would be nice) I only need to ignore the repeating character at the end.. it looks that (by this regex tester) just doing this, would work? /()_+$/ the empty parenthesis matches anything before a single or repeting matches at the end of the line.. would that be correct?
Thank you!
There are a couple of options here, you could either replace matches of ^_+|_+$ with an empty string, or extract the contents of the first capture group from the match of ^_*(.*?)_*$. Note that if your strings may be multiple lines and you want to perform the replacement on each line then you will need to use the Pattern.MULTILINE flag for either approach. If your strings may be multiple lines and you only want to replacement to occur at the very beginning and end, don't use Pattern.MULTILINE but use Pattern.DOTALL for the second approach.
For example: http://regexr.com?355ff
How about [^_\n\r](.*[^_\n\r])??
Demo
String data=
"this_is_my_example_line_0\n" +
"this_is_my_example_line_1_\n" +
"this_is_my_example_line_2___\n" +
"_this_is_my_ _example_line_3_\n" +
"__this_is_my___example_line_4__";
Pattern p=Pattern.compile("[^_\n\r](.*[^_\n\r])?");
Matcher m=p.matcher(data);
while(m.find()){
System.out.println(m.group());
}
output:
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4

Using java.util.regex.Pattern

I´m not a programmer, so my level is newie in this field. I must create a regular expression to check two lines. Between these two lines A and B could be one, two or more different lines.
I´ve been reviewing link http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html but i´ve not reach the solution, althouth i think that i´m very close to the solution.
I am testing the expression
^(.*$)
and this gets an entire line. If i write this expression twice it gets two lines. So it seems that this expression is getting as entire lines as occurrences of the expression.
But, i would like to check undetermined lines between A and B. I know that at least it will be one line
If i write ^(.*$){1,} it doesn´t work.
Anyone knows which could be the mistake?
Thank you for your time
Andres
DOT . in regex matches any character except newline character.
You're looking for DOTALL or s flag here that makes dot match any character including newline character as well. So if you want to match all the lines between literals A and B then use this regex:
(?s)A.*?B
(?s) is for DOTALL that will make .*? match all the characters including newline characters between A and B.
? is to make above regex non-greedy.
Read More: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html
Why don't you use Scanner ? It might be more related to what you want:
Scanner sc = new ...
while (sc.nextLine().compareTo(strB)!=0) {
whatYouWantToDo
}
You could try to search for line terminators \r and \n. Depending on the source of the file you maybe have to experiment a bit.
As far as I understood it, you want to match the lines, with at least one empty line in between? Try ^(.*)$\n{2,}^(.*)$
If you want to find two equal lines, using regex:
Pattern pattern = Pattern.compile("^(?:.*\n)*(.*\n)(?:.*\n)*\\1");
// Skip some lines, find a line, skip some lines, find the first group `(...)`
Matcher m = pattern.matcher(text);
while (m.find()) {
System.out.println("Double: " + m.group(1);
}
The (?: ...) is a non-capturing group; that is, not available through m.group(#).
However this won't find line B in: "A\nB\nA\nB\n".

How can I write a regex in Java that will perform a .replaceFirst on a group that is not in a comment?

So I need to return modified String where it replaces the first instance of a token with another token while skipping comments. Here's an example of what I'm talking about:
This whole quote is one big String
-- I don't want to replace this ##
But I want to replace this ##!
Being a former .NET developer, I thought this was easy. I'd just do a negative lookbehind like this:
(?<!--.*)##
But then I learned Java can't do this. So upon learning that the curly braces are okay, I tried this:
(?<!--.{0,9001})##
That didn't throw an exception, but it did match the ## in the comment.
When I test this regex with a Java regex tester, it works as expected. About the only thing I can think of is that I'm using Java 1.5. Is it possible that Java 1.5 has a bug in its regex engine? Assuming it does, how do I get Java 1.5 to do what I want it to do without breaking up my string and reassembling it?
EDIT I changed the # to the -- operator since it looks like the regex will be more complex with two chars instead of one. I originally did not reveal that I was modifying a query in order to avoid off topic discussion on "Well you shouldn't modify queries that way!" I have a very good reason for doing this. Please don't discuss query modification good practices. Thanks
You really don't need a negative look-behind here. You can do it without that too.
It would be like this:
String str = "I don't want to replace this ##";
str = str.replaceAll("^([^#].*?)##", "$1");
So, it replaces first occurrence of ## in the string that does not start with # with the part of the string before ##. So, ## is removed. Here replaceAll works because it uses a reluctant quantifier - .*?. So, it will automatically stop at the first ##.
As correctly pointed out by #nhahtdh in the comment, that this might fail, if your comment is at the end of the line. So, you can rather use this one:
String str = "I don't want to # replace this ##";
str = str.replaceAll("^([^#]*?)##", "$1");
This one will work for any case. And in the given example case, it won't replace the ##, as it is a part of the comment.
If your comment start is denoted by two characters, then negated character class won't work. You would need to use negative look-ahead like this:
String str = "This whole quote ## is one big String -- asdf ##\n" +
"-- I don't want to replace this ##\n" +
"But I want to replace this ##!";
str = str.replaceAll("(?m)^(((?!--).)*?)##", "$1");
System.out.println(str);
Output:
This whole quote is one big String -- asdf ##
-- I don't want to replace this ##
But I want to replace this !
(?m) at the beginning of the pattern is used to enable MULTILINE mode of matching, so the ^ will match the start of each line, rather than the start of the entire expression.
You can use something like this:
String string = "This whole quote is one big String\n" +
"# I don't want to replace this ##\n" +
"And I also # don't want to replace this ##\n" +
"But I want to replace this ##!\n" +
"But not this ##!";
Matcher m =
Pattern.compile (
"^((?:[^##]|#[^#]|#[^\n]*)*)##", Pattern.MULTILINE).
matcher (string);
StringBuffer result = new StringBuffer ();
if (m.find ())
m.appendReplacement (result, "$1FOO");
m.appendTail (result);
System.out.println (result.toString ());

Regular expressions: all words after my current one are gone

I need to remove all strings from my text file, such as:
flickr:user=32jdisffs
flickr:user=acssd
flickr:user=asddsa89
I'm currently using fields[i] = fields[i].replaceAll(" , flickr:user=.*", "");
however the issue with this is approach is that any word after flickr:user= is removed from the content, even after the space.
thanks
You probably need
replaceAll("flickr:user=[0-9A-Za-z]+", "");
flickr:user=\w+ should do it:
String noFlickerIdsHere = stringWithIds.replaceAll("flickr:user=\\w+", "");
Reference:
\w = A word character: [a-zA-Z_0-9]
Going by the question as stated, chances are that you want:
fields[i] = fields[i].replaceAll(" , flickr:user=[^ ]* ", ""); // or " "
This will match the string, including the value of user up to but not including the first space, followed by a space, and replace it either by a blank string, or a single space. However this will (barring the comma) net you an empty result with the input you showed. Is that really what you want?
I'm also not sure where the " , " at the beginning fits into the example you showed.
The reason for your difficulties is that an unbounded .* will match everything from that point up until the end of the input (even if that amounts to nothing; that's what the * is for). For a line-based regular expression parser, that's to the end of the line.

Categories

Resources