I am getting file names as string as follows:
file_g001
file_g222
g_file_z999
I would like to return files that contains "g_x" where x is any number (as string). Note that the last file should not appear as the g_ is followed by an alphabet and not a number like the first 2.
I tried: file.contains("_g[0-9]*$") but this didn't work.
Expected results:
file_g001
file_g222
Are you using the method contains of String ?
If so, it does not work with regular expression.
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#contains-java.lang.CharSequence-
public boolean contains(CharSequence s)
Returns true if and only if this string contains the specified sequence of char values.
Consider using the method matches.
https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#matches(java.lang.String)
Your regular expression is also fine, we'd just slightly improve that to:
^.*_g[0-9]+$
or
^.*_g\d+$
and it would likely work.
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "^.*_g[0-9]+$";
final String string = "file_g001\n"
+ "file_g222\n"
+ "file_some_other_words_g222\n"
+ "file_g\n"
+ "g_file_z999";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
I have the following string;
String s = "Hellow world,how are you?\"The other day, where where you?\"";
And I want to replace the , but only the one that is inside the quotation mark \"The other day, where where you?\".
Is it possible with regex?
String s = "Hellow world,how are you?\"The other day, where where you?\"";
Pattern pattern = Pattern.compile("\"(.*?)\"");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
s = s.substring(0, matcher.start()) + matcher.group().replace(',','X') +
s.substring(matcher.end(), s.length());
}
If there are more then two quotes this splits the text into in quote/out of quote and only processes inside quotes. However if there are odd number of quotes (unmatched quotes), the last quote is ignored.
If you are sure this is always the last "," you can do that
String s = "Hellow world,how are you?\"The other day, where where you?\"";
int index = s.lastIndexOf(",");
if( index >= 0 )
s = new StringBuilder(s).replace(index , index + 1,"X").toString();
System.out.println(s);
Hope it helps.
I'm having trouble accomplishing a few things with my program, I'm hoping someone is able to help out.
I have a String containing the source code of a HTML page.
What I would like to do is extract all instances of the following HTML and place it in an array:
<img src="http://*" alt="*" style="max-width:460px;">
So I would then have an array of X size containing values similar to the above, obviously with the src and alt attributes updated.
Is this possible? I know there are XML parsers, but the formatting is ALWAYS the same.
Any help would be greatly appreciated.
I'll suggest using ArrayList instead of a static array since it looks like you don't know how many matches you are going to have.
Also not good idea to have REGEX for HTML but if you are sure the tags always use the same format then I'll recommend:
Pattern pattern = Pattern.compile(".*<img src=\"http://(.*)\" alt=\"(.*)\"\\s+sty.*>", Pattern.MULTILINE);
Here is an example:
public static void main(String[] args) throws Exception {
String web;
String result = "";
for (int i = 0; i < 10; i++) {
web = "<img src=\"http://image" + i +".jpg\" alt=\"Title of Image " + i + "\" style=\"max-width:460px;\">";
result += web + "\n";
}
System.out.println(result);
Pattern pattern = Pattern.compile(".*<img src=\"http://(.*)\" alt=\"(.*)\"\\s+sty.*>", Pattern.MULTILINE);
List<String> imageSources = new ArrayList<String>();
List<String> imageTitles = new ArrayList<String>();
Matcher matcher = pattern.matcher(result);
while (matcher.find()) {
String imageSource = matcher.group(1);
String imageTitle = matcher.group(2);
imageSources.add(imageSource);
imageTitles.add(imageTitle);
}
for(int i = 0; i < imageSources.size(); i++) {
System.out.println("url: " + imageSources.get(i));
System.out.println("title: " + imageTitles.get(i));
}
}
}
As your getting an ArrayIndexOutOfBoundsException, it is most likely that the String array imageTitles is not big enough to hold all instances of ALT that are found in the regex search. In this case it is likely that it is a zero-size array.
I want to parse a line from a CSV(comma separated) file, something like this:
Bosh,Mark,mark#gmail.com,"3, Institute","83, 1, 2",1,21
I have to parse the file, and instead of the commas between the apostrophes I wanna have ';', like this:
Bosh,Mark,mark#gmail.com,"3; Institute","83; 1; 2",1,21
I use the following Java code but it doesn't parse it well:
Pattern regex = Pattern.compile("(\"[^\\]]*\")");
Matcher matcher = regex.matcher(line);
if (matcher.find()) {
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
the output is:
Bosh,Mark,mark#gmail.com,"3; Institute";"83; 1; 2",1,21
anyone have any idea how to fix this?
This is my solution to replace , inside quote to ;. It assumes that if " were to appear in a quoted string, then it is escaped by another ". This property ensures that counting from start to the current character, if the number of quotes " is odd, then that character is inside a quoted string.
// Test string, with the tricky case """", which resolves to
// a length 1 string of single quote "
String line = "Bosh,\"\"\"\",mark#gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern pattern = Pattern.compile("\"[^\"]*\"");
Matcher matcher = pattern.matcher(line);
int start = 0;
StringBuilder output = new StringBuilder();
while (matcher.find()) {
// System.out.println(m.group() + "\n " + m.start() + " " + m.end());
output
.append(line.substring(start, matcher.start())) // Append unrelated contents
.append(matcher.group().replaceAll(",", ";")); // Append replaced string
start = matcher.end();
}
output.append(line.substring(start)); // Append the rest of unrelated contents
// System.out.println(output);
Although I cannot find any case that will fail the method of replace the matched group like you did in line = line.replace(matcher.group(), replacedMatch);, I feel safer to rebuild the string from scratch.
Here's a way:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
String in = "Bosh,Mark,mark#gmail.com,\"3, \"\" Institute\",\"83, 1, 2\",1,21";
String regex = "[^,\"\r\n]+|\"(\"\"|[^\"])*\"";
Matcher matcher = Pattern.compile(regex).matcher(in);
StringBuilder out = new StringBuilder();
while(matcher.find()) {
out.append(matcher.group().replace(',', ';')).append(',');
}
out.deleteCharAt(out.length() - 1);
System.out.println(in + "\n" + out);
}
}
which will print:
Bosh,Mark,mark#gmail.com,"3, "" Institute","83, 1, 2",1,21
Bosh,Mark,mark#gmail.com,"3; "" Institute","83; 1; 2",1,21
Tested on Ideone: http://ideone.com/fCgh7
Here is the what you need
String line = "Bosh,Mark,mark#gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern regex = Pattern.compile("(\"[^\"]*\")");
Matcher matcher = regex.matcher(line);
while(matcher.find()){
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
line will have value you needed.
Have you tried to make the RegExp lazy?
Another idea: inside the [] you should use a " too. If you do that, you should have the expected output with global flag set.
Your regex is faulty. Why would you want to make sure there are no ] within the "..." expression? You'd rather make the regex reluctant (default is eager, which means it catches as much as it can).
"(\"[^\\]]*\")"
should be
"(\"[^\"]*\")"
But nhadtdh is right, you should use a proper CSV library to parse it and replace , to ; in the values the parser returns.
I'm sure you'll find a parser when googling "Java CSV parser".
Shouldn't your regex be ("[^"]*") instead? In other words, your first line should be:
Pattern regex = Pattern.compile("(\"[^\"]*\")");
Of course, this is assuming you can't have quotes in the quoted values of your input line.
i am trying to find a certain tag in a html-page with java. all i know is what kind of tag (div, span ...) and the id ... i dunno how it looks, how many whitespaces are where or what else is in the tag ... so i thought about using pattern matching and i have the following code:
// <tag[any character may be there or not]id="myid"[any character may be there or not]>
String str1 = "<" + Tag + "[.*]" + "id=\"" + search + "\"[.*]>";
// <tag[any character may be there or not]id="myid"[any character may be there or not]/>
String str2 = "<" + Tag + "[.*]" + "id=\"" + search + "\"[.*]/>";
Pattern p1 = Pattern.compile( str1 );
Pattern p2 = Pattern.compile( str2 );
Matcher m1 = p1.matcher( content );
Matcher m2 = p2.matcher( content );
int start = -1;
int stop = -1;
String Anfangsmarkierung = null;
int whichMatch = -1;
while( m1.find() == true || m2.find() == true ){
if( m1.find() ){
//System.out.println( " ... " + m1.group() );
start = m1.start();
//ende = m1.end();
stop = content.indexOf( "<", start );
whichMatch = 1;
}
else{
//System.out.println( " ... " + m2.group() );
start = m2.start();
stop = m2.end();
whichMatch = 2;
}
}
but i get an exception with m1(m2).start(), when i enter the actual tag without the [.*] and i dun get anything when i enter the regular expression :( ... i really havent found an explanation for this ... i havent worked with pattern or match at all yet, so i am a little lost and havent found anything so far. would be awesome if anyone could explain me what i am doing wrong or how i can do it better ...
thnx in advance :)
... dg
I know that I am broadening your question, but I think that using a dedicated library for parsing HTML documents (such as: http://htmlparser.sourceforge.net/) will be much more easier and accurate than regexps.
Here is an example for what you're trying to do adapted from one of my notes:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String tag = "thetag";
String id = "foo";
String content = "<tag1>\n"+
"<thetag name=\"Tag Name\" id=\"foo\">Some text</thetag>\n" +
"<thetag name=\"AnotherTag\" id=\"foo\">Some more text</thetag>\n" +
"</tag1>";
String patternString = "<" + tag + ".*?name=\"(.*?)\".*?id=\"" + id + "\".*?>";
System.out.println("Content:\n" + content);
System.out.println("Pattern: " + patternString);
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(content);
boolean found = false;
while (matcher.find()) {
System.out.format("I found the text \"%s\" starting at " +
"index %d and ending at index %d.%n",
matcher.group(), matcher.start(), matcher.end());
System.out.println("Name: " + matcher.group(1));
found = true;
}
if (!found) {
System.out.println("No match found.");
}
}
}
You'll notice that the pattern string becomes something like <thetag.*?name="(.*?)".*?id="foo".*?> which will search for tags named thetag where the id attribute is set to "foo".
Note the following:
It uses .*? to weakly match zero or more of anything (if you don't understand, try removing the ? to see what I mean).
It uses a submatch expression between parenthesis (the name="(.*?)" part) to extract the contents of the name attribute (as an example).
I think each call to find is advancing through your match. Calling m1.find() inside your condition is moving your matcher to a place where there is no longer a valid match, which causes m1.start() to throw (I'm guessing) an IllegalStateException Ensuring you call find once per iteration and referencing that result from some flag avoids this problem.
boolean m1Matched = m1.find()
boolean m2Matched = m2.find()
while( m1Matched || m2Matched ) {
if( m1Matched ){
...
}
m1Matched = m1.find();
m2Matched = m2.find();
}