I'd like to extract 2 arguments from given string using regex. For example:
C:\Users "C:\Program files"
C:\mytext.txt mytext2.txt
Output would be C:\Users and C:\Program files
C:\mytext.txt and mytext2.txt
If string is between " " it can contain white spaces, otherwise it has to be without them. So far I managed to extract arguments between " ", but can't figure out how to extract them when one argument has " " and the other one doesn't (like in example above).
Pattern p = Pattern.compile("\"(.*?)\"");
Matcher m = p.matcher(string);
while(m.find()){
System.out.println(m.group(1));
}
You can use this regex for matching:
Pattern p = Pattern.compile("\"[^\"]*\"|\\S+");
RegEx Demo
Related
Can you suggest me an approach by which I can split a String which is like:
:31C:150318
:31D:150425 IN BANGLADESH
:20:314015040086
So I tried to parse that string with
:[A-za-z]|\\d:
This kind of regular expression, but it is not working . Please suggest me a regular expression by which I can split that string with 20 , 31C , 31D etc as Keys and 150318 , 150425 IN BANGLADESH etc as Values .
If I use string.split(":") then it would not serve my purpose.
If a string is like:
:20: MY VALUES : ARE HERE
then It will split up into 3 string , and key 20 will be associated with "MY VALUES" , and "ARE HERE" will not associated with key 20 .
You may use matching mechanism instead of splitting since you need to match a specific colon in the string.
The regex to get 2 groups between the first and second colon and also capture everything after the second colon will look like
^:([^:]*):(.*)$
See demo. The ^ will assert the beginning of the string, ([^:]*) will match and capture into Group 1 zero or more characters other than :, and (.*) will match and capture into Group 2 the rest of the string. $ will assert the position at the end of a single line string (as . matches any symbol but a newline without Pattern.DOTALL modifier).
String s = ":20:AND:HERE";
Pattern pattern = Pattern.compile("^:([^:]*):(.*)$");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("Key: " + matcher.group(1) + ", Value: " + matcher.group(2) + "\n");
}
Result for this demo: Key: 20, Value: AND:HERE
You can use the following to split:
^[:]+([^:]+):
Try with split function of String class
String[] splited = string.split(":");
For your requirements:
String c = ":31D:150425 IN BANGLADESH:todasdsa";
c=c.substring(1);
System.out.println("C="+c);
String key= c.substring(0,c.indexOf(":"));
String value = c.substring(c.indexOf(":")+1);
System.out.println("key="+key+" value="+value);
Result:
C=31D:150425 IN BANGLADESH:todasdsa
key=31D value=150425 IN BANGLADESH:todasdsa
I want to extract only filename from the complete file name + time stamp . below is the input.
String filePath = "fileName1_20150108.csv";
expected output should be: "fileName1"
String filePath2 = "fileName1_filedesc1_20150108_002_20150109013841.csv"
And expected output should be: "fileName1_filedesc1"
I wrote a below code in java to get the file name but it is working for first part (filePath) but not for filepath2.
Pattern pattern = Pattern.compile(".*.(?=_)");
String filePath = "fileName1_20150108.csv";
String filePath2 = "fileName1_filedesc1_20150108_002_20150109013841.csv";
Matcher matcher = pattern.matcher(filePath);
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
System.out.println(matcher.group());
}
Can somebody please help me to correct the regex so i can parse both filepath using same regex?
Thanks
Anchor the start, and make the .* non-greedy:
^.*?(_\D.*?)?(?=[_.])
Update: change the second group (for fileDesc) to optional, and enforce that it starts with a non-digit character. This will work as long as your fileDesc strings never start with numbers.
You can get the characters before the first underscode, the first underscore, and then the characters until the next underscore:
^[^_]*_[^_]*
This should work: "^(.*?)_([0-9_]*)\\.([^.]*)$"
It will return you 3 groups:
the base name (assuming not a single part will be all numbers)
the timestamp info
the extension.
You can test here: http://fiddle.re/v0hne6 (RegexPlanet)
This is related to: RegEx: Grabbing values between quotation marks.
If there is a String like this:
HYPERLINK "hyperlink_funda.docx" \l "Sales"
The regex given on the link
(["'])(?:(?=(\\?))\2.)*?\1
is giving me
[" HYPERLINK ", " \l ", " "]
What regex will return values enclosed in quotation mark (specifically between the \" marks) ?
["hyperlink_funda.docx", "Sales"]
Using Java, String.split(String regex) way.
You're not supposed to use that with .split() method. Instead use a Pattern with capturing groups:
{
Pattern pattern = Pattern.compile("([\"'])((?:(?=(\\\\?))\\3.)*?)\\1");
Matcher matcher = pattern.matcher(" HYPERLINK \"hyperlink_funda.docx\" \\l \"Sales\" ");
while (matcher.find())
System.out.println(matcher.group(2));
}
Output:
hyperlink_funda.docx
Sales
Here is a regex demo, and here is an online code demo.
I think you are misunderstanding the nature of the String.split method. Its job is to find a way of splitting a string by matching the features of the separator, not by matching features of the strings you want returned.
Instead you should use a Pattern and a Matcher:
String txt = " HYPERLINK \"hyperlink_funda.docx\" \\l \"Sales\" ";
String re = "\"([^\"]*)\"";
Pattern p = Pattern.compile(re);
Matcher m = p.matcher(txt);
ArrayList<String> matches = new ArrayList<String>();
while (m.find()) {
String match = m.group(1);
matches.add(match);
}
System.out.println(matches);
I'm trying to extract part of the URL in the text files.
for example:
/p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed" class="search_bin"><span>Closed Tickets</span></a>
I would like to extract only
/p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed
HOW I COULD DO THAT BY USING REGULAR Expression. I tried with regex
"/p/*./bugs/*."
but it didn't work.
Try this:
"\/p.*\/bugs[^"]*"
it means: "/p"
then: all chars,
then: "/bugs",
then: all chars except "
You can use :
(\/p\/.*\/bugs\/.*?(?="))
Java Code :
String REGEX = "(\\/p\\/.*\\/bugs\\/.*?(?=\"))";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(line);
while (m.find()) {
String matched = m.group();
System.out.println("Mached : "+ matched);
}
OUTPUT
Mached : /p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed
DEMO
Explanation:
Here's another way:
(?i)/p/[a-z/]+bugs/[^ "]+
The (?i) in the beginning makes the regex case insensitive so you don't have to worry about that. Then after bugs/ it will continue until it reaches either a space or a ".
Is possible, in java, to make a regex for matching the end of the string but not the newlines, using the Pattern.DOTALL option and searching for a line with \n?
Examples:
1)
aaa\n==test==\naaa\nbbb\naaa
2)
bbb\naaa==toast==cccdd\nb\nc
3)
aaa\n==trick==\naaaDDDaaa\nbbb
I want to match
\naaa\nbbb\naaa
and
cccdd\nb\nc
but, in the third example, i don't want to match text ater DDD.
\naaa
Yes, there is. For example, (?-m)}$ will match a close-brace at the very end of a Java source file. The point is to disable the multiline mode. You can disable as I've shown or by setting the appropriate flag on the Pattern instance.
UPDATE: I believe that multiline is off by default when you instantiate a Pattern, but is on in Eclipse's find by regex.
The regex you need is:
"(?s)==(?!.*?==)([^(?:DDD)]*)"
Here is the full code:
String[] sarr = {"aaa\n==test==\naaa\nbbb\naaa", "bbb\naaa==toast==cccdd\nb\nc",
"aaa\n==trick==\naaaDDDaaa\nbbb"};
Pattern pt = Pattern.compile("(?s)==(?!.*?==)([^(?:DDD)]*)");
for (String s : sarr) {
Matcher m = pt.matcher(s);
System.out.print("For input: [" + s + "] => ");
if (m.find())
System.out.println("Matched: [" + m.group(1) + ']');
else
System.out.println("Didn't Match");
}
OUTPUT:
For input: [aaa\n==test==\naaa\nbbb\naaa] => Matched: [\naaa\nbbb\naaa]
For input: [bbb\naaa==toast==cccdd\nb\nc] => Matched: [cccdd\nb\nc]
For input: [aaa\n==trick==\naaaDDDaaa\nbbb] => Matched: [\naaa]