How to match Linux parent directories with Java Regex? - java

Linux path is ../../test/test/mydirectory/.....
I tried to remove all the ../ with this regex
[s{/././///}]
But this removes all special characters
I only want to remove ../../../../../ and leave the real path
String result = path.replaceAll("[s{/././///}]","");
I expect the regex to identify all possible ../../../../ empty parent directories and leave only the real directory where the real path name starts
start only where the letters start

You may use
s.replaceFirst("^(?:\\.{2}/)+", "")
The pattern matches
^ - start of string
(?:\\.{2}/)+ - one or more repetitions of:
\.{2} - two dots
/ - slash.
The .replaceFirst will find the first occurrence of the pattern and will replace it with an empty string.

Related

Regex for partial path

I have paths like these (single lines):
/
/abc
/def/
/ghi/jkl
/mno/pqr/
/stu/vwx/yz
/abc/def/ghi/jkl
I just need patterns that match up to the third "/". In other words, paths containing just "/" and up to the first 2 directories. However, some of my directories end with a "/" and some don't. So the result I want is:
/
/abc
/def/
/ghi/jkl
/mno/pqr/
/stu/vwx/
/abc/def/
So far, I've tried (\/|.*\/) but this doesn't get the path ending without a "/".
I would recommend this pattern:
/^(\/[^\/]+){0,2}\/?$/gm
DEMO
It works like this:
^ searches for the beginning of a line
(\/[^\/]+) searches for a path element
( starts a group
\/ searches for a slash
[^\/]+ searches for some non-slash characters
{0,2} says, that 0 to 2 of those path elements should be found
\/? allows trailling slashes
$ searches for the end of the line
Use these modifiers:
g to search for several matches within the input
m to treat every line as a separate input
You need a pattern like ^(\/\w+){0,2}\/?$, it checks that you have (/ and name) no more than 2 times and that it can end with /
Details :
^ : beginning of the string
(\/\w+) : slash (escaped) and word-char, all in a group
{0,2} the group can be 0/1/2 times
\/? : slash (escaped) can be 0 or 1 time
Online DEMO
Regex DEMO
Your regex (\/|.*\/) uses an alternation which matches either a forward slash or any characters 0+ times greedy followed by matching a forward slash.
So in for example /ghi/jkl, the first match will be the first forward slash. Then this part .* of the next pattern will match from the first g until the end of the string. The engine will backtrack to last forward slash to fullfill the whole .*\/ pattern.
The trailing jkl can not be matched anymore by neither patterns of the alternation.
Note that you don't have to escape the forward slash.
You could use:
^/(?:\w+/?){0,2}$
In Java:
String regex = "^/(?:\\w+/?){0,2}$";
Regex demo
Explanation
^ Start of the string
/ Match forward slash
(?: Non capturing group
\w+ Match 1+ word characters (If you want to match more than \w you could use a character class and add to that what you want match)
/? Match optional forward slash
){0,2} Close non capturing group and repeat 0 - 2 times
$ End of the string
^(/([^/]+){0,2}\/?)$
To break it down
^ is the start of the string
{0,2} means repeat the previous between 0 and 2 times.
Then it ends with an optional slash by using a ?
String end is $ so it doesn't match longer strings.
() Around the whole thing to capture it.
But I'll point out that the is almost always the wrong answer for directory matching. Some directories have special meaning, like /../.. which actually goes up two directories, not down. Better to use the systems directory API instead for more robust results.

Java Regex search two file names

I need some help with Java regular expressions.I have two files:
file_201111.txt.gz
file_2_201111.txt.gz
I need a regular expression to search both the files.
If I use file_[0-9]+.txt.gz I get the first file if I use file_[0-9]_[0-9]+.txt.gz I get the second file.
How can I combine both search patterns to search for the two files?
Thanks
Brief
Since you haven't specified the actual format for all the files, I'll present you with a couple of regular expressions and you can use whichever best matches your needs.
Code
Method 1
This matches an arbitrary number of _ and digits.
See regex in use here
file[_\d]+\.txt\.gz
For all the haters, yes it will match file_.txt.gz, so to prevent that you can use file(?:_\d+)+\.txt\.gz instead.
Method 2
This matches one or two of the _number pattern where number represents any number (1+ digits).
See regex in use here: Both patterns below accomplish the same thing.
file(?:_\d+){1,2}\.txt\.gz
file_\d+(?:_\d+)?\.txt\.gz
Explanation
Method 1
file Match this literally
[_\d]+ Match one or more of any character in the set (_ or digit)
\.txt\.gz Match this literally (note that \. matches a literal dot character .)
Method 2
file Match this literally
(?:_\d+){1,2} Match _\d+ (underscore followed by one or more digits) once or twice
Note that the second option _\d+(?:_\d+)? is essentially the same.
\.txt\.gz Match this literally (note that \. matches a litearl dot character .)
You have to indicate that the optional unit is optional with ?. And since it is a multi-character unit, you should group it with (). Try this:
file(_[0-9])?_[0-9]+\\.txt\\.gz

Escape symbol while spliting string using regex in java

I have a string that recieved while parsing XML-document:
"ListOfItems/Item[Name='Model/Id']/Price"
And I need to split it by delimeter - "/"
String[] nodes = path.split("/") , but with one condition:
"If backslash presence in name of item, like in an example above, I must skip this block and don't split it."
ie after spliting a must get next array of nodes:
ListOfItems, Item[Name='Model/Id'], Price
How can I do it using regex expression?
Thanks for help!
You can split using this regex:
/(?=(?:(?:[^']*'){2})*[^']*$)
RegEx Demo
This regex basically splits on only forward slashes / that are followed be even number of single quotes, which in other words mean that / inside single quotes are not matched for splitting.
A way consists to use this pattern with the find method and to check if the last match is empty. The advantage is that you don't need to add an additional lookahead to test the string until the end for each possible positions. The items you need are in the capture group 1:
\\G/?((?>[^/']+|'[^']*')*)|$
The \G is an anchor that matches either the start of the string or the position after the previous match. Using this forces all the matchs to be contiguous.
(?>[^/']+|'[^']*')* defines the possible content of an item: all that is not a / or a ', or a string between quotes.
Note that the description of a string between quotes can be improved to deal with escaped quotes: '(?>[^'\\]+|\\.)*' (with the s modifier)
The alternation with the $ is only here to ensure that you have parsed all the string until the end. The capture group 1 of the last match must be empty. If it is null, this means that the global research has stopped before the end (for example in case of unbalanced quotes)
example

java regex to strip root element in xpath string

What's the easiest way to strip the root element from an xpath string where anything matching /\w/, as long as the path starts with that pattern, like this:
/root/foo/bar/sushi becomes foo/bar/sushi
/my/t/fine/path becomes t/fine/path
I got this working:
String path = '/root/foo/bar/sushi'
path.replaceFirst('\\/(.*?)\\/', '')
but if path='root/foo/bar/sushi', I don't want anything changed, since that doesn't start with /, but it still strips out the first occurrence of /element/, resulting in rootbar/sushi. I understand why, just having trouble validating the start pattern.
You need the ^ anchor to specify that we are looking for /root/ at the beginning of the string. At the simplest, this regex will do it:
^/[^/]*/
In Java code, this can look like:
String replaced = your_original_string.replaceAll("^/[^/]*/", "");
This works if you know that what you are looking at is a path in the first place.
Explain Regex
^ # the beginning of the string
/ # '/'
[^/]* # any character except: '/' (0 or more times
# (matching the most amount possible))
/ # '/'
Option 2: validate at the same time
On the other hand, if you are not sure that the string is a path, then this regex is not adequate because it will accept any character after the /root/
In that case, you can specify your characters, for instance with
^/[^/]*/([\w-/]+)
for digits, letters, underscores and hyphens. This validation can be further refined to ensure that the characters occur in the right order.
For this regex, you would replace with:
String replaced = your_original_string.replaceAll("^/[^/]*/([\\w-/]+)", "$1");

Differentiating between slashes in a string using a regular expression

A program that I'm writing (in Java) gets input data made up of three kinds of parts, separated by a slash /. The parts can be one of the following:
A name matching the regular expression \w*
A call matching the expression \w*\(.*\)
A path matching the expression <.*>|\".*\". A path can contain slashes.
An example string could look like this:
bar/foo()/foo(bar)/<foo/bar>/bar/"foo/bar"/foo()
which has the following structure
name/call/call/path/name/path/call
I want to split this string into parts, and I'm trying to do this using a regular expression. My current expression captures slashes after calls and paths, but I'm having trouble getting it to capture slashes after names without also including slashes that may exist within paths. My current expression, just capturing slashes after paths and calls looks like this:
(?<=[\)>\"])/
How can I expand this expression to also capture slashes after names without including slashes within paths?
(\w+|\w+\([^/]*\)(?:/\w+\([^/]*\))*|<[^>]*>|"[^"]*")(?=/|$)
captures this from the string 'bar/foo()/foo(bar)/<foo/bar>/bar/"foo/bar"/foo()'
'bar'
'foo()/foo(bar)'
'<foo/bar>'
'bar'
'"foo/bar"'
'foo()'
It does not capture the separating slashes, though (what for? - just assume they are there).
The simpler (\w+|\w+\([^/]*\)|<[^>]*>|"[^"]*")(?=/|$) would capture calls separately:
"foo()"
"foo(bar)"
EDIT: Usually, I do a regex breakdown:
( # begin group 1 (for alternation)
\w+ # at least one word character
| # or...
\w+ # at least one word character
\( # a literal "("
[^/]* # anything but a "/", as often as possible
\) # a literal ")"
| # or...
< # a "<"
[^>]* # anything but a ">", as often as possible
> # a ">"
| # or...
" # a '"'
[^"]* # anything but a '"', as often as possible
" # a '"'
) # end group 1
(?=/|$) # look-ahead: ...followed by a slash or the end of string
My first thought was to match slashes with an even number of quotes to the left of it. (I.e., having a positive look behind of something like (".*")* but this ends up in an exception saying
Look-behind group does not have an obvious maximum length
Honestly I think you'd be better of with a Matcher, using an or:ed together version of your components, (something like \w*|\w*\(.*\)|(<.*>|\".*\")) and do while (matcher.find()).
Having your deliminator for your string not escaped when used inside your input might not be the best choice. However, you do have the luxury of the "false" slash being inside a regular pattern. What I suggest...
Split the whole string on "/"
Parse each part until you get to the start of the path
Put the path elements into a list until the end of the path
Rejoin the path back on "/"
I highly recommend you consider escaping the "/" in your paths to make your life easier.
This pattern captures all parts of your example string separately without including the delimiter into the results:
\w+\(.*?\)|<.*>|\".*\"|\w+

Categories

Resources