conditional replaceAll java - java

I have html code with img src tags pointing to urls. Some have mysite.com/myimage.png as src others have mysite.com/1234/12/12/myimage.png. I want to replace these urls with a cache file path. Im looking for something like this.
String website = "mysite.com"
String text = webContent.replaceAll(website+ "\\d{4}\\/\\d{2}\\/\\d{2}", String.valueOf(cacheDir));
This code however does not work when the url does not have the extra date stamp at the end. Does anyone know how i might achieve this? Thanks!

Try this one
mysite\.com/(\d{4}/\d{2}/\d{2}/)?
here ? means zero or more occurance
Note: use escape character \. for dot match because .(dot) is already used in regex
Sample code :
String[] webContents = new String[] { "mysite.com/myimage.png",
"mysite.com/1234/12/12/myimage.png" };
for (String webContent : webContents) {
String text = webContent.replaceAll("mysite\\.com/(\\d{4}/\\d{2}/\\d{2}/)?",
String.valueOf("mysite.com/abc/"));
System.out.println(text);
}
output:
mysite.com/abc/myimage.png
mysite.com/abc/myimage.png

You are missing a forward slash between the website.com and the first 4 digits.
String text = webContent.replaceAll(Pattern.quote(website) + "/\\d{4}\\/\\d{2}\\/\\d{2}", String.valueOf(cacheDir));
I'd also recommend using a literal for your website.com value (the Pattern.quote part).
Finally you are also missing the last forward slash after the last two digits so it won't be replaced, but that may be on purpose...

Try:
String text = webContent.replaceAll("(?<="+website+")(.*)(?=\\/)",
String.valueOf(cacheDir));

Related

Java regex for Windows file path

I'm trying to build a Java regex to search a .txt file for a Windows formatted file path, however, due to the file path containing literal backslashes, my regex is failing.
The .txt file contains the line:
C\Windows\SysWOW64\ntdll.dll
However, some of the filenames in the text file are formatted like this:
C\Windows\SysWOW64\ntdll.dll (some developer stuff here...)
So I'm unable to use String.equals
To match this line, I'm using the regex:
filename = "C\\Windows\\SysWOW64\\ntdll.dll"
read = BufferedReader.readLine();
if (Pattern.compile(Pattern.quote(filename), Pattern.CASE_INSENSITIVE).matcher(read).find()) {
I've tried escaping the literal backslashes, using the replace method, i.e:
filename.replace("\\", "\\\\");
However, this is failing to find, I'm guessing this is because I need to further escape the backslashes after the Pattern has been built, I'm thinking I might need to escape upto an additional four backslashes, i.e:
Pattern.replaceAll("\\\\", "\\\\\\\\");
However, each time I try, the pattern doesn't get matched. I'm certain it's a problem with the backslashes, but I'm not sure where to do the replacement, or if there's a better way of building the pattern.
I think the problem is further being compounded as the replaceAll method also uses a regex, with means the pattern will have it's own backslashes in there, to deal with the case insensitivity.
Any input or advice would be appreciated.
Thanks
Seems like you're attempting to to a direct comparison of String against another. For exact matches, you could do (
if (read.equalsIgnoreCase(filename)) {
of simply
if (read.startsWith(filename)) {
Try this :
While reading each line from the file, replace '\' by '\\'.
Then :
String lLine = "C\\Windows\\SysWOW64\\ntdll.dll";
Pattern lPattern = Pattern.compile("C\\\\Windows\\\\SysWOW64\\\\ntdll\\.dll");
Matcher lMatcher = lPattern.matcher(lLine);
if(lMatcher.find()) {
System.out.println(lMatcher.group());
}
lLine = "C\\Windows\\SysWOW64\\ntdll.dll (some developer stuff here...)";
lMatcher = lPattern.matcher(lLine);
if(lMatcher.find()) {
System.out.println(lMatcher.group());
}
The correct usage will be:
String filename = "C\\Windows\\SysWOW64\\ntdll.dll";
String file = filename.replace('\\', ' ');

Replacing special character from a String in Android

I have a String as folder/File Name. I am creating folder , file with that string. This string may or may not contain some charters which may not allow to create desired folder or file
e.g
String folder = "ArslanFolder 20/01/2013";
So I want to remove these characters with "_"
Here are characters
private static final String ReservedChars = "|\?*<\":>+[]/'";
What will be the regular expression for that? I know replaceAll(); but I want to create a regular expression for that.
Use this code:
String folder = "ArslanFolder 20/01/2013 ? / '";
String result = folder.replaceAll("[|?*<\":>+\\[\\]/']", "_");
And the result would be:
ArslanFolder 20_01_2013 _ _ _
you didn't say that space should be replaced, so spaces are there... you could add it if it is necessary to be done.
I used one of this:
String alphaOnly = input.replaceAll("[^\\p{Alpha}]+","");
String alphaAndDigits = input.replaceAll("[^\\p{Alpha}\\p{Digit}]+","");
See this link:
Replace special characters
Try this :
replaceAll("[\\W]", "_");
It will replace all non alphanumeric characters with underscore
This is correct solution:
String result = inputString.replaceAll("[\\\\|?\u0000*<\":>+\\[\\]/']", "_");
Kent answer is good, but he isnt include characters NUL and \.
Also, this is a secure solution for replacing/renaming text of user-input file names, for example.

Why this regex not giving expected output?

i have string which contains some value as given below. i want to replace the html img tags containing specific customerId with some new text. i tried small java program which is not giving me expected output.here is the program info
My input string is
String inputText = "Starting here.. <img src=\"getCustomers.do?custCode=2&customerId=3334&param1=123/></p>"
+ "<p>someText</p><img src=\"getCustomers.do?custCode=2&customerId=3340&param2=456/> ..Ending here";
Regex is
String regex = "(?s)\\<img.*?customerId=3340.*?>";
new text i want to put inside input string
EDIT Starts:
String newText = "<img src=\"getCustomerNew.do\">";
EDIT ENDS:
now i am doing
String outputText = inputText.replaceAll(regex, newText);
output is
Starting here.. Replacing Text ..Ending here
but my expected output is
Starting here.. <img src=\"getCustomers.do?custCode=2&customerId=3334&param1=123/></p><p>someText</p>Replacing Text ..Ending here
Please note in my expected output only img tag which is containing customerId=3340 got replaced with Replacing Text. i am not getting why in the output i am getting both the img tags are getting replced?
You've got "wildcard"/"any" patterns (.*) in there which will extend the match to the longest possible matching string, and the last fixed text in the pattern is a > character, which therefore matches the last > character in the input text, i.e. the very last one!
You should be able to fix this by changing the .* parts to something like [^>]+ so that the matching won't span past the first > character.
Parsing HTML with regular expressions is bound to cause pain.
As other people have told you in the comments, HTML is not a regular language so using regex for manipulating it is usually painful. Your best option is to use an HTML parser. I haven't used Jsoup before, but googling a little bit it seems you need something like:
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;
public class MyJsoupExample {
public static void main(String args[]) {
String inputText = "<html><head></head><body><p><img src=\"getCustomers.do?custCode=2&customerId=3334&param1=123\"/></p>"
+ "<p>someText <img src=\"getCustomers.do?custCode=2&customerId=3340&param2=456\"/></p></body></html>";
Document doc = Jsoup.parse(inputText);
Elements myImgs = doc.select("img[src*=customerId=3340");
for (Element element : myImgs) {
element.replaceWith(new TextNode("my replaced text", ""));
}
System.out.println(doc.toString());
}
}
Basically the code gets the list of img nodes with a src attribute containing a given string
Elements myImgs = doc.select("img[src*=customerId=3340");
then loop over the list and replace those nodes with some text.
UPDATE
If you don't want to replace the whole img node with text but instead you need to give a new value to its src attribute then you can replace the block of the for loop with:
element.attr("src", "my new value"));
or if you want to change just a part of the src value then you can do:
String srcValue = element.attr("src");
element.attr("src", srcValue.replace("getCustomers.do", "getCustonerNew.do"));
which is very similar to what I posted in this thread.
What happens is that your regex starts matching the first img tag then consumes everything (regardless is greedy or not) until it finds customerId=3340 and then continues consuming everything until it finds >.
If you want it to consume just the img with customerId=3340 think of what makes different this tag from other tags that it may match.
In this particular case, one possible solution is to look at what is behind that img tag using a look-behind operator (which doesn't consume a match). This regex will work:
String regex = "(?<=</p>)<img src=\".*?customerId=3340.*?>";

Replace "\\" with "/" in Java

I am trying to replace '\\'with '/' in java(Android) and this does not seem to work!
String rawPath = filePath.replace("\\\\", "/");
What is wrong with this ? I have escaped "\" and tried escaping '/' but to no use. Nothing happens to the original string.
filePath = abc\\xyz(not after escaping two \\, the original string is with two \\)
rawPath = abc \ xyz
expected = abc/xyz
Whats the correct way of doing this? (Another Windows file to Android path conversion prob)
When using String.replace(String, String) the backslash doesn't need to be escaped twice (thats when using replaceAll - it deals with regex). So:
String rawPath = filePath.replace("\\", "/");
Or using char version:
String rawPath = filePath.replace('\\', '/');
You do not need the quad-druple escape,
\\\\
, just simply
\\
.
escape with single slash should be enough. Following is working fine for me.
String rawPath = filePath.replace("\\", "/");
public static void main(String[] args) {
String s = "foo\\\\bar";
System.out.println(s);
System.out.println(s.replace("\\\\", "/"));
}
will print
foo\\bar
foo/bar
If you want to replace a sequence of 2 backslashes in your original string with a single forward slash, this should work:
String filePath = "abc\\\\xyz";
String rawPath = filePath.replace("\\\\", "/");
System.out.println(filePath);
System.out.println(rawPath);
outputs:
abc\\xyz
abc/xyz
Do you really have two backslashes in the String in the first place? That only appears in Java source code. At runtime there will only be one backslash. So the task reduces to changing backslashes to forward slashes (why?). For which you need a regex if you are using replaceAll(), which would require four of them: two for the compiler, and two for the regex, but you aren't using that, you are using replace(), which isn't a regex, so you only need two, one for the compiler and one for itself.
Why are you doing this? It is never necessary to use a backslash in a File path in Java at all, and it is also never necessary to translate them to / unless you are doing URL-like things with them, in which case there are File.toURI() methods and URI and URL classes for that.
Here is a very small method to get the desktop path and show you how to replace them in the return statement.
public static String getDesktopPath() {
String desktopPath = System.getProperty("user.home") + "/Desktop";
return desktopPath.replace("\\", "/");
}

Java Regex - Changing path with an alias

I have a path called $SERVER/public_html/ab1/ab2/.
I want to change it so that instead of $SERVER it just replaces it with my user directory. So I do
path = path.replaceFirst("\\$SERVER", System.getProperty("user.dir"));
but when I run it, it removes my \ in the new string.
F:Programming ProjectsJava Project/public_html/ab1/ab2/
Pattern has a String quote(String) function that will help you for the first string and Matcher has String quoteReplacement(String) for the second:
path = path.replaceFirst(java.util.regex.Pattern.quote("$SERVER"), java.util.regex.Matcher.quoteReplacement(System.getProperty("user.dir")));
edit: the reason you have to escape anything is because the second string has the semantics of Matcher.appendReplacement which treats backslashes and dollars as escape next char and insert captured group resp.
from the doc:
Note that backslashes () and dollar
signs ($) in the replacement string
may cause the results to be different
than if it were being treated as a
literal replacement string. Dollar
signs may be treated as references to
captured subsequences as described
above, and backslashes are used to
escape literal characters in the
replacement string.
a more obvious solution is (be careful of the needed escaped with that backslash)
path = path.replaceFirst("\\$SERVER", System.getProperty("user.dir").replaceAll("\\\\","\\\\\\\\"));
Yea you are completly right. I am trying to figure out why it is happening so.
But at the moment the only think I can suggest is to go with such a solution.
public class RegExTest
{
public static void main(String[] args)
{
String path = "$SERVER/public_html/ab1/ab2";
System.out.println("path before="+path);
String user = System.getProperty("user.dir");
System.out.println("user="+user);
System.out.println("replaceFirst using user="+path.replaceFirst("\\$SERVER", user));
path = path.replaceFirst("\\$SERVER", "");
path = user +path;
System.out.println("path after="+path);
}
}
EDIT: ..Why it does that?
From what I see in the code of the method line 701 to 708 they must do it. They just skip them. As to the reason why they do it, I still am not sure.
EDIT2:
OK reading the doc for the method answers it all. They do it so they can interpret accordingly special characters. Thus when reading the replacement they spot a slash the algorithm assumes it can be a part of special character and in result skips it.
if (nextChar == '\\') {
cursor++;
nextChar = replacement.charAt(cursor);
result.append(nextChar);
cursor++;
} else if (nextChar == '$') {
// Skip past $
cursor++;
Ok so in Windows the default slashes look like so '\' whereas on *nix the slashes look like so '/' . The simplest way to get through this problem is to invoke the replace function with the following parameters '\\' and '/' . That way you path will have its slashes all facing the same way.

Categories

Resources