Java regex expression to sanitize an uploaded file name

Java regex expression to sanitize an uploaded file name - java

I'm trying to sanitize a String that contains an uploaded file's name. I'm doing this because the files will be downloaded from the web and, plus, I want to normalize the names. This is what I have so far:
private String pattern = "[^0-9_a-zA-Z\\(\\)\\%\\-\\.]";
//Class methods & stuff
private String sanitizeFileName(String badFileName) {
StringBuffer cleanFileName = new StringBuffer();
Pattern filePattern = Pattern.compile(pattern);
Matcher fileMatcher = filePattern.matcher(badFileName);
boolean match = fileMatcher.find();
while(match) {
fileMatcher.appendReplacement(cleanFileName, "");
match = fileMatcher.find();
}
return cleanFileName.substring(0, cleanFileName.length() > 250 ? 250 : cleanFileName.length());
}
This works ok, but for a strange reason the extension of the file is erased. i.e. "p%Z_-...#!$()=¡¿&+.jpg" ends up being "p%Z_-...()".
Any Idea as to how should I tune up my regex?

You need a Matcher#appendTail at the end of your loop.

One line solution:
return badFileName.replaceAll("[^0-9_a-zA-Z\\(\\)\\%\\-\\.]", "");
If you want to restrict it to just alphanumeric and space:
return badFileName.replaceAll("[^a-zA-Z0-9 ]", "");
Cheers :)

Related

Split a string in java based on custom logic

I have a string
"target/abcd12345671.csv"
and I need to extract
"abcd12345671"
from the string using Java. Can anyone suggest me a clean way to extract this.

Core Java
String fileName = Paths.get("target/abcd12345671.csv").getFileName().toString();
fileName = filename.replaceFirst("[.][^.]+$", "")
Using apache commons
import org.apache.commons.io.FilenameUtils;
String fileName = Paths.get("target/abcd12345671.csv").getFileName().toString();
String fileNameWithoutExt = FilenameUtils.getBaseName(fileName);

I like a regex replace approach here:
String filename = "target/abcd12345671.csv";
String output = filename.replaceAll("^.*/|\\..*$", "");
System.out.println(output); // abcd12345671
Here we use a regex alternation to remove all content up, and including, the final forward slash, as well as all content from the dot in the extension to the end of the filename. This leaves behind the content you actually want.

Here is an approach with using regex
String filename = "target/abcd12345671.csv";
var pattern = Pattern.compile("target/(.*).csv");
var matcher = pattern.matcher(filename);
if (matcher.find()) {
// Whole matched expression -> "target/abcd12345671.csv"
System.out.println(matcher.group(0));
// Matched in the first group -> in regex it is the (.*) expression
System.out.println(matcher.group(1));
}

Regex Redirect URL excludes token

I'm trying to create a redirect URL for my client. We have a service that you specify "fromUrl" -> "toUrl" that is using a java regex Matcher. But I can't get it work to include the token in when it converts it. For example:
/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf
Should be:
/tourl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf
but it excludes the token so the result I get is:
/fromurl/login/
/tourl/login/
I tried various regex patterns like: " ?.* and [%5E//?]+)/([^/?]+)/(?.*)?$ and (/*) etc" but no one seems to work.
I'm not that familiar with regex. How can I solve this?

This can be easily done using simple string replace but if you insist on using regular expressions:
Pattern p = Pattern.compile("fromurl");
String originalUrlAsString = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf ";
String newRedirectedUrlAsString = p.matcher(originalUrlAsString).replaceAll("tourl");
System.out.println(newRedirectedUrlAsString);

If I understand you correctly you need something like this?
String from = "/my/old/url/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = from.replaceAll("\\/(.*)\\/", "/my/new/url/");
System.out.println(to); // /my/new/url/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
This will replace everything between the first and the last forward slash.

Can you detail more exactly what the original expression is like? This is necessary because the regular expression is based on it.
Assuming that the first occurrence of fromurl should simply be replaced with the following code:
String from = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = from.replaceFirst("fromurl", "tourl");
But if it is necessary to use more complex rules to determine the substring to replace, you can use:
String from = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = "";
String regularExpresion = "(<<pre>>)(fromurl)(<<pos>>)";
Pattern pattern = Pattern.compile(regularExpresion);
Matcher matcher = pattern.matcher(from);
if (matcher.matches()) {
to = from.replaceAll(regularExpresion, "$1tourl$3");
}
NOTE: pre and pos targets are referencial because I don't know the real expresion of the url
NOTE 2: $1 and $3 refer to the first and the third group

Although existing answers should solve the issue and some are similar, maybe below solution would be of help, with quite an easy regex being used (assuming you get input of same format as your example):
private static String replaceUrl(String inputUrl){
String regex = "/.*(/login\\?token=.*)";
String toUrl = "/tourl";
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(inputUrl);
if (matcher.find()) {
return toUrl + matcher.group(1);
} else
return null;
}
You can write a test if it works for other expected inputs/outputs if you want to change format and adjust regex:
String inputUrl = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String expectedUrl = "/tourl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
if (expectedUrl.equals(replaceUrl(inputUrl))){
System.out.println("Success");
}

Parse string value from URL

I have a string (which is an URL) in this pattern https://xxx.kflslfsk.com/kjjfkskfjksf/v1/files/media/93939393hhs8.jpeg
now I want to clip it to this
media/93939393hhs8.jpeg
I want to remove all the characters before the second last slash /.
i'm a newbie in java but in swift (iOS) this is how we do this:
if let url = NSURL(string:"https://xxx.kflslfsk.com/kjjfkskfjksf/v1/files/media/93939393hhs8.jpeg"), pathComponents = url.pathComponents {
let trimmedString = pathComponents.suffix(2).joinWithSeparator("/")
print(trimmedString) // "output = media/93939393hhs8.jpeg"
}
Basically, I'm removing everything from this Url expect of last 2 item and then.
I'm joining those 2 items using /.

String ret = url.substring(url.indexof("media"),url.indexof("jpg"))

Are you familiar with Regex? Try to use this Regex (explained in the link) that captures the last 2 items separated with /:
.*?\/([^\/]+?\/[^\/]+?$)
Here is the example in Java (don't forget the escaping with \\:
Pattern p = Pattern.compile("^.*?\\/([^\\/]+?\\/[^\\/]+?$)");
Matcher m = p.matcher(string);
if (m.find()) {
System.out.println(m.group(1));
}
Alternatively there is the split(..) function, however I recommend you the way above. (Finally concatenate separated strings correctly with StringBuilder).
String part[] = string.split("/");
int l = part.length;
StringBuilder sb = new StringBuilder();
String result = sb.append(part[l-2]).append("/").append(part[l-1]).toString();
Both giving the same result: media/93939393hhs8.jpeg

string result=url.substring(url.substring(0,url.lastIndexOf('/')).lastIndexOf('/'));
or
Use Split and add last 2 items
string[] arr=url.split("/");
string result= arr[arr.length-2]+"/"+arr[arr.length-1]

public static String parseUrl(String str) {
return (str.lastIndexOf("/") > 0) ? str.substring(1+(str.substring(0,str.lastIndexOf("/")).lastIndexOf("/"))) : str;
}

regex or string parsing

I am trying to parse a string which has a specific pattern. An example valid string is as follows:
<STX><DATA><ETX>
<STX>A?123<ETX>
<STX><DATA><ETX>
<STX>name!xyz<ETX>
<STX>age!27y<ETX>
<STX></DATA><ETX>
<STX>A?234<ETX>
<STX><DATA><ETX>
<STX>name!abc<ETX>
<STX>age!24y<ETX>
<STX></DATA><ETX>
<STX>A?345<ETX>
<STX><DATA><ETX>
<STX>name!bac<ETX>
<STX>age!22y<ETX>
<STX></DATA><ETX>
<STX>OK<ETX>
<STX></DATA><ETX>
this data is sent by device. All I need is to parse this string with id:123 name:xyz, age 27y.
I am trying to use this regex:
final Pattern regex = Pattern.compile("(.*?)", Pattern.DOTALL);
this does output the required data :
<ETX>
<STX>A?123<ETX>
<STX><DATA><ETX>
<STX>name!xyz<ETX>
<STX>age!27y<ETX>
<STX>
How can I loop the string recursively to copy all into list of string.
I am trying to loop over and delete the extracted pattern but it doesn't delete.
final Pattern regex = Pattern.compile("<DATA>(.*?)</DATA>", Pattern.DOTALL);// Q?(.*?)
final StringBuffer buff = new StringBuffer(frame);
final Matcher matcher = regex.matcher(buff);
while (matcher.find())
{
final String dataElements = matcher.group();
System.out.println("Data:" + dataElements);
}
}
Are there any beter ways to do this.
This is the output I am currently getting:
Data:<DATA><ETX><STX>A?123<ETX><STX><DATA><ETX><STX>name!xyz<ETX><STX>age!27y<ETX><STX> </DATA>
Data:<DATA><ETX><STX>name!abc<ETX><STX>age!24y<ETX><STX></DATA>
Data:<DATA><ETX><STX>name!bac<ETX><STX>age!22y<ETX><STX></DATA>
I am missing the A?234 and A?345 in the next two matches.

I really dont know what exactly you want to achieve by this but if you want to remove the occurances of that pattern this line:
buff.toString().replace(dataElements, "")
doesn't look good. you are just editing the string representation of that buff. You have to again replace the edited version back into the buff (after casting).

Using this regex solves my issue:
<STX>(A*)(.*?)<DATA>(.*?)</DATA>

How to extract word from string?

Suppose I have a string:
String message = "you should try http://google.com/";
Now, I want to send "http://google.com/" to a new
String url
What I want to do is:
check if a "word" in the string begins with "http://" and extract that word, where a word is
something that's surrounded by spaces (general english definition of word).
I have no idea how to extract the string, and the best I can do is use startsWith on the string. How to I use startsWith on a word, and extract the word?
Sorry if this is a little bit difficult to explain.
Thanks in advance!
EDIT: Also, what should I do to extract the word from the REGEX operation? And how should I handle it if there is more than 1 url in the string?

Use Pattern & Matcher classes.
String str = "blabla http://www.mywebsite.com blabla";
String regex = "((https?:\\/\\/)?(www.)?(([a-zA-Z0-9-]){2,}\\.){1,4}([a-zA-Z]){2,6}(\\/([a-zA-Z-_/.0-9#:+?%=&;,]*)?)?)";
Matcher m = Pattern.compile(regex).matcher(str);
if (m.find()) {
String url = m.group(); //value "http://www.mywebsite.com"
}
This regex will work for http://..., https://... and even www... URLs. Others regex can be easily found on the net.

You can try this:
String str = "blabla http://www.mywebsite.com blabla";
Matcher m = Pattern.compile("(http://.*)").matcher(str);
if (m.find()) {
String url = (new StringTokenizer(m.group(), " ")).nextToken();
}

The "correct" way to perform this task is to split the String by whitespace -- String#split("\s") -- and then pipe it to the URL constructor. If the string starts with your prefix and a MalformedURLException is thrown it is invalid. The URL class constructor is far better tested and more robust than any solution that you or I could come up with. So, use it, please and don't reinvent the wheel.

You can use Java Regex for this:
The following regex catches any string starting with http:// or https:// till the next whitespace character:
Pattern urlPattern = Pattern.compile("(http(s)?://[.^[\\S]]*)");
Matcher matcher = compile.matcher(myString);
if (matcher.find()) {
String url = matcher.group();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java regex expression to sanitize an uploaded file name - java

You need a Matcher#appendTail at the end of your loop.

One line solution: return badFileName.replaceAll("[^0-9_a-zA-Z\\(\\)\\%\\-\\.]", ""); If you want to restrict it to just alphanumeric and space: return badFileName.replaceAll("[^a-zA-Z0-9 ]", ""); Cheers :)

Related

Split a string in java based on custom logic

Regex Redirect URL excludes token

Parse string value from URL

regex or string parsing

How to extract word from string?

Categories

Resources