How to concatenate string or word inside each String index - java

I am having trouble encoding a url with combined Non-ASCII and spaces. For example, http://xxx.xx.xx.xx/resources/upload/pdf/APPLE ははは.pdf. I've read here that you need to encode only the last part of the path of the url.
Here's the code:
public static String getLastPathFromUrl(String url) {
return url.replaceFirst(".*/([^/?]+).*", "$1");
}
So now I have already APPLE ははは.pdf, next step is to replace spaces with %20 for the link to work BUT the problem is that if I encode APPLE%20ははは.pdf it becomes APPLE%2520%E3%81%AF%E3%81%AF%E3%81%AF.pdf. I should have APPLE%20%E3%81%AF%E3%81%AF%E3%81%AF.pdf.
So I decided to:
1. Separate each word from the link
2. Encode it
3. Concatenate the new encoded words, for example:
3.A. APPLE (APPLE)
3.B. %E3%81%AF%E3%81%AF%E3%81%AF.pdf (ははは.pdf)
with the (space) converted to %20, now becomes APPLE%20%E3%81%AF%E3%81%AF%E3%81%AF.pdf
Here's my code:
public static String[] splitWords(String sentence) {
String[] words = sentence.split(" ");
return words;
}
The calling code:
String urlLastPath = getLastPathFromUrl(pdfUrl);
String[] splitWords = splitWords(urlLastPath);
for (String word : splitWords) {
String urlEncoded = URLEncoder.encode(word, "utf-8"); //STUCKED HERE
}
I now want to concatenate each unicoded string(urlEncoded) inside the indices to finally form like APPLE%20%E3%81%AF%E3%81%AF%E3%81%AF.pdf. How do I do this?

actually the %20 is encoded as %2520 so just call URLEncoder.encode(word, "utf-8"); so you will get result like this APPLE+%E3%81%AF%E3%81%AF%E3%81%AF.pdf and in final result replace + with %20.

Do you want to do something like this:
// Get the whole url as string
Stirng urlString = pdfUrl.toString();
// get the string before the last path segment
String result = urlString.substring(0, urlString.lastIndexOf("/"));
String urlLastPath = getLastPathFromUrl(pdfUrl);
String[] splitWords = splitWords(urlLastPath);
for (String word : splitWords) {
String urlEncoded = URLEncoder.encode(word, "utf-8");
// add the encoded part to the url
result += urlEncoded;
}
Now the string result is your encoded URL as a string.

Possibly easy with org.apache.commons.io.FilenameUtils.
Split your url into baseUrl and the file name and extension.
Encode the file name and extension
Join them together
String url = "http://xxx.xx.xx.xx/resources/upload/pdf/APPLE ははは.pdf";
String baseUrl = FilenameUtils.getPath(url); // GIVES: http://xxx.xx.xx.xx/resources/upload/pdf/
String myFile = FilenameUtils.getBaseName(url)
+ "." + FilenameUtils.getExtension(url); // GIVES: APPLE ははは.pdf
String encoded = URLEncoder.encode(myFile, "UTF-8"); //GIVES: APPLE+%E3%81%AF%E3%81%AF%E3%81%AF.pdf
System.out.println(baseUrl + encoded);
Output:
http://xxx.xx.xx.xx/resources/upload/pdf/APPLE+%E3%81%AF%E3%81%AF%E3%81%AF.pdf

Don't reinvent the wheel. Use URLEncoder for encoding the URL.
URLEncoder.encode(yourArgumentsHere, "utf-8");
Moreover, where do you get your URL from, so that you have to split it before encoding? You should first build the arguments (last part), then just append it onto the base URL.

Related

How to include a ? in Java 11 HTTP Client URL? [duplicate]

Say I have a URL
http://example.com/query?q=
and I have a query entered by the user such as:
random word £500 bank $
I want the result to be a properly encoded URL:
http://example.com/query?q=random%20word%20%A3500%20bank%20%24
What's the best way to achieve this? I tried URLEncoder and creating URI/URL objects but none of them come out quite right.
URLEncoder is the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character & nor the parameter name-value separator character =.
String q = "random word £500 bank $";
String url = "https://example.com?q=" + URLEncoder.encode(q, StandardCharsets.UTF_8);
When you're still not on Java 10 or newer, then use StandardCharsets.UTF_8.toString() as charset argument, or when you're still not on Java 7 or newer, then use "UTF-8".
Note that spaces in query parameters are represented by +, not %20, which is legitimately valid. The %20 is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character ?), not in query string (the part after ?).
Also note that there are three encode() methods. One without Charset as second argument and another with String as second argument which throws a checked exception. The one without Charset argument is deprecated. Never use it and always specify the Charset argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
See also:
What every web developer must know about URL encoding
I would not use URLEncoder. Besides being incorrectly named (URLEncoder has nothing to do with URLs), inefficient (it uses a StringBuffer instead of Builder and does a couple of other things that are slow) Its also way too easy to screw it up.
Instead I would use URIBuilder or Spring's org.springframework.web.util.UriUtils.encodeQuery or Commons Apache HttpClient.
The reason being you have to escape the query parameters name (ie BalusC's answer q) differently than the parameter value.
The only downside to the above (that I found out painfully) is that URL's are not a true subset of URI's.
Sample code:
import org.apache.http.client.utils.URIBuilder;
URIBuilder ub = new URIBuilder("http://example.com/query");
ub.addParameter("q", "random word £500 bank \$");
String url = ub.toString();
// Result: http://example.com/query?q=random+word+%C2%A3500+bank+%24
You need to first create a URI like:
String urlStr = "http://www.example.com/CEREC® Materials & Accessories/IPS Empress® CAD.pdf"
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
Then convert that URI to an ASCII string:
urlStr = uri.toASCIIString();
Now your URL string is completely encoded. First we did simple URL encoding and then we converted it to an ASCII string to make sure no character outside US-ASCII remained in the string. This is exactly how browsers do it.
Guava 15 has now added a set of straightforward URL escapers.
The code
URL url = new URL("http://example.com/query?q=random word £500 bank $");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL = uri.toASCIIString();
System.out.println(correctEncodedURL);
Prints
http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$
What is happening here?
1. Split URL into structural parts. Use java.net.URL for it.
2. Encode each structural part properly!
3. Use IDN.toASCII(putDomainNameHere) to Punycode encode the hostname!
4. Use java.net.URI.toASCIIString() to percent-encode, NFC encoded Unicode - (better would be NFKC!). For more information, see: How to encode properly this URL
In some cases it is advisable to check if the URL is already encoded. Also replace '+' encoded spaces with '%20' encoded spaces.
Here are some examples that will also work properly
{
"in" : "http://نامه‌ای.com/",
"out" : "http://xn--mgba3gch31f.com/"
},{
"in" : "http://www.example.com/‥/foo",
"out" : "http://www.example.com/%E2%80%A5/foo"
},{
"in" : "http://search.barnesandnoble.com/booksearch/first book.pdf",
"out" : "http://search.barnesandnoble.com/booksearch/first%20book.pdf"
}, {
"in" : "http://example.com/query?q=random word £500 bank $",
"out" : "http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$"
}
The solution passes around 100 of the test cases provided by Web Platform Tests.
Using Spring's UriComponentsBuilder:
UriComponentsBuilder
.fromUriString(url)
.build()
.encode()
.toUri()
The Apache HttpComponents library provides a neat option for building and encoding query parameters.
With HttpComponents 4.x use:
URLEncodedUtils
For HttpClient 3.x use:
EncodingUtil
Here's a method you can use in your code to convert a URL string and map of parameters to a valid encoded URL string containing the query parameters.
String addQueryStringToUrlString(String url, final Map<Object, Object> parameters) throws UnsupportedEncodingException {
if (parameters == null) {
return url;
}
for (Map.Entry<Object, Object> parameter : parameters.entrySet()) {
final String encodedKey = URLEncoder.encode(parameter.getKey().toString(), "UTF-8");
final String encodedValue = URLEncoder.encode(parameter.getValue().toString(), "UTF-8");
if (!url.contains("?")) {
url += "?" + encodedKey + "=" + encodedValue;
} else {
url += "&" + encodedKey + "=" + encodedValue;
}
}
return url;
}
In Android, I would use this code:
Uri myUI = Uri.parse("http://example.com/query").buildUpon().appendQueryParameter("q", "random word A3500 bank 24").build();
Where Uri is a android.net.Uri
In my case I just needed to pass the whole URL and encode only the value of each parameters.
I didn't find common code to do that, so (!!) so I created this small method to do the job:
public static String encodeUrl(String url) throws Exception {
if (url == null || !url.contains("?")) {
return url;
}
List<String> list = new ArrayList<>();
String rootUrl = url.split("\\?")[0] + "?";
String paramsUrl = url.replace(rootUrl, "");
List<String> paramsUrlList = Arrays.asList(paramsUrl.split("&"));
for (String param : paramsUrlList) {
if (param.contains("=")) {
String key = param.split("=")[0];
String value = param.replace(key + "=", "");
list.add(key + "=" + URLEncoder.encode(value, "UTF-8"));
}
else {
list.add(param);
}
}
return rootUrl + StringUtils.join(list, "&");
}
public static String decodeUrl(String url) throws Exception {
return URLDecoder.decode(url, "UTF-8");
}
It uses Apache Commons' org.apache.commons.lang3.StringUtils.
Use this:
URLEncoder.encode(query, StandardCharsets.UTF_8.displayName());
or this:
URLEncoder.encode(query, "UTF-8");
You can use the following code.
String encodedUrl1 = UriUtils.encodeQuery(query, "UTF-8"); // No change
String encodedUrl2 = URLEncoder.encode(query, "UTF-8"); // Changed
String encodedUrl3 = URLEncoder.encode(query, StandardCharsets.UTF_8.displayName()); // Changed
System.out.println("url1 " + encodedUrl1 + "\n" + "url2=" + encodedUrl2 + "\n" + "url3=" + encodedUrl3);

ISO-8858-1 to UTF-8 only in URL, only invalid characters

Problem: sometimes we are getting links/phrases with invalid(for us) encoding.
Examples and my first solution below
Description:
I have to fix invalid encoded strings in one part of the application. Sometimes it is a word or phrase, but somtimes also a url. When its a URL I would like to change only wrongly encoded characters. If I decode with ISO and encode to UTF-8 the special url characters are also encoded (/ : ? = &). I coded a solution, which is working for my cases just fine, but those hashes you will see below are smelling badly to me.
Do you had a similar problem or do you know a library which allows to decode a phrase except some characters? Something like this:
decode(String value, char[] ignored)
I also though about braking URL into pieces and fix only path and query but it would be even more mess with parsing them etc..
TLDR: Decode ISO-8858-1 encoded URL and encode it to UTF-8. Dont touch URL specific characters (/ ? = : &)
Input/Output examples:
// wrong input
"http://some.url/xxx/a/%e4t%fcr%E4/b/%e4t%fcr%E4"
"t%E9l%E9phone"
// good output
"http://some.url/xxx/a/%C3%A4t%C3%BCr%C3%A4/b/%C3%A4t%C3%BCr%C3%A4"
"t%C3%A9l%C3%A9phone"
// very wrong output
"http%3A%2F%2Fsome.url%2Fxxx%2Fa%2F%C3%A4t%C3%BCr%C3%A4%2Fb%2F%C3%A4t%C3%BCr%C3%A4"
My first solution:
class EncodingFixer {
private static final String SLASH_HASH = UUID.randomUUID().toString();
private static final String QUESTION_HASH = UUID.randomUUID().toString();
private static final String EQUALS_HASH = UUID.randomUUID().toString();
private static final String AND_HASH = UUID.randomUUID().toString();
private static final String COLON_HASH = UUID.randomUUID().toString();
EncodingFixer() {
}
String fix(String value) {
if (isBlank(value)) {
return value;
}
return tryFix(value);
}
private String tryFix(String str) {
try {
String replaced = replaceWithHashes(str);
String fixed = java.net.URLEncoder.encode(java.net.URLDecoder.decode(replaced, ISO_8859_1), UTF_8);
return replaceBack(fixed);
} catch (Exception e) {
return str;
}
}
private String replaceWithHashes(String str) {
return str
.replaceAll("/", SLASH_HASH)
.replaceAll("\\?", QUESTION_HASH)
.replaceAll("=", EQUALS_HASH)
.replaceAll("&", AND_HASH)
.replaceAll(":", COLON_HASH);
}
private String replaceBack(String fixed) {
return fixed
.replaceAll(SLASH_HASH, "/")
.replaceAll(QUESTION_HASH, "?")
.replaceAll(EQUALS_HASH, "=")
.replaceAll(AND_HASH, "&")
.replaceAll(COLON_HASH, ":");
}
}
Or it should be more like: ???
Check if input is an URL
Create URL
Get path
Split by /
Fix every part
Put it back together
Same for query but little more complicated
??
I also though about it but it seems even more messy than those replaceAlls above :/
If you are able to recognize clearly that some string is an URL, then following user's #jschnasse answer in similar question on SO, this might be the solution you need:
URL url= new URL("http://some.url/xxx/a/%e4t%fcr%E4/b/%e4t%fcr%E4");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL=uri.toASCIIString();
System.out.println(correctEncodedURL);
outputs:
http://some.url/xxx/a/%25e4t%25fcr%25E4/b/%25e4t%25fcr%25E4

How to encode the arabic words in a Url [duplicate]

How do you encode a URL in Android?
I thought it was like this:
final String encodedURL = URLEncoder.encode(urlAsString, "UTF-8");
URL url = new URL(encodedURL);
If I do the above, the http:// in urlAsString is replaced by http%3A%2F%2F in encodedURL and then I get a java.net.MalformedURLException when I use the URL.
You don't encode the entire URL, only parts of it that come from "unreliable sources".
Java:
String query = URLEncoder.encode("apples oranges", Charsets.UTF_8.name());
String url = "http://stackoverflow.com/search?q=" + query;
Kotlin:
val query: String = URLEncoder.encode("apples oranges", Charsets.UTF_8.name())
val url = "http://stackoverflow.com/search?q=$query"
Alternatively, you can use Strings.urlEncode(String str) of DroidParts that doesn't throw checked exceptions.
Or use something like
String uri = Uri.parse("http://...")
.buildUpon()
.appendQueryParameter("key", "val")
.build().toString();
I'm going to add one suggestion here. You can do this which avoids having to get any external libraries.
Give this a try:
String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();
You can see that in this particular URL, I need to have those spaces encoded so that I can use it for a request.
This takes advantage of a couple features available to you in Android classes. First, the URL class can break a url into its proper components so there is no need for you to do any string search/replace work. Secondly, this approach takes advantage of the URI class feature of properly escaping components when you construct a URI via components rather than from a single string.
The beauty of this approach is that you can take any valid url string and have it work without needing any special knowledge of it yourself.
For android, I would use
String android.net.Uri.encode(String s)
Encodes characters in the given string as '%'-escaped octets using the UTF-8 scheme. Leaves letters ("A-Z", "a-z"), numbers ("0-9"), and unreserved characters ("_-!.~'()*") intact. Encodes all other characters.
Ex/
String urlEncoded = "http://stackoverflow.com/search?q=" + Uri.encode(query);
Also you can use this
private static final String ALLOWED_URI_CHARS = "##&=*+-_.,:!?()/~'%";
String urlEncoded = Uri.encode(path, ALLOWED_URI_CHARS);
it's the most simple method
try {
query = URLEncoder.encode(query, "utf-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
you can use below methods
public static String parseUrl(String surl) throws Exception
{
URL u = new URL(surl);
return new URI(u.getProtocol(), u.getAuthority(), u.getPath(), u.getQuery(), u.getRef()).toString();
}
or
public String parseURL(String url, Map<String, String> params)
{
Builder builder = Uri.parse(url).buildUpon();
for (String key : params.keySet())
{
builder.appendQueryParameter(key, params.get(key));
}
return builder.build().toString();
}
the second one is better than first.
Find Arabic chars and replace them with its UTF-8 encoding.
some thing like this:
for (int i = 0; i < urlAsString.length(); i++) {
if (urlAsString.charAt(i) > 255) {
urlAsString = urlAsString.substring(0, i) + URLEncoder.encode(urlAsString.charAt(i)+"", "UTF-8") + urlAsString.substring(i+1);
}
}
encodedURL = urlAsString;

Java convert string into url title characters only [duplicate]

How do you encode a URL in Android?
I thought it was like this:
final String encodedURL = URLEncoder.encode(urlAsString, "UTF-8");
URL url = new URL(encodedURL);
If I do the above, the http:// in urlAsString is replaced by http%3A%2F%2F in encodedURL and then I get a java.net.MalformedURLException when I use the URL.
You don't encode the entire URL, only parts of it that come from "unreliable sources".
Java:
String query = URLEncoder.encode("apples oranges", Charsets.UTF_8.name());
String url = "http://stackoverflow.com/search?q=" + query;
Kotlin:
val query: String = URLEncoder.encode("apples oranges", Charsets.UTF_8.name())
val url = "http://stackoverflow.com/search?q=$query"
Alternatively, you can use Strings.urlEncode(String str) of DroidParts that doesn't throw checked exceptions.
Or use something like
String uri = Uri.parse("http://...")
.buildUpon()
.appendQueryParameter("key", "val")
.build().toString();
I'm going to add one suggestion here. You can do this which avoids having to get any external libraries.
Give this a try:
String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();
You can see that in this particular URL, I need to have those spaces encoded so that I can use it for a request.
This takes advantage of a couple features available to you in Android classes. First, the URL class can break a url into its proper components so there is no need for you to do any string search/replace work. Secondly, this approach takes advantage of the URI class feature of properly escaping components when you construct a URI via components rather than from a single string.
The beauty of this approach is that you can take any valid url string and have it work without needing any special knowledge of it yourself.
For android, I would use
String android.net.Uri.encode(String s)
Encodes characters in the given string as '%'-escaped octets using the UTF-8 scheme. Leaves letters ("A-Z", "a-z"), numbers ("0-9"), and unreserved characters ("_-!.~'()*") intact. Encodes all other characters.
Ex/
String urlEncoded = "http://stackoverflow.com/search?q=" + Uri.encode(query);
Also you can use this
private static final String ALLOWED_URI_CHARS = "##&=*+-_.,:!?()/~'%";
String urlEncoded = Uri.encode(path, ALLOWED_URI_CHARS);
it's the most simple method
try {
query = URLEncoder.encode(query, "utf-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
you can use below methods
public static String parseUrl(String surl) throws Exception
{
URL u = new URL(surl);
return new URI(u.getProtocol(), u.getAuthority(), u.getPath(), u.getQuery(), u.getRef()).toString();
}
or
public String parseURL(String url, Map<String, String> params)
{
Builder builder = Uri.parse(url).buildUpon();
for (String key : params.keySet())
{
builder.appendQueryParameter(key, params.get(key));
}
return builder.build().toString();
}
the second one is better than first.
Find Arabic chars and replace them with its UTF-8 encoding.
some thing like this:
for (int i = 0; i < urlAsString.length(); i++) {
if (urlAsString.charAt(i) > 255) {
urlAsString = urlAsString.substring(0, i) + URLEncoder.encode(urlAsString.charAt(i)+"", "UTF-8") + urlAsString.substring(i+1);
}
}
encodedURL = urlAsString;

Regex to extract the filename and drop the file timestamp from complete path in Java

i have complete file path and i just need to extract the filename and just extension. So my output would be fileName.csv.
For ex: complete path is:
/Dir1/Dir2/Dir3/Dir4/Dir5/Dir6/fileName_20150108_002_20150109013841.csv
My output of Regex should be fileName.csv.
Extension and level of directories are not fixed.
As part of my requirement i need single regex that can extract fileName.csv not fileName_20150108_002_20150109013841.csv.how can i do it in single regular expression ?
Without using regex this can be solved as -public static String getFileName(String args){
args = args.substring(args.lastIndexOf('/')+1);
return args.substring(0,args.indexOf('_')) + args.substring(args.indexOf('.'));
}
Below would work for you might be
[^\\/:*?"<>|\r\n]+$
This regex has been tested on these two examples:
\var\www\www.example.com\index.jsp
\index.jsp
or rather you should use File.getName() for better approach.
String filename = new File("Payload/brownie.app/Info.plist").getName();
System.out.println(filename)
another way is
int index = path.lastIndexOf(File.separatorChar);
String filename = path.substring(index+1);
finally after getting the full filename use below code snippet
String str = filename;// in your case filename will be fileName_20150108_002_20150109013841.csv
str = str.substring(0,str.indexOf('_'))+str.substring(str.lastIndexOf('.'));
System.out.println("filename is ::"+str); // output will be fileName.csv
In the below code, group one will be fileName_timestamp.extension. I've replaced numerics and underscores with empty string. This may look ugly, but still will server your purpose. If the file name contains numerics, we need go for a different approach.
public static void main(String[] args) {
String toBeSplitted = "/Dir1/Dir2/Dir3/Dir4/Dir5/Dir6/fileName_20150108_002_20150109013841.csv";
Pattern r = Pattern.compile("(/[a-zA-Z0-9_.-]+)+/?");
Matcher m = r.matcher(toBeSplitted);
if(m.matches()){
String s = m.group(1).replaceAll("(/|[0-9]|_)", "");
System.out.println(s);
}
}

Categories

Resources