I got this error message :
java.net.URISyntaxException: Illegal character in query at index 31: http://finance.yahoo.com/q/h?s=^IXIC
My_Url = http://finance.yahoo.com/q/h?s=^IXIC
When I copied it into a browser address field, it showed the correct page, it's a valid URL, but I can't parse it with this: new URI(My_Url)
I tried : My_Url=My_Url.replace("^","\\^"), but
It won't be the url I need
It doesn't work either
How to handle this ?
Frank
You need to encode the URI to replace illegal characters with legal encoded characters. If you first make a URL (so you don't have to do the parsing yourself) and then make a URI using the five-argument constructor, then the constructor will do the encoding for you.
import java.net.*;
public class Test {
public static void main(String[] args) {
String myURL = "http://finance.yahoo.com/q/h?s=^IXIC";
try {
URL url = new URL(myURL);
String nullFragment = null;
URI uri = new URI(url.getProtocol(), url.getHost(), url.getPath(), url.getQuery(), nullFragment);
System.out.println("URI " + uri.toString() + " is OK");
} catch (MalformedURLException e) {
System.out.println("URL " + myURL + " is a malformed URL");
} catch (URISyntaxException e) {
System.out.println("URI " + myURL + " is a malformed URL");
}
}
}
Use % encoding for the ^ character, viz. http://finance.yahoo.com/q/h?s=%5EIXIC
You have to encode your parameters.
Something like this will do:
import java.net.*;
import java.io.*;
public class EncodeParameter {
public static void main( String [] args ) throws URISyntaxException ,
UnsupportedEncodingException {
String myQuery = "^IXIC";
URI uri = new URI( String.format(
"http://finance.yahoo.com/q/h?s=%s",
URLEncoder.encode( myQuery , "UTF8" ) ) );
System.out.println( uri );
}
}
http://java.sun.com/javase/6/docs/api/java/net/URLEncoder.html
Rather than encoding the URL beforehand you can do the following
String link = "http://example.com";
URL url = null;
URI uri = null;
try {
url = new URL(link);
} catch(MalformedURLException e) {
e.printStackTrace();
}
try{
uri = new URI(url.toString())
} catch(URISyntaxException e {
try {
uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(),
url.getPort(), url.getPath(), url.getQuery(),
url.getRef());
} catch(URISyntaxException e1 {
e1.printStackTrace();
}
}
try {
url = uri.toURL()
} catch(MalfomedURLException e) {
e.printStackTrace();
}
String encodedLink = url.toString();
A general solution requires parsing the URL into a RFC 2396 compliant URI (note that this is an old version of the URI standard, which java.net.URI uses).
I have written a Java URL parsing library that makes this possible: galimatias. With this library, you can achieve your desired behaviour with this code:
String urlString = //...
URLParsingSettings settings = URLParsingSettings.create()
.withStandard(URLParsingSettings.Standard.RFC_2396);
URL url = URL.parse(settings, urlString);
Note that galimatias is in a very early stage and some features are experimental, but it is already quite solid for this use case.
A space is encoded to %20 in URLs, and to + in forms submitted data (content type application/x-www-form-urlencoded). You need the former.
Using Guava:
dependencies {
compile 'com.google.guava:guava:28.1-jre'
}
You can use UrlEscapers:
String encodedString = UrlEscapers.urlFragmentEscaper().escape(inputString);
Don't use String.replace, this would only encode the space. Use a library instead.
Coudn't imagine nothing better for
http://server.ru:8080/template/get?type=mail&format=html&key=ecm_task_assignment&label=Согласовать с контрагентом&descr=Описание&objectid=2231
that:
public static boolean checkForExternal(String str) {
int length = str.length();
for (int i = 0; i < length; i++) {
if (str.charAt(i) > 0x7F) {
return true;
}
}
return false;
}
private static final Pattern COLON = Pattern.compile("%3A", Pattern.LITERAL);
private static final Pattern SLASH = Pattern.compile("%2F", Pattern.LITERAL);
private static final Pattern QUEST_MARK = Pattern.compile("%3F", Pattern.LITERAL);
private static final Pattern EQUAL = Pattern.compile("%3D", Pattern.LITERAL);
private static final Pattern AMP = Pattern.compile("%26", Pattern.LITERAL);
public static String encodeUrl(String url) {
if (checkForExternal(url)) {
try {
String value = URLEncoder.encode(url, "UTF-8");
value = COLON.matcher(value).replaceAll(":");
value = SLASH.matcher(value).replaceAll("/");
value = QUEST_MARK.matcher(value).replaceAll("?");
value = EQUAL.matcher(value).replaceAll("=");
return AMP.matcher(value).replaceAll("&");
} catch (UnsupportedEncodingException e) {
throw LOGGER.getIllegalStateException(e);
}
} else {
return url;
}
}
I had this exception in the case of a test for checking some actual accessed URLs by users.
And the URLs are sometime contains an illegal-character and hang by this error.
So I make a function to encode only the characters in the URL string like this.
String encodeIllegalChar(String uriStr,String enc)
throws URISyntaxException,UnsupportedEncodingException {
String _uriStr = uriStr;
int retryCount = 17;
while(true){
try{
new URI(_uriStr);
break;
}catch(URISyntaxException e){
String reason = e.getReason();
if(reason == null ||
!(
reason.contains("in path") ||
reason.contains("in query") ||
reason.contains("in fragment")
)
){
throw e;
}
if(0 > retryCount--){
throw e;
}
String input = e.getInput();
int idx = e.getIndex();
String illChar = String.valueOf(input.charAt(idx));
_uriStr = input.replace(illChar,URLEncoder.encode(illChar,enc));
}
}
return _uriStr;
}
test:
String q = "\\'|&`^\"<>)(}{][";
String url = "http://test.com/?q=" + q + "#" + q;
String eic = encodeIllegalChar(url,'UTF-8');
System.out.println(String.format(" original:%s",url));
System.out.println(String.format(" encoded:%s",eic));
System.out.println(String.format(" uri-obj:%s",new URI(eic)));
System.out.println(String.format("re-decoded:%s",URLDecoder.decode(eic)));
If you're using RestangularV2 to post to a spring controller in java you can get this exception if you use RestangularV2.one() instead of RestangularV2.all()
Replace spaces in URL with + like If url contains dimension1=Incontinence Liners then replace it with dimension1=Incontinence+Liners.
Related
i need a library to extract file's full name from it's URL(Direct Download Link). I want a powerful library. I use FileNameUtils from Apache commons, But this class does not support a lot of URLs.
I want a library which supports these Urls:
https://example.cdn.com/mp4/7/9/5/file_795f32460d111df334849ee8336e56ca.mp4?e=1535545105&h=4772d27a70cd9b1c665b712f62592c47&download=1
name : file_795f32460d111df334849ee8336e56ca.mp4
http://example.cdn.comr/post/93/3/Jozve-Kamele-arbi.abp.zip
name : Jozve-Kamele-arbi.abp.zip
http://cdl.example.com/?b=dl-software&f=Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar
name : dl-software&f=Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar
https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.pdf995.com/samples/pdf.pdf&ved=2ahUKEwjV096X-ZHdAhVQzlkKHTpUBV4QFjAAegQIARAB&usg=AOvVaw3HFvAQ7GNf5QjsUo05ot-j
name: pdf.pdf
Can anyone help me? Thanks.
I apologize in advance if the grammar of my sentence is not correct. because I can't speak English well.
You could actually also try to solve this problem with regular expressions (like e.g (?i)([^=/&?]+\\.(" + EXTENSIONS + "))\\b), if you have a list of the files extensions you are interested in.
Here is an example of such a method which extracts a file from a URL:
private static final String EXTENSIONS = "ez|aw|atom|atomcat|atomsvc|ccxml|cdmia|cdmic|cdmid|cdmio|cdmiq|cu|davmount|dbk|dssc|xdssc|ecma|emma|epub|exi|pfr|gml|gpx|gxf|stk|ipfix|jar|ser|class|js|json|jsonml|lostxml|hqx|cpt|mads|mrc|mrcx|mathml|mbox|mscml|metalink|meta4|mets|mods|mp4s|mp4|mxf|oda|opf|ogx|omdoc|oxps|xer|pdf|pgp|prf|p10|p7s|p8|ac|cer|crl|pkipath|pki|pls|cww|pskcxml|rdf|rif|rnc|rl|rld|rs|gbr|mft|roa|rsd|rss|rtf|sbml|scq|scs|spq|spp|sdp|setpay|setreg|shf|rq|srx|gram|grxml|sru|ssdl|ssml|tfi|tsd|plb|psb|pvb|tcap|pwn|aso|imp|acu|air|fcdt|xdp|xfdf|ahead|azf|azs|azw|acc|ami|apk|cii|fti|atx|mpkg|m3u8|swi|iota|aep|mpm|bmi|rep|cdxml|mmd|cdy|cla|rp9|c11amc|c11amz|csp|cdbcmsg|cmc|clkx|clkk|clkp|clkt|clkw|wbs|pml|ppd|car|pcurl|dart|rdz|fe_launch|dna|mlp|dpg|dfac|kpxx|ait|svc|geo|mag|nml|esf|msf|qam|slt|ssf|ez2|ez3|fdf|mseed|gph|ftc|fnc|ltf|fsc|oas|oa2|oa3|fg5|bh2|ddd|xdw|xbd|fzs|txd|ggb|ggt|gxt|g2w|g3w|gmx|kml|kmz|gac|ghf|gim|grv|gtm|tpl|vcg|hal|zmm|hbci|les|hpgl|hpid|hps|jlt|pcl|pclxl|sfd-hdstx|mpy|irm|sc|igl|ivp|ivu|igm|i2g|qbo|qfx|rcprofile|irp|xpr|fcs|jam|rms|jisp|joda|karbon|chrt|kfo|flw|kon|ksp|htke|kia|sse|lasxml|lbd|lbe|123|apr|pre|nsf|org|scm|lwp|portpkg|mcd|mc1|cdkey|mwf|mfm|flo|igx|mif|daf|dis|mbk|mqy|msl|plc|txf|mpn|mpc|xul|cil|cab|xlam|xlsb|xlsm|xltm|eot|chm|ims|lrm|thmx|cat|stl|ppam|pptm|sldm|ppsm|potm|docm|dotm|wpl|xps|mseq|mus|msty|taglet|nlu|nnd|nns|nnw|ngdat|n-gage|rpst|rpss|edm|edx|ext|odc|otc|odb|odf|odft|odg|otg|odi|oti|odp|otp|ods|ots|odt|odm|ott|oth|xo|dd2|oxt|pptx|sldx|ppsx|potx|xlsx|xltx|docx|dotx|mgp|dp|esa|paw|str|ei6|efif|wg|plf|pbd|box|mgz|qps|ptid|bed|mxl|musicxml|cryptonote|cod|rm|rmvb|link66|st|see|sema|semd|semf|ifm|itp|iif|ipk|mmf|teacher|dxp|sfs|sdc|sda|sdd|smf|sgl|smzip|sm|sxc|stc|sxd|std|sxi|sti|sxm|sxw|sxg|stw|svd|xsm|bdm|xdm|tao|tmo|tpt|mxs|tra|utz|umj|unityweb|uoml|vcx|vis|vsf|wbxml|wmlc|wmlsc|wtb|nbp|wpd|wqd|stf|xar|xfdl|hvd|hvs|hvp|osf|osfpvg|saf|spf|cmp|zaz|vxml|wgt|hlp|wsdl|wspolicy|7z|abw|ace|dmg|aam|aas|bcpio|torrent|bz|vcd|cfs|chat|pgn|nsc|cpio|csh|dgc|wad|ncx|dtb|res|dvi|evy|eva|bdf|gsf|psf|pcf|snf|arc|spl|gca|ulx|gnumeric|gramps|gtar|hdf|install|iso|jnlp|latex|mie|application|lnk|wmd|wmz|xbap|mdb|obd|crd|clp|mny|pub|scd|trm|wri|nzb|p7r|rar|ris|sh|shar|swf|xap|sql|sit|sitx|srt|sv4cpio|sv4crc|t3|gam|tar|tcl|tex|tfm|obj|ustar|src|fig|xlf|xpi|xz|xaml|xdf|xenc|dtd|xop|xpl|xslt|xspf|yang|yin|zip|adp|s3m|sil|eol|dra|dts|dtshd|lvp|pya|ecelp4800|ecelp7470|ecelp9600|rip|weba|aac|caf|flac|mka|m3u|wax|wma|rmp|wav|xm|cdx|cif|cmdf|cml|csml|xyz|ttc|otf|ttf|woff|woff2|bmp|cgm|g3|gif|ief|ktx|png|btif|sgi|psd|sub|dwg|dxf|fbs|fpx|fst|mmr|rlc|mdi|wdp|npx|wbmp|xif|webp|3ds|ras|cmx|ico|sid|pcx|pnm|pbm|pgm|ppm|rgb|tga|xbm|xpm|xwd|dae|dwf|gdl|gtw|mts|vtu|appcache|css|csv|n3|dsc|rtx|tsv|ttl|vcard|curl|dcurl|mcurl|scurl|sub|fly|flx|gv|3dml|spot|jad|wml|wmls|java|nfo|opml|etx|sfv|uu|vcs|vcf|3gp|3g2|h261|h263|h264|jpgv|ogv|dvb|fvt|pyv|viv|webm|f4v|fli|flv|m4v|mng|vob|wm|wmv|wmx|wvx|avi|movie|smv|ice";
private static final Pattern FILE_DETECT = Pattern.compile("(?i)([^=/&?]+\\.(" + EXTENSIONS + "))\\b");
public static Optional<String> extractFileFrom(String url) {
Matcher matcher = FILE_DETECT.matcher(url);
return (matcher.find()) ? Optional.of(matcher.group(1)) : Optional.empty();
}
And here is a test which demonstrates how to use the method above:
public static void main(String[] args) throws ParseException {
List<String> strings = Arrays.asList(
"https://example.cdn.com/mp4/7/9/5/file_795f32460d111df334849ee8336e56ca.mp4?e=1535545105&h=4772d27a70cd9b1c665b712f62592c47&download=1",
"http://example.cdn.comr/post/93/3/Jozve-Kamele-arbi.abp.zip",
"http://cdl.example.com/?b=dl-software&f=Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar",
"https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.pdf995.com/samples/pdf.pdf&ved=2ahUKEwjV096X-ZHdAhVQzlkKHTpUBV4QFjAAegQIARAB&usg=AOvVaw3HFvAQ7GNf5QjsUo05ot-j",
"https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.pdf995.com/samples/pdf.PDF&ved=2ahUKEwjV096X-ZHdAhVQzlkKHTpUBV4QFjAAegQIARAB&usg=AOvVaw3HFvAQ7GNf5QjsUo05ot-j");
strings.stream().map(s -> extractFileFrom(s)).collect(Collectors.toList())
.forEach(System.out::println);
}
If you execute the main method you will see this on the console:
Optional[file_795f32460d111df334849ee8336e56ca.mp4]
Optional[Jozve-Kamele-arbi.abp.zip]
Optional[Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar]
Optional[pdf.pdf]
Optional[pdf.PDF]
I use this method, hope it helps you too. It will parse from question marks, hash too.
public static String parseFileNameFromUrl(String url) {
if (url == null) {
return "";
}
try {
URL res = new URL(url);
String resHost = res.getHost();
if (resHost.length() > 0 && url.endsWith(resHost)) {
// handle ...example.com
return "";
}
} catch (MalformedURLException e) {
e.printStackTrace();
return "";
}
int startIndex = url.lastIndexOf('/') + 1;
int length = url.length();
// find end index for ?
int lastQuestionMarkPos = url.lastIndexOf('?');
if (lastQuestionMarkPos == -1) {
lastQuestionMarkPos = length;
}
// find end index for #
int lastHashPos = url.lastIndexOf('#');
if (lastHashPos == -1) {
lastHashPos = length;
}
// calculate the end index
int endIndex = Math.min(lastQuestionMarkPos, lastHashPos);
return url.substring(startIndex, endIndex);
}
I have a "moreinfo" Directory which has some html file and other folder. I am searching the file in the moreinfo directory( and not sub directory in moreinfo) matches with toolId*.The names of the file is same as toolId],
Below is a code snippet how i writing it, In case my toolId = delegatedAccess the list returns 2 file (delegatedAccess.html & delegatedAccess.shopping.html) based on the wide card filter(toolId*)
Is their a better way of writing the regular expression that check until last occurring period and return the file that matches exactly with my toolId?
infoDir =/Users/moreinfo
private String getMoreInfoUrl(File infoDir, String toolId) {
String moreInfoUrl = null;
try {
Collection<File> files = FileUtils.listFiles(infoDir, new WildcardFileFilter(toolId+"*"), null);
if (files.isEmpty()==false) {
File mFile = files.iterator().next();
moreInfoUrl = libraryPath + mFile.getName(); // toolId;
}
} catch (Exception e) {
M_log.info("unable to read moreinfo" + e.getMessage());
}
return moreInfoUrl;
}
This is what i end up doing with all the great comments. I did string manipulation to solve my problem. As Regex was not right solution to it.
private String getMoreInfoUrl(File infoDir, String toolId) {
String moreInfoUrl = null;
try {
Collection<File> files = FileUtils.listFiles(infoDir, new WildcardFileFilter(toolId+"*"), null);
if (files.isEmpty()==false) {
for (File mFile : files) {
int lastIndexOfPeriod = mFile.getName().lastIndexOf('.');
String fNameWithOutExtension = mFile.getName().substring(0,lastIndexOfPeriod);
if(fNameWithOutExtension.equals(toolId)) {
moreInfoUrl = libraryPath + mFile.getName();
break;
}
}
}
} catch (Exception e) {
M_log.info("unable to read moreinfo" + e.getMessage());
}
return moreInfoUrl;
}
I need to pass the encoded url but it needs to avoid special characters as well. So how do I encode it? None of the answers on stackoverflow worked for me. Can any one help? I want to do it in java
public String tweet_quote_func(Map attrs, FilterParam param) {
String url = param.getShorturi();
String text = attrs.get("display");
if (url != null && text != null) {
try {
text = StringEscapeUtils.unescapeHtml(text); // for double quotes
String encodedurl = "https://twitter.com/intent/tweet?url="+URLEncoder.encode(url, "UTF-8");
encodedurl = encodedurl + "&text=" + StringUtil.escapeUrl(text);
return "<span class=\"tweet_quote\"> " + text.trim() + "<span></span></span>";
} catch (UnsupportedEncodingException e) {
System.err.println(e);
return null;
}
} else
return null;
}
You should use URLEncoder for all URL query argument names and their values. Not StringUtils.escapeURL(), whatever that is.
I don't know what you think is different about 'the special characters ".", "-", "*", and "_"', but URLEncoder is defined to do the right thing.
For the URL paths themselves you should use new URI(null, path, null).toASCIIString().
I'm newbie to Java, I want to get all of the URL in the text below
WEBSITE1 https://localhost:8080/admin/index.php?page=home
WEBSITE2 https://192.168.0.3:8084/index.php
WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home
WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum
the result that I want is:
https://localhost:8080
https://192.168.0.3:8084
https://192.168.0.5
https://192.168.0.1:8080
I want to store it into the Linked List or Array too.
Can somebody teach me?
Thank You
This is how you can do this. I did one for you and you do the rest :)
try {
ArrayList<String> urls = new ArrayList<String>();
URL aURL = new URL("https://localhost:8080/admin/index.php?page=home");
System.out.println("protocol = " + aURL.getProtocol()+aURL.getHost()+aURL.getPort());
urls.add(aURL.getProtocol()+aURL.getHost()+aURL.getPort());
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Use a simple regexp to locate what's starting with https?:// and then just extract this until the first /
Matcher m = Pattern.compile("(https?://[^/]+)").matcher(//
"WEBSITE1 https://localhost:8080/admin/index.php?page=home\r\n" + //
"WEBSITE2 https://192.168.0.3:8084/index.php\r\n" + //
"WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home\r\n" + //
"WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum");
List<String> urls = new ArrayList<String>();
while (m.find()) {
urls.add(m.group(1));
}
System.out.println(urls);
Now if you do want to get only the WEBSITE. part you will only have to change the regular expression "(https?://[^/]+)" with the following one: "(.*?)\\s+https?". The rest of the code stays untouched.
Let's say the line represents a single line (probably in a loop):
//get the index of "https" in the string
int indexOfHTTPS= line.indexOf("https://");
//get the index of the first "/" after the "https"
int indexOfFirstSlashAfterHTTPS= line.indexOf("/", indexOfHTTPS + "https://".length());
//take a string between "https" and the first "/"
String url = line.substring(indexOfHTTPS, indexOfFirstSlashAfterHTTPS);
Later on, add this url to an ArrayList<String>:
ArrayList<String> urlList= new ArrayList<String>();
urlList.add(url);
You can do it with the help of URL class.
public static void main(String[] args) throws MalformedURLException {
String string ="https://192.168.0.5:9090/controller/index.php?page=home";
URL url= new URL(string);
String result ="https://"+url.getHost()+":"+url.getPort();
System.out.println(result);
}
Output :https://192.168.0.5:9090
You could either try to find the index of the protocol substring ("http[s]") in the Strings, or use a simple Pattern (only for matching the "website[0-9]" head, not to apply to the URLs).
Here's a solution with the Pattern.
String webSite1 = "WEBSITE1 https://localhost:8080/admin/index.php?page=home";
String webSite2 = "WEBSITE2 https://192.168.0.3:8084/index.php";
String webSite3 = "WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home";
String webSite4 = "WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum";
ArrayList<URI> uris = new ArrayList<URI>();
Pattern pattern = Pattern.compile("^website\\d+\\s+?(.+)", Pattern.CASE_INSENSITIVE);
Matcher matcher;
matcher = pattern.matcher(webSite1);
if (matcher.find()) {
try {
uris.add(new URI(matcher.group(1)));
}
catch (URISyntaxException use) {
use.printStackTrace();
}
}
matcher = pattern.matcher(webSite2);
if (matcher.find()) {
try {
uris.add(new URI(matcher.group(1)));
}
catch (URISyntaxException use) {
use.printStackTrace();
}
}
matcher = pattern.matcher(webSite3);
if (matcher.find()) {
try {
uris.add(new URI(matcher.group(1)));
}
catch (URISyntaxException use) {
use.printStackTrace();
}
}
matcher = pattern.matcher(webSite4);
if (matcher.find()) {
try {
uris.add(new URI(matcher.group(1)));
}
catch (URISyntaxException use) {
use.printStackTrace();
}
}
System.out.println(uris);
Output:
[https://localhost:8080/admin/index.php?page=home, https://192.168.0.3:8084/index.php, https://192.168.0.5:9090/controller/index.php?page=home, https://192.168.0.1:8080/home/index.php?page=forum]
I merge two url with the following code.
String strUrl1 = "http://www.domainname.com/path1/2012/04/25/file.php";
String arg = "?page=2";
URL url1;
try {
url1 = new URL(strUrl1);
URL reconUrl1 = new URL(url1,arg);
System.out.println(" url : " + reconUrl1.toString());
} catch (MalformedURLException ex) {
ex.printStackTrace();
}
I'm surprise by the result : http://www.domainname.com/path1/2012/04/25/?page=2
I expect it to be (what browser do) : http://www.domainname.com/path1/2012/04/25/file.php?page=2
Tha javadoc about the constructor URL(URL context, String spec) explain it should respect the RFC.
I'm doing something wrong ?
Thanks
UPDATE :
This is the only problem I encountered with the fonction.
The code already works in all others cases, like browser do
"domain.com/folder/sub" + "/test" -> "domain.com/test"
"domain.com/folder/sub/" + "test" -> "domain.com/folder/sub/test"
"domain.com/folder/sub/" + "../test" -> "domain.com/folder/test"
...
You can always merge the String first and then created the URL based on the merged String.
StringBuffer buf = new StringBuffer();
buf.append(strURL1);
buf.append(arg);
URL url1 = new URL(buf.toString());
try
String k = url1+arg;
URL url1;
try {
url1 = new URL(k);
//URL reconUrl1 = new URL(url1,arg);
System.out.println(" url : " + url1.toString());
} catch (MalformedURLException ex) {
ex.printStackTrace();
}
I haven't read through the RFC, but the context (as mentioned in the Java Doc for URL) is presumably the directory of a URL, which means that the context of
"http://www.domainname.com/path1/2012/04/25/file.php"
is
"http://www.domainname.com/path1/2012/04/25/"
which is why
new URL(url1,arg);
yields
"http://www.domainname.com/path1/2012/04/25/?page=2"
The "workaround" is obviously to concatenate the parts yourself, using +.
you are using the constructor of URL here which takes paramter as URL(URL context, String spec). So you dont pass the php page with the URL but instead with the string. context needs to be the directory. the proper way to do this would be
String strUrl1 = "http://www.domainname.com/path1/2012/04/25";
String arg = "/file.php?page=2";
URL url1;
try {
url1 = new URL(strUrl1);
URL reconUrl1 = new URL(url1,arg);
System.out.println(" url : " + reconUrl1.toString());
} catch (MalformedURLException ex) {
ex.printStackTrace();
}
Try this
String strUrl1 = "http://www.domainname.com/path1/2012/04/25/";
String arg = "file.php?page=2";
URL url1;
try {
url1 = new URL(strUrl1);
URL reconUrl1 = new URL(url1,arg);
System.out.println(" url : " + reconUrl1.toString());
} catch (MalformedURLException ex) {
ex.printStackTrace();
}
When you read the java doc it mentions about the context of the specified URL
Which is the domain and the path:
"http://www.domainname.com" + "/path1/2012/04/25/"
Where "file.php" is considered the text where it belongs to the context mentioned above.
This two parameter overloaded constructor uses the context of a URL as base and adds the second param to create a complete URL, which is not what you need.
So it's better to String add the two parts and then create URL from them:
String contextURL = "http://www.domainname.com/path1/2012/04/25/";
String textURL = "file.php?page=2";
URL url;
try {
url = new URL(contextURL);
URL reconUrl = new URL(url, textURL);
System.out.println(" url : " + reconUrl.toString());
} catch (MalformedURLException murle) {
murle.printStackTrace();
}