Get Specific URL from the Text/String Using Java

Get Specific URL from the Text/String Using Java - java

I'm newbie to Java, I want to get all of the URL in the text below
WEBSITE1 https://localhost:8080/admin/index.php?page=home
WEBSITE2 https://192.168.0.3:8084/index.php
WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home
WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum
the result that I want is:
https://localhost:8080
https://192.168.0.3:8084
https://192.168.0.5
https://192.168.0.1:8080
I want to store it into the Linked List or Array too.
Can somebody teach me?
Thank You

This is how you can do this. I did one for you and you do the rest :)
try {
ArrayList<String> urls = new ArrayList<String>();
URL aURL = new URL("https://localhost:8080/admin/index.php?page=home");
System.out.println("protocol = " + aURL.getProtocol()+aURL.getHost()+aURL.getPort());
urls.add(aURL.getProtocol()+aURL.getHost()+aURL.getPort());
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

Use a simple regexp to locate what's starting with https?:// and then just extract this until the first /
Matcher m = Pattern.compile("(https?://[^/]+)").matcher(//
"WEBSITE1 https://localhost:8080/admin/index.php?page=home\r\n" + //
"WEBSITE2 https://192.168.0.3:8084/index.php\r\n" + //
"WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home\r\n" + //
"WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum");
List<String> urls = new ArrayList<String>();
while (m.find()) {
urls.add(m.group(1));
}
System.out.println(urls);
Now if you do want to get only the WEBSITE. part you will only have to change the regular expression "(https?://[^/]+)" with the following one: "(.*?)\\s+https?". The rest of the code stays untouched.

Let's say the line represents a single line (probably in a loop):
//get the index of "https" in the string
int indexOfHTTPS= line.indexOf("https://");
//get the index of the first "/" after the "https"
int indexOfFirstSlashAfterHTTPS= line.indexOf("/", indexOfHTTPS + "https://".length());
//take a string between "https" and the first "/"
String url = line.substring(indexOfHTTPS, indexOfFirstSlashAfterHTTPS);
Later on, add this url to an ArrayList<String>:
ArrayList<String> urlList= new ArrayList<String>();
urlList.add(url);

You can do it with the help of URL class.
public static void main(String[] args) throws MalformedURLException {
String string ="https://192.168.0.5:9090/controller/index.php?page=home";
URL url= new URL(string);
String result ="https://"+url.getHost()+":"+url.getPort();
System.out.println(result);
}
Output :https://192.168.0.5:9090

You could either try to find the index of the protocol substring ("http[s]") in the Strings, or use a simple Pattern (only for matching the "website[0-9]" head, not to apply to the URLs).
Here's a solution with the Pattern.
String webSite1 = "WEBSITE1 https://localhost:8080/admin/index.php?page=home";
String webSite2 = "WEBSITE2 https://192.168.0.3:8084/index.php";
String webSite3 = "WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home";
String webSite4 = "WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum";
ArrayList<URI> uris = new ArrayList<URI>();
Pattern pattern = Pattern.compile("^website\\d+\\s+?(.+)", Pattern.CASE_INSENSITIVE);
Matcher matcher;
matcher = pattern.matcher(webSite1);
if (matcher.find()) {
try {
uris.add(new URI(matcher.group(1)));
}
catch (URISyntaxException use) {
use.printStackTrace();
}
}
matcher = pattern.matcher(webSite2);
if (matcher.find()) {
try {
uris.add(new URI(matcher.group(1)));
}
catch (URISyntaxException use) {
use.printStackTrace();
}
}
matcher = pattern.matcher(webSite3);
if (matcher.find()) {
try {
uris.add(new URI(matcher.group(1)));
}
catch (URISyntaxException use) {
use.printStackTrace();
}
}
matcher = pattern.matcher(webSite4);
if (matcher.find()) {
try {
uris.add(new URI(matcher.group(1)));
}
catch (URISyntaxException use) {
use.printStackTrace();
}
}
System.out.println(uris);
Output:
[https://localhost:8080/admin/index.php?page=home, https://192.168.0.3:8084/index.php, https://192.168.0.5:9090/controller/index.php?page=home, https://192.168.0.1:8080/home/index.php?page=forum]

Related

Append a string in front of line in java?

I am creating a pattern lock based project in android.
I have a file called category.txt
The content of the file is as below
Sports:Race:Arcade:
No what i want is that whenever the user draw a pattern for a specific games category the pattern should get append in front of that category.
eg :
Sports:Race:"string/pattern string to be appended here for race"Arcade:
i have used following code but it is not working.
private void writefile(String getpattern,String category)
{
String str1;
try {
file = new RandomAccessFile(filewrite, "rw");
while((str1 = file.readLine()) != null)
{
String line[] = str1.split(":");
if(line[0].toLowerCase().equals(category.toLowerCase()))
{
String colon=":";
file.write(category.getBytes());
file.write(colon.getBytes());
file.write(getpattern.getBytes());
file.close();
Toast.makeText(getActivity(),"In Writefile",Toast.LENGTH_LONG).show();
}
}
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch(IOException io)
{
io.printStackTrace();
}
}
please help !

Using RandomAccessFile you have to calculate the position. I think it's much easier to just replace the file content with a little help from apache-commons-io FileUtils. This might be not the best idea if you have a very large file but it's quite simple.
String givenCategory = "Sports";
String pattern = "stringToAppend";
final String colon = ":";
try {
List<String> lines = FileUtils.readLines(new File("someFile.txt"));
String modifiedLine = null;
int index = 0;
for (String line : lines) {
String[] categoryFromLine = line.split(colon);
if (givenCategory.equalsIgnoreCase(categoryFromLine[0])) {
modifiedLine = new StringBuilder().append(pattern).append(colon).append(givenCategory).append(colon).toString();
break;
}
index++;
}
if (modifiedLine != null) {
lines.set(index, modifiedLine);
FileUtils.writeLines(new File("someFile.txt"), lines);
}
} catch (IOException e1) {
// do something
}

Java applet on website does not scan other websites

Hello I have just finished my very first applet in Java :
http://st.fri.uniza.sk/~mudrak3/index2
What it does is basicaly it goes through websites source code and finds any links and appends them into textArea.
If I put that website link into textField (http://st.fri.uniza.sk/~mudrak3/index2) and hit button it all works. Button event :
private void button1ActionPerformed(java.awt.event.ActionEvent evt) {
textArea1.setText("\f");
try {
ArrayList<String> array = new ArrayList<String>();
ArrayList<String> vystup = new ArrayList<String>();
URL adresa;
adresa = new URL(textField1.getText());
BufferedReader kod = new BufferedReader(new InputStreamReader(adresa.openStream()));
String riadok;
while ((riadok = kod.readLine()) != null) {
array.add(riadok);
String[] pom = riadok.split(" ");
String xxx;
Pattern pattern = Pattern.compile("http://[^ \"]+");
for (int i = 0; i < pom.length; i++) {
xxx = pom[i];
Matcher matcher = pattern.matcher(xxx);
if (matcher.find()) {
textArea1.append(matcher.group(0) + "\n");
}
}
}
textArea1.append("---------Koniec---------");
} catch (MalformedURLException ex) {
JOptionPane.showMessageDialog(null, "Zle zadana URL !");
} catch (IOException ex) {
JOptionPane.showMessageDialog(null, "IOException !");
}
}
Any other website doesn't work. This app works in NetBeans as I run the applet, but not on the website. Any help ?

In order to reach across domains, an applet needs to be:
Digitally signed by you.
Accepted by the user when prompted.

I need to contain all matches of a Regex into a text file; I'm new to java programming

I'm trying to contain all matches found into a text document, I have been banging my head on my desk for the past 3 hours and figured it would be time I asked for help.
My current issue is with the List<String> and I'm not sure if it because the information entered is wrong or if it's my file print methods. It does not print to file and with other means of printing such as writer.println(returnvalue) and even then, it still only displays one of the matches and not all, I do have the matches appearing in console just to make sure they are showing and they are.
Edit2: Sorry this would be my first question on stackoverflow, I guess my question is How would you print all the data from a list array to a text file?
Edit3: My newest problem is printing out all matches i am currently stuck printing out the last match, any advice?
public static void RegexChecker(String TheRegex, String line){
String Result= "";
List<String> returnvalue = new ArrayList<String>();
Pattern checkRegex = Pattern.compile(TheRegex);
Matcher regexMatcher = checkRegex.matcher(line);
int count = 0 ;
FileWriter writer = null;
try {
writer = new FileWriter("output.txt");
} catch (IOException e1) {
e1.printStackTrace();
}
while ( regexMatcher.find() ){
if (regexMatcher.group().length() != 0){
returnvalue.add(regexMatcher.group());
System.out.println( regexMatcher.group().trim() );
}
for(String str: returnvalue) {
try {
out.write(String.valueOf(returnvalue.get(i)));
writer.write(str);
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

Get the for out of while. You want to write to the file only after all matches have been added to the list. The for-each block needs some modifications as well.
The for-each construct gives you values from iteration over the collection. You need not obtain the values again using an index.
Try this:
while (regexMatcher.find()) {
if (regexMatcher.group().length() != 0) {
returnvalue.add(regexMatcher.group());
System.out.println(regexMatcher.group().trim());
}
}
try {
for (String str : returnvalue) {
writer.write(str + "\n");
}
writer.flush();
writer.close();
} catch (IOException e) {
e.printStackTrace();
}

Merge two URL in JAVA

I merge two url with the following code.
String strUrl1 = "http://www.domainname.com/path1/2012/04/25/file.php";
String arg = "?page=2";
URL url1;
try {
url1 = new URL(strUrl1);
URL reconUrl1 = new URL(url1,arg);
System.out.println(" url : " + reconUrl1.toString());
} catch (MalformedURLException ex) {
ex.printStackTrace();
}
I'm surprise by the result : http://www.domainname.com/path1/2012/04/25/?page=2
I expect it to be (what browser do) : http://www.domainname.com/path1/2012/04/25/file.php?page=2
Tha javadoc about the constructor URL(URL context, String spec) explain it should respect the RFC.
I'm doing something wrong ?
Thanks
UPDATE :
This is the only problem I encountered with the fonction.
The code already works in all others cases, like browser do
"domain.com/folder/sub" + "/test" -> "domain.com/test"
"domain.com/folder/sub/" + "test" -> "domain.com/folder/sub/test"
"domain.com/folder/sub/" + "../test" -> "domain.com/folder/test"
...

You can always merge the String first and then created the URL based on the merged String.
StringBuffer buf = new StringBuffer();
buf.append(strURL1);
buf.append(arg);
URL url1 = new URL(buf.toString());

try
String k = url1+arg;
URL url1;
try {
url1 = new URL(k);
//URL reconUrl1 = new URL(url1,arg);
System.out.println(" url : " + url1.toString());
} catch (MalformedURLException ex) {
ex.printStackTrace();
}

I haven't read through the RFC, but the context (as mentioned in the Java Doc for URL) is presumably the directory of a URL, which means that the context of
"http://www.domainname.com/path1/2012/04/25/file.php"
is
"http://www.domainname.com/path1/2012/04/25/"
which is why
new URL(url1,arg);
yields
"http://www.domainname.com/path1/2012/04/25/?page=2"
The "workaround" is obviously to concatenate the parts yourself, using +.

you are using the constructor of URL here which takes paramter as URL(URL context, String spec). So you dont pass the php page with the URL but instead with the string. context needs to be the directory. the proper way to do this would be
String strUrl1 = "http://www.domainname.com/path1/2012/04/25";
String arg = "/file.php?page=2";
URL url1;
try {
url1 = new URL(strUrl1);
URL reconUrl1 = new URL(url1,arg);
System.out.println(" url : " + reconUrl1.toString());
} catch (MalformedURLException ex) {
ex.printStackTrace();
}

Try this
String strUrl1 = "http://www.domainname.com/path1/2012/04/25/";
String arg = "file.php?page=2";
URL url1;
try {
url1 = new URL(strUrl1);
URL reconUrl1 = new URL(url1,arg);
System.out.println(" url : " + reconUrl1.toString());
} catch (MalformedURLException ex) {
ex.printStackTrace();
}

When you read the java doc it mentions about the context of the specified URL
Which is the domain and the path:
"http://www.domainname.com" + "/path1/2012/04/25/"
Where "file.php" is considered the text where it belongs to the context mentioned above.
This two parameter overloaded constructor uses the context of a URL as base and adds the second param to create a complete URL, which is not what you need.
So it's better to String add the two parts and then create URL from them:
String contextURL = "http://www.domainname.com/path1/2012/04/25/";
String textURL = "file.php?page=2";
URL url;
try {
url = new URL(contextURL);
URL reconUrl = new URL(url, textURL);
System.out.println(" url : " + reconUrl.toString());
} catch (MalformedURLException murle) {
murle.printStackTrace();
}

How to deal with the URISyntaxException

I got this error message :
java.net.URISyntaxException: Illegal character in query at index 31: http://finance.yahoo.com/q/h?s=^IXIC
My_Url = http://finance.yahoo.com/q/h?s=^IXIC
When I copied it into a browser address field, it showed the correct page, it's a valid URL, but I can't parse it with this: new URI(My_Url)
I tried : My_Url=My_Url.replace("^","\\^"), but
It won't be the url I need
It doesn't work either
How to handle this ?
Frank

You need to encode the URI to replace illegal characters with legal encoded characters. If you first make a URL (so you don't have to do the parsing yourself) and then make a URI using the five-argument constructor, then the constructor will do the encoding for you.
import java.net.*;
public class Test {
public static void main(String[] args) {
String myURL = "http://finance.yahoo.com/q/h?s=^IXIC";
try {
URL url = new URL(myURL);
String nullFragment = null;
URI uri = new URI(url.getProtocol(), url.getHost(), url.getPath(), url.getQuery(), nullFragment);
System.out.println("URI " + uri.toString() + " is OK");
} catch (MalformedURLException e) {
System.out.println("URL " + myURL + " is a malformed URL");
} catch (URISyntaxException e) {
System.out.println("URI " + myURL + " is a malformed URL");
}
}
}

Use % encoding for the ^ character, viz. http://finance.yahoo.com/q/h?s=%5EIXIC

You have to encode your parameters.
Something like this will do:
import java.net.*;
import java.io.*;
public class EncodeParameter {
public static void main( String [] args ) throws URISyntaxException ,
UnsupportedEncodingException {
String myQuery = "^IXIC";
URI uri = new URI( String.format(
"http://finance.yahoo.com/q/h?s=%s",
URLEncoder.encode( myQuery , "UTF8" ) ) );
System.out.println( uri );
}
}
http://java.sun.com/javase/6/docs/api/java/net/URLEncoder.html

Rather than encoding the URL beforehand you can do the following
String link = "http://example.com";
URL url = null;
URI uri = null;
try {
url = new URL(link);
} catch(MalformedURLException e) {
e.printStackTrace();
}
try{
uri = new URI(url.toString())
} catch(URISyntaxException e {
try {
uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(),
url.getPort(), url.getPath(), url.getQuery(),
url.getRef());
} catch(URISyntaxException e1 {
e1.printStackTrace();
}
}
try {
url = uri.toURL()
} catch(MalfomedURLException e) {
e.printStackTrace();
}
String encodedLink = url.toString();

A general solution requires parsing the URL into a RFC 2396 compliant URI (note that this is an old version of the URI standard, which java.net.URI uses).
I have written a Java URL parsing library that makes this possible: galimatias. With this library, you can achieve your desired behaviour with this code:
String urlString = //...
URLParsingSettings settings = URLParsingSettings.create()
.withStandard(URLParsingSettings.Standard.RFC_2396);
URL url = URL.parse(settings, urlString);
Note that galimatias is in a very early stage and some features are experimental, but it is already quite solid for this use case.

A space is encoded to %20 in URLs, and to + in forms submitted data (content type application/x-www-form-urlencoded). You need the former.
Using Guava:
dependencies {
compile 'com.google.guava:guava:28.1-jre'
}
You can use UrlEscapers:
String encodedString = UrlEscapers.urlFragmentEscaper().escape(inputString);
Don't use String.replace, this would only encode the space. Use a library instead.

Coudn't imagine nothing better for
http://server.ru:8080/template/get?type=mail&format=html&key=ecm_task_assignment&label=Согласовать с контрагентом&descr=Описание&objectid=2231
that:
public static boolean checkForExternal(String str) {
int length = str.length();
for (int i = 0; i < length; i++) {
if (str.charAt(i) > 0x7F) {
return true;
}
}
return false;
}
private static final Pattern COLON = Pattern.compile("%3A", Pattern.LITERAL);
private static final Pattern SLASH = Pattern.compile("%2F", Pattern.LITERAL);
private static final Pattern QUEST_MARK = Pattern.compile("%3F", Pattern.LITERAL);
private static final Pattern EQUAL = Pattern.compile("%3D", Pattern.LITERAL);
private static final Pattern AMP = Pattern.compile("%26", Pattern.LITERAL);
public static String encodeUrl(String url) {
if (checkForExternal(url)) {
try {
String value = URLEncoder.encode(url, "UTF-8");
value = COLON.matcher(value).replaceAll(":");
value = SLASH.matcher(value).replaceAll("/");
value = QUEST_MARK.matcher(value).replaceAll("?");
value = EQUAL.matcher(value).replaceAll("=");
return AMP.matcher(value).replaceAll("&");
} catch (UnsupportedEncodingException e) {
throw LOGGER.getIllegalStateException(e);
}
} else {
return url;
}
}

I had this exception in the case of a test for checking some actual accessed URLs by users.
And the URLs are sometime contains an illegal-character and hang by this error.
So I make a function to encode only the characters in the URL string like this.
String encodeIllegalChar(String uriStr,String enc)
throws URISyntaxException,UnsupportedEncodingException {
String _uriStr = uriStr;
int retryCount = 17;
while(true){
try{
new URI(_uriStr);
break;
}catch(URISyntaxException e){
String reason = e.getReason();
if(reason == null ||
!(
reason.contains("in path") ||
reason.contains("in query") ||
reason.contains("in fragment")
)
){
throw e;
}
if(0 > retryCount--){
throw e;
}
String input = e.getInput();
int idx = e.getIndex();
String illChar = String.valueOf(input.charAt(idx));
_uriStr = input.replace(illChar,URLEncoder.encode(illChar,enc));
}
}
return _uriStr;
}
test:
String q = "\\'|&`^\"<>)(}{][";
String url = "http://test.com/?q=" + q + "#" + q;
String eic = encodeIllegalChar(url,'UTF-8');
System.out.println(String.format(" original:%s",url));
System.out.println(String.format(" encoded:%s",eic));
System.out.println(String.format(" uri-obj:%s",new URI(eic)));
System.out.println(String.format("re-decoded:%s",URLDecoder.decode(eic)));

If you're using RestangularV2 to post to a spring controller in java you can get this exception if you use RestangularV2.one() instead of RestangularV2.all()

Replace spaces in URL with + like If url contains dimension1=Incontinence Liners then replace it with dimension1=Incontinence+Liners.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Get Specific URL from the Text/String Using Java - java

Related

Append a string in front of line in java?

Java applet on website does not scan other websites

I need to contain all matches of a Regex into a text file; I'm new to java programming

Merge two URL in JAVA

How to deal with the URISyntaxException

Categories

Resources