modify regexp to detect all url links [duplicate]

modify regexp to detect all url links [duplicate] - java

This question already has answers here:
Validating URL in Java
(11 answers)
Closed 9 years ago.
i have method which can return me to array of links in string, but this work only if link have 'http' or 'www' prefix ( http:// site.com or www.site.com) . and also need to detect links without prefix just site.com
Please help me
ArrayList retrieveLinks(String text) {
ArrayList links = new ArrayList();
String regex = "\\(?\\b(http://|https://|www[.])[-A-Za-z0-9+&##/%?=~_()|!:,.;]*[-A-Za-z0-9+&##/%=~_()|]";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(text);
while(m.find()) {
String urlStr = m.group();
char[] stringArray1 = urlStr.toCharArray();
if (urlStr.startsWith("(") && urlStr.endsWith(")"))
{
char[] stringArray = urlStr.toCharArray();
char[] newArray = new char[stringArray.length-2];
System.arraycopy(stringArray, 1, newArray, 0, stringArray.length-2);
urlStr = new String(newArray);
// System.out.println("Finally Url ="+newArray.toString());
}
//System.out.println("...Url..."+urlStr);
links.add(urlStr);
}
return links;
}

Not commenting on the rest of the source code
Make the prefix optional, using a ? after the group that declares the possible prefixes.
String regex = "\\(?\\b(http://|https://|www[.])?[-A-Za-z0-9+&##/%?=~_()|!:,.;]*[-A-Za-z0-9+&##/%=~_()|]";
See live test here.

Related

Deleting exact match String from text file [duplicate]

This question already has answers here:
Find a line in a file and remove it
(17 answers)
Closed 1 year ago.
I am trying to remove a specific string in my text file. Current code gets the line one by one until 50 is hit. I am trying to remove the string (EXACT MATCH!) in the notepad, but not sure how to do so.
Scanner input = new Scanner(new File("C:\\file.txt"));
int counter = 0;
while(input.hasNextLine() && counter < 50) {
counter++;
String tempName = input.nextLine();
//perform my custom code here
//somehow delete tempName from the text file (exact match)
}
I have tried input.nextLine().replaceFirst(tempName, ""); without any luck

If you are using Java 8, You can do something like below using java.nio package:
Path p = Paths.get("PATH-TO-FILE");
List<String> lines = Files.lines(p)
.map(str -> str.replaceFirst("STRING-TO-DELETE",""))
.filter(str -> !str.equals(""))
.collect(Collectors.toList());
Files.write(p, lines, StandardCharsets.UTF_8);

Below line replaces "the" with "**"
Pattern pattern = Pattern.compile("the", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(tempName);
String repalcedvalue = matcher.replaceAll("**");

How to split URL and take only the host names [duplicate]

This question already has answers here:
What is the fastest way to get the domain/host name from a URL?
(8 answers)
Closed 3 years ago.
Im trying to split URLs, for example https://stackoverflow.com/questions/ and take only stackoverflow.com. How can I do this in Java without using the built in function getHost()?

If you can put your URL into a String , there is this option :
public static void main(String []args){
String str ="https://stackoverflow.com/questions/";
String[] parts = str.split("/");
String part1 = parts[0]; //https:
String part2 = parts[1]; //'nothing'
String part3 = parts[2]; //stackoverflow.com
String part4 = parts[3]; //questions
}

One thing you can do is use String#replaceAll. I know it's not what you want but off the bat it's a decent way to do it.
String uri = "https://stackoverflow.com/questions/";
if (uri.contains("https://")) {
uri = uri.replaceAll("https://", "");
}
if (uri.contains("http://")) {
uri = uri.replaceAll("http://", "");
}
int indexOfBackwardsSlash = uri.indexOf("/");
if (indexOfBackwardsSlash != -1) {
uri = uri.substring(0, indexOfBackwardsSlash);
}
Use URI#getPath.
URI uri = URI.create("https...");
String path = uri.getPath();

You could also use regular expressions:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class UrlRegex {
public static void main(String []args){
String url = "https://stackoverflow.com/questions/";
Pattern pat = Pattern.compile("//([^/]*)"); //match everything between "//" and "/"
Matcher m = pat.matcher(url);
if (m.find()) {
String hostname = m.group(1);
}
}
}

Here you go :
Pattern pattern = Pattern.compile("^(?:(?:http)s?://)?(?<hostGroup>[^/:]+).*");
Matcher matcher = pattern.matcher("https://stackoverflow.com/questions/");
if (matcher.matches()) {
System.out.println(matcher.group("hostGroup"));
} else {
System.out.println("Not found! Invalid URL ^^");
}
The above will find stackoverflow.com for the following urls strings :
https://stackoverflow.com/questions/
http://stackoverflow.com/questions/
stackoverflow.com/questions/
stackoverflow.com:80/questions/
stackoverflow.com/
stackoverflow.com
I guess that's for practicing regex ? Otherwise, prefer using the standard APIs whenever possible - (in your case URI#getHost() !)
Cheers!

If you are sure that you are getting the proper URL format than you can just substring it preferred places.
public static void main(String[] args) {
String url = "https://stackoverflow.com/questions/";
System.out.println(getHostFast(url));
}
public static String getHostFast(String url) {
String subbed = url.substring(url.indexOf('/') + 2);
return subbed.substring(0, subbed.indexOf('/'));
}
The error prof method would need to contain additional check (for example if the next '/' exists after dropping http://.

Dynamic incrementation of all numbers within a String [duplicate]

This question already has answers here:
How to increment string variable? [closed]
(4 answers)
Closed 6 years ago.
Is there any Java solution of replacing a digit in a String other than getting the digit using a matcher, increment it by one and replace it?
"REPEAT_FOR_4" will return "REPEAT_FOR_5"
"REPEAT_FOR_10" will return "REPEAT_FOR_11"
I would like to do it in one line with regex and replace, not by recomposing the String as "REPEAT_FOR_" and add the number after incrementation.
Thank you!
Later edit: I would like to know how to replace a number with the following one in a String.

I didn't use regex but here is the solution in one line. Considering your string remains the same.
public String getIncrementedString (String str){
return ("REPEAT_FOR_" + (Character.getNumericValue(str.charAt(11))+1));
}

Yes of course it's possible. using regex Pattern and Matcher, here's what you will need to do:
String str = "REPEAT_FOR_4";
Pattern p = Pattern.compile("([0-9]+)");
Matcher m = p.matcher(str);
StringBuffer s = new StringBuffer();
while (m.find())
m.appendReplacement(s, String.valueOf(1+ Integer.parseInt(m.group(1))));
String updated = s.toString();
System.out.println(updated);
This is a working Example that returns REPEAT_FOR_5 as output.

You can try this.
String ss = "REPEAT_FOR_4";
int vd = Integer.valueOf(ss.substring(ss.length() - 1));
String nss = ss.replaceAll("\\d",String.valueOf(vd+1));
System.out.println(nss);
output:
REPEAT_FOR_5
with regex: if the digit is not at the end of the string.
String ss = "REPEAT_5_FOR_ME";
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(ss);
m.find();
String strb = m.group();
int vd = Integer.valueOf(strb);
String nss = ss.replaceAll("\\d",String.valueOf(vd+1));
System.out.println(nss);
output:
REPEAT_6_FOR_ME
Base on the issue raised in the comments comments i think this solution with regex will help.
public static String convStr(String str){
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(str);
m.find();
String strb = m.group();
int vd = Integer.valueOf(strb);
return str.replaceAll("\\d",String.valueOf(vd+1));
}

Parse string value from URL

I have a string (which is an URL) in this pattern https://xxx.kflslfsk.com/kjjfkskfjksf/v1/files/media/93939393hhs8.jpeg
now I want to clip it to this
media/93939393hhs8.jpeg
I want to remove all the characters before the second last slash /.
i'm a newbie in java but in swift (iOS) this is how we do this:
if let url = NSURL(string:"https://xxx.kflslfsk.com/kjjfkskfjksf/v1/files/media/93939393hhs8.jpeg"), pathComponents = url.pathComponents {
let trimmedString = pathComponents.suffix(2).joinWithSeparator("/")
print(trimmedString) // "output = media/93939393hhs8.jpeg"
}
Basically, I'm removing everything from this Url expect of last 2 item and then.
I'm joining those 2 items using /.

String ret = url.substring(url.indexof("media"),url.indexof("jpg"))

Are you familiar with Regex? Try to use this Regex (explained in the link) that captures the last 2 items separated with /:
.*?\/([^\/]+?\/[^\/]+?$)
Here is the example in Java (don't forget the escaping with \\:
Pattern p = Pattern.compile("^.*?\\/([^\\/]+?\\/[^\\/]+?$)");
Matcher m = p.matcher(string);
if (m.find()) {
System.out.println(m.group(1));
}
Alternatively there is the split(..) function, however I recommend you the way above. (Finally concatenate separated strings correctly with StringBuilder).
String part[] = string.split("/");
int l = part.length;
StringBuilder sb = new StringBuilder();
String result = sb.append(part[l-2]).append("/").append(part[l-1]).toString();
Both giving the same result: media/93939393hhs8.jpeg

string result=url.substring(url.substring(0,url.lastIndexOf('/')).lastIndexOf('/'));
or
Use Split and add last 2 items
string[] arr=url.split("/");
string result= arr[arr.length-2]+"/"+arr[arr.length-1]

public static String parseUrl(String str) {
return (str.lastIndexOf("/") > 0) ? str.substring(1+(str.substring(0,str.lastIndexOf("/")).lastIndexOf("/"))) : str;
}

url and name spaces java convertion

I need to be able to convert:
(url) http://www.joe90.com/showroom
to
(namespace) com.joe90.showroom
I can do this using tokens etc, and a enforced rule set.
However, is there a way (a java package) that will do this for me?
or do i need to write one myself?
Thanks

java.net.URL url = new java.net.URL("http://www.joe90.com/showroom");
String tokens[] = url.getHostname().split(".");
StringBuilder sb = new StringBuilder();
for (int i=0; i<tokens.length; i++) {
if (i > 1) {
sb.append('.');
}
sb.append(tokens[i]);
}
String namespace = sb.toString();
Alternatively you can parse the hostname out.
Pattern p = Pattern.compile("^(\\w+://)?(.*?)/");
Matcher m = p.matcher(url); // string
if (m.matches()) {
String tokens[] = m.group(2).split(".");
// etc
}
Of course that regex doesn't match all URLs, for example:
http://username#hostname.com/...
That's why I suggested using java.net.URL: it does all the URL validation and parsing for you.

Your best bet would be to split the string based on the . and / characters (e.g. using Sting.split(), and then concatenate the pieces in reverse order, skipping over any you don't want to include (e.g. www)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

modify regexp to detect all url links [duplicate] - java

Not commenting on the rest of the source code Make the prefix optional, using a ? after the group that declares the possible prefixes. String regex = "\\(?\\b(http://|https://|www[.])?[-A-Za-z0-9+&##/%?=~_()|!:,.;]*[-A-Za-z0-9+&##/%=~_()|]"; See live test here.

Related

Deleting exact match String from text file [duplicate]

How to split URL and take only the host names [duplicate]

Dynamic incrementation of all numbers within a String [duplicate]

Parse string value from URL

url and name spaces java convertion

Categories

Resources