I want to get all .mp4 URLs of this String using Regex.
Also I want to know how to get only the last .mp4 URL using Regex.
Thanks
contentType=application/x-mpegURL, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.m3u8},
Variant{bitrate=0, contentType=application/dash+xml, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.mpd},
Variant{bitrate=320000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/320x180/YqZ72rzLj3VWVhy4.mp4},
Variant{bitrate=832000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/640x360/A2vMgzo2ElpPP6TE.mp4},
Variant{bitrate=2176000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/1280x720/j9xbNzRZqEbYs_2s.mp4}]}]";
Regex:
https?.*?\.mp4
Literal http
Followed by an optional 's': s?
Remove the question mark if they will all use HTTPS.
Followed by as few characters as possible: .*?
Followed by an mp4 extension (literal dot) \.mp4
2 Approaches:
If you're sure the URL's will always begin with https:// and will not contain a mp4 after the complete URL is finished, then you can use
pattern = "https://.*mp4";
String[] arr = {
"contentType=application/x-mpegURL, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.m3u8}",
"Variant{bitrate=0, contentType=application/dash+xml, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.mpd}",
"Variant{bitrate=320000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/320x180/YqZ72rzLj3VWVhy4.mp4}",
"Variant{bitrate=832000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/640x360/A2vMgzo2ElpPP6TE.mp4}",
"Variant{bitrate=2176000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/1280x720/j9xbNzRZqEbYs_2s.mp4}]}]"
};
String pattern = "https://.*mp4";
Pattern r = Pattern.compile(pattern);
for (String line : arr) {
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println(m.group(0));
} else {
System.out.println("NO MATCH");
}
}
If not, to Support all types of URL's then change your pattern to what is defined here with a little modification,
String pattern =
"(((ht|f)tp(s?)\\:\\/\\/|~\\/|\\/)|www.)" +
"(\\w+:\\w+#)?(([-\\w]+\\.)+(com|org|net|gov" +
"|mil|biz|info|mobi|name|aero|jobs|museum" +
"|travel|[a-z]{2}))(:[\\d]{1,5})?" +
"(((\\/([-\\w~!$+|.,=]|%[a-f\\d]{2})+)+|\\/)+|\\?|#)?" +
"((\\?([-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" +
"([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)" +
"(&(?:[-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" +
"([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)*)*" +
"(#([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)?\\b"+"mp4";
Output:
NO MATCH
NO MATCH
https://video.twimg.com/amplify_video/822938952332144642/vid/320x180/YqZ72rzLj3VWVhy4.mp4
https://video.twimg.com/amplify_video/822938952332144642/vid/640x360/A2vMgzo2ElpPP6TE.mp4
https://video.twimg.com/amplify_video/822938952332144642/vid/1280x720/j9xbNzRZqEbYs_2s.mp4
Related
How can I get an mp3 url with REGEX?
This mp3 url, for example:
https://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3
This is a what I've tried so far but I want it to only accept a url with '.mp3' on the end.
(https?|ftp|file)://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]
This expression would likely pass your desired inputs:
^(https?|ftp|file):\/\/(www.)?(.*?)\.(mp3)$
If you wish to add more boundaries to it, you can do that. For instance, you can add a list of chars instead of .*.
I have added several capturing groups, just to be simple to call, if necessary.
RegEx
If this wasn't your desired expression, you can modify/change your expressions in regex101.com.
RegEx Circuit
You can also visualize your expressions in jex.im:
const regex = /^(https?|ftp|file):\/\/(www.)?(.*?)\.(mp3)$/gm;
const str = `https://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3
http://soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3
http://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3
ftp://soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3
file://localhost/examples/mp3/SoundHelix-Song-1.mp3
file://localhost/examples/mp3/SoundHelix-Song-1.wav
file://localhost/examples/mp3/SoundHelix-Song-1.avi
file://localhost/examples/mp3/SoundHelix-Song-1.m4a`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Java Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "^(https?|ftp|file):\\/\\/(www.)?(.*?)\\.(mp3)$";
final String string = "https://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3\n"
+ "http://soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3\n"
+ "http://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3\n"
+ "ftp://soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3\n"
+ "file://localhost/examples/mp3/SoundHelix-Song-1.mp3\n"
+ "file://localhost/examples/mp3/SoundHelix-Song-1.wav\n"
+ "file://localhost/examples/mp3/SoundHelix-Song-1.avi\n"
+ "file://localhost/examples/mp3/SoundHelix-Song-1.m4a";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
If you want it to match inputs ending with '.mp3' you should add \.mp3$ at the end of your regex.
$ indicates the end of your expression
(https?|ftp|file):\/\/[-a-zA-Z0-9+&##\/%?=~_|!:,.;]*[-a-zA-Z0-9+&##\/%=~_|]\.mp3$
Matching:
https://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3 **=> Match**
https://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp4 **=> No Match**
You could use anchors to assert the start ^ and the end $ of the string and end the pattern with .mp3:
^https?://\S+\.mp3$
Explanation
^ Assert start of string
https?:// Match http with optional s and ://
\S+ Match 1+ times a non whitespace char
\.mp3 Match .mp3
$ Assert end of string
Regex demo | Java demo
For example:
String regex = "^https?://\\S+\\.mp3$";
String[] strings = {
"https://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3",
"https://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp4"
};
Pattern pattern = Pattern.compile(regex);
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(0));
}
}
Result
https://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3
Having this four type of file names:
Filename with double extension
Filename with no extension
Filename with dot at the end, and no extension
Filename with a proper name.
Like this:
String doubleexsension = "doubleexsension.pdf.pdf";
String noextension = "noextension";
String nameWithDot = "nameWithDot.";
String properName = "properName.pdf";
String extension = "pdf";
My aim is to sanitze all the types and output only the filename.filetype properly. I made a little stupid script in order to make this post:
ArrayList<String> app = new ArrayList<String>();
app.add(doubleexsension);
app.add(properName);
app.add(noextension);
app.add(nameWithDot);
System.out.println("------------");
for(String i : app) {
// Ends with .
if (i.endsWith(".")) {
String m = i + extension;
System.out.println(m);
break;
}
// Double extension
String p = i.replaceAll("(\\.\\w+)\\1+$", "$1");
System.out.println(p);
}
This outputs:
------------
doubleexsension.pdf
properName.pdf
noextension
nameWithDot.pdf
I dont know how can I handle the noextension one. How can I do it? When there's no extension, it should take the extension value and apped it to the string at the end.
My desired output would be:
------------
doubleexsension.pdf
properName.pdf
noextension.pdf
nameWithDot.pdf
Thanks in advance.
You may add alternatives to the regex to match all kinds of scenarios:
(?:(\.\w+)\1*|\.|([^.]))$
And replace with $2.pdf. See the regex demo.
EDIT: In case the extensions that can be duplicated are known, you may use the whitelisting approach via an alternation group:
(?:(\.(?:pdf|gif|jpe?g))\1*|\.|([^.]))$
See another regex demo.
Details:
(?: - start of grouping, the $ end of string anchor is applied to all the alternatives below (they must be at the end of string)
(\.\w+)\1* - duplicated (or not) extensions (. + 1+ word chars repeated zero or more times) (with the whitelisting approach, only the indicated extensions will be taken into account - (?:pdf|gif|jpe?g) will only match pdf, gif, jpeg, jpg, etc. if more alternatives are added)
| - or
\. - a dot
| - or
([^.]) - any char that is not a dot captured into Group 2
) - end of the outer grouping
$ - end of string.
See Java demo:
List<String> strs = Arrays.asList("doubleexsension.pdf.pdf","noextension","nameWithDot.","properName.pdf");
for (String str : strs)
System.out.println(str.replaceAll("(?:(\\.\\w+)\\1*|\\.|([^.]))$", "$2.pdf"));
Easy
if (-1 == i.indexOf('.'))
System.out.println(i + "." + extension);
I would avoid the complexity (and reduced readability) of regular expressions:
String m = i;
if (m.endsWith(".")) {
m = m + extension;
}
if (m.endsWith("." + extension + "." + extension)) {
m = m.substring(0, m.length() - extension.length() - 1);
}
if (!m.endsWith("." + extension)) {
m = m + "." + extension;
}
Why so complex. Just do str.replaceAll("\\..*", "") + "." + extension
Java 7 NIO has a way to do this by using PathMatcher
PathMatcher matcher = FileSystems.getDefault().getPathMatcher("glob:*.pdf");
Path filename = namewithdot.pdf;
if (matcher.matches(filename)) {
System.out.println(filename);
}
I have a String which contains some url how i can find all the href with a regular expression?
prodotto di prova
Now i have this which find all amazon links now i need to add also the href to this regex:
String regex="(http|www\\.)(amazon|AMAZON)\\.(com|it|uk|fr|de)\\/(?:gp\\/product|gp\\/product\\/glance|[^\\/]+\\/dp|dp|[^\\/]+\\/product-reviews)\\/([^\\/]{10})";
This pattern works for me in Java: (IDEONE here)
String input = "prodotto di prova\"";
String pattern = "href=(?<link>['\\\"](?:https?:\\/\\/)?(?:www\\.)?(?:amazon|AMAZON)\\.(?:com|it|uk|fr|de)\\/(?<product>:gp\\/product|gp\\/product\\/glance|[^\\/]+\\/dp|dp|[^\\/]+\\/product-reviews)\\/(?<productID>[^\\/]{10})\\/(?<queryString>.*?)\\\")";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
if (m.find( )) {
System.out.println("Amazon link: " + m.group(0) );
System.out.println("product: " + m.group("product") );
System.out.println("productID: " + m.group("productID"));
System.out.println("querystring: " + m.group("queryString"));
} else {
System.out.println("NO MATCH");
}
output:
Amazon link:
href="http://www.amazon.it/Die-10-Symphonien-Orchesterlieder-Sinfonie-Complete/dp/B003LQSHBO/ref=sr_1_2?ie=UTF8&qid=1440101590&sr=8-2&keywords=mahler"
product: Die-10-Symphonien-Orchesterlieder-Sinfonie-Complete/dp
productID: B003LQSHBO
querystring: ref=sr_1_2?ie=UTF8&qid=1440101590&sr=8-2&keywords=mahler
Java's rules for backslashes and escapes in strings are absolutely infuriating to me and I never get it right. You may find it helpful to go to http://www.regexplanet.com/advanced/java/index.html and enter a regex, which it will convert into a java string with the proper escapes. (I couldn't get mine working until I did this!)
i have this problem:
i have to make a regular expression which take this urls:
http://www.amazon.it/TP-LINK-TL-WR841N-Wireless-300Mbps-Ethernet/dp/B001FWYGJS?ie=UTF8&redirect=true&ref_=s9_simh_gw_p147_d0_i2
http://www.amazon.it/gp/product/B014KMQWU0/
http://www.amazon.it/gp/product/glance/B014KMQWU0/
I need a regular expression which matches the full url until the ASIN of the product (ASIN is a word of 10 capital letters)
I have write this regex but not make what i want:
String regex="http:\\/\\/(?:www\\.|)amazon\\.com\\/(?:gp\\ product|| gp\\ product\\ glance || [^\\/]+\\/dp|dp)\\/([^\\/]{10})";
Pattern pattern=Pattern.compile(regex);
Matcher urlAmazonMatcher = pattern.matcher(url);
while (urlAmazonMatcher.find()) {
System.out.println("PROVA "+urlAmazonMatcher.group(0));
}
This is my solution. Finally it works :D
String regex="(http|www\\.)amazon\\.(com|it|uk|fr|de)\\/(?:gp\\/product|gp\\/product\\/glance|[^\\/]+\\/dp|dp)\\/([^\\/]{10})";
Pattern pattern=Pattern.compile(regex);
Matcher urlAmazonMatcher = pattern.matcher(url);
String toReturn = null;
while (urlAmazonMatcher.find()) {
toReturn=urlAmazonMatcher.group(0);
}
How about
/[^/?]{10}(/$|\?)
This matches 10 characters that are neither / nor ? following a slash if those characters are followed by a final slash or a question mark.
You can get the part that precedes or follows the ASIN using one of the various Matcher functions.
Here is my work from a previous project that was to extract URLs from text:
private Pattern getUriPattern() {
if(uriPattern == null) {
// taken from http://labs.apache.org/webarch/uri/rfc/rfc3986.html
//TODO implement the full URI syntax
String genDelims = "\\:\\/\\?\\#\\[\\]\\#";
String subDelims = "\\!\\$\\&\\'\\*\\+\\,\\;\\=";
String reserved = genDelims + subDelims;
String unreserved = "\\w\\-\\.\\~"; // i.e. ALPHA / DIGIT / "-" / "." / "_" / "~"
String allowed = reserved + unreserved;
// ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
uriPattern = Pattern.compile("((?:[^\\:/\\?\\#]+:)?//[" + allowed + "&&[^\\?\\#]]*(?:\\?([" + allowed + "&&[^\\#]]*))?(?:\\#[" + allowed + "]*)?).*");
}
return uriPattern;
}
You can use the above method as follows:
Matcher uriMatcher =
getUriPattern().matcher(text);
if(uriMatcher.matches()) {
String candidateUriString = uriMatcher.group(1);
try {
new URI(candidateUriString); // check once again if you matched a URL
// your code here
} catch (Exception e) {
// error handling
}
}
This will catch the whole URL, including params. You can then split it up to the first occurence of '?' (if any) and take the first part. Of course, you can rework the regex too.
I want to replace all :variable (word starting with :) with ${variable}$.
For example,
:aks_num with ${aks_num}$
:brn_num with ${brn_num}$
Following is my code, which does not work:
public static void main(String[] argv) throws Exception
{
CharSequence chSeq = "AND ((:aks_num = -1) OR (aks_num = :aks_num AND ((:brn_num = -1) OR (brn_num = :brn_num))))";
// replaceAll also not working
//String s = chSeq.replaceAll(":\\([a-z_]*\\)","\\${ $1 \\}$");
Pattern p = Pattern.compile(":\\([a-z_]*\\)");
Matcher m = p.matcher(chSeq);
if (m.find()) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
} else {
System.out.println("NO MATCH");
}
}
While in shell script the following regex works perfectly:
s/:\([a-z_]*\)/${\1}$/g
:\\([a-z_]*\\) (with escaped parenthesis) means that you want to match expressions like :(aks_num). Obviously, there are no such expression in the input string. That explains why there are no matches.
Instead, if you want to use parenthesis in order to capture some variables, you should not escape the parenthesis.
Example :
CharSequence chSeq = "AND ((:aks_num = -1) OR (aks_num = :aks_num AND ((:brn_num = -1) OR (brn_num = :brn_num))))";
Pattern p = Pattern.compile(":([a-z_]*)");
Matcher m = p.matcher(chSeq);
while (m.find()) {
System.out.println("Found value: " + m.group(0)+". Captured : "+m.group(1));
}
Output:
Found value: :aks_num. Captured : aks_num
Found value: :aks_num. Captured : aks_num
Found value: :brn_num. Captured : brn_num
Found value: :brn_num. Captured : brn_num
CharSequence chSeq = "AND ((:aks_num = -1) OR (aks_num = :aks_num AND ((:brn_num = -1) OR (brn_num = :brn_num))))";
// replaceAll also not working
//String s = chSeq.replaceAll(":\\([a-z_]*\\)","\\${ $1 \\}$");
Pattern p = Pattern.compile(":(\\w+)");
Matcher m = p.matcher(chSeq);
while (m.find()) {
System.out.println("Found value: " + m.group(1) );
}
Ideone Demo
Working fine with replaceAll
Pattern p = Pattern.compile("(:\\w+)");
Matcher m = p.matcher(x);
x = m.replaceAll("\\${$1}\\$");
You don't need to escape the parentheses, so
Pattern.compile(":([a-z_]*)");
should work.
I believe you got confused with the Java's regex syntax that is different from regular sed syntax. You do not need to escape parentheses to make them "special" grouping operators. Vice versa, in Java, when you escape parentheses, they start matching literal ( and ) symbols.
In the replacement pattern, $ must be escaped for the regex engine to replace with literal $ symbols, but you do not need to escape braces there.
So, just use
.replaceAll(":([a-z_]+)", "\\${$1}\\$")
See the IDEONE demo
I suggest the + quantifier because I doubt you need to match a : followed with a space, or digits - any non-letter.
BTW, you do not need any /g flag in Java since replaceAll will replace all matches with the provided replacement pattern.
NOTE: you can further adjust the pattern to match all letters/digits/underscores with ":(\\w+)". Or just alphanumerics/underscore: ":([\\p{Alnum}_]+)".