I'm trying to extract the domain + subdomain from any URL (without the full URL suffix or http and www prefix).
I have the following lists of domains:
p.io -> p.io
amazon.com -> amazon.com
d.amazon.ca -> d.amazon.ca
domain.amazon.co.uk -> domain.amazon.co.uk
https://regex101.com/ -> regex101.com
www.regex101.comdddd -> regex101.com
www.wix.com.co -> wix.com.co
https://stackoverflow.com/questions/2301285/what-do-lazy-and-greedy-mean-in-the-context-of-regular-expressions -> stackoverflow.com
smile.amazon.com -> smile.amazon.com
I'm using the following regex to extract domain + subdomain:
[^w.\:\/]+[a-zA-Z\.]?\.[a-zA-Z]{1,3}(\.[a-zA-Z]{1,3})?(\.[a-zA-Z]{0,3})?|[w]{1,2}[^w.]+\.[a-zA-Z]{1,3}(\.[a-zA-Z]{1,3})?
The issue is that it is splitting several domains into two such as: d.amazon.ca -> d.ama + zon.ca and matching some non domain text such as: what-do-lazy-and-greedy-mean-in-the-context-of-regular-expressions as seen in image below:
How can I force the regex to be greedy in the sense that it matches the full domain as a single match?
I'm using Java.
I'd use the standard URI class instead of a regular expression to parse out the domain:
import java.net.URI;
import java.net.URISyntaxException;
import java.util.Optional;
public class Demo {
private static Optional<String> getHostname(String domain) {
try {
// Add a scheme if missing
if (domain.indexOf("://") == -1) {
domain = "https://" + domain;
}
URI uri = new URI(domain);
return Optional.ofNullable(uri.getHost()).map(s -> s.startsWith("www.") ? s.substring(4) : s);
} catch (URISyntaxException e) {
return Optional.empty();
}
}
public static void main(String[] args) {
String[] domains = new String[] {
"p.io",
"amazon.com",
"d.amazon.ca",
"domain.amazon.co.uk",
"https://regex101.com/",
"www.regex101.comdddd", // .comdddd is (potentially) a valid TLD; not sure why your output removes the d's
"www.wix.com.co",
"https://stackoverflow.com/questions/2301285/what-do-lazy-and-greedy-mean-in-the-context-of-regular-expressions",
"smile.amazon.com"
};
for (String domain : domains) {
System.out.println(getHostname(domain).orElse("hostname not found"));
}
}
}
outputs
p.io
amazon.com
d.amazon.ca
domain.amazon.co.uk
regex101.com
regex101.comdddd
wix.com.co
stackoverflow.com
smile.amazon.com
Related
I am new to geb, spock and groovy. The script I am working on is I have a groovy class containing my json. In my groovy class I count how many objects are there in the json and for each object I read key values and then I have another unit testSpec in spock and Geb where I have create my login test script to login to the application which is very simple.
The scenario I am trying to achieve is I want to generate data table in spock test based on data present in json file.
Here what I have achieved till now
My InputDataJson.groovy file
package resources
import geb.spock.GebSpec
import groovy.json.JsonSlurper
import spock.lang.Shared
class InputDataJson extends GebSpec{
#Shared
def inputJSON,
idValue, passwordValue, jsonSize
#Shared
def credsList = []
def setup() {
inputJSON = '''{
"validLogin":{
"username" : "abc",
"password" : "correcttest"
},
"invalidLogin":{
"username" : "xyz",
"password" : "badtest"
}
}'''
def JsonSlurper slurper = new JsonSlurper()
def TreeMap parsedJson = slurper.parseText(inputJSON)
jsonSize = parsedJson.size()
Set keySet = parsedJson.keySet()
int keySetCount = keySet.size()
for(String key : keySet){
credsList.add(new Creds(username: parsedJson[key].username,password:
parsedJson[key].password))
}
}
}
and here is my sample spock geb test
package com.test.demo
import grails.test.mixin.TestMixin
import grails.test.mixin.support.GrailsUnitTestMixin
import pages.LoginPage
import resources.InputDataJson
/**
* See the API for {#link grails.test.mixin.support.GrailsUnitTestMixin} for usage instructions
*/
#TestMixin(GrailsUnitTestMixin)
class SampleTest1Spec extends InputDataJson {
def credentialsList = []
def setup() {
credentialsList = credsList
}
def cleanup() {
}
void "test something"() {
}
def "This LoginSpec test"() {
given:
to LoginPage
when:'I am entering username and password'
setUsername(username)
setPassword(password)
login()
then: "I am being redirected to the homepage"
println("Hello")
where:
[username,password]<< getCreds()
//credsList[0]['username'] | credsList[0]['password']
}
def getCreds(){
println(" CREDS inside " + credsList)
println(" credentialsList : " + credentialsList)
}
}
The problem is when I run this test in debug mode (I understand in spock test first where clause is executed first) my credsList and credentialsList both are coming null and when execution mode reaches to "when" section it fetches the correct user name and password. I am not sure where I am making mistake.
Any help is well appreciated.
Leonard Brünings said:
try replacing setup with setupSpec
Exactly, this is the most important thing. You want something that is initialised before any feature method or iteration thereof starts. So if you want to initialise static or shared fields, this is the way to go.
Additionally, credsList contains Creds objects, not just pairs of user names and passwords. Therefore, if you want those in separate data variables, you need to dereference them in the Creds objects. Here is a simplified version of your Spock tests without any Grails or Geb, because your question is really just a plain Spock question:
package de.scrum_master.stackoverflow.q71122575
class Creds {
String username
String password
#Override
String toString() {
"Creds{" + "username='" + username + '\'' + ", password='" + password + '\'' + '}'
}
}
package de.scrum_master.stackoverflow.q71122575
import groovy.json.JsonSlurper
import spock.lang.Shared
import spock.lang.Specification
class InputDataJson extends Specification {
#Shared
List<Creds> credsList = []
def setupSpec() {
def inputJSON = '''{
"validLogin" : {
"username" : "abc",
"password" : "correcttest"
},
"invalidLogin" : {
"username" : "xyz",
"password" : "badtest"
}
}'''
credsList = new JsonSlurper().parseText(inputJSON)
.values()
.collect { login -> new Creds(username: login.username, password: login.password) }
}
}
package de.scrum_master.stackoverflow.q71122575
import spock.lang.Unroll
class CredsTest extends InputDataJson {
#Unroll("verify credentials for user #username")
def "verify parsed credentials"() {
given:
println "$username, $password"
expect:
username.length() >= 3
password.length() >= 6
where:
cred << credsList
username = cred.username
password = cred.password
}
}
The result in IntelliJ IDEA looks like this:
Try it in the Groovy web console
I am creating a Twitter Sentiment Analysis tool in Java. I am using the Twitter4J API to search tweets via the hashtag feature in twitter and then provide sentiment analysis on these tweets. Through research, I have found that the best solution to doing this will be using a POS and TreeTagger for Java.
At the moment, I am using the examples provided to see how the code works, although I am encountering some problems.
This is the code
import org.annolab.tt4j.*;
import static java.util.Arrays.asList;
public class Example {
public static void main(String[] args) throws Exception {
// Point TT4J to the TreeTagger installation directory. The executable is expected
// in the "bin" subdirectory - in this example at "/opt/treetagger/bin/tree-tagger"
System.setProperty("treetagger.home", "/opt/treetagger");
TreeTaggerWrapper tt = new TreeTaggerWrapper<String>();
try {
tt.setModel("/opt/treetagger/models/english.par:iso8859-1");
tt.setHandler(new TokenHandler<String>() {
public void token(String token, String pos, String lemma) {
System.out.println(token + "\t" + pos + "\t" + lemma);
}
});
tt.process(asList(new String[] { "This", "is", "a", "test", "." }));
}
finally {
tt.destroy();
}
}
}
At the moment, when this is run, I receive an error which says
TreeTaggerWrapper cannot be resolved to a type
TokenHandler cannot be resolved to a type
I will be grateful for any help given
Thank you
I'm wondering if there are any nice simple ways to validate a Diameter URI (description below) using Java?
Note, a Diameter URI must have one of the forms:
aaa://FQDN[:PORT][;transport=TRANS][;protocol=PROT]
aaas://FQDN[:PORT][;transport=TRANS][;protocol=PROT]
The FQDN (mandatory) has to be replaced with the fully qualified host name (or IP), the PORT (optional, default is 3868) with the port number, TRANS (optional) with the transport protocol (can be TCP or SCTP) and PROT (optional) with diameter.
Some examples of the acceptable forms are:
aaa://server.com
aaa://127.0.0.1
aaa://server.com:1234
aaas://server.com:1234;transport=tcp
aaas://[::1]
aaas://[::1]:1234
aaas://[::1]:1234;transport=tcp;protocol=diameter
Note, as shown above, if using an IPv6 address, the address must be placed in box brackets, whereas the port number (if specified), with its colon separator, should be outside of the brackets.
I think doing this using regular expressions would be quite messy and difficult to understand, and other examples I have seen which don't use regex are just as awkward looking (such as https://code.google.com/p/cipango/source/browse/trunk/cipango-diameter/src/main/java/org/cipango/diameter/util/AAAUri.java?r=763).
So was wondering if there were maybe a nicer way to do this, e.g. something like a URI validator library, which takes some rules (such as those for the Diameter URI above) and then applies them to some input to validate it?
I've had a look at the Google Guava libraries as well to see if there was anything that could help but I couldn't see anything offhand.
Many thanks!
Since the URI class is not sufficient, and in fact will create exceptions for valid Diameter URI's, this is not such a trivial task.
I think reg.ex. is the way to go here, but due to the complexities, you might be better off if you place it in a helper class. I agree that the code you linked to did not look very good -- you can do better! :)
Take a look at the following code example, where I've broken down a regEx into its individual parts as a way to "document" what's happening.
It is not in any ways complete, it was created to conform with your examples. Especially the IP6 type addresses needs work. In addition, you might want to give more information in the validation; like why it failed.
But at least it's a beginning, and I think it is quite a bit better than the code you linked to. It might seem like an awful lot of code, but most of it is actually print statements and tests... :) In addition, since each part is broken down and kept as field variables, you can create simple getters to access each part (if that is of importance to you).
import java.net.URISyntaxException;
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class DiameterUri {
private String diameterUri;
private String protocol;
private String host;
private String port;
private String[] params;
public DiameterUri(String diameterUri) throws URISyntaxException {
this.diameterUri = diameterUri;
validate();
System.out.println(diameterUri);
System.out.println(" protocol=" + protocol);
System.out.println(" host=" + host);
System.out.println(" port=" + port);
System.out.println(" params=" + Arrays.toString(params));
}
private void validate() throws URISyntaxException {
String protocol = "(aaa|aaas)://"; // protocol- required
String ip4 = "[A-Za-z0-9.]+"; // ip4 address - part of "host"
String ip6 = "\\[::1\\]"; // ip6 address - part of "host"
String host = "(" + ip4 + "|" + ip6 + ")"; // host - required
String port = "(:\\d+)?"; // port - optional (one occurrence)
String params = "((;[a-zA-Z0-9]+=[a-zA-Z0-9]+)*)"; // params - optional (multiple occurrences)
String regEx = protocol + host + port + params;
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(diameterUri);
if (matcher.matches()) {
this.protocol = matcher.group(1);
this.host = matcher.group(2);
this.port = matcher.group(3) == null ? null : matcher.group(3).substring(1);
String paramsFromUri = matcher.group(4);
if (paramsFromUri != null && paramsFromUri.length() > 0) {
this.params = paramsFromUri.substring(1).split(";");
} else {
this.params = new String[0];
}
} else {
throw new URISyntaxException(diameterUri, "invalid");
}
}
public static void main(String[] args) throws URISyntaxException {
new DiameterUri("aaa://server.com");
new DiameterUri("aaa://127.0.0.1");
new DiameterUri("aaa://server.com:1234");
new DiameterUri("aaas://server.com:1234;transport=tcp");
new DiameterUri("aaas://[::1]");
new DiameterUri("aaas://[::1]:1234");
new DiameterUri("aaas://[::1]:1234;transport=tcp;protocol=diameter");
try {
new DiameterUri("127.0.0.1");
throw new RuntimeException("Expected URISyntaxException");
} catch (URISyntaxException ignore) {}
}
}
I would like to use JNDI to look up Kerberos SRV records in a local network. I try to guess the local domain in hopefully clever ways. If that fails I would like to look up the plain entry, e.g. _kerberos._tcp without any suffix and rely on the DNS domain search list to find the right entry. This works on Windows with nslookup -type=srv _kerberos._tcp and Linux with host -t srv _kerberos._tcp. The domain example.test is appended and the entry is found.
Here is an example program to do DNS lookups via JNDI:
import java.util.Hashtable;
import javax.naming.Context;
import javax.naming.NamingEnumeration;
import javax.naming.NamingException;
import javax.naming.directory.Attribute;
import javax.naming.directory.Attributes;
import javax.naming.directory.DirContext;
import javax.naming.directory.InitialDirContext;
public class JndiDnsTest {
public static void main(String[] args) {
if (args.length < 2) {
System.out.println("Usage: " + JndiDnsTest.class.getName() +
" name record-types...");
return;
}
String name = args[0];
String[] recordTypes = new String[args.length - 1];
System.arraycopy(args, 1, recordTypes, 0, args.length - 1);
Hashtable<String, String> env = new Hashtable<String,String>();
env.put(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.dns.DnsContextFactory");
try {
DirContext ctx = new InitialDirContext(env);
Attributes dnsQueryResult = ctx.getAttributes(name, recordTypes);
if (dnsQueryResult == null) {
System.out.println("Not found: '" + name + "'");
}
for (String rrType: recordTypes) {
Attribute rr = dnsQueryResult.get(rrType);
if (rr != null) {
for (NamingEnumeration<?> vals = rr.getAll(); vals.hasMoreElements();) {
System.out.print(rrType + "\t");
System.out.println(vals.nextElement());
}
}
}
} catch (NamingException e) {
e.printStackTrace(System.err);
}
System.out.println("\nThe DNS search list:");
for (Object entry: sun.net.dns.ResolverConfiguration.open().searchlist()) {
System.out.println(entry);
}
System.out.println("\nsun.net.spi.nameservice.domain = " +
System.getProperty("sun.net.spi.nameservice.domain"));
}
}
It appears to me that JNDI only does one lookup for the direct name. No entry is found where above commands succeed. It seems it does not use the DNS search list. Its contents are printed correctly at the bottom, though.
On the other hand the Networking properties documentation says that
If the sun.net.spi.nameservice.domain property is not defined then the provider will use any domain or domain search list configured in the platform DNS configuration.
(The property is not set.) The Java version is Sun Java 1.6.0_20.
Does JNDI use the DNS search list or not?
It's a known bug - http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6427214
How create a REGEX to detect if a "String url" contains a file extension (.pdf,.jpeg,.asp,.cfm...) ?
Valids (without extensions):
http://www.yahoo.com
http://dbpedia.org/ontology/
http://www.rdf.com.br
Invalids (with extensions):
http://www.thesis.com/paper.pdf
http://pics.co.uk/mypic.png
http://jpeg.com/images/cool/the_image.JPEG
Thanks,
Celso
In Java, you are better off using String.endsWith() This is faster and easier to read.
Example:
"file.jpg".endsWith(".jpg") == true
Alternative version without regexp but using, the URI class:
import java.net.*;
class IsFile {
public static void main( String ... args ) throws Exception {
URI u = new URI( args[0] );
for( String ext : new String[] {".png", ".pdf", ".jpg", ".html" } ) {
if( u.getPath().endsWith( ext ) ) {
System.out.println("Yeap");
break;
}
}
}
}
Works with:
java IsFile "http://download.oracle.com/javase/6/docs/api/java/net/URI.html#getPath()"
How about this?
// assuming the file extension is either 3 or 4 characters long
public boolean hasFileExtension(String s) {
return s.matches("^[\\w\\d\\:\\/\\.]+\\.\\w{3,4}(\\?[\\w\\W]*)?$");
}
#Test
public void testHasFileExtension() {
assertTrue("3-character extension", hasFileExtension("http://www.yahoo.com/a.pdf"));
assertTrue("3-character extension", hasFileExtension("http://www.yahoo.com/a.htm"));
assertTrue("4-character extension", hasFileExtension("http://www.yahoo.com/a.html"));
assertTrue("3-character extension with param", hasFileExtension("http://www.yahoo.com/a.pdf?p=1"));
assertTrue("4-character extension with param", hasFileExtension("http://www.yahoo.com/a.html?p=1&p=2"));
assertFalse("2-character extension", hasFileExtension("http://www.yahoo.com/a.co"));
assertFalse("2-character extension with param", hasFileExtension("http://www.yahoo.com/a.co?p=1&p=2"));
assertFalse("no extension", hasFileExtension("http://www.yahoo.com/hello"));
assertFalse("no extension with param", hasFileExtension("http://www.yahoo.com/hello?p=1&p=2"));
assertFalse("no extension with param ends with .htm", hasFileExtension("http://www.yahoo.com/hello?p=1&p=a.htm"));
}
Not a Java developer anymore, but you could define what you're looking for with the following regex
"/\.(pdf|jpe{0,1}g|asp|docx{0,1}|xlsx{0,1}|cfm)$/i"
Not certain what the function would look like.
If the following code returns true, then contains a file extension in the end:
urlString.matches("\\p{Graph}+\\.\\p{Alpha}{2,4}$");
Assuming that a file extension is a dot followed by 2, 3 or 4 alphabetic chars.