In Java, how do I extract the domain of a URL?

In Java, how do I extract the domain of a URL? - java

I'm using Java 8. I want to extract the domain portion of a URL. Just in case I'm using the word "domain" incorrectly, what i want is if my server name is
test.javabits.com
I want to extract "javabits.com". Similarly, if my server name is
firstpart.secondpart.lastpart.org
I want to extract "lastpart.org". I tried the below
final String domain = request.getServerName().replaceAll(".*\\.(?=.*\\.)", "");
but its not extracting the domain properly. Then I tried what this guy has in his site -- https://www.mkyong.com/regular-expressions/domain-name-regular-expression-example/, e.g.
private static final String DOMAIN_NAME_PATTERN = "^((?!-)[A-Za-z0-9-]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$";
but that is also not extracting what I want. How can I extract the domain name portion properly?

Summary: Do not use regex for this. Use whois.
If I try to extrapolate from your question, to find out what you really want to do, I guess you want to find the domain belonging to some non-infrastructural owner from the host part of a URL. Additionally, from the tag of your question, you want to do it with the help of a regex.
The task you are undertaking is at best impractical, but probably impossible.
There are a number of corner cases that you would have to weed out. Apart from the list of infrastructural domains kindly provided by Lennart in https://publicsuffix.org/list/public_suffix_list.dat, you also have the cases of an empty host field in the URL or an IP-address forming the host part.
So, is there a better approach to this? Of course there is. What you do want to do is query a public database for the data you need. The protocol for such queries is called WHOIS.
Apache Commons provide an easy way to access WHOIS information in the WhoisClient. From there you can query the domain field, and find some more information that may be useful to you.
It shouldn't be harder than
import org.apache.commons.net.whois.WhoisClient;
import java.io.IOException;
public class CommonsTest {
public static void main(String args) {
WhoisClient c = new WhoisClient();
try {
c.connect(WhoisClient.DEFAULT_HOST);
System.out.println(c.query(URL));
c.disconnect();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Using this will get you the whois information aboutt he domain you are asking for. If the domain is uregistered, that is, is a private domain, as in the case of www.stackexchange.com you will get an error saying no domain is registered. Remove the first part of the address and try again. Once you found the registered domain, you will also find the registrar and the registrer.
Now, unfortunately, whois is not as simple as one would think. Read further on https://manpages.debian.org/jessie/whois/whois.1.en.html for an elaboration on how to use it and what information you can expect from different sources.
Also, check related questions here.

try it like this:
String parts[] = longDomain.split(".");
String domain = parts[parts.length-2] + "." + [parts.length -1];

Related

Enumerate Custom Slot Values from Speechlet

Is there any way to inspect or enumerate the Custom Slot Values that are set-up in your interaction model? For Instance, Say you have an intent schema with the following intent:
{
"intent": "MySuperCoolIntent",
"slots":
[
{
"name": "ShapesNSuch",
"type": "LIST_OF_SHAPES"
}
]
}
Furthermore, you've defined the LIST_OF_SHAPES Custom Slot to have the following Values:
SQUARE
TRIANGLE
CIRCLE
ICOSADECAHECKASPECKAHEDRON
ROUND
HUSKY
Question: is there a method I can call from my Speechlet or my RequestStreamHandler that will give me an enumeration of those Custom Slot Values??
I have looked through the Alexa Skills Kit's SDK Javadocs Located Here
And I'm not finding anything.
I know I can get the Slot's value that is sent in with the intent:
String slotValue = incomingIntentRequest.getIntent().getSlot("LIST_OF_SHAPES").getValue();
I can even enumerate ALL the incoming Slots (and with it their values):
Map<String, Slot> slotMap = IncomingIntentRequest.getIntent().getSlots();
for(Map.Entry<String, Slot> entry : slotMap.entrySet())
{
String key = entry.getKey();
Slot slot = (Slot)entry.getValue();
String slotName = slot.getName();
String slotValue = slot.getValue();
//do something nifty with the current slot info....
}
What I would really like is something like:
String myAppId = "amzn1.echo-sdk-ams.app.<TheRestOfMyID>";
List<String> posibleSlotValues = SomeMagicAlexaAPI.getAllSlotValues(myAppId, "LIST_OF_SHAPES");
With this information I wouldn't have to maintain two separate "Lists" or "Enumerations"; One within the interaction Model and another one within my Request Handler. Seems like this should be a thing right?

No, the API does not allow you to do this.
However, since your interaction model is intimately tied with your development, I would suggest you check in the model with your source code in your source control system. If you are going to do that, you might as well put it with your source. Depending on your language, that also means you can probably read it during run-time.
Using this technique, you can gain access to your interaction model at run-time. Instead of doing it automatically through an API, you do it by best practice.
You can see several examples of this in action for Java in TsaTsaTzu's examples.

No - there is nothing in the API that allows you to do that.
You can see the full extent of the Request Body structure Alexa gives you to work with. It is very simple and available here:
https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interface-reference#Request%20Format
Please note, the Request Body is not to be confused with the request, which is a structure in the request body, with two siblings: version and session.

I am coding in Android Studio, and I need to fetch and display a specific line of data from a specific webpage

I am very new to coding in Java/Android Studio. I have everything setup that I have been able to figure out thus far. I have a button, and I need to put code inside of the button click event that will fetch information from a website, convert it to a string and display it. I figured I would have to use the html source code in order to do this, so I have installed Jsoup html parser. All of the help with Jsoup I have found only leads me up to getting the HTML into a "Document". And I am not sure if that is the best way to accomplish what I need. Can anyone tell me what code to use to fetch the html code from the website, and then do a search through the html looking for a specific match, and convert that match to a string. Or can anyone tell me if there is a better way to do this. I only need to grab one piece of information and display it.
Here is the piece of html code that contains the value I want:
writeBidRow('Wheat',-60,false,false,false,0.5,'01/15/2015','02/26/2015','All',' ',' ',60,'even','c=2246&l=3519&d=G15',quotes['KEH15'], 0-0);
I need to grab and display whatever value represents the quotes['KEH15'], in that html code.
Thank you in advance for your help.
Keith

Grabbing raw HTML is an extremely tedious way to access information from the web, bad practice, and difficult to maintain in the case that wherever you are fetching the info from changes their HTML.
I don't know your specific situation and what the data is that you are fetching, but if there is another way for you to fetch that data via an API, use that instead.
Since you say you are pretty new to Android and Java, let me explain something I wish had been explained to me very early on (although I am mostly self taught).
The way people access information across the Internet is traditionally through HTML and JavaScript (which is interpreted by your browser like Chrome or Firefox to look pretty), which are transferred over the internet using the protocol called HTTP. This is a great way for humans to communicate with computers that are far away, and the average person probably doesn't realize that there is more to the internet than this--your browser and the websites you can go to.
Although there are multiple methods, for the purpose of what I think you're looking for, applications communicate over the internet a slightly different way:
When an android application asks a server for some information, rather than returning HTML and JavaScript which is intended for human consumption, the server will (traditionally) return what's called JSON (or sometimes XML, which is very similar). JSON is a very simple way to get information about an object, and put it into a form that is readable easily by both humans (developers) and computers, and can be transmitted over the internet easily. For example, let's say you ask a server for some kind of "Video" object for an app that plays video, it may give you something like this:
{
"name": "Gangnam Style",
"metadata": {
"url": "https://www.youtube.com/watch?v=9bZkp7q19f0",
"views": 2000000000,
"ageRestricted": false,
"likes": 43434
"dislikes":124
},
"comments": [
{
"username": "John",
"comment": "10/10 would watch again"
},
{
"username": "Jane",
"number": "12/10 with rice"
}
]
}
That is very readable by us humans, but also by computers! We know the name is "Gangnam Style", the link of the video, etc.
A super helpful way to interact with JSON in Java and Android is Google's GSON library, which lets you cast a Java object as JSON or parse a JSON object to a Java object.
To get this information in the first place, you have to make a network call to an API, Application Programming Interface. Just a fancy term for communication between a server and a client. One very cool, free, and easy to understand API that I will use for this example is the OMDB API, which just spits back information about movies from IMDB. So how do you talk to the API? Well luckily they've got some nice documentation, which says that to get information on a movie we need to use some parameters in the url, like perhaps
http://www.omdbapi.com/?t=Interstellar
They want a title with the parameter "t". We could put a year, or return type, but this should be good to understand the basics. If you go to that URL in your browser, it spits back lots of information about Interstellar in JSON form. That stuff we were talking about! So how would you get this information from your Android application?
Well, you could use Android's built in HttpUrlConnection classes and research for a few hours on why your calls aren't working. But doesn't essentially every app now use networking? Why reinvent the wheel when virtually every valuable app out there has probably done this work before? Perhaps we can find some code online to do this work for us.
Or even better, a library! In particular, an open source library developed by Square, retrofit. There are multiple libraries like it (go ahead and research that out, it's best to find the best fit for your project), but the idea is they do all the hard work for you like low level network programming. Following their guides, you can reduce a lot of code work into just a few lines. So for our OMDB API example, we can set up our network calls like this:
//OMDB API
public ApiClient{
//an instance of this client object
private static OmdbApiInterface sOmdbApiInterface;
//if the omdbApiInterface object has been instantiated, return it, but if not, build it then return it.
public static OmdbApiInterface getOmdbApiClient() {
if (sOmdbApiInterface == null) {
RestAdapter restAdapter = new RestAdapter.Builder()
.setEndpoint("http://www.omdbapi.com")
.build();
sOmdbApiInterface = restAdapter.create(OmdbApiInterface.class);
}
return sOmdbApiInterface;
}
public interface OmdbApiInterface {
#GET("/")
void getInfo(#Query("t") String title, Callback<JsonObject> callback);
}
}
After you have researched and understand what's going on up there using their documentation, we can now use this class that we have set up anywhere in your application to call the API:
//you could get a user input string and pass it in as movieName
ApiClient.getOmdbApiClient().getInfo(movieName, new Callback<List<MovieInfo>>() {
//the nice thing here is that RetroFit deals with the JSON for you, so you can just get information right here from the JSON object
#Override
public void success(JsonObject movies, Response response) {
Log.i("TAG","Movie name is " + movies.getString("Title");
}
#Override
public void failure(RetrofitError error) {
Log.e("TAG", error.getMessage());
}
});
Now you've made an API call to get info from across the web! Congratulations! Now do what you want with the data. In this case we used Omdb but you can use anything that has this method of communication. For your purposes, I don't know exactly what data you are trying to get, but if it's possible, try to find a public API or something where you can get it using a method similar to this.
Let me know if you've got any questions.
Cheers!

As #caleb-allen said, if an API is available to you, it's better to use that.
However, I'm assuming that the web page is all you have to work with.
There are many libraries that can be used on Android to get the content of a URL.
Choices range from using the bare-bones HTTPUrlConnection to slightly higher-level HTTPClient to using robust libraries like Retrofit. I personally recommend Retrofit. Whatever you do, make sure that your HTTP access is asynchronous, and not done on the UI thread. Retrofit will handle this for you by default.
For parsing the results, I've had good results in the past using the open-source HTMLCleaner library - see http://htmlcleaner.sourceforge.net
Similar to JSoup, it takes a possibly-badly-formed HTML document and creates a valid XML document from it.
Once you have a valid XML document, you can use HTMLCleaner's implementation of the XML DOM to parse the document to find what you need.
Here, for example, is a method that I use to parse the names of 'projects' from a <table> element on a web page where projects are links within the table:
private List<Project> parseProjects(String html) throws Exception {
List<Project> parsedProjects = new ArrayList<Project>();
HtmlCleaner pageParser = new HtmlCleaner();
TagNode node = pageParser.clean(html);
String xpath = "//table[#class='listtable']".toString();
Object[] tables = node.evaluateXPath(xpath);
TagNode tableNode;
if(tables.length > 1) {
tableNode = (TagNode) tables[0];
} else {
throw new Exception("projects table not found in html");
}
TagNode[] projectLinks = tableNode.getElementsByName("a", true);
for(int i = 0; i < projectLinks.length; i++) {
TagNode link = projectLinks[i];
String projectName = link.getText().toString();
String href = link.getAttributeByName("href");
String projectIdString = href.split("=")[1];
int projectId = Integer.parseInt(projectIdString);
Project project = new Project(projectId, projectName);
parsedProjects.add(project);
}
return parsedProjects;
}

If you have permission to edit the webpage to add hyper link to specified line of that page you can use this way
First add code for head of line that you want to go there in your page
head your text if wanna
Then in your apk app on control click code enter
This.mwebview.loadurl("https:#######.com.html#target")
in left side of # enter your address of webpage and then #target in this example that your id is target.
Excuse me if my english lang. isn't good

Handling multiple parameters in a URI (RESTfully) in Java

I've been working on a small scale web service in Java/Jersey which reads lists of user information from clients contained in XML files. I currently have this functioning in all but one aspect: using multiple parameters in the URI to denote pulling multiple sets of user information or multiple sets of client information. I have a version which currently works, but is not the best way nor what the project description calls for.
Currently, my code looks like this:
#Path("Client/{client}/users")
public class UserPage
{
#GET
#Produces(MediaType.TEXT_HTML)
public String userChoice(#PathParam(value = "client") final String client)
{****Method here which handles a list of 'users'****}
#GET
#Path("{name}")
#Produces(MediaType.TEXT_HTML)
public String userPage(#PathParam(value = "client") final String client, #PathParam(value = "name") final String name)
{****Method here which handles 'user' information****}
The first method handles a list of users from a 'client' denoted by "{client}" in the URI. The second method delivers 'user' information denoted by "{name}" in the URI. Both will function with a single argument. Currently, in order to handle multiple 'users' I have "{name}" comma separated like "Client/Chick-Fil-A/users/Phil,Bradley". I can parse this after using #PathParam and create an array of these 'users', but again, I feel this is not the best way to handle this, and the project description calls for something different.
Is there a way to accomplish this same task with a URI formatted as "Client/Chick-Fil-A;cd=Phil,Bradley"? (The ;cd= is what's giving me the most trouble.)
I also need to be able to use this format for multiple clients, i.e. "Client;cd=Chick-Fil-A,Subway/users;cd=Phil,Bradley".
Edit: To clarify the project:
The client information is contained in 6 separate files. Each of these files has the same 3 users (this is a proof of concept, effectively). I need to be able to pull different subsets of information, for instance, user Phil from McDonalds and Chick-Fil-A, or users Phil and Peter from McDonalds, or users named Peter from all clients, etc.

You cannot use '=' in the URL path since it's a reserved character. However there are many other character you can use as delimiters such as '-' and ','. So instead of '=' you can use '-'. If you really really want to use '=' then you will have to URL-encode it; however, I would strongly recommend against this because it may make things more complicated then it should be.
You can see the grammar of the URL string here:
http://www.w3.org/Addressing/URL/url-spec.txt
Copy and search the following string to skip to the path grammar:
path void | segment [ / path ]
segment xpalphas
That said, I believe HTTP request is usually used for request single resource only. So my personal opinion is to not implement the service the way you implemented. For getting multiple clients I would use query parameters as filters like this:
Client/{cName}/users?filters=<value1>,<value2> ...
Edit: From the business case you got there, it seems like you probably need service like
/users?<filters>
/clients?<filters>
So say you want to get Peter from all clients then can have a request of this form:
/users?name=Peter
Similarly, if you want to get Jack and Peter from Starbucks then you can do:
/users?name=Peter,Jack&client=Starbucks
Hopefully this helps.

Query strings have the following syntax and you can have multiple parameters with the same name:
http://server/path/program?<query_string>
where query_string has the following syntax:
field1=value1&field1=value2&field1=value3…
For more details check out this entry in Wikipedia: http://en.wikipedia.org/wiki/Query_string

How to mask credit card numbers in log files with Log4J?

Our web app needs to be made PCI compliant, i.e. it must not store any credit card numbers. The app is a frontend to a mainframe system which handles the CC numbers internally and - as we have just found out - occasionally still spits out a full CC number on one of its response screens. By default, the whole content of these responses are logged at debug level, and also the content parsed from these can be logged in lots of different places. So I can't hunt down the source of such data leaks. I must make sure that CC numbers are masked in our log files.
The regex part is not an issue, I will reuse the regex we already use in several other places. However I just can't find any good source on how to alter a part of a log message with Log4J. Filters seem to be much more limited, only able to decide whether to log a particular event or not, but can't alter the content of the message. I also found the ESAPI security wrapper API for Log4J which at first sight promises to do what I want. However, apparently I would need to replace all the loggers in the code with the ESAPI logger class - a pain in the butt. I would prefer a more transparent solution.
Any idea how to mask out credit card numbers from Log4J output?
Update: Based on #pgras's original idea, here is a working solution:
public class CardNumberFilteringLayout extends PatternLayout {
private static final String MASK = "$1++++++++++++";
private static final Pattern PATTERN = Pattern.compile("([0-9]{4})([0-9]{9,15})");
#Override
public String format(LoggingEvent event) {
if (event.getMessage() instanceof String) {
String message = event.getRenderedMessage();
Matcher matcher = PATTERN.matcher(message);
if (matcher.find()) {
String maskedMessage = matcher.replaceAll(MASK);
#SuppressWarnings({ "ThrowableResultOfMethodCallIgnored" })
Throwable throwable = event.getThrowableInformation() != null ?
event.getThrowableInformation().getThrowable() : null;
LoggingEvent maskedEvent = new LoggingEvent(event.fqnOfCategoryClass,
Logger.getLogger(event.getLoggerName()), event.timeStamp,
event.getLevel(), maskedMessage, throwable);
return super.format(maskedEvent);
}
}
return super.format(event);
}
}
Notes:
I mask with + rather than *, because I want to tell apart cases when the CID was masked by this logger, from cases when it was done by the backend server, or whoever else
I use a simplistic regex because I am not worried about false positives
The code is unit tested so I am fairly convinced it works properly. Of course, if you spot any possibility to improve it, please let me know :-)

You could write your own layout and configure it for all appenders...
Layout has a format method which makes a String from a loggingEvent that contains the logging message...

A better implementation of credit card number masking is at http://adamcaudill.com/2011/10/20/masking-credit-cards-for-pci/ .
You want to log the issuer and the checksum, but not the PAN (Primary Account Number).

How to designate resources as do-not-translate?

I work on the localization of Java software, and my projects have both .properties files and XML resources. We currently use comments to instruct translators to not translate certain strings, but the problem with comments is that they are not machine-readable.
The only solution I can think of is to prefix each do-not-translate key with something like _DNT_ and train our translation tools to ignore these entries. Does anyone out there have a better idea?

Could you break the files up into ones to be translated or ones to be not translated and then only send them the one that are to be translated? (Don't know the structure so har dto know when answering if that is practical...)

The Eclipse JDT also uses comments to prevent the translation of certain Strings:
How to write Eclipse plug-ins for the international market
I think your translation tool should work in a similar way?

The simplest solution is to not put do-not-translate strings (DNTs) in your resource files.
.properties files don't offer much in the way of metadata handling, and since you don't need the data at runtime, its presence in .properties files would be a side-effect rather than something that is desirable. Consider too, partial DNTs where you have something that cannot be translated contained in a translatable string (e.g. a brand name or URI).
"IDENTIFIER english en en en" -> "french fr IDENTIFIER fr fr"
As far as I am aware, even standards like XLIFF do not take DNTs into consideration and you'll have to manage them through custom metadata files, terminology files and/or comments (such as the note element in XLIFF).

Like axelclk posted in his link... eclipse provide a
//$NON-NLS-1$
Statement to notify the project that the first string in this line should not translated. All other string you can find by calling
Source->Externalize Strings
External Strings include all languages you want to support.
File which include the translations looking like:
PluginPage.Error1 = text1
PluginPage.Error2 = text2
Class which read the translation
private static final String BUNDLE_NAME = "com.plugin.name"; //$NON-NLS-1$
private static final ResourceBundle RESOURCE_BUNDLE = ResourceBundle.getBundle(BUNDLE_NAME);
private PluginMessages() {
}
public static String getString(String key) {
// TODO Auto-generated method stub
try {
return RESOURCE_BUNDLE.getString(key);
} catch (MissingResourceException e) {
return '!' + key + '!';
}
}
And you can call it like:
String msg = PluginMessages.getString("PluginPage.Error2"); //$NON-NLS-1$
EDIT:
When a string is externalized and you want to use the original string, you can delete the externalize string from all properties files, without the default one. When the Bundle can not find a message file which is matching to the local language, the default is used.
But this is not working at runtime.

If you do decide to use do-not-translate comments in your properties files, I would recommend you follow the Eclipse convention. It's nothing special, but life will be easier if we all use the same magic string!
(Eclipse doesn't actually support DO-NOT-TRANSLATE comments yet, as far as I know, but Tennera Ant-Gettext has an implementation of the above scheme which is used when converting from resource bundles to Gettext PO files.)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

In Java, how do I extract the domain of a URL? - java

try it like this: String parts[] = longDomain.split("."); String domain = parts[parts.length-2] + "." + [parts.length -1];

Related

Enumerate Custom Slot Values from Speechlet

I am coding in Android Studio, and I need to fetch and display a specific line of data from a specific webpage

Handling multiple parameters in a URI (RESTfully) in Java

How to mask credit card numbers in log files with Log4J?

How to designate resources as do-not-translate?

Categories

Resources