Find an String with some keys in java

Find an String with some keys in java - java

Consider a map as below:
Map("PDF","application/pdf")
Map("XLSX","application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
Map("CVS","application/csv")
....
There is an export method which gets the export button name and find the export type and application content type
public void setExport(String exportBtn) {
for (String key : exportTypes.keySet()) {
if (exportBtn.contains(key)) {
this.export = key;
this.exportContentType = exportTypes.get(key);
LOG.debug("Exporting to {} ", this.export);
return ;
}
}
}
This method can be called as
setExport("PDF") >> export=PDF, exportContentType=application/pdf
setExport("Make and PDF") >> PDF, exportContentType=application/pdf
setExport("PDF Maker") >> PDF, exportContentType=application/pdf
I am not feeling good with this approch! At least I think there is some libs, for example in StringUtils, which can do something like:
String keys[]={"PDF","XLSX","CVS"};
String input="Make the PDF";
selectedKey = StringUtils.xxx(input,keys);
This can some how simplify my method.
But I could not find anything. Any comments?!

You could use Regex to solve this issue, something like this:
final Pattern pattern = Pattern.compile("(PDF|XLSX|CVS)");
final Matcher matcher = pattern.matcher("Make the PDF");
if (matcher.find()) {
setExportType(matcher.group());
}
You then need to create the pattern procedurally to include all keys once, and of course use the button's name instead of "Make the PDF".

Map is the easy and best implementation to store key-value pairs.
Why cannot you directly use the get method of map with key?
exportContentType = exportTypes.get(exportBtn);
if(exportContentType !=null || exportcontentType.isEmpty())
throw error;
else
export = exportBtn;

Related

Cannot get '#' symbol in Controller using Spring #RequestParam

I have the following request Url /search?charset=UTF-8&q=C%23C%2B%2B.
My controller looks like
#RequestMapping(method = RequestMethod.GET, params = "q")
public String refineSearch(#RequestParam("q") final String searchQuery,....
and here i have searchQuery = 'CC++'.
'#' is encoded in '%23' and '+' is '%2B'.
Why searchQuery does not contain '#'?
searchQuery in debug

I resolved a similar problem by URL encoding the hash part. We have Spring web server and mix of JS and VueJS client. This fixed my problem:
const location = window.location;
const redirect = location.pathname + encodeURIComponent(location.hash);

The main cause is known as the "fragment identifier". You find more detail for Fragment Identifier right here. It says:
The fragment identifier introduced by a hash mark # is the optional last part of a URL for a document. It is typically used to identify a portion of that document.
When you write # sign, it contains info for clientbase. Put everything only the browser needs here. You can get this problem for all types of URI characters you can look Percent Encoding for this. In my opinion The simple solution is character replacing, you could try replace in serverbase.

Finally i found a problem.In filters chain ServletRequest is wrapped in XSSRequestWrapper with DefaultXSSValueTranslator and here is the method String stripXSS(String value) which iterates through pattern list,in case if value matches with pattern, method will delete it.
Pattern list contains "\u0023" pattern and '#' will be replaced with ""
DefaultXSSValueTranslator.
private String stripXSS(String value) {
Pattern scriptPattern;
if (value != null && value.length() > 0) {
for(Iterator var3 = this.patterns.iterator(); var3.hasNext(); value = scriptPattern.matcher(value).replaceAll("")) {
scriptPattern = (Pattern)var3.next();
}
}
return value;
}

How to read the public URL in GWT?

I m new in GWT and I m generating a web application in which i have to create a public URL.
In this public URL i have to pass hashtag(#) and some parameters.
I am finding difficulty in achieving this task.
Extracting the hashtag from the URL.
Extracting the userid from the URL.
My public URL example is :: http://www.xyz.com/#profile?userid=10003

To access the URL in GWT you can use the History.getToken() method. It will give you the entire string that follows the hashtag ("#").
In your case (http://www.xyz.com/#profile?userid=10003) it will return a string "profile?userid=10003". After you have this you can parse it however you want. You can check if it contains("?") and u can split it by "?" or you can get a substring. How you get the information from that is really up to you.

I guess you already have the URL. I'm not that good at Regex, but this should work:
String yourURL = "http://www.xyz.com/#profile?userid=10003";
String[] array = yourURL.split("[\\p{Lower}\\p{Upper}\\p{Punct}}]");
int userID = 0;
for (String string : array) {
if (!string.isEmpty()) {
userID = Integer.valueOf(string);
}
}
System.out.println(userID);

To get the parameters:
String userId = Window.Location.getParameter("userid");
To get the anchor / hash tag:
I don't think there is something, you can parse the URL: look at the methods provided by Window.Location.

get a substring with regex [duplicate]

I need a regex pattern for finding web page links in HTML.
I first use #"(<a.*?>.*?</a>)" to extract links (<a>), but I can't fetch href from that.
My strings are:
<a href="www.example.com/page.php?id=xxxx&name=yyyy" ....></a>
<a href="http://www.example.com/page.php?id=xxxx&name=yyyy" ....></a>
<a href="https://www.example.com/page.php?id=xxxx&name=yyyy" ....></a>
<a href="www.example.com/page.php/404" ....></a>
1, 2 and 3 are valid and I need them, but number 4 is not valid for me
(? and = is essential)
Thanks everyone, but I don't need parsing <a>. I have a list of links in href="abcdef" format.
I need to fetch href of the links and filter it, my favorite urls must be contain ? and = like page.php?id=5
Thanks!

I'd recommend using an HTML parser over a regex, but still here's a regex that will create a capturing group over the value of the href attribute of each links. It will match whether double or single quotes are used.
<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1
You can view a full explanation of this regex at here.
Snippet playground:
const linkRx = /<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1/;
const textToMatchInput = document.querySelector('[name=textToMatch]');
document.querySelector('button').addEventListener('click', () => {
console.log(textToMatchInput.value.match(linkRx));
});
<label>
Text to match:
<input type="text" name="textToMatch" value='<a href="google.com"'>
<button>Match</button>
</label>

Using regex to parse html is not recommended
regex is used for regularly occurring patterns.html is not regular with it's format(except xhtml).For example html files are valid even if you don't have a closing tag!This could break your code.
Use an html parser like htmlagilitypack
You can use this code to retrieve all href's in anchor tag using HtmlAgilityPack
HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);
var hrefList = doc.DocumentNode.SelectNodes("//a")
.Select(p => p.GetAttributeValue("href", "not found"))
.ToList();
hrefList contains all href`s

Thanks everyone (specially #plalx)
I find it quite overkill enforce the validity of the href attribute with such a complex and cryptic pattern while a simple expression such as
<a\s+(?:[^>]*?\s+)?href="([^"]*)"
would suffice to capture all URLs. If you want to make sure they contain at least a query string, you could just use
<a\s+(?:[^>]*?\s+)?href="([^"]+\?[^"]+)"
My final regex string:
First use one of this:
st = #"((www\.|https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+ \w\d:##%/;$()~_?\+-=\\\.&]*)";
st = #"<a href[^>]*>(.*?)</a>";
st = #"((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)";
st = #"((?:(?:https?|ftp|gopher|telnet|file|notes|ms-help):(?://|\\\\)(?:www\.)?|www\.)[\w\d:##%/;$()~_?\+,\-=\\.&]+)";
st = #"(?:(?:https?|ftp|gopher|telnet|file|notes|ms-help):(?://|\\\\)(?:www\.)?|www\.)";
st = #"(((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+)|(www\.)[\w\d:##%/;$()~_?\+-=\\\.&]*)";
st = #"href=[""'](?<url>(http|https)://[^/]*?\.(com|org|net|gov))(/.*)?[""']";
st = #"(<a.*?>.*?</a>)";
st = #"(?:hrefs*=)(?:[s""']*)(?!#|mailto|location.|javascript|.*css|.*this.)(?.*?)(?:[s>""'])";
st = #"http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?";
st = #"http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?";
st = #"(http|https)://([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?";
st = #"((http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?)";
st = #"http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?";
st = #"http(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?$";
st = #"(?<Protocol>\w+):\/\/(?<Domain>[\w.]+\/?)\S*";
my choice is
#"(?<Protocol>\w+):\/\/(?<Domain>[\w.]+\/?)\S*"
Second Use this:
st = "(.*)?(.*)=(.*)";
Problem Solved. Thanks every one :)

Try this :
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
var res = Find(html);
}
public static List<LinkItem> Find(string file)
{
List<LinkItem> list = new List<LinkItem>();
// 1.
// Find all matches in file.
MatchCollection m1 = Regex.Matches(file, #"(<a.*?>.*?</a>)",
RegexOptions.Singleline);
// 2.
// Loop over each match.
foreach (Match m in m1)
{
string value = m.Groups[1].Value;
LinkItem i = new LinkItem();
// 3.
// Get href attribute.
Match m2 = Regex.Match(value, #"href=\""(.*?)\""",
RegexOptions.Singleline);
if (m2.Success)
{
i.Href = m2.Groups[1].Value;
}
// 4.
// Remove inner tags from text.
string t = Regex.Replace(value, #"\s*<.*?>\s*", "",
RegexOptions.Singleline);
i.Text = t;
list.Add(i);
}
return list;
}
public struct LinkItem
{
public string Href;
public string Text;
public override string ToString()
{
return Href + "\n\t" + Text;
}
}
}
Input:
string html = "<a href=\"www.aaa.xx/xx.zz?id=xxxx&name=xxxx\" ....></a> 2.<a href=\"http://www.aaa.xx/xx.zz?id=xxxx&name=xxxx\" ....></a> ";
Result:
[0] = {www.aaa.xx/xx.zz?id=xxxx&name=xxxx}
[1] = {http://www.aaa.xx/xx.zz?id=xxxx&name=xxxx}
C# Scraping HTML Links
Scraping HTML extracts important page elements. It has many legal uses
for webmasters and ASP.NET developers. With the Regex type and
WebClient, we implement screen scraping for HTML.
Edited
Another easy way:you can use a web browser control for getting href from tag a,like this:(see my example)
public Form1()
{
InitializeComponent();
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
}
private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.DocumentText = "<a href=\"www.aaa.xx/xx.zz?id=xxxx&name=xxxx\" ....></a><a href=\"http://www.aaa.xx/xx.zz?id=xxxx&name=xxxx\" ....></a><a href=\"https://www.aaa.xx/xx.zz?id=xxxx&name=xxxx\" ....></a><a href=\"www.aaa.xx/xx.zz/xxx\" ....></a>";
}
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
List<string> href = new List<string>();
foreach (HtmlElement el in webBrowser1.Document.GetElementsByTagName("a"))
{
href.Add(el.GetAttribute("href"));
}
}

Try this regex:
"href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))"
You will get more help from discussions over:
Regular expression to extract URL from an HTML link
and
Regex to get the link in href. [asp.net]
Hope its helpful.

HTMLDocument DOC = this.MySuperBrowser.Document as HTMLDocument;
public IHTMLAnchorElement imageElementHref;
imageElementHref = DOC.getElementById("idfirsticonhref") as IHTMLAnchorElement;
Simply try this code

I came up with this one, that supports anchor and image tags, and supports single and double quotes.
<[a|img]+\\s+(?:[^>]*?\\s+)?[src|href]+=[\"']([^\"']*)['\"]
So
click here
Will match:
Match 1: /something.ext
And
<a href='/something.ext'>click here</a>
Will match:
Match 1: /something.ext
Same goes for img src attributes

I took a much simpler approach. This one simply looks for href attributes, and captures the value (between apostrophes) trailing it into a group named url:
href=['"](?<url>.*?)['"]

I think in this case it is one of the simplest pregmatches
/<a\s*(.*?id[^"]*")/g
gets links with the variable id in the address
starts from href including it, gets all characters/signs (. - excluding new line signs)
until first id occur, including it, and next all signs to nearest next " sign ([^"]*)

url harvester string manipulation

I'm doing a recursive url harvest.. when I find an link in the source that doesn't start with "http" then I append it to the current url. Problem is when I run into a dynamic site the link without an http is usually a new parameter for the current url. For example if the current url is something like http://www.somewebapp.com/default.aspx?pageid=4088 and in the source for that page there is a link which is default.aspx?pageid=2111. In this case I need do some string manipulation; this is where I need help.
pseudocode:
if part of the link found is a contains a substring of the current url
save the substring
save the unique part of the link found
replace whatever is after the substring in the current url with the unique saved part
What would this look like in java? Any ideas for doing this differently? Thanks.
As per comment, here's what I've tried:
if (!matched.startsWith("http")) {
String[] splitted = url.toString().split("/");
java.lang.String endOfURL = splitted[splitted.length-1];
boolean b = false;
while (!b && endOfURL.length() > 5) { // f.bar shortest val
endOfURL = endOfURL.substring(0, endOfURL.length()-2);
if (matched.contains(endOfURL)) {
matched = matched.substring(endOfURL.length()-1);
matched = url.toString().substring(url.toString().length() - matched.length()) + matched;
b = true;
}
}
it's not working well..

I think you are doing this the wrong way. Java has two classes URL and URI which are capable of parsing URL/URL strings much more accurately than a "string bashing" solution. For example the URL constructor URL(URL, String) will create a new URL object in the context of an existing one, without you needing to worry whether the String is an absolute URL or a relative one. You would use it something like this:
URL currentPageUrl = ...
String linkUrlString = ...
// (Exception handling not included ...)
URL linkUrl = new URL(currentPageUrl, linkUrlString);

How can I used named parameters in a messages.properties file?

Is there any way to have message.properties records as follows
message.myMessage=This message is for ${name} in ${location}
as opposed to
message.myMessage = This message is for {0} in {1}
When I am creating the messages, I don't neccessarily know the order / how many parameters are needed, but I am able just pass in several properties by name, and just the correct ones would be used.

After facing the very same question and poking in source code I found a "loop-hole" that makes it possible in a very easy way:
message.myMessage = This message is for {0,,name} in {1,,location}
This approach doesn't eliminate usage of numbers. The reason to use it is to give hints to translation folks.

I am afraid not, parameters are an Object array so there is no way to define names for them. If you always passes in the array of parameter in the same order though you could use them like this:
message.myMessage = This message is for {0} in {1}
message.myNameMessage = This message is for {0}
message.myLocationMessage = This message is for people in {1}
message.myAlternateMessage = The message params are location: {1}; name: {0}

Take a look at ICU4J
It allows for something like this:
message.myMessage=This message is for {name} in {location}.
And it is way more powerful than the simple replacements suggested, because can do locale aware formatting of the parameters (ie: "Subscription expires on: {expirationDate, date, long})
http://icu-project.org/apiref/icu4j/com/ibm/icu/text/MessageFormat.html

Unfortunately the MessageFormat API does not support named parameters, only argument-index:
Patterns and Their Interpretation
MessageFormat uses patterns of the following form:
MessageFormatPattern:
String
MessageFormatPattern FormatElement String
FormatElement:
{ ArgumentIndex }
{ ArgumentIndex , FormatType }
{ ArgumentIndex , FormatType , FormatStyle }

Everything is possible for those who try... I never heard about something like that for Java, but you can write it by yourself.
Please take a look at this example:
public String format(String message, String... arguments) {
for (String argument : arguments) {
String[] keyValue = argument.split("=");
if (keyValue.length != 2)
throw new IllegalArgumentException("Incorrect argument: " + argument);
String placeholder = "${" + keyValue[0] + "}";
if (!message.contains(placeholder))
throw new IllegalArgumentException(keyValue[0] + " does not exists.");
while (message.contains(placeholder))
message = message.replace(placeholder, keyValue[1]);
}
return message;
}
It is not ideal, as you actually would call it with hardcoded string (which is generally bad idea) and you would be forced to use Strings only, but it can be done. The only question is if it is practical.

It is possible using apache commons lang library.
https://commons.apache.org/proper/commons-lang/
Properties messages = ...
Map<String, String> m = new HashMap<>();
m.put("name", "Mithu");
m.put("location", "Dhaka");
StrSubstitutor sub = new StrSubstitutor(m);
String msg = sub.replace(messages.getProperty("message.myMessage"));
// msg = This message is for Mithu in Dhaka

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Find an String with some keys in java - java

Map is the easy and best implementation to store key-value pairs. Why cannot you directly use the get method of map with key? exportContentType = exportTypes.get(exportBtn); if(exportContentType !=null || exportcontentType.isEmpty()) throw error; else export = exportBtn;

Related

Cannot get '#' symbol in Controller using Spring #RequestParam

How to read the public URL in GWT?

get a substring with regex [duplicate]

url harvester string manipulation

How can I used named parameters in a messages.properties file?

Categories

Resources