Get specific words from a string in Java

Get specific words from a string in Java - java

If I have the following URL:
http://www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0
How can I get the name of the plugin (simply named wordpressplugin in the URL) and the version so the output will be - wordpressplugin ver 1.0?

I am posting my comment as an answer
String s = "http://www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0";
String[] ary = s.split("/");
System.out.println(ary[5] + " " + ary[7]);
Easiest way this is acc to your question,
you have to use regex for more dynamic searching.

You may do it like so, using Regex support in Java.
String url = "http://www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0";
Pattern pattern = Pattern.compile("(.*plugins/)(.*)(/\\d{3}/)(ver.*)");
Matcher matcher = pattern.matcher(url);
if (matcher.matches()) {
System.out.println("Plugin: " + matcher.group(2));
System.out.println("Version: " + matcher.group(4));
}
Notice the use of capture groups. Here's the output.
Plugin: wordpressplugin
Version: ver=1.0

You should have a look into Regular Expressions (in Oracle tutorials), which are the general tool in any programming language to get/match sub-strings out of a larger string (which follows some more or less fixed format).

Because you claim to be new to JAVA, here is a very simple answer that should suit your skills
String url = "http://www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0";
String search = "plugins/";
int index = url.indexOf(search);
String pluginName, version;
if (index > -1)
{
index += search.length;
pluginName = url.substring(index, url.indexOf("/",index + 1));
search = "ver=";
index = url.indexOf(search);
if (index > -1)
{
version = url.substring(index + search.length);
System.out.prinln(pluginName + " " + version);
}
}

PS: This would work if and only if your url format always remains the same!
The fastest way to solve this problem is to take advantage of the split method of Strings. Just study the method below carefully, it's basic.
public String getVersionNumber(String url){
String[] arr0 = url.split("//");
//The code above returns an array of two strings: "http:" and "www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0"
String[] arr1 = arr0[1].split("/");
//The code above returns an array of six strings: "www.example.com", "wordpress", "plugins", "wordpressplugin", "123" and "ver=1.0".
return String.format("%s %s", arr1[3], arr1[5]);
//OUTPUT: wordpressplugin ver=1.0
//I simply returned what I needed.
}
I hope this helps.. merry coding!

Related

How to replace string values with "XXXXX" in java?

I want to replace particular string values with "XXXX". The issue is the pattern is very dynamic and it won't have a fixed pattern in input data.
My input data
https://internetbanking.abc.co.uk/personal/logon/login/?userId=Templ3108&password=Got1&reme
I need to replace the values of userId and password with "XXXX".
My output should be -
https://internetbanking.abc.co.uk/personal/logon/login/?userId=XXXX&password=XXXX&reme
This is an one off example. There are other cases where only userId and password is present -
userId=12345678&password=stackoverflow&rememberID=
I am using Regex in java to achieve the above, but have not been successful yet. Appreciate any guidance.
[&]([^\\/?&;]{0,})(userId=|password=)=[^&;]+|((?<=\\/)|(?<=\\?)|(?<=;))([^\\/?&;]{0,})(userId=|password=)=[^&]+|(?<=\\?)(userId=|password=)=[^&]+|(userId=|password=)=[^&]+
PS : I am not an expert in Regex. Also, please do let me know if there are any other alternatives to achieve this apart from Regex.

This may cover given both cases.
String maskUserNameAndPassword(String input) {
return input.replaceAll("(userId|password)=[^&]+", "$1=XXXXX");
}
String inputUrl1 =
"https://internetbanking.abc.co.uk/personal/logon/login/?userId=Templ3108&password=Got1&reme";
String inputUrl2 =
"userId=12345678&password=stackoverflow&rememberID=";
String input = "https://internetbanking.abc.co.uk/personal/logon/login/?userId=Templ3108&password=Got1&reme";
String maskedUrl1 = maskUserNameAndPassword(inputUrl1);
System.out.println("Mask url1: " + maskUserNameAndPassword(inputUrl1));
String maskedUrl2 = maskUserNameAndPassword(inputUrl1);
System.out.println("Mask url2: " + maskUserNameAndPassword(inputUrl2));
Above will result:
Mask url1: https://internetbanking.abc.co.uk/personal/logon/login/?userId=XXXXX&password=XXXXX&reme
Mask url2: userId=XXXXX&password=XXXXX&rememberID=

String url = "https://internetbanking.abc.co.uk/personal/logon/login/?userId=Templ3108&password=Got1&reme";
String masked = url.replaceAll("(userId|password)=[^&]+", "$1=XXXX");
See online demo and regex explanation.
Please note, that sending sensitive data via the query string is a big security issue.

I would rather use a URL parser than regex. The below example uses the standard URL class available in java but third party libraries can do it much better.
Function<Map.Entry<String, String>, Map.Entry<String, String>> maskUserPasswordEntries = e ->
(e.getKey().equals("userId") || e.getKey().equals("password")) ? Map.entry(e.getKey(), "XXXX") : e;
Function<List<String>, Map.Entry<String, String>> transformParamsToMap = p ->
Map.entry(p.get(0), p.size() == 1 ? "" : p.get(p.size() - 1));
URL url = new URL("https://internetbanking.abc.co.uk/personal/logon/login/?userId=Templ3108&password=Got1&reme");
String maskedQuery = Stream.of(url.getQuery().split("&"))
.map(s -> List.of(s.split("=")))
.map(transformParamsToMap)
.map(maskUserPasswordEntries).map(e -> e.getKey() + "=" + e.getValue())
.collect(Collectors.joining("&"));
System.out.println(url.getProtocol() + "://" + url.getAuthority() + url.getPath() + "?" + maskedQuery);
Output:
https://internetbanking.abc.co.uk/personal/logon/login/?userId=XXXX&password=XXXX&reme=

Just use the methods replace/replaceAll from the String class, they support Charset aswell as regex.
String url = "https://internetbanking.abc.co.uk/personal/logon/login/?userId=Templ3108&password=Got1&reme";
url = url.replaceAll("(userId=.+?&)", "userId=XXXX&");
url = url.replaceAll("(password=.+?&)", "password=XXXX&");
System.out.println(url);
I'm not a regex expert either, but if you find it useful, I usually use this website to test my expressions and as a online Cheatsheet:
https://regexr.com

Use:
(?<=(\?|&))(userId|password)=(.*?)(?=(&|$))
(?<=(\?|&)) makes sure it’s preceded by ? or & (but not part of the match)
(userId|password)= matches either userId or password, then =
(.*?) matches any char as long as the next instruction cannot be executed
(?=(&|$)) makes sure the next char is either & or end of the string, (but not part of the match)
Then, replace with $2=xxxxx (to keep userId or password) and choose replaceAll.

masking of email address in java

I am trying to mask email address with "*" but I am bad at regex.
input : nileshxyzae#gmail.com
output : nil********#gmail.com
My code is
String maskedEmail = email.replaceAll("(?<=.{3}).(?=[^#]*?.#)", "*");
but its giving me output nil*******e#gmail.com I am not getting whats getting wrong here. Why last character is not converted?
Also can someone explain meaning all these regex

Your look-ahead (?=[^#]*?.#) requires at least 1 character to be there in front of # (see the dot before #).
If you remove it, you will get all the expected symbols replaced:
(?<=.{3}).(?=[^#]*?#)
Here is the regex demo (replace with *).
However, the regex is not a proper regex for the task. You need a regex that will match each character after the first 3 characters up to the first #:
(^[^#]{3}|(?!^)\G)[^#]
See another regex demo, replace with $1*. Here, [^#] matches any character that is not #, so we do not match addresses like abc#example.com. Only those emails will be masked that have 4+ characters in the username part.
See IDEONE demo:
String s = "nileshkemse#gmail.com";
System.out.println(s.replaceAll("(^[^#]{3}|(?!^)\\G)[^#]", "$1*"));

If you're bad at regular expressions, don't use them :) I don't know if you've ever heard the quote:
Some people, when confronted with a problem, think
"I know, I'll use regular expressions." Now they have two problems.
(source)
You might get a working regular expression here, but will you understand it today? tomorrow? in six months' time? And will your colleagues?
An easy alternative is using a StringBuilder, and I'd argue that it's a lot more straightforward to understand what is going on here:
StringBuilder sb = new StringBuilder(email);
for (int i = 3; i < sb.length() && sb.charAt(i) != '#'; ++i) {
sb.setCharAt(i, '*');
}
email = sb.toString();
"Starting at the third character, replace the characters with a * until you reach the end of the string or #."
(You don't even need to use StringBuilder: you could simply manipulate the elements of email.toCharArray(), then construct a new string at the end).
Of course, this doesn't work correctly for email addresses where the local part is shorter than 3 characters - it would actually then mask the domain.

Your Look-ahead is kind of complicated. Try this code :
public static void main(String... args) throws Exception {
String s = "nileshkemse#gmail.com";
s= s.replaceAll("(?<=.{3}).(?=.*#)", "*");
System.out.println(s);
}
O/P :
nil********#gmail.com

I like this one because I just want to hide 4 characters, it also dynamically decrease the hidden chars to 2 if the email address is too short:
public static String maskEmailAddress(final String email) {
final String mask = "*****";
final int at = email.indexOf("#");
if (at > 2) {
final int maskLen = Math.min(Math.max(at / 2, 2), 4);
final int start = (at - maskLen) / 2;
return email.substring(0, start) + mask.substring(0, maskLen) + email.substring(start + maskLen);
}
return email;
}
Sample outputs:
my.email#gmail.com > my****il#gmail.com
info#mail.com > i**o#mail.com

//In Kotlin
val email = "nileshkemse#gmail.com"
val maskedEmail = email.replace(Regex("(?<=.{3}).(?=.*#)"), "*")

public static string GetMaskedEmail(string emailAddress)
{
string _emailToMask = emailAddress;
try
{
if (!string.IsNullOrEmpty(emailAddress))
{
var _splitEmail = emailAddress.Split(Char.Parse("#"));
var _user = _splitEmail[0];
var _domain = _splitEmail[1];
if (_user.Length > 3)
{
var _maskedUser = _user.Substring(0, 3) + new String(Char.Parse("*"), _user.Length - 3);
_emailToMask = _maskedUser + "#" + _domain;
}
else
{
_emailToMask = new String(Char.Parse("*"), _user.Length) + "#" + _domain;
}
}
}
catch (Exception) { }
return _emailToMask;
}

Extract text from string Java

With this string "ADACADABRA". how to extract "CADA" From string "ADACADABRA" in java.
and also how to extract the id between "/" and "?" from the link below.
http://www.youtube-nocookie.com/embed/zaaU9lJ34c5?rel=0
output should be: zaaU9lJ34c5
but should use "/" and "?" in the process.

and also how to extract the id between "/" and "?" from the link below.
http://www.youtube-nocookie.com/embed/zaaU9lJ34c5?rel=0
output should be: zaaU9lJ34c5
Should be :
String url = "http://www.youtube-nocookie.com/embed/zaaU9lJ34c5?rel=0";
String str = url.substring(url.lastIndexOf("/") + 1, url.indexOf("?"));

String s = "ADACADABRA";
String s2 = s.substring(3,7);
Here 3 specifies the beginning index, and 7 specifies the stopping point.
The string returned contains all the characters from the beginning index, up to, but not including, the ending index.

I'm not entirely sure what you mean by extract, so I've provided the code to remove it from the String, I'm not certain if this is what you want.
public static void main (String args[]){
String string = "ADACADABRA";
string = string.replace("CADA", "");
System.out.println(string);
}

This is untested but something like this may help for the youtube part:
String youtubeUrl = "http://www.youtube-nocookie.com/embed/zaaU9lJ34c5?rel=0";
String[] urlParts = youtubeUrl.split("/");
String videoId = urlParts[urlParts.length - 1];
videoId = videoId.substring(0, videoId.indexOf("?"));
Extracting CADA from the string makes no sense. You will need to specify how you have determined that CADA is the string to extract.
E.g. is it because it is the middle 4 characters? Is it because you are stripping off 3 characters each side? Are you just looking for the String "CADA"? Is it characters 3,7 of the String? Is it the first 4 of the last 7 characters of a String? Is it because it contains 2 vowels and 2 consanants? I could go on..

String regex = "CADA";
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
Matcher m = p.matcher(originalText);
while (m.find()) {
String outputThis = m.group(1);
}
Use this tool http://www.regexplanet.com/advanced/java/index.html

Probably, you don't take in account the fact of java.lang.String immutability. That's why you need to assign the result of substringing to a new variable.

Trim() in Java not working the way I expect? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Query about the trim() method in Java
I am parsing a site's usernames and other information, and each one has a bunch of spaces after it (but spaces in between the words).
For example: "Bob the Builder " or "Sam the welder ". The numbers of spaces vary from name to name. I figured I'd just use .trim(), since I've used this before.
However, it's giving me trouble. My code looks like this:
for (int i = 0; i < splitSource3.size(); i++) {
splitSource3.set(i, splitSource3.get(i).trim());
}
The result is just the same; no spaces are removed at the end.
Thank you in advance for your excellent answers!
UPDATE:
The full code is a bit more complicated, since there are HTML tags that are parsed out first. It goes exactly like this:
for (String s : splitSource2) {
if (s.length() > "<td class=\"dddefault\">".length() && s.substring(0, "<td class=\"dddefault\">".length()).equals("<td class=\"dddefault\">")) {
splitSource3.add(s.substring("<td class=\"dddefault\">".length()));
}
}
System.out.println("\n");
for (int i = 0; i < splitSource3.size(); i++) {
splitSource3.set(i, splitSource3.get(i).substring(0, splitSource3.get(i).length() - 5));
splitSource3.set(i, splitSource3.get(i).trim());
System.out.println(i + ": " + splitSource3.get(i));
}
}
UPDATE:
Calm down. I never said the fault lay with Java, and I never said it was a bug or broken or anything. I simply said I was having trouble with it and posted my code for you to collaborate on and help solve my issue. Note the phrase "my issue" and not "java's issue". I have actually had the code printing out
System.out.println(i + ": " + splitSource3.get(i) + "*");
in a for each loop afterward.
This is how I knew I had a problem.
By the way, the problem has still not been fixed.
UPDATE:
Sample output (minus single quotes):
'0: Olin D. Kirkland                                          '
'1: Sophomore                                          '
'2: Someplace, Virginia  12345<br />VA SomeCity<br />'
'3: Undergraduate                                          '
EDIT the OP rephrased his question at Query about the trim() method in Java, where the issue was found to be Unicode whitespace characters which are not matched by String.trim().

It just occurred to me that I used to have this sort of issue when I worked on a screen-scraping project. The key is that sometimes the downloaded HTML sources contain non-printable characters which are non-whitespace characters too. These are very difficult to copy-paste to a browser. I assume that this could happened to you.
If my assumption is correct then you've got two choices:
Use a binary reader and figure out what those characters are - and delete them with String.replace(); E.g.:
private static void cutCharacters(String fromHtml) {
String result = fromHtml;
char[] problematicCharacters = {'\000', '\001', '\003'}; //this could be a private static final constant too
for (char ch : problematicCharacters) {
result = result.replace(ch, ""); //I know, it's dirty to modify an input parameter. But it will do as an example
}
return result;
}
If you find some sort of reoccurring pattern in the HTML to be parsed then you can use regexes and substrings to cut the unwanted parts. E.g.:
private String getImportantParts(String fromHtml) {
Pattern p = Pattern.compile("(\\w*\\s*)"); //this could be a private static final constant as well.
Matcher m = p.matcher(fromHtml);
StringBuilder buff = new StringBuilder();
while (m.find()) {
buff.append(m.group(1));
}
return buff.toString().trim();
}

Works without a problem for me.
Here your code a bit refactored and (maybe) better readable:
final String openingTag = "<td class=\"dddefault\">";
final String closingTag = "</td>";
List<String> splitSource2 = new ArrayList<String>();
splitSource2.add(openingTag + "Bob the Builder " + closingTag);
splitSource2.add(openingTag + "Sam the welder " + closingTag);
for (String string : splitSource2) {
System.out.println("|" + string + "|");
}
List<String> splitSource3 = new ArrayList<String>();
for (String s : splitSource2) {
if (s.length() > openingTag.length() && s.startsWith(openingTag)) {
String nameWithoutOpeningTag = s.substring(openingTag.length());
splitSource3.add(nameWithoutOpeningTag);
}
}
System.out.println("\n");
for (int i = 0; i < splitSource3.size(); i++) {
String name = splitSource3.get(i);
int closingTagBegin = splitSource3.get(i).length() - closingTag.length();
String nameWithoutClosingTag = name.substring(0, closingTagBegin);
String nameTrimmed = nameWithoutClosingTag.trim();
splitSource3.set(i, nameTrimmed);
System.out.println("|" + splitSource3.get(i) + "|");
}
I know that's not a real answer, but i cannot post comments and this code as a comment wouldn't fit, so I made it an answer, so that Olin Kirkland can check his code.

Regular expression string search in Java

I know this can be done in many ways but im curious as to what the regex would be to pick out all strings not containing a particular substring, say GDA from
strings like GADSA, GDSARTCC, , THGDAERY.

you can do negative lookaround
"^((?!GAD).)*$"

You don't need a regex. Just use string.contains("GDA") to see if a string contains a particular substring. It will return false if it doesn't.

If your input is one long string then you have to decide how you define a substring. If it's separated by spaces then:
String[] split = mylongstr.split(" ");
for (String s : split) {
if (!s.contains("GDA")) {
// do whatever
}
}

String regex = ".*GDA.*";
List<String> testStrings = populateStrings();
for (String s : testStrings)
{
if (!s.matches(regex))
System.out.println("String " + s + " does not match " + regex);
}

Give this a shot:
java.util.regex.Pattern p = java.util.regex.Pattern.compile("(?!\\w*GDA\\w*)\\b\\w+\\b");
java.util.regex.Matcher m = p.matcher("GADSA, GDSARTCC, , THGDAERY");
while (m.find()) {
System.out.println("Found: " + m.group());
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Get specific words from a string in Java - java

If I have the following URL: http://www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0 How can I get the name of the plugin (simply named wordpressplugin in the URL) and the version so the output will be - wordpressplugin ver 1.0?

I am posting my comment as an answer String s = "http://www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0"; String[] ary = s.split("/"); System.out.println(ary[5] + " " + ary[7]); Easiest way this is acc to your question, you have to use regex for more dynamic searching.

You should have a look into Regular Expressions (in Oracle tutorials), which are the general tool in any programming language to get/match sub-strings out of a larger string (which follows some more or less fixed format).

Related

How to replace string values with "XXXXX" in java?

masking of email address in java

Extract text from string Java

Trim() in Java not working the way I expect? [duplicate]

Regular expression string search in Java

Categories

Resources