How to select specific fragment from the whole text? - java

After registering on the site, I receive the credentials by mail in the format:
some text /
login: example#mail.com /
password: example123 /
some text
I need to select and copy exactly the login and password without too much text. All text is located in one table . No idea how to do this. I will be very grateful for the idea of how to do this.

You could either split the string (quick and dirty):
String input = "some text / login: example#mail.com / password: example123 / some text";
// iterate over lines if necessary or join using a stream.join("\n")
String username = input.split("login: ").split(" ")[0];
String password = input.split("password: ").split(" ")[0];
There are probably many other ways others can suggest also.
or use a regex and pattern match:
String input = "some text / login: example#mail.com / password: example123 / some text";
String emailRegex = "login: .*#.*\\..* ";
Pattern pattern = Pattern.compile(emailRegex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
String loginEmail = matcher.group();
System.out.println(matchingText); // would need to split it
}
Do the same for password.
If you want something that's scalable, easier to manage, use Regex. If you're not concerned about performance I think the first option is relatively straight forward but if the email format changes majorly you may need to maintain it.

Related

Regex Redirect URL excludes token

I'm trying to create a redirect URL for my client. We have a service that you specify "fromUrl" -> "toUrl" that is using a java regex Matcher. But I can't get it work to include the token in when it converts it. For example:
/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf
Should be:
/tourl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf
but it excludes the token so the result I get is:
/fromurl/login/
/tourl/login/
I tried various regex patterns like: " ?.* and [%5E//?]+)/([^/?]+)/(?.*)?$ and (/*) etc" but no one seems to work.
I'm not that familiar with regex. How can I solve this?
This can be easily done using simple string replace but if you insist on using regular expressions:
Pattern p = Pattern.compile("fromurl");
String originalUrlAsString = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf ";
String newRedirectedUrlAsString = p.matcher(originalUrlAsString).replaceAll("tourl");
System.out.println(newRedirectedUrlAsString);
If I understand you correctly you need something like this?
String from = "/my/old/url/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = from.replaceAll("\\/(.*)\\/", "/my/new/url/");
System.out.println(to); // /my/new/url/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
This will replace everything between the first and the last forward slash.
Can you detail more exactly what the original expression is like? This is necessary because the regular expression is based on it.
Assuming that the first occurrence of fromurl should simply be replaced with the following code:
String from = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = from.replaceFirst("fromurl", "tourl");
But if it is necessary to use more complex rules to determine the substring to replace, you can use:
String from = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = "";
String regularExpresion = "(<<pre>>)(fromurl)(<<pos>>)";
Pattern pattern = Pattern.compile(regularExpresion);
Matcher matcher = pattern.matcher(from);
if (matcher.matches()) {
to = from.replaceAll(regularExpresion, "$1tourl$3");
}
NOTE: pre and pos targets are referencial because I don't know the real expresion of the url
NOTE 2: $1 and $3 refer to the first and the third group
Although existing answers should solve the issue and some are similar, maybe below solution would be of help, with quite an easy regex being used (assuming you get input of same format as your example):
private static String replaceUrl(String inputUrl){
String regex = "/.*(/login\\?token=.*)";
String toUrl = "/tourl";
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(inputUrl);
if (matcher.find()) {
return toUrl + matcher.group(1);
} else
return null;
}
You can write a test if it works for other expected inputs/outputs if you want to change format and adjust regex:
String inputUrl = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String expectedUrl = "/tourl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
if (expectedUrl.equals(replaceUrl(inputUrl))){
System.out.println("Success");
}

Regex: how to extract a JSESSIONID cookie value from cookie string?

I might receive the following cookie string.
hello=world;JSESSIONID=sdsfsf;Path=/ei
I need to extract the value of JSESSIONID
I use the following pattern but it doesn't seem to work. However https://regex101.com shows it's correct.
Pattern PATTERN_JSESSIONID = Pattern.compile(".*JSESSIONID=(?<target>[^;\\n]*)");
You can reach your goal with a simpler approach using regex (^|;)JSESSIONID=(.*);. Here is the demo on Regex101 (you have forgotten to link the regular expression using the save button). Take a look on the following code. You have to extract the matched values using the class Matcher:
String cookie = "hello=world;JSESSIONID=sdsfsf;Path=/ei";
Pattern PATTERN_JSESSIONID = Pattern.compile("(^|;)JSESSIONID=(.*);");
Matcher m = PATTERN_JSESSIONID.matcher(cookie);
if (m.find()) {
System.out.println(m.group(0));
}
Output value:
sdsfsf
Of course the result depends on the all of possible variations of the input text. The snippet above will work in every case the value is between JSESSIONID and ; characters.
You can try below regex:
JSESSIONID=([^;]+)
regex explanation
String cookies = "hello=world;JSESSIONID=sdsfsf;Path=/ei;submit=true";
Pattern pat = Pattern.compile("\\bJSESSIONID=([^;]+)");
Matcher matcher = pat.matcher(cookies);
boolean found = matcher.find();
System.out.println("Sesssion ID: " + (found ? matcher.group(1): "not found"));
DEMO
You can even get what you aiming for with Splitting and Replacing the string aswell, below I am sharing which is working for me.
String s = "hello=world;JSESSIONID=sdsfsf;Path=/ei";
List<String> sarray = Arrays.asList(s.split(";"));
String filterStr = sarray.get(sarray.indexOf("JSESSIONID=sdsfsf"));
System.out.println(filterStr.replace("JSESSIONID=", ""));

regex to find email address from a String

My intention is to get email address from a web page. I have the page source. I am reading the page source line by line. Now I want to get email address from the current line I am reading. This current line may or may not have email. I saw a lot of regexp examples. But most of them are for validating email address. I want to get the email address from a page source not validate. It should work as http://emailx.discoveryvip.com/ is working
Some examples input lines are :
1)<p>Send details to neeraj#yopmail.com</p>
2)<p>Interested should send details directly to www.abcdef.com/abcdef/. Should you have any questions, please email neeraj#yopmail.com.
3)Note :- Send your queries at neeraj#yopmail.com for more details call Mr. neeraj 012345678901.
I want to get neeraj#yopmail.com from examples 1,2 and 3.
I am using java and I am not good in rexexp. Help me.
You can validate e-mail address formats as according to RFC 2822, with this:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
and here's an explanation from regular-expressions.info:
This regex has two parts: the part before the #, and the part after the #. There are two alternatives for the part before the #: it can either consist of a series of letters, digits and certain symbols, including one or more dots. However, dots may not appear consecutively or at the start or end of the email address. The other alternative requires the part before the # to be enclosed in double quotes, allowing any string of ASCII characters between the quotes. Whitespace characters, double quotes and backslashes must be escaped with backslashes.
And you can check this out here: Rubular example.
The correct code is
Pattern p = Pattern.compile("\\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\\.[A-Z]{2,4}\\b",
Pattern.CASE_INSENSITIVE);
Matcher matcher = p.matcher(input);
Set<String> emails = new HashSet<String>();
while(matcher.find()) {
emails.add(matcher.group());
}
This will give the list of mail address in your long text / html input.
You need something like this regex:
".*(\\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\\.[A-Z]{2,4}\\b).*"
When it matches, you can extract the first group and that will be your email.
String regex = ".*(\\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\\.[A-Z]{2,4}\\b).*";
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("your text here");
if (m.matches()) {
String email = m.group(1);
//do somethinfg with your email
}
This is a simple way to extract all emails from input String using Patterns.EMAIL_ADDRESS:
public static List<String> getEmails(#NonNull String input) {
List<String> emails = new ArrayList<>();
Matcher matcher = Patterns.EMAIL_ADDRESS.matcher(input);
while (matcher.find()) {
int matchStart = matcher.start(0);
int matchEnd = matcher.end(0);
emails.add(input.substring(matchStart, matchEnd));
}
return emails;
}

How to extract word from string?

Suppose I have a string:
String message = "you should try http://google.com/";
Now, I want to send "http://google.com/" to a new
String url
What I want to do is:
check if a "word" in the string begins with "http://" and extract that word, where a word is
something that's surrounded by spaces (general english definition of word).
I have no idea how to extract the string, and the best I can do is use startsWith on the string. How to I use startsWith on a word, and extract the word?
Sorry if this is a little bit difficult to explain.
Thanks in advance!
EDIT: Also, what should I do to extract the word from the REGEX operation? And how should I handle it if there is more than 1 url in the string?
Use Pattern & Matcher classes.
String str = "blabla http://www.mywebsite.com blabla";
String regex = "((https?:\\/\\/)?(www.)?(([a-zA-Z0-9-]){2,}\\.){1,4}([a-zA-Z]){2,6}(\\/([a-zA-Z-_/.0-9#:+?%=&;,]*)?)?)";
Matcher m = Pattern.compile(regex).matcher(str);
if (m.find()) {
String url = m.group(); //value "http://www.mywebsite.com"
}
This regex will work for http://..., https://... and even www... URLs. Others regex can be easily found on the net.
You can try this:
String str = "blabla http://www.mywebsite.com blabla";
Matcher m = Pattern.compile("(http://.*)").matcher(str);
if (m.find()) {
String url = (new StringTokenizer(m.group(), " ")).nextToken();
}
The "correct" way to perform this task is to split the String by whitespace -- String#split("\s") -- and then pipe it to the URL constructor. If the string starts with your prefix and a MalformedURLException is thrown it is invalid. The URL class constructor is far better tested and more robust than any solution that you or I could come up with. So, use it, please and don't reinvent the wheel.
You can use Java Regex for this:
The following regex catches any string starting with http:// or https:// till the next whitespace character:
Pattern urlPattern = Pattern.compile("(http(s)?://[.^[\\S]]*)");
Matcher matcher = compile.matcher(myString);
if (matcher.find()) {
String url = matcher.group();
}

Regular Expression Search On String

I am having great issues searching a string for particular parameters that are needed in my application, I am under the assumption that the only real way to do this is using regular expressions however they are giving me a huge headache! I don't usually write them myself but get them off other websites however what i need isn't simple enough to be included :(
Here is the string:
10 50 u E2U+pstn:tel "!^(.*)$!tel:\\1;spn=42180;mcc=234;mnc=33!" .
I need to extract the spn, mcc, and the mnc from this string. Unfortunately the api i call changes the location of these on the string for some requests which makes indexing the string difficult. I really need to list what i need to grab the spn= for example then follow off and read the number but everything i try never works.
I wouldn't use regex but simply splitting :
String[] tokens = str.split(";");
for (int i=0; i<tokens.length; i++) {
if (tokens[i].startsWith("spn=")) {
spn = Integer.parseInt(tokens[i].substring("spn=".length()));
}
}
Of course you could objectify this a little, or use constants for "spn=".
A solution using Pattern and Matcher:
String s = "10 50 u E2U+pstn:tel \"!^(.*)$!tel:\\\\1;spn=42180;mcc=234;mnc=33!\"";
Pattern p = Pattern.compile("^.*spn=([0-9]+);mcc=([0-9]*);mnc=([0-9]*)!.*$");
Matcher matcher = p.matcher(s);
matcher.matches(); // true
String spn = matcher.group(1); // 42180
String mcc = matcher.group(2); // 234
String mnc = matcher.group(3); // 33
Edit: You can use named-capturing groups, too:
Pattern p =
Pattern.compile("^.*spn=(?<spn>[0-9]+);mcc=(?<mcc>[0-9]*);mnc=(?<mnc>[0-9]*)!.*$");
Matcher matcher = p.matcher(s);
matcher.matches(); // true
String spn = matcher.group("spn");
String mcc = matcher.group("mcc");
String mnc = matcher.group("mnc");

Categories

Resources