Java Best way to extract parts from a string

Java Best way to extract parts from a string - java

I have the following string;
[Username [rank] -> me] message
The characters of the rank, username, and message will vary each time. What is the best way I can break this into three separate variables (Username, rank and message)?
I have experimented with:
String[] parts = text.split("] ");
But it is throwing back errors. Thanks in advance!

Use Java's support for regular expressions (java.util.regex) and let a regex match the 3 parts.
For example this one: ^\[([\w]+) \[([\w]+)\] -> \w+\] (.*)$
Java code snippet, slightly adapted from Ian F. Darwin's "Java Cookbook" (O'Reilly):
import java.util.regex.*;
class Test
{
public static void main(String[] args)
{
String pat = "^\\[([\\w]+) \\[([\\w]+)\\] -> \\w+\\] (.*)$";
Pattern rx = Pattern.compile(pat);
String text = "[Username [rank] -> me] message";
Matcher m = rx.matcher(text);
if(m.find())
{
System.out.println("Match found:");
for(int i=0; i<=m.groupCount(); i++)
{
System.out.println(" Group " + i + ": " + m.group(i));
}
}
}
}
Output:
Match found:
Group 0: [Username [rank] -> me] message
Group 1: Username
Group 2: rank
Group 3: message

String input = "[whatever [Is] -> me] Input";
String user, rank, message;
user = input.substring(1, input.indexOf('[', 1));
rank = input.substring(input.indexOf('[', 1), input.indexOf(']'));
message = input.substring(input.lastIndexOf(']'));
this should work but if you really want it done right you should make a separate object that holds all this and can output it as a string. depends where this is coming from and where its going.
-axon

Related

Regex to remove line break within double quote in CSV

Hi I have a csv file with an error in it.so i want it to correct with regular expression, some of the fields contain line break, Example as below
"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre Pkwy
California",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"
the above two lines should be in one line
"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre PkwyCalifornia",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"
I tried to use the below regex but it didnt help me
%s/\\([^\"]\\)\\n/\\1/

Try this:
public static void main(String[] args) {
String input = "\"AHLR150\",\"CDS\",\"-1\",\"MDCPBusinessRelationshipID\","
+ ",,\"Investigating\",\"1600 Amphitheatre Pkwy\n"
+ "California\",,\"Mountain View\",,\"United\n"
+ "States\",,\"California\",,,\"94043-1351\",\"9958\"\n";
Matcher matcher = Pattern.compile("\"([^\"]*[\n\r].*?)\"").matcher(input);
Pattern patternRemoveLineBreak = Pattern.compile("[\n\r]");
String result = input;
while(matcher.find()) {
String quoteWithLineBreak = matcher.group(1);
String quoteNoLineBreaks = patternRemoveLineBreak.matcher(quoteWithLineBreak).replaceAll(" ");
result = result.replaceFirst(quoteWithLineBreak, quoteNoLineBreaks);
}
//Output
System.out.println(result);
}
Output:
"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre Pkwy California",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"

Create a RegEx surrounding the text you want to keep by parentheses and that will create a group of matched characters. Then replace the string using the group index to compose as you wish.
String test = "\"AHLR150\",\"CDS\",\"-1\",\"MDCPBusinessRelationshipID\","
+ ",,\"Investigating\",\"1600 Amphitheatre Pkwy\n"
+ "California\",,\"Mountain View\",,\"United\n"
+ "States\",,\"California\",,,\"94043-1351\",\"9958\"\n";
System.out.println(test.replaceAll("(\"[^\"]*)\n([^\"]*\")", "$1$2"));
So when we replace the matching string ("United\nStates") by $1$2 we are removing the line break because it not belongs to any group:
$1 => the first group (\"[^\"]*) that will match "United
$2 => the second group ([^\"]*\")" that will match States"

Based on this you can try with:
/\r?\n|\r/
I checked it here and seems to be fine

How can I get non-matching groups using a Matcher in Java?

I'm trying to write a java regex to catch some groups of words from a String using a Matcher.
Say i got this string: "Hello, we are #happy# to see you today".
I would like to get 2 group of matches, one having
Hello, we are
to see you today
and the other
happy
So far, I was only able to match the word between the #s using this Pattern:
Pattern p = Pattern.compile("#(.+?)#");
I've read about negative lookahead and lookaround, played a bit with it but without success.
I assume I should do some sort of negation of the regex so far, but I couldn't come up with anything.
Any help would be really appreciated, thank you.

From comment:
I may incur in a string where I got more than one instances of words wrapped by #, such as "#Hello# kind #stranger#"
From comment:
I need to apply some different style format to both the text inside and outside.
Since you need to apply different stylings, the code need to process each block of text separately, and needs to know if the text is inside or outside a #..# section.
Note, in the following code, it will silently skip the last #, if there is an odd number of them.
String input = ...
for (Matcher m = Pattern.compile("([^#]+)|#([^#]+)#").matcher(input); m.find(); ) {
if (m.start(1) != -1) {
String outsideText = m.group(1);
System.out.println("Outside: \"" + outsideText + "\"");
} else {
String insideText = m.group(2);
System.out.println("Inside: \"" + insideText + "\"");
}
}
Output for input = "Hello, we are #happy# to see you today"
Outside: "Hello, we are "
Inside: "happy"
Outside: " to see you today"
Output for input = "#Hello# kind #stranger#"
Inside: "Hello"
Outside: " kind "
Inside: "stranger"
Output for input = "This #text# has unpaired # characters"
Outside: "This "
Inside: "text"
Outside: " has unpaired "
Outside: " characters"

The best I could do is splitting in 3 groups, then merging the group 1 and 4 :
(^.*)(\#(.+?)\#)(.*)
Test it here
EDIT: Taking remarks from the comments :
(^[^\#]*)(?:\#(.+?)\#)([^\#]*)
Thanks to #Lino we don't capture the useless group with # anymore, and we capture anything except #, instead of any non whitespace character in the 1st and 2nd groups.
Test it here

Is this solution fine?
Pattern pattern =
Pattern.compile("([^#]+)|#([^#]*)#");
Matcher matcher =
pattern.matcher("Hello, we are #happy# to see you today");
List<String> notBetween = new ArrayList<>(); // not surrounded by #
List<String> between = new ArrayList<>(); // surrounded by #
while (matcher.find()) {
if (Objects.nonNull(matcher.group(1))) notBetween.add(matcher.group(1));
if (Objects.nonNull(matcher.group(2))) between.add(matcher.group(2));
}
System.out.println("Printing group 1");
for (String string :
notBetween) {
System.out.println(string);
}
System.out.println("Printing group 2");
for (String string :
between) {
System.out.println(string);
}

Regex to match a fully qualified hostname or URL with optional https

2 possible strings contained in a log file:
1) "some text then https://myhost.ab.us2.myDomain.com and then some more text"
OR:
2) "some text then myhost.ab.us2.myDomain.com and then some more text"
The "myDomain.com" is constant, so we can look for that hard-coded in the regex.
In both cases, they are not at the start of the line, but in the middle.
Need to extract "myhost" out of the line, if it matches.
I've tried positive look behind using "https://" OR "\\s{1}". The https:// by itself works:
Matcher m = Pattern.compile("https://(.+?)\\.(.+?)\\.(.+?)\\.myDomain\\.com\\s").matcher(input);
I'm want to add an "or" in there so it matches with "https://" or "<space>" ("https://|//s{1}"), but it always grabs the entire string up to the start of the first space.
For now, I've settled on splitting the string into String[] and checking if it contains "myDomain". I worked so long on this I wanted to learn what the best answer is.

I just put in a non-regex approach:
public static String extractHost(String logEntry, String domain)
{
logEntry = logEntry.toLowerCase(); -> not needed, just a hint to remember case sensitive stuff ;)
if(logEntry.indexOf("https://") != -1)
{
// contains protocol, must be variant one
return logEntry.substring(logEntry.indexOf("https://")+8,logEntry.indexOf("."));
}
// has to be variant two
int domainIndex = logEntry.indexOf(domain);
if(domainIndex == -1) return null;
int previousDotIndex = -1;
for(int i = domainIndex; i>= 0; i--)
{
if(logEntry.charAt(i) == '.') previousDotIndex = i;
if(logEntry.charAt(i) == ' ') return logEntry.substring(++i,previousDotIndex);
}
return null;
}
The variant #2 is actually the more difficult one, in this approach you just iterate from the domain's index back to the first whitespace found and store the position of the most recent dot found. Then it's just a simple substring.

I'd use something like
\b(?:https?:\/\/)?(\w+)\.(?:\w+\.)*myDomain\.com
This matches an optional https:// prefix followed by your host which is captured, followed by some other subdomains (you could specify how many with {2} or hardcode them in, if you know it's always ab.us2), then myDomain.com.
In Java 10:
import java.util.Arrays;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
var text = "some text then https://myhost.ab.us2.myDomain.com " +
"and then some more text some text then " +
"myhost.ab.us2.myDomain.com and then some more text";
var pat = "\\b(?:https?://)?(\\w+)\\.(?:\\w+\\.)*myDomain\\.com";
var matches = Pattern.compile(pat)
.matcher(text)
.results()
.map((m) -> m.group(1))
.toArray(String[]::new);
System.out.println(Arrays.toString(matches)); // => [myhost, myhost]
}
}
In Java 8:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String text = "some text then https://myhost.ab.us2.myDomain.com " +
"and then some more text some text then " +
"myhost.ab.us2.myDomain.com and then some more text";
String pat = "\\b(?:https?://)?(\\w+)\\.(?:\\w+\\.)*myDomain\\.com";
Matcher matcher = Pattern.compile(pat).matcher(text);
while (matcher.find()) {
System.out.println(matcher.group(1)); // => myhost myhost
}
}
}

How to replace a word with specific word

I have a String:
String s="<p>Dear <span>{customerName}, your {accountName} is actived </span></p><p> </p><p><span>Congrats!.....</span></p>";
So I want to take CustomerName and accountName words and replace with customers details. Can anyone please tell me how can I replace. Here customerName and accountName are dynamically changing ..because those are columns in database sometimes different columns. So i want to find the words within the { and } and need to replace with column data.

Use the following code
s = s.replace("{customerName}", realCustomerName);
s = s.replace("{accountName}", realAccountNAme);
With String's replace function, the first argument is the string you want to replace, and the second argument is the string you want to insert.

Try:
s=s.replace('{customerName}',CustomerName ).replace('{accountName}',accountName);
where CustomerName and accountName will be the strings holding your customers details

If you simply want to replace the words, you could do the following:
String s="<p>Dear <span>{customerName}, your {accountName} is actived </span></p><p> </p><p><span>Congrats!.....</span></p>";
s.replace( "{customerName}", customer.getName() );
s.replace( "{accountName}", account.getName() );
Or, if you are building the string yourself and you can modify it, it might be better to do the following:
String s="<p>Dear <span>%1$s, your %1$s is actived </span></p><p> </p><p><span>Congrats!.....</span></p>";
// You may also just create a new String object...
s = String.format( s, customer.getName(), account.getName() );

Finally, I found the answer to replace the words using regular expressions. Here words b/w ~ need to replace and these words are not fixed and dynamically will be added to string from UI text Area.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularEx {
/**
* #param args
*/
public static void main(String args[]) {
Pattern pattern = Pattern.compile("\\~.*?\\~");
StringBuilder s = new StringBuilder(
"~ABCD~~BBCc~All the best ~ABCD~~BBCc~~in~~Raja~ Such kind of people ~in~~Raja~~ABCD~~BBCc~~in~~Raja~rajasekhar~ABCD~~BBCc~~in~~Raja~ Bayanapalli ~Chinthalacheruvu~");
Matcher matcher = pattern.matcher(s);
// using Matcher find(), group(), start() and end() methods
String s1 =new String("~ABCD~~BBCc~All the best ~ABCD~~BBCc~~in~~Raja~ Such kind of people ~in~~Raja~~ABCD~~BBCc~~in~~Raja~rajasekhar~ABCD~~BBCc~~in~~Raja~ Bayanapalli ~Chinthalacheruvu~");
int i = 0;
while (matcher.find()) {
String grp = matcher.group();
int si = matcher.start();
int ei = matcher.end();
System.out.println("Found the text \"" + grp
+ "\" starting at " + si + " index and ending at index " + ei);
s1=s1.replaceAll(grp, "Raja");
//System.out.println("FinalString" + s1);
}
System.out.println("------------------------------------\nFinalString" + s1);
}
}

s = s.replace("{customerName}", "John Doe");
s = s.replace("{accountName}", "jdoe");

Use RegEx to extract number from coordinates

I am a beginner of Java Programming language.
When I input (1,2) into the console (brackets included), how can I write the code to extract the first and the second number using RegEx?
If there is no such expression to extract the first/second number within the brackets, I will have to change the way of inputing coordinates to x,y without the brackets and that should be a lot easier to extract numbers to be used.

Try this code:
public static void main(String[] args) {
String searchString = "(7,32)";
Pattern compile1 = Pattern.compile("\\(\\d+,");
Pattern compile2 = Pattern.compile(",\\d+\\)");
Matcher matcher1 = compile1.matcher(searchString);
Matcher matcher2 = compile2.matcher(searchString);
while (matcher1.find() && matcher2.find()) {
String group1 = matcher1.group();
String group2 = matcher2.group();
System.out.println("value 1: " + group1.substring(1, group1.length() - 1 ) + " value 2: " + group2.substring(1, group2.length() - 1 ));
}
}
Not that I think regex is the best to use here. If you know the input will be in the form of: (number, number), I would first get rid of brackets:
stringWithoutBrackets = searchString.substring(1, searchString.length()-1)
and than tokenize it with split
String[] coordiantes = stringWithoutBrackets.split(",");
Looked through Regex API and you can also do something like this:
public static void main(String[] args) {
String searchString = "(7,32)";
Pattern compile1 = Pattern.compile("(?<=\\()\\d+(?=,)");
Pattern compile2 = Pattern.compile("(?<=,)\\d+(?=\\))");
Matcher matcher1 = compile1.matcher(searchString);
Matcher matcher2 = compile2.matcher(searchString);
while (matcher1.find() && matcher2.find()) {
String group1 = matcher1.group();
String group2 = matcher2.group();
System.out.println("value 1: " + group1 + " value 2: " + group2);
}
}
The main change is that I used (?<==\)), (?=,), (?<=,), (?=\)), to search for brackets and commas but not caputre them. But I really think its an overkill for this task.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Best way to extract parts from a string - java

Related

Regex to remove line break within double quote in CSV

How can I get non-matching groups using a Matcher in Java?

Regex to match a fully qualified hostname or URL with optional https

How to replace a word with specific word

Use RegEx to extract number from coordinates

Categories

Resources