How to pattern match and transform string to generate certain output?

How to pattern match and transform string to generate certain output? - java

The below code is for getting some form of input which includes lots of whitespace in between important strings and before and after the important strings, so far I have been able to filter the whitespace out. After preparing the string what I want to do is process it.
Here is an example of the inputs that I may get and the favorable output I want;
Input
+--------------+
EDIT example.mv Starter web-onyx-01.example.net.mv
Notice how whitespace id before and after the domain, this whitespace could be concluded as random amount.
Output
+--------------+
example.mv. in ns web-onyx-01.example.net.mv.
In the output the important bit is the whitespace between the domain (Example.) and the keyword (in) and keyword (ns) and host (web-onyx-01.example.net.mv.)
Also notice the period (".") after the domain and host. Another part is the fact that if its a (.mv) ccTLD we will have to remove that bit from the string,
What I would like to achieve is this transformation with multiple lines of text, meaning I want to process a bunch of unordered chaotic list of strings and batch process them to produce the clean looking outputs.
The code is by no-means any good design, but this is at least what I have come up with. NOTE: I am a beginner who is still learning about programming. I would like your suggestions to improve the code as well as to solve the problem at hand i.e transform the input to the desired output.
P.S The output is for zone files in DNS, so errors can be very problematic.
So far my code is accepting text from a textarea and outputs the text into another textarea which shows the output.
My code works for as long as the array length is 2 and 3 but fails at anything larger. So how do I go about being able to process the input to the output dynamically for as big as the list/array may become in the future?
String s = jTextArea1.getText();
Pattern p = Pattern.compile("ADD|EDIT|DELETE|Domain|Starter|Silver|Gold|ADSL Business|Pro|Lite|Standard|ADSL Multi|Pro Plus", Pattern.MULTILINE);
Matcher m = p.matcher(s);
s = m.replaceAll("");
String ms = s.replaceAll("(?m)(^\\s+|[\\t\\f ](?=[\\t\\f ])|[\\t\\f ]$|\\s+\\z)", "");
String[] last = ms.split(" ");
for (String test : last){
System.out.println(test);
}
System.out.println("The length of array is: " +last.length);
if (str.isContain(last[0], ".mv")) {
if (last.length == 2) {
for(int i = 0; i < last.length; i++) {
last[0] = last[0].replaceFirst(".mv", "");
System.out.println(last[0]);
last[i] += ".";
if (last[i] == null ? last[0] == null : last[i].equals(last[0])) {
last[i]+= " in ns ";
}
String str1 = String.join("", last);
jTextArea2.setText(str1);
System.out.println(str1);
}
}
else if (last.length == 3) {
for(int i = 0; i < last.length; i++) {
last[0] = last[0].replaceFirst(".mv", "");
System.out.println(last[0]);
last[i] += ".";
if (last[i] == null ? last[0] == null : last[i].equals(last[0])) {
last[i]+= " in ns ";
}
if (last[i] == null ? last[1] == null : last[i].equals(last[1])){
last[i] += "\n";
}
if (last[i] == null ? last[2] == null : last[i].equals(last[2])){
last[i] = last[0] + last[2];
}
String str1 = String.join("", last);
jTextArea2.setText(str1);
System.out.println(str1);
}
}
}

As I understand your question you have multiple lines of input in the following form:
whitespace[command]whitespace[domain]whitespace[label]whitespace[target-domain]whitespace
You want to convert that to the following form such that multiple lines are aligned nicely:
[domain]. in ns [target-domain].
To do that I'd suggest the following:
Split your input into multiple lines
Use a regular expression to check the line format (e.g. for a valid command etc.) and extract the domains
store the maximum length of both domains separately
build a string format using the maximum lengths
iterate over the extraced domains and build a string for that line using the format defined in step 4
Example:
String input = " EDIT domain1.mv Starter example.domain1.net.mv \n" +
" DELETE long-domain1.mv Silver long-example.long-domain1.net.mv \n" +
" ADD short-domain1.mv ADSL Business ex.sdomain1.net.mv \n";
//step 1: split the input into lines
String[] lines = input.split( "\n" );
//step 2: build a regular expression to check the line format and extract the domains - which are the (\S+) parts
Pattern pattern = Pattern.compile( "^\\s*(?:ADD|EDIT|DELETE)\\s+(\\S+)\\s+(?:Domain|Starter|Silver|Gold|ADSL Business|Pro|Lite|Standard|ADSL Multi|Pro Plus)\\s+(\\S+)\\s*$" );
List<String[]> lineList = new LinkedList<>();
int maxLengthDomain = 0;
int maxLengthTargetDomain = 0;
for( String line : lines )
{
//step 2: check the line
Matcher matcher = pattern.matcher( line );
if( matcher.matches() ) {
//step 2: extract the domains
String domain = matcher.group( 1 );
String targetDomain = matcher.group( 2 );
//step 3: get the maximum length of the domains
maxLengthDomain = Math.max( maxLengthDomain, domain.length() );
maxLengthTargetDomain = Math.max( maxLengthTargetDomain, targetDomain.length() );
lineList.add( new String[] { domain, targetDomain } );
}
}
//step 4: build the format string with variable lengths
String formatString = String.format( "%%-%ds in ns %%-%ds", maxLengthDomain + 5, maxLengthTargetDomain + 2 );
//step 5: build the output
for( String[] line : lineList ) {
System.out.println( String.format( formatString, line[0] + ".", line[1] + "." ) );
}
Result:
domain1.mv. in ns example.domain1.net.mv.
long-domain1.mv. in ns long-example.long-domain1.net.mv.
short-domain1.mv. in ns ex.sdomain1.net.mv.

Related

How to splitting different special characters in Java with using if contains?

I have “-“ characters in my strings as below.
I am using if contains “-“ and splitting correctly. But some string values are also “-“ characters in different indexes.
I tried to use 2nd if contains “.-“ cannot solve the issue as well.
So have can I get correct outputs without “-“ characters perfectly?
13-adana-demirspor -> has 2 “-“ characters.
15-y.-malatyaspor -> has “-“ characters too.
1st and 2nd strings makes problem for splitting.
And others has only one “-“ character and no issue.
My Code is:
final String [] URL = {
"13-adana-demirspor",
"14-fenerbahce",
"15-y.-malatyaspor",
"16-trabzonspor",
"17-sivasspor",
"18-konyaspor",
"19-giresunspor",
"20-galatasaray"
};
for(int i=0; i<URL.length; i++)
String team;
if (URL[i].contains("-")) {
String[] divide = URL[i].split("-");
team = divide[1];
System.out.println(" " + team.toUpperCase());
} else if (URL[i].contains(".-")){
String[] divide = URL[i].split(".-");
team = divide[2];
System.out.println(" " + team.toUpperCase());
}else {
team = null;
}
My Output is:
ADANA ** missing second word
FENERBAHCE
Y. ** missing second word
TRABZONSPOR
SIVASSPOR
KONYASPOR
GIRESUNSPOR
GALATASARAY
Thanks for your help.

it looks like you just want to split on the first occurence. for this you can use the second parameter of split and set that to 2. So like
if (URL[i].contains("-")) {
String[] divide = URL[i].split("-", 2);
team = divide[1];
System.out.println(" " + team.toUpperCase());
} else {
team = null;
}
to get the last part instead you could do
if (URL[i].contains("-")) {
String[] divide = URL[i].split("-");
team = divide[divide.length - 1];
System.out.println(" " + team.toUpperCase());
} else {
team = null;
}

Can not count how many number of unique date are available in every part of string

I divided my string in three part using newline ('\n'). The output that i want to achieve: count how many number of unique date are available in every part of string.
According to below code, first part contains two unique date, second part contains two and third part contains three unique date. So the output should be like this: 2,2,3,
But after run this below code i get this Output: 5,5,5,5,1,3,1,
How do i get Output: 2,2,3,
Thanks in advance.
String strH;
String strT = null;
StringBuilder sbE = new StringBuilder();
String strA = "2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-11,2021-03-11,2021-03-11,2021-03-11,2021-03-11," + '\n' +
"2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-15,2021-03-15,2021-03-15,2021-03-15,2021-03-15," + '\n' +
"2021-03-02,2021-03-09,2021-03-07,2021-03-09,2021-03-09,";
String[] strG = strA.split("\n");
for(int h=0; h<strG.length; h++){
strH = strG[h];
String[] words=strH.split(",");
int wrc=1;
for(int i=0;i<words.length;i++) {
for(int j=i+1;j<words.length;j++) {
if(words[i].equals(words[j])) {
wrc=wrc+1;
words[j]="0";
}
}
if(words[i]!="0"){
sbE.append(wrc).append(",");
strT = String.valueOf(sbE);
}
wrc=1;
}
}
Log.d("TAG", "Output: "+strT);

I would use a set here to count the duplicates:
String strA = "2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-11,2021-03-11,2021-03-11,2021-03-11,2021-03-11" + "\n" +
"2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-15,2021-03-15,2021-03-15,2021-03-15,2021-03-15" + "\n" +
"2021-03-02,2021-03-09,2021-03-07,2021-03-09,2021-03-09";
String[] lines = strA.split("\n");
List<Integer> counts = new ArrayList<>();
for (String line : lines) {
counts.add(new HashSet<String>(Arrays.asList(line.split(","))).size());
}
System.out.println(counts); // [2, 2, 3]
Note that I have done a minor cleanup of the strA input by removing the trailing comma from each line.

With Java 8 Streams, this can be done in a single statement:
String strA = "2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-11,2021-03-11,2021-03-11,2021-03-11,2021-03-11," + '\n' +
"2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-15,2021-03-15,2021-03-15,2021-03-15,2021-03-15," + '\n' +
"2021-03-02,2021-03-09,2021-03-07,2021-03-09,2021-03-09,";
String strT = Pattern.compile("\n").splitAsStream(strA)
.map(strG -> String.valueOf(Pattern.compile(",").splitAsStream(strG).distinct().count()))
.collect(Collectors.joining(","));
System.out.println(strT); // 2,2,3
Note that Pattern.compile("\n").splitAsStream(strA) can also be written as Arrays.stream(strA.split("\n")), which is shorter to write, but creates an unnecessary intermediate array. Matter of personal preference which is better.
String strT = Arrays.stream(strA.split("\n"))
.map(strG -> String.valueOf(Arrays.stream(strG.split(",")).distinct().count()))
.collect(Collectors.joining(","));
The first version can be further micro-optimized by only compiling the regex once:
Pattern patternComma = Pattern.compile(",");
String strT = Pattern.compile("\n").splitAsStream(strA)
.map(strG -> String.valueOf(patternComma.splitAsStream(strG).distinct().count()))
.collect(Collectors.joining(","));

Regex to capture groups and ignore last two characters where one is optional

I need to capture two groups from an input string. The values differ in structure as they come in.
The following are examples of the incoming strings:
Comment = "This is a comment";
NumericValue = 123456;
What I am trying to accomplish is to capture the string value from the left of the equals sign as one group and the value after the equals sign as a second group. The semicolon should never be included.
The caveat is that if the second group is a string, the quotes from each end must not be included in that capture group.
The expected results would be:
Comment = "This is a comment";
key group => Comment
value group => This is a comment
NumericValue = 123456;
key group => NumericValue
value group => 123456
The following is what I have so far. This works fine for capturing the numeric value, but leaves the end double quote when capturing the string value.
(?<key>\w+)\s*=\s*(?:[\"]?)(?<group>.+(?:(?=[\"]?;)))
EDIT
When applying the regex against a string value, it must allow capture of semicolons and double quotes within the string and ignore only the closing ones.
So, if we have an input of:
Comment = "This is a "comment"; This is still a comment";
The second capture group should be:
This is a "comment"; This is still a comment

An option is to use an alternation where you would have to check for group 2 or group 3:
(?<key>\w+)\h*=\h*(?:"(.*?)"|([^"\r\n]+));$
(?<key>\w+) Group key match 1+ word chars
\h*=\h* Match an = between optional horizontal whitespace chars
(?: Non capturing group
"(.+?)" Capture in group 2 1+ times any char between "
| Or
([^"\r\n]+) Capture group 3, match 1+ times any char except " or a newline
); Close non capturing group and match ;
$ End of string
Regex demo
In Java
String regex = "(?<key>\\w+)\\h*=\\h*(?:\"(.*?)\"|([^\"\\r\\n]+));$";

Edited based on comment to include ; and " in the comments as per the examples given:
(?<key>\w+)\s*=\s*(?:[\"]?)(?<value>((")(?!;?$)|;(?!$)|[^;"])+)"?;?$
The following one additionally doesn't allow ; or " to appear in the numeric text. However, to include this, I had to rename the capturing groups because the name cannot be used for more than one group.
(?<key>\w+)\s*=\s*((?:")(?<valueT>((")(?!;?$)|;(?!$)|[^;"])+)";?$|(?<valueN>[^;"]+);?$)
Here is a class that tests it.
For readability, I have separated the key and value regexes in the class. I have added the test cases in a method within the class. However, this still doesn't handle the case of a numeric text containing ; or ". Also, the line needs to be trimmed before being subjected to the pattern test (which I think is feasible).
public class NameValuePairRegex{
public static void main( String[] args ){
String SPACE = "\\s*";
String EQ = "=";
String OR = "|";
/* The original regex tried by you (for comparison). */
String orig = "(?<key>\\w+)\\s*=\\s*(?:[\\\"]?)(?<value>.+(?:(?=;)))";
String key = "(?<key>\\w+)";
String valuePatternForText = "(?:\")(?<valueT>((\")(?!;?$)|;(?!$)|[^;\"])+)\";?$";
String valuePatternForNumbers = "(?<valueN>[^;\"]+);?$";
String p = key + SPACE + EQ + SPACE + "(" + valuePatternForText + OR + valuePatternForNumbers + ")";
Pattern nvp = Pattern.compile( p );
System.out.println( nvp.pattern() );
print( input(), nvp );
}
private static void print( List<String> input, Pattern ep ) {
for( String e : input ) {
System.out.println( e );
Matcher m = ep.matcher( e );
boolean found = m.find();
if( !found ) {
System.out.println( "\t\tNo match" );
continue;
}
String valueT = m.group( "valueT" );
String valueN = m.group( "valueN" );
System.out.print( "\t\t" + m.group( "key" ) + " -> " + ( valueT == null ? "" : valueT ) + " " + ( valueN == null ? "" : valueN ) );
System.out.println( );
}
}
private static List<String> input(){
List<String> neg = new ArrayList<>();
Collections.addAll( neg,
"Comment = \"This is a comment\";",
"Comment = \"This is a comment with semicolon ;\";",
"Comment = \"This is a comment with semicolon ; and quote\"\";",
"Comment = \"This is a comment\"",
"Comment = \"This is a \"comment\"; This is still a comment\";",
"NumericValue = 123456;",
"NumericValue = 123;456;",
"NumericValue = 123\"456;",
"NumericValue = 123456" );
return neg;
}
}
Original answer:
The following changed regex is fulfilling the requirements you mentioned. I added the exclusion of ; and " from the value part.
Original that you tried:
(?<key>\w+)\s*=\s*(?:[\"]?)(?<group>.+(?:(?=[\"]?;)))
The changed one:
(?<key>\w+)\s*=\s*(?:[\"]?)(?<value>[^;"]+)

Regular expressions are fun, but look how clean and easy to read this would be without using a regular expression:
int equals = s.indexOf('=');
String key = s.substring(0, equals).trim();
String value = s.substring(equals + 1).trim();
if (value.endsWith(";")) {
value = value.substring(0, value.length() - 1).trim();
}
if (value.startsWith("\"") && value.endsWith("\"")) {
value = value.substring(1, value.length() - 1);
}
Don’t assume that because this uses more lines of code than a regular expression that it’s slower. The lines of code executed internally by a regex engine will far exceed the above code.

Insert a space after every given character - java

I need to insert a space after every given character in a string.
For example "abc.def..."
Needs to become "abc. def. . . "
So in this case the given character is the dot.
My search on google brought no answer to that question
I really should go and get some serious regex knowledge.
EDIT : ----------------------------------------------------------
String test = "0:;1:;";
test.replaceAll( "\\:", ": " );
System.out.println(test);
// output: 0:;1:;
// so didnt do anything
SOLUTION: -------------------------------------------------------
String test = "0:;1:;";
**test =** test.replaceAll( "\\:", ": " );
System.out.println(test);

You could use String.replaceAll():
String input = "abc.def...";
String result = input.replaceAll( "\\.", ". " );
// result will be "abc. def. . . "
Edit:
String test = "0:;1:;";
result = test.replaceAll( ":", ": " );
// result will be "0: ;1: ;" (test is still unmodified)
Edit:
As said in other answers, String.replace() is all you need for this simple substitution. Only if it's a regular expression (like you said in your question), you have to use String.replaceAll().

You can use replace.
text = text.replace(".", ". ");
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace%28java.lang.CharSequence,%20java.lang.CharSequence%29

If you want a simple brute force technique. The following code will do it.
String input = "abc.def...";
StringBuilder output = new StringBuilder();
for(int i = 0; i < input.length; i++){
char c = input.getCharAt(i);
output.append(c);
output.append(" ");
}
return output.toString();

regex; for to capture a specific group which is repeated number of times

Compare how you would accomplish the two tasks mentioned below with and without regular expressions. The problem:
The format for an SMS-based food delivery will be:
PABUSOG slash or comma repeated an infinite number of times #
// The quantity can only be numeric. For simplicity, assume that quantity is always an integer
e.g. PABUSOG STRFRY_SMAI/2 HSHBRWN_BRGR/1 COFEEFLT/1 #En311
it will capture the following:
STRFRY_SMAI - 2
HSHBRWN_BRGR - 1
COFEEFLT - 1
this is my sample code: // doing with regex
String message = "PABUSOG ASD_ASD/1 ASD_ASA/2";
Pattern pattern = Pattern.compile("PABUSOG(\\s+([A-Z]+_[A-Z]+)(/|,)([0-9]))+"
,Pattern.CASE_INSENSITIVE);
Matcher m = pattern.matcher(message);
try
{
if (m.matches())
{
String food = m.group(2);
String quantity = m.group(4);
System.out.println(food + " -- " + quantity + "\\n");
}
}
catch (NullPointerException e)
{
}
it displays the ASD_ASA -- 2, it overrides the 1st one which is ASD_ASD/1.
it must display
ASD_ASD -- 1
ASD_ASA -- 2

You cannot accomplish that with a single regex giving you all the data inside groups. And there's no great need for complex regex either. But still if you prefer regex try searching for pattern iteratively.
if (!message.startsWith("PABUSOG")) {
return;
}
Pattern pattern = Pattern.compile("([A-Z_]+)[/,]([0-9])+", Pattern.CASE_INSENSITIVE);
Matcher m = pattern.matcher(message);
while (m.find()) {
String food = m.group(1);
String quantity = m.group(2);
System.out.println(food + " -- " + quantity);
}
Without complex regex you can do the following by using String API:
// Check for correct header
if (!message.startsWith("PABUSOG")) {
return;
}
// split by whitespaces
String[] items = message.split("\\s+");
// skip header and iterate over remaining items
for (String item : Arrays.asList(items).subList(1, items.length)) {
// split each item by / or ,
String[] foodQuantity = item.split("[/,]");
assert foodQuantity.length == 2;
String food = foodQuantity[0];
String quantity = foodQuantity[1];
System.out.println(food + " -- " + quantity);
}
To skip items started with # you can either add
if (item.startsWith("#")) {
break; // or continue if it can be not the last
}
inside loop or limit subList in the following way if you sure that such item is always present and terminates the sequence: Arrays.asList(items).subList(1, items.length - 1).
By the way, your pattern [A-Z]+_[A-Z]+ won't match COFEEFLT from your example.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to pattern match and transform string to generate certain output? - java

Related

How to splitting different special characters in Java with using if contains?

Can not count how many number of unique date are available in every part of string

Regex to capture groups and ignore last two characters where one is optional

Insert a space after every given character - java

regex; for to capture a specific group which is repeated number of times

Categories

Resources