RegEX in trimming first two character - java

I'm trying to extract two words from a line with regex using matcher in Java
my line goes like this, BROWSER=Firefox
I'm using the below code
currentLine = currentLine.trim();
System.out.println("Current Line: "+ currentLine);
Pattern p = Pattern.compile("(.*?)=(.*)");
Matcher m = p1.matcher(currentLine);
if(m.find(1) && m.find(2)){
System.out.println("Key: "+m.group(1)+" Value: "+m.group(2));
}
The output I get is
Key: OWSER Value: FireFox
BR is trimming off in my case. It seems to be weird to me, till I know why it behaves in this way, as this works perfectly with PERL. Can someone help me?

When you call m.find(2) it strips the first two chars. From the JavaDocs (bold is mine):
public boolean find(int start)
Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.
So, use just m.find():
String currentLine = "BROWSER=FireFox";
System.out.println("Current Line: "+ currentLine);
Pattern p = Pattern.compile("(.*?)=(.*)");
Matcher m = p.matcher(currentLine);
if (m.find()) {
System.out.println("Key: "+m.group(1)+" Value: "+m.group(2));
}
Output:
Current Line: BROWSER=FireFox
Key: BROWSER Value: FireFox
See online demo here.

You can use String.indexOf to find the location of the = and then String.substring to get your two values:
String currentLine = "BROWSER=Firefox";
int indexOfEq = currentLine.indexOf('=');
String myKey = currentLine.substring(0, indexOfEq);
String myVal = currentLine.substring(indexOfEq + 1);
System.out.println(myKey + ":" + myVal);

Related

Regex lazy solution for java?

I have a string "hooRayNexTcapItaLnextcapitall"
I want to capture the first instance of "next" (NexT - in this case)
My soultion:
(.*)([nN][eE][xX][tT])([cC][aA][pP][iI][tT][aA][lL])(.*)
My solution group1 returns next instead of Next
How can I correct my regex to capture the first next instead of capturing the last next?
Edit 1:
Let me put my question properly,
If the string contains any combination of upper and lower case letters that spell "NextCapital", reverse the characters of the word "Next". Case should be preserved. If "NextCapital" occurs multiple times, only update the first occurrence.
So, I am using group to capture. But my group is capturing the last occurrence of "nextCapital" instead of first occurrence.
Ex:
Input: hooRayNexTcapItaLnextcapitall
output: hooRayTxeNcapItaLnextcapitall
Edit 2:
Please correct my code.
My java code:
Pattern ptn = Pattern.compile("(.*)([nN][eE][xX][tT])([cC][aA][pP][iI][tT][aA][lL])(.*)");
//sb = hooRayNexTcapItaLnextcapitall
Matcher mtc = ptn.matcher(sb);
StringBuilder c = new StringBuilder();
if(mtc.find()){
StringBuilder d = new StringBuilder();
StringBuilder e = new StringBuilder();
d.append(mtc.group(1));
e.append(mtc.group(2));
e.reverse();
d.append(e);
d.append(mtc.group(3));
d.append(mtc.group(4));
sb = d;
}
Your regex actually works if you get group 2. Test it here! Your regex does not need to be that complicated.
Your regex can just be this:
next
If you use Matcher.find and turn on CASE_INSENSITIVE option, you can find the first substring of the string that matches the pattern. Then, use group() to get the actual string:
Matcher matcher = Pattern.compile("next", Pattern.CASE_INSENSITIVE).matcher("hooRayNexTcapItaLnextcapitall");
if (matcher.find()) {
System.out.println(matcher.group());
}
EDIT:
After seeing your requirements, I wrote this code:
String input = "hooRayNexTcapItaLnextcapitall";
Matcher m = Pattern.compile("next(?=capital)", Pattern.CASE_INSENSITIVE).matcher(input);
if (m.find()) {
StringBuilder outputBuilder = new StringBuilder(input);
StringBuilder reverseBuilder = new StringBuilder(input.substring(m.start(), m.end()));
outputBuilder.replace(m.start(), m.end(), reverseBuilder.reverse().toString());
System.out.println(outputBuilder);
}
I used a lookahead to match next only if there is capital after it. After a match is found, I created a string builder with the input, and another string builder with the matched portion of the input. Then, I replaced the matched range with the reverse of the second string builder.
String target = "next";
int index = line.toLowerCase().indexOf(target);
if (index != -1) {
line = line.substring(index, index + target.length());
System.out.println(line);
} else {
System.out.println("Not Found");
}
This would be my first attempt which allows room for adjusting the desired String to locate.
Otherwise you may use this ReGeX solution to achieve the same effect:
Pattern pattern = Pattern.compile("(?i)next");
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
System.out.println(matcher.group());
}
The pattern "(?i)next" finds the substring matching "next" ignoring case.
Edit : This would reverse the order of the first occurrence of next.
String input = "hooRayNexTcapItaLnextcapitall";
String target = "nextcapital";
int index = input.toLowerCase().indexOf(target);
if (index != -1) {
String first = input.substring(index, index + target.length());
first = new StringBuilder(first.substring(0, 4)).reverse().toString() + first.substring(4, first.length());
input = input.substring(0, index) + first + input.substring(index + target.length(), input.length());
}
Edit Again : Here is a "fixed" form of your code.
String input = "hooRayNexTcapItaLnextcapitall";
Pattern ptn = Pattern.compile("([nN][eE][xX][tT])([cC][aA][pP][iI][tT][aA][lL])");
Matcher mtc = ptn.matcher(input);
if(mtc.find()){
StringBuilder d = new StringBuilder(mtc.group(1));
StringBuilder e = new StringBuilder(mtc.group(2));
input = input.replaceFirst(d.toString() + e.toString(), d.reverse().toString() + e.toString());
System.out.println(input);
}
Your regex is grabbing the second potential match for your group due to the default greedy nature of regex. Effectively, the first (.*) is grabbing as much as it can while still satisfying the rest of your regex.
To get what you intend, you can add a question mark to the first group, making it (.*?). This will make it non-greedy, grabbing the smallest string possible while still satisfying the rest of your regex.

Regex in Java not working while same regex is working in shell

I want to replace all :variable (word starting with :) with ${variable}$.
For example,
:aks_num with ${aks_num}$
:brn_num with ${brn_num}$
Following is my code, which does not work:
public static void main(String[] argv) throws Exception
{
CharSequence chSeq = "AND ((:aks_num = -1) OR (aks_num = :aks_num AND ((:brn_num = -1) OR (brn_num = :brn_num))))";
// replaceAll also not working
//String s = chSeq.replaceAll(":\\([a-z_]*\\)","\\${ $1 \\}$");
Pattern p = Pattern.compile(":\\([a-z_]*\\)");
Matcher m = p.matcher(chSeq);
if (m.find()) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
} else {
System.out.println("NO MATCH");
}
}
While in shell script the following regex works perfectly:
s/:\([a-z_]*\)/${\1}$/g
:\\([a-z_]*\\) (with escaped parenthesis) means that you want to match expressions like :(aks_num). Obviously, there are no such expression in the input string. That explains why there are no matches.
Instead, if you want to use parenthesis in order to capture some variables, you should not escape the parenthesis.
Example :
CharSequence chSeq = "AND ((:aks_num = -1) OR (aks_num = :aks_num AND ((:brn_num = -1) OR (brn_num = :brn_num))))";
Pattern p = Pattern.compile(":([a-z_]*)");
Matcher m = p.matcher(chSeq);
while (m.find()) {
System.out.println("Found value: " + m.group(0)+". Captured : "+m.group(1));
}
Output:
Found value: :aks_num. Captured : aks_num
Found value: :aks_num. Captured : aks_num
Found value: :brn_num. Captured : brn_num
Found value: :brn_num. Captured : brn_num
CharSequence chSeq = "AND ((:aks_num = -1) OR (aks_num = :aks_num AND ((:brn_num = -1) OR (brn_num = :brn_num))))";
// replaceAll also not working
//String s = chSeq.replaceAll(":\\([a-z_]*\\)","\\${ $1 \\}$");
Pattern p = Pattern.compile(":(\\w+)");
Matcher m = p.matcher(chSeq);
while (m.find()) {
System.out.println("Found value: " + m.group(1) );
}
Ideone Demo
Working fine with replaceAll
Pattern p = Pattern.compile("(:\\w+)");
Matcher m = p.matcher(x);
x = m.replaceAll("\\${$1}\\$");
You don't need to escape the parentheses, so
Pattern.compile(":([a-z_]*)");
should work.
I believe you got confused with the Java's regex syntax that is different from regular sed syntax. You do not need to escape parentheses to make them "special" grouping operators. Vice versa, in Java, when you escape parentheses, they start matching literal ( and ) symbols.
In the replacement pattern, $ must be escaped for the regex engine to replace with literal $ symbols, but you do not need to escape braces there.
So, just use
.replaceAll(":([a-z_]+)", "\\${$1}\\$")
See the IDEONE demo
I suggest the + quantifier because I doubt you need to match a : followed with a space, or digits - any non-letter.
BTW, you do not need any /g flag in Java since replaceAll will replace all matches with the provided replacement pattern.
NOTE: you can further adjust the pattern to match all letters/digits/underscores with ":(\\w+)". Or just alphanumerics/underscore: ":([\\p{Alnum}_]+)".

Extracting two substrings from one string

I am trying to extract the 00 and 02 from the line below into Strings.
invokestatic:indexbyte1=00 indexbyte2=02
I am using this code, but it's not working correctly:
String parse = "invokestatic:indexbyte1=00 indexbyte2=02";
String first = parse.substring(check.indexOf("=") + 1);
String second= parse.substring(check.lastIndexOf("=") + 1);
This seems to work for the seconds string, but the first strings value is
00 indexbyte2=02
I want to catch just the two digits and not the rest of the string.
If you don't specify the second parameter in substring method it will result in a substring from the starting index to the end of string that's why you get "00 indexbyte2=02" for first.
Specify the last index only to extract two digits when you extract value for first
String first = parse.substring(check.indexOf("=") + 1, check.indexOf("=") + 3);
You can use a regex pattern with groups, like this:
public static void main(String[] args) {
String input = "invokestatic:indexbyte1=00 indexbyte2=02";
Pattern pattern = Pattern.compile(".*indexbyte1=(\\d*) indexbyte2=(\\d*)");
Matcher m = pattern.matcher(input);
if (m.matches()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
}
}
Try this:
String first = parse.substring(check.indexOf("=") + 1, check.indexOf("=") + 3);
check.indexOf("=") + 3 will take the 02 and will be the endindex for the substring. Presently you are not specifying the endindex hence it is taking the indexbyte2=02 as well since substring does not know where to stop hence it parses down till the end.
String parse = "invokestatic:indexbyte1=00 indexbyte2=02";
String first = parse.substring(parse.indexOf("=") + 1,
parse.indexOf("=") + 3);
String second = parse.substring(parse.lastIndexOf("=") + 1);
System.out.println(first + ", " + second);
You could use Pattern, Matcher clases.
Matcher m = Pattern.compile("(?<==)\\d+").matcher(string);
while(m.find())
{
System.out.println(m.group());
}
substring also has an endIndex. See the docs: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#substring(int,%20int)
If the input has the basic form invokestatic:indexbyte1=00 indexbyte2=02 ... indexbyte99=99 you could use a regex:
Pattern p = Pattern.compile("indexbyte\\d+=([a-fA-F0-9]{2})");
Matcher m = p.matcher(input);
while( m.find() ) {
String idxByte = m.group(1);
//handle the byte here
}
This assumes that the identifier for those bytes is indexbyteN but this can be replaced with another identifier. Further this assumes the bytes are provided in hex, i.e. 2 hex characters (case insensitive here).

How do I read and remove a number from a string?

So for example, I have this string:
0no1no2yes3yes4yes
The first 0 here should be removed and used an an index of array. I am doing so by this statement:
string = string.replaceFirst(dataLine.substring(0, 1), "");
However, when I have say this string:
10yes11no12yes13yes14no
My code fails, since I want to process the 10 but my code extracts just the 1.
So in sort, single digits work fine, but double or triple digits cause IndexOutOfBound Error.
Here's the code: http://pastebin.com/uspYp1FK
And here's some sample data: http://pastebin.com/kTQx5WrJ
Here's the output for the sample data:
Enter filename: test.txt
Data before cleanUp: {"assignmentID":"2CCYEPLSP75KTVG8PTFALQES19DXRA","workerID":"AGMJL8K9OMU64","start":1359575990087,"end":"","elapsedTime":"","itemIndex":0,"responses":[{"jokeIndex":0,"response":"no"},{"jokeIndex":1,"response":"no"},{"jokeIndex":2,"response":"yes"},{"jokeIndex":3,"response":"yes"},{"jokeIndex":4,"response":"yes"}],"mturk":"yes"},
Data after cleanUp: 0no1no2yes3yes4yes
Data before cleanUp: {"assignmentID":"2118D8J3VE7W013Z4273QCKAGJOYID","workerID":"A2P0GYVEKGM8HF","start":1359576154789,"end":"","elapsedTime":"","itemIndex":3,"responses":[{"jokeIndex":15,"response":"no"},{"jokeIndex":16,"response":"no"},{"jokeIndex":17,"response":"no"},{"jokeIndex":18,"response":"no"},{"jokeIndex":19,"response":"no"}],"mturk":"yes"},
Data after cleanUp: 15no16no17no18no19no
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 2
at java.lang.String.substring(String.java:1907)
at jokes.main(jokes.java:34)
Basically, what the code is supposed to do is strip off the data into strings as shown above, and then read the number, and if it's followed by yes increase it's index's value in dataYes, or if followed by no increase value in dataNo. Makes sense?
What can I do? How can I make my code more flexible?
An alternative, more specific attempt: -
String regex = "^(\\d+)(yes|no)";
String myStr = "10yes11no";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(myStr);
while (m.find())
{
String all = m.group();
String digits = m.group(1);
String bool = m.group(2);
// do not try and combine the next 2 lines ... it doesn't work!
myStr = myStr.substring(all.length());
m.reset(myStr);
System.out.println(String.format("all = %s, digits = %s, bool = %s", all, digits, bool));
}
does it work for you?
string = string.replaceAll("^\\d+","");
Try this
System.out.println("10yes11no12yes13yes14no".replaceFirst("^\\d+",""));
How about: -
String regex = "^\\d+";
String myStr = "10abc11def";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(myStr);
if(m.find())
{
String digits = m.group();
myStr = m.replaceFirst("");
}

Regex composion

I want to parse a line from a CSV(comma separated) file, something like this:
Bosh,Mark,mark#gmail.com,"3, Institute","83, 1, 2",1,21
I have to parse the file, and instead of the commas between the apostrophes I wanna have ';', like this:
Bosh,Mark,mark#gmail.com,"3; Institute","83; 1; 2",1,21
I use the following Java code but it doesn't parse it well:
Pattern regex = Pattern.compile("(\"[^\\]]*\")");
Matcher matcher = regex.matcher(line);
if (matcher.find()) {
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
the output is:
Bosh,Mark,mark#gmail.com,"3; Institute";"83; 1; 2",1,21
anyone have any idea how to fix this?
This is my solution to replace , inside quote to ;. It assumes that if " were to appear in a quoted string, then it is escaped by another ". This property ensures that counting from start to the current character, if the number of quotes " is odd, then that character is inside a quoted string.
// Test string, with the tricky case """", which resolves to
// a length 1 string of single quote "
String line = "Bosh,\"\"\"\",mark#gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern pattern = Pattern.compile("\"[^\"]*\"");
Matcher matcher = pattern.matcher(line);
int start = 0;
StringBuilder output = new StringBuilder();
while (matcher.find()) {
// System.out.println(m.group() + "\n " + m.start() + " " + m.end());
output
.append(line.substring(start, matcher.start())) // Append unrelated contents
.append(matcher.group().replaceAll(",", ";")); // Append replaced string
start = matcher.end();
}
output.append(line.substring(start)); // Append the rest of unrelated contents
// System.out.println(output);
Although I cannot find any case that will fail the method of replace the matched group like you did in line = line.replace(matcher.group(), replacedMatch);, I feel safer to rebuild the string from scratch.
Here's a way:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
String in = "Bosh,Mark,mark#gmail.com,\"3, \"\" Institute\",\"83, 1, 2\",1,21";
String regex = "[^,\"\r\n]+|\"(\"\"|[^\"])*\"";
Matcher matcher = Pattern.compile(regex).matcher(in);
StringBuilder out = new StringBuilder();
while(matcher.find()) {
out.append(matcher.group().replace(',', ';')).append(',');
}
out.deleteCharAt(out.length() - 1);
System.out.println(in + "\n" + out);
}
}
which will print:
Bosh,Mark,mark#gmail.com,"3, "" Institute","83, 1, 2",1,21
Bosh,Mark,mark#gmail.com,"3; "" Institute","83; 1; 2",1,21
Tested on Ideone: http://ideone.com/fCgh7
Here is the what you need
String line = "Bosh,Mark,mark#gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern regex = Pattern.compile("(\"[^\"]*\")");
Matcher matcher = regex.matcher(line);
while(matcher.find()){
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
line will have value you needed.
Have you tried to make the RegExp lazy?
Another idea: inside the [] you should use a " too. If you do that, you should have the expected output with global flag set.
Your regex is faulty. Why would you want to make sure there are no ] within the "..." expression? You'd rather make the regex reluctant (default is eager, which means it catches as much as it can).
"(\"[^\\]]*\")"
should be
"(\"[^\"]*\")"
But nhadtdh is right, you should use a proper CSV library to parse it and replace , to ; in the values the parser returns.
I'm sure you'll find a parser when googling "Java CSV parser".
Shouldn't your regex be ("[^"]*") instead? In other words, your first line should be:
Pattern regex = Pattern.compile("(\"[^\"]*\")");
Of course, this is assuming you can't have quotes in the quoted values of your input line.

Categories

Resources