Java pattern to delimit string and find field - java

I once again need some help with this as I cannot figure out how to get a regex going.
Heres the string I want to split:
String str = "ERR||||TEST|GET|POST|UPDATE|"
I have a function which given a field index starting with 1 will return the string at that position after splitting the string. However the problem is that the regex does not return the empty strings between the delimiters as this is counted towards the field index. How can I modify this regex to include these empty fields ?
private static String extractField(String strt, int fieldNo)
{
Pattern pattern = Pattern.compile("[^|]+");
String str = strt.replaceAll("\\r", "");
Matcher matcher = pattern.matcher(str);
int i = 1;
while (matcher.find()) {
String fS = matcher.group().trim();
System.out.println("Result: \"" + fS + "\"");
if (i++==fieldNo) {
return fS;
}
}
return "";
}

Why not use split? ...
String[] x = str.split("\\|");
System.out.println(Arrays.toString(x));

As you are looking for a Pattern-based solution, you can use the regex:
(?<=(^|\|))(.*?)(?=(\||$))
Those are a positive lookahead ((?=X)) and a positive look-behind ((?<=X)). This mill match anything between two |s, or between the start of the string (^) and a |, or between a | and the end of the string ($). As lookaheads and look-behinds are zero-width assertions, they will not include the | in the groups. Also, the ? in .*? makes it non-greedy.
Code:
private static String extractField(String strt, int fieldNo)
{
Pattern pattern = Pattern.compile("(?<=(^|\\|))(.*?)(?=(\\||$))");
String str = strt.replaceAll("\\r", "");
Matcher matcher = pattern.matcher(str);
int i = 1;
while (matcher.find()) {
String fS = matcher.group().trim();
System.out.println("Result: \"" + fS + "\"");
if (i++==fieldNo) {
return fS;
}
}
return "";
}
Results for "ERR||||TEST|GET|POST|UPDATE|":
Result: "ERR"
Result: ""
Result: ""
Result: ""
Result: "TEST"
Result: "GET"
Result: "POST"
Result: "UPDATE"
Result: ""

Doing a split on | would yield what you want.
Something like this:
String s = "ERR||||TEST|GET|POST|UPDATE|";
String [] a = s.split("\\|");
for (int i=0; i<a.length; i++) {
System.out.println(a[i]);
}

You can easily upgrade your regex using look-around mechanism.
Try maybe this way [^|]+|(?<=[|]). Additional part |(?<=[|]) means "OR empty string that have | before it".
Thanks to this regex for ERR||||TEST|GET|POST|UPDATE| you will find
"ERR", "", "", "", "TEST", "GET", "POST", "UPDATE", ""
In case you don't want last "" you may use [^|]+|(?<=[|])(?=[|]). (?<=[|])(?=[|]) means "empty string that exists between two |", so result of this pattern will be
"ERR", "", "", "", "TEST", "GET", "POST", "UPDATE"

Related

Java regex extract specific values in long log

I have a very long text and I'm extracting some specific values that are followed by some particular words. Here's an example of my long text:
.........
FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]
.........
TotalFrames[ValMin: 100000, ValMax:200000]
.........
MemoryUsage(In MB)[ValMin:190000MB, ValMax:360000MB]
.........
here's my code:
File file = filePath.toFile();
JSONObject jsonObject = new JSONObject();
String FPSMin="";
String FPSMax="";
String TotalFramesMin="";
String TotalFramesMax="";
String MemUsageMin="";
String MemUsageMax="";
String log = "my//log//file";
final Matcher matcher = Pattern.compile("FPS/\(FramesPerSecond/\)/\[ValMin:");
if(matcher.find()){
FPSMin= matcher.end().trim();
}
But I can't make it work. Where am I wrong? Basically I need to select, for each String, the corresponding values (max and min) coming from that long text and store them into the variables. Like
FPSMin = 29.0000
FPSMax = 35.0000
FramesMin = 100000
Etc
Thank you
EDIT:
I tried the following code (in a test case) to see if the solution could work, but I'm experiencing issues because I can't print anything except an object. Here's the code:
#Test
public void whenReadLargeFileJava7_thenCorrect()
throws IOException, URISyntaxException {
Scanner txtScan = new Scanner("path//to//file//test.txt");
String[] FPSMin= new String[0];
String FPSMax= "";
//Read File Line By Line
while (txtScan.hasNextLine()) {
// Print the content on the console
String str = txtScan.nextLine();
Pattern FPSMin= Pattern.compile("^FPS\\(FramesPerSecond\\)\\[ValMin:");
Matcher matcher = FPSMin.matcher(str);
if(matcher.find()){
String MinMaxFPS= str.substring(matcher.end(), str.length()-1);
String[] splitted = MinMaxFPS.split(",");
FPSMin= splitted[0].split(": ");
FPSMax = splitted[1];
}
System.out.println(FPSMin);
System.out.println(FPSMax);
}
Maybe your pattern should be like this ^FPS\\(FramesPerSecond\\)\\[ValMin: . I've tried it and it works for me.
String line = "FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]";
Pattern pattern = Pattern.compile("^FPS\\(FramesPerSecond\\)\\[ValMin:");
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
System.out.println(line.substring(matcher.end(), line.length()-1));
}
}
In that way, you get the offset of the line that you want to extract data and using the substring function you can get all characters starting from offset until the size of the line-1 (because you dont want to get also the ] character)
The following regular expression will match and capture the name, min and max:
Pattern.compile("(.*)\\[.+:\\s*(\\d+(?:\\.\\d+)?)[A-Z]*,.+:\\s*(\\d+(?:\\.\\d+)?)[A-Z]*\\]");
Usage (extracting the captured groups):
String input = (".........\n" +
"FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]\n" +
".........\n" +
"TotalFrames[ValMin: 100000, ValMax:200000]\n" +
".........\n" +
"MemoryUsage(In MB)[ValMin:190000MB, ValMax:360000MB]\n" +
".........");
for (String s : input.split("\n")) {
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1) + ", " + matcher.group(2) + ", " + matcher.group(3));
}
}
Output:
FPS(FramesPerSecond), 29.0000, 35.000
TotalFrames, 100000, 200000
MemoryUsage(In MB), 190000, 360000

How do I properly use Matcher to retrieve first 30 chars of a String?

My goal is to return the first 30 characters of a user entered String and its returned in an email subject line.
My current solution is this:
Matcher matcher = Pattern.compile(".{1,30}").matcher(Item.getName());
String subject = this.subjectPrefix + "You have been assigned to Item Number " + Item.getId() + ": " + matcher + "...";
What is being returned for matcher is "java.util.regex.Matcher[pattern=.{1,30} region=0,28 lastmatch=]"
I think it's better to use String.substring():
public static String getFirstChars(String str, int n) {
if(str == null)
return null;
return str.substring(0, Math.min(n, str.length()));
}
In case you really want to use regexp, then this is an example:
public static String getFirstChars(String str, int n) {
if (str == null)
return null;
Pattern pattern = Pattern.compile(String.format(".{1,%d}", n));
Matcher matcher = pattern.matcher(str);
return matcher.matches() ? matcher.group(0) : null;
}
I personally would use the substring method of the String class too.
However, don't take it for granted that your string is at least 30 chars long, I'd guess that this may have been part of your problem:
String itemName = "lorem ipsum";
String itemDisplayName = itemName.substring(0, itemName.length() < 30 ? itemName.length() : 30);
System.out.println(itemDisplayName);
This makes use of the ternary operator, where you have a boolean condition, then and else. So if your string is shorter than 30 chars, we'll use the whole string and avoid a java.lang.StringIndexOutOfBoundsException.
Well, if you really need to use Matcher, then try:
Matcher matcher = Pattern.compile(".{1,30}").matcher("123456789012345678901234567890");
if (matcher.find()) {
String subject = matcher.group(0);
}
But it would be better to use the substring method:
String subject = "123456789012345678901234567890".substring(0, 30);
use substring instead.
String str = "....";
String sub = str.substring(0, 30);

Remove pattern from string in Java

I am currently working on a tool, which helps me to analyze a constantly growing String, that can look like this: String s = "AAAAAAABBCCCDDABQ". What I want to do is to find a sequence of A's and B's, do something and then remove that sequence from the original String.
My code looks like this:
while (someBoolean){
if(Pattern.matches("A+B+", s)) {
//Do stuff
//Remove the found pattern
}
if(Pattern.matches("C+D+", s)) {
//Do other stuff
//Remove the found pattern
}
}
return s;
Also, how I could remove the three sequences, so that s just contains "Q" at the end of the calculation, without and endless loop?
You should use a regex replacement loop, i.e. the methods appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb).
To find one of many patterns, use the | regex matcher, and capture each pattern separately.
You can then use group(int group) to get the matched string for each capture group (first group is group 1), which returns null if that group didn't match. For better performance, to simply check whether the group matched, use start(int group), which returns -1 if that group didn't match.
Example:
String s = "AAAAAAABBCCCDDABQ";
StringBuffer buf = new StringBuffer();
Pattern p = Pattern.compile("(A+B+)|(C+D+)");
Matcher m = p.matcher(s);
while (m.find()) {
if (m.start(1) != -1) { // Group 1 found
System.out.println("Found AB: " + m.group(1));
m.appendReplacement(buf, ""); // Replace matched substring with ""
} else if (m.start(2) != -1) { // Group 2 found
System.out.println("Found CD: " + m.group(2));
m.appendReplacement(buf, ""); // Replace matched substring with ""
}
}
m.appendTail(buf);
String remain = buf.toString();
System.out.println("Remain: " + remain);
Output
Found AB: AAAAAAABB
Found CD: CCCDD
Found AB: AB
Remain: Q
This solution assumes that the string always ends in Q.
String s="AAAAAAABBCCCDDABQ";
Pattern abPattern = Pattern.compile("A+B+");
Pattern cdPattern = Pattern.compile("C+D+");
while (s.length() > 1){
Matcher abMatcher = abPattern.matcher(s);
if (abMatcher.find()) {
s = abMatcher.replaceFirst("");
//Do other stuff
}
Matcher cdMatcher = cdPattern.matcher(s);
if (cdMatcher.find()) {
s = cdMatcher.replaceFirst("");
//Do other stuff
}
}
System.out.println(s);
You are probably looking for something like this:
String input = "AAAAAAABBCCCDDABQ";
String result = input;
String[] chars = {"A", "B", "C", "D"}; // chars to replace
for (String ch : chars) {
if (result.contains(ch)) {
String pattern = "[" + ch + "]+";
result = result.replaceAll(pattern, ch);
}
}
System.out.println(input); //"AAAAAAABBCCCDDABQ"
System.out.println(result); //"ABCDABQ"
This basically replace sequence of each character for single one.
If you want to remove the sequence completely, just replace ch to "" in replaceAll method parameters inside if body.

How to search word in String text, this word end "." or "," in java

someone can help me with code?
How to search word in String text, this word end "." or "," in java
I don't want search like this to find it
String word = "test.";
String wordSerch = "I trying to tasting the Artestem test.";
String word1 = "test,"; // here with ","
String word2 = "test."; // here with "."
String word3 = "test"; //here without
//after i make string array and etc...
if((wordSearch.equalsIgnoreCase(word1))||
(wordSearch.equalsIgnoreCase(word2))||
(wordSearh.equalsIgnoreCase(word3))) {
}
if (wordSearch.contains(gramer))
//it's not working because the word Artestem will contain test too, and I don't need it
You can use the matches(Regex) function with a String
String word = "test.";
boolean check = false;
if (word.matches("\w*[\.,\,]") {
check = true;
}
You can use regex for this
Matcher matcher = Pattern.compile("\\btest\\b").matcher(wordSearch);
if (matcher.find()) {
}
\\b\\b will match only a word. So "Artestem" will not match in this case.
matcher.find() will return true if there is a word test in your sentence and false otherwise.
String stringToSearch = "I trying to tasting the Artestem test. test,";
Pattern p1 = Pattern.compile("test[.,]");
Matcher m = p1.matcher(stringToSearch);
while (m.find())
{
System.out.println(m.group());
}
You can transform your String in an Array divided by words(with "split"), and search on that array , checking the last character of the words(charAt) with the character that you want to find.
String stringtoSearch = "This is a test.";
String whatIwantToFind = ",";
String[] words = stringtoSearch.split("\\s+");
for (String word : words) {
if (whatIwantToFind.equalsignorecas(word.charAt(word.length()-1);)) {
System.out.println("FIND");
}
}
What is a word? E.g.:
Is '5' a word?
Is '漢語' a word, or two words?
Is 'New York' a word, or two words?
Is 'Kraftfahrzeughaftpflichtversicherung' (meaning "automobile liability insurance") a word, or 3 words?
For some languages you can use Pattern.compile("[^\\p{Alnum}\u0301-]+") for split words. Use Pattern#split for this.
I think, you can find word by this pattern:
String notWord = "[^\\p{Alnum}\u0301-]{0,}";
Pattern.compile(notWord + "test" + notWord)`
See also: https://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

Regex composion

I want to parse a line from a CSV(comma separated) file, something like this:
Bosh,Mark,mark#gmail.com,"3, Institute","83, 1, 2",1,21
I have to parse the file, and instead of the commas between the apostrophes I wanna have ';', like this:
Bosh,Mark,mark#gmail.com,"3; Institute","83; 1; 2",1,21
I use the following Java code but it doesn't parse it well:
Pattern regex = Pattern.compile("(\"[^\\]]*\")");
Matcher matcher = regex.matcher(line);
if (matcher.find()) {
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
the output is:
Bosh,Mark,mark#gmail.com,"3; Institute";"83; 1; 2",1,21
anyone have any idea how to fix this?
This is my solution to replace , inside quote to ;. It assumes that if " were to appear in a quoted string, then it is escaped by another ". This property ensures that counting from start to the current character, if the number of quotes " is odd, then that character is inside a quoted string.
// Test string, with the tricky case """", which resolves to
// a length 1 string of single quote "
String line = "Bosh,\"\"\"\",mark#gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern pattern = Pattern.compile("\"[^\"]*\"");
Matcher matcher = pattern.matcher(line);
int start = 0;
StringBuilder output = new StringBuilder();
while (matcher.find()) {
// System.out.println(m.group() + "\n " + m.start() + " " + m.end());
output
.append(line.substring(start, matcher.start())) // Append unrelated contents
.append(matcher.group().replaceAll(",", ";")); // Append replaced string
start = matcher.end();
}
output.append(line.substring(start)); // Append the rest of unrelated contents
// System.out.println(output);
Although I cannot find any case that will fail the method of replace the matched group like you did in line = line.replace(matcher.group(), replacedMatch);, I feel safer to rebuild the string from scratch.
Here's a way:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
String in = "Bosh,Mark,mark#gmail.com,\"3, \"\" Institute\",\"83, 1, 2\",1,21";
String regex = "[^,\"\r\n]+|\"(\"\"|[^\"])*\"";
Matcher matcher = Pattern.compile(regex).matcher(in);
StringBuilder out = new StringBuilder();
while(matcher.find()) {
out.append(matcher.group().replace(',', ';')).append(',');
}
out.deleteCharAt(out.length() - 1);
System.out.println(in + "\n" + out);
}
}
which will print:
Bosh,Mark,mark#gmail.com,"3, "" Institute","83, 1, 2",1,21
Bosh,Mark,mark#gmail.com,"3; "" Institute","83; 1; 2",1,21
Tested on Ideone: http://ideone.com/fCgh7
Here is the what you need
String line = "Bosh,Mark,mark#gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern regex = Pattern.compile("(\"[^\"]*\")");
Matcher matcher = regex.matcher(line);
while(matcher.find()){
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
line will have value you needed.
Have you tried to make the RegExp lazy?
Another idea: inside the [] you should use a " too. If you do that, you should have the expected output with global flag set.
Your regex is faulty. Why would you want to make sure there are no ] within the "..." expression? You'd rather make the regex reluctant (default is eager, which means it catches as much as it can).
"(\"[^\\]]*\")"
should be
"(\"[^\"]*\")"
But nhadtdh is right, you should use a proper CSV library to parse it and replace , to ; in the values the parser returns.
I'm sure you'll find a parser when googling "Java CSV parser".
Shouldn't your regex be ("[^"]*") instead? In other words, your first line should be:
Pattern regex = Pattern.compile("(\"[^\"]*\")");
Of course, this is assuming you can't have quotes in the quoted values of your input line.

Categories

Resources