Parse a string line by opening a file using Regex - java

This is the below text file(log.txt) I am opening and need to match each line using regular expressions.
Jerty|gas|petrol|2.42
Tree|planet|cigar|19.00
Karie|entertainment|grocery|9.20
So I wrote this regular expressions but it is not getting matched.
public static String pattern = "(.*?)|(.*?)|(.*?)|(.*?)";
public static void main(String[] args) {
File file = new File("C:\\log.txt");
try {
Pattern regex = Pattern.compile(pattern);
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
Matcher m = regex.matcher(line);
if(m.matches()) {
System.out.println(m.group(1));
}
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Any suggestions will be appreciated.

The | is a special regex symbol which means 'or'. So, you have to escape it.
public static String pattern = "(.*?)\\|(.*?)\\|(.*?)\\|(.*?)";

You can greatly simplify the regex for this. Since the data appears to be pipe-separated, you should just split on the pipe character. You'll end up with an array of fields which can you further parse as needed:
String[] fields = line.split("\\|");

Related

Replace a specific character in a regex match

How can I replace a specific character in a Regex match which is present multiple times.
I found this post about it which helped me a lot, but it replaces only 1 character.
I would need it to replace all.
For example from this:
href="https://Link.com/test/Release+This+Has+Some+Pluses
Linkcom//test/Release+This+Has+Some+Pluses
To:
href="https://Link.com/test/Release%20This%20Has%20Some%20Pluses
Link.com/test/Release+This+Has+Some+Pluses
Here is what I got so far
Line.replaceAll("(https://Link.com/test/)(\w+)\+" , "$1%20")
But as I already mentioned. This only replaces one character and not all like this:
href="https://Link.com/display/release5/Release%20This+Has+Some+Pluses
Link.com/test/Release+This+Has+Some+Pluses
How would I replace all?
EDIT
Here is the code snippet from Java:
public class ExchangeLink {
public static void main(String[] args) {
try {
Path path = Paths.get("C:\\Users\\Mati\\Desktop\\test.txt");
Stream<String> lines = Files.lines(path);
List<String> replaced = lines.map(line -> line.replaceAll("(href=\"https://link.com/test/)(\\w+)\\+", "$1$2%20")).collect(Collectors.toList());
Files.write(path, replaced);
lines.close();
System.out.println("Find and Replace done!!!");
} catch (IOException e) {
e.printStackTrace();
}
}
}
Just do it in 2 steps.
Pattern links = Pattern.compile("href=\"https://link.com/test/((\\w+)\\+?)+");
Matcher matcher = links.matcher(line);
while (matcher.find()) {
line = line.replace(matcher.group(), matcher.group().replace("+", "%20"));
}

how to grab and show multiple lines between two string(pattern) from a file in java

i want to grab and show multi-lined string from a file (has more than 20,000 lines of text in it) between two desired string(pattern) using java
ex: file.txt(has more than 20,000 lines of text)
pattern1
string
that i
want
to grab
pattern2
i want to grab and show text in between these two patterns(pattern1 and pattern2) which is in this case "string /n that i /n want /n to grab"
how can i do that
i tried Bufferreader ,file ,string and few more things but nothing worked
sorry im a noob
Is your pattern on several lines ?
One easy solution would be to store the content of you'r file and then check for you'r pattern with a regular expression :
try {
BufferedReader reader = new BufferedReader(new FileReader(new File("test.txt")));
final StringBuilder contents = new StringBuilder();
while(reader.ready()) { // read the file content
contents.append(reader.readLine());
}
reader.close();
Pattern p = Pattern.compile("PATTERN1(.+)PATTERN2"); // prepare your regex
Matcher m = p.matcher(contents.toString());
while(m.find()){ // for each
String b = m.group(1);
System.out.println(b);
}
} catch(Exception e) {
e.printStackTrace();
}
You can use this :
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class HelloWorld{
public static void textBeetweenTwoPattern(String pattern1, String pattern2, String text){
Pattern p = Pattern.compile(pattern1+"([^<]+)"+pattern2, Pattern.MULTILINE);
Matcher m = p.matcher(text);
while (m.find())
{
System.out.println(m.group(1));
}
}
public static void main(String []args){
textBeetweenTwoPattern("<b>", "</b>", "Test <b>regex</b> <i>Java</i> for \n\n<b>Stackoverflow</b> ");
}
}
It returns :
regex
Stackoverflow

Java Pattern/ Matcher

This is a sample text: \1f\1e\1d\020028. I cannot modify the input text, I am reading long string of texts from a file.
I want to extract the following: \1f, \1e, \1d, \02
For this, I have written the following regular expression pattern: "\\[a-fA-F0-9]"
I am using Pattern and Matcher classes, but my matcher is not able find the pattern using the mentioned regular expression. I have tested this regex with the text on some online regex websites and surprisingly it works there.
Where am I going wrong?
Original code:
public static void main(String[] args) {
String inputText = "\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d";
inputText = inputText.replace("\\", "\\\\");
String regex = "\\\\[a-fA-F0-9]{2}";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(inputText);
while (m.find()) {
System.out.println(m.group());
}
}
Output: Nothing is printed
(answer changed after OP added more details)
Your string
String inputText = "\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d";
Doesn't actually contains any \ literals because according to Java Language Specification in section 3.10.6. Escape Sequences for Character and String Literals \xxx will be interpreted as character indexed in Unicode Table with octal (base/radix 8) value represented by xxx part.
Example \123 = 1*82 + 2*81 + 3*80 = 1*64 + 2*8 + 3*1 = 64+16+3 = 83 which represents character S
If string you presented in your question is written exactly the same in your text file then you should write it as
String inputText = "\\1f\\1e\\1d\\02002868BF03030000000000000000S023\\1f\\1e\\1d\\03\\0d";
(with escaped \ which now will represent literal).
(older version of my answer)
It is hard to tell what exactly you did wrong without seeing your code. You should be able to find at least \1, \1, \1, \0 since your regex can match one \ and one hexadecimal character placed after it.
Anyway this is how you can find results you mentioned in question:
String text = "\\1f\\1e\\1d\\020028";
Pattern p = Pattern.compile("\\\\[a-fA-F0-9]{2}");
// ^^^--we want to find two hexadecimal
// characters after \
Matcher m = p.matcher(text);
while (m.find())
System.out.println(m.group());
Output:
\1f
\1e
\1d
\02
You need to read the file properly and replace '\' characters with '\\'. Assume that there is file called test_file in your project with this content:
\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d
Here is the code to read the file and extract values:
public static void main(String[] args) throws IOException, URISyntaxException {
Test t = new Test();
t.test();
}
public void test() throws IOException {
BufferedReader br =
new BufferedReader(
new InputStreamReader(
getClass().getResourceAsStream("/test_file.txt"), "UTF-8"));
String inputText;
while ((inputText = br.readLine()) != null) {
inputText = inputText.replace("\\", "\\\\");
Pattern pattern = Pattern.compile("\\\\[a-fA-F0-9]{2}");
Matcher match = pattern.matcher(inputText);
while (match.find()) {
System.out.println(match.group());
}
}
}
Try adding a . at the end, like:
\\[a-fA-F0-9].
If you don't want to modify the input string, you could try something like:
static public void main(String[] argv) {
String s = "\1f\1e\1d\020028";
Pattern regex = Pattern.compile("[\\x00-\\x1f][0-9A-Fa-f]");
Matcher match = regex.matcher(s);
while (match.find()) {
char[] c = match.group().toCharArray();
System.out.println(String.format("\\%d%s",c[0]+0, c[1])) ;
}
}
Yes, it's not perfect, but you get the idea.

how to use escape characters for patterns read from file in java

I am reading some patterns from a file and using it in String matches method. but while reading the patterns from the file, the escape characters are not working
Ex I have few data ex "abc.1", "abcd.1", "abce.1", "def.2"
I want do do some activity if the string matches "abc.1" i.e abc. followed by any characters or numbers
I have a file that stores the pattern to be matched ex the pattern abc\..*
but when I read the pattern from the file and using it in String matches method it does not work.
any suggestions
a sample java program to demonstrate the issue is :
package com.test.resync;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
public class TestPattern {
public static void main(String args[]) {
// raw data against which the pattern is to be matched
String[] data = { "abc.1", "abcd.1", "abce.1", "def.2" };
String regex_data = ""; // variable to hold the regexpattern after
// reading from the file
// regex.txt the file containing the regex pattern
File file = new File(
"/home/ekhaavi/Documents/WORKSPACE/TESTproj/src/com/test/regex.txt");
try {
BufferedReader br = new BufferedReader(new FileReader(file));
String str = "";
while ((str = br.readLine()) != null) {
if (str.startsWith("matchedpattern")) {
regex_data = str.split("=")[1].toString(); // setting the
// regex pattern
}
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
/*if the regex is set by the below String literal it works fine*/
//regex_data = "abc\\..*";
for (String st : data) {
if (st.matches(regex_data)) {
System.out.println(" data matched "); // this is not printed when the pattern is read from the file instead of setting it through literals
}
}
}
}
The regex.txt file has the below entry
matchedpattern=abc\..*
Use Pattern.quote(String) method:
if (st.matches(Pattern.quote(regex_data))) {
System.out.println(" data matched "); // this is not printed when the pattern is read from the file instead of setting it through literals
}
There are some other issues that you should consider resolving:
You're overwriting the value of regex_data in the while loop. Did you intend to store all the the regex pattern in a list?
String#split()[0] will return a String only. You don't need to invoke toString() on that.

Cant match Srt subtitle using Regex in Java

In try in this code to parse an srt subtitle:
public class MatchArray {
public static void main(String args[]) {
File file = new File(
"C:/Users/Thiago/workspace/SubRegex/src/Dirty Harry VOST - Clint Eastwood.srt");
{
try {
Scanner in = new Scanner(file);
try {
String contents = in.nextLine();
while (in.hasNextLine()) {
contents = contents + "\n" + in.nextLine();
}
String pattern = "([\\d]+)\r([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})[\\s]*-->[\\s]*([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})\r(([^|\r]+(\r|$))+)";
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(contents);
ArrayList<String> start = new ArrayList<String>();
while (m.find()) {
start.add(m.group(1));
start.add(m.group(2));
start.add(m.group(3));
start.add(m.group(4));
start.add(m.group(5));
start.add(m.group(6));
start.add(m.group(7));
System.out.println(start);
}
}
finally {
in.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
But when i execute it, it dosent capture any group, when try to capture only the time with this pattern:
([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})[\\s]*-->[\\s]*([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})
It works. So how do I make it capture the entire subtitle?
I can not quite understand your need but i thought this can help.
Please try the regex:
(\\d+?)\\s*(\\d+?:\\d+?:\\d+?,\\d+?)\\s+-->\\s+(\\d+?:\\d+?:\\d+?,\\d+?)\\s+(.+)
I tried it on http://www.myregextester.com/index.php and it worked.
I hope this can help.

Categories

Resources