Java Pattern/ Matcher - java

This is a sample text: \1f\1e\1d\020028. I cannot modify the input text, I am reading long string of texts from a file.
I want to extract the following: \1f, \1e, \1d, \02
For this, I have written the following regular expression pattern: "\\[a-fA-F0-9]"
I am using Pattern and Matcher classes, but my matcher is not able find the pattern using the mentioned regular expression. I have tested this regex with the text on some online regex websites and surprisingly it works there.
Where am I going wrong?
Original code:
public static void main(String[] args) {
String inputText = "\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d";
inputText = inputText.replace("\\", "\\\\");
String regex = "\\\\[a-fA-F0-9]{2}";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(inputText);
while (m.find()) {
System.out.println(m.group());
}
}
Output: Nothing is printed

(answer changed after OP added more details)
Your string
String inputText = "\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d";
Doesn't actually contains any \ literals because according to Java Language Specification in section 3.10.6. Escape Sequences for Character and String Literals \xxx will be interpreted as character indexed in Unicode Table with octal (base/radix 8) value represented by xxx part.
Example \123 = 1*82 + 2*81 + 3*80 = 1*64 + 2*8 + 3*1 = 64+16+3 = 83 which represents character S
If string you presented in your question is written exactly the same in your text file then you should write it as
String inputText = "\\1f\\1e\\1d\\02002868BF03030000000000000000S023\\1f\\1e\\1d\\03\\0d";
(with escaped \ which now will represent literal).
(older version of my answer)
It is hard to tell what exactly you did wrong without seeing your code. You should be able to find at least \1, \1, \1, \0 since your regex can match one \ and one hexadecimal character placed after it.
Anyway this is how you can find results you mentioned in question:
String text = "\\1f\\1e\\1d\\020028";
Pattern p = Pattern.compile("\\\\[a-fA-F0-9]{2}");
// ^^^--we want to find two hexadecimal
// characters after \
Matcher m = p.matcher(text);
while (m.find())
System.out.println(m.group());
Output:
\1f
\1e
\1d
\02

You need to read the file properly and replace '\' characters with '\\'. Assume that there is file called test_file in your project with this content:
\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d
Here is the code to read the file and extract values:
public static void main(String[] args) throws IOException, URISyntaxException {
Test t = new Test();
t.test();
}
public void test() throws IOException {
BufferedReader br =
new BufferedReader(
new InputStreamReader(
getClass().getResourceAsStream("/test_file.txt"), "UTF-8"));
String inputText;
while ((inputText = br.readLine()) != null) {
inputText = inputText.replace("\\", "\\\\");
Pattern pattern = Pattern.compile("\\\\[a-fA-F0-9]{2}");
Matcher match = pattern.matcher(inputText);
while (match.find()) {
System.out.println(match.group());
}
}
}

Try adding a . at the end, like:
\\[a-fA-F0-9].

If you don't want to modify the input string, you could try something like:
static public void main(String[] argv) {
String s = "\1f\1e\1d\020028";
Pattern regex = Pattern.compile("[\\x00-\\x1f][0-9A-Fa-f]");
Matcher match = regex.matcher(s);
while (match.find()) {
char[] c = match.group().toCharArray();
System.out.println(String.format("\\%d%s",c[0]+0, c[1])) ;
}
}
Yes, it's not perfect, but you get the idea.

Related

how to grab and show multiple lines between two string(pattern) from a file in java

i want to grab and show multi-lined string from a file (has more than 20,000 lines of text in it) between two desired string(pattern) using java
ex: file.txt(has more than 20,000 lines of text)
pattern1
string
that i
want
to grab
pattern2
i want to grab and show text in between these two patterns(pattern1 and pattern2) which is in this case "string /n that i /n want /n to grab"
how can i do that
i tried Bufferreader ,file ,string and few more things but nothing worked
sorry im a noob
Is your pattern on several lines ?
One easy solution would be to store the content of you'r file and then check for you'r pattern with a regular expression :
try {
BufferedReader reader = new BufferedReader(new FileReader(new File("test.txt")));
final StringBuilder contents = new StringBuilder();
while(reader.ready()) { // read the file content
contents.append(reader.readLine());
}
reader.close();
Pattern p = Pattern.compile("PATTERN1(.+)PATTERN2"); // prepare your regex
Matcher m = p.matcher(contents.toString());
while(m.find()){ // for each
String b = m.group(1);
System.out.println(b);
}
} catch(Exception e) {
e.printStackTrace();
}
You can use this :
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class HelloWorld{
public static void textBeetweenTwoPattern(String pattern1, String pattern2, String text){
Pattern p = Pattern.compile(pattern1+"([^<]+)"+pattern2, Pattern.MULTILINE);
Matcher m = p.matcher(text);
while (m.find())
{
System.out.println(m.group(1));
}
}
public static void main(String []args){
textBeetweenTwoPattern("<b>", "</b>", "Test <b>regex</b> <i>Java</i> for \n\n<b>Stackoverflow</b> ");
}
}
It returns :
regex
Stackoverflow

Java, How can I find a pattern in a File and read the whole line?

I want to find a special charsequence in a file and I want to read the whole line where the occurrences are.
The following code just checks the first line and fetchess this ( the first ) line.
How can I fix it?
Scanner scanner = new Scanner(file);
String output = "";
output = output + scanner.findInLine(pattern) + scanner.next();
pattern and file are parameters
UPDATED ANSWER according to the comments on this very answer
In fact, what is used is Scanner#findWithHorizon, which in fact calls the Pattern#compile method with a set of flags (Pattern#compile(String, int)).
The result seems to be applying this pattern over and over again in the input text over lines of a file; and this supposes of course that a pattern cannot match multiple lines at once.
Therefore:
public static final String findInFile(final Path file, final String pattern,
final int flags)
throws IOException
{
final StringBuilder sb = new StringBuilder();
final Pattern p = Pattern.compile(pattern, flags);
String line;
Matcher m;
try (
final BufferedReader br = Files.newBufferedReader(path);
) {
while ((line = br.readLine()) != null) {
m = p.matcher(line);
while (m.find())
sb.append(m.group());
}
}
return sb.toString();
}
For completeness I should add that I have developed some time ago a package which allows a text file of arbitrary length to be read as a CharSequence and which can be used to great effect here: https://github.com/fge/largetext. It would work beautifully here since a Matcher matches against a CharSequence, not a String. But this package needs some love.
One example returning a List of matching strings in a file can be:
private static List<String> findLines(final Path path, final String pattern)
throws IOException
{
final Predicate<String> predicate = Pattern.compile(pattern).asPredicate();
try (
final Stream<String> stream = Files.lines(path);
) {
return stream.filter(predicate).collect(Collectors.toList());
}
}

Java sequentially parse information from file

lets say I have a file with a structure like this:
Line 0:
354858 Some String That Is Important AA OTHER STUFF SOMESTUFF
THAT SHOULD BE IGNORED
Line 1:
543788 Another String That Is Important AA OTHER STUFF
SOMESTUFF THAT SHOULD BE IGNORED
and so on...
Now I would like to get the information that is marked in my example (see gray background). The sequence AA is always present (and could be used as a break and skip to the next line) while the information string varies in length.
What will be the best way to parse the information? A buffered reader with if, then, else or is there some kind of parser that you can tell, read a number of lenth XYZ then read everything into a String until you find AA then skip line.
To tell you which is best for your problem is not possible without more information.
One solution might be
String s = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED";
String[] split = s.substring(0, s.indexOf(" AA")).split(" ", 2);
System.out.println("split = " + Arrays.toString(split));
output
split = [354858, Some String That Is Important]
You can read the file line by line and exclude the part which contains the AA charSequence:
final String charSequence = "AA";
String line;
BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream("yourfilename")));
try {
while ((line = r.readLine()) != null) {
int pos = line.indexOf(charSequence);
if (pos > 0) {
String myImportantStuff = line.substring(0, pos);
//do something with your useful string
}
}
} finally {
r.close();
}
I would read the file line by line and match each line against a regular expression. I hope my comments in the code below will be detailed enough.
// The pattern to use
Pattern p = Pattern.compile("^([0-9]+)\\s+(([^A]|A[^A])+)AA");
// Read file line by line
BufferedReader br = new BufferedReader(new FileReader(myFile));
String line;
while((line = br.readLine()) != null) {
// Match line against our pattern
Matcher m = p.matcher(line);
if(m.find()) {
// Line is valid, process it however you want
// m.group(1) contains the number
// m.group(2) contains the text between number and AA
} else {
// Line has invalid format (pattern does not match)
}
}
Explanation of the regular expression (Pattern) I used:
^([0-9]+)\s+(([^A]|A[^A])+)AA
^ matches the start of the line
([0-9]+) matches any integral number
\s+ matches one or more whitespace characters
(([^A]|A[^A])+) matches any characters which are either not A or not followed by another A
AA matches the terminating AA
Update as a reply to comment:
If every line has a preceding | character, the expression looks like this:
^\|([0-9]+)\s+(([^A]|A[^A])+)AA
In JAVA, you need to escape it like this:
"^\\|([0-9]+)\\s+(([^A]|A[^A])+)AA"
The character | has a special meaning in regular expressions and has to be escaped.
Here is a solution for you:
public static void main(String[] args) {
InputStream source; //select a text source (should be a FileInputStream)
{
String fileContent = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED\n" +
"543788 Another String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED";
source = new ByteArrayInputStream(fileContent.getBytes(StandardCharsets.UTF_8));
}
try(BufferedReader stream = new BufferedReader(new InputStreamReader(source))) {
Pattern pattern = Pattern.compile("^([0-9]+) (.*?) AA .*$");
while(true) {
String line = stream.readLine();
if(line == null) {
break;
}
Matcher matcher = pattern.matcher(line);
if(matcher.matches()) {
String someNumber = matcher.group(1);
String someText = matcher.group(2);
//do something with someNumber and someText
} else {
throw new ParseException(line, 0);
}
}
} catch (IOException | ParseException e) {
e.printStackTrace(); // TODO ...
}
}
You could use a regular expression, but if you know every line contains AA and you want the content up to AA you could can simply do substring(int,int) to get the part of the line up to AA
public List read(Path path) throws IOException {
return Files.lines(path)
.map(this::parseLine)
.collect(Collectors.toList());
}
public String parseLine(String line){
int index = line.indexOf("AA");
return line.substring(0,index);
}
Here's the non-Java8 version of read
public List read(Path path) throws IOException {
List<String> content = new ArrayList<>();
try(BufferedReader reader = new BufferedReader(new FileReader(path.toFile()))){
String line;
while((line = reader.readLine()) != null){
content.add(parseLine(line));
}
}
return content;
}
Use Regex : .+?(?=AA).
Check Here is the Demo

Replace a particular String from a text file

I'm trying to replace the occurence of a certain String from a given text file. Here's the code I've written:
BufferedReader tempFileReader = new BufferedReader(new InputStreamReader(new FileInputStream(tempFile)));
File tempFileBuiltForUse = new File("C:\\testing\\anotherTempFile.txt");
Writer changer = new BufferedWriter(new FileWriter(tempFileBuiltForUse));
String lineContents ;
while( (lineContents = tempFileReader.readLine()) != null)
{
Pattern pattern = Pattern.compile("/.");
Matcher matcher = pattern.matcher(lineContents);
String lineByLine = null;
while(matcher.find())
{
lineByLine = lineContents.replaceAll(matcher.group(),System.getProperty("line.separator"));
changer.write(lineByLine);
}
}
changer.close();
tempFileReader.close();
Suppose the contents of my tempFile are:
This/DT is/VBZ a/DT sample/NN text/NN ./.
I want the anotherTempFile to contain :
This/DT is/VBZ a/DT sample/NN text/NN .
with a new line.
But I'm not getting the desired output. And I'm not able to see where I'm going wrong. :-(
Kindly help. :-)
A dot means "every character" in regular expressions. Try to escape it:
Pattern pattern = Pattern.compile("\\./\\.");
(You need two backslahes, to escape the backslash itself inside the String, so that Java knows you want to have a backslash and not a special character as the newline character, e.g. \n
In a regex, the dot (.) matches any character (except newlines), so it needs to be escaped if you want it to match a literal dot. Also, you appear to be missing the first dot in your regex since you want the pattern to match ./.:
Pattern pattern = Pattern.compile("\\./\\.");
Your regular expression has a problem. Also you don't have to use the Pattern and matcher. Simply use replaceAll() method of the String class for the replacement. It would be easier. Try the code below:
tempFileReader = new BufferedReader(
new InputStreamReader(new FileInputStream("c:\\test.txt")));
File tempFileBuiltForUse = new File("C:\\anotherTempFile.txt");
Writer changer = new BufferedWriter(new FileWriter(tempFileBuiltForUse));
String lineContents;
while ((lineContents = tempFileReader.readLine()) != null) {
String lineByLine = lineContents.replaceAll("\\./\\.", System.getProperty("line.separator"));
changer.write(lineByLine);
}
changer.close();
tempFileReader.close();
/. is a regular expression \[any-symbol].
Change into to `/\\.'

url encode matched groups

I've got a regex that's matching a given pattern(obviously, thats what regex's do) and replacing that pattern with an anchor tag and including a captured group. That part is working lovely.
String substituted = content.asString().replaceAll("\\[{2}((?:.)*?)\\]{2}",
"$1");
What I can't figure out is how to url encode the captured group before using it in the href attribute.
Example inputs
[[a]]
[[a b]]
[[a&b]]
desired outputs
a
a b
a&b
Is there any way to do this? I haven't found anything that looks useful yet, though once I ask I usually find an answer.
Replace all special chars with what you want first,
then match that inside the double [ and replace it in the <a href=..> tag.
That, or extract the url part inside the [ and pass it through a URL encoder before placing it in the <a href=..> tag.
Java seems to offer java.net.URLEncoder by default. So I think getting the url from the pattern, and passing though the encoder, and then placing it in the <a href=..> tag is your best choice.
Sure 'nough, found my answer.
Started with the code from Matcher.appendReplacement
Pure java:
Pattern p = Pattern.compile("\\[{2}((?:.)*?)\\]{2}" );
Matcher m = p.matcher(content.asString());
StringBuffer sb = new StringBuffer();
while (m.find()) {
String one = m.group(1);
try {
m.appendReplacement(sb, "$1");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
m.appendTail(sb);
GWT:
RegExp p = RegExp.compile("\\[{2}((?:.)*?)\\]{2}", "g");
MatchResult m;
StringBuffer sb = new StringBuffer();
int beginIndex = 0;
while ((m = p.exec(content.asString())) != null) {
String one = m.getGroup(1);
int endIndex = m.getIndex();
sb.append(content.asString().substring(beginIndex, endIndex));
sb.append("" + one + "");
beginIndex = p.getLastIndex();
}
sb.append(content.asString().substring(beginIndex));

Categories

Resources