Java Matcher overreplacing groups

Java Matcher overreplacing groups - java

public class HexASCIITest {
public static void main(String[] args) throws DecoderException, UnsupportedEncodingException {
String test = "src=\"test/__test/path/path2/AA_5F00_20140915_5F00_15_5F00_11_5F00_55_5F00_image_5F005F00_name.jpg\"";
Pattern patternImages = Pattern.compile("src=\"[^\"]*?/__test/[^/]*?/[^/]*?/([^\"/]*?)\"");
Matcher matcherImages = patternImages.matcher(test);
while(matcherImages.find()) {
String imageName = matcherImages.group(1);
Pattern pattern = Pattern.compile("_((?:[01234567890ABCDEF]{4}){1,})_");
Matcher matcher = pattern.matcher(imageName);
while(matcher.find()) {
byte[] bytes = Hex.decodeHex(matcher.group(1).toCharArray());
String imagePath = new String(bytes, "latin1");
imagePath = imagePath.replaceAll("\0", "");
imageName = imageName.replaceFirst("_((?:[01234567890ABCDEF]{4}){1,})_", imagePath.trim());
}
System.out.println(imageName);
}
}
}
Hi guys, this is a program of mine, that should actually turn the HEX codes to ASCII, but it seems i am having logic problems, could anyone assist me ?
The initial image name is : AA_5F00_20140915_5F00_15_5F00_11_5F00_55_5F00_image_5F005F00_name.jpg
After all of the replaces : AA_15_11_55__image_5F005F00_name.jpg
Which is not how it is supposed to work as the date 20140915 is gone and 5F005F00 is still there. Thank you for your help !

Found it
Regex should be - Pattern pattern = Pattern.compile("(([0123456789ABCDEF]{4}){1,})");
Then - > byte[] bytes = Hex.decodeHex(matcher.group(2).toCharArray());
and finally the replace
imageName = imageName.replaceFirst(matcher.group(1), imagePath.trim());

Related

how to grab and show multiple lines between two string(pattern) from a file in java

i want to grab and show multi-lined string from a file (has more than 20,000 lines of text in it) between two desired string(pattern) using java
ex: file.txt(has more than 20,000 lines of text)
pattern1
string
that i
want
to grab
pattern2
i want to grab and show text in between these two patterns(pattern1 and pattern2) which is in this case "string /n that i /n want /n to grab"
how can i do that
i tried Bufferreader ,file ,string and few more things but nothing worked
sorry im a noob

Is your pattern on several lines ?
One easy solution would be to store the content of you'r file and then check for you'r pattern with a regular expression :
try {
BufferedReader reader = new BufferedReader(new FileReader(new File("test.txt")));
final StringBuilder contents = new StringBuilder();
while(reader.ready()) { // read the file content
contents.append(reader.readLine());
}
reader.close();
Pattern p = Pattern.compile("PATTERN1(.+)PATTERN2"); // prepare your regex
Matcher m = p.matcher(contents.toString());
while(m.find()){ // for each
String b = m.group(1);
System.out.println(b);
}
} catch(Exception e) {
e.printStackTrace();
}

You can use this :
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class HelloWorld{
public static void textBeetweenTwoPattern(String pattern1, String pattern2, String text){
Pattern p = Pattern.compile(pattern1+"([^<]+)"+pattern2, Pattern.MULTILINE);
Matcher m = p.matcher(text);
while (m.find())
{
System.out.println(m.group(1));
}
}
public static void main(String []args){
textBeetweenTwoPattern("<b>", "</b>", "Test <b>regex</b> <i>Java</i> for \n\n<b>Stackoverflow</b> ");
}
}
It returns :
regex
Stackoverflow

Apply regex on url string

I have a url like this,
http://abc-xyz.com/AppName/service?id=1234&isStudent&stream&shouldUpdateRecord=Y&courseType
I want to apply a regex before making a rest call to a 3rd party system. That regex should remove all the keys without a value. i.e from this given url, my regex should remove "&isStudent", "&stream" and "&courseType" and I should be left with,
http://abc-xyz.com/AppName/service?id=1234&shouldUpdateRecord=Y
Any pointers?

I can't do it in one regex, because the number of key-only parameters is variable. But I can do it with a short program like this
public class Playground {
public static void main(String[] args) {
String testInput = "http://abc-xyz.com/AppName/service?id=1234&isStudent&stream&shouldUpdateRecord=Y&courseType";
String[] tokens = testInput.split("\\?");
String urlPrefix = tokens[0];
String paramString = tokens[1];
String[] params = paramString.split("&");
StringBuilder sb = new StringBuilder();
sb.append(urlPrefix + "?");
String keyValueRegex = "(\\w+)=(\\w+)";
String amp = ""; // first time special
for (String param : params) {
if (param.matches(keyValueRegex)) {
sb.append(amp + param);
amp = "&"; // second time and onwards
}
}
System.out.println(sb.toString());
}
}
The output of this program is this:
http://abc-xyz.com/AppName/service?id=1234&shouldUpdateRecord=Y

Java Pattern/ Matcher

This is a sample text: \1f\1e\1d\020028. I cannot modify the input text, I am reading long string of texts from a file.
I want to extract the following: \1f, \1e, \1d, \02
For this, I have written the following regular expression pattern: "\\[a-fA-F0-9]"
I am using Pattern and Matcher classes, but my matcher is not able find the pattern using the mentioned regular expression. I have tested this regex with the text on some online regex websites and surprisingly it works there.
Where am I going wrong?
Original code:
public static void main(String[] args) {
String inputText = "\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d";
inputText = inputText.replace("\\", "\\\\");
String regex = "\\\\[a-fA-F0-9]{2}";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(inputText);
while (m.find()) {
System.out.println(m.group());
}
}
Output: Nothing is printed

(answer changed after OP added more details)
Your string
String inputText = "\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d";
Doesn't actually contains any \ literals because according to Java Language Specification in section 3.10.6. Escape Sequences for Character and String Literals \xxx will be interpreted as character indexed in Unicode Table with octal (base/radix 8) value represented by xxx part.
Example \123 = 1*82 + 2*81 + 3*80 = 1*64 + 2*8 + 3*1 = 64+16+3 = 83 which represents character S
If string you presented in your question is written exactly the same in your text file then you should write it as
String inputText = "\\1f\\1e\\1d\\02002868BF03030000000000000000S023\\1f\\1e\\1d\\03\\0d";
(with escaped \ which now will represent literal).
(older version of my answer)
It is hard to tell what exactly you did wrong without seeing your code. You should be able to find at least \1, \1, \1, \0 since your regex can match one \ and one hexadecimal character placed after it.
Anyway this is how you can find results you mentioned in question:
String text = "\\1f\\1e\\1d\\020028";
Pattern p = Pattern.compile("\\\\[a-fA-F0-9]{2}");
// ^^^--we want to find two hexadecimal
// characters after \
Matcher m = p.matcher(text);
while (m.find())
System.out.println(m.group());
Output:
\1f
\1e
\1d
\02

You need to read the file properly and replace '\' characters with '\\'. Assume that there is file called test_file in your project with this content:
\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d
Here is the code to read the file and extract values:
public static void main(String[] args) throws IOException, URISyntaxException {
Test t = new Test();
t.test();
}
public void test() throws IOException {
BufferedReader br =
new BufferedReader(
new InputStreamReader(
getClass().getResourceAsStream("/test_file.txt"), "UTF-8"));
String inputText;
while ((inputText = br.readLine()) != null) {
inputText = inputText.replace("\\", "\\\\");
Pattern pattern = Pattern.compile("\\\\[a-fA-F0-9]{2}");
Matcher match = pattern.matcher(inputText);
while (match.find()) {
System.out.println(match.group());
}
}
}

Try adding a . at the end, like:
\\[a-fA-F0-9].

If you don't want to modify the input string, you could try something like:
static public void main(String[] argv) {
String s = "\1f\1e\1d\020028";
Pattern regex = Pattern.compile("[\\x00-\\x1f][0-9A-Fa-f]");
Matcher match = regex.matcher(s);
while (match.find()) {
char[] c = match.group().toCharArray();
System.out.println(String.format("\\%d%s",c[0]+0, c[1])) ;
}
}
Yes, it's not perfect, but you get the idea.

Cant match Srt subtitle using Regex in Java

In try in this code to parse an srt subtitle:
public class MatchArray {
public static void main(String args[]) {
File file = new File(
"C:/Users/Thiago/workspace/SubRegex/src/Dirty Harry VOST - Clint Eastwood.srt");
{
try {
Scanner in = new Scanner(file);
try {
String contents = in.nextLine();
while (in.hasNextLine()) {
contents = contents + "\n" + in.nextLine();
}
String pattern = "([\\d]+)\r([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})[\\s]*-->[\\s]*([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})\r(([^|\r]+(\r|$))+)";
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(contents);
ArrayList<String> start = new ArrayList<String>();
while (m.find()) {
start.add(m.group(1));
start.add(m.group(2));
start.add(m.group(3));
start.add(m.group(4));
start.add(m.group(5));
start.add(m.group(6));
start.add(m.group(7));
System.out.println(start);
}
}
finally {
in.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
But when i execute it, it dosent capture any group, when try to capture only the time with this pattern:
([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})[\\s]*-->[\\s]*([\\d]{2}:[\\d]{2}:[\\d]{2}),([\\d]{3})
It works. So how do I make it capture the entire subtitle?

I can not quite understand your need but i thought this can help.
Please try the regex:
(\\d+?)\\s*(\\d+?:\\d+?:\\d+?,\\d+?)\\s+-->\\s+(\\d+?:\\d+?:\\d+?,\\d+?)\\s+(.+)
I tried it on http://www.myregextester.com/index.php and it worked.
I hope this can help.

java pattern to obtain the pagename with extension

For the URL http://questions/ask/stackoverflow.xhtml, the requirement is obtain stackoverflow.
What is the pattern used to obtain this page name?
The substring can be used but I read that the performance for pattern Matcher would be better.

I would guess that a regular expression solution would be more complicated (and likely slower). Here's how I would do it without them:
public static String getFilename(String s) {
int lastSlash = s.lastIndexOf("/");
if (lastSlash < 0) return null;
int nextDot = s.indexOf(".", lastSlash);
return s.substring(lastSlash+1, (nextDot<0) ? s.length() : nextDot);
}
String url = "http://questions/ask/stackoverflow.xhtml";
getFilename(url); // => "stackoverflow"
Of course, if the URL doesn't have a filename then you'll get the hostname instead. You're probably best off parsing a URL, extracting the file part of it, and removing the path and extension. Something like this:
public static String getFilename2(String s) {
URL url = null;
try {
url = new URL(s);
} catch (MalformedURLException mue) { return null; }
String filePart = url.getFile();
if (filePart.equals("")) return "";
File f = new File(filePart);
String filename = f.getName();
int lastDot = filename.lastIndexOf(".");
return (lastDot<0) ? filename : filename.substring(0, lastDot);
}

For that particular URL you can use:
String url = "http://questions/ask/stackoverflow.xhtml";
String pname = url.split("/")[4].split("\\.")[0];
For the more useful (in terms of regex not in performance) Pattern based solution consider this:
String url = "http://questions/ask/stackoverflow.xhtml";
Pattern pt = Pattern.compile("/(?![^/]*/)([^.]*)\\.");
Matcher matcher = pt.matcher(url);
if(matcher.find()) {
System.out.println("Matched: [" + matcher.group(1) + ']');
// prints Matched: [stackoverflow]
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Matcher overreplacing groups - java

Found it Regex should be - Pattern pattern = Pattern.compile("(([0123456789ABCDEF]{4}){1,})"); Then - > byte[] bytes = Hex.decodeHex(matcher.group(2).toCharArray()); and finally the replace imageName = imageName.replaceFirst(matcher.group(1), imagePath.trim());

Related

how to grab and show multiple lines between two string(pattern) from a file in java

Apply regex on url string

Java Pattern/ Matcher

Cant match Srt subtitle using Regex in Java

java pattern to obtain the pagename with extension

Categories

Resources