Scanner - parsing code values using delimiter regex

Scanner - parsing code values using delimiter regex - java

I'm trying to use a Scanner to read in lines of code from a string of the form "p.addPoint(x,y);"
The regex format I'm after is:
*anything*.addPoint(*spaces or nothing* OR ,*spaces or nothing*
What I've tried so far isn't working: [[.]+\\.addPoint(&&[\\s]*[,[\\s]*]]
Any ideas what I'm doing wrong?

I tested this in Python, but the regexp should be transferable to Java:
>>> regex = '(\w+\.addPoint\(\s*|\s*,\s*|\s*\)\s*)'
>>> re.split(regex, 'poly.addPoint(3, 7)')
['', 'poly.addPoint(', '3', ', ', '7', ')', '']
Your regexp seems seriously malformed. Even if it wasn't, matching infinitely many repetitions of the . wildcard character at the beginning of the string would probably result in huge swaths of text matching that aren't actually relevant/desired.
Edit: Misunderstood the original spec., current regexp should be correct.

Another way:
public class MyPattern {
private static final Pattern ADD_POINT;
static {
String varName = "[\\p{Alnum}_]++";
String argVal = "([\\p{Alnum}_\\p{Space}]++)";
String regex = "(" + varName + ")\\.addPoint\\(" +
argVal + "," +
argVal + "\\);";
ADD_POINT = Pattern.compile(regex);
System.out.println("The Pattern is: " + ADD_POINT.pattern());
}
public void findIt(String filename) throws FileNotFoundException {
Scanner s = new Scanner(new FileReader(filename));
while (s.findWithinHorizon(ADD_POINT, 0) != null) {
final MatchResult m = s.match();
System.out.println(m.group(0));
System.out.println(" arg1=" + m.group(2).trim());
System.out.println(" arg2=" + m.group(3).trim());
}
}
public static void main(String[] args) throws FileNotFoundException {
MyPattern p = new MyPattern();
final String fname = "addPoint.txt";
p.findIt(fname);
}
}

Related

Java: Getting a substring from a string in Text file starting after a special word

I would like to extract The Name and Age from The Text file from it. Can someone please provide me some help?
The text content :
fhsdgjfsdk;snfd fsd ;lknf;ksld sldkfj lk
Name: Max Pain
Age: 99 Years
and they df;ml dk fdj,nbfdlkn ......
Code:
package myclass;
import java.io.*;
public class ReadFromFile2 {
public static void main(String[] args)throws Exception {
File file = new File("C:\\Users\\Ss\\Desktop\\s.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
String st;
while ((st = br.readLine()) != null)
System.out.println(st.substring(st.lastIndexOf("Name:")));
// System.out.println(st);
}
}

please try below code.
public static void main(String[] args)throws Exception
{
File file = new File("/root/test.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
String st;
while ((st = br.readLine()) != null) {
if(st.lastIndexOf("Name:") >= 0 || st.lastIndexOf("Age:") >= 0) {
System.out.println(st.substring(st.lastIndexOf(":")+1));
}
}
}

You can use replace method from string class, since String is immutable and is going to create a new string for each modification.
while ((st = br.readLine()) != null)
if(st.startsWith("Name:")) {
String name = st.replace("Name:", "").trim();
st = br.readLine();
String age="";
if(st!= null && st.startsWith("Age:")) {
age = st.replace("Age:", "").trim();
}
// now you should have the name and the age in those variables
}
}

This will do your Job:
public static void main(String[] args) {
String str = "fhsdgjfsdk;snfd fsd ;lknf;ksld sldkfj lk Name: Max Pain Age: 99 Years and they df;ml dk fdj,nbfdlkn";
String[] split = str.split("(\\b: \\b)");
//\b represents an anchor like caret
// (it is similar to $ and ^)
// matching positions where one side is a word character (like \w) and
// the other side is not a word character
// (for instance it may be the beginning of the string or a space character).
System.out.println(split[1].replace("Age",""));
System.out.println(split[2].replaceAll("\\D+",""));
//remove everything except Integer ,i.e. Age
}
Output:
Max Pain
99

If they can occur on the same line and you want to use a pattern don't over matching them, you could use a capturing group and a tempered greedy token.
\b(?:Name|Age):\h*((?:.(?!(?:Name|Age):))+)
Regex demo | Java demo
For example
final String regex = "\\b(?:Name|Age):\\h*((?:.(?!(?:Name|Age):))+)";
final String string = "fhsdgjfsdk;snfd fsd ;lknf;ksld sldkfj lk \n"
+ "Name: Max Pain\n"
+ "Age: 99 Years\n"
+ "and they df;ml dk fdj,nbfdlkn ......\n\n"
+ "fhsdgjfsdk;snfd fsd ;lknf;ksld sldkfj lk \n"
+ "Name: Max Pain Age: 99 Years\n"
+ "and they df;ml dk fdj,nbfdlkn ......";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
Output
Max Pain
99 Years
Max Pain
99 Years

Java 6 converting utf8 to iso88591 charset and ignoring unmappable characters

I have written the following function which gets rid of characters in a string that can't be represented in iso88591:
public static String convert(String str) {
if (str.length()==0) return str;
str = str.replace("–","-");
str = str.replace("“","\"");
str = str.replace("”","\"");
return new String(str.getBytes(),iso88591charset);
}
My problem is this doesn't have the behavior I require.
When it comes across a character that has no representation it is converted to multiple bytes. I want that character to be simply omitted from the result.
I would also like to somehow not have to have all those replace commands.
I have been researching charsetEnocder. It has methods like:
CharsetEncoder encoder = iso88591charset.newEncoder();
encoder.onMalformedInput(CodingErrorAction.IGNORE);
encoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
which seem to be what I want, but I have failed to even write a function that mimics what I already have using charset encoder yet alone get to set those options.
Also I am restricted to Java 6 :(
Update:
I came up with a nasty solution for this, but there must be a better way to do it:
public static String convert(String str) {
if (str.length()==0) return str;
str = str.replace("–","-");
str = str.replace("“","\"");
str = str.replace("”","\"");
String str2 = "";
for (int c=0;c<str.length();c++) {
String cur = (new Character(str.charAt(c))).toString();
if (cur.equals(new String(cur.getBytes(),iso88591charset))) str2 += cur;
}
return new String(str2.getBytes(),iso88591charset);
}

One possibile way could be
// U+2126 - omega sign
// U+2013 - en dash
// U+201c - left double quotation mark
// U+201d - right double quotation mark
String str = "\u2126\u2013\u201c\u201d";
System.out.println("original = " + str);
str = str.replace("–", "-");
str = str.replace("“", "\"");
str = str.replace("”", "\"");
System.out.println("replaced = " + str);
StringBuilder sb = new StringBuilder();
for (char c : str.toCharArray()) {
if (c <= '\u00ff') {
sb.append(c);
}
}
System.out.println("stripped = " + sb);
output
original = Ω–“”
replaced = Ω-""
stripped = -""

Extracting a value from a file name base on regex in Java

Suppose my file name pattern is something like this %#_Report_%$_for_%&.xls and %# and %$ regex can have any character but %& is a date.
Now how can i get the actual values of those regex on filename in java.
For example if actual filename is Genr_Report_123_for_20151105.xls how to get
%# value is Genr
%$ value is 123
%& value is 20151105

You can do it like this:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Rgx {
private String str1 = "", str2 = "", date = "";
public static void main(String[] args) {
String fileName = "Genr_Report_123_for_20151105.xls";
Rgx rgx = new Rgx();
rgx.extractValues(fileName);
System.out.println(rgx.str1 + " " + rgx.str2 + " " + rgx.date);
}
private void extractValues(String fileName) {
Pattern pat = Pattern.compile("([^_]+)_Report_([^_]+)_for_([\\d]+)\\.xls");
Matcher m = pat.matcher(fileName);
if (m.find()) {
str1 = m.group(1);
str2 = m.group(2);
date = m.group(3);
}
}
}

How to match the text file against multiple regex patterns and count the number of occurences of these patterns?

I want to find and count all the occurrences of the words unit, device, method, module in every line of the text file separately. That's what I've done, but I don't know how to use multiple patterns and how to count the occurrence of every word in the line separately? Now it counts only occurrences of all words together for every line. Thank you in advance!
private void countPaterns() throws IOException {
Pattern nom = Pattern.compile("unit|device|method|module|material|process|system");
String str = null;
BufferedReader r = new BufferedReader(new FileReader("D:/test/test1.txt"));
while ((str = r.readLine()) != null) {
Matcher matcher = nom.matcher(str);
int countnomen = 0;
while (matcher.find()) {
countnomen++;
}
//intList.add(countnomen);
System.out.println(countnomen + " davon ist das Wort System");
}
r.close();
//return intList;
}

Better to use a word boundary and use a map to keep counts of each matched keyword.
Pattern nom = Pattern.compile("\\b(unit|device|method|module|material|process|system)\\b");
String str = null;
BufferedReader r = new BufferedReader(new FileReader("D:/test/test1.txt"));
Map<String, Integer> counts = new HashMap<>();
while ((str = r.readLine()) != null) {
Matcher matcher = nom.matcher(str);
while (matcher.find()) {
String key = matcher.group(1);
int c = 0;
if (counts.containsKey(key))
c = counts.get(key);
counts.put(key, c+1)
}
}
r.close();
System.out.println(counts);

Here's a Java 9 (and above) solution:
public static void main(String[] args) {
List<String> expressions = List.of("(good)", "(bad)");
String phrase = " good bad bad good good bad bad bad";
for (String regex : expressions) {
Pattern gPattern = Pattern.compile(regex);
Matcher matcher = gPattern.matcher(phrase);
long count = matcher.results().count();
System.out.println("Pattern \"" + regex + "\" appears " + count + (count == 1 ? " time" : " times"));
}
}
Outputs
Pattern "(good)" appears 3 times
Pattern "(bad)" appears 5 times

JAVA regex failing

I have string which is of format:
;1=2011-10-23T16:16:53+0530;2=2011-10-23T16:16:53+0530;3=2011-10-23T16:16:53+0530;4=2011-10-23T16:16:53+0530;
I have written following code to find string 2011-10-23T16:16:53+0530 from (;1=2011-10-23T16:16:53+0530;)
Pattern pattern = Pattern.compile("(;1+)=(\\w+);");
String strFound= "";
Matcher matcher = pattern.matcher(strindData);
while (matcher.find()) {
strFound= matcher.group(2);
}
But it is not working as expected. Can you please give me any hint?

Can you please give me any hint?
Yes. Neither -, nor :, nor + are part of \w.

Do you have to use a regex? Why not call String.split() to break up the string on semi-colon boundaries. Then call it again to break up the chunks by the equals sign. At that point you'll have an integer and the date in string form. From there you can parse the date string.
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
public final class DateScan {
private static final String INPUT = ";1=2011-10-23T16:16:53+0530;2=2011-10-23T16:16:53+0530;3=2011-10-23T16:16:53+0530;4=2011-10-23T16:16:53+0530;";
public static void main(final String... args) {
final SimpleDateFormat parser = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssZ");
final String[] pairs = INPUT.split(";");
for (final String pair : pairs) {
if ("".equals(pair)) {
continue;
}
final String[] integerAndDate = pair.split("=");
final Integer integer = Integer.parseInt(integerAndDate[0]);
final String dateString = integerAndDate[1];
try {
final Date date = parser.parse(dateString);
System.out.println(integer + " -> " + date);
} catch (final ParseException pe) {
System.err.println("bad date: " + dateString + ": " + pe);
}
}
}
}

I've change the input a bit, but just for presentation reasons that is
You can try this:
String input = " ;1=2011-10-23T16:16:53+0530; 2=2011-10-23T16:17:53+0530;3=2011-10-23T16:18:53+0530;4=2011-10-23T16:19:53+0530;";
Pattern p = Pattern.compile("(;\\d+?)?=(.+?);");
Matcher m = p.matcher(input);
while(m.find()){
System.out.println(m.group(2));
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Scanner - parsing code values using delimiter regex - java

I'm trying to use a Scanner to read in lines of code from a string of the form "p.addPoint(x,y);" The regex format I'm after is: anything.addPoint(spaces or nothing OR ,spaces or nothing What I've tried so far isn't working: [[.]+\\.addPoint(&&[\\s][,[\\s]]] Any ideas what I'm doing wrong?

Related

Java: Getting a substring from a string in Text file starting after a special word

Java 6 converting utf8 to iso88591 charset and ignoring unmappable characters

Extracting a value from a file name base on regex in Java

How to match the text file against multiple regex patterns and count the number of occurences of these patterns?

JAVA regex failing

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Scanner - parsing code values using delimiter regex - java

I'm trying to use a Scanner to read in lines of code from a string of the form "p.addPoint(x,y);" The regex format I'm after is: *anything*.addPoint(*spaces or nothing* OR ,*spaces or nothing* What I've tried so far isn't working: [[.]+\\.addPoint(&&[\\s]*[,[\\s]*]] Any ideas what I'm doing wrong?

Related

Java: Getting a substring from a string in Text file starting after a special word

Java 6 converting utf8 to iso88591 charset and ignoring unmappable characters

Extracting a value from a file name base on regex in Java

How to match the text file against multiple regex patterns and count the number of occurences of these patterns?

JAVA regex failing

Categories

Resources

I'm trying to use a Scanner to read in lines of code from a string of the form "p.addPoint(x,y);" The regex format I'm after is: anything.addPoint(spaces or nothing OR ,spaces or nothing What I've tried so far isn't working: [[.]+\\.addPoint(&&[\\s][,[\\s]]] Any ideas what I'm doing wrong?