Extracting a value from a file name base on regex in Java

Extracting a value from a file name base on regex in Java - java

Suppose my file name pattern is something like this %#_Report_%$_for_%&.xls and %# and %$ regex can have any character but %& is a date.
Now how can i get the actual values of those regex on filename in java.
For example if actual filename is Genr_Report_123_for_20151105.xls how to get
%# value is Genr
%$ value is 123
%& value is 20151105

You can do it like this:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Rgx {
private String str1 = "", str2 = "", date = "";
public static void main(String[] args) {
String fileName = "Genr_Report_123_for_20151105.xls";
Rgx rgx = new Rgx();
rgx.extractValues(fileName);
System.out.println(rgx.str1 + " " + rgx.str2 + " " + rgx.date);
}
private void extractValues(String fileName) {
Pattern pat = Pattern.compile("([^_]+)_Report_([^_]+)_for_([\\d]+)\\.xls");
Matcher m = pat.matcher(fileName);
if (m.find()) {
str1 = m.group(1);
str2 = m.group(2);
date = m.group(3);
}
}
}

Related

How to replace special Character with a String replacer

I have the following Code:
#Test
public void testReplace(){
int asciiVal = 233;
String str = new Character((char) asciiVal).toString();
String oldName = "Fr" + str + "d" + str + "ric";
System.out.println(oldName);
String newName = oldName.replace("é", "_");
System.out.println(newName);
Assert.assertNotEquals(oldName, newName); // Its still equal. Howto Replace with a String
String notTheWayILike = oldName.replace((char) 233 + "", "_"); // I don't want to do this.
Assert.assertNotEquals(oldName, notTheWayILike);
}
How can I replace the character with a String ?
I need this, because they should be userfriendly defined as Strings or chars.

Split filename into groups

Input:
"MyPrefix_CH-DE_ProductName.pdf"
Desired output:
["MyPrefix", "CH", "DE", "ProductName"]
CH is a country code, and it should come from a predefined list, eg. ["CH", "IT", "FR", "GB"]
Edit: prefix can contain _ and - as well but not CH or DE.
DE is a language code, and it should come from a predefined list, eg. ["EN", "IT", "FR", "DE"]
How do I do that?
I'm looking for a regex based solution here.

I'll assume that the extension is always pdf
String str = "MyPref_ix__CH-DE_ProductName.pdf";
String regex = "(.*)_(CH|IT|FR|GB)-(EN|IT|FR|DE)_(.*)\\.pdf";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
String[] res = new String[4];
if(matcher.matches()) {
res[0] = matcher.group(1);
res[1] = matcher.group(2);
res[2] = matcher.group(3);
res[3] = matcher.group(4);
}

You can try the following
String input = "MyPrefix_CH-DE_ProductName.pdf";
String[] segments = input.split("_");
String prefix = segments[0];
String countryCode = segments[1].split("-")[0];
String languageCode = segments[1].split("-")[1];
String fileName = segments[2].substring(0, segments[2].length() - 4);
System.out.println("prefix " + prefix);
System.out.println("countryCode " + countryCode);
System.out.println("languageCode " + languageCode);
System.out.println("fileName " + fileName);

this code does the split and create an object using the returned result, more OOP.
package com.local;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
/**
* Hello world!
*
*/
public class App
{
public static void main( String[] args )
{
List<String> countries = Arrays.asList("CH", "IT", "FR", "GB");
List<String> languages = Arrays.asList("EN", "IT", "FR", "DE");
String filename = "MyPrefix_CH-DE_ProductName.pdf";
//Remove prefix
filename = filename.split("\\.")[0];
List<String> result = Arrays.asList(filename.split("[_\\-]"));
FileNameSplitResult resultOne = new FileNameSplitResult(result.get(0), result.get(1), result.get(2), result.get(3));
System.out.println(resultOne);
}
static class FileNameSplitResult{
String prefix;
String country;
String language;
String productName;
public FileNameSplitResult(String prefix, String country, String language, String productName) {
this.prefix = prefix;
this.country = country;
this.language = language;
this.productName = productName;
}
#Override
public String toString() {
return "FileNameSplitResult{" +
"prefix='" + prefix + '\'' +
", country='" + country + '\'' +
", language='" + language + '\'' +
", productName='" + productName + '\'' +
'}';
}
}
}
Result of execution:
FileNameSplitResult{prefix='MyPrefix', country='CH', language='DE', productName='ProductName'}

You can use String.split two times so you can first split by '_' to get the CH-DE string and then split by '-' to get the CountryCode and LanguageCode.
Updated after your edit, with input containing '_' and '-':
The following code scans through the input String to find countries matches. I changed the input to "My-Pre_fix_CH-DE_ProductName.pdf"
Check the following code:
public static void main(String[] args) {
String [] countries = {"CH", "IT", "FR", "GB"};
String input = "My-Pre_fix_CH-DE_ProductName.pdf";
//First scan to find country position
int index = -1;
for (int i=0; i<input.length()-4; i++){
for (String country:countries){
String match = "_" + country + "-";
String toMatch = input.substring(i, match.length()+i);
if (match.equals(toMatch)){
//Found index
index=i;
break;
}
}
}
String prefix = input.substring(0,index);
String remaining = input.substring(index+1);//remaining is CH-DE_ProductName.pdf
String [] countryLanguageProductCode = remaining.split("_");
String country = countryLanguageProductCode[0].split("-")[0];
String language = countryLanguageProductCode[0].split("-")[1];
String productName = countryLanguageProductCode[1].split("\\.")[0];
System.out.println("[\"" + prefix +"\", \"" + country + "\", \"" + language +"\", \"" + productName+"\"]");
}
It outputs:
["My-Pre_fix", "CH", "DE", "ProductName"]

You can use the following regex :
^(.*?)_(CH|IT|FR|GB)-(EN|IT|FR|DE)_(.*)$
Java code :
Pattern p = Pattern.compile("^(.*?)_(CH|IT|FR|GB)-(EN|IT|FR|DE)_(.*)$");
Matcher m = p.matcher(input);
if (m.matches()) {
String[] result = { m.group(1), m.group(2), m.group(3), m.group(4) };
}
You can try it here.
Note that it would still fail if the prefix could contain a substring like _CH-EN_, and I don't think there's much than can be done about it beside sanitize the inputs.

One more alternative, which is pretty much the same as #billal GHILAS and #Aaron answers but using named groups. I find it handy for myself or for others who after a while look at my code immediately see what my regex does. The named groups make it easier.
String str = "My_Prefix_CH-DE_ProductName.pdf";
Pattern filePattern = Pattern.compile("(?<prefix>\\w+)_"
+ "(?<country>CH|IT|FR|GB)-"
+ "(?<language>EN|IT|FR|DE)_"
+ "(?<product>\\w+)\\.");
Matcher file = filePattern.matcher(str);
file.find();
System.out.println("Prefix: " + file.group("prefix"));
System.out.println("Country: " + file.group("country"));
System.out.println("Language: " + file.group("language"));
System.out.println("Product: " + file.group("product"));

Separate into column without using split function

I am trying to separate these value into ID, FullName and Phone. I know we can split it by using java split function. But is there any other ways to separate it? Values:
1 Peater John 2522523254
10 Neal Tom 2522523254
11 Tom Jackson 2522523254
111 Jack Smith 2522523254
12 Brownson Black 2522523254
I tried to use substring method but it won't work properly.
String id = line.substring(0, 3);
If I do this then it will work till 4th line, but other won't work properly.

If it is fixed length you can use String.substring(). But you should also trim() the result before you try to convert it to numeric:
String idTxt=line.substring(0,4);
Long id=Long.parseLong(idTxt.trim());
String name=line.substring(5,25).trim(); // or whatever the size is of name column.

You can use regex and Pattern
Pattern pattern = Pattern.compile("(\\d*)\s*([\\w\\s]*)\\s*(\\d*)");
Matcher matcher = pattern.matcher(content);
if (matcher.find()) {
string id = matcher.group(0);
string name = matcher.group(1);
string phone = matcher.group(2);
}

package Generic;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main
{
public static void main(String[] args)
{
String txt=" 12 Brownson Black 2522523254";
String re1=".*?"; // Non-greedy match on filler
String re2="(\\d+)"; // Integer Number 1
String re3="(\\s+)"; // White Space 1
String re4="((?:[a-z][a-z]+))"; // Word 1
String re5="(\\s+)"; // White Space 2
String re6="((?:[a-z][a-z]+))"; // Word 2
String re7="(\\s+)"; // White Space 3
String re8="(\\d+)"; // Integer Number 2
Pattern p = Pattern.compile(re1+re2+re3+re4+re5+re6+re7+re8,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
int id = Integer.parseInt(m.group(1));
String name =m.group(3) + " ";
name = name+m.group(5);
long phone = Long.parseLong(m.group(7));
System.out.println(id);
System.out.println(name);
System.out.println(phone);
}
}
}

What about this:
int first_space;
int last_space;
first_space = my_string.indexOf(' ');
last_space = my_string.lastIndexOf(' ');
if ((first_space > 0) && (last_space > first_space))
{
long id;
String full_name;
String phone;
id = Long.parseLong(my_string.substring(0, first_space));
full_name = my_string.substring(first_space + 1, last_space);
phone = my_string.substring(last_space + 1);
}

Use a regexp:
private static final Pattern RE = Pattern.compile(
"^\\s*(\\d+)\\s+(\\S+(?: \\S+)*)\\s+(\\d+)\\s*$");
Matcher matcher = RE.matcher(s);
if (matcher.matches()) {
System.out.println("ID: " + matcher.group(1));
System.out.println("FullName: " + matcher.group(2));
System.out.println("Phone: " + matcher.group(3));
}

You can use a StringTokenizer for this. You won't have to worry about amount of spaces and/or tabs before or after your values, and no need for complex regex expressions:
String line = " 1 Peater John\t2522523254 ";
StringTokenizer st = new StringTokenizer(line, " \t");
String id = "";
String name = "";
String phone = "";
// The first token is your id, you can parse it to an int if you like or need it
if(st.hasMoreTokens()) {
id = st.nextToken();
}
// Loop over the remaining tokens
while(st.hasMoreTokens()) {
String token = st.nextToken();
// As long a there are other tokens, you're processing the name
if(st.hasMoreTokens()) {
if(name.length() > 0) {
name = name + " ";
}
name = name + token;
}
// If there are no more tokens, you've reached the phone number
else {
phone = token;
}
}
System.out.println(id);
System.out.println(name);
System.out.println(phone);

Scanner - parsing code values using delimiter regex

I'm trying to use a Scanner to read in lines of code from a string of the form "p.addPoint(x,y);"
The regex format I'm after is:
*anything*.addPoint(*spaces or nothing* OR ,*spaces or nothing*
What I've tried so far isn't working: [[.]+\\.addPoint(&&[\\s]*[,[\\s]*]]
Any ideas what I'm doing wrong?

I tested this in Python, but the regexp should be transferable to Java:
>>> regex = '(\w+\.addPoint\(\s*|\s*,\s*|\s*\)\s*)'
>>> re.split(regex, 'poly.addPoint(3, 7)')
['', 'poly.addPoint(', '3', ', ', '7', ')', '']
Your regexp seems seriously malformed. Even if it wasn't, matching infinitely many repetitions of the . wildcard character at the beginning of the string would probably result in huge swaths of text matching that aren't actually relevant/desired.
Edit: Misunderstood the original spec., current regexp should be correct.

Another way:
public class MyPattern {
private static final Pattern ADD_POINT;
static {
String varName = "[\\p{Alnum}_]++";
String argVal = "([\\p{Alnum}_\\p{Space}]++)";
String regex = "(" + varName + ")\\.addPoint\\(" +
argVal + "," +
argVal + "\\);";
ADD_POINT = Pattern.compile(regex);
System.out.println("The Pattern is: " + ADD_POINT.pattern());
}
public void findIt(String filename) throws FileNotFoundException {
Scanner s = new Scanner(new FileReader(filename));
while (s.findWithinHorizon(ADD_POINT, 0) != null) {
final MatchResult m = s.match();
System.out.println(m.group(0));
System.out.println(" arg1=" + m.group(2).trim());
System.out.println(" arg2=" + m.group(3).trim());
}
}
public static void main(String[] args) throws FileNotFoundException {
MyPattern p = new MyPattern();
final String fname = "addPoint.txt";
p.findIt(fname);
}
}

JAVA regex failing

I have string which is of format:
;1=2011-10-23T16:16:53+0530;2=2011-10-23T16:16:53+0530;3=2011-10-23T16:16:53+0530;4=2011-10-23T16:16:53+0530;
I have written following code to find string 2011-10-23T16:16:53+0530 from (;1=2011-10-23T16:16:53+0530;)
Pattern pattern = Pattern.compile("(;1+)=(\\w+);");
String strFound= "";
Matcher matcher = pattern.matcher(strindData);
while (matcher.find()) {
strFound= matcher.group(2);
}
But it is not working as expected. Can you please give me any hint?

Can you please give me any hint?
Yes. Neither -, nor :, nor + are part of \w.

Do you have to use a regex? Why not call String.split() to break up the string on semi-colon boundaries. Then call it again to break up the chunks by the equals sign. At that point you'll have an integer and the date in string form. From there you can parse the date string.
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
public final class DateScan {
private static final String INPUT = ";1=2011-10-23T16:16:53+0530;2=2011-10-23T16:16:53+0530;3=2011-10-23T16:16:53+0530;4=2011-10-23T16:16:53+0530;";
public static void main(final String... args) {
final SimpleDateFormat parser = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssZ");
final String[] pairs = INPUT.split(";");
for (final String pair : pairs) {
if ("".equals(pair)) {
continue;
}
final String[] integerAndDate = pair.split("=");
final Integer integer = Integer.parseInt(integerAndDate[0]);
final String dateString = integerAndDate[1];
try {
final Date date = parser.parse(dateString);
System.out.println(integer + " -> " + date);
} catch (final ParseException pe) {
System.err.println("bad date: " + dateString + ": " + pe);
}
}
}
}

I've change the input a bit, but just for presentation reasons that is
You can try this:
String input = " ;1=2011-10-23T16:16:53+0530; 2=2011-10-23T16:17:53+0530;3=2011-10-23T16:18:53+0530;4=2011-10-23T16:19:53+0530;";
Pattern p = Pattern.compile("(;\\d+?)?=(.+?);");
Matcher m = p.matcher(input);
while(m.find()){
System.out.println(m.group(2));
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting a value from a file name base on regex in Java - java

Related

How to replace special Character with a String replacer

Split filename into groups

Separate into column without using split function

Scanner - parsing code values using delimiter regex

JAVA regex failing

Categories

Resources