Split filename into groups - java

Input:
"MyPrefix_CH-DE_ProductName.pdf"
Desired output:
["MyPrefix", "CH", "DE", "ProductName"]
CH is a country code, and it should come from a predefined list, eg. ["CH", "IT", "FR", "GB"]
Edit: prefix can contain _ and - as well but not CH or DE.
DE is a language code, and it should come from a predefined list, eg. ["EN", "IT", "FR", "DE"]
How do I do that?
I'm looking for a regex based solution here.

I'll assume that the extension is always pdf
String str = "MyPref_ix__CH-DE_ProductName.pdf";
String regex = "(.*)_(CH|IT|FR|GB)-(EN|IT|FR|DE)_(.*)\\.pdf";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
String[] res = new String[4];
if(matcher.matches()) {
res[0] = matcher.group(1);
res[1] = matcher.group(2);
res[2] = matcher.group(3);
res[3] = matcher.group(4);
}

You can try the following
String input = "MyPrefix_CH-DE_ProductName.pdf";
String[] segments = input.split("_");
String prefix = segments[0];
String countryCode = segments[1].split("-")[0];
String languageCode = segments[1].split("-")[1];
String fileName = segments[2].substring(0, segments[2].length() - 4);
System.out.println("prefix " + prefix);
System.out.println("countryCode " + countryCode);
System.out.println("languageCode " + languageCode);
System.out.println("fileName " + fileName);

this code does the split and create an object using the returned result, more OOP.
package com.local;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
/**
* Hello world!
*
*/
public class App
{
public static void main( String[] args )
{
List<String> countries = Arrays.asList("CH", "IT", "FR", "GB");
List<String> languages = Arrays.asList("EN", "IT", "FR", "DE");
String filename = "MyPrefix_CH-DE_ProductName.pdf";
//Remove prefix
filename = filename.split("\\.")[0];
List<String> result = Arrays.asList(filename.split("[_\\-]"));
FileNameSplitResult resultOne = new FileNameSplitResult(result.get(0), result.get(1), result.get(2), result.get(3));
System.out.println(resultOne);
}
static class FileNameSplitResult{
String prefix;
String country;
String language;
String productName;
public FileNameSplitResult(String prefix, String country, String language, String productName) {
this.prefix = prefix;
this.country = country;
this.language = language;
this.productName = productName;
}
#Override
public String toString() {
return "FileNameSplitResult{" +
"prefix='" + prefix + '\'' +
", country='" + country + '\'' +
", language='" + language + '\'' +
", productName='" + productName + '\'' +
'}';
}
}
}
Result of execution:
FileNameSplitResult{prefix='MyPrefix', country='CH', language='DE', productName='ProductName'}

You can use String.split two times so you can first split by '_' to get the CH-DE string and then split by '-' to get the CountryCode and LanguageCode.
Updated after your edit, with input containing '_' and '-':
The following code scans through the input String to find countries matches. I changed the input to "My-Pre_fix_CH-DE_ProductName.pdf"
Check the following code:
public static void main(String[] args) {
String [] countries = {"CH", "IT", "FR", "GB"};
String input = "My-Pre_fix_CH-DE_ProductName.pdf";
//First scan to find country position
int index = -1;
for (int i=0; i<input.length()-4; i++){
for (String country:countries){
String match = "_" + country + "-";
String toMatch = input.substring(i, match.length()+i);
if (match.equals(toMatch)){
//Found index
index=i;
break;
}
}
}
String prefix = input.substring(0,index);
String remaining = input.substring(index+1);//remaining is CH-DE_ProductName.pdf
String [] countryLanguageProductCode = remaining.split("_");
String country = countryLanguageProductCode[0].split("-")[0];
String language = countryLanguageProductCode[0].split("-")[1];
String productName = countryLanguageProductCode[1].split("\\.")[0];
System.out.println("[\"" + prefix +"\", \"" + country + "\", \"" + language +"\", \"" + productName+"\"]");
}
It outputs:
["My-Pre_fix", "CH", "DE", "ProductName"]

You can use the following regex :
^(.*?)_(CH|IT|FR|GB)-(EN|IT|FR|DE)_(.*)$
Java code :
Pattern p = Pattern.compile("^(.*?)_(CH|IT|FR|GB)-(EN|IT|FR|DE)_(.*)$");
Matcher m = p.matcher(input);
if (m.matches()) {
String[] result = { m.group(1), m.group(2), m.group(3), m.group(4) };
}
You can try it here.
Note that it would still fail if the prefix could contain a substring like _CH-EN_, and I don't think there's much than can be done about it beside sanitize the inputs.

One more alternative, which is pretty much the same as #billal GHILAS and #Aaron answers but using named groups. I find it handy for myself or for others who after a while look at my code immediately see what my regex does. The named groups make it easier.
String str = "My_Prefix_CH-DE_ProductName.pdf";
Pattern filePattern = Pattern.compile("(?<prefix>\\w+)_"
+ "(?<country>CH|IT|FR|GB)-"
+ "(?<language>EN|IT|FR|DE)_"
+ "(?<product>\\w+)\\.");
Matcher file = filePattern.matcher(str);
file.find();
System.out.println("Prefix: " + file.group("prefix"));
System.out.println("Country: " + file.group("country"));
System.out.println("Language: " + file.group("language"));
System.out.println("Product: " + file.group("product"));

Related

How to replace special Character with a String replacer

I have the following Code:
#Test
public void testReplace(){
int asciiVal = 233;
String str = new Character((char) asciiVal).toString();
String oldName = "Fr" + str + "d" + str + "ric";
System.out.println(oldName);
String newName = oldName.replace("é", "_");
System.out.println(newName);
Assert.assertNotEquals(oldName, newName); // Its still equal. Howto Replace with a String
String notTheWayILike = oldName.replace((char) 233 + "", "_"); // I don't want to do this.
Assert.assertNotEquals(oldName, notTheWayILike);
}
How can I replace the character with a String ?
I need this, because they should be userfriendly defined as Strings or chars.

Separate into column without using split function

I am trying to separate these value into ID, FullName and Phone. I know we can split it by using java split function. But is there any other ways to separate it? Values:
1 Peater John 2522523254
10 Neal Tom 2522523254
11 Tom Jackson 2522523254
111 Jack Smith 2522523254
12 Brownson Black 2522523254
I tried to use substring method but it won't work properly.
String id = line.substring(0, 3);
If I do this then it will work till 4th line, but other won't work properly.
If it is fixed length you can use String.substring(). But you should also trim() the result before you try to convert it to numeric:
String idTxt=line.substring(0,4);
Long id=Long.parseLong(idTxt.trim());
String name=line.substring(5,25).trim(); // or whatever the size is of name column.
You can use regex and Pattern
Pattern pattern = Pattern.compile("(\\d*)\s*([\\w\\s]*)\\s*(\\d*)");
Matcher matcher = pattern.matcher(content);
if (matcher.find()) {
string id = matcher.group(0);
string name = matcher.group(1);
string phone = matcher.group(2);
}
package Generic;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main
{
public static void main(String[] args)
{
String txt=" 12 Brownson Black 2522523254";
String re1=".*?"; // Non-greedy match on filler
String re2="(\\d+)"; // Integer Number 1
String re3="(\\s+)"; // White Space 1
String re4="((?:[a-z][a-z]+))"; // Word 1
String re5="(\\s+)"; // White Space 2
String re6="((?:[a-z][a-z]+))"; // Word 2
String re7="(\\s+)"; // White Space 3
String re8="(\\d+)"; // Integer Number 2
Pattern p = Pattern.compile(re1+re2+re3+re4+re5+re6+re7+re8,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
int id = Integer.parseInt(m.group(1));
String name =m.group(3) + " ";
name = name+m.group(5);
long phone = Long.parseLong(m.group(7));
System.out.println(id);
System.out.println(name);
System.out.println(phone);
}
}
}
What about this:
int first_space;
int last_space;
first_space = my_string.indexOf(' ');
last_space = my_string.lastIndexOf(' ');
if ((first_space > 0) && (last_space > first_space))
{
long id;
String full_name;
String phone;
id = Long.parseLong(my_string.substring(0, first_space));
full_name = my_string.substring(first_space + 1, last_space);
phone = my_string.substring(last_space + 1);
}
Use a regexp:
private static final Pattern RE = Pattern.compile(
"^\\s*(\\d+)\\s+(\\S+(?: \\S+)*)\\s+(\\d+)\\s*$");
Matcher matcher = RE.matcher(s);
if (matcher.matches()) {
System.out.println("ID: " + matcher.group(1));
System.out.println("FullName: " + matcher.group(2));
System.out.println("Phone: " + matcher.group(3));
}
You can use a StringTokenizer for this. You won't have to worry about amount of spaces and/or tabs before or after your values, and no need for complex regex expressions:
String line = " 1 Peater John\t2522523254 ";
StringTokenizer st = new StringTokenizer(line, " \t");
String id = "";
String name = "";
String phone = "";
// The first token is your id, you can parse it to an int if you like or need it
if(st.hasMoreTokens()) {
id = st.nextToken();
}
// Loop over the remaining tokens
while(st.hasMoreTokens()) {
String token = st.nextToken();
// As long a there are other tokens, you're processing the name
if(st.hasMoreTokens()) {
if(name.length() > 0) {
name = name + " ";
}
name = name + token;
}
// If there are no more tokens, you've reached the phone number
else {
phone = token;
}
}
System.out.println(id);
System.out.println(name);
System.out.println(phone);

Extracting a value from a file name base on regex in Java

Suppose my file name pattern is something like this %#_Report_%$_for_%&.xls and %# and %$ regex can have any character but %& is a date.
Now how can i get the actual values of those regex on filename in java.
For example if actual filename is Genr_Report_123_for_20151105.xls how to get
%# value is Genr
%$ value is 123
%& value is 20151105
You can do it like this:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Rgx {
private String str1 = "", str2 = "", date = "";
public static void main(String[] args) {
String fileName = "Genr_Report_123_for_20151105.xls";
Rgx rgx = new Rgx();
rgx.extractValues(fileName);
System.out.println(rgx.str1 + " " + rgx.str2 + " " + rgx.date);
}
private void extractValues(String fileName) {
Pattern pat = Pattern.compile("([^_]+)_Report_([^_]+)_for_([\\d]+)\\.xls");
Matcher m = pat.matcher(fileName);
if (m.find()) {
str1 = m.group(1);
str2 = m.group(2);
date = m.group(3);
}
}
}

String replace not working or Im dumb?

I have this:
for (String[] aZkratkyArray1 : zkratkyArray) {
String oldString = " " + aZkratkyArray1[0] + " ";
String firstString = aZkratkyArray1[0] + " ";
String newString = " " + aZkratkyArray1[1] + " ";
System.out.println(newString);
System.out.println(fileContentsSingle);
fileContentsSingle = fileContentsSingle.replaceAll(oldString, newString);
if (fileContentsSingle.startsWith(firstString)) {
fileContentsSingle = aZkratkyArray1[1] + " " + fileContentsSingle.substring(firstString.length(),fileContentsSingle.length());
}
}
fileContentsSingle is just some regular string, aZkratkyArray is array with shortened words, f.e.:
ht, hello there
wru, who are you
So when fileContentsSingle = ht I am robot
it should end up : hello there I am robot
or when fileContentsSingle = I am robot hru
it should end up : I am robot who are you
But when I sysout fileContentsSingle after this iteration, or during it, string is never changed.
I tried both replace and replaceAll, I tried probably everything I could think of.
Where is the mistake?
EDIT:
This is how I import array:
String[][] zkratkyArray;
try {
LineNumberReader lineNumberReader = new LineNumberReader(new FileReader("zkratky.csv"));
lineNumberReader.skip(Long.MAX_VALUE);
int lines = lineNumberReader.getLineNumber();
lineNumberReader.close();
FileReader fileReader = new FileReader("zkratky.csv");
BufferedReader reader = new BufferedReader(fileReader);
zkratkyArray = new String[lines + 1][2];
String line;
int row = 0;
while ((line = reader.readLine()) != null) {
String[] array = line.split(",");
for (int i = 0; i < array.length; i++) {
zkratkyArray[row][i] = array[i];
}
row++;
}
reader.close();
fileReader.close();
} catch (FileNotFoundException e) {
System.out.println("Soubor se zkratkami nenalezen.");
zkratkyArray = new String[0][0];
}
Your code will work correctly for "ht I am robot". If you print fileContentsSingle after your for loop, it will print what you expect it to print:
final String[][] zkratkyArray = new String[2][];
zkratkyArray[0] = new String[] { "ht", "hello there" };
zkratkyArray[1] = new String[] { "wru", "who are you" };
String fileContentsSingle = "ht I am robot";
for (String[] aZkratkyArray1 : zkratkyArray) {
String oldString = " " + aZkratkyArray1[0] + " ";
String firstString = aZkratkyArray1[0] + " ";
String newString = " " + aZkratkyArray1[1] + " ";
fileContentsSingle = fileContentsSingle.replaceAll(oldString, newString);
if (fileContentsSingle.startsWith(firstString)) {
fileContentsSingle = aZkratkyArray1[1] + " "
+ fileContentsSingle.substring(firstString.length(), fileContentsSingle.length());
}
}
System.out.println(fileContentsSingle); // prints "hello there I am robot"
Concerning "I am robot hru", it will not work because "hru" is at the end of the String, and not followed by a space, and the String you are replacing is " hru " (with spaces before and after).
As you don't use regexps, you don't need replaceAll(), and you can use replace() instead.
Using regexps, you can do a more generic solution working everywhere in the line:
final String[][] zkratkyArray = new String[2][];
zkratkyArray[0] = new String[] { "ht", "hello there" };
zkratkyArray[1] = new String[] { "wru", "who are you" };
String fileContentsSingle = "ht I am robot wru";
for (String[] aZkratkyArray1 : zkratkyArray) {
fileContentsSingle = fileContentsSingle.replaceAll("\\b" + Pattern.quote(aZkratkyArray1[0]) + "\\b",
Matcher.quoteReplacement(aZkratkyArray1[1]));
}
System.out.println(fileContentsSingle); // hello there I am robot who are you
I don't think you are using any regex here. You are just reading a suustring and replace it with another one.
Just use the other version which does not use regex:
fileContentsSingle.replace(oldString, newString);
In the end, I found out that I had BOM's in input.csv file.

Split mathematical string in Java

I have this string: "23+43*435/675-23". How can I split it? The last result which I want is:
String 1st=23
String 2nd=435
String 3rd=675
String 4th=23
I already used this method:
String s = "hello+pLus-minuss*multi/divide";
String[] split = s.split("\\+");
String[] split1 = s.split("\\-");
String[] split2 = s.split("\\*");
String[] split3 = s.split("\\/");
String plus = split[1];
String minus = split1[1];
String multi = split2[1];
String div = split3[1];
System.out.println(plus+"\n"+minus+"\n"+multi+"\n"+div+"\n");
But it gives me this result:
pLus-minuss*multi/divide
minuss*multi/divide
multi/divide
divide
But I require result in this form
pLus
minuss
multi
divide
Try this:
public static void main(String[] args) {
String s ="23+43*435/675-23";
String[] ss = s.split("[-+*/]");
for(String str: ss)
System.out.println(str);
}
Output:
23
43
435
675
23
I dont know why you want to store in variables and then print . Anyway try below code:
public static void main(String[] args) {
String s = "hello+pLus-minuss*multi/divide";
String[] ss = s.split("[-+*/]");
String first =ss[1];
String second =ss[2];
String third =ss[3];
String forth =ss[4];
System.out.println(first+"\n"+second+"\n"+third+"\n"+forth+"\n");
}
Output:
pLus
minuss
multi
divide
Try this out :
String data = "23+43*435/675-23";
Pattern pattern = Pattern.compile("[^\\+\\*\\/\\-]+");
Matcher matcher = pattern.matcher(data);
List<String> list = new ArrayList<String>();
while (matcher.find()) {
list.add(matcher.group());
}
for (int index = 0; index < list.size(); index++) {
System.out.println(index + " : " + list.get(index));
}
Output :
0 : 23
1 : 43
2 : 435
3 : 675
4 : 23
I think it is only the issue of index. You should have used index 0 to get the split result.
String[] split = s.split("\\+");
String[] split1 = split .split("\\-");
String[] split2 = split1 .split("\\*");
String[] split3 = split2 .split("\\/");
String hello= split[0];//split[0]=hello,split[1]=pLus-minuss*multi/divide
String plus= split1[0];//split1[0]=plus,split1[1]=minuss*multi/divide
String minus= split2[0];//split2[0]=minuss,split2[1]=multi/divide
String multi= split3[0];//split3[0]=multi,split3[1]=divide
String div= split3[1];
If the order of operators matters, change your code to this:
String s = "hello+pLus-minuss*multi/divide";
String[] split = s.split("\\+");
String[] split1 = split[1].split("\\-");
String[] split2 = split1[1].split("\\*");
String[] split3 = split2[1].split("\\/");
String plus = split1[0];
String minus = split2[0];
String multi = split3[0];
String div = split3[1];
System.out.println(plus + "\n" + minus + "\n" + multi + "\n" + div + "\n");
Otherwise, to spit on any operator, and store to variable do this:
public static void main(String[] args) {
String s = "hello+pLus-minuss*multi/divide";
String[] ss = s.split("[-+*/]");
String plus = ss[1];
String minus = ss[2];
String multi = ss[3];
String div = ss[4];
System.out.println(plus + "\n" + minus + "\n" + multi + "\n" + div + "\n");
}

Categories

Resources