I have an input file with the following format:
Ontario:Brampton:43° 41' N:79° 45' W
Ontario:Toronto:43° 39' N:79° 23' W
Quebec:Montreal:45° 30' N:73° 31' W
...
I have a class named where the values will go.
example:
Province: Ontario
City: Brampton
LatDegrees: 43
LatMinutes: 41
LatDirection: N
LongDegrees: 79 .... etc
I have already completed a method that parses this out correctly, but i'm trying to learn if this can be done better with Java 8 using Streams, Lambdas.
If I start with the following:
Files.lines(Paths.get(inputFile))
.map(line -> line.split("\\b+")) //this delimits everything
//.filter(x -> x.startsWith(":"))
.flatMap(Arrays::stream)
.forEach(System.out::println);
Can someone please help me reproduce the following please ?
private void parseLine(String data) {
int counter1 = 1; //1-2 province or city
int counter2 = 1; //1-2 LatitudeDirection,LongitudeDirection
int counter3 = 1; //1-4 LatitudeDegrees,LatitudeMinutes,LongitudeDegrees,LongitudeMinutes
City city = new City(); //create City object
//String read = Arrays.toString(data); //convert array element to String
String[] splited = data.split(":"); //set delimiter
for (String part : splited) {
//System.out.println(part);
char firstChar = part.charAt(0);
if(Character.isDigit(firstChar)){ //if the first char is a digit, then this part needs to be split again
String[] splited2 = part.split(" "); //split second time with space delimiter
for (String part2: splited2){
firstChar = part2.charAt(0);
if (Character.isDigit(firstChar)){ //if the first char is a digit, then needs trimming
String parseDigits = part2.substring(0, part2.length()-1); //trim trailing degrees or radians character
switch(counter2++){
case 1:
city.setLatitudeDegrees(Integer.parseInt(parseDigits));
//System.out.println("LatitudeDegrees: " + city.getLatitudeDegrees());
break;
case 2:
city.setLatitudeMinutes(Integer.parseInt(parseDigits));
//System.out.println("LatitudeMinutes: " + city.getLatitudeMinutes());
break;
case 3:
city.setLongitudeDegrees(Integer.parseInt(parseDigits));
//System.out.println("LongitudeDegrees: " + city.getLongitudeDegrees());
break;
case 4:
city.setLongitudeMinutes(Integer.parseInt(parseDigits));
//System.out.println("LongitudeMinutes: " + city.getLongitudeMinutes());
counter2 = 1; //reset counter2
break;
}
}else{
if(counter3 == 1){
city.setLatitudeDirection(part2.charAt(0));
//System.out.println("LatitudeDirection: " + city.getLatitudeDirection());
counter3++; //increment counter3 to use longitude next
}else{
city.setLongitudeDirection(part2.charAt(0));
//System.out.println("LongitudeDirection: " + city.getLongitudeDirection());
counter3 = 1; //reset counter 3
//System.out.println("Number of cities: " + cities.size());
cities.add(city);
}
}
}
}else{
if(counter1 == 1){
city.setProvince(part);
//System.out.println("\nProvince: " + city.getProvince());
counter1++;
}else if(counter1 == 2){
city.setCity(part);
//System.out.println("City: " + city.getCity());
counter1 = 1; //reset counter1
}
}
}
}
There's probably a better solution to my parseLine() method no doubt, but I would really like to condense that as outlined above.
Thanks !!
Let’s start with some general notes.
Your sequence .map(line -> line.split("\\b+")).flatMap(Arrays::stream) isn’t recommended. These two steps will first create an array before creating another stream wrapping that array. You can skip the array step by using splitAsStream though this requires you to deal with Pattern explicitly instead of hiding it within String.split:
.flatMap(Pattern.compile("\\b+")::splitAsStream)
but note that in this case, splitting into words doesn’t really pay off.
If you want to keep your original parseLine method, you can simply do
Files.lines(Paths.get(inputFile))
.forEach(this::parseLine);
and you’re done.
But seriously, that is not a real solution. To do pattern matching, you should use a library designated to pattern matching, e.g. the regex package. You are using it already, when you do splitting via split("\\b+") but that’s far behind from what it can do for you.
Lets define the pattern:
(…) forms a group that allows capturing the matching part so we can extract it for our result
[^:]* specifies a token consisting of arbitrary characters except the colon ([^:]) of arbitrary length (*)
\d+ defines a number (d = numeric digit, + = one or more)
[NS] and [WE] match a single character being either N or S, or either W or E, respectively
so the entire pattern you are looking for is
([^:]*):([^:]*):(\d+)° (\d+)' ([NS]):(\d+)° (\d+)' ([WE])
and the entire parse routine will be:
static Pattern CITY_PATTERN=Pattern.compile(
"([^:]*):([^:]*):(\\d+)° (\\d+)' ([NS]):(\\d+)° (\\d+)' ([WE])");
static City parseCity(String line) {
Matcher matcher = CITY_PATTERN.matcher(line);
if(!matcher.matches())
throw new IllegalArgumentException(line+" doesn't match "+CITY_PATTERN);
City city=new City();
city.setProvince(matcher.group(1));
city.setCity(matcher.group(2));
city.setLatitudeDegrees(Integer.parseInt(matcher.group(3)));
city.setLatitudeMinutes(Integer.parseInt(matcher.group(4)));
city.setLatitudeDirection(line.charAt(matcher.start(5)));
city.setLongitudeDegrees(Integer.parseInt(matcher.group(6)));
city.setLongitudeMinutes(Integer.parseInt(matcher.group(7)));
city.setLongitudeDirection(line.charAt(matcher.start(8)));
return city;
}
and I really hope you are calling your hard-to-read method never “condense” anymore…
Using the routine above, a clean Stream-based processing solution would look like
List<City> cities = Files.lines(Paths.get(inputFile))
.map(ContainingClass::parseCity).collect(Collectors.toList());
to collect a file into a new list of cities.
Related
Currently I am having a hard time trying to figure out if there is a better way to refactor the following code.
Given the following:
String detail = "POTATORANDOMFOOD";
Lets say I want to assign variables with different parts of detail, the end result would look something like this.
String title = detail.substring(0, 6); // POTATO
String label = detail.substring(6, 12); // RANDOM
String tag = detail.substring(12, 16); // FOOD
Now lets say the string detail length constantly changes, sometimes it only contains "POTATORANDOM" and no "FOOD", sometimes it contains even more characters "POTATORANDOMFOODTODAY", so another variable would be used.
String title = detail.substring(0, 6); // POTATO
String label = detail.substring(6, 12); // RANDOM
String tag = detail.substring(12, 16); // FOOD
...
String etc = detail.substring(30, 40); // etc value from detail string
The issue with this, is that since the string sometimes is shorter or longer, we would run into the StringIndexOutOfBoundsException which is not good.
So currently I have a naive way to handle this:
if (detail != null || !detail.isEmpty()) {
if (detail.length() >= 6) {
title = detail.substring(0, 6);
if (detail.length() >= 12) {
label = detail.substring(6, 12);
if (detail.length() >= 16) {
tag = detail.substring(12, 16);
.
.
.
}
}
}
}
This can get really messy, especially if lets say the string were to grow even more.
So my question is, what would be a good design pattern that would fit for this type of problem? I have tried the chain of responsibility design pattern but, the issue with this one is that it only returns a single value, while I am trying to return multiple ones if possible. This way I can assign multiple variables depending on the length of the string.
Any help/hints is greatly appreciated!
Edited:
The order and length are always the same. So title will always be first and it will always contain 6 characters. label will always be second and it will always contain 6 characters. tag will always be third and it will always contain 4 characters, etc.
If I was you, I would do the following:
Define a class to hold a Word definition
public class Word {
private final String name;
private final int startIndex;
private final int endIndex;
public Word(String name, int startIndex, int endIndex) {
this.name = name;
this.startIndex = startIndex;
this.endIndex = endIndex;
}
public String getName() { return name; }
public int getStartIndex() { return startIndex; }
public int getEndIndex() { return endIndex; }
}
Create a static list which holds all the possible words
public static final List<Word> WORDS = List.of(
new Word("title", 0, 6),
new Word("label", 6, 12),
new Word("tag", 12, 16),
...
);
Create a function that parses the String detail by walking this list until when the size of the string is exhausted
... and of course storing the elements into a Map<String, String> so that you can access them later.
public Map<String, String> parseDetail(String detail) {
Map<String, String> receivedWords = new LinkedHashMap<>(); //<-- map respecting insertion order
if (detail.isEmpty()) {
return receivedWords;
}
int parsedLength = 0;
for (Word word : WORDS) {
receivedWords.put(word.getName(), detail.substring(word.getStartIndex(), word.getEndIndex()); //<-- store the current word
parsedLength += word.getEndIndex() - word.getStartIndex(); //increase the parsedLength by the length of your word
if (parsedLength >= detail.length()) {
break; //<-- exit the loop when you're done with the parsing
}
}
return receivedWords;
}
To sum up:
Map<String, String> receivedWords = parseDetail(detail);
receivedWords.forEach((k, v) -> {
System.out.println("Key: " + k + ", value: " + v);
});
Output:
Key: title, value: POTATO
Key: label, value: RANDOM
Key: tag, value: FOOD
...
Tip 1: The input you receive looks pretty weird. I understand that you cannot change it but I would try to negotiate with the caller (if possible) a better way to send you their input (ideally a structured object, if not possible at least a string with some separator so that you can simply split by that character).
Tip 2: I have defined the list of words statically in the code. But I would instead define an external file (e.g. a Json file, or an Xml, or even a simple text file) that you parse dynamically to create the list. That will allow someone else to configure this file with the words/start index/end index without you having to do it in the code each time there is a change.
You could simply check the length of the total string to see if it has the RANDOM and the FOOD attributes before using substring()
String title = "", label = "", tag = "";
if (detail.length() >= 6)
title = detail.substring(0, 6);
if (detail.length() >= 12)
label = detail.substring(6, 12);
if (detail.length() == 16)
tag = detail.substring(12,16);
I would suggest a regex aproach:
public static void main(String[] args) {
String detail = "POTATORANDOMFOODTODAY";
Pattern p = Pattern.compile("(.{0,6})(.{0,6})(.{0,4})(.{0,5})");
Matcher m = p.matcher(detail);
m.find();
String title = m.group(1);
String label = m.group(2);
String tag = m.group(3);
String day = m.group(4);
System.out.println("title: " + title + ", lable: " + label + ", tag: " + tag + ", day: " + day);
}
//output: title: POTATO, lable: RANDOM, tag: FOOD, day: TODAY
If you have a lots of groups I would suggest to use named captured groups. The approach above can particularly be difficult to maintain as adding or removing a group in the middle of the regex upsets the previous numbering used via Matcher#group(int groupNumber). Using named capturing groups:
public static void main(String[] args) {
String detail = "POTATORANDOMFOODTODAY";
Pattern p = Pattern.compile("(?<title>.{0,6})(?<label>.{0,6})(?<tag>.{0,4})(?<day>.{0,5})");
Matcher m = p.matcher(detail);
m.find();
String title = m.group("title");
String label = m.group("label");
String tag = m.group("tag");
String day = m.group("day");
System.out.println("title: " + title + ", lable: " + label + ", tag: " + tag + ", day: " + day);
}
//output: title: POTATO, lable: RANDOM, tag: FOOD, day: TODAY
If the string is dynamic then it can essentially contain basically anything and since there can possibly be no whitespace(s) in the string the only way to know what a specific word (substring) might be is to play the string against a 'word list'. You can quickly come to realize how pivotal even a single whitespace (or separator character) can be within a string. Using the String#substring() method is only good if you already know what all the words within the detail string happen to be.
The simple solution would be to set acceptable rules as to how a specific string should be received. After all, why would you want to accept a string that contains multiple words without a separator character of some type to begin with. If the string has whitespaces in it, to separate the words contained within that string, a mere:
String[] words = string.split("\\s+");
line of code would do the trick. Bottom line, get rid of that nonsense of accepting strings containing multiple words with no separation mechanism included, even if that separation mechanism is by making use of the underscore ( _ ) character (or some other character). Well...if you can.
I suppose sometimes we just can't modify how we're dealt things (something like taxes) and how we receive specific strings is simply out of our control. If this is the case then one way to deal with this dilemma is to work against an established Word-List. This word list can in in the size of a few words to hundreds of thousands of words. The situation you need to deal with will determine the word list size. If small enough the word list can be contained within a String Array or a collection like an ArrayList or List Interface. If really large however then the word list would most likely be contained within a Text file. The word list I most commonly use contains well over 370,000 individual words.
Here is an example of using a small Word-List contained within a List Interface:
String detail = "POTATORANDOMFOODTODAY";
List<String> wordList = Arrays.asList(new String[] {
"pumpkin", "carrot", "potato", "tomato", "lettus", "radish", "bean",
"pea", "food", "random", "today", "yesterday", "tomorrow",
});
// See if the detail string 'contains' any word-list words...
List<String> found = new ArrayList<>();
for (int i = 0; i < wordList.size(); i++) {
String word = wordList.get(i);
if (detail.toLowerCase().contains(word.toLowerCase())) {
found.add(word.toUpperCase());
}
}
/* Ensure the words within the list are in proper order.
That is, the same order as they are received within the
detail String. This is necessary since words from the
word-List can be found anywhere within the detail string. */
int startIndex = 0;
List<String> foundWords = new ArrayList<>();
String tmpStrg = "";
while (!tmpStrg.equals(detail)) {
for (int i = 0; i < found.size(); i++) {
String word = found.get(i);
if (detail.indexOf(word) == startIndex) {
foundWords.add(word);
startIndex = startIndex + word.length();
String procStrg = foundWords.toString().replace(", ", "");
tmpStrg = procStrg.substring(1, procStrg.length() - 1);
}
}
}
//Format and Display the required data
if (foundWords.isEmpty()) {
System.err.println("Couldn't find any required words!");
return; // or whatever...
}
String title = foundWords.get(0);
String label = foundWords.size() > 1 ? foundWords.get(1) : "N/A";
String[] tag = new String[1];
if (foundWords.size() > 2) {
tag = new String[foundWords.size()-2];
for (int i = 0; i < foundWords.size() - 2; i++) {
tag[i] = foundWords.get(i + 2);
}
}
else {
tag[0] = "N/A";
}
System.out.println("Title:\t" + title);
System.out.println("Label:\t" + label);
System.out.println("Tags:\t"
+ Arrays.toString(tag).substring(1, Arrays.toString(tag).length() - 1));
When the above code is run the console window would display:
Title: POTATO
Label: RANDOM
Tags: FOOD, TODAY
You can use the Stream API and use filter() method.
Then you use map() to apply your existing logic, that should do the trick.
Switch-cases could be an alternative but it adds more LoC but reduces the arrow code of all the nested ifs
I have the reverse part done, but I'm having trouble about the hyphen. Any help is appreciated! Also, the code so far.
public static void main(String[] args) {
Scanner kbd = new Scanner(System.in);
System.out.print( "Enter a string of words that contains a hyphen: ");
String word = kbd.next();
for (int i = word.length()-1; i >= 0; i--) {
System.out.print(word.charAt(i));
}
}
Example input:
low-budget
Required output:
tegdub (the reverse of the part after the hyphen)
This is the simplest possible solution I can think of (ofc there are other better solutions but this is my implementation:
public static void main(String[] args) {
Scanner kbd = new Scanner(System.in);
System.out.print( "Enter a string of words that contains a hyphen: ");
String word = kbd.next();
int loc = word.indexOf('-'); //Here I am trying to find the location of that hyphen
for (int i = word.length()-1; i > loc; i--) { //Now print the rest of the String in reverse TILL that location where we found hyphen. Notic i > loc
System.out.print(word.charAt(i));
}
System.out.print(" ");
for (int i = loc + 1; i < word.length(); i++) { //Now print the original String starting after the hyphen. Notice int i = loc + 1
System.out.print(word.charAt(i));
}
}
I would do it this way (in one line):
System.out.println(new StringBuilder(word.replaceAll(".*-", "")).reverse());
Edge cases handled for free:
If there's no hyphen, the whole string is printed reversed
If there's more than one hyphen, the last one is used. To use the first one, change the match regex to "^.*?-"
If the string is blank, a blank is printed
Think about all the code that didn't need to be written to handle these (valid) input cases
Breaking down how this works:
word.replaceAll(".*-", "") does a replacement of all matches to the regex .*-, which means "everything up to and including the (last) hyphen", with a blank - effectively deleting the match
new StringBuilder(...) creates a StringBuilder initialized with the String passed into the constructor (from point 1). The only reason we need a StringBuilder is to use the reverse() method (String doesn't have it)
reverse() reverses the StringBuilder's contents and returns it ready for the next call (see Fluent Interface)
Passing a non-String to System.out.println causes String.valueOf() to be invoked on the object, which in turn invokes the objects toString() method, which for a StringBuilder returns its contents
Voila!
Here's a (one-line) Java 8 stream-based solution for interest:
word.chars().skip(word.indexOf('-') + 1).mapToObj(c -> String.valueOf((char)c))
.reduce("", (a, b) -> b + a).ifPresent(System.out::println);
Edge case treatment:
Conveniently, if there's no hyphen, the whole string is printed in reverse. This is due to indexOf(char) returning -1 in the case of not found, so the end result is skipping zero (-1 + 1)
If more than one hyphen is present, only the first will be used to split the word
A blank string prints nothing, because the chars() stream is empty
To print a blank when the input is blank, use this code instead:
System.out.println(word.chars().skip(word.indexOf('-') + 1)
.mapToObj(c -> String.valueOf((char)c)).reduce("", (a, b) -> b + a));
Notice the use of the alternate form of the reduce() method, wherein an identity value of a blank ("") is passed in, which is used in the case of an empty stream to guarantee a reduction result.
First, split it based on the -.
Then, go over the second part in reverse...
String s = "low-budget";
String[] t = s.split("-");
for (int i = t[1].length() - 1; i >= 0; --i) {
System.out.print(t[1].charAt(i));
}
I have a serious problem with extracting terms from each string line. To be more specific, I have one csv formatted file which is actually not csv format (it saves all terms into line[0] only)
So, here's just example string line among thousands of string lines;
test.csv
line1 : "31451 CID005319044 15939353 C8H14O3S2 beta-lipoic acid C1CS#S[C##H]1CCCCC(=O)O "
line2 : "12232 COD05374044 23439353 C924O3S2 saponin CCCC(=O)O "
line3 : "9048 CTD042032 23241 C3HO4O3S2 Berberine [C##H]1CCCCC(=O)O "
I want to extract "beta-lipoic acid" ,"saponin" and "Berberine" only which is located in 5th position.
You can see there are big spaces between terms, so that's why I said 5th position.
In this case, how can I extract terms located in 5th position for each line?
one more thing ;
the length of whitespace between each six terms is not always equal.
the length could be one,two,three or four..five... something like that..
Another try:
import java.io.File;
import java.util.Scanner;
public class HelloWorld {
// The amount of columns per row, where each column is seperated by an arbitrary number
// of spaces or tabs
final static int COLS = 7;
public static void main(String[] args) {
System.out.println("Tokens:");
try (Scanner scanner = new Scanner(new File("input.txt")).useDelimiter("\\s+")) {
// Counten the current column-id
int n = 0;
String tmp = "";
StringBuilder item = new StringBuilder();
// Operating of a stream
while (scanner.hasNext()) {
tmp = scanner.next();
n += 1;
// If we have reached the fifth column, take its content and append the
// sixth column too, as the name we want consists of space-separated
// expressions. Feel free to customize of your name-layout varies.
if (n % COLS == 5) {
item.setLength(0);
item.append(tmp);
item.append(" ");
item.append(scanner.next());
n += 1;
System.out.println(item.toString()); // Doing some stuff with that
//expression we got
}
}
}
catch(java.io.IOException e){
System.out.println(e.getMessage());
}
}
}
if your line[]'s type is String
String s = line[0];
String[] split = s.split(" ");
return split[4]; //which is the fifth item
For the delimiter, if you want to go more precisely, you can use regular expression.
How is the column separated? For example, if the columns are separated by tab character, I believe you can use the split method. Try using the below:
String[] parts = str.split("\\t");
Your expected result will be in parts[4].
Just use String.split() using a regex for at least 2 whitespace characters:
String foo = "31451 CID005319044 15939353 C8H14O3S2 beta-lipoic acid C1CS#S[C##H]1CCCCC(=O)O";
String[] bar = foo.split("\\s\\s");
bar[4]; // beta-lipoic acid
i need to develope a new methode, that should replace all Umlaute (ä, ö, ü) of a string entered with high performance with the correspondent HTML_Escapecodes. According to statistics only 5% of all strings entered contain Umlauts. As it is supposed that the method will be used extensively, any instantiation that is not necessary should be avoided.
Could someone show me a way to do it?
These are the HTML escape codes. Additionally, HTML features arbitrary escaping with codes of the format : and equivalently :
A simple string-replace is not going to be efficient with so many strings to replace. I suggest you split the string by entity matches, such as this:
String[] parts = str.split("&([A-Za-z]+|[0-9]+|x[A-Fa-f0-9]+);");
if(parts.length <= 1) return str; //No matched entities.
Then you can re-build the string with the replaced parts inserted.
StringBuilder result = new StringBuilder(str.length());
result.append(parts[0]); //First part always exists.
int pos = parts[0].length + 1; //Skip past the first entity and the ampersand.
for(int i = 1;i < parts.length;i++) {
String entityName = str.substring(pos,str.indexOf(';',pos));
if(entityName.matches("x[A-Fa-f0-9]+") && entityName.length() <= 5) {
result.append((char)Integer.decode("0" + entityName));
} else if(entityName.matches("[0-9]+")) {
result.append((char)Integer.decode(entityName));
} else {
switch(entityName) {
case "euml": result.append('ë'); break;
case "auml": result.append('ä'); break;
...
default: result.append("&" + entityName + ";"); //Unknown entity. Give the original string.
}
}
result.append(parts[i]); //Append the text after the entity.
pos += entityName.length() + parts[i].length() + 2; //Skip past the entity name, the semicolon and the following part.
}
return result.toString();
Rather than copy-pasting this code, type it in your own project by hand. This gives you the opportunity to look at how the code actually works. I didn't run this code myself, so I can't guarantee it being correct. It can also be made slightly more efficient by pre-compiling the regular expressions.
I have an array of line, which is somewhat like below
Here's example:
A-NUMBER ROUTINF ACO AO L MISCELL
0-0 0 1-20
0-00
0-01 FDS 3-20
0-02 6 7 3-20
0-03 4 3-20
1-0 F=PRE
ANT=3
NAPI=1
1-1 F=PRE
ANT=3
I need to parse the line according to column by skipping the column which has blank values and create a new line like below
ANUM = 0-0, ACO=0, L=1-20;
ANUM = 0-00;
ANUM = 0-01, ROUTINF=FDS, L=3-20;
ANUM = 0-02, ACO=6, AO=7, L=3-20;
ANUM = 0-03, AO=4,L=3-20;
ANUM = 1-0, F=PRE, ANT=3, NAPI=1;
ANUM = 1-1, F=PRE, ANT=3;
I can split the line but my code can't remember which column the value belongs to and when to skip the values.
String[] splitted = null;
for (Integer i = 0; i < lines.size(); i++) {
splitted = lines.get(i).split("\\s+");
for(String str : splitted)
if(!(splitted.length == 1)){
anum = splitted[0];
routinf = splitted[1];
aco = splitted[2];
ao = splitted[3];
l = splitted[4];
}else {
miscell = splitted[0];
}
}
The columns in your file seems to be of fixed length (I don't see any other way to distinguish each column). If that is the case then I would recommend using substring(srat, end) instead of split.
Create a class to hold one single record.
class Record {
String aNumber,
List<String> routingf, aco, ao, l, miscell;
public Record(String aNumber) {
this.aNumber = aNumber;
this.routingf = new ArrayList<>();
// init other lists like above ...
}
public void addRoutingf(String routingf) {
// add only of not null and is not empty trimmed
if(routingf != null && routiingf.trim().length() > 0) {
this.routingf.add(routiingf);
}
}
// implement add-methods for other lists like above ...
}
While parsing each line remember the last created record. If in the actual line A-NUMBER is empty then use the last created record to store the values, otherwise create a new record and remember it as last/actual so you can use it for the upcoming lines if necessary.
Save all record in a list
List<Record> records = new ArrayList<>();
What is the common separator? Just split on that... Your + at the moment will consume any amount of white space. \s{1,4} wil limit it to between 1 and 4 characters. Find the right numbers for your data.
if your input time use one space char (for instance tab) between columns your code is almost OK
String[] splitted = null;
for (Integer i = 0; i < lines.size(); i++) {
splitted = lines.get(i).split("\\s");
if(!(splitted.length == 1)){
anum = splitted[0];
routinf = splitted[1];
aco = splitted[2];
ao = splitted[3];
l = splitted[4];
}else {
miscell = splitted[0];
}
}
//print only not empty fields
pls note removing of unnecessary for loop and change of split character to \s from \s+
Just a thought, but you could also experiment if it helps to keep the whitespaces in the result for defining which column it belongs to.
lines.get(i).split(yourDelimiter, -1);
Its hard to tell if this helps without knowing what exactly your origin files are looking like, but you could give it a try.
e.g. if the values are always at a certain point in the splitted string with whitespaces, you could easily tell which column it belongs to and extract them.