java read csv + specific sum of subarray - most efficient way

java read csv + specific sum of subarray - most efficient way - java

I need to read ints from large csv and then do specific sums with them. Currently I have algorithm that:
String csvFile = "D:/input.csv";
String line = "";
String cvsSplitBy = ";";
Vector<Int[]> converted = new Vector<Int[]>();
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
while ((line = br.readLine()) != null) {
String[] a = line.split(";",-1);
int[] b = new int[a.length];
for (int n = 0, n < a.length(), n++){
b[n] = Integer.parseInt(a[n]);
}
converted.add(b);
}
}
catch (IOException e) {
e.printStackTrace();
}
int x = 7;
int y = 5;
int sum = 0;
for (int m = 0; m < converted.size(); m++){
for (n = 0, n < x, n++){
sum = sum + converted.get(m)[n];
}
System.out.print(sum + " ");
for (int n = x + y, n < converted.get(m).length, n = n + y){
sum = 0;
for (int o = n -y; o < n; o++)
sum = sum + converted.get(m)[n];
}
System.out.print(sum + " ");
}
System.out.println("");
}
What I tried to do, is to get sum of first x members of a csv row, and then sum of x members every +y. (in this case sum of first x - 7(sum of 0-6), then sum of next x - 7, but y - 5 columns later(sum of 5-11), (sum of 10-16)... and write them, for every row.(in the end collecting line number with greatest (sum of 0-6), (sum of 5-11).., so final result should be for example 5,9,13,155..., which would mean line 5 had the greatest sum of 0-6, line 9 greatest sum of 5-11... ) As you can see, this is a quite inefficient way. First I've read whole csv into string[], then to int[] and saved to Vector. Then I created quite inefficient loop to do the work. I need this to run as fast as possible, as i will be using very large csv with lot of different x and y. What I was thinking about, but don't know how to do it is:
do these sums in the reading loop
do the sum differently, not always looping x members backward (either saving last sum and then subtract old and add new members, or other faster way to do subarray sum)
use intStream and parallelism (parallel might be tricky as in the end i am looking for max )
use different input then csv?
all of the above?
How can I do this as fast as possible? Thank you

As the sums are per line, you do not need to first read all in memory.
Path csvFile = Paths.get("D:/input.csv");
try (BufferedReader br = Files.newBufferedReader(csvFile, StandardCharsets.ISO_8859_1)) {
String line;
while ((line = br.readLine()) != null) {
int[] b = lineToInts(line);
int n = b.length;
// Sum while reading:
int sum = 0;
for (int i = 0; i < 7; ++i) {
sum += b[i];
}
System.out.print(sum + " ");
sum = 0;
for (int i = n - 5; i < n; ++i) {
sum += b[i];
}
System.out.print(sum + " ");
System.out.println();
}
}
private static int[] lineToInts(String line) {
// Using split is slow, one could optimize the implementation.
String[] a = line.split(";", -1);
int[] b = new int[a.length];
for (int n = 0, n < a.length(), n++){
b[n] = Integer.parseInt(a[n]);
}
return b;
}
A faster version:
private static int[] lineToInts(String line) {
int semicolons = 0;
for (int i = 0; (i = line.indexOf(';', i)) != -1; ++i) {
++semicolons;
}
int[] b = new int[semicolons + 1];
int pos = 0;
for (int i = 0; i < b.length(); ++i) {
int pos2 = line.indexOf(';', pos);
if (pos2 < 0) {
pos2 = line.length();
}
b[i] = Integer.parseInt(line.substring(pos, pos2));
pos = pos2 + 1;
}
return b;
}
As an aside: Vector is old, better use List and ArrayList.
List<int[]> converted = new ArrayList<>(10_000);
Above the optional argument of initial capacity is given: ten thousand.
The weird try-with-resource syntax try (BufferedReader br = ...) { ensures that br is alway automatically closed. Even on exception or return.
Parallelism and after reformatting the question
You could read all lines
List<String> lines = Files.readAllLines(csvFile, StandardCharsets.ISO_8859_1);
And than play with parallel streams like:
OptionalInt max = lines.parallelStream()
.mapToInt(line -> {
int[] b = lineToInst(line);
...
return sum;
}).max();
or:
IntStream.range(0, lines.size()).parallel()
.mapToObj(i -> {
String line = lines.get(i);
...
return new int[] { i, sum5, sum7 };
});

You could probably try to create some of your sums while reading the input. Might also be feasible to use HashMaps of type Integer,Integer

Related

Trie Data Structure in Finding an Optimal Solution

This Question is part of ongoing Competition , I have solved the 75% of this Question Data Set but the 25% is giving me TLE. I am asking why it's is giving TLE an i am sure my complexity is O(n*n)Question:
String S consisting of N lowercase English alphabets. We has prepared a list L consisting of all non empty substrings of the string S.
Now he asks you Q questions. To ith question, you need to count the number of ways to choose exactly Ki equal strings from the list L
For Example:
String = ababa
L = {"a", "b", "a", "b", "a", "ab", "ba", "ab", "ba", "aba", "bab", "aba", "abab", "baba", "ababa"}.
k1 = 2: There are seven ways to choose two equal strings ("a", "a"), ("a", "a"), ("a", "a"), ("b", "b"), ("ab", "ab"), ("ba", "ba"), ("aba", "aba").
k2 = 1: We can choose any string from L (15 ways).
k3 = 3: There is one way to choose three equal strings - ("a", "a", "a").
k4 = 4: There are no four equal strings in L .
Question LINK
My approach
I am making a TRIE of IT and Calculating The and Array F[i] where F[i] represent the number of times i equal String Occur.
My TRIE:
static class Batman{
int value;
Batman[] next = new Batman[26];
public Batman(int value){
this.value = value;
}
}
MY Insert Function
public static void Insert(String S,int[] F , int start){
Batman temp = Root;
for(int i=start;i<S.length();i++){
int index = S.charAt(i)-'a';
if(temp.next[index]==null){
temp.next[index] = new Batman(1);
F[1]+=1;
}else{
temp.next[index].value+=1;
int xx = temp.next[index].value;
F[xx-1]-=1;
F[xx]+=1;
// Calculating The Frequency of I equal Strings
}
temp = temp.next[index];
}
}
MY MAIN FUNCTION
public static void main(String args[] ) throws java.lang.Exception {
Root = new Batman(0);
int n = in.nextInt();
int Q = in.nextInt();
String S = in.next();
int[] F = new int[n+1];
for(int i=0;i<n;i++)
Insert(S,F,i);
long[] ans = new long[n+1];
for(int i=1;i<=n;i++){
for(int j=i;j<=n;j++){
ans[i]+= F[j]*C[j][i]; // C[n][k] is the Binomial Coffecient
ans[i]%=mod;
}
}
while(Q>0){
Q--;
int cc = in.nextInt();
long o =0;
if(cc<=n) o=ans[cc];
System.out.println(o+" "+S.length());
}
}
Why My appraoch is giving TLE as time Complexity is O(N*N) ans the length of String is N<=5000. Please Help me Working CODE

One reason this program get TLE (keep in mind that time constraint is 1 sec):
Each time you create a Batman object, it will create an array with length [26], and it is equivalence to adding a loop with n = 26.
So, you time complexity is 26*5000*5000 = 650000000 = 6.5*10^8 operations, theoretically, it can still fit into time limit if CPU speed is 10^9 operations per sec, but also keep in mind that there are some heavy calculation stuffs after this, so, this should be the reason.
To solve this problem, I used Z-algorithm and get accepted: Link
The actual code is quite complex, so the idea is, you have a table count[i][j], which is the number of substring that matched substring (i, j). Using Z-algorithm, you can have a time complexity of O(n^2).
For each string s:
int n = in.nextInt();
int q = in.nextInt();
String s = in.next();
int[][] cur = new int[n][];
int[][] count = new int[n][n];
int[] length = new int[n];
for (int i = 0; i < n; i++) {
cur[i] = Z(s.substring(i).toCharArray());//Applying Z algorithm
for (int j = 1; j < cur[i].length; j++) {
if (cur[i][j] > length[j + i]) {
for (int k = i + length[j + i]; k < i + cur[i][j]; k++) {
count[i][k]++;
}
length[j + i] = cur[i][j];
}
}
}
int[] F = new int[n + 1];
for(int i = 0; i < n; i++){
for(int j = i; j < n; j++){
int v = count[i][j] + (length[i] < (j - i + 1) ? 1 : 0);
F[v]++;
}
}
Z-algorithm method:
public static int[] Z(char[] s) {
int[] z = new int[s.length];
int n = s.length;
int L = 0, R = 0;
for (int i = 1; i < n; i++) {
if (i > R) {
L = R = i;
while (R < n && s[R - L] == s[R])
R++;
z[i] = R - L;
R--;
} else {
int k = i - L;
if (z[k] < R - i + 1) {
z[i] = z[k];
} else {
L = i;
while (R < n && s[R - L] == s[R])
R++;
z[i] = R - L;
R--;
}
}
}
return z;
}
Actual code: http://ideone.com/5GYWeS
Explanation:
First, we have an array length, with length[i] is the longest substring that matched with the string start from index i
For each index i, after calculate the Z function, we see that, if cur[i][j] > length[j + i], which means, there exists one substring longer than previous substring matched at index j + i, and we havent counted them in our result, so we need to count them.
So, even there are 3 nested for loop, but each substring is only counted once, which make this whole time complexity is O(n ^2)
for (int j = 1; j < cur[i].length; j++) {
if (cur[i][j] > length[j + i]) {
for (int k = i + length[j + i]; k < i + cur[i][j]; k++) {
count[i][k]++;
}
length[j + i] = cur[i][j];
}
}
For below loop, we notice that, if there is a matched for substring (i,j), length[i] >= length of substring (i,j), but if there is no matched, we need to add 1 to count substring (i,j), as this substring is unique.
for(int j = i; j < n; j++){
int v = count[i][j] + (length[i] < (j - i + 1) ? 1 : 0);
F[v]++;
}

Inserting number into random array

I need some help inserting the number 8 into an array that gives me random values. The array must be in order. For example if I had an array of (1,5,10,15), I have to insert the number 8 between 5 and 10. I am having a problem on how I can figure our a way to find the index where 8 will be placed because the array is random, it can be anything. Here is my code so far :
public class TrickyInsert {
public static void main(String[] args) {
int[] mysteryArr = generateRandArr();
//print out starting state of mysteryArr:
System.out.print("start:\t");
for ( int a : mysteryArr ) {
System.out.print( a + ", ");
}
System.out.println();
//code starts below
// insert value '8' in the appropriate place in mysteryArr[]
int[] tmp = new int[mysteryArr.length + 1];
int b = mysteryArr.length;
for(int i = 0; i < mysteryArr.length; i++) {
tmp[i] = mysteryArr[i];
}
tmp[b] = 8;
for(int i =b ; i<mysteryArr.length; i++) {
tmp[i+1] = mysteryArr[i];
}
mysteryArr = tmp;
any tips? thanks!

Simply add the number then use Arrays.sort method,
int b = mysteryArr.length;
int[] tmp = new int[b + 1];
for(int i = 0; i < b; i++) {
tmp[i] = mysteryArr[i];
}
tmp[b] = 8;
mysteryArr = Arrays.sort(tmp);

In your example the random array is sorted. If this is the case, just insert 8 and sort again.

Simply copy the array over, add 8, and sort again.
int[] a = generateRandArr();
int[] b = Arrays.copyOf(a, a.length + 1);
b[a.length] = 8;
Arrays.sort(b);

int findPosition(int a, int[] inputArr)
{
for(int i = 0; i < inputArr.length; ++i)
if(inputArr[i] < a)
return i;
return -1;
}
int[] tmpArr = new int[mysteryArr.length + 1];
int a = 8; // or any other number
int x = findPosition(a, mysteryArr);
if(x == -1)
int i = 0;
for(; i < mysteryArr.length; ++i)
tmpArr[i] = mysteryArr[i];
tmpArr[i] = a;
else
for(int i = 0; i < mysteryArr.length + 1; ++i)
if(i < x)
tmpArr[i] = mysteryArr[i];
else if(i == x)
tmpArr = a;
else
tmpArr[i] = mysteryArr[i - 1];

I will suggest using binary search to find the appropriate index. Once you locate the index, you can use
System.arraycopy(Object src, int srcIndex, Obj dest, int destIndex, int length)
to copy the left half to your new array (with length one more than the existing one) and then the new element and finally the right half. This will stop the need to sort the whole array every time you insert an element.
Also, the following portion does not do anything.
for(int i =b ; i<mysteryArr.length; i++) {
tmp[i+1] = mysteryArr[i];
}
since int b = mysteryArr.length;, after setting int i =b ;, i<mysteryArr.length; will be false and hence the line inside this for loop will never execute.

Finding the greatest common divisor (GCD) of an array excluding some elements in minimum time

I was doing a competitive programming question whereby you are given an array of numbers, and then a certain number of queries. For each query, you are given 2 integers, 'a' and 'b'. So you're supposed to output the GCD of the remaining elements in the array (excluding a, b , and all the elements in between).
For example, if the array is : 16, 8, 24, 15, 20 and there are 2 queries (2, 3) and (1, 3), then output 1 is: 1 and output 2 is: 5.
Note that the indexing is 1 based.
Here is my code, in which I've implemented the basic idea with a function for finding the GCD of an array passed to it.
public static void main(String args[]) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
int t = Integer.parseInt(br.readLine());
while (t-- > 0) { //This is the number of test cases
String[] s1 = br.readLine().split(" ");
int n = Integer.parseInt(s1[0]); //Number of elements in array
int q = Integer.parseInt(s1[1]); //Number of queries
String[] s2 = br.readLine().split(" ");
int[] arr = new int[n];
for (int i = 0; i < n; i++) {
arr[i] = Integer.parseInt(s2[i]);
}
for (int i = 0; i < q; i++) { //for each query
String[] s3 = br.readLine().split(" ");
int a = Integer.parseInt(s3[0]) - 1;
int b = Integer.parseInt(s3[1]) - 1;
int[] copy = new int[n - b + a - 1]; //this is so that the original array doesn't get messed up
int index = 0;
for (int j = 0; j < n; j++) { //filing the array without the elements of the query
if (j < a || j > b) {
copy[index] = arr[j];
index++;
}
}
int fin = gcd(copy);
System.out.println(fin);
}
}
}
private static int gcd(int a, int b) {
while (b > 0) {
int temp = b;
b = a % b; // % is remainder
a = temp;
}
return a;
}
private static int gcd(int[] input) { //simple GCD calculator using the fact that GCD(a,b,c) === GCD((a,b),c)
int result = input[0];
for (int i = 1; i < input.length; i++)
result = gcd(result, input[i]);
return result;
}
The problem is that I'm getting AC on some of the parts (6 out of 10), and a TLE on the rest. Can someone suggest a better method to solve this problem, as my approach seems too slow, and almost impossible to be optimized any further?

You can just precompute gcd for all prefixes and suffixes. Each query is a union of a prefix and a suffix, so it takes O(log MAX_A) time to answer one. Here is my code:
import java.util.*;
import java.io.*;
public class Solution {
static int gcd(int a, int b) {
while (b != 0) {
int t = a;
a = b;
b = t % b;
}
return a;
}
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(
new InputStreamReader(System.in));
PrintWriter out = new PrintWriter(System.out);
int tests = Integer.parseInt(br.readLine());
for (int test = 0; test < tests; test++) {
String line = br.readLine();
String[] parts = line.split(" ");
int n = Integer.parseInt(parts[0]);
int q = Integer.parseInt(parts[1]);
int[] a = new int[n];
parts = br.readLine().split(" ");
for (int i = 0; i < n; i++)
a[i] = Integer.parseInt(parts[i]);
int[] gcdPrefix = new int[n];
int[] gcdSuffix = new int[n];
for (int i = 0; i < n; i++) {
gcdPrefix[i] = a[i];
if (i > 0)
gcdPrefix[i] = gcd(gcdPrefix[i], gcdPrefix[i - 1]);
}
for (int i = n - 1; i >= 0; i--) {
gcdSuffix[i] = a[i];
if (i < n - 1)
gcdSuffix[i] = gcd(gcdSuffix[i], gcdSuffix[i + 1]);
}
for (int i = 0; i < q; i++) {
parts = br.readLine().split(" ");
int left = Integer.parseInt(parts[0]);
int right = Integer.parseInt(parts[1]);
left--;
right--;
int res = 0;
if (left > 0)
res = gcd(res, gcdPrefix[left - 1]);
if (right < n - 1)
res = gcd(res, gcdSuffix[right + 1]);
out.println(res);
}
}
out.flush();
}
}

"Almost impossible to optimize further"? Pshaw:
Add a cache of computed GCDs of adjacent input elements so they don't need to be re-computed. For example, have a table that holds the GCD of input[i] and input[j]. Note that this will be no more than half the size of the original input.
Compute the GDC of successive pairs of inputs (so you can take advantage of #1)
This could be extended to larger groups, at the cost of more space.

What is crucial here is that the GCD of a set of numbers A is equal to the GCD of the GCDs of any partition of A. For example,
GCD(16, 8, 24, 15, 20) = GCD(GCD(16, 8), GCD(24, 15, 20))
I would exploit this fact by building some tree like structure. Lets write GCD[i, j] for the GCD of the set of elements with indices between i and j. For a given input of size n, I would store:
GCD[1, n]
GCD[1, n/2], GCD[n/2+1, n]
...
GCD[1, 2], GCD[2, 3] ... GCD[n-1, n]
That is, at every level of the tree the number of GCDs doubles and the size of the sets over which they are computed halves. Note that you will store n-1 numbers this way, so you need linear extra storage. Computing them bottom-up, you will need to do n-1 GCD operations as preprocessing.
For querying, you need to combine the GCDs such that exactly the two query indices are left out. As an example, lets have an array A with n = 8 and we query (2, 4).
We cannot use GCD[1, 8], because we need to exclude 2 and 4, so we go one level deeper in the tree.
We cannot use GCD[1, 4], but we can use GCD[5, 8], because neither of the indices to exclude is in there. For the first half we need to go deeper.
We cannot use GCD[1, 2], nor GCD[3, 4], so we go one level deeper.
We simply use the elements A[1] and A[3].
We now need to compute the GCD of GCD[5, 8], A[1], and A[3]. For the query, we need to do only 2 GCD calculations, instead of 5 in the naive way.
In general, you will spend O(log n) time searching the structure and will need O(log n) GCD calculations per query.

Constructor of a 2d array only contains the last value

I am reading in a file using a scanner. The file is formatted with the first line being the dimensions of the array. The next line contains the 1st row, next line the second, etc. Example file:
3 3
1 2 3
4 5 6
7 8 9
The problem I keep running into is that my array values seem to be all 9. The file is in ints but I need them in doubles. I also have a lot of print statements in there as I am trying to debug what was going on. My exception handling isn't finished. I will go back and beef that up after I can instantiate the array correctly. Any pointers would be appreciated. For example: a better way of getting the dimensions and instantiating the array with just opening the file once.
Updated but getting a nullPointerException
public class Help implements TopoMapInterface {
private String filename;
private File mapfile;
public double[][] baseMap;
public double[][] enMap;
public int enhancementLevel;
public Help(String filename) throws FileNotFoundException,
InvalidFileFormatException {
this.filename = filename;
System.out.println("Reading in file: " + filename);
String number = "";
int row = 0;
int col = 0;
int count = 0;
try {
Scanner inputFile = new Scanner(new File(filename));
while (inputFile.hasNextInt()) {
row = Integer.parseInt(inputFile.next());
col = Integer.parseInt(inputFile.next());
System.out.println("Row : " + row);
System.out.println("Col : " + col);
baseMap = new double[row][col];
System.out.println(baseMap[2][4]);
for (int i = 0; i < baseMap.length; i++){
for (int j = 0; j < baseMap[i].length; j++){
baseMap[i][j] = Double.parseDouble(inputFile.next());
}
}
}
} catch (Exception e) {
System.out.println(e.toString());
}

Assuming that your first line has the value of row first and then column
I would do this
int row = Double.parseDouble(inputFile.next());
int col = Double.parseDouble(inputFile.next());
for (int i = 0;i<row;i++){
for(j=0;j<col;j++)
{
baseMap[i][j]=Double.parseDouble(inputFile.next());
}}
This should store all your values in double as you want and I think so this is an easier way to store after reading from file.
I think so I got your question correct!

Every single time you read in a number when count1 > 2, you're then iterating over your entire matrix and inserting doubleVal into every single cell; the last value you see is 9, and so that's what you have everywhere.
More generally, if you're guaranteed to have correct input (i.e., you have a school exercise like this one), then you shouldn't be reading your dimension specifications inside a loop; you should read the first two ints, create the array based on that size, and then use a nested for loop to insert the values into the array.

for (int r = 0; r < baseMap.length; r++){
for (int c = 0; c < baseMap[r].length; c++){
baseMap[r][c] = doubleVal;
}
}
is where the problem lies.
this iterates through every row and column of the array, setting every element to the current number
since the last number you see is 9, that;s what gets left in the array.
what you really need is some other way of keeping track of row and column counts, rather than iterating through the array. perhaps a pair of counters?
int intVal = Integer.parseInt(number);
double doubleVal = (double) intVal;
should probably be replaced with
double doubleVal = Double.parseDouble(number);
file reading improvements as given by No Idea For Name
would significantly improve this code. though using is a java 7 and later only construct.
for earlier versions. upgrade if you can, remember to close the resources otherwise.

first of all when using scanner or any other stream, you should close them in the end using a finally statement
Scanner inputFile = null;
try{
inputFile = new Scanner(new File(filename));
}
catch{}
finally{
if(inputFile != null)
inputFile.close();
}
this will ensure that you will release the scanner when done reading and not hold it to the next loop.
also, in your code, in the second while loop you seem to not close it right, and so in each loop this code is called:
if (count1 > 2){
for (int r = 0; r < baseMap.length; r++){
for (int c = 0; c < baseMap[r].length; c++){
baseMap[r][c] = doubleVal;
}
}
last thing, you are opening the file twice! there is no need for that. at the very minimum you can change your code to:
public class Help implements TopoMapInterface {
private String filename;
private File mapfile;
public double[][] baseMap;
public double[][] enMap;
public int enhancementLevel;
public Help(String filename) throws FileNotFoundException,
InvalidFileFormatException {
this.filename = filename;
System.out.println("Reading in file: " + filename);
String number = "";
int row = 0;
int col = 0;
int count = 0;
Scanner inputFile = null;
try {
inputFile = new Scanner(new File(filename));
number = inputFile.next();
System.out.println(number);
row = Integer.parseInt(number);
number = inputFile.next();
col = Integer.parseInt(number);
int count1 = 0;
baseMap = new double[row][col];
while (inputFile.hasNextInt()) {
count1++;
number = inputFile.next();
int intVal = Integer.parseInt(number);
double doubleVal = (double) intVal;
}// Closed your while loop here
if (count1 > 2){
for (int r = 0; r < baseMap.length; r++){
for (int c = 0; c < baseMap[r].length; c++){
baseMap[r][c] = doubleVal;
}
}
System.out.println(doubleVal+"*");
}
}
System.out.println("end of this while loop");
} catch (Exception e) {
System.out.println(e.toString());
}
finally
{
if(inputFile != null)
inputFile.close();
}
try {
System.out.println("Row = " + row + " Col = " + col);
for (int r = 0; r < baseMap.length; r++) {
for (int c = 0; c < baseMap[r].length; c++) {
System.out.print(baseMap[r][c] + " ");
}
}
} catch (Exception e) {
System.out.println(e.toString());
}
}

I am not sure why you are scanning the file two times, but if you are then here is the problem in your code
for (int r = 0; r < baseMap.length; r++){
for (int c = 0; c < baseMap[r].length; c++){
baseMap[r][c] = doubleVal;
}
}
Following code will solve your problem:
try {
Scanner inputFile = new Scanner(new File(filename));
int count1 = 0;
int r = -1, c = -1;
baseMap = new double[row][col];
while (inputFile.hasNextInt()) {
count1++;
number = inputFile.next();
int intVal = Integer.parseInt(number);
double doubleVal = (double) intVal;
if (count1 > 2){
if (count1 % row == 0)
r++;
c = count1 % col;
baseMap[r][c] = doubleVal;
System.out.println(doubleVal+"*");
}
}
inputFile.close();
System.out.println("Row = " + row + " Col = " + col);
for (r = 0; r < baseMap.length; r++) {
for (c = 0; c < baseMap[r].length; c++) {
System.out.print(baseMap[r][c] + " ");
}
}
} catch (Exception e) {
System.out.println(e.toString());
}
Also don't forget to put a break in your first scanning
if (count == 1) {
row = Integer.parseInt(number);
} else if (count == 2) {
col = Integer.parseInt(number);
break;
}

Getting all combinations of values from many lists

I'm trying to resolve all the combinations of elements based on a given string.
The string is like this :
String result="1,2,3,###4,5,###6,###7,8,";
The number of element between ### (separated with ,) is not determined and the number of "list" (part separated with ###) is not determined either.
NB : I use number in this example but it can be String too.
And the expected result in this case is a string containing :
String result = "1467, 1468, 1567, 1568, 2467, 2468, 2567, 2568, 3467, 3468, 3567, 3568"
So as you can see the elements in result must start with an element of the first list then the second element must be an element of the second list etc...
From now I made this algorithm that works but it's slow :
String [] parts = result.split("###");
if(parts.length>1){
result="";
String stack="";
int i;
String [] elmts2=null;
String [] elmts = parts[0].split(",");
for(String elmt : elmts){ //Browse root elements
if(elmt.trim().isEmpty())continue;
/**
* This array is used to store the next index to use for each row.
*/
int [] elmtIdxInPart= new int[parts.length];
//Loop until the root element index change.
while(elmtIdxInPart[0]==0){
stack=elmt;
//Add to the stack an element of each row, chosen by index (elmtIdxInPart)
for(i=1 ; i<parts.length;i++){
if(parts[i].trim().isEmpty() || parts[i].trim().equals(","))continue;
String part = parts[i];
elmts2 = part.split(",");
stack+=elmts2[elmtIdxInPart[i]];
}
//rollback i to previous used index
i--;
if(elmts2 == null){
elmtIdxInPart[0]=elmtIdxInPart[0]+1;
}
//Check if all elements in the row have been used.
else if(elmtIdxInPart[i]+1 >=elmts2.length || elmts2[elmtIdxInPart[i]+1].isEmpty()){
//Make evolve previous row that still have unused index
int j=1;
while(elmtIdxInPart[i-j]+1 >=parts[i-j].split(",").length ||
parts[i-j].split(",")[elmtIdxInPart[i-j]+1].isEmpty()){
if(j+1>i)break;
j++;
}
int next = elmtIdxInPart[i-j]+1;
//Init the next row to 0.
for(int k = (i-j)+1 ; k <elmtIdxInPart.length ; k++){
elmtIdxInPart[k]=0;
}
elmtIdxInPart[i-j]=next;
}
else{
//Make evolve index in current row, init the next row to 0.
int next = elmtIdxInPart[i]+1;
for(int k = (i+1) ; k <elmtIdxInPart.length ; k++){
elmtIdxInPart[k]=0;
}
elmtIdxInPart[i]=next;
}
//Store full stack
result+=stack+",";
}
}
}
else{
result=parts[0];
}
I'm looking for a more performant algorithm if it's possible. I made it from scratch without thinking about any mathematical algorithm. So I think I made a tricky/slow algo and it can be improved.
Thanks for your suggestions and thanks for trying to understand what I've done :)
EDIT
Using Svinja proposition it divide execution time by 2:
StringBuilder res = new StringBuilder();
String input = "1,2,3,###4,5,###6,###7,8,";
String[] lists = input.split("###");
int N = lists.length;
int[] length = new int[N];
int[] indices = new int[N];
String[][] element = new String[N][];
for (int i = 0; i < N; i++){
element[i] = lists[i].split(",");
length[i] = element[i].length;
}
// solve
while (true)
{
// output current element
for (int i = 0; i < N; i++){
res.append(element[i][indices[i]]);
}
res.append(",");
// calculate next element
int ind = N - 1;
for (; ind >= 0; ind--)
if (indices[ind] < length[ind] - 1) break;
if (ind == -1) break;
indices[ind]++;
for (ind++; ind < N; ind++) indices[ind] = 0;
}
System.out.println(res);

This is my solution. It's in C# but you should be able to understand it (the important part is the "calculate next element" section):
static void Main(string[] args)
{
// parse the input, this can probably be done more efficiently
string input = "1,2,3,###4,5,###6,###7,8,";
string[] lists = input.Replace("###", "#").Split('#');
int N = lists.Length;
int[] length = new int[N];
int[] indices = new int[N];
for (int i = 0; i < N; i++)
length[i] = lists[i].Split(',').Length - 1;
string[][] element = new string[N][];
for (int i = 0; i < N; i++)
{
string[] list = lists[i].Split(',');
element[i] = new string[length[i]];
for (int j = 0; j < length[i]; j++)
element[i][j] = list[j];
}
// solve
while (true)
{
// output current element
for (int i = 0; i < N; i++) Console.Write(element[i][indices[i]]);
Console.WriteLine(" ");
// calculate next element
int ind = N - 1;
for (; ind >= 0; ind--)
if (indices[ind] < length[ind] - 1) break;
if (ind == -1) break;
indices[ind]++;
for (ind++; ind < N; ind++) indices[ind] = 0;
}
}
Seems kind of similar to your solution. Does this really have bad performance? Seems to me that this is clearly optimal, as the complexity is linear with the size of the output, which is always optimal.
edit: by "similar" I mean that you also seem to do the counting with indexes thing. Your code is too complicated for me to go into after work. :D
My index adjustment works very simply: starting from the right, find the first index we can increase without overflowing, increase it by one, and set all the indexes to its right (if any) to 0. It's basically counting in a number system where each digit is in a different base. Once we can't even increase the first index any more (which means we can't increase any, as we started checking from the right), we're done.

Here is a somewhat different approach:
static void Main(string[] args)
{
string input = "1,2,3,###4,5,###6,###7,8,";
string[] lists = input.Replace("###", "#").Split('#');
int N = lists.Length;
int[] length = new int[N];
string[][] element = new string[N][];
int outCount = 1;
// get each string for each position
for (int i = 0; i < N; i++)
{
string list = lists[i];
// fix the extra comma at the end
if (list.Substring(list.Length - 1, 1) == ",")
list = list.Substring(0, list.Length - 1);
string[] strings = list.Split(',');
element[i] = strings;
length[i] = strings.Length;
outCount *= length[i];
}
// prepare the output array
string[] outstr = new string[outCount];
// produce all of the individual output strings
string[] position = new string[N];
for (int j = 0; j < outCount; j++)
{
// working value of j:
int k = j;
for (int i = 0; i < N; i++)
{
int c = length[i];
int q = k / c;
int r = k - (q * c);
k = q;
position[i] = element[i][r];
}
// combine the chars
outstr[j] = string.Join("", position);
}
// join all of the strings together
//(note: joining them all at once is much faster than doing it
//incrementally, if a mass concatenate facility is available
string result = string.Join(", ", outstr);
Console.Write(result);
}
I am not a Java programmer either, so I adapted Svinja's c# answer to my algorithm, assuming that you can convert it to Java also. (thanks to Svinja..)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java read csv + specific sum of subarray - most efficient way - java

You could probably try to create some of your sums while reading the input. Might also be feasible to use HashMaps of type Integer,Integer

Related

Trie Data Structure in Finding an Optimal Solution

Inserting number into random array

Finding the greatest common divisor (GCD) of an array excluding some elements in minimum time

Constructor of a 2d array only contains the last value

Getting all combinations of values from many lists

Categories

Resources