I'm trying to solve this problem:
Given an array of positive integers, and an integer Y, you are allowed to replace at most Y array-elements with lesser values. Your goal is for the array to end up with as large a subset of identical values as possible. Return the size of this largest subset.
The array is originally sorted in increasing order, but you do not need to preserve that property.
So, for example, if the array is [10,20,20,30,30,30,40,40,40] and Y = 3, the result should be 6, because you can get six 30s by replacing the three 40s with 30s. If the array is [20,20,20,40,50,50,50,50] and Y = 2, the result should be 5, because you can get five 20s by replacing two of the 50s with 20s.
Below is my solution with O(nlogn) time complexity. (is that right?) I wonder if I can further optimize this solution?
Thanks in advance.
public class Nails {
public static int Solutions(int[] A, int Y) {
int N = A.length;
TreeMap < Integer, Integer > nailMap = new TreeMap < Integer, Integer > (Collections.reverseOrder());
for (int i = 0; i < N; i++) {
if (!nailMap.containsKey(A[i])) {
nailMap.put(A[i], 1);
} else {
nailMap.put(A[i], nailMap.get(A[i]) + 1);
}
}
List < Integer > nums = nailMap.values().stream().collect(Collectors.toList());
if (nums.size() == 1) {
return nums.get(0);
}
//else
int max = nums.get(0);
int longer = 0;
for (int j = 0; j < nums.size(); j++) {
int count = 0;
if (Y < longer) {
count = Y + nums.get(j);
} else {
count = longer + nums.get(j);
}
if (max < count) {
max = count;
}
longer += nums.get(j);
}
return max;
}
public static void main(String[] args) {
Scanner scanner = new Scanner(System.in);
while (scanner.hasNext()) {
String[] input = scanner.nextLine().replaceAll("\\[|\\]", "").split(",");
System.out.println(Arrays.toString(input));
int[] A = new int[input.length - 1];
int Y = Integer.parseInt(input[input.length - 1]);
for (int i = 0; i < input.length; i++) {
if (i < input.length - 1) {
A[i] = Integer.parseInt(input[i]);
} else {
break;
}
}
int result = Solutions(A, Y);
System.out.println(result);
}
}
}
A C++ implementation would like the following where A is the sorted pin size array and K is the number of times the pins can be hammered.
{1,1,3,3,4,4,4,5,5}, K=2 should give 5 as the answer
{1,1,3,3,4,4,4,5,5,6,6,6,6,6,6}, K=2 should give 6 as the answer
int maxCount(vector<int>& A, int K) {
int n = A.size();
int best = 0;
int count = 1;
for (int i = 0; i < n-K-1; i++) {
if (A[i] == A[i + 1])
count = count + 1;
else
count = 1;
if (count > best)
best = count;
}
int result = max(best+K, min(K+1, n));
return result;
}
Since the array is sorted to begin with, a reasonably straightforward O(n) solution is, for each distinct value, to count how many elements have that value (by iteration) and how many elements have a greater value (by subtraction).
public static int doIt(final int[] array, final int y) {
int best = 0;
int start = 0;
while (start < array.length) {
int end = start;
while (end < array.length && array[end] == array[start]) {
++end;
}
// array[start .. (end-1)] is now the subarray consisting of a
// single value repeated (end-start) times.
best = Math.max(best, end - start + Math.min(y, array.length - end));
start = end; // skip to the next distinct value
}
assert best >= Math.min(y + 1, array.length); // sanity-check
return best;
}
First, iterate through all the nails and create a hash H that stores the number of nails for each size. For [1,2,2,3,3,3,4,4,4], H should be:
size count
1 : 1
2 : 2
3 : 3
4 : 3
Now create an little algorithm to evaluate the maximum sum for each size S, given Y:
BestForSize(S, Y){
total = H[S]
while(Y > 0){
S++
if(Y >= H[S] and S < biggestNailSize){
total += H[S]
Y -= H[S]
}
else{
total += Y
Y = 0
}
}
return total;
}
Your answer should be max(BestForSize(0, Y), BestForSize(1, Y), ..., BestForSize(maxSizeOfNail, Y)).
The complexity is O(n²). A tip to optimize is to start from the end. For example, after you have the maximum value of nails in the size 4, how can you use your answer to find the maximum number of size 3?
Here is my java implementation: First I build a reversed map of each integer and its occurence for example {1,1,1,1,3,3,4,4,5,5} would give {5=2, 4=2, 3=2, 1=4}, then for each integer I calculate the max occurence that we can get of it regarding the K and the occurences of the highest integers in the array.
public static int ourFunction(final int[] A, final int K) {
int length = A.length;
int a = 0;
int result = 0;
int b = 0;
int previousValue = 0;
TreeMap < Integer, Integer > ourMap = new TreeMap < Integer, Integer > (Collections.reverseOrder());
for (int i = 0; i < length; i++) {
if (!ourMap.containsKey(A[i])) {
ourMap.put(A[i], 1);
} else {
ourMap.put(A[i], ourMap.get(A[i]) + 1);
}
}
for (Map.Entry<Integer, Integer> entry : ourMap.entrySet()) {
if( a == 0) {
a++;
result = entry.getValue();
previousValue = entry.getValue();
} else {
if( K < previousValue)
b = K;
else
b = previousValue;
if ( b + entry.getValue() > result )
result = b + entry.getValue();
previousValue += entry.getValue();
}
}
return result;
}
Since the array is sorted, we can have an O(n) solution by iterating and checking if current element is equals to previous element and keeping track of the max length.
static int findMax(int []a,int y) {
int n = a.length,current = 1,max = 0,diff = 0;
for(int i = 1; i< n; i++) {
if(a[i] == a[i-1]) {
current++;
diff = Math.min(y, n-i-1);
max = Math.max(max, current+diff);
}else {
current = 1;
}
}
return max;
}
given int array is not sorted than you should sort
public static int findMax(int []A,int K) {
int current = 1,max = 0,diff = 0;
List<Integer> sorted=Arrays.stream(A).sorted().boxed().collect(Collectors.toList());
for(int i = 1; i< sorted.size(); i++) {
if(sorted.get(i).equals(sorted.get(i-1))) {
current++;
diff = Math.min(K, sorted.size()-i-1);
max = Math.max(max, current+diff);
}else {
current = 1;
}
}
return max;
}
public static void main(String args[]) {
List<Integer> A = Arrays.asList(3,1,5,3,4,4,3,3,5,5,5,1);
int[] Al = A.stream().mapToInt(Integer::intValue).toArray();
int result=findMax(Al, 5);
System.out.println(result);
}
The task is:
A non-empty zero-indexed string S is given. String S consists of N characters from the set of upper-case English letters A, C, G, T.
This string actually represents a DNA sequence, and the upper-case letters represent single nucleotides.
You are also given non-empty zero-indexed arrays P and Q consisting of M integers. These arrays represent queries about minimal nucleotides. We represent the letters of string S as integers 1, 2, 3, 4 in arrays P and Q, where A = 1, C = 2, G = 3, T = 4, and we assume that A < C < G < T.
Query K requires you to find the minimal nucleotide from the range (P[K], Q[K]), 0 ≤ P[i] ≤ Q[i] < N.
For example, consider string S = GACACCATA and arrays P, Q such that:
P[0] = 0 Q[0] = 8
P[1] = 0 Q[1] = 2
P[2] = 4 Q[2] = 5
P[3] = 7 Q[3] = 7
The minimal nucleotides from these ranges are as follows:
(0, 8) is A identified by 1,
(0, 2) is A identified by 1,
(4, 5) is C identified by 2,
(7, 7) is T identified by 4.
Write a function:
class Solution { public int[] solution(String S, int[] P, int[] Q); }
that, given a non-empty zero-indexed string S consisting of N characters and two non-empty zero-indexed arrays P and Q consisting of M integers, returns an array consisting of M characters specifying the consecutive answers to all queries.
The sequence should be returned as:
a Results structure (in C), or
a vector of integers (in C++), or
a Results record (in Pascal), or
an array of integers (in any other programming language).
For example, given the string S = GACACCATA and arrays P, Q such that:
P[0] = 0 Q[0] = 8
P[1] = 0 Q[1] = 2
P[2] = 4 Q[2] = 5
P[3] = 7 Q[3] = 7
the function should return the values [1, 1, 2, 4], as explained above.
Assume that:
N is an integer within the range [1..100,000];
M is an integer within the range [1..50,000];
each element of array P, Q is an integer within the range [0..N − 1];
P[i] ≤ Q[i];
string S consists only of upper-case English letters A, C, G, T.
Complexity:
expected worst-case time complexity is O(N+M);
expected worst-case space complexity is O(N),
beyond input storage
(not counting the storage required for input arguments).
Elements of input arrays can be modified.
My solution is:
class Solution {
public int[] solution(String S, int[] P, int[] Q) {
final char c[] = S.toCharArray();
final int answer[] = new int[P.length];
int tempAnswer;
char tempC;
for (int iii = 0; iii < P.length; iii++) {
tempAnswer = 4;
for (int zzz = P[iii]; zzz <= Q[iii]; zzz++) {
tempC = c[zzz];
if (tempC == 'A') {
tempAnswer = 1;
break;
} else if (tempC == 'C') {
if (tempAnswer > 2) {
tempAnswer = 2;
}
} else if (tempC == 'G') {
if (tempAnswer > 3) {
tempAnswer = 3;
}
}
}
answer[iii] = tempAnswer;
}
return answer;
}
}
It is not optimal, I believe it's supposed to be done within one loop, any hint how can I achieve it?
You can check quality of your solution here https://codility.com/train/ test name is Genomic-range-query.
Here is the solution that got 100 out of 100 in codility.com. Please read about prefix sums to understand the solution:
public static int[] solveGenomicRange(String S, int[] P, int[] Q) {
//used jagged array to hold the prefix sums of each A, C and G genoms
//we don't need to get prefix sums of T, you will see why.
int[][] genoms = new int[3][S.length()+1];
//if the char is found in the index i, then we set it to be 1 else they are 0
//3 short values are needed for this reason
short a, c, g;
for (int i=0; i<S.length(); i++) {
a = 0; c = 0; g = 0;
if ('A' == (S.charAt(i))) {
a=1;
}
if ('C' == (S.charAt(i))) {
c=1;
}
if ('G' == (S.charAt(i))) {
g=1;
}
//here we calculate prefix sums. To learn what's prefix sums look at here https://codility.com/media/train/3-PrefixSums.pdf
genoms[0][i+1] = genoms[0][i] + a;
genoms[1][i+1] = genoms[1][i] + c;
genoms[2][i+1] = genoms[2][i] + g;
}
int[] result = new int[P.length];
//here we go through the provided P[] and Q[] arrays as intervals
for (int i=0; i<P.length; i++) {
int fromIndex = P[i];
//we need to add 1 to Q[i],
//because our genoms[0][0], genoms[1][0] and genoms[2][0]
//have 0 values by default, look above genoms[0][i+1] = genoms[0][i] + a;
int toIndex = Q[i]+1;
if (genoms[0][toIndex] - genoms[0][fromIndex] > 0) {
result[i] = 1;
} else if (genoms[1][toIndex] - genoms[1][fromIndex] > 0) {
result[i] = 2;
} else if (genoms[2][toIndex] - genoms[2][fromIndex] > 0) {
result[i] = 3;
} else {
result[i] = 4;
}
}
return result;
}
Simple, elegant, domain specific, 100/100 solution in JS with comments!
function solution(S, P, Q) {
var N = S.length, M = P.length;
// dictionary to map nucleotide to impact factor
var impact = {A : 1, C : 2, G : 3, T : 4};
// nucleotide total count in DNA
var currCounter = {A : 0, C : 0, G : 0, T : 0};
// how many times nucleotide repeats at the moment we reach S[i]
var counters = [];
// result
var minImpact = [];
var i;
// count nucleotides
for(i = 0; i <= N; i++) {
counters.push({A: currCounter.A, C: currCounter.C, G: currCounter.G});
currCounter[S[i]]++;
}
// for every query
for(i = 0; i < M; i++) {
var from = P[i], to = Q[i] + 1;
// compare count of A at the start of query with count at the end of equry
// if counter was changed then query contains A
if(counters[to].A - counters[from].A > 0) {
minImpact.push(impact.A);
}
// same things for C and others nucleotides with higher impact factor
else if(counters[to].C - counters[from].C > 0) {
minImpact.push(impact.C);
}
else if(counters[to].G - counters[from].G > 0) {
minImpact.push(impact.G);
}
else { // one of the counters MUST be changed, so its T
minImpact.push(impact.T);
}
}
return minImpact;
}
Java, 100/100, but with no cumulative/prefix sums! I stashed the last occurrence index of lower 3 nucelotides in a array "map". Later I check if the last index is between P-Q. If so it returns the nuclotide, if not found, it's the top one (T):
class Solution {
int[][] lastOccurrencesMap;
public int[] solution(String S, int[] P, int[] Q) {
int N = S.length();
int M = P.length;
int[] result = new int[M];
lastOccurrencesMap = new int[3][N];
int lastA = -1;
int lastC = -1;
int lastG = -1;
for (int i = 0; i < N; i++) {
char c = S.charAt(i);
if (c == 'A') {
lastA = i;
} else if (c == 'C') {
lastC = i;
} else if (c == 'G') {
lastG = i;
}
lastOccurrencesMap[0][i] = lastA;
lastOccurrencesMap[1][i] = lastC;
lastOccurrencesMap[2][i] = lastG;
}
for (int i = 0; i < M; i++) {
int startIndex = P[i];
int endIndex = Q[i];
int minimum = 4;
for (int n = 0; n < 3; n++) {
int lastOccurence = getLastNucleotideOccurrence(startIndex, endIndex, n);
if (lastOccurence != 0) {
minimum = n + 1;
break;
}
}
result[i] = minimum;
}
return result;
}
int getLastNucleotideOccurrence(int startIndex, int endIndex, int nucleotideIndex) {
int[] lastOccurrences = lastOccurrencesMap[nucleotideIndex];
int endValueLastOccurenceIndex = lastOccurrences[endIndex];
if (endValueLastOccurenceIndex >= startIndex) {
return nucleotideIndex + 1;
} else {
return 0;
}
}
}
Here is the solution, supposing someone is still interested.
class Solution {
public int[] solution(String S, int[] P, int[] Q) {
int[] answer = new int[P.length];
char[] chars = S.toCharArray();
int[][] cumulativeAnswers = new int[4][chars.length + 1];
for (int iii = 0; iii < chars.length; iii++) {
if (iii > 0) {
for (int zzz = 0; zzz < 4; zzz++) {
cumulativeAnswers[zzz][iii + 1] = cumulativeAnswers[zzz][iii];
}
}
switch (chars[iii]) {
case 'A':
cumulativeAnswers[0][iii + 1]++;
break;
case 'C':
cumulativeAnswers[1][iii + 1]++;
break;
case 'G':
cumulativeAnswers[2][iii + 1]++;
break;
case 'T':
cumulativeAnswers[3][iii + 1]++;
break;
}
}
for (int iii = 0; iii < P.length; iii++) {
for (int zzz = 0; zzz < 4; zzz++) {
if ((cumulativeAnswers[zzz][Q[iii] + 1] - cumulativeAnswers[zzz][P[iii]]) > 0) {
answer[iii] = zzz + 1;
break;
}
}
}
return answer;
}
}
In case anyone cares about C:
#include <string.h>
struct Results solution(char *S, int P[], int Q[], int M) {
int i, a, b, N, *pA, *pC, *pG;
struct Results result;
result.A = malloc(sizeof(int) * M);
result.M = M;
// calculate prefix sums
N = strlen(S);
pA = malloc(sizeof(int) * N);
pC = malloc(sizeof(int) * N);
pG = malloc(sizeof(int) * N);
pA[0] = S[0] == 'A' ? 1 : 0;
pC[0] = S[0] == 'C' ? 1 : 0;
pG[0] = S[0] == 'G' ? 1 : 0;
for (i = 1; i < N; i++) {
pA[i] = pA[i - 1] + (S[i] == 'A' ? 1 : 0);
pC[i] = pC[i - 1] + (S[i] == 'C' ? 1 : 0);
pG[i] = pG[i - 1] + (S[i] == 'G' ? 1 : 0);
}
for (i = 0; i < M; i++) {
a = P[i] - 1;
b = Q[i];
if ((pA[b] - pA[a]) > 0) {
result.A[i] = 1;
} else if ((pC[b] - pC[a]) > 0) {
result.A[i] = 2;
} else if ((pG[b] - pG[a]) > 0) {
result.A[i] = 3;
} else {
result.A[i] = 4;
}
}
return result;
}
Here is my solution Using Segment Tree O(n)+O(log n)+O(M) time
public class DNAseq {
public static void main(String[] args) {
String S="CAGCCTA";
int[] P={2, 5, 0};
int[] Q={4, 5, 6};
int [] results=solution(S,P,Q);
System.out.println(results[0]);
}
static class segmentNode{
int l;
int r;
int min;
segmentNode left;
segmentNode right;
}
public static segmentNode buildTree(int[] arr,int l,int r){
if(l==r){
segmentNode n=new segmentNode();
n.l=l;
n.r=r;
n.min=arr[l];
return n;
}
int mid=l+(r-l)/2;
segmentNode le=buildTree(arr,l,mid);
segmentNode re=buildTree(arr,mid+1,r);
segmentNode root=new segmentNode();
root.left=le;
root.right=re;
root.l=le.l;
root.r=re.r;
root.min=Math.min(le.min,re.min);
return root;
}
public static int getMin(segmentNode root,int l,int r){
if(root.l>r || root.r<l){
return Integer.MAX_VALUE;
}
if(root.l>=l&& root.r<=r) {
return root.min;
}
return Math.min(getMin(root.left,l,r),getMin(root.right,l,r));
}
public static int[] solution(String S, int[] P, int[] Q) {
int[] arr=new int[S.length()];
for(int i=0;i<S.length();i++){
switch (S.charAt(i)) {
case 'A':
arr[i]=1;
break;
case 'C':
arr[i]=2;
break;
case 'G':
arr[i]=3;
break;
case 'T':
arr[i]=4;
break;
default:
break;
}
}
segmentNode root=buildTree(arr,0,S.length()-1);
int[] result=new int[P.length];
for(int i=0;i<P.length;i++){
result[i]=getMin(root,P[i],Q[i]);
}
return result;
} }
Here is a C# solution, the basic idea is pretty much the same as the other answers, but it may be cleaner:
using System;
class Solution
{
public int[] solution(string S, int[] P, int[] Q)
{
int N = S.Length;
int M = P.Length;
char[] chars = {'A','C','G','T'};
//Calculate accumulates
int[,] accum = new int[3, N+1];
for (int i = 0; i <= 2; i++)
{
for (int j = 0; j < N; j++)
{
if(S[j] == chars[i]) accum[i, j+1] = accum[i, j] + 1;
else accum[i, j+1] = accum[i, j];
}
}
//Get minimal nucleotides for the given ranges
int diff;
int[] minimums = new int[M];
for (int i = 0; i < M; i++)
{
minimums[i] = 4;
for (int j = 0; j <= 2; j++)
{
diff = accum[j, Q[i]+1] - accum[j, P[i]];
if (diff > 0)
{
minimums[i] = j+1;
break;
}
}
}
return minimums;
}
}
Here is my solution. Got %100 . Of course I needed to first check and study a little bit prefix sums.
public int[] solution(String S, int[] P, int[] Q){
int[] result = new int[P.length];
int[] factor1 = new int[S.length()];
int[] factor2 = new int[S.length()];
int[] factor3 = new int[S.length()];
int[] factor4 = new int[S.length()];
int factor1Sum = 0;
int factor2Sum = 0;
int factor3Sum = 0;
int factor4Sum = 0;
for(int i=0; i<S.length(); i++){
switch (S.charAt(i)) {
case 'A':
factor1Sum++;
break;
case 'C':
factor2Sum++;
break;
case 'G':
factor3Sum++;
break;
case 'T':
factor4Sum++;
break;
default:
break;
}
factor1[i] = factor1Sum;
factor2[i] = factor2Sum;
factor3[i] = factor3Sum;
factor4[i] = factor4Sum;
}
for(int i=0; i<P.length; i++){
int start = P[i];
int end = Q[i];
if(start == 0){
if(factor1[end] > 0){
result[i] = 1;
}else if(factor2[end] > 0){
result[i] = 2;
}else if(factor3[end] > 0){
result[i] = 3;
}else{
result[i] = 4;
}
}else{
if(factor1[end] > factor1[start-1]){
result[i] = 1;
}else if(factor2[end] > factor2[start-1]){
result[i] = 2;
}else if(factor3[end] > factor3[start-1]){
result[i] = 3;
}else{
result[i] = 4;
}
}
}
return result;
}
If someone is still interested in this exercise, I share my Python solution (100/100 in Codility)
def solution(S, P, Q):
count = []
for i in range(3):
count.append([0]*(len(S)+1))
for index, i in enumerate(S):
count[0][index+1] = count[0][index] + ( i =='A')
count[1][index+1] = count[1][index] + ( i =='C')
count[2][index+1] = count[2][index] + ( i =='G')
result = []
for i in range(len(P)):
start = P[i]
end = Q[i]+1
if count[0][end] - count[0][start]:
result.append(1)
elif count[1][end] - count[1][start]:
result.append(2)
elif count[2][end] - count[2][start]:
result.append(3)
else:
result.append(4)
return result
This is my JavaScript solution that got 100% across the board on Codility:
function solution(S, P, Q) {
let total = [];
let min;
for (let i = 0; i < P.length; i++) {
const substring = S.slice(P[i], Q[i] + 1);
if (substring.includes('A')) {
min = 1;
} else if (substring.includes('C')) {
min = 2;
} else if (substring.includes('G')) {
min = 3;
} else if (substring.includes('T')) {
min = 4;
}
total.push(min);
}
return total;
}
import java.util.Arrays;
import java.util.HashMap;
class Solution {
static HashMap<Character, Integer > characterMapping = new HashMap<Character, Integer>(){{
put('A',1);
put('C',2);
put('G',3);
put('T',4);
}};
public static int minimum(int[] arr) {
if (arr.length ==1) return arr[0];
int smallestIndex = 0;
for (int index = 0; index<arr.length; index++) {
if (arr[index]<arr[smallestIndex]) smallestIndex=index;
}
return arr[smallestIndex];
}
public int[] solution(String S, int[] P, int[] Q) {
final char[] characterInput = S.toCharArray();
final int[] integerInput = new int[characterInput.length];
for(int counter=0; counter < characterInput.length; counter++) {
integerInput[counter] = characterMapping.get(characterInput[counter]);
}
int[] result = new int[P.length];
//assuming P and Q have the same length
for(int index =0; index<P.length; index++) {
if (P[index]==Q[index]) {
result[index] = integerInput[P[index]];
break;
}
final int[] subArray = Arrays.copyOfRange(integerInput, P[index], Q[index]+1);
final int minimumValue = minimum(subArray);
result[index]= minimumValue;
}
return result;
}
}
Here's 100% Scala solution:
def solution(S: String, P: Array[Int], Q: Array[Int]): Array[Int] = {
val resp = for(ind <- 0 to P.length-1) yield {
val sub= S.substring(P(ind),Q(ind)+1)
var factor = 4
if(sub.contains("A")) {factor=1}
else{
if(sub.contains("C")) {factor=2}
else{
if(sub.contains("G")) {factor=3}
}
}
factor
}
return resp.toArray
}
And performance: https://codility.com/demo/results/trainingEUR4XP-425/
Hope this helps.
public int[] solution(String S, int[] P, int[] K) {
// write your code in Java SE 8
char[] sc = S.toCharArray();
int[] A = new int[sc.length];
int[] G = new int[sc.length];
int[] C = new int[sc.length];
int prevA =-1,prevG=-1,prevC=-1;
for(int i=0;i<sc.length;i++){
if(sc[i]=='A')
prevA=i;
else if(sc[i] == 'G')
prevG=i;
else if(sc[i] =='C')
prevC=i;
A[i] = prevA;
G[i] = prevG;
C[i] = prevC;
//System.out.println(A[i]+ " "+G[i]+" "+C[i]);
}
int[] result = new int[P.length];
for(int i=0;i<P.length;i++){
//System.out.println(A[P[i]]+ " "+A[K[i]]+" "+C[P[i]]+" "+C[K[i]]+" "+P[i]+" "+K[i]);
if(A[K[i]] >=P[i] && A[K[i]] <=K[i]){
result[i] =1;
}
else if(C[K[i]] >=P[i] && C[K[i]] <=K[i]){
result[i] =2;
}else if(G[K[i]] >=P[i] && G[K[i]] <=K[i]){
result[i] =3;
}
else{
result[i]=4;
}
}
return result;
}
Python Solution with explanation
The idea is to hold an auxiliary array per nucleotide X, with position i (ignoring zero) is how many times X has occurred as of now. And so if we need the number of occurrences of X from position f to position t, we could take the following equation:
aux(t) - aux(f)
Time complexity is:
O(N+M)
def solution(S, P, Q):
n = len(S)
m = len(P)
aux = [[0 for i in range(n+1)] for i in [0,1,2]]
for i,c in enumerate(S):
aux[0][i+1] = aux[0][i] + ( c == 'A' )
aux[1][i+1] = aux[1][i] + ( c == 'C' )
aux[2][i+1] = aux[2][i] + ( c == 'G' )
result = []
for i in range(m):
fromIndex , toIndex = P[i] , Q[i] +1
if aux[0][toIndex] - aux[0][fromIndex] > 0:
r = 1
elif aux[1][toIndex] - aux[1][fromIndex] > 0:
r = 2
elif aux[2][toIndex] - aux[2][fromIndex] > 0:
r = 3
else:
r = 4
result.append(r)
return result
This is a Swift 4 solution to the same problem. It is based on #codebusta's solution above:
public func solution(_ S : inout String, _ P : inout [Int], _ Q : inout [Int]) -> [Int] {
var impacts = [Int]()
var prefixSum = [[Int]]()
for _ in 0..<3 {
let array = Array(repeating: 0, count: S.count + 1)
prefixSum.append(array)
}
for (index, character) in S.enumerated() {
var a = 0
var c = 0
var g = 0
switch character {
case "A":
a = 1
case "C":
c = 1
case "G":
g = 1
default:
break
}
prefixSum[0][index + 1] = prefixSum[0][index] + a
prefixSum[1][index + 1] = prefixSum[1][index] + c
prefixSum[2][index + 1] = prefixSum[2][index] + g
}
for tuple in zip(P, Q) {
if prefixSum[0][tuple.1 + 1] - prefixSum[0][tuple.0] > 0 {
impacts.append(1)
}
else if prefixSum[1][tuple.1 + 1] - prefixSum[1][tuple.0] > 0 {
impacts.append(2)
}
else if prefixSum[2][tuple.1 + 1] - prefixSum[2][tuple.0] > 0 {
impacts.append(3)
}
else {
impacts.append(4)
}
}
return impacts
}
Here is python solution with little explanation hope it helps some one.
Python codility 100%
def solution(S, P, Q):
"""
https://app.codility.com/demo/results/training8QBVFJ-EQB/
100%
Idea is consider solution as single dimensional array and use concept of prefix some ie.
stores the value in array for p,c and g based on frequency
array stores the frequency of p,c and g for all positions
Example -
# [0, 0, 1, 1, 1, 1, 1, 2] - prefix some of A - represents the max occurrence of A as 2 in array
# [0, 1, 1, 1, 2, 3, 3, 3] - prefix some of C - represents the max occurrence of A as 3 in array
# [0, 0, 0, 1, 1, 1, 1, 1] - prefix some of G - represents the max occurrence of A as 1 in array
# To find the query answers we can just use prefix some and find the distance between position
S = CAGCCTA
P[0] = 2 Q[0] = 4
P[1] = 5 Q[1] = 5
P[2] = 0 Q[2] = 6
Given a non-empty zero-indexed string S consisting of N characters and two non-empty zero-indexed arrays P and Q consisting
of M integers, returns an array consisting of M integers specifying the consecutive answers to all queries.
The part of the DNA between positions 2 and 4 contains nucleotide G and C (twice), whose impact factors are 3 and 2 respectively, so the answer is 2.
The part between positions 5 and 5 contains a single nucleotide T, whose impact factor is 4, so the answer is 4.
The part between positions 0 and 6 (the whole string) contains all nucleotide, in particular nucleotide A whose impact factor is 1, so the answer is 1.
N is an integer within the range [1..100,000];
M is an integer within the range [1..50,000];
each element of arrays P, Q is an integer within the range [0..N − 1];
P[K] ≤ Q[K], where 0 ≤ K < M;
string S consists only of upper-case English letters A, C, G, T.
Ref - https://github.com/ghanan94/codility-lesson-solutions/blob/master/Lesson%2005%20-%20Prefix%20Sums/PrefixSums.pdf
:return: return the values [2, 4, 1]
"""
# two d array - column size is 3 for a,c,g - not taking size 4 since that will be part of else ie. don`t need to calculate
# row size is the length of DNA sequence
prefix_sum_two_d_array = [[0 for i in range(len(S) + 1)] for j in range(3)]
# find the prefix some of all nucleotide in given sequence
for i, nucleotide in enumerate(S):
# store prefix some of each
# nucleotide == 'A -> 1 if true 0 if false
# [0, 0, 1, 1, 1, 1, 1, 2] - prefix some of A - represents the max occurrence of A as 2 in array
prefix_sum_two_d_array[0][i + 1] = prefix_sum_two_d_array[0][i] + (nucleotide == 'A')
# store prefix some of c
# [0, 1, 1, 1, 2, 3, 3, 3] - prefix some of C - represents the max occurrence of A as 3 in array
prefix_sum_two_d_array[1][i + 1] = prefix_sum_two_d_array[1][i] + (nucleotide == 'C')
# store prefix some of g
# [0, 0, 0, 1, 1, 1, 1, 1] - prefix some of G - represents the max occurrence of A as 1 in array
prefix_sum_two_d_array[2][i + 1] = prefix_sum_two_d_array[2][i] + (nucleotide == 'G')
#print(prefix_sum_two_d_array)
# now to find the query answers we can just use prefix some and find the distance between position
query_answers = []
for position in range(len(P)):
# for each query of p
# find the start index from p
start_index = P[position]
# find the end index from Q
end_index = Q[position] + 1
# find the value from prefix some array - just subtract end index and start index to find the value
if prefix_sum_two_d_array[0][end_index] - prefix_sum_two_d_array[0][start_index]:
query_answers.append(1)
elif prefix_sum_two_d_array[1][end_index] - prefix_sum_two_d_array[1][start_index]:
query_answers.append(2)
elif prefix_sum_two_d_array[2][end_index] - prefix_sum_two_d_array[2][start_index]:
query_answers.append(3)
else:
query_answers.append(4)
return query_answers
result = solution("CAGCCTA", [2, 5, 0], [4, 5, 6])
print("Sol " + str(result))
# Sol [2, 4, 1]
My 100% JavaScript solution with O(N + M) time complexity and no use of advanced built-in methods such as .includes, .substring, etc:
function solution(S, P, Q) {
// initialize prefix sums for A, C, G (you don't need T)
const A = [0];
const C = [0];
const G = [0];
// calculate prefix sums for A, C, G
for (let i = 0, len = S.length; i < len; i++) {
A.push(A[i] + Number("A" === S[i]));
C.push(C[i] + Number("C" === S[i]));
G.push(G[i] + Number("G" === S[i]));
}
// calculate the result using prefix sums
const result = [];
for (let i = 0, len = P.length; i < len; i++) {
const from = P[i];
const to = Q[i] + 1;
if (A[to] - A[from] > 0) {
result.push(1);
} else if (C[to] - C[from] > 0) {
result.push(2);
} else if (G[to] - G[from] > 0) {
result.push(3);
} else {
result.push(4); // this is why you don't need T
}
}
return result;
}
pshemek's solution constrains itself to the space complexity (O(N)) - even with the 2-d array and the answer array because a constant (4) is used for the 2-d array. That solution also fits in with the computational complexity - whereas mine is O (N^2) - though the actual computational complexity is much lower because it skips over entire ranges that include minimal values.
I gave it a try - but mine ends up using more space - but makes more intuitive sense to me (C#):
public static int[] solution(String S, int[] P, int[] Q)
{
const int MinValue = 1;
Dictionary<char, int> stringValueTable = new Dictionary<char,int>(){ {'A', 1}, {'C', 2}, {'G', 3}, {'T', 4} };
char[] inputArray = S.ToCharArray();
int[,] minRangeTable = new int[S.Length, S.Length]; // The meaning of this table is [x, y] where x is the start index and y is the end index and the value is the min range - if 0 then it is the min range (whatever that is)
for (int startIndex = 0; startIndex < S.Length; ++startIndex)
{
int currentMinValue = 4;
int minValueIndex = -1;
for (int endIndex = startIndex; (endIndex < S.Length) && (minValueIndex == -1); ++endIndex)
{
int currentValue = stringValueTable[inputArray[endIndex]];
if (currentValue < currentMinValue)
{
currentMinValue = currentValue;
if (currentMinValue == MinValue) // We can stop iterating - because anything with this index in its range will always be minimal
minValueIndex = endIndex;
else
minRangeTable[startIndex, endIndex] = currentValue;
}
else
minRangeTable[startIndex, endIndex] = currentValue;
}
if (minValueIndex != -1) // Skip over this index - since it is minimal
startIndex = minValueIndex; // We would have a "+ 1" here - but the "auto-increment" in the for statement will get us past this index
}
int[] result = new int[P.Length];
for (int outputIndex = 0; outputIndex < result.Length; ++outputIndex)
{
result[outputIndex] = minRangeTable[P[outputIndex], Q[outputIndex]];
if (result[outputIndex] == 0) // We could avoid this if we initialized our 2-d array with 1's
result[outputIndex] = 1;
}
return result;
}
In pshemek's answer - the "trick" in the second loop is simply that once you've determined you've found a range with the minimal value - you don't need to continue iterating. Not sure if that helps.
The php 100/100 solution:
function solution($S, $P, $Q) {
$S = str_split($S);
$len = count($S);
$lep = count($P);
$arr = array();
$result = array();
$clone = array_fill(0, 4, 0);
for($i = 0; $i < $len; $i++){
$arr[$i] = $clone;
switch($S[$i]){
case 'A':
$arr[$i][0] = 1;
break;
case 'C':
$arr[$i][1] = 1;
break;
case 'G':
$arr[$i][2] = 1;
break;
default:
$arr[$i][3] = 1;
break;
}
}
for($i = 1; $i < $len; $i++){
for($j = 0; $j < 4; $j++){
$arr[$i][$j] += $arr[$i - 1][$j];
}
}
for($i = 0; $i < $lep; $i++){
$x = $P[$i];
$y = $Q[$i];
for($a = 0; $a < 4; $a++){
$sub = 0;
if($x - 1 >= 0){
$sub = $arr[$x - 1][$a];
}
if($arr[$y][$a] - $sub > 0){
$result[$i] = $a + 1;
break;
}
}
}
return $result;
}
This program has got score 100 and performance wise has got an edge over other java codes listed above!
The code can be found here.
public class GenomicRange {
final int Index_A=0, Index_C=1, Index_G=2, Index_T=3;
final int A=1, C=2, G=3, T=4;
public static void main(String[] args) {
GenomicRange gen = new GenomicRange();
int[] M = gen.solution( "GACACCATA", new int[] { 0,0,4,7 } , new int[] { 8,2,5,7 } );
System.out.println(Arrays.toString(M));
}
public int[] solution(String S, int[] P, int[] Q) {
int[] M = new int[P.length];
char[] charArr = S.toCharArray();
int[][] occCount = new int[3][S.length()+1];
int charInd = getChar(charArr[0]);
if(charInd!=3) {
occCount[charInd][1]++;
}
for(int sInd=1; sInd<S.length(); sInd++) {
charInd = getChar(charArr[sInd]);
if(charInd!=3)
occCount[charInd][sInd+1]++;
occCount[Index_A][sInd+1]+=occCount[Index_A][sInd];
occCount[Index_C][sInd+1]+=occCount[Index_C][sInd];
occCount[Index_G][sInd+1]+=occCount[Index_G][sInd];
}
for(int i=0;i<P.length;i++) {
int a,c,g;
if(Q[i]+1>=occCount[0].length) continue;
a = occCount[Index_A][Q[i]+1] - occCount[Index_A][P[i]];
c = occCount[Index_C][Q[i]+1] - occCount[Index_C][P[i]];
g = occCount[Index_G][Q[i]+1] - occCount[Index_G][P[i]];
M[i] = a>0? A : c>0 ? C : g>0 ? G : T;
}
return M;
}
private int getChar(char c) {
return ((c=='A') ? Index_A : ((c=='C') ? Index_C : ((c=='G') ? Index_G : Index_T)));
}
}
Here's a simple javascript solution which got 100%.
function solution(S, P, Q) {
var A = [];
var C = [];
var G = [];
var T = [];
var result = [];
var i = 0;
S.split('').forEach(function(a) {
if (a === 'A') {
A.push(i);
} else if (a === 'C') {
C.push(i);
} else if (a === 'G') {
G.push(i);
} else {
T.push(i);
}
i++;
});
function hasNucl(typeArray, start, end) {
return typeArray.some(function(a) {
return a >= P[j] && a <= Q[j];
});
}
for(var j=0; j<P.length; j++) {
if (hasNucl(A, P[j], P[j])) {
result.push(1)
} else if (hasNucl(C, P[j], P[j])) {
result.push(2);
} else if (hasNucl(G, P[j], P[j])) {
result.push(3);
} else {
result.push(4);
}
}
return result;
}
perl 100/100 solution:
sub solution {
my ($S, $P, $Q)=#_; my #P=#$P; my #Q=#$Q;
my #_A = (0), #_C = (0), #_G = (0), #ret =();
foreach (split //, $S)
{
push #_A, $_A[-1] + ($_ eq 'A' ? 1 : 0);
push #_C, $_C[-1] + ($_ eq 'C' ? 1 : 0);
push #_G, $_G[-1] + ($_ eq 'G' ? 1 : 0);
}
foreach my $i (0..$#P)
{
my $from_index = $P[$i];
my $to_index = $Q[$i] + 1;
if ( $_A[$to_index] - $_A[$from_index] > 0 )
{
push #ret, 1;
next;
}
if ( $_C[$to_index] - $_C[$from_index] > 0 )
{
push #ret, 2;
next;
}
if ( $_G[$to_index] - $_G[$from_index] > 0 )
{
push #ret, 3;
next;
}
push #ret, 4
}
return #ret;
}
Java 100/100
class Solution {
public int[] solution(String S, int[] P, int[] Q) {
int qSize = Q.length;
int[] answers = new int[qSize];
char[] sequence = S.toCharArray();
int[][] occCount = new int[3][sequence.length+1];
int[] geneImpactMap = new int['G'+1];
geneImpactMap['A'] = 0;
geneImpactMap['C'] = 1;
geneImpactMap['G'] = 2;
if(sequence[0] != 'T') {
occCount[geneImpactMap[sequence[0]]][0]++;
}
for(int i = 0; i < sequence.length; i++) {
occCount[0][i+1] = occCount[0][i];
occCount[1][i+1] = occCount[1][i];
occCount[2][i+1] = occCount[2][i];
if(sequence[i] != 'T') {
occCount[geneImpactMap[sequence[i]]][i+1]++;
}
}
for(int j = 0; j < qSize; j++) {
for(int k = 0; k < 3; k++) {
if(occCount[k][Q[j]+1] - occCount[k][P[j]] > 0) {
answers[j] = k+1;
break;
}
answers[j] = 4;
}
}
return answers;
}
}
In ruby (100/100)
def interval_sum x,y,p
p[y+1] - p[x]
end
def solution(s,p,q)
#Hash of arrays with prefix sums
p_sums = {}
respuesta = []
%w(A C G T).each do |letter|
p_sums[letter] = Array.new s.size+1, 0
end
(0...s.size).each do |count|
%w(A C G T).each do |letter|
p_sums[letter][count+1] = p_sums[letter][count]
end if count > 0
case s[count]
when 'A'
p_sums['A'][count+1] += 1
when 'C'
p_sums['C'][count+1] += 1
when 'G'
p_sums['G'][count+1] += 1
when 'T'
p_sums['T'][count+1] += 1
end
end
(0...p.size).each do |count|
x = p[count]
y = q[count]
if interval_sum(x, y, p_sums['A']) > 0 then
respuesta << 1
next
end
if interval_sum(x, y, p_sums['C']) > 0 then
respuesta << 2
next
end
if interval_sum(x, y, p_sums['G']) > 0 then
respuesta << 3
next
end
if interval_sum(x, y, p_sums['T']) > 0 then
respuesta << 4
next
end
end
respuesta
end
simple php 100/100 solution
function solution($S, $P, $Q) {
$result = array();
for ($i = 0; $i < count($P); $i++) {
$from = $P[$i];
$to = $Q[$i];
$length = $from >= $to ? $from - $to + 1 : $to - $from + 1;
$new = substr($S, $from, $length);
if (strpos($new, 'A') !== false) {
$result[$i] = 1;
} else {
if (strpos($new, 'C') !== false) {
$result[$i] = 2;
} else {
if (strpos($new, 'G') !== false) {
$result[$i] = 3;
} else {
$result[$i] = 4;
}
}
}
}
return $result;
}
Here's my Java (100/100) Solution:
class Solution {
private ImpactFactorHolder[] mHolder;
private static final int A=0,C=1,G=2,T=3;
public int[] solution(String S, int[] P, int[] Q) {
mHolder = createImpactHolderArray(S);
int queriesLength = P.length;
int[] result = new int[queriesLength];
for (int i = 0; i < queriesLength; ++i ) {
int value = 0;
if( P[i] == Q[i]) {
value = lookupValueForIndex(S.charAt(P[i])) + 1;
} else {
value = calculateMinImpactFactor(P[i], Q[i]);
}
result[i] = value;
}
return result;
}
public int calculateMinImpactFactor(int P, int Q) {
int minImpactFactor = 3;
for (int nucleotide = A; nucleotide <= T; ++nucleotide ) {
int qValue = mHolder[nucleotide].mOcurrencesSum[Q];
int pValue = mHolder[nucleotide].mOcurrencesSum[P];
// handling special cases when the less value is assigned on the P index
if( P-1 >= 0 ) {
pValue = mHolder[nucleotide].mOcurrencesSum[P-1] == 0 ? 0 : pValue;
} else if ( P == 0 ) {
pValue = mHolder[nucleotide].mOcurrencesSum[P] == 1 ? 0 : pValue;
}
if ( qValue - pValue > 0) {
minImpactFactor = nucleotide;
break;
}
}
return minImpactFactor + 1;
}
public int lookupValueForIndex(char nucleotide) {
int value = 0;
switch (nucleotide) {
case 'A' :
value = A;
break;
case 'C' :
value = C;
break;
case 'G':
value = G;
break;
case 'T':
value = T;
break;
default:
break;
}
return value;
}
public ImpactFactorHolder[] createImpactHolderArray(String S) {
int length = S.length();
ImpactFactorHolder[] holder = new ImpactFactorHolder[4];
holder[A] = new ImpactFactorHolder(1,'A', length);
holder[C] = new ImpactFactorHolder(2,'C', length);
holder[G] = new ImpactFactorHolder(3,'G', length);
holder[T] = new ImpactFactorHolder(4,'T', length);
int i =0;
for(char c : S.toCharArray()) {
int nucleotide = lookupValueForIndex(c);
++holder[nucleotide].mAcum;
holder[nucleotide].mOcurrencesSum[i] = holder[nucleotide].mAcum;
holder[A].mOcurrencesSum[i] = holder[A].mAcum;
holder[C].mOcurrencesSum[i] = holder[C].mAcum;
holder[G].mOcurrencesSum[i] = holder[G].mAcum;
holder[T].mOcurrencesSum[i] = holder[T].mAcum;
++i;
}
return holder;
}
private static class ImpactFactorHolder {
public ImpactFactorHolder(int impactFactor, char nucleotide, int length) {
mImpactFactor = impactFactor;
mNucleotide = nucleotide;
mOcurrencesSum = new int[length];
mAcum = 0;
}
int mImpactFactor;
char mNucleotide;
int[] mOcurrencesSum;
int mAcum;
}
}
Link: https://codility.com/demo/results/demoJFB5EV-EG8/
I'm looking forward to implement a Segment Tree similar to #Abhishek Kumar solution
My C++ solution
vector<int> solution(string &S, vector<int> &P, vector<int> &Q) {
vector<int> impactCount_A(S.size()+1, 0);
vector<int> impactCount_C(S.size()+1, 0);
vector<int> impactCount_G(S.size()+1, 0);
int lastTotal_A = 0;
int lastTotal_C = 0;
int lastTotal_G = 0;
for (int i = (signed)S.size()-1; i >= 0; --i) {
switch(S[i]) {
case 'A':
++lastTotal_A;
break;
case 'C':
++lastTotal_C;
break;
case 'G':
++lastTotal_G;
break;
};
impactCount_A[i] = lastTotal_A;
impactCount_C[i] = lastTotal_C;
impactCount_G[i] = lastTotal_G;
}
vector<int> results(P.size(), 0);
for (int i = 0; i < P.size(); ++i) {
int pIndex = P[i];
int qIndex = Q[i];
int numA = impactCount_A[pIndex]-impactCount_A[qIndex+1];
int numC = impactCount_C[pIndex]-impactCount_C[qIndex+1];
int numG = impactCount_G[pIndex]-impactCount_G[qIndex+1];
if (numA > 0) {
results[i] = 1;
}
else if (numC > 0) {
results[i] = 2;
}
else if (numG > 0) {
results[i] = 3;
}
else {
results[i] = 4;
}
}
return results;
}
/* 100/100 solution C++.
Using prefix sums. Firstly converting chars to integer in nuc variable. Then in a bi-dimensional vector we account the occurrence in S of each nucleoside x in it's respective prefix_sum[s][x]. After we just have to find out the lower nucluoside that occurred in each interval K.
*/
.
vector solution(string &S, vector &P, vector &Q) {
int n=S.size();
int m=P.size();
vector<vector<int> > prefix_sum(n+1,vector<int>(4,0));
int nuc;
//prefix occurrence sum
for (int s=0;s<n; s++) {
nuc = S.at(s) == 'A' ? 1 : (S.at(s) == 'C' ? 2 : (S.at(s) == 'G' ? 3 : 4) );
for (int u=0;u<4;u++) {
prefix_sum[s+1][u] = prefix_sum[s][u] + ((u+1)==nuc?1:0);
}
}
//find minimal impact factor in each interval K
int lower_impact_factor;
for (int k=0;k<m;k++) {
lower_impact_factor=4;
for (int u=2;u>=0;u--) {
if (prefix_sum[Q[k]+1][u] - prefix_sum[P[k]][u] != 0)
lower_impact_factor = u+1;
}
P[k]=lower_impact_factor;
}
return P;
}
static public int[] solution(String S, int[] P, int[] Q) {
// write your code in Java SE 8
int A[] = new int[S.length() + 1], C[] = new int[S.length() + 1], G[] = new int[S.length() + 1];
int last_a = 0, last_c = 0, last_g = 0;
int results[] = new int[P.length];
int p = 0, q = 0;
for (int i = S.length() - 1; i >= 0; i -= 1) {
switch (S.charAt(i)) {
case 'A': {
last_a += 1;
break;
}
case 'C': {
last_c += 1;
break;
}
case 'G': {
last_g += 1;
break;
}
}
A[i] = last_a;
G[i] = last_g;
C[i] = last_c;
}
for (int i = 0; i < P.length; i++) {
p = P[i];
q = Q[i];
if (A[p] - A[q + 1] > 0) {
results[i] = 1;
} else if (C[p] - C[q + 1] > 0) {
results[i] = 2;
} else if (G[p] - G[q + 1] > 0) {
results[i] = 3;
} else {
results[i] = 4;
}
}
return results;
}
scala solution 100/100
import scala.annotation.switch
import scala.collection.mutable
object Solution {
def solution(s: String, p: Array[Int], q: Array[Int]): Array[Int] = {
val n = s.length
def arr = mutable.ArrayBuffer.fill(n + 1)(0L)
val a = arr
val c = arr
val g = arr
val t = arr
for (i <- 1 to n) {
def inc(z: mutable.ArrayBuffer[Long]): Unit = z(i) = z(i - 1) + 1L
def shift(z: mutable.ArrayBuffer[Long]): Unit = z(i) = z(i - 1)
val char = s(i - 1)
(char: #switch) match {
case 'A' => inc(a); shift(c); shift(g); shift(t);
case 'C' => shift(a); inc(c); shift(g); shift(t);
case 'G' => shift(a); shift(c); inc(g); shift(t);
case 'T' => shift(a); shift(c); shift(g); inc(t);
}
}
val r = mutable.ArrayBuffer.fill(p.length)(0)
for (i <- p.indices) {
val start = p(i)
val end = q(i) + 1
r(i) =
if (a(start) != a(end)) 1
else if (c(start) != c(end)) 2
else if (g(start) != g(end)) 3
else if (t(start) != t(end)) 4
else 0
}
r.toArray
}
}