Faster method to extract distinct string from an Arraylist

Faster method to extract distinct string from an Arraylist - java

I have an ArrayList of Dico and I try to extract a distinct string from Arraylist of Dico.
This is the Dico class.
public class Dico implements Comparable {
private final String m_term;
private double m_weight;
private final int m_Id_doc;
public Dico(int Id_Doc, String Term, double tf_ief) {
this.m_Id_doc = Id_Doc;
this.m_term = Term;
this.m_weight = tf_ief;
}
public String getTerm() {
return this.m_term;
}
public double getWeight() {
return this.m_weight;
}
public void setWeight(double weight) {
this.m_weight = weight;
}
public int getDocId() {
return this.m_Id_doc;
}
}
I use this function to extract 1000 distinct value from middle of this array:
i start form the middle and i take only distinct value in both direction left and right
public static List <String> get_sinificativ_term(List<Dico> dico)
{
List <String> term = new ArrayList();
int pos_median= ( dico.size() / 2 );
int count=0;
int i=0;
int j=0;
String temp_d = dico.get(pos_median).getTerm();
String temp_g =temp_d;
term.add(temp_d);
while(count < 999) // count of element
{
if(!temp_d.equals(dico.get( ( pos_median + i) ).getTerm()))
{
temp_d = dico.get(( pos_median + i)).getTerm(); // save current term in temp
// System.out.println(temp_d);
term.add(temp_d); // add term to list
i++; // go to the next value-->right
count++;
// System.out.println(temp_d);
}
else
i++; // go to the next value-->right
if(!temp_g.equals(dico.get( ( pos_median+j ) ).getTerm()))
{
temp_g = dico.get(( pos_median+j )).getTerm();
term.add(temp_g );// add term to array
// System.out.println(temp_g);
j--; // go to the next value-->left
count++;
}
else
j--;// go to the next value-->left
}
return term;
}
I would like to make my solution more faster than this function,if is possible can i make this with Java SE 8 Streams ?

Streams will not make it faster but can make it much simpler and clearer.
Here's the simplest version. It will take all list indexes, sort them by distance to the middle of the list, get the corresponding term, filter out duplicates and limit to 1000 elements. It will certainly be slower than your iterative code, but much easier to follow because the code neatly mirrors its English description:
public static List<String> get_sinificativ_term(List<Dico> dicolist) {
int size = dicolist.size();
return IntStream.range(0, size)
.boxed()
.sorted(comparing(i -> Math.abs(size / 2 - i)))
.map(dicolist::get)
.map(Dico::getTerm)
.distinct()
.limit(1000)
.collect(toList());
}
If your list is really huge and you want to avoid sorting it, you can trade away some simplicity for performance. This version does a bit of math to go right-left-right-left from center:
public static List<String> get_sinificativ_term(List<Dico> dicolist) {
int size = dicolist.size();
return IntStream.range(0, size)
.map(i -> i % 2 == 0 ? (size + i) / 2 : (size - i - 1) / 2)
.mapToObj(i -> dicolist.get(i).getTerm())
.distinct()
.limit(1000)
.collect(toList());
}

Can't you do something like this?
public static List <String> get_sinificativ_term(List<Dico> dico) {
List<String> list = dico.stream()
.map(Dico::getTerm)
.distinct()
.limit(1000)
.collect(Collectors.toList());
if(list.size() != 1000) {
throw new IllegalStateException("Need at least 1000 distinct values");
}
return list;
}
You need to check the size because you can have less than 1000 distinct values. If efficiency is a concern you can try to run the pipeline in parallel and measure if its faster.

Related

What is the time complexity of my solutions?

First method:
public static List<List<Integer>> applicationPairs(int deviceCapacity, List<List<Integer>> foregroundAppList, List<List<Integer>> backgroundAppList) {
List<List<Integer>> result = new ArrayList<>();
int max = Integer.MIN_VALUE;
for (List<Integer> foregroundApp : foregroundAppList) {
for (List<Integer> backgroundApp : backgroundAppList) {
int memoryRequired = foregroundApp.get(1) + backgroundApp.get(1);
if (memoryRequired <= deviceCapacity) {
if (memoryRequired > max) {
result.clear();
max = memoryRequired;
result.add(Arrays.asList(foregroundApp.get(0), backgroundApp.get(0)));
} else if (memoryRequired == max) {
result.add(Arrays.asList(foregroundApp.get(0), backgroundApp.get(0)));
}
}
}
}
/*
return empty pair if no pair is found
*/
if (result.size() == 0) {
return new ArrayList<>(Collections.emptyList());
}
return result;
}
Second method:
public static List<String> sortOrders(List<String> orderList) {
// Write your code here
var result = orderList.stream()
.filter(e -> e.split(" ")[1].matches("[a-z]+"))
.sorted((s, t1) -> {
int r = s.substring(s.indexOf(" ")).compareTo(t1.substring(t1.indexOf(" ")));
if (r == 0) {
return s.substring(0, s.indexOf(" ")).compareTo(t1.substring(0, t1.indexOf(" ")));
}
return r;
})//.thenComparing(s -> s.substring(0, s.indexOf(" "))))
.collect(toList());
return Stream.concat(result.stream(), orderList.stream().filter(e -> !result.contains(e)))
.collect(toList());
}
I was asked to write these solutions as amazon's online assessment thing. Question also said to write the time complexity of the solutions, which I am not good at calculating.
Anyone?

In the first method if you see you are iterating over two lists, foregroundAppList, and backgroundAppList. As you are iterating each element backgroundAppList per element if foregroundAppList, it's clearly a time complexity of O(n*m) Where n is the size of backgroundAppList and m is the size of foregroundAppList.
In the second method, you have implemented a custom sort method and as you might be aware the Java sorted method uses Quicksort for list of primitives and merge sort for list of objects, where in both the cases the average time complexity fall under O(n*log(n)) where n is the size of the list.

Generate flat tree using Stream API

By flat tree I mean a table where each child has a link to its parent. Also tree have to be limited by depth and size (max number of nodes). When object and parent ids are equal it's one of the root nodes. So want to generate data structure like this:
id | parent id
1 | 1
2 | 2
3 | 2
4 | 1
I've solved this task, but resulting code somewhat cumbersome:
private static final ThreadLocalRandom RANDOM = ThreadLocalRandom.current();
public static <I, T extends Node<I>> List<T> generateTree(int count,
int maxDepth,
Supplier<T> supplier) {
List<T> result = new ArrayList<>(count);
int remainingDepth = maxDepth;
while (remainingDepth > 0 && result.size() < count) {
final boolean firstStep = result.isEmpty();
final boolean lastStep = remainingDepth == 1;
final int remainingCount = count - result.size();
final int generatedCount = !lastStep ?
RANDOM.nextInt(1, remainingCount + 1) :
remainingCount;
List<T> generatedNodes = IntStream.range(0, generatedCount).boxed()
.map(i -> {
T value = supplier.get();
value.parentId = firstStep ?
value.id : // root node, id = parent id
result.get(RANDOM.nextInt(0, result.size())).id; // child node, find random parent
return value;
})
.collect(Collectors.toList());
result.addAll(generatedNodes);
remainingDepth--;
}
return result;
}
static class Node<I> {
public I id;
public I parentId;
public Node(I id) {
this.id = id;
}
#Override
public String toString() {
return id + " | " + parentId;
}
}
#Test
public void test() {
List<Node<Integer>> result = generateTree(100, 4, () -> new Node<>(RANDOM.nextInt(100, 1000)));
result.forEach(System.out::println);
}
Yeah, it can be simplified by remove a couple of unnecessary variables, those just improve readability, but in general I don't see ways to improve the most complex part - obtaining random parent id.
So I wonder is it possible to rewrite this implementation to Stream API (I mean reducers, of course). Will it be simpler? I've tried to do so but functional paradigm just blows my brain. Could someone please help me?

I don’t see a way to rewrite the assignment of parent IDs using the Stream API to simpler code.
Instead, separate the two different concerns of the code 1) generate count Node objects and 2) assign them parent IDs.
public static <I, T extends Node<I>>
List<T> generateTree(int count, int maxDepth, Supplier<T> supplier) {
ThreadLocalRandom r = ThreadLocalRandom.current();
List<T> result = IntStream.range(0, count)
.mapToObj(i -> supplier.get())
.collect(Collectors.toList());
for(int index = 0, level = 0, numItemsInThisLevel;
level < maxDepth && index < count; level++, index += numItemsInThisLevel) {
int remaining = count - index;
numItemsInThisLevel = level < maxDepth - 1? r.nextInt(remaining) + 1: remaining;
for(T value: result.subList(index, index + numItemsInThisLevel))
value.parentId = index == 0? value.id: result.get(r.nextInt(0, index)).id;
}
return result;
}
Generating count objects is straight-forward and should not need further explanation. Your algorithm, as far as I understood, iterates over up to maxDepth ranges of the objects and assigns them parent IDs taken from the IDs of random objects before that range. I wrote this as a loop over ranges reflecting exactly that. Note that it would be easy to adapt this to use only IDs from the previous level, to get the specified depth exactly.
One important note: ThreadLocalRandom is, as the name suggests, local to the thread and should always get acquired via current() by the using thread. So storing it in a static final variable means, only the thread that executed the class initializer would be allowed to use that instance. On the other hand, current() is cheap, there would be no advantage in caching the result anyway.

Ok, I've figured out better solution. It's a custom Collector. Also, it's easier to track depth when it's just one of node properties, which can easily done via wrapper. Here's an example:
static class Node<I> {
public I id;
public I parentId;
public int depth;
public Node(I id) {
this.id = id;
this.parentId = id;
}
#Override
public String toString() {
return id + " | " + parentId + " | depth: " + depth;
}
}
public static <I, T extends Node<I>> List<T> generateTreeStream(int count,
int maxDepth,
Supplier<T> supplier) {
Collector<T, List<T>, List<T>> collector = Collector.of(
ArrayList::new,
(list, node) -> {
if (!list.isEmpty()) {
int random = ThreadLocalRandom.current().nextInt(0, list.size());
T parent = list.get(random);
if (parent.depth < maxDepth) {
node.parentId = parent.id;
node.depth = parent.depth + 1;
}
}
list.add(node);
},
(left, right) -> {
left.addAll(right);
return left;
}
);
return IntStream.range(0, count)
.mapToObj(i -> supplier.get())
.collect(collector);
}

Method for returning the mode that occurs most frequently

I need help on this problem:
Write the method mode that accepts an array of test grades(0 – 100) and return the mode, the grade that occurs most frequently. If there are multiple modes, return the smallest.
My code does not work: Please help, thanks!
public static int mode(int[] marks) {
int count = 0;
int count0 = 0;
int mostFrequent = marks[0];
for (int x = 0; x < marks.length; x++) {
if (marks[x] == mostFrequent) {
count++;
}
}
if (count>count0) {
count0 = count;
mostFrequent = marks[count];
}
return mostFrequent;
}
E.g:
If marks = {1,2,4,6,1,1,1}, it works
if marks = {1,2,4,1,3,2,3,5,5}, it does not work

You need to count the occurrence of all numbers. Try the following.
public static int mode(int[] marks) {
Map<Integer, Integer> myMap = new HashMap<>();
IntStream.of(marks).forEach(x -> myMap.merge(x, 1, Integer::sum)); // Take the occurences of each number
Integer maxOccurence = Collections.max(myMap.values()); // Take the maximum
return myMap.entrySet().stream().filter(entry -> entry.getValue() == maxOccurence).findFirst().get().getKey(); // Take the first(smallest) key which matches the maximum
}

This should do the trick:
public static int mode(int[] marks) {
return Arrays.stream(marks)
.boxed()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet()
.stream()
.sorted(Comparator.comparing(Map.Entry<Integer, Long>::getValue).reversed().thenComparing(Map.Entry<Integer, Long>::getKey))
.findFirst()
.map(Map.Entry::getKey).orElse(0);
}
Note: the last line sets 0 as a default value in case the array which is passed is empty.
.map(Map.Entry::getKey).orElse(0);
If you want to throw an exception in such a case change it to:
.map(Map.Entry::getKey).orElseThrow(RuntimeException::new);

Java 8 stream filter

I hope someone can help me I am trying to find a way with which i can filter a list based on a condition
public class Prices {
private String item;
private double price;
//......
}
For example i have a list of above object List has the following data
item, price
a 100,
b 200,
c 250,
d 350,
e 450
is there a way to use streams and filter on List so that at the end of it we are left with only objects that have a sum of prices less that a given input value
Say if the input value is 600, so the resultant list would only have a,b,c,d as these are the objects whose price, when added to each other, the sum takes it closer to 600. So e would not be included in the final filtered list.
If the input/given value is 300 then the filtered list will only have a and b.
The list is already sorted and will start from the top and keep on adding till the given value is reached
Thanks
Regards

You can write this static method, that create suitable predicate:
public static Predicate<Prices> byLimitedSum(int limit) {
return new Predicate<Prices>() {
private int sum = 0;
#Override
public boolean test(Prices prices) {
if (sum < limit) {
sum += prices.price;
return true;
}
return false;
}
};
}
And use it:
List<Prices> result = prices.stream()
.filter(byLimitedSum(600))
.collect(Collectors.toList());
But it is bad solution for parallelStream.
Anyway i think in this case stream and filter using is not so good decision, cause readability is not so good. Better way, i think, is write util static method like this:
public static List<Prices> filterByLimitedSum(List<Prices> prices, int limit) {
List<Prices> result = new ArrayList<>();
int sum = 0;
for (Prices price : prices) {
if (sum < limit) {
result.add(price);
sum += price.price;
} else {
break;
}
}
return result;
}
Or you can write wrapper for List<Prices> and add public method into new class. Use streams wisely.

Given you requirements, you can use Java 9's takeWhile.
You'll need to define a Predicate having a state:
Predicate<Prices> pred = new Predicate<Prices>() {
double sum = 0.0;
boolean reached = false;
public boolean test (Prices p) {
sum += p.getPrice();
if (sum >= 600.0) { // reached the sum
if (reached) { // already reached the some before, reject element
return false;
} else { // first time we reach the sum, so current element is still accepted
reached = true;
return true;
}
} else { // haven't reached the sum yet, accept current element
return true;
}
}
};
List<Prices> sublist =
input.stream()
.takeWhile(pred)
.collect(Collectors.toList());

The simplest solution for this kind of task is still a loop, e.g.
double priceExpected = 600;
int i = 0;
for(double sumCheck = 0; sumCheck < priceExpected && i < list.size(); i++)
sumCheck += list.get(i).getPrice();
List<Prices> resultList = list.subList(0, i);
A Stream solution fulfilling all formal criteria for correctness, is much more elaborated:
double priceThreshold = 600;
List<Prices> resultList = list.stream().collect(
() -> new Object() {
List<Prices> current = new ArrayList<>();
double accumulatedPrice;
},
(o, p) -> {
if(o.accumulatedPrice < priceThreshold) {
o.current.add(p);
o.accumulatedPrice += p.getPrice();
}
},
(a,b) -> {
if(a.accumulatedPrice+b.accumulatedPrice <= priceThreshold) {
a.current.addAll(b.current);
a.accumulatedPrice += b.accumulatedPrice;
}
else for(int i=0; a.accumulatedPrice<priceThreshold && i<b.current.size(); i++) {
a.current.add(b.current.get(i));
a.accumulatedPrice += b.current.get(i).getPrice();
}
}).current;
This would even work in parallel by just replacing stream() with parallelStream(), but it would not only require a sufficiently large source list to gain a benefit, since the loop can stop at the first element exceeding the threshold, the result list must be significantly larger than ¹/n of the source list (where n is the number of cores) before the parallel processing can have an advantage at all.
Also the loop solution shown above is non-copying.

Using a simple for loop would be much much simpler, and this is abusive indeed as Holger mentions, I took it only as an exercise.
Seems like you need a stateful filter or a short-circuit reduce. I can think of this:
static class MyException extends RuntimeException {
private final List<Prices> prices;
public MyException(List<Prices> prices) {
this.prices = prices;
}
public List<Prices> getPrices() {
return prices;
}
// make it a cheap "stack-trace-less" exception
#Override
public Throwable fillInStackTrace() {
return this;
}
}
This is needed to break from the reduce when we are done. From here the usage is probably trivial:
List<Prices> result;
try {
result = List.of(
new Prices("a", 100),
new Prices("b", 200),
new Prices("c", 250),
new Prices("d", 350),
new Prices("e", 450))
.stream()
.reduce(new ArrayList<>(),
(list, e) -> {
double total = list.stream().mapToDouble(Prices::getPrice).sum();
ArrayList<Prices> newL = new ArrayList<>(list);
if (total < 600) {
newL.add(e);
return newL;
}
throw new MyException(newL);
},
(left, right) -> {
throw new RuntimeException("Not for parallel");
});
} catch (MyException e) {
e.printStackTrace();
result = e.getPrices();
}
result.forEach(x -> System.out.println(x.getItem()));

How to get random objects from a stream

Lets say I have a list of words and i want to create a method which takes the size of the new list as a parameter and returns the new list. How can i get random words from my original sourceList?
public List<String> createList(int listSize) {
Random rand = new Random();
List<String> wordList = sourceWords.
stream().
limit(listSize).
collect(Collectors.toList());
return wordList;
}
So how and where can I use my Random?

I've found a proper solution.
Random provides a few methods to return a stream. For example ints(size) which creates a stream of random integers.
public List<String> createList(int listSize)
{
Random rand = new Random();
List<String> wordList = rand.
ints(listSize, 0, sourceWords.size()).
mapToObj(i -> sourceWords.get(i)).
collect(Collectors.toList());
return wordList;
}

I think the most elegant way is to have a special collector.
I am pretty sure the only way you can guarantee that each item has an equal chance of being picked, is to collect, shuffle and re-stream. This can be easily done using built-in Collectors.collectingAndThen(...) helper.
Sorting by a random comparator or using randomized reducer, like suggested on some other answers, will result in very biased randomness.
List<String> wordList = sourceWords.stream()
.collect(Collectors.collectingAndThen(Collectors.toList(), collected -> {
Collections.shuffle(collected);
return collected.stream();
}))
.limit(listSize)
.collect(Collectors.toList());
You can move that shuffling collector to a helper function:
public class CollectorUtils {
public static <T> Collector<T, ?, Stream<T>> toShuffledStream() {
return Collectors.collectingAndThen(Collectors.toList(), collected -> {
Collections.shuffle(collected);
return collected.stream();
});
}
}
I assume that you are looking for a way to nicely integrate with other stream processing functions. So following straightforward solution is not what you are looking for :)
Collections.shuffle(wordList)
return wordList.subList(0, limitSize)

This is my one line solution:
List<String> st = Arrays.asList("aaaa","bbbb","cccc");
st.stream().sorted((o1, o2) -> RandomUtils.nextInt(0, 2)-1).findFirst().get();
RandomUtils are from commons lang 3

Here's a solution I came up with which seems to differ from all the other ones, so I figured why not add it to the pile.
Basically it works by using the same kind of trick as one iteration of Collections.shuffle each time you ask for the next element - pick a random element, swap that element with the first one in the list, move the pointer forwards. Could also do it with the pointer counting back from the end.
The caveat is that it does mutate the list you passed in, but I guess you could just take a copy as the first thing if you didn't like that. We were more interested in reducing redundant copies.
private static <T> Stream<T> randomStream(List<T> list)
{
int characteristics = Spliterator.SIZED;
// If you know your list is also unique / immutable / non-null
//int characteristics = Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.SIZED;
Spliterator<T> spliterator = new Spliterators.AbstractSpliterator<T>(list.size(), characteristics)
{
private final Random random = new SecureRandom();
private final int size = list.size();
private int frontPointer = 0;
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
if (frontPointer == size)
{
return false;
}
// Same logic as one iteration of Collections.shuffle, so people talking about it not being
// fair randomness can take that up with the JDK project.
int nextIndex = random.nextInt(size - frontPointer) + frontPointer;
T nextItem = list.get(nextIndex);
// Technically the value we end up putting into frontPointer
// is never used again, but using swap anyway, for clarity.
Collections.swap(list, nextIndex, frontPointer);
frontPointer++;
// All items from frontPointer onwards have not yet been chosen.
action.accept(nextItem);
return true;
}
};
return StreamSupport.stream(spliterator, false);
}

Try something like that:
List<String> getSomeRandom(int size, List<String> sourceList) {
List<String> copy = new ArrayList<String>(sourceList);
Collections.shuffle(copy);
List<String> result = new ArrayList<String>();
for (int i = 0; i < size; i++) {
result.add(copy.get(i));
}
return result;
}

The answer is very simple(with stream):
List<String> a = src.stream().sorted((o1, o2) -> {
if (o1.equals(o2)) return 0;
return (r.nextBoolean()) ? 1 : -1;
}).limit(10).collect(Collectors.toList());
You can test it:
List<String> src = new ArrayList<String>();
for (int i = 0; i < 20; i++) {
src.add(String.valueOf(i*10));
}
Random r = new Random();
List<String> a = src.stream().sorted((o1, o2) -> {
if (o1.equals(o2)) return 0;
return (r.nextBoolean()) ? 1 : -1;
}).limit(10).collect(Collectors.toList());
System.out.println(a);

If you want non repeated items in the result list and your initial list is immutable:
There isn't a direct way to get it from the current Streams API.
It's not possible to use a random Comparator because it's going to break the compare contract.
You can try something like:
public List<String> getStringList(final List<String> strings, final int size) {
if (size < 1 || size > strings.size()) {
throw new IllegalArgumentException("Out of range size.");
}
final List<String> stringList = new ArrayList<>(size);
for (int i = 0; i < size; i++) {
getRandomString(strings, stringList)
.ifPresent(stringList::add);
}
return stringList;
}
private Optional<String> getRandomString(final List<String> stringList, final List<String> excludeStringList) {
final List<String> filteredStringList = stringList.stream()
.filter(c -> !excludeStringList.contains(c))
.collect(toList());
if (filteredStringList.isEmpty()) {
return Optional.empty();
}
final int randomIndex = new Random().nextInt(filteredStringList.size());
return Optional.of(filteredStringList.get(randomIndex));
}

#kozla13 improved version:
List<String> st = Arrays.asList("aaaa","bbbb","cccc");
st.stream().min((o1, o2) -> o1 == o2 ? 0 : (ThreadLocalRandom.current().nextBoolean() ? -1 : 1)).orElseThrow();
Used java built-in class ThreadLocalRandom
nextInt generates one from sequence [-1, 0, 1], but return 0 in compare func means equals for the elements and side effect of this - first element (o1) will be always taken in this case.
properly handle object equals case

A stream is probably overkill. Copy the source list so you're not creating side-effects, then give back a sublist of the shuffled copy.
public static List<String> createList(int listSize, List<String> sourceList) {
if (listSize > sourceList.size()) {
throw IllegalArgumentException("Not enough words for new list.");
}
List<String> copy = new ArrayList<>(sourceList);
Collections.shuffle(copy);
return copy.subList(0, listSize);
}

If the source list is generally much larger than the new list, you might gain some efficiencies by using a BitSet to get random indices:
List<String> createList3(int listSize, List<String> sourceList) {
if (listSize > sourceList.size()) {
throw new IllegalArgumentException("Not enough words in the source list.");
}
List<String> newWords = randomWords(listSize, sourceList);
Collections.shuffle(newWords); // optional, for random order
return newWords;
}
private List<String> randomWords(int listSize, List<String> sourceList) {
int endExclusive = sourceList.size();
BitSet indices = new BitSet(endExclusive);
Random rand = new Random();
while (indices.cardinality() < listSize) {
indices.set(rand.nextInt(endExclusive));
}
return indices.stream().mapToObj(i -> sourceList.get(i))
.collect(Collectors.toList());
}

One liner to randomize a stream:
Stream.of(1, 2, 3, 4, 5).sorted(Comparator.comparingDouble(x -> Math.random()))

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Faster method to extract distinct string from an Arraylist - java

Related

What is the time complexity of my solutions?

Generate flat tree using Stream API

Method for returning the mode that occurs most frequently

Java 8 stream filter

How to get random objects from a stream

Categories

Resources