What would be a good data structure in Java to hold millisecond price data?
About 1.000.000 elementsor more in the container.
What is the specific container (vector, array, hash, etc) specifically designed to to hold a lot of data.
I do not intend to filter or sort the data but I do need the ability to assign a key (possibly a tie stamp) to get the corresponding price.
If you need to assign a value (price) to a key (price), you should use a collection implementing the Map interface: TreeMap, HashMap, Hashtable or LinkedHashMap. You can find an overview of the collections here.
If a key can have more than one value, you will need a Multimap, i.e. a map where the value is a collection (see an example here).
Assuming it is not the case, I'd suggest to use a TreeMap, i.e. a map where keys are sorted. Here is an example:
Map<Long, Double> priceHistory = new TreeMap<>();
ZonedDateTime zdt1 = ZonedDateTime.now();
ZonedDateTime zdt2 = ZonedDateTime.of(
LocalDate.of(2015,9,26),
LocalTime.of(11,10,9),
ZoneId.systemDefault());
priceHistory.put(zdt1.toInstant().toEpochMilli(), 10.5);
priceHistory.put(zdt2.toInstant().toEpochMilli(), 11.5);
It should not be a problem to hold 1 million elements and it should be fast to get the price for a given time stamp.
You probably don't really need millisecond precision, you just need to be able to get something for any given millisecond. Guava's RangeMap is great for this, you can construct a RangeMap<LocalDateTime, Price> and populate it with a value per second (for instance) with Range.closedOpen(second, second.plusSeconds(1)) as the key. You can then call .get() on an arbitrary LocalDateTime and it will return the value associated with the Range containing that time. You get arbitrary lookup precision but only use as much storage as you care to.
If you really do want millisecond precision you're going to have a hard time storing anything more than a few days of data efficiently. Instead you should look into storing your data in a database. Databases are designed to efficiently store and query large datasets, far beyond what standard in-memory data structures are designed for.
Related
I am currently working on DynamoDB and I saw there is no inbuilt function to put offset and limit like SQL queries. The only thing I have seen is get the lastEvaluatedKey and pass as exclusiveStartKey. But my problem is according to our scenarios we can't get the lastEvaluatedKey and populate because we are sorting data and parallel-stream the data.
So there will be issues and hard points. I just need to know what is the clean way or are there any best way we can pass the offset and limit and get the data without iterating over all the data. Because right now even though there is offset and limit put to bound-iterator it is inside get and iterating all the data in DynamoDB which consumes lots of read capacities even though we don't need the other data.
Map<String, AttributeValue> valueMap = new HashMap<>();
valueMap.put(":v1", new AttributeValue().withS(id));
Map<String, String> nameMap = new HashMap<>();
nameMap.put("#PK", "PK");
DynamoDBQueryExpression<TestClass> expression = new DynamoDBQueryExpression<TestClass>()
.withKeyConditionExpression("#PK = :v1")
.withExpressionAttributeNames(nameMap)
.withExpressionAttributeValues(valueMap)
.withConsistentRead(false);
PaginatedQueryList<TestClass> testList = dynamoDBMapper.query(TestClass.class, expression);
Iterator<TestClass> iterator = IteratorUtils.boundedIterator(testList.iterator(), offset, limit);
return IteratorUtils.toList(iterator);
What will be the best way to handle this issue?
A Query request has a Limit option just like you wanted. As for the offset, you have the ExclusiveStartKey option, which says at which sort key inside the long partition you want to start. Although usually one pages through a long partition by setting ExclusiveStartKey to the LastEvaluatedKey of the previous page, you don't strictly need to do this, and can actually pass any existing or even non-existing item, and the query will start after that key (that's the meaning of the word exclusive, i.e., it excludes).
But when you said "offset", you probably meant a numeric offset - e.g., start at the 1000th item in the partition. Unfortunately, this is not supported by DynamoDB. You can approximate it if your sort key (or LSI key) is a numeric offset if the item (only practical if you only append to the partition...) or using some additional data structures, but it's not supported by DynamoDB itself.
I have an ArrayList which stores 0...4 Dates.
The amount of Dates in the list depends on a Business logic.
How can I get the earliest date of this list? Of course I can build iterative loops to finally retrieve the earliest date. But is there a 'cleaner'/ quicker way of doing this, especially when considering that this list can grow on a later perspective?
java.util.Date implements Comparable<Date>, so you can simply use:
Date minDate = Collections.min(listOfDates);
This relies on there being at least one element in the list. If the list might be empty (amongst many other approaches):
Optional<Date> minDate = listOfDates.stream().min(Comparator.naturalOrder());
SortedSet
Depending on your application you can think about using a SortedSet (like TreeSet). This allows you to change the collection and always receive the lowest element easily.
Adding elements to the collection is more expensive, though.
If you dont mind to change the insertion order then sort the list and get the elemeent at index 0
I have a String that I need to search for in a collection of Strings. I'll need to do searches for multiple representations of the required String(original representation, trimmed, UTF-8 encoded, non ASCII characters encoded). The collection size will be in the order of thousands.
I'm trying to figure out what's the best representation to use for the collection in order to have the best performance:
ArrayList - iterate over the array and check if any of the elements match any of the Strings representations
HashMap - check if map contains any of my Strings representation
Any other?
Generally speaking, HashMap (or any other hashtable-based data structure) is much more preferred for "lookup" exercise. The reason is simple, those data structures support lookup in constant time (independent of collection size).
But... in your scenario (single query for collection), you probably will not gain any performance improvements from using HashMap instead of ArrayList. Reasons:
Putting data inside HashMap will take some time. Not significant time, but comparable to one full pass of the initial list.
Your collection is pretty small - iterating over 5000 of elements is a matter of couple milliseconds (or faster?). Since you need to "search" only once, you will not save much time on that.
I want a data structure that contain the following:
Two Objects that represent a timeslot (e.g. LocalDateTime in Java 8)
A boolean variable
The Timeslot needs to be connected to a boolean. The program needs to know if a timeslot is available. It will be used in a knapsack problem solving algorithm for work scheduling.
What I've got so far:
ArrayList<Map<List<LocalDateTime>, Boolean>>
But it looks pretty complicated and a Map might not be the best way to iterate through if I don't know the key. I tought about a ArrayList instead of a Map but I don't know how to initialize it with different Data Types.
Table from Google Guava sounds like a perfect fit for this.
Table<LocalDateTime, LocalDateTime, Boolean> dateTable
= TreeBasedTable.create();
The reason for it is that it gives you access to row, column, and value (in parameter order), and will allow you to do relatively straightforward lookups.
An example: If you want to find all of the values for a given LocalDateTime row, then you would do this:
LocalDateTime today = LocalDateTime.of(2014, Month.JULY, 26, 0, 0);
// prints a map of all of today's values
System.out.println(dateTable.row(today));
I have c.1,000,000 objects that need to be stored in some form of data structure. They must be unique by a key (ID) - but sorted according to their date. I'm therefore trying to think of a best way of storing them in some form of data structure.
Performance (in terms of time taken to execute) it the primary goal, and then memory usage. My idea was to put the objects into a Tree, so they may be sorted according to their date as they enter the data structure, and I can then return them in order. However - I think this is going to be horrendously slow to find a single object based on it's ID. One thought that did occur to me was to have a secondary structure which linked ID's to dates so I can reduce the time taken to find the single object, or just store everything by this ID anyway (perhaps in a HashTable) and then just sort through all 1,000,000 objects when I want to return them (although this seems to take a very long time).
Key Points:
Objects may be added afterwards so the c.1,000,000 objects ARE NOT fixed. They WILL NOT be updated or removed.
I MAY NOT use Java's built in Comparator.
I am optimising for efficiency of returning the data - whether this be the complete set in order (by date), or a single object obtained from it's ID.
if performance in your chief concern before memory usage, i'd go with 2 datastructures:
ArrayList<YourClass> instancesByDate;
and
HashMap<SomeId,YourClass> instancesById;
this gives you the fastest traversal by date and O(1) lookup (depending on hashCode() obviously).
How about using a hashtable of ID => yourobject for the ID lookups, and a secondary hashtable of date (at some level of granularity) => Vector<yourobject>? You could choose the 'granularity' of the date to ensure you've a moderate number of objects in each vector - and sort each by date.