which data structure should I use for trading system? - java

I have finished my studying of a basic trading system structure and going to build one now. I am going to put a table of historical data (having columns of day high, day low...etcs of days of many years) into an array or other kind of data structure, then use the date to do analysis, and some analysis would need to put the result into another array (or another data structure).
So basically it would be few "tables" with each having around 6 columns and around 2000-5000 rows. I would do calculations within these tables and then store the result to another similar size table.
Array is good enough? Or I should choose another data structure like linked list?

It really depends on your requirements.
If you are just doing simple analysis on relatively small amounts of data (sounds like this is the case?) then there's no need to be too fancy. Probably a simple ArrayList of rows would work well. Keep it simple.
If performance is a concern then you need to understand the usage pattern a lot more. For example, if you are doing a lot of read-only lookups then you may want to create indexes into the data (e.g. using HashMaps). But that's getting quite advanced.
So for now, I suggest sticking with an ArrayList.

Related

Is 'hashing' more efficient than 'linear' search?

I decided to revise Java collection framework, so I started with internal implementation. One question came on my mind, which I can't solve. Hope someone can make a clear explanation on following.
ArrayList uses linear or binary search (both have pros/cons), but we can do anything with them! My question is why do all 'hashing' classes (like HashMap f.e.) use hashing principle? Couldn't they settle with linear or binary search for example? Why just not store Key/Value pair inside array? And the opposite, why isn't (for example ArrayList stored in hashTable)?
The intention of the collections framework is that the programmer will choose the data structure appropriate to the use case. Depending on what you're using it for, different data structures are appropriate.
Hashing classes use the hashing principle, as you put it, because if you choose them, then that's what you want to use. (Hashing is generally the best choice for simple, straightforward lookups.) A screwdriver uses the screwing principle because if you pick up a screwdriver, you want to screw something in; if you had a nail you needed to put in, you would have picked up the hammer instead.
But if you're not going to be performing lookups, or if linear search is good enough for you, then an ArrayList is what you want. It's not worth adding a hash table to a collection that's never going to use it, and it costs CPU and memory to do things you aren't going to need.
I had a large hash of values (about 1,500). The nature of the code was that once the hashmap was loaded it would never be altered. The hashmap was accessed many times per web page, and I had wondered if it could be sped up for faster page loading.
One day I had some time, so I did a series of time tests (using the nano time function). I then reworked the hashmap use over to an array. Not an ArrayList, but an actual array[]. I stored the index with the key class used to get the hash value.
There was a difference, that the array lookup was faster. I calculated that over a days worth of activity I would have saved almost a full second!
So yes, using an array is faster than using a hash, YMMV :-)
And I reverted my code back to using a hashmap, as it was easier to maintain...

Sorted Array vs Hashtable: Which data structure would be more efficient in searching over a range of dates in a calendar app?

I have about a year of experience in coding in Java. To hone my skills I'm trying to write a Calendar/journal entry desktop app in Java. I've realized that I still have no experience in data persistence and still don't really understand what the data persistence options would be for this program -- So perhaps I'm jumping the gun, and the design choices that I'm hoping to implement aren't even applicable once I get into the nitty gritty.
I mainly want to write a calendar app that allows you to log daily journal entries with associated activity logs for time spent on daily tasks. In terms of adding, editing and viewing the journal entries, using a hash table with the dates of the entries as keys and the entries themselves as the values seems most Big-Oh efficient (O(1) average case for each using a hash table).
However, I'm also hoping to implement a feature that could, given a certain range of dates, provide a simple analysis of average amount of time spent on certain tasks per day. If this is one of the main features I'm interested in, am I wrong in thinking that perhaps a sorted array would be more Big-Oh efficient? Especially considering that the data entries are generally expected to already be added date by date.
Or perhaps there's another option I'm unaware of?
The reason I'm asking is because of the answer provided by this following question: Why not use hashing/hash tables for everything?
And the reason I'm unsure if I'm even asking the right question is because of the answer to the following question: Whats the best data structure for a calendar / day planner?
If so, I would really appreciate being directed other resources on data persistence in java.
Thank you for the help!
Use a NavigableMap interface (implemented by TreeMap, a red-black tree).
This allows you to easily and efficiently select date ranges and traverse over events in key order.
As an aside, if you consider time or date intervals to be "half-open" it will make many problems easier. That is, when selecting events, include the lower bound in results, but exclude the upper. The methods of NavigableMap, like subMap(), are designed to work this way, and it's a good practice when you are working with intervals of any quantity, as it's easy to define a sequence of intervals without overlap or gaps.
Depends on how serious you want your project to be. In all cases, be careful of premature optimization. This is when you try too hard to make your code "efficient", and sacrifice readability/maintainability in the process. For example, there is likely a way of doing manual memory management with native code to make a more efficient implementation of a data structure for your calendar, but it likely does not outweigh the beneits of using familiar APIs etc. It might do, but you only know when you run your code.
Write readable code
Run it, test for performance issues
Use a profiler (e.g. JProfiler) to identify the code that is responsible for poor performance
Optimise that code
Repeat
For code that will "work", but will not be very scalable, a simple List will usually do fine. You can use JSONs to store your objects, and a library such as Jackson Databind to map between List and JSON. You could then simply save it to a file for persistence.
For an application that you want to be more robust and protected against data corruption, a database is probably better. With this, you can guarantee that, for example, data is not partially written, concurrent access to the same data will not result in corruption, and a whole host of other benefits. However, you will need to have a database server running alongside your application. You can use JDBC and suitable drivers for your database vendor (e.g. Mysql) to connect to, read from and write to the database.
For a serious application, you will probably want to create an API for your persistence. A framework like Spring is very helpful for this, as it allows you to declare REST endpoints using annotations, and introduces useful programming concepts, such as containers, IoC/Dependency Injection, Testing (unit tests and integration tests), JPA/ORM systems and more.
Like I say, this is all context dependent, but above all else, avoid premature optimization.
This thread might give you some ideas what data structure to use for Range Queries.
Data structure for range query
And it even might be easier to use a database and using an API to query for the desired range.
If you are using (or are able to use) Guava, you might consider using RangeMap (*).
This would allow you to use, say, a RangeMap<Instant, Event>, which you could then query to say "what event is occurring at time T".
One drawback is that you wouldn't be able to model concurrent events (e.g. when you are double-booked in two meetings).
(*) I work for Google, Guava is Google's open-sourced Java library. This is the library I would use, but others with similar range map offerings are available.

Performance: List containing lots of objects VS lots of objects containing smaller lists

I am currently developing my own app (just another remake of the famous "Game of Life") and I want to add a "revert" button. My game basically consists of a two dimensional array: Cell[][]...
So: My idea was to create an ArrayList which that array is being added to every update... (With a limit of 50 entries)
But then I thought, that that would be a lot of Objects in that list... So:
Would it be more performant to have an ArrayList in each cell in the 2-dimensional array, containing the history of itself, or to have a huge ArrayList containing entire game states as a history?
(I don't think you need any of my code to answer this question, but I will post it if you do)
Instead of storing cell state, why not keep the events that transform the cell state? An event would simply encapsulate the previous state info and the next state, or maybe even a coding enum that can be executed in reverse. This would already significantly reduce the data you need to store.
Also, it makes a difference if you're working natively, or on a garbage-collecting VM platform such as Java or C#. The former tends to be faster and is more forgiving in terms of large addressing storage, while the latter can start trashing wildly if you store lots of objects.
But the real proof of the pudding is in the eating / tinstatfc: there is no such thing as the fastest code (c) Michael Abrash -> i.e. benchmark your code and find out! (and come tell us)
In my opinion, an ArrayList per cell would work fine; however, if you only want to store 50 states, you'll have to create a custom ADT or utilize Guava's Collections API.
I recommend using a Stack rather than an ArrayList, as reverting every cell would be a simple operation. In Java, the correct ADT to use is an ArrayDeque, but you'll have to handle capacity yourself.

Data structure for continuous additions and cheap deletions

I am reading this blog post about making animations with Gnuplot and Cairo -terminal which algo's plan is simply
to save png-images to working directory, and
to save latest the video to working directory.
I would like to have something more such that the user can also browse the images real time when the images are being converted:
Data-parallelism model - data structure regularly arranged in an array
to give the user some list in some interface which the user can browse by arrow buttons
in this interface, new images are being added to the end of the list
the user can also remove bad images from the stream in real time
which may work well in Data parallelism model of Parallel programming i.e. a data set regularly structured in an array.
The operations (additions, deletions) can operate on this data, but independently on distinct processes.
Let's assume that there is no need for efficient searches for simplicity in Version 1.
However, if you come with a model which can do that also, I am happy to consider it - let's call it Version 2.
I think a list is not a good data structure here because of the wanted opportunity for deletions and continuous easy addition to the end of the data structure.
The data structure stack is not going to work either because of deletions.
I think some sort of tree data structure can work because of rather cheap deletions and cheap search there.
However, a simple array in the Data-parallelism model can be sufficient.
Languages
I think Java is a good option here because of parallelism.
However, any language and pseudocode are good too.
Frontend
I have an intuition that requirements for such a system in the frontend should be qT as a terminal emulator.
What is a better data structure for cheap deletions and continuous additions to the end?
Java LinkedList seems to be the thing you could use for version 1. you can use its single param add() to append to the list in constant time. if by "real-time" you mean when the image is in user's display and thus pointed to somehow, can delete them in constant time as well.
optimum use of memory and no re-instantiation as you'd have with an Arraylist.
any doubly linked list implemented on objects (as opposed to an array) would do.
your second version isn't clear enough.

Suggestions for data structure for keyword search

I need to put together a data structure that will efficiently provide keyword search facilities.
My metrics are:
Circa 500,000 products.
Circa 20+ keywords per product (a guess).
Products are identified by an ID of about 10 digits but may be any ASCII codes going forward.
I would like to try to fit the data structure in memory if possible. I will be on a server so I can assume some significant memory availability.
Speed is important. Using LIKE database queries will not be an acceptable solution.
Any ideas for a data structure?
My thoughts:
TrieMap
Very efficient for the keywords but there would need to be a list of product IDs hanging off any leaf so seriously memory hungry. Any ideas that could help with that?
Compression
Various compression schemes come to mind but none jump out as of significant value.
Has anyone else put something like this together? Could you share your experiences?
The data may change but not often. It would be reasonable to rebuild the structure on a daily basis to accommodate changes.
Have you thought about using lucene either in memory or as a file system index?
It is quite fast and has lots of room for further requirements that might arise in the future.

Categories

Resources