I am writing simple Book management (library) application, but I am confised with right data reprezentation.
I have classes like:
Book (isbn, author, title, libraryIdNumber)
BookDatabase (collection of Books)
In this example the main data structure inside BookDatabase is a
Map<Integer, Book> booksById;
where key is a unique library identifier of the book (for example incremental int value), and the value is just Book instance.
It is now easy to find a book by ID, but it is hard to find all books with the same title. So I would need another data structure like
Map<String, List<Book>> booksByTitle;
where the key is title of the book, and the value is a list of all books with this title in library (for example 10 copies of "Lion King").
I know that the best way to store such data would be just database, but question is more general:
Is it ok to store the same data in many data structures inside one storage class or it is memory inefficient? Or maybe it is better to get additional representation of the data when it is actually needed (for example by invoking Map.values() method)?
What is the best approach in such case in your opinion?
Is it ok to store the same data in many data structures inside one storage class or it is memory inefficient?
Yes. You'll just be copying the references to your Book objects, not the actual Book objects themselves -- so it is memory-efficient.
P.S. You are right that a database would be better: it makes querying very easy, for example.
Related
I want to build a data structure to store the information of multiple houses, and later user can retrieve desirable housing information through a search query. In order to achieve a fast search, I will use red black tree. The problem I am facing is that the key of each node only contains one attribute of the house i.e. price, as for the others such as number of beds, land size etc they can not be stored in a single tree. What would be a good data structure for this problem, initially I thought a tree nested in a tree, is this viable or considered good?
The problem you are facing can be solved using secondary indexes on top of your data. Secondary indexes are a concept studied intensely in the database world and you should have no trouble finding resources to help you understand how they are implemented in real databases.
So, you currently have a primary key for your data: the objects memory reference or maybe an index into a collection of references. For each attribute that you want to query you will need to have a fast way of looking up matching objects. The exact data structure you use will depend on the type of queries you perform but some kind of search tree will be a good general purpose data structure and will usually be efficient for updates which is very important for a lot of databases. Your data structure should take in a query relating to the specific attribute and return references, or primary keys, to all the objects which match that query.
In your example you might have one red-black tree for price and another for number-of-beds. If you are answering a query for "price = 30 or number-of-beds = 4" then all you need to do is query your price data structure and then your number-of-beds data structure and then since you have an "or" in your query you simply take the union of the primary keys returned from your data structures (take the intersection for "and"s).
Notice that if you add to or update your objects then you will also need to update all the indexes that change. This is a trade-off you also see in real databases; faster reads for slower writes.
A nested tree approach might work depending on what kind of queries you are making but will quickly become unsuitable if the data structure is not static - it will be very slow to update the tree if you update your objects.
I have abstract super class and some sub classes. My question is how is the best way to keep objects of those classes so I can easily find them using all the different parameters.
For example if I want to look up with resourceCode (every object is with unique resource code) I can use HashMap with key value resourceCode. But what happens if I want to look up with genre - there are many games with the same genre so I will get all those games. My first idea was with ArrayList of those objects, but isn’t it too slow if we have 1 000 000 games (about 1 000 000 operations).
My other idea is to have a HashTable with key value the product code. Complexity of the search is constant. After that I create that many HashSets as I have fields in the classes and for each field I get the productCode/product Codes of the objects, that are in the HashSet under that certain filed (for example game promoter). With those unique codes I can get everything I want from the HashTable. Is this a good idea? It seems there will be needed a lot of space for the date to be stored, but it will be fast.
So my question is what Data Structure should I use so I can implement fast finding of custom object, searching by its attributes (fields)
Please see the attachment: Classes Example
Thank you in advanced.
Stefan Stefanov
You can use Sorted or Ordered data structures to optimize search complexity.
You can introduce your own search index for custom data.
But it is better to use database or search engine.
Have a look at Elasticsearch, Apache Solr, PostgreSQL
It sounds like most of your fields can be mapped to a string (name, genre, promoter, description, year of release, ...). You could put all these strings in a single large index that maps each keyword to all objects that contain the word in any of their fields. Then if you search for certain keywords it will return a list of all entries that contain that word. For example searching for 'mine' should return 'minecraft' (because of title), as well as all mine craft clones (having 'minecraft-like' as genre) and all games that use the word 'mine' in the 'info text' field.
You can code this yourself, but I suppose some fulltext indexer, such as Lucene may be useful. I haven't used Lucene myself, but I suppose it would also allow you to search for multiple keyword at once, even if they occur in different fields.
This is not a very appealing answer.
Start with a database. Maybe an embedded database (like h2database).
Easy set of fixed develop/test data; can be easily changed. (The database dump.)
. Too many indices (hash maps) harm
Developing and optimizing queries is easier (declarative) than with data structures
Database tables are less coupled than data structures with help structures (maps)
The resulting system is far less complex and better scalable
After development has stabilized the set of queries, you can think of doing away of the DB part. Use at least a two tier separation of database and the classes.
Then you might find a stable and best fitting data model.
Should you still intend to do it all with pure objects, then work them out in detail as design documentation before you start programming. Example stories, and how one solves them.
I am trying to make a simple music database that will generate a playlist based off of song tempo and genre. However, I am stuck trying to decide the best data structure to use to parse my CSV file to.
If I use an ArrayList, I would create a Song class, and the ArrayList would be my music database. The song class would have getters and setters for Title Artist Tempo and Genre. I would get my resultant playlist by using conditionals to whittle down the contents of our ArrayList (if tempo isn't at least a certain number, etc)
If I use a HashMap, I would set the keys to a pair , and values to pair (So it would be HashMap<,>).
What is the most efficient way to organize this information?
It depends on how much data this database will contain and what fields will be searched. I think the most general way is a List for the database itself, where items are appended and removed (maybe mark deleted) by the implicit ID that may be the index in the list.
Then, for performance reasons you'll likely need to keep some form of indices. The data structure used in real databases is the tree, but you may also use an ordered list for each field and perform a binary search to keep it simple.
Also, consider that for Java there are many databases (Derby, H2) that are really small, lightweight, can run in-process and have an SQL interface.
After starting to read a book on OO programming, I am attempting to make my android app more OO. However I am stumped on a simple scenario.
I have a Book object, which can have many say Chapter objects. I also have a search function which searches across multiple books, 97 of them. I end up with many Chapter objects from the Sqlite table.
I felt that it would be useful to the user to be able to see the title of the book on each result, otherwise it might be confusing if there are say two "chapter 5" results.
For that to happen, I need the book title. Should I make it part of my chapter object, like :
chapter.getBookTitle()
Which kind of does not seem right, as I have glued the book name onto a chapter... The alternative is to instantiate a book object for each chapter and somehow reference it, which has its own problems including in android with regards to not being able to pass a reference to an in-memory object to another activity.
Also a book may have many other chapters which were not results in the search, and it may seem like they would return if I was to just instantiate the book.
What is the correct OO solution to this seemingly simple issue? Is it just a matter of learning when not to be dogmatic about the whole OO thing?
More Info:
I am using FTS4 in Sqlite, which accounts for over half of my actual DB size of 80mb. What I am storing is text from 97 books, with chapters in 4 languages. So my FTS at the moment stores:
ChapterId, ChapterNo (withinBook), Lang1, Lang2, Lang3, Lang4, Tags, Notes
The searching is very fast, I retrieve only 50 results. I match any column with a string term, and not one column in particular. So if I type "apple" it will search all the fields above.
Currently as part of my FTS query I am join a join onto Book, fetching the BookId, I later use that to get me the title of the book. However its all in a procedural like style, with no regard to where the information "belongs".
I need the title so I can display it in the results, just for user convenience.
It works well, however I am wanting similar performance or slightly less but with an OO approach as I think that will make more sense to me when I come back to this project after a long pause.
The Chapter object should have reference to the book it came from so I would suggest something like
chapter.getBook().getTitle();
Your database table structure should have a books table and a chapters table with columns like:
books
id
book specific info
etc
chapters
id
book_id
chapter specific info
etc
Then to reduce the number of queries use a join table in your search query.
The approach I would take is: when reading the chapters from the database, instead of a collection of chapters, use a collection of books. This will have your chapters organised into books and you'll be able to use information from both classes to present the information to the user (you can even present it in a hierarchical way easily when using this approach).
You might implement your class model by composition, having the book object have a map of chapter objects contained within it (map chapter number to chapter object). Your search function could be given a list of books into which to search by asking each book to search its chapters. The book object would then iterate over each chapter, invoking the chapter.search() function to look for the desired key and return some kind of index into the chapter. The book's search() would then return some data type which could combine a reference to the book and some way to reference the data that it found for the search. The reference to the book could be used to get the name of the book object that is associated with the collection of chapter search hits.
I'm trying to come up with a simple way of organizing some objects, in terms of what classes to create. Let's say I'm trying to keep track of books. A book can fall under a number of different genres and subgenres. I want to be able to recognize a book as one book and yet have it fall under these different categories. I have a genre class which keeps track of all the subgenres, and a subgenre class which has all of the books in it. I want the book to know all of the genre and subgenres that it falls under. I also want to keep track of some statistics (reviews, comments, number of times read, etc.) based on genre and subgenre and then be able to aggregate them to get numbers for the entire book. In this way, a user could select a book and know, each genre/subgenre that the book belongs to, and soem statistics about that book for each category
What are some ideas for how I can design this?
My thought was to have each Book define a class called BookGroup, and the BookGroup would contain the Genre and Subgenre, along with any relevant information for that category (assuming that subgenres can only belong to one genre). Then in the Book class I would keep a set of bookgroups that the book belongs in. I can add up stats from all the different bookgroups. The only thing I don't like about this is that I feel like a BookGroup should contain Books, not the other way around.
Any other ideas?
Thanks!
Edit:
All you guys gave really good tips. I think for simplicity reasons, I might do something like this for now:
class Book
{
Genre myGenre;
SubGenre mySubGenre;
String myTitle;
}
class Library
{
Map<String,Set<Book>> allBooks = new HashMap<String,Set<Book>>();
//where allBooks contains a mapping from book title, to all of the book objects which actually represent the same book but may contain different information related to their specific genre/subgenre
}
I'd imagine you would want your classes to look something like this:
public class Book
{
String name;
List<Review> reviews;
Set<Genre> genres;
public Book(String name, Set<Genre> genres){}
}
public class Genre
{
String name;
Set<Book> books;
public Genre(String name, Set<Book> books){}
}
I am making an assumption here that you will be utilizing a database, in turn you would have a DAO to query on all known books that match a criteria and subsequently perform CRUD operations across the datasets. I feel a bit off by suggesting that the Genre constructor takes a Set of Book objects, but at the moment I can't think of another way to do this right now.
So, the problem is to do with inverse relationships, really. It's quite difficult to avoid this and maintain efficiency. A relational database sidesteps this issue by optimising in the background, using efficient query operations, and never storing the inverse relationship in the first place.
If you use a relational database in the background, you can create methods that get the book groups using a relational query without ever storing the information in Java.
I would just make two enums, one BookGenre = {scifi, novel, ...} and similar for subgenres. When creating a new Book object, add a reference to the Book object to some list which keeps track of all scifi book, etc ( i.e. make an EnumMap> which maps each genre to a list of books ); in this way you can easily access all the books of a genre.
There have been good suggestions from the other posters, but your original idea might work as well. The biggest problem for you, if I understand you correctly, appears to be one of naming: your 'BookGroup' is not really a grouping as such, but a descriptor of which group (genre/subgenre) it belongs to plus associated statistics. If you renamed it to e.g. 'BookGenreStatistics', the question of who contains what would go away.
I think you want collections pointing to each other. And when adding a book to a changre you would also add the changre to the book. Then just iterate as needed to obtain what you wanted. A changre and a sub changre should really be the same class, no need to have different classes here.
An alternative to this would be not to have references in a book to what changres it belongs to, instead if you need to know you would have to iterate through all changres and see if the book is in them. Depends on how many changres there are and how usual it is for a book to belong to a changre. Let's say if most changres have over half of all the books in them. The obvious third option is not to have books in changres, in that case you would have to iterate through the books to obtain the changres, the question is if most books belong to almost all changres, or if changres are unusual and only contain few books.
If you chose option number one, then a changre would be able to contain books and other changres, and a book would be able to contain changres but not other books. Sounds similar doesn't it? Well, it is, a changre and a book is the same thing, well, almost. The main difference is how you use them. Imagine a tree where the changres on top point down to subchangres and so forth, then they in turn point down to books who in turn point back up to the subchangres they're part of. Then in order to find all books in a changre for instance you would just have to traverse the tree from root up, except when you're at a book you stop. If a book can belong to several changres (yes, it can, right?), then you just need a loop variable in the book that's set when iterating and if the book is reached a second time you know because the variable has already been set.
For instance finding all the books in a changre:
1. Construct collection object that is to hold the result.
2. (in subclass changre) Iterate through all changres and books (they might be stored in the same collection object)
2. (same method as above, but in subclass book) Check if iteration field is set, if so just return, else add this to the result collection object.
3. Unset iteration field in all books of the result collection object to make it possible to redo from step one. (the alternative to having such an iteration field is of course use a collection that doesn't matter if you put in duplicates)
-Done, a book simply instead of iterating through the changres it has (like a changre does) knows that it has to add itself to the result.
Now that I think about it I think there's a tool that automatically generates code where you can specify things like a changre can have books and so on, and then to find all book reviews in a changre you can specify to traverse from the changre, pass at most one book on your path through the graph, and end in a review, and then agregate the results, and it generates code that does that. I don't remember the name or what language it was, but I think code like this can be generated from only a few lines, but of course writing it yourself shouldn't hurt either.