How to implement a simple scenario the OO way

How to implement a simple scenario the OO way - java

After starting to read a book on OO programming, I am attempting to make my android app more OO. However I am stumped on a simple scenario.
I have a Book object, which can have many say Chapter objects. I also have a search function which searches across multiple books, 97 of them. I end up with many Chapter objects from the Sqlite table.
I felt that it would be useful to the user to be able to see the title of the book on each result, otherwise it might be confusing if there are say two "chapter 5" results.
For that to happen, I need the book title. Should I make it part of my chapter object, like :
chapter.getBookTitle()
Which kind of does not seem right, as I have glued the book name onto a chapter... The alternative is to instantiate a book object for each chapter and somehow reference it, which has its own problems including in android with regards to not being able to pass a reference to an in-memory object to another activity.
Also a book may have many other chapters which were not results in the search, and it may seem like they would return if I was to just instantiate the book.
What is the correct OO solution to this seemingly simple issue? Is it just a matter of learning when not to be dogmatic about the whole OO thing?
More Info:
I am using FTS4 in Sqlite, which accounts for over half of my actual DB size of 80mb. What I am storing is text from 97 books, with chapters in 4 languages. So my FTS at the moment stores:
ChapterId, ChapterNo (withinBook), Lang1, Lang2, Lang3, Lang4, Tags, Notes
The searching is very fast, I retrieve only 50 results. I match any column with a string term, and not one column in particular. So if I type "apple" it will search all the fields above.
Currently as part of my FTS query I am join a join onto Book, fetching the BookId, I later use that to get me the title of the book. However its all in a procedural like style, with no regard to where the information "belongs".
I need the title so I can display it in the results, just for user convenience.
It works well, however I am wanting similar performance or slightly less but with an OO approach as I think that will make more sense to me when I come back to this project after a long pause.

The Chapter object should have reference to the book it came from so I would suggest something like
chapter.getBook().getTitle();
Your database table structure should have a books table and a chapters table with columns like:
books
id
book specific info
etc
chapters
id
book_id
chapter specific info
etc
Then to reduce the number of queries use a join table in your search query.

The approach I would take is: when reading the chapters from the database, instead of a collection of chapters, use a collection of books. This will have your chapters organised into books and you'll be able to use information from both classes to present the information to the user (you can even present it in a hierarchical way easily when using this approach).

You might implement your class model by composition, having the book object have a map of chapter objects contained within it (map chapter number to chapter object). Your search function could be given a list of books into which to search by asking each book to search its chapters. The book object would then iterate over each chapter, invoking the chapter.search() function to look for the desired key and return some kind of index into the chapter. The book's search() would then return some data type which could combine a reference to the book and some way to reference the data that it found for the search. The reference to the book could be used to get the name of the book object that is associated with the collection of chapter search hits.

Related

Data structure for fast searching of custom object using its attributes (fields) in Java

I have abstract super class and some sub classes. My question is how is the best way to keep objects of those classes so I can easily find them using all the different parameters.
For example if I want to look up with resourceCode (every object is with unique resource code) I can use HashMap with key value resourceCode. But what happens if I want to look up with genre - there are many games with the same genre so I will get all those games. My first idea was with ArrayList of those objects, but isn’t it too slow if we have 1 000 000 games (about 1 000 000 operations).
My other idea is to have a HashTable with key value the product code. Complexity of the search is constant. After that I create that many HashSets as I have fields in the classes and for each field I get the productCode/product Codes of the objects, that are in the HashSet under that certain filed (for example game promoter). With those unique codes I can get everything I want from the HashTable. Is this a good idea? It seems there will be needed a lot of space for the date to be stored, but it will be fast.
So my question is what Data Structure should I use so I can implement fast finding of custom object, searching by its attributes (fields)
Please see the attachment: Classes Example
Thank you in advanced.
Stefan Stefanov

You can use Sorted or Ordered data structures to optimize search complexity.
You can introduce your own search index for custom data.
But it is better to use database or search engine.
Have a look at Elasticsearch, Apache Solr, PostgreSQL

It sounds like most of your fields can be mapped to a string (name, genre, promoter, description, year of release, ...). You could put all these strings in a single large index that maps each keyword to all objects that contain the word in any of their fields. Then if you search for certain keywords it will return a list of all entries that contain that word. For example searching for 'mine' should return 'minecraft' (because of title), as well as all mine craft clones (having 'minecraft-like' as genre) and all games that use the word 'mine' in the 'info text' field.
You can code this yourself, but I suppose some fulltext indexer, such as Lucene may be useful. I haven't used Lucene myself, but I suppose it would also allow you to search for multiple keyword at once, even if they occur in different fields.

This is not a very appealing answer.
Start with a database. Maybe an embedded database (like h2database).
Easy set of fixed develop/test data; can be easily changed. (The database dump.)
. Too many indices (hash maps) harm
Developing and optimizing queries is easier (declarative) than with data structures
Database tables are less coupled than data structures with help structures (maps)
The resulting system is far less complex and better scalable
After development has stabilized the set of queries, you can think of doing away of the DB part. Use at least a two tier separation of database and the classes.
Then you might find a stable and best fitting data model.
Should you still intend to do it all with pure objects, then work them out in detail as design documentation before you start programming. Example stories, and how one solves them.

The same data in multiple data structures - the right approach

I am writing simple Book management (library) application, but I am confised with right data reprezentation.
I have classes like:
Book (isbn, author, title, libraryIdNumber)
BookDatabase (collection of Books)
In this example the main data structure inside BookDatabase is a
Map<Integer, Book> booksById;
where key is a unique library identifier of the book (for example incremental int value), and the value is just Book instance.
It is now easy to find a book by ID, but it is hard to find all books with the same title. So I would need another data structure like
Map<String, List<Book>> booksByTitle;
where the key is title of the book, and the value is a list of all books with this title in library (for example 10 copies of "Lion King").
I know that the best way to store such data would be just database, but question is more general:
Is it ok to store the same data in many data structures inside one storage class or it is memory inefficient? Or maybe it is better to get additional representation of the data when it is actually needed (for example by invoking Map.values() method)?
What is the best approach in such case in your opinion?

Is it ok to store the same data in many data structures inside one storage class or it is memory inefficient?
Yes. You'll just be copying the references to your Book objects, not the actual Book objects themselves -- so it is memory-efficient.
P.S. You are right that a database would be better: it makes querying very easy, for example.

What is an index in Elasticsearch

What is an index in Elasticsearch? Does one application have multiple indexes or just one?
Let's say you built a system for some car manufacturer. It deals with people, cars, spare parts, etc. Do you have one index named manufacturer, or do you have one index for people, one for cars and a third for spare parts? Could someone explain?

Good question, and the answer is a lot more nuanced than one might expect. You can use indices for several different purposes.
Indices for Relations
The easiest and most familiar layout clones what you would expect from a relational database. You can (very roughly) think of an index like a database.
MySQL => Databases => Tables => Rows/Columns
ElasticSearch => Indices => Types => Documents with Properties
An ElasticSearch cluster can contain multiple Indices (databases), which in turn contain multiple Types (tables). These types hold multiple Documents (rows), and each document has Properties (columns).
So in your car manufacturing scenario, you may have a SubaruFactory index. Within this index, you have three different types:
People
Cars
Spare_Parts
Each type then contains documents that correspond to that type (e.g. a Subaru Imprezza doc lives inside of the Cars type. This doc contains all the details about that particular car).
Searching and querying takes the format of: http://localhost:9200/[index]/[type]/[operation]
So to retrieve the Subaru document, I may do this:
$ curl -XGET localhost:9200/SubaruFactory/Cars/SubaruImprezza
.
Indices for Logging
Now, the reality is that Indices/Types are much more flexible than the Database/Table abstractions we are used to in RDBMs. They can be considered convenient data organization mechanisms, with added performance benefits depending on how you set up your data.
To demonstrate a radically different approach, a lot of people use ElasticSearch for logging. A standard format is to assign a new index for each day. Your list of indices may look like this:
logs-2013-02-22
logs-2013-02-21
logs-2013-02-20
ElasticSearch allows you to query multiple indices at the same time, so it isn't a problem to do:
$ curl -XGET localhost:9200/logs-2013-02-22,logs-2013-02-21/Errors/_search=q:"Error Message"
Which searches the logs from the last two days at the same time. This format has advantages due to the nature of logs - most logs are never looked at and they are organized in a linear flow of time. Making an index per log is more logical and offers better performance for searching.
.
Indices for Users
Another radically different approach is to create an index per user. Imagine you have some social networking site, and each users has a large amount of random data. You can create a single index for each user. Your structure may look like:
Zach's Index
Hobbies Type
Friends Type
Pictures Type
Fred's Index
Hobbies Type
Friends Type
Pictures Type
Notice how this setup could easily be done in a traditional RDBM fashion (e.g. "Users" Index, with hobbies/friends/pictures as types). All users would then be thrown into a single, giant index.
Instead, it sometimes makes sense to split data apart for data organization and performance reasons. In this scenario, we are assuming each user has a lot of data, and we want them separate. ElasticSearch has no problem letting us create an index per user.

#Zach's answer is valid for elasticsearch 5.X and below. Since elasticsearch 6.X Type has been deprecated and will be completely removed in 7.X. Quoting the elasticsearch docs:
Initially, we spoke about an “index” being similar to a “database” in an SQL database, and a “type” being equivalent to a “table”.
This was a bad analogy that led to incorrect assumptions.
Further to explain, two columns with the same name in SQL from two different tables can be independent of each other. But in an elasticsearch index that is not possible since they are backed by the same Lucene field. Thus, "index" in elasticsearch is not quite same as a "database" in SQL. If there are any same fields in an index they will end up having conflicts of field types. To avoid this the elasticsearch documentation recommends storing index per document type.
Refer: Removal of mapping types

An index is a data structure for storing the mapping of fields to the corresponding documents. The objective is to allow faster searches, often at the expense of increased memory usage and preprocessing time.
The number of indexes you create is a design decision that you should take according to your application requirements. You can have an index for each business concept... You can an index for each month of the year...
You should invest some time getting acquainted with lucene and elasticsearch concepts.
Take a look at the introductory video and to this one with some data design patterns

Above one is too detailed in very short it could be defined as
Index: It is a collection of different type of documents and document properties. Index also uses the concept of shards to improve the performance. For example, a set of document contains data of a social networking application.
Answer from tutorialpoints.com
Since index is collection of different type of documents as per question depends how you want to categorize.
Do you have one index named manufacturer?
Yes , we will keep one document with manufacturer thing.
do you have one index for people, one for cars and a third for spare parts? Could someone explain?
Think of instance car given by same manufacturer to many people driving it on road .So there could be many indices depending upon number of use.
If we think deeply we will found except first question all are invalid ones.
Elastic-search documents are much different that SQL docs or csv or spreadsheet docs ,from one indices and by good powerful query language you can create millions type of data categorised documents in CSV style.
Due to its blazingly fast and indexed capability we create one index only for one customer , from that we create many type of documnets as per our need .
For example:
All old people using same model.Or One Old people using all model .
Permutation is inifinite.

Java program design question

I'm trying to come up with a simple way of organizing some objects, in terms of what classes to create. Let's say I'm trying to keep track of books. A book can fall under a number of different genres and subgenres. I want to be able to recognize a book as one book and yet have it fall under these different categories. I have a genre class which keeps track of all the subgenres, and a subgenre class which has all of the books in it. I want the book to know all of the genre and subgenres that it falls under. I also want to keep track of some statistics (reviews, comments, number of times read, etc.) based on genre and subgenre and then be able to aggregate them to get numbers for the entire book. In this way, a user could select a book and know, each genre/subgenre that the book belongs to, and soem statistics about that book for each category
What are some ideas for how I can design this?
My thought was to have each Book define a class called BookGroup, and the BookGroup would contain the Genre and Subgenre, along with any relevant information for that category (assuming that subgenres can only belong to one genre). Then in the Book class I would keep a set of bookgroups that the book belongs in. I can add up stats from all the different bookgroups. The only thing I don't like about this is that I feel like a BookGroup should contain Books, not the other way around.
Any other ideas?
Thanks!
Edit:
All you guys gave really good tips. I think for simplicity reasons, I might do something like this for now:
class Book
{
Genre myGenre;
SubGenre mySubGenre;
String myTitle;
}
class Library
{
Map<String,Set<Book>> allBooks = new HashMap<String,Set<Book>>();
//where allBooks contains a mapping from book title, to all of the book objects which actually represent the same book but may contain different information related to their specific genre/subgenre
}

I'd imagine you would want your classes to look something like this:
public class Book
{
String name;
List<Review> reviews;
Set<Genre> genres;
public Book(String name, Set<Genre> genres){}
}
public class Genre
{
String name;
Set<Book> books;
public Genre(String name, Set<Book> books){}
}
I am making an assumption here that you will be utilizing a database, in turn you would have a DAO to query on all known books that match a criteria and subsequently perform CRUD operations across the datasets. I feel a bit off by suggesting that the Genre constructor takes a Set of Book objects, but at the moment I can't think of another way to do this right now.

So, the problem is to do with inverse relationships, really. It's quite difficult to avoid this and maintain efficiency. A relational database sidesteps this issue by optimising in the background, using efficient query operations, and never storing the inverse relationship in the first place.
If you use a relational database in the background, you can create methods that get the book groups using a relational query without ever storing the information in Java.

I would just make two enums, one BookGenre = {scifi, novel, ...} and similar for subgenres. When creating a new Book object, add a reference to the Book object to some list which keeps track of all scifi book, etc ( i.e. make an EnumMap> which maps each genre to a list of books ); in this way you can easily access all the books of a genre.

There have been good suggestions from the other posters, but your original idea might work as well. The biggest problem for you, if I understand you correctly, appears to be one of naming: your 'BookGroup' is not really a grouping as such, but a descriptor of which group (genre/subgenre) it belongs to plus associated statistics. If you renamed it to e.g. 'BookGenreStatistics', the question of who contains what would go away.

I think you want collections pointing to each other. And when adding a book to a changre you would also add the changre to the book. Then just iterate as needed to obtain what you wanted. A changre and a sub changre should really be the same class, no need to have different classes here.
An alternative to this would be not to have references in a book to what changres it belongs to, instead if you need to know you would have to iterate through all changres and see if the book is in them. Depends on how many changres there are and how usual it is for a book to belong to a changre. Let's say if most changres have over half of all the books in them. The obvious third option is not to have books in changres, in that case you would have to iterate through the books to obtain the changres, the question is if most books belong to almost all changres, or if changres are unusual and only contain few books.
If you chose option number one, then a changre would be able to contain books and other changres, and a book would be able to contain changres but not other books. Sounds similar doesn't it? Well, it is, a changre and a book is the same thing, well, almost. The main difference is how you use them. Imagine a tree where the changres on top point down to subchangres and so forth, then they in turn point down to books who in turn point back up to the subchangres they're part of. Then in order to find all books in a changre for instance you would just have to traverse the tree from root up, except when you're at a book you stop. If a book can belong to several changres (yes, it can, right?), then you just need a loop variable in the book that's set when iterating and if the book is reached a second time you know because the variable has already been set.
For instance finding all the books in a changre:
1. Construct collection object that is to hold the result.
2. (in subclass changre) Iterate through all changres and books (they might be stored in the same collection object)
2. (same method as above, but in subclass book) Check if iteration field is set, if so just return, else add this to the result collection object.
3. Unset iteration field in all books of the result collection object to make it possible to redo from step one. (the alternative to having such an iteration field is of course use a collection that doesn't matter if you put in duplicates)
-Done, a book simply instead of iterating through the changres it has (like a changre does) knows that it has to add itself to the result.
Now that I think about it I think there's a tool that automatically generates code where you can specify things like a changre can have books and so on, and then to find all book reviews in a changre you can specify to traverse from the changre, pass at most one book on your path through the graph, and end in a review, and then agregate the results, and it generates code that does that. I don't remember the name or what language it was, but I think code like this can be generated from only a few lines, but of course writing it yourself shouldn't hurt either.

Multilingual fields in DB tables

I have an application that needs to support a multilingual interface, five languages to be exact. For the main part of the interface the standard ResourceBundle approach can be used to handle this.
However, the database contains numerous tables whose elements contain human readable names, descriptions, abstracts etc. It needs to be possible to enter each of these in all five languages.
While I suppose I could simply have fields on each table like
NameLang1
NameLang2
...
I feel that that leads to a significant amount of largely identical code when writing the beans the represent each table.
From a purely object oriented point of view the solution is however simple. Each class simply has a Text object that contains the relevant text in each of the languages. This is further helpful in that only one of the language is mandated, the others have fallback rules (e.g. if language 4 is missing return language 2 which fall back to language 1 which is mandatory).
Unfortunately, mapping this back to a relational database, means that I wind up with a single table that some 10-12 other tables FK to (some tables have more than one FK to it in fact).
This approach seems to work and I've been able to map the data to POJOs with Hibernate. About the only thing you cant do is map from a Text object to its parent (since you have no way of knowing which table you should link to), but then there is hardly any need to do that.
So, overall this seems to work but it just feels wrong to have multiple tables reference one table like this. Anyone got a better idea?
If it matters I'm using MySQL...

I had to do that once... multilingual text for some tables... I don't know if I found the best solution but what I did was have the table with the language-agnostic info and then a child table with all the multilingual fields. At least one record was required in the child table, for the default language; more languages could be added later.
On Hibernate you can map the info from the child tables as a Map, and get the info for the language you want, implementing the fallback on your POJO like you said. You can have different getters for the multilingual fields, that internally call the fallback method to get the appropiate child object for the needed language and then just return the required field.
This approach uses more table (one extra table for every table that needs multilingual info) but the performance is much better, as well as the maintenance I think...

The standard translation approach as used, for example, in gettext is to use a single string to describe the concept and make a call to a translate method which translates to the destination language.
This way you only need to store in the database a single string (the canonical representation) and then make a call in your application to the translate method to get the translated string. No FKs and total flexibility at the cost of a little of runtime performance (and maybe a bit more of maintenance trouble, but with some thought there's no need to make maintenance a problem in this scenario).

The approach I've seen in an application with a similar problem is that we use a "text id" column to store a reference, and we have a single table with all the translations. This provides some flexibility also in reusing the same keys to reduce the amount of required translations, which is an expensive part of the project.
It also provides a good separation between the data, and the translations which in my opinion is more of an UI thing.
If it is the case that the strings you require are not that many after all, then you can just load them all in memory once and use some method to provide translations by checking a data structure in memory.
With this approach, your beans won't have getters for each language, but you would use some other translator object:
MyTranslator.translate(myBean.getNameTextId());

Depending on your requirements, it may be best to have a separate label table for each table which needs to be multilingual. e.g.: you have a XYZ table with a xyz_id column, and a XYZ_Label table with a xyz_id, language_code, label, other_label, etc
The advantage of this, over having a single huge labels table, is that you can do unique constraints on the XYZ_labels table (e.g.: The english name for XYZ must be unique), and you can do indexed lookups much more efficiently, since the index will only be covering a single table at a time (e.g.: if you need to look up XYZ entities by english name) .

What about this:
http://rob.purplerockscissors.com/2009/07/24/internationalizing-websites/
...that is what user "Chochos" says in response #2

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.