Java class that models a book

Java class that models a book - java

Before asking the question I have to say that this is homework and I am only looking for ideas on how to implement a class that models a book.
A book has a title and many chapters, each chapter has a title and multiple subchapters. Every subchapter has a title and a list of paragraphs. Every paragraph can have a number and a text. I am looking to implement in OOP the following functionalities: add/remove: chapters, subchapters and paragraphs and display the book.
I am having some troubles finding the right structure for the class.
This is how I'm thinking of implementing it but it seems kind of redundant and complicated. Are there simpler and more correct ways to do it?
public class Element<T> {
int nr;
String Title;
ArrayList<T> x = new ArrayList<T>();
Element() {
nr = 0;
Title = "";
}
public void removeFromElement(int index) {
if (index <= 0 && index > x.size())
System.out.println("The chapter doesn't exist");
else {
x.remove(index);
System.out.println("Succesful deletion");
}
}
public void addToElement(T elem) {
x.add(elem);
}
}
public class Paragraph {
int nr;
String text;
}
public class Subchapter extends Element<Paragraph> {
}
public class Chapter extends Element<Subchapter> {
}
public class Book extends Element<Chapter> {
}

Your first approach is pretty good but does not take into account the possibility of evolution.
What would happen if we asked you to add the possibility for paragraphs to have 0-n quotes (or anything else) ?
What would happen if we asked you to add, in your book, a type of chapter which can't have subchapters ?
This will need huge changes. You should take a look at composite pattern.
Following this pattern, you'll be way more flexible as far as evolutions are concerned.
When you think OOP, you must always keep in mind that interfaces can have a huge role in the way you'll design your code.
Many of theses problems have been already resolved and have a conventionnal solution (design patterns). You should take the time to learn the most common of them. It will definitly change your coding approach.
("Design pattern head first" would be a good book to read).
More over, one of the most important feature of OOP is the encapsulation.
This provide a very powerful way to control class' attributes' accessibility.
You MUST use it. Start by adding private or protected modifiers on your class' attributes and create the getters/setters needed to access/modify these attributes.
Another thing to take in note:
You should not use the System.out.println() method to logg your code. Use a log API (log4j for instance) and exceptions.
public void removeFromElement(int index) throws ChapterNotFoundException{
if (index <= 0 && index > x.size()){
throw new ChapterNotFoundException(String.format("Chapter %s does not exist", index));
}
x.remove(index);
logger.info(String.format("Chapter %s has been removed", index));
}

This is actually not correct design. (but I like how you think about minimalize code writing)
The Book is not extended by chapter or "chapter elements". The book contains chapters and it also can contain/do something else.
Therefore the correct design is the simpliest one
public class Book{
private String title;
private List<Chapter> chapters;
}
//other classes would look similar
This approach is much more stable and it allows easy modification (or even replacement) in future.
This "phenomen" has also name, it is Composition over inheritance

Your overall model actually makes a fair amount of sense. You've identified the repeated code and separated it out into a class. Your use of generics helps keep that clean. Whether it's worth explicitly separating out layers of the tree (for that's what this essentially is) as Subchapters, Chapters, etc depends on your exact requirements.
It might be simpler just to define tree nodes and leaves, but if you do need different behavior at each layer and don't need flexibility to add or remove more layers then this is fine. Consider for example if you will ever have an omnibus with Omnibus->Book->Chapter->Subchapter->Paragraph, or a book with Book->Chapter->Section->Subchapter->Paragraph. Could your model support those? If not does it need to?
Some of the naming could be clearer (for example nr as number) or doesn't follow style conventions (Title should be title).
The main mistake I'd say would be to store the number inside the object at all. That is fragile as it means you constantly have to update it as things are added, removed, etc.
You can always find out the number by just looking at the position in the parent. Essentially you are duplicating that information.
As Grégory pointed out in the comments all variables should be private and accessed via getters and setters.

Your design is good. There are a few bugs in the code; I suggest to write a few unit tests to find them (hint: Try to delete pages).
I've written software to handle books in several languages (Java, Python, Groovy) and found that it's harder than it looks. The problem is that all elements (book, chapter, sub-chapter, paragraph) have many common features but subtle differences. Most of them are containers for specific types of other elements (so a book is made of chapter but not paragraphs). Only a paragraph isn't a container. All of them have a text (title or text of the paragraph) but the semantics of the text is different. Also, titles for sub-chapters don't make sense, or do they?
That makes it so hard to come up with a good API to handle all the corner cases without too much duplicated code, especially in a language like Java where you don't have mixins.

Related

Efficient comparison between two POJOs to determine the extent of how similar they are

I need to make a comparison between 2 POJOs, but instead of checking for equality, I'm trying to determine how similar they are even though I know they are not the same. For instance, out of the 20 fields they have, I need to determine how many are the same/different.
ex:
public class Objekt {
private int field1;
private String field2;
private String field3;
...
private List<Integer> field4;
public Objekt () {
...
}
public compareWith (Objekt other) {
if (field1 != other.field2)
System.out.println("Field 1 is different");
if (!field2.equals(other.field2))
System.out.println("Field 2 is different");
...
// etc
}
}
Having to compare each field manually seems like a lot of extra boilerplate code, and it's also not scalable if I were to need a method similar to this with other object. I was curious if there's a solution out there to do something similar, or if anyone has any ideas on how I could make this more efficient?
New to StackOverflow, thanks for any suggestions! :)

The simple answer is: Java isn't built for "unstructured" code like you have in mind.
Meaning: there is reflection (see https://www.oracle.com/technetwork/articles/java/javareflection-1536171.html) for example that allows you to write code that inspects the fields of objects arbitrary classes. But that is extremely cumbersome and error prone, and of course: coming at high performance penalties at runtime.
There are libraries build around that though, like the https://commons.apache.org/proper/commons-lang/javadocs/api-3.5/org/apache/commons/lang3/builder/EqualsBuilder.html that make things a bit easier to use.
But as said: the real answer is that you normally strive to avoid such designs. You should rather step back and find ways to solve the underlying problem that have better solutions in Java. Or ask yourself why you want to use a statically typed language like Java for a problem that dynamic languages are much better suited for.

Interface versus implementing Queue

Queue is an interface that implements many different classes. I am confused about why my text book gives this sample text using Queue. I have also found this in an example on the Princeton website. Is this a customary way to provide code so it can be edited later to the programmers preferred type of queue?
This is code taken from an algorithm for a Binary Search Symbol Table.
public Iterable<Key> keys(Key lo, Key hi) {
Queue<Key> q = new Queue<Key>();
for (int i = rank(lo); i < rank(hi); i++) {
q.enqueue(keys[i]);
}
if (contains(hi)) {
q.enqueue(keys[rank(hi)]);
}
return q;
}

So first, there are many classes that implement Queue not Queue is an interface that implements many different classes.
Second, yes code convention is that you use the interface where ever possible to make code more flexible in case a different implementation is desired. This is especially true of method signatures but also good practice for variable and field declaration.
Third, looks like a typo. Line should be Queue<Key> q = new LinkedList<Key>();
See answer here: Why are variables declared with their interface name in Java?
Also, What does it mean to “program to an interface”?

I do this often for other interfaces where the specific implementation doesn't provide anything over the interface, like List.
List<String> strings = new ArrayList<>();
In that case I retain the most flexibility with the least amount of hassle.
Granted, you won't change implementations very often, but there are some specific ones where for example inserting takes O(n) time and lookup takes O(log n) time or vice versa and in those cases you have to do some research how you're using the interface.
Do you change the contents of the collection a lot, but read it sparingly? Or is it the other way around and do you insert just once and read often?

Your question is very broad. This technique is called programming to interface and it's an effect of years of gathering good practices in coding. It just makes things easier in so many ways. It revolutionised ways how we write software today. If you're an eager learner and want to know more, check this article as a starter.

Database information -> object: how should it be done?

My application will upon request retrieve information from a database and produce an object from that information. I'm currently considering two different techniques (but I'm open to others as well!) to complete this task:
Method one:
class Book {
private int id;
private String author;
private String title;
public Book(int id) {
ResultSet book = getBookFromDatabaseById(id);
this.id = book.id;
this.author = book.author;
// ...
}
}
Method two:
public class Book {
private HashMap<String, Object> propertyContainer;
public Book(int id) {
this.propertyContainer = getBookFromDatabaseById(id);
}
public Object getProperty(String propertyKey) {
return this.propertyContainer.get(propertyKey);
}
}
With method one, I believe that it's easier to control, limit and possibly access properties, adding new properties, however, becomes smoother with method two.
What's the proper way to do this?

I think this problem has been solved in many ways: ORM, DAO, row and table mapper, lots of others. There's no need to redo it again.
One issue you have to think hard about is coupling and cyclic dependencies between packages. You might think you're doing something clever by telling a model object how to persist itself, but one consequence of this design choice is coupling between model objects and the persistence tier. You can't use model objects without persistence if you do this. They really become one big, unwieldy package. There's no layering.
Another choice is to have model objects remain oblivious to whether or not they're persisted. It's a one way dependence that way: persistence knows about model objects, but not the other way around.
Google for those other solutions. There's no need to beat that dead horse again.

The first method will provide you with type safety for associated accessors so you will know what type of object you are getting back and don.t have to cast to that type the you are expecting (this becomes more important when providing anything other than primitives).
For that reason (plus that it will make the resulting code simpler and easier to read) I would pick the first one. In any large applications you will also be able to quickly, easily and neatly get parameter values back in the code for debug etc. within the object itself.
If anyone else is going to be working on this code also (or your planning on working it after you forget about it) the first one will also help as you know the parameters etc. The second one will only give you this with extensive javadoc.

The first one is the classical way. The second one is really tricky for nothing.

Design pattern to progressively "fill out" an object (beginner question)

I need to process a bunch of data - each element (datum?) is essentially just a dictionary of textual attributes. So say the class is Book - it might have an author, title, genre, reading difficulty level, and recommended price. Now, I start off only knowing the first two, and for each book, need to infer or estimate the next three (in my problem it is more than that).
So the approach that is natural to me is to do this iteratively for each book. My design would look something along the lines of (this is in Java)
public class Book
{
public String author;
public String title;
/* ... */
public double price;
public Book(String author,String title)
{
this.author = author;
this.title = title;
}
public void setGenre(DataProvider dp,...)
{
/* some sort of guess, resulting in genreGuess */
this.genre = genreGuess;
}
/* .. and so on for price, etc */
}
And then, my code would like:
for (Book book : bookList)
{
setGenre(book);
setPrice(book);
/* and so on */
}
However, I am trying to learn how to design programs better, in a less iterative fashion, using less mutable state. Does anyone have any recommendations on how I might go about this?

I'm NOT an OO-design guru... Here's one way which I personally think is better.
Inject an implementation of the GenreGuesser interface into Book... this is best done via a BookFactory. The factory is configured ONCE, and then used to create "like" books. I'm thinking of using dependency injection here (like Springs DI framework, or Google's Guice), which dramatically cut-down the overhead of "wiring" the factories into the things which depend on them ;-)
Then we could retrieve AND CACHE the calculated attribute on-the fly. Note that caching the result implies that a Book-objects IDENTITY (eg: author & title) are final, or atleast fixed-once-set.
public String getGenre()
{
if (this.genre==null)
this.genre = genreGuesser.getGuess();
return this.genre;
}
So basically you're doing your own "late binding" for each calculated field. There's also nothing stopping you (or the user) from setting each field manually if the default "guess" is off-base.
This achives a simple "rich" interface on the Book class; at the cost of making Book aware of the concept "guesses"... and I'm not a fan of "intelligent" transfer-objects, per se, which brings to mind another approach.
If we're going to accept all the overhead of having a BookFactory class, and we CAN limit ourselves to ONLY EVER creating books through the factory, then why not just let the BookFactory (which by definition knows all-about Book and it's attributes) populate all the calculated fields with (guessed) default values. Then the Book class is back to being a simple, dumb, transfer object, which does exactly what it needs to, AND NOTHING ELSE.
I'll be interested to read others suggestions.
Cheers. Keith.

The key thing here is that the class you're describing is a very simple one, so it's hard to see how it could be improved.
What happens in real systems, however, is that your Author class would, for example, be a connection to a Person and a Contract, or the Book would have a Publisher. In a library, it might have a history of when it was purchased, when it was loaned out and returned, and something like ISBN and Library of Congress records.
Behind the objects would be some kind of persistent store -- from something as simple as Python's "pickling" to a relational data base or a "NoSQL" table store.
That's where the complexity starts to show up.
So here are some things to think about:
how many objects do you mean to store? Decisions for 10 Books are very different from what you need to store 10 million.
If you have a complicated tree of objects -- with Publisher, Author, Person, Contract, LC records, inventory and so on -- then creating (or "rehydrating") the object from .persistent store can take a long time. Back when OO was first catching on, this was a traditional issue in forst systems: the object model was wonderful, but it took a half-hour to load an object and all its connected objects.
At that point, you need to start thinking about lazy evaluation. Another useful pattern is Flyweight -- instead of making many copies, you cache one copy and simply refer to it.
What are the use cases? You can't just say "I want to model a Book" because you don't know what the book is for. Start with use cases, and work down to the goal of having the methods of your class make it easy to write code.
The best way to handle that is, basically, to write code. Write out, sketch, actual examples of code using your objects and see if they are easy to use.
As Fred Brooks says, "plan to throw one away; you will anyway." As in writing prose, writing code is rewriting.

Firs thing I notice is that setGenre and setPrice are member methods on the Book object. In that case, you shouldn't be passing in a book, but rather calling
book.setGenre();
book.setPrice();
But I'm not sure you should even be doing that. If you're trying to infer Genre and Difficulty and ultimately Price from the author and title, you shouldn't be explicitly calling setGenre().
Instead, you could call
book.getPrice();
or
book.calculatePrice();
Then that method could infer gender and difficulty before returning the final price.

Application design for processing data prior to database

I have a large collection of data in an excel file (and csv files). The data needs to be placed into a database (mysql). However, before it goes into the database it needs to be processed..for example if columns 1 is less than column 3 add 4 to column 2. There are quite a few rules that must be followed before the information is persisted.
What would be a good design to follow to accomplish this task? (using java)
Additional notes
The process needs to be automated. In the sense that I don't have to manually go in and alter the data. We're talking about thousands of lines of data with 15 columns of information per line.
Currently, I have a sort of chain of responsibility design set up. One class(Java) for each rule. When one rule is done, it calls the following rule.
More Info
Typically there are about 5000 rows per data sheet. Speed isn't a huge concern because
this large input doesn't happen often.
I've considered drools, however I wasn't sure the task was complicated enough for drols.
Example rules:
All currency (data in specific columns) must not contain currency symbols.
Category names must be uniform (e.g. book case = bookcase)
Entry dates can not be future dates
Text input can only contain [A-Z 0-9 \s]
etc..
Additionally if any column of information is invalid it needs to be reported when
processing is complete
(or maybe stop processing).
My current solution works. However I think there is room for improvement so I'm looking
for ideals as to how it can be improved and or how other people have handled similar
situations.
I've considered (very briefly) using drools but I wasn't sure the work was complicated enough to take advantage of drools.

If I didn't care to do this in 1 step (as Oli mentions), I'd probably use a pipe and filters design. Since your rules are relatively simple, I'd probably do a couple delegate based classes. For instance (C# code, but Java should be pretty similar...perhaps someone could translate?):
interface IFilter {
public IEnumerable<string> Filter(IEnumerable<string> file) {
}
}
class PredicateFilter : IFilter {
public PredicateFilter(Predicate<string> predicate) { }
public IEnumerable<string> Filter(IEnumerable<string> file) {
foreach (string s in file) {
if (this.Predicate(s)) {
yield return s;
}
}
}
}
class ActionFilter : IFilter {
public ActionFilter(Action<string> action) { }
public IEnumerable<string> Filter(IEnumerable<string> file) {
foreach (string s in file) {
this.Action(s);
yield return s;
}
}
}
class ReplaceFilter : IFilter {
public ReplaceFilter(Func<string, string> replace) { }
public IEnumerable<string> Filter(IEnumerable<string> file) {
foreach (string s in file) {
yield return this.Replace(s);
}
}
}
From there, you could either use the delegate filters directly, or subclass them for the specifics. Then, register them with a Pipeline that will pass them through each filter.

I think your method is OK. Especially if you use the same interface on every processor.
You could also look to somethink called Drules, currently Jboss-rules. I used that some time ago for a rule-heavy part of my app and what I liked about it is that the business logic can be expressed in for instance a spreadsheet or DSL which then get's compiled to java (run-time and I think there's also a compile-time option). It makes rules a bit more succint and thus readable. It's also very easy to learn (2 days or so).
Here's a link to the opensource Jboss-rules. At jboss.com you can undoubtedly purchase an offically maintained version if that's more to your companies taste.

Just create a function to enforce each rule, and call every applicable function for each value. I don't see how this requires any exotic architecture.

A class for each rule? Really? Perhaps I'm not understanding the quantity or complexity of these rules, but I would (semi-pseudo-code):
public class ALine {
private int col1;
private int col2;
private int coln;
// ...
public ALine(string line) {
// read row into private variables
// ...
this.Process();
this.Insert();
}
public void Process() {
// do all your rules here working with the local variables
}
public void Insert() {
// write to DB
}
}
foreach line in csv
new ALine(line);

Your methodology of using classes for each rule does sound a bit heavy weight but it has the advantage of being easy to modify and expand should new rules come along.
As for loading the data bulk loading is the way to go. I have read some informaiton which suggests it may be as much as 3 orders of magnitude faster than loading using insert statements. You can find some information on it here

Bulk load the data into a temp table, then use sql to apply your rules.
use the temp table, as a basis for the insert into real table.
drop the temp table.

you can see that all the different answers are coming from their own experience and perspective.
Since we don't know much about the complexity and number of rows in your system, we tend to give advice based on what we have done earlier.
If you want to narrow down to a 1/2 solutions for your implementation, try giving more details.
Good luck

It may not be what you want to hear, it isn't the "fun way" by any means, but there is a much easier way to do this.
So long as your data is evaluated line by line... you can setup another worksheet in your excel file and use spreadsheet style functions to do the necessary transforms, referencing the data from the raw data sheet. For more complex functions you can use the vba embedded in excel to write out custom operations.
I've used this approach many times and it works really well; its just not very sexy.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.