I need to process a bunch of data - each element (datum?) is essentially just a dictionary of textual attributes. So say the class is Book - it might have an author, title, genre, reading difficulty level, and recommended price. Now, I start off only knowing the first two, and for each book, need to infer or estimate the next three (in my problem it is more than that).
So the approach that is natural to me is to do this iteratively for each book. My design would look something along the lines of (this is in Java)
public class Book
{
public String author;
public String title;
/* ... */
public double price;
public Book(String author,String title)
{
this.author = author;
this.title = title;
}
public void setGenre(DataProvider dp,...)
{
/* some sort of guess, resulting in genreGuess */
this.genre = genreGuess;
}
/* .. and so on for price, etc */
}
And then, my code would like:
for (Book book : bookList)
{
setGenre(book);
setPrice(book);
/* and so on */
}
However, I am trying to learn how to design programs better, in a less iterative fashion, using less mutable state. Does anyone have any recommendations on how I might go about this?
I'm NOT an OO-design guru... Here's one way which I personally think is better.
Inject an implementation of the GenreGuesser interface into Book... this is best done via a BookFactory. The factory is configured ONCE, and then used to create "like" books. I'm thinking of using dependency injection here (like Springs DI framework, or Google's Guice), which dramatically cut-down the overhead of "wiring" the factories into the things which depend on them ;-)
Then we could retrieve AND CACHE the calculated attribute on-the fly. Note that caching the result implies that a Book-objects IDENTITY (eg: author & title) are final, or atleast fixed-once-set.
public String getGenre()
{
if (this.genre==null)
this.genre = genreGuesser.getGuess();
return this.genre;
}
So basically you're doing your own "late binding" for each calculated field. There's also nothing stopping you (or the user) from setting each field manually if the default "guess" is off-base.
This achives a simple "rich" interface on the Book class; at the cost of making Book aware of the concept "guesses"... and I'm not a fan of "intelligent" transfer-objects, per se, which brings to mind another approach.
If we're going to accept all the overhead of having a BookFactory class, and we CAN limit ourselves to ONLY EVER creating books through the factory, then why not just let the BookFactory (which by definition knows all-about Book and it's attributes) populate all the calculated fields with (guessed) default values. Then the Book class is back to being a simple, dumb, transfer object, which does exactly what it needs to, AND NOTHING ELSE.
I'll be interested to read others suggestions.
Cheers. Keith.
The key thing here is that the class you're describing is a very simple one, so it's hard to see how it could be improved.
What happens in real systems, however, is that your Author class would, for example, be a connection to a Person and a Contract, or the Book would have a Publisher. In a library, it might have a history of when it was purchased, when it was loaned out and returned, and something like ISBN and Library of Congress records.
Behind the objects would be some kind of persistent store -- from something as simple as Python's "pickling" to a relational data base or a "NoSQL" table store.
That's where the complexity starts to show up.
So here are some things to think about:
how many objects do you mean to store? Decisions for 10 Books are very different from what you need to store 10 million.
If you have a complicated tree of objects -- with Publisher, Author, Person, Contract, LC records, inventory and so on -- then creating (or "rehydrating") the object from .persistent store can take a long time. Back when OO was first catching on, this was a traditional issue in forst systems: the object model was wonderful, but it took a half-hour to load an object and all its connected objects.
At that point, you need to start thinking about lazy evaluation. Another useful pattern is Flyweight -- instead of making many copies, you cache one copy and simply refer to it.
What are the use cases? You can't just say "I want to model a Book" because you don't know what the book is for. Start with use cases, and work down to the goal of having the methods of your class make it easy to write code.
The best way to handle that is, basically, to write code. Write out, sketch, actual examples of code using your objects and see if they are easy to use.
As Fred Brooks says, "plan to throw one away; you will anyway." As in writing prose, writing code is rewriting.
Firs thing I notice is that setGenre and setPrice are member methods on the Book object. In that case, you shouldn't be passing in a book, but rather calling
book.setGenre();
book.setPrice();
But I'm not sure you should even be doing that. If you're trying to infer Genre and Difficulty and ultimately Price from the author and title, you shouldn't be explicitly calling setGenre().
Instead, you could call
book.getPrice();
or
book.calculatePrice();
Then that method could infer gender and difficulty before returning the final price.
Related
Consider this database model:
Book
isbn primary key
title
In a RDBMS, the database makes sure that two identical rows don't exist for the above model.
Similarly, in Java consider this object model:
Book
- isbn: int
- title: String
+ Book(isbn)
Let's say we are creating a Book object:
Book b = new Book(123456);
Later, in some other part of the code we are creating again an identical Book object:
Book c = new Book(123456);
Can Java make sure that no two objects exist in the JVM heap if they are identical? Just like a RDBMS does?
There's no built-in mechanism in Java that automatically does this for you. You could build something for this, but probably shouldn't. And if you do, then probably not in the way that you show in your question.
First: let's assume that these objects are immutable, so the problem is reduced to "let no two objects be constructed that have the same attributes". This is not a necessary restriction, but this way I can already demonstrate the issues with this approach.
The first issue is that it requires you to keep track of each Book instance in your program in a single central place. You can do that quite easily by having a collection that you fill when an object is constructed.
However, this basically builds a massive memory leak into your program because if nothing else hangs on to this Book object, that collection still will reference it, preventing it from being garbage collected.
You can work around that issue by using WeakReference object to hold on to your Book objects.
Next, if you want to avoid duplicates, you almost certainly want a way to fetch the "original" instance of a Book if you can't create a new one. You can't do that if you simply use the constructor, since the constructor can't "return another object", it will always create and return a new object.
So instead of new Book(12345) you want something like BookFactory.getOrCreateBook(12345). That factory can then either fetch the existing Book object with the given id or create a new one, as required.
One way to make the memory leak issue easier to handle (and also to potentially allow multiple parallel sessions each with their own set of unique Book objects) is to make the BookFactory be a BookSession: i.e. you instantiate one and it keeps tracks of its books. Now that BookSession is the "root" of all Books and if it no longer gets referenced it (and all the books it created) can potentially be garbage collected.
All of this doesn't even get into thread safety which is solvable reasonably easily for immutable objects but can get quite convoluted if you want to allow modifications while still maintaining uniqueness.
A simple BookSession could look a little like this (note that I use a record for book only for brevity of this sample code, this would leave the constructor visible. In "real" code I'd use an equivalent normal class where the constructor isn't accessible to others):
record Book(int isbn, String title) {}
class BookSession {
private final ConcurrentHashMap<Integer, Book> books = new ConcurrentHashMap<>();
public Optional<Book> get(int isbn) {
return Optional.ofNullable(books.get(isbn));
}
public Book getOrCreate(int isbn, String title) {
return books.computeIfAbsent(isbn, (i) -> new Book(i, title));
}
}
You can easily add other methods to the session (such as findByTitle or something like that).
And if you only ever want a single BookSession you could even have a public static final BookSession BOOKS somewhere, if you wanted (but at that point you have re-created the memory leak)
I do not know of a JVM internals specific way of doing this, but it is not that hard to achieve the basic goal. Joachim Sauer's answer goes into depth on why this might not be the greatest idea without some additional forethought :)
If you forego of thread safety, the code is basically just about creating a private constructor and use a factory method that keeps tab on created objects.
Pseudo Java follows
public class Book {
// potential memory leak, see Joachim Sauer's answer (WeakReference)
Map<Book> created = new Map<>();
// other internal fields follow
// can only be invoked from factory method
private Book(String isbn){ /* internals */ }
public Book get(String isbn){
if(created.has(isbn)) return created.get(isbn);
var b = new Book(isbn);
b.add(isbn, b);
return b;
}
}
Converting this to a thread safe implementation is just about adding some details * and is another question. Avoiding the potential memory leak means reading up on weak references.
i.e. locks (synchronized), mutexes, Concurrent*, Atomic*, etc
Neither of the other answers is technically correct.
They will often work, but in situations where multiple ClassLoaders are in play they will both fail.
Any object instance can ever only be unique within the context of a specific ClassLoader, thus 2 instances of the same Book can exist, even if you guard against multiples being created within a specific ClassLoader.
Usually this won't be a problem as many (especially simpler) programs will never have to deal with multiple ClassLoaders existing at the same time.
There is btw no real way to protect against this.
Before asking the question I have to say that this is homework and I am only looking for ideas on how to implement a class that models a book.
A book has a title and many chapters, each chapter has a title and multiple subchapters. Every subchapter has a title and a list of paragraphs. Every paragraph can have a number and a text. I am looking to implement in OOP the following functionalities: add/remove: chapters, subchapters and paragraphs and display the book.
I am having some troubles finding the right structure for the class.
This is how I'm thinking of implementing it but it seems kind of redundant and complicated. Are there simpler and more correct ways to do it?
public class Element<T> {
int nr;
String Title;
ArrayList<T> x = new ArrayList<T>();
Element() {
nr = 0;
Title = "";
}
public void removeFromElement(int index) {
if (index <= 0 && index > x.size())
System.out.println("The chapter doesn't exist");
else {
x.remove(index);
System.out.println("Succesful deletion");
}
}
public void addToElement(T elem) {
x.add(elem);
}
}
public class Paragraph {
int nr;
String text;
}
public class Subchapter extends Element<Paragraph> {
}
public class Chapter extends Element<Subchapter> {
}
public class Book extends Element<Chapter> {
}
Your first approach is pretty good but does not take into account the possibility of evolution.
What would happen if we asked you to add the possibility for paragraphs to have 0-n quotes (or anything else) ?
What would happen if we asked you to add, in your book, a type of chapter which can't have subchapters ?
This will need huge changes. You should take a look at composite pattern.
Following this pattern, you'll be way more flexible as far as evolutions are concerned.
When you think OOP, you must always keep in mind that interfaces can have a huge role in the way you'll design your code.
Many of theses problems have been already resolved and have a conventionnal solution (design patterns). You should take the time to learn the most common of them. It will definitly change your coding approach.
("Design pattern head first" would be a good book to read).
More over, one of the most important feature of OOP is the encapsulation.
This provide a very powerful way to control class' attributes' accessibility.
You MUST use it. Start by adding private or protected modifiers on your class' attributes and create the getters/setters needed to access/modify these attributes.
Another thing to take in note:
You should not use the System.out.println() method to logg your code. Use a log API (log4j for instance) and exceptions.
public void removeFromElement(int index) throws ChapterNotFoundException{
if (index <= 0 && index > x.size()){
throw new ChapterNotFoundException(String.format("Chapter %s does not exist", index));
}
x.remove(index);
logger.info(String.format("Chapter %s has been removed", index));
}
This is actually not correct design. (but I like how you think about minimalize code writing)
The Book is not extended by chapter or "chapter elements". The book contains chapters and it also can contain/do something else.
Therefore the correct design is the simpliest one
public class Book{
private String title;
private List<Chapter> chapters;
}
//other classes would look similar
This approach is much more stable and it allows easy modification (or even replacement) in future.
This "phenomen" has also name, it is Composition over inheritance
Your overall model actually makes a fair amount of sense. You've identified the repeated code and separated it out into a class. Your use of generics helps keep that clean. Whether it's worth explicitly separating out layers of the tree (for that's what this essentially is) as Subchapters, Chapters, etc depends on your exact requirements.
It might be simpler just to define tree nodes and leaves, but if you do need different behavior at each layer and don't need flexibility to add or remove more layers then this is fine. Consider for example if you will ever have an omnibus with Omnibus->Book->Chapter->Subchapter->Paragraph, or a book with Book->Chapter->Section->Subchapter->Paragraph. Could your model support those? If not does it need to?
Some of the naming could be clearer (for example nr as number) or doesn't follow style conventions (Title should be title).
The main mistake I'd say would be to store the number inside the object at all. That is fragile as it means you constantly have to update it as things are added, removed, etc.
You can always find out the number by just looking at the position in the parent. Essentially you are duplicating that information.
As Grégory pointed out in the comments all variables should be private and accessed via getters and setters.
Your design is good. There are a few bugs in the code; I suggest to write a few unit tests to find them (hint: Try to delete pages).
I've written software to handle books in several languages (Java, Python, Groovy) and found that it's harder than it looks. The problem is that all elements (book, chapter, sub-chapter, paragraph) have many common features but subtle differences. Most of them are containers for specific types of other elements (so a book is made of chapter but not paragraphs). Only a paragraph isn't a container. All of them have a text (title or text of the paragraph) but the semantics of the text is different. Also, titles for sub-chapters don't make sense, or do they?
That makes it so hard to come up with a good API to handle all the corner cases without too much duplicated code, especially in a language like Java where you don't have mixins.
Imagine that you are editing a big back office Enterprise Java app, where other people might poke around years from now. That means you have to keep the code clean and easy to understand, performance might not be the #1 priority.
There is a module that needs to
Extract data from objects
Map data parameters, for example SE -> Sweden [this only applies and is used in this module, for now]
Send these new parameters to somewhere (for example via email/xml)
For a small set of data, then i'd use a small HashMap, but the custom table of data that has to be transformed has grown to 3 HashMaps with ~100 elements in some. I have them in a file called Translater.Java
and there I got a method:
public String getCountryCode(String country) {
return countryCodes.get(country);
}
which is initiated with
countryCodes = new HashMap<String, String>() {{
put("Andorra", "AD");
put("Afghanistan", "AF");
...
}};
it looks ugly! But my choices seem to be:
Make a database table in a new database, which would add another layer of obfuscation when a coder just wants to see what maps to what. It is also not needed to ever change this data, and if so its better done as a code change since the db is not source code controlled! (we use hibernate)
Store this static data as a config file, the application uses a database table for configuration options, this would add to the maintenance.
Use the config database table to store this, that would work but could also make the rest of the configuration options harder to find since the other types of data in the configuration table are relatively small and cohesive.
Try with simply enum for this, this is very effective and easy to maintain.
Example:
public enum Country {
ANDORRA("AD"),
AFGHANISTAN("AF"),
...;
private String code;
private Country(String code) {
this.code = code;
}
public static String findCountryCode(String country) {
return valueOf(country.toUpperCase()).getCode();
}
public String getCode() {
return code;
}
}
public class CountryTest {
#Test
public void testGetCode() throws Exception {
assertThat(Country.findCountryCode("Andorra")).isEqualTo("AD");
}
}
Edit: I'm not fully sure from your question which way the mapping should go, or if you need to be able to do lookups both ways. The following assumes that you are looking up country code as the value by the key of country name.
In my experience, number 3 is the best option. In a lot of system architectures you would have to redeploy the application if you need to change hard coded mappings.
I have seen from your comments to the first answer that your mappings are only likely to change once every 3 years or so. However, you can't guarantee that; requirements can change, and so too can international relations.
Your reservations towards number 3 was that
that would work but could also make the rest of the configuration options harder to find since the other types of data in the configuration table are relatively small and cohesive.
The solution to this point is to have a well defined and clear naming convention for keys in the configuration database. You could, for instance, use multiple levels of prefixes in the key name to narrow down the intended scope/place of use of the configuration values. For example:
general.translation.countrycodes.Andorra
My application will upon request retrieve information from a database and produce an object from that information. I'm currently considering two different techniques (but I'm open to others as well!) to complete this task:
Method one:
class Book {
private int id;
private String author;
private String title;
public Book(int id) {
ResultSet book = getBookFromDatabaseById(id);
this.id = book.id;
this.author = book.author;
// ...
}
}
Method two:
public class Book {
private HashMap<String, Object> propertyContainer;
public Book(int id) {
this.propertyContainer = getBookFromDatabaseById(id);
}
public Object getProperty(String propertyKey) {
return this.propertyContainer.get(propertyKey);
}
}
With method one, I believe that it's easier to control, limit and possibly access properties, adding new properties, however, becomes smoother with method two.
What's the proper way to do this?
I think this problem has been solved in many ways: ORM, DAO, row and table mapper, lots of others. There's no need to redo it again.
One issue you have to think hard about is coupling and cyclic dependencies between packages. You might think you're doing something clever by telling a model object how to persist itself, but one consequence of this design choice is coupling between model objects and the persistence tier. You can't use model objects without persistence if you do this. They really become one big, unwieldy package. There's no layering.
Another choice is to have model objects remain oblivious to whether or not they're persisted. It's a one way dependence that way: persistence knows about model objects, but not the other way around.
Google for those other solutions. There's no need to beat that dead horse again.
The first method will provide you with type safety for associated accessors so you will know what type of object you are getting back and don.t have to cast to that type the you are expecting (this becomes more important when providing anything other than primitives).
For that reason (plus that it will make the resulting code simpler and easier to read) I would pick the first one. In any large applications you will also be able to quickly, easily and neatly get parameter values back in the code for debug etc. within the object itself.
If anyone else is going to be working on this code also (or your planning on working it after you forget about it) the first one will also help as you know the parameters etc. The second one will only give you this with extensive javadoc.
The first one is the classical way. The second one is really tricky for nothing.
Lets say there is a method that searches for book authors by book id. What should be passed as a parameter to such method - only book.id (int) or whole book object?
Or another example. In java I need to do some work with current url of the page. What should be passed to such method - only request.getRequestURL() or whole request?
I kind of see benefits from each method but can't come up with good rule when to use what.
Thanks.
I am not sure if there is a "rule" to what is best, but I most often pass just the paramaters I need into the method. So in your first example I would only pass in the book.id and in your second example I would only pass in the request.getRequestURL().
I try to avoid passing in more than I need.
I'm going to be a dissenter and argue for passing the entire Book object.
Reason 1: Type checking. If you just pass an integer ID, there's no way to know, looking at code, if you've got the correct "kind" of integer ID. Maybe you've been passing around an integer variable that you think is the Book ID, but it's actually the Author ID. The compiler is not going to help you catch this mistake, and the results are going to be buggy in unexpected ways.
Reason 2: Future proofing. Some have made the argument that if you just pass the ID, you give yourself the option to change the structure of the Book object later, without breaking the doSomethingWithBook(int ID) method. And that's true. On the other hand, if you pass the entire Book object, you give yourself the option to change the internals of doSomethingWithBook(Book book) (maybe it will want to search based on some other field in the future) without breaking any of the (possibly numerous) places you've called doSomethingWithBook. I'd argue that the latter helps you more.
In the case of the Request, I would give a different answer, since I would consider a Request object to be tightly linked to a certain type of interface (web) and therefore would want to limit the use of that object. One question I like to ask myself: if I wanted to switch this web application to be, say, a command-line application, how many classes would have to change? If I'm passing around the Request, that's going to "infect" more classes with web-specific logic.
Weaker connectivity is preferred unless there are specific reasons. When pass book id only to search method you are free to change Book interface without worrying that it might affect other functions. At some moment in future you may discover that you need to do exactly the same job with some URL outside request handler, so avoiding unneeded dependency on request is good. But note, that if you frequently call do_smth(request.getRequestURL()) it may become quite annoying.
This is related to the Law of Demeter, which basically states that objects and methods should only receive exactly what they need, rather than going through another object to get what they actually need. If you need to use multiple fields from a Book in your method, it might be better to just take a book. But in general, you'll have less coupling in a system if you only depend on exactly what you need.
In both your examples, just using the ID or URL would probably be preferable. Particularly in the case of the URL, where (if you want to test the method) it's easy to create a URL to test with but harder (and completely unnecessary) to create a request to pass to the method which will then only use the URL anyway. The method also becomes more generally applicable to other situations than one in which you have a request object.
I would give each method only as much as necessary (so for the second question: just give it request.getRequestURL()).
For the first one I would think about defining both methods (but prefer the id-one, as you can easily get the ID if you have a Book, but not the other way around).
findAuthorsForBookId(int bookId)
findAuthorsForBook(Book b)
Call book.authors().
(Note: this is a dissenting view regarding the accepted answer.)
Well, there is an implicit rule set in context of domain modeling. If the receiver is performing tasks independent of the domain model then you pass the field. Otherwise, you should pass the object and the model specific action is made explicit by the act of the receiver accessing the id property of the 'Book' object. Most importantly, if accessing the property ever evolves beyond simply returning the reference of a field (e.g. certain actions in the property accessor) then clearly you do NOT want to chase all instances in your code where you dereferenced the property before passing it into various methods.
Further considerations are the consequences (if any) of accessing the field before the call cite, or, inside the receiver.
There's no rule actually, you should be straightforward with the info you need, in that case the book.id. If you consider extending / sharing your search in the future, the you can have an overloaded method to accept a book object so that you can search by other attributes of the book object.
Think about maintaining the code in the long run. Any method you expose is a method you'll have to support for your users going forward. If bookId is all that's needed for the forseeable future, then I'd go with just passing in that: that way, anyone who has a bookId can use your method, and it becomes more powerful.
But if there's a good chance that you may need to refactor the lookup to use some other attributes of Book, then pass in Book.
If you're writing a DAO of sorts, you should consider having a BookSelector which can be built up like: new BookSelector().byId(id).bySomethingElse(somethingElse) and pass this selector instead of having a proliferation of findByXYZ methods.
I agree with the previous posters. I wanted to add that if you find yourself needing multiple properties of the object (id, title, author) then I'd suggest passing the object (or an interface to the object). Short parameter lists are generally preferable.
Lets say there is a method that searches for book authors by book id. What should be passed as a parameter to such method - only book.id (int) or whole book object?
I am making the assumption that "book authors" is an attribute of a book. Therefore, I imagine something like the following class:
class Book {
private int id;
private List<Author> authors;
// ... maybe some other book information
public int getID() {
return this.id
}
public void setID(int value) {
this.id = value
}
public List<Author> getAuthors() {
return this.authors.clone();
}
// ...
}
Given an instantiated Book object (aBook), to determine the list of authors, I would expect that I can call aBook.getAuthors(), which requires no parameters.
I would discourage the creation of partially instantiated domain objects. In other words, given a bookid, and looking for a list of authors, I would want the client code to look more like this:
Book aBook = library.findBook(bookid);
List<Author> authors = aBook.getAuthors();
and less like this:
Book bookQuery = new Book().setID(bookid); // partially instantiated Book
Book aBook = library.findBook(bookQuery);
List<Author> aBook = book.getAuthors();
The first version reduces the number of throwaway objects that are created by the client code. (In this case, bookQuery, which isn't a real book.)
It also makes the code easier to read--and therefore to maintain. This is because bookQuery is not doing what the maintenance programmer would expect. For example, I'd expect two Books with the same ID to have the same tite, authors, ISBN, etc. These assertions would fail for bookQuery and aBook.
Thirdly, it minimizes the chance that you will someday pass an invalid (partially instantiated) Book object to a method that is expecting a real Book. This is a bug where the failure (in the method) may happen far away from the cause (the partial instantiation).