I'm working on a spaghetti monster (unfortunately not of the flying variety), and I've got a question about proper design.
I'm in the process of taking a gigantic static Java method that returns an object and splitting it into reusable (and readable) components. Right now, the method reads an XML document, and then appends summary and detailed information from the document to a "dataModule", and the dataModule is then returned from the method.
In breaking my code up into a getSummaryData and getDetailedData method, I noticed I'd done the following:
dataModule = getSummaryData(xmlDocument);
setDetailedData(xmlDocument, dataModule);
(Pass by reference, append detailed data to dataModule within method)
This mostly has to do with the fact that the detailed data requires business logic based on the summary data in order to be parsed properly, and the fact that changing the structure of the dataModule involves lots of changing the front end of the application.
Is this approach any better than:
dataModule = getSummaryData(xmlDocument);
dataModule = setDetailedData(xmlDocument, dataModule);
(Pass by reference, append detailed data to dataModule within method, return dataModule)
I can't share much more of the code without revealing "teh secretz", but is there a strong reason to go with one approach over the other? Or, am I just getting caught up in which shade of lipstick I'm putting on my pig, here?
Thanks,
IVR Avenger
I find your second approach, where you return the same object, more confusing - because it implies to the calling function that a different object might be returned. If you're modifying the object, your first solution looks fine to me.
One principle I'd use to answer your question is that you want as many things as possible to be final, so that you have less trouble reasoning about state. By that principle, you'd want to avoid a meaningless reassignment.
final DataModule dataModule = getSummaryData(xmlDocument);
setDetailedData(xmlDocument, dataModule);
But that's wrong too. Why should the summary and detailed data be separate steps? Will you ever do one without the other? If not, those steps should be private to the DataModule. Really, the data module should probably know how to construct itself from the xml data.
final DataModule dataModule = new DataModule(xmlDocument);
The (arguable) advantage of the second approach is that it permits method chaining.
Say you had, in addition to setDetailedData(), setMoreData(), and that both functions were written to return the object. You could then write:
dataModule = getSummaryData(xmlDocument);
dataModule = dataModule.setDetailedData(xmlDocument).setMoreData();
I don't think the example you've provided benefits much from a method chaining syntax, but there are examples where it can lead to truly beautiful, expressive code. It permits what Martin Fowler calls a Fluent Interface.
Related
I have object customerSummary at line #2 and accessing it at lines #11 & #12. Does it lead to data corruption in production?
private CustomerSummary enrichCustomerIdentifiers(CustomerSummaryDTO customerSummaryDTO) {
CustomerSummary customerSummary = customerSummaryDTO.getCustomerSummary();
List<CustomerIdentifier> customerIdentifiers = customerSummary
.getCustomerIdentifiers().stream()
.peek(customerIdentifier -> {
if (getCustomerReferenceTypes().contains(customerIdentifier.getIdentifierType())) {
customerIdentifier.setRefType(RefType.REF.toString());
} else {
customerIdentifier.setRefType(RefType.TAX.toString());
Country country = new Country();
country.setIsoCountryCode(customerSummary.getCustomerAddresses().get(0).getIsoCountryCode());
country.setCountryName(customerSummary.getCustomerAddresses().get(0).getCountryName());
customerIdentifier.setCountry(country);
}
}).collect(Collectors.toList());
customerSummary.setCustomerIdentifiers(customerIdentifiers);
return customerSummary;
}
The literal answer to your question is No ... assuming that the access is thread-safe.
But your code probably doesn't do what you think it does.
The peek() method returns the precise stream of objects that it is called on. So your code is effectively doing this:
summary.setCustomerIdentifiers(
new SomeListClass<>(summary.getCustomerIdentifiers()));
... while doing some operations on the identifier objects.
You are (AFAIK unnecessarily) copying the list and reassigning it to the field of the summary object.
It would be simpler AND more efficient to write it as:
for (CustomerIdentifier id: summary.getCustomerIdentifiers()) {
if (getCustomerReferenceTypes().contains(id.getIdentifierType())) {
id.setRefType(RefType.REF.toString());
} else {
id.setRefType(RefType.TAX.toString());
Country country = new Country();
Address address = summary.getCustomerAddresses().get(0);
country.setIsoCountryCode(address.getIsoCountryCode());
country.setCountryName(address.getCountryName());
id.setCountry(country);
}
}
You could do the above using a list.stream().forEach(), or a list.forEach(), but the code is (IMO) neither simpler or substantially more concise than a plain loop.
summary.getCustomerIdentifiers().forEach(
id -> {
if (getCustomerReferenceTypes().contains(id.getIdentifierType())) {
id.setRefType(RefType.REF.toString());
} else {
id.setRefType(RefType.TAX.toString());
Country country = new Country();
Address address = summary.getCustomerAddresses().get(0);
country.setIsoCountryCode(address.getIsoCountryCode());
country.setCountryName(address.getCountryName());
id.setCountry(country);
}
}
);
(A final micro-optimization would be to declare and initialize address outside of the loop.)
Java 8 streams are not the solution to all problems.
The direct answer to your question is a resounding 'no', but you're misusing streams, which presumably is part of why you are even asking this question. You're operating on mutables in stream code, which you shouldn't be doing: It's why I'm saying 'misusing' - this code compiles and works but leads to hard to read and had to maintain code that will fail in weird ways as you use more and more of the stream API. The solution is not to go against the grain so much.
You're also engaging in stringly based typing which is another style mistake.
Finally, your collect call is misleading.
So, to answer the question:
Does it lead to data corruption in production?
No. How would you imagine it would?
Style mistake #1: mutables
Streams don't work nearly as well when you're working with mutables. The general idea is that you have immutable classes (classes without any setters; the instances of these classes cannot change after construction. String is immutable, so is Integer, and so is BigDecimal. There is no .setValue() on an integer instance, there is no setChar() on a string, or even a clear() or an append() - all operations on immutables that appear to modify things actually return a new instance that contains the result of the operation. someBigDecimal.add() doesn't change what someBigDecimal is pointing at; it constructs a new bigDecimal instance and returns that.
With immutables, if you want to change things, Stream's map method is the right one to use: For example, if you have a stream of BigDecimal objects and you want to, say, print them all, but with 2.5 added to them, you'd be calling map: You want to map each input BigDecimal into an output BD by asking the BD instance to make a new BD instance by adding 2.5 to itself.
With mutables, both map and peek are more relevant. Style debates are rife on what to do. peek just lets you witness what's going through a stream pipeline. It can be misleading because stream pipelines dont process anything until you stick a terminator on the end (something like collect, or max() or whatnot, those are 'terminators'). When talking about mutables, peek in theory works just as well as map does and some (evidently, including intellij's auto-suggest authors) are of the belief that a map operation that really just mutates the underlying object in the stream and returns the same reference is a style violation and should be replaced with a peek operation instead.
But the far more relevant observation is that stream operations should not be mutating anything at all. Do not call setters.
You have 2 options:
Massively refactor this code, make CustomIdentifier immutable (get rid of the getters, make all fields final, consider adding with-ers and builders and the like), change your peek code to something like:
.map(identifier -> {
if (....) return customerIdentifier.with(RefType.REF);
return identifier.withCountry(new Country(summary.get..., summary.get...));
})
Note that Country also needs this treatment.
Do not use streams.
This is much simpler. This code is vastly less confusing and better style if you just write a foreach loop. I have no idea why you thought streams were appropriate here. Streams are not 'better'. A problem is that adherents of functional style are so incredibly convinced they are correct they spread copious FUD (Fear, Uncertainty, Doubt) about non-functional approaches and strongly insinuate that functional style is 'just better'. This is not true - it's merely a different style that is more suitable to some domains and less to others. This style goes a lot further than just 'turn for loops into streams', and unawareness of what 'functional style' really means just leads to hard to maintain, hard to read, weird code like what you pasted.
I really, really want to use streams here
This is just a bad idea here (unless you do the full rewrite to immutables), but if you MUST, the actual right answer is not what intellij said, it's to use forEach. This is peek and the terminal in one package. It gets rid of the pointless collect (which just recreates a list that is 100% identical to what customerSummary.getCustomerIdentifiers() returns) call and properly represents what is actually happening (which is NOT that you're writing code that witnesses what is flowing through the stream pipe, you're writing code that you intend to execute on each element in the stream).
But that's still much worse than this:
CustomerSummary summary = custumerSummaryDTO.getCustomerSummary();
for (CustomerIdentifier identifier : summary.getCustomerIdentifiers()) {
if (getCustomerReferenceTypes().contains(customerIdentifier.getIdentifierType())) {
customerIdentifier.setRefType(RefType.REF.toString());
} else {
customerIdentifier.setRefType(RefType.TAX.toString());
Country country = new Country();
country.setIsoCountryCode(customerSummary.getCustomerAddresses().get(0).getIsoCountryCode());
country.setCountryName(customerSummary.getCustomerAddresses().get(0).getCountryName());
customerIdentifier.setCountry(country);
}
}
return customerSummary;
Style mistake #2: stringly typing
Why isn't the refType field in CustomerIdentifier just RefType? Why are you converting RefType instances to strings and back?
DB engines support enums and if they don't, the in-between layer (your DTO) should support marshalling enums into strings and back.
I have been give comment to not use variable in the return statement and instead use condition directly in return statement.
Is there any difference between line 3 and 4 in the code below?
String str = "Hello Sir";
boolean flag = str.contains("Hello");
return(flag);
// instead ask to use below
return(str.contains("Hello"));
I prefer to use variable, as in complex calculations those are helpful in debugging.
There is really no difference here. That variable lives on the stack, so does the value that is returned directly.
So, theoretically, there might be minor minor performance differences between them.
But rest assured: readability is much more important here, therefore I am with you: you can use such an additional variable when it helps the reader. But when you follow clean code principles, another option would be to have a method that only computes that condition and returns the result.
Please note: the "common" practice is to avoid additional variables, so many tools such as PMD or even IDEs suggest you to directly return (see here for a discussion of this aspect).
And finally, coming back on performance. If your method is invoked often enough, the JIT will inline/compile it anyway, and optimize it. If the method isn't invoked often enough, what would we care about a nanosecond more or less of execution time ...
i don't see a difference..
basically it is returning the value directly vs returning a variable containing the value..
Edit: OK the answer looked like a rewrite of the question.. what i meant is that its passing a value (true/false) or passing a variable for the system to unwrap it's value (var -> true/false)
so, better performance for the first option.. but nothing worth going against your personal preference for..
I overheard two of my colleagues arguing about whether or not to create a new data model class which only contains one string field and a setter and a getter for it. A program will then create a few objects of the class and put them in an array list. The guy who is storing them argue that there should be a new type while the guy who is getting the data said there is not point going through all this trouble while you can simple store string.
Personally I prefer creating a new type so we know what's being stored in the array list, but I don't have strong arguments to persuade the 'getting' data guy. Do you?
Sarah
... a new data model class which only contains one string field and a setter and a getter for it.
If it was just a getter, then it is not possible to say in general whether a String or a custom class is better. It depends on things like:
consistency with the rest of your data model,
anticipating whether you might want to change the representation,
anticipating whether you might want to implement validation when creating an instance, add helper methods, etc,
implications for memory usage or persistence (if they are even relevant).
(Personally, I would be inclined to use a plain String by default, and only use a custom class if for example, I knew that it was likely that a future representation change / refinement would be needed. In most situations, it is not a huge problem to change a String into custom class later ... if the need arises.)
However, the fact that there is proposed to be a setter for the field changes things significantly. Instances of the class will be mutable, where instances of String are not. On the one hand this could possibly be useful; e.g. where you actually need mutability. On the other hand, mutability would make the class somewhat risky for use in certain contexts; e.g. in sets and as keys in maps. And in other contexts you may need to copy the instances. (This would be unnecessary for an immutable wrapper class or a bare String.)
(The simple answer is to get rid of the setter, unless you really need it.)
There is also the issue that the semantics of equals will be different for a String and a custom wrapper. You may therefore need to override equals and hashCode to get a more intuitive semantic in the custom wrapper case. (And that relates back to the issue of a setter, and use of the class in collections.)
Wrap it in a class, if it matches the rest of your data model's design.
It gives you a label for the string so that you can tell what it represents at run time.
It makes it easier to take your entity and add additional fields, and behavior. (Which can be a likely occurrence>)
That said, the key is if it matches the rest of your data model's design... be consistent with what you already have.
Counterpoint to mschaef's answer:
Keep it as a string, if it matches the rest of your data model's design. (See how the opening sounds so important, even if I temper it with a sentence that basically says we don't know the answer?)
If you need a label saying what it is, add a comment. Cost = one line, total. Heck, for that matter, you need a line (or three) to comment your new class, anyway, so what's the class declaration for?
If you need to add additional fields later, you can refactor it then. You can't design for everything, and if you tried, you'd end up with a horrible mess.
As Yegge says, "the worst thing that can happen to a code base is size". Add a class declaration, a getter, a setter, now call those from everywhere that touches it, and you've added size to your code without an actual (i.e., non-hypothetical) purpose.
I disagree with the other answers:
It depends whether there's any real possibility of adding behavior to the type later [Matthew Flaschen]
No, it doesn’t. …
Never hurts to future-proof the design [Alex]
True, but not relevant here …
Personally, I would be inclined to use a plain String by default [Stephen C]
But this isn’t a matter of opinion. It’s a matter of design decisions:
Is the entity you store logically a string, a piece of text? If yes, then store a string (ignoring the setter issue).
If not – then do not store a string. That data may be stored as a string is an implementation detail, it should not be reflected in your code.
For the second point it’s irrelevant whether you might want to add behaviour later on. All that matters is that in a strongly typed language, the data type should describe the logical entity. If you handle things that are not text (but may be represented by text, may contain text …) then use a class that internally stores said text. Do not store the text directly.
This is the whole point of abstraction and strong typing: let the types represent the semantics of your code.
And finally:
As Yegge says, "the worst thing that can happen to a code base is size". [Ken]
Well, this is so ironic. Have you read any of Steve Yegge’s blog posts? I haven’t, they’re just too damn long.
It depends whether there's any real possibility of adding behavior to the type later. Even if the getters and setters are trivial now, a type makes sense if there is a real chance they could do something later. Otherwise, clear variable names should be sufficient.
In the time spent discussing whether to wrap it in a class, it could be wrapped and done with. Never hurts to future-proof the design, especially when it only takes minimal effort.
I see no reason why the String should be wrapped in a class. The basic perception behind the discussion is, the need of time is a String object. If it gets augmented later, get it refactored then. Why add unnecessary code in the name of future proofing.
Wrapping it in a class provides you with more type safety - in your model you can then only use instances of the wrapper class, and you can't easily make a mistake where you put a string that contains something different into the model.
However, it does add overhead, extra complexity and verbosity to your code.
I'm creating a cell editor, but I've done (and seen) this in other code. I'm creating an object and then dropping it on the floor like this:
ButtonCellEditor buttonColumn = new ButtonCellEditor(table, 2);
This class takes the table and sets a TableColumnModel and custom cell renderers to it. Then, the method ends and I don't reference the buttonColumn object anymore.
So, is there a difference between doing the above and doing this (which also works)?
new ButtonCellEditor(table, 2);
Anything really wrong with doing this?
You shouldn't have unused variables in your code, that makes it less clear. Also, a constructor is (as its name states) a method for initialize the object, this in your case is not done.
I suggest you to have a static method instead:
ButtonCellEditor.niceNameHere(table, 2);
The only case I can think in which a constructor would be adequate is when it takes params to initialize itself and then perform some actions later, but not for doing the action inside and this is not like yours.
There's nothing wrong with either of those way of creating a ButtonCellEditor. However, if you later want to reference that object, with method two you have no way of doing so. With method 1 you can at least say buttonColumn.method().
No tangible difference, as far as I know.
Nothing wrong either -- I would prefer shorter form, if the only reason really is to get side effects of constructing the object (which is not necessarily a very good API design in itself, IMO, but that's irrelevant here).
There is no real difference between the two cases. In the second case an anonymous variable will be created that will be normally garbage collected. The second case will also save you some typing and is somewhat more readable. A reader may expect to find a reference at the created object (if you choose the first version) and be surprised if he doesn't find one.
In any case, a static method may be more suitable for such cases.
they are the same, but a comment about why you are doing it might be in order. otherwise someone might come along and delete it, thinking it is a no-op without investigating.
you could also be more explict and call
table.getColumn(2).setCellEditor(new ButtonCellEditor());
For example:
I'm creating a CSV file and I have a CsvOptions object which holds several parameters for the CSV file creation.
if (cells.hasNext()) {
output.write(csvOptions.getDelimiter());
}
The csvOptions is just a simple container object, that kind you are tempted to not use any getters at all. The getter just returns the value from a private field.
Is it worth to cache the return value of a trivial getter, like getDelimiter() in my example, in terms of
... execution speed?
... coding style?
The JIT should be able to inline all trivial getters at run-time. So this is not really a concern.
Regarding the coding style, I would prefer the direct call -- unless it makes the line where it occurs too long -- when you need the value only once.
Caching shouldn't be an option.
If it is a container, make it immutable, i.e. make all its fields public final.
I believe that Martin Fowler's refactoring explicitly commends not doing this
char delimiter = csvOptions.getDelimiter();
// code here to use delimiter
instead he favours using
csvOptions.getDelimiter()
directly. The argument is two-fold. First there should be minimal performance overhead, compiler and JITers can optimise the function call. Second, by using the method we actually make the code easier to refactor in future.
See Fowler's book referenced here