Related
So, say I've got lots of values and flags passed into my program from the command line, all stored as variables in some configuration object, config. These variables default to null if they are not provided by the user.
I've then got some other object, say an instance of Dog, which has lots of methods. Depending on the value of a specific command line argument, I may or may not want to call a specific method, possibly passing the argument value to the method.
At the moment I'm doing that like this:
Dog dog = new Dog();
if (config.argumentA != null) {
dog.methodA(config.argumentA);
}
if (config.argumentB != null) {
dog.methodB(config.argumentB);
}
if (config.boolArgument) {
dog.methodC();
}
// ... ... ...
if (config.argumentZ != null) {
dog.methodZ(config.argumentZ);
}
Now I've tried to look for a more elegant way of doing this, since this feels very dirty, but Google and Java jargon have me stumped.
I'm imagining making a map from the arguments' names to the function names, then looping through, checking each argument's value and calling the corresponding method. Does such a mapping exist in Java? Is there any way to do this nicely, or am I going about the problem completely wrong?
P.S.: I'm a bit of a beginner with both Java and problems like this, so pls be gentle :)
Actually this question relates to the programming practices. So like Alan Kay said OOP is basically message passing. Thus your code should not be making these decisions but rather passing this info to some other method of some other class, till the time it's actually needed. Now if you couple this concept with different Design patterns you'll get an elegant piece of code.
Also it's difficult to suggest a particular solution to a problem as abstract as your's.
When i see code from others, i mainly see two types of method-styling.
One looks like this, having many nested ifs:
void doSomething(Thing thing) {
if (thing.hasOwner()) {
Entity owner = thing.getOwner();
if (owner instanceof Human) {
Human humanOwner = (Human) owner;
if (humanOwner.getAge() > 20) {
//...
}
}
}
}
And the other style, looks like this:
void doSomething(Thing thing) {
if (!thing.hasOwner()) {
return;
}
Entity owner = thing.getOwner();
if (!(owner instanceof Human)) {
return;
}
Human humanOwner = (Human) owner;
if (humanOwner.getAge() <= 20) {
return;
}
//...
}
My question is, are there names for these two code styles? And if, what are they called.
The early-returns in the second example are known as guard clauses.
Prior to the actual thing the method is going to do, some preconditions are checked, and if they fail, the method immediately returns. It is a kind of fail-fast mechanism.
There's a lot of debate around those return statements. Some think that it's bad to have multiple return statements within a method. Others think that it avoids wrapping your code in a bunch of if statements, like in the first example.
My own humble option is in line with this post: minimize the number of returns, but use them if they enhance readability.
Related:
Should a function have only one return statement?
Better Java syntax: return early or late?
Guard clauses may be all you need
I don't know if there is a recognized name for the two styles, but in structured programming terms, they can be described as "single exit" versus "multiple exit" control structures. (This also includes continue and break statements in loop constructs.)
The classical structured programming paradigm advocated single exit over multiple exit, but most programmers these days are happy with either style, depending on the context. Even classically, relaxation of the "single exit" rule was acceptable when the resulting code was more readable.
(One needs to remember that structured programming was a viewed as the antidote to "spaghetti" programming, particularly in assembly language, where the sole control constructs were conditional and non-conditional branches.)
i would say it's about readability. The 2nd style which i prefer, gives you the opportunity to send for example messages to the user/program for any check that should stop the program.
One could call it "multiple returns" and "single return". But I wouldn't call it a style, you may want to use both approaches, depending on readability in any particular case.
Single return is considered a better practice in general, since it allows you to write more readable code with the least surprise for the reader. In a complex method, it may be quite complicated to understand at which point the program will exit for any particular arguments, and what side effects may occur.
But if in any particular case you feel multiple returns improve readability of your code, there's nothing wrong with using them.
I have a method that will process a Collection<Nodes> that is passed in as a parameter. This Collection will be modified, therefore I thought it would be good to first make a copy of it. How do I name the parameter and local variable, e.g. nodes in the example below?
List<Nodes> process(Collection<Nodes> nodes) {
List<Nodes> nodes2 = new ArrayList<>(nodes);
...
}
As another example consider the following where the variable is an int parsed from a String parameter:
public void processUser(final String userId) {
final int userId2 = Integer.parseInt(userId);
...
A good approach to the name variables problem is to use names that suggest the actual meaning of the variable. In your example, you are using names that do not say anything about the method functionality or variables meaning, that's why it is hard to pick a name.
There are many cases like yours in the JDK, e.g. Arrays#copyOf:
public static <T,U> T[] copyOf(U[] original, int newLength, Class<? extends T[]> newType) {
#SuppressWarnings("unchecked")
T[] copy = ((Object)newType == (Object)Object[].class)
? (T[]) new Object[newLength]
: (T[]) Array.newInstance(newType.getComponentType(), newLength);
System.arraycopy(original, 0, copy, 0,
Math.min(original.length, newLength));
return copy;
}
In this case they call the parameter original and the local variable copy which perfectly expresses that the returned value is a copy of the parameter. Precisely, copying is what this method does and it is named accordingly.
Using the same reasoning for your case (consider refactoring to give more meaningful names to your method and variables) I would name your local copy of nodes something like processedNodes, to express what that variable is and to be consistent with your method's name.
Edit:
The name of the new method you added in your edit does not provide hints about what it does either. I'll assume that it modifies some properties (maybe in a database) of the user whose id is passed via parameter.
If that is the case (or similar), I think that an appropriate approach you
could apply would be that every method should have a single responsibility. According to your method's name it should process the user, for that you need an int userId. The responsibility of parsing an String userId should be out of the scope of this method.
Using the proposed approach has, among others, the following advantages:
Your class won't change if you have to add additional validation to your input.
Your class won't be responsible for handling NumberFormatException which must be the application responsibility.
Your processUser method won't change if you have to handle different types of inputs (e.g. float userId).
It ultimately comes down to what you want to communicate to future programmers. The computer obviously doesn't care; it's other people you're talking to. So the biggest factor is going to be what those people need to know:
What is the logical (abstract, conceptual) meaning of this variable?
What aspects of how this variable is used could be confusing to programmers?
What are the most important things about this variable?
Looking at your first example, it's kind of hard to understand enough about your program to really choose a good name. The method is called process; but methods generally speaking implement computational processes, so this name really doesn't tell me anything at all. What are you processing? What is the process? Who are you processing it for, and why? Knowing what the method does, and the class it's in, will help to inform your variable name.
Let's add some assumptions. Let's say you're building an application that locates Wi-fi access points in a building. The Node in question is a wireless node, with subclasses Repeater, AccessPoint, and Client. Let's also say it's an online-processed dataset, so the collection of nodes given may change at any time in response to a background thread receiving updates in what nodes are currently visible. Your reason for copying the collection at the head of the method is to isolate yourself from those changes for the duration of local processing. Finally, let's assume that your method is sorting the nodes by ping time (explaining why the method takes a generic Collection but returns the more specific List type).
Now that we better understand your system, let's use that understanding to choose some names that communicate the logical intention of your system to future developers:
class NetworkScanner {
List<Node> sortByPingTime(Collection<Node> networkNodes) {
final ArrayList<Node> unsortedSnapshot;
synchronized(networkNodes) {
unsortedSnapshot = new ArrayList<>(networkNodes);
}
return Utils.sort(unsortedSnapshot, (x,y) -> x.ping < y.ping);
}
}
So the method is sortByPingTime to define what it does; the argument is networkNodes to describe what kind of node we're looking at. And the variable is called unsortedSnapshot to express two things about it that aren't visible just by reading the code:
It's a snapshot of something (implying that the original is somehow volatile); and
It has no order that matters to us (suggesting that it might have, by the time we're done with it).
We could put nodes in there, but that's immediately visible from the input argument. We could also call this snapshotToSort but that's visible in the fact that we hand it off to a sort routine immediately below.
This example remains kind of contrived. The method is really too short for the variable name to matter much. In real life I'd probably just call it out, because picking a good name would take longer than anyone will ever waste figuring out how this method works.
Other related notes:
Naming is inherently a bit subjective. My name will never work for everyone, especially when multiple human languages are taken into account.
I find that the best name is often no name at all. If I can get away with making something anonymous, I will--this minimizes the risk of the variable being reused, and reduces symbols in IDE 'find' boxes. Generally this also pushes me to write tighter, more functional code, which I view as a good thing.
Some people like to include the variable's type in its name; I've always found that a bit odd because the type is generally immediately obvious, and the compiler will usually catch me if I get it wrong anyway.
"Keep it Simple" is in full force here, as everywhere. Most of the time your variable name will not help someone avoid future work. My rule of thumb is, name it something dumb, and if I ever end up scratching my head about what something means, choose that occasion to name it something good.
I used to give names, which reflect and emphasize the major things. So a potential reader (including myself after a couple of months) can get immediately, what is done inside the method just by its signature.
The API in discussion receives an input , does some processing and returns the output. These are the three main things here.
If it is not important, what processing is done and what is the type of input, the most generic is this form:
List<Nodes> process(Collection<Nodes> input) {
List<Nodes> output = new ArrayList<>(input);
...
}
and
public void process(final String input) {
final int output = Integer.parseInt(input);
...
If it is important to provide more information about processing and type of an input, names like: processCollection, inputCollection and processUser, inputUserId are more appropriate, but the local variable is still the output - it is clear and self-explained name:
List<Nodes> processCollection(Collection<Nodes> inputCollection) {
List<Nodes> output = new ArrayList<>(inputCollection);
...
}
and
public void processUser(final String inputUserId) {
final int output = Integer.parseInt(inputUserId);
...
It depends on the use case and sometimes it is even more appropriate to elaborate the processing, which is done: asArray or asFilteredArray etc instead of processCollection.
Someone may prefer the source-destination terminology to the input-output - I do not see the major difference between them. If this serves telling the method story with its title, it is good enough.
It depends on what you are going to do with the local variable.
For example in the first example it seems that is likely that variable nodes2 will actually be the value returned in the end. My advice is then to simply call it result or output.
In the second example... is less clear what you may want to achieve... I guess that userIdAsInt should be fine for the local. However if an int is always expected here and you still want to keep the parameter as a String (Perhaps you want to push that validation out of the method) I think it is more appropriate to make the local variable userId and the parameter userIdAsString or userIdString which hints that String, although accepted here, is not the canonic representation of an userId which is an int.
For sure it depends on the actual context. I would not use approaches from other programming languages such as _ which is good for instance for naming bash scripts, IMO my is also not a good choice - it looks like a piece of code copied from tutorial (at least in Java).
The most simple solution is to name method parameter nodesParam or nodesBackup and then you can simply go with nodes as a copy or to be more specific you can call it nodesCopy.
Anyway, your method process has some tasks to do and maybe it is not the best place for making copies of the nodes list. You can make a copy in the place where you invoke the method, then you can simply use nodes as a name of your object:
List<Nodes> process(Collection<Nodes> nodes) {
// do amazing things here
// ...
}
// ...
process(new ArrayList<>(nodes))
// ...
Just my guess, you have got a collection and you want to keep the original version and modify the copy, maybe a real solution for you is to use java.util.stream.Stream.
Simply put, when naming the variable, I consider a few things.
How is the copy created? (Is it converted from one type to another?...)
What am I going to do with the variable?
Is the name short, but/and meaningful?
Considering the same examples you have provided in the question, I will name variables like this:
List<Nodes> process(Collection<Nodes> nodes) {
List<Nodes> nodesCopy = new ArrayList<>(nodes);
...
}
This is probably just a copy of the collection, hence the name nodesCopy. Meaningful and short. If you use nodesList, that can mean it is not just a Collection; but also a List (more specific).
public void processUser(final String userId) {
final int userIdInt = Integer.parseInt(userId);
...
The String userId is parsed and the result is an integer (int)! It is not just a copy. To emphasize this, I would name this as userIdInt.
It is better not to use an underscore _, because it often indicates instance variables. And the my prefix: not much of a meaning there, and it is nooby (local will do better).
When it comes to method parameter naming conventions, if the thing a method parameter represents will not be represented by any other variable, use a method parameter name that makes it very clear what that method parameter is in the context of the method body. For example, primaryTelephoneNumber may be an acceptable method parameter name in a JavaBean setter method.
If there are multiple representations of a thing in a method context (including method parameters and local variables), use names that make it clear to humans what that thing is and how it should be used. For example, providedPrimaryTelephoneNumber, requestedPrimaryTelephoneNumber, dirtyPrimaryTelephoneNumber might be used for the method parameter name and parsedPrimaryTelephoneNumber, cleanPrimaryTelephoneNumber, massagedPrimaryTelephoneNumber might be used for the local variable name in a method that persists a user-provided primary telephone number.
The main objective is to use names that make it clear to humans reading the source code today and tomorrow as to what things are. Avoid names like var1, var2, a, b, etc.; these names add extra effort and complexity in reading and understanding the source code.
Don't get too caught up in using long method parameter names or local variable names; the source code is for human readability and when the class is compiled method parameter names and local variable names are irrelevant to the machine.
I'm considering the option of using anonymous { } code blocks to logically distinguish "code blocks" inside the same method call, something that (theoretically) should improve readability of the code.
I'm wondering which of the following 2 code segments is better to your eyes?
Also, are the 2 code segments compile to the same bytecode?, In other words, can using { } hurt in any way the performance of the code?
Option 1: Code block without { } identation
public static String serviceMatch(HttpServletRequest servletRequest, RequestTypeEnum requestTypeEnum, ...censorsed..., RequestStatistics requestStatistics) {
Request request;
// We get the parser that fits the ...censorsed..., effectively transforming the HTTPReqeuest to application local "Request*" object
RequestParser parser = RequestParserFactory.getParser(...censorsed...);
// Populate basic parameters, the "heavy" data will be lazy loaded
request = parser.parse(servletRequest);
// Instead of polluting the parsers let's put it here... (unless we identify meaningful justifications for the other alternative of changing RequestParser.parse() interface.
request.requestType = requestTypeEnum;
// Store the request statistics object on the request, so that we have access to it from all over the code
request.requestStatistics = requestStatistics;
// Update timestamp when request was parsed
request.requestStatistics._1_end_parseRequest = System.currentTimeMillis();
/*
* ...censorsed...
*/
MatchResult matchResult = Matcher.findMatch(...censorsed...);
/*
* ...censorsed...
*/
String reply = ReplyFormatFactory.getFormatter(...censorsed...
// Update timestamp when reply finished construction
request.requestStatistics._6_end_formatReply = System.currentTimeMillis();
return reply;
}
Option 2: Code block with { } identation
public static String serviceMatch(HttpServletRequest servletRequest, RequestTypeEnum requestTypeEnum, ...censorsed..., RequestStatistics requestStatistics) {
Request request;
/*
* Request parsing block
*/
{
// We get the parser that fits the ...censorsed..., effectively transforming the HTTPReqeuest to application local "Request*" object
RequestParser parser = RequestParserFactory.getParser(...censorsed...);
// Populate basic parameters, the "heavy" data will be lazy loaded
request = parser.parse(servletRequest);
// Instead of polluting the parsers let's put it here... (unless we identify meaningful justifications for the other alternative of changing RequestParser.parse() interface.
request.requestType = requestTypeEnum;
// Store the request statistics object on the request, so that we have access to it from all over the code
request.requestStatistics = requestStatistics;
}
// Update timestamp when request was parsed
request.requestStatistics._1_end_parseRequest = System.currentTimeMillis();
/*
* ...censorsed...
*/
MatchResult matchResult = Matcher.findMatch(...censorsed...);
/*
* ...censorsed...
*/
String reply = ReplyFormatFactory.getFormatter(...censorsed...
// Update timestamp when reply finished construction
request.requestStatistics._6_end_formatReply = System.currentTimeMillis();
return reply;
}
Thanks for the review, Maxim.
If you're looking into adding extra { }'s within the same method just for the sake of readability, my advice would be to consider refactoring your method into several smaller methods. These smaller methods have the advantage of being easier to understand by themselves, and being more reusable (if they are "loosely coupled"). See the single responsibility principle.
If you come to the state that it would be handy to put the brackets around some part of code (like in Option 2), you should move it to its own method. That's what improves readability.
By the way, I also think you don't really need to comment every single line of your code. For example the timestamp update is self-explanatory even without the comment.
I don't generally add a brace-delimited block without some syntactic reason, but if a variable will only be needed within a limited scope, I'd rather created a nested scope than define the variable in the middle of a larger one (since in the latter case there's no clear indication when the variable goes out of 'useful' scope).
As for pulling out such a code block into another method, I think it's a good idea if the resulting method both (1) has a reasonable batch of parameters, and (2) can be given a name that describes its behavior as well as the actual code does. If using the method would require passing an excessive number of parameters, or if one would have to look at the code in the method to understand what its caller is doing, then I think it's better to use an anonymous scoping block.
I think this is a bit subjective, no right or wrong answer... my opinion is don't do it. Separate blocks of code with comment blocks that precede and explain why they are different, but don't use the braces. When I see braces, I immediately think there should be a leading if, while, or something... and not finding is is a little weird.
You should probably use separate methods instead. You can call the first block processRequest. Anyone who reads this code will be able to see which parameters are used, what data is returned, what it does (even without comments). Blocks don't provide such information.
Bytecode will likely be the same.
I sometimes prefer to use the second option. That happens when extracting separate methods would lead to mess with multiple return parameters (that is, wrapping them in artificial object).
Lighttpd has a comment blocks in configuration file, made in this style;
#{{{ module name
module.option = value;
module.option = value;
#}}}
So you can just comment instead of {}'ing your code.
In Perl, anything inside { }, sub { } or eval { } will be evaluated; however, keeping a large amount of { } blocks inside some sub-routine is considered bad enough to push the code out in smaller parts;
$html .= eval { $val =
&getNextPiece(); return $val; };
So the practice is known.
Braces are usually used to group statements for control structures and the like. I find them jarring when used for anything else.
If I have an overlong function that (for whatever reason) I don't want to split up, I break it apart into blocks with comments.
Braces { } have their purpose (even more in Java 7) and I think they are rarely used just for readability. Personally, if they are used like in Option 2 the first thing that comes to my mind is that, "Is this a static block?". Hence, I find Option 1 "more normal" and readable.
If you are really keen on sticking with one method and not refactoring this chuck of code as suggested by many here, then use comments as line separators instead. Something like:
/* -------------------------------------------- */
/* describe in detail here why you don't want to put this in another method */
/* so other readers will know why! */
// We get the parser that fits the ...censorsed..., effectively transforming the HTTPReqeuest to application local "Request*" object
RequestParser parser = RequestParserFactory.getParser(...censorsed...);
// Populate basic parameters, the "heavy" data will be lazy loaded
request = parser.parse(servletRequest);
// Instead of polluting the parsers let's put it here... (unless we identify meaningful justifications for the other alternative of changing RequestParser.parse() interface.
request.requestType = requestTypeEnum;
// Store the request statistics object on the request, so that we have access to it from all over the code
request.requestStatistics = requestStatistics;
}
/* -------- END of confusing block ------------- */
IMHO, comments are probably the best in making codes readable.
If you're developing in C# I would advise you to use #region ... #endregion instead for readability purpose.
I have a large collection of data in an excel file (and csv files). The data needs to be placed into a database (mysql). However, before it goes into the database it needs to be processed..for example if columns 1 is less than column 3 add 4 to column 2. There are quite a few rules that must be followed before the information is persisted.
What would be a good design to follow to accomplish this task? (using java)
Additional notes
The process needs to be automated. In the sense that I don't have to manually go in and alter the data. We're talking about thousands of lines of data with 15 columns of information per line.
Currently, I have a sort of chain of responsibility design set up. One class(Java) for each rule. When one rule is done, it calls the following rule.
More Info
Typically there are about 5000 rows per data sheet. Speed isn't a huge concern because
this large input doesn't happen often.
I've considered drools, however I wasn't sure the task was complicated enough for drols.
Example rules:
All currency (data in specific columns) must not contain currency symbols.
Category names must be uniform (e.g. book case = bookcase)
Entry dates can not be future dates
Text input can only contain [A-Z 0-9 \s]
etc..
Additionally if any column of information is invalid it needs to be reported when
processing is complete
(or maybe stop processing).
My current solution works. However I think there is room for improvement so I'm looking
for ideals as to how it can be improved and or how other people have handled similar
situations.
I've considered (very briefly) using drools but I wasn't sure the work was complicated enough to take advantage of drools.
If I didn't care to do this in 1 step (as Oli mentions), I'd probably use a pipe and filters design. Since your rules are relatively simple, I'd probably do a couple delegate based classes. For instance (C# code, but Java should be pretty similar...perhaps someone could translate?):
interface IFilter {
public IEnumerable<string> Filter(IEnumerable<string> file) {
}
}
class PredicateFilter : IFilter {
public PredicateFilter(Predicate<string> predicate) { }
public IEnumerable<string> Filter(IEnumerable<string> file) {
foreach (string s in file) {
if (this.Predicate(s)) {
yield return s;
}
}
}
}
class ActionFilter : IFilter {
public ActionFilter(Action<string> action) { }
public IEnumerable<string> Filter(IEnumerable<string> file) {
foreach (string s in file) {
this.Action(s);
yield return s;
}
}
}
class ReplaceFilter : IFilter {
public ReplaceFilter(Func<string, string> replace) { }
public IEnumerable<string> Filter(IEnumerable<string> file) {
foreach (string s in file) {
yield return this.Replace(s);
}
}
}
From there, you could either use the delegate filters directly, or subclass them for the specifics. Then, register them with a Pipeline that will pass them through each filter.
I think your method is OK. Especially if you use the same interface on every processor.
You could also look to somethink called Drules, currently Jboss-rules. I used that some time ago for a rule-heavy part of my app and what I liked about it is that the business logic can be expressed in for instance a spreadsheet or DSL which then get's compiled to java (run-time and I think there's also a compile-time option). It makes rules a bit more succint and thus readable. It's also very easy to learn (2 days or so).
Here's a link to the opensource Jboss-rules. At jboss.com you can undoubtedly purchase an offically maintained version if that's more to your companies taste.
Just create a function to enforce each rule, and call every applicable function for each value. I don't see how this requires any exotic architecture.
A class for each rule? Really? Perhaps I'm not understanding the quantity or complexity of these rules, but I would (semi-pseudo-code):
public class ALine {
private int col1;
private int col2;
private int coln;
// ...
public ALine(string line) {
// read row into private variables
// ...
this.Process();
this.Insert();
}
public void Process() {
// do all your rules here working with the local variables
}
public void Insert() {
// write to DB
}
}
foreach line in csv
new ALine(line);
Your methodology of using classes for each rule does sound a bit heavy weight but it has the advantage of being easy to modify and expand should new rules come along.
As for loading the data bulk loading is the way to go. I have read some informaiton which suggests it may be as much as 3 orders of magnitude faster than loading using insert statements. You can find some information on it here
Bulk load the data into a temp table, then use sql to apply your rules.
use the temp table, as a basis for the insert into real table.
drop the temp table.
you can see that all the different answers are coming from their own experience and perspective.
Since we don't know much about the complexity and number of rows in your system, we tend to give advice based on what we have done earlier.
If you want to narrow down to a 1/2 solutions for your implementation, try giving more details.
Good luck
It may not be what you want to hear, it isn't the "fun way" by any means, but there is a much easier way to do this.
So long as your data is evaluated line by line... you can setup another worksheet in your excel file and use spreadsheet style functions to do the necessary transforms, referencing the data from the raw data sheet. For more complex functions you can use the vba embedded in excel to write out custom operations.
I've used this approach many times and it works really well; its just not very sexy.