Over the years, I think I have seen and tried every conceivable way of generating stub data structures (fake data) for complex object graphs. It always gets hairy in java.
* * * *
A---B----C----D----E
(Pardon cheap UML)
The key issue is that there are certain relationships between the values, so a certain instance of C may imply given values for E.
Any attempt I have seen at applying a single pattern or group of pattens to solve this problem in java ultimately end up being messy.
I am considering if groovy or any of the dynamic vm languages can do a better job. It should be possible to do things significantly simpler with closures.
Anyone have any references/examples of this problem solved nicely with (preferably) groovy or scala ?
Edit:
I did not know "Object Mother" was the name of the pattern, but it's the one I'm having troubles with: When the object structure to be generated by the Object Mother is sufficiently complex, you'll always end up with a fairly complex internal structure inside the Object Mother itself (or by composing multiple Object Mothers). Given a sufficiently large target structure (Say 30 classes), finding structured ways to implement the object mother(s) is really hard. Now that I know the name of the pattern i can google it better though ;)
You might find the Object Mother pattern to be useful. I've used this on my current Groovy/Grails project to help me create example data.
It's not groovy specific, but a dynamic language can often make it easier to create something like this using duck typing and closures.
I typically create object mothers using the builder pattern.
public class ItineraryObjectMother
{
Status status;
private long departureTime;
public ItineraryObjectMother()
{
status = new Status("BLAH");
departureTime = 123456L;
}
public Itinerary build()
{
Itinerary itinerary = new Itinerary(status);
itinerary.setDepartureTime(departureTime);
return itinerary;
}
public ItineraryObjectMother status(Status status)
{
this.status = status;
return this;
}
public ItineraryObjectMother departs(long departureTime)
{
this.departureTime = departureTime;
return this;
}
}
Then it can be used like this:
Itinerary i1 = new ItineraryObjectMother().departs(1234L).status(someStatus).build();
Itinerary i2 = new ItineraryObjectMother().departs(1234L).build();
As Ted said, this can be improved/simplified with a dynamic language.
Related
In this big legacy project there is a core class MyObject that has an ID property, which has been coded as a String. This ID is accessed everywhere across the project.
public class MyObject {
private String id;
public String getId(){
return id;
}
}
I am looking at the possibility of refactoring this String property to a type class Id with the following methods:
class Id implements Comparable<Id> {
String value
Id(String value)
String getValue()
int hashCode()
boolean equals(Object obj)
int compareTo(Id o)
String toString()
}
When refactoring, I need to keep in mind that while I can refactor our own project in any way, there are customers who use the project's API and the changes should preferably be backwards compatible. The current usages of the String IDs internally are:
Get an object ID, store it in a variable and do some comparisons on it later
Create a list or set and add object IDs to it or check if ID is contained already (without custom comparators)
Compare an object ID to a String value (e.g. user input) sometimes with equals() and sometimes with equalsIgnoreCase()
Compare two object IDs
Specifically, what I would like to do is:
Refactor type of ID from String to Id
Refactor the current method String getId() to Id getUniqueID()
Where ID is compared to a string directly using id.equals("String") or id.equalsIgnoreCase("String"), change it to id.equals(new Id("String"))
Add a new method (with the old name) String getId() that will return getUniqueID().getValue(). This is for backwards compatibility with customer code that relies on the old String IDs.
Of course, I could just list all usages of the property, its getters and setters, and go and replace them by hand. Sometimes, I'd probably be able to get away with using a regex, but it probably isn't that great of an idea. Besides, its plain daunting, as there are 500+ usages to edit across a couple of dozen classes, including sub-classes.
I have looked at IDEA's refactor: type migrate function, but it does not seem to be able to do it. I don't normally work in IDEA, so I might be doing something wrong, but it tells me that there are a few hundred conflicts and the list of things it can't convert is rather long.
It does not look like I can provide a mapping of the old getter getId() to new getUniqueID().getValue(), e.g.
myObject.getId().equalsIgnoreCase("test")
would need to map to
myObject.getUniqueID().getValue().equals(new Id("test"))
I imagine that this sort of refactoring should be rather popular, but so far it looks like I will have to do it mostly by hand and search+replace.
Is there an automated way of doing it? Perhaps, some refactoring tool that would allow to specify how old usage pattern would map into new usage pattern?
A simple solution, for backwards compatibility, which does not require major refactoring :
#Deprecated
String getId() { return getUniqueID().getValue(); }
ID getUniqueID() {
//TODO
}
Manually edit getId().getValue().equals("test") occurrences only where it is essential and has clear benefit.
I think the general answer to the posed question is indeed to use IntelliJ IDEA's functionality: Refactor → Type migration.
The challenge is that it will probably migrate a lot of other variables to the new type, and not just the variable I want to migrate.
For example, we have another class NameDescription that takes two String parameters for name and description and is sometimes used as new NameDescription(myObject.getId(), "Some description"). In this case, IDEA figures that it probably needs to change NameDescription.name from String to Id too, which is actually wrong.
In order to avoid this, one needs to preview usages after requesting type migration and manually exclude this and other cases that should not be migrated.
Furthermore, another feature of IDEA can be used to modify patterns in the code: Search/Replace Structurally that allows to define a search pattern with variables and is a bit easier to use than writing a lot of regular expressions yourself.
For backward compatibility, as #c0der suggests, the old getter would return newIdObject.getValue()
I am writing an Android app, in Java, which uses an SQLite database containing dozens of tables. I have a few Datasource classes set up to pull data from these tables and turn them into their respective objects. My problem is that I do not know the most efficient way to structure code that accesses the database in Java.
The Datasource classes are getting very repetitive and taking a long time to write. I would like to refactor the repetition into a parent class that will abstract away most of the work of accessing the database and creating objects.
The problem is, I am a PHP (loosely-typed) programmer and I'm having a very hard time solving this problem in a strictly-typed way.
Thinking in PHP, I'd do something like this:
public abstract class Datasource {
protected String table_name;
protected String entity_class_name;
public function get_all () {
// pseudo code -- assume db is a connection to our database, please.
Cursor cursor = db.query( "select * from {this.table_name}");
class_name = this.entity_class_name;
entity = new $class_name;
// loops through data in columns and populates the corresponding fields on each entity -- also dynamic
entity = this.populate_entity_with_db_hash( entity, cursor );
return entity;
}
}
public class ColonyDatasource extends Datasource {
public function ColonyDataSource( ) {
this.table_name = 'colony';
this.entity_class_name = 'Colony';
}
}
Then new ColonyDatasource.get_all() would get all the rows in table colony and return a bunch of Colony objects, and creating the data source for each table would be as easy as creating a class that has little more than a mapping of table information to class information.
Of course, the problem with this approach is that I have to declare my return types and can't use variable class names in Java. So now I'm stuck.
What should one do instead?
(I am aware that I could use a third-party ORM, but my question is how someone might solve this without one.)
First, is that you don't want to do these lines in your Java code:
class_name = this.entity_class_name;
entity = new $class_name;
It is possible to do what you are suggesting, and in languages such as Java it is called reflection.
https://en.wikipedia.org/wiki/Reflection_(computer_programming)
In this (and many cases) using reflection to do what you want is a bad idea for many reasons.
To list a few:
It is VERY expensive
You want the compiler to catch any mistakes, eliminating as many runtime errors as possible.
Java isn't really designed to be quacking like a duck: What's an example of duck typing in Java?
Your code should be structured in a different way to avoid this type of approach.
Sadly, I do believe that because it is strictly typed, you can't automate this part of your code:
// loops through data in columns and populates the corresponding fields on each entity -- also dynamic
entity = this.populate_entity_with_db_hash( entity, cursor );
Unless you do it through means of reflection. Or shift approaches entirely and begin serializing your objects (¡not recommending, just saying it's an option!). Or do something similar to Gson https://code.google.com/p/google-gson/. I.e. turn the db hash into a json representation and then using gson to turn that into an object.
What you could do, is automate the "get_all" portion of the object in the abstract class since that would be repetitive nearly every instance, but use an implementation so that you can have the abstract function rest assured that it can call a method of it's extending object. This will get you most of the way towards your "automated" approach, reducing the amount of code you must retype.
To do this we must consider the fact that Java has:
Generics (https://en.wikipedia.org/wiki/Generics_in_Java)
Function overloading.
Every Object in Java extends from the Object class, always.
Very Liskov-like https://en.wikipedia.org/wiki/Liskov_substitution_principle
Package scope: What is the default scope of a method in Java?
Try something like this (highly untested, and most likely wont compile) code:
// Notice default scoping
interface DataSourceInterface {
//This is to allow our GenericDataSource to call a method that isn't defined yet.
Object cursorToMe(Cursor cursor);
}
//Notice how we implement here?, but no implemented function declarations!
public abstract class GenericDataSource implements DataSourceInterface {
protected SQLiteDatabase database;
// and here we see Generics and Objects being friends to do what we want.
// This basically says ? (wildcard) will have a list of random things
// But we do know that these random things will extend from an Object
protected List<? extends Object> getAll(String table, String[] columns){
List<Object> items = new ArrayList<Object>();
Cursor cursor = database.query(table, columns, null, null, null, null,null);
cursor.moveToFirst();
while (!cursor.isAfterLast()) {
// And see how we can call "cursorToMe" without error!
// depending on the extending class, cursorToMe will return
// all sorts of different objects, but it will be an Object nonetheless!
Object object = this.cursorToMe(cursor);
items.add(object);
cursor.moveToNext();
}
// Make sure to close the cursor
cursor.close();
return items;
}
}
//Here we extend the abstract, which also has the implements.
// Therefore we must implement the function "cursorToMe"
public class ColonyDataSource extends GenericDataSource {
protected String[] allColumns = {
ColonyOpenHelper.COLONY_COLUMN_ID,
ColonyOpenHelper.COLONY_COLUMN_TITLE,
ColonyOpenHelper.COLONY_COLUMN_URL
};
// Notice our function overloading!
// This getAll is also changing the access modifier to allow more access
public List<Colony> getAll(){
//See how we are casting to the proper list type?
// Since we know that our getAll from super will return a list of Colonies.
return (List<Colony>)super.getAll(ColonyOpenHelper.COLONY_TABLE_NAME, allColumns);
}
//Notice, here we actually implement our db hash to object
// This is the part that would only be able to be done through reflection or what/not
// So it is better to just have your DataSource object do what it knows how to do.
public Colony cursorToMe(Cursor cursor) {
Colony colony = new Colony();
colony.setId(cursor.getLong(0));
colony.setTitle(cursor.getString(1));
colony.setUrl(cursor.getString(2));
return colony;
}
}
If your queries are virtually identical except for certain parameters, consider using prepared statements and binding
In SQLite, do prepared statements really improve performance?
So another option that I have yet to explore fully is something called Java Persistence API, there are projects that implement annotations very similar to this. The majority of these are in the form of an ORM which provide you with Data access objects (http://en.wikipedia.org/wiki/Data_access_object)
An open source project called "Hibernate" seems to be one of the go-to solutions for ORM in Java, but I have also heard that it is a very heavy solution. Especially for when you start considering a mobile app.
An android specific ORM solution is called OrmLite (http://ormlite.com/sqlite_java_android_orm.shtml), this is based off of Hibernate, but is very much stripped down and without as many dependencies for the very purpose of putting it on an android phone.
I have read that people using one will transition to the other very nicely.
Here is my problem (big picture). I have a project which uses large and complicated (by which I mean contains multiple levels of nested structures) Matlab structures. This is predictably slow (especially when trying to load / save). I am attempting to improve runtimes by converting some of these structures into Java Objects. The catch is that the data in these Matlab structures is accessed in a LOT of places, so anything requiring a rewrite of access syntax would be prohibitive. Hence, I need the Java Objects to mimic as closely as possible the behavior of Matlab structures, particularly when it comes to accessing the values stored within them (the values are only set in one place, so the lack of operator overloading in java for setting isn't a factor for consideration).
The problem (small picture) that I am encountering lies with accessing data from an array of these structures. For example,
person(1)
.age = 20
.name
.first = 'John'
.last = 'Smith
person(2)
.age = 25
.name
.first = 'Jane'
.last = 'Doe'
Matlab will allow you to do the following,
>>age = [person(1:2).age]
age =
20 25
Attempting to accomplish the same with Java,
>>jperson = javaArray('myMatlab.Person', 2);
>>jperson(1) = Person(20, Name('John', 'Smith'));
>>jperson(2) = Person(25, Name('Jane', 'Doe'));
>>age = [jperson(1:2).age]
??? No appropriate method or public field age for class myMatlab.Person[]
Is there any way that I can get the Java object to mimic this behavior?
The first thought I had was to simply extend the Person[] class, but this doesn't appear to be possible because it is final. My second approach was to create a wrapper class containing an ArrayList of Person, however I don't believe this will work either because calling
wrappedPerson(1:2)
would either be interpreted as a constructor call to a wrappedPerson class or an attempt to access elements of a non-existent array of WrappedPerson (since java won't let me override a "()" operator). Any insight would be greatly appreciated.
The code I am using for my java class is
public class Person {
int _age;
ArrayList<Name> _names;
public Person(int age, Name name) {
_age = age;
_names.add(name);
}
public int age() {return _age;}
public void age(int age) {_age = age;}
public Name[] name() {return _names.toArray(new Name[0]);}
public void name(Name name) { _names.add(name);}
}
public class Name {
String _first;
String _last;
public Name(String first, String last) {
_first = first;
_last = last;
}
public int first() {return _first;}
public void first(String firstName) {_first = firstName;}
public int last() {return _last;}
public void last(String lastName) {_last = lastName;}
}
TL;DR: It's possible, with some fancy OOP M-code trickery. Altering the behavior of () and . can be done with a Matlab wrapper class that defines subsref on top of your Java wrapper classes. But because of the inherent Matlab-to-Java overhead, it probably won't end up being any faster than normal Matlab code, just a lot more complicated and fussy. Unless you move the logic in to Java as well, this approach probably won't speed things up for you.
I apologize in advance for being long-winded.
Before you go whole hog on this, you might benchmark the performance of Java structures as called from your Matlab code. While Java field access and method calls are much faster on their own than Matlab ones, there is substantial overhead to calling them from M-code, so unless you push a lot of the logic down in to Java as well, you might well end up with a net loss in speed. Every time you cross the M-code to Java layer, you pay. Have a look at the benchmark over at this answer: Is MATLAB OOP slow or am I doing something wrong? to get an idea of scale. (Full disclosure: that's one of my answers.) It doesn't include Java field access, but it's probably on the order of method calls due to the autoboxing overhead. And if you are coding Java classes as in your example, with getter and setter methods instead instead of public fields (that is, in "good" Java style), then you will be incurring the cost of Java method calls with each access, and it's going to be bad compared to pure Matlab structures.
All that said, if you wanted to make that x = [foo(1:2).bar] syntax work inside M-code where foo is a Java array, it would basically be possible. The () and . are both evaluated in Matlab before calling to Java. What you could do is define your own custom JavaArrayWrapper class in Matlab OOP corresponding to your Java array wrapper class, and wrap your (possibly wrapped) Java arrays in that. Have it override subsref and subsasgn to handle both () and .. For (), do normal subsetting of the array, returning it wrapped in a JavaArrayWrapper. For the . case:
If the wrapped object is scalar, invoke the Java method as normal.
If the wrapped object is an array, loop over it, invoke the Java method on each element, and collect the results. If the results are Java objects, return them wrapped in a JavaArrayWrapper.
But. Due to the overhead of crossing the Matlab/Java barrier, this would be slow, probably an order of magnitude slower than pure Matlab code.
To get it to work at speed, you could provide a corresponding custom Java class that wraps Java arrays and uses the Java Reflection API to extract the property of each selected array member object and collect them in an array. The key is that when you do a "chained" reference in Matlab like x = foo(1:3).a.b.c and foo is an object, it doesn't do a stepwise evaluation where it evaluates foo(1:3), and then calls .a on the result, and so on. It actually parses the entire (1:3).a.b.c reference, turns that in to a structured argument, and passes the entire thing in to the subsref method of foo, which has responsibility for interpreting the entire chain. The implicit call looks something like this.
x = subsref(foo, [ struct('type','()','subs',{{[1 2 3]}}), ...
struct('type','.', 'subs','a'), ...
struct('type','.', 'subs','b'), ...
struct('type','.', 'subs','c') ] )
So, given that you have access to the entire reference "chain" up front, if foo was a M-code wrapper class that defined subsasgn, you could convert that entire reference to a Java argument and pass it in a single method call to your Java wrapper class which then used Java Reflection to dynamically go through the wrapped array, select the reference elements, and do the chained references, all inside the Java layer. E.g. it would call getNestedFields() in a Java class like this.
public class DynamicFieldAccessArrayWrapper {
private ArrayList _wrappedArray;
public Object getNestedFields(int[] selectedIndexes, String[] fieldPath) {
// Pseudo-code:
ArrayList result = new ArrayList();
if (selectedIndexes == null) {
selectedIndexes = 1:_wrappedArray.length();
}
for (ix in selectedIndexes) {
Object obj = _wrappedArray.get(ix-1);
Object val = obj;
for (fieldName in fieldPath) {
java.lang.reflect.Field field = val.getClass().getField(fieldName);
val = field.getValue(val);
}
result.add(val);
}
return result.toArray(); // Return as array so Matlab can auto-unbox it; will need more type detection to get array type right
}
}
Then your M-code wrapper class would examine the result and decide whether it was primitive-ish and should be returned as a Matlab array or comma-separated list (i.e. multiple argouts, which get collected with [...]), or should be wrapped in another JavaArrayWrapper M-code object.
The M-code wrapper class would look something like this.
classdef MyMJavaArrayWrapper < handle
% Inherit from handle because Java objects are reference-y
properties
jWrappedArray % holds a DynamicFieldAccessArrayWrapper
end
methods
function varargout = subsref(obj, s)
if isequal(s(1).type, '()')
indices = s(1).subs;
s(1) = [];
else
indices = [];
end
% TODO: check for unsupported indexing types in remaining s
fieldNameChain = parseFieldNamesFromArgs(s);
out = getNestedFields( jWrappedArray, indices, fieldNameChain );
varargout = unpackResultsAndConvertIfNeeded(out);
end
end
end
The overhead involved in marshalling and unmarshalling the values for the subsasgn call would probably overwhelm any speed gain from the Java bits.
You could probably eliminate that overhead by replacing your M-code implementation of subsasgn with a MEX implementation that does the structure marshalling and unmarshalling in C, using JNI to build the Java objects, call getNestedFields, and convert the result to Matlab structures. This is way beyond what I could give an example for.
If this looks a bit horrifying to you, I totally agree. You're bumping up against the edges of the language here, and trying to extend the language (especially to provide new syntactic behavior) from userland is really hard. I wouldn't seriously do something like this in production code; just trying to outline the area of the problem you're looking around.
Are you dealing with homogeneous arrays of these deeply nested structures? Maybe it would be possible to convert them to "planar organized" structures, where instead of an array of structs with scalar fields, you have a scalar struct with array fields. Then you can do vectorized operations on them in pure M-code. This would make things a lot faster, especially with save and load, where the overhead scales per mxarray.
I have an algorithm that alters the state of an object each generation, depending on some semi-random modifications made to a list. I made a simplification to be clearer, so assume I have two class:
public class Archive{
...
}
public class Operation{
...
}
In another class,Algorithm, a method iterates, make some adjustments to a List<Operation> (similar to Genetic Algorithm crossovers and mutations). This list among with other objects related are used to update an Archiveobject, making a lot of calculations and modifications to the Archive object.
In the current version of my code I have a ArchiveUpdateclass that has a internal Archive object and a method that receives ALL the objects used in the update to change the Archive. I think this way is kinda fuzzy and I can't think of another way of doing this better, can anybody help?
Have you considered making the Archive immutable and providing methods that return new Archive instances based on an existing archive? That is, something like:
public class Archive {
private final String field;
public Archive(String field) { this.field = field; }
public Archive changeField(String newField) { return new Archive(newField); }
}
If your objects are all immutable, it's much easier to reason about their state and you wouldn't need an ArchiveUpdate class. However, without more examples of exactly how these classes get used I can't suggest much else.
Its hard to grasp completely...but from what I understood you need a pattern that would allow you to be notified if a "monitored" state changed. If that is the case you should look at Observer pattern it provides a simple way of monitoring state changes.
In Google's Protocol Buffer API for Java, they use these nice Builders that create an object (see here):
Person john =
Person.newBuilder()
.setId(1234)
.setName("John Doe")
.setEmail("jdoe#example.com")
.addPhone(
Person.PhoneNumber.newBuilder()
.setNumber("555-4321")
.setType(Person.PhoneType.HOME))
.build();
But the corresponding C++ API does not use such Builders (see here)
The C++ and the Java API are supposed to be doing the same thing, so I'm wondering why they didn't use builders in C++ as well. Are there language reasons behind that, i.e. it's not idiomatic or it's frowned upon in C++? Or probably just the personal preference of the person who wrote the C++ version of Protocol Buffers?
The proper way to implement something like that in C++ would use setters that return a reference to *this.
class Person {
std::string name;
public:
Person &setName(string const &s) { name = s; return *this; }
Person &addPhone(PhoneNumber const &n);
};
The class could be used like this, assuming similarly defined PhoneNumber:
Person p = Person()
.setName("foo")
.addPhone(PhoneNumber()
.setNumber("123-4567"));
If a separate builder class is wanted, then that can be done too. Such builders should be allocated
in stack, of course.
I would go with the "not idiomatic", although I have seen examples of such fluent-interface styles in C++ code.
It may be because there are a number of ways to tackle the same underlying problem. Usually, the problem being solved here is that of named arguments (or rather their lack of). An arguably more C++-like solution to this problem might be Boost's Parameter library.
The difference is partially idiomatic, but is also the result of the C++ library being more heavily optimized.
One thing you failed to note in your question is that the Java classes emitted by protoc are immutable and thus must have constructors with (potentially) very long argument lists and no setter methods. The immutable pattern is used commonly in Java to avoid complexity related to multi-threading (at the expense of performance) and the builder pattern is used to avoid the pain of squinting at large constructor invocations and needing to have all the values available at the same point in the code.
The C++ classes emitted by protoc are not immutable and are designed so that the objects can be reused over multiple message receptions (see the "Optimization Tips" section on the C++ Basics Page); they are thus harder and more dangerous to use, but more efficient.
It is certainly the case that the two implementations could have been written in the same style, but the developers seemed to feel that ease of use was more important for Java and performance was more important for C++, perhaps mirroring the usage patterns for these languages at Google.
Your claim that "the C++ and the Java API are supposed to be doing the same thing" is unfounded. They're not documented to do the same things. Each output language can create a different interpretation of the structure described in the .proto file. The advantage of that is that what you get in each language is idiomatic for that language. It minimizes the feeling that you're, say, "writing Java in C++." That would definitely be how I'd feel if there were a separate builder class for each message class.
For an integer field foo, the C++ output from protoc will include a method void set_foo(int32 value) in the class for the given message.
The Java output will instead generate two classes. One directly represents the message, but only has getters for the field. The other class is the builder class and only has setters for the field.
The Python output is different still. The class generated will include a field that you can manipulate directly. I expect the plug-ins for C, Haskell, and Ruby are also quite different. As long as they can all represent a structure that can be translated to equivalent bits on the wire, they're done their jobs. Remember these are "protocol buffers," not "API buffers."
The source for the C++ plug-in is provided with the protoc distribution. If you want to change the return type for the set_foo function, you're welcome to do so. I normally avoid responses that amount to, "It's open source, so anyone can modify it" because it's not usually helpful to recommend that someone learn an entirely new project well enough to make major changes just to solve a problem. However, I don't expect it would be very hard in this case. The hardest part would be finding the section of code that generates setters for fields. Once you find that, making the change you need will probably be straightforward. Change the return type, and add a return *this statement to the end of the generated code. You should then be able to write code in the style given in Hrnt's answer.
To follow up on my comment...
struct Person
{
int id;
std::string name;
struct Builder
{
int id;
std::string name;
Builder &setId(int id_)
{
id = id_;
return *this;
}
Builder &setName(std::string name_)
{
name = name_;
return *this;
}
};
static Builder build(/* insert mandatory values here */)
{
return Builder(/* and then use mandatory values here */)/* or here: .setId(val) */;
}
Person(const Builder &builder)
: id(builder.id), name(builder.name)
{
}
};
void Foo()
{
Person p = Person::build().setId(2).setName("Derek Jeter");
}
This ends up getting compiled into roughly the same assembler as the equivalent code:
struct Person
{
int id;
std::string name;
};
Person p;
p.id = 2;
p.name = "Derek Jeter";
In C++ you have to explicitly manage memory, which would probably make the idiom more painful to use - either build() has to call the destructor for the builder, or else you have to keep it around to delete it after constructing the Person object.
Either is a little scary to me.