Create dynamic classes with reserved words as variables - java

This question was once asked without a satisfactory answer besides "why would you want to do this" at Reserved words as variable or method names. I'm going to ask it again, and provide context that explains why it is necessary, and even the direction to a proper solution.
I am writing code that builds classes dynamically to match the schema of a database, which I have no control over. For the most part, the code is working cleanly, but in about .1% of the column cases, there are reserved words in Java being used as column names. The following code is being used to create the dynamic field in the class:
evalClass.addField(CtField.make("public " + columnType + " " + columnName + ";", evalClass));
Now, with Java the language, this results in an issue, however in JVM byte code, this should be perfectly legal, so there should be a way to dynamically create this field and access it using byte-code operations. Does anybody have any examples of how this would be done in a way that would support arbitrary strings, including spaces and reserved words? Thanks!

It's not clear which part you are stuck on. Any bytecode manipulation library should let you do this.
For example, using ASM, you just pass your string directly to visitField. There's no hoops to jump through or anything.
Note that even at the bytecode level, there are still a few restrictions on field names. In particular, they can't be more than 65535 bytes long in MUTF8 encoding.

You picked the only way where this doesn’t work—Javassist’s source level API. It should be obvious to you that if you use the identifier to construct source code, the identifier must adhere to the source code rules. Besides, using the already known intended structure to construct source code which has to be parsed again to reconstitute the intention, is the most inefficient way of processing byte code.
You can use the Bytecode level API to overcome these limitations. As a side note, most other byte code processing libraries do not have a source level API at all, so with them you would have used a byte code level API right from the start.
That said, you should rethink your premise. Generated classes whose fields can only be accessed via Reflection or other generated code, do not offer any advantage over, e.g. a HashMap mapping from identifiers to values or arrays intrinsically associating columns with positions.

Related

Java Runtime query - toString() and T

In the following expression:
T(org.apache.commons.io.IOUtils).toString(T(java.lang.Runtime)
.getRuntime().exec(T(java.lang.Character).toString(105)
.concat(T(java.lang.Character).toString(100))).getInputStream())
Does the '105' in toString(105) refer to an itemized object within the Character class?
and
Why is the 'T', which I believe expresses a generic type, and is used 4 times in this expression, a necessary feature of Java?
The toString() method that seems to be invoked here is actually the toString(char) (static) method of java.lang.Character. Quoting the documentation:
public static String toString(char c)
Returns a String object representing the specified char.
The result is a string of length 1 consisting solely of the specified char.
Parameters:
c - the char to be converted
Returns:
the string representation of the specified char
Since:
1.4
Note that 100 and 105 are also valid char values where 100 == 'd' and 105 == 'i'.
Update: after knowing the context, I am now confident that this code is intended to be injected into a template for a web page. The template engine used provides special syntax for accessing static methods where T(Classname) resolves to just Classname (not Classname.class!) in the resulting Java code.
So your code would be translated to:
org.apache.commons.io.IOUtils.toString(java.lang.Runtime
.getRuntime().exec(java.lang.Character.toString(105)
.concat(java.lang.Character.toString(100))).getInputStream())
The full qualification of the class names is necessary because we do not know if those classes are imported on the attacked site (or if the template engine even allows imports or class names must always be fully qualified).
A more readable version of the code that assumes imports is
IOUtils.toString(
Runtime.getRuntime().exec(
Character.toString(105).concat(Character.toString(100))
).getInputStream()
)
And after a little de-obfuscation...
IOUtils.toString(Runtime.getRuntime().exec("id").getInputStream())
Whatever this is, it is definitely NOT meaningful Java code.
And the fact that you can provide it as as a search query on some site is not evidence that it is Java either.
I suspect that this is actually some custom (site-specific?) query language. That makes it futile to try to understand it as a Java snippet.
Your theory that T could denote a generic type parameter doesn't work. Java would not allow you to write T(...) if that was the case.
Furthermore, if we assume that org.apache.commons.io.IOUtils, java.lang.Runtime and so on are intended to refer to Java class objects, then the correct Java syntax would be org.apache.commons.io.IOUtils.class, java.lang.Runtime.class and so on.
So what does it mean?
Well a bit of Googling found me some other examples that look like yours. For instance;
https://github.com/VikasVarshney/ssti-payload
appears to generate "code" that is reminiscent of your example. This is SSTI - Server Side Template Injection, and it appears to be targeting Java EE Expression Language (EL).
And I think this particular example is an attempt to run the Linux id program ... which would output some basic information about the user and group ids for the account running your web server.
Does it matter? Well only if your site is vulnerable to SSTI attacks!
How would you know if your site is vulnerable?
By understanding the nature of SSTI with respect to EL and other potential attack vectors ... and auditing your codebase and configurations.
By using a vulnerability scanner to test your site and/or your code-base.
By employing the services of a trustworthy IT security company to do some penetration testing.
In this case, you could also try to use curl to repeat the attempted attack ... as the hacker would have done ... based on what is in your logs. Just see if it actually works. Note that running the id program does no actual damage to your system. The harm would be in the information that is leaked to a hacker ... if they succeed.
Note that if this hack did succeed, then the hacker would probably try to run other programs. These could do some damage to your system, depending on how how well your server was hardened against such things.

Why doesn't Java compiler shorten names by default? (both for performance and obfuscation)

I cannot understand why the Java compiler does not shorten names of variables, parameters, method names, by replacing them with some unique IDs.
For instance, given the class
public class VeryVeryVeryVeryVeryLongClass {
private int veryVeryVeryVeryVeryLongInt = 3;
public void veryVeryVeryVeryVeryLongMethod(int veryVeryVeryVeryVeryLongParamName) {
this.veryVeryVeryVeryVeryLongInt = veryVeryVeryVeryVeryLongParamName;
}
}
the compiled file contains all these very long names:
Wouldn't simple unique IDs speed the parsing, and also provide a first obfuscation?
You assume that obfuscation is always desired, but it isn't:
Reflection would break, and with it JavaBeans and many frameworks reliant on it
Stack traces would become completely unreadable
If you tried to code against a compiled JAR, you'd end up with code like String name = p.a1() instead of String name = p.getName()
Obfuscation is normally the very last step taken, when you're delivering the finished app, and even then it's not used particularly often except when the target platform has severe memory constraints.
When you use a class, you refer to its methods by their name. Therefore, the compiler needs to preserve those names.
In any event, I don't see why the compiler should aim to obfuscate anything. Rather, it should aim to do exactly the opposite: be as transparent as possible.
The JVM does use numeric IDs internally.
Class files cannot be obfuscated like that because Java is dynamically linked: names of members must be publicly readable or other classes cannot use your code.
Wouldn't simple unique IDs speed the parsing?
No. It would add a mapping that would probably slow it down.
and also provide a first obfuscation
Yes, but who wants the compiler to do obfuscation buy default? Not me.
Your suggestion has no merit.

Forcing devs to explicitly define keys for configuration data

We are working in a project with multiple developers and currently the retrieval of values from a configuration file is somewhat "wild west":
Everybody uses some string to retrieve a value from the Config object
Those keys are spread across multiple classes and packages
Sometimes the are not even declared as constants
Naming of the keys is inconsistent and the config file (.properties) looks messy
I would like to sort that out and force everyone to explicitly define their configuration keys. Ideally in one place to streamline how config keys actually look.
I was thingking of using an Enum as a key and turning my retrieval method into:
getConfigValue(String key)
into something like
getConfigValue(ConfigKey)
NOTE: I am using this approach since the Preferences API seems a bit overkill to me plus I would actually like to have the configuration in a simple file.
What are the cons of this approach?
First off, FWIW, I think it's a good idea. But you did specifically ask what the "cons" are, so:
The biggest "con" is that it ties any class that needs to use configuration data to the ConfigKey class. Adding a config key used to mean adding a string to the code you were working on; now it means adding to the enum and to the code you were working on. This is (marginally) more work.
You're probably not markedly increasing inter-dependence otherwise, since I assume the class that getConfigValue is part of is the one on which you'd define the enum.
The other downside to consolidation is if you have multiple projects on different parts of the same code base. When you develop, you have to deal with delivery dependencies, which can be a PITA.
Say Project A and Project B are scheduled to get released in that order. Suddenly political forces change in the 9th hour and you have to deliver B before A. Do you repackage the config to deal with it? Can your QA cycles deal with repackaging or does it force a reset in their timeline.
Typical release issues, but just one more thing you have to manage.
From your question, it is clear that you intend to write a wrapper class for the raw Java Properties API, with the intention that your wrapper class provides a better API. I think that is a good approach, but I'd like to suggest some things that I think will improve your wrapper API.
My first suggested improvement is that an operation that retrieves a configuration value should take two parameters rather than one, and be implemented as shown in the following pseudocode:
class Configuration {
public String getString(String namespace, String localName) {
return properties.getProperty(namespace + "." + localName);
}
}
You can then encourage each developer to define a string constant value to denote the namespace for whatever class/module/component they are developing. As long as each developer (somehow) chooses a different string constant for their namespace, you will avoid accidental name clashes and promote a somewhat organised collection of property names.
My second suggested improvement is that your wrapper class should provide type-safe access to property values. For example, provide getString(), but also provide methods with names such as getInt(), getBoolean(), getDouble() and getStringList(). The int/boolean/double variants should retrieve the property value as a string, attempt to parse it into the appropriate type, and throw a descriptive error message if that fails. The getStringList() method should retrieve the property value as a string and then split it into a list of strings based on using, say, a comma as a separator. Doing this will provide a consistent way for developers to get a list value.
My third suggested improvement is that your wrapper class should provide some additional methods such as:
int getDurationMilliseconds(String namespace, String localName);
int getDurationSeconds(String namespace, String localName);
int getMemorySizeBytes(String namespace, String localName);
int getMemorySizeKB(String namespace, String localName);
int getMemorySizeMB(String namespace, String localName);
Here are some examples of their intended use:
cacheSize = cfg.getMemorySizeBytes(MY_NAMSPACE, "cache_size");
timeout = cfg.getDurationMilliseconds(MY_NAMSPACE, "cache_timeout");
The getMemorySizeBytes() method should convert string values such as "2048 bytes" or "32MB" into the appropriate number of bytes, and getMemorySizeKB() does something similar but returns the specified size in terms of KB rather than bytes. Likewise, the getDuration<units>() methods should be able to handle string values like "500 milliseconds", "2.5 minutes", "3 hours" and "infinite" (which is converted into, say, -1).
Some people may think that the above suggestions have nothing to do with the question that was asked. Actually, they do, but in a sneaky sort of way. The above suggestions will result in a configuration API that developers will find to be much easier to use than the "raw" Java Properties API. They will use it to obtain that ease-of-use benefit. But using the API will have the side effect of forcing the developers to adopt a namespace convention, which will help to solve the problem that you are interested in addressing.
Or to look at it another way, the main con of the approach described in the question is that it offers a win-lose situation: you win (by imposing a property-naming convention on developers), but developers lose because they swap the familiar Java Properties API for another API that doesn't offer them any benefits. In contrast, the improvements I have suggested are intended to provide a win-win situation.

Merits/Reasons for using "get" as a prefix in the name of an accessor method

I know that in Java, it is common practice to use "get" as a prefix to an accessor method. I was wondering what the reason for this is. Is it purely to be able to predict what it is returning?
To clarify: In some java classes (eg String) a variable like length can be accessed by calling "length()" rather than "size()". Why are these methods written like this, but others like "getSomeVariable()"?
Thank you for your time.
Edit: Good to see I'm not alone about the confusion & such about the size and length variables
'get' prefix (or 'is' for methods returning booleans) is a part of JavaBean specification which is used throughout the java but mostly in views in web UI.
length() and size() are historical artefacts from pre-javabean times; many a UI developer had lamented the fact that Collection has a size() method instead of getSize()
Because properties are nouns and methods are verbs. It is part of the bean pattern that is well-established and therefore expected by anyone using your class.
It might make sense to say:
String txt="I have " + car.GetFuelLevel() + " liters of petrol.";
or ...
String txt="I have " + car.FuelLevel + " liters of petrol.";
but not ...
String txt="I have " + car.FuelLevel() + " liters of petrol.";
I mean, it doesn't make sense to say "Hey, car. Go FuelLevel for me." But to say "Hey, car. Go GetFuelLevel for me." That's more natural.
Now, why did they break rank with String.length() and others? That's always bothered me, too.
The get prefix is particularly useful if you also have set, add, remove, etc., methods. Of course, it's generally better to have an interface full of gets or full of sets. If almost every method has get then it just becomes noise. So, I'd drop the get for immutables and the set for builders. For "fundamental" types, such as collections and strings, these little words are also noisy, IMO.
The get/set conventions stem from the java Bean specification. So people strongly tend to use that.
And the .size(), .length(), and even .length attribute of arrays are all examples of Java's failures to follow its own conventions. There are many more, it's "fun" to discover them!
They may be failures to the specification, however they improve readability. size and length allow you to read the following line of code:
for (int i=0; i<thing.size(); ++i){
As...
While i is less than the thing's size...
There's no real convention behind this, but it does make it easier to translate into a sentence directly.
The historical reason was that the JavaBean specification stated that accessors to class properties should be done with getPropertyName/setPropertyName. The benefit was that you could then use Introspection APIs to dynamically list the properties of an object, even one that you hadn't previously compiled into your program. An example of where this would be useful is in building a plug-in architecture that needs to load objects and provide the user access to the properties of the object.
You have different names to retrieve size in different classes simply because they were written by different people and there probably wasn't at the time a design guideline for naming class methods in a consistent manner. Once millions of lines of code had been written using these inconsistent names, it was too late to change.

Can I add and remove elements of enumeration at runtime in Java

It is possible to add and remove elements from an enum in Java at runtime?
For example, could I read in the labels and constructor arguments of an enum from a file?
#saua, it's just a question of whether it can be done out of interest really. I was hoping there'd be some neat way of altering the running bytecode, maybe using BCEL or something. I've also followed up with this question because I realised I wasn't totally sure when an enum should be used.
I'm pretty convinced that the right answer would be to use a collection that ensured uniqueness instead of an enum if I want to be able to alter the contents safely at runtime.
No, enums are supposed to be a complete static enumeration.
At compile time, you might want to generate your enum .java file from another source file of some sort. You could even create a .class file like this.
In some cases you might want a set of standard values but allow extension. The usual way to do this is have an interface for the interface and an enum that implements that interface for the standard values. Of course, you lose the ability to switch when you only have a reference to the interface.
Behind the curtain, enums are POJOs with a private constructor and a bunch of public static final values of the enum's type (see here for an example). In fact, up until Java5, it was considered best-practice to build your own enumeration this way, and Java5 introduced the enum keyword as a shorthand. See the source for Enum<T> to learn more.
So it should be no problem to write your own 'TypeSafeEnum' with a public static final array of constants, that are read by the constructor or passed to it.
Also, do yourself a favor and override equals, hashCode and toString, and if possible create a values method
The question is how to use such a dynamic enumeration... you can't read the value "PI=3.14" from a file to create enum MathConstants and then go ahead and use MathConstants.PI wherever you want...
I needed to do something like this (for unit testing purposes), and I came across this - the EnumBuster:
http://www.javaspecialists.eu/archive/Issue161.html
It allows enum values to be added, removed and restored.
Edit: I've only just started using this, and found that there's some slight changes needed for java 1.5, which I'm currently stuck with:
Add array copyOf static helper methods (e.g. take these 1.6 versions: http://www.docjar.com/html/api/java/util/Arrays.java.html)
Change EnumBuster.undoStack to a Stack<Memento>
In undo(), change undoStack.poll() to undoStack.isEmpty() ? null : undoStack.pop();
The string VALUES_FIELD needs to be "ENUM$VALUES" for the java 1.5 enums I've tried so far
I faced this problem on the formative project of my young career.
The approach I took was to save the values and the names of the enumeration externally, and the end goal was to be able to write code that looked as close to a language enum as possible.
I wanted my solution to look like this:
enum HatType
{
BASEBALL,
BRIMLESS,
INDIANA_JONES
}
HatType mine = HatType.BASEBALL;
// prints "BASEBALL"
System.out.println(mine.toString());
// prints true
System.out.println(mine.equals(HatType.BASEBALL));
And I ended up with something like this:
// in a file somewhere:
// 1 --> BASEBALL
// 2 --> BRIMLESS
// 3 --> INDIANA_JONES
HatDynamicEnum hats = HatEnumRepository.retrieve();
HatEnumValue mine = hats.valueOf("BASEBALL");
// prints "BASEBALL"
System.out.println(mine.toString());
// prints true
System.out.println(mine.equals(hats.valueOf("BASEBALL"));
Since my requirements were that it had to be possible to add members to the enum at run-time, I also implemented that functionality:
hats.addEnum("BATTING_PRACTICE");
HatEnumRepository.storeEnum(hats);
hats = HatEnumRepository.retrieve();
HatEnumValue justArrived = hats.valueOf("BATTING_PRACTICE");
// file now reads:
// 1 --> BASEBALL
// 2 --> BRIMLESS
// 3 --> INDIANA_JONES
// 4 --> BATTING_PRACTICE
I dubbed it the Dynamic Enumeration "pattern", and you read about the original design and its revised edition.
The difference between the two is that the revised edition was designed after I really started to grok OO and DDD. The first one I designed when I was still writing nominally procedural DDD, under time pressure no less.
You can load a Java class from source at runtime. (Using JCI, BeanShell or JavaCompiler)
This would allow you to change the Enum values as you wish.
Note: this wouldn't change any classes which referred to these enums so this might not be very useful in reality.
A working example in widespread use is in modded Minecraft. See EnumHelper.addEnum() methods on Github
However, note that in rare situations practical experience has shown that adding Enum members can lead to some issues with the JVM optimiser. The exact issues may vary with different JVMs. But broadly it seems the optimiser may assume that some internal fields of an Enum, specifically the size of the Enum's .values() array, will not change. See issue discussion. The recommended solution there is not to make .values() a hotspot for the optimiser. So if adding to an Enum's members at runtime, it should be done once and once only when the application is initialised, and then the result of .values() should be cached to avoid making it a hotspot.
The way the optimiser works and the way it detects hotspots is obscure and may vary between different JVMs and different builds of the JVM. If you don't want to take the risk of this type of issue in production code, then don't change Enums at runtime.
You could try to assign properties to the ENUM you're trying to create and statically contruct it by using a loaded properties file. Big hack, but it works :)

Categories

Resources