I have a large data set. I am creating a system which allows users to submit java source files, which will then be applied to the data set. To be more specific, each submitted java source file must contain a static method with a specific name, let's say toBeInvoked(). toBeInvoked will take a row of the data set as an array parameter. I want to call the toBeInvoked method of each submitted source file on each row in the data set. I also need to implement security measures (so toBeInvoked() can't do I/O, can't call exit, etc.).
Currently, my implementation is this: I have a list of the names of the java source files. For each file, I create an instance of the custom secure ClassLoader which I coded, which compiles the source file and returns the compiled class. I use reflection to extract the static method toBeInvoked() (e.g. method = c.getMethod("toBeInvoked", double[].class)). Then, I iterate over the rows of the data set, and invoke the method on each row.
There are at least two problems with my approach:
it appears to be painfully slow (I've heard reflection tends to be slow)
the code is more complicated than I would like
Is there a better way to accomplish what I am trying to do?
There is no significantly better approach given the constraints that you have set yourself.
For what it is worth, what makes this "painfully slow" is compiling the source files to class files and loading them. That is many orders of magnitude slower than the use of reflection to call the methods.
(Use of a common interface rather than static methods is not going to make a measurable difference to speed, and the reduction in complexity is relatively small.)
If you really want to simplify this and speed it up, change your architecture so that the code is provided as a JAR file containing all of the compiled classes.
Assuming your #toBeInvoked() could be defined in an interface rather than being static (it should be!), you could just load the class and cast it to the interface:
Class<? extends YourInterface> c = Class.forName("name", true, classLoader).asSubclass(YourInterface.class);
YourInterface i = c.newInstance();
Afterwards invoke #toBeInvoked() directly.
Also have a look into java.util.ServiceLoader, which could be helpful for finding the right class to load in case you have more than one source file.
Personally, I would use an interface. This will allow you to have multiple instance with their own state (useful for multi-threading) but more importantly you can use an interface, first to define which methods must be implemented but also to call the methods.
Reflection is slow but this is only relative to other options such as a direct method call. If you are scanning a large data set, the fact you have to pulling data from main memory is likely to be much more expensive.
I would suggest following steps for your problem.
To check if the method contains any unwanted code, you need to have a check script which can do these checks at upload time.
Create an Interface having a method toBeInvoked() (not a static method).
All the classes which are uploaded must implement this interface and add the logic inside this method.
you can have your custom class loader scan a particular folder for new classes being added and load them accordingly.
When a file is uploaded and successfully validated, you can compile and copy the class file to the folder which class loader scans.
You processor class can lookup for new files and then call toBeInvoked() method on loaded class when required.
Hope this help. (Note that i have used a similar mechanism to load dynamically workflow step classes in Workflow Engine tool which was developed).
Related
I have been using premain() with addTransformer(). Since, it gives javassist.ClassNotFound exceptions for certain classes when i run the agent with a server, i thought to try the agentMain() with redefineClasses(). I went through many links, but so far i am unable to find a piece of code that gives me clear idea on how to set up a simple java agent using these two methods. Some help would be really appreciated.
Can we use redefineClasses() with premain()? (When we use redefineClasses() do we still need the transform method?)
I am trying to instrument set of methods of set of classes, where i know the fully qualified name of those classes as com.test.Foo. I wanted to instrument them without going through the entire set of classes loaded onto JVM. I have been reading those documents back and forth, but still i am unable to get a clear idea on how to use that redefineClasses method?
You can call redefineClasses from anywhere, also from a premain method which is nothing but an extension to a normal Java program run by the same JVM process previous to a main method.
A trivial example for running a redefinition is:
instrumentation.redefineClasses(new ClassDefinition(Foo.class, new byte[] {...}));
This way, Foo is set to be represented by the byte array that must contain a valid class file for Foo where all signatures of fields and methods are the same as by the loaded Foo.class. You can use a tool like ASM for instrumenting the class.
If you really only want to instrument Foo, then this might just be the way to go instead of using a ClassFileTransformer.
I have a little design issue on which I would like to get some advice:
I have several classes that inherit from the same base class, each one can accept the same data and analyze it in a slightly different way.
Analyzer
|
˪_ AnalyzerA
|
˪_ AnalyzerB
...
I have an input file (I do not have control over the file's format) that defines which analyzers should be invoked and their parameters. Plus it defines data-extractors in the same way and other similar things too (in similar I mean that this is an action that can have several variations).
I have a module that iterates over different analyzers in the file and calls some factory that constructs the correct analyzer. I have a factory for each of the archetypes the input file can define and so far so good.
But what if I want to extend it and to add a new type of analyzer?
The solution I was thinking about is using a property file for each factory that will be named after the factories name and it will hold a mapping between the input file's definition of whatever it wants me to execute and the actual classes that I use to execute the action.
This way I could load that class at run-time -> verify that it's implementing the right interface and then execute it.
If some John Doe would like to create his own analyzer he'd just need to add a new property to the correct file (I'm not quite sure what would be the best strategy to allow this kind of property customization).
So in short:
Is my solution too flawed?
If no what would be the most user friendly/convenient way to allow customization of properties?
P.S
Unfortunately I'm confined to using only build in JDK classes as the existing solution, so I can't just drop in SF on them.
I hope this question is not out of line I'm just not used to having my wings clipped this way, not having SF or some other to help me implement an elegant solution.
Have a look at the way how the java.sql.DriverManager.getConnection(connectionString) method is implemented. The best way is to watch the source code.
Very rough summary of the idea (it is hidden inside a lot of private methods). It is more or less an implementation of chain of responsibility, although there is not linked list of drivers.
DriverManager manages a list of drivers.
Each driver must register itself to the DriverManager by calling its method registerDriver().
Upon request for a connection, the getConnection(connectionString) method sequentially calls the drivers passing them the connectionString.
Each driver KNOWS if the given connection string is within its competence. If yes, it creates the connection and returns it. Otherwise the control is passed to the next driver.
Analogy:
drivers = your concrete Analyzers
connection strings = types of your files to be analyzed
Advantages:
There is no need to explicitly bind the analyzers with their type of file they are meant for. Let the analyzer to decide itself if it is able to analyze the file. If not, null is returned (or an exception or whatever) to tell the AnalyzerManager that the next analyzer in the row should be asked.
Adding new analyzer just means adding a new call to the register() method. Complete decoupling.
I am building a webcrawler which is using two classes: a downloader class and an analyzer class. Due to my design of the program I had some methods which I outsourced to a static class named utils (finding the link suffix, determining if I should download it given some variables, etc.). Since at a certain time there is more than one downloader and more than one analyzer I'm wondering whether they can get a wrong answer from some static method in the utils class.
For example, say the analyzer needs to know the link suffix - it's using the utils.getSuffix(link) method. At that same time the OS switches to some downloader thread which also needs to get some link suffix and again uses utils.getSuffix(link). Now the OS switches back to the analyzer thread which does not get the correct response.
Am I right?
In case I'm right should I add synchronized to every method on the utils class? Or should I just use the relevant methods in every thread to prevent that kind of scenario even though I'm duplicating code?
This entirely depends on the implementation of the method. If the method uses only local variables and determines the suffix based on the parameter you give it, all is well. As soon as it needs any resource that is accessible from another thread (local variables and parameters are not) you'll need to worry about synchronization.
It seems to me you're using statics as utilities that don't need anything outside their own parameters; so you should be safe :)
For the sake of an example, I have a class called FileIdentifier. This class:
Has the method identify which accepts a File and returns a String representing the type.
Requires external data since new file formats are a possibility.
How could this class be written so it could used in any project while remaining unobstrusive? Overall, how is this aspect usually handled in standalone frameworks that require configuration?
That all depends on how you identify the file type. From your question I would assume that it's not a process as trivial as parsing for the file extension...
That said maybe you could just use an external XML file, or INI, or db table etc. that maps file types and just have the class read that data and return whatever... (You would actually want to use a few classes to keep things clean.) That way only the external data would need to be updated and the class remain unchanged.
Try with a chain of responsibility.
Each instance in the chain is from a different class that manages a single file type. The file is passed down in the chain, and as soon as an instance decides to manage it, the chain stops and the results are returned back.
Then you just would have to build the chain in the desired order (maybe with more common file types at the top), provide default classes that manages some file types in your framework. This shoud be also easy to extend in your applications, it's just a matter of writing another subclass of the chain that manages your new user-defined file types.
Of course your base class for the chain (the Handler, as called by dofactory.com) could provide useful protected methods to its subclasses in order to make their work easier.
Is there a feasible way to get my own code run whenever any class is loaded in Java, without forcing the user explicitly and manually loading all classes with a custom classloader?
Without going too much into the details, whenever a class implementing a certain interface read its annotation that links it with another class, and give the pair to a third class.
Edit: Heck, I'll go to details: I'm doing an event handling library. What I'm doing is having the client code do their own Listener / Event pairs, which need to be registered with my library as a pair. (hm, that wasn't that long after all).
Further Edit: Currently the client code needs to register the pair of classes/interfaces manually, which works pretty well. My intent is to automate this away, and I thought that linking the two classes with annotations would help. Next, I want to get rid of the client code needing to keeping the list of registrations up to date always.
PS: The static block won't do, since my interface is bundled into a library, and the client code will create further interfaces. Thus, abstract classes won't do either, since it must be an interface.
If you want to base the behavior on an interface, you could use a static initializer in that interface.
public interface Foo{
static{
// do initializing here
}
}
I'm not saying it's good practice, but it will definitely initialize the first time one of the implementing classes is loaded.
Update: static blocks in interfaces are illegal. Use abstract classes instead!
Reference:
Initializers (Sun Java Tutorial)
But if I understand you right, you want the initialization to happen once per implementing class. That will be tricky. You definitely can't do that with an interface based solution. You could do it with an abstract base class that has a dynamic initializer (or constructor), that checks whether the requested mapping already exists and adds it if it doesn't, but doing such things in constructors is quite a hack.
I'd say you cleanest options are either to generate Code at build time (through annotation processing with apt or through bytecode analysis with a tool like asm) or to use an agent at class load time to dynamically create the mapping.
Ah, more input. Very good. So clients use your library and provide mappings based on annotations. Then I'd say your library should provide an initializer method, where client code can register classes. Something like this:
YourLibrary.getInstance().registerMappedClasses(
CustomClass1.class,
CustomClass2.class,
CustomClass3.class,
CustomClass4.class
)
Or, even better, a package scanning mechanism (example code to implement this can be found at this question):
YourLibrary.getInstance().registerMappedClassesFromPackages(
"com.mycompany.myclientcode.abc",
"com.mycompany.myclientcode.def"
)
Anyway, there is basically no way to avoid having your clients do that kind of work, because you can't control their build process nor their classloader for them (but you could of course provide guides for classloader or build configuration).
If you want some piece of code to be run on any class loading, you should:
overwrite the ClassLoader, adding your own custom code at the loadClass methods (don't forget forwarding to the parent ClassLoader after or before your custom code).
Define this custom ClassLoader as the default for your system (here you got how to do it: How to set my custom class loader to be the default?).
Run and check it.
Depending on what kind of environment you are, there are chances that not all the classes be loaded trouugh your custom ClassLoader (some utility packages use their own CL, some Java EE containers handle some spacific areas with specific classLoaders, etc.), but it's a kind of aproximation to what you are asking.