I have been working on a project alone for more than two years for a company. The project is a really big one using rxtx to communicate with a hardware device. I used Java 8 and JAVAFX for the UI. Now it is almost finished and I am starting to search how to deliver the end user application that the company will distribute over its clients.
The problem is that the company I am working with wants the code to be non reachable when the software is between final clients hands because the Java code contains some extremely sensitive information that could have very bad consequences for the company if final clients happened to know them. The clients can literally perform actions they don’t have the right to perform.
So after searching (a lot) and thinking relatively to my case, I understood that giving a JAR obfuscated isn’t the solution. I then tried to generate a JAR and then transform it to an EXE but all I succeeded on was wrapping the JAR into EXE which does not prevent extracting the JAR and then seeing all the code easily. Finally, I found that I should use AoT compilation like GCJ compiler to produce native binary exe from my Java code but here I am stuck because after watching videos and reading articles etc I didn’t manage to find a clear way to produce the native binary exe.
I am now confused since I don’t know if I am on the right path and good direction or if I am totally wrong and there is another way of protecting the code (at least from non professional hackers, I understand that it is not possible to make it 100% safe but I am just searching for a reasonable and good way). How should I manage this final step of my work?
I currently work for a company that has code that we don't want anyone to have access to for the security of our clients and-- less important-- for legal reasons. ;-)
One possible solution you could look into would be to rewrite the code you deem most sensitive into a C/C++ library. It would be possible to compile this into a .so/.dll/.dylib file for the respective OSs and it would make it difficult, not entirely impossible, but difficult to decompile.
The trouble would come from learning how to access native code from Java as much of the documentation is not helpful or just simply nonexistent. This would utilize the Java Native Interface (JNI) which allows Java to, well, interface with the native (compiled C/C++) code. This would make it possible to create a Jar file that would effectively become a Java library for you to access throughout the rest of your project. The native code, however will still need to be loaded at runtime, but that's apart of learning how JNI works. A helpful link I found for JNI is http://jnicookbook.owsiak.org/ (for as long as it's still a functional link).
One of our clients here where I work has a project written in Java and needed to implement our code that is unfortunately all written in C. So we needed a way to access this C/C++ code from Java. This is the way we went about solving this issue without rewriting our code in Java. But we had the benefit (?) of having already written our code in C.
This solution to write a bunch of extra code last minute in another language that I may or may not be familiar with doesn't sound like particularly fun time.
I would be curious to learn what possible problems others might see with this solution.
A bit of a noob-who-tries-to-get-a-glimpse-of-something-without-making-homeworks-first question...
Suppose I'd like to include a JVM on a closed source O.S./hardware to be able to provide extended functionalities to customers with addon java applets, and that I'd want to be the only available source to develop and sell addon apps... then is it feaseable to easily implement such a mechanism by simply forcing embedded JVM to only allow execution of apps signed with my digital sign?
In other words I'd just like to know if this is an easy to implement, already proven to work, widely accepted path or just plain BS (for reasons you are free to not tell!) :)
It sounds like what you're wanting is class signing. The startup code for your application can install a SecurityManager to ensure that only classes signed by keys matching some particular criteria can be loaded.
Adding my own answer to get feedback on the following solution, which seems to be the most fitting with my question:
Could Java system policy file be the answer?
As far as I can understand from reading the documentation at http://docs.oracle.com/javase/6/docs/technotes/guides/security/PolicyFiles.html you can basically implement code execution permission policy in 2 ways:
1) implementing and extending permission policy at runtime (what #chrylis refers to).
2) using a default system policy file (java.home\lib\security\java.policy)
The second approach seems easier to implement and kind of more "static" which is a good thing given my use-case because I only need JVM to check that digital sign of app is mine to allow it to run, and will never ever need to extend this policy in any possible way.
So I am not sure yet but given my prerequisites this approach might be what I was looking for in my question... If you have any thoughts just add them, thanks.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Do you obfuscate your commercial Java code?
Is there any way other then obfuscation to protect jars from being opened by someone else? The thing is that I don't want anyone to access the code, which is why I don't prefer Java. From the decompilers I used, programs made in C# and Java have EVERYTHING like the names of the variables intact which would make it easy to get access to programs that are not free. Worse, give out the source code.
Most of these points are covered by comments above, but I'll expand on them a bit here:
If your code is running on the user's machine, the user can decompile your code. It doesn't matter what language it is. Java, C, whitespace, brainfuck, it doesn't matter. If the code runs on a computer, a human can read it. Even if you make your own homebrew language and compiler, the compiled code is still going to be a sequence of standard machine instructions, which decompilers will handily turn into readable code in C or whatever language you like.
No exceptions. Forget about it.
But there are ways to get what you want: protecting some secret business logic. An easy way to do this would be to place the business logic on your own machine and expose it with a web service. The user can still see the client requests and service responses but otherwise your logic is a black box.
You could also make your own machines, lock them down, and distribute them to users. Be aware that although this is possible, it's technically quite difficult to do correctly (think of all the hacked gaming consoles and smartphones), and will significantly increase the cost of your service.
As far as I know, jar files (generated with NetBeans) may contain only .class files, which are Java bitecode, not source code. I don't know if there's a way to reverse-engineer a .class file, but it has very little ASCII usable text.
At a recent interview, I was asked:
Open source web app (say built on Struts/Spring) is more prone to hacking since anyone can access the source code and change it. How do you prevent it?
My response was:
The java source code is not directly accessible. It is compiled into class files, which are then bundled in a war file and deployed within a secure container like Weblogic app server.
The app server sits behind a corporate firewall and is not directly accessible.
At that time - I did not mention anything about XSS and SQL injection which can affect a COTS-based web app similar to an open source one.
My questions:
a) Is my response to the question correct?
b) What additional points can I add to the answer?
thanks in advance.
EDIT:
While I digest your replies - let me also point out the question was also meant towards frameworks such as Liferay and Apache OFBiz.
The question is a veiled argument towards Security through obscurity. I suggest you read up the usual arguments for and against and see how that fits:
Security through obscurity ( Wikipedia )
Hardening Wordpress
SSH server security (Putty)
My personal opinion is that obscurity is at best the weakest layer of defence against atack. It might help filter out automated attacks by uninformed attackers, but it does not help much against a determined assault.
a) Is my response to the question correct?
The part about the source not being accessible (to change it) because it is compiled and deployed where it cannot be touched is not a good answer. The same applies to non-open-source software. The point that was being made against an open source stack is that the source is accessible to read, which would make it easier to find vulnerabilities that can be exploited against the installed app (compiled or not).
The point about the firewall is good (even though it does not concern the open- or closedness of the software, either).
b) What additional points can I add to the answer?
The main counterargument against security through obscurity (which was the argument being made here) is that with open source software, many more people will be looking at the source in order to find and fix these problems.
since anyone can access the source code and change it.
Are you sure that is what they said? Change it? Not "study it"?
I don't see how anyone can just change the source code for Struts...
A popular open-source web framework/CMS/library is less likely to have horrible bugs in it for long, since there are lots of people looking at the code, finding the bugs, and fixing them. (Note, in order for this to matter, you'll need to keep your stuff up to date.)
Now, your friend does have a tiny point -- anyone who can fix the bugs could also introduce them, if the project is run by a bunch of idiots. If they take patches from any random schmuck without looking the patches over, or don't know what they're doing in the first place, it's possible to introduce bugs into the framework. (This doesn't matter unless you update regularly.) So it's important to use one that's decently maintained by people who have a clue.
Note, all of the problems with open-source frameworks/apps apply to COTS ones as well. You just won't know about bugs in the latter til after bugtraq and other such lists publish them, as big companies like to pretend there aren't any bugs in their software til forced to react.
a) Yes. Open source doesn't mean open binaries :) The sentence "anyone can change the source code" is simply incorrect (you can change your copy of the code, but can't edit Apache Struts code)
b) Maybe the fact that the source code is visible makes it easier to somebody to see the posible flaws it can have and exploit them. But, the same argument functions the other way: as a lot of people review the code the flaws are found faster so the code is more robust at the end.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I've used MyGeneration, and I love it for generating code that uses Data Access Applicaiton Blocks from Microsoft for my Data Access Layer, and keeping my database concepts in sync with the domain I am modeling. Although, it took a steeper than expected learning curve one weekend to make it productive.
I'm wondering what others are doing related to code generation.
http://www.mygenerationsoftware.com
http://www.codesmithtools.com/
Others?
Back in 2000, or so, the company I worked for used a product from Veritas Software (I believe it was) to model components and generate code that integrated components (dlls). I didn't get a lot of experience with it, but it seems that code generation has been the "holy grail" for a long time. Is it practical? How are others using it?
Thanks!
T4 is the CodeSmith killer for Microsoft!!!!
Go check it out. Microsoft doesn't want to destroy their partners so they don't advertise it, but it is a thing to be reckoned with and ITS FREE and comes installed in Visual Studio 2008.
www.olegsych.com
codeplex.com/t4toolbox
www.t4editor.net
I have used LLBLGen and nHibernate successfully to generate Entity and DAL layers.
We use Codesmith and have had great success with it. I am now constantly trying to find where we can implement templates to speed up mundane processes.
I've done work with CSLA and used codesmith to generate my code using the CSLA templates.
codesmithtools.com
If your database is your model, SubSonic has an excellent code generator that as of v2.1, no longer requires ActiveRecord (you can use the Repository Pattern instead). It's less flexible than others, but there are customizations that can be made in the stock templates.
I have used CodeSmith and MyGeneration, wasn't overly keen on either, felt somewhat terse to use, learning template languages etc.
SubSonic is what we sometimes use here to generate a Data Access Layer. Used in the right size projects, it is a fantastic time saving tool. clicky
I see code generation harmfull as well, but only if you use 3rd party tools like codesmith and mygeneration. I have 2 stored procedures that generate my domain objects and domain interfaces
Example
GenerateDomainInterface 'TableName'
Then I just copy and paste it into visual studio. Works pretty awesome for those tasks I hate to do.
Two framworks I use often.
Ragel
Something worth checking out is Ragel. It's used to generate code for state machines.
You just add some simple markup to your source code, then run a generator on
Ragel generates code for C, C++, Objective-C, D, Java and Ruby, and it's easy to mix it with your regular source.
Ragel even allow you to execute code on state transitions and such. It makes it easy to create file format and protocol parsers.
Some notable projects that user Ragel are, Mongrel, a great ruby web server. And Hpricot, a ruby based html-parser, sort of inspired by jQuery.
Another great feature of Ragel is how it can generate graphviz-based charts that visualize your state machines. Below is an example taken from Zed Shaw's article on ragel state charts.
(source: zedshaw.com)
XMLBeans
XMLBeans is a java-based xml-binding. It's got a great workflow and I use it often.
XMLBeans processen an xml-schema that describes your model, into a set of java-classes that represents that model. You can programmatically create models then serialise them to and from xml.
I have used CodeSmith. Was pretty helpful.
I love to use
SubSonic. Open source is the way to go with code generation I think because it is very easy to modify the templates and the core as they always tend to have bugs or one or two things you want to do that is not built in.
I've used code generation for swizzle functions in a vector math library. I used a custom PERL script for it. None of the FLOSS generators I looked at seemed well-suited to creating swizzle functions
I generally use C++ templates, rather than code generation.
I've primarily used LLBLGen Pro to generate code. It offers a variety of patterns to use for generation and you can supply your own patters, just like CodeSmith. The customer support has been excellent.
Essentially, I generate my business objects and DAL using LLBLGen and keep them up to date. The code templates have sections where you can add your own logic that won't be wiped out during regeneration. It's definitely worth taking a look.
We custom build our code generation using linq and XML literals (VB).
We haven't found a way to break the solutions into templates yet; however, those two technologies make this task so trivial, I don't think we will.
I'd consider code generation harmful as it bloats the codebase without adding new logic or insight. Ideally one should raise the level of abstraction, use data files, templates or macros etc. to avoid generating large amounts of boiler plate code. It helps you get things done quickly but can hurt maintainability in the long run.
If your chosen programming language becomes much less painful by generating it from some template language, that seems indicate you'd save even more time by doing the higher level work in another, perhaps more dynamic language. YMMV.
LLBLGen Pro is an excellent tool which allows you to write a database agnostic solution. It's really quick to pick up the basic features. Advanced features aren't much more challenging. I highly recommend you check it out.
I worked for four years as the main developer in a web agency, as I wrote from ground-up my first two or three websites, I soon realized that it was going to be a very boring task to do it all the times. So I started writing my own web site generator engine.
My starting point was this site http://www.codegeneration.net/. I took one of their examples for a simple crud generation and extended to the level that i was generating entire sites with it.
I used xml for the definition of various parts of the website (pages, datalists, joins, tables, form management). The generated web sites were completely detached from the generator, so the generated website could also be modified by hand.
Here is their article http://www.codegeneration.net/tiki-read_article.php?articleId=19.
I've done several one-off's of code generation using Castor to create Java source code based on XSD's. The latest use was to create Java classes for an Open Travel Association implementation. The OTA Schema is pretty hairy and would have been a bear to do by hand. Castor did a pretty good job given the complexity of the schema.
Python.
I have used MyGeneration which uses C# to write your code templates. However, I started using Python and I found that I can write code that generates other code faster in that language than I would if written in C#. Subsequently, I have used Python to code gen C#, TSQL, and VB.
Generally, code that generates other code tends to be harder to follow by its very nature. Python's cleaner syntax helps tremendously by making it more readable and more maintainable than the equivalent in C#.
codesmith for .net
I wrote a utility where you specify a table and it generates an Oracle trigger which records all changes to that table. Makes logging really simple.
There's another one I wrote that generates a Delphi class that models any database table you give it, but I consider it a code smell to do that, so I rarely use it.
At the company we've written our own to generate most of our entity/dalc/business classes and the related stored procedures as it took only a little time and we had some special requirements. Although I'm sure we could've achieved the same thing using an existing generator, it was a fun little project to work on.
Codesmith's been recommended by many people and it does seem to be a good one. Personally all I need from a code generator is to make it easy to amend templates.
I use the hibernate tools in myEclipse to generate domain models and DAO code from my data model. It seems to work pretty well (there are some issues if you write custom methods in your DAO's, these seem to get lost on over-writes), but generally it seems to work pretty well- especially in conjunction with Spring.
SubSonic is great!! The query capability is easy to grasp, and the stored procedure implementation is truly awesome. I could go on and on. It makes you productive instantly.
I mainly code in C# and when i need code generation I do it in XLST when the source could be simply converted to XML or a ruby script when it's more complex.
If the code generation part need frequent modifications by more than a few developers CodeSmith works pretty well (And is easier to learn than XSLT or ruby by new developers).
Outsystems' Agile Platform can be used to generate open source, well documented C# and Java applications. Because it has also several features related to deploying, managing and changing, most people end up using it not just to generate the code but actually to manage the full life-cycle of web applications.
For some time, I've used a home-grown script/template language for code generation. (I've used that languge mostly for no other reason than to find use for my little pet project)
Recently, I've created some SQL*PLUS scripts to create database access code (no Hibernate for us...)
MyGeneration all the way!
MyGeneration is an extremely flexible template based code generator written in Microsoft.NET. MyGeneration is great at generating code for ORM architectures. The meta-data from your database is made available to the templates through the MyMeta API.