Migrating C++ project to Java, protecting implementation details

Migrating C++ project to Java, protecting implementation details - java

I have a quite complex project to migrate from C++ (Linux) to Java
Currently, the C++ version is being distributed as a shared library (.so) followed by top-level interface header class. The implementation details are fully hidden from the final user.
This question is not about porting the C++ code to Java, but rather about creating similar distribution package.
Let's assume I have a very simple 'public' class in C++, topapi.h:
class TopApi
{
public:
void do( const string& v );
}
The actual implementation is hidden from the API user. The actual project may contain another 100 files/classes do() will call.
The distribution will contain 2 files: topapi.so and topapi.h
Users will #include "topapi.h" in their code, and link their applications with topapi.so.
The questions are:
1. How can I achieve a similar effect in Java (hide the IP related code)
2. How do I show public methods to the user ( not related to code protection, just a java version of the header file above )

Check out proguard. It will at least obfuscate the jar file, which otherwise is basically human readable. It's not absolutely safe from reverse engineering, but I guess neither is an so file.
I'm not an expert with Java, but this is what we have done to protect implementations in the past.
I don't know exactly what the motivations are for a Java port, but if it is just to support a Java end user, you could consider a JNI wrapper. I guess this probably isn't the case, but I thought I would mention it.
As far as exposing interface code to the user, you could write a Java interface class (like a pure virtual abstract c++ class) and simply not proguard that class.

To answer the question of how to show public methods to the user. This is usually done through a combination of declaring the internal classes without an access modifier, which makes them only accessible from within the same package, and not documenting them. Don't depend on the former though, it's easily circumventable, but it sends the message to the user that those classes are internal.
Java 9 adds modules which allow you to encapsulate entire packages, but it's not here yet, and you would still be able to circumvent the encapsulation.
One side effect of ahead of time compilation (usually the case with C++) is that the distributed code is already optimized, and contains no metadata, so it's harder to reverse engineer. Java is distributed in an intermediate language, but the actual machine code is generated at runtime (JIT compilation). The intermediate language is practically un-optimized, so it's easier to reverse engineer. Java also merges the idea of header files and source files where a .class file will contain all the metadata you need to use it.

Related

Are there any Java Class Library "header files" containing all method descriptors in the standard library?

In order to create a valid .class file, every method has to have a full internal name and type descriptors associated with it. When procedurally creating these, is there some sort of lookup table one can use (outside of Java, where a ClassLoader can be used) to get these type descriptors from a method name? For example, how would one go from Scanner.hasNextByte to boolean java.util.Scanner.hasNextByte(int) / boolean java.util.Scanner.hasNextByte() (or even from java.util.Scanner.hasNextByte to boolean java.util.Scanner.hasNextByte(int) / boolean java.util.Scanner.hasNextByte())? The above example has overloading in it, which is another problem a human- but mostly computer-readable declarations file would hopefully address.
I've found many sources of human-readable documentation like https://docs.oracle.com/javase/8/docs/api/index.html containing uses of each method, hyperlinks to other places, etc. but never a simple text file or collection of files containing just declarations in any format. If there's no such file(s) don't worry about it, I can try and scrape some annoying HTML files, but if there is it would save a lot of time. Thanks!

The short answer is No.
There isn't a "header file" containing the class and method signatures for the Java class libraries. The Java tool chain has no need for such a thing. Nor do 3rd-party Java compilers, or compilers for other languages that rely on the Java SE class libraries.
AFAIK, there isn't a 3rd-party tool that builds such a file or an equivalent database or in-memory data structures.
You could create one though.
You could chose an existing Java parsing library, and use it to build parse trees for all of the source files in the class library, and emit the information that you need.
You could potentially create a custom Javadoc "doclet" plugin to emit the information.
Having said that, I don't understand why you would need such a mapping. Surely, your IDE does this already ... and exposes the information via some internal API. And if this is not for an IDE plugin, what it is for?
You commented:
I'm making a compiler for a JVM-based programming language ....
Ah ... so your compiler should do what other compilers do. Get the information from the ".class" file. You can either load the class using a standard or custom class loader, or you can use a library like asm or bcel or javassist ... which can read a ".class" file without loading it.
(I haven't checked, but I think the standard javac compiler uses an internal API to do this.)
Note that your proposed approaches won't work for interfacing with 3rd-party Java libraries where the source code is not available and/or the javadoc is not scrapable.

What about building it from the source files for the standard library?
The Oracle Java 8 API web pages you referenced was created by Javadoc processing of source files for the Java standard library.
If you use an IDE with a debugger, there is a good chance you already have much of the standard library source code downloaded. After all, if you set a break point, and then follow the program step-by-step with "Step into", you can trace the execution of the program into standard library methods. The source files would be part of the JDK.
However, some parts of the standard library source might not be available, due to licensing restrictions.

How to break up an Android activity in multiple files

In Android, a lot of functionality is in the Activity derived class. When an activity gets big (with many event handlers and such), the Java file can get large and very cluttered.
Is there a way to "break up" a Java class code file, like C# has the partial keyword?

As others have pointed out, you cannot split the actual file (I view this as a good thing).
You can extract view related functionality in custom views and fragments. Everything else (business logic, Web service access, DB access, etc.) can be in 'helper' classes you use in your activity. Even though activities are the God objects in Android, you don't have to write everything inside the actual activity class. It should only coordinate stuff and implement necessary callbacks and event handlers (which technically can be in their own classes as well).

short answer ? no.
quoted from wikipedia
The Sun Microsystems Java compiler requires that a source file name must match the only public class inside it, while C# allows multiple public classes in the same file, and puts no restrictions on the file name. C# 2.0 and later allows splitting a class definition into several files by using the partial keyword in the source code. In Java, a public class will always be in its own source file. In C#, source code files and logical units separation are not tightly related.
so while you may rework your design and relegate some code to utility classes to unclutter the code, you can not seperate the code of a single class across two files in java.

No. Java source codes can not be split across multiple files.
From the http://en.wikipedia.org/wiki/Comparison_of_Java_and_C_Sharp
The Sun Microsystems Java compiler requires that a source file name must match the only public class inside it, while C# allows multiple public classes in the same file, and puts no restrictions on the file name. C# 2.0 and later allows a class definition to be split into several files, by using the partial keyword in the source code. In Java, a public class will always be in its own source file. In C#, source code files and logical units separation are not tightly related.

When importing a java library class from jar, is this considered static linking? or dynamic?

say I have jcifs-1.3.14.jar in my lib folder, and I have a class that is importing from the library and uses the classes like:
import jcifs.smb.*;
NtlmPasswordAuthentication auth = new NtlmPasswordAuthentication(domain,
user,
pass);
SmbFile file = new SmbFile(path, auth);
// do some operations with the file here
When using the library in this fashion is it considered to be: A) Static Linking OR B) Dynamic Linking OR C) something else?

If you are looking for information about applying various software licenses on Java programs, then searching Google for <license name> Java usually results in a useful hit.
E.g for LGPL Java, this is the first hit. In this particular case, the bottom line is:
Applications which link to LGPL
libraries need not be released under
the LGPL. Applications need only
follow the requirements in section 6
of the LGPL: allow new versions of the
library to be linked with the
application; and allow reverse
engineering to debug this.
I.e. as long as the library is provided in a separate JAR file that can be easily replaced, LGPL allows it.
PS: I Am Not A Lawyer! If in doubt, consult one. As a matter of fact, depending on where you live, it might make sense to consult one regardless if you are in doubt or not.

Static vs dynamic as in C++ doesn't exist in Java. All class get loaded into JVM as they are referenced, so you'd want to think that all imports (this includes reflections) in Java are dynamic.
And that .* is bad because of the naming and class discovery conflicts that it might incur, nothing to do with class referencing.

Well, you don't compile the code from library into your java classes. Your compiled classes refere the classes from other library by name. When need, the class is loaded by class loader. It's more similar to dynamic linking.
From licencing point of view - f.g. LGPL licence, it should be considered as dynamic linking. I've never heard of any law proceeding in that case (though I've searched for it), but it is high propable, I'm looking forward to it, because many developers are a bit anxious about it.

Why is each public class in a separate file?

I recently started learning Java and found it very strange that every Java public class must be declared in a separate file. I am a C# programmer and C# doesn't enforce any such restriction.
Why does Java do this? Were there any design considerations?
Edit (based on a few answers):
Why is Java not removing this restriction now in the age of IDEs? This will not break any existing code (or will it?).

I have just taken a C# solution and did just this (remove any file that had multiple public classes in them) and broke them out to individual files and this has made life much easier.
If you have multiple public classes in a file you have a few issues:
What do you name the file? One of the public classes? Another name? People have enough issues around poor solution code organization and file naming conventions to have one extra issue.
Also, when you are browsing the file / project explorer its good that things aren't hidden. For example you see one file and drill down and there are 200 classes all mushed together. If you have one file one class, you can organize your tests better and get a feel for the structure and complexity of a solution.
I think Java got this right.

According to the Java Language Specification, Third Edition:
This restriction implies that there must be at most one such type per compilation unit. This restriction makes it easy for a compiler for the Java programming language or an implementation of the Java virtual machine to find a named class within a package; for example, the source code for a public type wet.sprocket.Toad would be found in a file Toad.java in the directory wet/sprocket, and the corresponding object code would be found in the file Toad.class in the same directory.
Emphasis is mine.
It seems like basically they wanted to translate the OS's directory separator into dots for namespaces, and vice versa.
So yes, it was a design consideration of some sort.

From Thinking in Java
:
There can be only one public class per compilation unit (file).
The idea is that each compilation unit has a single public interface represented by that public class. It can have as many supporting “friendly” classes as you want. If you have more than one public class inside a compilation unit, the compiler will give you an error message.
From the specification (7.2.6)
When packages are stored in a file system (?7.2.1), the host system may choose to enforce the restriction that it is a compile-time error if a type is not found in a file under a name composed of the type name plus an extension (such as .java or .jav) if either of the following is true:
The type is referred to by code in other compilation units of the package in which the type is declared.
The type is declared public (and therefore is potentially accessible from code in other packages).
This restriction implies that there must be at most one such type per compilation unit.
This restriction makes it easy for a compiler for the Java programming language or an implementation of the Java virtual machine to find a named class within a package; for example, the source code for a public type wet.sprocket.Toad would be found in a file Toad.java in the directory wet/sprocket, and the corresponding object code would be found in the file Toad.class in the same directory.
In short: it may be about finding classes without having to load everything on your classpath.
Edit: "may choose" seems like it leaves the possibility to not follow that restriction, and the meaning of "may" is probable the one described in RFC 2119 (i.e. "optional")
In practice though, this is enforced in so many platform and relied upon by so many tools and IDE that I do not see any "host system" choosing to not enforce that restriction.
From "Once upon an Oak ..."
It's pretty obvious - like most things are once you know the design reasons - the compiler would have to make an additional pass through all the compilation units (.java files) to figure out what classes were where, and that would make the compilation even slower.
(Note:
the Oak Language Specification for Oak version 0.2 (postcript document): Oak was the original name of what is now commonly known as Java, and this manual is the oldest manual available for Oak (i.e. Java).
For more history on the origins of Java, please have a look at the Green Project and Java(TM) Technology: An Early History
)

It's just to avoid confusion in the sense that Java was created with simplicity in mind from the perspective of the developer. Your "primary" classes are your public classes and they are easy to find (by a human) if they are in a file with the same name and in a directory specified by the class's package.
You must recall that the Java language was developed in the mid-90s, in the days before IDEs made code navigation and searching a breeze.

If a class is only used by one other class, make it a private inner class. This way you have your multiple classes in a file.
If a class is used by multiple other classes, which of these classes would you put into the same file? All three? You would end up having all your classes in a single file...

That's just how the language designers decided to do it. I think the main reason was to optimize the compiler pass-throughs - the compiler does not have to guess or parse through files to locate the public classes. I think it's actually a good thing, it makes the code files much easier to find, and forces you to stay away from putting too much into one file. I also like how Java forces you to put your code files in the same directory structure as the package - that makes it easy to locate any code file.

It is technically legal to have multiple Java top level classes in one file. However this is considered to be bad practice (in most cases), and some Java tools may not work if you do this.
The JLS says this:
When packages are stored in a file
system (§7.2.1), the host system may
choose to enforce the restriction that
it is a compile-time error if a type
is not found in a file under a name
composed of the type name plus an
extension (such as .java or .jav) if
either of the following is true:
The type is referred to by code in other compilation units of the package in which the type is declared.
The type is declared public (and therefore is potentially accessible from code in other packages).
Note the use of may in the JLS text. This says that a compiler may reject this as invalid, or it may not. That is not a good situation if you are trying to build your Java code to be portable at the source code level. Thus, even if multiple classes in one source file works on your development platform, it is bad practice to do this.
My understanding is that this "permission to reject" is a design decision that is intended in part to make it easier to implement Java on a wider range of platforms. If (conversely) the JLS required all compilers to support source files containing multiple classes, there would be conceptual issues implementing Java on a platform which wasn't file-system based.
In practice, seasoned Java developers don't miss being able to do this at all. Modularization and information hiding are better done using an appropriate combination of packages, class access modifiers and inner or nested classes.

Why is java not removing this restriction now in the age of IDEs? This will not break any existing code (or will it?).
Now all code is uniform. When you see a source file you know what to expect. it is same for every project. If java were to remove this convention you have to relearn code structure for every project you work on, where as now you learn it once and apply it everywhere. We should not be trusting IDE's for everything.

Not really an answer to the question but a data point none the less.
I grepped the headers of my personal C++ utilty library (you can get it yourself from here) and almost all of the header files that actually do declare classes (some just declare free functions) declare more than one class. I like to think of myself as a pretty good C++ designer (though the library is a bit of a bodge in places - I'm its only user), so I suggest that for C++ at least, multiple classes in the same file are normal and even good practice.

It allows for simpler heuristics for going from Foobar.class to Foobar.java.
If Foobar could be in any Java file you have a mapping problem, which may eventually mean you have to do a full scan of all java files to locate the definition of the class.
Personally I have found this to be one of the strange rules that combined result in that Java applications can grow very large and still be sturdy.

Well, actually it is an optional restriction according to Java Language Specification (Section 7.6, Page No. 209) but followed by Oracle Java compiler as a mandatory restriction. According to Java Language Specification,
When packages are stored in a file system (§7.2.1), the host system
may choose to enforce the restriction that it is a compile-time error
if a type is not found in a file under a name composed of the type
name plus an extension (such as .java or .jav) if either of the
following is true:
The type is referred to by code in other compilation units of the package in which the type is declared.
The type is declared public (and therefore is potentially accessible from code in other packages).
This restriction implies that there must be at most one such type per
compilation unit. This restriction makes it easy for a Java compiler
to find a named class within a package.
In practice, many programmers choose to put each class or interface
type in its own compilation unit, whether or not it is public or is
referred to by code in other compilation units.
For example, the source code for a public type wet.sprocket.Toad would
be found in a file Toad.java in the directory wet/sprocket , and the
corresponding object code would be found in the file Toad.class in the
same directory.
To get more clear picture let's imagine there are two public classes public class A and public class B in a same source file and A class have reference to the not yet compiled class B. And we are compiling (compiling-linking-loading) class A now while linking to class B compiler will be forced to examine each *.java files within the current package because class B don’t have it’s specific B.java file. So In above case, it is a little bit time consuming for the compiler to find which class lies under which source file and in which class the main method lies.
So the reason behind keeping one public class per source file is to actually make compilation process faster because it enables a more efficient lookup of the source and compiled files during linking (import statements). The idea is if you know the name of a class, you know where it should be found for each classpath entry and no indexing will be required.
And also as soon as we execute our application JVM by default looks for the public class (since no restrictions and can be accessible from anywhere) and also looks for public static void main(String args[]) in that public class. Public class acts as the initial class from where the JVM instance for the Java application (program) is begun. So when we provide more than one public class in a program the compiler itself stops you by throwing an error. This is because later we can’t confuse the JVM as to which class to be its initial class because only one public class with the public static void main(String args[]) is the initial class for JVM.
You can read more on Why Single Java Source File Can Not Have More Than One public class

How does software update work?

Ive been reading Liang's Introduction to Java Programming for a couple of weeks, and that question came up when the author said "There is no need for developers to create, and for users to install, major new software versions.".
How does software update work? For example, patches for games, new version of products, and that kind of things. In the book, there's an example that, as long as you keep an interface of a class the same, you dont need to do any changes in any of the classes that are dependent on the one you changed. That's fine, but still a little abstract (for example, how do I create an update patch with only that class?).
Im also interested in books on the subject.
Thank you.

Have a look at the book
Practical API Design - Confessions of a Java Framework Architect (Jaroslav Tulach, Apress, 2008)
I think it covers most of the aspects you are asking about.
For the topic on shipping new software versions or updates, have a look at the technology Java Web Start for example.
Shipping an update to a web application could be considered implicit in the face of the users as the changes made on a centralized server1 are delivered by the web browser itself.
1 or a set of servers

I think the concept you're trying to understand is using an interface as a type. In a java program, a variable can be declared to have the type of some defined interface. Classes that implement that interface can then be instantiated for variables of the interface type. However, only methods declared on the interface can be used. At compile time, the interface is used for type checking. At runtime however, the bytecode that actually does the work comes from the interface implementor. An example:
public interface foo {
public void bar();
}
public class A implements foo {
public void bar() {
// some code
}
}
public class Example {
public static void main(String[] args) {
foo aFoo = new A();
aFoo.bar();
}
}
In the class Example, a variable named aFoo is declared to be of type foo, the interface. A, which implements the foo interface, will contain the code to do the work of method bar(). In the class Example, the variable aFoo gets an instance of the class A, so whatever code is in the bar() method of A will be executed when aFoo.bar() is called even though aFoo is declared to be of type foo.
So, we've established that all the work is done in class A. The two classes and one interface above can each be defined in their own file. All three resulting .class files can be packaged into a jar and shipped to customers as version 1.0 of the program. Sometime later, a bug might be discovered in the implementation of bar() in class A. Once a fix is developed, assuming the changes are all contained inside the bar() method, just the .class file for A needs to be shipped to customers. The updated A.class can be inserted into the program's .jar file (.jar files are just .zip files after all) overwriting the previous, broken version of A.class. When the program is restarted the JVM will load the new implementation of A.bar() and the class Example will get the new behavior.
As with almost everything, more complicated programs can get, well, more complicated. But the principles are the same.

Consider a game that has been architected to have a number of independent modules, each interacting with each other via interfaces only. The initial deployment of such a game could consist of 15 or 20 individual jar files.
Applying updates would then be a matter of connecting to a server, querying for the latest version of each jar file, comparing against the jar file that you already have, and downloading only the changed files.
A poor-man's version of this would be if you were working on a project that has 15 jars, and you make a critical bug fix to one of the jars. You wouldn't have to copy all 15 jars to your customer's system - you jut copy the one that changed.
jnlp (of which Webstart and the new applet implementation are examples) does this using a well defined xml file to define which versions of each library are required. In this case, initial deployment of the application can be performed by reading the xml (served by a web server), and dynamically downloading required jar files to a local cache during a bootstrap operation. There actually is no initial installation at all.
It is probably important to mention that this benefit, while certainly valid and real, is pretty minor compared to the maintainability and ease of coding that comes from reducing dependencies between modules, and using well defined interfaces.

The simplest way to update a single class will be to stop the application, override the old class with the new one and then restart. Shutting down the application may not be appropriate for some cases. To overcome this you would need to have a framework/container of some kind and develop your own class loaders that make possible updating a class without restart. This is what Java Application Servers are doing.
For the desktop/thick clients there are available some commercial and free solutions for software updates. See this article for a comparison. It is rather old, new solutions are bound to be available.

In addition to updating individual JAR files (or, in other software environments, shared library files or DLLs), there are binary patch tools that can be used in some cases to transmit a set of changes against the base file. A patch program will take this difference file and merge it with the base file to produce the new, changed file. This only works if the original file is not modified on the installed system, but can work in some cases to incrementally patch particular program components.

I'd actually marginally studied this at university, and upon research specifically for it I actually found it quite interesting. One thing to do to specify updates specifically is recorded versioning where for each version that you release you specify what has updated. Say for example the difference between version 1.0.1 and 1.0.2 is that mylib.dll has been updated. Now say that the difference between 1.0.2 and 1.0.4 is that myotherlib.dll has changed then if someone updates their software to the latest version (1.0.4) and they currently have 1.0.1 then both dll's are included in the update.
I'm not sure if this is practiced though or if more intelligient methods are used. It depends on the software maker really, quite a few different methods are used I assume as some developers encrypt application files and bundle them into a custom format.

Heres a simple batch file auto updater to get you started with the concept...
You can run it in background and use it to update the files instead of writing it in java, I find it easier to write in c++ though.
#echo off
if not exists "file.jar" goto FM
cls
echo [1] Update
echo [2] Exit
set /p "main=1OR2:"
if %main% == 1 goto U
if %main% == 2 exit
:U
cls
echo Updating...
del "file.jar"
wget "http://yourserver.com/file.(EXTENSION)"
if not exists "file.jar" goto FM
cls
:FM
cls
echo You are missing important files!
pause >nul
exit
I advise that you learn how to do this in java, but if you want to get more advanced you can put a file in the server named "version.bat" and then download that too to the users laptop so that it can read it and if its the same version as the one that they have it will skip the download and say You have the ltest Version advalible!.
NOTE: in order to this code you need to go on google and search wget, download the proper file and put in the same folder as the code above, also save the code above as (anything.bat)

Is that not referring to the Single Responsibility Principle. http://en.wikipedia.org/wiki/Single_responsibility_principle
In practice it doesn't seem to work, unless you have something truly enormous like Windows which you can update pieces of or a game which uses lots of textures which don't need to be changed.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Migrating C++ project to Java, protecting implementation details - java

Related

Are there any Java Class Library "header files" containing all method descriptors in the standard library?

How to break up an Android activity in multiple files

When importing a java library class from jar, is this considered static linking? or dynamic?

Why is each public class in a separate file?

How does software update work?

Categories

Resources