Mapping a Java program to an Operating System process

Mapping a Java program to an Operating System process - java

I am wondering how this happens: how is a Java program mapped to an OS process (like the one shown for Linux below):
In C, it's a straightforward association in how a program is written and how the whole call stack proceeds in the OS. I was wondering how is the mapping achieved in Java? Does a method meth(), called on an object: obj, just translate to locating the address of obj.meth() & from then on stack is used the way it is in C?
Thanks in advance!
Edit: I'd also be curious to know the model that other OOP languages use in general (C++, Python etc).

That's a pretty complex problem. Here is a pretty good article about this topic. In short, Java got two execution modes which hugely affects memory layout.
Some code is executed by intepreter
Some code are compiled to native code for better performance.
See this wiki page: http://en.wikipedia.org/wiki/Just-in-time_compilation.
And JVM got more type of memory region, like perm-gen, memory for JIT, etc.
This is well-discussed in other threads:
java and memory layout
jdk1.6 memory layout

Most Java JVMs are plain C programs. So the picture will be the same write up to the first class file being interpreted/executed.
After that it depends on the JVM implementation. Typically they would use the stack storage to keep track of control type information such as which classes are loaded, which threads are running etc. For the actual "program" storage the interpreter and garbage collector will use plain "malloc"/"mfree" to allocatate and free memory plus some fairly complex control structures to enable the garbage collector to function.

Related

How to access JVM internal data structures using the Hotspot Dynamic Attach Mechanism?

According to the OpenJDK's website, it is possible to attach a thread to Hotspot (Dynamic Attach API) which can collect information about it. I couldn't find any material on the internet on how to obtain information about Hotspot's internal data structures such as the operand stack or the state of the bytecode interpreter(to know which bytecode is currently executing) or to retrieve the current Stack Frame etc.
Also, if this is not possible with the Dynamic Attach API, how can this be done using the Serviceability Agent? The only example I found on the internet is this gist from Github which shows how to attach to a running JVM and get the values of some fields. But how to access the aformentioned internal data structures in the JVM?

The article Creating Your Own Debugging Tools briefly describes both Dynamic Attach and Serviceability Agent.
Dynamic Attach allows to connect to a running JVM and execute one of the predefined commands like
Print stack traces
Dump heap
Query or set a VM flag
Load an agent library
etc.
Basically, standard jstack, jmap and jcmd tools cover nearly all functions provided by Dynamic Attach. This API is not for accessing internal JVM structures. I doubt it can help with your task, except for loading a custom JVM TI library.
Serviceability Agent is closer to the JVM internal structures. Indeed, it can read JVM memory and recover structures like Code Cache, Stack Frames, TLAB, Constant Pool etc.
SA javadoc is available here. There are some examples of SA-based tools in JDK sources.
However, SA does not meet your requirements either.
It is a read only interface.
It works out of the process. SA-based tools suspend the JVM process entirely and read its memory using ptrace.
It is rather slow. It's main purpose is to debug unresponsive (or dead) JVM process.
Regarding operand stack, bytecode pointer etc. These notions exist only in the interpreter. Once a method is JIT-compiled, it no longer has structures you are asking about.
The locals and operands may be allocated in CPU registers or converted to constants.
The machine code does not always map one-to-one to the bytecode.
An inlined method may not even have its own stack frame, and so on.
Executing bytecodes one by one would mean giving up JIT compilation. JVM TI SingleStep indeed works only in the interpreter. Java application may work 10-100 times slower in a purely interpreted mode.
If you want to keep performance of your debugger reasonable, processing each bytecode instruction one after another is not an option. As told before, instrumentation is the right way to go. Note that it's not necessary to intercept every single bytecode - instrumenting basic blocks should be enough.

Why operating systems are not written in java?

All the operating systems till date have been written in C/C++ while there is none in Java. There are tonnes of Java applications but not an OS. Why?

Because we have operating systems already, mainly. Java isn't designed to run on bare metal, but that's not as big of a hurdle as it might seem at first. As C compilers provide intrinsic functions that compile to specific instructions, a Java compiler (or JIT, the distinction isn't meaningful in this context) could do the same thing. Handling the interaction of GC and the memory manager would be somewhat tricky also. But it could be done. The result is a kernel that's 95% Java and ready to run jars. What's next?
Now it's time to write an operating system. Device drivers, a filesystem, a network stack, all the other components that make it possible to do things with a computer. The Java standard library normally leans heavily on system calls to do the heavy lifting, both because it has to and because running a computer is a pain in the ass. Writing a file, for example, involves the following layers (at least, I'm not an OS guy so I've surely missed stuff):
The filesystem, which has to find space for the file, update its directory structure, handle journaling, and finally decide what disk blocks need to be written and in what order.
The block layer, which has to schedule concurrent writes and reads to maximize throughput while maximizing fairness.
The device driver, which has to keep the device happy and poke it in the right places to make things happen. And of course every device is broken in its own special way, requiring its own driver.
And all this has to work fine and remain performant with a dozen threads accessing the disk, because a disk is essentially an enormous pile of shared mutable state.
At the end, you've got Linux, except it doesn't work as well because it doesn't have near as much effort invested into functionality and performance, and it only runs Java. Possibly you gain performance from having a single address space and no kernel/userspace distinction, but the gain isn't worth the effort involved.
There is one place where a language-specific OS makes sense: VMs. Let the underlying OS handle the hard parts of running a computer, and the tenant OS handles turning a VM into an execution environment. BareMetal and MirageOS follow this model. Why would you bother doing this instead of using Docker? That's a good question.

Indeed there is a JavaOS http://en.wikipedia.org/wiki/JavaOS
And here is discuss about why there is not many OS written in java Is it possible to make an operating system using java?
In short, Java need to run on JVM. JVM need to run on an OS. writing an OS using Java is not a good choice.
OS needs to deal with hardware which is not doable using java (except using JNI). And that is because JVM only provided limited commands which can be used in Java. These command including add, call a method and so on. But deal with hardware need command to operate reg, memory, CPU, hardware drivers directly. These are not supported directly in JVM so JNI is needed. That is back to the start - it is still needed to write an OS using C/assembly.
Hope this helps.

One of the main benefits of using Java is that abstracts away a lot of low level details that you usually don't really need to care about. It's those details which are required when you build an OS. So while you could work around this to write an OS in Java, it would have a lot of limitations, and you'd spend a lot of time fighting with the language and its initial design principles.

For operating systems you need to work really low-level. And that is a pain in Java. You do need e.g. unsigned data types, and Java only has signed data types. You need struct objects that have exactly the memory alignment the driver expects (and no object header like Java adds to every object).
Even key components of Java itself are no longer written in Java.
And this is -by no means- a temporary thing. More and more does get rewritten in native code to get better performance. The HotSpot VM adds "intrinsics" for performance critical native code, and there is work underway to reduce the overall cost of native calls.
For example JavaFX: The reason why it is much faster than AWT/Swing ever were is because it contains/uses a huge amount of native code. It relies on native code for rendering, and e.g. if you add the "webview" browser component it is actually using the webkit C library to provide the browser.
There is a number of things Java does really well. It is a nicely structured language with a fantastic toolchain. Python is much more compact to write, but its toolchain is a mess, e.g. refactoring tools are disappointing. And where Java shines is at optimizing polymorphism at run-time. Where C++ compilers would need to do expensive virtual calls - because at compile time it is not known which implementation will be used - there Hotspot can aggressively inline code to get better performance. But for operating systems, you do not need this much. You can afford to manually optimize call sites and inlining.

This answer does not mean to be exhaustive in any way, but I'd like to share my thoughts on the (very vast) topic.
Although it is theoretically possible to write some OS in pure java, there are practical matters that make this task really difficult. The main problem is that there is no (currently up to date and reliable) java compiler able to compile java to byte code. So there is no existing tool to make writing a whole OS from the ground up feasible in java, at least as far as my knowledge goes.
Java was designed to run in some implementation of the java virtual machine. There exist implementations for Windows, Mac, Linux, Android, etc. The design of the language is strongly based on the assumption that the JVM exists and will do some magic for you at runtime (think garbage collection, JIT compiler, reflection, etc.). This is most likely part of the reason why such a compiler does not exist: where would all these functionality go? Compiled down to byte code? It's possible but at this point I believe it would be difficult to do. Even Android, whose SDK is purely java based, runs Dalvik (a version of the JVM that supports a subset of the language) on a Linux Kernel.

Accessing memory with Java

I have a program loaded in the memory. Now I want to access the memory directly and change the OPCODE and DATA in the memory for that program. For this I need to write a Java program.
Can you please tell me if this is feasible? If yes, please let me know how to write such a program.
Thanks in advance!

Java is not designed for this.
The main aim of Java is to let the JVM manage the memory for you. Thus, your programs are sandboxed.
However, there seems to be a backdoor in HotSpot JVM:
Java was initially designed as a safe, managed environment.
Nevertheless, Java HotSpot VM contains a “backdoor” that provides a
number of low-level operations to manipulate memory and threads
directly. This backdoor – sun.misc.Unsafe – is widely used by JDK
itself in the packages like java.nio or java.util.concurrent. It is
hard to imagine a Java developer who uses this backdoor in any regular
development because this API is extremely dangerous, non portable, and
volatile. Nevertheless, Unsafe provides an easy way to look into
HotSpot JVM internals and do some tricks. Sometimes it is simply
funny, sometimes it can be used to study VM internals without C++ code
debugging, sometimes it can be leveraged for profiling and development
tools.
Source: http://highlyscalable.wordpress.com/2012/02/02/direct-memory-access-in-java/
The Unsafe class is, however, undocumented. You may want to have a look at this SO answer for more details: https://stackoverflow.com/questions/5574241/interesting-uses-of-sun-misc-unsafe
Unoffical Docs: http://mishadoff.github.io/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/
Absolute Beginners' Guide http://java-performance.info/string-packing-converting-characters-to-bytes/
http://javapapers.com/core-java/address-of-a-java-object/
P.S. I am aware that I must post some of the content of the link here but since the articles are really very detailed, I have skipped that part

You cannot directly reference memory in java, as their is no concept of pointers in java like c/c++
You must go through this referencing memory address
Hope it helps.

Calling Java from C program

How to call Java methods from C program? I.e. is it possible to embed java (not necessary Sun/Oracle JVM) in other language?

A full Oracle JVM is a very large chunk to pull into your existing program, but it is perfectly doable but I would recommend against it if any of the following apply:
You need to pull a lot of data in and out of the JVM on a frequent basis. This is expensive.
You are not in full control of the operating system and JVM to use.
You are not an experienced C programmer. Debugging these things can be hard.
You might find jamvm - http://jamvm.sourceforge.net/ - an interesting alternative. It is a very small interpreter written in C, which may be a lot easier to handle. I have not tried embedding it.

Any concept of shared memory in Java

AFAIK, memory in Java is based on heap from which the memory is allotted to objects dynamically and there is no concept of shared memory.
If there is no concept of shared memory, then the communication between Java programs should be time consuming. In C where inter-process communication is quicker via shared memory compared to other modes of communication.
Correct me if I'm wrong. Also what is the quickest way for 2 Java progs to talk to each other.

A few ways:
RAM Drive
Apache APR
OpenHFT Chronicle Core
Details here and here with some performance measurements.

Since there is no official API to create a shared memory segment, you need to resort to a helper library/DDL and JNI to use shared memory to have two Java processes talk to each other.
In practice, this is rarely an issue since Java supports threads, so you can have two "programs" run in the same Java VM. Those will share the same heap, so communication will be instantaneous. Plus you can't get errors because of problems with the shared memory segment.

Java Chronicle is worth looking at; both Chronicle-Queue and Chronicle-Map use shared memory.
These are some tests that I had done a while ago comparing various off-heap and on-heap options.

One thing to look at is using memory-mapped files, using Java NIO's FileChannel class or similar (see the map() method). We've used this very successfully to communicate (in our case one-way) between a Java process and a C native one on the same machine.
I'll admit I'm no filesystem expert (luckily we do have one on staff!) but the performance for us is absolutely blazingly fast -- effectively you're treating a section of the page cache as a file and reading + writing to it directly without the overhead of system calls. I'm not sure about the guarantees and coherency -- there are methods in Java to force changes to be written to the file, which implies that they are (sometimes? typically? usually? normally? not sure) written to the actual underlying file (somewhat? very? extremely?) lazily, meaning that some proportion of the time it's basically just a shared memory segment.
In theory, as I understand it, memory-mapped files CAN actually be backed by a shared memory segment (they're just file handles, I think) but I'm not aware of a way to do so in Java without JNI.

Shared memory is sometimes quick. Sometimes its not - it hurts CPU caches and synchronization is often a pain (and should it rely upon mutexes and such, can be a major performance penalty).
Barrelfish is an operating system that demonstrates that IPC using message passing is actually faster than shared memory as the number of cores increases (on conventional X86 architectures as well as the more exotic NUMA NUCA stuff you'd guess it was targeting).
So your assumption that shared memory is fast needs testing for your particular scenario and on your target hardware. Its not a generic sound assumption these days!

There's a couple of comparable technologies I can think of:
A few years back there was a technology called JavaSpaces but that never really seemed to take hold, a shame if you ask me.
Nowadays there are the distributed cache technologies, things like Coherence and Tangosol.
Unfortunately neither will have the out right speed of shared memory, but they do deal with the issues of concurrent access, etc.

The easiest way to do that is to have two processes instantiate the same memory-mapped file. In practice they will be sharing the same off-heap memory space. You can grab the physical address of this memory and use sun.misc.Unsafe to write/read primitives. It supports concurrency through the putXXXVolatile/getXXXVolatile methods. Take a look on CoralQueue which offers IPC easily as well as inter-thread communication inside the same JVM.
Disclaimer: I am one of the developers of CoralQueue.

Similar to Peter Lawrey's Java Chronicle, you can try Jocket.
It also uses a MappedByteBuffer but does not persist any data and is meant to be used as a drop-in replacement to Socket / ServerSocket.
Roundtrip latency for a 1kB ping-pong is around a half-microsecond.

MappedBus (http://github.com/caplogic/mappedbus) is a library I've added on github which enable IPC between multiple (more than two) Java processes/JVMs by message passing.
The transport can be either a memory mapped file or shared memory. To use it with shared memory simply follow the examples on the github page but point the readers/writers to a file under "/dev/shm/".
It's open source and the implementation is fully explained on the github page.

The information provided by Cowan is correct. However, even shared memory won't always appear to be identical in multiple threads (and/or processes) at the same time. The key underlying reason is the Java memory model (which is built on the hardware memory model). See Can multiple threads see writes on a direct mapped ByteBuffer in Java? for a quite useful discussion of the subject.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.