I have this function to read a doc file using tika on linux:
def read_doc(doc_path):
output_path=doc_path+'.txt'
java_path='/home/jdk1.7.0_17/jre/bin/'
environ = os.environ.copy()
environ['JAVA_HOME'] =java_path
environ['PATH'] =java_path
tika_path=java_path+'tika-app-1.3.jar'
shell_command='java -jar %s --text --encoding=utf-8 "%s" >"%s"'%(tika_path,doc_path,output_path)
proc=subprocess.Popen(shell_command,shell=True, env=environ,cwd=java_path)
proc.wait()
This function works fine when I run it from the command line, but when I call the same function using CGI, I get the following error:
Error occurred during initialization of VM Could not reserve enough
space for object heap
I checked previous answers for this particular error and they suggest increasing the memory, but this doesn't seem to work...I don't think this has to do with memory allocation, but rather some read/write/execute privilages from the cgi script, any idea how to solve this problem?
You're loading an entire JVM instance within the memory & process space of each individual CGI invocation. That's bad. Very bad. For both performance and memory usage. Increasing memory allocation is a hack that doesn't address the real problem. Core java code should almost never be invoked via CGI.
You'd be better off:
Avoiding both CGI and Python by running a java Servlet within your web server that invokes the appropriate Tika class directly with desired arguments. Map the user url directly to the servlet (via #WebServlet("someURL") annotation on the Servlet class).
Running Tika in server mode and invoking it via REST from Python.
Running a core java app separately as a server/daemon proces, have it listen on a TCP ServerSocket. Invoke from Python via a client socket.
Try to add -Xmx512m and -XX:MaxHeapSize=256m to the shell command. So that the shell command looks like this.
shell_command = 'java -XX:MaxHeapSize=256m -Xmx512m -jar %s --text --encoding=utf-8 "%s" >"%s"'%(tika_path,doc_path,output_path)
Related
I am using a 2014 book on Jython-Java-Python in regards to music and computation.
...
I am trying to use a custom java command to handle a shell script with shell but all while telling java to handle the heap at a maximum size in MB.
I understand that the other previous contents of the heap management in java is stated well on this site. I do not need really a way to handle the heap but to handle the heap while handling shell scripts in java with a command like this:
java -Xms60m sh jython.sh furElise.py
The shell script is a wrapper for handling python and java, Jython, and I am trying to make this work on a 32-bit Linux SBC all while output as sound resonates. #JythonMusic
So, it is b/c of Elliott Frisch's answer that I have changed up the source in the .sh file called jython.sh to account for a smaller heap size.
I have chosen 1024 so far and things are in working order. I would have had to play with the allocated 4096 heap size which is too large for the entirety of my system plus what other "add-ons" are allocated to the heap outside of calling java via the jython.sh script.
Now, on my BeagleBone Black Wireless thus far, I can run a vncserver to account for the #JythonMusic source working which in the end leaves my command prompt in the jython interpreter.
Once in the jython interpreter, one would simply leave it as though one was in the python interpreter, e.g. exit().
Usually, I use jstack to check if the java process is working normally. While i found, when the /tmp/java_pid<num> (the num is pid of java process) socket file has been deleted, jstack will not work. like this:
[xxx]$ jstack -l 5509
5509: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
(PS. I didn't want to use the "-F", there may be other problems)
Is there any way to change the socket file location(not /tmp)? or to generate the socket file again when found not existed? Now what i did is to restart the java process again, a very bad solution.
Thanks!
/tmp/.java_pid socket is used by HotSpot Dynamic Attach mechanism. It is the way how jstack and other utilities communicate with JVM.
You cannot change the path - it is hardcoded in JVM source code. Neither you can force JVM to regenerate it, because the Attach Listener is initialized only once in HotSpot lifetime.
jstack -F works in a quite different way.
In order to check whether Java process works fine, I suggest using JMX remote.
I am wondering, how does JPS tool get the name of the main class it is executed within jvm process. It is
jps -l
123 package.MainClass
456 /path/example.jar
I am talking specifically about Linux (I am not interested in Windows, and I have no Win machine to experiment on).
I could think of 2 ways
Connecting to the JVM in question which in turn tells it
From /proc file system
Regarding the first alternative, is it using local JMX connection? Still, it must go to /proc for the pids.
There is PID, so it must ask OS anyway
jps lists also itself
Regarding the second alternative, I feel this could be the correct one, because
On the command line, there is either -jar or MainClass
/proc knows wery well the PID
Before jps starts doind something, it has own folder in /proc
But, I am facing little problem here. When java command is very long (e.g. there is extremely long -classpath parameter), the information about the command line does not fit into space reserved for it in /proc. My system has 4kB for it, and what I learned elsewhere, this is hardwired in OS code (changing it requires kernel compilation). However, even in this case jps is still able to get that main class somewhere. How?
I need to find quicker way to get JVM process than calling jps. When system is quite loaded (e.g. when number of JVMs start), jps got stuck for several seconds (I have seen it waiting for ~30s).
jps scans through /tmp/hsperfdata_<username>/<pid> files that contain monitors and counters of running JVMs. The monitor named sun.rt.javaCommand contains the string you are looking for.
To find out the format of PerfData file you'll have to look into JDK source code.
Goal: I have client-server program in which client and server runs in different jvms.
To test the same, I want to invoke the Server in a different JVM programatically and then use current jvm to run the client and execute different C/S tests.
Is there any way I can execute a method or run Java commands in different jvm programatically?
1) The most powerful tool in java to run process is ProcessBuilder:
ProcessBuilder pb = new ProcessBuilder("java", "-server", "-jar", "yourJar.jar");
Process p = pb.start();
Than using Process you are able to manipulate child process e.g. read InputStream, destroy e.t.c.
2) If you are able to edit both source code review this question to build efficient communication between JVM on the same host.
If you cannot change code, simply create own loader which load Server and implements inter JVM communication and invoke methods you need, because it in the same JVM space.
You can run virtually any command which you otherwise run manually using
Runtime.getRuntime().exec(command);
For more refer to Runtime.getRuntime().exec(...) documentation.
But also note that running any platform specific command using exec will rob your program its platform independent nature.
Sometime back I saw someone using "mv" to move a file. That made the entire program to Unix-based OS specific. Charm of Java or any virtual machine based language is its platform independent nature.
You can use command line:
Runtime.getRuntime().exec("java -server MyServer")
or if you want to build some more complicated call just use http://commons.apache.org/exec/ to build and run program.
From what I know, this is not possible with plain java. Probably grid enabled frameworks could provide a way of running a java program on multiple JVMs. A similar problem was resolved here:
how-to-run-a-java-file-project-in-remote-jvm-which-is-present-in-other-network
I have a question relates to PHP and Java programming.
I am going to develop a web application to do a unit testing.
PHP is the language that I'll use for the web and will executes the exec() function.
as I read, the function returns the output of execution, ok I do need it.
but it's not enough, I also think if I can get how much memory are used during the execution.
the web will run in an Apache web server in a Linux native operating system (orefered in Ubuntu).
This is the second case:
if there is a Java source which contains a program in which requires the user input during the execution, how can I execute it via a web server with also pass all the lines which may act as the user input?
the next problem is, the exec() function only accepts parameters in line.
How if I want
so, if there is any idea how to do that things?
The /usr/bin/time program (documented in time(1)) can return the amount of memory used during execution:
$ /usr/bin/time echo hello
hello
0.00user 0.00system 0:00.00elapsed ?%CPU (0avgtext+0avgdata 2512maxresident)k
0inputs+0outputs (0major+205minor)pagefaults 0swaps
You can see that the echo(1) program required 2.5 megabytes of memory and ran very quickly. Larger programs will be more impressive:
$ /usr/bin/time jacksum --help
Unknown argument. Use -h for help. Exit.
Command exited with non-zero status 2
0.08user 0.03system 0:00.87elapsed 12%CPU (0avgtext+0avgdata 57456maxresident)k
25608inputs+64outputs (92major+4072minor)pagefaults 0swaps
jacksum is a Java-based program, so it took 57 megabytes to tell me I screwed up the command line arguments. That's more like it.
You might also find the BSD process account system worthwhile. See lastcomm(1), sa(8), and dump-acct(8) for more information.