parallel file I/O Java - java

I have a Fortran program that is calling into Java using JNI. My Java function receives an array, writes the array to a file, makes a system call to a Python function that computes something and writes the result to a file which in turn is read by the Java function and passed back to Fortran. This works as expected.
Unfortunately, I cannot use Jython because Jython does not support NumPy yet.
The serial implementation of my program works as expected but when I run the parallel implementation of Fortran code that uses OpenMP, file I/O is messed up. Is there any way I can safely read/write from files with the parallel implementation?

I assume that you use hard-coded filenames. The probblem is that all active threads are using the same files to pass data to the next program. Try to separate them. If you are running 3 OpenMP threads then you need 3 files for data transfer.
For separation you could name your files based on UUIDs and pass that filename to your python program as a parameter.
String filename = "myFile" + UUID.randomUUID() + ".dat";
Process p=Runtime.getRuntime().exec("python myProgram.py " + filename);
p.waitFor();
Python program:
print 'using file: ', sys.argv[0]

Related

Call a java function from a python script

I have a Java script - with a function that I wrote, that I send her a list of strings, the function encrypt each element, and returns a list with the encrypted elements.
My problem is this:
I need to use this function in a python script (send a "list" Python object as input, and receive an "ArrayList" Java object).
How can I call a Java function - that I wrote, in a python script?
And does the list objects are consistent between Python and Java (list Vs. ArrayList)?
A big thank you to all!
** Edit: I'm about to use this entire package in AWS Lambda Function **
The main decisions for choosing a solution seem to be
What do we use to execute the Java program?
How do we transfer computed data from the Java program to the Python program?
E.g. you could decide to use a Java JVM and execute via a call to the operating system from Python.
The computed data could be sent to standard output (in some suitable format) and read in and processed by Python. (See link for the os call and i/o)

How to get output from JNI (native) method implemented in C back to Java

I have a thirdparty native subroutine in C that wants to output it's results to a file descriptor. I can choose the file descriptor (including standard output, stdout, if I want) by passing it to the C subroutine, but I cannot have it output to anything other than a file descriptor. I have created my own C code to act as an interface between Java and this C subroutine.
How can I get the output of the C subroutine back to Java directly... that is, without writing it to a file then reading it back in?
The output is usually in the neighborhood of 20-30k and is plain human readable text, so it isn't terribly large and is meaningful to my users. I'd like to display the output to the user in a Java console like window as it is being output. Is this possible? And how?
I see Java has a FileDescriptor object... is this part of the solution? I do not want to plug my C fd into the FileDescriptor object (using a private field of FileDescriptor) as described here: http://www.kfu.com/~nsayer/Java/jni-filedesc.html
I'd like to do this using Java best practices and idioms.
If you're on a Unix-like system, consider creating a named pipe and then opening it in both native and Java code as though it were a normal file.
This is similar to using a temporary file, except that the data goes directly from the writer to the reader without being stored on disk.
The downside is that you will still have to manage a temporary named pipe.

Pass Value between Java and C

I have a Java GUI and a C program. C program calculates given value(args). I want to call C calculator in Java, and change label to returned value in GUI (label changes almost every second).
I thought some ways,
Socket programming (probably not efficient in same computer)
File operations (Java writes input, c calculates then java reads calculated value)
JNICALL stuff, but i dont know how to get value from c file.
SharedMemory (i have no idea about windows shm)
What is the best way to solve this problem?
You could take a look at JNA, if your C program is a DLL with export functions.
Remember from a past project it works very nicely, without having to do the JNI bits yourself.
I had to do something similar at my job, I wrote a helper program in C++ to access some custom made database and my main application was in Java. The database API was only available to C++ programs.
What I did was to start the C++ process from Java and communicate via command line args and stdout.
List<String> cmdPathAndArgs = Arrays.asList("/path/to/binary", "--arg1", "--arg2");
ProcessBuilder pb = new ProcessBuilder(cmdPathAndArgs);
Processpp = pb.start();
InputStream stream = pp.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(stream));
InputStream errorStream = pp.getErrorStream();
BufferedReader errorBr = new BufferedReader(new InputStreamReader(errorStream));
If your progran generates output in both stdout and stderr you may get into a deadlock. You should empty both streams regularly.
If you can build your C code into a shared library, then JNI is probably your best bet. Sockets will be more than fast enough to handle 1 update a second. Using files will probably be slower than sockets. Shared memory is fine too, however I think sockets is probably a simpler interface for what you want to do.

How to avoid loading a large file repeatedly?

I'm trying to call a Java program (Stanford Chinese Word Segmenter) from within python. The Java program needs to load a large (100M) dictionary file (word list to assist segmentation) which takes 12+ seconds. I was wondering if it is possible to speed up the loading process, and more importantly, how to avoid loading it repeatedly when I need to call the python script multiple times?
Here's the relevant part of the code:
op = subprocess.Popen(['java',
'-mx2g',
'-cp',
'seg.jar',
'edu.stanford.nlp.ie.crf.CRFClassifier',
'-sighanCorporaDict',
'data',
'-testFile',
filename,
'-inputEncoding',
'utf-8',
'-sighanPostProcessing',
'true',
'ctb',
'-loadClassifier',
**'./data/ctb.gz',**
'-serDictionary',
'./data/dict-chris6.ser.gz',
'0'],
stdout = subprocess.PIPE,
stdin = subprocess.PIPE,
stderr = subprocess.STDOUT,
)
In the above code, './data/ctb.gz' is the place where the large word list file is loaded. I think this might be related to process, but I don't know much about it.
You might be able to use an OS specific solution here. Most modern Operating Systems have the ability to have a partition in memory. For example, in Linux you could do
mkfs -q /dev/ram1 8192
mkdir -p /ramcache
mount /dev/ram1 /ramcache
Moving the file to that directory would greatly speed I/O
There might be many ways to speed up the loading of the word list, but it depends on the details. If IO (disk read speed) is the bottleneck, then a simple way might be to zip the file and use a ZipInputStream to read it - but you would need to benchmark this.
To avoid multiple loading, you probably need to keep the Java process running, and communicate with it from Python via files or sockets, to send it commands, rather than actually launching the Java process each time from Python.
However, both of these require modifying the Java code.
If the java program produces output as soon as it receives input from filename named pipe and you can't change the java program then you could keep your Python script running instead and communicate with it via files/sockets as #DNA suggested for the Java process (the same idea but the Python program keeps running).
# ...
os.mkfifo(filename)
p = Popen([..., filename, ...], stdout=PIPE)
with open(filename, 'w') as f:
while True:
indata = read_input() # read text to segment from files/sockets, etc
f.write(indata)
# read response from java process
outdata = p.stdout.readline()# you need to figure out when to stop reading
write_output(outdata) # write response via files/sockets, etc
You can run a single instance of the JVM and use named pipes to allow the python script to communicate with the JVM. This will work assuming that the program executed by the JVM is stateless and responds on its stdout (and stderr perhaps) to requests arriving via its stdin.
Why not track whether the file has already been read on the python side? I'm not a python whiz, but I'm sure you could have some list or map/dictionary of all the files that have been opened so far.

MATLAB - Delete elements of binary files without loading entire file

This may be a stupid question, but Google and MATLAB documentation have failed me. I have a rather large binary file (>10 GB) that I need to open and delete the last forty million bytes or so. Is there a way to do this without reading the entire file to memory in chunks and printing it out to a new file? It took 6 hours to generate the file, so I'm cringing at the thought of re-reading the whole thing.
EDIT:
The file is 14,440,000,000 bytes in size. I need to chop it to 14,400,000,000.
There is no ftruncate() in Matlab, but you've got access to the full Java standard library in the JVM embedded in Matlab, and can use java.io.RandomAccessFile or the Java NIO classes to truncate a file.
Here's a Matlab function that calls to Java to lop the last n bytes off a file. Should have minimal I/O cost.
function remove_last_n_bytes_from_file(file, n)
jFile = java.io.RandomAccessFile(file, 'rw');
currentLength = jFile.length();
wantLength = currentLength - n;
fprintf('Truncating file %s: Resizing to %d to remove %d bytes\n', file, wantLength, n);
jFile.setLength(wantLength);
jFile.close();
You could also do it as a one-liner.
java.io.RandomAccessFile('/path/to/my/file.bin', 'rw').setLength(n);
I found Perl is much quicker to do this than MATLAB.
Here are two examples from Perl Cookbook:
truncate(HANDLE, $length)
or die "Couldn't truncate: $!\n";
truncate("/tmp/$$.pid", $length)
or die "Couldn't truncate: $!\n";
You can run Perl script from MATLAB with PERL function.
Since you don't want to read the file into MATLAB (understandably), you are dealing with system level commands. MATLAB has a facility to call system commands using the "system" command
system
So now your problem is reduced to finding the shell command in your OS that will do it for you. Or you can write a program using truncate() (unix -- KennyTM) or SetEndOfFile (windows)
I don't know if MATLAB supports this, but see ftruncate() and truncate().

Categories

Resources