Reading and Processing .mat file (Adjacency Matrix) in Java

Reading and Processing .mat file (Adjacency Matrix) in Java - java

I have covered lots of StackOverflow questions and Google search results, read many discussion topics but I couldn't find any proper answer for my question. I have an Sparse Matrix in .mat format which contains 36600 nodes (36600x36600 adjacency matrix) to read and manipulate (like matrix vector multiplication) in Java Environment. I applied many answers that discussed at here but I always got NullPointerException errors although there was a data at that .mat files.(Some says these result is because of size of data) I have applied these following code to my .mat file that return null and NullPointerException.
MatFileReader matfilereader = new MatFileReader("sourceData.mat");
MLArray mlArrayRetrieved = matfilereader.getMLArray("data");
System.out.println(mlArrayRetrieved);
System.out.println(mlArrayRetrieved.contentToString());
Also I have tried many times to convert .mat file to .csv or .xls in MATLAB Environment and Python Environment at Jupyter Notebook but, I did not get any result at these times, too.
That .mat file is going to be a adjacency matrix and will be a source for a specific algorithm in Cytoscape project. Hence, I must use it at Java Environment and I have decided to use the COLT Library for matrix manipulations. Suggestions and advises are going to help me so much. Thanks for reading.

just use find to get rows, columns and values of nonzeros elements and save these as text,csv or...:
[row, col, v] = find(my_spares_matrix);

Below is a code snippet using MFL that would result in a MATLAB-like printout of all values in your sparse matrix
Mat5.readFromFile("sourceData.mat")
.getSparse("data")
.forEach((row, col, real, imag) -> {
System.out.println(String.format("(%d,%d) \t %1.4f ", row + 1, col + 1, real));
});
The CSV workaround will work fine for the mentioned 750KB matrix, but it would likely become difficult to work with once data sets go beyond >50MB. MAT files store sparse data in a (binary) Compressed Sparse Column (CSC) format, which can be loaded with significantly less overhead than CSV files.

Related

How can I convert a .wav file to a float array ( edit the float array e.g. adding two signals) and back to a .wav file without whitenoise

I am trying to program an auralization via Ray-Tracing in processing. To edit a sample over the information from the Ray Tracer, i need to convert a .wav File (File-Format: PCM-signed,16bit,stereo,2 bytes/frame, little endian) to an Float Array.
I converted the audio via an audioInputStream and a DataInputStream, where I am loading the audio into an byte Array.
Then I convert the byte Array to a float array like this.
byte[] samples;
float[] audio_data = float(samples);
When I convert the float Array back to a .wav File, I'm getting the sound of the original Audio-File.
But when I'm adding another Float Array to the Original signal and convert it back to a. wav file via the method above(even if I'm adding the same signal), i get a white noise signal instead of the wanted signal (I can hear the original signal under the white noise modulated, but very very silent).
I read about this problem before, that there can be problems by the conversion from the float array to a byte array. That's because float is a 32bit datatype and byte (in java) is only 16 bits and somehow the bytes get mixed together wrong so the white noise is the result. In Processing there is a data type with signed 16bit integers (named: "short") but i can't modify the amplitude anymore, because therefore i need float values, which i can't convert to short.
I also tried to handle the overflow (amplitude) in the float array by modulating the signal from 16 bit values (-32768/32767) to values from -1/1 and back again after mixing (adding) the signals. The result gave me white noise. When i added more than 2 signals it gaves me nothing (nothing to hear).
The concrete Problem I want to solve is to add many signals (more than 1000 with a decent delay to create a kind of reverbation) in the form of float Arrays. Then I want to combine them to one Float Array that i want to save as an audio file without white noise.
I hope you guys can help me.

If you have true PCM data points, there should be no problem using simple addition. The only issue is that on rare occasions (assuming your audio is not too hot to begin with) the values will go out of range. This will tend create a harsh distortion, not white noise. The fact that you are getting white noise suggests to me that maybe you are not converting your PCM sums back to bytes correctly for the format that you are outputting.
Here is some code I use in AudioCue to convert PCM back to bytes. The format is assumed to be 16-bit, 44100 fps, stereo, little-endian. I'm working with PCM as normalized floats. This algorithm does the conversion for a buffer's worth of data at a time.
for (int i = 0, n = buffer.length; i < n; i++)
{
buffer[i] *= 32767;
audioBytes[i*2] = (byte) buffer[i];
audioBytes[i*2 + 1] = (byte)((int)buffer[i] >> 8 );
}
Sometimes, a function like Math.min(Math.max(audioval, -1), 1) or Math.min(Math.max(audioval, -32767), 32767) is used to keep the values in range. More sophisticated limiters or compressor algorithms will scale the volume to fit. But still, if this is not handled, the result should be distortion, not white noise.
If the error is happening at another stage, we will need to see more of your code.
All this said, I wish you luck with the 1000-point echo array reverb. I hadn't heard of this approach working. Maybe there are processors that can handle the computational load now? (Are you trying to do this in real time?) My only success with coding real-time reverberation has been to use the Schroeder method, plugging the structure and values from the CCMRA Freeberb, working off of code from Craig Lindley's now ancient (copyright 2001) book "Digital Audio with Java". Most of that book deals with obsolete GUI code (pre-Swing!), but the code he gives for AllPass and Comb filters is still valid.
I recall when I was working on this that I tracked down references a better reverb to try and code, but I would have to do some real digging to try and find my notes. I was feeling over my head at the time, as the algorithm was presented via block diagrams not coding details or even pseudo-code. Would like to work on this again though and get a better reverb than the Shroeder-type to work. The Schoeder was passable for sounds that were not too percussive.
Getting a solution for real-time ray tracing would be a valuable accomplishment. Many applications in AR/VR and games.

Fast Matlab Matrix to Java List conversion

I must massage data in a matrix of type 131072x1 int32 into a Java List<Integer> in Matlab. So far, the only working conversion I've come up with is to roll through the values and directly add them to a LinkedList.
count = size(data_flattened, 1);
ll = java.util.LinkedList;
for i = 1:count
ll.add(data_flattened(i));
end
Which is slow in the extreme (5 seconds). I've tried several formulations of converting first to a Java array and then to a List but I always end up with an array with 1 column and 131072 rows.
I need a way of quickly assigning an N-by-1 Matlab matrix of int32s to a Java List<Integer> type.

Convert to a cell
I found one way of getting Matlab to behave the way I want is to convert the matrix to cells.
cells = num2cell(data_flattened);
the_list = java.util.Arrays.asList(cells)
It is faster than rolling through the array and appending to the list, but it is still too slow. On average 0.25 seconds per conversion which is better but still too high.
Java 8 Stream
After some research and testing implementing a function in Java to handle the conversion to from an int[] to a List<Integer> in reasonable time (0.001 seconds).
public static List<Integer> flatten(int[] arr) {
return IntStream.of(arr).parallel().boxed().collect(Collectors.toList());
}
To use Java 8 you'll need to point your MATLAB_JAVA environment variable to the newer JRE. The location of your JRE can be found using java_home on a Mac.
/usr/libexec/java_home
Then in .bashrc or similar
export MATLAB_JAVA="$(/usr/libexec/java_home)/jre"
Launching MATLAB from the terminal will now correctly pick up the new JRE.
In Matlab you can check your Java version
version -java
and then in Matlab
matlab_data_flattened = matlab_data(:);
java_list = com.my.package.ClassName.flatten(matlab_data_flattened);

Java compiler error: lookup table exceeds 65535 limit

I'm running into this compiler error due to my extremely large lookup table based on this definition:
//92 X 182 array
private static final double[][] lookUpTable = new double[][]
{
{ numbers....}
};
As i understand it, dividing it up is a solution, but it would be extremely difficult to split this array up accurately. I also believe i could move it out to a file, but i don't know if i could format it in a way to help me, plus i don't want file reads every second.
Are there any other suggestions to help me get around this?

Convert your table to a file, embed the file as a resource, read it once in a static initialization block, and store it in a lookUpTable array. It will not be distinguishable from an array initialized through an aggregate, except there would be no 65535 limit. Storing in a static variable will help you avoid "reads every second".
As far as the format is concerned, you can put each row of the matrix in a separate line of the resource file. Reading and maintaining this file would be simple, because there would be no other mark-up around your numbers.
Here is a link to an answer explaining how to read a file from a resource.

Read the file once on demand.
As you have a table/matrix, I suggest having one line per row. Read each line and split the numbers and parse them individually.

You could keep the rows in a string (thus reducing the number of objects for java to handle) as comma separated values, and on program start, split each row and so build up your table of longs.

reading data from Matlab into Java

I'm trying to read a matrix produced in Matlab into a 2D array in java.
I've been using jmatio so far for writing from java to a .mat file (successfully), but now can't manage to go the other way around.
I've managed to import a matrix into an MLArray object using this code:
matfilereader = new MatFileReader("filename.mat");
MLArray j = matfilereader.getMLArray("dataname");
But other than getting its string representation I couldn't manage to access the data itself. I found no example for this or documentation on the library itself, and I actually wrote a function to parse the intire string into a double[][] array but that's only good if the matrix is smaller than 1000 items...
Would be grateful for any experience or tips,
thanks,
Amir

matfilereader.getMLArray has several subclasses to access different kinds of data in MLArray object.
To represent double array you can cast MLArray to MLDouble:
MLDouble j = (MLDouble)matfilereader.getMLArray("dataname");

I'm not familiar with that tool, but it's pretty old. Try saving to an older version of *.mat file and see if your results change. That is, add either the '-v7.0' or '-v6' flag when you save you r*.mat file.
Example code:
save filename var1 var2 -v7.0
or
save filename var1 var2 -v6

csv rows into separate txt files?

Task 1: Read each row from one csv file into one seprate txt file.
Task 2: Reverse: in one folder, read text from each txt file and put into a row in a single csv. So, read all txt files into one csv file.
How would you do this? Would Java or Python be good to get this task done in very quickly?
Update:
For Java, there are already some quite useful libraries you can use, for example opencsv or javacsv. But better have a look at wikipedia about csv if no knowledge on csv. And this post tells you all the possibilities in Java.
Note: Due to the simplicity of the question, some one pre-assume this is a homework. I hereby declare it is not.
More background: I am working on my own experiments on machine learning and setting up a large scale test set. I need crawl, scrape and file type transfer as the basic utility for the experiment. Building a lot of things by myself for now, and suddenly want to learn Python due to some recent discoveries and get the feeling Python is more concise than Java for many parsing and file handling situations. Hence got this question.
I just want to save time for both you and me by getting to the gist without stating the not-so-related background. And my questions is more about the second question "Java vs Python". Because I run into few lines of code of Python using some csv library (? not sure, that's why I asked), but just do not know how to use Python. That are all the reasons why I got this question. Thanks.

From what you write there is little need on using something specific for CSV files. In particular for Task 1, this is a pure data I/O operation on text files. In Python for instance:
for i,l in enumerate(open(the_file)):
f = open('new_file_%i.csv' % i, 'w')
f.write(l)
f.close()
For Task 2, if you can guarantee that each file has the same structure (same number of fields per row) it is again a pure data I/O operation:
# glob files
files = glob('file_*.csv')
target = open('combined.csv', 'w')
for f in files:
target.write(open(f).read())
target.write(new_line_speparator_for_your_platform)
target.close()
Whether you do this in Java or Python depends on the availability on the target system and your personal preference only.

In that case I would use python since it is often more concise than Java.
Plus, the CSV files are really easy to handle with Python without installing something. I don't know for Java.
Task 1
It would roughly be this based on an example from the official documentation:
import csv
with open('some.csv', 'r') as f:
reader = csv.reader(f)
rownumber = 0
for row in reader:
g=open("anyfile"+str(rownumber)+".txt","w")
g.write(row)
rownumber = rownumber + 1
g.close()
Task 2
f = open("csvfile.csv","w")
dirList=os.listdir(path)
for fname in dirList:
if fname[-4::] == ".txt":
g = open("fname")
for line in g: f.write(line)
g.close
f.close()

In python,
Task 1:
import csv
with open('file.csv', 'rb') as df:
reader = csv.reader(df)
for rownumber, row in enumerate(reader):
with open(''.join(str(rownumber),'.txt') as f:
f.write(row)
Task 2:
from glob import glob
with open('output.csv', 'wb') as output:
for f in glob('*.txt'):
with open(f) as myFile:
rows = myFile.readlines()
output.write(rows)
You will need to adjust these for your use cases.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.