I have a Tensorflow program running in Python, and for some convenience reasons I want to run the same program on Java, so I have to save my model and load it in my Java application.
My problem is that a don't know how to save a Tensor object, here is my code :
class Main:
def __init__(self, checkpoint):
...
self.g = tf.Graph()
self.sess = tf.Session()
self.img_placeholder = tf.placeholder(tf.float32,
shape=(1, 679, 1024, 3), name='img_placeholder')
#self.preds is an instance of Tensor
self.preds = transform(self.img_placeholder)
self.saver = tf.train.Saver()
self.saver.restore(self.sess, checkpoint)
def ffwd(...):
...
_preds = self.sess.run(self.preds, feed_dict=
{self.img_placeholder: self.X})
...
So since I can't create my Tensor (the transform function creates the NN behind the scenes...), I'am obliged to save it and reload it into Java. I have found ways of saving the session but not Tensor instances.
Could someone give me some insights on how to achieve this ?
Python Tensor objects are symbolic references to a specific output of an operation in the graph.
An operation in a graph can be uniquely identified by its string name. A specific output of that operation is identified by an integer index into the list of outputs of that operation. That index is typically zero since a vast majority of operations produce a single output.
To obtain the name of an Operation and the output index referred to by a Tensor object in Python you could do something like:
print(preds.op.name)
print(preds.value_index) # Most likely will be 0
And then in Java, you can feed/fetch nodes by name.
Let's say preds.op.name returned the string foo, and preds.value_index returned the integer 1, then in Java, you'd do the following:
session.runner().feed("img_placeholder").fetch("foo", 1)
(See javadoc for org.tensorflow.Session.Runner for details).
You may find the slides linked to in https://github.com/tensorflow/models/tree/master/samples/languages/java along with the speaker notes in those slides useful.
Hope that helps.
Related
I've programmed (Java) my own feed-forward network learning by back propagation. My network is trained to learn the XOR problem. I have an input matrix 4x2 and target 4x1.
Inputs:
{{0,0},
{0,1},
{1,0},
{1,1}}
Outputs:
{0.95048}
{-0.06721}
{-0.06826}
{0.95122}
I have this trained network and now I want to test it on new inputs like:
{.1,.9} //should result in 1
However, I'm not sure how to implement a float predict(double[] input) method. From what I can see, my problem is that my training data has a different size than my input data.
Please suggest.
EDIT:
The way I have this worded, it sounds like I want a regression value. However, I'd like the output to be a probability vector (classification) which I can then analyze.
In your situation your neural network should have a input with dimension 2 and output 1. So during training you will provide each example input {x0, x1} and output {y0} for it to learn. Then finally when predicting you can provide a vector {.9, .1} and get the desired output.
Basically, when you train a neural network, you get a bunch of parameters that can be used to predict the result. You get the result by adding the products of your features and weights in each layer and then you apply your activation function to it. for example, let's say your network has 3 layers (other than features), and each hidden layer has three neurons and your output layer has one neuron. W1 denotes your weights for the first layer, therefore it has a shape of [3,2]. With the same argument, W2, the weights for your second layer has a shape of [3,3]. Finally, W3 which is the weights for your output layer has a shape of [1,3]. Now if we use a function called g(z) as your activation function, you can calculate the result for an example like this:
Z1 = W1.X
A1 = g(Z1)
Z2 = W2.A1
A2 = g(Z2)
Z3 = W3.A2
A3 = g(Z3)
and A3 is your result, predicting the XOR of two numbers. Please note that I have not considered bias terms for this example.
I'm new to Java programming, and I ran into this problem:
I'm creating a program that reads a .csv file, converts its lines into objects and then manipulate these objects.
Being more specific, the application reads every line giving it an index and also reads certain values from those lines and stores them in TRIE trees.
The application then can read indexes from the values stored in the trees and then retrieve the full information of the corresponding line.
My problem is that, even though I've been researching the last couple of days, I don't know how to write these structures in binary files, nor how to read them.
I want to write the lines (with their indexes) in a binary indexed file and read only the exact index that I retrieved from the TRIEs.
For the tree writing, I was looking for something like this (in C)
fwrite(tree, sizeof(struct TrieTree), 1, file)
For the "binary indexed file", I was thinking on writing objects like the TRIEs, and maybe reading each object until I've read enough to reach the corresponding index, but this probably wouldn't be very efficient.
Recapitulating, I need help in writing and reading objects in binary files and solutions on how to create an indexed file.
I think you are (for starters) best off when trying to do this with serialization.
Here is just one example from stackoverflow: What is object serialization?
(I think copy&paste of the code does not make sense, please follow the link to read)
Admittedly this does not yet solve your index creation problem.
Here is an alternative to Java native serialization, Google Protocol Buffers.
I am going to write direct quotes from documentation mostly in this answer, so be sure to follow the link at the end of answer if you are interested into more details.
What is it:
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler.
In other words, you can serialize your structures in Java and deserialize at .net, pyhton etc. This you don't have in java native Serialization.
Performance:
This may vary according to use case but in principle GPB should be faster, as its built with performance and interchangeability in mind.
Here is stack overflow link discussing Java native vs GPB:
High performance serialization: Java vs Google Protocol Buffers vs ...?
How does it work:
You specify how you want the information you're serializing to be structured by defining protocol buffer message types in .proto files. Each protocol buffer message is a small logical record of information, containing a series of name-value pairs. Here's a very basic example of a .proto file that defines a message containing information about a person:
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phone = 4;
}
Once you've defined your messages, you run the protocol buffer compiler for your application's language on your .proto file to generate data access classes. These provide simple accessors for each field (like name() and set_name()) as well as methods to serialize/parse the whole structure to/from raw bytes.
You can then use this class in your application to populate, serialize, and retrieve Person protocol buffer messages. You might then write some code like this:
Person john = Person.newBuilder()
.setId(1234)
.setName("John Doe")
.setEmail("jdoe#example.com")
.build();
output = new FileOutputStream(args[0]);
john.writeTo(output);
Read all about it here:
https://developers.google.com/protocol-buffers/
You could look at GPB as an alternative format to XSD describing XML structures, just more compact and with faster serialization.
I'm trying to read a matrix produced in Matlab into a 2D array in java.
I've been using jmatio so far for writing from java to a .mat file (successfully), but now can't manage to go the other way around.
I've managed to import a matrix into an MLArray object using this code:
matfilereader = new MatFileReader("filename.mat");
MLArray j = matfilereader.getMLArray("dataname");
But other than getting its string representation I couldn't manage to access the data itself. I found no example for this or documentation on the library itself, and I actually wrote a function to parse the intire string into a double[][] array but that's only good if the matrix is smaller than 1000 items...
Would be grateful for any experience or tips,
thanks,
Amir
matfilereader.getMLArray has several subclasses to access different kinds of data in MLArray object.
To represent double array you can cast MLArray to MLDouble:
MLDouble j = (MLDouble)matfilereader.getMLArray("dataname");
I'm not familiar with that tool, but it's pretty old. Try saving to an older version of *.mat file and see if your results change. That is, add either the '-v7.0' or '-v6' flag when you save you r*.mat file.
Example code:
save filename var1 var2 -v7.0
or
save filename var1 var2 -v6
Okay, so I have been reading a lot about Hadoop and MapReduce, and maybe it’s because I’m not as familiar with iterators as most, but I have a question I can’t seem to find a direct answer too. Basically, as I understand it, the map function is executed in parallel by many machine and/or cores. Thus, whatever you are working on must not depend on prior code being executed for the program to make any kind of speed gains. This works perfectly for me, but what I’m doing requires me to test information in small batches. Basically I need to send batches of lines in a .csv as arrays of 32, 64, 128 or whatever lines each. Like lines 0 – 127 go to core1’s execution of the map function, lines 128 – 255 lines go to core2’s, etc., .etc . Also I need to have the contents of each batch available as a whole inside the function, as if I had passed it an array. I read a little about how the new java API allows for something called push and pull, and that this allows things to be sent in batches, but I couldn’t find any example code. I dunno, I’m going to continue researching, and I’ll post anything I find, but if anyone knows, could they please post in this thread. I would really appreciate any help I might receive.
edit
If you could simply ensure that the chunks of the .csv are sent in sequence you could preform it this way. I guess this also assumes that there are globals in mapreduce.
//** concept not code **//
GLOBAL_COUNTER = 0;
GLOBAL_ARRAY = NEW ARRAY();
map()
{
GLOBAL_ARRAY[GLOBAL_COUNTER] = ITERATOR_VALUE;
GLOBAL_COUNTER++;
if(GLOBAL_COUNTER == 127)
{
//EXECUTE TEST WITH AN ARRAY OF 128 VALUES FOR COMPARISON
GLOBAL_COUNTER = 0;
}
}
If you're trying to get a chunk of lines from your CSV file into the mapper, you might consider writing your own InputFormat/RecordReader and potentially your own WritableComparable object. With the custom InputFormat/RecordReader you'll be able to specify how objects are created and passed to the mapper based on the input you receive.
If the mapper is doing what you want, but you need these chunks of lines sent to the reducer, make the output key for the mapper the same for each line you want in the same reduce function.
The default TextInputFormat will give input to your mapper like this (the keys/offsets in this example are just random numbers):
0 Hello World
123 My name is Sam
456 Foo bar bar foo
Each of those lines will be read into your mapper as a key,value pair. Just modify the key to be the same for each line you need and write it to the output:
0 Hello World
0 My name is Sam
1 Foo bar bar foo
The first time the reduce function is read, it will receive a key,value pair with the key being "0" and the value being an Iterable object containing "Hello World" and "My name is Sam". You'll be able to access both of these values in the same reduce method call by using the Iterable object.
Here is some pseudo code:
int count = 0
map (key, value) {
int newKey = count/2
context.write(newKey,value)
count++
}
reduce (key, values) {
for value in values
// Do something to each line
}
Hope that helps. :)
If the end goal of what you want is to force certain sets to go to certain machines for processing you want to look into writing your own Partitioner. Otherwise, Hadoop will split data automatically for you depending on the number of reducers.
I suggest reading the tutorial on the Hadoop site to get a better understanding of M/R.
If you simply want to send N lines of input to a single mapper, you can user the NLineInputFormat class. You could then do the line parsing (splitting on commas, etc) in the mapper.
If you want to have access to the lines before and after the line the mapper is currently processing, you may have to write your own input format. Subclassing FileInputFormat is usually a good place to start. You could create an InputFormat that reads N lines, concatenates them, and sends them as one block to a mapper, which then splits the input into N lines again and begins processing.
As far as globals in Hadoop go, you can specify some custom parameters when you create the job configuration, but as far as I know, you cannot change them in a worker and expect the change to propagate throughout the cluster. To set a job parameter that will be visible to workers, do the following where you are creating the job:
job.getConfiguration().set(Constants.SOME_PARAM, "my value");
Then to read the parameters value in the mapper or reducer,
public void map(Text key, Text value, Context context) {
Configuration conf = context.getConfiguration();
String someParam = conf.get(Constants.SOME_PARAM);
// use someParam in processing input
}
Hadoop has support for basic types such as int, long, string, bool, etc to be used in parameters.
I have written a kernel density estimator in Java that takes input in the form of ESRI shapefiles and outputs a GeoTIFF image of the estimated surface. To test this module I need an example shapefile, and for whatever reason I have been told to retrieve one from the sample data included in R. Problem is that none of the sample data is a shapefile...
So I'm trying to use the shapefiles package's funciton convert.to.shapefile(4) to convert the bei dataset included in the spatstat package in R to a shapefile. Unfortunately this is proving to be harder than I thought. Does anyone have any experience in doing this? If you'd be so kind as to lend me a hand here I'd greatly appreciate it.
Thanks,
Ryan
References:
spatstat,
shapefiles
There are converter functions for Spatial objects in the spatstat and maptools packages that can be used for this. A shapefile consists of at least points (or lines or polygons) and attributes for each object.
library(spatstat)
library(sp)
library(maptools)
data(bei)
Coerce bei to a Spatial object, here just points without attributes since there are no "marks" on the ppp object.
spPoints <- as(bei, "SpatialPoints")
A shapefile requires at least one column of attribute data, so create a dummy.
dummyData <- data.frame(dummy = rep(0, npoints(bei)))
Using the SpatialPoints object and the dummy data, generate a SpatialPointsDataFrame.
spDF <- SpatialPointsDataFrame(spPoints, dummyData)
At this point you should definitely consider what the coordinate system used by bei is and whether you can represent that with a WKT CRS (well-known text coordinate reference system). You can assign that to the Spatial object as another argument to SpatialPointsDataFrame, or after create with proj4string(spDF) <- CRS("+proj=etc...") (but this is an entire problem all on its own that we could write pages on).
Load the rgdal package (this is the most general option as it supports many formats and uses the GDAL library, but may not be available because of system dependencies.
library(rgdal)
(Use writePolyShape in the maptools package if rgdal is not available).
The syntax is the object, then the "data source name" (here the current directory, this can be a full path to a .shp, or a folder), then the layer (for shapefiles the file name without the extension), and then the name of the output driver.
writeOGR(obj = spDF, dsn = ".", layer = "bei", driver = "ESRI Shapefile")
Note that the write would fail if the "bei.shp" already existed and so would have to be deleted first unlink("bei.shp").
List any files that start with "bei":
list.files(pattern = "^bei")
[1] "bei.dbf" "bei.shp" "bei.shx"
Note that there is no general "as.Spatial" converter for ppp objects, since decisions must be made as to whether this is a point patter with marks and so on - it might be interesting to try writing one, that reports on whether dummy data was required and so on.
See the following vignettes for further information and details on the differences between these data representations:
library(sp); vignette("sp")
library(spatstat); vignette("spatstat")
A general solution is:
convert the "ppp" or "owin" classed objects to appropriate classed objects from the sp package
use the writeOGR() function from package rgdal to write the Shapefile out
For example, if we consider the hamster data set from spatstat:
require(spatstat)
require(maptools)
require(sp)
require(rgdal)
data(hamster)
first convert this object to a SpatialPointsDataFrame object:
ham.sp <- as.SpatialPointsDataFrame.ppp(hamster)
This gives us a sp object to work from:
> str(ham.sp, max = 2)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
..# data :'data.frame': 303 obs. of 1 variable:
..# coords.nrs : num(0)
..# coords : num [1:303, 1:2] 6 10.8 25.8 26.8 32.5 ...
.. ..- attr(*, "dimnames")=List of 2
..# bbox : num [1:2, 1:2] 0 0 250 250
.. ..- attr(*, "dimnames")=List of 2
..# proj4string:Formal class 'CRS' [package "sp"] with 1 slots
This object has a single variable in the #data slot:
> head(ham.sp#data)
marks
1 dividing
2 dividing
3 dividing
4 dividing
5 dividing
6 dividing
So say we now want to write out this variable as an ESRI Shapefile, we use writeOGR()
writeOGR(ham.sp, "hamster", "marks", driver = "ESRI Shapefile")
This will create several marks.xxx files in directory hamster created in the current working directory. That set of files is the ShapeFile.
One of the reasons why I didn't do the above with the bei data set is that it doesn't contain any data and thus we can't coerce it to a SpatialPointsDataFrame object. There are data we could use, in bei.extra (loaded at same time as bei), but these extra data or on a regular grid. So we'd have to
convert bei.extra to a SpatialGridDataFrame object (say bei.spg)
convert bei to a SpatialPoints object (say bei.sp)
overlay() the bei.sp points on to the bei.spg grid, yielding values from the grid for each of the points in bei
that should give us a SpatialPointsDataFrame that can be written out using writeOGR() as above
As you see, that is a bit more involved just to give you a Shapefile. Will the hamster data example I show suffice? If not, I can hunt out my Bivand et al tomorrow and run through the steps for bei.