I'm currently building a model in Python and get the result from another Java client.
I need to know how can I get the float[][] or List<List<Float>> (something similar like that) from a TensorProto which has more than 1 dimension.
In Python, it could be very easy to do this job:
from tensorflow.python.framework import tensor_util
.
.
.
print tensor_util.MakeNdarray(tensorProto)
===== UPDATE =======:
Java's tensorProto.getFloatValList() does not work if it was created by Python's tensor_util.make_tensor_proto(vector), either.
All the case above can be solved by #Ash's answer
As Allen mentioned in a comment, this is probably a good feature request.
But in the interim, a workaround would be to construct and run a graph that parses the encoded protobuf and returns a Tensor. It won't be particularly efficient, but you could do something like this:
import org.tensorflow.*;
import java.util.Arrays;
public final class ProtoToTensor {
public static Tensor<Float> tensorFromSerializedProto(byte[] serialized) {
// One may way to cache the Graph and Session as member variables to avoid paying the cost of
// graph and session construction on each call.
try (Graph g = buildGraphToParseProto();
Session sess = new Session(g);
Tensor<String> input = Tensors.create(serialized)) {
return sess.runner()
.feed("input", input)
.fetch("output")
.run()
.get(0)
.expect(Float.class);
}
}
private static Graph buildGraphToParseProto() {
Graph g = new Graph();
// The graph construction process in Java is currently (as of TensorFlow 1.4) very verbose.
// Once https://github.com/tensorflow/tensorflow/issues/7149 is resolved, this should become
// *much* more convenient and succint.
Output<String> in =
g.opBuilder("Placeholder", "input")
.setAttr("dtype", DataType.STRING)
.setAttr("shape", Shape.scalar())
.build()
.output(0);
g.opBuilder("ParseTensor", "output").setAttr("out_type", DataType.FLOAT).addInput(in).build();
return g;
}
public static void main(String[] args) {
// Let's say you got a byte[] representation of the proto somehow.
// In this case, I got it from Python from the following program
// that serializes the 1x1 matrix:
/*
import tensorflow as tf
list(bytearray(tf.make_tensor_proto([[1.]]).SerializeToString()))
*/
byte[] bytes = {8, 1, 18, 8, 18, 2, 8, 1, 18, 2, 8, 1, 42, 4, 0, 0, (byte)128, 63};
try (Tensor<Float> t = tensorFromSerializedProto(bytes)) {
// You can now get an float[][] array using t.copyTo().
// t.shape() gives shape information.
System.out.println("Tensor: " + t);
float[][] f = t.copyTo(new float[1][1]);
System.out.println("float[][]: " + Arrays.deepToString(f));
}
}
}
As you can see, this is using some pretty low-level APIs to construct the graph and session. It would be reasonable to have a feature request that replaces all of this with a single line:
Tensor<Float> t = Tensor.createFromProto(serialized);
Related
I'm new to modding Minecraft and I was wondering how to change mob spawn rates. Let's say we want to spawn lots of Endermen for example.
So far I've found the code that seems to sets the spawn frequency in net.minecraft.world.biome DefaultBiomeFeatures.java:
public static void withHostileMobs(MobSpawnInfo.Builder builder) {
...
builder.withSpawner(EntityClassification.MONSTER, new MobSpawnInfo.Spawners(EntityType.ENDERMAN, 10, 1, 4));
...
}
meaning Endermen spawn in most biomes, albeit rarely (10 is the weight, creepers and spiders have 100).
I know this DefaultBiome is then used by BiomeMaker.java to makeGiantTaigaBiome, makeBirchForestBiome etc. My conclusion is that I need to change the biomes to change spawn rates.
I can access the biomes using either BiomeRegistry or ForgeRegistries.BIOMES. I see 2 approaches here:
Replace the map of biomes completely. Sadly its register method is private so I cannot add new biomes to replace the existing ones. I also read here that removing them is apparently not possible.
Modify the existing map of biomes. This would use biome.withMobSpawnSettings(MobSpawnInfo mobSpawnSettings) to modify the biome in-place. But the MobSpawnInfo class once again does not have any public setters, so I don't see how I can get a modified MobSpawnInfo without re-creating the entire MobSpawnInfo object by hand.
Most solutions online (1, 2) seem to suggest the following which sadly do no longer work in the current 1.16.4:
ModLoader.addSpawn(YOURENTITY.class, 25, 1, 3);
EntityRegistry.addSpawn(...)
Any help would be greatly appreciated.
Do not try to modify the existing Minecraft package using Mixins -- that is called coremodding and frowned upon for various reasons. The correct approach for 1.16 is to subscribe to a BiomeLoadingEvent and then monkey-patch all biomes after they have been loaded:
1.16
#Mod("example")
public class ExampleMod
{
public ExampleMod() {
MinecraftForge.EVENT_BUS.register(this);
}
#SubscribeEvent(priority = EventPriority.HIGH)
public void onBiomeLoadingEvent(BiomeLoadingEvent event) {
List<MobSpawnInfo.Spawners> spawns =
event.getSpawns().getSpawner(EntityClassification.MONSTER);
// Remove existing Enderman spawn information
spawns.removeIf(e -> e.type == EntityType.ENDERMAN);
// Make Enderman spawns more frequent and add Blaze spawns in all biomes
spawns.add(new MobSpawnInfo.Spawners(EntityType.BLAZE, 200, 1, 4));
spawns.add(new MobSpawnInfo.Spawners(EntityType.ENDERMAN, 200, 1, 4));
}
}
1.15 (might also work in 1.14, 1.13, 1.12, ...)
#Mod("example")
public class ExampleMod
{
public ExampleMod() {
ForgeRegistries.BIOMES.forEach(biome -> {
List<Biome.SpawnListEntry> spawns = biome.getSpawns(EntityClassification.MONSTER);
spawns.removeIf(e -> e.entityType == EntityType.ENDERMAN);
spawns.add(new Biome.SpawnListEntry(EntityType.BLAZE, 200, 1, 4));
spawns.add(new Biome.SpawnListEntry(EntityType.ENDERMAN, 200, 1, 4));
});
}
}
Edit: Note that the InControl can be used to achieve a similar effect, requiring no coding.
If I have a list of timestamps and a file path of an object that I want to convert, can I make a collection of converters that expect the method signature Converter(filePath, start, end)?
More Detail (Pseuo-Code):
Some list that has timestamps (Imagine they're in seconds) path = somewhere, list = {0, 15, 15, 30},
How can I do something like this:
list.stream.magic.map(start, end -> new Converter (path, start, end))?
Result: new Converter (path, 0, 15), new Converter(path, 15, 30)
Note: I'm aware of BiFunction, but to my knowledge, streams do not implement it.
There are many approaches to get the required result using streams.
But first of all, you're not obliged to use Stream API, and in case of dealing with lists of tens and hundreds elements I would suggest to use plain old list iterations.
Just for instance try the code sample below.
We easily can see the two surface problems arising from the nature of streams and their incompatibility with the very idea of pairing its elements:
it's necessary to apply stateful function which is really tricky for using in map() and should be considered dirty coding; and the mapping produce some nulls on even places that should be filtered out properly;
problems are there when stream contains odd number of elements, and you never can predict if it does.
If you decide to use streams then to make it a clear way we need a custom implementation of Iterator, Spliterator or Collector - depends on demands.
Anyway there are couple of non-obvious corner cases you won't be happy to implement by yourself, so can try tons of third-party stream libs.
Two of the most popular are Streamex and RxJava.
Definitely they have tools for pairing stream elements... but don't forget to check the performance for your case!
import java.util.Objects;
import java.util.function.Function;
import java.util.stream.Stream;
public class Sample
{
public static void main(String... arg)
{
String path = "somewhere";
Stream<Converter> stream = Stream.of(0, 15, 25, 30).map(
new Function<Integer, Converter>()
{
int previous;
boolean even = true;
#Override
public Converter apply(Integer current)
{
Converter converter = even ? null : new Converter(path, previous, current);
even = !even;
previous = current;
return converter;
}
}).filter(Objects::nonNull);
stream.forEach(System.out::println);
}
static class Converter
{
private final String path;
private final int start;
private final int end;
Converter(String path, int start, int end)
{
this.path = path;
this.start = start;
this.end = end;
}
public String toString()
{
return String.format("Converter[%s,%s,%s]", path, start, end);
}
}
}
I have a generic question concerning the structuring of code in Java Spark applications. I want to separate the code for the implementation of Spark transformations from the calling on RDDs so the source code of the application stays clear even when using lots of transformations containing lots of lines of code.
I'll give you a short example first. In this scenario the implementation of a flatMap transformation is provided as an anonymous inner class. This is a simple application that reads an RDD of integers and then multiplies each element to an integer array which was broadcasted to all worker nodes before:
public static void main(String[] args) {
SparkConf conf = new SparkConf().setMaster("local").setAppName("MyApp");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<Integer> result = sc.parallelize(Arrays.asList(5, 8, 9));
final Broadcast<int[]> factors = sc.broadcast(new int[] { 1, 2, 3 });
result = result.flatMap(new FlatMapFunction<Integer, Integer>() {
public Iterable<Integer> call(Integer t) throws Exception {
int[] values = factors.value();
LinkedList<Integer> result = new LinkedList<Integer>();
for (int value : values) result.add(t * value);
return result;
}
});
System.out.println(result.collect()); // [5, 10, 15, 8, 16, 24, 9, 18, 27]
sc.close();
}
In order to structure code I have extracted the implementation of the Spark functions to a different class. The class SparkFunctions provides the implementation for the flatMap transformation and has a setter method to get a reference to the broadcast variable (...in my real-world scenario there would be many operations in this class which all access the broadcasted data).
I have experienced that a method representing a Spark transformation can be static as long as it is not accessing a Broadcast variable or an Accumulator variable. Why? Static methods can only access static attributes. A static reference to a Broadcast variable is always null (probably as it is not serialized when Spark sends the class SparkFunctions to the worker nodes).
#SuppressWarnings("serial")
public class SparkFunctions implements Serializable {
private Broadcast<int[]> factors;
public SparkFunctions() {
}
public void setFactors(Broadcast<int[]> factors) {
this.factors = factors;
}
public final FlatMapFunction<Integer, Integer> myFunction = new FlatMapFunction<Integer, Integer>() {
public Iterable<Integer> call(Integer t) throws Exception {
int[] values = factors.value();
LinkedList<Integer> result = new LinkedList<Integer>();
for (int value : values) result.add(t * value);
return result;
}
};
}
This is the second version of the application using the class SparkFunctions:
public static void main(String[] args) {
SparkConf conf = new SparkConf().setMaster("local").setAppName("MyApp");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<Integer> result = sc.parallelize(Arrays.asList(5, 8, 9));
final Broadcast<int[]> factors = sc.broadcast(new int[] { 1, 2, 3 });
// 1) Initializing
SparkFunctions functions = new SparkFunctions();
// 2) Pass reference of broadcast variable
functions.setFactors(factors);
// 3) Implementation is now in the class SparkFunctions
result = result.flatMap(functions.myFunction);
System.out.println(result.collect()); // [5, 10, 15, 8, 16, 24, 9, 18, 27]
sc.close();
}
Both versions of the application are working (locally and in a cluster setup) but I am asking if they are equally efficient.
Question 1: In my opinion, Spark serializes the class SparkFunctions including the Broadcast variable and sends it to the worker nodes so that the nodes can use the function in their tasks. Is the data sent twice to the worker nodes, first on the broadcast using SparkContext, and then another time on the serialization of the class SparkFunctions? Or is it even sent once per element (plus 1 for the broadcast)?
Question 2: Can you provide me with suggestions on how the source code might be structured otherwise?
Please don't provide solutions how to prevent a Broadcast. I have a real-world application which is much more complex.
Similar questions that I have found which were not really helpful:
Spark Java Code Structure
BroadCast Variables In Spark
Spark: passing broadcast variable to executors
Thanks in advance for your help!
This is regarding the Question1
When a spark job is submitted, the jobs are divided into stages-> tasks. The tasks actually carries out the execution of the transformations and actions on worker nodes. The drivers's sumbitTask() will serialize the functions and metadata about the broadcast variable to all nodes.
Anatomy of how broadcast works.
The Driver creates a local directory to store the data to be broadcasted and launches a HttpServer with access to the directory. The data is actually written into the directory when the broadcast is called (val bdata = sc.broadcast(data)). At the same time, the data is also written into driver's blockManger with a StorageLevel memory + disk. Block manager allocates a blockId (of type BroadcastBlockId) for the data.
The real data is broadcasted only when an executor deserializes the task it has received, it also gets the broadcast variable's metadata, in the form of a Broadcast object. It then calls the readObject() method of the metadata object (bdata variable). This method will first check the local block manager to see if there's already a local copy. If not, the data will be fetched from the driver. Once the data is fetched, it's stored in the local block manager for subsequent uses.
I decided to write my own read and write object methods by implementing the Json.Serializable interface because I was unhappy with how Json does it's automated object writing (it omits arrays). My write methods work properly, however for some reason I get a NullPointerException when I try to read the values back, as if I'm looking for a value by the incorrect name, which I'm certain I'm not doing; the write and read names are identical. These are my read and write methods and the Json output (the error occurs at the first readValue() call).
#Override
public void write(Json json)
{
json.writeObjectStart(this.getName());
json.writeValue("Level", level);
json.writeValue("Health", health);
json.writeValue("Allegiance", alle);
json.writeValue("Stats", stats);
json.writeValue("Has Moved", hasMoved);
json.writeValue("Position", new Point((int)this.getX(), (int)this.getY()));
json.writeObjectEnd();
}
#Override
public void read(Json json, JsonValue jsonData)
{
level = json.readValue("Level", Integer.class, jsonData);
health = json.readValue("Health", Integer.class, jsonData);
alle = json.readValue("Allegiance", Allegiance.class, jsonData);
stats = json.readValue("Stats", int[].class, jsonData);
hasMoved = json.readValue("Has Moved", Boolean.class, jsonData);
Point p = json.readValue("Position", Point.class, jsonData);
this.setPosition(p.x, p.y);
}
/////////////////////////////////////////////////////////////////////
player: {
party: {}
},
state: state1,
map: {
foes: {
units: [
{
class: com.strongjoshuagames.reverseblade.game.units.UnitWolf,
Wolf: {
Level: 5,
Health: 2,
Allegiance: FOE,
Stats: [ 2, 3, 3, 4, 3, 4, 3, 5 ],
"Has Moved": false,
Position: {
x: 320,
y: 320
}
}
}
]
}
}
Note that I've read objects from the same file this is being saved in before, so the file shouldn't be an issue.
I'm not 100% sure how the JSON library works, but I believe that since you do json.writeObjectStart(this.getName()); in your write function, you have to 'reverse' this in your read function like everything else you wrote. In order to do this, you need to get the JsonValue's first child and get it's Level, Health, etc. I'm not sure about the API so I can't give exact code, but it'd be something like this:
level = json.readValue("Level", Integer.class, jsonData.child());
Think of it like this: I make a box and put a dictionary in it. I can't just lookup words in the box, I have to take the dictionary out first. Likewise, you need to get the object you wrote first before you can look up its fields.
What is the simplest way to implement a parallel computation (e.g. on a multiple core processor) using Java.
I.E. the java equivalent to this Scala code
val list = aLargeList
list.par.map(_*2)
There is this library, but it seems overwhelming.
http://gee.cs.oswego.edu/dl/jsr166/dist/extra166ydocs/
Don't give up so fast, snappy! ))
From the javadocs (with changes to map to your f) the essential matter is really just this:
ParallelLongArray a = ... // you provide
a.replaceWithMapping (new LongOp() { public long op(long a){return a*2L;}};);
is pretty much this, right?
val list = aLargeList
list.par.map(_*2)
& If you are willing to live with a bit less terseness, the above can be a reasonably clean and clear 3 liner (and of course, if you reuse functions, then its the same exact thing as Scala - inline functions.):
ParallelLongArray a = ... // you provide
LongOp f = new LongOp() { public long op(long a){return a*2L;}};
a.replaceWithMapping (f);
[edited above to show concise complete form ala OP's Scala variant]
and here it is in maximal verbose form where we start from scratch for demo:
import java.util.Random;
import jsr166y.ForkJoinPool;
import extra166y.Ops.LongGenerator;
import extra166y.Ops.LongOp;
import extra166y.ParallelLongArray;
public class ListParUnaryFunc {
public static void main(String[] args) {
int n = Integer.parseInt(args[0]);
// create a parallel long array
// with random long values
ParallelLongArray a = ParallelLongArray.create(n-1, new ForkJoinPool());
a.replaceWithGeneratedValue(generator);
// use it: apply unaryLongFuncOp in parallel
// to all values in array
a.replaceWithMapping(unaryLongFuncOp);
// examine it
for(Long v : a.asList()){
System.out.format("%d\n", v);
}
}
static final Random rand = new Random(System.nanoTime());
static LongGenerator generator = new LongGenerator() {
#Override final
public long op() { return rand.nextLong(); }
};
static LongOp unaryLongFuncOp = new LongOp() {
#Override final public long op(long a) { return a * 2L; }
};
}
Final edit and notes:
Also note that a simple class such as the following (which you can reuse across your projects):
/**
* The very basic form w/ TODOs on checks, concurrency issues, init, etc.
*/
final public static class ParArray {
private ParallelLongArray parr;
private final long[] arr;
public ParArray (long[] arr){
this.arr = arr;
}
public final ParArray par() {
if(parr == null)
parr = ParallelLongArray.createFromCopy(arr, new ForkJoinPool()) ;
return this;
}
public final ParallelLongArray map(LongOp op) {
return parr.replaceWithMapping(op);
}
public final long[] values() { return parr.getArray(); }
}
and something like that will allow you to write more fluid Java code (if terseness matters to you):
long[] arr = ... // you provide
LongOp f = ... // you provide
ParArray list = new ParArray(arr);
list.par().map(f);
And the above approach can certainly be pushed to make it even cleaner.
Doing that on one machine is pretty easy, but not as easy as Scala makes it. That library you posted is already apart of Java 5 and beyond. Probably the simplest thing to use is a ExecutorService. That represents a series of threads that can be run on any processor. You send it tasks and those things return results.
http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
http://www.fromdev.com/2009/06/how-can-i-leverage-javautilconcurrent.html
I'd suggest using ExecutorService.invokeAll() which will return a list of Futures. Then you can check them to see if their done.
If you're using Java7 then you could use the fork/join framework which might save you some work. With all of these you can build something very similar to Scala parallel arrays so using it is fairly concise.
Using threads, Java doesn't have this sort of thing built-in.
There will be an equivalent in Java 8: http://www.infoq.com/articles/java-8-vs-scala