Why are my VBOs slower than display lists?

Why are my VBOs slower than display lists? - java

I created two simple voxel engines, literally just chunks that hold cubes. For the first one, I use display lists and can render hundreds of chunks at 60 FPS no problem, despite the fact that the technology behind it is years old and deprecated by now. With my VBO version, I try to render 27 chunks and I suddenly drop to less than 50 FPS. What gives? I use shaders for my VBO version, but not for display list one. Without shaders for the VBO version, I still get the same FPS rate. I'll post some relevant code:
VBO
Initialization of chunk:
public void initGL() {
rand = new Random();
sizeX = (int) pos.getX() + CHUNKSIZE;
sizeY = (int) pos.getY() + CHUNKSIZE;
sizeZ = (int) pos.getZ() + CHUNKSIZE;
tiles = new byte[sizeX][sizeY][sizeZ];
vCoords = BufferUtils.createFloatBuffer(CHUNKSIZE * CHUNKSIZE * CHUNKSIZE * (3 * 4 * 6));
cCoords = BufferUtils.createFloatBuffer(CHUNKSIZE * CHUNKSIZE * CHUNKSIZE * (4 * 4 * 6));
createChunk();
verticeCount = CHUNKSIZE * CHUNKSIZE * CHUNKSIZE * (4 * 4 * 6);
vCoords.flip();
cCoords.flip();
vID = glGenBuffers();
glBindBuffer(GL_ARRAY_BUFFER, vID);
glBufferData(GL_ARRAY_BUFFER, vCoords, GL_STATIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
cID = glGenBuffers();
glBindBuffer(GL_ARRAY_BUFFER, cID);
glBufferData(GL_ARRAY_BUFFER, cCoords, GL_STATIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
}
private void createChunk() {
for (int x = (int) pos.getX(); x < sizeX; x++) {
for (int y = (int) pos.getY(); y < sizeY; y++) {
for (int z = (int) pos.getZ(); z < sizeZ; z++) {
if (rand.nextBoolean() == true) {
tiles[x][y][z] = Tile.Grass.getId();
} else {
tiles[x][y][z] = Tile.Void.getId();
}
vCoords.put(Shape.createCubeVertices(x, y, z, 1));
cCoords.put(Shape.getCubeColors(tiles[x][y][z]));
}
}
}
}
And then rendering:
public void render() {
glBindBuffer(GL_ARRAY_BUFFER, vID);
glVertexPointer(3, GL_FLOAT, 0, 0L);
glBindBuffer(GL_ARRAY_BUFFER, cID);
glColorPointer(4, GL_FLOAT, 0, 0L);
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_COLOR_ARRAY);
shader.use();
glDrawArrays(GL_QUADS, 0, verticeCount);
shader.release();
glDisableClientState(GL_COLOR_ARRAY);
glDisableClientState(GL_VERTEX_ARRAY);
}
I know I use quads, and that's bad, but I'm also using quads for my display list engine. The shaders are very simple, all they do is take a color and apply it to the vertices, I won't even post them they are that simple.
Display List
Initialization:
public void init() {
rand = new Random();
opaqueID = glGenLists(1);
tiles = new byte[(int) lPosition.x][(int) lPosition.y][(int) lPosition.z];
genRandomWorld();
rebuild();
}
public void rebuild() {
glNewList(opaqueID, GL_COMPILE);
glBegin(GL_QUADS);
for (int x = (int) sPosition.x; x < (int) lPosition.x; x++) {
for (int y = (int) sPosition.y; y < (int) lPosition.y; y++) {
for (int z = (int) sPosition.z; z < (int) lPosition.z; z++) {
if (checkCubeHidden(x, y, z)) {
// check if tiles hidden. if not, add vertices to
// display list
if (type != 0) {
Tile.getTile(tiles[x][y][z]).getVertices(x, y, z, 1, spritesheet.getTextureCoordsX(tiles[x][y][z]), spritesheet.getTextureCoordsY(tiles[x][y][z]));
} else {
Tile.getTile(tiles[x][y][z]).getVertices(x, y, z, 1);
}
}
}
}
}
glEnd();
glEndList();
spritesheet.bind();
}
I should note that in my display list version, I only add in the visible cubes. So, that may be an unfair advantage, but it should not bring the VBO version down to that FPS with just 27 chunks versus 500 chunks for the display list version.
I render like this:
public void render() {
if (tiles.length != -1) {
glEnable(GL_BLEND);
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
glCallList(opaqueID);
}
}
So, after all of that code, I really still wonder why my VBO version is just so darn slow? I do have a one dimensional list of chunks in my display list version for when I'm calling them to render, and a 3 dimensional one in my VBO version, but I think the JVM pretty much eliminates any lag with the extra dimensions. So, what am I doing wrong?

It is hard to answer such question without having an actual project and a profiler at hand, so these are theories:
You don't show your Display Lists generation code in detail, so I'm assuming you are doing something alike glColor(); glVertex3f(); in a loop (not that you declared color once and done with it).
Display List implementation is implementation-specific, but usually that is interleaved array of vertex properties, because that is much more friendly to a cache (all vertice props are tightly aligned by 16bytes instead of being spread by a size of array). On the other hand, VBO you use is coming in two non-interleaved chunks - Coordinates and Colors. This could cause excessive unfriendly cache usage (especially with big amounts of data).
As noted in comments:
try interleaving your position and colour data in a single buffer. That is the usual recommendation for static data as it gives better memory access patterns during rendering. – GuyRT`

Related

Android semantic segmentation post-processing is too slow

I'd really appreciate it if anyone can advise with a task I've been working without success for the last week.
I have semantic segmentation model (MobileNetV3 + Lightweight ASPP).Short info: input - 1024x1024, output - same size and 2 classes (bg and vehicle), so my output shape is (1, 1048576, 2). I'm not the mobile dev or java world guy, so I used a few complete andoid examples for image segmentation to test it:
the one from google: https://github.com/tensorflow/examples/tree/master/lite/examples/image_segmentation
and another one open-sourced: https://github.com/pillarpond/image-segmenter-android
I successfully converted it to tflite format and its inference time on OnePlus 7 with GPU enabled and 10 threads is between 105-140ms for such size. But here I run into a problem: general execution time in these two android examples or any you can find for semantic segmentation is about 1050-1300ms (which is less than 1FPS). The slower part of this pipeline is image post-processing (~900-1150ms). You can see that part in the Deeplab#segment method. Since I have only 1 class besides bg - I don't have this third loop, but everything else is untouched and still very slow. Output size is not small in comparison to other common mobile sizes like 128/226/512, but still. I think it shouldn't take so much time to process 1024x1024 matrix and draw rectangles in canvas on modern smartphones.
I tried different solutions, like splitting matrix manipulations into threads or creating all these objects like RectF and Recognition once before and just filling their attributes with new data inside nested loops, but I didn't succeed on either of them. On the desktop side I easily handle it with numpy and opencv and I don't even close to understanding how can I do the same in Android and will it even be efficient or not.
Here's code which I use in python:
CLASS_COLORS = [(0, 0, 0), (255, 255, 255)] # black for bg and white for mask
def get_image_array(image_input, width, height):
img = cv2.imread(image_input, 1)
img = cv2.resize(img, (width, height))
img = img.astype(np.float32)
img[:, :, 0] -= 128.0
img[:, :, 1] -= 128.0
img[:, :, 2] -= 128.0
img = img[:, :, ::-1]
return img
def get_segmentation_array(seg_arr, n_classes):
output_height = seg_arr.shape[0]
output_width = seg_arr.shape[1]
seg_img = np.zeros((output_height, output_width, 3))
for c in range(n_classes):
seg_arr_c = seg_arr[:, :] == c
seg_img[:, :, 0] += ((seg_arr_c)*(CLASS_COLORS[c][0])).astype('uint8')
seg_img[:, :, 1] += ((seg_arr_c)*(CLASS_COLORS[c][1])).astype('uint8')
seg_img[:, :, 2] += ((seg_arr_c)*(CLASS_COLORS[c][2])).astype('uint8')
return seg_img
interpreter = tf.lite.Interpreter(model_path=f"my_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
img_arr = get_image_array("input.png", 1024, 1024)
interpreter.set_tensor(input_details[0]['index'], np.array([x]))
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])
output = output.reshape((1024, 1024, 2)).argmax(axis=2)
seg_img = get_segmentation_array(output, 2)
cv2.imwrite("output.png", seg_img)
Maybe there's anything powerful than the current solution for post-processing.
I would really appreciate any help with this. I'm sure there's anything that can improve post-processing and reduce its time to ~100ms, so I will have ~5FPS in general.

New Update. Thanks to Farmaker, I used a piece of code found in his repo from comment above and now pipeline looks like:
int channels = 3;
int n_classes = 2;
int float_byte_size = 4;
int width = model.inputWidth;
int height = model.inputHeight;
int[] intValues = new int[width * height];
ByteBuffer inputBuffer = ByteBuffer.allocateDirect(width * height * channels * float_byte_size).order(ByteOrder.nativeOrder());
ByteBuffer outputBuffer = ByteBuffer.allocateDirect(width * height * n_classes * float_byte_size).order(ByteOrder.nativeOrder());
Bitmap input = textureView.getBitmap(width, height);
input.getPixels(intValues, 0, width, 0, 0, height, height);
inputBuffer.rewind();
outputBuffer.rewind();
for (final int value: intValues) {
inputBuffer.putFloat(((value >> 16 & 0xff) - 128.0) / 1.0f);
inputBuffer.putFloat(((value >> 8 & 0xff) - 128.0) / 1.0f);
inputBuffer.putFloat(((value & 0xff) - 128.0) / 1.0f);
}
tfLite.run(inputBuffer, outputBuffer);
final Bitmap output = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888);
outputBuffer.flip();
int[] pixels = new int[width * height];
for (int i = 0; i < width * height; i++) {
float max = outputBuffer.getFloat();
float val = outputBuffer.getFloat();
int id = val > max ? 1 : 0;
pixels[i] = id == 0 ? 0x00000000 : 0x990000ff;
}
output.setPixels(pixels, 0, width, 0, 0, width, height);
resultView.setImageBitmap(resizeBitmap(output, resultView.getWidth(), resultView.getHeight()));
public static Bitmap resizeBitmap(Bitmap bm, int newWidth, int newHeight) {
int width = bm.getWidth();
int height = bm.getHeight();
float scaleWidth = ((float) newWidth) / width;
float scaleHeight = ((float) newHeight) / height;
// CREATE A MATRIX FOR THE MANIPULATION
Matrix matrix = new Matrix();
// RESIZE THE BIT MAP
matrix.postScale(scaleWidth, scaleHeight);
// "RECREATE" THE NEW BITMAP
Bitmap resizedBitmap = Bitmap.createBitmap(
bm, 0, 0, width, height, matrix, false);
bm.recycle();
return resizedBitmap;
}
Right now post-processing time is ~70-130ms, 95th is around 90ms, which alongside ~60ms of image pre-processing time, ~140ms inference time and around 30-40ms for other stuff with enabled GPU and 10 threads gives me general execution time around 330ms which is 3FPS! And this is for a large model for 1024x1024.
At this point, I'm more than satisfied and want to try different configurations for my model, including MobilenetV3 small as a backbone.

Fastest way to render 2D tiles using LWJGL?

I started watching these tutorials for creating a 2d top-down game using LWJGL and I read that VBO's should be fast but for rendering 48*48 tiles per frame I get only about 100FPS which is pretty slow because I will add a lot more stuff to the game than just some static, not moving or changing, tiles.
What can I do to make this faster? Keep in mind that I just started learning lwjgl and opengl so I probably won't know many things.
Anyways, here are some parts of my code (I removed some parts from the code that were kinda meaningless and replaced them with some descriptions):
The main loop
double targetFPS = 240.0;
double targetUPS = 60.0;
long initialTime = System.nanoTime();
final double timeU = 1000000000 / targetUPS;
final double timeF = 1000000000 / targetFPS;
double deltaU = 0, deltaF = 0;
int frames = 0, updates = 0;
long timer = System.currentTimeMillis();
while (!window.shouldClose()) {
long currentTime = System.nanoTime();
deltaU += (currentTime - initialTime) / timeU;
deltaF += (currentTime - initialTime) / timeF;
initialTime = currentTime;
if (deltaU >= 1) {
// --- [ update ] ---
--INPUT HANDLING FOR BASIC MOVEMENT, CLOSING THE GAME AND TURNING VSYNC ON AND OFF USING A METHOD FROM THE INPUT HANDLER CLASS--
world.correctCamera(camera, window);
window.update();
updates++;
deltaU--;
}
if (deltaF >= 1) {
// --- [ render ] ---
glClear(GL_COLOR_BUFFER_BIT);
world.render(tileRenderer, shader, camera, window);
window.swapBuffers();
frames++;
deltaF--;
}
--PRINTING THE FPS AND UPS EVERY SECOND--
}
The input handler methods used:
I have this in my constructor:
this.keys = new boolean[GLFW_KEY_LAST];
for(int i = 0; i < GLFW_KEY_LAST; i++)
keys[i] = false;
And here are the methods:
public boolean isKeyDown(int key) {
return glfwGetKey(window, key) == 1;
}
public boolean isKeyPressed(int key) {
return (isKeyDown(key) && !keys[key]);
}
public void update() {
for(int i = 32; i < GLFW_KEY_LAST; i++)
keys[i] = isKeyDown(i);
}
This is the render method from the World class:
public void render(TileRenderer renderer, Shader shader, Camera camera, Window window) {
int posX = ((int) camera.getPosition().x + (window.getWidth() / 2)) / (scale * 2);
int posY = ((int) camera.getPosition().y - (window.getHeight() / 2)) / (scale * 2);
for (int i = 0; i < view; i++) {
for (int j = 0; j < view; j++) {
Tile t = getTile(i - posX, j + posY);
if (t != null)
renderer.renderTile(t, i - posX, -j - posY, shader, world, camera);
}
}
}
This is the renderTile() method from TileRenderer:
public void renderTile(Tile tile, int x, int y, Shader shader, Matrix4f world, Camera camera) {
shader.bind();
if (tileTextures.containsKey(tile.getTexture()))
tileTextures.get(tile.getTexture()).bind(0);
Matrix4f tilePosition = new Matrix4f().translate(new Vector3f(x * 2, y * 2, 0));
Matrix4f target = new Matrix4f();
camera.getProjection().mul(world, target);
target.mul(tilePosition);
shader.setUniform("sampler", 0);
shader.setUniform("projection", target);
model.render();
}
This is the constructor and render method from Model class:
public Model(float[] vertices, float[] texture_coords, int[] indices) {
draw_count = indices.length;
v_id = glGenBuffers();
glBindBuffer(GL_ARRAY_BUFFER, v_id);
glBufferData(GL_ARRAY_BUFFER, createBuffer(vertices), GL_STATIC_DRAW);
t_id = glGenBuffers();
glBindBuffer(GL_ARRAY_BUFFER, t_id);
glBufferData(GL_ARRAY_BUFFER, createBuffer(texture_coords), GL_STATIC_DRAW);
i_id = glGenBuffers();
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, i_id);
IntBuffer buffer = BufferUtils.createIntBuffer(indices.length);
buffer.put(indices);
buffer.flip();
glBufferData(GL_ELEMENT_ARRAY_BUFFER, buffer, GL_STATIC_DRAW);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);
glBindBuffer(GL_ARRAY_BUFFER, 0);
}
public void render() {
glEnableVertexAttribArray(0);
glEnableVertexAttribArray(1);
glBindBuffer(GL_ARRAY_BUFFER, v_id);
glVertexAttribPointer(0, 3, GL_FLOAT, false, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, t_id);
glVertexAttribPointer(1, 2, GL_FLOAT, false, 0, 0);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, i_id);
glDrawElements(GL_TRIANGLES, draw_count, GL_UNSIGNED_INT, 0);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);
glBindBuffer(GL_ARRAY_BUFFER, 0);
glDisableVertexAttribArray(0);
glDisableVertexAttribArray(1);
}
I store the vertices, texture coords and indices in the tile renderer:
float[] vertices = new float[]{
-1f, 1f, 0, //top left 0
1f, 1f, 0, //top right 1
1f, -1f, 0, //bottom right 2
-1f, -1f, 0, //bottom left 3
};
float[] texture = new float[]{
0, 0,
1, 0,
1, 1,
0, 1,
};
int[] indices = new int[]{
0, 1, 2,
2, 3, 0
};
I don't know what else to put here but the full source code and resources + shader files are available on github here.

With your current system, what I would recommend doing is grouping your tiles based on texture. Create something like this:
Map<Texture, List<Tile>> tiles = new HashMap<Texture, List<Tile>>()
Then when you go to render your map of tiles, you will only need to set the texture once per group of tiles, rather than once per tile. This saves PCI-E bandwidth for pushing textures/texture ids to the GPU. You would achieve that like this (pseudo code):
for (Texture tex : tile.keySet())
{
BIND TEXTURE
for (Tile tile : tiles.get(tex))
{
SET UNIFORMS
RENDER
}
}
Something else I see along these lines is that you are pushing the projection matrix to each tile individually. When you are running a shader program, the value of a given uniform stays the same until you change it or until the program ends. Set the projection matrix uniform once.
It also appears that you are calling this every renderTile(...). Given the value does not change, calculate it once before the render pass, then pass it in as a variable in the renderTile(...) method rather than passing in camera and world.

Box2D LibGDX Rope issue

I am creating a rope with a series of Box2D bodies with the following code:
public void create(float length, float ropeLength){
Array<Body> bodies = new Array<Body>();
bodies.add(BodyFactory.createBox(world, position.x, position.y, length, length, BodyType.StaticBody, 0, 0, 0, "RopeMain"));
for(int i = 1; i < ropeLength; i++){
bodies.add(BodyFactory.createBox(world, position.x, position.y - (((length/2) / Core.PPM) * i),
length, length, BodyType.DynamicBody, 0, 0, 0, "RopeBody" + i));
RopeJointDef rDef = new RopeJointDef();
rDef.bodyA = bodies.get(i - 1);
rDef.bodyB = bodies.get(i);
rDef.collideConnected = true;
rDef.maxLength = (length/2)/Core.PPM;
rDef.localAnchorA.set(position.x, -((length / 2) / Core.PPM));
rDef.localAnchorB.set(position.x, ((length / 2) / Core.PPM));
world.createJoint(rDef);
}
}
Allow me to share some parameters...
For BodyFactory.createBox it requires the following:
world, xPos, yPos, width, height BodyType, density, friction, restitution, fixture user data.(length is same because it uses boxes)
Core.PPM is the pixels per meter. Also note that the position is being divided by PPM in the constructor.
Question: why do the following lines shoot to the right?
Any info is very helpful, also how will density, friction, and restitution affect the rope? Thanks!

The joint's localAnchor is relative to the center of the body and isn't an absolute value. That means that if you want to set the joint to the center-bottom of bodyA and center-top of bodyB you need to use
rDef.localAnchorA.set(0, -((length / 2) / Core.PPM));
rDef.localAnchorB.set(0, ((length / 2) / Core.PPM));

Distribute 32bit float over 4 integers (RGBA) in java

1. Consider a 32bit java float sample in (0.0f .. 1.0f) and four 32bit java integers r, g, b, a each in (0 .. 255) in a vector called RGBA.
2. The sample variable contains normalized measurement data that I wish to present in an ImageIcon in the form of a heat map. The target for the final RGBA values is an integer vector that later is passed as pixel data to a java BufferdImage.
3. The constraints are that when sample==0.0f then RGBA==[0,0,0,255] with uniform distribution so that sample==1.0f represents RGBA==[255,0,0,255] and with sample==0.5f is represented by RGBA==[255,255,255,255]. The alpha channel is always 255.
4. So far I have used a static method by dividing the colors into three separate sections R G B while A remain static at 255. Like so
/* BLUE */
if ( sample <= 0.340000f ){
localSample = (sample/(0.340000f/255.000000f));
sourceLinearData[localIndex] = 0; // R
sourceLinearData[localIndex+1] = 0; // G
sourceLinearData[localIndex+2] = Math.round(localSample); // B
}
5. My questions: A) Are there any suitable java api's/libraries that would help me do this? B) If not then I ask for suggestions to a solution.
6. Thoughts: Since each of the R, G, B, A are in (0 .. 255) I assume I can use bytes instead of integers and then possibly shift these bytes into one variable and then extract the float that way. Though I have not had any success with this method so far.
7. EDIT: Adding example heat map
SOLVED: So, like many other things in software development, this question too holds more than a single answer. In my case I wanted the most direct route with the least amount of additional work. Because of that I decided to go with the answer given by #haraldK. This said though, if you are looking for a formal solution with more control, precision and flexibility, the answer provided by #SashaSalauyou is the more correct one.

To elaborate on my comment above. This doesn't give exactly the colors in the map above, but it is pretty close, and extremely simple:
float sample = ...; // [0...1]
float hue = (1 - sample) * .75f; // limit hue to [0...0.75] to avoid color "loop"
int rgb = Color.getHSBColor(hue, 1, 1).getRGB();
If you want darker tints in "edges" of the scale, you could use a sine function to compute the brightness, for example:
float hue = (1 - sample) * .75f;
float brightness = .5f + (float) Math.sin(Math.PI * sample) / 2;
int rgb = Color.getHSBColor(hue, 1, brightness).getRGB();

I suggest some kind of interpolation in a path that value from 0 to 1 performs in 3D color space:
// black: c = 0.0
// blue: c = 0.3
// white: c = 0.5
// red: c = 1.0
// add more color mappings if needed, keeping c[] sorted
static float[] c = {0.0, 0.3, 0.5, 1.0};
static int[] r = { 0, 0, 255, 255}; // red components
static int[] g = { 0, 0, 255, 0}; // green components
static int[] b = { 0, 255, 255, 0}; // blue components
public int[] getColor(float f) {
int i = 0;
while (c[i] < f)
i++;
if (i == 0)
return new int[] {r[0], g[0], b[0]};
// interpolate
float k = (f - c[i-1]) / (c[i] - c[i-1]);
return new int[] {
Math.round((r[i] - r[i-1]) * k + r[i-1]),
Math.round((g[i] - g[i-1]) * k + g[i-1]),
Math.round((b[i] - b[i-1]) * k + b[i-1])
}
}
}

Lerp should do the trick:
public static void main(String[] args) {
float value = 0.5f;
float[] red = new float[] {1.0f, 0, 0};
float[] white = new float[] {1.0f, 1.0f, 1.0f};
float[] black = new float[] {0, 0, 0};
if (value <= 0.5f) {
float gradientValue = value * 2;
int[] color = new int[] {(int) (lerp(white[0], black[0], gradientValue) * 255), (int) (lerp(white[1], black[1], gradientValue) * 255),
(int) (lerp(white[2], black[2], gradientValue) * 255), 255};
} else if (value > 0.5f) {
float gradientValue = (value + 1) / 2.0f;
int[] color = new int[] {(int) (lerp(white[0], red[0], gradientValue) * 255), (int) (lerp(white[1], red[1], gradientValue) * 255),
(int) (lerp(white[2], red[2], gradientValue) * 255), 255};
}
}
public static float lerp(float v0, float v1, float t) {
return (1-t)*v0 + t*v1;
}
(the order of the lerp arguments might be different, I haven't tested)

Texture Buffers and glMultiDrawElements

Backstory:
I'm trying to draw as many squares the the screen as possible using a single draw call. I'm using a custom glsl vertex shader that is specialized for 2D drawing, and that is supposed to be pulling position data for the vertices of the squares from a samplerBuffer. Since I don't need to worry about rotating or scaling the squares all I should need to do is load the position data into a buffer, bind a texture to that buffer, and then use the sampler to get each vertex's position in the shader. In order to get an index into the texture I store each elements index as the z-component of the vertices.
Everything seems to work really well for a thousand or so squares, but after that I start to get weird blinking. It sort of seems like it's not drawing all of the squares every draw step, or possibly not using all of the positions so that many of the squares are overlapping.
The weird thing is, that if I use drawElements instead of drawElementsMulti, the blinking goes away (but of course then all the squares are drawn as one single object, which I don't want)
One question I have is if my position data is limited to the max texture size, or the max texture buffer size. And if I am limited to the much smaller max texture size, how do I get around it? There's got to be a reason all of that texture buffer space is there, but I obviously don't get how to properly use it.
I'm also thinking maybe glMultiDrawElements is doing something I'm not accounting for with the sampler somehow. Idk, I'm really lost at this point, and yet..it works perfectly for smaller numbers of squares, so I must be doing something right.
[EDIT] Code had changed to reflect suggestions below (and for readability), but the problem persists.
Ok, so here's some code. First the vertex shader:
uniform mat3 projection;
attribute vec3 vertex;
uniform samplerBuffer positionSampler;
attribute vec4 vertex_color;
varying vec4 color;
float positionFetch(int index)
{
// I've tried texelFetch here as well, same effect
float value = texelFetchBuffer(positionSampler, index).r;
return value;
}
void main(void)
{
color = vec4(1, 1, 1, 1);
// use the z-component of the vertex to look up the position of this instance in the texture
vec3 real_position = vec3(vertex.x + positionFetch(int(vertex.z)*2), vertex.y + positionFetch(int(vertex.z)*2+1), 1);
gl_Position = vec4(projection * real_position, 1);
}
And now my GLRenderer, sorry there is so much code, I just really want to make sure there's enough info here to get an answer. This has really been driving me nuts, and examples for java seem to be hard to come by (maybe this code will help someone else on their quest):
public class GLRenderer extends GLCanvas implements GLEventListener, WindowListener
{
private static final long serialVersionUID = -8513201172428486833L;
private static final int bytesPerFloat = Float.SIZE / Byte.SIZE;
private static final int bytesPerShort = Short.SIZE / Byte.SIZE;
public float viewWidth, viewHeight;
public float screenWidth, screenHeight;
private FPSAnimator animator;
private boolean didInit = false;
JFrame the_frame;
SquareGeometry geometry;
// Thought power of 2 might be required, doesn't seem to make a difference
private static final int NUM_THINGS = 2*2*2*2*2*2*2*2*2*2*2*2*2*2;
float[] position = new float[NUM_THINGS*2];
// Shader attributes
private int shaderProgram, projectionAttribute, vertexAttribute, positionAttribute;
public static void main(String[] args)
{
new GLRenderer();
}
public GLRenderer()
{
// setup OpenGL Version 2
super(new GLCapabilities(GLProfile.get(GLProfile.GL2)));
addGLEventListener(this);
setSize(1800, 1000);
the_frame = new JFrame("Hello World");
the_frame.getContentPane().add(this);
the_frame.setSize(the_frame.getContentPane().getPreferredSize());
the_frame.setVisible(true);
the_frame.addWindowListener(this);
animator = new FPSAnimator(this, 60);
animator.start();
}
// Called by the drivers when the gl context is first made available
public void init(GLAutoDrawable d)
{
final GL2 gl = d.getGL().getGL2();
IntBuffer asd = IntBuffer.allocate(1);
gl.glGetIntegerv(GL2.GL_MAX_TEXTURE_BUFFER_SIZE, asd);
System.out.println(asd.get(0));
asd = IntBuffer.allocate(1);
gl.glGetIntegerv(GL2.GL_MAX_TEXTURE_SIZE, asd);
System.out.println(asd.get(0));
shaderProgram = ShaderLoader.compileProgram(gl, "default");
gl.glLinkProgram(shaderProgram);
_getShaderAttributes(gl);
gl.glUseProgram(shaderProgram);
_checkGLCapabilities(gl);
_initGLSettings(gl);
// Calculate batch of vertex data from dirt geometry
geometry = new SquareGeometry(.1f);
geometry.buildGeometry(viewWidth, viewHeight);
geometry.finalizeGeometry(NUM_THINGS);
geometry.vertexBufferID = _generateBufferID(gl);
_loadVertexBuffer(gl, geometry);
geometry.indexBufferID = _generateBufferID(gl);
_loadIndexBuffer(gl, geometry);
geometry.positionBufferID = _generateBufferID(gl);
// initialize buffer object
int size = NUM_THINGS * 2 * bytesPerFloat;
System.out.println(size);
IntBuffer bla = IntBuffer.allocate(1);
gl.glGenTextures(1, bla);
geometry.positionTextureID = bla.get(0);
gl.glUniform1i(positionAttribute, 0);
gl.glActiveTexture(GL2.GL_TEXTURE0);
gl.glBindTexture(GL2.GL_TEXTURE_BUFFER, geometry.positionTextureID);
gl.glBindBuffer(GL2.GL_TEXTURE_BUFFER, geometry.positionBufferID);
gl.glBufferData(GL2.GL_TEXTURE_BUFFER, size, null, GL2.GL_DYNAMIC_DRAW);
gl.glTexBuffer(GL2.GL_TEXTURE_BUFFER, GL2.GL_R32F, geometry.positionBufferID);
}
private void _initGLSettings(GL2 gl)
{
gl.glClearColor(0f, 0f, 0f, 1f);
}
private void _loadIndexBuffer(GL2 gl, SquareGeometry geometry)
{
gl.glBindBuffer(GL2.GL_ELEMENT_ARRAY_BUFFER, geometry.indexBufferID);
gl.glBufferData(GL2.GL_ELEMENT_ARRAY_BUFFER, bytesPerShort*NUM_THINGS*geometry.getNumPoints(), geometry.indexBuffer, GL2.GL_STATIC_DRAW);
}
private void _loadVertexBuffer(GL2 gl, SquareGeometry geometry)
{
int numBytes = geometry.getNumPoints() * 3 * bytesPerFloat * NUM_THINGS;
gl.glBindBuffer(GL2.GL_ARRAY_BUFFER, geometry.vertexBufferID);
gl.glBufferData(GL2.GL_ARRAY_BUFFER, numBytes, geometry.vertexBuffer, GL2.GL_STATIC_DRAW);
gl.glEnableVertexAttribArray(vertexAttribute);
gl.glVertexAttribPointer(vertexAttribute, 3, GL2.GL_FLOAT, false, 0, 0);
}
private int _generateBufferID(GL2 gl)
{
IntBuffer bufferIDBuffer = IntBuffer.allocate(1);
gl.glGenBuffers(1, bufferIDBuffer);
return bufferIDBuffer.get(0);
}
private void _checkGLCapabilities(GL2 gl)
{
// TODO: Respond to this information in a meaningful way.
boolean VBOsupported = gl.isFunctionAvailable("glGenBuffersARB") && gl.isFunctionAvailable("glBindBufferARB")
&& gl.isFunctionAvailable("glBufferDataARB") && gl.isFunctionAvailable("glDeleteBuffersARB");
System.out.println("VBO Supported: " + VBOsupported);
}
private void _getShaderAttributes(GL2 gl)
{
vertexAttribute = gl.glGetAttribLocation(shaderProgram, "vertex");
projectionAttribute = gl.glGetUniformLocation(shaderProgram, "projection");
positionAttribute = gl.glGetUniformLocation(shaderProgram, "positionSampler");
}
// Called by me on the first resize call, useful for things that can't be initialized until the screen size is known
public void viewInit(GL2 gl)
{
for(int i = 0; i < NUM_THINGS; i++)
{
position[i*2] = (float) (Math.random()*viewWidth);
position[i*2+1] = (float) (Math.random()*viewHeight);
}
gl.glUniformMatrix3fv(projectionAttribute, 1, false, Matrix.projection3f, 0);
// Load position data into a texture buffer
gl.glBindBuffer(GL2.GL_TEXTURE_BUFFER, geometry.positionBufferID);
ByteBuffer textureBuffer = gl.glMapBuffer(GL2.GL_TEXTURE_BUFFER, GL2.GL_WRITE_ONLY);
FloatBuffer textureFloatBuffer = textureBuffer.order(ByteOrder.nativeOrder()).asFloatBuffer();
for(int i = 0; i < position.length; i++)
{
textureFloatBuffer.put(position[i]);
}
gl.glUnmapBuffer(GL2.GL_TEXTURE_BUFFER);
gl.glBindBuffer(GL2.GL_TEXTURE_BUFFER, 0);
}
public void display(GLAutoDrawable d)
{
if (!didInit || geometry.vertexBufferID == 0)
{
return;
}
//long startDrawTime = System.currentTimeMillis();
final GL2 gl = d.getGL().getGL2();
gl.glClear(GL2.GL_COLOR_BUFFER_BIT | GL2.GL_DEPTH_BUFFER_BIT);
// If we were drawing any other buffers here we'd need to set this every time
// but instead we just leave them bound after initialization, saves a little render time
// No combination of these seems to fix the problem
//gl.glBindBuffer(GL2.GL_ARRAY_BUFFER, geometry.vertexBufferID);
//gl.glVertexAttribPointer(vertexAttribute, 3, GL2.GL_FLOAT, false, 0, 0);
//gl.glBindBuffer(GL2.GL_ELEMENT_ARRAY_BUFFER, geometry.indexBufferID);
gl.glBindBuffer(GL2.GL_TEXTURE_BUFFER, geometry.positionBufferID);
//gl.glActiveTexture(GL2.GL_TEXTURE0);
//gl.glTexBuffer(GL2.GL_TEXTURE_BUFFER, GL2.GL_R32F, geometry.positionBufferID);
_render(gl, geometry);
// Also tried these
//gl.glFlush();
//gl.glFinish();
}
public void _render(GL2 gl, SquareGeometry geometry)
{
gl.glMultiDrawElements(geometry.drawMode, geometry.countBuffer, GL2.GL_UNSIGNED_SHORT, geometry.offsetBuffer, NUM_THINGS);
// This one works, but isn't what I want
//gl.glDrawElements(GL2.GL_LINE_LOOP, count, GL2.GL_UNSIGNED_SHORT, 0);
}
public void reshape(GLAutoDrawable d, int x, int y, int width, int height)
{
final GL2 gl = d.getGL().getGL2();
gl.glViewport(0, 0, width, height);
float ratio = (float) height / width;
screenWidth = width;
screenHeight = height;
viewWidth = 100;
viewHeight = viewWidth * ratio;
Matrix.ortho3f(0, viewWidth, 0, viewHeight);
if (!didInit)
{
viewInit(gl);
didInit = true;
}
else
{
// respond to view size changing
}
}
}
The final bit is the SquareGeometry class which holds all the bufferIDs and vertex data, but also is responsible for filling the vertex buffer correctly so that each vertex's z component can function as an index into the position texture:
public class SquareGeometry
{
public float[] vertices = null;
ShortBuffer indexBuffer;
IntBuffer countBuffer;
PointerBuffer offsetBuffer;
FloatBuffer vertexBuffer;
public int vertexBufferID = 0;
public int indexBufferID = 0;
public int positionBufferID = 0;
public int positionTextureID = 0;
public int drawMode;
protected float width = 0;
protected float height = 0;
public SquareGeometry(float size)
{
width = size;
height = size;
}
public void buildGeometry(float viewWidth, float viewHeight)
{
vertices = new float[4 * 2];
vertices[0] = -width/2;
vertices[1] = -height/2;
vertices[2] = -width/2;
vertices[3] = height/2;
vertices[4] = width/2;
vertices[5] = height/2;
vertices[6] = width/2;
vertices[7] = -height/2;
drawMode = GL2.GL_POLYGON;
}
public void finalizeGeometry(int numInstances)
{
if(vertices == null) return;
int num_vertices = this.getNumPoints();
int total_num_vertices = numInstances * num_vertices;
// initialize vertex Buffer (# of coordinate values * 4 bytes per float)
ByteBuffer vbb = ByteBuffer.allocateDirect(total_num_vertices * 3 * Float.SIZE);
vbb.order(ByteOrder.nativeOrder());
vertexBuffer = vbb.asFloatBuffer();
for(int i = 0; i < numInstances; i++)
{
for(int v = 0; v < num_vertices; v++)
{
int vertex_index = v * 2;
vertexBuffer.put(vertices[vertex_index]);
vertexBuffer.put(vertices[vertex_index+1]);
vertexBuffer.put(i);
}
}
vertexBuffer.rewind();
// Create the indices
vbb = ByteBuffer.allocateDirect(total_num_vertices * Short.SIZE);
vbb.order(ByteOrder.nativeOrder());
indexBuffer = vbb.asShortBuffer();
for(int i = 0; i < total_num_vertices; i++)
{
indexBuffer.put((short) (i));
}
indexBuffer.rewind();
// Create the counts
vbb = ByteBuffer.allocateDirect(numInstances * Integer.SIZE);
vbb.order(ByteOrder.nativeOrder());
countBuffer = vbb.asIntBuffer();
for(int i = 0; i < numInstances; i++)
{
countBuffer.put(num_vertices);
}
countBuffer.rewind();
// create the offsets
offsetBuffer = PointerBuffer.allocateDirect(numInstances);
for(int i = 0; i < numInstances; i++)
{
offsetBuffer.put(num_vertices*i*2);
}
offsetBuffer.rewind();
}
public int getNumPoints()
{
return vertices.length/2;
}
}

Ok first things first, you are not setting gl_Color in the shader maybe that can be the issue here and you only lucky with small numbers. It is a varying, but do you also have fragment shader that picks up the value?
At no point do you ensure that NUM_THINGS*2 < GL_MAX_TEXTURE_SIZE. I don't know how FloatBuffer.put reacts; being Java probably / hopefully an exception.
Also you bind the positionBufferID buffer, then unbind it but never rebind it.
You create positionTextureID but never put any data there. This also what you put into the sampler positionSampler and try to access.
Yea well lots of issues but my gut tells me the last one may be the real issue here.

Alright, I've got it solved, though I'm still really not clear on what the original problem was. I fixed it by simplifying the drawing to use drawArrays instead of drawElements or multiDrawElements. I'm really not sure why I thought I needed them, as I really don't in this case. I'm pretty sure I was messing up a few things with the indexes and offsets.
Furthermore, as far as the proper way to bind the texture buffer, neither the code I have above, nor example found at the link I posted in a comment are correct at all.
If anyone is interested in the correct way to use the texture buffer like this I just did a pretty extensive write-up on it here http://zebadiah.me/?p=44. Thanks all for the help.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why are my VBOs slower than display lists? - java

Related

Android semantic segmentation post-processing is too slow

Fastest way to render 2D tiles using LWJGL?

Box2D LibGDX Rope issue

Distribute 32bit float over 4 integers (RGBA) in java

Texture Buffers and glMultiDrawElements

Categories

Resources