I'm working on small project which requires: Change clothes (shirt/pants etc.) of a person in any 2D image he chooses to upload. So somehow edges needs to be detected and relevant areas are supposed to be filled with new patterns. I do see a lot of other complications, but let's assume simple patterns have to be filled only.
For a web application, is it possible to do it in HTML5? Any other alternatives?
For a standalone application, what kind of technology would be preferred, C++/Java?
Update
Based on Bart's comment:
Any useful pointer like Bart's would be really useful
Assumption: Clear traceable 'standing' human figure in 2d image
Since it's an image, there is no real-time scenario
Assumption: Clear traceable 'standing' human figure in 2d image
A way to do this is to require the user to take two pictures. One picture is the one with the user in it, the other picture must be taken in the same camera position and orientation, but the user steps out of the frame for that one.
Since both pictures will have the same background you can compare pixel by pixel between the two images and flag those pixels that have a difference over some threshold. Of course the threshold must be selected so that camera noise isn't detected as a difference. Once you have the collection of pixels that are different you can filter them and calculate an approximate silhouette for the user from the pixels on the edge.
A simplification of the above method can be done if you have control over the background. You could use a bluescreen to avoid having to have a second picture with the background.
Related
Introduction
The title is a bit complicated so let's break it down:
I have an image submitted by a user
The image is a top view of a landscape featuring clearly marked regions. For example, if this was a park the image would be a top view of the parks layout.
I need to allow the user to classify different elements in the image and estimate the area occupied by those elements. Continuing with the park analogy; the park may have two pavilions and a sand volleyball court. I must allow the user to mark the points of interest (let's say the volleyball court) and compute their area (given the overall dimensions of the depicted park)
Current Ideas
I think I should create a buffered image and use that as the background of a canvas.
I'm not sure about the user input. My first idea was to have the users drag rectangles associated with a specific feature (ie. red rectangle for volleyball courts) to the region over the image. Rectangles work because the elements are mostly rectangular but I don't know that users can resize rectangles.
To reiterate, the main problem is determining the area occupied by physical structures in a given image. No machine vision, just plain old Mouse Events.
How should I be approaching the user input dilemma? Any APIs I should be digging through?
Please let me know if I can improve the question and explanation.
I'm having quite a bit of difficulty wrapping my head around the actual display side of things with libgdx. That is, it just seems fairly jumbled in terms of what needs to be done in order to actually put something up onto the screen. I guess my confusion can sort of be separated into two parts:
What exactly needs to be done in terms of creating an image? There's
Texture, TextureRegion, TextureAtlas, Sprite, Batch, and probably a
few other art related assets that I'm missing. How do these all
relate and tie into each other? What's the "production chain" among
these I guess would be a way of putting it.
In terms of putting
whatever is created from the stuff above onto the monitor or
display, how do the different coordinate and sizing measures relate
and translate to and from each other? Say there's some image X that
I want to put on the screen. IT's got it's own set of dimensions and
coordinates, but then there's also a viewport size (is there a
viewport position?) and a camera position (is there a camera size?).
On top of all that, there's also the overall dispaly size that's
from Gdx.graphics. A few examples of things I might want to do could
be as follow:
X is my "global map" that is bigger than my screen
size. I want to be able to scroll/pan across it. What are the
coordinates/positions I should use when displaying it?
Y is bigger
than my screen size. I want to scale it down and have it always be
in the center of the screen/display. What scaling factor do I use
here, and which coordinates/positions?
Z is smaller than my screen
size. I want to stick it in the upper left corner of my screen and
have it "stick" to the global map I mentioned earlier. Which
positioning system do I use?
Sorry if that was a bunch of stuff... I guess the tl;dr of that second part is just which set of positions/coordinates, sizes, and scales am I supposed to do everything in terms of?
I know this might be a lot to ask at once, and I also know that most of this stuff can be found online, but after sifting through tutorial after tutorial, I can't seem to get a straight answer as to how these things all relate to each other. Any help would be appreciated.
Texture is essentially the raw image data.
TextureRegion allows you to grab smaller areas from a larger texture. For example, it is common practice to pack all of the images for your game/app into a single large texture (the LibGDX “TexturePacker” is a separate program that does this) and then use regions of the larger texture for your individual graphics. This is done because switching textures is a heavy and slow operation and you want to minimize this process.
When you pack your images into a single large image with the TexturePacker it creates a “.atlas” file which stores the names and locations of your individual images. TextureAtlas allows you to load the .atlas file and then extract your original images to use in your program.
Sprite adds position and color capabilities to the texture. Notice that the Texture API has no methods for setting/getting position or color. Sprites will be your characters and other objects that you can actually move around and position on the screen.
Batch/SpriteBatch is an efficient way of drawing multiple sprites to the screen. Instead of making drawing calls for each sprite one at a time the Batch does multiple drawing calls at once.
And hopefully I’m not adding to the confusion, but another I option I really like is using the “Actor” and “Stage” classes over the “Sprite” and “SpriteBatch” classes. Actor is similar to Sprite but adds additional functionality for moving/animating, via the act method. The Stage replaces the SpriteBatch as it uses its own internal SpriteBatch so you do not need to use the SpriteBatch explicitly.
There is also an entire set of UI components (table, button, textfield, slider, progress bar, etc) which are all based off of Actor and work with the Stage.
I can’t really help with question 2. I stick to UI-based apps, so I don’t know the best practices for working with large game worlds. But hopefully someone more knowledgeable in that area can help you with that.
This was to long to reply as a comment so I’m responding as another answer...
I think both Sprite/SpriteBatch and Actor/Stage are equally powerful as you can still animate and move with Sprite/SpriteBatch, but Actor/Stage is easier to work with. The stage has two methods called “act” and “draw” which allows the stage to update and draw every actor it contains very easily. You override the act method for each of your actors to specify what kind of action you want it to do. Look up a few different tutorials for Stage/Actor with sample code and it should become clear how to use it.
Also, I was slightly incorrect before that “Actor” is equivalent to Sprite, because Sprite includes a texture, but Actor by itself does not have any kind of graphical component. There is an extension of Actor called “Image” that includes a Drawable, so the Image class is actually the equivalent to Sprite. Actor is the base class that provides the methods for acting (or “updating”), but it doesn’t have to be graphical. I've used Actors for other purposes such as triggering audio sounds at specific times.
Atlas creates the large Texture containing all of your png files and then allows you to get regions from it for individual png's. So the pipeline for getting a specific png graphic would be Atlas > Region > Sprite/Image. Both Image and Sprite classes have constructors that take a region.
I have written a program that takes a 'photo' and for every pixel it chooses to insert an image from a range of other photos. The image chosen is the photo of which the average colour is closest to the original pixel from the photograph.
I have done this by firstly averaging the rgb values from every pixel in 'stock' image and then converting it to CIE LAB so i could calculate the how 'close' it is to the pixel in question in terms of human perception of the colour.
I have then compiled an image where each pixel in the original 'photo' image has been replaced with the 'closest' stock image.
It works nicely and the effect is good however the stock image size is 300 by 300 pixels and even with the virtual machine flags of "-Xms2048m -Xmx2048m", which yes I know is ridiculus, on 555px by 540px image I can only replace the stock images scaled down to 50 px before I get an out of memory error.
So basically I am trying to think of solutions. Firstly I think the image effect itself may be improved by averaging every 4 pixels (2x2 square) of the original image into a single pixel and then replacing this pixel with the image, as this way the small photos will be more visible in the individual print. This should also allow me to draw the stock images at a greater size. Does anyone have any experience in this sort of image manipulation? If so what tricks have you discovered to produce a nice image.
Ultimately I think the way to reduce the memory errors would be to repeatedly save the image to disk and append the next line of images to the file whilst continually removing the old set of rendered images from memory. How can this be done? Is it similar to appending a normal file.
Any help in this last matter would be greatly appreciated.
Thanks,
Alex
I suggest looking into the Java Advanced Imaging (JAI) API. You're probably using BufferedImage right now, which does keep everything in memory: source images as well as output images. This is known as "immediate mode" processing. When you call a method to resize the image, it happens immediately. As a result, you're still keeping the stock images in memory.
With JAI, there are two benefits you can take advantage of.
Deferred mode processing.
Tile computation.
Deferred mode means that the output images are not computed right when you call methods on the images. Instead, a call to resize an image creates a small "operator" object that can do the resizing later. This lets you construct chains, trees, or pipelines of operations. So, your work would build a tree of operations like "crop, resize, composite" for each stock image. The nice part is that the operations are just command objects so you aren't consuming all the memory while you build up your commands.
This API is pull-based. It defers computation until some output action pulls pixels from the operators. This quickly helps save time and memory by avoiding needless pixel operations.
For example, suppose you need an output image that is 2048 x 2048 pixels, scaled up from a 512x512 crop out of a source image that's 1600x512 pixels. Obviously, it doesn't make sense to scale up the entire 1600x512 source image, just to throw away 2/3 of the pixels. Instead, the scaling operator will have a "region of interest" (ROI) based on it's output dimensions. The scaling operator projects the ROI onto the source image and only computes those pixels.
The commands must eventually get evaluated. This happens in a few situations, mostly relating to output of the final image. So, asking for a BufferedImage to display the output on the screen will force all the commands to evaluate. Similarly, writing the output image to disk will force evaluation.
In some cases, you can keep the second benefit of JAI, which is tile based rendering. Whereas BufferedImage does all its work right away, across all pixels, tile rendering just operates on rectangular sections of the image at a time.
Using the example from before, the 2048x2048 output image will get broken into tiles. Suppose these are 256x256, then the entire image gets broken into 64 tiles. The JAI operator objects know how to work a tile at a tile. So, scaling the 512x512 section of the source image really happens 64 times on 64x64 source pixels at a time.
Computing a tile at a time means looping across the tiles, which would seem to take more time. However, two things work in your favor when doing tile computation. First, tiles can be evaluated on multiple threads concurrently. Second, the transient memory usage is much, much lower than immediate mode computation.
All of which is a long-winded explanation for why you want to use JAI for this type of image processing.
A couple of notes and caveats:
You can defeat tile based rendering without realizing it. Anywhere you've got a BufferedImage in the workstream, it cannot act as a tile source or sink.
If you render to disk using the JAI or JAI Image I/O operators for JPEG, then you're in good shape. If you try to use the JDK's built-in image classes, you'll need all the memory. (Basically, avoid mixing the two types of image manipulation. Immediate mode and deferred mode don't mix well.)
All the fancy stuff with ROIs, tiles, and deferred mode are transparent to the program. You just make API call on the JAI class. You only deal with the machinery if you need more control over things like tile sizes, caching, and concurrency.
Here's a suggestion that might be useful;
Try segregating the two main tasks into individual programs. Your first task is to decide which images go where, and that can be a simple mapping from coordinates to filenames, which can be represented as lines of text:
0,0,image123.jpg
0,1,image542.jpg
.....
After that task is done (and it sounds like you have it well handled), then you can have a separate program handle the compilation.
This compilation could be done by appending to an image, but you probably don't want to mess around with file formats yourself. It's better to let your programming environment do it by using a Java Image object of some sort. The biggest one you can fit in memory pixelwise will be 2GB leading to sqrt(2x10^9) maximum height and width. From this number and dividing by the number of images you have for height and width, you will get the overall pixels per subimage allowed., and can paint them into the appropriate places.
Every time you 'append' are you perhaps implicitly creating a new object with one more pixel to replace the old one (ie, a parallel to the classic problem of repeatedly appending to a String instead of using a StringBuilder) ?
If you post the portion of your code that does the storing and appending, someone will probably help you find an efficient way of recoding it.
I'm trying to take an ongoing chess game and use image recognition to automatically transcribe it into a list of chess moves (1. e4 e5 2. Nf3 Nc6), automatically.
Given a 2D layout of the board and pieces, and standard images for each of the pieces, how would I go about in Java doing this?
Thanks!
I would imagine that depending on the set being played with, given a top down view of the board, it might prove difficult to distinguish between the different pieces.
Rather than relying on image recognition to determine which pieces are which, it would almost certainly be easier to simply track the pieces throughout the course of the game. You already know exactly where they started from, so after each turn it should be possible to deduce which square is now empty that wasn't previously empty, and which square is now occupied that wasn't previously occupied. This makes your image analysis much simpler as you're just determining whether each square on the board is empty or not.
Well if you have an image for each piece than just follow and save(using some arrays) all the moves. But first show us some code on what you have. More info.
For image classification in JAVA you can try Rapidminer and IMMI extension for image mining:
http://spl.utko.feec.vutbr.cz/en/component/content/article/46-image-processing-extension-for-rapidminer-5
For this purpose you should extract global features from images and than train some classifier (e.g. SVM). Another approach might be training Viola-Jones detector.
I'm trying to develop a 2D game to android using opengl.
I know how to print images on the screen and animate them. But in my game I have a map and a want to zoom in and out and scroll the map. But I can't figure out the best way of doing it.
Can anybody help me?
I don't have any api examples but I did games design at college so I'll give my two bits.
The method you use will depend on your map style, size, functionality and format.
For example if you are looking for a very static non changing map, use a simple picture image. You can use the API frame for a picture view, enabling you to zoom in and out as you do in the gallery and to scroll on zoomed images, or in this case, zoom locations on your map.
Alternatively, if your map is based off a tiling system, a good example of this is the original Pokémon and Legend of Zelda games from the old game boy, then each area stores a tile 'thumbnail' for itself as a bitmap. These are then put into their appropriate locations on a grid depending on what areas are discovered.
This is the probably the most flexible way to build your map as you are not relying on a set bitmap for the entirety your map meaning it can change its look efficiently; you can build it as desired to show areas of choice (useful for if the map only reveals places the gamer has covered) and it also means you can do tile based overlay:
ie - if a certain area should contain treasure, theres a treasure icon overlayed on that tiles x,y position on the map grid.
I used the tiling option in my game projects from college and it made everything else map related easier. It also made the map side of things smaller storage wise.
The simplest approach would be to just call glTranslatef(-scrollX,-scrollY,0) followed by glScalef(zoom,zoom,zoom) before you render your map.