i am implementing sliding windows technique to develop photo OCR,i.e.,a rectangule of a specific size is cut from the picture and checked if it contains text or not. Then again the rectangle is shifted by some pixels. But this sliding windows technique is taking a lot of time. For example to process a picture of 1366x768 it takes 6 hours with a step size of 2 and window size of 20x25. Is there any other technique which could be helpful or how to speed up the process?
i am coding in java.
It is hard to give a specific recommendation without knowing any details of your algorithm/code. There are several potential performance improvements you could consider:
Minimize disk I/O and cache misses. You stated that a rectangle is "cut from the picture". If each "cut" is a separate read from disk, it is very inefficient and would contribute significantly to execution time. When you shift your window (by 2 pixels, it appears), most of the data in the new window is the same so try to avoid re-reading that data as much as possible.
Decrease your window size or increase your step size. This obviously affects your result but depending on the size of the characters you are trying to OCR, it might be an option.
If you are applying a convolution filter to do OCR, consider doing fast convolution via a 2D FFT of the image data.
Multithread your application, if it isn't already. While your problem is not embarrassingly parallel, it could be fairly easily multithreaded.
Sliding window approaches are brute force and, therefore, terribly slow by their nature. Perhaps you should take a look at salience-based techniques that use filters to prioritize which areas of the image to process.
Here is a paper I am somewhat familiar with: B. Draper and A. Lionelle. “Evaluation of Selective Attention under Similarity Transformations,” Vision and Image Understanding, 100:152-171, 2005
Finally, what ANN library are you using? Make sure your ANN code is doing matrix/vector operations and that they are as optimized as possible!
Related
I've been playing a bit with some image processing techniques to do HDR pictures and similar. I find it very hard to align pictures taken in bursts... I tried some naïve motion search algorithms, simply based on comparing small samples of pixels (like 16x16) between different pictures that pretty much work like:
- select one 16x6 block in the first picture, one with high contrast, then blur it, to reduce noise
- compare in a neighbouring radius (also blurred for noise)... (usually using averaged squared difference)
- select the most similar one.
I tried a few things to improve this, for example using these search alghorithms (https://en.wikipedia.org/wiki/Block-matching_algorithm) to speed it up. The results however are not good and when they are they are not robust. Also they keep being computationally very intensive (which precludes usage on a mobile device for example).
I looked into popular research based algorithms like https://en.wikipedia.org/wiki/Lucas%E2%80%93Kanade_method, but it does not seem very suitable to big movements. If we see burst images taken with todays phones, that have sensors > 12Mpix, it's easy that small movements result in a difference of 50-100 pixels. The Lucas Kanade method seems more suitable for small amounts of motion...
It's a bit frustrating as there seem to be hundreds of apps that do HDR, and they seems to be able to match pictures so easily and reliably in a snap... I've tried to look into OpenCV, but all it offers seems to be the above Lucas Kanade method. Also I've seen projects like https://github.com/almalence/OpenCamera, which do this in pure java easily. Although the code is not easy (one class has 5k lines doing it all). Does anyone have any pointers to reliable resources.
Take a look at HDR+ paper by google. It uses a hierarchical algorithm for alignment that is very fast but not robust enough. Afterward it uses a merging algorithm that is robust to alignment failures.
But it may be a little tricky to use it for normal HDR, since it says:
We capture frames of constant exposure, which makes alignment more robust.
Here is another work that needs sub-pixel accurate alignment. It uses a refined version of the alignment introduced in the HDR+ paper.
HDR+ code
Since SURF feature matching spend a lot of processing time. So I decided to resize the bitmap in order to shorten the processing time of SURF.But can I know if make the bitmap smaller will spend less processing time of SURF?
Sure, that's one way to speed up most image processing algorithms.
In OpenCV, you can also specify the parameters _nOctaveLayers and _nOctaves in the SURF constructor. These parameters dictate the number of different scales that the algorithm checks for feature points. If you decrease these, you will get a faster detection time, but you will also miss out on feature points at scales that aren't checked.
These speedups are based around the detection of SURF points. If you are talking strictly about matching the points, then it is the number of points in the image that is the largest dictator of the running time.
have you tried orb? you can find orb sample usage under samples/python2/plane_tracker.py. i havent tried it on phone but on pc it can match many targets simultaneously and quickly while surf is struggling with just one.
I am rendering rather heavy object consisting of about 500k triangles. I use opengl display list and in render method just call glCallList. I thought that once graphic primitives is compiled into display list cpu work is done and it just tells gpu to draw. But now one cpu core is loaded up to 100%.
Could you give me some clues why does it happen?
UPDATE: I have checked how long does it take to run glCallList, it's fast, it takes about 30 milliseconds to run it
Most likely you are hitting the limits on the list length, which are at 64k verteces per list. Try to split your 500k triangles (1500k verteces?) into smaller chunks and see what you get.
btw which graphical chips are you using? If the verteces are processed on CPU, that also might be a problem
It's a bit of a myth that display lists magically offload everything to the GPU. If that was really the case, texture objects and vertex buffers wouldn't have needed to be added to OpenGL. All the display list really is, is a convenient way of replaying a sequence of OpenGL calls and hopefully saving some of the function call/data conversion overhead (see here). None of the PC HW implementations I've used seem to have done anything more than that so far as I can tell; maybe it was different back in the days of SGI workstations, but these days buffer objects are the way to go. (And modern OpenGL books like OpenGL Distilled give glBegin/glEnd etc the briefest of mentions only before getting stuck into the new stuff).
The one place I have seen display lists make a huge difference is the GLX/X11 case where your app is running remotely to your display (X11 "server"); in that case using a display list really does push all the display-list state to the display side just once, whereas a non-display-list immediate-mode app needs to send a bunch of stuff again each frame using lots more bandwidth.
However, display lists aside, you should be aware of some issues around vsync and busy waiting (or the illusion of it)... see this question/answer.
So I need to understand how swing allocates memory for buffering screen rendering. Obviously there are duplicates if you have double/tripple/etc buffering. However I would need to know when swing allocates memory and how much of it. Very helpful to know if I have multiple windows open (launched from the same jvm) how much memory is needed depending on windows being maximized to one screen, multiple screens (I need it to go up to 6 screens), etc.
Does anyone know of any good readings or maybe have answers for how Java Swing/AWT allocate memory for rendering buffers.
End of the day, I am looking for a definitive formula so that if I have a number of windows opened, number of buffers in each window, location of windows, and size of each window I can get an exact byte count required to render the application (just the buffering part, the rest of the memory is another problem)
I was assuming it was (single buffered) x by y of each window = 1 buffer, add those together and you have all memory requirements, but profiling the data this appears to be far from the truth, some buffers are weak/soft references, some strong, and I cannot determine the way to calculate (yet :)).
Edit: I am using JFrame objects (for better or worse) to do my top-level stuff.
Double buffering is a convenient feature of JPanel, but there will always be a significant platform-dependent contribution: Every visible JComponent belongs to a heavyweight peer whose memory is otherwise inaccessible to the JVM.
If you're trying to avoid running out of memory, pick a reasonable value for the startup parameters and instruct the user how to change them. ImageJ is a good example.
I'd recommend JConsole and the Swing Source code fro this kind of precision.
I assume you realize this will be extremely tedious to calculate since you have to consider every object created somewhere in the process, which of course will depend on the controls involved in the UI.
I am not aware of any automatic screen buffering support in Swing. If you need double buffering, you need to implement it yourself in which case you will know better how to calculate memory requirements :-)
See this answer: Java: how to do double-buffering in Swing? for more information and good pointers.
I need simple opinion from all Guru!
I developed a program to do some matrix calculations. It work all fine with
small matrix. However when I start calculating BIG thousands column row matrix. It
kills the speed.
I was thinking to do processing on each row and write the result in a file then free the
memory and start processing 2nd row and write in a file, so and so forth.
Will it help in improving speed? I have to make big changes to implement this change. Thats
why I need your opinion. What do you think?
Thanks
P.S: I know about colt and Jama matrix. I can not use these packages due to company
rules.
Edited
In my program I am storing all the matrix in 2 dimensional array and if matrix is small it is fine. However, when it has thousands column and rows. Then storing all this matrix in memory for calculation cause performance issues. Matrix contains floating values. For processing I read all the matrix store in memory then start calculation. After calculating I write the result in a file.
Is memory really your bottleneck? Because if it isn't, then stopping to write things out to a file is always going to be much, much slower than the alternative. It sounds like you are probably experiencing some limitation of your algorithm.
Perhaps you should consider optimising the algorithm first.
And as I always say for all performance issue - asking people is one thing, but there is no substitute for trying it! Opinions don't matter if the real-world performance is measurable.
I suggest using profiling tools and timing statements in your code to work out exactly where your performance problem is before your start making changes.
You could spend a long time 'fixing' something that isn't the problem. I suspect that the file IO you suggest would actually slow your code down.
If your code effectively has a loop nested within another loop to process each element then you will see your speed drop away quickly as you increase the size of the matrix. If so, an area to look at would be processing your data in parallel, allowing your code to take advantage of multiple CPUs/cores.
Consider a more efficient implementation of a sparse matrix data structure and not a multidimensional array (if you are using one now)
You need to remember that perfoming an NxN multipled by an NxN takes 2xN^3 calculations. Even so it shouldn't take hours. You should get an inprovement by transposing the second matrix (about 30%) but it really shouldn't be taking hours.
So as you 2x N you increase the time by 8x. Worse than that a matrix which fit into your cache is very fast but mroe than a few MB and they have to come from main memory which slows down your operations by another 2-5x.
Putting the data on disk will really slow down your calaculation, I only suggest you do this if you martix doesn't fit in memory, but it will make it 10x - 100x slower so buying a little more memory is a good idea. (In your case your matrixies should be small enough to fit into memory)
I tried Jama, which is a very basic library which use two dimensional arrays instead of one and on 4 year old labtop took 7 minutes. You should be able to get half this time by just using the latest hardware and with multiple threads cut this below one minute.
EDIT: Using a Xeon X5570, Jama multiplied two 5000x5000 matrices in 156 seconds. Using a parallel implementation I wrote, cut this time to 27 seconds.
Use the profiler in jvisualvm in the JDK to identify where the time is spent.
I would do some simple experiments to identify how your algoritm scales, because it sounds like you might use one that has a higher runtime complexity than you think. If it runs in N^3 (which is common if you want to multiply a list with an array) then doubling the input size will eight-double the run time.