I am using SikuliX to check when a video on a website has ended.
I do this by comparing my region (which is my active web browser window) to a screen capture of the region I have taken while the video is playing.
If it doesn't match, that means the video is still playing and I will take a new screen capture, which will be run through the while loop again for comparison.
If it matches, it means the video has stopped and will exit the while loop.
It works when it first goes through the loop. The while loop returns null which means the video is playing. However, on the second time it loops, it will exit the while loop and tell me my video has stopped but it clearly hasn't.
Is my logic flawed?
// Get dimensions of the bounding rectangle of the specified window
WinDef.HWND hwnd = User32.INSTANCE.GetForegroundWindow();
WinDef.RECT dimensions = new WinDef.RECT();
// Get screen coordinates of upper-left and lower-right corners of the window in dimensions
User32.INSTANCE.GetWindowRect(hwnd, dimensions);
Rectangle window = new Rectangle(dimensions.toRectangle());
int x = window.x;
int y = window.y;
int width = window.width;
int height = window.height;
// Initialize screen region for Sikuli to match
Region region = new Region(x, y, width, height);
Robot robot;
Image image;
Pattern p;
try {
robot = new Robot(); // Gets and saves a reference to a new Robot object
} catch (AWTException e) {
throw new RuntimeException(
"Failed to initialize robot...");
}
robot.delay(3000); // Delay robot for 3 seconds
// Take a screen capture of the region
BufferedImage capture = robot.createScreenCapture(dimensions.toRectangle());
image = new Image(capture);
p = new Pattern(image);
region.wait(1.0); // Wait 1 second
// Check if region content is still the same
while (region.exists(p.similar((float) 0.99), 0) == null) {
System.out.println("Video is playing");
// Take a new screen capture of the region
BufferedImage captureLoop = robot.createScreenCapture(dimensions.toRectangle());
image = new Image(captureLoop);
p = new Pattern(image);
region.wait(1.0); // Wait 1 second
}
System.out.println("Video has stopped");
Instead of using while (region.exists(p.similar((float) 0.99), 0) == null), using while (region.compare(image).getScore() == 1.0) to compare the region with the screen capture gave the results I wanted.
Related
I have been searching the whole day for a solution. I've checked out several Threads regarding my problem.
Custom detector object
Reduce bar code tracking window
and more...
But it didn't help me a lot. Basically I want that the Camera Preview is fullscreen but text only gets recognized in the center of the screen, where a Rectangle is drawn.
Technologies I am using:
Google Mobile Vision API’s for Optical character recognition(OCR)
Dependecy: play-services-vision
My current state: I created a BoxDetector class:
public class BoxDetector extends Detector {
private Detector mDelegate;
private int mBoxWidth, mBoxHeight;
public BoxDetector(Detector delegate, int boxWidth, int boxHeight) {
mDelegate = delegate;
mBoxWidth = boxWidth;
mBoxHeight = boxHeight;
}
public SparseArray detect(Frame frame) {
int width = frame.getMetadata().getWidth();
int height = frame.getMetadata().getHeight();
int right = (width / 2) + (mBoxHeight / 2);
int left = (width / 2) - (mBoxHeight / 2);
int bottom = (height / 2) + (mBoxWidth / 2);
int top = (height / 2) - (mBoxWidth / 2);
YuvImage yuvImage = new YuvImage(frame.getGrayscaleImageData().array(), ImageFormat.NV21, width, height, null);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
yuvImage.compressToJpeg(new Rect(left, top, right, bottom), 100, byteArrayOutputStream);
byte[] jpegArray = byteArrayOutputStream.toByteArray();
Bitmap bitmap = BitmapFactory.decodeByteArray(jpegArray, 0, jpegArray.length);
Frame croppedFrame =
new Frame.Builder()
.setBitmap(bitmap)
.setRotation(frame.getMetadata().getRotation())
.build();
return mDelegate.detect(croppedFrame);
}
public boolean isOperational() {
return mDelegate.isOperational();
}
public boolean setFocus(int id) {
return mDelegate.setFocus(id);
}
#Override
public void receiveFrame(Frame frame) {
mDelegate.receiveFrame(frame);
}
}
And implemented an instance of this class here:
final TextRecognizer textRecognizer = new TextRecognizer.Builder(App.getContext()).build();
// Instantiate the created box detector in order to limit the Text Detector scan area
BoxDetector boxDetector = new BoxDetector(textRecognizer, width, height);
//Set the TextRecognizer's Processor but using the box collider
boxDetector.setProcessor(new Detector.Processor<TextBlock>() {
#Override
public void release() {
}
/*
Detect all the text from camera using TextBlock
and the values into a stringBuilder which will then be set to the textView.
*/
#Override
public void receiveDetections(Detector.Detections<TextBlock> detections) {
final SparseArray<TextBlock> items = detections.getDetectedItems();
if (items.size() != 0) {
mTextView.post(new Runnable() {
#Override
public void run() {
StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < items.size(); i++) {
TextBlock item = items.valueAt(i);
stringBuilder.append(item.getValue());
stringBuilder.append("\n");
}
mTextView.setText(stringBuilder.toString());
}
});
}
}
});
mCameraSource = new CameraSource.Builder(App.getContext(), boxDetector)
.setFacing(CameraSource.CAMERA_FACING_BACK)
.setRequestedPreviewSize(height, width)
.setAutoFocusEnabled(true)
.setRequestedFps(15.0f)
.build();
On execution this Exception is thrown:
Exception thrown from receiver.
java.lang.IllegalStateException: Detector processor must first be set with setProcessor in order to receive detection results.
at com.google.android.gms.vision.Detector.receiveFrame(com.google.android.gms:play-services-vision-common##19.0.0:17)
at com.spectures.shopendings.Helpers.BoxDetector.receiveFrame(BoxDetector.java:62)
at com.google.android.gms.vision.CameraSource$zzb.run(com.google.android.gms:play-services-vision-common##19.0.0:47)
at java.lang.Thread.run(Thread.java:919)
If anyone has a clue, what my fault is or has any alternatives I would really appreciate it. Thank you!
This is what I want to achieve, a Rect. Text area scanner:
Google vision detection have the input is a frame. A frame is an image data and contain a width and height as associated data. U can process this frame (Cut it to smaller centered frame) before pass it to the Detector. This process must be fast and do along camera processing image.
Check out my Github below, Search for FrameProcessingRunnable. U can see the frame input there. u can do the process yourself there.
CameraSource
You can try to pre-parse the CameraSource feed as #'Thành Hà Văn' mentioned (which I myself tried first, but discarded after trying to adjust for the old and new camera apis) but I found it easier to just limit your search area and use the detections returned by the default Vision detections and CameraSource. You can do it in several ways. For example,
(1) limiting the area of the screen by setting bounds based on the screen/preview size
(2) creating a custom class that can be used to dynamically set the detection area
I chose option 2 (I can post my custom class if needed), and then in the detection area, I filtered it for detections only within the specified area:
for (j in 0 until detections.size()) {
val textBlock = detections.valueAt(j) as TextBlock
for (line in textBlock.components) {
if((line.boundingBox.top.toFloat()*hScale) >= scanView.top.toFloat() && (line.boundingBox.bottom.toFloat()*hScale) <= scanView.bottom.toFloat()) {
canvas.drawRect(line.boundingBox, linePainter)
if(scanning)
if (((line.boundingBox.top.toFloat() * hScale) <= yTouch && (line.boundingBox.bottom.toFloat() * hScale) >= yTouch) &&
((line.boundingBox.left.toFloat() * wScale) <= xTouch && (line.boundingBox.right.toFloat() * wScale) >= xTouch) ) {
acceptDetection(line, scanCount)
}
}
}
}
The scanning section is just some custom code I used to allow the user to select what detections they wanted to keep. You would replace everything inside the if(line....) loop with your custom code to only act on the cropped detection area. Note, this example code only crops vertically, but you could also drop horizontally as well, and both directions also.
In google-vision you can get the coordinates of a detected text like described in How to get position of text in an image using Mobile Vision API?
You get the TextBlocks from TextRecognizer, then you filter the TextBlock by their coordinates, that can be determined by the getBoundingBox() or getCornerPoints() method of TextBlocks class :
TextRecognizer
Recognition results are returned by detect(Frame). The OCR algorithm
tries to infer the text layout and organizes each paragraph into
TextBlock instances. If any text is detected, at least one TextBlock
instance will be returned.
[..]
Public Methods
public SparseArray<TextBlock> detect (Frame frame) Detects and recognizes text in a image. Only supports bitmap and NV21 for now.
Returns mapping of int to TextBlock, where the int domain represents an opaque ID for the text block.
source : https://developers.google.com/android/reference/com/google/android/gms/vision/text/TextRecognizer
TextBlock
public class TextBlock extends Object implements Text
A block of text (think of it as a paragraph) as deemed by the OCR
engine.
Public Method Summary
Rect getBoundingBox() Returns the TextBlock's axis-aligned bounding box.
List<? extends Text> getComponents() Smaller components that comprise this entity, if any.
Point[] getCornerPoints() 4 corner points in clockwise direction starting with top-left.
String getLanguage() Prevailing language in the TextBlock.
String getValue() Retrieve the recognized text as a string.
source : https://developers.google.com/android/reference/com/google/android/gms/vision/text/TextBlock
So you basically proceed like in How to get position of text in an image using Mobile Vision API? however you do not split any block in lines and then any line in words like
//Loop through each `Block`
foreach (TextBlock textBlock in blocks)
{
IList<IText> textLines = textBlock.Components;
//loop Through each `Line`
foreach (IText currentLine in textLines)
{
IList<IText> words = currentLine.Components;
//Loop through each `Word`
foreach (IText currentword in words)
{
//Get the Rectangle/boundingBox of the word
RectF rect = new RectF(currentword.BoundingBox);
rectPaint.Color = Color.Black;
//Finally Draw Rectangle/boundingBox around word
canvas.DrawRect(rect, rectPaint);
//Set image to the `View`
imgView.SetImageDrawable(new BitmapDrawable(Resources, tempBitmap));
}
}
}
instead you get the boundary box of all text blocks and then select the boundary box with the coordinates closest to the center of the screen/frame or the rectangle that you specify (i.e. How can i get center x,y of my view in android?) . For this you use the getBoundingBox() or getCornerPoints() method of TextBlocks ...
Currently I am developing a project and it is a requirement that I screenshot the current active window on the screen (Assuming one monitor) and save it as an image.
I have worked at the following code which screenshots the entire screen:
int x = 0,y = 0;
Color suit = new Robot().getPixelColor(x, y);
Rectangle fs = new Rectangle(Toolkit.getDefaultToolkit().getScreenSize());
BufferedImage rank = new Robot().createScreenCapture(fs);
ImageIO.write(rank, "bmp", new File("hi.bmp"));
and I am of the understanding that to get the size of the current active window one must use a method as such:
public static long getHWnd(Frame f) {
return f.getPeer() != null ? ((WComponentPeer) f.getPeer()).getHWnd() : 0;
}
however I am having trouble implementing this method into my code, and I have no previous experience working with frames or rectangles.
Could I be pointed in the right direction in terms of where to go next!
Thanks.
i am writing a game using Libgdx and i need to detect when the user touches a sprite.
I tried to do so with this following code:
this code set the rect's position
for(int i = 0;i<circlesArray.length;i++)
{
rect[i] = new Rectangle(circles.getPosition(i).x,circles.getPosition(i).y,height/8,height/8);
}
and this code set the click as rectangle and checks if it overlaps the sprite
if(Gdx.input.isTouched())
{
click = new Rectangle(Gdx.input.getX(),Gdx.input.getY(),Gdx.input.getX(),Gdx.input.getY());
if(Intersector.overlaps(click,rect[1]));
{
System.out.println("clicked");
x[1] = 0;
}
}
From some reason that i cant understand the program detects a collision even when it is not happened, when i tap anywhere on the screen it says that i pressed the sprite.
What should i do to fix it?
This is your issue:
click = new Rectangle(Gdx.input.getX(), Gdx.input.getY(), Gdx.input.getX(), Gdx.input.getY());
Should be:
click = new Rectangle(Gdx.input.getX(), Gdx.input.getY(), 1, 1);
The last two parameters of a Rectangle are width and height. Here we give the rectangle a width and a height of 1 pixel but you can set it to whatever you like. You were setting those parameters as the inputs x and y which are different every time giving you varying results.
Edit:
To translate the input coordinates to your game world's coordinates based on the camera you have to do this:
Vector3 clickPos = new Vector3(Gdx.input.getX(), Gdx.input.getY(), 0);
camera.unproject(clickPos);
click = new Rectangle(clickPos.x, clickPos.y, 1, 1);
When you use camera.unproject(Vector) it translates your inputs coordinates to work based off of where your camera is positioned in the game world. So now we're using a vector clickPos.
The reason we use a Vector3 is because this is what is required by the unproject function. We supply this vector with the inputs x and y but give it a 0 for the third parameter z since this is a 2d world.
Occasionally I have to display a popup or dialog relative to an existing component (prime example is a date input control with a calendar button beside it).
It worked beautifully for years, but always had the bug that the calendar could partially appear outside the screen (it was hardcoded to appear just to the right of the field). Just nobody ever noticed because there was never a date control at the far right in a window. Well that changed recently with the addition of a new window.
Well then, I thought, lets just fix a windows position (after I positioned it where it should be) to be completely on screen. I wrote a simple utility method to do just that:
public static void correctWindowLocationForScreen(Window window) {
GraphicsConfiguration gc = window.getGraphicsConfiguration();
Rectangle screenRect = gc.getBounds();
Rectangle windowRect = window.getBounds();
Rectangle newRect = new Rectangle(windowRect);
if (windowRect.x + windowRect.width > screenRect.x + screenRect.width)
newRect.x = screenRect.x + screenRect.width - windowRect.width;
if (windowRect.y + windowRect.height > screenRect.y + screenRect.height)
newRect.y = screenRect.y + screenRect.height - windowRect.height;
if (windowRect.x < screenRect.x)
newRect.x = screenRect.x;
if (windowRect.y < screenRect.y)
newRect.y = screenRect.y;
if (!newRect.equals(windowRect))
window.setLocation(newRect.x, newRect.y);
}
Problem solved. Or not. I position my window using the on-screen coordinates from the triggering component (the button that makes the calendar appear):
JComponent invoker = ... // passed in from the date field (a JButton)
Window owner = SwingUtilities.getWindowAncestor(invoker);
JDialog dialog = new JDialog(owner);
dialog.setLocation(invoker.getLocationOnScreen());
correctWindowLocationForScreen(dialog);
Havoc breaks out if the "invoker" component is located in a window that spans two screens. Apparently "window.getGraphicsConfiguration()" returns whatever graphic configuration the windows top left corner happens to be in. Thats not necessarily the screen where the date component within the window is located.
So how can I position my dialog properly in this case?
One can iterate over all devices, and find the monitor where the point is in. Then keep to that Rectangle.
See GraphicsEnvironment.getScreenDevices.
This will not use the current Window, but you already found out that a window may be shown in several monitors.
Useful might be Component.getLocationOnScreen.
Ok, here is what I ended up with (a wall of code to handle the odd edge case).
correctWindowLocationForScreen() will reposition a window if it is not completely within the visible screen area (simplest case, its completely on one screen. Hard case, it spans multiple screens). If the window leaves the complete screen area by just one pixel, it is repositioned using the first screen rectangle found. If the window doesn't fit the screen, its positioned at the top left and extends over the screen to bottom right (its implied by the order in which positionInsideRectangle() checks/alters coordinates).
Its quite complicated considering the requirement is pretty simple.
/**
* Check that window is completely on screen, if not correct position.
* Will not ensure the window fits completely onto the screen.
*/
public static void correctWindowLocationForScreen(final Window window) {
correctComponentLocation(window, getScreenRectangles());
}
/**
* Set the component location so that it is completely inside the available
* regions (if possible).
* Although the method will make some effort to place the component
* nicely, it may end up partially outside the regions (either because it
* doesn't fit at all, or the regions are placed badly).
*/
public static void correctComponentLocation(final Component component, final Rectangle ... availableRegions) {
// check the simple cases (component completely inside one region, no regions available)
final Rectangle bounds = component.getBounds();
if (availableRegions == null || availableRegions.length <= 0)
return;
final List<Rectangle> intersecting = new ArrayList<>(3);
for (final Rectangle region : availableRegions) {
if (region.contains(bounds)) {
return;
} else if (region.intersects(bounds)) {
// partial overlap
intersecting.add(region);
}
}
switch (intersecting.size()) {
case 0:
// position component in the first available region
positionInsideRectangle(component, availableRegions[0]);
return;
case 1:
// position component in the only intersecting region
positionInsideRectangle(component, intersecting.get(0));
return;
default:
// uuuh oooh...
break;
}
// build area containing all detected intersections
// and check if the bounds fall completely into the intersection area
final Area area = new Area();
for (final Rectangle region : intersecting) {
final Rectangle2D r2d = new Rectangle2D.Double(region.x, region.y, region.width, region.height);
area.add(new Area(r2d));
}
final Rectangle2D boundsRect = new Rectangle2D.Double(bounds.x, bounds.y, bounds.width, bounds.height);
if (area.contains(boundsRect))
return;
// bah, just place it in the first intersecting region...
positionInsideRectangle(component, intersecting.get(0));
}
/**
* Position component so that its completely inside the rectangle.
* If the component is larger than the rectangle, component will
* exceed to rectangle bounds to the right and bottom, e.g.
* the component is placed at the rectangles x respectively y.
*/
public static void positionInsideRectangle(final Component component, final Rectangle region) {
final Rectangle bounds = component.getBounds();
int x = bounds.x;
int y = bounds.y;
if (x + bounds.width > region.x + region.width)
x = region.x + region.width - bounds.width;
if (y + bounds.height > region.y + region.height)
y = region.y + region.height - bounds.height;
if (region.x < region.x)
x = region.x;
if (y < region.y)
y = region.y;
if (x != bounds.x || y != bounds.y)
component.setLocation(x, y);
}
/**
* Gets the available display space as an arrays of rectangles
* (there is one rectangle for each screen, if the environment is
* headless the resulting array will be empty).
*/
public static Rectangle[] getScreenRectangles() {
try {
Rectangle[] result;
final GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
final GraphicsDevice[] devices = ge.getScreenDevices();
result = new Rectangle[devices.length];
for (int i=0; i<devices.length; ++i) {
final GraphicsDevice gd = devices[i];
result[i] = gd.getDefaultConfiguration().getBounds();
}
return result;
} catch (final Exception e) {
return new Rectangle[0];
}
}
I'm doing a project which involves taking a live camera feed and displaying it on a window for the user.
As the camera image is the wrong way round by default, I'm flipping it using cvFlip (so the computer screen is like a mirror) like so:
while (true)
{
IplImage currentImage = grabber.grab();
cvFlip(currentImage,currentImage, 1);
// Image then displayed here on the window.
}
This works fine most of the time. However, for a lot of users (mostly on faster PCs), the camera feed flickers violently. Basically an unflipped image is displayed, then a flipped image, then unflipped, over and over.
So I then changed things a bit to detect the problem...
while (true)
{
IplImage currentImage = grabber.grab();
IplImage flippedImage = null;
cvFlip(currentImage,flippedImage, 1); // l-r = 90_degrees_steps_anti_clockwise
if(flippedImage == null)
{
System.out.println("The flipped image is null");
continue;
}
else
{
System.out.println("The flipped image isn't null");
continue;
}
}
The flipped image appears to always return null. Why? What am I doing wrong? This is driving me crazy.
If this is an issue with cvFlip(), what other ways are there to flip an IplImage?
Thanks to anyone who helps!
You need to initialise the flipped image with an empty image rather than NULL before you can store a result in it. Also, you should only create the image once and then re-use the memory for more efficiency. So a better way to do this would be something like below (untested):
IplImage current = null;
IplImage flipped = null;
while (true) {
current = grabber.grab();
// Initialise the flipped image once the source image information
// becomes available for the first time.
if (flipped == null) {
flipped = cvCreateImage(
current.cvSize(), current.depth(), current.nChannels()
);
}
cvFlip(current, flipped, 1);
}