I am trying to implement my own remote desktop solution in java. Using sockets and TCP/UDP.
I know I could use VNC or anything else, but its an assignmentwork from school that I want to do.
So for moving the mouse and clicking I could use the Robot class. I have two questions about this:
What about sending the video? I know the Robot class can capture the screen too, so should I just send images in a sequence and display in order at the other side of the connection? Is this the best way to implement remote desktop?
Also should I use TCP or UDP?
I think UDP would be harder to implement since I will have to figure out which image comes after the other.
What you are trying to do will work, but incredibly slow. The images must be compressed before you send them over the net. Before compressing, the number of colors should be reduced. Also, only the portions of the image which have changed since the last update should be sent.
When transferring mouse coordinates an update should only occur if the new mouse position is more than x pixels away from the last position away or a number of y seconds is over. Otherwise you spend so much traffic for the mouse position that there is no room for the images.
UDP will be the best solution here, since it is fastest for video streaming (which is what you are effectively doing).
About 2:
UDP would be much harder, since it's a datagram-based protocol there are limits on how much data you can send at a time; it's not very likely that you are going to be able to fit entire images into single datagrams. So, you're going to have to work with differential/partial updates, which becomes complicated pretty quickly.
TCP, however, is stream-based and only delivers data in-order. If a packet in the middle disappears and needs to be re-sent, all following packets need to wait, even if they've been received by the target machine. This creates lag, which is often very undesirable in interactive applications.
So UDP is probably your best choice, but you can't design it around the assumption that you can send whole images at a time, you need to come up with a way to send just parts of images.
Basically a video is a sequence of images ( frames ) displayed by second. You should send as much as your bandwidth allows you.
On the other hand, there is no point to send the raw image, you should compresss it as much as you can, and definitely consider lose a lot of resolution in the process.
You can take a look at this SO question about image compression if you compress it enough you may have a vivid video.
It will be better if you use Google Protocol buffer or Apache thrift. You will send binary data which will be smaller - by this, your software will work faster.
Related
I have a general question about programming the client/server communication on a network game.
I use TCP as protocol, and the communication ... works, but I'm not sure, if it is a efficient way.
In general , actions that happen on the client-side will go throught all this steps:
Some action (eg. a Fireball is cast)
[*]For this action i defined a string (eg. #F#270#130#, which means the 'F' says it's a fireball and 270 is (for example) the degree of the angle, 130 - the speed of the fireball that is shoot.)
String goes into outputpuffer of Client & waitingqueue
String is sent
String is received by server
[*] Server needs a lineinterpreter that can detect the meaning of the string (here : what means F? It is a fireball!) & adds a unique identity, based on, from which client the command was received.
[*]The Server needs to calculate logic, based on the action happened (fireball does damage to someone, does it hit someone (immediately) or does it just fly first?)
Server sends an (updated) string of the action(s) that occur to all clients. (eg. maybe the fireball is slowed down in speed for some reason - here will be an updated string (#F#12345#270#90# - 12345 is the unique player identity)
clients receive string
[*] clients resolve string to a command + handle it (fire an animationsequence...)
client that originaly sent the command compares received string with string in waitingqueue - when equal, do nothing (to smoothe out some action, otherwise through connection problems /delay, some action would occur twice or jump from location to location, based on ping
Is it really necessary to go through all these steps? At all steps marked with [*] i need to define new lineinterpreters/action for each command, so i'm coding each action twice, client & server-side.
I read something about sending serializable objects, but in genereal the idea seems to be the same to me, i send an object, that has to be interpreted+handled and i send an object back...
Any hints? To solve the whole communication more elegant, with less coding ? Or a bit more sorted - all these #F# #M# #H# tags for different actions are making it more and more complicated :)
(In fact i actually have the following handlers/actions:
-move
-look/rotate
-hpchange
-firearrow
-spawn/disconnect
...)
Hope you understand what I mean - I know, I could just continue coding like that, and it would work somehow, but it just seems too complicated as it could be.
Thanks!
You could do it in a more OO way if you:
Define an object called Action or something like that, which has all of the above parameters - type of action, direction of action (or
target), damage dealt, etc..
Create those Action objects as your game normally executes
Use ObjectOutputStream chained to your TPC Socket to output the whole Action object to the server/ pass it back to the client.
On the server, interpret what happens by examining the recieved object from ObjectInputStream.
I think this way would be cleaner and more flexible in case you add more logic, than just analyzing strings, but not as fast (since objects going into ObjectOutputStream need to be serialized).
You need to look at several factors before you decide if your game requires any changes.
Firstly, is TCP the best communication channel? Have you compared it to UDP. Would it be better to implement the networking using UDP? Would it matter if a few packets went missing? What happens if the network is slow?
Secondly, look at how often you are polling/pushing to the server. Can this be reduced? Does the game have to be in real-time. Do all parts of a game have to be in realtime. Perhaps certain things can be non-realtime. A fireball will continue in a straight path so you dont have to keep updating the server about its position, you can just tell it about its direction and speed. What other aspects of the game can be non real-time. The only thing that needs sending is, players locations and actions. Most other things like collision detection can be offloaded to the client.
Thirdly, does every keypress need to be sent to the server? If the user is against the wall and wants to move further, the client knows that they cannot and thus will not send those keypresses to the server. If the user has moved to a new valid location, then update the server. Can certain things be buffered and sent to the server in one go, instead of sending several queries to the server. i.e. if I move forward, jump and throw a fireball, thats 3 keypresses on the client side, can they be buffered and sent at the next 500th millisecond?
If you are worried about the networking overhead, you should be working at the bit level. Instead of sending a long string "#F#270#130#" - which is 11 bytes long, would it make sense to send 3 consecutive bytes (24 bits).
7 bits represent the action (127 different actions).
9 bits will represent the angle (1-512), but you only need it up to 0-360 degrees.
8 bits represent the force.
This or any other byte format is shorter and easier to use over the network, and produces tighter code. Also binary comparison is faster, so writing your action parser on the server is now made easier. I.e. instead of a large switch case looking for #F#, you just look at the first 7 bits and compare it to an int.
Can you reduce other networking overheads, instead of force being decided by the client, can the server decide this. i.e. a standard force, or 2 levels of force (much better as this can be represented by 1 bit). Which stops the client and malicious users sending rubbish data across to the server (like force of 999), now the force can either be a 0 or a 1, i.e. speed of 10 or 20, nothing silly.
As part of a learning experiment, I want to make a network application. I'm still learning about multithreading, and I've made a simple multithreaded "game" that involves drawing sprites on the screen, where the player can move them.
For this new project, I want to make something simple, just for the experience and learning. I want to create a server, where multiple clients can connect to it. The server will contain "game objects", which is just an object containing an x and y position, a string name, an ID to identify it, and velocity (dx and dy).
The "server" application will feature things like a worker thread updating the position of the objects (using their velocity), and it will also send the client the "state" of all game objects, so the clients can draw them for the players to see.
The clients will also send the server data whenever the player presses a button on the keyboard and generates a KeyEvent. The server will read this data and process it by updating the player's game object. For example, if the player presses the left arrow key, the client will send the server data indicating that the left key was pressed, the server will find the game object associated with the client (maybe I will store the objects in a Map like ?)
The problem is how to do this. I've never done any networking of this scale before. The most I've made so far is a buggy 2 player tic tac toe game. Since that is turn based, it is very different the application that I am now trying to make. As research, I read the entire Concurrency lesson on Oracle. I also read this tutorial on NIO. I've never used NIO before but I was recommended by multiple people to use it. I learned a lot about multithreading and concurrency from reading the code and figuring out how it works.
Since the SocketChannels communicate using ByteBuffers, one strategy I thought of is to take a List of GameObjects and serialize it to a byte array. Then create a ByteBuffer with the ByteArray and send it to the client. The client will then get all the GameObjects loaded in the server, and be able to draw all of them based on their type and states. One problem with this is that the byte array could get very large. I tested it with 256 GameObjects, and got a byte array of up to 9KB. If I compress the byte array, the size will be around 96 bytes. Still, I don't know if this is too big to be sent to the client 15 times a second.
Another option that I know of is to send the info byte by byte. For example, the first byte sent could identify what type of game object I am sending (example: cat, car, person), the next two bytes could identify the X position, then the next two the Y position, then the last byte used to identify the state of the object. Using this data, the client can just draw a sprite on the position received.
A big problem with all that I've said is that the client won't be able to interact with any of the objects. The client will just be able to send simple commands to the server, and "view" the game objects' sprites at its position. If the server wanted to tell the a game object to do a specific action, it would be hard to draw it since I will just be drawing static sprites to a screen, instead of interacting with the sprites.
I also am not sure whether or not NIO is the best idea. Most people I've talked to recommend it, but I don't quite understand the practical difference between implementing a non blocking NIO server and a multithreaded regular java.net.* server. My guess is that the non blocking server has better performance than a multithreaded one, and I would run into much less problems with just 1 thread rather than multiple threads per connection.
And finally, I've gotten mixed advice whether or not to use TCP or UDP. I've never used UDP before either. Some strongly suggest TCP, while others strongly suggest UDP.
As you can see, I am pretty unorganized and not sure on what to do. I feel like I have read a lot about networking and concurrency, and I already know enough about Swing to make graphical games. I just don't know how to put it all together.
Perhaps consider using Netty. I have found it extremely useful for rapid application development and fast enough for most server and client applications. You can get faster with a raw implementation of the reactor design pattern with Java NIO, but it will take a lot more work and code that is hard to debug.
I'm working in a remote administration toy project. For now, I'm able to capture screenshots and control the mouse using the Robot class. The screenshots are BufferedImage instances.
First of all, my requirements:
- Only a server and a client.
- Performance is important, since the client might be an Android app.
I've thought on opening two socket connections, one for mouse and system commands and the second one for the video feed.
How could I convert the screenshots to a video stream? Should I convert them to a known video format or would it be ok to just send a series of serialized images?
The compression is another problem. Sending the screen captures in full resolution would result in a low frame rate, according to my preliminary tests. I think I need at least 24 fps to perceive movement, so I've to both downscale and compress. I could convert the BufferedImages to jpg files and then set the compression rate, but I don't want to store the files on disk, they should live in RAM only. Another possibility would be to serialize instances (representing an uncompressed screenshot) to a GZipOutputStream. What is the correct approach for this?
To summarize:
In case you recommend the "series of images" approach, how would you serialize them to the socket OutputStream?
If your proposition is to convert to a know video format, which classes or libraries are available?
Thanks in advance.
UPDATE: my tests, client and server on same machine
-Full screen serialized BufferedImages (only dimension, type and int[]), without compression: 1.9 fps.
-full screen images through GZip streams: 2.6 fps.
-Downscaled images (640 width) and GZip streams: 6.56 fps.
-Full screen images and RLE encoding: 4.14 fps.
-Downscaled images and RLE encoding: 7.29 fps.
If its just screen captures, I would not compress them using a Video compression scheme, most likely you don't want lossy compression (blurred details in small text etc are the most common defects).
For getting a workable "remote Desktop" feel, remember the previously sent Screenshot and send only the difference to get to the next one. If nothing (or very little) changes between frames this is very efficient.
It will however not work well in certain situations like playing a video, game or scrolling a lot in a document.
Compressing the difference between two BufferedImage can be done with more or less elaborate methods, a very simple, yet reasonably effective method is simply to subtract one image from the other (resulting in zeros everywhere they are identical) and compressing the result with simple RLE (run length encoding).
Reducing the color precision can be used to further reduce the amount of data (depending on the use case you could omit the least significant N bits of each color channel, for most GUI applications look not much different if you reduce colors from 24 bits to 15 bits).
Break the screen up into a grid squares (or strips)
Only send the grid square if it's different from the previous
// server start
sendScreenMetaToClient(); // width, height, how many grid squares
...
// server loop ImageBuffer[] prevScrnGrid while(isRunning) {
ImageBuffer scrn = captureScreen();
ImageBuffer[] scrnGrid = screenToGrid(scrn);
for(int i = 0; i < scrnGrid.length; i++) {
if(isSameImage(scrnGrid[i], prevScrnGrid[i]) == false) {
prevScrnGrid[i] = scrnGrid[i];
sendGridSquareToClient(i, scrnGrid[i]); // send the client a message saying it will get grid square (i) then send the bytes for grid square (i)
}
} }
Don't send serialized java objects just send the image data.
ByteArrayOutputStream imgBytes = new ByteArrayOutputStream();
ImageIO.write( bufferedImage, "jpg", imgBytes );
imgBytes.flush();
Firstly, I might suggest only capturing a small part of the screen, rather than downscaling and potentially losing information, perhaps with something like a sliding window which can be moved around by pushing the edges with a cursor. This is really just a small design suggestion though.
As for compression, I would think that a series of images would not compress separately as well as with a decent video compression scheme, especially as frames are likely to remain consistent between captures in this scenario.
One option would be to use Xuggle, which is capable of capturing the desktop via Robot in a number of video formats afaiu, but I can't tell if you can stream and decode with this.
For capturing jpegs and converting them, then you can also use this.
Streaming these videos seems to be a little more complicated, though.
Also, it seems that the abandoned Java Media Framework supports this functionality.
My knowledge in this area is not fantastic tbh, so sorry if I have wasted your time, but it looks like some more useful information on the feasibility of using Xuggle as a screensharer has been compiled here. This also appears to link to their own notes on existing approaches.
If it doesn't need to be pure Java I reckon this would all be much easier using just by interfacing with a native screen capture tool...
Maybe it would be easiest just to send video as a series of jpegs after all! You could always implement your own compression scheme if you were feeling a little crazy...
I think you described a good solution in your question. Convert the images to jpeg, but don't write them as files to disk. If you want it to be a known video format, use M-JPEG. M-JPEG is a stream of jpeg frames in a standard format. Many digital cameras, especially older ones, save videos in this format.
You can get some information about how to play an M-JPEG stream from this question's answers: Android and MJPEG
If network bandwidth is a problem, then you'll want to use an inter-frame compression system such as MPEG-2, h.264, or similar. That requires a lot more processing than M-JPEG but is far more efficient.
If you're trying to get 24fps video then there's no reason not to use modern video codecs. Why try and recreate that wheel?
Xuggler works fine for encoding h264 video and sounds like it would serve your needs nicely.
So, I'm developing an Android app that communicates with server via a socket. Now, the phone will constantly be getting data from a server. The packets will not all be the same size. The first byte of the packet is the type, and then, if applicable, the next four bytes indicate the size. After that, is that many number as floats (4 bytes).
What's the best way to read these in? calling readByte(), readFloat(), readFloat(), etc. or use these Datagram things that I stumbled upon? Explain why. If you want more details to make a better suggestion, please ask.
Thanks,
Sounds like you want to DataInputStream and BuferredInputStream your sockets input stream.
You don't want Datagrams because that is for UDP not sockets.
Then I would use readByte(), optionally readInt() followed by readFloat() etc.
IIUC, if you are talking about JDK datagrams, they are not sockets, they are for UTP communication, which is lighter than TCP but less reliable, and generally used for real time audio/video streaming.
I'd go with DataInputStream/DataOutputStream, and use readByte, readInt, readFloat. This will save you a lot of details about encoding complex numbers on the wire, and also add a (thin) layer of error detection.
If you have java on the other side, it should be no problem at all. If you have C/C++ on the other side, mind that it could (but shouldn't) be using little endian instead of big endian, in which case your numbers (except readByte) will appear scrambled. You can find routines for reading and little endian numbers in java quite easily, but it can cause a bit of a pain attack for the first couple of hours :)
Due to (quite annoying) limitations on many J2ME phones, audio files cannot be played until they are fully downloaded. So, in order to play live streams, I'm forced to download chunks at a time, and construct ByteArrayInputStreams, which I then feed to Players.
This works well, except that there's an annoying gap of about 1/4 of a second every time a stream ends and a new one is needed. Is there any way to solve this problem, or the problem above?
The only good way to play long (3 minutes and more) tracks with J2ME JSR135, moderately reliably, on the largest number of handsets out there, is to use a "file://" url when you create the player, or to have the inputstream actually come from a FileConnection.
recent blackberry phones can use a ByteArrayInputstream only when they have a large java heap memory available.
a lot of phones running on the Symbian operating system will allow you to put files in a private area for the J2ME application while still being able to play tracks in the same location.
Unfortunately you can't get rid of these gaps, at least not on any device I've tried it on. It's very annoying indeed. It's part of the spec that you can't stream audio or video over HTTP.
If you want to stream from a server, the only way to do it is to use an RTSP server instead though you'll need to check support for this on your device.
And faking RTSP using a local server on the device (rtsp://localhost...) doesn't work either.. I tried that too.
EDIT2: Or you could just look at this which seems to be exactly what you want: http://java.sun.com/javame/reference/apis/jsr135/javax/microedition/media/protocol/DataSource.html
I would create two Player classes and make sure that I had received enough chunks before I started playing them. Then I would start playing the first chunk through player one and load the second one into player two. Then I would use the TimeBase class to keep track of how much time has passed and when I knew the first chunk would end (you should know how long each chunk has to play) then I would start playing the second chunk through the second player and load the third chunk into the first and so on and so forth until there are no more chunks to play.
The key here is using the TimeBase class properly to know when to make the transition. I think that that should get rid of the annoying 1/4 second gap bet between chunks. I hope that works, let me know if it does because it sounds really interesting.
EDIT: Player.prefetch() could also be useful here in reducing latency.