How to process videos in Java without any external API?
I have a mpeg4 or mpeg2 video which I want to view in a JPanel reading each frame one after another and then displaying the frames with paintComponent(). Displaying each file as BufferedImage in Graphics g.
My question is how to get an array of BufferedImage class from the video files?
You certainly can process videos in Java without relying on external libraries. It will take a fair amount of coding effort on your part, so be forewarned.
If you're decoding an MPEG-2 file, that will probably be a program stream, so you'll need to write the code to take that apart. MPEG-4 part 2 video will probably arrive in an MP4 container which will require a lot of code to take apart (not too hard, just a lot of details). So this is just the container; inside will be chunks of compressed video and audio.
Now you will need to decode either the MPEG-2 or MPEG-4 video. This will entail parsing variable length codes from the bitstream and recovering syntax elements. For intraframes, this will give you reconstruction data to apply to a stream of macroblocks. You will combine things like DCT coefficients, differential coding, dequantization factors, zigzag scan patterns, discrete cosine transforms, and possibly post-processing filters in order to recover the original image. That's just for intraframes; then there are the interframes which also apply motion vectors and copy data from previously decoded frames.
After getting a frame, you will find that it is in a YUV colorspace. You will probably need to manually convert it to an RGB colorspace in order to plot it onto a BufferedImage.
It's entirely possible that you could implement all of this on your own. Or you could find an appropriate Java-friendly multimedia library, complete with its own API, to do the heavy lifting for you.
Related
I am attempting to write a very simple DAW in Java but am having trouble playing an audio clip in a sequence. I have looked into both the sampled and MIDI classes in Java Sound but what I really need is a hybrid of the two.
It seems that with the MIDI classes you cannot use a sequencer for example, to play your own audio clip.
I have attempted to write my own sequencer using scheduling to play a javax.sound.sampled.Clip in a sequence but the timings vary far too much. It is not really a viable option as it doesn't keep time.
Does anybody have any suggestions of how I could get around this?
I can attest that an audio mixing system combining aspects of MIDI and samples can be written in Java, as I wrote my own and it currently works with samples and a couple real-time synths that I also wrote.
The key is making the audio data of the samples available on a per-frame basis and a frame-counting command-processor/audio-mixer that both manages the execution of "commands," and collects and mixes the audio frame data. With 44100 fps, that's accuracy in the vicinity of 0.02 milliseconds. I can describe in more detail if requested.
Another way to go, probably saner, though I haven't done it personally, would be to make use of a Java bridge to a system such as Jack.
EDIT: Answering questions in comment (12/8/19).
Audio sample data in Java is usually either held in memory (Java uses Clip) or read from a .wav file. Because the individual frames are not exposed by Clip, I wrote an alternate, and use it to hold the data as signed floats ranging -1 to 1. Signed floats are a common way to hold audio data that we are going to perform multiple operations upon.
For playback of .wav audio, Java combines reading the data with AudioInputStream and outputting with SourceDataLine. Your system will have to sit in the middle, intercepting the AudioInputStream, convert to PCM float frames, and counting the frames as you go.
A number of sources or tracks can be processed at the same time, and merged (simple addition of the normalized floats) to a single signal. This signal can be converted back to bytes and sent out for playback via a single SourceDataLine.
Counting output frames from an arbitrary 0th frame from the single SourceDataLine will help with keeping constituent incoming tracks coordinated, and will provide the frame number reference used to schedule any additional commands that you wish to execute prior to that frame being output (e.g., changing a volume/pan of a source, or a setting on a synth).
My personal alternate to a Clip is very similar to AudioCue which you are welcome to inspect and use. The main difference is that for better or worse, I'm processing everything one frame at a time in my system, and AudioCue and its "Mixer" process buffer loads. I've had several very credible people criticize my personal per-frame system as inefficient, so when I made the public API for AudioCue, I bowed to that preconception. [There are ways to add buffering to a per-frame system to recapture that efficiency, and per-frame makes scheduling simpler. So I'm sticking with my per-frame logical scheme.]
No, you can't use a sequencer to play your own clips directly.
In the MIDI world, you have to deal with samples, instruments, and soundbanks.
Very quickly, a sample is the audio data + informations such as looping points, note range covered by the sample, base volume and envelopes, etc.
An instrument is a set of samples, and a soundbank contain a set of instruments.
If you want to use your own sounds to play some music, you must make a soundbank out of them.
You will also need to use another implementation than the default provided by Java, because that default only read soundbanks in a proprietary format, which is gone since at least 15 and perhaps even 20 years.
Back in 2008-2009, there existed for example Gervill. It was able to read SF2 and DLS soundbanks. SF2 and DLS are two popular soundbank formats, several programs exist in the market, free or paid, to edit them.
If you want to go from the other way round, starting with sampled, that's also exact as you ahve noticed, you can't rely on timers, task schedule, Thread.sleep and the like to have enough precision.
The best precision you can achieve by using those is around 10ms, what's of course far too few to be acceptable for music.
The usual way to go here is to generate the audio of your music by mixing your audio clips yourself into the final clip. So you can achieve frame precision.
In fact that's very roughly what does a MIDI synthesizer.
So, lets say I want to recode some PNG to JPEG in Java. The image has extreme resolution, lets say for example 10 000 x 10 000px. Using "standard" Java image API Writers and Reader, you need at some point to have entire image decoded in RAM, which takes extreme amount of RAM space (hundreds of MB). I have been looking how other tools do this, and I found that ImageMagick uses disk pixel storage, but this seems to by way too slower for my needs. So what I need is tru streaming recoder. And by true streaming I mean read and process data by chuncks or bins, not just give stream as input but decode it whole beforehand.
Now, first the theory behind - is it even possible, given JPEG and PNG algorithms, to do this using streams, or lets say in bins of data? So there is no need to have entire image encoded in memory(or other storage)? In JPEG compression, first few stages could be done in streams, but I believe Huffman encoding needs to build entire tree of value probabilities after quantization, therefore it needs to analyze whole image - so whole image needs to be decoded beforehand, or somehow on demand by regions.
And the golden question, if above could be achieved, is there any Java library that can actually work in this way? And save large amount of RAM?
If I create a 10,000 x 10,000 PNG file, full of incompressible noise, with ImageMagick like this:
convert -size 10000x10000 xc:gray +noise random image.png
I see ImageMagick uses 675M of RAM to create the resulting 572MB file.
I can convert it to a JPEG with vips like this:
vips im_copy image.png output.jpg
and vips uses no more than 100MB of RAM while converting, and takes 7 seconds on a reasonable spec iMac around 4 years old - albeit with SSD.
I have thought about this for a while, and I would really like to implement such a library. Unfortunately, it's not that easy. Different image formats store pixels in different ways. PNG or GIFs may be interlaced. JPEGs may be progressive (multiple scans). TIFFs are often striped or tiled. BMPs are usually stored bottom up. PSDs are channeled. Etc.
Because of this, the minimum amount of data you have to read to recode to a different format, may in worst case be the entire image (or maybe not, if the format supports random access and you can live with a lot of seeking back and forth)... Resampling (scaling) the image to a new file using the same format would probably work in most cases though (probably not so good for progressive JPEGs, unless you can resample each scan separately).
If you can live with disk buffer though, as the second best option, I have created some classes that allows for BufferedImages to be backed by nio MappedByteBuffers (memory-mapped file Buffers, kind of like virtual memory). While performance isn't really like in-memory images, it's also not entirely useless. Have a look at MappedImageFactory and MappedFileBuffer.
I've written a PNG encoder/decoder that does that (read and write progressively, which only requires to store a row in memory) for PNG format: PNGJ
I don't know if there is something similar with JPEG
Android provides a default of 15 steps for its sound systems which you can access through Audio Manager. However, I would like to have finer control.
One method of doing so seems to be altering specific files within the Android system to divide the sound levels even further then default. I would like to programmatically achieve the same effect using Java.
Fine volume control is an example of the app being able to divide the sound levels into one hundred distinct intervals. How do I achieve this?
One way, in Java, to get very precise volume adjustment is to access the PCM data directly and multiply it by some factor, usually from 0 up to 1. Another is to try and access the line's volume control, if it has one. I've given up trying to do the latter. The precision is okay in terms of amplitude, but the timing is terrible. One can only have one volume change per audio buffer read.
To access the PCM data directly, one has to iterate through the audio read buffer, translate the bytes into PCM, perform the multiplication then translate back to bytes. But this gives you per-frame control, so very smooth and fast fades can be made.
EDIT: To do this in Java, first check out the sample code snippet at the start of this java tutorial link, in particular, the section with the comment
// Here, do something useful with the audio data that's now in the audioBytes array...
There are several StackOverflow questions that show code for the math to convert audio bytes to PCM and back, using Java. Should not be hard to uncover with a search.
Pretty late to the party, but I'm currently trying to solve this issue as well. IF you are making your own media player app and are running an instance of a MediaPlayer, then you can use the function setVolume(leftScalar, rightScalar) where leftScalar and rightScalar are floats in the range of 0.0 to 1.0. representing logarithmic scale volume for each respective ear.
HOWEVER, this means that you must have a reference to the currently active MediaPlayer instance. If you are making a music app, no biggie. If you're trying to run a background service that allows users to give higher precision over all media output, I'm not sure how to use this in that scenario.
Hope this helps.
I'm working in a remote administration toy project. For now, I'm able to capture screenshots and control the mouse using the Robot class. The screenshots are BufferedImage instances.
First of all, my requirements:
- Only a server and a client.
- Performance is important, since the client might be an Android app.
I've thought on opening two socket connections, one for mouse and system commands and the second one for the video feed.
How could I convert the screenshots to a video stream? Should I convert them to a known video format or would it be ok to just send a series of serialized images?
The compression is another problem. Sending the screen captures in full resolution would result in a low frame rate, according to my preliminary tests. I think I need at least 24 fps to perceive movement, so I've to both downscale and compress. I could convert the BufferedImages to jpg files and then set the compression rate, but I don't want to store the files on disk, they should live in RAM only. Another possibility would be to serialize instances (representing an uncompressed screenshot) to a GZipOutputStream. What is the correct approach for this?
To summarize:
In case you recommend the "series of images" approach, how would you serialize them to the socket OutputStream?
If your proposition is to convert to a know video format, which classes or libraries are available?
Thanks in advance.
UPDATE: my tests, client and server on same machine
-Full screen serialized BufferedImages (only dimension, type and int[]), without compression: 1.9 fps.
-full screen images through GZip streams: 2.6 fps.
-Downscaled images (640 width) and GZip streams: 6.56 fps.
-Full screen images and RLE encoding: 4.14 fps.
-Downscaled images and RLE encoding: 7.29 fps.
If its just screen captures, I would not compress them using a Video compression scheme, most likely you don't want lossy compression (blurred details in small text etc are the most common defects).
For getting a workable "remote Desktop" feel, remember the previously sent Screenshot and send only the difference to get to the next one. If nothing (or very little) changes between frames this is very efficient.
It will however not work well in certain situations like playing a video, game or scrolling a lot in a document.
Compressing the difference between two BufferedImage can be done with more or less elaborate methods, a very simple, yet reasonably effective method is simply to subtract one image from the other (resulting in zeros everywhere they are identical) and compressing the result with simple RLE (run length encoding).
Reducing the color precision can be used to further reduce the amount of data (depending on the use case you could omit the least significant N bits of each color channel, for most GUI applications look not much different if you reduce colors from 24 bits to 15 bits).
Break the screen up into a grid squares (or strips)
Only send the grid square if it's different from the previous
// server start
sendScreenMetaToClient(); // width, height, how many grid squares
...
// server loop ImageBuffer[] prevScrnGrid while(isRunning) {
ImageBuffer scrn = captureScreen();
ImageBuffer[] scrnGrid = screenToGrid(scrn);
for(int i = 0; i < scrnGrid.length; i++) {
if(isSameImage(scrnGrid[i], prevScrnGrid[i]) == false) {
prevScrnGrid[i] = scrnGrid[i];
sendGridSquareToClient(i, scrnGrid[i]); // send the client a message saying it will get grid square (i) then send the bytes for grid square (i)
}
} }
Don't send serialized java objects just send the image data.
ByteArrayOutputStream imgBytes = new ByteArrayOutputStream();
ImageIO.write( bufferedImage, "jpg", imgBytes );
imgBytes.flush();
Firstly, I might suggest only capturing a small part of the screen, rather than downscaling and potentially losing information, perhaps with something like a sliding window which can be moved around by pushing the edges with a cursor. This is really just a small design suggestion though.
As for compression, I would think that a series of images would not compress separately as well as with a decent video compression scheme, especially as frames are likely to remain consistent between captures in this scenario.
One option would be to use Xuggle, which is capable of capturing the desktop via Robot in a number of video formats afaiu, but I can't tell if you can stream and decode with this.
For capturing jpegs and converting them, then you can also use this.
Streaming these videos seems to be a little more complicated, though.
Also, it seems that the abandoned Java Media Framework supports this functionality.
My knowledge in this area is not fantastic tbh, so sorry if I have wasted your time, but it looks like some more useful information on the feasibility of using Xuggle as a screensharer has been compiled here. This also appears to link to their own notes on existing approaches.
If it doesn't need to be pure Java I reckon this would all be much easier using just by interfacing with a native screen capture tool...
Maybe it would be easiest just to send video as a series of jpegs after all! You could always implement your own compression scheme if you were feeling a little crazy...
I think you described a good solution in your question. Convert the images to jpeg, but don't write them as files to disk. If you want it to be a known video format, use M-JPEG. M-JPEG is a stream of jpeg frames in a standard format. Many digital cameras, especially older ones, save videos in this format.
You can get some information about how to play an M-JPEG stream from this question's answers: Android and MJPEG
If network bandwidth is a problem, then you'll want to use an inter-frame compression system such as MPEG-2, h.264, or similar. That requires a lot more processing than M-JPEG but is far more efficient.
If you're trying to get 24fps video then there's no reason not to use modern video codecs. Why try and recreate that wheel?
Xuggler works fine for encoding h264 video and sounds like it would serve your needs nicely.
I already have successfully coded my Steganography program in a PNG file using Java. My program works very well in both PNG and BMP files. But when I tried running my program in a JPG file, the revealed data is not the same as the original data. Certainly, the headers of each file type isn't the same. And so now I wonder; Do the data structures of PNG and JPG files aren't the same? I need to know exactly how to manipulate the bytes of a JPG file without affecting its header and the footer.
Thanks.
First of all you need to tell the exact method you are using for image steganography e.g. hiding the secret data in the lsb's of the pixels of the image, reading the file in a binary format etc.
If working with lsb's is your procedure then I hope the following answer satisfies your query-
'PNG' and 'BMP' are actually lossless file formats. After manipulating the bits of pixels of these formats when you create the new image no data is lost. This is the reason you are able to retrieve all the hidden data.
'JPG' formats however use a lossy compression technique due to which the data hidden in the pixels is lost. Even I faced this problem and the solution to this exists in handling the image in transform domain. You need to use the Direct Cosine Transform method for its implementation.
The transform domain involves manipulation of the algorithms and image transforms such as discrete cosine transformation (DCT) and wavelet transformation. These methods can hide information in more significant areas of the image and may also manipulate the properties of the image like luminance. These kinds of techniques are more effective than image domain bit-wise steganographic methods. The transform domain techniques can be applied to image of any format. Also, the conversion between lossless and lossly formats may survive.
How DCT works in steganography?
The image is broken down into 8x8 blocks of pixels. DCT is applied to each block from left to right, top to bottom. The quantization table compresses each block to scale the DCT coefficients and the message is embedded in the scaled DCT coefficients.
A lot of reasearch is still required in this method. I am working on its code and will post it ASAP.
It will be a pleasure to hear of other methods or different efficient techniques from other developers.