In my Android application I am recording the user's voice which I save as a .3gp encoded audio file.
What I want to do is open it up, i.e. the sequence x[n] representing the audio sample, in order to perform some audio signal analysis.
Does anyone know how I could go about doing this?
You can use the Android MediaCodec class to decode 3gp or other media files. The decoder output is standard PCM byte array. You can directly send this output to the Android AudioTrack class to play or continue with this output byte array for further processing such as DSP. To apply DSP algorithm the byte array must be transform into float/double array. There are several steps to get the byte array output. In summary it looks like as follows:
Instantiate MediaCodec
String mMime = "audio/3gpp"
MediaCodec mMediaCodec = MediaCodec.createDecoderByType(mMime);
Create Media format and configure media codec
MediaFormat mMediaFormat = new MediaFormat();
mMediaFormat = MediaFormat.createAudioFormat(mMime,
mMediaFormat.getInteger(MediaFormat.KEY_SAMPLE_RATE),
mMediaFormat.getInteger(MediaFormat.KEY_CHANNEL_COUNT));
mMediaCodec.configure(mMediaFormat, null, null, 0);
mMediaCodec.start();
Capture output from MediaCodec ( Should process inside a thread)
MediaCodec.BufferInfo buf_info = new MediaCodec.BufferInfo();
int outputBufferIndex = mMediaCodec.dequeueOutputBuffer(buf_info, 0);
byte[] pcm = new byte[buf_info.size];
mOutputBuffers[outputBufferIndex].get(pcm, 0, buf_info.size);
This Google IO talk might be relevant here.
Related
I have mp3 files I'd like to run through Google's Cloud Speech API [reference] - but just the first 15 seconds of each audio file. I'm working in Scala with jlayer, mp3spi, and tritonus libraries imported as suggested by JavaZoom. My code so far looks like this:
val in = AudioSystem.getAudioInputStream(new URL("mySong.mp3"))
val baseFormat = in.getFormat
val decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
16000,
16,
baseFormat.getChannels,
baseFormat.getChannels * 2,
16000,
false)
val audioInputStream = AudioSystem.getAudioInputStream(decodedFormat, in)
val buffer = new Array[Byte](16000*4*15)
var i = 0
while (audioInputStream.available() > 0) {
i += audioInputStream.read(buffer)
}
audioInputStream.close()
in.close()
// pass this to API request:
lazy val recognitionConfig: RecognitionConfig = RecognitionConfig.newBuilder
.setEncoding(AudioEncoding.LINEAR16)
.setLanguageCode("en-US")
.setSampleRateHertz(16000)
.build
val request = RecognizeRequest.newBuilder()
.setAudio(RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(buffer)).build())
.setConfig(recognitionConfig)
.build()
However, when I print out the value of the ByteString-copied buffer it's only 0's and the API call returns nothing. Any ideas on what I'm doing wrong? this is my first time manipulating audio in Java/Scala so I may be missing something obvious...
I had the same problem. You get nothing if the audio is not intelligible or is encoded in a format but not decoded appropriately, and other specifics e.g. th audio file can't be stereo, it needs to be mono.
So I first converted the audio from .mp3 to .flac as follows using the ffmpeg module (in python - you need to find its scala version):
# turn the video into audio
ff = ffmpy.FFmpeg(inputs={input_file_path: None}, outputs={output_file_path: '-y -vn -acodec flac -ar 16000 -ac 1'})
ff.run()
the input_file_path and the out_file_path are strings that contain paths for the input and output audio file locations. Note: you can test the output audio file to see if the conversion was successful using the play command.
Having done the above, you can now use AudioFormat.Encoding.FLAC and AudioEncoding.FLAC instead.
I'm trying simply to convert a .mov file into .webm using Xuggler, which should work as FFMPEG supports .webm files.
This is my code:
IMediaReader reader = ToolFactory.makeReader("/home/user/vids/2.mov");
reader.addListener(ToolFactory.makeWriter("/home/user/vids/2.webm", reader));
while (reader.readPacket() == null);
System.out.println( "Finished" );
On running this, I get this error:
[main] ERROR org.ffmpeg - [libvorbis # 0x8d7fafe0] Specified sample_fmt is not supported.
[main] WARN com.xuggle.xuggler - Error: could not open codec (../../../../../../../csrc/com/xuggle/xuggler/StreamCoder.cpp:831)
Exception in thread "main" java.lang.RuntimeException: could not open stream com.xuggle.xuggler.IStream#-1921013728[index:1;id:0;streamcoder:com.xuggle.xuggler.IStreamCoder#-1921010088[codec=com.xuggle.xuggler.ICodec#-1921010232[type=CODEC_TYPE_AUDIO;id=CODEC_ID_VORBIS;name=libvorbis;];time base=1/44100;frame rate=0/0;sample rate=44100;channels=1;];framerate:0/0;timebase:1/90000;direction:OUTBOUND;]: Operation not permitted
at com.xuggle.mediatool.MediaWriter.openStream(MediaWriter.java:1192)
at com.xuggle.mediatool.MediaWriter.getStream(MediaWriter.java:1052)
at com.xuggle.mediatool.MediaWriter.encodeAudio(MediaWriter.java:830)
at com.xuggle.mediatool.MediaWriter.onAudioSamples(MediaWriter.java:1441)
at com.xuggle.mediatool.AMediaToolMixin.onAudioSamples(AMediaToolMixin.java:89)
at com.xuggle.mediatool.MediaReader.dispatchAudioSamples(MediaReader.java:628)
at com.xuggle.mediatool.MediaReader.decodeAudio(MediaReader.java:555)
at com.xuggle.mediatool.MediaReader.readPacket(MediaReader.java:469)
at com.mycompany.xugglertest.App.main(App.java:13)
Java Result: 1
Any ideas?
There's a funky thing going on with Xuggler where it doesn't always allow you to set the sample rate of IAudioSamples. You'll need to use an IAudioResampler.
Took me a while to figure this out. This post by Marty helped a lot, though his code is outdated now.
Here's how you fix it.
.
Before encoding
I'm assuming here that audio input has been properly set up, resulting in an IStreamCoder called audioCoder.
After that's done, you are probably initiating an IMediaWriter and adding an audio stream like so:
final IMediaWriter oggWriter = ToolFactory.makeWriter(oggOutputFile);
// Using stream 1 'cause there is also a video stream.
// For an audio only file you should use stream 0.
oggWriter.addAudioStream(1, 1, ICodec.ID.CODEC_ID_VORBIS,
audioCoder.getChannels(), audioCoder.getSampleRate());
Now create an IAudioResampler:
IAudioResampler oggResampler = IAudioResampler.make(audioCoder.getChannels(),
audioCoder.getChannels(),
audioCoder.getSampleRate(),
audioCoder.getSampleRate(),
IAudioSamples.Format.FMT_FLT,
audioCoder.getSampleFormat());
And tell your IMediaWriter to update to its sample format:
// The stream 1 here is consistent with the stream we added earlier.
oggWriter.getContainer().getStream(1).getStreamCoder().
setSampleFormat(IAudioSamples.Format.FMT_FLT);
.
During encoding
You are currently probably initiating an IAudioSamples and filling it with audio data, like so:
IAudioSamples audioSample = IAudioSamples.make(512, audioCoder.getChannels(),
audioCoder.getSampleFormat());
int bytesDecoded = audioCoder.decodeAudio(audioSample, packet, offset);
Now initiate an IAudioSamples for our resampled data:
IAudioSamples vorbisSample = IAudioSamples.make(512, audioCoder.getChannels(),
IAudioSamples.Format.FMT_FLT);
Finally, resample the audio data and write the result:
oggResampler.resample(vorbisSample, audioSample, 0);
oggWriter.encodeAudio(1, vorbisSample);
.
Final thought
Just a hint to get your output files to play well:
If you use audio and video within the same container, then audio and video data packets should be written in such an order that the timestamp of each data packet is higher than that of the previous data packet. So you are almost certainly going to need some kind of buffering mechanism that alternates writing audio and video.
I've a problem in converting byte to .mp3 sound file. In my case I do it using FileOutputStream using its write(bytes) method but it just creates a data file with mp3 extension but I cannot play it in any player on my PC.
Note: I'm recording it from Flex Michrophone and send ByteArray to java.
Which libraries should I use to add mp3 sound file headers etc. in java?
UPDATE: I couldn't even convert my raw data to Wave format that is supported by java sound api.. It creates for me sound with recorded sound but with a noise - where's the problem?
Here's my code for wave:
AudioFormat format = new AudioFormat(Encoding.PCM_SIGNED, 44100, 16, 2, 2, 44100, true);
ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
AudioInputStream stream = new AudioInputStream(bais, format, bytes.length/format.getFrameSize());
AudioSystem.write(stream, AudioFileFormat.Type.WAVE, new File(path+"zzz.wav"));
What's wrong with my AudioFormat??? And which one do I have to use in MP3 case ?!
Urgent help! Any help will be highly appreciated!
Just writing the raw bytes to a file with the name extension .mp3 doesn't magically convert the data to the MP3 format. You need to use an encoder to compress it.
A quick google search found LAMEOnJ which is a Java API for the popular LAME MP3 encoder.
I am building an export application using Xuggler that exports a h264 encoded recording so that it can be played in an external player ( writing the video recording to .avi or .mp4 container).
I am interested to know how one could create a IPacket from a byte array representing a video frame. What parameters from the IPacket need to be set and what values should those contain?
And again what parameters should be set and what should be their values for the container that gathers the packets?
packet = IPacket.make( IBuffer.make( null, data, 0, data.length ));
packet.setTimeStamp( time );
packet.setTimeBase( IRational.make(1,1000) );
int pksz = packet.getSize();
packet.setComplete(true, pksz);
I am struggling with the transfer of a simple jpeg file inside an ID3v2 tag from c++ over a TCP socket to java (Android). The library "taglib" offers to extract this file and I am able to save the jpeg as a new file.
The send function looks like this
char *parameter_full = new char[f3->picture().size()+2];
sprintf(parameter_full,"%s\n\0",f3->picture().data());
// send
result = send(c,parameter_full,strlen(parameter_full),0);
delete[] parameter_full;
where
f3->picture().data() returns a pointer to the internal data structure (it returns char*) and
f3->picture().size() returns the size of the array.
Then Android receives it with
String imageString = inFromServer.readLine();
byte[] imageBytes = imageString.getBytes();
Bitmap cover = BitmapFactory.decodeByteArray(imageBytes,0,imageBytes.length);
But somehow decodeByteArray always returns null. My idea is that Java doesn't receive the image correctly because imageString only consists of 4 characters...while the extracted jpeg file has a size of 12.7 KB.
But what has gone wrong?
Martin
You shouldn't use string functions on byte data because 0 values are taken as string terminators. Try looking into memcpy on the C++ side if you need to copy the char* and also the byte[] read functions for InputStream on the Java side.