Playing multiple byte arrays simultaneously in Java - java

How can you play multiple (audio) byte arrays simultaneously? This "byte array" is recorded by TargetDataLine, transferred using a server.
What I've tried so far
Using SourceDataLine:
There is no way to play mulitple streams using SourceDataLine, because the write method blocks until the buffer is written. This problem cannot be fixed using Threads, because only one SourceDataLine can write concurrently.
Using the AudioPlayer Class:
ByteInputStream stream2 = new ByteInputStream(data, 0, data.length);
AudioInputStream stream = new AudioInputStream(stream2, VoiceChat.format, data.length);
AudioPlayer.player.start(stream);
This just plays noise on the clients.
EDIT
I don't receive the voice packets at the same time, it's not simultaneously, more "overlapping".

Apparently Java's Mixer interface was not designed for this.
http://docs.oracle.com/javase/7/docs/api/javax/sound/sampled/Mixer.html:
A mixer is an audio device with one or more lines. It need not be
designed for mixing audio signals.
And indeed, when I try to open multiple lines on the same mixer this fails with a LineUnavailableException. However if all your audio recordings have the same audio format it's quite easy to manually mix them together. For example if you have 2 inputs:
Convert both to the appropriate data type (for example byte[] for 8 bit audio, short[] for 16 bit, float[] for 32 bit floating point etc)
Sum them in another array. Make sure summed values do not exceed the range of the datatype.
Convert output back to bytes and write that to the SourceDataLine
See also How is audio represented with numbers?
Here's a sample mixing down 2 recordings and outputting as 1 signal, all in 16bit 48Khz stereo.
// print all devices (both input and output)
int i = 0;
Mixer.Info[] infos = AudioSystem.getMixerInfo();
for (Mixer.Info info : infos)
System.out.println(i++ + ": " + info.getName());
// select 2 inputs and 1 output
System.out.println("Select input 1: ");
int in1Index = Integer.parseInt(System.console().readLine());
System.out.println("Select input 2: ");
int in2Index = Integer.parseInt(System.console().readLine());
System.out.println("Select output: ");
int outIndex = Integer.parseInt(System.console().readLine());
// ugly java sound api stuff
try (Mixer in1Mixer = AudioSystem.getMixer(infos[in1Index]);
Mixer in2Mixer = AudioSystem.getMixer(infos[in2Index]);
Mixer outMixer = AudioSystem.getMixer(infos[outIndex])) {
in1Mixer.open();
in2Mixer.open();
outMixer.open();
try (TargetDataLine in1Line = (TargetDataLine) in1Mixer.getLine(in1Mixer.getTargetLineInfo()[0]);
TargetDataLine in2Line = (TargetDataLine) in2Mixer.getLine(in2Mixer.getTargetLineInfo()[0]);
SourceDataLine outLine = (SourceDataLine) outMixer.getLine(outMixer.getSourceLineInfo()[0])) {
// audio format 48khz 16 bit stereo (signed litte endian)
AudioFormat format = new AudioFormat(48000.0f, 16, 2, true, false);
// 4 bytes per frame (16 bit samples stereo)
int frameSize = 4;
int bufferSize = 4800;
int bufferBytes = frameSize * bufferSize;
// buffers for java audio
byte[] in1Bytes = new byte[bufferBytes];
byte[] in2Bytes = new byte[bufferBytes];
byte[] outBytes = new byte[bufferBytes];
// buffers for mixing
short[] in1Samples = new short[bufferBytes / 2];
short[] in2Samples = new short[bufferBytes / 2];
short[] outSamples = new short[bufferBytes / 2];
// how long to record & play
int framesProcessed = 0;
int durationSeconds = 10;
int durationFrames = (int) (durationSeconds * format.getSampleRate());
// open devices
in1Line.open(format, bufferBytes);
in2Line.open(format, bufferBytes);
outLine.open(format, bufferBytes);
in1Line.start();
in2Line.start();
outLine.start();
// start audio loop
while (framesProcessed < durationFrames) {
// record audio
in1Line.read(in1Bytes, 0, bufferBytes);
in2Line.read(in2Bytes, 0, bufferBytes);
// convert input bytes to samples
ByteBuffer.wrap(in1Bytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(in1Samples);
ByteBuffer.wrap(in2Bytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(in2Samples);
// mix samples - lower volume by 50% since we're mixing 2 streams
for (int s = 0; s < bufferBytes / 2; s++)
outSamples[s] = (short) ((in1Samples[s] + in2Samples[s]) * 0.5);
// convert output samples to bytes
ByteBuffer.wrap(outBytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().put(outSamples);
// play audio
outLine.write(outBytes, 0, bufferBytes);
framesProcessed += bufferBytes / frameSize;
}
in1Line.stop();
in2Line.stop();
outLine.stop();
}
}

Allright, I put something together which should get you started. I'll post the full code below but I'll first try and explain the steps involved.
The interesting part here is to create you're own audio "mixer" class which allows consumers of that class to schedule audio blocks at specific points in the (near) future. The specific-point-in-time part is important here: i'm assuming you receive network voices in packets where each packet needs to start exactly at the end of the previous one in order to play back a continuous sound for a single voice. Also since you say voices can overlap I'm assuming (yes, lots of assumptions) a new one can come in over the network while one or more old ones are still playing. So it seems reasonable to allow audio blocks to be scheduled from any thread. Note that there's only one thread actually writing to the dataline, it's just that any thread can submit audio packets to the mixer.
So for the submit-audio-packet part we now have this:
private final ConcurrentLinkedQueue<QueuedBlock> scheduledBlocks;
public void mix(long when, short[] block) {
scheduledBlocks.add(new QueuedBlock(when, Arrays.copyOf(block, block.length)));
}
The QueuedBlock class is just used to tag a byte array (the audio buffer) with the "when": the point in time where the block should be played.
Points in time are expressed relative to the current position of the audio stream. It is set to zero when the stream is created and updated with the buffer size each time an audio buffer is written to the dataline:
private final AtomicLong position = new AtomicLong();
public long position() {
return position.get();
}
Apart from all the hassle to set up the data line, the interesting part of the mixer class is obviously where the mixdown happens. For each scheduled audio block, it's split up into 3 cases:
The block is already played in it's entirety. Remove from the scheduledBlocks list.
The block is scheduled to start at some point in time after the current buffer. Do nothing.
(Part of) the block should be mixed down into the current buffer. Note that the beginning of the block may (or may not) be already played in previous buffer(s). Similarly, the end of the scheduled block may exceed the end of the current buffer in which case we mix down the first part of it and leave the rest for the next round, untill all of it has been played an the entire block is removed.
Also note that there's no reliable way to start playing audio data immediately, when you submit packets to the mixer be sure to always have them start at least the duration of 1 audio buffer from now otherwise you'll risk losing the beginning of your sound. Here's the mixdown code:
private static final double MIXDOWN_VOLUME = 1.0 / NUM_PRODUCERS;
private final List<QueuedBlock> finished = new ArrayList<>();
private final short[] mixBuffer = new short[BUFFER_SIZE_FRAMES * CHANNELS];
private final byte[] audioBuffer = new byte[BUFFER_SIZE_FRAMES * CHANNELS * 2];
private final AtomicLong position = new AtomicLong();
Arrays.fill(mixBuffer, (short) 0);
long bufferStartAt = position.get();
for (QueuedBlock block : scheduledBlocks) {
int blockFrames = block.data.length / CHANNELS;
// block fully played - mark for deletion
if (block.when + blockFrames <= bufferStartAt) {
finished.add(block);
continue;
}
// block starts after end of current buffer
if (bufferStartAt + BUFFER_SIZE_FRAMES <= block.when)
continue;
// mix in part of the block which overlaps current buffer
int blockOffset = Math.max(0, (int) (bufferStartAt - block.when));
int blockMaxFrames = blockFrames - blockOffset;
int bufferOffset = Math.max(0, (int) (block.when - bufferStartAt));
int bufferMaxFrames = BUFFER_SIZE_FRAMES - bufferOffset;
for (int f = 0; f < blockMaxFrames && f < bufferMaxFrames; f++)
for (int c = 0; c < CHANNELS; c++) {
int bufferIndex = (bufferOffset + f) * CHANNELS + c;
int blockIndex = (blockOffset + f) * CHANNELS + c;
mixBuffer[bufferIndex] += (short)
(block.data[blockIndex]*MIXDOWN_VOLUME);
}
}
scheduledBlocks.removeAll(finished);
finished.clear();
ByteBuffer
.wrap(audioBuffer)
.order(ByteOrder.LITTLE_ENDIAN)
.asShortBuffer()
.put(mixBuffer);
line.write(audioBuffer, 0, audioBuffer.length);
position.addAndGet(BUFFER_SIZE_FRAMES);
And finally a complete, self-contained sample which spawns a number of threads submitting audio blocks representing sinewaves of random duration and frequency to the mixer (called AudioConsumer in this sample). Replace sinewaves by incoming network packets and you should be halfway to a solution.
package test;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicLong;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.Line;
import javax.sound.sampled.Mixer;
import javax.sound.sampled.SourceDataLine;
public class Test {
public static final int CHANNELS = 2;
public static final int SAMPLE_RATE = 48000;
public static final int NUM_PRODUCERS = 10;
public static final int BUFFER_SIZE_FRAMES = 4800;
// generates some random sine wave
public static class ToneGenerator {
private static final double[] NOTES = {261.63, 311.13, 392.00};
private static final double[] OCTAVES = {1.0, 2.0, 4.0, 8.0};
private static final double[] LENGTHS = {0.05, 0.25, 1.0, 2.5, 5.0};
private double phase;
private int framesProcessed;
private final double length;
private final double frequency;
public ToneGenerator() {
ThreadLocalRandom rand = ThreadLocalRandom.current();
length = LENGTHS[rand.nextInt(LENGTHS.length)];
frequency = NOTES[rand.nextInt(NOTES.length)] * OCTAVES[rand.nextInt(OCTAVES.length)];
}
// make sound
public void fill(short[] block) {
for (int f = 0; f < block.length / CHANNELS; f++) {
double sample = Math.sin(phase * 2.0 * Math.PI);
for (int c = 0; c < CHANNELS; c++)
block[f * CHANNELS + c] = (short) (sample * Short.MAX_VALUE);
phase += frequency / SAMPLE_RATE;
}
framesProcessed += block.length / CHANNELS;
}
// true if length of tone has been generated
public boolean done() {
return framesProcessed >= length * SAMPLE_RATE;
}
}
// dummy audio producer, based on sinewave generator
// above but could also be incoming network packets
public static class AudioProducer {
final Thread thread;
final AudioConsumer consumer;
final short[] buffer = new short[BUFFER_SIZE_FRAMES * CHANNELS];
public AudioProducer(AudioConsumer consumer) {
this.consumer = consumer;
thread = new Thread(() -> run());
thread.setDaemon(true);
}
public void start() {
thread.start();
}
// repeatedly play random sine and sleep for some time
void run() {
try {
ThreadLocalRandom rand = ThreadLocalRandom.current();
while (true) {
long pos = consumer.position();
ToneGenerator g = new ToneGenerator();
// if we schedule at current buffer position, first part of the tone will be
// missed so have tone start somewhere in the middle of the next buffer
pos += BUFFER_SIZE_FRAMES + rand.nextInt(BUFFER_SIZE_FRAMES);
while (!g.done()) {
g.fill(buffer);
consumer.mix(pos, buffer);
pos += BUFFER_SIZE_FRAMES;
// we can generate audio faster than it's played
// sleep a while to compensate - this more closely
// corresponds to playing audio coming in over the network
double bufferLengthMillis = BUFFER_SIZE_FRAMES * 1000.0 / SAMPLE_RATE;
Thread.sleep((int) (bufferLengthMillis * 0.9));
}
// sleep a while in between tones
Thread.sleep(1000 + rand.nextInt(2000));
}
} catch (Throwable t) {
System.out.println(t.getMessage());
t.printStackTrace();
}
}
}
// audio consumer - plays continuously on a background
// thread, allows audio to be mixed in from arbitrary threads
public static class AudioConsumer {
// audio block with "when to play" tag
private static class QueuedBlock {
final long when;
final short[] data;
public QueuedBlock(long when, short[] data) {
this.when = when;
this.data = data;
}
}
// need not normally be so low but in this example
// we're mixing down a bunch of full scale sinewaves
private static final double MIXDOWN_VOLUME = 1.0 / NUM_PRODUCERS;
private final List<QueuedBlock> finished = new ArrayList<>();
private final short[] mixBuffer = new short[BUFFER_SIZE_FRAMES * CHANNELS];
private final byte[] audioBuffer = new byte[BUFFER_SIZE_FRAMES * CHANNELS * 2];
private final Thread thread;
private final AtomicLong position = new AtomicLong();
private final AtomicBoolean running = new AtomicBoolean(true);
private final ConcurrentLinkedQueue<QueuedBlock> scheduledBlocks = new ConcurrentLinkedQueue<>();
public AudioConsumer() {
thread = new Thread(() -> run());
}
public void start() {
thread.start();
}
public void stop() {
running.set(false);
}
// gets the play cursor. note - this is not accurate and
// must only be used to schedule blocks relative to other blocks
// (e.g., for splitting up continuous sounds into multiple blocks)
public long position() {
return position.get();
}
// put copy of audio block into queue so we don't
// have to worry about caller messing with it afterwards
public void mix(long when, short[] block) {
scheduledBlocks.add(new QueuedBlock(when, Arrays.copyOf(block, block.length)));
}
// better hope mixer 0, line 0 is output
private void run() {
Mixer.Info[] mixerInfo = AudioSystem.getMixerInfo();
try (Mixer mixer = AudioSystem.getMixer(mixerInfo[0])) {
Line.Info[] lineInfo = mixer.getSourceLineInfo();
try (SourceDataLine line = (SourceDataLine) mixer.getLine(lineInfo[0])) {
line.open(new AudioFormat(SAMPLE_RATE, 16, CHANNELS, true, false), BUFFER_SIZE_FRAMES);
line.start();
while (running.get())
processSingleBuffer(line);
line.stop();
}
} catch (Throwable t) {
System.out.println(t.getMessage());
t.printStackTrace();
}
}
// mix down single buffer and offer to the audio device
private void processSingleBuffer(SourceDataLine line) {
Arrays.fill(mixBuffer, (short) 0);
long bufferStartAt = position.get();
// mixdown audio blocks
for (QueuedBlock block : scheduledBlocks) {
int blockFrames = block.data.length / CHANNELS;
// block fully played - mark for deletion
if (block.when + blockFrames <= bufferStartAt) {
finished.add(block);
continue;
}
// block starts after end of current buffer
if (bufferStartAt + BUFFER_SIZE_FRAMES <= block.when)
continue;
// mix in part of the block which overlaps current buffer
// note that block may have already started in the past
// but extends into the current buffer, or that it starts
// in the future but before the end of the current buffer
int blockOffset = Math.max(0, (int) (bufferStartAt - block.when));
int blockMaxFrames = blockFrames - blockOffset;
int bufferOffset = Math.max(0, (int) (block.when - bufferStartAt));
int bufferMaxFrames = BUFFER_SIZE_FRAMES - bufferOffset;
for (int f = 0; f < blockMaxFrames && f < bufferMaxFrames; f++)
for (int c = 0; c < CHANNELS; c++) {
int bufferIndex = (bufferOffset + f) * CHANNELS + c;
int blockIndex = (blockOffset + f) * CHANNELS + c;
mixBuffer[bufferIndex] += (short) (block.data[blockIndex] * MIXDOWN_VOLUME);
}
}
scheduledBlocks.removeAll(finished);
finished.clear();
ByteBuffer.wrap(audioBuffer).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().put(mixBuffer);
line.write(audioBuffer, 0, audioBuffer.length);
position.addAndGet(BUFFER_SIZE_FRAMES);
}
}
public static void main(String[] args) {
System.out.print("Press return to exit...");
AudioConsumer consumer = new AudioConsumer();
consumer.start();
for (int i = 0; i < NUM_PRODUCERS; i++)
new AudioProducer(consumer).start();
System.console().readLine();
consumer.stop();
}
}

You can use the Tritontus library to do software audio mixing (it's old but still works quite well).
Add the dependency to your project:
<dependency>
<groupId>com.googlecode.soundlibs</groupId>
<artifactId>tritonus-all</artifactId>
<version>0.3.7.2</version>
</dependency>
Use the org.tritonus.share.sampled.FloatSampleBuffer. Both buffers must be of same AudioFormat before calling #mix.
// TODO instantiate these variables with real data
byte[] audio1, audio2;
AudioFormat af1, af2;
SourceDataLine sdl = AudioSystem.getSourceDataLine(af1);
FloatSampleBuffer fsb1 = new FloatSampleBuffer(audio1, 0, audio1.length, af1.getFormat());
FloatSampleBuffer fsb2 = new FloatSampleBuffer(audio2, 0, audio2.length, af2.getFormat());
fsb1.mix(fsb2);
byte[] result = fsb1.convertToByteArray(af1);
sdl.write(result, 0, result.length); // play it

Related

Given n files with their lengths, divide total bytes equally among d threads equally such that any two threads differ by atmost 1 byte

You are given a list of file names and their lengths in bytes.
Example:
File1: 200 File2: 500 File3: 800
You are given a number N. We want to launch N threads to read all the files parallelly such that each thread approximately reads an equal amount of bytes
You should return N lists. Each list describes the work of one thread: Example, when N=2, there are two threads. In the above example, there is a total of 1500 bytes (200 + 500 + 800). A fairway to divide is for each thread to read 750 bytes. So you will return:
Two lists
List 1: File1: 0 - 199 File2: 0 - 499 File3: 0-49 ---------------- Total 750 bytes
List 2: File3: 50-799 -------------------- Total 750 bytes
Implement the following method
List<List<FileRange>> getSplits(List<File> files, int N)
Class File {
String filename; long length }
Class FileRange {
String filename Long startOffset Long endOffset }
I tried with this one but it's not working any help would be highly appreciated.
List<List<FileRange>> getSplits(List<File> files, int n) {
List<List<FileRange>> al=new ArrayList<>();
long s=files.size();
long sum=0;
for(int i=0;i<s;i++){
long l=files.get(i).length;
sum+=(long)l;
}
long div=(long)sum/n; // no of bytes per thread
long mod=(long)sum%n;
long[] lo=new long[(long)n];
for(long i=0;i<n;i++)
lo[i]=div;
if(mod!=0){
long i=0;
while(mod>0){
lo[i]+=1;
mod--;
i++;
}
}
long inOffset=0;
for(long j=0;j<n;j++){
long val=lo[i];
for(long i=0;i<(long)files.size();i++){
String ss=files.get(i).filename;
long ll=files.get(i).length;
if(ll<val){
inOffset=0;
val-=ll;
}
else{
inOffset=ll-val;
ll=val;
}
al.add(new ArrayList<>(new File(ss,inOffset,ll-1)));
}
}
}
I'm getting problem in startOffset and endOffset with it's corresponding file. I tried it but I was not able to extract from List and add in the form of required return type List>.
The essence of the problem is to simultaneously walk through two lists:
the input list, which is a list of files
the output list, which is a list of threads (where each thread has a list of ranges)
I find that the easiest approach to such problems is an infinite loop that looks something like this:
while (1)
{
move some information from the input to the output
decide whether to advance to the next input item
decide whether to advance to the next output item
if we've reached (the end of the input _OR_ the end of the output)
break
if we advanced to the next input item
prepare the next input item for processing
if we advanced to the next output item
prepare the next output item for processing
}
To keep track of the input, we need the following information
fileIndex the index into the list of files
fileOffset the offset of the first unassigned byte in the file, initially 0
fileRemain the number of bytes in the file that are unassigned, initially the file size
To keep track of the output, we need
threadIndex the index of the thread we're currently working on (which is the first index into the List<List<FileRange>> that the algorithm produces)
threadNeeds the number of bytes that the thread still needs, initially base or base+1
Side note: I'm using base as the minimum number bytes assigned to each thread (sum/n), and extra as the number of threads that get an extra byte (sum%n).
So now we get to the heart of the algorithm: what information to move from input to output:
if fileRemain is less than threadNeeds then the rest of the file (which may be the entire file) gets assigned to the current thread, and we move to the next file
if fileRemain is greater than threadNeeds then a portion of the file is assigned to the current thread, and we move to the next thread
if fileRemain is equal to threadNeeds then the rest of the file is assigned to the thread, and we move to the next file, and the next thread
Those three cases are easily handled by comparing fileRemain and threadNeeds, and choosing a byteCount that is the minimum of the two.
With all that in mind, here's some pseudo-code to help get you started:
base = sum/n;
extra = sum%n;
// initialize the input control variables
fileIndex = 0
fileOffset = 0
fileRemain = length of file 0
// initialize the output control variables
threadIndex = 0
threadNeeds = base
if (threadIndex < extra)
threadNeeds++
while (1)
{
// decide how many bytes can be assigned, and generate some output
byteCount = min(fileRemain, threadNeeds)
add (file.name, fileOffset, fileOffset+byteCount-1) to the list of ranges
// decide whether to advance to the next input and output items
threadNeeds -= byteCount
fileRemain -= byteCount
if (threadNeeds == 0)
threadIndex++
if (fileRemain == 0)
fileIndex++
// are we done yet?
if (threadIndex == n || fileIndex == files.size())
break
// if we've moved to the next input item, reinitialize the input control variables
if (fileRemain == 0)
{
fileOffset = 0
fileRemain = length of file
}
// if we've moved to the next output item, reinitialize the output control variables
if (threadNeeds == 0)
{
threadNeeds = base
if (threadIndex < extra)
threadNeeds++
}
}
Debugging tip: Reaching the end of the input, and the end of the output, should happen simultaneously. In other words, you should run out of files at exactly the same time as you run out of threads. So during development, I would check both conditions, and verify that they do, in fact, change at the same time.
Here's the code solution for your problem (in Java) :
The custom class 'File' and 'FileRange' are as follows :
public class File{
String filename;
long length;
public File(String filename, long length) {
this.filename = filename;
this.length = length;
}
public String getFilename() {
return filename;
}
public void setFilename(String filename) {
this.filename = filename;
}
public long getLength() {
return length;
}
public void setLength(long length) {
this.length = length;
}
}
public class FileRange {
String filename;
Long startOffset;
Long endOffset;
public FileRange(String filename, Long startOffset, Long endOffset) {
this.filename = filename;
this.startOffset = startOffset;
this.endOffset = endOffset;
}
public String getFilename() {
return filename;
}
public void setFilename(String filename) {
this.filename = filename;
}
public Long getStartOffset() {
return startOffset;
}
public void setStartOffset(Long startOffset) {
this.startOffset = startOffset;
}
public Long getEndOffset() {
return endOffset;
}
public void setEndOffset(Long endOffset) {
this.endOffset = endOffset;
}
}
The main class will be as follows :
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
import java.util.concurrent.atomic.AtomicInteger;
public class MainClass {
private static List<List<FileRange>> getSplits(List<File> files, int N) {
List<List<FileRange>> results = new ArrayList<>();
long sum = files.stream().mapToLong(File::getLength).sum(); // Total bytes in all the files
long div = sum/N;
long mod = sum%N;
// Storing how many bytes each thread gets to process
long thread_bytes[] = new long[N];
// At least 'div' number of bytes will be processed by each thread
for(int i=0;i<N;i++)
thread_bytes[i] = div;
// Left over bytes to be processed by each thread
for(int i=0;i<mod;i++)
thread_bytes[i] += 1;
int count = 0;
int len = files.size();
long processed_bytes[] = new long[len];
long temp = 0L;
int file_to_be_processed = 0;
while(count < N && sum > 0) {
temp = thread_bytes[count];
sum -= temp;
List<FileRange> internal = new ArrayList<>();
while (temp > 0) {
// Start from the file to be processed - Will be 0 in the first iteration
// Will be updated in the subsequent iterations
for(int j=file_to_be_processed;j<len && temp>0;j++){
File f = files.get(j);
if(f.getLength() - processed_bytes[j] <= temp){
internal.add(new FileRange(f.getFilename(), processed_bytes[j], f.getLength()- 1));
processed_bytes[j] = f.getLength() - processed_bytes[j];
temp -= processed_bytes[j];
file_to_be_processed++;
}
else{
internal.add(new FileRange(f.getFilename(), processed_bytes[j], processed_bytes[j] + temp - 1));
// In this case, we won't update the number for file to be processed
processed_bytes[j] += temp;
temp -= processed_bytes[j];
}
}
results.add(internal);
count++;
}
}
return results;
}
public static void main(String args[]){
Scanner scn = new Scanner(System.in);
int N = scn.nextInt();
// Inserting demo records in list
File f1 = new File("File 1",200);
File f2 = new File("File 2",500);
File f3 = new File("File 3",800);
List<File> files = new ArrayList<>();
files.add(f1);
files.add(f2);
files.add(f3);
List<List<FileRange>> results = getSplits(files, N);
final AtomicInteger result_count = new AtomicInteger();
// Displaying the results
results.forEach(result -> {
System.out.println("List "+result_count.incrementAndGet() + " : ");
result.forEach(res -> {
System.out.print(res.getFilename() + " : ");
System.out.print(res.getStartOffset() + " - ");
System.out.print(res.getEndOffset() + "\n");
});
System.out.println("---------------");
});
}
}
If some part is still unclear, consider a case and dry run the program.
Say 999 bytes have to be processed by 100 threads
So the 100 threads get 9 bytes each and out of the remaining 99 bytes, each thread except the 100th gets 1 byte. By doing this, we'll make sure no 2 threads differ by at most 1 byte. Proceed with this idea and follow up with the code.

Raspberry PI + Wiegand RFID

I have problem with my Wiegand RFID reader (26bit). I have write simple Java app and everything seems fine. But after 10 reads for example, it starts to shift bits. Is RPi Raspbian to slow for Wiegand time protocol?
Here is sample code and output
package classes;
import com.pi4j.io.gpio.GpioController;
import com.pi4j.io.gpio.GpioFactory;
import com.pi4j.io.gpio.GpioPinDigitalInput;
import com.pi4j.io.gpio.PinPullResistance;
import com.pi4j.io.gpio.RaspiPin;
public class Test {
public static char[] s = new char[10000];
static int bits = 0;
public static void main(String[] args) {
// create gpio controller
final GpioController gpio = GpioFactory.getInstance();
// provision gpio pin #02 as an input pin with its internal pull down
// resistor enabled
final GpioPinDigitalInput pin0 = gpio.provisionDigitalInputPin(RaspiPin.GPIO_00, PinPullResistance.PULL_UP);
final GpioPinDigitalInput pin1 = gpio.provisionDigitalInputPin(RaspiPin.GPIO_01, PinPullResistance.PULL_UP);
System.out.println("PINs ready");
Thread th = new Thread(new Runnable() {
#Override
public void run() {
while (true) {
if (pin0.isLow()) { // D1 on ground?
s[bits++] = '0';
while (pin0.isLow()) {
}
}
if (pin1.isLow()) { // D1 on ground?
s[bits++] = '1';
while (pin1.isLow()) {
}
}
if (bits == 26) {
bits=0;
Print();
}
}
}
});
th.setPriority(Thread.MAX_PRIORITY);
th.start();
System.out.println("Thread start");
for (;;) {
try {
Thread.sleep(500);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
protected static void Print() {
for (int i = 0; i < 26; i++) {
System.out.write(s[i]);
}
System.out.println();
bits = 0;
}
}
and output:
10100100111111110110011011
10100100111111110110011011
10100100111111110110011011
10100100111111110110011011
10100100111111110001101110
10010011111111011001101110
10010011111111011001101110
10010011111111011001101110
10010011111111011001101110
10010011111111011001101110
10010011111111011001101110
Your printf statements may be causing the problem. Try storing the data and printing it at the end. printf tends to be slow (it involves several context switches).
Also, it seems you have no way of detecting if you miss a bit. I would say try a timeout, so if you don't get 26 bits in time reset your counter. That way you're not looping around reading nothing, and eventually getting misaligned data.
I have done this in C and python on a pi and also on an arduino. From my experience as #woodrow douglas says you need to capture the bits in a loop or use interrupts (better) and use a timeout which you increase each time you receive a bit and then print it out once you are sure you have all the bits (timed out).
This is how I do this on arduino using interrupts.
void zero(){
bit_count ++;
bit_holder = (bit_holder << 1) + 0; //shift left one and add a 0
timeout = t;
}
void one(){
bit_count ++;
bit_holder = (bit_holder << 1) + 1; //shift left one and add a 1
timeout = t;
}
void loop() {
timeout --;
if (timeout == 0 && bit_count > 0){
lcd.clear();
lcd.print("Dec:");
lcd.print(bit_holder);
lcd.setCursor(0,1);
lcd.print("Hex:");
lcd.print(String(bit_holder,HEX));
Serial.print("bit count= ");
Serial.println(bit_count);
Serial.print("bits= ");
Serial.println(bit_holder,BIN);
oldbit = bit_holder; //store previous this value as previous
bit_count = 0; //reset bit count
bit_holder = 0; //reset badge number
}
}
I never got any issues using C on Pi but I did get issues using python as it is not as real time. The only way it would work in Python was by using interrupts and I managed too get the bad read rate down to something like 1 in 200 but never completely remove it.
What I did in the end was use some C to collect the bits then call my python script with the bits for processing.
If you are interested this is the C code I use:
#include <wiringPi.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <time.h>
int r1data0 = 6; //pin 22
int r1data1 = 7; // Pin 7
// green goes to same as relay
int r1beep = 0; //pin 11
int r2data0 = 10; /* P1-24 */
int r2data1 = 11; /* P1-26 */
//green goes to same as relay
int r2beep = 0;
int i1 = 0;
int i2 = 0;
//generic variables
int t = 5;
int blank; //blank variable to reset bits
int rel1time = 0;
int rel2time = 0;
int rel1 = 5;
int previoust;
//reader 1variables
int r1bits; // Collected bits storage area
int r1bit_count = 0; //to measure the size of bits
int oldr1bit_count = 0; //somewhere to store bitcount to send to python
int r1timeout; //timout to return correct value
char r1command[40];
int r1ret;
//reader 2 variables
int r2bits; // Collected bits storage area
int r2bit_count = 0; //to measure the size of bits
int oldr2bit_count = 0;
int r2timeout; //timout to return correct value
char r2command[40];
pthread_t threads;
void access_denied(int red){
int pin;
if(red = 1){pin = r1beep;}
if(red = 2){pin = r2beep;}
pinMode(pin, OUTPUT);
digitalWrite(pin, LOW);
delay(300);;
digitalWrite(pin, HIGH);
delay(200);
digitalWrite(pin, LOW);
delay(300);
digitalWrite(pin, HIGH);
delay(200);
digitalWrite(pin, LOW);
delay(300);
digitalWrite(pin, HIGH);
}
void *r1python_thread(void *val){
sprintf(r1command,"python access.py r1 %X %X", oldr1bit_count, val); //build python command
FILE* file = popen(r1command, "r"); //execute command using popen
char buffer[5];
fscanf(file, "%100s", buffer); //read command output
pclose(file);
//printf("buffer is : %s\n", buffer);
rel1time = atoi(buffer); //convert returned string to int
if(rel1time == 0){access_denied(1);}
pthread_exit(NULL);
}
void *r2python_thread(void *val){
sprintf(r2command,"python access.py r2 %X",val); //build python command
FILE* file = popen(r2command, "r"); //execute command using popen
char buffer[5];
fscanf(file, "%100s", buffer); //read command output
pclose(file);
//printf("buffer is : %s\n", buffer);
rel2time = atoi(buffer); //convert returned string to int
pthread_exit(NULL);
}
//reader 1 bit functions
void onebit0(){ //adds a 0
r1bit_count ++; //increase bit count
r1bits = (r1bits << 1) + 0;
r1timeout = t; //reset timeout
}
void onebit1(){ //adds a 1
r1bit_count ++;
r1bits = (r1bits << 1) + 1;
r1timeout = t;
}
//reader 2 bit functions
void twobit0(){ //adds a 0
r2bit_count ++; //increase bit count
r2bits = (r2bits << 1) + 0;
r2timeout = t; //reset timeout
}
void twobit1(){ //adds a 1
r2bit_count ++;
r2bits = (r2bits << 1) + 1;
r2timeout = t;
}
int main(){
wiringPiSetup(); //initialise wiringPi
pinMode (r1data0, INPUT); // set reader 1 data0 as input
pinMode (r1data1, INPUT); // set reader 1 data1 as input
pinMode (r2data0, INPUT); // set reader 2 data0 as input
pinMode (r2data1, INPUT); // set reader 2 data1 as input
//reader 1
wiringPiISR(r1data0, INT_EDGE_FALLING, onebit0); // set interrupt on data 0 if it falls call bit0 function
wiringPiISR(r1data1, INT_EDGE_FALLING, onebit1); // set interrupt on data 1 if it falls call bit1 function
//reader 2
wiringPiISR(r2data0, INT_EDGE_FALLING, twobit0); // set interrupt on data 0 if it falls call bit0 function
wiringPiISR(r2data1, INT_EDGE_FALLING, twobit1); // set interrupt on data 1 if it falls call bit1 function
while (1){ //loop
if (r1bit_count > 0 ){ //if bits is not empty
r1timeout--; // reduce timeout by 1
if(r1timeout == 0){ //and it has timed out ie no more bits coming
//printf("%X\n",r1bits);
pthread_create(&threads, NULL, r1python_thread,(void *) r1bits); //start new thread for python program
oldr1bit_count = r1bit_count;
r1bit_count = 0; //reset bit count
r1bits = blank; //clear bits cariable
r1timeout = t; //reset timeout
}
}
if (r2bit_count > 0 ){ //if bits is not empty
r2timeout--; // reduce timeout by 1
if(r2timeout == 0){ //and it has timed out ie no more bits coming
pthread_create(&threads, NULL, r2python_thread,(void *) r2bits); //start new thread for python program
r2bit_count = 0; //reset bit count
r2bits = blank; //clear bits cariable
r2timeout = t; //reset timeout
}
}
if (rel1time > 0){
pinMode(rel1, OUTPUT);
int diff = time(NULL) - previoust;
if(diff >= 1){
previoust = time(NULL);
rel1time--;
}
}
else{
pinMode(rel1, INPUT);
previoust = time(NULL);
}
delay(1);
}
return 0;
}

Android app to record sound in real time and identify frequency

I need to develop an app to record frequencies in real time using the phone's mic and then display them (in text). I am posting my code here. The FFT and complex classes have been used from http://introcs.cs.princeton.edu/java/97data/FFT.java.html and http://introcs.cs.princeton.edu/java/97data/Complex.java.html .The problem is when i run this on the emulator the frequency starts from some random value and keeps on increasing till 7996. It then repeats the whole process. Can someone plz help me out?
public class Main extends Activity {
TextView disp;
private static int[] sampleRate = new int[] { 44100, 22050, 11025, 8000 };
short audioData[];
double finalData[];
int bufferSize,srate;
String TAG;
public boolean recording;
AudioRecord recorder;
Complex[] fftArray;
float freq;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
disp = (TextView) findViewById(R.id.display);
Thread t1 = new Thread(new Runnable(){
public void run() {
Log.i(TAG,"Setting up recording");
for (int rate : sampleRate) {
try{
Log.d(TAG, "Attempting rate " + rate);
bufferSize=AudioRecord.getMinBufferSize(rate,AudioFormat.CHANNEL_CONFIGURATION_MONO,
AudioFormat.ENCODING_PCM_16BIT)*3; //get the buffer size to use with this audio record
if (bufferSize != AudioRecord.ERROR_BAD_VALUE) {
recorder = new AudioRecord (MediaRecorder.AudioSource.MIC,rate,AudioFormat.CHANNEL_CONFIGURATION_MONO,
AudioFormat.ENCODING_PCM_16BIT,2048); //instantiate the AudioRecorder
Log.d(TAG, "BufferSize " +bufferSize);
srate = rate;
}
} catch (Exception e) {
Log.e(TAG, rate + "Exception, keep trying.",e);
}
}
bufferSize=2048;
recording=true; //variable to use start or stop recording
audioData = new short [bufferSize]; //short array that pcm data is put into.
Log.i(TAG,"Got buffer size =" + bufferSize);
while (recording) { //loop while recording is needed
Log.i(TAG,"in while 1");
if (recorder.getState()==android.media.AudioRecord.STATE_INITIALIZED) // check to see if the recorder has initialized yet.
if (recorder.getRecordingState()==android.media.AudioRecord.RECORDSTATE_STOPPED)
recorder.startRecording(); //check to see if the Recorder has stopped or is not recording, and make it record.
else {
Log.i(TAG,"in else");
// audiorecord();
finalData=convert_to_double(audioData);
Findfft();
for(int k=0;k<fftArray.length;k++)
{
freq = ((float)srate/(float) fftArray.length) *(float)k;
runOnUiThread(new Runnable(){
public void run()
{
disp.setText("The frequency is " + freq);
if(freq>=15000)
recording = false;
}
});
}
}//else recorder started
} //while recording
if (recorder.getState()==android.media.AudioRecord.RECORDSTATE_RECORDING)
recorder.stop(); //stop the recorder before ending the thread
recorder.release(); //release the recorders resources
recorder=null; //set the recorder to be garbage collected.
}//run
});
t1.start();
}
private void Findfft() {
// TODO Auto-generated method stub
Complex[] fftTempArray = new Complex[bufferSize];
for (int i=0; i<bufferSize; i++)
{
fftTempArray[i] = new Complex(finalData[i], 0);
}
fftArray = FFT.fft(fftTempArray);
}
private double[] convert_to_double(short data[]) {
// TODO Auto-generated method stub
double[] transformed = new double[data.length];
for (int j=0;j<data.length;j++) {
transformed[j] = (double)data[j];
}
return transformed;
}
#Override
public boolean onCreateOptionsMenu(Menu menu) {
// Inflate the menu; this adds items to the action bar if it is present.
getMenuInflater().inflate(R.menu.main, menu);
return true;
}
}
Your question has been succinctly answered, however, to further your objectives and complete the loop...
Yes, FFT is not optimal on limited CPUs for pitch / frequency identification. A more optimal approach is YIN described here. You may find an implementation at Tarsos.
Issues you will face are the lack of javax.sound.sampled in the ADK and therefore converting the shorts/bytes from AudioRecord to the floats required for the referenced implementations.
Your problem is right here:
Findfft();
for(int k=0;k<fftArray.length;k++) {
freq = ((float)srate/(float) fftArray.length) *(float)k;
runOnUiThread(new Runnable() {
public void run() {
disp.setText("The frequency is " + freq);
if(freq>=15000) recording = false;
}
});
}
All this for loop does is go through your array of FFT values, convert the array index to a frequency in Hz, and print it.
If you want to output what frequency you're recording, you should at least look at the data in your array - the crudest method would be to calculate the square real magnitude and find the frequency bin with the biggest.
In addition to that, I don't think the FFT algorithm you're using does any precalculations - there are others that do, and seeing as you're developing for a mobile device, you might want to take CPU usage and power use into account.
JTransforms is one library that does use precalculation to lower CPU load, and its documentation is very complete.
You may also find useful information on how to interpret the data returned from the FFT at Wikipedia - no offense, but it looks like you're not quite sure what you're doing, so I'm giving pointers.
Lastly, if you're looking to use this app for musical notes, I seem to remember lots of people saying that an FFT isn't the best way to do that, but I can't remember what is. Maybe someone else can add that bit?
i find this solution after few days - the Best for getting frequency in Hrz:
download Jtransforms and this Jar also - Jtransforms need it.
then i use this task:
public class MyRecorder extends AsyncTask<Void, short[], Void> {
int blockSize = 2048;// = 256;
private static final int RECORDER_SAMPLERATE = 8000;
private static final int RECORDER_CHANNELS = AudioFormat.CHANNEL_IN_MONO;
private static final int RECORDER_AUDIO_ENCODING = AudioFormat.ENCODING_PCM_16BIT;
int BufferElements2Rec = 1024; // want to play 2048 (2K) since 2 bytes we use only 1024
int BytesPerElement = 2;
#Override
protected Void doInBackground(Void... params) {
try {
final AudioRecord audioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC,
RECORDER_SAMPLERATE, RECORDER_CHANNELS,
RECORDER_AUDIO_ENCODING, BufferElements2Rec * BytesPerElement);
if (audioRecord == null) {
return null;
}
final short[] buffer = new short[blockSize];
final double[] toTransform = new double[blockSize];
audioRecord.startRecording();
while (started) {
Thread.sleep(100);
final int bufferReadResult = audioRecord.read(buffer, 0, blockSize);
publishProgress(buffer);
}
audioRecord.stop();
audioRecord.release();
} catch (Throwable t) {
Log.e("AudioRecord", "Recording Failed");
}
return null;
}
#Override
protected void onProgressUpdate(short[]... buffer) {
super.onProgressUpdate(buffer);
float freq = calculate(RECORDER_SAMPLERATE, buffer[0]);
}
public static float calculate(int sampleRate, short [] audioData)
{
int numSamples = audioData.length;
int numCrossing = 0;
for (int p = 0; p < numSamples-1; p++)
{
if ((audioData[p] > 0 && audioData[p + 1] <= 0) ||
(audioData[p] < 0 && audioData[p + 1] >= 0))
{
numCrossing++;
}
}
float numSecondsRecorded = (float)numSamples/(float)sampleRate;
float numCycles = numCrossing/2;
float frequency = numCycles/numSecondsRecorded;
return frequency;
}

measuring sound intensity after analyzing spectrum?

i am writing a program on smartphone (on Android)
It is about :
Analyzing spectrum of sound by fft algorithms
measuring the intensity of a sound have f = fo (ex. fo = 18khz) from the spectrum which I have got results from the analysis above.
Calculating the distance from smartphone to source of sound with this intensity
After fft, I got two arrays (real and image). I calculate the sound intensity at f=18000hz( suppose that source of sound at 18000 hz is unchanged so that it makes it easier to measure sound intensity). As follow:
frequency at bin FFT[i] is :
if i <= [N/2] then i * SamplingFrequency / N
if i >= [N/2] then (N-i) * SamplingFrequency / N
therefore at frequency = 18000hz then I choose i = 304
sound intensity = real_array[304] * real_array[304] + image_array[304] * image_array[304]
However, the intensity, in fact, varies a lot making it difficult to measure the distance. And, I have no idea how to explain this.
Besides, I would like to ask you a question that the intensity I have measured above uses what unit to calculate.
Here is my code:
a. fft algorithms( I use fft 512 point)
import define.define512;
public class fft {
private static float[] W_real;
private static float[] W_img;
private static float[] input_real= new float[512];
private static float[] input_img;
//input_real1 is values from mic(smartphone)
//output is values of sound intensity
public static void FFT(float[] input_real1, float[] output)
{
for(int i =0;i<512;i++) input_real[i] = input_real1[i];
input_img = new float[512];
W_real = define512.W_IMAG;
W_img = define512.W_IMAG;
int[] W_order = define512.ORDER;
float[] output_real = new float[512], output_img = new float[512];
fftradix2(0,511);
//reorder deals with inverse bit
reorder(input_real, input_img, output_real, output_img, W_order, 512);
for(int i =0;i<512;i++)
{
output[i] = sqrt((output_real[i]*output_real[i] + output_img[i]*output_img[i]));
}
}
private static void reorder(float[] in_real,float[] in_imag, float[] out_real,float[] out_imag,int[] order,int N){
for(int i=0;i<N;i++){
out_real[i]=in_real[order[i]];
out_imag[i]=in_imag[order[i]];
}
}
//fft algorithms
private static void fftradix2(int dau,int cuoi)
{
int check = cuoi - dau;
if (check == 1)
{
input_real[dau] = input_real[dau] + input_real[cuoi];
input_img[dau] = input_img[dau] + input_img[cuoi];
input_real[cuoi] = input_real[dau] -2* input_real[cuoi];
input_img[cuoi] = input_img[dau] -2* input_img[cuoi];
}
else
{
int index = 512/(cuoi - dau + 1);
int tg = (cuoi - dau)/2;
fftradix2(dau,(dau+tg));
fftradix2((cuoi-tg),cuoi);
for(int i = dau;i<=(dau+tg);i++)
{
input_real[i] = input_real[i] + input_real[i+tg+1]*W_real[(i-dau)*index] - input_img[i+tg+1]*W_img[(i-dau)*index];
input_img[i] = input_img[i] + input_real[i+tg+1]*W_img[(i-dau)*index] + input_img[i+tg+1]*W_real[(i%(tg+1))*index];
input_real[i+tg+1] = input_real[i] -2* input_real[i+tg+1]*W_real[(i-dau)*index] +2* input_img[i+tg+1]*W_img[(i-dau)*index];
input_img[i+tg+1] = input_img[i] -2* input_real[i+tg+1]*W_img[(i-dau)*index] -2* input_img[i+tg+1]*W_real[(i-dau)*index];
}
}
}
}
b. code use mic in smartphone
NumOverlapSample = 800;
NumNewSample = 224;
private static int Fs = 44100;
private byte recorderAudiobuffer[] = new byte [1024];
AudioRecord recorder = new AudioRecord(AudioSource.MIC, Fs, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, 4096);
//start recorder
recorder.startRecording();
timer.schedule(new task_update(), 1000, 10);
class task_update extends TimerTask
{
#Override
public void run() {
// TODO Auto-generated method stub
for(int i=0;i<NumOverlapSample;i++)
recorderAudiobuffer[i] = recorderAudiobuffer[i+NumNewSample];
int bufferRead = recorder.read(recorderAudiobuffer,NumOverlapSample,NumNewSample);
convert.decode(recorderAudiobuffer, N, input);
fft.FFT(input, output);
}
and my soucre https://www.box.com/s/zuppzkicymfsuv4kb65p
thanks for all
At 18 kHz, microphone type, position and direction, as well as sound reflections from the nearby acoustic environment will strongly influence the sound level.

Binary search in a sorted (memory-mapped ?) file in Java

I am struggling to port a Perl program to Java, and learning Java as I go. A central component of the original program is a Perl module that does string prefix lookups in a +500 GB sorted text file using binary search
(essentially, "seek" to a byte offset in the middle of the file, backtrack to nearest newline, compare line prefix with the search string, "seek" to half/double that byte offset, repeat until found...)
I have experimented with several database solutions but found that nothing beats this in sheer lookup speed with data sets of this size. Do you know of any existing Java library that implements such functionality? Failing that, could you point me to some idiomatic example code that does random access reads in text files?
Alternatively, I am not familiar with the new (?) Java I/O libraries but would it be an option to memory-map the 500 GB text file (I'm on a 64-bit machine with memory to spare) and do binary search on the memory-mapped byte array? I would be very interested to hear any experiences you have to share about this and similar problems.
I am a big fan of Java's MappedByteBuffers for situations like this. It is blazing fast. Below is a snippet I put together for you that maps a buffer to the file, seeks to the middle, and then searches backwards to a newline character. This should be enough to get you going?
I have similar code (seek, read, repeat until done) in my own application, benchmarked
java.io streams against MappedByteBuffer in a production environment and posted the results on my blog (Geekomatic posts tagged 'java.nio' ) with raw data, graphs and all.
Two second summary? My MappedByteBuffer-based implementation was about 275% faster. YMMV.
To work for files larger than ~2GB, which is a problem because of the cast and .position(int pos), I've crafted paging algorithm backed by an array of MappedByteBuffers. You'll need to be working on a 64-bit system for this to work with files larger than 2-4GB because MBB's use the OS's virtual memory system to work their magic.
public class StusMagicLargeFileReader {
private static final long PAGE_SIZE = Integer.MAX_VALUE;
private List<MappedByteBuffer> buffers = new ArrayList<MappedByteBuffer>();
private final byte raw[] = new byte[1];
public static void main(String[] args) throws IOException {
File file = new File("/Users/stu/test.txt");
FileChannel fc = (new FileInputStream(file)).getChannel();
StusMagicLargeFileReader buffer = new StusMagicLargeFileReader(fc);
long position = file.length() / 2;
String candidate = buffer.getString(position--);
while (position >=0 && !candidate.equals('\n'))
candidate = buffer.getString(position--);
//have newline position or start of file...do other stuff
}
StusMagicLargeFileReader(FileChannel channel) throws IOException {
long start = 0, length = 0;
for (long index = 0; start + length < channel.size(); index++) {
if ((channel.size() / PAGE_SIZE) == index)
length = (channel.size() - index * PAGE_SIZE) ;
else
length = PAGE_SIZE;
start = index * PAGE_SIZE;
buffers.add(index, channel.map(READ_ONLY, start, length));
}
}
public String getString(long bytePosition) {
int page = (int) (bytePosition / PAGE_SIZE);
int index = (int) (bytePosition % PAGE_SIZE);
raw[0] = buffers.get(page).get(index);
return new String(raw);
}
}
I have the same problem. I am trying to find all lines that start with some prefix in a sorted file.
Here is a method I cooked up which is largely a port of Python code found here: http://www.logarithmic.net/pfh/blog/01186620415
I have tested it but not thoroughly just yet. It does not use memory mapping, though.
public static List<String> binarySearch(String filename, String string) {
List<String> result = new ArrayList<String>();
try {
File file = new File(filename);
RandomAccessFile raf = new RandomAccessFile(file, "r");
long low = 0;
long high = file.length();
long p = -1;
while (low < high) {
long mid = (low + high) / 2;
p = mid;
while (p >= 0) {
raf.seek(p);
char c = (char) raf.readByte();
//System.out.println(p + "\t" + c);
if (c == '\n')
break;
p--;
}
if (p < 0)
raf.seek(0);
String line = raf.readLine();
//System.out.println("-- " + mid + " " + line);
if (line.compareTo(string) < 0)
low = mid + 1;
else
high = mid;
}
p = low;
while (p >= 0) {
raf.seek(p);
if (((char) raf.readByte()) == '\n')
break;
p--;
}
if (p < 0)
raf.seek(0);
while (true) {
String line = raf.readLine();
if (line == null || !line.startsWith(string))
break;
result.add(line);
}
raf.close();
} catch (IOException e) {
System.out.println("IOException:");
e.printStackTrace();
}
return result;
}
I am not aware of any library that has that functionality. However, a correct code for a external binary search in Java should be similar to this:
class ExternalBinarySearch {
final RandomAccessFile file;
final Comparator<String> test; // tests the element given as search parameter with the line. Insert a PrefixComparator here
public ExternalBinarySearch(File f, Comparator<String> test) throws FileNotFoundException {
this.file = new RandomAccessFile(f, "r");
this.test = test;
}
public String search(String element) throws IOException {
long l = file.length();
return search(element, -1, l-1);
}
/**
* Searches the given element in the range [low,high]. The low value of -1 is a special case to denote the beginning of a file.
* In contrast to every other line, a line at the beginning of a file doesn't need a \n directly before the line
*/
private String search(String element, long low, long high) throws IOException {
if(high - low < 1024) {
// search directly
long p = low;
while(p < high) {
String line = nextLine(p);
int r = test.compare(line,element);
if(r > 0) {
return null;
} else if (r < 0) {
p += line.length();
} else {
return line;
}
}
return null;
} else {
long m = low + ((high - low) / 2);
String line = nextLine(m);
int r = test.compare(line, element);
if(r > 0) {
return search(element, low, m);
} else if (r < 0) {
return search(element, m, high);
} else {
return line;
}
}
}
private String nextLine(long low) throws IOException {
if(low == -1) { // Beginning of file
file.seek(0);
} else {
file.seek(low);
}
int bufferLength = 65 * 1024;
byte[] buffer = new byte[bufferLength];
int r = file.read(buffer);
int lineBeginIndex = -1;
// search beginning of line
if(low == -1) { //beginning of file
lineBeginIndex = 0;
} else {
//normal mode
for(int i = 0; i < 1024; i++) {
if(buffer[i] == '\n') {
lineBeginIndex = i + 1;
break;
}
}
}
if(lineBeginIndex == -1) {
// no line begins within next 1024 bytes
return null;
}
int start = lineBeginIndex;
for(int i = start; i < r; i++) {
if(buffer[i] == '\n') {
// Found end of line
return new String(buffer, lineBeginIndex, i - lineBeginIndex + 1);
return line.toString();
}
}
throw new IllegalArgumentException("Line to long");
}
}
Please note: I made up this code ad-hoc: Corner cases are not tested nearly good enough, the code assumes that no single line is larger than 64K, etc.
I also think that building an index of the offsets where lines start might be a good idea. For a 500 GB file, that index should be stored in an index file. You should gain a not-so-small constant factor with that index because than there is no need to search for the next line in each step.
I know that was not the question, but building a prefix tree data structure like (Patrica) Tries (on disk/SSD) might be a good idea to do the prefix search.
This is a simple example of what you want to achieve. I would probably first index the file, keeping track of the file position for each string. I'm assuming the strings are separated by newlines (or carriage returns):
RandomAccessFile file = new RandomAccessFile("filename.txt", "r");
List<Long> indexList = new ArrayList();
long pos = 0;
while (file.readLine() != null)
{
Long linePos = new Long(pos);
indexList.add(linePos);
pos = file.getFilePointer();
}
int indexSize = indexList.size();
Long[] indexArray = new Long[indexSize];
indexList.toArray(indexArray);
The last step is to convert to an array for a slight speed improvement when doing lots of lookups. I would probably convert the Long[] to a long[] also, but I did not show that above. Finally the code to read the string from a given indexed position:
int i; // Initialize this appropriately for your algorithm.
file.seek(indexArray[i]);
String line = file.readLine();
// At this point, line contains the string #i.
If you are dealing with a 500GB file, then you might want to use a faster lookup method than binary search - namely a radix sort which is essentially a variant of hashing. The best method for doing this really depends on your data distributions and types of lookup, but if you are looking for string prefixes there should be a good way to do this.
I posted an example of a radix sort solution for integers, but you can use the same idea - basically to cut down the sort time by dividing the data into buckets, then using O(1) lookup to retrieve the bucket of data that is relevant.
Option Strict On
Option Explicit On
Module Module1
Private Const MAX_SIZE As Integer = 100000
Private m_input(MAX_SIZE) As Integer
Private m_table(MAX_SIZE) As List(Of Integer)
Private m_randomGen As New Random()
Private m_operations As Integer = 0
Private Sub generateData()
' fill with random numbers between 0 and MAX_SIZE - 1
For i = 0 To MAX_SIZE - 1
m_input(i) = m_randomGen.Next(0, MAX_SIZE - 1)
Next
End Sub
Private Sub sortData()
For i As Integer = 0 To MAX_SIZE - 1
Dim x = m_input(i)
If m_table(x) Is Nothing Then
m_table(x) = New List(Of Integer)
End If
m_table(x).Add(x)
' clearly this is simply going to be MAX_SIZE -1
m_operations = m_operations + 1
Next
End Sub
Private Sub printData(ByVal start As Integer, ByVal finish As Integer)
If start < 0 Or start > MAX_SIZE - 1 Then
Throw New Exception("printData - start out of range")
End If
If finish < 0 Or finish > MAX_SIZE - 1 Then
Throw New Exception("printData - finish out of range")
End If
For i As Integer = start To finish
If m_table(i) IsNot Nothing Then
For Each x In m_table(i)
Console.WriteLine(x)
Next
End If
Next
End Sub
' run the entire sort, but just print out the first 100 for verification purposes
Private Sub test()
m_operations = 0
generateData()
Console.WriteLine("Time started = " & Now.ToString())
sortData()
Console.WriteLine("Time finished = " & Now.ToString & " Number of operations = " & m_operations.ToString())
' print out a random 100 segment from the sorted array
Dim start As Integer = m_randomGen.Next(0, MAX_SIZE - 101)
printData(start, start + 100)
End Sub
Sub Main()
test()
Console.ReadLine()
End Sub
End Module
I post a gist https://gist.github.com/mikee805/c6c2e6a35032a3ab74f643a1d0f8249c
that is rather complete example based on what I found on stack overflow and some blogs hopefully someone else can use it
import static java.nio.file.Files.isWritable;
import static java.nio.file.StandardOpenOption.READ;
import static org.apache.commons.io.FileUtils.forceMkdir;
import static org.apache.commons.io.IOUtils.closeQuietly;
import static org.apache.commons.lang3.StringUtils.isBlank;
import static org.apache.commons.lang3.StringUtils.trimToNull;
import java.io.File;
import java.io.IOException;
import java.nio.Buffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
public class FileUtils {
private FileUtils() {
}
private static boolean found(final String candidate, final String prefix) {
return isBlank(candidate) || candidate.startsWith(prefix);
}
private static boolean before(final String candidate, final String prefix) {
return prefix.compareTo(candidate.substring(0, prefix.length())) < 0;
}
public static MappedByteBuffer getMappedByteBuffer(final Path path) {
FileChannel fileChannel = null;
try {
fileChannel = FileChannel.open(path, READ);
return fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, fileChannel.size()).load();
}
catch (Exception e) {
throw new RuntimeException(e);
}
finally {
closeQuietly(fileChannel);
}
}
public static String binarySearch(final String prefix, final MappedByteBuffer buffer) {
if (buffer == null) {
return null;
}
try {
long low = 0;
long high = buffer.limit();
while (low < high) {
int mid = (int) ((low + high) / 2);
final String candidate = getLine(mid, buffer);
if (found(candidate, prefix)) {
return trimToNull(candidate);
}
else if (before(candidate, prefix)) {
high = mid;
}
else {
low = mid + 1;
}
}
}
catch (Exception e) {
throw new RuntimeException(e);
}
return null;
}
private static String getLine(int position, final MappedByteBuffer buffer) {
// search backwards to the find the proceeding new line
// then search forwards again until the next new line
// return the string in between
final StringBuilder stringBuilder = new StringBuilder();
// walk it back
char candidate = (char)buffer.get(position);
while (position > 0 && candidate != '\n') {
candidate = (char)buffer.get(--position);
}
// we either are at the beginning of the file or a new line
if (position == 0) {
// we are at the beginning at the first char
candidate = (char)buffer.get(position);
stringBuilder.append(candidate);
}
// there is/are char(s) after new line / first char
if (isInBuffer(buffer, position)) {
//first char after new line
candidate = (char)buffer.get(++position);
stringBuilder.append(candidate);
//walk it forward
while (isInBuffer(buffer, position) && candidate != ('\n')) {
candidate = (char)buffer.get(++position);
stringBuilder.append(candidate);
}
}
return stringBuilder.toString();
}
private static boolean isInBuffer(final Buffer buffer, int position) {
return position + 1 < buffer.limit();
}
public static File getOrCreateDirectory(final String dirName) {
final File directory = new File(dirName);
try {
forceMkdir(directory);
isWritable(directory.toPath());
}
catch (IOException e) {
throw new RuntimeException(e);
}
return directory;
}
}
I had similar problem, so I created (Scala) library from solutions provided in this thread:
https://github.com/avast/BigMap
It contains utility for sorting huge file and binary search in this sorted file...
If you truly want to try memory mapping the file, I found a tutorial on how to use memory mapping in Java nio.

Categories

Resources