I am managing audio capturing and playing using java sound API (targetDataLine and sourceDataLine). Now suppose in a conference environment, one participant's audio queue size got greater than jitter size (due to processing or network) and I want to fast forward the audio bytes I have of that participant to make it shorter than jitter size.
How can I fast forward the audio byte array of that participant?
I can't do it during playing as normally Player thread just deque 1 frame from every participant's queue and mix it for playing. The only way I can get that is if I deque more than 1 frame of that participant and mix(?) it for fast-forwarding before mixing it with other participants 1 dequeued frame for playing?
Thanks in advance for any kind of help or advice.
There are two ways to speed up the playback that I know of. In one case, the faster pace creates a rise in pitch. The coding for this is relatively easy. In the other case, pitch is kept constant, but it involves a technique of working with sound granules (granular synthesis), and is harder to explain.
For the situation where maintaining the same pitch is not a concern, the basic plan is as follows: instead of advancing by single frames, advance by a frame + a small increment. For example, let's say that advancing 1.1 frames over a course of 44000 frames is sufficient to catch you up. (That would also mean that the pitch increase would be about 1/10 of an octave.)
To advance a "fractional" frame, you first have to convert the bytes of the two bracketing frames to PCM. Then, use linear interpolation to get the intermediate value. Then convert that intermediate value back to bytes for the output line.
For example, if you are advancing from frame[0] to frame["1.1"] you will need to know the PCM for frame[1] and frame[2]. The intermediate value can be calculated using a weighted average:
value = PCM[1] * 9/10 + PCM[2] * 1/10
I think it might be good to make the amount by which you advance change gradually. Take a few dozen frames to ramp up the increment and allow time to ramp down again when returning to normal dequeuing. If you suddenly change the rate at which you are reading the audio data, it is possible to introduce a discontinuity that will be heard as a click.
I have used this basic plan for dynamic control of playback speed, but I haven't had the experience of employing it for the situation that you are describing. Regulating the variable speed could be tricky if you also are trying to enforce keeping the transitions smooth.
The basic idea for using granules involves obtaining contiguous PCM (I'm not clear what the optimum number of frames would be for voice, 1 to 50 millis is cited as commonly being used with this technique in synthesis), and giving it a volume envelope that allows you to mix sequential granules end-to-end (they must overlap).
I think the envelopes for the granules make use of a Hann function or Hamming window--but I'm not clear on the details, such as the overlapping placement of the granules so that they mix/transition smoothly. I've only dabbled, and I'm going to assume folks at Signal Processing will be the best bet for advice on how to code this.
I found a fantastic git repo (sonic library, mainly for audio player) which actually does exactly what I wanted with so much controls. I can input a whole .wav file or even chunks of audio byte arrays and after processing, we can get speed up play experience and so more. For real time processing I actually called this on every chunk of audio byte array.
I found another way/algo to detect whether a audio chunk/byte array is voice or not and after depending on it's result, I can simply ignore playing non voice packets which gives us around 1.5x speedup with less processing.
public class DTHVAD {
public static final int INITIAL_EMIN = 100;
public static final double INITIAL_DELTAJ = 1.0001;
private static boolean isFirstFrame;
private static double Emax;
private static double Emin;
private static int inactiveFrameCounter;
private static double Lamda; //
private static double DeltaJ;
static {
initDTH();
}
private static void initDTH() {
Emax = 0;
Emin = 0;
isFirstFrame = true;
Lamda = 0.950; // range is 0.950---0.999
DeltaJ = 1.0001;
}
public static boolean isAllSilence(short[] samples, int length) {
boolean r = true;
for (int l = 0; l < length; l += 80) {
if (!isSilence(samples, l, l+80)) {
r = false;
break;
}
}
return r;
}
public static boolean isSilence(short[] samples, int offset, int length) {
boolean isSilenceR = false;
long energy = energyRMSE(samples, offset, length);
// printf("en=%ld\n",energy);
if (isFirstFrame) {
Emax = energy;
Emin = INITIAL_EMIN;
isFirstFrame = false;
}
if (energy > Emax) {
Emax = energy;
}
if (energy < Emin) {
if ((int) energy == 0) {
Emin = INITIAL_EMIN;
} else {
Emin = energy;
}
DeltaJ = INITIAL_DELTAJ; // Resetting DeltaJ with initial value
} else {
DeltaJ = DeltaJ * 1.0001;
}
long thresshold = (long) ((1 - Lamda) * Emax + Lamda * Emin);
// printf("e=%ld,Emin=%f, Emax=%f, thres=%ld\n",energy,Emin,Emax,thresshold);
Lamda = (Emax - Emin) / Emax;
if (energy > thresshold) {
isSilenceR = false; // voice marking
} else {
isSilenceR = true; // noise marking
}
Emin = Emin * DeltaJ;
return isSilenceR;
}
private static long energyRMSE(short[] samples, int offset, int length) {
double cEnergy = 0;
float reversOfN = (float) 1 / length;
long step = 0;
for (int i = offset; i < length; i++) {
step = samples[i] * samples[i]; // x*x/N=
// printf("step=%ld cEng=%ld\n",step,cEnergy);
cEnergy += (long) ((float) step * reversOfN);// for length =80
// reverseOfN=0.0125
}
cEnergy = Math.pow(cEnergy, 0.5);
return (long) cEnergy;
}
}
Here I can convert my byte array to short array and detect whether it is voice or non voice by
frame.silence = DTHVAD.isSilence(encodeShortBuffer, 0, shortLen);
Related
This is a very long post, but only the first half is really relevant.
The second half describes only what I have tried to solve it, but what seemed to me too inefficient (it can perhaps help to get the idea of what I want). The relevant part ends with the line after the bolded question.
I am trying to simulate multiple productions in an imaginary factory to calculate what amount of goods of each type known will be available at the end. There are several different goods types, and they have all a specific maximum production capacity that can only be reached if enough ingredients are available. An example of how the production lines could look is here:
The goods at the bottom all have a known rate at which they are delivered to the factory, so for those, there is nothing to calculate, though this rate can change over time (also, the maximum production capacity can also change at any point in time, e.g., the capacity can be increased by adding workers or more machines).
As shown in the picture, for the other goods there are three things to look at:
Some lines produce a good out of a single other one.
Some lines produce a good out of two others.
Some lines have a good used for creation of more than one new good
(see, for example, "R8" in the middle of the illustration).
I have the following information:
Maximum production rate of each good (5 produced per hour, for example)
for the bottom goods we have the amount delivered to the factory (5 delivered per hour, for example)
how much of each is in stock now (so in case there is not enough delivered, if we still have some in stock, we don't need to reduce production)
At what times the delivery of a good will change (can happen to any good at the bottom)
At what times the maximum production rate of a good will change (can happen to any good not at the bottom)
With this information, I want to calculate the amount of each good at a given time in the future. I would like this to be as efficient as possible, since I need these calculations quite often.
I tried to make an implementation for this in Java, but I have some problems. Instead of fixing them I looked at my algorithm again and figured out it did not look as if it was very efficient anyway, so I wanted to know if someone has already seen or solved this kind of problem?
The way I tried to solve this is following:
I create maximum production (and delivery) intervals for each good using the known information when a production (or delivery) amount changes.
I put all resources at the bottom in a remaining Set and a checked Set (bottom goods are immediately checked ones).
I calculate the actual amount of goods produced for each good: therefore, I take each good in remaining and I check what goods can be produced, if all that can be produced only are made of checked goods I calculate the actual amount produced depending of the maximum rate and the available goods (depending on the production of the things it is made of and the amount in stock if this is less). Additionally, in this step I add production intervals if due to lesser production of a source good (but some in stock at the beginning) the production needs to be reduced. When finished the goods the new ones are made of get removed from the remaining Set and the new ones are added, as well as being added to the checked Set.
Now we have all the actual good productions for each good and we can calculate it. For this we loop over each good and take the actual production and add it up using the interval borders for time. We have now the amount of goods at the wanted time in the future.
Additional info: we cannot do the point 4. without 3. since the actual amount we calculate for a good, can be consumed again for the production of the next one, so we need need this Step in between.
If it helps to understand what I have done (or wanted to do) I add my code (not working). The class is already initialized with the maximum production rate intervals of each produced good currently in production. Since other goods can be in stock, for all goods that are not included we initialize them to with a production of zero and one interval.
public class FactoryGoods {
private long future;
private long now;
private Map<String, Integer> availableGoods;
private Map<String, ArrayList<ProductionInterval>> hourlyGoodIncrease;
/**
*
* #param future long current time
* #param now long last time the factory's resources got updates
* #param availableGoods Map<String,Integer> of the goods in the factory
* #param hourlyGoodIncrease Map<String,ArrayList<ProductionInterval>> of the intervals of production quantities for the goods
*/
public factoryGoods(long future, long now, Map<String,Integer> availableGoods, Map<String,ArrayList<ProductionInterval>> hourlyGoodIncrease) {
this.future = future;
this.now = now;
this.availableGoods = availableGoods;
this.hourlyGoodIncrease = hourlyGoodIncrease;
}
/**
* Calculates the resources present in a factory's storage
* #return a Map of quantities mapped on the String name of the good
*/
public Map<String,Integer> getResources() {
// Make sure all goods to have all goods inside the loop, to work on goods,
// that are produced, but also those which are only in store
HashMap<String, Boolean> goodChecked = new HashMap<String,Boolean>();
Set<String> remaining = new HashSet<String>();
for (Goods good: Goods.values()) {
String g = good.get();
if (hourlyGoodIncrease.get(g) == null) {
ArrayList<ProductionInterval> prods = new ArrayList<ProductionInterval>();
ProductionInterval start = new ProductionInterval(now, 0);
prods.add(start);
hourlyGoodIncrease.put(g, prods);
}
if (availableGoods.get(g) == null) {
availableGoods.put(g, 0);
}
if (good.isPrimary()) {
goodChecked.put(g, true);
} else {
goodChecked.put(g, false);
}
remaining.add(g);
}
// As long as goods are remaining to be checked loops over the goods, and
// recalculates hourly good increases for goods, that have all its sources
// already calculated
while (remaining.size() > 0) {
Set<String> removes = new HashSet<String>();
for (String good: remaining) {
if (goodChecked.get(good)) {
Good g = GoodFactory.get(good);
Set<String> to = new HashSet<String>();
Map<String,Float> from = new HashMap<String,Float>();
setUpFromAndToGoods(g, to, from, availableGoods);
if (areGoodsAlreadyCalculated(to, goodChecked)) {
//remaining.remove(good);
removes.add(good);
} else {
if (areGoodsReadyForCalculation(to, goodChecked)) {
// Get all resources we are working on now
Map<String,Float> fromDecrease = new HashMap<String,Float>();
for (String t: to) {
for (String f: GoodFactory.get(t).isMadeFrom().keySet()) {
from.put(f, (float) availableGoods.get(f));
}
}
// Get all interval borders
ArrayList<Long> intervalBorders = new ArrayList<Long>();
for (String wGood: from.keySet()) {
ArrayList<ProductionInterval> intervals = hourlyGoodIncrease.get(wGood);
for (ProductionInterval interval: intervals) {
long intervalStart = interval.getStartTime();
if (!intervalBorders.contains(intervalStart)) {
intervalBorders.add(intervalStart);
}
}
}
Collections.sort(intervalBorders);
intervalBorders.add(future);
for (String f: from.keySet()) {
hourlyGoodIncrease.put(f, createNewProductionIntervalls(intervalBorders, hourlyGoodIncrease.get(f)));
}
// For all intervals
int iLast = intervalBorders.size() - 1;
for (int i = 0; i < iLast; i++) {
long elapsedTime = intervalBorders.get(i + 1) - intervalBorders.get(i);
for (String t: to) {
Map<String, Float> source = GoodFactory.get(t).isMadeFrom();
for (String s: source.keySet()) {
Float decrease = fromDecrease.get(s);
fromDecrease.put(s, (decrease != null ? decrease : 0) + source.get(s));
}
}
// Calculate amount after normal maximum production
Set<String> negatives = new HashSet<String>();
Map<String,Float> nextFrom = new HashMap<String,Float>();
for (String f: from.keySet()) {
float delta = from.get(f) + (hourlyGoodIncrease.get(f).get(i).getHourlyIncrease() - fromDecrease.get(f)) * elapsedTime / (1000 * 60 * 60);
nextFrom.put(f, delta);
if (delta < 0) {
negatives.add(f);
}
}
// Check if got under zero
if (negatives.size() == 0) {
for (String f: from.keySet()) {
float newIncrease = hourlyGoodIncrease.get(f).get(i).getHourlyIncrease() - fromDecrease.get(f);
hourlyGoodIncrease.get(f).get(i).setHourlyIncrease(newIncrease);
from.put(f, nextFrom.get(f));
}
} else {
// TODO: handle case when more is used than exists
}
// Else calculate point where at least one from is zero and add an interval
// before its maximum, after needs to be adjusted
}
// Break to remove all calculated goods from the remaining set and rerun the loop
removes = to;
break;
}
}
}
}
for (String remove: removes) {
remaining.remove(remove);
}
}
// Final calculation of the goods amounts that are available in the factory
for (String good: goodChecked.keySet()) {
ArrayList<ProductionInterval> intervals = hourlyGoodIncrease.get(good);
intervals.add(new ProductionInterval(future, 0));
float after = availableGoods.get(good);
for (int i = 0; i < (intervals.size() - 1); i++) {
after += intervals.get(i).getHourlyIncrease() * (intervals.get(i + 1).getStartTime() - intervals.get(i).getStartTime()) / (1000 * 60 * 60);
}
availableGoods.put(good, (int) after);
}
return availableGoods;
}
private static ArrayList<ProductionInterval> createNewProductionIntervalls(ArrayList<Long> intervalBorders, ArrayList<ProductionInterval> hourlyIncreases) {
System.out.print("intervalBorders\n");
System.out.print(intervalBorders + "\n");
System.out.print("hourlyIncreases\n");
System.out.print(hourlyIncreases + "\n");
ArrayList<ProductionInterval> intervalls = new ArrayList<ProductionInterval>();
int i = 0;
long iTime = 0;
long nextTime = 0;
for (long l: intervalBorders) {
float increase = 0;
iTime = hourlyIncreases.get(i).getStartTime();
if (i + 1 < hourlyIncreases.size()) {
nextTime = hourlyIncreases.get(i + 1).getStartTime();
}
if (l == iTime) {
increase = hourlyIncreases.get(i).getHourlyIncrease();
} else if (iTime < l && l < nextTime) {
increase = hourlyIncreases.get(i).getHourlyIncrease();
} else if (l == nextTime) {
increase = hourlyIncreases.get(++i).getHourlyIncrease();
}
intervalls.add(new ProductionInterval(l, increase));
}
return intervalls;
}
private static void setUpFromAndToGoods(Good g, Set<String> to, Map<String,Float> from, Map<String,Integer> availableGoods) {
Set<String> unchecked = g.isUsedToCreate();
while (unchecked.size() > 0) {
String good = unchecked.iterator().next();
unchecked.remove(good);
to.add(good);
Set<String> madeFrom = GoodFactory.get(good).isMadeFrom().keySet();
for (String fromGood: madeFrom) {
if (!from.containsKey(fromGood)) {
from.put(fromGood, (float) availableGoods.get(fromGood));
Set<String> additions = GoodFactory.get(fromGood).isUsedToCreate();
for (String addition: additions) {
if (!to.contains(addition) && !unchecked.contains(addition)) {
unchecked.add(addition);
}
}
}
}
}
}
private static boolean areGoodsReadyForCalculation(Set<String> toGoods, Map<String,Boolean> goodChecked) {
for (String t: toGoods) {
Good toGood = GoodFactory.get(t);
for (String from: toGood.isMadeFrom().keySet()) {
if (!goodChecked.get(from)) {
return false;
}
}
}
return true;
}
private static boolean areGoodsAlreadyCalculated(Set<String> toGoods, Map<String,Boolean> goodChecked) {
for (String t: toGoods) {
if (!goodChecked.get(t)) {
return false;
}
}
return true;
}
}
public class ProductionInterval {
private long startTime;
private float hourlyIncrease;
public ProductionInterval(long startTime, float hourlyIncrease) {
this.setStartTime(startTime);
this.setHourlyIncrease(hourlyIncrease);
}
public float getHourlyIncrease() {
return hourlyIncrease;
}
public void setHourlyIncrease(float hourlyIncrease) {
this.hourlyIncrease = hourlyIncrease;
}
public long getStartTime() {
return startTime;
}
public void setStartTime(long startTime) {
this.startTime = startTime;
}
public String toString() {
return "<starttime=" + this.startTime + ", hourlyIncrease=" + this.hourlyIncrease + ">";
}
}
Does someone know an algorithm that can solve my problem, or have some ideas how I can change my algorithm so that it gets more efficient? (I know it does not work at all, but with all these loops, I don't think it will be efficient and I would like to know if someone sees something I could make better before I put the work into finishing it).
You can apply a max flow algorithm like the Edmonds-Karp with few modifications, and you need to build the graph to feed to the algo:
Create a node for each good
You need one "source" node and one "sink" node
For each delivered good, create an arc from the source to respective node, with the capacity equal to delivery rate
For each final product, create an arc from its respective node to the sink, with capacity equal to production rate
For each dependency between goods, create an arc between respective nodes with capacity of one.
For each good, create an arc from source to the respective node with capacity equal to amount of the good in stock (for first iteration it's zero)
The results will be the flows from final goods nodes to the sink after the algorithm is finished. For your case, you need two modifications:
When calculating flow at a node, you take the minimum of the flows to it (since you require all dependencies to create a good), and then cap it at this good's maximum production rate for non-delivered goods
You need to account for change of goods in stock - will edit the answer later
Although, this algorithm is offline, which means it's not suited for flows changing over time, it's relatively simple, and if you're not too constrained by performance requirements, it may work - just run the algo again after adjusting the capacities. For online max flow, you can look at this,
Working out my idea of fractional simulation in C++ (sorry). Please see heavily commented code below.
(I know the prioritization in the face of constrained resources isn't what you want. It's not trivial to get a fair implementation of Derivative that produces as much as it can, so I wanted to validate this approach before going down the rabbit hole.)
#include <cassert>
#include <iostream>
#include <limits>
#include <utility>
#include <vector>
// Identifies a type of good in some Factory.
using Good = int;
// Simulates a factory. The simulation is crude, assuming continuous,
// fractional, zero-latency production. Nevertheless it may be accurate enough
// for high production volumes over long periods of time.
class Factory {
public:
// Declares a new raw material. `delivery_rate` is the amount of this good
// delivered per hour.
Good NewRawMaterial(double stock, double delivery_rate) {
assert(stock >= 0.0);
assert(delivery_rate >= 0.0);
return NewGood(stock, delivery_rate, {});
}
// Declares a new manufactured good. `max_production_rate` is the max amount
// of this good produced per hour. Producing one of this good consumes one
// `input`.
Good NewManufacturedGood(double stock, double max_production_rate,
Good input) {
assert(stock >= 0.0);
assert(max_production_rate >= 0.0);
return NewGood(stock, max_production_rate, {input});
}
// Declares a new manufactured good. `max_production_rate` is the max amount
// of this good produced per hour. Producing one of this good consumes one
// `input_a` and one `input_b`.
Good NewManufacturedGood(double stock, double max_production_rate,
Good input_a, Good input_b) {
assert(stock >= 0.0);
assert(max_production_rate >= 0.0);
return NewGood(stock, max_production_rate, {input_a, input_b});
}
// Returns the number of hours since the start of the simulation.
double Now() const { return current_time_; }
// Advances the current time to `time` hours since the start of the
// simulation.
void AdvanceTo(double time);
// Returns the amount of `good` in stock as of the current time.
double Stock(Good good) const { return stock_[good]; }
// Sets the delivery rate of `good` to `delivery_rate` as of the current time.
void SetDeliveryRate(Good good, double delivery_rate) {
assert(delivery_rate >= 0.0);
max_production_rate_[good] = delivery_rate;
}
// Sets the max production rate of `good` to `max_production_rate` as of the
// current time.
void SetMaxProductionRate(Good good, double max_production_rate) {
assert(max_production_rate >= 0.0);
max_production_rate_[good] = max_production_rate;
}
private:
// Floating-point tolerance.
static constexpr double kEpsilon = 1e-06;
// Declares a new good. We handle raw materials as goods with no inputs.
Good NewGood(double stock, double max_production_rate,
std::vector<Good> inputs) {
assert(stock >= 0.0);
assert(max_production_rate >= 0.0);
Good good = stock_.size();
stock_.push_back(stock);
max_production_rate_.push_back(max_production_rate);
inputs_.push_back(std::move(inputs));
return good;
}
// Returns the right-hand derivative of stock.
std::vector<double> Derivative() const;
// Returns the next time at which a good is newly out of stock, or positive
// infinity if there is no such time.
double NextStockOutTime(const std::vector<double> &derivative) const;
// The current time, in hours since the start of the simulation.
double current_time_ = 0.0;
// `stock_[good]` is the amount of `good` in stock at the current time.
std::vector<double> stock_;
// `max_production_rate_[good]` is the max production rate of `good` at the
// current time.
std::vector<double> max_production_rate_;
// `inputs_[good]` is the list of goods required to produce `good` (empty for
// raw materials).
std::vector<std::vector<Good>> inputs_;
// Derivative of `stock_`.
std::vector<double> stock_rate_;
};
void Factory::AdvanceTo(double time) {
assert(time >= current_time_);
bool caught_up = false;
while (!caught_up) {
auto derivative = Derivative();
double next_time = NextStockOutTime(derivative);
if (time <= next_time) {
next_time = time;
caught_up = true;
}
for (Good good = 0; good < stock_.size(); good++) {
stock_[good] += (next_time - current_time_) * derivative[good];
}
current_time_ = next_time;
}
}
std::vector<double> Factory::Derivative() const {
// TODO: this code prioritizes limited supply by the order in which production
// is declared. You probably want to use linear programming or something.
std::vector<double> derivative = max_production_rate_;
for (Good good = 0; good < stock_.size(); good++) {
for (Good input : inputs_[good]) {
if (stock_[input] <= kEpsilon) {
derivative[good] = std::min(derivative[good], derivative[input]);
}
}
for (Good input : inputs_[good]) {
derivative[input] -= derivative[good];
}
}
return derivative;
}
double Factory::NextStockOutTime(const std::vector<double> &derivative) const {
double duration = std::numeric_limits<double>::infinity();
for (Good good = 0; good < stock_.size(); good++) {
if (stock_[good] > kEpsilon && derivative[good] < -kEpsilon) {
duration = std::min(duration, stock_[good] / -derivative[good]);
}
}
return current_time_ + duration;
}
int main() {
Factory factory;
Good r1 = factory.NewRawMaterial(60.0, 3.0);
Good r2 = factory.NewRawMaterial(20.0, 1.0);
Good r3 = factory.NewManufacturedGood(0.0, 2.0, r1);
Good r4 = factory.NewManufacturedGood(0.0, 1.0, r1, r2);
auto print_stocks = [&]() {
std::cout << "t : " << factory.Now() << "\n";
std::cout << "r1: " << factory.Stock(r1) << "\n";
std::cout << "r2: " << factory.Stock(r2) << "\n";
std::cout << "r3: " << factory.Stock(r3) << "\n";
std::cout << "r4: " << factory.Stock(r4) << "\n";
std::cout << "\n";
};
print_stocks();
// Everything running smoothly
factory.AdvanceTo(24.0);
print_stocks();
// Uh oh, r1 supply cut off. Stock out at 44 hours.
factory.SetDeliveryRate(r1, 0.0);
factory.AdvanceTo(48.0);
print_stocks();
// r1 supply at 50%. r3 production prioritized.
factory.SetDeliveryRate(r1, 1.5);
factory.AdvanceTo(72.0);
print_stocks();
// r1 oversupplied.
factory.SetDeliveryRate(r1, 4.0);
factory.AdvanceTo(96.0);
print_stocks();
}
Output:
t : 0
r1: 60
r2: 20
r3: 0
r4: 0
t : 24
r1: 60
r2: 20
r3: 48
r4: 24
t : 48
r1: 0
r2: 24
r3: 88
r4: 44
t : 72
r1: 0
r2: 48
r3: 124
r4: 44
t : 96
r1: 24
r2: 48
r3: 172
r4: 68
I am writing some code that intends to take a Wave file, and write it out to and AudioTrack in mode stream. This is a minimum viable test to get AudioTrack stream mode working.
But once I write some buffer of audio to the AudioTrack, and subsequently call play(), the method getPlaybackHeadPosition() continually returns 0.
EDIT: If I ignore my available frames check, and just continually write buffers to the AudioTrack, the write method returns 0 (after the the first buffer write), indicating that it simply did not write any more audio. So it seems that the AudioTrack just doesn't want to start playing.
My code is properly priming the audiotrack. The play method is not throwing any exceptions, so I am not sure what is going wrong.
When stepping through the code, everything on my end is exactly how I anticipate it, so I am thinking somehow I have the AudioTrack configured wrong.
I am running on an emulator, but I don't think that should be an issue.
The WavFile class I am using is a vetted class that I have up and running reliably in lots of Java projects, it is tested to work well.
Observe the following log write, which is a snippet from the larger chunk of code. This log write is never hitting...
if (headPosition > 0)
Log.e("headPosition is greater than zero!!");
..
public static void writeToAudioTrackStream(final WavFile wave)
{
Log.e("writeToAudioTrackStream");
Thread thread = new Thread()
{
public void run()
{
try {
final float[] data = wave.getData();
int format = -1;
if (wave.getChannel() == 1)
format = AudioFormat.CHANNEL_OUT_MONO;
else if (wave.getChannel() == 2)
format = AudioFormat.CHANNEL_OUT_STEREO;
else
throw new RuntimeException("writeToAudioTrackStatic() - unsupported number of channels value = "+wave.getChannel());
final int bufferSizeInFrames = 2048;
final int bytesPerSmp = wave.getBytesPerSmp();
final int bufferSizeInBytes = bufferSizeInFrames * bytesPerSmp * wave.getChannel();
AudioTrack audioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, wave.getSmpRate(),
format,
AudioFormat.ENCODING_PCM_FLOAT,
bufferSizeInBytes,
AudioTrack.MODE_STREAM);
int index = 0;
float[] buffer = new float[bufferSizeInFrames * wave.getChannel()];
boolean started = false;
int framesWritten = 0;
while (index < data.length) {
// calculate the available space in the buffer
int headPosition = audioTrack.getPlaybackHeadPosition();
if (headPosition > 0)
Log.e("headPosition is greater than zero!!");
int framesInBuffer = framesWritten - headPosition;
int availableFrames = bufferSizeInFrames - framesInBuffer;
// once the buffer has no space, the prime is done, so start playing
if (availableFrames == 0) {
if (!started) {
audioTrack.play();
started = true;
}
continue;
}
int endOffset = availableFrames * wave.getChannel();
for (int i = 0; i < endOffset; i++)
buffer[i] = data[index + i];
int samplesWritten = audioTrack.write(buffer , 0 , endOffset , AudioTrack.WRITE_BLOCKING);
// could return error values
if (samplesWritten < 0)
throw new RuntimeException("AudioTrack write error.");
framesWritten += samplesWritten / wave.getChannel();
index = endOffset;
}
}
catch (Exception e) {
Log.e(e.toString());
}
}
};
thread.start();
}
Per the documentation,
For portability, an application should prime the data path to the maximum allowed by writing data until the write() method returns a short transfer count. This allows play() to start immediately, and reduces the chance of underrun.
With a strict reading, this might be seen to contradict the earlier statement:
...you can optionally prime the data path prior to calling play(), by writing up to bufferSizeInBytes...
(emphasis mine), but the intent is clear enough: You're supposed to get a short write first.
This is just to get play started. Once that takes place, you can, in fact, use
getPlaybackHeadPosition() to determine when more space is available. I've used that technique successfully in my own code, on many different devices/API levels.
As an aside: You should be prepared for getPlaybackHeadPosition() to change only in large increments (if I remember correctly, it's getMinBufferSize()/2). This is the max resolution available from the system; onMarkerReached() cannot be used to do any better.
I am writing my own audio format as part of a game console project. Part of the project requires me to write an emulator so I know exactly how to implement it's functions in hardware. I am currently writing the DSP portion, but I am having trouble writing a decoding algorithm. Before I go further, I'll explain my format.
DST (Dingo Sound Track) Audio format
The audio format only records to pieces of data per sample: the amplitude and the number of frames since the last sample. I'll explain. When converting an audio file (WAV for example), it compares the current sample with the previous one. If it detects that the current sample switches amplitude direction in relation to the previous sample, it records the previous sample and the number of frames since the last record. It keeps going until the end of the file. Here is a diagram to explain further:
What I need to do
I need my "DSP" to figure out the data between each sample, as accurately as possible using only the given information. I don't think it's my encoding algorithm, because when I play the file in Audacity, I can sort of make out the original song. But when I try to play it with my decoding algorithm, I get scattered clicks. I am able to play WAV files directly with a few mods to the algorithm with almost no quality drop, so I know it's definitely the algorithm and not the rest of the DSP.
The Code
So now I got all of the basic info out of the way, here is my code (only the important parts).
Encoding algorithm:
FileInputStream s = null;
BufferedWriter bw;
try {
int bytes;
int previous = 0;
int unsigned;
int frames = 0;
int size;
int cursor = 0;
boolean dir = true;
int bytes2;
int previous2 = 0;
int unsigned2;
int frames2 = 0;
boolean dir2 = true;
s = new FileInputStream(selectedFile);
size = (int)s.getChannel().size();
File f = new File(Directory.getPath() + "\\" + (selectedFile.getName().replace(".wav", ".dts")));
System.out.println(f.getPath());
if(!f.exists()){
f.createNewFile();
}
bw = new BufferedWriter(new FileWriter(f));
try (BufferedInputStream b = new BufferedInputStream(s)) {
byte[] data = new byte[128];
b.skip(44);
System.out.println("Loading...");
while ((bytes = b.read(data)) > 0) {
// do something
for(int i=1; i<bytes; i += 4) {
unsigned = data[i] & 0xFF;
if (dir) {
if (unsigned < previous) {
bw.write(previous);
bw.write(frames);
dir = !dir;
frames = 0;
}else{
frames ++;
}
} else {
if (unsigned > previous) {
bw.write(previous);
bw.write(frames);
dir = !dir;
frames = 0;
}else{
frames ++;
}
}
previous = unsigned;
cursor ++;
unsigned2 = data[i + 2] & 0xFF;
if (dir2) {
if (unsigned2 < previous2) {
bw.write(previous2);
bw.write(frames2);
dir2 = !dir2;
frames2 = 0;
}else{
frames2 ++;
}
} else {
if (unsigned2 > previous2) {
bw.write(previous2);
bw.write(frames2);
dir2 = !dir2;
frames2 = 0;
}else{
frames2 ++;
}
}
previous2 = unsigned2;
cursor ++;
progress.setValue((int)(((float)(cursor / size)) * 100));
}
}
b.read(data);
}
bw.flush();
bw.close();
System.out.println("Done");
convert.setEnabled(true);
status.setText("finished");
} catch (Exception ex) {
status.setText("An error has occured");
ex.printStackTrace();
convert.setEnabled(true);
}
finally {
try {
s.close();
} catch (Exception ex) {
status.setText("An error has occured");
ex.printStackTrace();
convert.setEnabled(true);
}
}
The progress and status objects can be ignored for they are part of the GUI of my converter tool. This algorithm converts WAV files to my format (DST).
Decoding algorithm:
int start = bufferSize * (bufferNumber - 1);
short current;
short frames;
short count = 1;
short count2 = 1;
float jump;
for (int i = 0; i < bufferSize; i ++) {
current = RAM.read(start + i);
i++;
frames = RAM.read(start + i);
if (frames == 0) {
buffer[count - 1] = current;
count ++;
} else {
jump = current / frames;
for (int i2 = 1; i2 < frames; i2++) {
buffer[(2 * i2) - 1] = (short) (jump * i2);
count ++;
}
}
i++;
current = RAM.read(start + i);
i++;
frames = RAM.read(start + i);
if (frames == 0) {
buffer[count2] = current;
count2 ++;
} else {
jump = current / frames;
for (int i2 = 1; i2 < frames; i2++) {
buffer[2 * i2] = (short) (jump * i2);
count2 ++;
}
}
}
bufferNumber ++;
if(bufferNumber > maxBuffer){
bufferNumber = 1;
}
The RAM object is just a byte array. bufferNumber and maxBuffer refer to the amount of processing buffers the DSP core uses. buffer is the object that the resulting audio is written to. This algorithm set is designed to convert stereo tracks, which works the same way in my format but each sample will contain two sets of data, one for each track.
The Question
How do I figure out the missing audio between each sample, as accurately as possible, and how accurate will the approach be? I would love to simply use the WAV format, but my console is limited on memory (RAM). This format halves the RAM space required to process audio. I am also planning on implementing this algorithm in an ARM microcontroller, which will be the console's real DSP. The algorithm should also be fast, but accuracy is more important. If I need to clarify or explain anything further, let me know since this is my first BIG question and I am sure I forgot something. Code samples would be nice, but aren't needed that much.
EDIT:
I managed to get the DSP to output a song, but it's sped up and filled with static. The sped up part is due to a glitch in it not splitting the track into stereo (I think). And the static is due to the initial increment being too steep. Here is a picture of what I'm getting:
Here is the new code used in the DSP:
if (frames == 0) {
buffer[i - 1] = current;
//System.out.println(current);
} else {
for (int i2 = 1; i2 < frames + 1; i2++) {
jump = (float)(previous + ((float)(current - previous) / (frames - i2 + 1)));
//System.out.println((short)jump);
buffer[(2 * i2) - 1] = (short)(jump);
}
}
previous = current;
I need a way to smooth out those initial increments, and I'd prefer not to use complex arithmetic because I am limited on performance when I port this to hardware (preferably something that can operate on a 100MHZ ARM controller while being able to keep a 44.1KHZ sample rate). Edit: the result wave should actually be backwards. Sorry.
Second Edit:
I got the DSP to output in stereo, but unfortunately that didn't fix anything else like I hoped it would. I also fixed some bugs with the encoder so now it takes 8 bit unsigned audio. This has become more of a math issue so I think I'll post a similar question in Mathematics Stack Exchange. Well that was a waste of time. It got put on fhold near instantly.
You have basically a record of the signal's local extrema and want to reconstruct the signal. The most straight-forward way would be to use some monotonic interpolation scheme. You can try if this fits your needs. But I guess, the result would be very inaccurate because the characteristics of the signal are ignored.
I am not an audio engineer, so my assumptions could be wrong. But maybe, you get somewhere with these thoughts.
The signal is basically a mixture of sines. Calculating a sine function for any segment between two key frames is quite easy. The period is given by twice their distance. The amplitude is given by half the amplitude difference. This will give you a sine that hits the two key samples exactly. Furthermore, it will give you a C1-continuous signal because the derivatives at the connection points are zero. For a nice signal, you probably need even more smoothness. So you could start to interpolate the two sines around a key frame with an appropriate window function. I would start with a simple triangle window but others may give better results. This procedure will preserve the extrema.
It is probably easier to tackle this problem visually (with a plot of the signal), so you can see the results.
If it's all about size, then maybe you want to look into established audio compression methods. They usually give much better compression ratio than 1:2. Also, I don't understand why this method saves RAM because you'll have to calculate all samples when decoding. Of course, this assumes that not the complete data are loaded into RAM but streamed in pieces.
I'm working on a voice recording app. In it, I have a Seekbar to change the input voice gain.
I couldn't find any way to adjust the input voice gain.
I am using the AudioRecord class to record voice.
recorder = new AudioRecord(MediaRecorder.AudioSource.MIC,
RECORDER_SAMPLERATE, RECORDER_CHANNELS,
RECORDER_AUDIO_ENCODING, bufferSize);
recorder.startRecording();
I've seen an app in the Google Play Store using this functionality.
As I understand you don't want any automatic adjustments, only manual from the UI. There is no built-in functionality for this in Android, instead you have to modify your data manually.
Suppose you use read (short[] audioData, int offsetInShorts, int sizeInShorts) for reading the stream. So you should just do something like this:
float gain = getGain(); // taken from the UI control, perhaps in range from 0.0 to 2.0
int numRead = read(audioData, 0, SIZE);
if (numRead > 0) {
for (int i = 0; i < numRead; ++i) {
audioData[i] = (short)Math.min((int)(audioData[i] * gain), (int)Short.MAX_VALUE);
}
}
Math.min is used to prevent overflow if gain is greater than 1.
Dynamic microphone sensitivity is not a thing that the hardware or operating system is capable of as it requires analysis on the recorded sound. You should implement your own algorithm to analyze the recorded sound and adjust (amplify or decrease) the sound level on your own.
You can start by analyzing last few seconds and find a multiplier that is going to "balance" the average amplitude. The multiplier must be inversely proportional to the average amplitude to balance it.
PS: If you still want to do it, the mic levels are accessible when you have a root access, but I am still not sure -and don't think it is possible- if you can change the settings while recording. Hint: "/system/etc/snd_soc_msm" file.
Solution by OP.
I have done it using
final int USHORT_MASK = (1 << 16) - 1;
final ByteBuffer buf = ByteBuffer.wrap(data).order(
ByteOrder.LITTLE_ENDIAN);
final ByteBuffer newBuf = ByteBuffer.allocate(
data.length).order(ByteOrder.LITTLE_ENDIAN);
int sample;
while (buf.hasRemaining()) {
sample = (int) buf.getShort() & USHORT_MASK;
sample *= db_value_global;
newBuf.putShort((short) (sample & USHORT_MASK));
}
data = newBuf.array();
os.write(data);
This is working implementation based on ByteBuffer for 16bit audio. It's important to clamp the increased value from both sides since short is signed. It's also important to set the native byte order to ByteBuffer since audioRecord.read() returns native endian bytes.
You may also want to perform audioRecord.read() and following code in a loop, calling data.clear() after each iteration.
double gain = 2.0;
ByteBuffer data = ByteBuffer.allocateDirect(SAMPLES_PER_FRAME).order(ByteOrder.nativeOrder());
int audioInputLengthBytes = audioRecord.read(data, SAMPLES_PER_FRAME);
ShortBuffer shortBuffer = data.asShortBuffer();
for (int i = 0; i < audioInputLengthBytes / 2; i++) { // /2 because we need the length in shorts
short s = shortBuffer.get(i);
int increased = (int) (s * gain);
s = (short) Math.min(Math.max(increased, Short.MIN_VALUE), Short.MAX_VALUE);
shortBuffer.put(i, s);
}
There have been other questions and answers on this site suggesting that, to create an echo or delay effect, you need only add one audio sample with a stored audio sample from the past. As such, I have the following Java class:
public class DelayAMod extends AudioMod {
private int delay = 500;
private float decay = 0.1f;
private boolean feedback = false;
private int delaySamples;
private short[] samples;
private int rrPointer;
#Override
public void init() {
this.setDelay(this.delay);
this.samples = new short[44100];
this.rrPointer = 0;
}
public void setDecay(final float decay) {
this.decay = Math.max(0.0f, Math.min(decay, 0.99f));
}
public void setDelay(final int msDelay) {
this.delay = msDelay;
this.delaySamples = 44100 / (1000/this.delay);
System.out.println("Delay samples:"+this.delaySamples);
}
#Override
public short process(short sample) {
System.out.println("Got:"+sample);
if (this.feedback) {
//Delay should feed back into the loop:
sample = (this.samples[this.rrPointer] = this.apply(sample));
} else {
//No feedback - store base data, then add echo:
this.samples[this.rrPointer] = sample;
sample = this.apply(sample);
}
++this.rrPointer;
if (this.rrPointer >= this.samples.length) {
this.rrPointer = 0;
}
System.out.println("Returning:"+sample);
return sample;
}
private short apply(short sample) {
int loc = this.rrPointer - this.delaySamples;
if (loc < 0) {
loc += this.samples.length;
}
System.out.println("Found:"+this.samples[loc]+" at "+loc);
System.out.println("Adding:"+(this.samples[loc] * this.decay));
return (short)Math.max(Short.MIN_VALUE, Math.min(sample + (int)(this.samples[loc] * this.decay), (int)Short.MAX_VALUE));
}
}
It accepts one 16-bit sample at a time from an input stream, finds an earlier sample, and adds them together accordingly. However, the output is just horrible noisy static, especially when the decay is raised to a level that would actually cause any appreciable result. Reducing the decay to 0.01 barely allows the original audio to come through, but there's certainly no echo at that point.
Basic troubleshooting facts:
The audio stream sounds fine if this processing is skipped.
The audio stream sounds fine if decay is 0 (nothing to add).
The stored samples are indeed stored and accessed in the proper order and the proper locations.
The stored samples are being decayed and added to the input samples properly.
All numbers from the call of process() to return sample are precisely what I would expect from this algorithm, and remain so even outside this class.
The problem seems to arise from simply adding signed shorts together, and the resulting waveform is an absolute catastrophe. I've seen this specific method implemented in a variety of places - C#, C++, even on microcontrollers - so why is it failing so hard here?
EDIT: It seems I've been going about this entirely wrong. I don't know if it's FFmpeg/avconv, or some other factor, but I am not working with a normal PCM signal here. Through graphing of the waveform, as well as a failed attempt at a tone generator and the resulting analysis, I have determined that this is some version of differential pulse-code modulation; pitch is determined by change from one sample to the next, and halving the intended "volume" multiplier on a pure sine wave actually lowers the pitch and leaves volume the same. (Messing with the volume multiplier on a non-sine sequence creates the same static as this echo algorithm.) As this and other DSP algorithms are intended to work on linear pulse-code modulation, I'm going to need some way to get the proper audio stream first.
It should definitely work unless you have significant clipping.
For example, this is a text file with two columns. The leftmost column is the 16 bit input. The second column is the sum of the first and a version delayed by 4001 samples. The sample rate is 22KHz.
Each sample in the second column is the result of summing x[k] and x[k-4001] (e.g. y[5000] = x[5000] + x[999] = -13840 + 9181 = -4659) You can clearly hear the echo signal when playing the samples in the second column.
Try this signal with your code and see if you get identical results.