January 21, 2020

A ring buffer for audio samples

A ring buffer for audio samples

At Fliva we work with audio (we have a c++ library that essentially works as a audio sequencer), video (we have a rendering engine which basically works as a scriptable After Effects clone) and we do transcoding, muxing and all that jazz...

When we began this journey we generated video and audio in separate processes and used ffmpeg to mux the files to a video file that could be used by our customer - or uploaded to our player platform.  

Now though we do it all in one process, which then requires that the various parts of the engine does not slow down the other parts. I.e. we run the whole thing in multiple threads. At least one for generating video, at least one for generating audio and at least one for muxing the resulting video file.

We use ring buffers for this, which offsets some of the fluctuation in generation speeds by using extra memory.

Our audio data is flowing through such a ring buffer, with a rather simple interface.

class AudioRingBuffer {
  public:
    AudioRingBuffer(int size = 128, int count = 375 , int channels = 2);
    ~AudioRingBuffer();

    bool empty() const { return writes == reads; }
    bool full() const { return size() == bufferCount; }
    uint64_t size() const { return (writes - reads); }

    void write(const float* const* data, int numSamples);

    uint64_t availableSamples() const { return size() * bufferSize; }

    uint64_t read(float *outData);

    uint64_t nextWritePosition();
    uint64_t nextReadPosition();


  private:
    uint64_t writes;
    uint64_t reads;
    uint16_t bufferSize;
    uint16_t  bufferCount;
    int channelCount;
    float* buffer;

};

The sizes should make some sense. Our internal buffersize is set to 128 samples (per channel). The reason is that we run events in the sequencer after each write to the buffer, and doing this for every 128 samples, means there is max of approximately 3ms delay between events (we run at 44100 samples per second internally). In other words, an audio event (play file, change effect param, change pan, change gain, stop file and so forth) will never be more than 3ms late.

The number of buffer places is set to 375, which means we could max have a second of audio in the buffer. (I know what you are thinking, 375 times 128 is not 44100 - and thus it is not a second of audio.... No you are correct dear pedantic reader - but 44100/128 is not an integer - so I decided to use 48000 as the basis for this calculation instead)

Finally almost all our videos are 2 channel audio (stereo) - so we default to that. However both our audio engine library and our muxer could potentially emit any amount of channels...

The implementation of the file is equally simple

AudioRingBuffer::AudioRingBuffer(int size, int count, int channels) : writes(0), reads(0), bufferSize(size), bufferCount(count), channelCount(channels), buffer(new float[bufferSize*bufferCount*channelCount])  {
}
AudioRingBuffer::~AudioRingBuffer() {}

uint64_t AudioRingBuffer::nextWritePosition() {
  auto position = writes % bufferCount;
  return position * bufferSize * channelCount;
}

void AudioRingBuffer::write(const float* const* data, int numSamples) {
  while(full()) {
    std::this_thread::sleep_for(std::chrono::milliseconds(1));
  }
  if(!full()) {
    for(int i = 0;i < channelCount;++i) {
      std::memcpy((buffer + (nextWritePosition() + (i*bufferSize))), data[i], bufferSize * sizeof(float));
    }
    ++writes;
  }
}

uint64_t AudioRingBuffer::nextReadPosition() {
  auto position = reads % bufferCount;
  return position * bufferSize * channelCount;
}

uint64_t AudioRingBuffer::read(float *outData) {
  if(!empty()) {
    std::memcpy(outData, (buffer + nextReadPosition()), bufferSize * channelCount * sizeof(float));
    ++reads;
    return bufferSize;
  }
  return 0;
}

The method nextReadPosition and nextWritePosition could have been private, however we use these in tests.

The write method takes an array of channels of samples. If the ring buffer is full, it blocks the thread (by looping over a sleep call, until the buffer is no longer full), then it copies the audio data into the ringbuffer into a single contiguous area of memory (each channel comes directly after the previous one).

The read method does almost the same thing, just in reverse. The read method never blocks, it either returns samples or nothing. And the samples are written in the incoming pointer contiguously - it is the callers responsibility to map this to channels again, if needed.

This design makes this buffer easy to develop, easy to maintain and relatively easy to test as well...

The test file has a crazy amount of boilerplate, but it tests the most common issues that would crop up in such a ring buffer.

#define CATCH_CONFIG_MAIN  // This tells Catch to provide a main() - only do
// this in one cpp file
#include "audio_ring_buffer.hpp"
#include "catch.hpp"

TEST_CASE("Defaults to not be full", "[audio ring buffer]") {
  AudioRingBuffer subject;
  REQUIRE(subject.full() == false);
}

TEST_CASE("Defaults to be empty", "[audio ring buffer]") {
  AudioRingBuffer subject;
  REQUIRE(subject.full() == false);
}

TEST_CASE("Default read position is the start pointer", "[audio ring buffer]") {
  AudioRingBuffer subject;
  REQUIRE(subject.nextReadPosition() == 0);
}

TEST_CASE("Default write position is the start pointer", "[audio ring buffer]") {
  AudioRingBuffer subject;
  REQUIRE(subject.nextWritePosition() == 0);
}


TEST_CASE("Writing changes size", "[audio ring buffer]") {
  AudioRingBuffer subject;
  REQUIRE(subject.full() == false);

  auto data = new float*[2];
  data[0] = new float[128];
  data[1] = new float[128];
  for(int channel = 0;channel < 2;++channel) {
    for(int sample = 0;sample < 128;++sample) {
      data[channel][sample] = 0;
    }
  }

  subject.write(data, 128);

  REQUIRE(subject.size() == 1);
}


TEST_CASE("Writing changes write poisition by 256 samples", "[audio ring buffer]") {
  AudioRingBuffer subject;
  REQUIRE(subject.full() == false);

  auto data = new float*[2];
  data[0] = new float[128];
  data[1] = new float[128];
  for(int channel = 0;channel < 2;++channel) {
    for(int sample = 0;sample < 128;++sample) {
      data[channel][sample] = 0;
    }
  }

  subject.write(data, 128);

  REQUIRE(subject.nextWritePosition() == 256);
}


TEST_CASE("Reading changes size", "[audio ring buffer]") {
  AudioRingBuffer subject;
  REQUIRE(subject.full() == false);

  auto data = new float*[2];
  data[0] = new float[128];
  data[1] = new float[128];
  for(int channel = 0;channel < 2;++channel) {
    for(int sample = 0;sample < 128;++sample) {
      data[channel][sample] = 0;
    }
  }

  subject.write(data, 128);

  auto outData = new float[256];

  auto samples = subject.read(outData);

  REQUIRE(samples == 128);
  REQUIRE(subject.size() == 0);
}


TEST_CASE("Reading changes read poisition by 256 samples", "[audio ring buffer]") {
  AudioRingBuffer subject;
  REQUIRE(subject.full() == false);

  auto data = new float*[2];
  data[0] = new float[128];
  data[1] = new float[128];
  for(int channel = 0;channel < 2;++channel) {
    for(int sample = 0;sample < 128;++sample) {
      data[channel][sample] = 0;
    }
  }

  subject.write(data, 128);

  auto outData = new float[256];

  auto samples = subject.read(outData);

  REQUIRE(samples == 128);
  REQUIRE(subject.nextReadPosition() == 256);
}


TEST_CASE("Data is read in order", "[audio ring buffer]") {
  AudioRingBuffer subject;
  REQUIRE(subject.full() == false);

  auto data1 = new float*[2];
  data1[0] = new float[128];
  data1[1] = new float[128];
  for(int channel = 0;channel < 2;++channel) {
    for(int sample = 0;sample < 128;++sample) {
      data1[channel][sample] = 0;
    }
  }

  auto data2 = new float*[2];
  data2[0] = new float[128];
  data2[1] = new float[128];
  for(int channel = 0;channel < 2;++channel) {
    for(int sample = 0;sample < 128;++sample) {
      data2[channel][sample] = channel * 128 + sample + 1;
    }
  }

  auto data3 = new float*[2];
  data3[0] = new float[128];
  data3[1] = new float[128];
  for(int channel = 0;channel < 2;++channel) {
    for(int sample = 0;sample < 128;++sample) {
      data3[channel][sample] = 257;
    }
  }
  REQUIRE(subject.nextWritePosition() == 0);
  subject.write(data1, 128);
  REQUIRE(subject.nextWritePosition() == 256);
  subject.write(data2, 128);
  REQUIRE(subject.nextWritePosition() == 512);
  subject.write(data3, 128);
  REQUIRE(subject.nextWritePosition() == 768);

  auto outData = new float[256];

  REQUIRE(subject.nextReadPosition() == 0);
  auto samples = subject.read(outData);
  REQUIRE((int) outData[0] == 0); // LOWER BOUND FIRST READ
  REQUIRE((int) outData[255] == 0); // UPPER BOUND FIRST READ

  REQUIRE(subject.nextReadPosition() == 256);
  samples = subject.read(outData);
  REQUIRE((int) outData[0] == 1); // LOWER BOUND SECOND READ
  REQUIRE((int) outData[255] == 256); // UPPER BOUND SECOND READ

  REQUIRE(subject.nextReadPosition() == 512);
  samples = subject.read(outData);
  REQUIRE((int) outData[0] == 257); // LOWER BOUND THIRD READ
  REQUIRE((int) outData[255] == 257); // UPPER BOUND THIRD READ

  REQUIRE(subject.nextReadPosition() == 768);
}

TEST_CASE("Continous writing and reading does not exhaust the buffer (i.e. does not block)", "[audio ring buffer]") {
  AudioRingBuffer subject(128, 3);
  REQUIRE(subject.full() == false);

  auto data1 = new float*[2];
  data1[0] = new float[128];
  data1[1] = new float[128];
  for(int channel = 0;channel < 2;++channel) {
    for(int sample = 0;sample < 128;++sample) {
      data1[channel][sample] = 0;
    }
  }

  auto outData = new float[256];
  for(int i=0;i<100;++i) {
    subject.write(data1, 128);
    subject.read(outData);
  }
}


TEST_CASE("Data pointers get reset when wrapping", "[audio ring buffer]") {
  AudioRingBuffer subject(128, 3);
  REQUIRE(subject.full() == false);

  auto data1 = new float*[2];
  data1[0] = new float[128];
  data1[1] = new float[128];
  for(int channel = 0;channel < 2;++channel) {
    for(int sample = 0;sample < 128;++sample) {
      data1[channel][sample] = 0;
    }
  }

  auto data2 = new float*[2];
  data2[0] = new float[128];
  data2[1] = new float[128];
  for(int channel = 0;channel < 2;++channel) {
    for(int sample = 0;sample < 128;++sample) {
      data2[channel][sample] = channel * 128 + sample + 1;
    }
  }

  auto data3 = new float*[2];
  data3[0] = new float[128];
  data3[1] = new float[128];
  for(int channel = 0;channel < 2;++channel) {
    for(int sample = 0;sample < 128;++sample) {
      data3[channel][sample] = 257;
    }
  }
  REQUIRE(subject.nextWritePosition() == 0);
  subject.write(data1, 128);
  REQUIRE(subject.nextWritePosition() == 256);
  subject.write(data2, 128);
  REQUIRE(subject.nextWritePosition() == 512);
  subject.write(data3, 128);
  REQUIRE(subject.nextWritePosition() == 0); // Reset back to zero here!!!

  auto outData = new float[256];

  REQUIRE(subject.nextReadPosition() == 0);
  auto samples = subject.read(outData);
  REQUIRE((int) outData[0] == 0); // LOWER BOUND FIRST READ
  REQUIRE((int) outData[255] == 0); // UPPER BOUND FIRST READ

  REQUIRE(subject.nextReadPosition() == 256);
  samples = subject.read(outData);
  REQUIRE((int) outData[0] == 1); // LOWER BOUND SECOND READ
  REQUIRE((int) outData[255] == 256); // UPPER BOUND SECOND READ

  REQUIRE(subject.nextReadPosition() == 512);
  samples = subject.read(outData);
  REQUIRE((int) outData[0] == 257); // LOWER BOUND THIRD READ
  REQUIRE((int) outData[255] == 257); // UPPER BOUND THIRD READ

  REQUIRE(subject.nextReadPosition() == 0); // Reset back to zero here
}