AVFoundation to Play and Reverse Video

Become a Subscriber

In this tutorial I will tell you how to achieve cool reversed video effect via AVFoundation.

Preview the final result on YouTube

This is the repo of final project

Getting Started with AVFoundation

First let’s setup the AVCaptureSession for the project and prepare the app for showing the camera.

To display the video output on UIView, let’s prepare the instance of AVCaptureSession:

func prepareSession() {
  /// 1
  captureSession = AVCaptureSession()

  /// 2
  let input = try? AVCaptureDeviceInput(device: connectedDevice())
  if captureSession.canAddInput(input) {
      captureSession.addInput(input)
  }

  /// 3
  captureSession.sessionPreset = AVCaptureSessionPresetHigh
  captureSession.startRunning()

  /// 4
  dataOutput = AVCaptureVideoDataOutput()
  dataOutput.videoSettings = [
      String(kCVPixelBufferPixelFormatTypeKey) : Int(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)
  ]

  if captureSession.canAddOutput(dataOutput) {
     captureSession.addOutput(dataOutput)
  }

  /// 5
  captureSession.commitConfiguration()

  /// 6
  let queue: DispatchQueue = DispatchQueue(label: "VideoOutputQueue")
  dataOutput.setSampleBufferDelegate(self, queue: queue)
  dataOutput.connection(withMediaType: AVMediaTypeVideo).videoOrientation = .portrait

  output.presentCaptureSession()
}
fileprivate func connectedDevice() -> AVCaptureDevice! {
  /// 7
  if #available(iOS 10.0, *) {
    return AVCaptureDevice.defaultDevice(withDeviceType: AVCaptureDeviceType.builtInWideAngleCamera, mediaType: AVMediaTypeVideo, position: .back)
  }
  else {
    return AVCaptureDevice.devices()
      .map { $0 as! AVCaptureDevice }
      .filter { $0.hasMediaType(AVMediaTypeVideo) && $0.position == AVCaptureDevicePosition.back }.first! as AVCaptureDevice
  }
}

To briefly go through this code:

  1. AVCaptureSession is the object which hold and manage all manipulation through input devices (camera and microphone) and bring the picture to the UIKit layer, so user can see the output.
  2. To see an actual output on the screen from camera, we need to add the input device. In sake of this tutorial we’re gonna add only the back camera. But so you know you can change current input and call commitConfiguration() on captureSession to apply changes.
  3. Setting PresetHigh means that the captureSession would apply the most high resolution of the video output which is available on the current device. After that we can start the session, so it would start receiving data from back camera.
  4. Next thing we need to define AVCaptureOutput for video data. By initialising an instance of AVCaptureVideoOutput we provide a way for our captureSession to process the raw frames as pixel buffers while it captures them during the running session. kCVPixelFormatType_420YpCbCr8BiPlanarFullRange - is the best color range type for iPhone devices. It shows Luma, ChromeBlue and ChromeRed color environment with 8 bit for each component.
    • NOTE: that capture session can includes multiple outputs and inputs and process the merged data between them. This is out of the bounds for this tutorial, but if that’s something you’re interested in, let me know in the comments and I’ll make another tutorial for that topic.
  5. Committing applied changes to the captureSession
  6. In order to acquire the set of captured raw frames in presentation of pixel buffers we need to set the sampleBufferDelegate with background typed queue and remember that all data captured in delegate’s methods are run on background thread. Another thing is the orientation, I set it to Portrait only for the tutorial, but you hook the orientation handling to device actual tilts.
  7. By announcing the iPhone 7 plus with dual camera, Apple changed the API in AVFoundation for defining the camera as AVCaptureDevice. They added new methods allowing to access the dual camera and switch between the wide and telephoto cameras. So in order to support iOS 9 we need to trick via #available macros and use both new and old API to access the wide angle camera (Abstract input camera, which would apply for all available iPhones).

Preview Layer

To show the actual output from the camera in UI you’d need to add AVCaptureVideoPreviewLayer instance and add it into UIView’s layer. Code below shows an example of how to achieve it.

func displayCaptureSession() {
  preview = AVCaptureVideoPreviewLayer(session: output.captureSession)
  preview.videoGravity = AVLayerVideoGravityResizeAspectFill
  preview.connection.videoOrientation = .portrait
  sessionLayer.layer.addSublayer(preview)
}

ResizeAspectFill works similarly with UIImageView’s contentMode AspectFill, which basically stretches the image so it would fill all the visible area and it also saves the original aspect ratio of the video input. Saying such if your view’s aspect ratio differs from the camera’s you won’t see some part of the camera picture.

Acquire Data

In order to access the Core Media sample buffer we need to implement the didOutputSampleBuffer method of sampleBufferDelegate of device output.

func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
  /// 1
  guard let cvBuf = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }

  /// 2
  if captureOutput is AVCaptureVideoDataOutput, recordShallStart {
    /// 3
    let copiedCvBuf = cvBuf.deepcopy()
    buffers.append(copiedCvBuf)
    progress?(Float(buffers.count) / Float(totalFrames))

    /// 4
    if buffers.count >= totalFrames {
      recordShallStart = false
      merge()
    }
  }
}
  1. In this delegate method we have access to core media’s sample buffer which includes information about particular media type like audio device input, video device input or muxed data from both. In order to get the video frame from that buffer we need to grab the core video’s buffer with CMSampleBufferGetImageBuffer This method returns CVImageBuffer of media data, which is actually an alias of CVBuffer. Very important to note that returned buffer is not owned by the caller, which means it needs to be deep copied into another instance. Deep copy means we need to release the reference to memory heap and allocate new segment in memory for copied buffer. Otherwise the SampeBuffer from delegate method wouldn’t be freed and this can easily cause two potential problems:
    • First one your app would crash from OOM very fast
    • Second option is that delegate method would stop bringing new samplebuffer from video output in order to save memory space and wait until retained buffer instances wouldn’t be freed. Therefore we wouldn’t have enough data to accomplish our goal.
  2. We’re checking if the output is type of video and not still image, photo, audio or something else. recordShallStart is just a flag which toggles to true when we indicates that processing should start.
  3. We’re deep-copying the CVImageBuffer and saving it in array. Later I’ll go through deepcopy() extension function.
  4. When the number of saved frames is enough to build the video we toggle the flag recordShallStart to stop saving buffer and call the method to build the video.
extension CVPixelBuffer {
  func deepcopy() -> CVPixelBuffer {
    /// 1
    precondition(CFGetTypeID(self) == CVPixelBufferGetTypeID(), "copy() cannot be called on a non-CVPixelBuffer")

    /// 2
    let attr = CVBufferGetAttachments(self, .shouldPropagate)
    var _copy : CVPixelBuffer? = nil

    /// 3
    CVPixelBufferCreate(
      CFAllocatorGetDefault().takeRetainedValue(),
      CVPixelBufferGetWidth(self),
      CVPixelBufferGetHeight(self),
      CVPixelBufferGetPixelFormatType(self),
      attr,
      &_copy
    )

    guard let copy = _copy else { fatalError() }

    /// 4
    CVPixelBufferLockBaseAddress(self, .readOnly)
    CVPixelBufferLockBaseAddress(copy, CVPixelBufferLockFlags(rawValue: CVOptionFlags(0)))

    /// 5
    let planeCount = CVPixelBufferGetPlaneCount(self)

    for plane in 0..<planeCount {
      let dest = CVPixelBufferGetBaseAddressOfPlane(copy, plane)
      let source = CVPixelBufferGetBaseAddressOfPlane(self, plane)
      let height = CVPixelBufferGetHeightOfPlane(self, plane)
      let bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(self, plane)

      memcpy(dest, source, height * bytesPerRow)
    }

    /// 6
    CVPixelBufferUnlockBaseAddress(copy, CVPixelBufferLockFlags(rawValue: CVOptionFlags(0)))
    CVPixelBufferUnlockBaseAddress(self, .readOnly)

    return copy
  }
}
  1. Validate if the CVBuffer we’re trying to copy is actually an instance of CVPixelBuffer
  2. Get a CFDictionary of all attachments in CVBuffer.
  3. Create a new copy of CVPixelBuffer with same width and height, format type and attachments as the copying buffer. Allocate new instance into _copy.
  4. Locking the base address to avoid the undefined behaviour. I’m locking self with readOnly, as I’m not modifying it during the lock. In the contrary I’m locking copy with zero flag as I’m going to update the memory value of it.
  5. For each buffer plane from self we’re going to copy all bytes into the buffer plane of copy.
  6. Unlock the base address, so self can be released after the strong reference count to self would decrease to zero. Unlocking should be strongly with the same flag as it was with locking, otherwise it may lead to undefined behavior.

Movie Writer

Now in order to glue all the data we grabbed so far, we need to use magic AVAssetWriter. It’s a powerful class in AVFoundation which allow to write a video into output directory, encode the video file format into .mov or .mp4 and manage the metadata of the frames while recording it.

let documentsPath = NSSearchPathForDirectoriesInDomains(.documentDirectory, .userDomainMask, true)[0] as NSString
let videoOutputURL = URL(fileURLWithPath: documentsPath.appendingPathComponent("MergedVideo.mov"))

do {
  try FileManager.default.removeItem(at: videoOutputURL)
} catch {}

do {
  try videoWriter = AVAssetWriter(outputURL: videoOutputURL, fileType: AVFileTypeQuickTimeMovie)
} catch let writerError as NSError {
  error = writerError
  videoWriter = nil
}

Just to be sure that output video is not existing I’m trying to remove it before writer initialisation. AVAssetWriter constructor takes two arguments: the output url, which is a path where the resulting video would be stored, and fileType which can be QuickTime format, MPEG4 format or many other options.

let firstPixelBuffer = buffer.first!
let width = CVPixelBufferGetWidth(firstPixelBuffer)
let height = CVPixelBufferGetHeight(firstPixelBuffer)

let videoSettings: [String : Any] = [
  AVVideoCodecKey  : AVVideoCodecH264,
  AVVideoWidthKey  : width,
  AVVideoHeightKey : height
]

let videoWriterInput = AVAssetWriterInput(mediaType: AVMediaTypeVideo, outputSettings: videoSettings)

let attr = CVBufferGetAttachments(firstPixelBuffer, .shouldPropagate) as! [String : Any]

let pixelBufferAdaptor = AVAssetWriterInputPixelBufferAdaptor(
  assetWriterInput: videoWriterInput,
  sourcePixelBufferAttributes: attr
)

assert(videoWriter.canAdd(videoWriterInput))
videoWriterInput.expectsMediaDataInRealTime = true
videoWriter.add(videoWriterInput)

Now in order to tell our writer how exactly manipulate input data we need to get the size properties from the buffer sample data and build a property dictionary with width, height key and define the codec writer is going to use while encoding the video for us.

AVAssetWriterInput is a dependency class which takes all the heavy lifting processes of appending video samples into one video segment.

AVAssetWriterInputPixelBufferAdaptor is a class which stores the CVPixelBuffer instances into a PixelBufferPool and provides samples for AVAssetWriterInput when latter needs one for appending into writing segment.

if videoWriter.startWriting() {
  videoWriter.startSession(atSourceTime: kCMTimeZero)
  assert(pixelBufferAdaptor.pixelBufferPool != nil)

  let media_queue = DispatchQueue(label: "mediaInputQueue")

  videoWriterInput.requestMediaDataWhenReady(on: media_queue) { [weak self] in
    guard let welf = self else { return }
    let frameDuration = CMTimeMake(1, welf.fps)
    let currentProgress = Progress(totalUnitCount: Int64(welf.buffer.count))

    var frameCount: Int64 = 0
    var remainingPhotoURLs = welf.buffer

    while !remainingPhotoURLs.isEmpty {
      let nextPhotoURL = remainingPhotoURLs.remove(at: 0)
      let lastFrameTime = CMTimeMake(frameCount, welf.fps)
      let presentationTime = frameCount == 0 ? lastFrameTime : CMTimeAdd(lastFrameTime, frameDuration)

      while !videoWriterInput.isReadyForMoreMediaData {
          Thread.sleep(forTimeInterval: 0.1)
      }

      pixelBufferAdaptor.append(nextPhotoURL, withPresentationTime: presentationTime)

      frameCount += 1

      currentProgress.completedUnitCount = frameCount
      progress(currentProgress)
    }

    videoWriterInput.markAsFinished()
    videoWriter.finishWriting {
      if error == nil {
        success(videoOutputURL)
      }

      welf.videoWriter = nil
    }
  }
}

After all preparation is done, we can start writing the video. At first try to start the writer by calling videoWriter.startWriting(). Initialize new session for the writer and set the source time to kCMTimeZero so the video would have frames from the beginning.

Then by calling videoWriterInput.requestMediaDataWhenReady on background thread you marking the writer input to become available for appending new media data until it’s not mark as finished. So that block would be called repeatedly until we stop it manually when all the frames would be appended.

So in that block we’re appending all the samples we gathered in the array and then marking the input as finished so we can call finishWriting() on our videoWriter.

NOTE: there is a workaround of sleeping the thread for 0.1 of second, because sometime videoWriterInput can’t process all the data as fast as it gets, therefor this workaround is making sure that no frame would be dropped during the writing, even though it can significantly increase the waiting time.

Reversing algorithm

All we discussed above is a code which takes the frames from AVCaptureDataOutput’s delegate and then merge them into video segment. You may ask where the reversing come from. Well that’s the easiest part in this tutorial. In order to get the reversed video you need to append to buffers array copies of itself in reversed order. The most easy way to achieve this in Swift is two lines of elegant code

var requestBuffer = buffers
buffers[0 ... buffers.count - 2].reversed().forEach { requestBuffer.append($0.deepcopy()) }

Check out the full code here

Happy Coding!