Context

This seminal paper was written at in 2010 - just when phone cameras started getting okay and phones were beginning to contain non-trivial RAM. There was a palpable excitement in the air, and people were trying to come up with many techniques for using computation to get high-quality photographs from a bunch of middling ones.

The problem was there not being a consistent API for dealing with camera hardware. No coherent system that made this accessible. The prior art (Canon Hack Development Kit and Magic Lantern project) didn’t make it so that the lower levels of the camera were accessible, and were often too specific to firmware/models.

Architecture

The authors talk about how their implementation is precise enough for “implementations to be built and verified for it”, yet high-level and general enough to allow for portability and evolution.

Doing the last reading taught me to pay special attention to the goals/constraints while reading a systems paper. The Frankencamera paper has 8 goals - they center around the system being usable, easy to program, able to manipulate setting on a per-frame basis at video rate, allowing access to raw pixels, and so on.

Nouns

The “nouns” in the Frankencamera abstract machine, the major moving parts of the system, are devices, sensors and processors. The authors do a great job explaining this in Figure 2. Sensors are not devices - sensors are managers of the imaging pipeline, and form the core of the camera. Devices are items like a lens, a flash and so on.

There is an application processor and an image processor. The image processor does all the “standard” stuff, and the application processor is where “fancier” optimizations and algorithms are run. The image processor generates useful stats from raw image data which it attaches to the corresponding returned frame. It does things like demosaicking, white-balancing and resizing - effectively converting the sensor output to a “useful” form.

Verbs

A shot is a standard C class within the Frankencamera abstract machine implementation. It’s described as a “bundle of parameters that completely describe the capture and post-processing of a single output image”.

I think the example code does a good job of explaining the class members:

Shot shot;
shot.gain = 1.0;
shot.exposure = 10000;
shot.frameTime = 33333;
shot.image = Image(640, 480, UYVY);
shot.histogram.regions = 1;
shot.histogram.region[0] = Rect(0, 0, 640, 480);

Other examples of class members are presumably things like output resolution, format, frame rate and so on. A shot specifies the series of actions to be taken by devices during their exposure. I think “timeline” could be a good word to describe a shot.

The distinction between frames and shots is interesting. The API guarantees exactly one frame per shot requested, in the same order. As to why this distinction is made, this is similar to a request-response paradigm one sees in the client-server model. The shot is the request, and the frame is the response. The response (frame) can be invalid, it is possible the sensor was not able to achieve the requested parameters. Users can look at a corresponding frame object’s shot through the id field.

The abstract machine is programmable as opposed to configurable. One can run programs in individual components of the abstract machine - more than merely just changing configurable parameters. The rephotography program in the paper is a great example of the type of program that can be run on the architecture. HDR+ would be great to implement on this as well.