I think you are right. I suspect the technique they use is similar to defered rendering, but in this case the G-buffers are pre-computed, ie. albedo, normals, depth and possibly ambient occlusion. This would allow them to combine pre-computed and dynamic light elements together.
I must say I am very impressed at the tech and it can only get better.