>> We need buffer, L2, core, device (CPU/GPU/DSP), and parameter management
>> and optimization while chaining a number of compute intensive modules on large
>> amounts of mostly use-once data in, for some modes, a highly repetitive
>> environment.  All while keeping the functional code clean and highly reconfigurable
>> at compile or runtime with several alternate versions.
> That reminds me -- have we discussed Halide here?
> Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines
> Halide
> I've long though we need a way to decouple the "expression" of the code's intent from the
> "mechanism" of its efficient implementation.  Halide is the closest I've seen to a real-world articulation of that philosophy.
> Do you know anyone else who's attempted this? Any thoughts on whether this will prove useful?

Link is here, for anyone too lazy to search:

Yeah, I've read over the Halide paper; I've not done anything with it
in particular, but I do think the general idea is a very interesting
one, and probably worth considering in terms of decoupling different

