Skip to main content

Demo Programming Guide

This document describes the current interface of the zkCuda demo. The related code can be found in zkcuda. A complete example is available in zkcuda_1.rs.

Kernel Function Definition

An example of a kernel function is as follows:

fn add_2<C: Config>(api: &mut API<C>, inputs: &mut Vec<Vec<Variable>>) {
let a = inputs[0][0];
let b = inputs[0][1];
let sum = api.add(a, b);
inputs[1][0] = sum;
}

This function is similar to the ones used in memorized_simple_call in the standard compiler, but the input array is a two-dimensional array. The array inputs serves not only as input but also as output.

It is roughly equivalent to the following Cuda function:

__global__ void add_2_kernel(int* input, int* output, int n) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < n) {
int input_idx = idx * 2;
output[idx] = input[input_idx] + input[input_idx + 1];
}
}

Compilation

Before using the kernel, it needs to be compiled. The example kernel above can be compiled as follows:

let kernel_add_2: Kernel<M31Config> = compile_with_spec(
add_2,
&[
IOVecSpec {
len: 2,
is_input: true,
is_output: false,
},
IOVecSpec {
len: 1,
is_input: false,
is_output: true,
},
],
)
.unwrap();

Here, some IOVecSpec are introduced. They indicate the length required for each input array in a zk thread, as well as whether they are inputs or outputs.

For instance, the first parameter has a length of 2, is an input, and is not an output. When calling this kernel, the user needs to provide an array of length 2 for this parameter. After the kernel runs, the user will not get any output from this parameter.

Note that this syntax is somewhat verbose, but it is only in the current demo version. We plan to use a more concise definition method in the future.

Context

The context automatically maintains the existing proof and commits the input variables. It provides the following functions:

impl<C: Config, P: ProvingSystem<C>> Default for Context<C, P> {
fn default() -> Self {
// Implementation
}
}

impl<C: Config, P: ProvingSystem<C>> Context<C, P> {
pub fn copy_to_device(&mut self, host_memory: &[C::CircuitField]) -> DeviceMemoryHandle {
// Implementation
}

pub fn copy_to_host(&self, device_memory_handle: DeviceMemoryHandle) -> Vec<C::CircuitField> {
// Implementation
}

pub fn call_kernel(
&mut self,
kernel: &Kernel<C>,
ios: &mut [Option<DeviceMemoryHandle>],
parallel_count: usize,
is_broadcast: &[bool],
) {
// Implementation
}

pub fn to_proof(self) -> CombinedProof<C, P> {
// Implementation
}
}

The call_kernel function here is relatively long. In addition to the kernel itself, it requires a few other parameters. parallel_count specifies how many zk threads will run in parallel. is_broadcast determines how each parameter will be distributed. If a parameter's is_broadcast is true, each zk thread will receive the same input; otherwise, the input provided by the user will be divided into parallel_count parts, with each zk thread receiving one part.

For example, suppose a kernel requires input lengths of 2, 4, 4, and parallel_count = 8, is_broadcast = [false, true, false]. In this case, the user needs to provide three inputs with lengths of 16, 4, 32, respectively.

Kernel API (ExpanderCompilerCollection)

The compiler APIs that can be used inside a kernel are the same as those used in regular circuits. You can learn more from Rust APIs.

Complete Example

Here's an example of how to use this CUDA-like circuit frontend:

See zkcuda_1.rs.