Supplement: Native Parallel Renderer

All of our implementations so far have been designed to run as WebAssembly (WASM). Because WASM runs single-threaded and synchronously, rendering the final scene at 1200×800 with 500 samples/px takes over 30 minutes in the browser.

In this supplement we build a standalone native program based on r113-final-scene-hq, adding native execution, PNG output, and multi-threaded parallelism. The ray-tracing algorithms themselves are unchanged.

/
├── Cargo.toml
└── src/
    ├── main.rs          # CLI, parallel loop, PNG save
    └── rt/              # common crate ported inline
        ├── mod.rs
        ├── vec3.rs
        ├── ray.rs
        ├── utils.rs
        ├── hittable.rs
        ├── hittable_list.rs
        ├── sphere.rs
        ├── material.rs
        └── camera.rs

Changes from the WASM Version

New Dependencies

Cargo.toml

toml

[package]
name = "raytracing-book1-final-scene"
version = "0.1.0"
edition = "2024"

[dependencies]
clap  = { version = "4", features = ["derive"] }
image = "0.25"
rayon = "1"

Crate	Purpose
`clap`	Command-line option parsing
`image`	PNG file output
`rayon`	Data-parallel (multi-threaded) loops

The WASM version needed wasm-bindgen, which is omitted here. There is also no dependency on the common crate — the relevant code is ported directly into src/rt/.

Output Format: PPM → PNG

The WASM version returned PPM text as a String for JavaScript to render. The native version uses the image crate to write PNG directly to disk.

WASM version (PPM)

rust

let mut output = String::new();
output.push_str("P3\n");
output.push_str(&format!("{} {}\n", image_width, image_height));
output.push_str("255\n");

for j in (0..image_height).rev() {
    for i in 0..image_width {
        // ...accumulate samples...
        output.push_str(&format!("{}\n", write_color_gamma(pixel_color, samples)));
    }
}
output  // returned as String

Native version (PNG)

rust

fn color_to_rgb(pixel_color: Color, samples_per_pixel: u32) -> [u8; 3] {
    let scale = 1.0 / samples_per_pixel as f64;
    let r = (pixel_color.x() * scale).sqrt().clamp(0.0, 0.999);
    let g = (pixel_color.y() * scale).sqrt().clamp(0.0, 0.999);
    let b = (pixel_color.z() * scale).sqrt().clamp(0.0, 0.999);
    [(256.0 * r) as u8, (256.0 * g) as u8, (256.0 * b) as u8]
}

// ...after rendering...
let mut img = ImageBuffer::new(image_width, image_height);
for (j, row) in rows.iter().enumerate() {
    for (i, &[r, g, b]) in row.iter().enumerate() {
        img.put_pixel(i as u32, j as u32, Rgb([r, g, b]));
    }
}
img.save(&output_path).expect("Failed to save PNG");

The gamma correction and scaling that write_color_gamma performed are now handled by color_to_rgb, which returns [u8; 3]. The algorithm ( $\sqrt{\cdot}$ ) is identical.

Multi-Threading

What We Parallelize

We parallelize at the row level. Each row can be computed independently, making it an ideal target for data parallelism.

rust

let rows: Vec<Vec<[u8; 3]>> = (0..image_height)
    .into_par_iter()            // rayon parallel iterator
    .map(|j| {
        let world_j = image_height - 1 - j;
        (0..image_width)
            .map(|i| {
                let mut pixel_color = Color::new(0.0, 0.0, 0.0);
                for _ in 0..samples_per_pixel {
                    let u = (i as f64 + random_double()) / (image_width - 1) as f64;
                    let v = (world_j as f64 + random_double()) / (image_height - 1) as f64;
                    let ray = camera.get_ray(u, v);
                    pixel_color += ray_color(&ray, &world, max_depth);
                }
                color_to_rgb(pixel_color, samples_per_pixel)
            })
            .collect()
    })
    .collect();

Adding .into_par_iter() is all it takes — rayon creates a thread pool sized to the CPU core count and distributes rows among threads.

The image-space row index j (0 at top) is converted to the world-space $v$ coordinate via world_j = image_height - 1 - j (0 at bottom).

Changes Required for Thread Safety

rayon's par_iter requires that the iterated data is Send + Sync. Because all threads share a reference to the world (HittableList), we must add Send + Sync as supertraits on Hittable.

WASM version (common/src/hittable.rs)

rust

pub trait Hittable {
    fn hit(&self, r: &Ray, t_min: f64, t_max: f64) -> Option<HitRecord>;
}

Native version (src/rt/hittable.rs)

rust

// Send + Sync supertrait bounds are required for sharing the world across rayon threads.
pub trait Hittable: Send + Sync {
    fn hit(&self, r: &Ray, t_min: f64, t_max: f64) -> Option<HitRecord>;
}

The Material trait already declared Send + Sync in the WASM version (common/src/material.rs), so no change is needed there.

rust

pub trait Material: Send + Sync {
    fn scatter(&self, r_in: &Ray, rec: &HitRecord) -> Option<(Color, Ray)>;
}

Thread Safety of the Random Number Generator

Both the WASM and native versions use a thread_local! Xorshift64. Each thread gets its own independent instance, so no locking is needed — multiple threads calling random_double() simultaneously never conflict.

rust

thread_local! {
    static RNG_STATE: Cell<u64> = Cell::new(0x123456789abcdef0);
}

All threads start with the same seed 0x123456789abcdef0. This means that changing the thread count changes the random sequence, which in turn changes the small sphere placement. The rendering result varies across runs.

Progress Display

We could not implement progress display in the WASM version. In the native version we use an AtomicUsize to share the completed row count across threads and print it to standard error.

rust

let completed = AtomicUsize::new(0);

// ...inside the par_iter map...
let done = completed.fetch_add(1, Ordering::Relaxed) + 1;
eprint!("\rRows: {}/{}", done, total_rows);

fetch_add is an atomic operation (indivisible), so the counter increments correctly even when multiple threads execute it simultaneously. Ordering::Relaxed provides the minimum ordering guarantee, which is sufficient for a plain counter.

Command-Line Options

Options are defined using the clap derive macro.

rust

#[derive(Parser)]
struct Cli {
    #[arg(short, long, default_value = "output.png")]
    output: String,
    #[arg(long, default_value_t = 1200)]
    width: u32,
    #[arg(long)]
    height: Option<u32>,
    #[arg(short, long, default_value_t = 500)]
    samples: u32,
    #[arg(long, default_value_t = 11)]
    grid: i32,
    #[arg(long, default_value_t = 50)]
    max_depth: i32,
    #[arg(short = 'j', default_value_t = 0)]
    threads: usize,
}

--help output:

Usage: raytracing-book1-final-scene [OPTIONS]

Options:
  -o, --output <OUTPUT>        Output PNG file path [default: output.png]
      --width <WIDTH>          Image width in pixels [default: 1200]
      --height <HEIGHT>        Image height in pixels (default: width / 1.5)
  -s, --samples <SAMPLES>      Samples per pixel [default: 500]
      --grid <GRID>            Half-grid size N; 2N×2N grid [default: 11]
      --max-depth <MAX_DEPTH>  Maximum ray recursion depth [default: 50]
  -j <THREADS>                 Number of worker threads (0 = all cores) [default: 0]
  -h, --help                   Print help

Omitting --height automatically sets it to width / 1.5 (3:2 aspect ratio). -j 0 uses all available CPU cores.

Usage Examples

bash

# Default: 1200×800, 500 samples/px, all cores
./raytracing-book1-final-scene

# Preview: small size, few samples, 4 threads
./raytracing-book1-final-scene --width 400 -s 20 --grid 5 -j 4 -o preview.png

# High resolution
./raytracing-book1-final-scene --width 1920 --height 1280 -s 200 -o hi.png

Timings on Apple M2 Max

Measured with 12 threads (-j 0 = all cores):

Resolution	Samples/px	real (wall time)	user (total CPU)
300 × 200	100	3.1 s	26.5 s
600 × 400	100	11.9 s	1 min 48 s
1200 × 800	100	49.2 s	7 min 44 s
1200 × 800	500	3 min 49 s	37 min 46 s

user is the sum of CPU time across all threads. The ratio user / real ≈ 12 confirms that the 12 threads are running in near-ideal parallel.

For comparison, the WASM single-threaded version takes over 30 minutes for the equivalent 1200×800 at 500 samples/px.

Complete Implementation

Cargo.toml

toml

[package]
name = "raytracing-book1-final-scene"
version = "0.1.0"
edition = "2024"

[dependencies]
clap  = { version = "4", features = ["derive"] }
image = "0.25"
rayon = "1"

src/main.rs

rust

use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;

use clap::Parser;
use image::{ImageBuffer, Rgb};
use rayon::prelude::*;

mod rt;
use rt::{
    Camera, Color, Dielectric, Hittable, HittableList, Lambertian, Metal, Point3, Ray, Sphere,
    random_double, random_double_range, unit_vector,
};

#[derive(Parser)]
#[command(
    name = "raytracing-book1-final-scene",
    about = "Ray Tracing in One Weekend — final scene renderer (native, parallel PNG output)"
)]
struct Cli {
    /// Output PNG file path
    #[arg(short, long, default_value = "output.png")]
    output: String,

    /// Image width in pixels
    #[arg(long, default_value_t = 1200)]
    width: u32,

    /// Image height in pixels (default: width / 1.5, matching the 3:2 aspect ratio)
    #[arg(long)]
    height: Option<u32>,

    /// Samples per pixel
    #[arg(short, long, default_value_t = 500)]
    samples: u32,

    /// Half-grid size N; places small spheres on a 2N×2N grid (default: 11 → 22×22)
    #[arg(long, default_value_t = 11)]
    grid: i32,

    /// Maximum ray recursion depth
    #[arg(long, default_value_t = 50)]
    max_depth: i32,

    /// Number of worker threads (0 = use all available CPU cores)
    #[arg(short = 'j', default_value_t = 0)]
    threads: usize,
}

fn ray_color(r: &Ray, world: &dyn Hittable, depth: i32) -> Color {
    if depth <= 0 {
        return Color::new(0.0, 0.0, 0.0);
    }
    if let Some(rec) = world.hit(r, 0.001, f64::INFINITY) {
        if let Some(mat) = &rec.mat {
            if let Some((attenuation, scattered)) = mat.scatter(r, &rec) {
                return attenuation * ray_color(&scattered, world, depth - 1);
            }
        }
        return Color::new(0.0, 0.0, 0.0);
    }
    let unit_direction = unit_vector(r.direction());
    let t = 0.5 * (unit_direction.y() + 1.0);
    (1.0 - t) * Color::new(1.0, 1.0, 1.0) + t * Color::new(0.5, 0.7, 1.0)
}

fn random_scene(grid: i32) -> HittableList {
    let mut world = HittableList::new();

    // Ground: large Lambertian sphere (gray)
    world.add(Box::new(Sphere::with_material(
        Point3::new(0.0, -1000.0, 0.0),
        1000.0,
        Arc::new(Lambertian::new(Color::new(0.5, 0.5, 0.5))),
    )));

    // Place spheres randomly on a 2N×2N grid.
    for a in -grid..grid {
        for b in -grid..grid {
            let choose_mat = random_double();
            let center = Point3::new(
                a as f64 + 0.9 * random_double(),
                0.2,
                b as f64 + 0.9 * random_double(),
            );

            // Avoid overlapping the three large focal spheres.
            let focal_1 = Point3::new(4.0, 0.2, 0.0);
            let focal_2 = Point3::new(0.0, 1.0, 0.0);
            let focal_3 = Point3::new(-4.0, 0.2, 0.0);

            if (center - focal_1).length() > 0.9
                && (center - focal_2).length() > 0.9
                && (center - focal_3).length() > 0.9
            {
                if choose_mat < 0.8 {
                    // Lambertian (diffuse): 80%
                    let albedo = Color::new(
                        random_double() * random_double(),
                        random_double() * random_double(),
                        random_double() * random_double(),
                    );
                    world.add(Box::new(Sphere::with_material(
                        center,
                        0.2,
                        Arc::new(Lambertian::new(albedo)),
                    )));
                } else if choose_mat < 0.95 {
                    // Metal: 15%
                    let albedo = Color::new(
                        random_double_range(0.5, 1.0),
                        random_double_range(0.5, 1.0),
                        random_double_range(0.5, 1.0),
                    );
                    let fuzz = random_double_range(0.0, 0.5);
                    world.add(Box::new(Sphere::with_material(
                        center,
                        0.2,
                        Arc::new(Metal::new(albedo, fuzz)),
                    )));
                } else {
                    // Dielectric (glass): 5%
                    world.add(Box::new(Sphere::with_material(
                        center,
                        0.2,
                        Arc::new(Dielectric::new(1.5)),
                    )));
                }
            }
        }
    }

    // Three prominent focal spheres
    world.add(Box::new(Sphere::with_material(
        Point3::new(-4.0, 1.0, 0.0),
        1.0,
        Arc::new(Lambertian::new(Color::new(0.4, 0.2, 0.1))),
    )));
    world.add(Box::new(Sphere::with_material(
        Point3::new(0.0, 1.0, 0.0),
        1.0,
        Arc::new(Dielectric::new(1.5)),
    )));
    world.add(Box::new(Sphere::with_material(
        Point3::new(4.0, 1.0, 0.0),
        1.0,
        Arc::new(Metal::new(Color::new(0.7, 0.6, 0.5), 0.0)),
    )));

    world
}

/// Converts an accumulated pixel color to a gamma-corrected RGB triple.
fn color_to_rgb(pixel_color: Color, samples_per_pixel: u32) -> [u8; 3] {
    let scale = 1.0 / samples_per_pixel as f64;
    let r = (pixel_color.x() * scale).sqrt().clamp(0.0, 0.999);
    let g = (pixel_color.y() * scale).sqrt().clamp(0.0, 0.999);
    let b = (pixel_color.z() * scale).sqrt().clamp(0.0, 0.999);
    [(256.0 * r) as u8, (256.0 * g) as u8, (256.0 * b) as u8]
}

fn main() {
    let cli = Cli::parse();

    let image_width = cli.width;
    let aspect_ratio = 3.0_f64 / 2.0;
    let image_height = cli.height.unwrap_or_else(|| (image_width as f64 / aspect_ratio) as u32);
    let samples_per_pixel = cli.samples;
    let max_depth = cli.max_depth;
    let total_rows = image_height as usize;

    // Configure the rayon thread pool before any parallel work.
    if cli.threads > 0 {
        rayon::ThreadPoolBuilder::new()
            .num_threads(cli.threads)
            .build_global()
            .expect("Failed to build thread pool");
    }

    let thread_count = rayon::current_num_threads();
    eprintln!(
        "Rendering {}×{}, {} samples/px, grid=±{}, depth={}, threads={}",
        image_width, image_height, samples_per_pixel, cli.grid, max_depth, thread_count
    );

    let world = random_scene(cli.grid);

    let lookfrom = Point3::new(13.0, 2.0, 3.0);
    let lookat = Point3::new(0.0, 0.0, 0.0);
    let vup = Point3::new(0.0, 1.0, 0.0);
    let dist_to_focus = 10.0_f64;
    let aperture = 0.1_f64;

    let camera = Camera::new(
        lookfrom,
        lookat,
        vup,
        20.0,
        aspect_ratio,
        aperture,
        dist_to_focus,
    );

    let completed = AtomicUsize::new(0);

    // Render all rows in parallel; row index 0 is the top of the image.
    let rows: Vec<Vec<[u8; 3]>> = (0..image_height)
        .into_par_iter()
        .map(|j| {
            // Map image-space top-to-bottom row j to world-space bottom-to-top coordinate.
            let world_j = image_height - 1 - j;
            let row = (0..image_width)
                .map(|i| {
                    let mut pixel_color = Color::new(0.0, 0.0, 0.0);
                    for _ in 0..samples_per_pixel {
                        let u = (i as f64 + random_double()) / (image_width - 1) as f64;
                        let v = (world_j as f64 + random_double()) / (image_height - 1) as f64;
                        let ray = camera.get_ray(u, v);
                        pixel_color += ray_color(&ray, &world, max_depth);
                    }
                    color_to_rgb(pixel_color, samples_per_pixel)
                })
                .collect();

            let done = completed.fetch_add(1, Ordering::Relaxed) + 1;
            eprint!("\rRows: {}/{}", done, total_rows);

            row
        })
        .collect();

    eprintln!("\nWriting {}...", cli.output);

    let mut img = ImageBuffer::new(image_width, image_height);
    for (j, row) in rows.iter().enumerate() {
        for (i, &[r, g, b]) in row.iter().enumerate() {
            img.put_pixel(i as u32, j as u32, Rgb([r, g, b]));
        }
    }
    img.save(&cli.output).expect("Failed to save PNG");

    eprintln!("Done: {}", cli.output);
}

Supplement: Native Parallel Renderer ​

Changes from the WASM Version ​

New Dependencies ​

Output Format: PPM → PNG ​

Multi-Threading ​

What We Parallelize ​

Changes Required for Thread Safety ​

Thread Safety of the Random Number Generator ​

Progress Display ​

Command-Line Options ​

Usage Examples ​

Timings on Apple M2 Max ​

Complete Implementation ​

Supplement: Native Parallel Renderer

Changes from the WASM Version

New Dependencies

Output Format: PPM → PNG

Multi-Threading

What We Parallelize

Changes Required for Thread Safety

Thread Safety of the Random Number Generator

Progress Display

Command-Line Options

Usage Examples

Timings on Apple M2 Max

Complete Implementation