Optimizing Concurrent Processing in Go

Common Approach (Using Channels)

Typically, we use a channel to collect results from multiple goroutines as follows:

package main

import "time"

func handleRequest(req int) int {
    time.Sleep(2 * time.Millisecond)
    return req * 2
}

func requestWithChannel(ch chan<- int, req int) {
	ch <- handleRequest(req)
}

func main() {
	requests := []int{1, 2, 3, 4, 5}
	responses := make([]int, len(requests))
	ch := make(chan int, len(requests))

	for _, req := range requests {
		go requestWithChannel(ch, req)
	}

	for i := range requests {
		responses[i] = <-ch
	}
}

In this code snippet, we have a handleRequest function that simulates processing by sleeping for 2 milliseconds and then returning the input multiplied by 2. We then have a requestWithChannel function that sends the result of handleRequest to a channel. In the main function, we create a channel ch with a buffer size equal to the number of requests. We then iterate over the requests, spawning a goroutine for each request to process it concurrently. Finally, we collect the responses from the channel and print them along with the time taken to process all requests.

This approach works fine for small-scale applications, but it has some drawbacks:

Memory Usage: The channel buffer size must be equal to the number of requests, which can lead to high memory usage for large numbers of requests.
Performance: The overhead of using channels can impact performance, especially when dealing with a large number of requests.

Optimized Approach (Using sync.WaitGroup and Pre-Allocated Slice)

Instead of using a channel, we can pre-allocate a slice and write results directly to it. This reduces memory usage and improves performance.

package main

import (
	"sync"
	"time"
)

func handleRequest(req int) int {
    time.Sleep(2 * time.Millisecond)
    return req * 2
}

func requestEfficiently(wg *sync.WaitGroup, res *int, req int) {
	defer wg.Done()
	*res = handleRequest(req)
}

func main() {
	var wg sync.WaitGroup
	requests := []int{1, 2, 3, 4, 5}
	responses := make([]int, len(requests))

	for i, req := range requests {
		wg.Add(1)
		go requestEfficiently(&wg, &responses[i], req)
	}

	wg.Wait()
}

In this optimized version, we create a slice responses to store the results of processing the requests. We also use a sync.WaitGroup to wait for all goroutines to finish processing before printing the results. The requestEfficiently function takes a pointer to the sync.WaitGroup, a pointer to the result slice, and the request to process. It processes the request, updates the result slice, and signals the sync.WaitGroup that it has finished processing.

This approach has several advantages:

Memory Usage: Pre-allocated slice saves memory, eliminating the need for an intermediate channel.
Performance: Better performance by avoiding the synchronization overhead of channels.
Safe Data Handling: No race conditions, as each goroutine writes to a unique element in the slice.

By replacing channels with a pre-allocated slice and using a sync.WaitGroup, we can optimize concurrent processing in Go, reducing memory usage, improving performance, and ensuring safe data handling. This approach is ideal for scenarios where efficiency and simplicity are key.

Compare Two Solutions

I run both solutions and compare their performance and memory usage for processing 1000000 requests.

processWithChannel Duration:  457.1165	ms
processWithChannel Memory used:  16086 KB

processWithWaitGroup Duration:  333.071167 ms
processWithWaitGroup Memory used:  13529 KB

When Should You Use This Approach?

When you have a list of requests and only need to collect results without sharing data between goroutines.
When optimizing for performance and memory efficiency is a priority.
When you want to avoid the overhead of channels by pre-allocating memory upfront.

However, for complex models like fan-in/fan-out, using channels may still be a better option.