← All posts
3 min read

My remix: Two Sum, but it's 100M numbers and you get 512 MB

#leetcode#algorithms#memory

After solving Two Sum, I started asking my own follow-up questions — the kind an interviewer asks when you finish too fast.

Question 1: what about duplicates and negatives?

The one-pass hash map already handles both — [-3, 7, -3] with target -6 works because we check before inserting, so the second -3 finds the first. The trap people imagine isn't there. Lesson: test your mental model before "fixing" it.

Question 2 (the real remix): 100M numbers, 512 MB RAM

Now the interesting one. 100M × 8-byte integers = 800 MB raw. A Python dict of 100M entries? Several GB. The classic solution dies. What survives?

Idea 1: sort + two pointers — O(1) extra memory, but…

Sorting 800 MB in 512 MB means external merge sort — sort chunks that fit, spill to disk, merge. Then walk two pointers from both ends:

lo, hi = 0, n - 1
while lo < hi:
    s = a[lo] + a[hi]
    if s == target: return (a[lo], a[hi])
    if s < target: lo += 1
    else: hi -= 1

Works, but we lost the original indices in the sort — if the problem needs indices, we must carry them along (doubling what we sort).

Idea 2: partition by hash — the distributed-systems move

Stream the file once and route each number x into bucket hash(x) % 16 — 16 files of ~50 MB. Here's the key insight: if x + y = target, then x and target - y land in predictable bucket pairs. So process bucket pairs (i, j) where a complement in bucket j can match bucket i — each pair fits in RAM, and inside a pair it's the ordinary hash-map solution again.

Same trick MapReduce and every shuffle-based join uses. Two Sum at scale is a distributed join.

The trade-off table

ApproachTimeExtra RAMDisk I/OKeeps indices?
Hash map (classic)O(n)O(n) — too bignone
External sort + 2 pointersO(n log n)O(1)~3 passesneeds care
Hash partitioningO(n)O(n/k) per bucket2 passes

Why I do this

The original problem teaches a data structure. The remix teaches engineering — the constraint that breaks your favorite tool is where the real learning starts. More of these coming.