My remix: Two Sum, but it's 100M numbers and you get 512 MB
After solving Two Sum, I started asking my own follow-up questions — the kind an interviewer asks when you finish too fast.
Question 1: what about duplicates and negatives?
The one-pass hash map already handles both — [-3, 7, -3] with target -6 works because we check before inserting, so the second -3 finds the first. The trap people imagine isn't there. Lesson: test your mental model before "fixing" it.
Question 2 (the real remix): 100M numbers, 512 MB RAM
Now the interesting one. 100M × 8-byte integers = 800 MB raw. A Python dict of 100M entries? Several GB. The classic solution dies. What survives?
Idea 1: sort + two pointers — O(1) extra memory, but…
Sorting 800 MB in 512 MB means external merge sort — sort chunks that fit, spill to disk, merge. Then walk two pointers from both ends:
lo, hi = 0, n - 1
while lo < hi:
s = a[lo] + a[hi]
if s == target: return (a[lo], a[hi])
if s < target: lo += 1
else: hi -= 1Works, but we lost the original indices in the sort — if the problem needs indices, we must carry them along (doubling what we sort).
Idea 2: partition by hash — the distributed-systems move
Stream the file once and route each number x into bucket hash(x) % 16 — 16 files of ~50 MB. Here's the key insight: if x + y = target, then x and target - y land in predictable bucket pairs. So process bucket pairs (i, j) where a complement in bucket j can match bucket i — each pair fits in RAM, and inside a pair it's the ordinary hash-map solution again.
Same trick MapReduce and every shuffle-based join uses. Two Sum at scale is a distributed join.
The trade-off table
| Approach | Time | Extra RAM | Disk I/O | Keeps indices? |
|---|---|---|---|---|
| Hash map (classic) | O(n) | O(n) — too big | none | ✅ |
| External sort + 2 pointers | O(n log n) | O(1) | ~3 passes | needs care |
| Hash partitioning | O(n) | O(n/k) per bucket | 2 passes | ✅ |
Why I do this
The original problem teaches a data structure. The remix teaches engineering — the constraint that breaks your favorite tool is where the real learning starts. More of these coming.