Skip to main content

Command Palette

Search for a command to run...

Build a word finder in 60 lines of JavaScript: the data-structure tour

Updated
4 min read

Build a word finder in 60 lines of JavaScript: the data-structure tour

Building a word finder is a rite of passage for developers. It seems simple—just find words that match a set of letters—but it quickly becomes a masterclass in algorithmic trade-offs. Whether you are building a tool like a production word-finder that uses a variant of approach 3 or a simpler real-world example, the underlying data structure dictates your performance.

Let’s walk through four ways to build this, moving from "it works" to "it’s fast."

1. The Naive Approach: Array.filter

The most intuitive way to find words is to iterate through a dictionary array and check if every letter in a candidate word exists in your input.

const dictionary = ["apple", "banana", "cherry", "date"];
const findWords = (letters, dict) => {
  return dict.filter(word => {
    let temp = [...letters];
    return [...word].every(char => {
      const idx = temp.indexOf(char);
      return idx > -1 ? (temp.splice(idx, 1), true) : false;
    });
  });
};

The Verdict: This is \(O(N \times M)\) where $N$ is the dictionary size and $M$ is word length. It’s fine for 1,000 words, but once you hit a standard English dictionary (200k+ words), your UI will freeze.

2. The Set-Based Lookup

If you are looking for exact anagrams rather than subsets, a Set or a Map is your best friend. By storing words in a hash table, you turn an $O(N)$ search into an $O(1)$ lookup.

const dictSet = new Set(["apple", "banana", "cherry"]);
const isWord = (word) => dictSet.has(word);

The Verdict: This is lightning fast for validation, but useless for "finding" words from a jumble of letters. It doesn't help you discover that "cat" is inside "act."

3. The Canonical Key (Sorted Signature)

To find anagrams efficiently, we normalize the data. If you sort the letters of any word alphabetically, all anagrams result in the same string. "Listen" and "Silent" both become "eilnst".

const buildAnagramMap = (dict) => {
  const map = {};
  for (const word of dict) {
    const key = [...word].sort().join('');
    if (!map[key]) map[key] = [];
    map[key].push(word);
  }
  return map;
};

The Verdict: This is the industry standard for anagram solvers. You sort the input letters once, then perform a single hash-map lookup. It is incredibly fast, though it requires pre-processing the dictionary.

4. The Bitmask Subset Check

For "Scrabble-style" word finders where you need to find all possible subsets of letters, bitmasks are the secret weapon. By mapping each letter (a-z) to a prime number or a specific bit position, you can represent a word as a single integer. Checking if word A is a subset of word B becomes a simple bitwise AND operation.

Performance Benchmarks (Approximate)

Approach Time Complexity Memory Usage Best For
Array.filter $O(N)$ Low Small lists
Set Lookup $O(1)$ Medium Validation
Sorted Sig $O(1)$ High Anagrams
Bitmask $O(1)$ Very Low Subset search

Where each approach breaks down

Every optimization has a hidden cost. Understanding these limits is what separates a junior implementation from a production-grade engine.

1. Memory constraints (Approach 2 & 3): The Sorted Signature and Set approaches require keeping the entire dictionary in memory. If you are building a mobile web app, loading a 5MB dictionary into a Map can lead to significant memory pressure and slow initial load times.

2. Language-specific character sets (Approach 3 & 4): The "Sorted Signature" and "Bitmask" methods assume a standard Latin alphabet. If you need to support languages with diacritics (like French or German) or non-Latin scripts (like Kanji), the sorting logic becomes complex. Bitmasks also fail when you exceed 32 or 64 bits, requiring you to use BigInt or custom objects, which can negate the performance gains.

3. The "Subset" Problem: The Sorted Signature approach is perfect for finding exact anagrams, but it is terrible for finding words that are subsets of your input (e.g., finding "cat" inside "caterpillar"). For that, you need a Trie (Prefix Tree). A Trie allows you to traverse the dictionary letter-by-letter, pruning branches that don't match your available letters.

If you are just starting, stick to the Sorted Signature method. It provides the most "bang for your buck" in terms of code complexity versus performance. Once you hit the limits of that, it’s time to dive into Tries and recursive backtracking. Happy coding!

More from this blog

G

GWN Word Tools

6 posts