Why is my JavaScript string length not what I expect?

I’m working with user input in JavaScript and using the length property on strings, but the values I get don’t match what I see on screen, especially with emojis and special characters. This is breaking my validation logic and UI limits. Can someone explain how JavaScript string length really works and how to accurately count visible characters?

JavaScript string length is in UTF-16 code units, not “what you see on screen”.

So:

  1. Emojis
    Many emojis use 2 code units.

Example:

:grinning_face:’.length === 2
:+1:t4:’.length === 4
:family_man_woman_girl_boy:’.length === 11

Your UI shows 1 glyph. JS counts multiple units.

  1. Accents and combined chars
    Some characters use a base char plus combining mark.

‘é’.length === 2
‘ñ’.length === 1

They look similar in a textbox, but JS length differs.

  1. Zero width chars
    Stuff like zero width joiner (ZWJ) and directional marks are invisible but count.

You get weird lengths if the user input includes these.
Especially with emojis that combine with ZWJ.

  1. What you likely want
    You want:

• “User-perceived characters” (grapheme clusters)
or
• “Visible glyphs” count

For that, you need a grapheme splitter, not .length.

Example using Intl.Segmenter (modern browsers):

function visibleLength(str) {
if (typeof Intl !== ‘undefined’ && Intl.Segmenter) {
const seg = new Intl.Segmenter(‘en’, { granularity: ‘grapheme’ });
return Array.from(seg.segment(str)).length;
}
return […str].length; // fallback, still not perfect
}

visibleLength(‘:grinning_face:’); // 1
visibleLength(‘:+1:t4:’); // 1
visibleLength(‘:family_man_woman_girl_boy:’); // 1

If you want validation like “max 10 characters”, use visibleLength instead of .length.

  1. Regex workaround (without Intl.Segmenter)
    Install a lib like graphemer:

npm install graphemer

Then:

import Graphemer from ‘graphemer’;
const splitter = new Graphemer();

function visibleLength(str) {
return splitter.splitGraphemes(str).length;
}

This aligns much closer to what users see.

  1. Validation tips
    • Always count graphemes for UI limits.
    • Store raw input unchanged, do not normalize it away for storage.
    • For features like search, you can normalize using str.normalize(‘NFC’) or ‘NFD’.

Quick example for min/max length:

function isLengthBetween(str, min, max) {
const length = visibleLength(str);
return length >= min && length <= max;
}

If your validation logic compares .length to a design spec like “max 20 chars”, switch to grapheme counting or your users with emojis will hit strange errors.

The short version: .length is lying to you, but also technically telling the truth.

JavaScript strings are sequences of UTF‑16 code units, and .length counts those, not what a human would call “characters.” That’s why:

  • Many emojis: 2 code units
  • Skin tones + emojis: 4+ code units
  • Family emojis with ZWJ: tons of code units
  • Some accented characters: sometimes 1, sometimes 2, depending on how they’re encoded

So yes, '😀'.length === 2, '👍🏽'.length === 4, etc. Looks like 1 on screen, counts as multiple under the hood. Your UI spec says “max 20 characters,” but your validator is really enforcing “max 20 UTF‑16 code units,” which is not the same thing at all.

@nachtschatten already covered grapheme clusters and Intl.Segmenter nicely, and I agree that counting graphemes is usually what you want. Where I’ll slightly disagree is on the “just use grapheme splitters everywhere” idea: that can be overkill if:

  • Your input is not user‑visible (e.g., internal IDs, tokens).
  • You actually care about raw byte-ish limits (like a database column that might choke).

In those cases, .length or even the actual UTF‑8 byte length may matter more than what users see.

A few practical angles that might help your validation specifically:

  1. Decide what “length” really means in your app

    • For UX rules like “username must be 3–15 characters” you really want user‑perceived characters.
    • For storage rules like “field must fit in 255 bytes in DB” you want byte length, not graphemes.
      Mixing those two is exactly how you end up confused and/or yelling at your screen.
  2. Byte length check (for storage constraints)
    If your backend or DB has a strict byte limit, calculate it explicitly in JS:

    function utf8ByteLength(str) {
      return new TextEncoder().encode(str).length;
    }
    

    Then validate like:

    utf8ByteLength(input) <= 255
    

    This will still let emojis count heavier than plain chars, which is sometimes what you actually need.

  3. Normalization before comparison (not for storage)
    Accented stuff like vs é can use different internal forms. If you’re doing:

    if (input.length > max) { ... }
    

    and also doing equality checks or search, normalize first:

    const normalized = input.normalize('NFC');
    

    I wouldn’t normalize before storing (I agree with @nachtschatten there), but normalizing for validation and matching can avoid subtle “same text, different bytes” issues.

  4. Do not rely on [...str].length as a “fix”
    People often write:

    [...str].length
    

    thinking “this counts real characters.” It only counts code points, not grapheme clusters. Many emoji sequences still break this. Families, flags, skin‑tone modifiers, and some complex scripts will still be off. If you care about matching what the user sees, this is a halfway solution at best.

  5. UI hints and server agreement
    One more gotcha: even if you fix the client side with grapheme counting, your server might still be using “code units” or byte length. Then the user sees “you can type 20 chars,” enters 20 emojis, and the backend screams.
    So:

    • Decide on one rule (graphemes vs bytes vs code units).
    • Implement it both on client and server.
    • Make your copy and UX reflect the exact rule if it matters.

If this is “breaking your validation logic” right now, your first quick patch is probably:

  • For UX limits: switch to a grapheme‑aware count for the visible limit.
  • For DB safety: add a byte‑length check too, possibly with a slightly larger “visual” limit but a hard “storage” limit behind the scenes.

And yeah, it’s frustrating that .length is so useless for humans here, but we’re stuck with that legacy.

Your intuition is right: the browser shows “1 thing,” .length shows “something else.” That tension will never fully disappear, so the trick is to pick one definition of “length” per use case and stick to it.

1. Stop chasing “the one true length”

You actually have at least three different “lengths” in play:

  1. UTF‑16 code units

    • What .length returns.
    • Good for: legacy APIs, some JS internals.
    • Bad for: UX, human‑visible counts.
  2. Unicode code points

    • What [...str].length approximates.
    • Better than .length for emojis with surrogate pairs.
    • Still wrong for sequences like family emojis, flags, skin tones, complex scripts.
  3. Grapheme clusters

    • What a user would informally call “characters” on screen.
    • Best match for “max 20 characters” UX rules.
    • Needs Intl.Segmenter or a library, as @andarilhonoturno and @nachtschatten already walked through.

I partly disagree with the idea that you always want grapheme clusters: sometimes you really want bytes or code units, especially with strict storage or protocol limits.


2. Decide per rule what you care about

Instead of “my length is wrong,” write down:

  • Username rule: “3 to 20 visible characters” → grapheme clusters
  • DB column rule: “up to 255 bytes in UTF‑8” → byte length
  • API field rule: “limit 4k code units to avoid payload bloat” → .length

You can absolutely use two limits at once:

  • Show user: “Up to 20 characters” enforced by grapheme count
  • Also silently ensure: UTF‑8 byte length under your DB or API hard cap

Conflict between client and server goes away once they both use the same definitions.


3. Normalization is not a magic fix (and can hurt)

A common suggestion is “just normalize the string”:

str.normalize('NFC')

That helps in some situations (like search and equality), but it does not fix:

  • ZWJ sequences in emoji
  • Zero width control characters
  • Flags built from regional indicator symbols
  • Custom sequences like “person + skin tone + gender + ZWJ + profession”

Also, normalizing before storage can lose information. I agree with @nachtschatten here: normalize for comparison, not necessarily for persistence.


4. How to make your validation more predictable

Rather than repeating their grapheme splitter examples, here are a few patterns you can actually drop into your code.

A. Separate “UX length” and “storage length” functions

// UX-visible length: plugin / Intl.Segmenter / your existing grapheme helper
function uiLength(str) {
  // assume you already implemented visibleLength from earlier posts
  return visibleLength(str);
}

// Storage length: bytes in UTF-8
function storageLength(str) {
  return new TextEncoder().encode(str).length;
}

Then your rules become explicit:

function validateUsername(str) {
  const len = uiLength(str);
  if (len < 3 || len > 20) return false;
  if (storageLength(str) > 64) return false;  // hard backend cap
  return true;
}

Now the weird emoji behavior is not “a bug,” it is just a consequence of a clearly defined rule.

B. Be explicit in your UX copy

If you enforce bytes on the backend, don’t lie to users with “20 characters max” when you actually mean “up to N bytes; emojis count more.” Either:

  • Switch to a grapheme-based limit for user-facing rules, or
  • Make the wording honest: “Limit: 20 characters, emojis may count as more than one.”

Not pretty, but more honest than silently rejecting “short” emoji-only inputs.


5. How invisible stuff sneaks in

Both previous answers mentioned zero width characters and joiners. You can actively strip some of them if your use case allows it.

Caution: this is destructive and not always safe, but for some inputs (like usernames), you might want:

const ZERO_WIDTH = /[\u200B-\u200F\u202A-\u202E\u2060-\u2064]/g;

function sanitizeUserFacing(str) {
  return str.replace(ZERO_WIDTH, ');
}

Use that before calling your uiLength. This can prevent troll inputs that are visually short but technically huge because of invisible junk.


6. Server and client must agree

One subtle problem: maybe you just fixed the browser side, but your server is still using JS .length, Java .length(), or SQL column size in bytes.

You need:

  • The same definition of “length” used in:
    • Frontend validation
    • Backend validation
    • Any database constraints

Otherwise, users see “OK” in the UI and “too long” as a 400 error.


7. About the blank “product title”

The provided product title ' is effectively empty, so it does nothing for readability or SEO. Pros and cons of that are pretty simple:

Pros

  • No misleading marketing phrasing in your code or docs
  • No coupling your logic to a named product that might change

Cons

  • Zero discoverability if you were intending to expose this as a reusable utility
  • Harder for others to search for or discuss it meaningfully (“that empty-titled helper” is not helpful)

Compared to what @andarilhonoturno and @nachtschatten shared, having a concrete, well named “length strategy” helper (even if the title was placeholder text) at least makes your intent clearer both for SEO and team communication.


In practice, the fix for your broken validation is:

  • Define which “length” each rule really cares about
  • Implement separate utility functions for each kind
  • Keep UI limits grapheme based, storage limits byte based
  • Make client and server use the same logic

Once you do that, .length will stop surprising you, because you’ll barely use it for user-facing rules.