Split a JavaScript string into words
Up until a few years ago, the go-to method for splitting a string into words was String.prototype.split()
. While it can still work just fine, it's a bit of a hassle to get right, especially for longer bodies of text. Yet, JavaScript has come up with a simpler way that takes care of all the nuances for us - Intl.Segmenter
.
Using the Intl.Segmenter()
constructor, we can create a segmenter for a given locale, with a specific granularity. In this case, we want to split a string into words, so we'll use the word
granularity. Then, we can use the Intl.Segmenter.prototype.segment()
method to split the string into segments.
A segment is an object with a handful of properties. The ones that are interesting for this task are segment
and isWordLike
. The former is the actual segment, while the latter is a boolean indicating whether the segment is word-like or not. This allows us to easily filter out non-word segments.
Putting everything together, we can create a function that splits a string into words, using the Intl.Segmenter
API.
const splitIntoWords = (str, locale) => [...new Intl.Segmenter(locale, { granularity: 'word' }).segment(str)].reduce( (acc, { segment, isWordLike }) => { if (isWordLike) acc.push(segment); return acc; }, [] ); splitIntoWords('I love javaScript!!', 'en-US'); // ['I', 'love', 'javaScript'] splitIntoWords('python, javaScript & coffee', 'en-US'); // ['python', 'javaScript', 'coffee']