dataprep/prep
dataprep/prep — infallible transformations on a single value.
Reach for this module when the operation always succeeds:
trim, lowercase, collapse whitespace, replace substrings, fall
back to a default. Compose with then / sequence.
For fallible checks ("is this non-empty?", "does this match a
pattern?") use dataprep/validator. The two compose cleanly —
see doc/architecture.md for the
decision table, the canonical Prep → Validator pipeline recipe,
and a worked end-to-end example.
Applying a Prep
Prep(a) is a type alias for fn(a) -> a, so applying a built
prep is just calling it like a function — no wrapper module
function is needed:
let pipeline = prep.then(first: prep.trim(), next: prep.uppercase())
let cleaned = pipeline(\" hello \") // \"HELLO\"
For readers who prefer a named entry point — pipeline-style or
when threading a prep through multiple call sites — prep.run/2
is a thin alias of the function call: prep.run(pipeline, value)
is identical to pipeline(value). Pick whichever reads better at
your call site; both compile to the same code.
Types
Prep(a) is an infallible transformation: fn(a) -> a. It always succeeds and never produces errors.
pub type Prep(a) =
fn(a) -> a
Reason a _checked constructor could not build a prep from the
supplied arguments. The panicking constructors (replace) reject the
same cases at construction time; the _checked siblings surface them
as data so callers with runtime-supplied arguments can recover.
pub type PrepError {
EmptyTarget
}
Constructors
-
EmptyTargetreplace_checkedwas called with an emptytarget. The empty string matches at every position, which the underlyingstring.replaceturns into a silent no-op — almost always a swapped-argument or unintended-runtime-value bug.
Values
pub fn collapse_space() -> fn(String) -> String
Collapse consecutive ASCII whitespace into a single space.
Matches the POSIX whitespace class [ \t\n\r\f\v] (space, tab,
linefeed, carriage return, form feed, vertical tab). Unicode
whitespace such as NO-BREAK SPACE (U+00A0) and IDEOGRAPHIC SPACE
(U+3000) is preserved — for those use collapse_unicode_space
(it matches the wider Unicode \s set and replaces every run with
a single ASCII space).
This split avoids the silent CJK-destruction footgun of replacing
姓 名 (with U+3000 between the names) with 姓 名 when the
caller only meant to normalise indentation.
Uses let assert for the regex compilation. The pattern is a
fixed, known-valid regular expression, so compilation cannot fail
at runtime. The assert is intentional and safe.
pub fn collapse_unicode_space() -> fn(String) -> String
Collapse consecutive Unicode whitespace into a single ASCII space.
Matches \s+ under the regex engine’s full Unicode rule, so it
recognises NO-BREAK SPACE (U+00A0), IDEOGRAPHIC SPACE (U+3000),
LINE / PARAGRAPH SEPARATOR (U+2028 / U+2029), the various EN/EM
SPACEs (U+2000..U+200A), etc. Each run — even one made entirely of
non-ASCII whitespace — is rewritten to a single ASCII U+0020.
Reach for collapse_space instead when the caller wants to keep
CJK / typographic whitespace intact and only fold ASCII runs.
Uses let assert for the regex compilation. The pattern \s+ is
a fixed, known-valid regular expression, so compilation cannot
fail at runtime. The assert is intentional and safe.
pub fn compose(
first p1: fn(a) -> a,
then p2: fn(a) -> a,
) -> fn(a) -> a
Sequential composition: same as then/2, exposed under the FP
compose name so callers coming from Haskell (.), Elm <<,
or lodash _.flow find the entry point on first grep. The label
reads compose(first:, then:) — the second label is then (not
next) to mirror the prose “first do f, then do g”. Output is
byte-identical to then(first:, next:). (#61)
pub fn default(fallback: String) -> fn(String) -> String
Replace the value with fallback when the input is exactly the
literal empty string "".
Whitespace-only inputs like " ", "\t", " \n " are
passed through unchanged — only s == "" triggers the
fallback. Reach for default_when_blank instead when you want
the broader "missing or whitespace-only" check, or compose with
trim:
prep.trim() |> prep.then(first: _, next: prep.default(“N/A”))
pub fn default_when_blank(
fallback: String,
) -> fn(String) -> String
Replace the value with fallback when the input is the literal
empty string "" or consists only of whitespace (per
string.trim).
Examples that fire the fallback: "", " ", "\t", "\r\n",
" \n ". Examples that do not: "a", " a ", "\t hello".
Equivalent to prep.trim() |> prep.then(prep.default(fallback))
when the trimmed value is what the caller wants to keep on the
non-blank path. The dedicated helper preserves the original
(un-trimmed) input on the non-blank path, which matches the
default posture: only substitute, never edit. Use the explicit
trim |> default composition when the trimmed form is the
desired output.
pub fn replace(
target target: String,
replacement replacement: String,
) -> fn(String) -> String
Replace all occurrences of target with replacement.
target must be non-empty. The empty string matches at every
position (including between every byte and at both ends of the
input), which the underlying string.replace handles by leaving
the input untouched — that is a silent no-op and almost always
indicates the caller swapped the arguments or fed in an
unintended runtime value, so the empty-target case is rejected
at construction time with a panic that names the function.
pub fn replace_checked(
target target: String,
replacement replacement: String,
) -> Result(fn(String) -> String, PrepError)
Result-returning companion to replace for callers
whose target comes from runtime input (user-typed search fields,
configuration, CSV columns) rather than a known-good literal. Returns
Error(EmptyTarget) instead of panicking when target is empty, so
the empty-target case can be handled as data:
case prep.replace_checked(target: user_find, replacement: user_with) {
Ok(prepper) -> Ok(prepper(input))
Error(prep.EmptyTarget) -> Error("search text must not be empty")
}
For a non-empty target the returned prep is identical to the one
replace builds — replace is defined in terms of this function, so
the two cannot drift apart.
pub fn run(prep prep: fn(a) -> a, value value: a) -> a
Apply a Prep(a) to a value. Thin alias of the function call:
prep.run(p, value) is identical to p(value). The two forms
compile to the same code; reach for run/2 when a named entry
point reads better at the call site (pipelines, currying, threading
the prep value through multiple call sites). Discoverability hook
for users who grep for “apply” / “run” before learning the type
alias trick (#60).
pub fn sequence(steps: List(fn(a) -> a)) -> fn(a) -> a
Compose a list of preps into a single prep.
identity() is the identity element of sequential composition,
so sequence([]) returns a prep that leaves every input
unchanged. This is a deliberate monoid law (see
test/dataprep/laws_test.gleam) and lets callers build prep
lists incrementally — for example via
list.filter(all_preps, by_feature_flag) — without a special
case when the resulting list happens to be empty.
pub fn then(
first p1: fn(a) -> a,
next p2: fn(a) -> a,
) -> fn(a) -> a
Sequential composition: apply p1, then apply p2 to the result.
FP-leaning users often grep for compose first; prep.compose/2
is a labelled alias of this function with the same semantics.
Both forms accept positional and labelled arguments.