Think RTF Is Just ASCII? Think Again!
RTF files can include binary data, and it is done using the \bin
control word.
Details:
- The
\binN
control word is used to specify that the next N bytes in the file should be interpreted as raw binary data, not as RTF control words or ASCII text. - This is typically used for embedding objects like images, OLE objects, or non-text attachments.
Don’t Trust VARCHAR with Binary Data
Storing RTF documents containing binary data directly in a VARCHAR database field is risky because binary sections (introduced with the \bin keyword) can include non-printable or control characters that may corrupt text-based storage, cause encoding issues, or break applications expecting valid character data. This can lead to data loss, retrieval errors, or security vulnerabilities if binary content is interpreted incorrectly. A safer approach is to store the entire RTF document as a BLOB (Binary Large Object), which preserves the raw data accurately and supports both textual and binary components without risking corruption.
From the RTF 1.9.1 specification:
“RTF files are usually 7-bit ASCII plain text, consisting of control words, control symbols, and groups. RTF files are easily transmitted between most PC based operating systems because of their 7-bit ASCII characters. However, converters that communicate with Microsoft Word for Windows or Microsoft Word for the Macintosh should expect data transfer as 8-bit characters and binary data (see \binN) can contain any 8-bit values.” (…)
[MSFT-RTF] Microsoft Corporation, “Rich Text Format (RTF) Specification”, version 1.9.1, March 2008, https://go.microsoft.com/fwlink/?LinkId=120924
Discover more from Habarisoft Blog
Subscribe to get the latest posts sent to your email.