Think RTF Is Just ASCII? Think Again!

RTF files can include binary data, and it is done using the \bin control word.

Details:

  • The \binN control word is used to specify that the next N bytes in the file should be interpreted as raw binary data, not as RTF control words or ASCII text.
  • This is typically used for embedding objects like images, OLE objects, or non-text attachments.

Don’t Trust VARCHAR with Binary Data

Storing RTF documents containing binary data directly in a VARCHAR database field is risky because binary sections (introduced with the \bin keyword) can include non-printable or control characters that may corrupt text-based storage, cause encoding issues, or break applications expecting valid character data. This can lead to data loss, retrieval errors, or security vulnerabilities if binary content is interpreted incorrectly. A safer approach is to store the entire RTF document as a BLOB (Binary Large Object), which preserves the raw data accurately and supports both textual and binary components without risking corruption.

From the RTF 1.9.1 specification:

“RTF files are usually 7-bit ASCII plain text, consisting of control words, control symbols, and groups. RTF files are easily transmitted between most PC based operating systems because of their 7-bit ASCII characters. However, converters that communicate with Microsoft Word for Windows or Microsoft Word for the Macintosh should expect data transfer as 8-bit characters and binary data (see \binN) can contain any 8-bit values.” (…)


[MSFT-RTF] Microsoft Corporation, “Rich Text Format (RTF) Specification”, version 1.9.1, March 2008, https://go.microsoft.com/fwlink/?LinkId=120924


Discover more from Habarisoft Blog

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *