What is Base 64 encoding?
Introduction
Base64 encoding is a format designed to prevent communication mishaps during the transfer of binary information. It achieves this through the conversion of binary data using a lookup table. Data is eventually converted to a stream of ASCII characters, which can then be transmitted and decoded. With Base64-encoded data, the resultant string is always larger than the original (i.e. Base64 is not a compression algorithm). Another important distinction is that Base64 does not encrypt any information. It uses a standard table of characters to encode and decode information. In other words, any Base64 string can be decoded as long as the string was encoded using a standard set of characters that the decoder can also understand.
Uses for Base64
In addition to safely encoding image and media data (you may have seen images on the web encoded in the following format: data:image/png;base64,(...)
), Base64 is used for SSL certificates, email transmissions, and virtually any transfer of information that requires special control characters to be escaped.
Encoding Base64
While many programming languages (e.g., Java, Python, JavaScript) include built-in functions to facilitate the conversion of Base64-encoded information and binary data, the algorithm used to perform the conversion is relatively simple. Starting with binary information, Base64 splits a binary string into 6-bit groups (note: each zero or one in a binary string is one bit) of three bytes. The result is an ASCII-readable string that can be safely transmitted and received.
For the sake of simplicity, we’ll use a short string to demonstrate the encoding process. Suppose we have “hi” as our string, the ASCII equivalent being: 104 (h), 105 (i).
We need to convert the ASCII string to its binary representation:
01101000 01101001 (0110100001101001 without spaces)
Now, divide the binary representation into 6-bit groups:
011010 000110 100100
With the binary representation of our string, we can then make use of an ASCII lookup table above. We get the following ASCII characters for our string’s binary representation:
- 011010 => “a”
- 000110 => “G”
- 100100 => “k”
Before we can finish the conversion process, it’s important to account for padding: Base64 strings must be made up of groups that total a multiple of 3 bytes. Since our string "hi" only has 2 characters, we need to insert a "=" at the end of the string to make it a valid Base64 string.
The final result is: “aGk=” (hi).
Decoding Base64
Decoding Base64 strings follows a similar process, just in reverse. Using the result from the previous section (“aGk=”), we will use the same Base64 lookup table:
- “a” => 011010
- “G” => 000110
- “k” => 100100
- “=” => padding
This leaves us with three 6-bit binary groups:
011010 000110 100100
Converting this back to 8-bit groups, we get (the last two zeroes can be ignored):
01101000 01101001
… which results in “hi” in ASCII.
Conclusion
Whether you’re sending an email, or trying to encode binary streams (images, videos, etc.), Base64 can be seen in many applications. While the resulting strings are larger, using Base64 encoding is a reliable way to ensure that binary information is transmitted without being misinterpreted.