We are at NAB Las Vegas 2024! 🐰 - Come and say Hi at West Hall booth number W1259🚀

Come and say Hi
x

What is URL Encoding?

What is URL encoding? How does it work - and what is it used for?

What Is URL (Uniform Resource Identifier) Encoding?

Definition

URL encoding is an encoding format used in URLs. The standard allows the use of arbitrary data inside a Uniform Resource Identifier (a URI; typically a URL) while using only a narrow set of US-ASCII characters. The encoding exists because URLs and HTTP request parameters often contain characters (or other data) that cannot be represented with the limited set of US-ASCII characters (i.e. control characters, etc.).

Reserved and unreserved characters

In general, a URI can contain characters that are either reserved or unreserved. Unreserved characters are characters that have no special meaning; they can be displayed as-is and require no special handling. These include uppercase and lowercase letters (A-Z, a-z), decimal digits (0-9), hyphen (-), period (.), underscore (_), and tilde (~).

Reserved characters, on the other hand, are characters that may delimit the URI into sub-components: characters such as / # & and others. The following is the list of all reserved characters: ! # $ & ' ( ) * + , / : ; = ? @ [ ].

We cannot use reserved character as-is, because this would create ambiguous URIs. For instance, consider URL http://example.com/foo#bar. Does this URL point to an anchor #bar inside resource /foo, or it points to a resource /foo#bar, that is, a resource whose name contains character #? Without URL encoding it would be impossible to tell.

We resolve such ambiguities by encoding reserved characters differently when used as data; when used as delimiters, we encode them as-is.

Percent encoding

To encode reserved characters, we use the percent-encoding scheme. In percent-encoding, each byte is encoded as a character triplet that consists of the percent character % followed by the two hexadecimal digits that represent the byte numeric value. For instance, %23 is the percent-encoding for the binary octet 00100011, which in US-ASCII, corresponds to the character #. Strictly speaking, while the percent character (%) isn't reserved, it nonetheless serves as a special indicator for percent-encoded bytes (and therefore requires special handling). Simply put: it must also be percent-encoded (as %25).

So with percent-encoding, we know that URL http://example.com/foo#bar points to an anchor bar inside resource /foo while http://example.com/foo%23bar points to resource /foo#bar where character # is encoded as %23.

What Is URL (Uniform Resource Identifier) and Percent Encoding

Other characters

Percent encoding is also used to represent other characters; characters that are neither reserved nor unreserved. As an example, imagine a GET request containing a non-ASCII string parameter, such as a search query zajec in jež which is Slovenian for a rabbit and a hedgehog.

In such cases, we have to first encode non-ASCII characters as UTF-8 and then encode each byte of the new string with percent-encoding. So if we send a GET request to the Duckduckgo search engine containing search query zajec in jež, we generate the following URL: https://duckduckgo.com/?q=zajec%20in%20je%C5%BE

Encoding the space character

You may have seen cases where the space character was encoded as character +, however, the percent-encoding suggests it should be encoded as %20 (in US-ASCII, the space character is 20 hexadecimal or 32 decimal). So what is going on?

Such encodings are typically created by HTML forms. When a user submits an HTML form, the data is URL-encoded using an early version of the URI percent-encoding rules that contained a number of modifications such as replacing spaces with + and others.

Note however, that using the + instead of %20 is valid only when encoding the application/x-www-form-urlencoded content, such as the query part of an URL. To make this clearer, consider the following cases.

  1. http://www.example.com/search+script.php?search+query=search+term

    In this URL, the resource being requested is search+script.php (the plus character (+) is part of the filename), while the parameter name is search query and its value is search term – in the name of the query parameter and in its value the + sign is converted to space while in the name of the resource, search+script.php, the + sign remains.

  2. http://www.example.com/search+script.php?search%20query=search%20term

    This case is identical to the example above. The difference—using %20 instead of the + sign in parameter name and value—is only superficial. Both URLs point to the same resource, search+script.php, and they contain the same parameters.

  3. http://www.example.com/search%20script.php?search%20query=search%20term

    This example, however, is different. Here the resource name contains the actual space character, so the name of the requested resource is search function.php; the request parameter names and values remain the same as above. Consequently this URL is different from those above.

A URL encoder

The application below performs URL encoding and decoding on arbitrary strings. Feel free to test it out (HTML).

Input <br>
<input type="text" name="input" id="input"><br><br>

Output <br>
<input type="text" name="encoded" id="encoded">

<script>
let input = null;
let encoded = null;

document.addEventListener("DOMContentLoaded", () => {
        input = document.querySelector("#input");
        input.onkeyup = encode;
        encoded = document.querySelector("#encoded");
        encoded.onkeyup = decode;
});

function encode(event) {
        encoded.value = encodeURIComponent(input.value);
}

function decode(event) {
        try {
                input.value = decodeURIComponent(encoded.value);
        } catch (error) {
                input.value = "Invalid URI string";
        }
}
</script>

Further reading

Did you find this article helpful?

0 out of 0 Bunnies found this article helpful

Glossary

HTTP

Hypertext Transfer Protocol. A protocol that connects web browsers to web servers when they request content.

Encoding

The act of transferring or saving information into a usable file format.

Prove your Knowledge.
Earn a bunny diploma.

Test your knowledge in our Junior and Master Quizes to see where you stand. Prove your mastery by getting A+ and recieving a diploma.

Start the QuizBunny with a diploma.