DESCRIPTION
A Uniform Resource Identifier (URI) is a short string of characters identifying an abstract or physical resource (for example, a web page). A Uniform Resource Locator (URL) is a URI that identifies a resource through its primary access mechanism (e.g., its network "location"), rather than by name or some other attribute of that resource. A Uniform Resource Name (URN) is a URI that must remain globally unique and persistent even when the resource ceases to exist or becomes unavailable.
URIs are the standard way to name hypertext link destinations for tools such as web browsers. The string "http://www.kernelnotes.org" is a URL (and thus its a URI). Many people use the term URL loosely as a synonym for URI (though technically URLs are a subset of URIs).
URIs can be absolute or relative. An absolute identifier refers to a resource independent of context, while a relative identifier refers to a resource by describing the difference from the current context. Within a relative path reference, the complete path segments "." and ".." have special meanings: "the current hierarchy level" and "the level above this hierarchy level", respectively, just like they do in Unix-like systems. A path segment which contains a colon character cant be used as the first segment of a relative URI path (e.g., "this:that"), because it would be mistaken for a scheme name; precede such segments with ./ (e.g., "./this:that"). Note that descendents of MS-DOS (e.g., Microsoft Windows) replace devicename colons with the vertical bar ("|") in URIs, so "C:" becomes "C|".
A fragment identifier, if included, refers to a particular named portion (fragment) of a resource; text after a # identifies the fragment. A URI beginning with # refers to that fragment in the current resource.
USAGE
There are many different URI schemes, each with specific additional rules and meanings, but they are intentionally made to be as similar as possible. For example, many URL schemes permit the authority to be the following format, called here an ip_server (square brackets show whats optional):
--> |
hostport | the LDAP server to query, written as a hostname optionally followed by a colon and the port number. The default LDAP port is TCP port 389. If empty, the client determines which the LDAP server to use. |
dn | the LDAP Distinguished Name, which identifies the base object of the LDAP search (see RFC 2253 section 3). |
attributes | a comma-separated list of attributes to be returned; see RFC 2251 section 4.1.5. If omitted, all attributes should be returned. |
scope | specifies the scope of the search, which can be one of "base" (for a base object search), "one" (for a one-level search), or "sub" (for a subtree search). If scope is omitted, "base" is assumed. |
filter | specifies the search filter (subset of entries to return). If omitted, all entries should be returned. See RFC 2254 section 4. |
extensions | a comma-separated list of type=value pairs, where the =value portion may be omitted for options not requiring it. An extension prefixed with a ! is critical (must be supported to be valid), otherwise its non-critical (optional). |
|
LDAP queries are easiest to explain by example. Heres a query that asks ldap.itd.umich.edu for information about the University of Michigan in the U.S.: ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US To just get its postal address attribute, request: ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US?postalAddress
Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear. For example, "%7e" is sometimes used instead of "~" in an http URL path, but the two are equivalent for an http URL.
For URIs which must handle characters outside the US ASCII character set, the HTML 4.01 specification (section B.2) and IETF RFC 2718 (section 2.2.5) recommend the following approach: