LDAP: A Gentle Introduction

The perception of LDAP (Lightweight Directory Access Protocol) is ambivalent. On the one hand, it is widely supported as a common authentication backend. On the other hand, there’s very little and poor documentation mainly targeted toward a particular case (for example, replacing NIS with LDAP).

Although I am mainly a developer, I’ve been regularly exposed to LDAP and would like to give a very gentle introduction to this field to make the first step easier for others who have to grok this technology.

Introduction

So, what is LDAP? I’m going to spare you the details of its history and jump right in:

An LDAP server is a database.

A database with some unique attributes that make it a directory. One of the most fundamental is it is optimized for reading. You’ll need writing for it to be useful, but essentially, it’s about reading. Therefore, it’s perfect for any kind of white pages or configurations. Accordingly, it’s mostly known for its usage as a centralized address book or for authentication.

So, we have a read-optimized database, and this database consists of objects, which have attributes. The next important feature is predefined schemas, which should make it easy to adopt LDAP by having conventions for object types that allow for interoperation between unrelated software.

For example, for address books, inetOrgPerson is fine – it contains attributes for most information about people you’ll ever need. These schemas are also specified using attributes of the object called objectClass – so the objects are self-describing, which helps parsing. Notably, an object can be of an arbitrary number of classes. That makes it possible to authenticate Windows and UNIX users against the same object, for example.

Addressing

So, how do you access the data you fill in? LDAP databases are hierarchical and the addressing works from right to left. First, entries usually have a common name attribute called cn.

For example, cn=hynek. Now, you’re probably going to divide the (in this case) persons into groups called organizational units. Assuming our group is “users”, we’ve got cn=hynek,ou=users. These ou can be as deep and branched as you like. Finally, you define your top directory using domain components (dc): cn=hynek,ou=users,dc=ox,dc=cx. This would be the absolute address of an entry. These addresses are called distinguished names (in short dn) in LDAP. The aggressive use of non-obvious abbreviations (cn, dn, ou, dc, …) is one of the reasons why LDAP appears so confusing at first.

Accessing

Now that we know what LDAP is and how to address specific entries…how do you actually access them? There are GUI tools for browsing and editing LDAP directories like Apache Directory Studio. In production, you’ll probably use one of the language bindings (for example, for Python, Go, or Java) or the LDIF format which allows manipulation using text files. The best way to understand it is to tinker a bit with all of them.

For querying the directory, there is a whole language including logical ands and ors. If you just search a specific cn, the expression would be: (cn=foobar). If you search someone with the cn “foo” or the cn “bar” you might say (| (cn=foo) (cn=bar)) .

One last thing you can encounter pretty often is the so-called search base. With it, you define under which hierarchy the specified expression is supposed to be searched. In the example above, such a search base would be ou=users,dc=ox,dc=cx.

Putting it all together

To add an example of how such a query might look, let’s search for the address book entry “hynek” inside ou=users,dc=ox,dc=cx:

$ ldapsearch -x '(cn=hynek)' -b 'ou=users,dc=ox,dc=cx'
# extended LDIF
#
# LDAPv3
# base ou=users,dc=ox,dc=cx> with scope sub
# filter: (cn=hynek)
# requesting: ALL
#
# hynek, users, ox.cx
dn: cn=hynek,ou=users,dc=ox,dc=cx
cn: hynek
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
objectClass: top
sn: Schlawack
givenName: Hynek
# search result
search: 2
result: 0 Success
# numResponses: 2
# numEntries: 1

The output looks confusing but is harmless. The essential data begins after # hynek, users, ox.cx and ends with the givenName attribute. As I didn’t fill in many attributes (just “givenName” and “sn” (aka surname)), there isn’t much to see. The objectClass attributes that describe the object are interesting, though.

A little pain for the end

Let’s wrap up with the more complicated parts. LDAP suffers a bit from its heritage. Everything is built around numerical IDs, requiring an OID before defining own valid schemas. Making it even more complicated: every object (for example, “Person”) you define, and every attribute (e.g., “Name”) has its own ID, beginning with the OID. The nice thing about this is that you can rename attributes painlessly. The bad thing is that it looks awful.

One last OID pain: the data types (e.g., “Unicode String”) are IDs too. So, for a 16-character sized Unicode string, you’ll write:

1.3.6.1.4.1.1466.115.121.1.15{16}

Here’s an example from the shipped schemas from OpenLDAP for an attribute:

attributetype ( 2.5.4.9 NAME ( 'street' 'streetAddress' )
    DESC 'RFC2256: street address of this object'
    EQUALITY caseIgnoreMatch
    SUBSTR caseIgnoreSubstringsMatch
    SYNTAX 1.3.6.1.4.1.1466.115.121.1.15{128} )

A very fine tutorial for writing your own schemas has been released in the Linux Gazette.

Introduction

Addressing

Accessing

Putting it all together

A little pain for the end

Hynek Schlawack