Problems with PAM (Pluggable Authentication Modules)

by Darren Tucker.

This is currently incomplete and has no links to it, so if I didn't send you a link to it and you still found it, I'm impressed!

Disclosure

I am, among other things, a sysadmin and an OpenSSH developer.

As a sysadmin, I was neutral to positive about PAM. Neutral because our environment (lots of individual machines on different networks separated by firewalls) didn't lend itself to the kind schemes that PAM is often used for (eg Kerberos, RADIUS), but positive because it seemed to provide a bunch of neat capabilities, even if we couldn't use it much.

Since mid-2003, I have done a lot of the work on OpenSSH's PAM interface, and have grown to really dislike PAM. It's not any single thing that makes it so difficult to work with (in the context of OpenSSH), but a number of things that compound to make life far harder than they ought to be.

PAM and SSH don't play nice together

The PAM API and the SSH protocol are a poor match, and this mismatch makes it tricky for an SSH server to support PAM. Niels Möller (the author of the SSH implementation lsh) has also noted this.

From my point of view, the main reasons are:

PAM assumes it has complete control of the authentication, but sshd has additional authentication capabilities that PAM doesn't, eg SSH public keys.
PAM goes to great lengths to hide information from the application. There's no hints about what the messages passed to the conversation structure mean, the PAM handle is a "blind" structure, and modules can stash data in it (with pam_set_data).
The SSH1 and SSH2 protocols are event-driven, but the blocking nature of the functions combined with the conversation functions as callbacks leads to difficulties interacting with the event loop from within the conversation function.
The SSH protocol has some requirements that PAM can not easily provide. For example, at the start of an SSH2 authentication, the client must send a request for an authentication of "none". If the user is permitted without further authentication, the server must reply with a success or a list of authentications that can continue.
There's no way to ask PAM directly if the user is permitted without a password, so the only thing that sshd can do is set up a conversation function that just return an error if called, then call pam_authenticate. This works, but may result in spurious PAM failure messages in logfiles and/or unnecessary delays. Users tend to hate unnecessary delays, so as a result, sshd works around this by skipping the test if PermitEmptyPasswords is set to no.

PAM Architecture

Blocking functions and callbacks

This is the biggest problem with the PAM architecture with respect to an SSH implementation. The primary interface to the PAM functions are blocking library calls, but the mechanism for interacting with the user is the conversation function, which is a callback. When a PAM function, such as pam_authenticate, is called, the function does not return until the authentication has run to completion and either succeeded or failed, but during this time the conversation function may be called any number of times.

This is a big drawback for event-driven applications (such as sshd): if an event requires a call to PAM, this call will block until the PAM interaction is complete during which time the event loop is not running. This is a massive problem if the interaction with the user requires the event loop as it does in sshd.

This could be improved greatly by implementing some kind of re-entrant conversation function.

Unnecessary complexity

The PAM conversation mechanism also means that the application's interface to PAM is about twice as complicated as it needs to be.

Loadable modules and shared address space

In Firewalls and Internet Security (1994), the authors discuss adding hooks to Unix authentication binaries and offer this advice: "we do not recommend doing this via a shared library; such facilities have been responsible for security problems on a number of different platforms." PAM was first introduced in Solaris 2.3 (release November 1993) as a private interface and in Solaris 2.6 (1997) as a public interface. The security implications of using shared libraries were known in that timeframe, but PAM uses them anyway.

The PAM application and PAM modules are at each other's mercy due to their shared address space. Programming errors in either can trash memory in the other, so there is no easy way to determine which is responsible for a problem, since it's quite possible that the place where it blows up is in no way related to the source of the error.

Along the same lines, because they're in the same address space applications and modules are vulnerable to namespace collisions. This risk can be reduced by using unique prefixes for functions and global variables and making as many as possible static.

Loadable modules also bring with them a series of challenges resolving conflicts while dynamic linking (eg, if the application is linked against the system's libcrypt.so for crypt() and the module is linked against OpenSSL's libcrypto, which gets used?).

To be fair, running in the application's address space does provide some advantages over alternatives (such as separate helpers as used by BSD auth). For example, the PAM modules can perform kernel-level magic that will "stick" with the application (eg AFS modules can set a PAG). I think the disadvantages outweigh the advantages.

Inconsistencies between implementations

Dereferencing message structures passed to conversation functions

XSSO (pp 89) specifies that the message parameter passed to the conversation function is "a pointer to an array". Solaris PAM treats it as such, but LinuxPAM (and OpenPAM?) treat it as an array of pointers. The two are equivalent for a single message, but can blow up horribly for more. The LinuxPAM documentation recommends modules call the conversation function with only one message, or doubly-reference the structs so both methods work.

pam_setcred/pam_open_session call order

Solaris PAM and OpenPAM expect pam_setcred to be called before pam_open_session, whereas LinuxPAM expects the opposite.

The Solaris 8 pam_setcred(3PAM) man page says:

     It is typically called after the
     user has been authenticated and after  a  session  has  been
     opened.   See  pam_authenticate(3PAM),  pam_acct_mgmt(3PAM),
     and pam_open_session(3PAM).

Whereas the LinuxPAM pam_setcred(3) man page:

       This function is used to establish, maintain and delete the credentials
       of a user. It should be called after a user has been authenticated  and
       before a session is opened for the user (with pam_open_session(3)).

This seems to show up in assumptions made by third-party modules which only work with one order (Trusted HP-UX seems to be particularly picky).

Module names and parameters

There is little consistency between the module names and parameters between platforms. pam_unix.so usually implementes the traditional Unix account semantics, but even this is not consistent everywhere: Solaris 8 and up this is split into pam_unix_auth.so, pam_unix_account.so and pam_unix_session.so.

Location of header files

This one is trivial but annoying because it's unnecessary. The PAM application header file is <security/pam_appl.h>. Except on Mac OS X, where in the interests of consistency it's <pam/pam_appl.h> instead.

pam_chauthtok() and real uids

Solaris will not enforce password restrictions if the real uid of the calling process is 0, on the assumption that it's passwd being run as root. Conversely, AIX's pam_chauthtok() won't work at all if the real uid is not 0.

Implementation errors

LinuxPAM will not change conversation function in some cases

Some versions of LinuxPAM will not permit the changing of the conversation function. The pam_set_item(pamh, PAM_CONV, [...]) call succeeds but the previous conversation is still called. This has been observed with Redhat 9 (pam-0.75-48) and Fedora Core 2 (pam-0.77-40), but does not occur with Debian 3.0 (...). This can be demonstrated with this small test program. Note that even though pam_set_item succeeded, pam_chauthtok still uses to old conversation function. This been mentioned on the pam list and reported as a Fedora bug #126985.

Buggy modules

A PAM application may have to deal with any number of buggy modules. Some examples:

Modules that trash the application context: a PAM module may be passed an application context (appdata_ptr in the pam_conv structure, which seems to be present to allow a single conversation function to handle multiple concurrent authentication requests). Some modules, however, do not correctly handle this application data and pass NULL to the conversation function instead of the pointer.
This is a good example of the problems with shared address space and the lack of isolation between application and authenticator: although the bug is in the module, the likely outcome is that the application code will segfault when it attempts to dereference a NULL pointer.
Improper handling of PAM_TTY: Some versions of Solaris had PAM modules that would segfault if PAM_TTY was set to something that did not start with "/dev/", even though XSSO says that PAM_TTY might be anything including an X11 $DISPLAY. (In fairness, this has long since been fixed, but buggy modules still seem to show up occasionally.)
This will also crash the application, but at least the backtrace will point to the buggy module.
LinuxPAM pam_unix.so nullok flag overrides DISALLOW_NULL_AUTHTOK: See Fedora bug #127054

The whole of the problem is greater than the sum of the parts.

[todo]

References

A brief introduction to PAM.
Original PAM RFC
XOpen Single Sign On specification (XSSO)
Linux PAM Documentation
Operating System release dates
Firewalls and Internet Security, William R. Cheswick and Steven M. Bellovin, Addison Wesley, 1994.

Page last modified: $Date: 2022-05-25 $