LDAP3 Authentication

I am looking into using CAT-SOOP and see that there are several authentication managers in CAT-SOOP. However, non of them can easily be configured against my university's authentication system. I can not seem to find any information about the development of additional authentication systems. The easiest would be to have a simple login page with an authentication manager that communicated with some remote server using LDAP3. Thus, I am wondering the following:

  1. How is the API for authentication mangers defined? That is, how would one generally implement a new authentication manager?
  2. Is it possible to add custom authentication managers to a single subject or do they have to be defined inside the __AUTH__ directory in the CAT-SOOP library?
  3. If I implemented a standarized LDAP3 login manager, would there be any interest in adding it to CAT-SOOP?

If you are wondering about my use case for CAT-SOOP, I am leaning towards using it for the introductory course to algorithms at my university (NTNU). Initially, the aim is to use CAT-SOOP with the queue plugin this fall, where our only use case is a queue system for digital guidance with TAs (as a result of COVID-19). Eventually I am considering porting our assignments to CAT-SOOP (probably Fall 2021).

Exciting, thanks for reaching out! I apologize for the lack of documentation...but I'm sure we can help figure this out.

Yes, that would definitely be a welcome addition!

Since catsoop logins carry over across courses on the same catsoop instance, it's not possible to implement them at the level of an individual subject. Actually, it might technically be possible in 2020.2 (due to an oversight) by adding an __AUTH__ directory within your subject, but it's not intended, and future versions will remove that.

From a high-level perspective, what we need is a new subdirectory of the __AUTH__ directory (which it looks like you already discovered). Inside of that directory, there must be a Python file with the same name, for example __AUTH__/dummy/dummy.py. When cs_auth_type = 'dummy', catsoop will look there for how to do authentication.

'dummy' is a reasonable place to start looking, because it's by far the most minimal example we have for an authentication type, and it implements the one required piece: get_logged_in_user, which takes in a dictionary with a bunch of information about the catsoop instance, and returns a dictionary with some additional information: 'username' is the indentifier that catsoop should use for this person, 'email' is an e-mail address we can use for that person, and 'name' (not required) is a full name we can use for the person.

'dummy' just looks up this information from the catsoop configuration file, but other authentication types do more work. The OpenID Connect implementation, for example, does a bit more:

  • It first looks in the user's session data for user information. If that information exists there, it simply returns it.
  • If no user information exists in the session, and a special flag indicating a log in attempt has not been set in the query string, we render the page (modifying context['cs_content'] to indicate that the user should log in)
  • If we set the flag asking to log in, we store some information in the session (related to the OpenID Connect request we're going to make), and redirect elsewhere (which then redirects back to catsoop to actually complete the login).

I think the high-level approach to an LDAP authentication would be similar, but the last step, rather than performing a redirect, would show a login form. Then that form would submit to someplace where we could send the username and password to the LDAP server, and use its response to store the appropriate information in the session.

I'm not at all familiar with LDAP, but it looks like there are at least a couple of packages in PyPI that implement LDAP (this one, for example, looks reasonable). If we were using that package, it does look like the login process would be pretty straightforward (following this example), though we would want to have a way to specify the arguments to the Server object, since different LDAP servers will, presumably, require different settings.

I'm not sure if that's enough to get started on, but I'm certainly happy to continue the conversation, and to help/advise as necessary.

It's maybe also worth mentioning that I'm planning on implementing a more general "single-sign-on" solution as part of version 2020.9 (probably following Discourse's SSO model verbatim), which would send some information off to an arbitrary external page and receive back authentication information.

Once that is done, that would allow you to implement an external LDAP-authenticated page somewhere, and to have it perform the authentication and send back the relevant information to catsoop.

It's probably still good to have LDAP authentication built in to catsoop (since I imagine a lot of institutions use it), but if it's preferable for your use case, the SSO option will exist in 2020.9 (which will release by the end of this summer), and I can get it into the dev branch of the catsoop repo sooner rather than later if that's helpful.

Thanks for the detailed introduction. I think I have enough information to start working on this.

The ldap3 library seems to be the "standard" LDAP3 package for Python. We have used it before for this exact purpose, so I already have a reference implementation.

What would be the semantic way of handling these arguments in CAT-SOOP? Would it be appropriate to require some properties to be defined in the preload.py file of the course? E.g. defining cs_auth_ldap3_server, cs_auth_ldap3_ssl, etc. In that case, how do you access these variables in the authentication manager?

Built-in LDAP would probably still be easier for our use case, but good to know. When does the 2020.9 version release? I would guess that it is just prior to the start of the term at MiT?

Sounds good! If other questions come up, I'm happy to help.

Yes, the usual way to do this is indeed to allow specifying these things in the global config.py or in the preload.py of a subject.

Regardless of whether they are specified in the config.py or in a preload.py file, those variables will be available from inside of the context dictionary that is passed to get_logged_in_user . For example, if I defined foo = 'bar' in my preload.py, I could look that up from within get_logged_in_user by looking up context['foo'].

I think one could imagine the details going one of two different ways, either:

  1. Expect several different variables, cs_auth_ldap3_server, cs_auth_ldap3_ssl, ...
  2. Expect a single variable cs_ldap3_server_options that contains a dictionary mapping the names of the parameters of the Server class's initializer, to the values we want. For example, cs_ldap3_server_options = {'host': 'example.org', 'port': 389, 'get_info': 'ALL', 'use_ssl': True}

I tend to lean toward things like option 2, since that allows just using dictionary unpacking on that cs_ldap3_server_options object when creating your Server instance, e.g., s = Server(**context['cs_ldap3_server_options']). This would also mirror the way we're handling connection options when using PostgreSQL for log storage.

Yep, I agree that it is good to have it built in.

Yes, probably sometime in early- or mid-August, so that it's a little bit ahead of the actual start of our semester.

That said, I think working from the current dev branch is probably fairly safe right now, since I think that most of the big backwards-incompatible changes I'm planning for 2020.9 have already been merged in, and most of the remaining work for the summer will be adding features and fixing bugs. So if you wanted to work from the dev branch for now, you'd probably be in good shape (since whatever is on the dev branch when we decide to "release" will become v2020.9).

Another alternative would be to plan to use 2019.9 (our "long-term stable" release). That version won't get many fancy new features, but we'll generally backport bugfixes into it until September 2021, and there won't be any backwards-incompatible changes during that time. We could certainly backport LDAP3 authentication into the 2019.9 branch if you wanted to go that route.

At MIT, we have a mix of people using those two options (but I doubt anyone here will be on 2020.2 in the fall).