Installation and Configuration Quick-Start

This section of the document is designed to help you get an instance of CAT-SOOP up and running on a server you control. As a general rule, I only test using Debian GNU/Linux, but others have tested these instructions on Mac OSX and on Windows (Cygwin or LSW).

This page is primarily intended for users who are setting up public-facing instances of CAT-SOOP. For instructions on setting up an instance for local debugging, see this page.

1) Install Necessary Software

CAT-SOOP depends on Python (version 3.5+, with pip).

You will need Python version 3.5+ on your system to run CAT-SOOP. Many distributions have Python 3.5+ in their package managers, though it may be necessary to download the source from the official Python site.

On Debian Stretch, you will need the python3 and python3-pip packages (or a version of Python 3.5+ installed in some other way).

1.1) (Cygwin Only) Patch _pyio

As of the time of this writing (December 2017), the Python version available through Cygwin ships with a broken version of _pyio, which cheroot uses. In order to run CAT-SOOP on a Cygwin host, edit the file /usr/lib/python3.6/_pyio.py so that the first conditional (about sys.platform) reads as follows:

if sys.platform == 'win32':
    from msvcrt import setmode as _setmode
elif sys.platform == 'cygwin':
    import ctypes
    _cygwin1 = ctypes.PyDLL('cygwin1.dll')
    def _setmode(fd, mode):
        return _cygwin1._setmode(ctypes.c_int(fd), ctypes.c_int(mode))
else:
    _setmode = None

2) Download CAT-SOOP

You will also need a copy of the CAT-SOOP source. You can get the most recent version of the code here, or the bleeding-edge version here.

That said, it may be a better idea to clone the development repository instead, with the following command:

$ hg clone https://catsoop.mit.edu/repo/cat-soop

or from the Git mirror:

$ git clone https://catsoop.mit.edu/gitrepo cat-soop

Cloning the repository typically makes it easier to update in the future.

Regardless of the method you use or the version you download, it is important that these files be in a location where the user who will be running the web server has read/write access.

3) Install Python Dependencies

Install CAT-SOOP's Python dependencies by navigating to the source directory and running:

$ pip3 install -r requirements.txt

Note

You may need to preface the above with sudo, depending on the location of your Python interpreter.

4) Configure CAT-SOOP

In the scripts directory of the source distribution, there is a script called setup_catsoop.py. Run this script and answer the questions it poses:

$ python3 scripts/setup_catsoop.py

This will create a file called config.py in the catsoop directory of the source code. This file will contain system-wide configuration.

Note

You are strongly encouraged to enable encryption if the directory in which you are storing the logs is not already encrypted in some way (e.g., via `luks` or `gocryptfs` or `cryfs`, etc).

4.1) Check Web Settings

Double-check the following values in config.py:

For example:

cs_url_root = 'http://localhost:6010'
cs_checker_websocket = 'ws://localhost:6011'

Typically, on a public-facing server, cs_url_root will start with https, and cs_checker_websocket will start with wss.

Double Check

Make sure that the cs_fs_root directory can be read from and written to by the web server's user.

Make sure that the cs_data_root directory is not web-accessible, and that the web server's user has read/write access.

By default, the start_catsoop.py script will start several processes. The most important are the UWSGI server (default port 6010) and the websocket server (default port 6011). You can change these ports by setting additional variables cs_wsgi_server_port and cs_checker_server_port, respectively, in your config.py.

5) Configure nginx

Next, we will configure nginx to redirect relevant traffic to the web server and the websocket server.

Start by creating a new file in /etc/nginx/sites-available with the following content, which will configure nginx to route certain requests to CAT-SOOP, and to redirect all traffic to HTTPS.

You can, of course, customize the endpoints (/cat-soop and /reporter in the example below) to change the base URL for both the WSGI server and the websocket server.

Note

If you do not already have one and you are planning to make a public-facing server, you should acquire an SSL certificate. If your server is running at MIT, you can follow the instructions on this page. Otherwise, SSL/TLS Certificates are available gratis from Let's Encrypt.
# redirect all HTTP traffic to HTTPS
server {
    listen 80;
    listen [::]:80;
    return 301 https://$host$request_uri;
}

server {
    # listen on port 443 (standard port for HTTPS traffic) and enable SSL
    listen 443 ssl;
    listen [::]:443 ssl;

    # the following should reference your SSL cert and key file
    ssl_certificate     /path/to/certificate-chain.crt;
    ssl_certificate_key /path/to/keyfile.key;

    # by default, serve files from /var/www/html
    root /var/www/html;

    # set the server's name (change this to reflect your server's FQDN)
    server_name your.server.com;

    # try adding trailing slashes before 404'ing
    location / {
        try_files $uri $uri/ =404;
    }

    # ignore .ht* files
    location ~ /\.ht {
        deny all;
    }

    # the following will route requests to https://your.server.com/cat-soop
    # to the uWSGI server.  change "cat-soop" in the following lines if you
    # want to use a different URL.
    location /cat-soop {
            rewrite /cat-soop/?(.*) /$1 break;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection 'upgrade';
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_cache_bypass $http_upgrade;
            proxy_pass http://localhost:6010/;
    }

    # the following will route websocket requests to
    # wss://your.server.com/reporter to CAT-SOOP's websocket server.
    location /reporter {
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection 'upgrade';
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_cache_bypass $http_upgrade;
            proxy_pass http://localhost:6011/;
    }
}

Double Check

Note that your cs_url_root and cs_checker_websocket should match the nginx configuration. In the example above, we should have cs_url_root = 'https://your.server.com/cat-soop' and cs_checker_websocket = wss://your.server.com/reporter in the config.py file.

Also, make sure the ports (6010 and 6011 in the example above) match the port numbers you set in config.py, if any.

Note

If you want the root of the webserver to point to the CAT-SOOP instance, you can remove the block labeled location / from the example above, change location /cat-soop to be location /, and comment out the rewrite line within that block.

5.1) (Optional) Client Certificate Authentication

If you would like to enable authentication based on client certificates (instead of username/password), add the following line beneath the other ssl_ configuration variables in the NGINX configuration file:

    ssl_client_certificate /path/to/client_ca.pem;
    ssl_verify_client on;

where /path/to/client_ca.pem is the location on fisk of the CA with which client certificates are signed.

6) (Optional) Configure Workers

6.1) Web Server

By default, CAT-SOOP uses cheroot as its WSGI server. You can cause CAT-SOOP to launch more than one worker by setting cs_wsgi_server_port to a list of integers instead of a single integer. In this case, you will also likely want to configure NGINX to balance the load between the different processes by following the instructions on this page (and, importantly, including the ip_hash; directive so users' sessions are not lost).

It is also possible to use uWSGI instead of cheroot. To do so, set cs_wsgi_server = 'uwsgi' in your config.py file. To make uWSGI spawn multiple worker processes, set the cs_wsgi_server_min_processes and cs_wsgi_server_max_processes variables. When using uWSGI, you do not need to do any special NGINX configuration for load balancing.

6.2) Checker

By default, CAT-SOOP's checker will run at most 1 check at a time. If you have the resources available, you can configure the checker to run multiple checks in parallel by setting cs_checker_parallel_checks to a larger (integer) number in your config.py.

7) Start CAT-SOOP

From within the scripts directory of the CAT-SOOP source, run the following command to start CAT-SOOP:

$ python3 start_catsoop.py

On a typical webserver, it is a good idea to run the command using nohup so that the process does not die when you hang up. For example,

$ nohup python3 start_catsoop.py > /dev/null &

8) Test Configuration

Direct your web browser to your cs_url_root and you should now see the CAT-SOOP default page!

9) (Optional) Sign Up for Mailing List

catsoop-users@mit.edu is a low-volume list used to announce updates to CAT-SOOP, as well as a place to ask questions about usage. You can subscribe here.

Note also that way to report issues is by sending an e-mail to catsoop-dev@mit.edu. You are also welcome to subscribe to that list, and you are encouraged to do so if you plan on participating in CAT-SOOP development.

10) (Optional) Configure Backups

All of CAT-SOOP's data are stored in files on disk in a directory called __LOGS__ in the cs_data_root location specified above. CAT-SOOP itself will not back these files up, but there are many strategies for backups using common utilities.

I have used many approaches in the past, but my usual approach involves setting the __LOGS__ directory up as a Mercurial or Git repository, and then setting up a cron job to commit all files in that repository and push to several locations. This approach has several advantages over simply using rsync or scp to copy the folder to a remote machine. In particular, it allows you to roll back to any past backup while keeping size down by only storing diffs (instead of storing a complete copy of each file for each backup).

Here, we'll set up a backup using Git (which tends to be more efficient for this purpose, both in time and in memory, than Mercurial). To set this up, first move yourself to the __LOGS__ directory and run git init, followed by git add -A . and git gc --aggressive. This will set your __LOGS__ directory up as a Git repository. You can then set up a cron job to commit all changes and push these changes to an arbitrary number of backup locations (local or remote).

The following example script (/home/catsoop/do_backup.sh) was used by several classes in fall 2018. It commits local changes to a Git repository, and it then pushes those changes to one local location (on a separate disk) and to one remote location.

#!/bin/bash
cd /home/catsoop/cat-soop-data/__LOGS__;
git add -A;
git commit -m "$(date +'%Y-%m-%d:%H:%M')";
git push /storage2/backup master;
git push catsoop@cat-soop.org:backups/py master;

It can then be configured to run, for example, every hour at xx:05 and xx:35 with the following crontab entry:

5,35 * * * * /usr/bin/flock -n /tmp/backup.lockfile /home/catsoop/do_backup.sh 2>&1 >/dev/null

11) (Optional) Set Up Local Python Sandbox

See this page.