3 Managing

Chapter 3
Managing

Mongrel2 is designed to be easy to deploy and automate the deployment. This is why it uses SQLite to store the configuration, but m2sh as an interface to creating the configuration. Doing this lets you access the configuration using any language that works for you, augment it, alter it, migrate it, and automate it.

In this chapter, I’m going to show you how to make a basic configuration using m2sh and all the commands that are available. You’ll learn how the configuration system is structured so that you know what goes where, but in the end it’s just a simple storage mechanism.

Note 2: Apparently SQL Inspires FUD

When I first started talking about Mongrel2, I said I’d store the configuration in SQLite and do a Model-View-Controller kind of design. Immediately, people who can’t read flipped out and thought this meant they’d be back in “Windows registry hell”, but with SQL as their only way to access it. They thought that they’d be stuck writing configurations with SQL; that SQL couldn’t possibly configure a web server.
They were wrong on many levels. Nobody was ever going to make anyone use SQL. That was repeated over and over but, again, people don’t read and love spreading FUD. The SQLite config database is nothing like the Windows Registry. No other web server really uses a true hierarchy; they just cram a relational model into a weirdo configuration format. The real goal was to make a web server that was easy to manage from any language, and then give people a nice tool to get their job done without having to ever touch SQL. EVER!
In the end, what we got despite all this fear mongering is a bad ass configuration tool and a design that is simple, elegant, and works fantastically. If you read that Mongrel2 uses SQLite and thought this was weird, well, welcome to the future. Sometimes it’s weird out here (even though Postfix has been doing this for a decade or more).

3.1 Model-View-Controller

When you hear Model-View-Controller, you think about web applications. This is a design pattern where you place different concerns into different parts of your system and try not to mix them too much. For an interactive application, if you keep the part that stores data (Model) separated from the logic (Controller) and use another piece to display and interact with the user (View), then it’s easier to change the system and adapt it over time to new features.

The power of MVC is simply that these things really are separate orthogonal pieces that get ugly if they’re mixed together. There’s no math or theory that says why; just lots of experience has told us it’s usually a bad idea. When you start mixing them, you find out that it’s hard to change for new requirements later, because you’ve sprinkled logic all over your web pages. Or you can’t update your database because there’s all these stored procedures that assume the tables are a certain way.

Mongrel2 needed a way to allow you to use various languages and tools to automate its configuration. Letting you automate your deployments is the entire point of the server. The idea was that if we gave you the Controller and the Model, then you can craft any View you wanted, and there’s no better Model than a SQL database like SQLite: it’s embeddable, easily accessed from C or any language, portable, small, fast enough and full of all the features you need and then some.

What you are doing when you use m2sh (from tools/m2sh) to configure a configuration for Mongrel2, is working with a View we’ve given you to create a Model for the Mongrel2 server to work with. That’s it, and you can create your own View if you want. It could be automated deployment scripts, a web interface, monitoring scripts, anything you need.

The point is, if you just want to get Mongrel2 up and running, then use m2sh. If you want to do more advanced stuff, then get into the configuration database schema and see what you can do. The structure of the database very closely matches Mongrel2’s internal structure, so understanding that means you understand how Mongrel2 works. This is a vast improvement over other web servers like Apache where you’ve got no idea why one stanza has to go in a particular place, or why information has to be duplicated.

With Mongrel2, it’s all right there.

3.2 Trying m2sh

To give this configuration system a try, you just need to run the test configuration used in the unit tests. Let’s try doing a few of the most basic commands with this configuration.

First, make sure you are in the mongrel2 source and you’ve ran the build so that you get the tests/config.sqlite file primed. This is our base test case that we use in unit testing. After you have that, do this:

Source 5: Sample m2sh Commands

# get list of the available servers to run 
m2sh servers -db tests/config.sqlite 
 
# see what hosts a server has 
m2sh hosts -db tests/config.sqlite -server test 
 
# find out if a server named ’test’ is running 
m2sh running -db tests/config.sqlite -name test 
 
# start a server whose default host is ’localhost’ 
m2sh start -db tests/config.sqlite -host localhost

At this point, you should have seen lists of servers and hosts, seen that mongrel2 is not running, and then started it. You can find out about all the commands and get help for them with m2sh help or ms2h help --on command.

You can now try doing some simple starting, stopping and reloading using sudo (make sure you CTRL-c to exit from the previous start command):

Source 6: Starting, Stopping, Reloading

# start it so it runs in the background via sudo 
m2sh start -db tests/config.sqlite -host localhost -sudo 
tail logs/error.log 
 
# reload it 
m2sh reload -db tests/config.sqlite -host localhost 
tail logs/error.log 
 
# hit is with curl to see it do the reload 
curl http://localhost:6767/ 
tail logs/error.log 
 
# see if it’s running then stop it 
m2sh running -db tests/config.sqlite -host localhost 
m2sh stop -db tests/config.sqlite -host localhost

Note 3: Warning: contents may be broken

If m2sh start runs fine, but m2sh start -sudo fails, you may need to link /proc in your chroot, using mkdir -p proc && sudo mount --bind /proc proc. You can also try installing ZeroMQ from source, if you’d rather avoid putting your /proc where it might get seen.

Awesome, right? Using just this one little management tool you are able to completely manage a Mongrel2 instance without having to hack on a config file at all. But you probably need to know how this is all working anyway.

3.2.1 What The Hell Just Happened?

You now have done nearly everything you can to a configuration, but you might not know exactly what’s going on. Here’s an explanation of what’s going on behind the scenes:

When you did m2sh start with the -sudo option, it actually runs sudo mongrel2 tests/config.sqlite localhost to start the server.
Mongrel2 is now running in the background as a daemon process, just like a regular server. However, what it did was chroot to the current directory and then drop privileges so that they match the owner of that directory (you). Use ps aux to take a look.
With Mongrel2 running, you can look in the logs/error.log file to see what it said. It should be a bunch of debug logging, but check out the messages: nice and detailed.
Next you did a soft reload with m2sh reload and you should notice that your mongrel2 process was able to load the new config without restarting.
However, there’s a slight bug that doesn’t do the reload until the next request is served. That’s what the curl http://localhost:6767/ was for.
Now that you can see this reload work in logs/error.log, you used m2sh running to see if it’s running. This command is just reading the config database to find out where the PID file is (run/mongrel2.pid) and then checking if that process is running.
Finally, you tell mongrel2 to stop, and since it dropped privileges to be owned by you, you can do that without having to use sudo.

All of this is happening by reading the tests/config.sqlite file and not reading any configuration files. You can now try building your own configuration that matches this one or some others.

3.3 A Simple Configuration File

To configure a new config database you’ll write a file that looks a lot like a configuration file. It looks like a Python file, because it comes from the first m2sh we wrote in Python (living in examples/python), but now it’s written in C. Even though it was rewritten, we managed to keep the same format —and even make it a little easier by making commas optional in most places.

First you load your configuration into a fresh database using m2sh load. For our example, we’ll use the example configuration from examples/configs/sample.conf to make a simple one:

Source 7: Simple Little Config Example

main = Server(
    uuid="f400bf85-4538-4f7a-8908-67e313d515c2",
    access_log="/logs/access.log",
    error_log="/logs/error.log",
    chroot="./",
    default_host="localhost",
    name="test",
    pid_file="/run/mongrel2.pid",
    port=6767,
    hosts = [
        Host(name="localhost", routes={
            /tests/: Dir(base=tests/, index_file=index.html,
                             default_ctype=text/plain)
        })
    ]
)

servers = [main]

If you aren’t familiar with Python, then this code might look freaky, but it’s really simple. We’ll get into how it’s structured in a second, but to load this file we would just do this:

Source 8: Loading The Simple Config

m2sh load -config examples/configs/sample.conf 
ls -l config.sqlite 
m2sh servers 
m2sh hosts -server test 
m2sh start -name test

Notice that we didn’t have to tell m2sh that the database was config.sqlite. It assumes that is the default, as well as that mongrel2.conf is the config file you want. If you use those two files, then you never have to type those parameters again.

With this sequence of commands you:

Create a raw fresh config database name config.sqlite and load the sample.conf into it.
List the servers it has configured.
List the hosts that server has, with what routes it has.
Start this server to try it out.

By now you should be getting the hang of the pattern here, which is to use m2sh and a configuration “script” to generate .sqlite files that Mongrel2 understands.

3.4 How A Config Is Structured

The base structure of a Mongrel2 configuration is:

Server

This is the root of a config, and you can have multiples of these in one database, even though each start command only runs one at a time.

Host

Servers have Hosts inside them, which say what DNS hostname Mongrel2 should answer for. You can have multiples of these in each Server.

Route

Hosts have Routes in them, which tells Mongrel2 what to do with URL paths and patterns that match them. Routes then have Dir, Handler or Proxy items in them.

Dir: A Dir serves files out of a directory, complete with 304 and ETag support, default content types, and most of the things you need to serve them.
Proxy: A Proxy takes requests matching the Route they’re attached to and sends them to another HTTP server somewhere else. Mongrel2 will then act as a full proxy and also try to keep connections open in keep-alive mode if the browser supports it.
Handler: A Handler is the best part of Mongrel2. It takes HTTP requests, and turns them into nicely packed and processed ZeroMQ messages for your asynchronous handlers.

Each of these nested “objects” then has a set of attributes you can use to configure them, and most of them have reasonable defaults.

3.4.1 Server

The server is all about telling Mongrel2 where to listen on its port, where to chroot, and general server specific deployment gear.

uuid: A UUID is used to make sure that each deployed server is unique in your infrastructure. You could easily use any string that’s letters, numbers, or - characters.
chroot: This is the directory that Mongrel2 should chroot to and drop privileges.
access_log: The access log file relative to the chroot. Usually starts with a ‘/’. Make sure you configure your server so that this and other files aren’t accessible, or make this owned by root.
error_log: The error log file, just like access_log.
pid_file: Like the access log, where within the chroot directory is the pid file stored.
default_host: The server has a bunch of hosts listed, but it needs to know what the default host is. This is also used as a convenient way to refer to this Server.
bind_addr: The IP address to bind to. By default, it’s “0.0.0.0”.
port: The port the server should listen on for new connections.

3.4.2 Host

A host is matched using a kind of inverse route that matches the ending of Host: headers against a pattern. You’ll see how this works when we talk about routes, but for now you just need to know that request to the Server.port are routed based on these Host configurations the Server contains.

name: The name that you use to talk about this Host in the server configuration.
matching: This is a pattern that’s used to match incoming Host: headers for routing purposes.
server: If you want to set the server separately, you can use this attribute.
maintenance: This will be a setting for the future that will let you have Mongrel2 throw up a maintenance page for this host.
routes: This is a dict (hashmap) of the URL patterns mapped to the targets that should be run.

3.4.3 Route

The Route is the workhorse of the whole system. It uses some very fancy but still simple code in Mongrel2 to translate Host: headers to Hosts and URL paths to Handlers, Dirs, and Proxies.

path: This is the path pattern that matches a route. The pattern uses the Mongrel2 pattern language, which is a reduced version of the Lua pattern matching system.
reversed: Determines if this pattern is reversed, which is useful for matching file extensions, hostnames, and other naming systems where the ending is really the prefix. Usually you don’t set this.
host: You can use this attribute to set the host manually.
target: This is the target that should handle the request, either a Dir, Handler or Proxy.

Later on, you’ll learn about the pattern matching that’s used, but it’s basically a stripped down version of your normal regular expressions, but with a few convenient syntaxes for doing simple string matching. When you configure a route, you write something like /images/(.*.jpg) and the part before the ‘(’ is used as a fast matched prefix, while the part after it is considered a pattern to match. When a request comes in, Mongrel2 quickly finds the longest prefix that matches the URL, and then tests its pattern if there is one. If the pattern is valid, the request goes through. If not, 404.

3.4.4 Dir

A Dir is a simple directory-serving route target that serves files out of a directory. It has caching built-in, handles if-modified-since, ETags, and all the various bizarre HTTP caching mechanisms as RFC-accurately as possible. It also has default content-types and index files.

base: This is the base directory from the chroot that is served. Files should not be served outside of this base directory, even if they’re in the chroot.
index_file: This is the default index file to use if a request doesn’t give one. The Dir also will do redirects if a request for a directory doesn’t end in a slash.
default_ctype: The default Content-Type to use if none matches the MIMEType table.

Currently, we don’t offer more parameters for configuration, but eventually you’ll be able to tweak more and more of the settings to control how Dirs work.

3.4.5 Proxy

A proxy is used so that you can use Mongrel2 but not have to throw out your existing infrastructure. Mongrel2 goes to great pains to make sure that it implements a fast and dead-accurate proxy system internally, but no matter how good it is, it can’t compete with ZeroMQ handlers. The idea with giving Proxy functionality is that you can point Mongrel2 at existing servers, and then slowly carve out pieces that will work as handlers.

addr: The DNS address of the server.
port: The port to connect to.

Requests that match a Proxy route are still parsed by Mongrel2’s incredibly accurate HTTP parser, so that your backend servers should not receive badly formatted HTTP requests. Responses from a Proxy server, however, are sent unaltered to the browser directly.

3.4.6 Handler

Now we get to the best part: the ZeroMQ Handlers that will receive asynchronous requests from Mongrel2. You need to use the ZeroMQ syntax for configuring them, but this means with one configuration format you can use handlers that are using UDP, TCP, Unix, or PGM transports. Most testing has been done with TCP transports.

send_spec: This is the 0MQ sender specification. Something like tcp://127.0.0.1:9999 will use TCP to connect to a server on 127.0.0.1 at port 9999. The type of socket used is a PUSH socket, so that handlers receive messages in round-robin style.
send_ident: This is an identifier (usually a UUID) that will be used to register the send socket. This makes it so that messages are persisted between crashes.
recv_spec: Same as the send spec, but it’s for receiving responses from Handlers. The type of socket used is a SUB socket, so that a cluster of Mongrel2 servers will receive handler responses but only the one with the right recv_ident will process it.
recv_ident: This is another UUID if you want the receive socket to subscribe to its messages. Handlers properly mention the send_ident on all returned messages, so you should either set this to nothing and don’t subscribe, or set it to the same as send_ident.

The interesting thing about the Handler configuration is that you don’t have to say where the actual backend handlers live. Did you notice you aren’t declaring large clusters of proxies, proxy selection methods, or anything else, other than two 0MQ endpoints and some identifiers? This is because Mongrel2 is binding these sockets and listening. Mongrel2 doesn’t actively connect to backends; they connect to Mongrel2. This means, if you want to fire up 10 more handlers, you just start them; no need to restart or reconfigure Mongrel2 to make them active.

3.4.7 Others

There’s also Log, MIMEType, and Setting objects/tables you can work with, but we’ll get into those later, since you don’t need to know about them to understand the Mongrel2 structure.

3.5 A More Complex Example

All of this knowledge about the Mongrel2 configuration structure can now be used to take a look at a more complex example. We’ll take a look at this example and I’ll just say what’s going on, and you try to match what I’m saying to the code. Here’s the examples/configs/mongrel2.conf file:

Source 9: Mongrel2.org Config Script

# heres a sample directory
test_directory = Dir(base=tests/,
                     index_file=index.html,
                     default_ctype=text/plain)

# a sample proxy route
web_app_proxy = Proxy(addr=127.0.0.1, port=8080)

chat_demo_dir = Dir(base=examples/chat/static/,
                    index_file=index.html,
                    default_ctype=text/plain)

# a sample of doing some handlers
chat_demo = Handler(send_spec=tcp://127.0.0.1:9999,
                    send_ident=54c6755b-9628-40a4-9a2d-cc82a816345e,
                    recv_spec=tcp://127.0.0.1:9998, recv_ident=)

handler_test = Handler(send_spec=tcp://127.0.0.1:9997,
                       send_ident=34f9ceee-cd52-4b7f-b197-88bf2f0ec378,
                       recv_spec=tcp://127.0.0.1:9996, recv_ident=)

# your main host
mongrel2 = Host(name="mongrel2.org", routes={
    @chat: chat_demo,
    /handlertest: handler_test,
    /chat/: web_app_proxy,
    /: web_app_proxy,
    /tests/: test_directory,
    /testsmulti/(.*' .json): test_directory,
    /chatdemo/: chat_demo_dir,
    /static/: chat_demo_dir,
    /mp3stream: Handler(
        send_spec=tcp://127.0.0.1:9995,
        send_ident=53f9f1d1-1116-4751-b6ff-4fbe3e43d142,
        recv_spec=tcp://127.0.0.1:9994, recv_ident=)
})

# the server to run them all
main = Server(
    uuid="2f62bd5-9e59-49cd-993c-3b6013c28f05",
    access_log="/logs/access.log",
    error_log="/logs/error.log",
    chroot="./",
    pid_file="/run/mongrel2.pid",
    default_host="mongrel2.org",
    name="main",
    port=6767,
    filters = [],
    hosts=[mongrel2]
)

settings = {"zeromq.threads": 1, "upload.temp_store":
    "/home/zedshaw/projects/mongrel2/tmp/upload.XXXXXX",
    "upload.temp_store_mode": "0666"
}

servers = [main]

If you haven’t guessed yet, this configuration is what’s used on http://mongrel2.org to configure the main test system. In it we’ve got the following things to check out:

Our basic server, with a default host of mongrel2.org.
The route targets are separated out into their own variables, unlike the sample_conf.py file where they’re just tossed into one big structure.
First target is a Dir that serves up files out of the tests directory and uses index.html as its default file.
Next we setup a Proxy pointing at the main website’s server for testing the proxy.
Then there’s a Dir target for the http://mongrel2.org:6767/chatdemo/ that we’ll look at later. You MUST have flash for this to work!
And you have the Handler for the same chat demo that does the actual logic of a chat system.
After that’s a little Handler for testing out doing HTTP requests to a handler. Notice how even though the chat demo and this handler use different protocols (chat demo is using JSSockets) you don’t have tell mongrel2 that? It figures it out based on how they’re being used rather than by configurations.
With all those handler targets, we can now make the mongrel2 Host with all the routes assigned once, nice and clean. However, look how I was lazy and just tossed the mp3stream demo right into the routes dict? You can totally do this and m2sh will figure it out. Remember also that you can use the ’blah’ string format to not have to double up on your ' chars in the patterns.
We then assign this mongrel2 variable as the hosts for the main server.
There is also a settings feature, which is just a dict of global settings you can tweak. In this case, we’re upping the number of threads that 0MQ is using for its operations.
Finally, we commit the whole thing to the database by passing in the servers to save and the settings to use.

And that, my friends, is the most complex configuration we have so far.

3.6 Routing And Host Patterns

The pattern code was taken from Lua and is some of the simplest code for doing fast pattern matches. It is very much like regular expressions, except it removes a lot of features you don’t need for routes. Also, unlike regular expressions, URL patterns always match from the start. Mongrel2 uses them by breaking routes up into a prefix and pattern part. It then uses routes to find the longest matching prefix and then tests the pattern. If the pattern matches, then the route works. If the route doesn’t have a pattern, then it’s assumed to match, and you’re done.

The only caveat is that you have to wrap your pattern parts in parenthesis, but these don’t mean anything other than to delimit where a pattern starts. So instead of /images/.*.jpg, write /images/(.*.jpg) for it to work.

Here’s the list of characters you can use in your patterns:

. (period) All characters.
' a Letters.
' c Control characters.
' d Digits.
' l Lowercase letters.
' p Punctuation characters.
' s Space characters.
' u Uppercase letters.
' w Alphanumeric characters.
' x Hexadecimal digits.
' z The ' 0 character (null terminator).
[set] Just like a regex’s [] where set is a set of chars, like [0-9] for all digits.
[ˆset] Inverse character set, so [ˆ0-9] is anything but digits.
* Longest match of 0 or more of the preceding character.
+ Longest match of 1 or more of the preceding character.
- Shortest match of 0 or more of the preceding character.
? 0 or 1 match of of the preceding character
' bxy Balanced match a substring starting with x and ending in y. So ' b() will match balanced parentheses.
$ End of the string.

Using the uppercase version of an escaped character makes it work the opposite way (e.g., ' A matches any character that isn’t a letter). The backslash can be used to escape the following character, disabling its special abilities (e.g., ' ' will match a backslash).

Anything that’s not listed here is matched literally.

Note 4: Sorry, Unicodians, It’s All ASCII

Yep, I get it. You think that everyone should use UTF-8 or some Unicode encoding for everything. You despise the dominance of the ‘A’ in ASCII and hate that you can’t put your spoken language right in a URL.
Well, I hate to say it, but tough. Protocols are hard enough without having to worry about the bewildering mess that is Unicode. When you sit down to write a network protocol, the last thing you need is a format that’s inconsistent, has multiple interpretations, can’t be properly capitalized or lowercased, and requires extra translation steps for every operation. With ASCII, every computer just knows what it is, and it’s the fastest for creating wire protocol formats.
This is why, on the Internet, you have to do things to URLs to make them ASCII, like encoding them with % signs. It’s in the standard, and it’s the smart thing to do. I don’t want to have to know the difference between the various accents in your spoken language to route a URL around. I just want to deal with a fixed set of characters and be done with it. Don’t blame me or Mongrel2 for this, it’s just the way the standard is and the way to get a server that is stable and works.
Protocols work better when there’s less politics in their design. This means you can’t put Unicode into your URL patterns. I mean, you can try; but the behavior is completely undefined.

Here are some example routes you can try to get a feel for the system:

"/images/" This will just match any path that has /images/ in it without any patterns.
"/" The fastest possible route you can have.
"/images/(.*.jpg)" Match only requests for jpg images in the images directory. Keep in mind that this isn’t actually looking in the directory, it’s just matching the (.*.jpg) pattern.
"/images/(' a-' -' d+' .jpg)" A more complex example that matches a short sequence of 0 or more letters (remember -), then a dash (' - escapes the -), then 1 or long sequence of digits and finally a .jpg) with the ' . escaping the period.

That should give the idea of how you can use them. Notice also that I’m using the Python "blah" string syntax which is interchangeable with the ’blah’ syntax so I don’t have to double escape everything.

3.6.1 How Routing Works

The routing algorithm is actually kind of simple, but it’s an unfamiliar algorithm to most programmers. I won’t go into the details of how a “Ternary Search Tree” works, but basically it lets you match one prefix against a bunch of other strings very fast. This data structure lets Mongrel2 very quickly determine the target for a route, and also know if it has a route at all. Typically, it can match a route in just a few characters, and reject a route in even fewer.

For practical usage, it’s better to just read how it works, rather than how it’s implemented. Here’s how Mongrel2 matches an incoming URL against routes you’ve given it:

Your configuration has a route for "/images/(.*.jpg)" and "/users".
Mongrel2 loads these and converts them to PREFIX/PATTERN pairs. For the first one the PREFIX=”images”, PATTERN=”(.*.jpg)”. For the second one, it’s PREFIX=”/users” and PATTERN=None.
It stores these in the URL routes by their PREFIX, and there can be only one PREFIX at a time. This means you can’t put "/foo/(.*)" and "/foo/" in at the same time (that’s always redundant anyway).
A request comes in for /images/hello.jpg so Mongrel2 takes the whole URL and searches for the longest first route that can possibly match. In this case, that’s the /images route.
It checks if the route it found has a pattern, and if it does then it runs the pattern match code for the whole thing. If they match, then this is the target and it’s good. If not, it returns a 404. In this case the /images URL and patterns match so it’s good.
Next, a request comes in for /users/johndoe/1234.
Mongrel2 does the PREFIX search again, and the longest matching prefix is the route for "/users" so it gets that from the routing table.
Since the /users route doesn’t have a PATTERN, then this is the route and it passes by default. No pattern matching code is run.
Now for a slightly confusing result: A request comes in for /us. Since a PREFIX for "/users" exists, and it’s the longest ”first” match, it will match that route. If you wanted this condition to fail, you’d need to be explicit and add on a pattern like, "/users()$" to say you need an exact match. Another option is to give a "/" route for a default location (which usually happens).
Finally, a request comes in for /XRAY. This will match no prefix at all, so it gets a 404.

That example should show you how routes work, and the important thing to realize is that they’ll try to match the “longest first route” as what we call the “best” route. If you get unexpected routing behavior, then you’ll want to just make them explicit by putting a pattern at the end.

Finally, here’s some examples directly from the unit test that we have for the routing system. Imagine we have these routes:

"/" == handler0
"/users/([0-9]+)" == handler1
"/users" == handler2
"/users/people/([0-9]+)$" == handler3
"/cars-fast/([a-z]-)$" == handler4

Then this is how a set of example requests would match:

/users/1234/testing - handler1
/users - handler2
/users/people/1234 - handler3
/cars-fast/cadillac - handler4
/users/1234 - handler1
/ - handler0
/usersBLAHAHAHAHA - handler2
/us - handler2

Work through those in your head so you make sure you understand them.

3.6.2 JSON/XML Message Routing Syntax

Mongrel2 works with Flash sockets out of the box (with WebSockets coming soon) and can handle either XML messages or special JSON messages. It does this by modifying the parser it has internally to parse out HTTP or (exclusive) XML and JSON messages. This feature can be used by any TCP client, not just Flash, it just happens to be a simple way to send simple async messages without using HTTP.

To make it work, there’s a slight modification to the routes used by JSON or XML messages. Basically, JSON routes start with a ’@’ and XML routes start with a ’<’ and both must be terminated with a NUL byte ’' 0’. When the parser sees these at the beginning of a request, it parses that message and sends it “as-is” to your target handler.

Let’s look at two examples from the chat demo and from some test suites:

"@chat": chat_demo
"<test": xml_demo

The first one will take any Flash (or just TCP connection) that sends lines like @chat {"msg": "hello"}' 0 and route those to the chat_demo handler. You can connect, and then just stream these JSON messages all you want, and handlers can send back the same responses. In fact, as long as you don’t include a ’' 0’ character, you could probably send anything you want.

The second route will take any XML that is wrapped in a <test> tag and send that to your handlers. That means you can send <test name="joe"><age>21</age></test> and it will send it to xml_demo.

This is powerful because Mongrel2 now becomes a generic XML or JSON messaging server very easily. For example, I wrote a simple little BBS demo with Mongrel2 and wrote a very basic terminal client in Python for people to use instead of the browser. Look at examples/bbs/client.py to see how that works in full, but the meat of it is:

Source 10: BBS Client JSON Socket Handling

CONN = socket.socket() 
CONN.connect((host, port)) 
 
def read_msg(): 
    reply = "" 
 
    ch = CONN.recv(1) 
    while ch != ’' 0’: 
        reply += ch 
        ch = CONN.recv(1) 
 
    return json.loads(b64decode(reply)) 
 
def post_msg(data): 
    msg = ’@bbs %s' x00’ % ( 
        json.dumps({’type’: ’msg’, ’msg’: data})) 
    CONN.send(msg)

In that code, notice how (for historical reasons due to Flash sucking) the response is base64 encoded, but your handler doesn’t have to do that. You can just adopt the same protocol back. Other than that, the BBS example client is just opening a socket and sending message, but Mongrel2 is converting them to messages to backend handlers for processing.

Finally, here’s the grammar rules in the parser for handling these messages:

Source 11: JSON/XML Message Grammar

rel_path = ( path? (";" params)? ) ("?" query)?; 
SocketJSONStart = ("@" rel_path); 
SocketJSONData = "{" any* "}" :>> "' 0"; 
 
SocketXMLData = ("<" [a-z0-9A-Z' -.]+) 
    ("/" | space | ">") any* ">" :>> "' 0"; 
 
SocketJSON = SocketJSONStart " " SocketJSONData; 
SocketXML = SocketXMLData; 
 
SocketRequest = (SocketXML | SocketJSON);

If you read that carefully, you’ll see you can actually pass query strings and path parameters to your JSON socket handlers. That’s currently not used, but in the future we might.

One caveat to this whole feature is these targets can only be routed to the Server.default_host of the server. There’s not enough information in these routes to determine a target host (like the Host: header in HTTP) so you can only send it to the default target host.

3.7 Deployment Logs And Commits

A very nice feature for people doing operations work is that m2sh keeps track of all the commands you run on it while you work, and lets you add little commit logs to the log for documentation later. These commit logs are then maintained even across m2sh load commands so you can see what’s going on. They track who did something, what server they did it on, what time they did it and what they did.

To see the logs for your own tests, just do m2sh log -db simple.sqlite and then, if you want to add a commit log message, you use the m2sh commit command. Here’s an example from mongrel2.org:

Source 12: Example Commit Log

> m2sh log
[2010-07-18T04:14:53, mongrel2@zedshaw, init_command] /usr/bin/m2sh init
[2010-07-18T04:15:06, mongrel2@zedshaw, load_command] /usr/bin/m2sh load
[2010-07-18T04:22:06, mongrel2@zedshaw, load_command] /usr/bin/m2sh load
[2010-07-18T04:23:32, mongrel2@zedshaw, load_command] /usr/bin/m2sh load
[2010-07-18T04:26:16, mongrel2@zedshaw, upgrade] Latest code for Mongrel2.
[2010-07-18T18:05:59, mongrel2@zedshaw, load_command] /usr/bin/m2sh load
[2010-07-18T20:09:01, mongrel2@zedshaw, init_command] /usr/bin/m2sh config
[2010-07-18T20:09:02, mongrel2@zedshaw, load_command] /usr/bin/m2sh config
> m2sh commit -what mongrel2.org -why "Testing things out."

The motivation for this feature is the trend that ops stores server configurations in revision control systems like git or etckeeper. This works great for holding the configuration files, but it doesn’t tell you what happened on each server. In many cases, the configuration files also need to be reworked or altered for each deployment. With the m2sh log and commit system, you can augment your revision control with deployment action tracking.

Later versions of Mongrel2 will keep small amounts of statistics which will link these actions to changes in Mongrel2 behavior like frequent crashing, failures, slowness, or other problems.

Basically, there’s nowhere to hide. Mongrel2 will help operations figure out who needs to get fired the next time Twitter goes down.

3.8 Control Port

Just before the release of 1.0, we added a feature called the “Control Port”, which lets you connect to a running Mongrel2 server over a unix (domain) socket and give it control commands. These commands let you get the status of running tasks, lists of currently connected sockets and how long they’ve been connected, the server’s current time and kill a connection. Using this control port, you can then implement any monitoring and timeout policies you want, and provide better status.

By default, the control port is in your chroot at run/control, but you can set the control_port setting to change this. You can actually change it to any ZeroMQ valid spec you want, although you’re advised to use IPC for security.

Once Mongrel2 starts, you can then use m2sh to connect to Mongrel2 and control it using the simple command language. Currently, what you get back is very raw, but it will improve as we work on the control port and what it does.

The list of commands you can issue are:

stop: Stops the server using a SIGINT.
reload: Reloads the server using a SIGHUP.
terminate: Terminates the server with SIGTERM.
help: Prints out a simple help message.
uuid: Gives you the server’s UUID.
info: More information about the server.
status what=tasks: Dumps a JSON formatted dict (object) of all the currently running tasks and what they’re doing. Think of it like an internal ps command.
status what=net: Dumps a JSON dict that matches connections IDs (same ones your handlers get) to the seconds since their last ping. In the case of an HTTP connection this is how long they’ve been connected. In the case of a JSON socket this is the last time a ping message was received.
time: Prints the unix time the server thinks it’s using. Useful for synching.
kill id=ID: Does a forced close on the socket that is at this ID from the status net command. This is a rather violent way to kill a connection so don’t do it that often, but if you’re overloaded then this is where to go.
control_stop: Shuts down the control port permanently in case you want to keep it from being accessed for some reason.

You then use the control port by running m2sh:

m2sh control -every 
 
m2 [test]> help 
name  help 
stop  stop the server (SIGINT) 
reload  reload the server 
help  this command 
control_stop  stop control port 
kill  kill a connection 
status  status, what=[’net’|’tasks’] 
terminate  terminate the server (SIGTERM) 
time  the server’s time   
uuid  the server’s uuid 
info  information about this server 
 
m2 [test]> info 
port:  6767 
bind_addr:  0.0.0.0 
uuid:  f400bf85-4538-4f7a-8908-67e313d515c2 
chroot:  ./ 
access_log:  .//logs/access.log 
error_log:  /logs/error.log 
pid_file:  ./run/mongrel2.pid 
default_hostname:  localhost 
 
m2 [test]>

The protocol to and from the control socket is a simple tnetstring in and out that any langauge can read. Here’s a nearly complete Python client that is using the control port:

Source 13: Python Control Port Example

import zmq 
from mongrel2 import tnetstrings 
from pprint import pprint 
 
CTX = zmq.Context() 
 
addr = "ipc://run/control" 
 
ctl = CTX.socket(zmq.REQ) 
 
print "CONNECTING" 
ctl.connect(addr) 
 
while True: 
    cmd = raw_input("> ") 
    # will only work with simple commands that have no arguments 
    ctl.send(tnetstrings.dump([cmd, {}])) 
 
    resp = ctl.recv() 
 
    pprint(tnetstrings.parse(resp)) 
 
ctl.close()

You obviously don’t need to do this, but should you want to do something special like a management interface, this is your start.

3.9 Multiple Servers

A Mongrel2 process itself does not have any support for running multiple servers; instead, it takes two simple parameters: a sqlite config database and a server uuid that names the server to be launched. This is done to keep the mongrel2 code simple and workable.

However.

Mongrel2’s m2sh does support launching multiple servers from a single configuration database. By passing -every to many m2sh commands, you are able to perform actions on all configured servers at once. You can also perform actions on single servers by specifying their uuid, name or host. If any parameter given is ambiguous (that is if, for example, you search with -host localhost and your config contains two servers which attempt to bind to localhost), m2sh will list the matching servers and ask you to clarify your selection.

For example:

> m2sh start -db config.sqlite -every
Launching server localhost XXX on port 6768
...
Launching server localhost XXX on port 6767
...

> m2sh start -db config.sqlite -host localhost
Not sure which server to run, what I found:
NAME HOST UUID
--------------
localhost localhost XXX
localhost localhost XXX
* Use -every to run them all.

> m2sh start -db config.sqlite -uuid XXX
Launching server localhost XXX on port 6767
...

> m2sh running -db config.sqlite -every
Found server localhost XXX RUNNING at PID 28525
PID file run/mongrel2.pid not found for server localhost XXX

> m2sh stop -db config.sqlite -every

3.10 Tweakable Expert Settings

Many of Mongrel2’s internal settings are configurable using the settings system. Some of these are dangerous to mess with, so make sure you test any changes before you try to run them. Setting them to 0 or negative numbers isn’t checked, so if you make a setting and things go crazy, you need to not make that setting. All of these have good defaults so you can leave them alone unless you need to change them.

To configure your settings, you set the variable settings and you’re done:

Source 14: Changing Settings

settings = {"zeromq.threads": 1, "limits.url_path": 1024}

servers = [main]

Mongrel2 will read these on the fly and write INFO log messages telling you what the settings are so you can debug them if they cause problems. The list of available settings are:

control_port=ipc://run/control: This is where Mongrel2 will listen with 0MQ for control messages. You should use ipc:// for the spec, so that only a local user with file access can get at it.
disable.access_logging=0: Set this to 1 to turn off access logs. You still have to specify log files but they won’t be created.
limits.buffer_size=2 * 1024: Internal IO buffers, used for things like proxying and handling requests. This is a very conservative setting, so if you get HTTP headers greater than this, you’ll want to increase this setting. You’ll also want to shoot whoever is sending you those requests, because the average is 400-600 bytes.
limits.client_read_retries=5: How many times it will attempt to read a complete HTTP header from a client. This prevents attacks where a client trickles an incomplete request at you until you run out of resources.
limits.connection_stack_size=32 * 1024: Size of the stack used for connection coroutines. If you’re trying to cram a ton of connections into very little RAM, see how low this can go.
limits.content_length=20 * 1024: Maximum allowed content length on submitted requests. This is, right now, a hard limit so requests that go over it are rejected. Later versions of Mongrel2 will use an upload mechanism that will allow any size upload.
limits.dir_max_path=256: Max path length you can set for Dir handlers.
limits.dir_send_buffer=16 * 1024: Maximum buffer used for file sending when we need to use one.
limits.fdtask_stack=100 * 1024: Stack frame size for the main IO reactor task. There’s only one, so set it high if you can, but it could possibly go lower.
limits.handler_stack=100 * 1024: The stack frame size for any Handler tasks. You probably want this high, since there’s not many of these, but adjust and see what your system can handle.
limits.handler_targets=128: The maximum number of connection IDs a message from a Handler may target. It’s not smart to set this really high.
limits.header_count=128 * 10: Maximum number of allowed headers from a client connection.
limits.host_name=256: Maximum hostname for Host specifiers and other DNS related settings.
limits.mime_ext_len=128: Maximum length of MIME type extensions.
limits.proxy_read_retries=100: The number of read attempts Mongrel2 should make when reading from a backend proxy. Many backend servers don’t buffer their I/O properly and Mongrel2 will ditch their HTTP response if it doesn’t get a header after this many attempts.
limits.proxy_read_retry_warn=10: This is the threshold where you get a warning that a particular backend is having performance problems, useful for spotting potential errors before they become a problem.
limits.url_path=256: Max URL paths. Does not include query string, just path.
superpoll.hot_dividend=4: Ratio of the total (like 1/4th, 1/8th) that should be in the hot selection. Set this higher if you have lots of idle connections; set it lower if you have more active connections.
superpoll.max_fd=10 * 1024: Maximum possible open files. Do not set this above 64 * 1024, and expect it to take a bit while Mongrel2 sets up constant structures.
upload.temp_store=None: This is not set by default. If you want large requests to reach your handlers, then set this to a directory they can access, and make sure they can handle it. Read about it in the Hacking section under Uploads. The file has to end in XXXXXX chars to work (read man mkstemp).
upload.temp_store_mode=0666: The mode to chmod any files uploaded to upload.temp_store.
zeromq.threads=1: Number of 0MQ IO threads to run. Careful, we’ve experienced thread bugs in 0MQ sometimes with high numbers of these.
limits.tick_timer=10: Mongrel2 keeps an internal clock for efficiency and to run the timeouts. This is how often that clock updates, and defaults to 10 seconds.
limits.min_ping=120: Minimum time since last activity before considering closing a socket. Set to 0 to disable it.
limits.min_write_rate=300: Minimum bytes/second written before considering closing a socket. Set to 0 to disable it.
limits.min_read_rate=300: Minimum bytes/second read before considering closing a socket. Set to 0 to disable it.
limits.kill_limit=2: How many of min_ping, min_write_rate, and min_read_rate have to trigger before a socket is killed.

You can also update your mimetypes in the same way, just set a variable with them:

Source 15: Changing Mimetypes

settings = {"zeromq.threads": 1, "limits.url_path": 1024}
mimetypes = {".txt": "text/superawesome"}

servers = [main]

3.11 SSL Configuration

Mongrel2 now supports SSL, with preliminary support for SSL session caching. As of v1.8.0 (actually earlier) you can enable SSL very easily for your Mongrel2 server. Mongrel2 configures SSL certs with two options in settings, and then a directory of .crt and .key files named after the UUID of the servers that need them.

To get started, you can make a simple self-signed certificate with some weak encryption and setup your certs directory:

Source 16: Making A Self-Signed Certificate

# make a certs directory 
mkdir certs 
 
# list out your servers so you can get the UUID 
m2sh servers 
 
# go into the certs directory 
cd certs 
 
# make a self-signed weak cert to play with 
openssl genrsa -des3 -out server.key 512 
openssl req -new -key server.key -out server.csr 
cp server.key server.key.org 
openssl rsa -in server.key.org -out server.key 
openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt 
 
# finally, copy the sesrver.crt and server.key files over to the UUID for that 
# server configuration in your mongrel2.conf 
mv server.crt 2f62bd5-9e59-49cd-993c-3b6013c28f05.crt 
mv server.key 2f62bd5-9e59-49cd-993c-3b6013c28f05.key

I actually have a shell script kind of like this, since I can never remember how to set this stuff up with openssl. Also, you should really adjust the RSA key strength from 512 to something you’re comfortable with. I’m using a weak key here so you can do performance testing and thrashing, and then compare with your real key later.

Once you have that done, you just have to add three little settings to your mongrel2 conf:

Add the setting certdir pointed at ./certs/. Make sure it has the trailing slash!
Add the Server.use_ssl = 1 value to the Server that has the UUID you just created a cert for.
Optionally, set the setting ssl_ciphers to SSL_RSA_RC4_128_SHA so you can play with the performance of a weak cipher. If you unset this, Mongrel2 will use the best one the browser asks for.

After that, your config should look something like this:

Source 17: Minimal SSL Configuration

main = Server(
    uuid="2f62bd5-9e59-49cd-993c-3b6013c28f05",
    use_ssl=1,
    access_log="/logs/access.log",
    error_log="/logs/error.log",
    chroot="./",
    pid_file="/run/mongrel2.pid",
    default_host="mongrel2.org",
    name="main",
    port=6767,
    hosts=[mysite]
)

settings = {
    "certdir": "./certs/"
    "ssl_ciphers": "SSL_RSA_RC4_128_SHA"
}

servers = [main]

Get that written, rerun m2sh config to make the new config, restart Mongrel2 (you can’t reload to enable SSL), and it should be working.

After you get this working you just have to get your own certificate, put it in the certs directory with the right filename, and you should be good to go.

3.11.1 SNI Support

Starting with 1.9.0, mongrel2 adds SNI support. On any request, mongrel2 will first search for files named hostname.crt and hostname.key – if found, those will be used, otherwise it will fallback to the UUID key and certificate files.

3.11.2 Experimental SSL Caching

We’ve got experimental SSL caching working, which will try to reuse the browser’s SSL session if it’s there. This is meant to be a trade-off between memory and performance, so it can chew a bunch of RAM if you have a lot of SSL traffic over a short period of time. We’ll be making the caching more configurable, but for now, it’s working and does speed up SSL clients that do it properly.

3.12 Configuring Filters (BETA)

The Mongrel2 v1.8.0 release also included working filters that you can configure and load dynamically. The filters are very fresh, and the only one available is the null filter found in tools/filters/null.c but it does work and you can configure it. It’s also currently not hooked into the reload gear that we’ve recently done, so don’t expect it to work if you do frequently hot reloading.

Configuring a filter is fairly easy, take a look at this example:

Source 18: Minimal Filter Configuration

null = Filter(name="/usr/local/lib/mongrel2/filters/null.so", settings={
        "extensions": ["*.html", "*.txt"],
        "min_size": 1000
        })

main = Server(
    uuid="f400bf85-4538-4f7a-8908-67e313d515c2",
    access_log="/logs/access.log",
    error_log="/logs/error.log",
    chroot="./",
    default_host="localhost",
    name="test",
    pid_file="/run/mongrel2.pid",
    port=6767,
    hosts = [
        Host(name="localhost", routes={
            /tests/: Dir(base=tests/, index_file=index.html,
                             default_ctype=text/plain)
            /nulltest/: Proxy(addr=127.0.0.1, port=8080)
        })
    ]
    filters = [null]
)

servers = [main]

First you can see that we setup the null filter with some arbitrary settings and point to where the .so file is. Filters can be configured with any arbitrarily nested data structure that can fit into a tnetstring, so you can pass them pretty much anything that matters. Lists, dicts, numbers, strings, are the main ones. You can also use variables in the config file, so you could create different servers and share config options for Filters and other parts of the files.

After that, there’s simply a Server.filters which takes a list of filters to load. If you don’t set this variable, then the filter gear isn’t even loaded and your server behaves as normal. If you do set this variable, then the filters are installed and will work.

If you run this config, you’ll see the filter printing out it’s config as a tnetstring, and then closing the connection, but only if you go to /nulltest/. If you go to /tests/sample.html to get at a directory, it’ll not even run.

We’ll have more documentation on actually writing filters in the Hacking section.

[next] [prev] [prev-tail] [front] [up]

Chapter 3Managing