RFC (2)822 - From, Sender, Disposition, what?

This section is dedicated to the plain text format in which email messages are mostly transmitted and focuses on routing information like author and recipients. Only if you know the basics about RFC 822 and SMTP, you can use TurboMail’s capabilities to a full extend.

The Anatomy of a message

The original format for email messages was defined in RFC 822 which was superseded by RFC 2822 [1]. The newest standard document about the format is currently RFC 5322. But the basics of RFC 822 still apply, so for the sake of readability we will just use ‘RFC 822’ to refer to all these RFCs. Please read the official standard documents if this text fails to explain some aspects.

Email messages consist of header lines which contain meta information like the subject and creation date and a body part where the actual message text and attachments are placed. Each header field has a name and a value, separated by a colon (“:”):

From: foo@example.com
Date: Sun, 26 Oct 2008 12:30:42 -0000
Subject: Test message

This should just give you an idea, how email headers look like. This - hopefully - leads to a better understanding of the following paragraphs. We did not talk about the format of the message body because it has no strict format like the header lines and generally you don’t need a deeper understanding of that because TurboMail’s Message class will take care of generating the message body. If you want to learn more, just examine the emails in your inbox (use the ‘view source’ command of your email client), have a look at the Wikipedia article about e-mail, or just read the standard documents cited above.

Originator fields

Email messages can have one or more authors which appear in the ‘From’ line. The author is probably the one who created the contents of the message body. Sometimes, the author is not the same as the one who actually sent the message, e.g. the secretary sends the message for his boss. In this case, the boss is still the author of the message (his name appears in the From: line) but the ‘Sender’ header tells who did send the mail actually [2] so here you would see the secretary’s name. RFC 5322 says: ‘The “Sender:” field specifies the mailbox of the agent responsible for the actual transmission of the message.’

If there are multiple authors of a message, the explicit specification of a sender is mandated by the RFCs. Otherwise the sender header may be omitted.

Who should receive the message?

There are multiple headers which do encode which addresses should receive this message. The best known header is ‘To’ where recipients are listed to whom the message text is addressed. Sometimes it can be useful send other people a copy of the message also so they get the information too. This is what the ‘CC’ header is for. If you don’t want send information to someone else too but don’t want to reveal this to the other recipients, you can put addresses in ‘BCC’ which is not a real header but TurboMail will send the message to the addresses listed there too. None of these headers is actually mandated by the RFCs so you can omit any or all of them (e.g. for a newsletter you could decide only to put people in BCC so that subscribers don’t see each others email addresses).

Who should receive replies?

Whe you use email as a communication medium, you must be prepared that the recipients will use email for their replies. So you should always provide a working email address which can receive the replies.

Sometimes the author should not receive replies to the message he sent. E.g. the project manager tells everyone in her team about some event but another employee organizes the event so he should get all messages related to that event. You can set a header named ‘Reply-To’ so that the recipient’s email client will use the specified address as a the recipient for a reply. Of course the user may change that but he has to do this explicitly. By default his reply will go to the address specified in the ‘Reply-To’ header.

When no reply-to address was given, replies will be sent to the from address(es). The sender address is not used for replies.

The SMTP envelope

It is important to differentiate between the message format as defined in RFC822 and SMTP. In the sections above you read something about originator fields, recipients and the like. But when you send email messages with SMTP, all these header fields don’t have any effect! During the transport with SMTP your email (headers+body part) is just treated as data [3].

The mechanism works like letter envelopes for snail mail: The post man (SMTP server) will only look at the envelope address to find out to whom the mail should be delivered. In your letter (email header) you can address the letter to whoever you like, the post man will deliver it to the address written on the envelope [4]. So in a SMTP transaction, you have a list of recipients (“RCPT TO”) where all recipient email addresses are listed. When you send a message through TurboMail, all addresses from your “To”, “CC” and “BCC” header fields are collected and used for “RCPT TO”. The “RCPT TO” list is not visible in the delivered message. This mechanism is used for the BCC (“blind carbon copy”) property. All addresses in BCC just appear in “RCPT TO” so the recipients don’t see if others got the mail too.

In contrary to normal letters, you can send the same message (with the same contents!) to multiple recipients: The SMTP server will take care about sending each recipient a copy of this message. This can potentially same quite a big amount of bandwidth because the message (including attachments) can be easily some megabytes big. But this only works if you send exactly the same message to everyone - if you want to use personalized messages which contain the recipient’s name in the message body, you have to generate individual messages for each recipient.

And last but not least there is the SMTP From address. If the post man can not deliver a letter (e.g. the address is wrong), it will be returned to the sender. Exactly the same principle is true for email: There is one address used as sender in the SMTP communication. Normally the address in the ‘Sender’ header is used. If this is not present, the from address will be taken. All delivery failures (“bounces”, e.g. user unknown, mailbox is over quota) are sent to this address. For example if you sent out a newsletter you may want to use an address for SMTP From which is different from the ‘From’ header so all bounces will go to a mailbox which is not monitored by human operators. Instead you can use scripts to disable addresses in your database which produce bounces with a permanent error code [5]. In TurboMail you can change the default behaviour by setting the ‘smtp_from’ property of a message explicitly.

Please note that remote SMTP servers are likely to require a valid “SMTP From” address (otherwise they will reject your message immediately). There are many checks to ensure that you provide valid address and of course you can “trick” all checks but you really should provide an email address to which bounces can be delivered - even if this is just a black hole which may be appropriate in some cases.

[1]There were several other extensions to these RFCs like RFC 2045 through RFC 2049 for MIME message bodies, but most of the message header lines are still unchanged since the publication of RFC 822 - ‘Standard for the Format of ARPA Internet Text Messages’.
[2]Gotcha: When you submit an email via authenticated SMTP to a mail server, some mail servers replace your sender header (or create a new one) if the From address is not the same as the authentication name. In the Exim MTA the configuration option is called ‘local_from_check’.
[3]Actually, this not quite true because SMTP servers may look into the message to identify spam and other malicious mails. But in general the message routing is totally unaffected by the headers in your email.
[4]That’s why you can not trust the “To” header in messages you receive: They don’t have any effect on the message routing. So may receive messages addressed to billg@microsoft.com (“To” header) because the sender put a different address in “RCPT TO” than in the message header.
[5]Please note that there is no standardized format for bounce messages and error codes. Therefore you likely need to write a “parser” for different mailer daemons.

Managers

Every manager component implements a specific strategy when to send messages after they were submitted via message.send(). TurboMail comes with two managers by default: The ImmediateManager and the DemandManager. However, there can be only one active manager in TurboMail. You can configure the manager to use by changing the ‘mail.manager’ configuration option.

ImmediateManager

The ImmediateManager is extremely simple: Every message is immediately handed over to the transport, all operations are synchronous. This is the manager which is used by default if you don’t configure anything.

DemandManager

The DemandManager behaves like TurboMail 2.x: It can spawn multiple threads so that your application is not blocked if you send many messages at once or your mail server is slow.

Configuration options: * mail.manager = ‘demand’ – enable the demand manager * mail.demand.threads (integer, default: 4) – maximum number of worker threads * mail.demand.divisor (integer, default: 10) – divide the queue size by this number to estimate the number of required threads * mail.demand.timeout (integer, default: 60) – number of seconds a worker thread can be idle before shutting down

Once the DemandManager got a message, it will try to deliver it. You won’t get any notification about a failed delivery. Also the DemandManager holds the queue in memory so all unsent messages are lost if you stop the application. Currently fixing these shortcomings is planned for TurboMail 3.1 (patches welcome!).

Transports - Your Work Horses

Transports actually deliver your message so that it can reach the specified recipients like the SMTPTransport. Other possible transports (not yet written) would be some SendmailTransport (calls /usr/bin/sendmail on the command line) or AppEngineMailServiceTransport.

SMTPTransport

The SMTP transport sends messages to a SMTP server (smart host).

Configuration options:

  • mail.smtp.server (string, mandatory) host name or IP address of the SMTP server which will act as a relay for all mails
  • mail.smtp.username (string)
  • mail.smtp.password (string)
  • mail.smtp.tls (boolean, default: None): Use TLS for the server connection (If ‘None’ is set TurboMail will attempt to auto-detect TLS which might if your server has a bad configuration)
  • mail.smtp.debug (boolean, default: False): Print SMTP communication to STDERR
  • mail.smtp.max_messages_per_connection (integer, default: 1): The number of messages that are sent over one connection. Increasing the number can help your performance if there is a high latency on connections to your mail server but you might find more (generally harmless) log messages about connections which timed out.

If password and username were given, TurboMail will try to authenticate to the server with these credentials before sending messages.

Notes

Please be aware that your SMTP server can enforce additional restrictions. A very common example are checks for valid recipient and sender addresses (email address is syntactically valid and the domain exists). If you do not ensure that your message fulfills these criteria, the message won’t be delivered. If you use the ImmediateManager you may need to catch these exceptions (most notably SMTPSenderRefused and SMTPRecipientsRefused). Please note that SMTPRecipientsRefused exception is only raised if all recipients were rejected. If you send a message to two recipients and just one was rejected, you will not see any exception.

If your SMTP server does not listen on the standard SMTP port 25, you need to specify the port in the ‘mail.smtp.server’ option (e.g. ‘localhost:4711’).

DebugTransport

This is the default transport in case you did not configure a transport explicitly (in order to minimize the risk that you accidentally send out messages). The debug transport just collects all messages which were sent to TurboMail. No delivery takes place. This is very useful for unit tests where you don’t want to have interactions with external servers.

There are no configuration options for the DebugTransport, just a method to retrieve the sent messages:

  • get_sent_mails() - return a list of collected messages

TurboMail Extensions - Leverage Super Powers

TurboMail Extensions will be extended in a way that you can add custom behaviour to Messages easily (e.g. encryption). However due to time constraints the interface was not finalized for 3.0. Therefore this interface will probably be changed within the 3.x series.

UTF8qp

Unfortunately Python’s email module will encode the body part using base64 by default if you have a UTF-8 character set. As written before TurboMail provides a fake encoding called ‘UTF-8-QP’ which works around that.

This extension reconfigures the global UTF-8 character set to use quoted- printable encoding. Please note that this will affect other Python modules in the same process using the email module.

To enable this extension, you need to enable it in your configuration:

mail.utf8qp.on        = True

TurboMail Adapters

TurboMail comes with several so-called ‘adapters’ which ease the usage of TurboMail in a certain framework. For example an adapter will configure TurboMail in a way so that it can use your framework’s configuration mechanism.

TurboGears 1.x

TurboGears 1 uses setuptools to find ‘extensions’ (entry point ‘turbogears.extensions’). If you installed TurboMail correctly, TurboGears 1 will start and stop TurboMail automatically, you don’t need to write a single line of custom code. Just put TurboMail’s configuration in your app.cfg or dev.cfg/prod.cfg.

Pylons

Pylons tries to be as flexible as possible so there is not fixed ‘extension’ mechanism in place (as of 0.9.7). Therefore you need to add two lines in your startup configuration. While there are many places where you could add this code, we recommend doing it in the constructor of your Globals object (yourpackage.lib.app_globals):

def __init__(self):
    # ...
    from turbomail.adapters import tm_pylons
    tm_pylons.start_extension()

To configure TurboMail, just use the [DEFAULT] section in your development.ini/production.ini files.

As there is no shutdown hook either, TurboMail tries to use Python’s atexit module to do all the necessary cleanup tasks. Please note that atexit will not call the registered handlers in all cases.

Pylons’ configuration module does not convert the configuration values to Python types other than strings (as opposed to ConfigObj used in TurboGears 1). The Pylons adapter calls as_bool() on values for keys that ends with ‘.on’. Configuration values which only consist of digits will be converted to int. Everything else is returned unmodified.

For Pylons you don’t have to quote strings in the configuration files. Strings surrounded with quotations marks will be returned unmodified:

# Wrong:
# mail.manager = 'demand'

# Good:
mail.manager = demand

TurboGears 2

TurboGears 2 is based on Pylons so for now you can just use the Pylons adapter. When TurboGears Ticket 2206 is implemented, we can provide a custom adapter for TurboGears 2 that does not require any code modification.

Yet Another Python Web Framework (YAPWF)

YAPWF has native support for TurboMail. As with Pylons, define your mail configuration in your INI file, though use the [app:main] (or other relevant app: section) instead of [DEFAULT].

Others

We like to add TurboGears for other popular frameworks like Trac and Django. However we did not have any time to do that in a sane way so far. Feel free to send patches.

TurboMail from Scratch

If you don’t use a supported framework that means one which has a TurboMail Adapter you have to start the TurboMail machinery by yourself. You can even use TurboMail for Python command line scripts! One goal of TurboMail 3.0 was to get rid of the TurboGears dependency [6]. Therefore starting TurboMail is quite easy:

from turbomail.control import interface

turbomail_config = {
    'mail.on': True,
    'mail.transport': 'smtp',
    'mail.smtp.server': 'localhost',
}
# ...
interface.start(turbomail_config)
# now you can send messages
# ...
interface.stop()

If your framework has a sophisticated configuration mechanism you probably want to use this instead of hard coding all configuration options by hand. TurboMail just expects that the object used for the configuration provides a dict-like get() method which is how most configuration mechanisms work. If there were no TurboGears 1 adapter, you could write:

import turbogears
from turbomail.control import interface

interface.start(turbogears.config)
# ...
interface.stop()

Thereby you can leverage all the advantages of having different configurations: TurboGears has a general application and deployment/environment specific configurations which are merged at runtime. So you can change settings easily which change often (e.g. mail server credentials, logging) while leaving most of the other stuff untouched.

Gotcha

Don’t forget to call interface.stop() before your application terminates. This is especially important if you use a manager which may send messages asynchronously like the DemandManager. If you don’t call interface.stop() and the DemandManager has still some mails in the queue when your app is finished, you will loose these mails!

[6]Don’t get us wrong: The authors of TurboMail rely on TurboGears a lot but we felt that TurboMail could be useful for more people than just for TurboGears developers.