Crossroads is a load balance and fail over utility for TCP based services. It is a daemon program running in user space, and features extensive configurability, polling of back ends using 'wakeup calls', detailed status reporting, 'hooks' for special actions when backend calls fail, and much more. Crossroads is service-independent: it is usable for HTTP/HTTPS, SSH, SMTP, DNS, etc. In the case of HTTP balancing, Crossroads can modify HTTP headers, e.g. to provide 'session stickiness' for back-end processes that need sessions, but aren't session-aware of other back-ends.
Using this approach, crossroads serves as load balancer and fail over utility. Crossroads will very likely not be as reliable as hardware based balancers, since it always will require a server to run on. This server, in turn, may become a new Single Point of Failure (SPOS). However, in situations where cost efficiency is an issue, crossroads may be a good choice. Furthermore, crossroads can be deployed in situations where a hardware based balancing already exists and augmenting service reliability is needed. Or, crossroads may be run off a diskless system, which again improves reliability of the underlying hardware.
This document describes how to use crossroads, how to configure it in order to increase the reliability of your systems, and how to compile the program from its sources. This document is also available in PDF format.
As quick reference, here are some important URL's for Crossroads:
Crossroads is distributed as-is, without assumptions of fitness or usability. You are free to use crossroads to your liking. It's free, and as with everything that's free: there's also no warranty.
You are allowed to make modifications to the source code of crossroads, and you are allowed to (re)distribute crossroads, as long as you include this text, all sources, and if applicable: all your modifications, with each distribution.
While you are allowed to make any and all changes to the sources, I would appreciate hearing about them. If the changes concern new functionality or bugfixes, then I'll include them in a next release, stating full credits. If you want to seriously contribute (to which you are heartily encouraged), then mail me and I'll get you access to the Crossroads SVN repository, so that you can update and commit as you like.
Throughout this document, the following terms are used: (Many more meanings of the terms will exist -- yes, I am aware of that. I'm using the terms here in a very strict sense.)
HTTP/1.0
500 Server Error
then crossroads will see this as a
succesful connection, though the user behind a browser may
think otherwise.
As of version 0.26 the syntax of the configuration file has changed. In particular:
maxconnections
is now used instead of
maxclients
;
connectiontimeout
is now used instead of
sessiontimeout
.Therefore when converting configuration files to the new syntax, the above keywords must be changed. (The reason for these changes is that 0.26 introduces sticky HTTP sessions that span multiple TCP connections, and the term session is used strictly in that sense -- and no longer for a TCP connection.)
As of version 1.08, the following directives no longer are supported:
insertstickycookie
was replaced by the more generic
directive addclientheader
. E.g., instead of insertstickycookie "XRID=100; Path=/";
addclientheader "Set-Cookie: XRID=100; Path=/";
insertrealip
was replaced by the more generic
directive setserverheader
. E.g., instead of insertrealip on;
setserverheader "XR-Real-IP: %r";
XR-Real-IP
).
crossroads/
.
svn://svn.e-tunity.com/crossroads
.
crossroads/trunk
and the stable versions under
crossroads/tags
,
e.g. crossroads/tags/release-1.00
.
crossroads/tags/release-
X.YY, where
X.YY is a release ID. In the latter case, change-dir to
crossroads/trunk
.
make install
. This installs the crossroads
binary into /usr/local/bin/
. If the compilation doesn't
work on your system, check etc/Makefile.def
for hints.
/etc/crossroads.conf
. In it state
something like:
service www { port 80; revivinginterval 15; backend one { server 10.1.1.100:80; } backend two { server 10.1.1.101:80; } }
That's off course assuming that you want to balance HTTP on port 80 to two back ends at 10.1.1.100 and 10.1.1.101.
crossroads start
.
crossroads
status
.
/etc/crossroads.conf
(the default configuration file). It
supports a number of flags (e.g., to overrule the location of the
configuration file). The actual usage information is always obtained
by typing crossroads
without any arguments. Crossroads then
displays the allowed arguments.
This section shows the most basic usage. As said above, start
crossroads
without arguments to view the full listing of options.
crossroads start
and crossroads stop
are typical
actions that are run from system startup scripts. The
meaning is self-explanatory.
crossroads restart
is a combination of the former
two. Beware that a restart may cause discontinuity in
service; it is just a shorthand for typing the 'stop' and
'start' actions after one another.
crossroad status
reports on each running
service. Per service, the state of each back end is
reported.
crossroads tell
service backend state is a
command line way of telling crossroads that a given back
end, of a given service, is in a given state. Normally
crossroads maintains state information itself, but by
using crossroads tell
, a back end can be e.g. taken
'off line' for servicing.
crossroads configtest
tells you whether the
configuration is syntactially correct.
crossroads services
reports on the configured
services. In contrast to crossroads status
, this
option only shows what's configured -- not what's up and
running. Therefore, crossroads services
doesn't
report on back end states.
crossroads sampleconf
shows a sample configuration on
screen. A good way of quicky viewing the configuration
file syntax, or of getting a start for your own
configuration /etc/crossroads.conf
.
Two 'flags' of Crossroads are specifically logging-related. This section elaborates on these flags.
First, there's flag -a
. When present, the start and end of
activity is logged using statements like
Similarly, there are 'ending' statements. Using this flag and scanning your logs for these statements may be helpful in quickly determining your system load.
Second, there's flag -l
. This flag selects the 'facility' of
logging and defaults to LOG_DAEMON
. You can supply a number
between 0 and 7 to flag -l
to select LOG_LOCAL0
to
LOG_LOCAL7
. This would separate the Crossroads-related logging
from other streams. Here's a very short guide; please read your Unix
manpages of syslogd
for more information.
/etc/syslog.conf
and add a line:
local7.* /var/log/crossroads.log
That instructs syslogd
to send LOG_LOCAL7
requests to the
logfile /var/log/crossroads.log
.
syslogd
. On most Unices that's done by
issuing killall -1 syslogd
. (As a side-note, I tried this once
on an Bull/AIX system, and the box just shut down. The killall
command killed every process...)
crossroads
with the flag -l7
.
/var/log/crossroads.log
for Crossroads'
messages./etc/crossroads.conf
. This location can be overruled using the
command line flag -c
.
This section explains the syntax of the configuration file, and what all settings do.
This section describes the general elements of the crossroads configuration language.
4.1.1: Empty lines and comments
Empty lines are of course allowed in the configuration. Crossroads recognizes three formats of comment:
/*
and */
,
//
and ending with the end
of the text line;
#
and ending with the end
of the text line.
Simply choose your favorite editor and use the comment that 'looks
best'. (I favor C or C++ comment. My favorite editor emacs
can be put in cmode
and nicely highlight what's comment and what's
not. And as a bonus it will auto-indent the configuration!)
4.1.2: Keywords, numbers, identifiers, generic strings
In a configuration file, statements are identified by keywords,
such as service
, verbosity
. These are reserved words.
Many keywords require an identifier as the argument. E.g, a
service has a unique name, which must start with a letter or
underscore, followed by zero or more letters, underscores, or
digits. Therefore, in the statement service myservice
, the keyword is
service
and the identifier is myservice
.
Other keywords require a numeric argument. Crossroads knows only
non-negative integer numbers, as in port 8000
. Here, port
is
the keyword and 8000
is the number.
Yet other keywords require 'generic strings', such as hostname
specifications or system commands. Such generic strings contain any
characters (including white space) up to the terminating statement
character ;
. If a string must contain a semicolon, then it must
be enclosed in single or double quotes:
This is a string;
is a string that starts at T
and ends with g
"This is a string";
is the same, the double quotes
are not necessary
"This is ; a string";
has double quotes to protect
the inner ;
Finally, an argument can be a 'boolean' value. Crossroads knows
true
, false
, yes
, no
, on
, off
. The keywords
true
, yes
and on
all mean the same and can be used
interchangeably; as can the keywords false
, no
and off
.
Service definitions are blocks in the configuration file that
state what is for each service. A service definition starts with
service
, followed by a unique identifier, and by statements in
{
and }
. For example:
// Definition of service 'www': service www { ... ... // statements that define the ... // service named 'www' ... }
The configuration file can contain many service blocks, as long as the
identifying names differ. The following list shows possible
statements. Each statement must end with a semicolon, except for the
backend
statement, which has is own block (more on this later).
any
and
http
. The type any
means that crossroads doesn't
interpret the contents of a TCP stream, but only distributes streams
over back ends. The type http
means that crossroads has to
analyze what's in the messages, does magical HTTP header tricks, and
so on -- all to ensure that multiple connections are treated as one
session, or that the back end is notified of the client's IP address.
Unless you really need such special features, use the type any
(the
default), even for HTTP protocols.
type
typespecifier ;
any
or http
any
port 8000
says that this service will accept
connections on port 8000.
port
number ;
bindto 127.0.0.1
causes crossroads to 'bind' the service only to
the local IP address. Network connections from other hosts won't be
serviced. By default, crossroads binds a service to all presently
active IP addresses at the invoking host.
bindto
ip-address ;
192.168.1.45
, or the keyword any
any
verbosity on
or
verbosity off
. When 'on', log messages to /var/log/messages
are generated that show what's going on. (Actually, the
messages go to syslog(3)
, using facility LOG_DAEMON
and
priority LOG_INFO
. In most (Linux) cases this will mean: output to
/var/log/messages
. On Mac OSX the messages go to
/var/log/system.log
.) The keyword verbose
is an alias for
verbosity
.
verbosity
setting ;
verbose
setting ;
true
, yes
or on
to turn
verbosity on; or false
, no
, off
to turn it off.
off
.
The syntax is:
dispatchmode roundrobin
: Simply the 'next in line' is
chosen. E.g, when 3 back ends are active, then the usage
series is 1, 2, 3, 1, 2, 3, and so on.
Roundrobin dispatching is the default method, when no
dispatchmode
statement occurs.
dispatchmode random
: Random selection. Probably only
for stress testing, though when used with weights (see below)
it is a good distributor of new connections too.
dispatchmode bysize [ over
connections ]
:
The next back end is the one
that has transferred the least number of bytes. This
selection mechanism assumes that the more bytes, the heavier
the load.
The modifier over
connections is optional. (The square
brackets shown above are not part of the statement but
indicate optionality.) When given,
the load is computed as an average of the last stated number of
connections. When this modifier is absent, then the load is
computed over all connections since startup.
dispatchmode byduration [ over
connections ]
:
The next back end is the one
that served connections for the shortest time. This mechanism
assumes that the longer the connection, the heavier the load.
dispatchmode byconnections
: The next back end is the one
with the least active connections. This mechanism assumes that
each connection to a back end represents load. It is usable
for e.g. database connections.
dispatchmode byorder
: The first back end is selected
every time, unless it's unavailable. In that case the second
is taken, and so on.The selection algorithm is only used when clients are serviced that aren't part of a sticky HTTP session. This is the case during:
any
;
http
.
When type http
is in effect and a session is underway, then the
previously used back end is always selected -- regardless of
dispatching mode.
Your 'right' dispatch mode will depend on the type of service. Given
the fact that crossroads doesn't know (and doesn't care) how to
estimate load from a network traffic stream, you have to choose an
appropriate dispatch mode to optimize load balancing. In most cases,
roundrobin
or byconnections
will do the job just fine.
An example of the definition is revivinginterval 10
. When this
reviving interval is given, crossroads will check each 10 seconds
whether unavailable back ends have woken up yet. A back end is
considered awake when a network connection to that back end can
succesfully be established.
revivinginterval
number ;
maxconnections
. There is one argument; the number of concurrent
established connections that may be active within one service.
'Throttling' the number of connections is a way of preventing Denial of Service (DOS) attacks. Without a limit, numerous network connections may spawn so many server instances, that the service ultimately breaks down and becomes unavailable.
maxconnections
number ;
backlog 5
to cause crossroads
to have 5 waiting connections for 1 active connection.
The backlog queue shouldn't be too
high, or clients will experience timeouts before they can actually
connect. The queue shouldn't be too small either, because clients
would be simply rejected. Your mileage may vary.
backlog
number ;
crossroad
status
must be able to get to the actual state information of all
running services. This is internally implemented through shared
memory, which is reserved using a key.
Normally crossroads will supply a shared memory key, based on the service port and bitwise or-ed with a magic number. In situations where this conflicts with existing keys (of other programs, having their own keys), you may supply a chosen value.
The syntax is e.g. shmkey 123456
. The actual key value doesn't
matter much, as long as it's unique and as long as each invocation of
crossroads uses it.
shmkey
number ;
connectiontimeout 300
. This instructs crossroads to
consider a connection where nothing has happened for 300 seconds as
'finished'. Crossroads will terminate the connection when this timeout
is exceeded.
connectiontimeout
number ;
Inside the service definitions as are described in the previous
section, backend definitions must also occur. Backend definitions
are started by the keyword backend
, followed by an identifier
(the back end name) , and statements inside {
and }
:
service myservice { ... ... // statements that define the ... // service named 'myservice' ... backend mybackend { ... ... // statements that define the ... // backend named 'mybackend' ... } }
Each service definition must have at least one backend definition. There may be more (and probably will, if you want balancing and fail over) as long as the backend names differ. The statements in the backend definition blocks are described in the following sections.
4.3.1: General Backend Directives
The following directives are used in all types of services (any
or
http
). HTTP-specific directives are shown in section 4.3.2.
server
10.1.1.23
, or server web.mydomain.org
. A TCP port specifier
can follow the server name, as in server web.mydomain.org:80
.
server
servername ;
server
servername:
port ;
server
specifier doesn't include a TCP
port, then this statement is used to define the port at which the
back end expects its traffic. There is one argument, the (numeric)
port number.
port
number ;
server
setting or using the port
specifier.
service
specifications, a
backend
can have its own verbosity (on
or off
). When
on
, traffic to and fro this back end is reported.
verbosity
setting ;
verbose
setting ;
true
, yes
or on
to turn
verbosity on; or false
, no
, off
to turn it off.
off
.
maxconnections
statement for the overall service description.
The difference is that a maxconnections
statement at the level of
a service description avoids too many hits from the outside (DOS
prevention). A maxconnections
statement at the level of a back end
description makes sure that this particular back end doesn't get
overloaded.
maxconnections
number ;
The weighing mechanism only applies to the dispatch modes
random
, byconnections
, bysize
and byduration
.
The weight is in fact a penalty factor. E.g., if backend A has
weight 2
and backend B has weight 1
, then backend B will
be selected all the time, until its usage parameter is twice as
large as the parameter of A. Think of it as a 'sluggishness' statement.
weight
number ;
decay 10
makes
sure that the load that crossroads computes for this back end (be
it in seconds or in bytes) is decreased by 10% each time that
an other back end is hit. Decays are not applied to the count
of concurrent connections.
This means that when a given back end is hit, then its usage data of the transferred bytes and the connection duration are updated using the actual number of bytes and actual duration. However, when a different back end is hit, then the usage data are decreased by the specified decay.
decay
number ;
onfailure
and onsuccess
. The argument to
the triggers is a system command that is executed when a connection
with the back end either fails or succeeds.
onfailure
commandline ;
and
onsuccess
commandline ;
trafficlog
and
throughputlog
.
trafficlog
filename ;
throughputlog
filename ;
The trafficlog
statement causes all traffic to be logged in
hexadecimal format. Each line is prefixed by B
or C
,
depending on whether the information was received from the back
end or from the client.
The throughputlog
statement writes shorthand transmissions to
its log, accompanied by timings.
4.3.2: HTTP-related Backend Directives
The following directives are specific for HTTP-type services; i.e.,
services with a specification type http
.
It is inevitable that when Crossroads handles services of type
http
, more processing is necessary. Crossroads has to unpack the TCP
payload in order to do its header magic; which leads to performance
impact.
stickycookie
value
causes Crossroads to unpack clients' requests, to check for
value in the cookies. When found, the message is routed to the
back end having the appropriate stickycookie
directive.
E.g., consider the following configuration:
service ... { ... backend one { ... stickycookie "BalancerID=first"; } backend two { ... stickycookie "BalancerID=second"; } }
When clients' messages contain cookies named BalancerID
with
the value first
, then such messages are routed to backend
one
. When the value is second
then they are routed to the
backend two
.
There are basically to provide such cookies to a browser. First, a
back end can insert such a cookie into the HTTP response. E.g.,
the webserver of back end one
might insert a cookie named
BalancerID
, having value first
.
Second, Crossroads can insert such cookies using a carefully
crafted directive addclientheader
. See below.
addclientheader
,
appendclientheader
, setclientheader
, addserverheader
,
appendserverheader
, setserverheader
.
The directive names always consist of
ActionDestinationheader
, where:
add
, append
or insert
.
add
adds a header, even when headers with
the same name already are present in an HTTP
message. Adding headers is useful for e.g. Set-Cookie
headers; a message may contain several of such headers.
append
adds a header if it isn't present
yet in an HTTP message. If such a header is already
present, then the value is appended to the pre-existing
header. This is useful for e.g. Via
headers. Imagine
an HTTP message with a header Via: someproxy
. Then the
directive appendclientheader "Via: crossroads"
will
rewrite the header to Via: someproxy; crossroads
.
set
overwrites headers with the same
name; or adds a new header if no pre-existing is found.
This is useful for e.g. Host
headers.
client
or server
. When
the destination is server
, then Crossroads will apply such
directives to HTTP messages that originate from the browser
and are being forwarded to back ends. When the destination is
client
, then Crossroads will apply such directives to
backend responses that are shuttled to the browser.
The syntax of the directives is e.g. addclientheader
"X-Processed-By: Crossroads";
. The directives expect one
argument; a string, consisting of a header name, a colon, and a
header value. The directive ends with a semicolon.
The header value may contain one of the following formatting directives:
%r
is expanded to the real IP address of a client;
%t
is expanded to a timestamp of the local time;
%T
is expanded to a timestamp of Greenwich Mean Time;
%v
is expanded to the Crossroads version;
%
x (where x is any other character) is
expanded to x. E.g., %%
is a literal % sign.
The following examples show common uses of header modifications.
stickycookie
and addclientheader
, HTTP session
stickiness is enforced. Consider the following configuration:
service ... { ... backend one { ... addclientheader "Set-Cookie: BalancerID=first; path=/"; stickycookie "BalancerID=first"; } backend two { ... addclientheader "Set-Cookie: BalancerID=second; path=/"; stickycookie "BalancerID=second"; } }
The first request of an HTTP session is balanced to either
backend one
or two
. The server response is enriched
using addclientheader
with an appropriate cookie. A
subsequent request from the same browser now has that cookie
in place; and is therefore sent to the same back end where the
its predecessors went.
Server: Apache
1.27
. This potentially provides information to attackers. The
following configuration hides such information:
service ... { ... backend one { ... setclientheader "Server: WWW-Server"; } }
X-Real-IP
:
service ... { ... backend one { ... setserverheader "X-Real-IP: %r"; } }
setclientheader
and setserverheader
also play a key
role in downgrading Keep-Alive connections to
'single-shot'. E.g., the following configuration makes sure
that no Keep-Alive connections occur.
service ... { ... backend one { ... setserverheader "Connection: close"; setclientheader "Connection: close"; } }
In order to tune your load balancing, you'll need to understand how
crossroads computes usage, how weighing works, and so on. In this
section we'll focus on the dispatching modes bysize
, byduration
and byconnections
only. The other dispatching types are
self-explanatory.
5.1.1: Bysize, byduration or byconnections?
As stated before, crossroads doesn't know 'what a service does' and how to judge whether a given back end is very busy or not. You must therefore give the right hints:
byduration
is appropriate here.
bysize
is apppropriate.
byduration
can also be used when
network latency is an issue. E.g., if your balancer has back
ends that are geograpically distributed, then byduration
would be a good way to select best available back ends.
dispatchmode
byduration
is not usable for interactive processes such as
SSH logins. Idle time of a
login adds to the duration, while causing (almost) no
load. Mode byduration
should only be used for automated
processes that don't wait for user interaction (e.g., SOAP
calls and other HTTP requests).
byconnections
can
be used if you don't have other clues for load
estimations.
E.g., consider a database connection. What's
heavier on the back end, time-consuming connections, or connections
where loads of bytes are transferred? Well, that depends. A
tough select
query that joins multiple tables can be very
heavy on the back end, though the response set can be quite
small - and hence the number of
transferred bytes. That would suggest
dispatching by duration. However, byduration
balancing doesn't respresent the true world, when interactive
connections can occur where users have an idle TCP connection to
the database:
this consumes time, but no bytes (see the SSH login example
above). In this case, the dispatch mode byconnections
may be
your best bet.
5.1.2: Averaging size and duration
The configuration statement dispatchmode bysize
or byduration
allows an optional modifier over
number, where the stated
number represents a connection count. When this modifier is present, then
crossroads will use a moving average over the last n connections to
compute duration and size figures.
In the real world you'll always want this modifier. E.g., consider two
back ends that are running for years now, and one of them is suddenly
overloaded and very busy (it experiences a 'spike' in activity).
When the over
modifier is absent, then
the sudden load will hardly show up in the usage figures -- it will
flatten out due to the large usage figures already stored in the years
of service.
In contrast, when e.g. over 3
is in effect, then a sudden load
does show up -- because it highly contributes to the average of three
connections.
Decays are also only relevant when crossroads computes the 'next best back end' by size (bytes) or duration (seconds). E.g., imagine two back ends A and B, both averaged over say 3 connections.
Now when back end A is suddenly hit by a spike, its average would go up accordingly. But the back end would never again be used, unless B also received a similar spike, because A's 'usage data' over its last three connections would forever be larger than B's data.
For that reason, you should in real situations probably always specify a decay, so that the backend selection algorithm recovers from spikes. Note that the usage data of the back end where a decay is specified, decay when other back ends are hit. The decay parameter is like specifying how fast your body regenerates when someone else does the work.
The below configuration illustrates this:
/* Definition of the service */ service soap { /* Local TCP port */ port 8080; /* We'll select back ends by the processing * duration */ dispatchmode byduration over 3; /* First back end: */ backend A { /* Back end IP address and port */ server 10.1.1.1:8080; /* When this back end is NOT hit because * the other one was less busy, then the * usage parameters decay 10% per connection */ decay 10; } /* Second back end: */ backend B { server 10.1.1.2:8080; decay 10; } }
The back end modifier weight
is useful in situations where your
back ends differ in respect to performance. E.g,. your back ends may
be geographically distributed, and you know that a given back end is
difficult to reach and often experiences network lag.
Or you may have one primary back end, a system with a fast CPU and enough memory, and a small fall-back back end, with a slow CPU and short on memory. In that case you know in advance that the second back end should be used only rarely. Most requests should go to the big server, up to a certain load.
In such cases you will know in advance that the best performing back ends
should be selected the most often. Here's where the weight
statement comes in: you can simply increase the weight of the back
ends with the least performance, so that they are selected less
frequently.
E.g., consider the following configuration:
service soap { port 8080; dispatchmode byduration over 3; backend A { server 10.1.1.1:8080; decay 20; } backend B { server 10.1.1.2:8080; weight 2; decay 10; } backend C { server 10.1.1.3:8080; weight 4; decay 5; } }
This will cause crossroads to select back ends by the processing time, averaging over the last three connections. However, backend B will kick in only when its usage is half of the usage of A (back end B is probably only half as fast as A). Backend C will kick in only when its usage is a quarter of the usage of A, which is half of the usage of B (back end C is probably very weak, and just a fall-back system incase both A and B crash). Note also that A's usage data decay much faster than B's and C's: we're assuming that this big server recovers quicker than its smaller siblings.
5.1.5: Throttling the number of concurrent connections
If you suspect that your service may occasionally receive 'spikes' of activity (which you should always assume), then it might be a good idea to protect your service by specifying a maximum number of concurrent connections. This protection can be specified on two levels:
maxconnections
100;
states that the service as a whole will never
service more than 100 concurrent connections. This means that
all your back ends and the crossroads balancer itself
will be protected from being overloaded.
maxconnections 10;
states that this particular back end will never have more
than 10 concurrent connections; regardless of the overall
setting on the service level. This means that this
particular back end will be protected from being
overloaded (regardless of what other back ends may
experience).
The maxconnections
statement, combined with a back end selection
algorithm, allows very fine granularity. The maxconnections
statement
on the back end level is like a hand brake: even when you specify a
back end algorithm that would protect a given back end from being used
too much, a situation may occur where that back end is about to be
hit. A maxconnections
statement on the level of that back may then
protect it.
This section focuses on HTTP session stickiness. This term refers to the ability of a balancer to route a conversation between browser and a backend farm always to the same back end. In other words: once a back end is selected by the balancer, it will remain the back end of choice, even for subsequent connections.
The rule of thumb as far as the balancer is concerned, is: Do not use HTTP session stickiness unless you really have to. Enabling session stickiness hampers failover, balancing and performance:
There is a number of measures that you can take to avoid using session stickiness. E.g., session data can be 'shared' between web back ends. PHP offers functionality to store session data in a database, so that all PHP applications have access to these data. Application servers such as Websphere can be configured to replicate session data between nodes.
However, if you must use session stickiness, then proceed as follows:
service
description, set the type to
http
.
stickycookie
and a addclientheader
directives.Once crossroads sees that, it will examine each HTTP message that it shuttles between client and back end:
Set-Cookie
directive.
Below is a short example of a configuration.
service www { port 80; type stickyhttp; revivinginterval 15; dispatchmode byconnections; backend one { server 10.1.1.100:80; stickycookie XRID=100; addclientheader "Set-Cookie: XRID=100; Path=/"; } backend two { server 10.1.1.101:80; stickycookie XRID=101; addclientheader "Set-Cookie: XRID=101; Path=/"; } }
Note how the cookie names and values in the directives
stickycookie
and addclientheader
match. That is obviously a
prerequisite for stickiness.
Since Crossroads just shuttles bytes to and fro, meta-information of network connections is lost. As far as the back ends are concerned, their connections originate at the Crossroads junction. For example, standard Apache access logs will show the IP address of Crossroads.
In order to compensate for this, Crossroads can insert a special header in HTTP connections, to inform the back end of the original client's IP address. In order to enable this, the Crossroads configuration must state the following:
http
, and not any
;
addserverheader "X-Real-IP: %r";
X-Real-IP
is a common name for this purpose.
After this, HTTP traffic that arrives at the back ends has a new
header: X-Real-IP
, holding the client's IP address.
Note that once the type is set to http
, Crossroads'
performance will be hampered -- all passing messages will have to be
unpacked and analyzed.
5.3.1: Sample Crossroads configuration
The below sample configuration shows two HTTP back ends that receive the client's IP address:
service www { port 80; type http; revivinginterval 5; dispatchmode roundrobin; backend one { server 10.1.1.100:80; addserverheader "X-Real-IP: %r"; } backend two { server 10.1.1.200:80; addserverheader "X-Real-IP: %r"; } }
5.3.2: Sample Apache configuration
The method by which each back end analyzes the header X-Real-IP
will obviously be different per server implementations. However, a
common method with the Apache webserver is to log the client's IP
address into the access log.
Often this is accomplished using the log format custom
, defined as
follows:
LogFormat "%h %l %u %t %D \"%r\" %>s %b" common CustomLog logs/access_log common
The first line defines the format common
, with the remote host
specified by %h
. The second line sends access information to a log
file logs/access_log
, using the previously defined format
common
.
Furtunately, Apache's LogFormat
allows one to log contents of
headers. By replacing the %h
with %{X-Real-IP}i
, the desired
information is sent to the log. Therefore, normally you can simply
redefine the common
format to
LogFormat "%{X-Real-IP}i %l %u %t %D \"%r\" %>s %b" common
Incase the traffic between
client and backend
must be debugged, the statement trafficlog
filename can
be issued. This causes the traffic to be dumped in hexadecimal
format to the stated filename.
Traffic sent by the client is prefixed by a C, traffic sent by the back end is prefixed by a B. Below is a sample traffic dump of a browser trying to get a HTML page. The server replies that the page was not modified.
C 0000 47 45 54 20 68 74 74 70 3a 2f 2f 77 77 77 2e 63 GET http://www.c C 0010 73 2e 68 65 6c 73 69 6e 6b 69 2e 66 69 2f 6c 69 s.helsinki.fi/li C 0020 6e 75 78 2f 6c 69 6e 75 78 2d 6b 65 72 6e 65 6c nux/linux-kernel C 0030 2f 32 30 30 31 2d 34 37 2f 30 34 31 37 2e 68 74 /2001-47/0417.ht C 0040 6d 6c 20 48 54 54 50 2f 31 2e 31 0d 0a 43 6f 6e ml HTTP/1.1..Con C 0050 6e 65 63 74 69 6f 6e 3a 20 63 6c 6f 73 65 0d 0a nection: close.. . . etcetera . B 0000 48 54 54 50 2f 31 2e 30 20 33 30 34 20 4e 6f 74 HTTP/1.0 304 Not B 0010 20 4d 6f 64 69 66 69 65 64 0d 0a 44 61 74 65 3a Modified..Date: B 0020 20 54 75 65 2c 20 31 32 20 4a 75 6c 20 32 30 30 Tue, 12 Jul 200 B 0030 35 20 30 39 3a 34 39 3a 34 37 20 47 4d 54 0d 0a 5 09:49:47 GMT.. B 0040 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 74 65 Content-Type: te B 0050 78 74 2f 68 74 6d 6c 3b 20 63 68 61 72 73 65 74 xt/html; charset . . etcetera .
Turning on traffic dumps will significantly slow down crossroads.
Besides trafficlog
, there is also a directive
throughputlog
. This directive also takes one argument, a
filename. The file is appended, and the following information is
logged:
As an example, consider the following (the lines are shortened for brevity and prefixed by line numbers for clarity):
1 0000594 0.000001 C GET http://public.e-tunity.com/index.html... 2 0000594 0.173713 B HTTP/1.0 200 OK..Date: Fri, 18 Nov 2005 0... 3 0000594 0.278125 B width="100" bgcolor="#e0e0e0" valign="to... 4 0000595 0.000001 C GET http://public.e-tunity.com/css/style/... 5 0000594 0.944339 B /a></td>.. </tr>.</table>.</td><td class... 6 0000594 0.946356 B smallboxdownl">Download</td>.. <td class... 7 0000594 0.961102 B td><td class="smallboxodd" valign="top"><... 8 0000595 0.698215 B HTTP/1.0 304 Not Modified..Date: Fri, 18 ...
This tells us that:
index.html
requested in line 1.
It is also worth while remembering that the start time of a C request is the time that crossroads sees the activity. Any latency between the true client and crossroads is obviously not included. This is illustrated by the below simple ASCII art:
client ---->---->---->--->*crossroads ====>====>====> \ back end / client ----<----<----<---< crossroads ====<====<====<
This simple picture shows a typical HTTP request that originates
at a client, travels to crossroads, and is relayed via the back
end. The C entry in a throughput log is the time when
crossroads sees the request, indicated by an asterisk. The B
entries are the times that it takes the back end to answer,
indicated by ===
style lines. Therefore, the true roundtrip
time will be longer than the number of seconds that are logged in
the throughput log: the latency between client and crossroads
isn't included in that measurement.
Summarizing, the throughput times of a client-back end connection
can be analyzed using the directive throughputlog
. In a
real-world analysis, you'd probably want to write up a script to
analyze the output and to compute round trip times. Such scripts
are not (yet) included in Crossroads.
As a general hint, use crossroads sampleconf
to view the most
up-to-date examples of configurations. The description below shows a
few examples too.
5.5.1: A load balancer for three webserver back ends
The following configuration example binds crossroads to port 80 of the current server, and distributes the load over three back ends. This configuration shows most of the possible settings.
service www { /* We don't need session stickyness. */ type any; /* Port on which we'll listen in this service: required. */ port 8000; /* What IP address should this service listen? Default is 'any'. * Alternatively you can state an explicit IP address, such as * 127.0.0.1; that would bind the service only to 'localhost'. */ bindto any; /* Verbose reporting or not. Default is off. */ verbosity on; /* Dispatching mode, or: How to select a back end for an incoming * request. Possible values: * roundrobin: just the next back end in line * random: like roundrobin, but at random to make things more * confusing. Probably only good for testing. * bysize: The backend that transferred the least nr of bytes * is the next in line. As a modifier you can say e.g. * bysize over 10, meaning that the 10 last connections will * be used to compute the transfer size, instead of all * transfers. * byduration: The backend that was active for the shortest time * is the next in line. As a modifier you can say e.g. * byduration of 10 to compute over the last 10 connections. * byconnections: The back end with the least active connections * is the next ine line. * byorder: The first available back end is always taken. */ dispatchmode byduration over 5; /* Interval at which we'll check whether a temporarily unavailable * backend has woken up. */ revivinginterval 5; /* TCP backlog of connections. Default is 0 (no backlog, one * connection may be active). */ backlog 5; /* For status reporting: a shared memory key. Default is the same * as the port number, OR-ed by a magic number. */ shmkey 8000; /* This controls when crossroads should consider a connection as * finished even when the TCP sockets weren't closed. This is to * avoid hanging connections that don't do anything. NOTE THAT when * crossroads cuts off a connection due to timeout exceed, this is * not marked as a failure, but as a success. Default is 0: no timeout. */ connectiontimeout 300; /* The max number of allowed client connections. When present, connections * won't be accepted if the max is about to be exceeded. When * absent, all connections will be accepted, which might be misused * for a DOS attack. */ maxconnections 300; /* Now let's define a couple of back ends. Number 1: */ backend www_backend_1 { /* The server and its port, the minimum configuration. */ server httpserver1; port 9010; /* The 'decay' of usage data of this back end. Only relevant * when the whole service has 'dispatchmode bysize' or * 'byduration'. The number is a percentage by which the usage * parameter is decreased upon each connection of an other back * end. */ decay 10; /* To see what's happening in /var/log/messages: */ verbosity on; } /* The second one: */ backend www_backend_2 { /* Server and port */ server httpserver2; port 9011; /* Verbosity of reporting when this back end is active */ verbosity on; /* Decay */ decay 10; /* This back end is twice as weak as the first one */ weight 2; /* Event triggers for system commands upon succesful activation * and upon failure. */ onsuccess echo 'success on backend 2' | mail root; onfailure echo 'failure on backend 2' | mail root; } /* And yet another one.. this time we will dump the traffic * to a trace file. Furthermore we don't want more than 10 concurrent * connections here. Note that there's also a total maxconnections for the * whole service. */ backend www_backend_3 { server httpserver3; verbosity on; port 9000; verbosity on; decay 10; trafficlog /tmp/backend.3.log; maxconnections 10; } }
5.5.2: An HTTP forwarder when travelling
As another example, here's my crossroads.conf
that I use on my
Unix laptop. The problem that I face is that I need many HTTP proxy
configurations (at home, at customers' sites and so on) but I'm too
lazy to reconfigure browsers all the time.
Here's how it used to be before crossroads:
http://localhost:3128
.
http://localhost:3129
.
http://10.120.34.113:8080
, because they have configured it
so.
http://localhost:8888
.Here's how it works with a crossroads configuration:
http://localhost:8080
as the proxy. For all situations.
dispatchmode byorder
. This
makes sure that once crossroads determines which
backend works, it will stick to it. This usage of
crossroads doesn't need to balance over more than one
back end.
bindto 127.0.0.1
makes sure
that requests from other interfaces than loopback
won't get serviced.
service HttpProxy { port 8080; bindto 127.0.0.1; verbosity on; dispatchmode byorder; revivinginterval 15; backend Charles { server localhost:8888; verbosity on; } backend CustomerProxy { server 10.120.34.113:8080; verbosity on; } backend SshTunnel { server localhost:3129; } backend LocalSquid { server localhost:3128; } }
As a final note, the commandline argument tell
can be used to
influence crossroad's own detection mechanism of back end availability
detection. E.g., if in the above example the back ends SshTunnel
and LocalSquid
are both active, then crossroads tell httpproxy
sshtunnel down
will 'take down' the back end SshTunnel
-- and
will automatically cause crossroads to switch to LocalSquid
.
5.5.3: SSH login with enforced idle logout
The following example shows how crossroads 'throttles' SSH logins. Connections are accepted on port 22 (the normal SSH port) and forwarded to the actual SSH daemon which is running on port 2222.
Note the usage of the
connectiontimeout
directive. This makes sure that users are logged
out after 10 minutes of inactivity. Note also the maxconnections
setting, this makes sure that no more than 10 concurrent logins occur.
service Ssh { port 22; backlog 5; maxconnections 10; connectiontimeout 600; backend TrueSshDaemon { server localhost:2222; } }
The benchmark was run on a system where the following was varied:
The crossroads configuration of the second alternative is shown below:
service HttpProxy { port 8080; verbosity on; backend LocalSquid { server 127.0.0.1; port 3128; verbosity on; } }
The results of this test are that crossroads causes a negligible delay, if it is statistically relevant at all. Without crossroads, the timing results are:
real 0m8.146s user 0m0.130s sys 0m0.253s
When using crossroads as a middle station, the results are:
real 0m9.481s user 0m0.141s sys 0m0.230s
The above shown results are quite favorable to crossroads. However, one should know that situations will exist where crossroads leans towards the 'worst case' scenario, causing up to 50% delay.
E.g., imagine a test where a wget
command retrieves a
HTML document from an Apache server on localhost
. Now we have
(almost) no overhead due to network throttling, hostname lookups and
so on. When this test would be run either with or without crossroads
in between, then theoretically, crossroads would cause a much larger
delay, because it has to read from the server, and then write the same
information to wget
. Each read/write occurs twice when crossroads
sits in between.
This worst case scenario will however (fortunately) occur only very seldom in the real world:
LVS is a kernel-based balancer that acts like a masquerading firewall: TCP packets that arrive at the balancer are sent to one of the configured back ends. LVS has the advantage over crossroads that there is no stop-and-go in the transmission; in contrast, crossroads needs to send data via an internal buffer. Crossroads has the advantage that it offers instantaneous failover because it tries to contact the back end for upon each new TCP connection; in contrast, LVS isn't aware of downtime of back ends (unless one implements an external heartbeat). Also, crossroads offers more complex balancing than LVS.
On the balancer, LVS was run on port 80, its forwarding set up for two
equally weighted back ends, using ipvsadm
:
ipvsadm -a -t 192.168.1.250:http -r 10.1.1.100:http -m -w 1 ipvsadm -a -t 192.168.1.250:http -r 10.1.1.101:http -m -w 1
Crossroads was run on port 81. The configuration file is shown below:
service http { port 81; dispatchmode roundrobin; revivinginterval 5; backend one { server 10.1.1.100; port 80; } backend two { server 10.1.1.101; port 80; } }
In the first test, ports 80 and 81 on the balancer were 'bombed' with 50 concurrent clients, each requesting a small page 50 times. The following timings where measured:
The results of this test were:
In this setup there seems to be no difference between the performance of LVS and crossroads!
In a second test, the size of the retrieved page was varied from 2.000 to 2.000.000 bytes. This test was taken to see whether crossroads would show performance degradation when transferring larger amounts of data.
For each page size, 30 concurrent clients were started, that retrieved the page 50 times. Again, the connect times and processing times where recorded.
The results of the total time (connect time + retrieval time) are shown in the below table:
Bytes | LVS timing | Crossroads timing |
2000 | 0.130741688 | 0.12739582 |
20000 | 0.490916224 | 0.50376901 |
200000 | 3.799440328 | 4.33125273 |
2000000 | 45.25090855 | 45.9600728 |
Again, the results show that crossroads performs just as effectively as LVS, even with large data chunks!
The creation of crossroads requires:
sed
, awk
, Perl
(5.00 or better);
Basically a Linux or Apple MacOSX box will do nicely. To compile and install crossroads, follow these steps.
crossroads-
type.tar.gz
, where type is
stable
or devel
.
tar
xzf crossroads-
X.YY.tar.gz
. The contents spill into a
subdirectory crossroads-
X.YY/.
etc/Makefile.def
and verify that all
compilation settings are to your likings. The settings are
explained in the file. Note that the default distribution
of Makefile.def
is suited for Linux or Apple MacOSX
systems. On other Unices, or on non-Unix systems, you must
particularly pay attention to SET_PROC_TITLE_BY...
. When
in doubt, comment out all SET_PROC_TITLE...
settings. Crossroads will work nevertheless, but it won't show
nice titles in ps
listings. Also there's a macro
EXTRA_LIBS
to add linkage flags (an example for a Solaris
build is included).
make
local
followed by make install
. The latter step may have
to be done by the user root
if the BINDIR
setting of
etc/Makefile.def
points to a root-owned directory.
cp doc/crossroads.html
htmldirectory/; where htmldirectory is the destination
directory for your HTML manuals;
cp doc/crossroads.pdf
pdfdirectory/; where pdfdirectory is the
destination directory for your PDF manuals;
cp doc/crossroads.man
manualdirectory/crossroads.1
, where
manualdirectory is e.g. /usr/man/man1
,
/usr/share/man1
, /usr/local/man/man1
,
/usr/local/share/man1
. Any possibility is valid, as
long as manualdirectory is one of the directories
where manual pages are stored;
gzip
manualdirectory/crossroads.1
.
Now that the binary is available on your system, you need to create a
suitable /etc/crossroads.conf
. Use this manual or the output of
crossroads samplconf
to get started.
Once you have the configuration ready, start crossroads with
crossroads start
. Test the availability of your services and back
ends. Monitor how crossroads is doing with:
while [ 1 ] ; do tput clear crossroads status sleep 3 done
Note that depending on your system you might need
sleep 3s
, i.e., with an s
appended.
while [ 1 ] ; do tput clear ps ax | grep crossroads | grep -v grep sleep 3 done
Note that depending on your system you might need
ps -ef
instead of ps ax
.
tail -f
/var/log/messages
(supply the appropriate system log file if
/var/log/messages
doesn't work for you).
Now thoroughly test the availability of your back ends through
crossroads. The status display will show an updated view of which back
ends are selected and how busy they are. The process list will show
which crossroads daemons are running. Finally, the tailing of
/var/log/messages
shows what's going on -- especially if you have
verbosity true
statements in the configuration.
Finally, you may want to create a boot-time startup script. The exact procedure depends on the used Unix flavor.
On SysV style systems, there's a startup script directory
/etc/init.d
where bootscripts for all utilities are located.
You may have the chkconfig
utility to automate the task of
inserting scripts into the boot sequence, but
otherwise the steps will resemble the following.
crossroads
in /etc/init.d
similar to the
following:
#!/bin/sh /usr/local/bin/crossroads -v $@
The stated directory /usr/local/bin
must correspond with
the installation path. The flag -v
causes the startup to
be more 'verbose'. However, once daemonized, the verbosity is
controlled by the appropriate statements in the configuration.
root> cd /etc/rc.d/rc3.d root> ln -s /etc/init.d/crossroads S99crossroads root> ln -s /etc/init.d/crossroads K99crossroads
This creates startup (S*
) and stop (K*
) links that
will be run when the system enters or leaves a given runlevel.
If your runlevel is 5, then the right cd
command is to
/etc/rc.d/rc5.d
. Alternatively, you can create the
symlinks in both runlevel directories.
On BSD style systems, daemons are booted directly from /etc/rc
and
related scripts. Incase you have a file /etc/rc.local
, edit it,
and add the statement:
/usr/local/bin/crossroads start
If your BSD system lacks /etc/rc.local
, then you may need to start
Crossroads from /etc/rc
. Your mileage may vary.