Network Working Group G. Chen Internet-Draft H. Deng Intended status: Informational B. Zhou Expires: January 12, 2010 CMCC, Inc. M. Xu D. Huo Y. Cao Tsinghua University July 11, 2009 An Incremental Deployable Mapping Service for Scalable Routing Architecture draft-chen-lisp-er-mo-01 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 12, 2010. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Chen, et al. Expires January 12, 2010 [Page 1] Internet-Draft ER+MO July 2009 Abstract This document describes a mechanism of providing mapping service for LISP-like architecture. The mapping service comprises of EID Router (ER) mechanism and supplementary DHT Mapping Overlay (MO), in which ER mechanism is for reducing forwarding entries in routers while driving the packets to the destination through tunnels, and the DHT MO serves as a supplement that provides specific mappings to reduce the number of tunnels. The mechanism is flexibly deployable for ISPs since it costs little and is easy to progress. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Definition of Terms . . . . . . . . . . . . . . . . . . . . . 4 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4. When an ITR meets packets . . . . . . . . . . . . . . . . . . 6 5. Utilization of current BGP system . . . . . . . . . . . . . . 7 5.1. Automatic Mapping obtainment and storage . . . . . . . . . 7 5.2. Mapping propagation by BGP . . . . . . . . . . . . . . . . 7 6. EID Router mechanism . . . . . . . . . . . . . . . . . . . . . 9 6.1. Address aggregation policy . . . . . . . . . . . . . . . . 9 6.2. EID Router . . . . . . . . . . . . . . . . . . . . . . . . 9 6.3. When an ER meets packets . . . . . . . . . . . . . . . . . 9 7. Supplementary DHT Mapping Overlay (MO) . . . . . . . . . . . . 11 7.1. Mapping Node (MN) and Mapping Server (MS) . . . . . . . . 11 7.2. MNID Assignment and K-bucket Table . . . . . . . . . . . . 11 7.3. LOOKUP Process . . . . . . . . . . . . . . . . . . . . . . 12 7.4. Security Consideration of Mapping Storage . . . . . . . . 12 7.5. Self-adaptive Capability . . . . . . . . . . . . . . . . . 13 7.6. Dynamic Adjustment of K value and m value . . . . . . . . 13 7.7. Mapping Storing and Exchanging in Multi-homing Scenario . 13 8. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 14 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15 10. Security Considerations . . . . . . . . . . . . . . . . . . . 16 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 12.1. Normative References . . . . . . . . . . . . . . . . . . . 18 12.2. Informative References . . . . . . . . . . . . . . . . . . 18 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 Chen, et al. Expires January 12, 2010 [Page 2] Internet-Draft ER+MO July 2009 1. Introduction LISP [I-D.farinacci-lisp] is an architecture for scalable routing. It defines two address spaces: Routing Locators (RLOC) and Endpoint Identifiers (EID). LISP uses EIDs as lookup keys for a new EID-to- RLOC mapping database, in which way several mapping services are built such as [I-D.fuller-lisp-alt] and [I-D.meyer-lisp-cons]. In these mapping service solutions, different kinds of overlays are designed and built as database for storing mapping information, as well as providing mapping lookup results for mapping queries. The problem they commonly share is that packets without any caches on current ITR have to be waiting for the reply of mapping lookup query, or simply be dropped by this ITR as long as no relevant cache exists on this ITR. One solution to this problem could be that, instead of sending lookup queries to the Mapping Overlay (MO), data packet itself is sent to the MO as a query (e.g., "Data Probe" in [I-D.fuller-lisp-alt]>) and get forwarded in the MO to the final ETR linked to the site in which the destination EID resides. But usually when a packet is going through the MO, long latency becomes a remarkable problem then. In this draft we describe an incremental deployable mapping service for LISP. This mapping service comprises of EID Router (ER) mechanism and supplementary DHT Mapping Overlay (MO). The ER mechanism is designed for reducing forwarding entries in routers, while driving the packets to the destination through tunnels. The DHT MO serves as a supplement that provides specific mappings to reduce the number of tunnels along the path to the destination. Note that an ER can be deployed unilaterally in an AS for it's own benefits and the DHT MO is unitedly built among ASes however whether to join the MO is not compulsory to an AS (it can still benefit from deploying the ER). The remainder of this document is organized as follows: Section 2 provides the definitions of terms in this document. Section 3 sketches an overview of the mapping service. Section 4 describes how an ITR handles the packets. Section 5 describes how to utilize current BGP system in the mapping service. Section 6 describes how the EID Router mechanism works, and Section 7 describes how to build the DHT Mapping Overlay and how to retrieve mappings in it. And Section 8 shows the steps for deploying the mapping service incrementally. Chen, et al. Expires January 12, 2010 [Page 3] Internet-Draft ER+MO July 2009 2. Definition of Terms Mapping: an EID-to-ELOC mapping. EID aggregated prefix: an aggregated prefix which covers some EID blocks. EID+RLOC aggregated prefix: an aggregated prefix which covers some EID block(s) and RLOC(s). EID Router (ER): a new introduced router which keeps entries to all EID aggregated prefixes. Mapping Node (MN): an entity used for storing a mapping. Each MN holds and can only hold one mapping, and each mapping is related to only one MN. It can be implemented as a process in a MS, which has a data structure to store the mapping as well as the ability to manage and retrieve the mappings. Master Mapping Node (MMN): a chosen Mapping Node used to be the representative among redundant MNs. It is in charge of initiating mapping query and exchanging mappings. Mapping Server (MS): a server specified to physically store mappings. Each MS can hold more than one Mapping Nodes. Mapping Overlay (MO): a DHT overlay, which is designed for storing the distributed mapping information. Only one MO exists among ISPs in the Internet. Chen, et al. Expires January 12, 2010 [Page 4] Internet-Draft ER+MO July 2009 3. Overview The mechanism described in this draft aims: o to eliminate all forwarding entries to distant customer ASes in P routers; o to eliminate the forwarding entries, targeted to distant customer ASes not behind the border routers, in the border routers; o to be deployed incrementally; o to help reduce the number of tunnels. To achieve the four aims above, the mechanism described in this draft mainly comprises of the following two parts: o EID Router (ER) mechanism for non-cached packets tunneling, and o DHT Mapping Overlay (MO) as a supplement, which provides specific mappings to reduce tunneling cost. The EID Router mechanism is designed for the first three aims, and the DHT MO is designed for the last aim. In EID Router mechanism, by manually or automatically setting the default route to an ER (each AS at least has one ER), all forwarding entries to distant customer ASes in P routers, and a part of forwarding entries (targeted to distant customer ASes not behind the border routers) in the border routers can be eliminated. The current running Border Gateway Protocol [RFC4271] is mainly utilized to propagate mappings through the current running BGP speaking system. The most important reason to use the current running BGP speaking system is to make the deployment backward compatible, so that incremental deployment can be achieved. The DHT Mapping Overlay can help reduce the number of tunnels which result from deploying the ER mechanism. It is optional for ISPs and only needs a little investment on it. Chen, et al. Expires January 12, 2010 [Page 5] Internet-Draft ER+MO July 2009 4. When an ITR meets packets When an ITR receives a packet originated from a customer site, it checks whether a copy of mapping exists in its cache first. If the mapping exists, the ITR encapsulates the packet in a LISP header, putting the RLOC extracted from the mapping onto the outer destination address, meanwhile selecting one of the ITR's RLOC as the outer source address. Else if cache misses (i.e., no relevant copy of mapping exists in the ITR), two concurrent events occur: o Data Plane Traffic: the packet simply follows a default route preset manually or automatically to an ER in current AS. Since ER knows whole global mapping information, it can forward every packet to the right ETR by encapsulating the packet in LISP header with the ITR's RLOC in the outer source address and the ETR's RLOC in the outer destination address. o Control Plane Traffic: the ITR sends a Mapping Query to its default Mapping Server (MS) in the AS. And then a mapping LOOKUP process (details of mapping lookup process are shown in Section 7) is launched in the Mapping Overlay (MO) by the Master Mapping Node (MMN) of the ITR. After the MMN receives a copy of queried mapping from the MO, it returns the copy to the ITR which initiated the Mapping Query, and is cached for a period of time. Chen, et al. Expires January 12, 2010 [Page 6] Internet-Draft ER+MO July 2009 5. Utilization of current BGP system The BGP is an inter-Autonomous System routing protocol. The primary function of a BGP speaking system is to exchange network reachability information with other BGP systems. This network reachability information includes information on the list of ASes that reachability information traverses. This information is sufficient for constructing a graph of AS connectivity for this reachability, as well as inevitable for constructing the mappings from EIDs onto RLOCs automatically. Moreover, especially for incremental deployment requirement, which means ASes deployed new mechanism must work along with those not deployed ones, it is necessary to design mapping service inherently adaptable for the current running BGP system (i.e., the BGP system we use for basic routing and forwarding today). The BGP in the mapping service has two functions: to obtain the mappings automatically, and to propagate mappings to ERs in other ASes. They're both based on current running BGP system. 5.1. Automatic Mapping obtainment and storage When an customer AS advertise an BGP UPDATE message to homed (no matter single-homed or multi-homed) provider AS which is deployed the DHT mapping server described in Section 7, the provider AS would set or update the relevant mapping information according to the advertised route to the customer AS. The announced prefix is treat as the EID in the mapping and the address of the ETR which directly receives BGP announcement from the customer AS is chosen as the RLOC. This mapping could be stored both in MN (Mapping Node) and ER (EID Router) concurrently. In the former case, one mapping refers to one MN and vice versa as described in Section 7. However in the latter case, the mapping is not only stored in the ER in current provider AS, but also propagated to distant provider ASes by BGP advertisements and stored in ERs at those ASes. Note that the mappings obtained so far are original specific mappings. In DHT MO, these original specific mappings are stored on MNs and no changes on mapping granularity. However in ER mechanism, during the mapping propagation by BGP, mapping granularity is changed once a prefix aggregation occurs in an AS (details are shown in Section 5.2). 5.2. Mapping propagation by BGP BGP speakers work as what they act today, in addition that mapping information is affiliated in BGP UPDATE message. Each BGP speaker on Chen, et al. Expires January 12, 2010 [Page 7] Internet-Draft ER+MO July 2009 the route SHALL keep the originality of the mappings (i.e., the mappings stay untouched during propagation), except that it aggregates some prefixes into one. New mapping SHOULD be formed when such aggregation occurs, in which case both EID and RLOC in mapping are updated, that EID is set to the new aggregated EID block which covers more prefixes while RLOC is set to the address of either ER (if ER is deployed) or border router (if no ER is deployed) in current AS. Note that since aggregation is permitted during the mapping propagation, the number of mappings stored on the ERs would be far more less than the number of mappings stored in the MO. Chen, et al. Expires January 12, 2010 [Page 8] Internet-Draft ER+MO July 2009 6. EID Router mechanism 6.1. Address aggregation policy All addresses from edge customer ASes can be seen as the EIDs. EID prefixes can be aggregated to EID aggregated prefix. Moreover we allow EIDs to be aggregated with RLOCs to EID+RLOC aggregated prefix. For example, suppose two EID blocks 166.111.8/24 and 166.111.9/24 belong to two customer ASes homed to a provider AS which has some RLOCs range from 166.111.10/24 to 166.111.11/24, the provider AS can aggregate either to an EID aggregated prefix 166.111.8/23 or to an EID+RLOC aggregated prefix 166.111.8/22. 6.2. EID Router An EID Router is no particular than a legacy router, except that special configuration is applied. It is configured to act as an eBGP speaker, and only loads the forwarding entries to all EID aggregated prefixes. Note that the EID+RLOC aggregated prefixes don't have to be loaded in EID Routers, since the RLOCs in the EID+RLOC aggregated prefixes are supposed be reachable (i.e., forwarding entries to these prefixes should be preserved in the P routers). So the ideal situation becomes: o the EID Routers load the forwarding entries to all EID aggregated prefixes, o the P routers load the forwarding entries to all RLOCs and all EID+RLOC aggregated prefixes, and o the border routers load the forwarding entries to all RLOCs and the prefixes (i.e., EID aggregated prefixes and EID+RLOC aggregated prefixes) of the distant ASes behind the border routers. So due to deploying the EID Router mechanism, P routers and border routers can get their FIB (Forwarding Information Base) size reduced. 6.3. When an ER meets packets When an ER receives a packet, it matches the destination address with entries in its forwarding table (that can be seen as the mapping table). Since the ER holds whole mapping table (from its angle of view), this packet can be encapsulated in a LISP header and sent out. The tunnel end point may be one of the following four kinds of routers: Chen, et al. Expires January 12, 2010 [Page 9] Internet-Draft ER+MO July 2009 o the border router of the peering AS on the path to the destination, in which case aggregation occurs in this peering AS or this peering AS didn't pass the mapping information to the current AS. o the border router of the non-peering AS on the path to the destination, in which case aggregation occurs in this non-peering AS. o the EID Router of a distant AS (either peering or non-peering) on the path to the destination, in which case the downstream AS didn't pass the mapping information to this distant AS so that the ER in this distant AS created a new mapping (the ER's RLOC is set in the mapping). o the destination ETR, in which case the originality of the mapping is maintained. Chen, et al. Expires January 12, 2010 [Page 10] Internet-Draft ER+MO July 2009 7. Supplementary DHT Mapping Overlay (MO) The DHT Mapping Overlay (MO) is based on [Kademlia], a highly efficient protocol of Distributed Hash Table (DHT) overlay for Peer- to-Peer network, which applies XOR as metric to measure distance. Here in the MO, it is adapted to meet several requirements below: o MO should be scalable; o MO should have a good ability of redundancy; o MO should be self-adaptive for mapping adding or failure; o MO should be flexible for balancing performance and overhead; o MO should support multi-homing scenario. The benefit of deploying the MO is that, it provides specific mappings since it doesn't aggregate prefixes (i.e., mappings stored in MO are finest-granulated that each mapping refers to one relation between a customer AS and one of its provider site). Due to the large number of such fines-granulated mappings, the MO should be scalable and capable for redundancy. So DHT is chosen as the means of distributing the mappings. 7.1. Mapping Node (MN) and Mapping Server (MS) As described in Section 5.1, a mapping is automatically obtained from the BGP advertisement through the ETR. Afer that it is sent to a MS in current provider AS and then stored in a new created MN (or manually set on the MN). Note that each mapping can only be initially stored on one MN in the MO, and each MS can accommodate more than one MNs. For example, an ISP is accessed by 5 customer ASes labeled as a, b, c, d, e, whose corresponding EIDs are v, w, x, y, z respectively. These five EID prefixes of customer ASes are one- to-one mapped, forming five MNs physically existed on one or multiple MSes administrated by the ISP. 7.2. MNID Assignment and K-bucket Table In the MO, each MN is assigned a 160 bit ID. The DHT MO utilizes the highest numerical IP address reserved in customer ASes as a MNID. For example, assume a customer AS with a prefix 162.137.2/24 is mapped to the RLOC 134.121.3.56. The lower 32 bits of the MNID of the corresponding Master Mapping Node (MMN) is 0xA28902FE (i.e., 162.137.2.254), and the rest 128 bits are all 0. The mapping will be stored on this MMN and several (at least one) other MNs whose MNIDs are closest to the MNID 0xA28902FE. Chen, et al. Expires January 12, 2010 [Page 11] Internet-Draft ER+MO July 2009 Each MN manages a K-bucket table of its own that keeps the information how it can reach other MNs (i.e., the RLOCs of the resident MSes of these MNs). Each MN's reachability imformation is stored on a node in K-bucket. The table of a MN N consists of 160 rows in which the i-th row (0 <= i < 160) preserves the reachability information of some MNs (i.e., the RLOCs of the resident MSes of these MNs) which are at a distance range 2^I ~ 2^(i+1) from N. If i becomes quite large, the number of nodes that the i-th row preserves is limited to K at most. 7.3. LOOKUP Process LOOKUP process needs to call FIND_MAP with MNID of destination MN as parameter. Here describes the FIND_MAP procedure (MN B is the destination MN): 1. MN A calculate the distance D from A to B (D = A XOR B); 2. Fetch m MNs from the right row of K-bucket table of MN A and then query them (call FIND_MAP for every one of these m MNs); 3. MN A set a timer waiting reply for each MN that a called FIND_MAP. If it expires, then delete information of corresponding MN in K-bucket table. 4. Each MN who received FIND_MAP call will check if it is one of the closest MNs destined to B. If so then return mapping to MN A; else like in step 1 and 2, calculates distance D and fetches m closer MNs, then return them to MN A. 5. MN A continues to send FIND_MAP calls to those returned MNs until mapping returned or find K closest MNs (which means no such mapping existed). 7.4. Security Consideration of Mapping Storage In native Kademlia, any MN can initiate a STORE call to put the pair on other K closest nodes. But for the reason that it could probably cause security problem, for instance a malicious MN store a wrong mapping in other MNs, a mapping can only initially stored on one or more MNs (a MMN is chosen) which are under supervision of the ISP who in fact controls this mapping. And only the MMN is authorized to call STORE. After running for hours, MNs in some other autonomous systems could keep cache of the mapping. Chen, et al. Expires January 12, 2010 [Page 12] Internet-Draft ER+MO July 2009 7.5. Self-adaptive Capability Comparing to other non-DHT mapping system, the DHT MO is more adaptive for MN failure and dynamic MN joining. Assume an ISP deploys multiple MSes for the address block of a customer AS in one or multiple provider ASes it administrates. When some of MNs go down, as long as at least one MN is healthy, mappings service can be normally provided without manually configuration. Even if they're all out of health temporarily, mapping information cached on other MNs could also be available in a period of time (cache updating period). When a new customer site accesses to some ISP, a new mapping is required to be added in the MO. It needs to add a new MN u into the MO and put this mapping in MN u. At first, an existing MN w in MO should be known and w is put into u's K-bucket table. Then do a LOOKUP process with u's MNID as parameter. Finally information in K-bucket table of MN u can be built up and meanwhile other MNs update their K-bucket table as well during the LOOUP process. 7.6. Dynamic Adjustment of K value and m value After one LOOKUP, if the time of this LOOKUP is greater than threshold t (manually configured by ISP), which implies that this LOOKUP spent too long time, then increase K by 1. At the same time, if 2m < K then m = 2m, otherwise increase m by 1. Consequently, more queries will be sent to MNs during this LOOKUP process. However if the time of this LOOKUP is no greater than t, K value and m value stay not changed. When congestion occurs in some AS, K value and m value both decrease by 1 to suppress number of updates that used to keep in touch with other MNs. 7.7. Mapping Storing and Exchanging in Multi-homing Scenario Suppose a scenario that a customer site accesses to more than one ISP, which is called multi-homing. When a new MMN x puts the new mapping in the mapping system, another MMN y with the same MNID will be probed in the MO. Different to native Kademlia protocol, no "ID Collision Error" occurs. Instead x tells y this new mapping and meanwhile obtains mapping information existed already. Finally x and y both know all mapping information about how to destine for the customer AS. Of course x and y will probe each other to ensure availability every period of time. Chen, et al. Expires January 12, 2010 [Page 13] Internet-Draft ER+MO July 2009 8. Incremental Deployment This mechanism is practical for incremental deployment, since no big changes introduced on existing routers. Instead of deploying an imperative third-party infrastructure over current Internet, an ISP only puts one or more MSes in its domain and configures it to join the MO if it wants to benefit from deploying the DHT MO. An ISP could start from deploying an ER in its domain, through which way the number of entries in other routers in this domain could be reduced however the length of the intra-domain route grows. It's up to ISPs to decide whether to tolerate such length-stretch to obtain decrease of FIB (Forwarding Information Base) size. As time goes by, suppose more and more ISPs have deployed ERs. Some of them may then deploy the DHT MO to benefit from specific mappings (that can decrease number of tunnels needed in each data transmission) by simply putting MSes in their ASes and let them join the MO automatically as described in Section 7. There're no new particular devices or functions required to support backward-compatibility. Chen, et al. Expires January 12, 2010 [Page 14] Internet-Draft ER+MO July 2009 9. Acknowledgements Chen, et al. Expires January 12, 2010 [Page 15] Internet-Draft ER+MO July 2009 10. Security Considerations The ERs can apply any existing security mechanisms for BGP to enhance the security. And for DHT MO, existing authentication methods for DHT (especially for Kademlia) can be adapted to enhance its security. Other new security enhancements are expected to design to support the mechanism in this draft in future. Chen, et al. Expires January 12, 2010 [Page 16] Internet-Draft ER+MO July 2009 11. IANA Considerations Chen, et al. Expires January 12, 2010 [Page 17] Internet-Draft ER+MO July 2009 12. References 12.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. 12.2. Informative References [I-D.farinacci-lisp] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "Locator/ID Separation Protocol (LISP)", draft-farinacci-lisp-12 (work in progress), March 2009. [I-D.fuller-lisp-alt] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "LISP Alternative Topology (LISP+ALT)", draft-fuller-lisp-alt-05 (work in progress), February 2009. [I-D.meyer-lisp-cons] Brim, S., "LISP-CONS: A Content distribution Overlay Network Service for LISP", draft-meyer-lisp-cons-04 (work in progress), April 2008. [Kademlia] Maymounkov, P. and D. Mazieres, "Kademlia: A Peer-to-peer Information System Based on the XOR Metric", IPTPS'02, Boston, 2002. Chen, et al. Expires January 12, 2010 [Page 18] Internet-Draft ER+MO July 2009 Authors' Addresses Gang Chen CMCC, Inc. 53A, Xibianmennei Ave., Xuanwu District Beijing 100053 P.R.China Phone: +86-10-1391-071-0674 Email: phdgang@gmail.com Hui Deng CMCC, Inc. 53A, Xibianmennei Ave., Xuanwu District Beijing 100053 P.R.China Phone: +86-10-1391-075-0201 Email: denghui02@gmail.com Bo Zhou CMCC, Inc. 53A, Xibianmennei Ave., Xuanwu District Beijing 100053 P.R.China Phone: +86-10-1381-194-8723 Email: zhouboyj@chinamobile.com Mingwei Xu Tsinghua University Department of Computer Science, Tsinghua University Beijing 100084 P.R.China Phone: +86-10-6278-5822 Email: xmw@csnet1.cs.tsinghua.edu.cn Chen, et al. Expires January 12, 2010 [Page 19] Internet-Draft ER+MO July 2009 Dong Huo Tsinghua University Department of Computer Science, Tsinghua University Beijing 100084 P.R.China Phone: +86-10-6278-5822 Email: dhuo.thu@gmail.com Yu Cao Tsinghua University Department of Computer Science, Tsinghua University Beijing 100084 P.R.China Phone: +86-10-6278-5822 Email: cyanalyst@126.com Chen, et al. Expires January 12, 2010 [Page 20]