Wednesday, March 16, 2016

Network namespaces and NetworkManager

This post documents a process to implement support for network namespaces in the NetworkManager. The code described in the post can be found on GitHub. While my personal motivation to add namespace support to NetworkManager was to be able to add support for provisioning domains as specified by IETF MIF WG, it also brings benefits to existing users by allowing isolation of different applications into different network namespaces. The best example is a VPN connection isolation. When network namespaces are used then certain applications can be started in namespace in which only network connectivity is via VPN. Those applications can access VPN resources while all the other applications, that are in different network namespaces, will not see VPN connection and thus couldn't use them. The additional benefit would be from using multiple connections, as described by MIF Architecture RFC.

Note that after I started to work on this, Thomas Haller implemented basic support for namespaces in NetworkManager that was a bit different in some implementation details from mine.

The idea


The intention of implementing support for network namespaces was to allow applications to be isolated so that they use specific network connections. Network namespaces in Linux kernel have some properties that anyone that wants to use them must be aware of. This is documented in a separate post.

So, an application can not be moved between different network namespace by some other application, i.e. only application itself can change network namespaces, and only if it has appropriate permissions.

So, the idea is the following one. Application is started in some network namespace. This can be done easily (e.g. see 'ip netns exec' command). Then, this network namespace is manipulated by either the application itself - it is aware of NM's support for network namespaces, or by third party application (that application could be nmcli). The manipulation means that requests are sent to NetworkManager via D-Bus to make changes to network namespace. The changes can be activation and deactivation of certain connections. NetworkManager, based on those requests and on specifics of connections and devices those connections are bound to, determines what to do. For example, it can create virtual device in the network namespace, or can move physical device. Basically, this part isn't important to the application itself, the only thing that is important is that the application is assigned requested connections.

Implementation


The following changes were made to NetworkManager in order to introduce support for network namespaces:

  1. Created new object NMNetnsController whose purpose is to allow management of all network namespaces controlled by NetworkManager. Via the interface org.freedesktop.NetworkManager.NetworkNamespacesController it is possible to create a new network namespace, or to remove the existing one. It is also possible to obtain a list of existing network namespaces.
     
  2. Created new object NMNetns that represents a single network namespace. So, when new network namespace is created a new object NMNetns is created and exposed on D-Bus. This object allows manipulation with network namespace via the interface org.freedesktop.NetworkManager.NetNsInstance. So, it is possible to get a list of all devices within the network namespace, take certain device from some other network namespace and to activate some connection.
     
  3. NMSettings is now singleton object. This wasn't so significant change because there was also one object of this type before, but now it is more explicitly exposed as such.
     
  4. NMPlatform, NMDefaultRouteManager and NMRouteManager aren't singleton objects any more. They are now instantiated for each new network namespace that is created.


VPN isolation


VPN isolation was done as a first user of network namespaces implementation. It was easier then other connections because the assumption was that VPN connection should live only in single network namespace and it should be the only available connection.

At the beginning, there was doubt on where to place the knowledge of network namespace and two places were candidates, in NMActiveConnection and NMVPNConnection classes. NMActiveConnection is actually a base class of NMVPNConnection class. The modification of NMVPNConnection approach is better for the following reason because the idea was to introduce new configuration parameter in the configuration file of a VPN connection that will specify that isolation is necessary and also some additional behaviors:
  • netns-isolate

    Boolean parameter (yes/no) which defines weather VPN connection should be isolated within a network namespace or not. For backwards compatibility reasons
     
  • netns-persistent

    Should network namespace be persistant (yes) or not (no). Persistant namespace will be retained when VPN connection is terminated, while non-persistant will be removed.
     
  • netns-name

    Network namespace name. Special value is uuid which means connection's UUID should be used, also name is special value that requests connection's name to be used. Finally, any other string is taken as-is and used as a network namespace name.
     
  • netns-timeout

    How much time to wait (in milliseconds) for device to appear in target namespace.
Basically, the implementation is such that when device appears in root network namespace it is taken from there (using TakeDevice method, but called directly instead via D-Bus).  When device appears in the target network namespace network parameters are assigned to the interface. This was tested with OpenVPN type of VPN.

The implementation has two problems. First, the case of VPN connections that don't create virtual devices but instead just modify packet processing rules in the Linux kernel (i.e. XFRM). Secondly, hostname and name resolution parameters aren't assigned because the infrastructure is lacking in that respect.

Conclusion


The initial goal of having network namespaces support in NetworkManager was achieved. There are functionalities missing like isolation of any connection, hostname handling and DNS resolution handling. Those are things that will have to be resolved in the future.

NetworkManager and multiple provisioning domains

The goal of this post is to list different options on how to introduce PvDs into the NetworkManager, i.e. what should be changed in NetworkManager and how it should handle explicit and implicit PvDs. But first we'll start with the definition of Provisioning Domain and object that could potentially be used to store/represent Provisioning Domains. The implementation this post refers to can be found on GitHub.

The term Provisioning Domain (PvD) is defined and clarified in RFC7556 as:
A consistent set of network configuration information. Classically, all of the configuration information available on a single interface is provided by a single source (such as a network administrator) and can therefore be treated as a single provisioning domain.  In modern IPv6 networks, multihoming can result in more than one provisioning domain being present on a single link.  In some scenarios, it is also possible for elements of the same PvD to be present on multiple links.
Basically it is a set of configuration information that should be treated as a single unit. Here are some examples of such units of configuration data:
  1. Static IPv4 configuration provided by a user for a server or for a network without DHCP.
  2. Data handed over to a client by DHCP server.
  3. On an IPv6 enabled local network with a single router which sends configuration data in RA to nodes attached to the network.
  4. Configuration data sent by VPN gateway upon successful connection of a client. 
In all these cases we have implicit PvDs, meaning that the sets of configuration data are implicitly bound together and there was no indication whatsoever that they should be treated as a single unit. This is in contrast to explicit PvDs which are sets of configuration data bound together by some explicit mechanism and associated with some kind of a PvD identifier sent to a client in some way. Explicit PvDs, as of time this post was written, don't exist yet, and the IETF MIF working group is trying to define necessary mechanisms to support them as well as how exactly IDs should look like.

Note that apart from explicit PvD and implicit Pvd we also differentiate between PvD and PvD instance. The difference is that PvD consists of a set of PvD instances thate are the same on some local network, while PvD instance is valid for only a single host on a given local network. In other words, PvD will include network prefix and mask, while PvD instance will include host addresses too. It is interesting to note that router advertisements communicate PvDs while DHCP communicates PvD instaces.

How to implement PvDs in the NetworkManager

As always, the same goal can be achieved in multiple ways, so here are the options on how PvDs can be implemented within NM. Basically, there are two main approaches: first, existing objects can be enhanced so that they can represent PvDs or a completely new object can be introduced.

Using NMSettingsConnection object to store PvD and PvD instance


Each network connection (which is not the same as PvD or PvD instance) is stored in NMSettingsConnection object. Those objects are generated from static files or dynamically during NetworkManager's execution. NMSettingsConnection objects are initialized from the following sources:
  1. Distribution configuration files. System dependent network configuration files (e.g. /etc/sysconfig/network-scripts for RHEL based systems) are read by NM via plugins and NMSettingsConnection objects are created as a result.
     
  2. Network manager specific configuration. NetworkManager has its own configuration files that are stored in /etc/NetworkManager/system-connections/.
  3. Dynamically created configurations. While running, NetworkManager allows new configurations to be created via D-Bus interface.
Note that NetworkManager has a concept of profiles that are used in the case of Wired networks. Basically, those are settings which are not bound to any specific network interface. Profiles can have 802.1x type of credentials assigned to them.

So, the idea of integrating PvDs into NetworkManager is for each new PvD or PvD instance to create a new NMSettingsConnection object. The modification to NMSettingsConnection should be extended with PvD ID parameter.

There are several potential problems with this approach:
  1. There is a difference between NMSettingsConnection on the one hand, and PvD and PvD instance on the other hand. For example, some NMSettingsConnection defines a network connection that should be configured using DHCP and in that case the NMSettingsConnection isn't PvD nor PvD instance. On the other hand, NMSettingsConnection can be the same as PvD instance. This is the case with static IPv4 configurations when a user specifies concrete IP addresses. Finally, NMSettingsConnection can be PvD only in the case of IPv6 when host part is generated from MAC address.
  2. When PvDs and PvD instances are received they are valid only for the interface on which they are received. But, a user can request any NMSettingsConnection object to be activated on any interface which isn't possible.
  3. Also, this can create confusion. Take for example preconfigured NMSettingsConnection which is now treated as PvD with a specific PvD ID, and it is defined to use DHCP for the configuration. Obviously, this PvD ID is expected to be valid on a certain interface on a specific attachment point. But due to the way the interface is configured (DHCP) it can actually be activated on any interface on any network that supports DHCP. Thus, it might easily happen that a user by mistake activated this particular NMSettingsConnection on a "wrong" network and so makes a user believe the network is active while in the reality it is not.

    Note that even NMSettingsConnection objects that contain credential information aren't guaranteed to retrieve the same PvD every time the connection is made. Namely, there are AAA servers and infrastructure that allow clients with a same credentials to connect to multiple networks, and thus to potentially receive multiple PvDs.
     
  4. Finally, the problem is that on a single network interface only one NMSettingsObject might be activated and so this prevents having multiple PvDs on a single interface.
Those problems are not unsolvable, i.e. they could be solved by modifying certain aspects of the NetworkManager in general, and NMSettingsObject in particular.

Treating NMActiveConnection object as PvD instance and PvD


Whenever a connection is made in NetworkManager an object is created. Basically, there are two classes for the object, both of which inherit from NMActiveConnection base class. Which class is used depends on the type of the connection. Basically,  the only distinction is made between VPN connections that are represented by NMVPNConnection objects and other connections that are represented by NMActRequest objects. The main task of NMActiveConnection is to bind NMSettingsConnection with NMDevice objects.

The idea in this case is to treat NMActiveConnection as a PvD or a PvD instance, i.e. on each new PvD or PvD instance received new NMActiveConnection is created.

But, there are still some problems:
  1. Since NMActiveConnection objects are transient that means that there would be no history of PvDs used. This might, or might not be a problem, depending on whether we need this history or not.

    The cases when the history would be necessary is if we cache some information for the next time we connect to the given PvD. The second case is if there are processes still using PvD through API and thus the information about PvD must live until the process dies. Note that this letter problem could be solved with delayed removal of NMActiveConnections or by some asynchronous mechanism informing applications that specific NMActiveConnection isn't available any more.
     
  2. The second problem is the question if there could exist two ActiveConnection objects that were created from the same NMSettingsConnection object, i.e. can NMSettingsConnections be shared.
     
  3. The third problem is that it will happen from within a single NMActiveConnection that two or more PvDs are received and this requires that NMActiveConnection is a factory for itself.

Using NMIP4Config and NMIP6Config objects for PvDs and PvD instances


NetworkManager has object/classes for storing IPv4 (libnm-core/nm-setting-ip4-config.c) and IPv6 (libnm-core/nm-setting-ip6-config.c) settings. More precisely, those objects are used to expose network settings of devices to the rest of the NetworkManager. So, in some way they are PvDs in a sense that each of them contains enough information to allow connection to the network.

The problem is that internally NetworkManager keeps a single IPv4/IPv6 configuration object per device and in addition it merges all received configuration data on a single interface.

Specifically, in case of configuration data received in RAs everything is kept in the object NMRdisc defined in src/rdisc/nm-rdisc.h. There you'll find arrays of received configuration data. NetworkManager assumes that a single router sends all the configuration data. This assumption is not valid on a multihomed network, or a network that can send multiple provisioning domains within each RA. What would be necessary is to change this structure so that configuration data is kept separate for each router and provisioning domain.

The problems in this case are:
  1. NMIPxConfig objects were not intended to keep information about available IPv4 and IPv6 addresses but to make available addresses configured on device. So, it reverses the purpose of those objects which isn't accepted so well.
  2. Again, those are transient objects and thus there is no history. It is possible to keep every object alive, but NM isn't designed to behave in such way.
  3. It seems that in libnm there is no way to obtain a list of IPv4 and IPv6 objects.

Having separate PvD structures


This is the final alternative and the most intrusive one. The idea is that settings, active connections and IPv6 and IPv6 objects/classes stay as is, but instead, when each new connection is established a new PvD data structure is created. PvD is inferred from configuration settings or the NetworkManager received explicit PvD.

This would solve the problem that some settings might be used to obtain different PvDs which isn't known until connection is established. For example, if we are using DHCP to configure the interface, then, PvD received depends on the PoA.

It would also solve the problem that the user might try to instantiate one PvD, while some other is actually in use. This way, after the connection is established, appropriate PvD is searched for, or new one is created.

This is most intrusive change that would require change in APIs and thus break compatibility with the existing applications (or require a completely new API).

Current PvD Support Implementation


The first implementation of PvDs was done using NMIP6Config as a PvD container. Before describing the implementation we have to state that the only mechanism currently able to carry PvDs is RA messages. NMIP6Config objects are extended with PvD ID field. At first, there was support for different types of PvD IDs and the first implemented type was UUID stored in ASCII format. Later in the development process PvD ID types were removed and the only possible type is UUID. It seems that this doesn't make implementations less flexible and in the same time substantially reduces complexity.

When RA is received, and after it is processed as usual, a new implicit PvD is created from data in RA. If there are two or more routers on the local network, each sending its own configuration data, then a separate PvD is created for each RA. Also, in case there are PVD container option in RA it is parsed and additional PvD is created from that data.

This information is then handled to NMDevice object which merges data from implicit PvDs (as it does in the unmodified version) but now there is also a hash table with set of PvDs received on the given interface. This information is then exposed through NMActiveConnection object.



Friday, February 12, 2016

Few notes about network namespaces in Linux

For some time I'm working with network namespaces as implemented in the Linux kernel. Here I'll collect some notes about the implementation, behavior, usage and anything else I learn while using network namespaces.

Kernel API for NETNS


Kernel offers two system calls that allow management of network namespaces. The first one is for creating a new network namespace, unshare(2). Actually, this system call allows other types of namespaces to be created, but here we are interested only in network namespaces. So, to create a new network namespace you should call the function like this:
#include <sched.h>
...
    unshare(CLONE_NEWNET);
...
And that would define a new network namespace.

There are two ways other processes can now use that network namespace. The first approach is for the process that created new network namespace to fork other processes and each forked process would share and inherit the parent's process network namespace. The same is true if exec is used.

The second system call kernel offers is setns(2). To use this system call you have to have a file descriptor that is somehow related to the network namespace you want to use. There are two approaches how to obtain the file descriptor.

The first approach is to know the process that lives currently in the required network namespace. Let's say that the PID of the given process is $PID. So, to obtain file descriptor you should open the file /proc/$PID/ns/net file and that's it, pass file descriptor to setns(2) system call to switch network namespace. This approach always works.

The second approach works only for iproute2 compatible tools. Namely, ip command when creating new network namespace creates a file in /var/run/netns directory and bind mounts new network namespace to this file. So, if you know a name of network namespace you want to access (let's say the name is NAME), to obtain file descriptor you just need to open(2) related file, i.e. /var/run/netns/NAME.

Note that there is no system call that would allow you to remove some existing network namespace. Each network namespace exists as long as there is at least one process that uses it, or there is a mount point.

Two remarks for the end of this section. First, there is no system call that would allow one process to move some other process into another network namespace! And second, you need appropriate privileges to use the mentioned system calls, i.e. regular user processes can't switch namespaces.

Socket API behavior


The next question is how Socket API behaves when network namespaces are used, and things here are quite interesting.

First, each socket handle you create is bound to whatever network namespace was active at the time the socket was created. That means that you can set one network namespace to be active (say NS1) create socket and then immediately set another network namespace to be active (NS2). The socket created is bound to NS1 no matter which network namespace is active and socket can be used normally. In other words, when doing some operation with the socket (let's say bind, connect, anything) you don't need to activate socket's own network namespace before that!

Also, to note is that network namespace is per-thread setting, meaning if you set certain network namespace in one thread, this won't have any impact on other threads in the process.

Command line tools


There are two command line tools available to manipulate network namespaces. The first one is nsenter(1) which isn't specific to networking. It allows one to start some process within predefined network namespace. The second tool is ip command from iproute2 package. It allows management of network namespaces and also allows network interfaces to be switched between different namespaces.

NETLINK behavior

To change device from one network namespace to another one, NETLINK must be used. I found somewhere references to /sys files, but at least on my system they don't appear to exist.

One interesting fact is that interface ID is global across all network namespaces - except for loopback interface, i.e. if you create interface in one network namespace and it gets ID N, and then you move it to another network namespace, it will keep ID N.

TBD.

About Me

scientist, consultant, security specialist, networking guy, system administrator, philosopher ;)