Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/integration_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ jobs:
run: sudo apt-get install -y --no-install-recommends build-essential patchelf pkg-config net-tools

- name: Install libkrunfw
run: curl -L -o /tmp/libkrunfw-4.9.0-x86_64.tgz https://github.com/containers/libkrunfw/releases/download/v4.9.0/libkrunfw-4.9.0-x86_64.tgz && mkdir tmp && tar xf /tmp/libkrunfw-4.9.0-x86_64.tgz -C tmp && sudo mv tmp/lib64/* /lib/x86_64-linux-gnu
run: curl -L -o /tmp/libkrunfw-5.0.0-x86_64.tgz https://github.com/containers/libkrunfw/releases/download/v5.0.0/libkrunfw-5.0.0-x86_64.tgz && mkdir tmp && tar xf /tmp/libkrunfw-5.0.0-x86_64.tgz -C tmp && sudo mv tmp/lib64/* /lib/x86_64-linux-gnu

- name: Integration tests
run: RUST_LOG=trace KRUN_ENOMEM_WORKAROUND=1 KRUN_NO_UNSHARE=1 make test
43 changes: 40 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,48 @@ Each variant generates a dynamic library with a different name (and ```soname```

## Networking

In ```libkrun```, networking is provided by two different, mutually exclusive techniques:
In ```libkrun```, networking is provided by two different, mutually exclusive techniques: **virtio-vsock + TSI** and **virtio-net + passt/gvproxy**.

- **virtio-vsock + TSI**: A novel technique called **Transparent Socket Impersonation** which allows the VM to have network connectivity without a virtual interface. This technique supports both outgoing and incoming connections. It's possible for userspace applications running in the VM to transparently connect to endpoints outside the VM and receive connections from the outside to ports listening inside the VM. Requires a custom kernel (like the one bundled in **libkrunfw**) and it's limited to AF_INET SOCK_DGRAM and SOCK_STREAM sockets.
### virtio-vsock + TSI

- **virtio-net + passt/gvproxy**: A conventional virtual interface that allows the guest to communicate with the outside through the VMM using a supporting application like [passt](https://passt.top/passt/about/) or [gvproxy](https://github.com/containers/gvisor-tap-vsock).
This is a novel technique called **Transparent Socket Impersonation** which allows the VM to have network connectivity without a virtual interface. This technique supports both outgoing and incoming connections. It's possible for userspace applications running in the VM to transparently connect to endpoints outside the VM and receive connections from the outside to ports listening inside the VM.

#### Enabling TSI

TSI for AF_INET and AF_INET6 is automatically enabled when no network interface is added to the VM. TSI for AF_UNIX is enabled when, in addition to the previous condition, `krun_set_root` has been used to set `/` as root filesystem.

#### Known limitations

- Requires a custom kernel (like the one bundled in **libkrunfw**).
- It's limited to SOCK_DGRAM and SOCK_STREAM sockets and AF_INET, AF_INET6 and AF_UNIX address families (for instance, raw sockets aren't supported).
- Listening on SOCK_DGRAM sockets from the guest is not supported.
- When TSI is enabled for AF_UNIX sockets, only absolute path are supported as addresses.

### **virtio-net + passt/gvproxy**

A conventional virtual interface that allows the guest to communicate with the outside through the VMM using a supporting application like [passt](https://passt.top/passt/about/) or [gvproxy](https://github.com/containers/gvisor-tap-vsock).

#### Enabling virtio-net

Use `krun_add_net_unixstream` and/or `krun_add_net_unixdgram` to add a virtio-net interface connected to the userspace network proxy.

## Security model

The libkrun security model is primarily defined by the consideration that both the guest and the VMM pertain to the same security context. For many operations, the VMM acts as a proxy for the guest within the host. Host resources that are accessible to the VMM can potentially be accessed by the guest through it.

While defining the security implementation of your environment, you should think about the guest and the VMM as a single entity. To prevent the guest from accessing host's resources, you need to use the host's OS security features to run the VMM inside an isolated context. On Linux, the primary mechanism to be used for this purpose is namespaces. Single-user systems may have a more relaxed security policy and just ensure the VMM runs with a particular UID/GID.

While most virtio devices allow the guest to access resources from the host, two of them require special consideration when used: virtio-fs and virtio-vsock+TSI.

### virtio-fs

When exposing a directory in a filesystem from the host to the guest through virtio-fs devices configured with `krun_set_root` and/or `krun_add_virtiofs`, libkrun **does not** provide any protection against the guest attempting to access other directories in the same filesystem, or even other filesystems in the host.

A mount point isolation mechanism from the host should be used in combination with virtio-fs.
Comment on lines +97 to +98
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A mount point isolation mechanism from the host should be used in combination with virtio-fs.
A mount point isolation mechanism from the host should be used in combination with virtio-fs.
In addition, when using virtio-fs, a guest may exhaust filesystem resources such as inode limits and disk capacity. Controls should be implemented on the host to mitigate this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops... sorry, this came in literally one minute after I merged the PR. Could you please create another PR with the additional text?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #453.


### virtio-vsock + TSI

When TSI is enabled, the VMM acts as a proxy for AF_INET, AF_INET6 and AF_UNIX sockets, for both incoming and outgoing connections. For all that matters, the VMM and the guest should be considered to be running in the network context. As such, you should apply on the VMM whatever restrictions you want to apply on the guest.

## Building and installing

Expand Down
43 changes: 30 additions & 13 deletions src/devices/src/virtio/vsock/device.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ impl Vsock {
host_port_map: Option<HashMap<u16, u16>>,
queues: Vec<VirtQueue>,
unix_ipc_port_map: Option<HashMap<u32, (PathBuf, bool)>>,
enable_tsi: bool,
enable_tsi_unix: bool,
) -> super::Result<Vsock> {
let mut queue_events = Vec::new();
for _ in 0..queues.len() {
Expand All @@ -64,7 +66,13 @@ impl Vsock {

Ok(Vsock {
cid,
muxer: VsockMuxer::new(cid, host_port_map, unix_ipc_port_map),
muxer: VsockMuxer::new(
cid,
host_port_map,
unix_ipc_port_map,
enable_tsi,
enable_tsi_unix,
),
queue_rx,
queue_tx,
queues,
Expand All @@ -82,12 +90,21 @@ impl Vsock {
cid: u64,
host_port_map: Option<HashMap<u16, u16>>,
unix_ipc_port_map: Option<HashMap<u32, (PathBuf, bool)>>,
enable_tsi: bool,
enable_tsi_unix: bool,
) -> super::Result<Vsock> {
let queues: Vec<VirtQueue> = defs::QUEUE_SIZES
.iter()
.map(|&max_size| VirtQueue::new(max_size))
.collect();
Self::with_queues(cid, host_port_map, queues, unix_ipc_port_map)
Self::with_queues(
cid,
host_port_map,
queues,
unix_ipc_port_map,
enable_tsi,
enable_tsi_unix,
)
}

pub fn id(&self) -> &str {
Expand All @@ -102,7 +119,7 @@ impl Vsock {
/// have pending. Return `true` if descriptors have been added to the used ring, and `false`
/// otherwise.
pub fn process_stream_rx(&mut self) -> bool {
debug!("vsock: process_stream_rx()");
debug!("process_stream_rx()");
let mem = match self.device_state {
DeviceState::Activated(ref mem, _) => mem,
// This should never happen, it's been already validated in the event handler.
Expand All @@ -111,10 +128,10 @@ impl Vsock {

let mut have_used = false;

debug!("vsock: process_rx before while");
debug!("process_rx before while");
let mut queue_rx = self.queue_rx.lock().unwrap();
while let Some(head) = queue_rx.pop(mem) {
debug!("vsock: process_rx inside while");
debug!("process_rx inside while");
let used_len = match VsockPacket::from_rx_virtq_head(&head) {
Ok(mut pkt) => {
if self.muxer.recv_pkt(&mut pkt).is_ok() {
Expand All @@ -127,12 +144,12 @@ impl Vsock {
}
}
Err(e) => {
warn!("vsock: RX queue error: {e:?}");
warn!("RX queue error: {e:?}");
0
}
};

debug!("vsock: process_rx: something to queue");
debug!("process_rx: something to queue");
have_used = true;
if let Err(e) = queue_rx.add_used(mem, head.index, used_len) {
error!("failed to add used elements to the queue: {e:?}");
Expand All @@ -145,7 +162,7 @@ impl Vsock {
/// Walk the driver-provided TX queue buffers, package them up as vsock packets, and process
/// them. Return `true` if descriptors have been added to the used ring, and `false` otherwise.
pub fn process_stream_tx(&mut self) -> bool {
debug!("vsock::process_stream_tx()");
debug!("process_stream_tx()");
let mem = match self.device_state {
DeviceState::Activated(ref mem, _) => mem,
// This should never happen, it's been already validated in the event handler.
Expand All @@ -159,7 +176,7 @@ impl Vsock {
let pkt = match VsockPacket::from_tx_virtq_head(&head) {
Ok(pkt) => pkt,
Err(e) => {
error!("vsock: error reading TX packet: {e:?}");
error!("error reading TX packet: {e:?}");
have_used = true;
if let Err(e) = queue_tx.add_used(mem, head.index, 0) {
error!("failed to add used elements to the queue: {e:?}");
Expand All @@ -169,13 +186,13 @@ impl Vsock {
};

if pkt.type_() == uapi::VSOCK_TYPE_DGRAM {
debug!("vsock::process_stream_tx() is DGRAM");
debug!("process_stream_tx() is DGRAM");
if self.muxer.send_dgram_pkt(&pkt).is_err() {
queue_tx.undo_pop();
break;
}
} else {
debug!("vsock::process_stream_tx() is STREAM");
debug!("process_stream_tx() is STREAM");
if self.muxer.send_stream_pkt(&pkt).is_err() {
queue_tx.undo_pop();
break;
Expand Down Expand Up @@ -235,7 +252,7 @@ impl VirtioDevice for Vsock {
byte_order::write_le_u32(data, ((self.cid() >> 32) & 0xffff_ffff) as u32)
}
_ => warn!(
"vsock: virtio-vsock received invalid read request of {} bytes at offset {}",
"virtio-vsock received invalid read request of {} bytes at offset {}",
data.len(),
offset
),
Expand All @@ -244,7 +261,7 @@ impl VirtioDevice for Vsock {

fn write_config(&mut self, offset: u64, data: &[u8]) {
warn!(
"vsock: guest driver attempted to write device config (offset={:x}, len={:x})",
"guest driver attempted to write device config (offset={:x}, len={:x})",
offset,
data.len()
);
Expand Down
16 changes: 8 additions & 8 deletions src/devices/src/virtio/vsock/event_handler.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ use crate::virtio::VirtioDevice;

impl Vsock {
pub(crate) fn handle_rxq_event(&mut self, event: &EpollEvent) -> bool {
debug!("vsock: RX queue event");
debug!("RX queue event");

let event_set = event.event_set();
if event_set != EventSet::IN {
warn!("vsock: rxq unexpected event {event_set:?}");
warn!("rxq unexpected event {event_set:?}");
return false;
}

Expand All @@ -33,11 +33,11 @@ impl Vsock {
}

pub(crate) fn handle_txq_event(&mut self, event: &EpollEvent) -> bool {
debug!("vsock: TX queue event");
debug!("TX queue event");

let event_set = event.event_set();
if event_set != EventSet::IN {
warn!("vsock: txq unexpected event {event_set:?}");
warn!("txq unexpected event {event_set:?}");
return false;
}

Expand All @@ -57,11 +57,11 @@ impl Vsock {
}

fn handle_evq_event(&mut self, event: &EpollEvent) -> bool {
debug!("vsock: event queue event");
debug!("event queue event");

let event_set = event.event_set();
if event_set != EventSet::IN {
warn!("vsock: evq unexpected event {event_set:?}");
warn!("evq unexpected event {event_set:?}");
return false;
}

Expand All @@ -72,7 +72,7 @@ impl Vsock {
}

fn handle_activate_event(&self, event_manager: &mut EventManager) {
debug!("vsock: activate event");
debug!("activate event");
if let Err(e) = self.activate_evt.read() {
error!("Failed to consume vsock activate event: {e:?}");
}
Expand Down Expand Up @@ -147,7 +147,7 @@ impl Subscriber for Vsock {
self.device_state.signal_used_queue();
}
} else {
warn!("Vsock: The device is not yet activated. Spurious event received: {source:?}");
warn!("The device is not yet activated. Spurious event received: {source:?}");
}
}

Expand Down
9 changes: 7 additions & 2 deletions src/devices/src/virtio/vsock/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ mod muxer_thread;
mod packet;
mod proxy;
mod reaper;
mod tcp;
#[cfg(target_os = "macos")]
mod timesync;
mod udp;
mod tsi_dgram;
mod tsi_stream;
mod unix;

pub use self::defs::uapi::VIRTIO_ID_VSOCK as TYPE_VSOCK;
Expand Down Expand Up @@ -59,6 +59,11 @@ mod defs {
pub const TSI_ACCEPT: u32 = 1030;
pub const TSI_PROXY_RELEASE: u32 = 1031;

// Linux definitions that we need for cross-platform compatibility.
pub const LINUX_AF_UNIX: u16 = 1;
pub const LINUX_AF_INET: u16 = 2;
pub const LINUX_AF_INET6: u16 = 10;

pub mod uapi {

/// Virtio feature flags.
Expand Down
Loading