-
Notifications
You must be signed in to change notification settings - Fork 15
Description
When accessing an OpenZFS-on-OSX file system via an NFSv3 mount on an OSX client, creating a new file for writing by calling open(2) with (O_WRONLY|O_CREAT|O_EXCL) fails and sets errno to EIO ("Input/output error"). Also, after the failing open(2) call, a file can be found to exist (with all its permission bits zeroed).
Further, when the NFSv3 client is Linux or FreeBSD, the same failure occurs for the combination of (O_WRONLY|O_CREAT|O_EXCL). Additionally, when using different combinations of open(2) flags that create a new file, writing a small amount of data to the newly created file appears never to store data in the file, but always eventually fails and sets errno to the error with message "Permission denied".
Reproduction steps below. The identical reproduction does not show any unexpected failures when NFSv3 mounting an exported APFS file system from the same OSX host.
Expected behavior: when a pathname does not name an existing file, I'd expect opening with O_CREAT|O_EXCL to succeed (modulo permission checks, inode/storage availability, and so forth). To the best of my knowledge, there are no relevant permissions restrictions or resource exhaustion issues in my experiments.
Additional desired behavior: given the nature of networked file systems, I'm comfortable with the idea that file handles can become stale or otherwise unusable in case of network outage, server restarts, host reboots, et al. To the best of my knowledge, none of those is occurring during in my experiments. So ISTM that writing to newly created files' handles ought to work when nothing else shows signs of failure.
I'd be happy to provide further information and/or run any further reproductions that might be helpful here. (I'll confess, I'm out of practice at capturing NFS network traffic. In case that's needed, please point me in the direction of the best way to get it nowadays.)
NFS server host information
In all cases, I'm running nfsd on a system with the following OS and zfs extension info:
System Software Overview:
System Version: macOS 10.15.7 (19H1323)
Kernel Version: Darwin 19.6.0
Boot Volume: Macintosh HD
Boot Mode: Normal
Computer Name: <redacted>
User Name: <redacted>
Secure Virtual Memory: Enabled
System Integrity Protection: Disabled
Time since boot: 22:00
zfs:
Version: 2.1.0
Last Modified: 8/19/21, 9:28 PM
Bundle ID: org.openzfsonosx.zfs
Notarized: Yes
Loaded: Yes
Obtained from: Identified Developer
Kind: Intel
Architectures: x86_64
64-Bit (Intel): Yes
Location: /Library/Extensions/zfs.kext
Kext Version: 2.1.0
Load Address: 18446743521869111000
Loadable: Yes
Dependencies: Satisfied
Signed by: Developer ID Application: Joergen Lundman (735AM5QEU3), Developer ID Certification Authority, Apple Root CA
NFSv3 clients
I've exercised the reproduction using the following NFSv3 clients:
2a. a different OSX system (running a very vanilla Catalina installation); this is that host's system info is:
System Software Overview:
System Version: macOS 10.15.7 (19H15)
Kernel Version: Darwin 19.6.0
Boot Volume: Macintosh HD - Data
Boot Mode: Normal
Computer Name: <redacted>
User Name: <redacted>
Secure Virtual Memory: Enabled
System Integrity Protection: Disabled
Time since boot: 5 minutes
2b. a fairly vanilla FreeBSD 13.0 system running on another host on the local network. Here's some info about that system:
$ uname -a
FreeBSD <redacted> 13.0-RELEASE FreeBSD 13.0-RELEASE #0 releng/13.0-n244733-ea31abc261f: Fri Apr 9 04:24:09 UTC 2021 [email protected]:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
2c. a Linux system, running as a QEMU guest of the OSX running nfsd. Here's some info about that Linux system:
$ uname -a
Linux debian 4.19.0-17-amd64 openzfsonosx/zfs#1 SMP Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux
$ dpkg -l | grep nfs
ii libnfsidmap2:amd64 0.25-5.1 amd64 NFS idmapping library
ii nfs-common 1:1.3.4-2.5+deb10u1 amd64 NFS support files common to client and server
Software setup
3a. On the OSX NFS server, first create an APFS export, then an OpenZFS-on-OSX export. My local network is 10.0.0.0/24, and my normal unprivileged user account's uid on my network is 1001, so I'll set the uid of the exported directory to that uid.
# mkdir /var/lib/apfs-export
# echo '/private/var/lib/apfs-export -network 10.0.0.0 -mask 255.255.255.0' >> /etc/exports
# nfsd update
# chown 1001 /var/lib/apfs-export
# dd if=/dev/zero of=/var/lib/zfs.bin bs=$((1024*1024)) count=64
64+0 records in
64+0 records out
67108864 bytes transferred in 0.095889 secs (699859212 bytes/sec)
# zpool create nfs-test /var/lib/zfs.bin
# zfs create nfs-test/zfs-export
# echo '/Volumes/nfs-test/zfs-export -network 10.0.0.0 -mask 255.255.255.0' >> /etc/exports
# nfsd update
# chown 1001 /Volumes/nfs-test/zfs-export
3b. Setup on the NFS client. This appears to work equivalently across OSX, Linux, FreeBSD.
# mkdir /tmp/apfs-mount /tmp/zfs-mount
# mount -t nfs <nfsd-host>:/private/var/lib/apfs-export /tmp/apfs-mount
# mount -t nfs <nfsd-host>:/Volumes/nfs-test/zfs-export /tmp/zfs-mount
File writing program
I tried to come up with a reproduction using only shell-level utilities, but the shell and utilities are inconsistent as to whether, when, and how honestly they report syscall errors (certain utilities produce error messages different from what syscalls set errno to, it seems). So here's a file writer. It takes a pathname and a string to write into the file; and its open(2) flags are configured by option switches. This can be compiled with cc -o writef writef.c.
/* writef.c: open a file for writing, write into it. Written to
exercise unexpected behavior on NFS-mounted ZFS file systems. */
#include <errno.h>
#include <fcntl.h>
#include <libgen.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <unistd.h>
char *progname; /* Shared among various output routines. */
extern int optind; /* For getopt. */
void report(char *msg) {
printf("%s", msg); fflush(stdout);
}
void lose() {
fprintf(stderr, "%s: %s\n", progname, strerror(errno));
exit(1);
}
void usage() {
fprintf(stderr, "usage: %s [-cet] PATH STRING\n", progname);
exit(1);
}
int main (int argc, char **argv) {
char *fn, *str; /* Filename and text to write. */
int fd, len; /* File descriptor, string length. */
int open_flags = O_WRONLY; /* 2nd open(2) argument */
int ch; /* For getopt. */
progname = strdup(basename(argv[0]));
/* Turn switches into open_flags. */
while ((ch = getopt(argc, argv, "acet")) != -1) {
switch (ch) {
case 'a':
open_flags |= O_APPEND;
break;
case 'c':
open_flags |= O_CREAT;
break;
case 'e':
open_flags |= O_EXCL;
break;
case 't':
open_flags |= O_TRUNC;
break;
case '?':
default:
usage();
}
}
argc -= optind;
argv += optind;
if (argc < 2)
usage();
fn = argv[0];
str = argv[1];
len = strlen(str);
if ((fd = open(fn, open_flags, 0660)) == -1)
lose();
report("open okay\n");
if (write(fd, str, len) < len)
lose();
/* For eyeball debugging. */
if (write(fd, "\n", 1) < 1)
lose();
report("write okay\n");
if (close(fd) == -1)
lose();
report("close okay\n");
return 0;
}
Driver for writef
This tests handful of open(2) flag combinations. It expects the writef binary to be in the current working directory. Note that in all cases, the function testit ensures that there's no file at the specified path, so the open(2) in writef should always be trying to create a new file. If this file is at run-test.sh beside writef, then it can be run as, e.g., sh ./run-test.sh /tmp/apfs-mount/test.out and sh ./run-test.sh /tmp/zfs-mount/test.out
#!/bin/sh
set -u
path="$1"
# This takes one or more triples as arguments. Each triple consists of
# an option flag for writef (q.v.), the expected exit status from
# writef given the flag and expected file system state, and an error
# message to write if writef didn't exit with the expected status.
testit() (
set +v # see 'set -v' below
! test -e "$path" || { echo "$path exists" > /dev/stderr; exit 1; }
while test $# -gt 0; do
./writef "$1" "$path" "$(date)"
ret=$?
# Show numeric userids in order to avoid any ambiguity about
# UIDs on the server and the client.
ls -ln "$path"
# For "eyeball" verification that the file contains what's
# expected.
cat "$path"
status=0;
if test $ret -ne $2; then
echo "$3" > /dev/stderr
status=1
break
fi
# In order to see the effect of truncating writes, sleep a
# second between loops
sleep 1
shift 3
done
rm -f "$path"
exit $status
)
# Show what we're doing as we go.
set -v
# (0) Show our user id, and show the userid of the target directory.
id -u
ls -lnd "$(dirname "$path")"
# (1) Create the file with (O_WRONLY|O_CREAT).
testit -c 0 "(O_WRONLY|O_CREAT) failed"
# (2) Try creating with (O_WRONLY|O_CREAT|O_EXCL).
testit -ce 0 "(O_WRONLY|O_CREAT|O_EXCL) failed"
# (3) Create the file, then check if (O_CREAT|O_EXCL) prevents opening
# it a second time. Note that the (O_CREAT|O_EXCL) case is supposed to
# exit non-zero.
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
-ce 1 "(O_CREAT|O_EXCL) on an existing file opened something"
# (4) Create the file, then try to truncate it.
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
-t 0 "(O_TRUNC) on an existing file failed"
# (5) Create the file, then try to append to it
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
-a 0 "(O_APPEND) on an existing file failed"
Expected Results
These are the results when running on an NFS-mounted APFS export. These are the expected results of the test script, including the error message in case (3) (i.e., attempting to open an existing file with O_CREAT|O_EXCL).
# (0) Show our user id, and show the userid of the target directory.
id -u
1001
ls -lnd "$(dirname "$path")"
dirname "$path"
drwxr-xr-x 2 1001 0 64 Sep 29 11:02 /tmp/apfs-mount
# (1) Create the file with (O_WRONLY|O_CREAT).
testit -c 0 "(O_WRONLY|O_CREAT) failed"
open okay
write okay
close okay
-rw-r----- 1 1001 0 29 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:24 EDT 2021
# (2) Try creating with (O_WRONLY|O_CREAT|O_EXCL).
testit -ce 0 "(O_WRONLY|O_CREAT|O_EXCL) failed"
open okay
write okay
close okay
-rw-r----- 1 1001 0 29 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:27 EDT 2021
# (3) Create the file, then check if (O_CREAT|O_EXCL) prevents opening
# it a second time. Note that the (O_CREAT|O_EXCL) case is supposed to
# exit non-zero.
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
-ce 1 "(O_CREAT|O_EXCL) on an existing file opened something"
open okay
write okay
close okay
-rw-r----- 1 1001 0 29 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:28 EDT 2021
writef: File exists
-rw-r----- 1 1001 0 29 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:28 EDT 2021
# (4) Create the file, then try to truncate it.
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
-t 0 "(O_TRUNC) on an existing file failed"
open okay
write okay
close okay
-rw-r----- 1 1001 0 29 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:31 EDT 2021
open okay
write okay
close okay
-rw-r----- 1 1001 0 29 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:32 EDT 2021
# (5) Create the file, then try to append to it
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
-a 0 "(O_APPEND) on an existing file failed"
open okay
write okay
close okay
-rw-r----- 1 1001 0 29 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:33 EDT 2021
open okay
write okay
close okay
-rw-r----- 1 1001 0 58 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:33 EDT 2021
Wed Sep 29 11:30:34 EDT 2021
Results when running on an NFS-mounted ZFS export
The following are the results when the NFSv3 client is an OSX host. Case (2) is the unexpected result.
# (0) Show our user id, and show the userid of the target directory.
id -u
1001
ls -lnd "$(dirname "$path")"
dirname "$path"
drwxr-xr-x 3 1001 0 5 Sep 29 11:04 /tmp/zfs-mount
# (1) Create the file with (O_WRONLY|O_CREAT).
testit -c 0 "(O_WRONLY|O_CREAT) failed"
open okay
write okay
close okay
-rw-r----- 1 1001 0 29 Sep 29 11:30 /tmp/zfs-mount/test.out
Wed Sep 29 11:30:48 EDT 2021
# (2) Try creating with (O_WRONLY|O_CREAT|O_EXCL).
testit -ce 0 "(O_WRONLY|O_CREAT|O_EXCL) failed"
writef: Input/output error
---------- 1 1001 0 0 Sep 29 11:30 /tmp/zfs-mount/test.out
cat: /tmp/zfs-mount/test.out: Permission denied
(O_WRONLY|O_CREAT|O_EXCL) failed
# (3) Create the file, then check if (O_CREAT|O_EXCL) prevents opening
# it a second time. Note that the (O_CREAT|O_EXCL) case is supposed to
# exit non-zero.
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
-ce 1 "(O_CREAT|O_EXCL) on an existing file opened something"
open okay
write okay
close okay
-rw-r----- 1 1001 0 29 Sep 29 11:30 /tmp/zfs-mount/test.out
Wed Sep 29 11:30:50 EDT 2021
writef: File exists
-rw-r----- 1 1001 0 29 Sep 29 11:30 /tmp/zfs-mount/test.out
Wed Sep 29 11:30:50 EDT 2021
# (4) Create the file, then try to truncate it.
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
-t 0 "(O_TRUNC) on an existing file failed"
open okay
write okay
close okay
-rw-r----- 1 1001 0 29 Sep 29 11:30 /tmp/zfs-mount/test.out
Wed Sep 29 11:30:54 EDT 2021
open okay
write okay
close okay
-rw-r----- 1 1001 0 29 Sep 29 11:30 /tmp/zfs-mount/test.out
Wed Sep 29 11:30:55 EDT 2021
# (5) Create the file, then try to append to it
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
-a 0 "(O_APPEND) on an existing file failed"
open okay
write okay
close okay
-rw-r----- 1 1001 0 29 Sep 29 11:30 /tmp/zfs-mount/test.out
Wed Sep 29 11:30:56 EDT 2021
open okay
write okay
close okay
-rw-r----- 1 1001 0 58 Sep 29 11:30 /tmp/zfs-mount/test.out
Wed Sep 29 11:30:56 EDT 2021
Wed Sep 29 11:30:57 EDT 2021
The following are the results when the NFSv3 client is a FreeBSD host; the results on a Linux host are identical (modulo timestamps, of course).
Note that case (2) shows the same unexpected behavior as for an OSX NFSv3 client; cases (3) through (5) all demonstrate failure to write to the newly created file. (I observe that in cases (3) through (5), even after the close(2) in writef fails, the file's size shows up as 29 bytes from ls. I'm not sure if that size is coming from the NFS server or the NFS client, so the number might be a red herring.)
# (0) Show our user id, and show the userid of the target directory.
id -u
1001
ls -ld "$(dirname "$path")"
drwxr-xr-x 3 kreuter wheel 5 Sep 29 11:47 /tmp/zfs-mount
# (1) Create the file with (O_WRONLY|O_CREAT).
testit -c 0 "(O_WRONLY|O_CREAT) failed"
open okay
write okay
writef: Permission denied
-rw-r----- 1 1001 0 29 Sep 29 11:50 /tmp/zfs-mount/test.out
(O_WRONLY|O_CREAT) failed
# (2) Try creating with (O_WRONLY|O_CREAT|O_EXCL).
testit -ce 0 "(O_WRONLY|O_CREAT|O_EXCL) failed"
writef: Input/output error
---------- 1 1001 0 0 Sep 29 11:50 /tmp/zfs-mount/test.out
cat: /tmp/zfs-mount/test.out: Permission denied
(O_WRONLY|O_CREAT|O_EXCL) failed
# (3) Create the file, then check if (O_CREAT|O_EXCL) prevents opening
# it a second time. Note that the (O_CREAT|O_EXCL) case is supposed to
# exit non-zero.
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
-ce 1 "(O_CREAT|O_EXCL) on an existing file opened something"
open okay
write okay
writef: Permission denied
-rw-r----- 1 1001 0 29 Sep 29 11:50 /tmp/zfs-mount/test.out
(O_WRONLY|O_CREAT) failed
# (4) Create the file, then try to truncate it.
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
-t 0 "(O_TRUNC) on an existing file failed"
open okay
write okay
writef: Permission denied
-rw-r----- 1 1001 0 29 Sep 29 11:50 /tmp/zfs-mount/test.out
(O_WRONLY|O_CREAT) failed
# (5) Create the file, then try to append to it
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
-a 0 "(O_APPEND) on an existing file failed"
open okay
write okay
writef: Permission denied
-rw-r----- 1 1001 0 29 Sep 29 11:50 /tmp/zfs-mount/test.out
(O_WRONLY|O_CREAT) failed