Skip to content

Unable to create and/or write to newly created files on ZFS via several NFSv3 clients. #104

@kreuter

Description

@kreuter

When accessing an OpenZFS-on-OSX file system via an NFSv3 mount on an OSX client, creating a new file for writing by calling open(2) with (O_WRONLY|O_CREAT|O_EXCL) fails and sets errno to EIO ("Input/output error"). Also, after the failing open(2) call, a file can be found to exist (with all its permission bits zeroed).

Further, when the NFSv3 client is Linux or FreeBSD, the same failure occurs for the combination of (O_WRONLY|O_CREAT|O_EXCL). Additionally, when using different combinations of open(2) flags that create a new file, writing a small amount of data to the newly created file appears never to store data in the file, but always eventually fails and sets errno to the error with message "Permission denied".

Reproduction steps below. The identical reproduction does not show any unexpected failures when NFSv3 mounting an exported APFS file system from the same OSX host.

Expected behavior: when a pathname does not name an existing file, I'd expect opening with O_CREAT|O_EXCL to succeed (modulo permission checks, inode/storage availability, and so forth). To the best of my knowledge, there are no relevant permissions restrictions or resource exhaustion issues in my experiments.

Additional desired behavior: given the nature of networked file systems, I'm comfortable with the idea that file handles can become stale or otherwise unusable in case of network outage, server restarts, host reboots, et al. To the best of my knowledge, none of those is occurring during in my experiments. So ISTM that writing to newly created files' handles ought to work when nothing else shows signs of failure.

I'd be happy to provide further information and/or run any further reproductions that might be helpful here. (I'll confess, I'm out of practice at capturing NFS network traffic. In case that's needed, please point me in the direction of the best way to get it nowadays.)

NFS server host information

In all cases, I'm running nfsd on a system with the following OS and zfs extension info:

System Software Overview:

  System Version:	macOS 10.15.7 (19H1323)
  Kernel Version:	Darwin 19.6.0
  Boot Volume:	Macintosh HD
  Boot Mode:	Normal
  Computer Name:	<redacted>
  User Name:	<redacted>
  Secure Virtual Memory:	Enabled
  System Integrity Protection:	Disabled
  Time since boot:	22:00
zfs:

  Version:	2.1.0
  Last Modified:	8/19/21, 9:28 PM
  Bundle ID:	org.openzfsonosx.zfs
  Notarized:	Yes
  Loaded:	Yes
  Obtained from:	Identified Developer
  Kind:	Intel
  Architectures:	x86_64
  64-Bit (Intel):	Yes
  Location:	/Library/Extensions/zfs.kext
  Kext Version:	2.1.0
  Load Address:	18446743521869111000
  Loadable:	Yes
  Dependencies:	Satisfied
  Signed by:	Developer ID Application: Joergen  Lundman (735AM5QEU3), Developer ID Certification Authority, Apple Root CA

NFSv3 clients

I've exercised the reproduction using the following NFSv3 clients:

2a. a different OSX system (running a very vanilla Catalina installation); this is that host's system info is:

System Software Overview:

  System Version:	macOS 10.15.7 (19H15)
  Kernel Version:	Darwin 19.6.0
  Boot Volume:	Macintosh HD - Data
  Boot Mode:	Normal
  Computer Name:	<redacted>
  User Name:	<redacted>
  Secure Virtual Memory:	Enabled
  System Integrity Protection:	Disabled
  Time since boot:	5 minutes

2b. a fairly vanilla FreeBSD 13.0 system running on another host on the local network. Here's some info about that system:

$ uname -a
FreeBSD <redacted> 13.0-RELEASE FreeBSD 13.0-RELEASE #0 releng/13.0-n244733-ea31abc261f: Fri Apr  9 04:24:09 UTC 2021     [email protected]:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

2c. a Linux system, running as a QEMU guest of the OSX running nfsd. Here's some info about that Linux system:

$ uname -a
Linux debian 4.19.0-17-amd64 openzfsonosx/zfs#1 SMP Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux
$ dpkg -l | grep nfs
ii  libnfsidmap2:amd64               0.25-5.1                            amd64        NFS idmapping library
ii  nfs-common                       1:1.3.4-2.5+deb10u1                 amd64        NFS support files common to client and server

Software setup

3a. On the OSX NFS server, first create an APFS export, then an OpenZFS-on-OSX export. My local network is 10.0.0.0/24, and my normal unprivileged user account's uid on my network is 1001, so I'll set the uid of the exported directory to that uid.

# mkdir /var/lib/apfs-export
# echo '/private/var/lib/apfs-export -network 10.0.0.0 -mask 255.255.255.0' >> /etc/exports
# nfsd update
# chown 1001 /var/lib/apfs-export
# dd if=/dev/zero of=/var/lib/zfs.bin bs=$((1024*1024)) count=64
64+0 records in
64+0 records out
67108864 bytes transferred in 0.095889 secs (699859212 bytes/sec)
# zpool create nfs-test /var/lib/zfs.bin
# zfs create nfs-test/zfs-export
# echo '/Volumes/nfs-test/zfs-export -network 10.0.0.0 -mask 255.255.255.0' >> /etc/exports
# nfsd update
# chown 1001 /Volumes/nfs-test/zfs-export

3b. Setup on the NFS client. This appears to work equivalently across OSX, Linux, FreeBSD.

# mkdir /tmp/apfs-mount /tmp/zfs-mount
# mount -t nfs <nfsd-host>:/private/var/lib/apfs-export /tmp/apfs-mount
# mount -t nfs <nfsd-host>:/Volumes/nfs-test/zfs-export /tmp/zfs-mount

File writing program

I tried to come up with a reproduction using only shell-level utilities, but the shell and utilities are inconsistent as to whether, when, and how honestly they report syscall errors (certain utilities produce error messages different from what syscalls set errno to, it seems). So here's a file writer. It takes a pathname and a string to write into the file; and its open(2) flags are configured by option switches. This can be compiled with cc -o writef writef.c.

/* writef.c: open a file for writing, write into it. Written to
   exercise unexpected behavior on NFS-mounted ZFS file systems. */

#include <errno.h>
#include <fcntl.h>
#include <libgen.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <unistd.h>

char *progname; /* Shared among various output routines. */

extern int optind; /* For getopt. */ 

void report(char *msg) {
  printf("%s", msg); fflush(stdout);
}

void lose() {
  fprintf(stderr, "%s: %s\n", progname, strerror(errno));
  exit(1);
}

void usage() {
  fprintf(stderr, "usage: %s [-cet] PATH STRING\n", progname);
  exit(1);
}

int main (int argc, char **argv) {
  char *fn, *str;            /* Filename and text to write. */
  int fd, len;               /* File descriptor, string length. */
  int open_flags = O_WRONLY; /* 2nd open(2) argument */
  int ch;                    /* For getopt. */
  
  progname = strdup(basename(argv[0]));

  /* Turn switches into open_flags. */
  while  ((ch = getopt(argc, argv, "acet")) != -1) {
    switch (ch) {
    case 'a':
      open_flags |= O_APPEND;
      break;
    case 'c':
      open_flags |= O_CREAT;
      break;
    case 'e':
      open_flags |= O_EXCL;
      break;
    case 't':
      open_flags |= O_TRUNC;
      break;
    case '?':
    default:
      usage();
    }
  }
  argc -= optind;
  argv += optind;
  
  if (argc < 2)
    usage();
  
  fn = argv[0];
  str = argv[1];
  len = strlen(str);
  
  if ((fd = open(fn, open_flags, 0660)) == -1)
    lose();
  report("open okay\n");

  if (write(fd, str, len) < len)
    lose();
  /* For eyeball debugging. */
  if (write(fd, "\n", 1) < 1)
    lose();
  report("write okay\n");

  if (close(fd) == -1)
    lose();
  report("close okay\n");
  
  return 0;
}

Driver for writef

This tests handful of open(2) flag combinations. It expects the writef binary to be in the current working directory. Note that in all cases, the function testit ensures that there's no file at the specified path, so the open(2) in writef should always be trying to create a new file. If this file is at run-test.sh beside writef, then it can be run as, e.g., sh ./run-test.sh /tmp/apfs-mount/test.out and sh ./run-test.sh /tmp/zfs-mount/test.out

#!/bin/sh

set -u

path="$1"

# This takes one or more triples as arguments. Each triple consists of
# an option flag for writef (q.v.), the expected exit status from
# writef given the flag and expected file system state, and an error
# message to write if writef didn't exit with the expected status.
testit() (
    set +v # see 'set -v' below
    ! test -e "$path" || { echo "$path exists" > /dev/stderr; exit 1; }
    while test $# -gt 0; do
	./writef "$1" "$path" "$(date)"
	ret=$?
	# Show numeric userids in order to avoid any ambiguity about
	# UIDs on the server and the client.
	ls -ln "$path"
	# For "eyeball" verification that the file contains what's
	# expected.
	cat "$path"
	status=0;
	if test $ret -ne $2; then
	    echo "$3" > /dev/stderr
	    status=1
	    break
	fi
	# In order to see the effect of truncating writes, sleep a
	# second between loops
	sleep 1
	shift 3
    done
    rm -f "$path"
    exit $status
)

# Show what we're doing as we go.
set -v

# (0) Show our user id, and show the userid of the target directory.
id -u
ls -lnd "$(dirname "$path")"

# (1) Create the file with (O_WRONLY|O_CREAT).
testit -c 0 "(O_WRONLY|O_CREAT) failed"

# (2) Try creating with (O_WRONLY|O_CREAT|O_EXCL).
testit -ce 0 "(O_WRONLY|O_CREAT|O_EXCL) failed"

# (3) Create the file, then check if (O_CREAT|O_EXCL) prevents opening
# it a second time. Note that the (O_CREAT|O_EXCL) case is supposed to
# exit non-zero.
testit -c  0 "(O_WRONLY|O_CREAT) failed" \
       -ce 1 "(O_CREAT|O_EXCL) on an existing file opened something"

# (4) Create the file, then try to truncate it.
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
       -t 0 "(O_TRUNC) on an existing file failed"

# (5) Create the file, then try to append to it
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
       -a 0 "(O_APPEND) on an existing file failed"

Expected Results

These are the results when running on an NFS-mounted APFS export. These are the expected results of the test script, including the error message in case (3) (i.e., attempting to open an existing file with O_CREAT|O_EXCL).

# (0) Show our user id, and show the userid of the target directory.
id -u
1001
ls -lnd "$(dirname "$path")"
dirname "$path"
drwxr-xr-x  2 1001  0  64 Sep 29 11:02 /tmp/apfs-mount

# (1) Create the file with (O_WRONLY|O_CREAT).
testit -c 0 "(O_WRONLY|O_CREAT) failed"
open okay
write okay
close okay
-rw-r-----  1 1001  0  29 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:24 EDT 2021

# (2) Try creating with (O_WRONLY|O_CREAT|O_EXCL).
testit -ce 0 "(O_WRONLY|O_CREAT|O_EXCL) failed"
open okay
write okay
close okay
-rw-r-----  1 1001  0  29 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:27 EDT 2021

# (3) Create the file, then check if (O_CREAT|O_EXCL) prevents opening
# it a second time. Note that the (O_CREAT|O_EXCL) case is supposed to
# exit non-zero.
testit -c  0 "(O_WRONLY|O_CREAT) failed" \
       -ce 1 "(O_CREAT|O_EXCL) on an existing file opened something"
open okay
write okay
close okay
-rw-r-----  1 1001  0  29 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:28 EDT 2021
writef: File exists
-rw-r-----  1 1001  0  29 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:28 EDT 2021

# (4) Create the file, then try to truncate it.
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
       -t 0 "(O_TRUNC) on an existing file failed"
open okay
write okay
close okay
-rw-r-----  1 1001  0  29 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:31 EDT 2021
open okay
write okay
close okay
-rw-r-----  1 1001  0  29 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:32 EDT 2021

# (5) Create the file, then try to append to it
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
       -a 0 "(O_APPEND) on an existing file failed"
open okay
write okay
close okay
-rw-r-----  1 1001  0  29 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:33 EDT 2021
open okay
write okay
close okay
-rw-r-----  1 1001  0  58 Sep 29 11:30 /tmp/apfs-mount/test.out
Wed Sep 29 11:30:33 EDT 2021
Wed Sep 29 11:30:34 EDT 2021

Results when running on an NFS-mounted ZFS export

The following are the results when the NFSv3 client is an OSX host. Case (2) is the unexpected result.

# (0) Show our user id, and show the userid of the target directory.
id -u
1001
ls -lnd "$(dirname "$path")"
dirname "$path"
drwxr-xr-x  3 1001  0  5 Sep 29 11:04 /tmp/zfs-mount

# (1) Create the file with (O_WRONLY|O_CREAT).
testit -c 0 "(O_WRONLY|O_CREAT) failed"
open okay
write okay
close okay
-rw-r-----  1 1001  0  29 Sep 29 11:30 /tmp/zfs-mount/test.out
Wed Sep 29 11:30:48 EDT 2021

# (2) Try creating with (O_WRONLY|O_CREAT|O_EXCL).
testit -ce 0 "(O_WRONLY|O_CREAT|O_EXCL) failed"
writef: Input/output error
----------  1 1001  0  0 Sep 29 11:30 /tmp/zfs-mount/test.out
cat: /tmp/zfs-mount/test.out: Permission denied
(O_WRONLY|O_CREAT|O_EXCL) failed

# (3) Create the file, then check if (O_CREAT|O_EXCL) prevents opening
# it a second time. Note that the (O_CREAT|O_EXCL) case is supposed to
# exit non-zero.
testit -c  0 "(O_WRONLY|O_CREAT) failed" \
       -ce 1 "(O_CREAT|O_EXCL) on an existing file opened something"
open okay
write okay
close okay
-rw-r-----  1 1001  0  29 Sep 29 11:30 /tmp/zfs-mount/test.out
Wed Sep 29 11:30:50 EDT 2021
writef: File exists
-rw-r-----  1 1001  0  29 Sep 29 11:30 /tmp/zfs-mount/test.out
Wed Sep 29 11:30:50 EDT 2021

# (4) Create the file, then try to truncate it.
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
       -t 0 "(O_TRUNC) on an existing file failed"
open okay
write okay
close okay
-rw-r-----  1 1001  0  29 Sep 29 11:30 /tmp/zfs-mount/test.out
Wed Sep 29 11:30:54 EDT 2021
open okay
write okay
close okay
-rw-r-----  1 1001  0  29 Sep 29 11:30 /tmp/zfs-mount/test.out
Wed Sep 29 11:30:55 EDT 2021

# (5) Create the file, then try to append to it
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
       -a 0 "(O_APPEND) on an existing file failed"
open okay
write okay
close okay
-rw-r-----  1 1001  0  29 Sep 29 11:30 /tmp/zfs-mount/test.out
Wed Sep 29 11:30:56 EDT 2021
open okay
write okay
close okay
-rw-r-----  1 1001  0  58 Sep 29 11:30 /tmp/zfs-mount/test.out
Wed Sep 29 11:30:56 EDT 2021
Wed Sep 29 11:30:57 EDT 2021

The following are the results when the NFSv3 client is a FreeBSD host; the results on a Linux host are identical (modulo timestamps, of course).

Note that case (2) shows the same unexpected behavior as for an OSX NFSv3 client; cases (3) through (5) all demonstrate failure to write to the newly created file. (I observe that in cases (3) through (5), even after the close(2) in writef fails, the file's size shows up as 29 bytes from ls. I'm not sure if that size is coming from the NFS server or the NFS client, so the number might be a red herring.)

# (0) Show our user id, and show the userid of the target directory.
id -u
1001
ls -ld "$(dirname "$path")"
drwxr-xr-x  3 kreuter  wheel  5 Sep 29 11:47 /tmp/zfs-mount

# (1) Create the file with (O_WRONLY|O_CREAT).
testit -c 0 "(O_WRONLY|O_CREAT) failed"
open okay
write okay
writef: Permission denied
-rw-r-----  1 1001  0  29 Sep 29 11:50 /tmp/zfs-mount/test.out
(O_WRONLY|O_CREAT) failed

# (2) Try creating with (O_WRONLY|O_CREAT|O_EXCL).
testit -ce 0 "(O_WRONLY|O_CREAT|O_EXCL) failed"
writef: Input/output error
----------  1 1001  0  0 Sep 29 11:50 /tmp/zfs-mount/test.out
cat: /tmp/zfs-mount/test.out: Permission denied
(O_WRONLY|O_CREAT|O_EXCL) failed

# (3) Create the file, then check if (O_CREAT|O_EXCL) prevents opening
# it a second time. Note that the (O_CREAT|O_EXCL) case is supposed to
# exit non-zero.
testit -c  0 "(O_WRONLY|O_CREAT) failed" \
       -ce 1 "(O_CREAT|O_EXCL) on an existing file opened something"
open okay
write okay
writef: Permission denied
-rw-r-----  1 1001  0  29 Sep 29 11:50 /tmp/zfs-mount/test.out
(O_WRONLY|O_CREAT) failed

# (4) Create the file, then try to truncate it.
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
       -t 0 "(O_TRUNC) on an existing file failed"
open okay
write okay
writef: Permission denied
-rw-r-----  1 1001  0  29 Sep 29 11:50 /tmp/zfs-mount/test.out
(O_WRONLY|O_CREAT) failed

# (5) Create the file, then try to append to it
testit -c 0 "(O_WRONLY|O_CREAT) failed" \
       -a 0 "(O_APPEND) on an existing file failed"
open okay
write okay
writef: Permission denied
-rw-r-----  1 1001  0  29 Sep 29 11:50 /tmp/zfs-mount/test.out
(O_WRONLY|O_CREAT) failed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions