Commits · cbb52bb2f0141802bd9a056ae220dacb849e96f3 · Verlässliche Systemsoftware / projects / osv

Oct 13, 2013
- zfs: disable jail ioctls · cbb52bb2
  Avi Kivity authored 11 years ago
  
  Not supported under osv. Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
  cbb52bb2
- bsd: implement CTASSERT() · 1ad0a6c2
  Avi Kivity authored 11 years ago
  
  Wanted by zfs_ioctl.c. Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
  1ad0a6c2
- zfs: prepare zfs_ioctl for inclusion in the build · 052123a1
  Avi Kivity authored 11 years ago
  
  Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
  052123a1
- libzpool: build adjustments · e23ab6be
  Avi Kivity authored 11 years ago
  
  Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
  e23ab6be
- build: build libzpool · 6ca2acc4
  Avi Kivity authored 11 years ago
  
  Make it part of libzfs, we don't need it separately anyway. Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
  6ca2acc4
- cpuset: add missing #include · 930d471a
  Avi Kivity authored 11 years ago
  
  Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
  930d471a
- netport: allow writing to 'physmem' · c24d2b81
  Avi Kivity authored 11 years ago
  
  Strangely, it still works even though it's not initialized. Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
  c24d2b81
- build: wire up libzfs · 7b006413
  Avi Kivity authored 11 years ago
  
  Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
  7b006413
- param.h: include <limits.h> · 71ec36ff
  Avi Kivity authored 11 years ago
  
  Fixed MAXPATHLEN Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
  71ec36ff
- mount: add MS_FORCE define · b8cfd3ce
  Avi Kivity authored 11 years ago
  
  Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
  b8cfd3ce
- netport: add some more module stubs · 6206564e
  Avi Kivity authored 11 years ago
  
  Needed by libzfs. Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
  6206564e
- libzfs: add missing includes · 15f4d1aa
  Avi Kivity authored 11 years ago
  
  Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
  15f4d1aa
Oct 10, 2013

build: define _KERNEL everywhere · 95ce17e3

Avi Kivity authored 11 years ago

We have _KERNEL defines scattered throughout the code, which makes
understanding it difficult.

Define it just once, and adjust the source to build.

We define it in an overridable variable, so that non-kernel imported code
can undo it.

95ce17e3

bsd: import libumem · 0228c7cc

Avi Kivity authored 11 years ago


Imported from FreeBSD 245655, no changes.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

0228c7cc

bsd: random imports · 5fca31d5

Avi Kivity authored 11 years ago


Import FreeBSD files, changeset 245655.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

5fca31d5

bsd: import libuutil · f3d263d1

Avi Kivity authored 11 years ago


Imported with no change from FreeBSD 245655.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

f3d263d1

zfs: import zpool and zfs command-line utilities · 90f19a2a

Avi Kivity authored 11 years ago


Imported with no change from FreeBSD 245655.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

90f19a2a

Oct 03, 2013

sys/socket.h: rename bsd_sockaddr using functions into bsd_* · fec150f3
Benoît Canet authored 11 years ago
```
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
```
fec150f3

net/if.h: reconcile osv and bsd headers · 341373c8

Benoît Canet authored 11 years ago


Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

341373c8

bsd: rename osv ticks macro to bsd_ticks to avoid conflict with boost ticks usage · 77a4dd1b
Benoît Canet authored 11 years ago
```
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
```
77a4dd1b

bsd: rename ifconf · 877771dc

Avi Kivity authored 11 years ago


bsd's ifconf conflicts with osv's; rename it.

We use the bsd version in <osv/ioctl.h>, since we currently don't support
the Linux-ABI variants of these ioctls.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Tested-By: Benoit Canet <benoit@irqsave.net>

877771dc

bsd: rename ifaddr · 53cf9be0

Avi Kivity authored 11 years ago


The bsd ifaddr struct conflicts with the osv ifaddr struct, which is a
public interface.  Rename the bsd struct to avoid conflict.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Tested-By: Benoit Canet <benoit@irqsave.net>

53cf9be0

arpa/inet.h: reconcile osv and bsd headers · defedb39

Avi Kivity authored 11 years ago


Workaround a bytorder function conflict, and reconcile a declaration.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Tested-By: Benoit Canet <benoit@irqsave.net>

defedb39

netinet/in.h: reconcile osv and bsd headers · 6a87d1a3

Avi Kivity authored 11 years ago


Some structures are duplicated; move the duplicates to a common header
<netinet/__in.h>.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Tested-By: Benoit Canet <benoit@irqsave.net>

6a87d1a3

sys/socket.h: reconcile bsd and osv variants · 9798c432

Avi Kivity authored 11 years ago


Some structures are duplicated; deduplicate them.

A few are source-compatible but not binary-compatible; use the ones from
<bits/socket.h>.

Others are both source- and binary- compatible; put them in a new header
<sys/__socket.h> which is included from both.

Work around a problem with the byteorder functions/macros.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Tested-By: Benoit Canet <benoit@irqsave.net>

9798c432

Sep 26, 2013

zfs: Update ->va_nlink in zfs_getattr() · 95f77770

Raphael S. Carvalho authored 11 years ago

Update ->va_nlink() in zfs_getattr() in preparation for sys_link().

Signed-off-by: Raphael S. Carvalho <raphael.scarv@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

95f77770

zfs: Wire up VOP_LINK · 5caf1af3

Raphael S. Carvalho authored 11 years ago


Wire up the VOP_LINK vnode operation for ZFS in preparation for
sys_link().

Signed-off-by: Raphael S. Carvalho <raphael.scarv@gmail.com>
[ penberg: drop FIGNORE, cleanup, split ]
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

5caf1af3

Sep 20, 2013

pcpu: register cpu id in our glue code · e1d1c009

Glauber Costa authored 11 years ago


All the other fields of the pcpu structure BSD expects are initialized by
the event channel. Except for the cpu id, which the code expects to be already
initialized.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

e1d1c009

Sep 17, 2013
- Remove magic number from msleep · 9826d074
  Or Cohen authored 11 years ago
  
  9826d074
Sep 16, 2013

XenStore: Support for devices without virtual-device property · 865308e8
Dmitry Fleytman authored 11 years ago

865308e8

routing: provide a valid MTU in route messages · 295d80ff

Glauber Costa authored 11 years ago

Right now we send route messages with MTUs zeroed out. This can lead
to the following assert in ip_output.c (~line 308) triggering:

    KASSERT(mtu > 0, ("%s: mtu %d <= 0, rte=%p (rt_flags=0x%08x) ifp=%p",
        __func__, mtu, rte, (rte != NULL) ? rte->rt_flags : 0, ifp));

This happens because the code will assume that if there is a valid route, that
route will have a valid MTU. And in this case, will always use the route MTU
instead of the interface one.

When we allocate the route it has a valid MTU. But when we send the route
message, we will overwrite it with the value we see in the route message.  This
is done in rtsock.c:rt_setmetrics.

With this patch, those assertion stops happening.

A note: this wasn't been seen in local instalations, only on EC2. Looking at
it, there is nothing Xen specific. The reason it was not happening on local, is
that local traffic does not go through the default route, but rather through
the local 192.168.100.0/24 route. That one seems to take a different
configuration path, and thus sets the MTU correctly.

295d80ff

Sep 15, 2013

Add copyright statements in bsd/ · fe9e6a82

Nadav Har'El authored 11 years ago

Added our copyright statements to some of the files in the top bsd/
directory, and in bsd/porting.

I only added our copyright to files which were completely by us - I did
not attempt to hunt which bsd or solaris files we modified to add our
copyright to them, I don't think this is important (or, we can do this
later).

I also found one header file (uma_stub.h) that had large chunks copied from
freebsd, so I added both the freebsd copyright and ours.

fe9e6a82

trivial: small fixes to blkfront · 21285383

Glauber Costa authored 11 years ago

* %lu => %u when reading flush. The barrier flag had the same bug, but I
ended up recreating it for flush.
* Move check of xb_flags to after we check sc->flags != NULL, as spotted by
Dima

21285383

Sep 14, 2013

xenfront: guest-side implementation of flush · 6842d8ce

Glauber Costa authored 11 years ago

Not all Xen versions implement the barrier feature in their pv disks. Such is
the case in Amazon EC2. For those, we should interpret the flush request and
wait for the requests until they are all handled. Also, we can still try to use
one of either barriers or flush-disk operations before we resort to guest-side
software implementation. Our driver currently only tests for one of them, so
this patch also implements flush requests.

Luckily, the implementation of xb_dump (which we don't currently use) needs to
do that as well, and this is also quite well isolated in xb_quiesce(). All we need
to do is call xb_quiesce() if flushing is not available in our backend

6842d8ce

Do not overwrite the buffer on writes. · 0e62d585

Glauber Costa authored 11 years ago

Even Lords make brown paper bag mistakes. This is a left over code from my
initial testing, where the buffer where set with pre existing values to make
sure they were going through. I forgot to remove them. As a result reads were
fine, but writes would just wipe the previous data from the buffer.
Incidentally, the "write-then-read-the-data-back" test I was doing would also
obviously pass, so I haven't noticed this so far.

Fix is to just leave the buffer alone.

0e62d585

Change "hz" to fix poll() premature timeout · 26a30376

Nadav Har'El authored 11 years ago

msleep() measure times in units of 1/hz seconds. We had hz = 1,000,000,
which gives excellent resolution (microsecond) but a terible range
(limits msleep()'s timeout to 35 minutes).

We had a program (Cassandra) doing poll() with a timeout of 2 hours,
which caused msleep to think we gave a negative timeout.

This patch reduces hz to 1,000, i.e., have msleep() operate in the same units
as poll(). Looking at the code, I don't believe this change will have any
ill-effects - we don't need higher resolution (freebsd code is used to
hz=1,000, which is the default there), and the code converts time units to
hz's correctly, always using the hz macro. The allowed range for timeouts will
grow to over 24 days - and match poll()'s allowed range.

26a30376

Sep 12, 2013

Support for Xen w/o vector callbacks · 1d3e336c

Dmitry Fleytman authored 11 years ago

This patch implements GSI interrupt support for Xen bus.
Needed in Xen environments w/o vector callbacks for HVM.
One example of such an environment is Amazon EC2.

1d3e336c

Sep 05, 2013

blkfront: mark device ready earlier · 7b0354b9

Glauber Costa authored 11 years ago

We cannot read the partition table from the device if the device is not marked
as ready, since all IO will stall. I believe it should be fine to just mark the
device ready before we mark our state as connected. With that change, it all
proceed normally.

7b0354b9

Aug 28, 2013

mbufs: use an entire page for jumbop zone allocations · 0d466fab

Glauber Costa authored 11 years ago

Xen has hard requirements on page transfers, and how to feed the grant tables.
The address need to be page aligned, since the pfns and not addresses are used,
and we need to provide at least a full page per buffer, since the hypervisor is
free to fill any data within the page.

To achieve that, the netfront driver will use m_cljget to attach an extended
buffer to the mbuf, from the jumbop zone, since they are page-sized. However,
two problems arise from this:

1) Allocating a page goes through malloc_large. Our implementation of malloc_large
is currently terribly inefficient, and that creates a very heavy contention site.

What I am doing with this patch is to switch our uma implementation to
alloc_page / free_page instead of malloc if the caller of zcreate so specified
(and then of course, specify it for the jumbop cache)

2) The refcount that is attached in the end of the buffer would either extend the
buffer to 4100 bytes - defeating our purpose, or then the buffer would have to be
PAGE_SIZE - 4, to accomodate for the refcount. But since the hypervisor will write
to the whole page, it will eventually overwrite the refcount.

To address that, I am allocating an external reference counter. BSD already
have some infrastructure to do that, and I am taking advantage of this.
However, I have found no way of implementing this in a way in which the
reference count can be easily deduceable from the address of the extended
buffer, without having the supporting mbuf to start from. Any external data
structure such as hashes would probably make freeing way too slow. Thankfully,
uma_find_refcnt and the UMA_ZONE_REFCNT seems to be used mostly in the
setup/destruction phase (the mbuf refcount is used directly, open coded). So my
proposal here is to remove the UMA_ZONE_REFCNT for that zone.

0d466fab

Aug 26, 2013

zfs: Fix GPF in zfs_rmnode() · 3d3c65b3

Pekka Enberg authored 11 years ago

If a crashed OSv guest is restarted, ZFS mount causes a GPF in early
startup:

  VFS: mounting zfs at /usr
  zfs: mounting osv/usr from device /dev/vblk1
  Aborted

GDB backtrace points finger at zfs_rmnode():

  #0  processor::halt_no_interrupts () at ../../arch/x64/processor.hh:212
  #1  0x00000000003e7f2a in osv::halt () at ../../core/power.cc:20
  #2  0x000000000021cdd4 in abort (msg=0x636df0 "Aborted\n") at ../../runtime.cc:95
  #3  0x000000000021cda2 in abort () at ../../runtime.cc:86
  #4  0x000000000044c149 in osv::generate_signal (siginfo=..., ef=0xffffc0003ffe7008) at ../../libc/signal.cc:44
  #5  0x000000000044c220 in osv::handle_segmentation_fault (addr=72, ef=0xffffc0003ffe7008) at ../../libc/signal.cc:55
  #6  0x0000000000366df3 in page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:876
  #7  <signal handler called>
  #8  0x0000000000345eaa in zfs_rmnode (zp=0xffffc0003d1de400)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:611
  #9  0x000000000035650c in zfs_zinactive (zp=0xffffc0003d1de400)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1355
  #10 0x0000000000345be1 in zfs_unlinked_drain (zfsvfs=0xffffc0003ddfe000)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:523
  #11 0x000000000034f45c in zfsvfs_setup (zfsvfs=0xffffc0003ddfe000, mounting=true)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:881
  #12 0x000000000034f7a4 in zfs_domount (vfsp=0xffffc0003de02000, osname=0x6b14cb "osv/usr")
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1016
  #13 0x000000000034f98c in zfs_mount (mp=0xffffc0003de02000, dev=0x6b14d7 "/dev/vblk1", flags=0, data=0x6b14cb)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1415
  #14 0x0000000000406852 in sys_mount (dev=0x6b14d7 "/dev/vblk1", dir=0x6b14a3 "/usr", fsname=0x6b14d3 "zfs", flags=0, data=0x6b14cb)
      at ../../fs/vfs/vfs_mount.c:171
  #15 0x00000000003eff97 in mount_usr () at ../../fs/vfs/main.cc:1415
  #16 0x0000000000203a89 in do_main_thread (_args=0xffffc0003fe9ced0) at ../../loader.cc:215
  #17 0x0000000000448575 in pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, pthread_private::thread_attr const*)::{lambda()#1}::operator()() const () at ../../libc/pthread.cc:59
  #18 0x00000000004499d3 in std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda0>::_M_invoke(const std::_Any_data &) (__functor=...)
      at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2071
  #19 0x000000000037e602 in std::function<void ()>::operator()() const (this=0xffffc0003e170038)
      at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2468
  #20 0x00000000003bae3e in sched::thread::main (this=0xffffc0003e170010) at ../../core/sched.cc:581
  #21 0x00000000003b8c92 in sched::thread_main_c (t=0xffffc0003e170010) at ../../arch/x64/arch-switch.hh:133
  #22 0x0000000000399c8e in thread_main () at ../../arch/x64/entry.S:101

The problem is that ZFS tries to check if the znode is an attribute
directory and trips over zp->z_vnode being NULL.  However, as explained
in commit b7ee91ef ("zfs: port vop_lookup"), we don't even support
extended attributes so drop the check completely for OSv.

3d3c65b3