This is supposed to be a dtrace_id_t, which is a uint32_t, while id_t is
a uint64_t. sdt.h avoids depending on dtrace.h so we can't use
dtrace_id_t directly.
Bump __FreeBSD_version since the layout of structures in the SDT probe
linker set has changed.
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
There is a pool feature, dynamic_gang_header, that is enabled by default
in new pools. When this feature is active, gang headers may be larger
than 512 bytes. The loader needs to be taught to cope with that.
Try using the vdev ashift to pick the gang block header size. If the
checksum fails, fall back to the old gang block header size.
This is based on a patch by Paul Dagnelie, with testing, bug-fixing and
some simplifications from me.
PR: 289690
Co-authored by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D53578
LINKER_EACH_FUNCTION_NAMEVAL() stops processing the symbol table if a
callback function returns a non-zero value.
The fbt_provide_module_function() callback should not return 1 when
ignoring symbols. Instead, always return 0, as in dtrace/x86.
Fixes: 30b68ecda8 ("Changes that improve DTrace FBT reliability on freebsd/arm64:")
Reviewed by: markj, oshogbo
Approved by: oshogbo (mentor)
Obtained from: CheriBSD
Differential Revision: https://reviews.freebsd.org/D53399
dtrace_xcall() is just a thin wrapper around smp_rendezvous_cpus().
There's no need for six identical implementations to live in MD layers.
No functional change intended.
MFC after: 2 weeks
printm is specific to the FreeBSD dtrace port. I believe it's
effectively the same as tracemem(), though printm apparently predates
it. It stores the size of the buffer of traced data inline. Currently
it represents that size using a uintptr_t, which isn't really right and
poses challenges when porting to CHERI because
`DTRACE_STORE(uintptr_t, ...` requires the destination to be suitably
aligned, but this isn't necessary since we're just storing a size.
Convert to using a size_t. This should be a no-op since
sizeof(uintptr_t) == sizeof(size_t) on non-CHERI platforms (and besides
that I don't see a reason to use printm() when tracemem() is available
and is simpler to use.)
Reviewed by: Domagoj Stolfa, avg
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D52055
Fix two problems with 6dd0803ffd. First problem is that situation when
newer label was read before stale one, was handled differently to reverse
order case. Second problem is that vdev_free() would free the fully
initialized leaf vdev that carried stale label. In a case when vdev
carries a stale label, but is still referenced by a different label with
new a configuration, we don't want to free it, but rather insert it into
the new configuration.
o Provide a helper function nvlist_find_vdev_guid() that checks presence
of certain GUID in a label.
o In top level vdev store the GUID of vdev used to instantiate top vdev.
o Cover all possible cases in the block in vdev_probe() where we encounter
a known configuration. Make the diagnostic print more informative and
looking same regardless of probe order. Make this whole block easier to
read reducing one level of indentation for a price of a single comparison
at runtime.
Reviewed by: mav, imp
Differential Revision: https://reviews.freebsd.org/D51913
Fixes: 6dd0803ffd
Before this change in vdev_insert() we would avoid inserting a duplicate
vdev to the list of children, however this duplicate being unlinked from
the parent is still stored on the global list with initialized v_guid.
Such leaked duplicate can later be returned by vdev_find(). After
6dd0803ffd such leaked vdev may be freed or pointing to a freed parent,
which leads to a loader crash. Note that the leak problem was there
before 6dd0803ffd.
First, in vdev_insert() free conflicting vdev and return the existing one.
Update callers accordingly. There is only one caller that actually may
encounter this condition.
Second, eliminate global list of vdevs and make vdev_find() to work
recursively on the tree that a caller must provide. Of course, a chance
of GUID collision between members of different pools is extremely low. The
main motivation here is just to increase code robustness and fully isolate
the data structures of different pools being tasted by the loader, and
make easier debugging of bugs like the one being fixed.
Reviewed by: mav, imp
Differential Revision: https://reviews.freebsd.org/D51912
Fixes: 6dd0803ffd
The shift used to calculate uberblock location depends both
on minimum size (UBERBLOCK_SHIFT) and MAX_UBERBLOCK_SHIFT.
Since makefs defaults to use ashift 12, it incidentally
does get the correct size, but ashift 9 does not work with
current code.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D51860
Suppose a kernel module A defines an SDT provider and probes, and kernel
linker file B, dependant on A, contains tracepoints for those probes.
When sdt.ko is loaded, it iterates over all loaded KLDs to initialize
probe structures and register them with dtrace. In particular it uses
linker_file_foreach(), which is not sorted; in the above scenario, B may
be visited before A. Thus, it's possible for sdt_kld_load_probes() to
try to add tracepoints to an uninitialized SDT probe.
An example of the above arises when pfsync, pf, and sdt are loaded in
that exact order after commit 4bb3b36577.
Fix this by initializing probe structures in the first pass over loaded
KLDs. Then, the second pass can safely add tracepoints to any probe
structure.
Note that the scenario where B and A are loaded after sdt.ko is already
handled properly, as there, the kld_load eventhandler is responsible for
registering probes with dtrace, and that eventhandler fires for
dependencies before it does for the dependent KLD. This presumes,
however, that there are no cycles in the dependency graph.
Reported by: jenkins
MFC after: 2 weeks
Before this change, the first probed member of a pool would initialize
vdev tree for the pool. Now, imagine a situation when a machine has a
disk that has been removed from the pool, but the ZFS label was not
erased. That's a typical scenario - disk goes offline, it is replaced
with a spare, no data changes written to the gone disk. Then, disk
appears back at boot time and it is the first one to be probed by the
loader. It has the same pool GUID as all other members and naive loader
would not see a conflict. Then the disk will be used as source of truth
to read the bootenv.
To fix that, provide vdev_free() that allows to rollback the already
prebuilt vdev tree, so that a new one can be built from scratch. Upon
encountering a newer configuration for already known top level part of a
pool, call vdev_free() and let vdev_probe() to build a new one.
The change has been tested with loader_lua and userboot.so, but it should
have same effect on the legacy boot1 loader.
Reviewed by: tsoome, mav, imp
Differential Revision: https://reviews.freebsd.org/D51219
Switch to using sys/stdarg.h for va_list type and va_* builtins.
Make an attempt to insert the include in a sensible place. Where
style(9) was followed this is easy, where it was ignored, aim for the
first block of sys/*.h headers and don't get too fussy or try to fix
other style bugs.
Reviewed by: imp
Exp-run by: antoine (PR 286274)
Pull Request: https://github.com/freebsd/freebsd-src/pull/1595
It's unused, and the naked strcpy() was susceptible to buffer overflow
if one creates, say, a probe called "profile-2000000000ns".
Reported by: CHERI
MFC after: 1 week
Sponsored by: Innovate UK
Otherwise there's nothing preventing reordering of memory accesses with
respect to these flags being set or cleared. Since their most common
use is to create a section in dtrace probe context where memory faults
are intercepted by dtrace_trap(), such reordering can turn dtrace errors
into fatal exceptions.
MFC after: 2 weeks
Sponsored by: Innovate UK
This routine returns a monotonic count of the number of nanoseconds elapsed
since the previous call. On arm64 it uses the generic system timer. The
implementation multiplies the counter value by 10**9 then divides by the counter
frequency, but this multiplication can overflow. This can result in trace
records with non-monotonic timestamps, which breaks libdtrace's temporal
ordering algorithm.
An easy fix is to reverse the order of operations, since the counter frequency
will in general be smaller than 10**9. (In fact, it's mandated to be 1Ghz in
ARMv9, which makes life simple.) However, this can give a fair bit of error.
Adopt the calculation used on amd64, with tweaks to handle frequencies as low as
1MHz: the ARM generic timer documentation suggests that ARMv8 timers are
typically in the 1MHz-50MHz range, which is true on arm64 systems that I have
access to.
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D49244
The change is clearly wrong as it removes a dereference of the pointer
into the stack. Just revert for now.
This reverts commit 70c0670622.
Reported by: jrtc27
On arm64, the FBT provider treats tail calls as return probes. Ignoring
the question of whether this is really correct, the implementation is
wrong: instr is a pointer to uint32_t, so the removed multiplication by
the instruction size is wrong. As a result, FBT would create return
probes for intra-function branches.
MFC after: 2 weeks
Sponsored by: Innovate UK
The use of memcpy here is redundant, and also incorrect since memcpy()
might be instrumented by fbt or kinst. dtrace_bcopy() exists, but we
don't need it.
MFC after: 2 weeks
Sponsored by: Innovate UK
It serves no purpose after commit 82283cad12. No functional change
intended.
Fixes: 82283cad12 ("dtrace: Avoid including dtrace_isa.c directly into dtrace.c")
MFC after: 2 weeks
This eases porting of DTrace to CHERI, where uintptr_t and size_t aren't
interchangeable.
No functional change intended.
Reviewed by: Domagoj Stolfa <domagoj.stolfa@gmail.com>
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D48625
This type is used only to store PC values corresponding to a thread
stack trace, so a pointer type is not quite right. Switch to
vm_offset_t, as in struct stack, to simplify a port of DTrace to CHERI.
No functional change intended.
MFC after: 1 week
Sponsored by: Innovate UK
- Don't allow FBT and kinst to instrument the KMSAN runtime.
- When fetching data from the traced thread's stack, mark it as
initialized. It may well be uninitialized, but as dtrace permits
arbitrary inspection of kernel memory, it isn't very useful to raise
KMSAN reports.
- Mark data copied in from userspace as initialized, as we do for
copyin() etc. using interceptors.
MFC after: 2 weeks
We were previously allocating MAXCPU structures for several purposes,
but this is generally unnecessary and is quite excessive, especially
after MAXCPU was bumped to 1024 on amd64 and arm64. We already are
careful to allocate only as many per-CPU tracing buffers as are needed;
extend this to other allocations.
For example, in a 2-vCPU VM, the size of a consumer state structure
drops from 64KB to 128B. The size of the per-consumer `dts_buffer` and
`dts_aggbuffer` arrays shrink similarly. Ditto for pre-allocations of
local and global D variable storage space.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47667
Previously only providers in kernel modules were removed leaving
dangling pointers to tracepoints, etc. in unloaded kernel modules.
PR: 281825
Reported by: Sony Arpita Das <sonyarpitad@chelsio.com>
Reviewed by: markj
Fixes: ddf0ed09bd sdt: Implement SDT probes using hot-patching
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D46890
It is not needed after commit 7e80fd5ef397. No functional change
intended.
Reviewed by: avg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D46675
FBT refuses to create probes in modules which depend on dtrace(all), but
dtrace_test is a convenient place to add functions specifically for
testing dtrace.
The dependency on dtraceall is not needed, so just remove it. In fact,
it can be useful to test SDT probe creation by loading dtrace_test with
and without dtraceall loaded.
Reviewed by: avg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D46673
dtrace_getarg() previously walked the call stack looking for a frame
matching the dtrace_invop_callsite symbol, in order to look for a
trapframe corresponding to an invop (i.e., FBT or kinst) probe. Commit
3ba8e9dc4a broke this in some cases by breaking the expected alignment
of the dtrace_invop_callsite symbol.
Rather than groveling around the stack to find invop probe arguments,
simply use the trapframe reference saved by dtrace_invop(). This is
simpler and less fragile.
Reported by: avg
Reviewed by: avg
MFC after: 2 weeks
Fixes: 3ba8e9dc4a ("dtrace/amd64: Implement emulation of call instructions")
Differential Revision: https://reviews.freebsd.org/D46672
Notable upstream pull request merges:
#15892 -multiple Fast Dedup: Introduce the FDT on-disk format and feature flag
#15893 -multiple Fast Dedup: “flat” DDT entry format
#15895 -multiple Fast Dedup: FDT-log feature
#162396be8bf555 zpool: Provide GUID to zpool-reguid(8) with -g
#16277 -multiple Fast Dedup: prune unique entries
#163165807de90a Fix null ptr deref when renaming a zvol with snaps and snapdev=visible
#1634377a797a38 Enable L2 cache of all (MRU+MFU) metadata but MFU data only
#1644683f359245 FreeBSD: fix build without kernel option MAC
#16449963e6c9f3 Fix incorrect error report on vdev attach/replace
#16505b10992582 spa_prop_get: require caller to supply output nvlist
Obtained from: OpenZFS
OpenZFS commit: b109925820
- Modify PHOLD() to no longer fault in the process.
- Remove _PHOLD_LITE(), which is now the same as _PHOLD(), fix up
consumers.
- Remove faultin() and its callees.
Tested by: pho
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D46114
This was done in the original DTrace import, presumably because that
made it a bit easier to handle includes. However, this can cause
dtrace_getpcstack() to be inlined into dtrace_probe(), resulting in a
missing frame in stack traces since dtrace_getpcstack() takes care to
bump "aframes" to account for its own stack frame.
To avoid this, compile dtrace_isa.c separately on all platforms. Add
requisite includes.
MFC after: 2 weeks
Sponsored by: Innovate UK
DTrace probes have an "aframes" attribute, used when unwinding the stack
from dtrace_probe(). It counts the number of leading frames to skip
when returning a stack trace, thus is used to hide internal functions.
Commit ddf0ed09bd set the aframes value for SDT probes to 0, which was
correct for an earlier iteration of the patch, but now doesn't take
sdt_probe()/sdt_probe6() into account.
Fix the aframes definition for SDT probes. Also try to improve
lockstat(1) output by adding an additional aframe for lockstat probes,
which otherwise show internal mtx(9), rwlock(9), etc. functions as the
probe "caller". This is not quite correct as the number of frames to
skip may differ depending on the lock type and kernel configuration (see
e.g., the MUTEX_NOINLINE kernel option), but this is not a new problem.
Reported by: mjg
Fixes: ddf0ed09bd ("sdt: Implement SDT probes using hot-patching")
For invop providers (i.e., fbt and kinst) we can simply reach into the
invop trapframe to fetch argument registers for arguments 0-7; for
argument 8 and beyond we have to read the value off of the stack.
Reviewed by: Domagoj Stolfa, avg
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D45649
SDT calls dtrace_probe() directly, and this can be used to pass up to
five probe arguments directly. To pass the sixth argument (SDT
currently doesn't support more than this), we use a hack: just add
additional parameters to the call and cast dtrace_probe accordingly.
This happens to work on amd64, but doesn't work in general.
Modify SDT to call dtrace_probe() after storing arguments beyond the
first five in thread-local storage. Implement sdt_getargval() to fetch
extra argument values this way. An alternative would be to use invop
handlers instead and make sdt_probe_func point to a breakpoint
instruction, so that one can extract arguments using the breakpoint
exception trapframe, but this makes the providers more expensive when
enabled and doesn't seem justified. This approach works well unless we
want to add more than one or two more parameters to SDT probes, which
seems unlikely at present.
In particular, this fixes fetching the last argument of most ip and tcp
probes on arm64.
Reported by: rwatson
Reviewed by: Domagoj Stolfa
MFC after: 1 month
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D45648
The idea here is to avoid a memory access and conditional branch per
probe site. Instead, the probe is represented by an "unreachable"
unconditional function call. asm goto is used to store the address of
the probe site (represented by a no-op sled) and the address of the
function call into a tracepoint record. Each SDT probe carries a list
of tracepoints.
When the probe is enabled, the no-op sled corresponding to each
tracepoint is overwritten with a jmp to the corresponding label. The
implementation uses smp_rendezvous() to park all other CPUs while the
instruction is being overwritten, as this can't be done atomically in
general. The compiler moves argument marshalling code and the
sdt_probe() function call out-of-line, i.e., to the end of the function.
Per gallatin@ in D43504, this approach has less overhead when probes are
disabled. To make the implementation a bit simpler, I removed support
for probes with 7 arguments; nothing makes use of this except a
regression test case. It could be re-added later if need be.
The approach taken in this patch enables some more improvements:
1. We can now automatically fill out the "function" field of SDT probe
names. The SDT macros let the programmer specify the function and
module names, but this is really a bug and shouldn't have been
allowed. The intent was to be able to have the same probe in
multiple functions and to let the user restrict which probes actually
get enabled by specifying a function name or glob.
2. We can avoid branching on SDT_PROBES_ENABLED() by adding the ability
to include blocks of code in the out-of-line path. For example:
if (SDT_PROBES_ENABLED()) {
int reason = CLD_EXITED;
if (WCOREDUMP(signo))
reason = CLD_DUMPED;
else if (WIFSIGNALED(signo))
reason = CLD_KILLED;
SDT_PROBE1(proc, , , exit, reason);
}
could be written
SDT_PROBE1_EXT(proc, , , exit, reason,
int reason;
reason = CLD_EXITED;
if (WCOREDUMP(signo))
reason = CLD_DUMPED;
else if (WIFSIGNALED(signo))
reason = CLD_KILLED;
);
In the future I would like to use this mechanism more generally, e.g.,
to remove branches and marshalling code used by hwpmc, and generally to
make it easier to add new tracepoint consumers without having to add
more conditional branches to hot code paths.
Reviewed by: Domagoj Stolfa, avg
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D44483
LLD has the -zbti-report=error argument to check if the BTI note is
present when linking. To allow for this to be used when linking the
kernel and modules:
- Add the BTI note to the remaining assembly files
- Mark ptrauth.c as protected by BTI
- Disable -zbti-report for vmm hypervisor switching code as it's not
used there.
The linux64 module doesn't build with the flag as it includes vdso code
that doesn't include the note.
Reviewed by: imp, kib, emaste
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D45466
This reverts commit e92491d95f.
The general idea looked good to me. In particular, it allowed to save
some memory and avoid memory allocation failures when a large buffer
size was requested along with ring and fill policies.
But I didn't take into account that the second, supposedly unused
buffer, was actually used as the scratch buffer. The scratch buffer is
used as a temporary space for DTrace subroutines like copyin, copyinstr,
and alloca.
I think that the change can be fixed by allocating a separate smaller
buffer for the scratch buffer, but that fix would require more work than
I am able to do now. Hence the revert.
Reported by: Domagoj Stolfa
Diagnosed by: Domagoj Stolfa, markj
MFC after: immediately
No functional change, but this reduces diffs with CheriBSD downstream.
Reviewed by: andrew
Sponsored by: University of Cambridge, Google, Inc.
Differential Revision: https://reviews.freebsd.org/D44344
No functional change, but this reduces diffs with CheriBSD downstream.
Reviewed by: andrew
Sponsored by: University of Cambridge, Google, Inc.
Differential Revision: https://reviews.freebsd.org/D44342