commit b70bfeb98635040588883503d2760e0f46231491 Author: Greg Kroah-Hartman Date: Sat Oct 29 10:20:36 2022 +0200 Linux 5.4.221 Link: https://lore.kernel.org/r/20221027165049.817124510@linuxfoundation.org Tested-by: Sudip Mukherjee Tested-by: Jon Hunter Tested-by: Linux Kernel Functional Testing Tested-by: Florian Fainelli Tested-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit 6bb8769326c46db3058780c0640dcc49d8187b24 Author: Seth Jenkins Date: Thu Oct 27 11:36:52 2022 -0400 mm: /proc/pid/smaps_rollup: fix no vma's null-deref Commit 258f669e7e88 ("mm: /proc/pid/smaps_rollup: convert to single value seq_file") introduced a null-deref if there are no vma's in the task in show_smaps_rollup. Fixes: 258f669e7e88 ("mm: /proc/pid/smaps_rollup: convert to single value seq_file") Signed-off-by: Seth Jenkins Reviewed-by: Alexey Dobriyan Tested-by: Alexey Dobriyan Signed-off-by: Greg Kroah-Hartman commit a351077e589d69f83d0b624543d399c131e09e6c Author: Gaurav Kohli Date: Wed Oct 5 22:52:59 2022 -0700 hv_netvsc: Fix race between VF offering and VF association message from host commit 365e1ececb2905f94cc10a5817c5b644a32a3ae2 upstream. During vm boot, there might be possibility that vf registration call comes before the vf association from host to vm. And this might break netvsc vf path, To prevent the same block vf registration until vf bind message comes from host. Cc: stable@vger.kernel.org Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number") Reviewed-by: Haiyang Zhang Signed-off-by: Gaurav Kohli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2f1b3377b6fc5e30d340bb7392dc7e0ea38fb8a6 Author: Nick Desaulniers Date: Mon Oct 24 13:34:14 2022 -0700 Makefile.debug: re-enable debug info for .S files This is _not_ an upstream commit and just for 5.4.y only. It is based on commit 32ef9e5054ec0321b9336058c58ec749e9c6b0fe upstream. Alexey reported that the fraction of unknown filename instances in kallsyms grew from ~0.3% to ~10% recently; Bill and Greg tracked it down to assembler defined symbols, which regressed as a result of: commit b8a9092330da ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1") In that commit, I allude to restoring debug info for assembler defined symbols in a follow up patch, but it seems I forgot to do so in commit a66049e2cf0e ("Kbuild: make DWARF version a choice") Fixes: b8a9092330da ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1") Signed-off-by: Nick Desaulniers Signed-off-by: Greg Kroah-Hartman commit 9220881831c3832dc23db387c1fedcca04e81855 Author: Werner Sembach Date: Wed Oct 26 17:22:46 2022 +0200 ACPI: video: Force backlight native for more TongFang devices commit 3dbc80a3e4c55c4a5b89ef207bed7b7de36157b4 upstream. This commit is very different from the upstream commit! It fixes the same issue by adding more quirks, rather then the general fix from the 6.1 kernel, because the general fix from the 6.1 kernel is part of a larger refactoring of the backlight code which is not suitable for the stable series. As described in "ACPI: video: Drop NL5x?U, PF4NU1F and PF5?U?? acpi_backlight=native quirks" (10212754a0d2) the upstream commit "ACPI: video: Make backlight class device registration a separate step (v2)" (3dbc80a3e4c5) makes these quirks unnecessary. However as mentioned in this bugtracker ticket https://bugzilla.kernel.org/show_bug.cgi?id=215683#c17 the upstream fix is part of a larger patchset that is overall too complex for stable. The TongFang GKxNRxx, GMxNGxx, GMxZGxx, and GMxRGxx / TUXEDO Stellaris/Polaris Gen 1-4, have the same problem as the Clevo NL5xRU and NL5xNU / TUXEDO Aura 15 Gen1 and Gen2: They have a working native and video interface for screen backlight. However the default detection mechanism first registers the video interface before unregistering it again and switching to the native interface during boot. This results in a dangling SBIOS request for backlight change for some reason, causing the backlight to switch to ~2% once per boot on the first power cord connect or disconnect event. Setting the native interface explicitly circumvents this buggy behaviour by avoiding the unregistering process. Reviewed-by: Hans de Goede Signed-off-by: Werner Sembach Signed-off-by: Greg Kroah-Hartman commit 8ad8fc82eee8a7ced44fcefd993a7a18f84d773a Author: Conor Dooley Date: Wed Oct 19 13:52:10 2022 +0100 riscv: topology: fix default topology reporting commit fbd92809997a391f28075f1c8b5ee314c225557c upstream. RISC-V has no sane defaults to fall back on where there is no cpu-map in the devicetree. Without sane defaults, the package, core and thread IDs are all set to -1. This causes user-visible inaccuracies for tools like hwloc/lstopo which rely on the sysfs cpu topology files to detect a system's topology. On a PolarFire SoC, which should have 4 harts with a thread each, lstopo currently reports: Machine (793MB total) Package L#0 NUMANode L#0 (P#0 793MB) Core L#0 L1d L#0 (32KB) + L1i L#0 (32KB) + PU L#0 (P#0) L1d L#1 (32KB) + L1i L#1 (32KB) + PU L#1 (P#1) L1d L#2 (32KB) + L1i L#2 (32KB) + PU L#2 (P#2) L1d L#3 (32KB) + L1i L#3 (32KB) + PU L#3 (P#3) Adding calls to store_cpu_topology() in {boot,smp} hart bringup code results in the correct topolgy being reported: Machine (793MB total) Package L#0 NUMANode L#0 (P#0 793MB) L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0) L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1) L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2) L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3) CC: stable@vger.kernel.org # 456797da792f: arm64: topology: move store_cpu_topology() to shared code Fixes: 03f11f03dbfe ("RISC-V: Parse cpu topology during boot.") Reported-by: Brice Goglin Link: https://github.com/open-mpi/hwloc/issues/536 Reviewed-by: Sudeep Holla Reviewed-by: Atish Patra Signed-off-by: Conor Dooley Signed-off-by: Greg Kroah-Hartman commit 60dd3dc2acc4088d717f452b2094c45734c52ce8 Author: Conor Dooley Date: Wed Oct 19 13:52:09 2022 +0100 arm64: topology: move store_cpu_topology() to shared code commit 456797da792fa7cbf6698febf275fe9b36691f78 upstream. arm64's method of defining a default cpu topology requires only minimal changes to apply to RISC-V also. The current arm64 implementation exits early in a uniprocessor configuration by reading MPIDR & claiming that uniprocessor can rely on the default values. This is appears to be a hangover from prior to '3102bc0e6ac7 ("arm64: topology: Stop using MPIDR for topology information")', because the current code just assigns default values for multiprocessor systems. With the MPIDR references removed, store_cpu_topolgy() can be moved to the common arch_topology code. Reviewed-by: Sudeep Holla Acked-by: Catalin Marinas Reviewed-by: Atish Patra Signed-off-by: Conor Dooley Signed-off-by: Greg Kroah-Hartman commit 724483b585a1b1e063d42ac5aa835707ff2ec165 Author: Jerry Snitselaar Date: Wed Oct 19 08:44:47 2022 +0800 iommu/vt-d: Clean up si_domain in the init_dmars() error path [ Upstream commit 620bf9f981365c18cc2766c53d92bf8131c63f32 ] A splat from kmem_cache_destroy() was seen with a kernel prior to commit ee2653bbe89d ("iommu/vt-d: Remove domain and devinfo mempool") when there was a failure in init_dmars(), because the iommu_domain cache still had objects. While the mempool code is now gone, there still is a leak of the si_domain memory if init_dmars() fails. So clean up si_domain in the init_dmars() error path. Cc: Lu Baolu Cc: Joerg Roedel Cc: Will Deacon Cc: Robin Murphy Fixes: 86080ccc223a ("iommu/vt-d: Allocate si_domain in init_dmars()") Signed-off-by: Jerry Snitselaar Link: https://lore.kernel.org/r/20221010144842.308890-1-jsnitsel@redhat.com Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin commit dfc0337c6dceb6449403b33ecb141f4a1458a1e9 Author: Yang Yingliang Date: Tue Oct 18 20:24:51 2022 +0800 net: hns: fix possible memory leak in hnae_ae_register() [ Upstream commit ff2f5ec5d009844ec28f171123f9e58750cef4bf ] Inject fault while probing module, if device_register() fails, but the refcount of kobject is not decreased to 0, the name allocated in dev_set_name() is leaked. Fix this by calling put_device(), so that name can be freed in callback function kobject_cleanup(). unreferenced object 0xffff00c01aba2100 (size 128): comm "systemd-udevd", pid 1259, jiffies 4294903284 (age 294.152s) hex dump (first 32 bytes): 68 6e 61 65 30 00 00 00 18 21 ba 1a c0 00 ff ff hnae0....!...... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000034783f26>] slab_post_alloc_hook+0xa0/0x3e0 [<00000000748188f2>] __kmem_cache_alloc_node+0x164/0x2b0 [<00000000ab0743e8>] __kmalloc_node_track_caller+0x6c/0x390 [<000000006c0ffb13>] kvasprintf+0x8c/0x118 [<00000000fa27bfe1>] kvasprintf_const+0x60/0xc8 [<0000000083e10ed7>] kobject_set_name_vargs+0x3c/0xc0 [<000000000b87affc>] dev_set_name+0x7c/0xa0 [<000000003fd8fe26>] hnae_ae_register+0xcc/0x190 [hnae] [<00000000fe97edc9>] hns_dsaf_ae_init+0x9c/0x108 [hns_dsaf] [<00000000c36ff1eb>] hns_dsaf_probe+0x548/0x748 [hns_dsaf] Fixes: 6fe6611ff275 ("net: add Hisilicon Network Subsystem hnae framework support") Signed-off-by: Yang Yingliang Reviewed-by: Leon Romanovsky Link: https://lore.kernel.org/r/20221018122451.1749171-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit bc8301ea7e7f1bb9d2ba2fcdf7b5ec2f0792b47e Author: Zhengchao Shao Date: Tue Oct 18 14:31:59 2022 +0800 net: sched: cake: fix null pointer access issue when cake_init() fails [ Upstream commit 51f9a8921ceacd7bf0d3f47fa867a64988ba1dcb ] When the default qdisc is cake, if the qdisc of dev_queue fails to be inited during mqprio_init(), cake_reset() is invoked to clear resources. In this case, the tins is NULL, and it will cause gpf issue. The process is as follows: qdisc_create_dflt() cake_init() q->tins = kvcalloc(...) --->failed, q->tins is NULL ... qdisc_put() ... cake_reset() ... cake_dequeue_one() b = &q->tins[...] --->q->tins is NULL The following is the Call Trace information: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:cake_dequeue_one+0xc9/0x3c0 Call Trace: cake_reset+0xb1/0x140 qdisc_reset+0xed/0x6f0 qdisc_destroy+0x82/0x4c0 qdisc_put+0x9e/0xb0 qdisc_create_dflt+0x2c3/0x4a0 mqprio_init+0xa71/0x1760 qdisc_create+0x3eb/0x1000 tc_modify_qdisc+0x408/0x1720 rtnetlink_rcv_msg+0x38e/0xac0 netlink_rcv_skb+0x12d/0x3a0 netlink_unicast+0x4a2/0x740 netlink_sendmsg+0x826/0xcc0 sock_sendmsg+0xc5/0x100 ____sys_sendmsg+0x583/0x690 ___sys_sendmsg+0xe8/0x160 __sys_sendmsg+0xbf/0x160 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f89e5122d04 Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Zhengchao Shao Acked-by: Toke Høiland-Jørgensen Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit b87f88d58f1b404769fb6999e965beeee6958027 Author: Harini Katakam Date: Fri Oct 14 12:17:35 2022 +0530 net: phy: dp83867: Extend RX strap quirk for SGMII mode [ Upstream commit 0c9efbd5c50c64ead434960a404c9c9a097b0403 ] When RX strap in HW is not set to MODE 3 or 4, bit 7 and 8 in CF4 register should be set. The former is already handled in dp83867_config_init; add the latter in SGMII specific initialization. Fixes: 2a10154abcb7 ("net: phy: dp83867: Add TI dp83867 phy") Signed-off-by: Harini Katakam Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 6453077a00c1b2f068a1b6e184601d50eff653ea Author: Xiaobo Liu Date: Fri Oct 14 10:05:40 2022 +0800 net/atm: fix proc_mpc_write incorrect return value [ Upstream commit d8bde3bf7f82dac5fc68a62c2816793a12cafa2a ] Then the input contains '\0' or '\n', proc_mpc_write has read them, so the return value needs +1. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xiaobo Liu Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 4258c473ee03199ec18d9667e2f99abf72ec6e2f Author: José Expósito Date: Sun Oct 9 20:27:47 2022 +0200 HID: magicmouse: Do not set BTN_MOUSE on double report [ Upstream commit bb5f0c855dcfc893ae5ed90e4c646bde9e4498bf ] Under certain conditions the Magic Trackpad can group 2 reports in a single packet. The packet is split and the raw event function is invoked recursively for each part. However, after processing each part, the BTN_MOUSE status is updated, sending multiple click events. [1] Return after processing double reports to avoid this issue. Link: https://gitlab.freedesktop.org/libinput/libinput/-/issues/811 # [1] Fixes: a462230e16ac ("HID: magicmouse: enable Magic Trackpad support") Reported-by: Nulo Signed-off-by: José Expósito Signed-off-by: Benjamin Tissoires Link: https://lore.kernel.org/r/20221009182747.90730-1-jose.exposito89@gmail.com Signed-off-by: Sasha Levin commit 567f8de358b61015dcfb8878a1f06c5369a45f54 Author: Alexander Potapenko Date: Wed Oct 12 17:25:14 2022 +0200 tipc: fix an information leak in tipc_topsrv_kern_subscr [ Upstream commit 777ecaabd614d47c482a5c9031579e66da13989a ] Use a 8-byte write to initialize sub.usr_handle in tipc_topsrv_kern_subscr(), otherwise four bytes remain uninitialized when issuing setsockopt(..., SOL_TIPC, ...). This resulted in an infoleak reported by KMSAN when the packet was received: ===================================================== BUG: KMSAN: kernel-infoleak in copyout+0xbc/0x100 lib/iov_iter.c:169 instrument_copy_to_user ./include/linux/instrumented.h:121 copyout+0xbc/0x100 lib/iov_iter.c:169 _copy_to_iter+0x5c0/0x20a0 lib/iov_iter.c:527 copy_to_iter ./include/linux/uio.h:176 simple_copy_to_iter+0x64/0xa0 net/core/datagram.c:513 __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:419 skb_copy_datagram_iter+0x58/0x200 net/core/datagram.c:527 skb_copy_datagram_msg ./include/linux/skbuff.h:3903 packet_recvmsg+0x521/0x1e70 net/packet/af_packet.c:3469 ____sys_recvmsg+0x2c4/0x810 net/socket.c:? ___sys_recvmsg+0x217/0x840 net/socket.c:2743 __sys_recvmsg net/socket.c:2773 __do_sys_recvmsg net/socket.c:2783 __se_sys_recvmsg net/socket.c:2780 __x64_sys_recvmsg+0x364/0x540 net/socket.c:2780 do_syscall_x64 arch/x86/entry/common.c:50 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd arch/x86/entry/entry_64.S:120 ... Uninit was stored to memory at: tipc_sub_subscribe+0x42d/0xb50 net/tipc/subscr.c:156 tipc_conn_rcv_sub+0x246/0x620 net/tipc/topsrv.c:375 tipc_topsrv_kern_subscr+0x2e8/0x400 net/tipc/topsrv.c:579 tipc_group_create+0x4e7/0x7d0 net/tipc/group.c:190 tipc_sk_join+0x2a8/0x770 net/tipc/socket.c:3084 tipc_setsockopt+0xae5/0xe40 net/tipc/socket.c:3201 __sys_setsockopt+0x87f/0xdc0 net/socket.c:2252 __do_sys_setsockopt net/socket.c:2263 __se_sys_setsockopt net/socket.c:2260 __x64_sys_setsockopt+0xe0/0x160 net/socket.c:2260 do_syscall_x64 arch/x86/entry/common.c:50 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd arch/x86/entry/entry_64.S:120 Local variable sub created at: tipc_topsrv_kern_subscr+0x57/0x400 net/tipc/topsrv.c:562 tipc_group_create+0x4e7/0x7d0 net/tipc/group.c:190 Bytes 84-87 of 88 are uninitialized Memory access of size 88 starts at ffff88801ed57cd0 Data copied to user address 0000000020000400 ... ===================================================== Signed-off-by: Alexander Potapenko Fixes: 026321c6d056a5 ("tipc: rename tipc_server to tipc_topsrv") Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 27ee73c1199e0a8fec0784f4aa333ebb022e7348 Author: Mark Tomlinson Date: Mon Oct 10 15:46:13 2022 +1300 tipc: Fix recognition of trial period [ Upstream commit 28be7ca4fcfd69a2d52aaa331adbf9dbe91f9e6e ] The trial period exists until jiffies is after addr_trial_end. But as jiffies will eventually overflow, just using time_after will eventually give incorrect results. As the node address is set once the trial period ends, this can be used to know that we are not in the trial period. Fixes: e415577f57f4 ("tipc: correct discovery message handling during address trial period") Signed-off-by: Mark Tomlinson Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit fa0676d94fa4e91e08d30f25145883ffcdd54ab9 Author: Tony Luck Date: Mon Oct 10 13:34:23 2022 -0700 ACPI: extlog: Handle multiple records [ Upstream commit f6ec01da40e4139b41179f046044ee7c4f6370dc ] If there is no user space consumer of extlog_mem trace records, then Linux properly handles multiple error records in an ELOG block extlog_print() print_extlog_rcd() __print_extlog_rcd() cper_estatus_print() apei_estatus_for_each_section() But the other code path hard codes looking for a single record to output a trace record. Fix by using the same apei_estatus_for_each_section() iterator to step over all records. Fixes: 2dfb7d51a61d ("trace, RAS: Add eMCA trace event interface") Signed-off-by: Tony Luck Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin commit 13a2719ec89fd421661624d28de699d8f29b996a Author: Filipe Manana Date: Tue Oct 11 13:16:52 2022 +0100 btrfs: fix processing of delayed tree block refs during backref walking [ Upstream commit 943553ef9b51db303ab2b955c1025261abfdf6fb ] During backref walking, when processing a delayed reference with a type of BTRFS_TREE_BLOCK_REF_KEY, we have two bugs there: 1) We are accessing the delayed references extent_op, and its key, without the protection of the delayed ref head's lock; 2) If there's no extent op for the delayed ref head, we end up with an uninitialized key in the stack, variable 'tmp_op_key', and then pass it to add_indirect_ref(), which adds the reference to the indirect refs rb tree. This is wrong, because indirect references should have a NULL key when we don't have access to the key, and in that case they should be added to the indirect_missing_keys rb tree and not to the indirect rb tree. This means that if have BTRFS_TREE_BLOCK_REF_KEY delayed ref resulting from freeing an extent buffer, therefore with a count of -1, it will not cancel out the corresponding reference we have in the extent tree (with a count of 1), since both references end up in different rb trees. When using fiemap, where we often need to check if extents are shared through shared subtrees resulting from snapshots, it means we can incorrectly report an extent as shared when it's no longer shared. However this is temporary because after the transaction is committed the extent is no longer reported as shared, as running the delayed reference results in deleting the tree block reference from the extent tree. Outside the fiemap context, the result is unpredictable, as the key was not initialized but it's used when navigating the rb trees to insert and search for references (prelim_ref_compare()), and we expect all references in the indirect rb tree to have valid keys. The following reproducer triggers the second bug: $ cat test.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV mount -o compress $DEV $MNT # With a compressed 128M file we get a tree height of 2 (level 1 root). xfs_io -f -c "pwrite -b 1M 0 128M" $MNT/foo btrfs subvolume snapshot $MNT $MNT/snap # Fiemap should output 0x2008 in the flags column. # 0x2000 means shared extent # 0x8 means encoded extent (because it's compressed) echo echo "fiemap after snapshot, range [120M, 120M + 128K):" xfs_io -c "fiemap -v 120M 128K" $MNT/foo echo # Overwrite one extent and fsync to flush delalloc and COW a new path # in the snapshot's tree. # # After this we have a BTRFS_DROP_DELAYED_REF delayed ref of type # BTRFS_TREE_BLOCK_REF_KEY with a count of -1 for every COWed extent # buffer in the path. # # In the extent tree we have inline references of type # BTRFS_TREE_BLOCK_REF_KEY, with a count of 1, for the same extent # buffers, so they should cancel each other, and the extent buffers in # the fs tree should no longer be considered as shared. # echo "Overwriting file range [120M, 120M + 128K)..." xfs_io -c "pwrite -b 128K 120M 128K" $MNT/snap/foo xfs_io -c "fsync" $MNT/snap/foo # Fiemap should output 0x8 in the flags column. The extent in the range # [120M, 120M + 128K) is no longer shared, it's now exclusive to the fs # tree. echo echo "fiemap after overwrite range [120M, 120M + 128K):" xfs_io -c "fiemap -v 120M 128K" $MNT/foo echo umount $MNT Running it before this patch: $ ./test.sh (...) wrote 134217728/134217728 bytes at offset 0 128 MiB, 128 ops; 0.1152 sec (1.085 GiB/sec and 1110.5809 ops/sec) Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap' fiemap after snapshot, range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008 Overwriting file range [120M, 120M + 128K)... wrote 131072/131072 bytes at offset 125829120 128 KiB, 1 ops; 0.0001 sec (683.060 MiB/sec and 5464.4809 ops/sec) fiemap after overwrite range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008 The extent in the range [120M, 120M + 128K) is still reported as shared (0x2000 bit set) after overwriting that range and flushing delalloc, which is not correct - an entire path was COWed in the snapshot's tree and the extent is now only referenced by the original fs tree. Running it after this patch: $ ./test.sh (...) wrote 134217728/134217728 bytes at offset 0 128 MiB, 128 ops; 0.1198 sec (1.043 GiB/sec and 1068.2067 ops/sec) Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap' fiemap after snapshot, range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008 Overwriting file range [120M, 120M + 128K)... wrote 131072/131072 bytes at offset 125829120 128 KiB, 1 ops; 0.0001 sec (694.444 MiB/sec and 5555.5556 ops/sec) fiemap after overwrite range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x8 Now the extent is not reported as shared anymore. So fix this by passing a NULL key pointer to add_indirect_ref() when processing a delayed reference for a tree block if there's no extent op for our delayed ref head with a defined key. Also access the extent op only after locking the delayed ref head's lock. The reproducer will be converted later to a test case for fstests. Fixes: 86d5f994425252 ("btrfs: convert prelimary reference tracking to use rbtrees") Fixes: a6dbceafb915e8 ("btrfs: Remove unused op_key var from add_delayed_refs") Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Sasha Levin commit b397ce3477754b7562c7a82510ef24d99418ea0c Author: Filipe Manana Date: Tue Oct 11 13:16:51 2022 +0100 btrfs: fix processing of delayed data refs during backref walking [ Upstream commit 4fc7b57228243d09c0d878873bf24fa64a90fa01 ] When processing delayed data references during backref walking and we are using a share context (we are being called through fiemap), whenever we find a delayed data reference for an inode different from the one we are interested in, then we immediately exit and consider the data extent as shared. This is wrong, because: 1) This might be a DROP reference that will cancel out a reference in the extent tree; 2) Even if it's an ADD reference, it may be followed by a DROP reference that cancels it out. In either case we should not exit immediately. Fix this by never exiting when we find a delayed data reference for another inode - instead add the reference and if it does not cancel out other delayed reference, we will exit early when we call extent_is_shared() after processing all delayed references. If we find a drop reference, then signal the code that processes references from the extent tree (add_inline_refs() and add_keyed_refs()) to not exit immediately if it finds there a reference for another inode, since we have delayed drop references that may cancel it out. In this later case we exit once we don't have references in the rb trees that cancel out each other and have two references for different inodes. Example reproducer for case 1): $ cat test-1.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV mount $DEV $MNT xfs_io -f -c "pwrite 0 64K" $MNT/foo cp --reflink=always $MNT/foo $MNT/bar echo echo "fiemap after cloning:" xfs_io -c "fiemap -v" $MNT/foo rm -f $MNT/bar echo echo "fiemap after removing file bar:" xfs_io -c "fiemap -v" $MNT/foo umount $MNT Running it before this patch, the extent is still listed as shared, it has the flag 0x2000 (FIEMAP_EXTENT_SHARED) set: $ ./test-1.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 Example reproducer for case 2): $ cat test-2.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV mount $DEV $MNT xfs_io -f -c "pwrite 0 64K" $MNT/foo cp --reflink=always $MNT/foo $MNT/bar # Flush delayed references to the extent tree and commit current # transaction. sync echo echo "fiemap after cloning:" xfs_io -c "fiemap -v" $MNT/foo rm -f $MNT/bar echo echo "fiemap after removing file bar:" xfs_io -c "fiemap -v" $MNT/foo umount $MNT Running it before this patch, the extent is still listed as shared, it has the flag 0x2000 (FIEMAP_EXTENT_SHARED) set: $ ./test-2.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 After this patch, after deleting bar in both tests, the extent is not reported with the 0x2000 flag anymore, it gets only the flag 0x1 (which is FIEMAP_EXTENT_LAST): $ ./test-1.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x1 $ ./test-2.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x1 These tests will later be converted to a test case for fstests. Fixes: dc046b10c8b7d4 ("Btrfs: make fiemap not blow when you have lots of snapshots") Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Sasha Levin commit 96894a4fe6b0d066763a936aa549074c92208062 Author: Jean-Francois Le Fillatre Date: Wed Aug 24 21:14:36 2022 +0200 r8152: add PID for the Lenovo OneLink+ Dock commit 1bd3a383075c64d638e65d263c9267b08ee7733c upstream. The Lenovo OneLink+ Dock contains an RTL8153 controller that behaves as a broken CDC device by default. Add the custom Lenovo PID to the r8152 driver to support it properly. Also, systems compatible with this dock provide a BIOS option to enable MAC address passthrough (as per Lenovo document "ThinkPad Docking Solutions 2017"). Add the custom PID to the MAC passthrough list too. Tested on a ThinkPad 13 1st gen with the expected results: passthrough disabled: Invalid header when reading pass-thru MAC addr passthrough enabled: Using pass-thru MAC addr XX:XX:XX:XX:XX:XX Signed-off-by: Jean-Francois Le Fillatre Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7f6d2188ec3357d93238d1a1cf509451fc44eb85 Author: James Morse Date: Thu Jul 14 17:15:23 2022 +0100 arm64: errata: Remove AES hwcap for COMPAT tasks commit 44b3834b2eed595af07021b1c64e6f9bc396398b upstream. Cortex-A57 and Cortex-A72 have an erratum where an interrupt that occurs between a pair of AES instructions in aarch32 mode may corrupt the ELR. The task will subsequently produce the wrong AES result. The AES instructions are part of the cryptographic extensions, which are optional. User-space software will detect the support for these instructions from the hwcaps. If the platform doesn't support these instructions a software implementation should be used. Remove the hwcap bits on affected parts to indicate user-space should not use the AES instructions. Acked-by: Ard Biesheuvel Signed-off-by: James Morse Link: https://lore.kernel.org/r/20220714161523.279570-3-james.morse@arm.com Signed-off-by: Will Deacon [florian: resolved conflicts in arch/arm64/tools/cpucaps and cpu_errata.c] Signed-off-by: Florian Fainelli Signed-off-by: Greg Kroah-Hartman commit aae35081633f897092bbc3bfd3c3aa58510c6394 Author: Bryan O'Donoghue Date: Tue Jul 26 04:14:54 2022 +0200 media: venus: dec: Handle the case where find_format fails commit 06a2da340f762addc5935bf851d95b14d4692db2 upstream. Debugging the decoder on msm8916 I noticed the vdec probe was crashing if the fmt pointer was NULL. A similar fix from Colin Ian King found by Coverity was implemented for the encoder. Implement the same fix on the decoder. Fixes: 7472c1c69138 ("[media] media: venus: vdec: add video decoder files") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Bryan O'Donoghue Signed-off-by: Stanimir Varbanov Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit fd596e7371acc0c42b941036755c0314eac091ba Author: Eric Ren Date: Sat Oct 15 11:19:28 2022 +0800 KVM: arm64: vgic: Fix exit condition in scan_its_table() commit c000a2607145d28b06c697f968491372ea56c23a upstream. With some PCIe topologies, restoring a guest fails while parsing the ITS device tables. Reproducer hints: 1. Create ARM virt VM with pxb-pcie bus which adds extra host bridges, with qemu command like: ``` -device pxb-pcie,bus_nr=8,id=pci.x,numa_node=0,bus=pcie.0 \ -device pcie-root-port,..,bus=pci.x \ ... -device pxb-pcie,bus_nr=37,id=pci.y,numa_node=1,bus=pcie.0 \ -device pcie-root-port,..,bus=pci.y \ ... ``` 2. Ensure the guest uses 2-level device table 3. Perform VM migration which calls save/restore device tables In that setup, we get a big "offset" between 2 device_ids, which makes unsigned "len" round up a big positive number, causing the scan loop to continue with a bad GPA. For example: 1. L1 table has 2 entries; 2. and we are now scanning at L2 table entry index 2075 (pointed to by L1 first entry) 3. if next device id is 9472, we will get a big offset: 7397; 4. with unsigned 'len', 'len -= offset * esz', len will underflow to a positive number, mistakenly into next iteration with a bad GPA; (It should break out of the current L2 table scanning, and jump into the next L1 table entry) 5. that bad GPA fails the guest read. Fix it by stopping the L2 table scan when the next device id is outside of the current table, allowing the scan to continue from the next L1 table entry. Thanks to Eric Auger for the fix suggestion. Fixes: 920a7a8fa92a ("KVM: arm64: vgic-its: Add infrastructure for tableookup") Suggested-by: Eric Auger Signed-off-by: Eric Ren [maz: commit message tidy-up] Signed-off-by: Marc Zyngier Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/d9c3a564af9e2c5bf63f48a7dcbf08cd593c5c0b.1665802985.git.renzhengeek@gmail.com Signed-off-by: Greg Kroah-Hartman commit 383b7c50f5445ff8dbbf03080905648d6980c39d Author: Kai-Heng Feng Date: Tue Oct 11 10:46:17 2022 +0800 ata: ahci: Match EM_MAX_SLOTS with SATA_PMP_MAX_PORTS commit 1e41e693f458eef2d5728207dbd327cd3b16580a upstream. UBSAN complains about array-index-out-of-bounds: [ 1.980703] kernel: UBSAN: array-index-out-of-bounds in /build/linux-9H675w/linux-5.15.0/drivers/ata/libahci.c:968:41 [ 1.980709] kernel: index 15 is out of range for type 'ahci_em_priv [8]' [ 1.980713] kernel: CPU: 0 PID: 209 Comm: scsi_eh_8 Not tainted 5.15.0-25-generic #25-Ubuntu [ 1.980716] kernel: Hardware name: System manufacturer System Product Name/P5Q3, BIOS 1102 06/11/2010 [ 1.980718] kernel: Call Trace: [ 1.980721] kernel: [ 1.980723] kernel: show_stack+0x52/0x58 [ 1.980729] kernel: dump_stack_lvl+0x4a/0x5f [ 1.980734] kernel: dump_stack+0x10/0x12 [ 1.980736] kernel: ubsan_epilogue+0x9/0x45 [ 1.980739] kernel: __ubsan_handle_out_of_bounds.cold+0x44/0x49 [ 1.980742] kernel: ahci_qc_issue+0x166/0x170 [libahci] [ 1.980748] kernel: ata_qc_issue+0x135/0x240 [ 1.980752] kernel: ata_exec_internal_sg+0x2c4/0x580 [ 1.980754] kernel: ? vprintk_default+0x1d/0x20 [ 1.980759] kernel: ata_exec_internal+0x67/0xa0 [ 1.980762] kernel: sata_pmp_read+0x8d/0xc0 [ 1.980765] kernel: sata_pmp_read_gscr+0x3c/0x90 [ 1.980768] kernel: sata_pmp_attach+0x8b/0x310 [ 1.980771] kernel: ata_eh_revalidate_and_attach+0x28c/0x4b0 [ 1.980775] kernel: ata_eh_recover+0x6b6/0xb30 [ 1.980778] kernel: ? ahci_do_hardreset+0x180/0x180 [libahci] [ 1.980783] kernel: ? ahci_stop_engine+0xb0/0xb0 [libahci] [ 1.980787] kernel: ? ahci_do_softreset+0x290/0x290 [libahci] [ 1.980792] kernel: ? trace_event_raw_event_ata_eh_link_autopsy_qc+0xe0/0xe0 [ 1.980795] kernel: sata_pmp_eh_recover.isra.0+0x214/0x560 [ 1.980799] kernel: sata_pmp_error_handler+0x23/0x40 [ 1.980802] kernel: ahci_error_handler+0x43/0x80 [libahci] [ 1.980806] kernel: ata_scsi_port_error_handler+0x2b1/0x600 [ 1.980810] kernel: ata_scsi_error+0x9c/0xd0 [ 1.980813] kernel: scsi_error_handler+0xa1/0x180 [ 1.980817] kernel: ? scsi_unjam_host+0x1c0/0x1c0 [ 1.980820] kernel: kthread+0x12a/0x150 [ 1.980823] kernel: ? set_kthread_struct+0x50/0x50 [ 1.980826] kernel: ret_from_fork+0x22/0x30 [ 1.980831] kernel: This happens because sata_pmp_init_links() initialize link->pmp up to SATA_PMP_MAX_PORTS while em_priv is declared as 8 elements array. I can't find the maximum Enclosure Management ports specified in AHCI spec v1.3.1, but "12.2.1 LED message type" states that "Port Multiplier Information" can utilize 4 bits, which implies it can support up to 16 ports. Hence, use SATA_PMP_MAX_PORTS as EM_MAX_SLOTS to resolve the issue. BugLink: https://bugs.launchpad.net/bugs/1970074 Cc: stable@vger.kernel.org Signed-off-by: Kai-Heng Feng Signed-off-by: Damien Le Moal Signed-off-by: Greg Kroah-Hartman commit da979315029793377d2d140c2e5801bca2e9602d Author: Alexander Stein Date: Wed Oct 12 15:11:05 2022 +0200 ata: ahci-imx: Fix MODULE_ALIAS commit 979556f1521a835a059de3b117b9c6c6642c7d58 upstream. 'ahci:' is an invalid prefix, preventing the module from autoloading. Fix this by using the 'platform:' prefix and DRV_NAME. Fixes: 9e54eae23bc9 ("ahci_imx: add ahci sata support on imx platforms") Cc: stable@vger.kernel.org Signed-off-by: Alexander Stein Reviewed-by: Fabio Estevam Signed-off-by: Damien Le Moal Signed-off-by: Greg Kroah-Hartman commit c00cdfc9bd767ee743ad3a4054de17aeb0afcbca Author: Zhang Rui Date: Fri Oct 14 17:01:45 2022 +0800 hwmon/coretemp: Handle large core ID value commit 7108b80a542b9d65e44b36d64a700a83658c0b73 upstream. The coretemp driver supports up to a hard-coded limit of 128 cores. Today, the driver can not support a core with an ID above that limit. Yet, the encoding of core ID's is arbitrary (BIOS APIC-ID) and so they may be sparse and they may be large. Update the driver to map arbitrary core ID numbers into appropriate array indexes so that 128 cores can be supported, no matter the encoding of core ID's. Signed-off-by: Zhang Rui Signed-off-by: Dave Hansen Acked-by: Len Brown Acked-by: Guenter Roeck Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221014090147.1836-3-rui.zhang@intel.com Signed-off-by: Greg Kroah-Hartman commit 3ea7da6a97d505f323d46523d2ee97bd1120c16f Author: Borislav Petkov Date: Wed Oct 5 12:00:08 2022 +0200 x86/microcode/AMD: Apply the patch early on every logical thread commit e7ad18d1169c62e6c78c01ff693fd362d9d65278 upstream. Currently, the patch application logic checks whether the revision needs to be applied on each logical CPU (SMT thread). Therefore, on SMT designs where the microcode engine is shared between the two threads, the application happens only on one of them as that is enough to update the shared microcode engine. However, there are microcode patches which do per-thread modification, see Link tag below. Therefore, drop the revision check and try applying on each thread. This is what the BIOS does too so this method is very much tested. Btw, change only the early paths. On the late loading paths, there's no point in doing per-thread modification because if is it some case like in the bugzilla below - removing a CPUID flag - the kernel cannot go and un-use features it has detected are there early. For that, one should use early loading anyway. [ bp: Fixes does not contain the oldest commit which did check for equality but that is good enough. ] Fixes: 8801b3fcb574 ("x86/microcode/AMD: Rework container parsing") Reported-by: Ștefan Talpalaru Signed-off-by: Borislav Petkov Tested-by: Ștefan Talpalaru Cc: Link: https://bugzilla.kernel.org/show_bug.cgi?id=216211 Signed-off-by: Greg Kroah-Hartman commit 3064c74198cf65c5b0ff3e61cc0080cc9dd16c5b Author: Joseph Qi Date: Mon Oct 17 21:02:26 2022 +0800 ocfs2: fix BUG when iput after ocfs2_mknod fails commit 759a7c6126eef5635506453e9b9d55a6a3ac2084 upstream. Commit b1529a41f777 "ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error" tried to reclaim the claimed inode if __ocfs2_mknod_locked() fails later. But this introduce a race, the freed bit may be reused immediately by another thread, which will update dinode, e.g. i_generation. Then iput this inode will lead to BUG: inode->i_generation != le32_to_cpu(fe->i_generation) We could make this inode as bad, but we did want to do operations like wipe in some cases. Since the claimed inode bit can only affect that an dinode is missing and will return back after fsck, it seems not a big problem. So just leave it as is by revert the reclaim logic. Link: https://lkml.kernel.org/r/20221017130227.234480-1-joseph.qi@linux.alibaba.com Fixes: b1529a41f777 ("ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error") Signed-off-by: Joseph Qi Reported-by: Yan Wang Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit c2489774a2f0b5cb83d446b7a312d0cad50850a6 Author: Joseph Qi Date: Mon Oct 17 21:02:27 2022 +0800 ocfs2: clear dinode links count in case of error commit 28f4821b1b53e0649706912e810c6c232fc506f9 upstream. In ocfs2_mknod(), if error occurs after dinode successfully allocated, ocfs2 i_links_count will not be 0. So even though we clear inode i_nlink before iput in error handling, it still won't wipe inode since we'll refresh inode from dinode during inode lock. So just like clear inode i_nlink, we clear ocfs2 i_links_count as well. Also do the same change for ocfs2_symlink(). Link: https://lkml.kernel.org/r/20221017130227.234480-2-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi Reported-by: Yan Wang Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 6391ed32b101e7e489553d136d6eac180db13352 Author: Dave Chinner Date: Wed Oct 26 11:58:43 2022 +0530 xfs: fix use-after-free on CIL context on shutdown commit c7f87f3984cfa1e6d32806a715f35c5947ad9c09 upstream. xlog_wait() on the CIL context can reference a freed context if the waiter doesn't get scheduled before the CIL context is freed. This can happen when a task is on the hard throttle and the CIL push aborts due to a shutdown. This was detected by generic/019: thread 1 thread 2 __xfs_trans_commit xfs_log_commit_cil xlog_wait schedule xlog_cil_push_work wake_up_all xlog_cil_committed kmem_free remove_wait_queue spin_lock_irqsave --> UAF Fix it by moving the wait queue to the CIL rather than keeping it in in the CIL context that gets freed on push completion. Because the wait queue is now independent of the CIL context and we might have multiple contexts in flight at once, only wake the waiters on the push throttle when the context we are pushing is over the hard throttle size threshold. Fixes: 0e7ab7efe7745 ("xfs: Throttle commits on delayed background CIL push") Reported-by: Yu Kuai Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit ac055fee2544ecbb3a79643d6b16b838b839d917 Author: Darrick J. Wong Date: Wed Oct 26 11:58:42 2022 +0530 xfs: move inode flush to the sync workqueue commit f0f7a674d4df1510d8ca050a669e1420cf7d7fab upstream. [ Modify fs/xfs/xfs_super.c to include the changes at locations suitable for 5.4-lts kernel ] Move the inode dirty data flushing to a workqueue so that multiple threads can take advantage of a single thread's flushing work. The ratelimiting technique used in bdd4ee4 was not successful, because threads that skipped the inode flush scan due to ratelimiting would ENOSPC early, which caused occasional (but noticeable) changes in behavior and sporadic fstest regressions. Therefore, make all the writer threads wait on a single inode flush, which eliminates both the stampeding hordes of flushers and the small window in which a write could fail with ENOSPC because it lost the ratelimit race after even another thread freed space. Fixes: c6425702f21e ("xfs: ratelimit inode flush on buffered write ENOSPC") Signed-off-by: Darrick J. Wong Reviewed-by: Brian Foster Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit d3eb14b8ea2682677cbafc79c122de014bc71440 Author: Christoph Hellwig Date: Wed Oct 26 11:58:41 2022 +0530 xfs: reflink should force the log out if mounted with wsync commit 5833112df7e9a306af9af09c60127b92ed723962 upstream. Reflink should force the log out to disk if the filesystem was mounted with wsync, the same as most other operations in xfs. [Note: XFS_MOUNT_WSYNC is set when the admin mounts the filesystem with either the 'wsync' or 'sync' mount options, which effectively means that we're classifying reflink/dedupe as IO operations and making them synchronous when required.] Fixes: 3fc9f5e409319 ("xfs: remove xfs_reflink_remap_range") Signed-off-by: Christoph Hellwig Reviewed-by: Brian Foster [darrick: add more to the changelog] Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 05e2b279ead434521da2e9304944111f865e26b7 Author: Christoph Hellwig Date: Wed Oct 26 11:58:40 2022 +0530 xfs: factor out a new xfs_log_force_inode helper commit 54fbdd1035e3a4e4f4082c335b095426cdefd092 upstream. Create a new helper to force the log up to the last LSN touching an inode. Signed-off-by: Christoph Hellwig Reviewed-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit f1172b08bb8ecb7282e262e5a509c7455a50e0cd Author: Brian Foster Date: Wed Oct 26 11:58:39 2022 +0530 xfs: trylock underlying buffer on dquot flush commit 8d3d7e2b35ea7d91d6e085c93b5efecfb0fba307 upstream. A dquot flush currently blocks on the buffer lock for the underlying dquot buffer. In turn, this causes xfsaild to block rather than continue processing other items in the meantime. Update xfs_qm_dqflush() to trylock the buffer, similar to how inode buffers are handled, and return -EAGAIN if the lock fails. Fix up any callers that don't currently handle the error properly. Signed-off-by: Brian Foster Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 890d7dfff79d0c6960818653261a962debc862af Author: Darrick J. Wong Date: Wed Oct 26 11:58:38 2022 +0530 xfs: don't write a corrupt unmount record to force summary counter recalc commit 5cc3c006eb45524860c4d1dd4dd7ad4a506bf3f5 upstream. [ Modify fs/xfs/xfs_log.c to include the changes at locations suitable for 5.4-lts kernel ] In commit f467cad95f5e3, I added the ability to force a recalculation of the filesystem summary counters if they seemed incorrect. This was done (not entirely correctly) by tweaking the log code to write an unmount record without the UMOUNT_TRANS flag set. At next mount, the log recovery code will fail to find the unmount record and go into recovery, which triggers the recalculation. What actually gets written to the log is what ought to be an unmount record, but without any flags set to indicate what kind of record it actually is. This worked to trigger the recalculation, but we shouldn't write bogus log records when we could simply write nothing. Fixes: f467cad95f5e3 ("xfs: force summary counter recalc at next mount") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Reviewed-by: Brian Foster Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 8ebd3ba932df28095deb4c10a78c5950a3fd958b Author: Dave Chinner Date: Wed Oct 26 11:58:37 2022 +0530 xfs: tail updates only need to occur when LSN changes commit 8eb807bd839938b45bf7a97f0568d2a845ba6929 upstream. We currently wake anything waiting on the log tail to move whenever the log item at the tail of the log is removed. Historically this was fine behaviour because there were very few items at any given LSN. But with delayed logging, there may be thousands of items at any given LSN, and we can't move the tail until they are all gone. Hence if we are removing them in near tail-first order, we might be waking up processes waiting on the tail LSN to change (e.g. log space waiters) repeatedly without them being able to make progress. This also occurs with the new sync push waiters, and can result in thousands of spurious wakeups every second when under heavy direct reclaim pressure. To fix this, check that the tail LSN has actually changed on the AIL before triggering wakeups. This will reduce the number of spurious wakeups when doing bulk AIL removal and make this code much more efficient. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Brian Foster Reviewed-by: Allison Collins Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 87b8a7fb626340dd7d22b5361b0afc999b4ddf73 Author: Dave Chinner Date: Wed Oct 26 11:58:36 2022 +0530 xfs: factor common AIL item deletion code commit 4165994ac9672d91134675caa6de3645a9ace6c8 upstream. Factor the common AIL deletion code that does all the wakeups into a helper so we only have one copy of this somewhat tricky code to interface with all the wakeups necessary when the LSN of the log tail changes. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Allison Collins Reviewed-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 4202b103d382f3e338e39608462386eeff45bc56 Author: Dave Chinner Date: Wed Oct 26 11:58:35 2022 +0530 xfs: Throttle commits on delayed background CIL push commit 0e7ab7efe77451cba4cbecb6c9f5ef83cf32b36b upstream. In certain situations the background CIL push can be indefinitely delayed. While we have workarounds from the obvious cases now, it doesn't solve the underlying issue. This issue is that there is no upper limit on the CIL where we will either force or wait for a background push to start, hence allowing the CIL to grow without bound until it consumes all log space. To fix this, add a new wait queue to the CIL which allows background pushes to wait for the CIL context to be switched out. This happens when the push starts, so it will allow us to block incoming transaction commit completion until the push has started. This will only affect processes that are running modifications, and only when the CIL threshold has been significantly overrun. This has no apparent impact on performance, and doesn't even trigger until over 45 million inodes had been created in a 16-way fsmark test on a 2GB log. That was limiting at 64MB of log space used, so the active CIL size is only about 3% of the total log in that case. The concurrent removal of those files did not trigger the background sleep at all. Signed-off-by: Dave Chinner Reviewed-by: Allison Collins Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 7a8f95bfb9e39d037229c6e2fbe8ea96ee6ca073 Author: Dave Chinner Date: Wed Oct 26 11:58:34 2022 +0530 xfs: Lower CIL flush limit for large logs commit 108a42358a05312b2128533c6462a3fdeb410bdf upstream. The current CIL size aggregation limit is 1/8th the log size. This means for large logs we might be aggregating at least 250MB of dirty objects in memory before the CIL is flushed to the journal. With CIL shadow buffers sitting around, this means the CIL is often consuming >500MB of temporary memory that is all allocated under GFP_NOFS conditions. Flushing the CIL can take some time to do if there is other IO ongoing, and can introduce substantial log force latency by itself. It also pins the memory until the objects are in the AIL and can be written back and reclaimed by shrinkers. Hence this threshold also tends to determine the minimum amount of memory XFS can operate in under heavy modification without triggering the OOM killer. Modify the CIL space limit to prevent such huge amounts of pinned metadata from aggregating. We can have 2MB of log IO in flight at once, so limit aggregation to 16x this size. This threshold was chosen as it little impact on performance (on 16-way fsmark) or log traffic but pins a lot less memory on large logs especially under heavy memory pressure. An aggregation limit of 8x had 5-10% performance degradation and a 50% increase in log throughput for the same workload, so clearly that was too small for highly concurrent workloads on large logs. This was found via trace analysis of AIL behaviour. e.g. insertion from a single CIL flush: xfs_ail_insert: old lsn 0/0 new lsn 1/3033090 type XFS_LI_INODE flags IN_AIL $ grep xfs_ail_insert /mnt/scratch/s.t |grep "new lsn 1/3033090" |wc -l 1721823 $ So there were 1.7 million objects inserted into the AIL from this CIL checkpoint, the first at 2323.392108, the last at 2325.667566 which was the end of the trace (i.e. it hadn't finished). Clearly a major problem. Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Reviewed-by: Allison Collins Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit f43ff28b01834cabfbe587bec01a259db215e3ac Author: Darrick J. Wong Date: Wed Oct 26 11:58:33 2022 +0530 xfs: preserve default grace interval during quotacheck commit 5885539f0af371024d07afd14974bfdc3fff84c5 upstream. When quotacheck runs, it zeroes all the timer fields in every dquot. Unfortunately, it also does this to the root dquot, which erases any preconfigured grace intervals and warning limits that the administrator may have set. Worse yet, the incore copies of those variables remain set. This cache coherence problem manifests itself as the grace interval mysteriously being reset back to the defaults at the /next/ mount. Fix it by not resetting the root disk dquot's timer and warning fields. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Reviewed-by: Christoph Hellwig Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 553e5c8031f572848a03a033afff211d75a4dd1a Author: Brian Foster Date: Wed Oct 26 11:58:32 2022 +0530 xfs: fix unmount hang and memory leak on shutdown during quotaoff commit 8a62714313391b9b2297d67c341b35edbf46c279 upstream. AIL removal of the quotaoff start intent and free of both quotaoff intents is currently limited to the ->iop_committed() handler of the end intent. This executes when the end intent is committed to the on-disk log and marks the completion of the operation. The problem with this is it assumes the success of the operation. If a shutdown or other error occurs during the quotaoff, it's possible for the quotaoff task to exit without removing the start intent from the AIL. This results in an unmount hang as the AIL cannot be emptied. Further, no other codepath frees the intents and so this is also a memory leak vector. First, update the high level quotaoff error path to directly remove and free the quotaoff start intent if it still exists in the AIL at the time of the error. Next, update both of the start and end quotaoff intents with an ->iop_release() callback to properly handle transaction abort. This means that If the quotaoff start transaction aborts, it frees the start intent in the transaction commit path. If the filesystem shuts down before the end transaction allocates, the quotaoff sequence removes and frees the start intent. If the end transaction aborts, it removes the start intent and frees both. This ensures that a shutdown does not result in a hung unmount and that memory is not leaked regardless of when a quotaoff error occurs. Signed-off-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 835306dd3f0cacd44a7c3d7b702d985b2373300d Author: Brian Foster Date: Wed Oct 26 11:58:31 2022 +0530 xfs: factor out quotaoff intent AIL removal and memory free commit 854f82b1f6039a418b7d1407513f8640e05fd73f upstream. AIL removal of the quotaoff start intent and free of both intents is hardcoded to the ->iop_committed() handler of the end intent. Factor out the start intent handling code so it can be used in a future patch to properly handle quotaoff errors. Use xfs_trans_ail_remove() instead of the _delete() variant to acquire the AIL lock and also handle cases where an intent might not reside in the AIL at the time of a failure. Signed-off-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit a1e03f16001948b144a13ecf076e448d324a2db9 Author: Pavel Reichl Date: Wed Oct 26 11:58:30 2022 +0530 xfs: Replace function declaration by actual definition commit 1cc95e6f0d7cfd61c9d3c5cdd4e7345b173f764f upstream. Signed-off-by: Pavel Reichl Reviewed-by: Darrick J. Wong [darrick: fix typo in subject line] Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit fdce40c8fd92cdb561941bfbed2d4ea804f50a54 Author: Pavel Reichl Date: Wed Oct 26 11:58:29 2022 +0530 xfs: remove the xfs_qoff_logitem_t typedef commit d0bdfb106907e4a3ef4f25f6d27e392abf41f3a0 upstream. Signed-off-by: Pavel Reichl Reviewed-by: Darrick J. Wong [darrick: fix a comment] Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 926ddf7846ee9a1c61060b7a6e014829f60395a0 Author: Pavel Reichl Date: Wed Oct 26 11:58:28 2022 +0530 xfs: remove the xfs_dq_logitem_t typedef commit fd8b81dbbb23d4a3508cfac83256b4f5e770941c upstream. Signed-off-by: Pavel Reichl Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 80f78aa76a176740c8e87dbcfba49c4067e9fd47 Author: Pavel Reichl Date: Wed Oct 26 11:58:27 2022 +0530 xfs: remove the xfs_disk_dquot_t and xfs_dquot_t commit aefe69a45d84901c702f87672ec1e93de1d03f73 upstream. Signed-off-by: Pavel Reichl Reviewed-by: Darrick J. Wong [darrick: fix some of the comments] Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 4776ae328ccb568ac5023d7d0e7331b66bb03913 Author: Takashi Iwai Date: Wed Oct 26 11:58:26 2022 +0530 xfs: Use scnprintf() for avoiding potential buffer overflow commit 17bb60b74124e9491d593e2601e3afe14daa2f57 upstream. Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Signed-off-by: Takashi Iwai Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 2f55a03891543ffffceddfc6b2104af3cb79af92 Author: Darrick J. Wong Date: Wed Oct 26 11:58:25 2022 +0530 xfs: check owner of dir3 blocks commit 1b2c1a63b678d63e9c98314d44413f5af79c9c80 upstream. Check the owner field of dir3 block headers. If it's corrupt, release the buffer and return EFSCORRUPTED. All callers handle this properly. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 15b0651f383f738c2e1cabeec7c9862ebc04d11d Author: Darrick J. Wong Date: Wed Oct 26 11:58:24 2022 +0530 xfs: check owner of dir3 data blocks commit a10c21ed5d5241d11cf1d5a4556730840572900b upstream. [Slightly edit xfs_dir3_data_read() to work with existing mapped_bno argument instead of flag values introduced in later kernels] Check the owner field of dir3 data block headers. If it's corrupt, release the buffer and return EFSCORRUPTED. All callers handle this properly. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit bc013efdcf1705317e77801eee41d747fc0e461a Author: Darrick J. Wong Date: Wed Oct 26 11:58:23 2022 +0530 xfs: fix buffer corruption reporting when xfs_dir3_free_header_check fails commit ce99494c9699df58b31d0a839e957f86cd58c755 upstream. xfs_verifier_error is supposed to be called on a corrupt metadata buffer from within a buffer verifier function, whereas xfs_buf_mark_corrupt is the function to be called when a piece of code has read a buffer and catches something that a read verifier cannot. The first function sets b_error anticipating that the low level buffer handling code will see the nonzero b_error and clear XBF_DONE on the buffer, whereas the second function does not. Since xfs_dir3_free_header_check examines fields in the dir free block header that require more context than can be provided to read verifiers, we must call xfs_buf_mark_corrupt when it finds a problem. Switching the calls has a secondary effect that we no longer corrupt the buffer state by setting b_error and leaving XBF_DONE set. When /that/ happens, we'll trip over various state assertions (most commonly the b_error check in xfs_buf_reverify) on a subsequent attempt to read the buffer. Fixes: bc1a09b8e334bf5f ("xfs: refactor verifier callers to print address of failing check") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 6e204b9e67f397ac03247a4371257d99e72aa5ee Author: Darrick J. Wong Date: Wed Oct 26 11:58:22 2022 +0530 xfs: xfs_buf_corruption_error should take __this_address commit e83cf875d67a6cb9ddfaa8b45d2fa93d12b5c66f upstream. Add a xfs_failaddr_t parameter to this function so that callers can potentially pass in (and therefore report) the exact point in the code where we decided that a metadata buffer was corrupt. This enables us to wire it up to checking functions that have to run outside of verifiers. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 0213ee5f4c937890a7d4d86064b5be47ecb3ed3b Author: Darrick J. Wong Date: Wed Oct 26 11:58:21 2022 +0530 xfs: add a function to deal with corrupt buffers post-verifiers commit 8d57c21600a514d7a9237327c2496ae159bab5bb upstream. Add a helper function to get rid of buffers that we have decided are corrupt after the verifiers have run. This function is intended to handle metadata checks that can't happen in the verifiers, such as inter-block relationship checking. Note that we now mark the buffer stale so that it will not end up on any LRU and will be purged on release. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 3c88c3c00c9730e6d9044931417946d8ccb2c1c1 Author: Brian Foster Date: Wed Oct 26 11:58:20 2022 +0530 xfs: rework collapse range into an atomic operation commit 211683b21de959a647de74faedfdd8a5d189327e upstream. The collapse range operation uses a unique transaction and ilock cycle for the hole punch and each extent shift iteration of the overall operation. While the hole punch is safe as a separate operation due to the iolock, cycling the ilock after each extent shift is risky w.r.t. concurrent operations, similar to insert range. To avoid this problem, make collapse range atomic with respect to ilock. Hold the ilock across the entire operation, replace the individual transactions with a single rolling transaction sequence and finish dfops on each iteration to perform pending frees and roll the transaction. Remove the unnecessary quota reservation as collapse range can only ever merge extents (and thus remove extent records and potentially free bmap blocks). The dfops call automatically relogs the inode to keep it moving in the log. This guarantees that nothing else can change the extent mapping of an inode while a collapse range operation is in progress. Signed-off-by: Brian Foster Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 3602df3f1f5ffaf59cff842e904c42bfb0234d43 Author: Brian Foster Date: Wed Oct 26 11:58:19 2022 +0530 xfs: rework insert range into an atomic operation commit dd87f87d87fa4359a54e7b44549742f579e3e805 upstream. The insert range operation uses a unique transaction and ilock cycle for the extent split and each extent shift iteration of the overall operation. While this works, it is risks racing with other operations in subtle ways such as COW writeback modifying an extent tree in the middle of a shift operation. To avoid this problem, make insert range atomic with respect to ilock. Hold the ilock across the entire operation, replace the individual transactions with a single rolling transaction sequence and relog the inode to keep it moving in the log. This guarantees that nothing else can change the extent mapping of an inode while an insert range operation is in progress. Signed-off-by: Brian Foster Reviewed-by: Allison Collins Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 7cd181cb2333280d0365fbe04ebd0e20602b8877 Author: Brian Foster Date: Wed Oct 26 11:58:18 2022 +0530 xfs: open code insert range extent split helper commit b73df17e4c5ba977205253fb7ef54267717a3cba upstream. The insert range operation currently splits the extent at the target offset in a separate transaction and lock cycle from the one that shifts extents. In preparation for reworking insert range into an atomic operation, lift the code into the caller so it can be easily condensed to a single rolling transaction and lock cycle and eliminate the helper. No functional changes. Signed-off-by: Brian Foster Reviewed-by: Allison Collins Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman