commit 3cfdd7252e00ecf3f3dc9508525acf1e67b7d343 Author: Greg Kroah-Hartman Date: Wed Aug 4 12:47:57 2021 +0200 Linux 5.13.8 Link: https://lore.kernel.org/r/20210802134344.028226640@linuxfoundation.org Tested-by: Justin M. Forbes Tested-by: Guenter Roeck Tested-by: Jon Hunter Tested-by: Linux Kernel Functional Testing Tested-by: Fox Chen Signed-off-by: Greg Kroah-Hartman commit d92d15c288413661e43542f0c2ac478e9d8e5de6 Author: Sunil Goutham Date: Thu Jul 22 18:15:51 2021 +0530 octeontx2-af: Remove unnecessary devm_kfree commit d72e91efcae12f2f24ced984d00d60517c677857 upstream. Remove devm_kfree of memory where VLAN entry to RVU PF mapping info is saved. This will be freed anyway at driver exit. Having this could result in warning from devm_kfree() if the memory is not allocated due to errors in rvu_nix_block_init() before nix_setup_txvlan(). Fixes: 9a946def264d ("octeontx2-af: Modify nix_vtag_cfg mailbox to support TX VTAG entries") Signed-off-by: Sunil Goutham Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 187463c4a262fb010d6b240c09cedfe780aac580 Author: John Garry Date: Tue Jul 20 23:10:19 2021 +0800 perf pmu: Fix alias matching commit c07d5c9226980ca5ae21c6a2714baa95be2ce164 upstream. Commit c47a5599eda324ba ("perf tools: Fix pattern matching for same substring in different PMU type"), may have fixed some alias matching, but has broken some others. Firstly it cannot handle the simple scenario of PMU name in form pmu_name{digits} - it can only handle pmu_name_{digits}. Secondly it cannot handle more complex matching in the case where we have multiple tokens. In this scenario, the code failed to realise that we may examine multiple substrings in the PMU name. Fix in two ways: - Change perf_pmu__valid_suffix() to accept a PMU name without '_' in the suffix - Only pay attention to perf_pmu__valid_suffix() for the final token Also add const qualifiers as necessary to avoid casting. Fixes: c47a5599eda324ba ("perf tools: Fix pattern matching for same substring in different PMU type") Signed-off-by: John Garry Tested-by: Jin Yao Cc: Alexander Shishkin Cc: Ian Rogers Cc: Jiri Olsa Cc: Kajol Jain Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Link: http://lore.kernel.org/lkml/1626793819-79090-1-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit 5e1fc537c1be332aef9621ca9146aeb3ba59522f Author: Oleksij Rempel Date: Wed Jul 14 13:16:02 2021 +0200 can: j1939: j1939_session_deactivate(): clarify lifetime of session object commit 0c71437dd50dd687c15d8ca80b3b68f10bb21d63 upstream. The j1939_session_deactivate() is decrementing the session ref-count and potentially can free() the session. This would cause use-after-free situation. However, the code calling j1939_session_deactivate() does always hold another reference to the session, so that it would not be free()ed in this code path. This patch adds a comment to make this clear and a WARN_ON, to ensure that future changes will not violate this requirement. Further this patch avoids dereferencing the session pointer as a precaution to avoid use-after-free if the session is actually free()ed. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/r/20210714111602.24021-1-o.rempel@pengutronix.de Reported-by: Xiaochen Zou Signed-off-by: Oleksij Rempel Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit f27deb33bbdbc9d990a3ce08c334c7747d3216f2 Author: Lukasz Cieplicki Date: Mon May 31 16:55:49 2021 +0000 i40e: Add additional info to PHY type error commit dc614c46178b0b89bde86ac54fc687a28580d2b7 upstream. In case of PHY type error occurs, the message was too generic. Add additional info to PHY type error indicating that it can be wrong cable connected. Fixes: 124ed15bf126 ("i40e: Add dual speed module support") Signed-off-by: Lukasz Cieplicki Signed-off-by: Michal Maloszewski Tested-by: Tony Brelinski Signed-off-by: Tony Nguyen Signed-off-by: Greg Kroah-Hartman commit e95d994d2f8fcff9549433fdcd569300fa3dd123 Author: Jens Axboe Date: Mon Jul 26 10:42:56 2021 -0600 io_uring: fix race in unified task_work running commit 110aa25c3ce417a44e35990cf8ed22383277933a upstream. We use a bit to manage if we need to add the shared task_work, but a list + lock for the pending work. Before aborting a current run of the task_work we check if the list is empty, but we do so without grabbing the lock that protects it. This can lead to races where we think we have nothing left to run, where in practice we could be racing with a task adding new work to the list. If we do hit that race condition, we could be left with work items that need processing, but the shared task_work is not active. Ensure that we grab the lock before checking if the list is empty, so we know if it's safe to exit the run or not. Link: https://lore.kernel.org/io-uring/c6bd5987-e9ae-cd02-49d0-1b3ac1ef65b1@tnonline.net/ Cc: stable@vger.kernel.org # 5.11+ Reported-by: Forza Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit ee6d50cb1c239e15b6b6ac486fc6ed844955d441 Author: Arnaldo Carvalho de Melo Date: Fri Jul 30 18:26:22 2021 -0300 Revert "perf map: Fix dso->nsinfo refcounting" commit 9bac1bd6e6d36459087a728a968e79e37ebcea1a upstream. This makes 'perf top' abort in some cases, and the right fix will involve surgery that is too much to do at this stage, so revert for now and fix it in the next merge window. This reverts commit 2d6b74baa7147251c30a46c4996e8cc224aa2dc5. Cc: Riccardo Mancini Cc: Ian Rogers Cc: Jiri Olsa Cc: Krister Johansen Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit 9755a447ec42dacef80c201039935dbcb57e88b0 Author: Srikar Dronamraju Date: Thu Jul 29 11:34:49 2021 +0530 powerpc/pseries: Fix regression while building external modules commit 333cf507465fbebb3727f5b53e77538467df312a upstream. With commit c9f3401313a5 ("powerpc: Always enable queued spinlocks for 64s, disable for others") CONFIG_PPC_QUEUED_SPINLOCKS is always enabled on ppc64le, external modules that use spinlock APIs are failing. ERROR: modpost: GPL-incompatible module XXX.ko uses GPL-only symbol 'shared_processor' Before the above commit, modules were able to build without any issues. Also this problem is not seen on other architectures. This problem can be workaround if CONFIG_UNINLINE_SPIN_UNLOCK is enabled in the config. However CONFIG_UNINLINE_SPIN_UNLOCK is not enabled by default and only enabled in certain conditions like CONFIG_DEBUG_SPINLOCKS is set in the kernel config. #include spinlock_t spLock; static int __init spinlock_test_init(void) { spin_lock_init(&spLock); spin_lock(&spLock); spin_unlock(&spLock); return 0; } static void __exit spinlock_test_exit(void) { printk("spinlock_test unloaded\n"); } module_init(spinlock_test_init); module_exit(spinlock_test_exit); MODULE_DESCRIPTION ("spinlock_test"); MODULE_LICENSE ("non-GPL"); MODULE_AUTHOR ("Srikar Dronamraju"); Given that spin locks are one of the basic facilities for module code, this effectively makes it impossible to build/load almost any non GPL modules on ppc64le. This was first reported at https://github.com/openzfs/zfs/issues/11172 Currently shared_processor is exported as GPL only symbol. Fix this for parity with other architectures by exposing shared_processor to non-GPL modules too. Fixes: 14c73bd344da ("powerpc/vcpu: Assume dedicated processors as non-preempt") Cc: stable@vger.kernel.org # v5.5+ Reported-by: marc.c.dionne@gmail.com Signed-off-by: Srikar Dronamraju Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20210729060449.292780-1-srikar@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman commit c73256979654a8dfd707e26a7abee840f66c4b1d Author: Michael Ellerman Date: Thu Jul 29 22:56:36 2021 +1000 powerpc/vdso: Don't use r30 to avoid breaking Go lang commit a88603f4b92ecef9e2359e40bcb99ad399d85dd7 upstream. The Go runtime uses r30 for some special value called 'g'. It assumes that value will remain unchanged even when calling VDSO functions. Although r30 is non-volatile across function calls, the callee is free to use it, as long as the callee saves the value and restores it before returning. It used to be true by accident that the VDSO didn't use r30, because the VDSO was hand-written asm. When we switched to building the VDSO from C the compiler started using r30, at least in some builds, leading to crashes in Go. eg: ~/go/src$ ./all.bash Building Go cmd/dist using /usr/lib/go-1.16. (go1.16.2 linux/ppc64le) Building Go toolchain1 using /usr/lib/go-1.16. go build os/exec: /usr/lib/go-1.16/pkg/tool/linux_ppc64le/compile: signal: segmentation fault go build reflect: /usr/lib/go-1.16/pkg/tool/linux_ppc64le/compile: signal: segmentation fault go tool dist: FAILED: /usr/lib/go-1.16/bin/go install -gcflags=-l -tags=math_big_pure_go compiler_bootstrap bootstrap/cmd/...: exit status 1 There are patches in flight to fix Go[1], but until they are released and widely deployed we can workaround it in the VDSO by avoiding use of r30. Note this only works with GCC, clang does not support -ffixed-rN. 1: https://go-review.googlesource.com/c/go/+/328110 Fixes: ab037dd87a2f ("powerpc/vdso: Switch VDSO to generic C implementation.") Cc: stable@vger.kernel.org # v5.11+ Reported-by: Paul Menzel Tested-by: Paul Menzel Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20210729131244.2595519-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman commit 52e9158959d287b68e8fed9fd2f03d6c850263d4 Author: Steve French Date: Fri Jul 23 18:35:15 2021 -0500 SMB3: fix readpage for large swap cache commit f2a26a3cff27dfa456fef386fe5df56dcb4b47b6 upstream. readpage was calculating the offset of the page incorrectly for the case of large swapcaches. loff_t offset = (loff_t)page->index << PAGE_SHIFT; As pointed out by Matthew Wilcox, this needs to use page_file_offset() to calculate the offset instead. Pages coming from the swap cache have page->index set to their index within the swapcache, not within the backing file. For a sufficiently large swapcache, we could have overlapping values of page->index within the same backing file. Suggested by: Matthew Wilcox (Oracle) Cc: # v5.7+ Reviewed-by: Ronnie Sahlberg Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit 19f60bf08ee3734f78b3cd5eb0b4b53acb553d6b Author: Daniel Borkmann Date: Fri Jul 16 09:18:21 2021 +0000 bpf: Fix pointer arithmetic mask tightening under state pruning commit e042aa532c84d18ff13291d00620502ce7a38dda upstream. In 7fedb63a8307 ("bpf: Tighten speculative pointer arithmetic mask") we narrowed the offset mask for unprivileged pointer arithmetic in order to mitigate a corner case where in the speculative domain it is possible to advance, for example, the map value pointer by up to value_size-1 out-of- bounds in order to leak kernel memory via side-channel to user space. The verifier's state pruning for scalars leaves one corner case open where in the first verification path R_x holds an unknown scalar with an aux->alu_limit of e.g. 7, and in a second verification path that same register R_x, here denoted as R_x', holds an unknown scalar which has tighter bounds and would thus satisfy range_within(R_x, R_x') as well as tnum_in(R_x, R_x') for state pruning, yielding an aux->alu_limit of 3: Given the second path fits the register constraints for pruning, the final generated mask from aux->alu_limit will remain at 7. While technically not wrong for the non-speculative domain, it would however be possible to craft similar cases where the mask would be too wide as in 7fedb63a8307. One way to fix it is to detect the presence of unknown scalar map pointer arithmetic and force a deeper search on unknown scalars to ensure that we do not run into a masking mismatch. Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit 8595837e9df51e5f5af78d748bc16ce5dec10edd Author: Lorenz Bauer Date: Thu Apr 29 14:46:56 2021 +0100 bpf: verifier: Allocate idmap scratch in verifier env commit c9e73e3d2b1eb1ea7ff068e05007eec3bd8ef1c9 upstream. func_states_equal makes a very short lived allocation for idmap, probably because it's too large to fit on the stack. However the function is called quite often, leading to a lot of alloc / free churn. Replace the temporary allocation with dedicated scratch space in struct bpf_verifier_env. Signed-off-by: Lorenz Bauer Signed-off-by: Alexei Starovoitov Acked-by: Edward Cree Link: https://lore.kernel.org/bpf/20210429134656.122225-4-lmb@cloudflare.com Signed-off-by: Greg Kroah-Hartman commit 738ab7d5e554c9edba38344bdc8ffd03040a4f41 Author: Daniel Borkmann Date: Tue Jun 29 09:39:15 2021 +0000 bpf: Remove superfluous aux sanitation on subprog rejection commit 59089a189e3adde4cf85f2ce479738d1ae4c514d upstream. Follow-up to fe9a5ca7e370 ("bpf: Do not mark insn as seen under speculative path verification"). The sanitize_insn_aux_data() helper does not serve a particular purpose in today's code. The original intention for the helper was that if function-by-function verification fails, a given program would be cleared from temporary insn_aux_data[], and then its verification would be re-attempted in the context of the main program a second time. However, a failure in do_check_subprogs() will skip do_check_main() and propagate the error to the user instead, thus such situation can never occur. Given its interaction is not compatible to the Spectre v1 mitigation (due to comparing aux->seen with env->pass_cnt), just remove sanitize_insn_aux_data() to avoid future bugs in this area. Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit 0b27bdf02c400684225ee5ee99970bcbf5082282 Author: Daniel Borkmann Date: Tue Jul 13 08:18:31 2021 +0000 bpf: Fix leakage due to insufficient speculative store bypass mitigation [ Upstream commit 2039f26f3aca5b0e419b98f65dd36481337b86ee ] Spectre v4 gadgets make use of memory disambiguation, which is a set of techniques that execute memory access instructions, that is, loads and stores, out of program order; Intel's optimization manual, section 2.4.4.5: A load instruction micro-op may depend on a preceding store. Many microarchitectures block loads until all preceding store addresses are known. The memory disambiguator predicts which loads will not depend on any previous stores. When the disambiguator predicts that a load does not have such a dependency, the load takes its data from the L1 data cache. Eventually, the prediction is verified. If an actual conflict is detected, the load and all succeeding instructions are re-executed. af86ca4e3088 ("bpf: Prevent memory disambiguation attack") tried to mitigate this attack by sanitizing the memory locations through preemptive "fast" (low latency) stores of zero prior to the actual "slow" (high latency) store of a pointer value such that upon dependency misprediction the CPU then speculatively executes the load of the pointer value and retrieves the zero value instead of the attacker controlled scalar value previously stored at that location, meaning, subsequent access in the speculative domain is then redirected to the "zero page". The sanitized preemptive store of zero prior to the actual "slow" store is done through a simple ST instruction based on r10 (frame pointer) with relative offset to the stack location that the verifier has been tracking on the original used register for STX, which does not have to be r10. Thus, there are no memory dependencies for this store, since it's only using r10 and immediate constant of zero; hence af86ca4e3088 /assumed/ a low latency operation. However, a recent attack demonstrated that this mitigation is not sufficient since the preemptive store of zero could also be turned into a "slow" store and is thus bypassed as well: [...] // r2 = oob address (e.g. scalar) // r7 = pointer to map value 31: (7b) *(u64 *)(r10 -16) = r2 // r9 will remain "fast" register, r10 will become "slow" register below 32: (bf) r9 = r10 // JIT maps BPF reg to x86 reg: // r9 -> r15 (callee saved) // r10 -> rbp // train store forward prediction to break dependency link between both r9 // and r10 by evicting them from the predictor's LRU table. 33: (61) r0 = *(u32 *)(r7 +24576) 34: (63) *(u32 *)(r7 +29696) = r0 35: (61) r0 = *(u32 *)(r7 +24580) 36: (63) *(u32 *)(r7 +29700) = r0 37: (61) r0 = *(u32 *)(r7 +24584) 38: (63) *(u32 *)(r7 +29704) = r0 39: (61) r0 = *(u32 *)(r7 +24588) 40: (63) *(u32 *)(r7 +29708) = r0 [...] 543: (61) r0 = *(u32 *)(r7 +25596) 544: (63) *(u32 *)(r7 +30716) = r0 // prepare call to bpf_ringbuf_output() helper. the latter will cause rbp // to spill to stack memory while r13/r14/r15 (all callee saved regs) remain // in hardware registers. rbp becomes slow due to push/pop latency. below is // disasm of bpf_ringbuf_output() helper for better visual context: // // ffffffff8117ee20: 41 54 push r12 // ffffffff8117ee22: 55 push rbp // ffffffff8117ee23: 53 push rbx // ffffffff8117ee24: 48 f7 c1 fc ff ff ff test rcx,0xfffffffffffffffc // ffffffff8117ee2b: 0f 85 af 00 00 00 jne ffffffff8117eee0 <-- jump taken // [...] // ffffffff8117eee0: 49 c7 c4 ea ff ff ff mov r12,0xffffffffffffffea // ffffffff8117eee7: 5b pop rbx // ffffffff8117eee8: 5d pop rbp // ffffffff8117eee9: 4c 89 e0 mov rax,r12 // ffffffff8117eeec: 41 5c pop r12 // ffffffff8117eeee: c3 ret 545: (18) r1 = map[id:4] 547: (bf) r2 = r7 548: (b7) r3 = 0 549: (b7) r4 = 4 550: (85) call bpf_ringbuf_output#194288 // instruction 551 inserted by verifier \ 551: (7a) *(u64 *)(r10 -16) = 0 | /both/ are now slow stores here // storing map value pointer r7 at fp-16 | since value of r10 is "slow". 552: (7b) *(u64 *)(r10 -16) = r7 / // following "fast" read to the same memory location, but due to dependency // misprediction it will speculatively execute before insn 551/552 completes. 553: (79) r2 = *(u64 *)(r9 -16) // in speculative domain contains attacker controlled r2. in non-speculative // domain this contains r7, and thus accesses r7 +0 below. 554: (71) r3 = *(u8 *)(r2 +0) // leak r3 As can be seen, the current speculative store bypass mitigation which the verifier inserts at line 551 is insufficient since /both/, the write of the zero sanitation as well as the map value pointer are a high latency instruction due to prior memory access via push/pop of r10 (rbp) in contrast to the low latency read in line 553 as r9 (r15) which stays in hardware registers. Thus, architecturally, fp-16 is r7, however, microarchitecturally, fp-16 can still be r2. Initial thoughts to address this issue was to track spilled pointer loads from stack and enforce their load via LDX through r10 as well so that /both/ the preemptive store of zero /as well as/ the load use the /same/ register such that a dependency is created between the store and load. However, this option is not sufficient either since it can be bypassed as well under speculation. An updated attack with pointer spill/fills now _all_ based on r10 would look as follows: [...] // r2 = oob address (e.g. scalar) // r7 = pointer to map value [...] // longer store forward prediction training sequence than before. 2062: (61) r0 = *(u32 *)(r7 +25588) 2063: (63) *(u32 *)(r7 +30708) = r0 2064: (61) r0 = *(u32 *)(r7 +25592) 2065: (63) *(u32 *)(r7 +30712) = r0 2066: (61) r0 = *(u32 *)(r7 +25596) 2067: (63) *(u32 *)(r7 +30716) = r0 // store the speculative load address (scalar) this time after the store // forward prediction training. 2068: (7b) *(u64 *)(r10 -16) = r2 // preoccupy the CPU store port by running sequence of dummy stores. 2069: (63) *(u32 *)(r7 +29696) = r0 2070: (63) *(u32 *)(r7 +29700) = r0 2071: (63) *(u32 *)(r7 +29704) = r0 2072: (63) *(u32 *)(r7 +29708) = r0 2073: (63) *(u32 *)(r7 +29712) = r0 2074: (63) *(u32 *)(r7 +29716) = r0 2075: (63) *(u32 *)(r7 +29720) = r0 2076: (63) *(u32 *)(r7 +29724) = r0 2077: (63) *(u32 *)(r7 +29728) = r0 2078: (63) *(u32 *)(r7 +29732) = r0 2079: (63) *(u32 *)(r7 +29736) = r0 2080: (63) *(u32 *)(r7 +29740) = r0 2081: (63) *(u32 *)(r7 +29744) = r0 2082: (63) *(u32 *)(r7 +29748) = r0 2083: (63) *(u32 *)(r7 +29752) = r0 2084: (63) *(u32 *)(r7 +29756) = r0 2085: (63) *(u32 *)(r7 +29760) = r0 2086: (63) *(u32 *)(r7 +29764) = r0 2087: (63) *(u32 *)(r7 +29768) = r0 2088: (63) *(u32 *)(r7 +29772) = r0 2089: (63) *(u32 *)(r7 +29776) = r0 2090: (63) *(u32 *)(r7 +29780) = r0 2091: (63) *(u32 *)(r7 +29784) = r0 2092: (63) *(u32 *)(r7 +29788) = r0 2093: (63) *(u32 *)(r7 +29792) = r0 2094: (63) *(u32 *)(r7 +29796) = r0 2095: (63) *(u32 *)(r7 +29800) = r0 2096: (63) *(u32 *)(r7 +29804) = r0 2097: (63) *(u32 *)(r7 +29808) = r0 2098: (63) *(u32 *)(r7 +29812) = r0 // overwrite scalar with dummy pointer; same as before, also including the // sanitation store with 0 from the current mitigation by the verifier. 2099: (7a) *(u64 *)(r10 -16) = 0 | /both/ are now slow stores here 2100: (7b) *(u64 *)(r10 -16) = r7 | since store unit is still busy. // load from stack intended to bypass stores. 2101: (79) r2 = *(u64 *)(r10 -16) 2102: (71) r3 = *(u8 *)(r2 +0) // leak r3 [...] Looking at the CPU microarchitecture, the scheduler might issue loads (such as seen in line 2101) before stores (line 2099,2100) because the load execution units become available while the store execution unit is still busy with the sequence of dummy stores (line 2069-2098). And so the load may use the prior stored scalar from r2 at address r10 -16 for speculation. The updated attack may work less reliable on CPU microarchitectures where loads and stores share execution resources. This concludes that the sanitizing with zero stores from af86ca4e3088 ("bpf: Prevent memory disambiguation attack") is insufficient. Moreover, the detection of stack reuse from af86ca4e3088 where previously data (STACK_MISC) has been written to a given stack slot where a pointer value is now to be stored does not have sufficient coverage as precondition for the mitigation either; for several reasons outlined as follows: 1) Stack content from prior program runs could still be preserved and is therefore not "random", best example is to split a speculative store bypass attack between tail calls, program A would prepare and store the oob address at a given stack slot and then tail call into program B which does the "slow" store of a pointer to the stack with subsequent "fast" read. From program B PoV such stack slot type is STACK_INVALID, and therefore also must be subject to mitigation. 2) The STACK_SPILL must not be coupled to register_is_const(&stack->spilled_ptr) condition, for example, the previous content of that memory location could also be a pointer to map or map value. Without the fix, a speculative store bypass is not mitigated in such precondition and can then lead to a type confusion in the speculative domain leaking kernel memory near these pointer types. While brainstorming on various alternative mitigation possibilities, we also stumbled upon a retrospective from Chrome developers [0]: [...] For variant 4, we implemented a mitigation to zero the unused memory of the heap prior to allocation, which cost about 1% when done concurrently and 4% for scavenging. Variant 4 defeats everything we could think of. We explored more mitigations for variant 4 but the threat proved to be more pervasive and dangerous than we anticipated. For example, stack slots used by the register allocator in the optimizing compiler could be subject to type confusion, leading to pointer crafting. Mitigating type confusion for stack slots alone would have required a complete redesign of the backend of the optimizing compiler, perhaps man years of work, without a guarantee of completeness. [...] From BPF side, the problem space is reduced, however, options are rather limited. One idea that has been explored was to xor-obfuscate pointer spills to the BPF stack: [...] // preoccupy the CPU store port by running sequence of dummy stores. [...] 2106: (63) *(u32 *)(r7 +29796) = r0 2107: (63) *(u32 *)(r7 +29800) = r0 2108: (63) *(u32 *)(r7 +29804) = r0 2109: (63) *(u32 *)(r7 +29808) = r0 2110: (63) *(u32 *)(r7 +29812) = r0 // overwrite scalar with dummy pointer; xored with random 'secret' value // of 943576462 before store ... 2111: (b4) w11 = 943576462 2112: (af) r11 ^= r7 2113: (7b) *(u64 *)(r10 -16) = r11 2114: (79) r11 = *(u64 *)(r10 -16) 2115: (b4) w2 = 943576462 2116: (af) r2 ^= r11 // ... and restored with the same 'secret' value with the help of AX reg. 2117: (71) r3 = *(u8 *)(r2 +0) [...] While the above would not prevent speculation, it would make data leakage infeasible by directing it to random locations. In order to be effective and prevent type confusion under speculation, such random secret would have to be regenerated for each store. The additional complexity involved for a tracking mechanism that prevents jumps such that restoring spilled pointers would not get corrupted is not worth the gain for unprivileged. Hence, the fix in here eventually opted for emitting a non-public BPF_ST | BPF_NOSPEC instruction which the x86 JIT translates into a lfence opcode. Inserting the latter in between the store and load instruction is one of the mitigations options [1]. The x86 instruction manual notes: [...] An LFENCE that follows an instruction that stores to memory might complete before the data being stored have become globally visible. [...] The latter meaning that the preceding store instruction finished execution and the store is at minimum guaranteed to be in the CPU's store queue, but it's not guaranteed to be in that CPU's L1 cache at that point (globally visible). The latter would only be guaranteed via sfence. So the load which is guaranteed to execute after the lfence for that local CPU would have to rely on store-to-load forwarding. [2], in section 2.3 on store buffers says: [...] For every store operation that is added to the ROB, an entry is allocated in the store buffer. This entry requires both the virtual and physical address of the target. Only if there is no free entry in the store buffer, the frontend stalls until there is an empty slot available in the store buffer again. Otherwise, the CPU can immediately continue adding subsequent instructions to the ROB and execute them out of order. On Intel CPUs, the store buffer has up to 56 entries. [...] One small upside on the fix is that it lifts constraints from af86ca4e3088 where the sanitize_stack_off relative to r10 must be the same when coming from different paths. The BPF_ST | BPF_NOSPEC gets emitted after a BPF_STX or BPF_ST instruction. This happens either when we store a pointer or data value to the BPF stack for the first time, or upon later pointer spills. The former needs to be enforced since otherwise stale stack data could be leaked under speculation as outlined earlier. For non-x86 JITs the BPF_ST | BPF_NOSPEC mapping is currently optimized away, but others could emit a speculation barrier as well if necessary. For real-world unprivileged programs e.g. generated by LLVM, pointer spill/fill is only generated upon register pressure and LLVM only tries to do that for pointers which are not used often. The program main impact will be the initial BPF_ST | BPF_NOSPEC sanitation for the STACK_INVALID case when the first write to a stack slot occurs e.g. upon map lookup. In future we might refine ways to mitigate the latter cost. [0] https://arxiv.org/pdf/1902.05178.pdf [1] https://msrc-blog.microsoft.com/2018/05/21/analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639/ [2] https://arxiv.org/pdf/1905.05725.pdf Fixes: af86ca4e3088 ("bpf: Prevent memory disambiguation attack") Fixes: f7cf25b2026d ("bpf: track spill/fill of constants") Co-developed-by: Piotr Krysiuk Co-developed-by: Benedict Schlueter Signed-off-by: Daniel Borkmann Signed-off-by: Piotr Krysiuk Signed-off-by: Benedict Schlueter Acked-by: Alexei Starovoitov Signed-off-by: Sasha Levin commit ddab060f996e17b38bb181c5fd11a83fd1bfa0df Author: Daniel Borkmann Date: Tue Jul 13 08:18:31 2021 +0000 bpf: Introduce BPF nospec instruction for mitigating Spectre v4 [ Upstream commit f5e81d1117501546b7be050c5fbafa6efd2c722c ] In case of JITs, each of the JIT backends compiles the BPF nospec instruction /either/ to a machine instruction which emits a speculation barrier /or/ to /no/ machine instruction in case the underlying architecture is not affected by Speculative Store Bypass or has different mitigations in place already. This covers both x86 and (implicitly) arm64: In case of x86, we use 'lfence' instruction for mitigation. In case of arm64, we rely on the firmware mitigation as controlled via the ssbd kernel parameter. Whenever the mitigation is enabled, it works for all of the kernel code with no need to provide any additional instructions here (hence only comment in arm64 JIT). Other archs can follow as needed. The BPF nospec instruction is specifically targeting Spectre v4 since i) we don't use a serialization barrier for the Spectre v1 case, and ii) mitigation instructions for v1 and v4 might be different on some archs. The BPF nospec is required for a future commit, where the BPF verifier does annotate intermediate BPF programs with speculation barriers. Co-developed-by: Piotr Krysiuk Co-developed-by: Benedict Schlueter Signed-off-by: Daniel Borkmann Signed-off-by: Piotr Krysiuk Signed-off-by: Benedict Schlueter Acked-by: Alexei Starovoitov Signed-off-by: Sasha Levin commit 9ec54436991f443cbe75882dfca8ff9801f754f8 Author: Dan Carpenter Date: Thu Jul 29 17:12:46 2021 +0300 can: hi311x: fix a signedness bug in hi3110_cmd() [ Upstream commit f6b3c7848e66e9046c8a79a5b88fd03461cc252b ] The hi3110_cmd() is supposed to return zero on success and negative error codes on failure, but it was accidentally declared as a u8 when it needs to be an int type. Fixes: 57e83fb9b746 ("can: hi311x: Add Holt HI-311x CAN driver") Link: https://lore.kernel.org/r/20210729141246.GA1267@kili Signed-off-by: Dan Carpenter Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin commit dda5c13325f1a0b05905c72e9f45b53e4022724c Author: Wang Hai Date: Wed Jul 28 20:11:07 2021 +0800 sis900: Fix missing pci_disable_device() in probe and remove [ Upstream commit 89fb62fde3b226f99b7015280cf132e2a7438edf ] Replace pci_enable_device() with pcim_enable_device(), pci_disable_device() and pci_release_regions() will be called in release automatically. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Hulk Robot Signed-off-by: Wang Hai Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 6cbc642e6f82c87198146bc9b35f01b720b36f36 Author: Wang Hai Date: Wed Jul 28 15:43:13 2021 +0800 tulip: windbond-840: Fix missing pci_disable_device() in probe and remove [ Upstream commit 76a16be07b209a3f507c72abe823bd3af1c8661a ] Replace pci_enable_device() with pcim_enable_device(), pci_disable_device() and pci_release_regions() will be called in release automatically. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Hulk Robot Signed-off-by: Wang Hai Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit c7d5458d5589e226a7ac7ef33bb7296f06e1db66 Author: Marcelo Ricardo Leitner Date: Tue Jul 27 23:40:54 2021 -0300 sctp: fix return value check in __sctp_rcv_asconf_lookup [ Upstream commit 557fb5862c9272ad9b21407afe1da8acfd9b53eb ] As Ben Hutchings noticed, this check should have been inverted: the call returns true in case of success. Reported-by: Ben Hutchings Fixes: 0c5dc070ff3d ("sctp: validate from_addr_param return") Signed-off-by: Marcelo Ricardo Leitner Reviewed-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit fc553003e36191f0959e33006e17736baba0a2e2 Author: Christoph Hellwig Date: Thu Jul 22 09:53:54 2021 +0200 block: delay freeing the gendisk [ Upstream commit 340e84573878b2b9d63210482af46883366361b9 ] blkdev_get_no_open acquires a reference to the block_device through the block device inode and then tries to acquire a device model reference to the gendisk. But at this point the disk migh already be freed (although the race is free). Fix this by only freeing the gendisk from the whole device bdevs ->free_inode callback as well. Fixes: 22ae8ce8b892 ("block: simplify bdev/disk lookup in blkdev_get") Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik Reviewed-by: Ming Lei Link: https://lore.kernel.org/r/20210722075402.983367-2-hch@lst.de Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin commit 926fa6598cddfe73877e1f2ab2cdee68cb3e79a6 Author: Chris Mi Date: Mon Apr 26 11:06:37 2021 +0800 net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32 [ Upstream commit 740452e09cf5fc489ce60831cf11abef117b5d26 ] The offending refactor commit uses u16 chain wrongly. Actually, it should be u32. Fixes: c620b772152b ("net/mlx5: Refactor tc flow attributes structure") CC: Ariel Levkovich Signed-off-by: Chris Mi Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit 601c356d1e0a0a2ff80f74fdaf63342de7f76eef Author: Dima Chumak Date: Mon Apr 26 15:16:26 2021 +0300 net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev() [ Upstream commit b1c2f6312c5005c928a72e668bf305a589d828d4 ] The result of __dev_get_by_index() is not checked for NULL and then gets dereferenced immediately. Also, __dev_get_by_index() must be called while holding either RTNL lock or @dev_base_lock, which isn't satisfied by mlx5e_hairpin_get_mdev() or its callers. This makes the underlying hlist_for_each_entry() loop not safe, and can have adverse effects in itself. Fix by using dev_get_by_index() and handling nullptr return value when ifindex device is not found. Update mlx5e_hairpin_get_mdev() callers to check for possible PTR_ERR() result. Fixes: 77ab67b7f0f9 ("net/mlx5e: Basic setup of hairpin object") Addresses-Coverity: ("Dereference null return value") Signed-off-by: Dima Chumak Reviewed-by: Vlad Buslov Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit 6b35ae3f6b429812d2c265162d6fb8f2df1300e5 Author: Aya Levin Date: Wed Jun 16 19:11:03 2021 +0300 net/mlx5: Unload device upon firmware fatal error [ Upstream commit 7f331bf0f060c2727e36d64f9b098b4ee5f3dfad ] When fw_fatal reporter reports an error, the firmware in not responding. Unload the device to ensure that the driver closes all its resources, even if recovery is not due (user disabled auto-recovery or reporter is in grace period). On successful recovery the device is loaded back up. Fixes: b3bd076f7501 ("net/mlx5: Report devlink health on FW fatal issues") Signed-off-by: Aya Levin Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit 9bf4345430b4ae08a5e1fa2662ac69a0298561a7 Author: Aya Levin Date: Tue Jun 22 10:24:17 2021 +0300 net/mlx5e: Fix page allocation failure for ptp-RQ over SF [ Upstream commit 678b1ae1af4aef488fcc42baa663e737b9a531ba ] Set the correct pci-device pointer to the ptp-RQ. This allows access to dma_mask and avoids allocation request with wrong pci-device. Fixes: a099da8ffcf6 ("net/mlx5e: Add RQ to PTP channel") Signed-off-by: Aya Levin Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit 09f2d23a618e107b55b4891a6952574528447f93 Author: Aya Levin Date: Mon Jun 21 18:04:07 2021 +0300 net/mlx5e: Fix page allocation failure for trap-RQ over SF [ Upstream commit 497008e783452a2ec45c7ec5835cfe6950dcb097 ] Set the correct device pointer to the trap-RQ, to allow access to dma_mask and avoid allocation request with the wrong pci-dev. WARNING: CPU: 1 PID: 12005 at kernel/dma/mapping.c:151 dma_map_page_attrs+0x139/0x1c0 ... all Trace: ? __page_pool_alloc_pages_slow+0x5a/0x210 mlx5e_post_rx_wqes+0x258/0x400 [mlx5_core] mlx5e_trap_napi_poll+0x44/0xc0 [mlx5_core] __napi_poll+0x24/0x150 net_rx_action+0x22b/0x280 __do_softirq+0xc7/0x27e do_softirq+0x61/0x80 __local_bh_enable_ip+0x4b/0x50 mlx5e_handle_action_trap+0x2dd/0x4d0 [mlx5_core] blocking_notifier_call_chain+0x5a/0x80 mlx5_devlink_trap_action_set+0x8b/0x100 [mlx5_core] Fixes: 5543e989fe5e ("net/mlx5e: Add trap entity to ETH driver") Signed-off-by: Aya Levin Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit e6eaea0980ee1e9565557cae505cd3f51aaa3f3b Author: Maxim Mikityanskiy Date: Wed Jun 30 13:33:31 2021 +0300 net/mlx5e: Add NETIF_F_HW_TC to hw_features when HTB offload is available [ Upstream commit 9841d58f3550d11c6181424427e8ad8c9c80f1b6 ] If a feature flag is only present in features, but not in hw_features, the user can't reset it. Although hw_features may contain NETIF_F_HW_TC by the point where the driver checks whether HTB offload is supported, this flag is controlled by another condition that may not hold. Set it explicitly to make sure the user can disable it. Fixes: 214baf22870c ("net/mlx5e: Support HTB offload") Signed-off-by: Maxim Mikityanskiy Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit b0ba8a145d8db5080e4dc4d986fa114956335bf9 Author: Tariq Toukan Date: Wed Jun 30 13:45:05 2021 +0300 net/mlx5e: RX, Avoid possible data corruption when relaxed ordering and LRO combined [ Upstream commit e2351e517068718724f1d3b4010e2a41ec91fa76 ] When HW aggregates packets for an LRO session, it writes the payload of two consecutive packets of a flow contiguously, so that they usually share a cacheline. The first byte of a packet's payload is written immediately after the last byte of the preceding packet. In this flow, there are two consecutive write requests to the shared cacheline: 1. Regular write for the earlier packet. 2. Read-modify-write for the following packet. In case of relaxed-ordering on, these two writes might be re-ordered. Using the end padding optimization (to avoid partial write for the last cacheline of a packet) becomes problematic if the two writes occur out-of-order, as the padding would overwrite payload that belongs to the following packet, causing data corruption. Avoid this by disabling the end padding optimization when both LRO and relaxed-ordering are enabled. Fixes: 17347d5430c4 ("net/mlx5e: Add support for PCI relaxed ordering") Signed-off-by: Tariq Toukan Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit 4d253ea99fbac316cbe4bafaf36087b592c2cd49 Author: Roi Dayan Date: Wed Jun 2 14:17:07 2021 +0300 net/mlx5: E-Switch, handle devcom events only for ports on the same device [ Upstream commit dd3fddb82780bfa24124834edd90bbc63bd689cc ] This is the same check as LAG mode checks if to enable lag. This will fix adding peer miss rules if lag is not supported and even an incorrect rules in socket direct mode. Also fix the incorrect comment on mlx5_get_next_phys_dev() as flow #1 doesn't exists. Fixes: ac004b832128 ("net/mlx5e: E-Switch, Add peer miss rules") Signed-off-by: Roi Dayan Reviewed-by: Maor Dickman Reviewed-by: Mark Bloch Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit 0b26a4e2d5dd23775bc484b7eeacd8cb9d9266d9 Author: Maor Dickman Date: Tue Jun 22 17:07:02 2021 +0300 net/mlx5: E-Switch, Set destination vport vhca id only when merged eswitch is supported [ Upstream commit c671972534c6f7fce789ac8156a2bc3bd146f806 ] Destination vport vhca id is valid flag is set only merged eswitch isn't supported. Change destination vport vhca id value to be set also only when merged eswitch is supported. Fixes: e4ad91f23f10 ("net/mlx5e: Split offloaded eswitch TC rules for port mirroring") Signed-off-by: Maor Dickman Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit 1dc7f1219c13bc6e07513ff6ca1f1a0fbec64abf Author: Maor Dickman Date: Thu Jul 8 15:24:58 2021 +0300 net/mlx5e: Disable Rx ntuple offload for uplink representor [ Upstream commit 90b22b9bcd242a3ba238f2c6f7eab771799001f8 ] Rx ntuple offload is not supported in switchdev mode. Tryng to enable it cause kernel panic. BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 80000001065a5067 P4D 80000001065a5067 PUD 106594067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 7 PID: 1089 Comm: ethtool Not tainted 5.13.0-rc7_for_upstream_min_debug_2021_06_23_16_44 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5e_arfs_enable+0x70/0xd0 [mlx5_core] Code: 44 24 10 00 00 00 00 48 c7 44 24 18 00 00 00 00 49 63 c4 48 89 e2 44 89 e6 48 69 c0 20 08 00 00 48 89 ef 48 03 85 68 ac 00 00 <48> 8b 40 08 48 89 44 24 08 e8 d2 aa fd ff 48 83 05 82 96 18 00 01 RSP: 0018:ffff8881047679e0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000004000000000 RCX: 0000004000000000 RDX: ffff8881047679e0 RSI: 0000000000000000 RDI: ffff888115100880 RBP: ffff888115100880 R08: ffffffffa00f6cb0 R09: ffff888104767a18 R10: ffff8881151000a0 R11: ffff888109479540 R12: 0000000000000000 R13: ffff888104767bb8 R14: ffff888115100000 R15: ffff8881151000a0 FS: 00007f41a64ab740(0000) GS:ffff8882f5dc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000104cbc005 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: set_feature_arfs+0x1e/0x40 [mlx5_core] mlx5e_handle_feature+0x43/0xa0 [mlx5_core] mlx5e_set_features+0x139/0x1b0 [mlx5_core] __netdev_update_features+0x2b3/0xaf0 ethnl_set_features+0x176/0x3a0 ? __nla_parse+0x22/0x30 genl_family_rcv_msg_doit+0xe2/0x140 genl_rcv_msg+0xde/0x1d0 ? features_reply_size+0xe0/0xe0 ? genl_get_cmd+0xd0/0xd0 netlink_rcv_skb+0x4e/0xf0 genl_rcv+0x24/0x40 netlink_unicast+0x1f6/0x2b0 netlink_sendmsg+0x225/0x450 sock_sendmsg+0x33/0x40 __sys_sendto+0xd4/0x120 ? __sys_recvmsg+0x4e/0x90 ? exc_page_fault+0x219/0x740 __x64_sys_sendto+0x25/0x30 do_syscall_64+0x3f/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f41a65b0cba Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c RSP: 002b:00007ffd8d688358 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000010f42a0 RCX: 00007f41a65b0cba RDX: 0000000000000058 RSI: 00000000010f43b0 RDI: 0000000000000003 RBP: 000000000047ae60 R08: 00007f41a667c000 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 00000000010f4340 R13: 00000000010f4350 R14: 00007ffd8d688400 R15: 00000000010f42a0 Modules linked in: mlx5_vdpa vhost_iotlb vdpa xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad ib_ipoib rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core ptp pps_core fuse CR2: 0000000000000008 ---[ end trace c66523f2aba94b43 ]--- Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode") Signed-off-by: Maor Dickman Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit 644c3c58ec77cdbdd28e84a7947201f06d8e58d8 Author: Maor Gottlieb Date: Mon Jul 26 09:20:14 2021 +0300 net/mlx5: Fix flow table chaining [ Upstream commit 8b54874ef1617185048029a3083d510569e93751 ] Fix a bug when flow table is created in priority that already has other flow tables as shown in the below diagram. If the new flow table (FT-B) has the lowest level in the priority, we need to connect the flow tables from the previous priority (p0) to this new table. In addition when this flow table is destroyed (FT-B), we need to connect the flow tables from the previous priority (p0) to the next level flow table (FT-C) in the same priority of the destroyed table (if exists). --------- |root_ns| --------- | -------------------------------- | | | ---------- ---------- --------- |p(prio)-x| | p-y | | p-n | ---------- ---------- --------- | | ---------------- ------------------ |ns(e.g bypass)| |ns(e.g. kernel) | ---------------- ------------------ | | | ------- ------ ---- | p0 | | p1 | |p2| ------- ------ ---- | | \ -------- ------- ------ | FT-A | |FT-B | |FT-C| -------- ------- ------ Fixes: f90edfd279f3 ("net/mlx5_core: Connect flow tables") Signed-off-by: Maor Gottlieb Reviewed-by: Mark Bloch Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin commit 951e41ddd2142fa533ee7413263b9a0fe107ef3c Author: John Fastabend Date: Tue Jul 27 09:04:58 2021 -0700 bpf, sockmap: Zap ingress queues after stopping strparser [ Upstream commit 343597d558e79fe704ba8846b5b2ed24056b89c2 ] We don't want strparser to run and pass skbs into skmsg handlers when the psock is null. We just sk_drop them in this case. When removing a live socket from map it means extra drops that we do not need to incur. Move the zap below strparser close to avoid this condition. This way we stop the stream parser first stopping it from processing packets and then delete the psock. Fixes: a136678c0bdbb ("bpf: sk_msg, zap ingress queue on psock down") Signed-off-by: John Fastabend Signed-off-by: Andrii Nakryiko Acked-by: Jakub Sitnicki Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/bpf/20210727160500.1713554-2-john.fastabend@gmail.com Signed-off-by: Sasha Levin commit 0664f9acc5bfc36c6a2d1d313f394d4596cbad9c Author: David Matlack Date: Tue Jul 13 22:09:56 2021 +0000 KVM: selftests: Fix missing break in dirty_log_perf_test arg parsing [ Upstream commit 15b7b737deb30e1f8f116a08e723173b55ebd2f3 ] There is a missing break statement which causes a fallthrough to the next statement where optarg will be null and a segmentation fault will be generated. Fixes: 9e965bb75aae ("KVM: selftests: Add backing src parameter to dirty_log_perf_test") Reviewed-by: Ben Gardon Signed-off-by: David Matlack Message-Id: <20210713220957.3493520-6-dmatlack@google.com> Signed-off-by: Paolo Bonzini Signed-off-by: Sasha Levin commit 9e27f578a40373ea359c8d3402916c48be1bf4f3 Author: Bjorn Andersson Date: Wed Jul 21 19:44:34 2021 -0700 drm/msm/dp: Initialize the INTF_CONFIG register [ Upstream commit f9a39932fa54b6421e751ada7a285da809146421 ] Some bootloaders set the widebus enable bit in the INTF_CONFIG register, but configuration of widebus isn't yet supported ensure that the register has a known value, with widebus disabled. Fixes: c943b4948b58 ("drm/msm/dp: add displayPort driver support") Signed-off-by: Bjorn Andersson Reviewed-by: Stephen Boyd Link: https://lore.kernel.org/r/20210722024434.3313167-1-bjorn.andersson@linaro.org Signed-off-by: Rob Clark Signed-off-by: Sasha Levin commit c122e9371bd63a2ddec486f4f3a3d07452367b82 Author: Kuogee Hsieh Date: Tue Jul 13 08:54:01 2021 -0700 drm/msm/dp: use dp_ctrl_off_link_stream during PHY compliance test run [ Upstream commit 7591c532b818ef4b8e3e635d842547c08b3a32b4 ] DP cable should always connect to DPU during the entire PHY compliance testing run. Since DP PHY compliance test is executed at irq_hpd event context, dp_ctrl_off_link_stream() should be used instead of dp_ctrl_off(). dp_ctrl_off() is used for unplug event which is triggered when DP cable is dis connected. Changes in V2: -- add fixes statement Fixes: f21c8a276c2d ("drm/msm/dp: handle irq_hpd with sink_count = 0 correctly") Signed-off-by: Kuogee Hsieh Reviewed-by: Stephen Boyd Link: https://lore.kernel.org/r/1626191647-13901-2-git-send-email-khsieh@codeaurora.org Signed-off-by: Rob Clark Signed-off-by: Sasha Levin commit 58389fac95fbf21ba799918616cfe7bf931ca659 Author: Robert Foss Date: Mon Jun 28 10:50:33 2021 +0200 drm/msm/dpu: Fix sm8250_mdp register length [ Upstream commit b910a0206b59eb90ea8ff76d146f4c3156da61e9 ] The downstream dts lists this value as 0x494, and not 0x45c. Fixes: af776a3e1c30 ("drm/msm/dpu: add SM8250 to hw catalog") Signed-off-by: Robert Foss Reviewed-by: Dmitry Baryshkov Reviewed-by: AngeloGioacchino Del Regno Link: https://lore.kernel.org/r/20210628085033.9905-1-robert.foss@linaro.org Signed-off-by: Rob Clark Signed-off-by: Sasha Levin commit 5e8c20b001e8ba3e935bc95c8632427397076196 Author: Pavel Skripkin Date: Sun Jul 25 00:11:59 2021 +0300 net: llc: fix skb_over_panic [ Upstream commit c7c9d2102c9c098916ab9e0ab248006107d00d6c ] Syzbot reported skb_over_panic() in llc_pdu_init_as_xid_cmd(). The problem was in wrong LCC header manipulations. Syzbot's reproducer tries to send XID packet. llc_ui_sendmsg() is doing following steps: 1. skb allocation with size = len + header size len is passed from userpace and header size is 3 since addr->sllc_xid is set. 2. skb_reserve() for header_len = 3 3. filling all other space with memcpy_from_msg() Ok, at this moment we have fully loaded skb, only headers needs to be filled. Then code comes to llc_sap_action_send_xid_c(). This function pushes 3 bytes for LLC PDU header and initializes it. Then comes llc_pdu_init_as_xid_cmd(). It initalizes next 3 bytes *AFTER* LLC PDU header and call skb_push(skb, 3). This looks wrong for 2 reasons: 1. Bytes rigth after LLC header are user data, so this function was overwriting payload. 2. skb_push(skb, 3) call can cause skb_over_panic() since all free space was filled in llc_ui_sendmsg(). (This can happen is user passed 686 len: 686 + 14 (eth header) + 3 (LLC header) = 703. SKB_DATA_ALIGN(703) = 704) So, in this patch I added 2 new private constansts: LLC_PDU_TYPE_U_XID and LLC_PDU_LEN_U_XID. LLC_PDU_LEN_U_XID is used to correctly reserve header size to handle LLC + XID case. LLC_PDU_TYPE_U_XID is used by llc_pdu_header_init() function to push 6 bytes instead of 3. And finally I removed skb_push() call from llc_pdu_init_as_xid_cmd(). This changes should not affect other parts of LLC, since after all steps we just transmit buffer. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+5e5a981ad7cc54c4b2b4@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 40e79954edce82cceac5f121b5e84a85393f88ec Author: Vitaly Kuznetsov Date: Thu Jul 22 14:30:18 2021 +0200 KVM: x86: Check the right feature bit for MSR_KVM_ASYNC_PF_ACK access [ Upstream commit 0a31df6823232516f61f174907e444f710941dfe ] MSR_KVM_ASYNC_PF_ACK MSR is part of interrupt based asynchronous page fault interface and not the original (deprecated) KVM_FEATURE_ASYNC_PF. This is stated in Documentation/virt/kvm/msr.rst. Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: Vitaly Kuznetsov Reviewed-by: Maxim Levitsky Reviewed-by: Oliver Upton Message-Id: <20210722123018.260035-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini Signed-off-by: Sasha Levin commit 9eb2c41471e6aa07592872b11d017cc7dbc7f755 Author: Rodrigo Vivi Date: Fri Jul 23 05:52:25 2021 -0400 drm/i915/bios: Fix ports mask [ Upstream commit d7f237df53457cf0cbdb9943b9b7c93a05e2fdb6 ] PORT_A to PORT_F are regular integers defined in the enum port, while for_each_port_masked requires a bit mask for the ports. Current given mask: 0b111 Desired mask: 0b111111 I noticed this while Christoph was reporting a bug found on headless GVT configuration which bisect blamed commit 3ae04c0c7e63 ("drm/i915/bios: limit default outputs to ports A through F") v2: Avoid unnecessary line continuations as pointed by CI and Christoph Cc: Christoph Hellwig Fixes: 3ae04c0c7e63 ("drm/i915/bios: limit default outputs to ports A through F") Cc: Lucas De Marchi Cc: Ville Syrjälä Cc: Jani Nikula Signed-off-by: Rodrigo Vivi Reviewed-by: José Roberto de Souza Reviewed-by: Lucas De Marchi Tested-by: Christoph Hellwig Link: https://patchwork.freedesktop.org/patch/msgid/20210723095225.562913-1-rodrigo.vivi@intel.com (cherry picked from commit 9b52aa720168859526bf90d77fa210fc0336f170) Signed-off-by: Rodrigo Vivi Signed-off-by: Sasha Levin commit 4689d61012a6d0aeff8f53702a73bd11800ffdb8 Author: Jagan Teki Date: Sun Jul 25 23:17:37 2021 +0530 drm/panel: panel-simple: Fix proper bpc for ytc700tlag_05_201c [ Upstream commit 44379b986424b02acfa6e8c85ec5d68d89d3ccc4 ] ytc700tlag_05_201c panel support 8 bpc not 6 bpc as per recent testing in i.MX8MM platform. Fix it. Fixes: 7a1f4fa4a629 ("drm/panel: simple: Add YTC700TLAG-05-201C") Signed-off-by: Jagan Teki Signed-off-by: Sam Ravnborg Link: https://patchwork.freedesktop.org/patch/msgid/20210725174737.891106-1-jagan@amarulasolutions.com Signed-off-by: Sasha Levin commit 7d93d6111d0e3903a248a934c50c7c7668e34018 Author: Jiapeng Chong Date: Fri Jul 23 18:36:09 2021 +0800 mlx4: Fix missing error code in mlx4_load_one() [ Upstream commit 7e4960b3d66d7248b23de3251118147812b42da2 ] The error code is missing in this code scenario, add the error code '-EINVAL' to the return value 'err'. Eliminate the follow smatch warning: drivers/net/ethernet/mellanox/mlx4/main.c:3538 mlx4_load_one() warn: missing error code 'err'. Reported-by: Abaci Robot Fixes: 7ae0e400cd93 ("net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for PF/VFs") Signed-off-by: Jiapeng Chong Reviewed-by: Tariq Toukan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 58b30f9e1a488e50d39846a8c4e4945b15614932 Author: Kevin Lo Date: Fri Jul 23 21:59:27 2021 +0800 net: phy: broadcom: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811 PHY [ Upstream commit ad4e1e48a6291f7fb53fbef38ca264966ffd65c9 ] Restore PHY_ID_BCM54811 accidently removed by commit 5d4358ede8eb. Fixes: 5d4358ede8eb ("net: phy: broadcom: Allow BCM54210E to configure APD") Signed-off-by: Kevin Lo Acked-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 0379d6b0118a7b367a0c11168c5a58cc4bed5ee8 Author: Hariprasad Kelam Date: Sun Jul 25 13:29:37 2021 +0530 octeontx2-pf: Dont enable backpressure on LBK links [ Upstream commit 4c85e57575fb9e6405d02d55aef8025c60abb824 ] Avoid configure backpressure for LBK links as they don't support it and enable lmacs before configuration pause frames. Fixes: 75f36270990c ("octeontx2-pf: Support to enable/disable pause frames via ethtool") Signed-off-by: Geetha sowjanya Signed-off-by: Hariprasad Kelam Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 4182c0d6663932107ab95086daf44439c166d53d Author: Geetha sowjanya Date: Sun Jul 25 13:29:03 2021 +0530 octeontx2-pf: Fix interface down flag on error [ Upstream commit 69f0aeb13bb548e2d5710a350116e03f0273302e ] In the existing code while changing the number of TX/RX queues using ethtool the PF/VF interface resources are freed and reallocated (otx2_stop and otx2_open is called) if the device is in running state. If any resource allocation fails in otx2_open, driver free already allocated resources and return. But again, when the number of queues changes as the device state still running oxt2_stop is called. In which we try to free already freed resources leading to driver crash. This patch fixes the issue by setting the INTF_DOWN flag on error and free the resources in otx2_stop only if the flag is not set. Fixes: 50fe6c02e5ad ("octeontx2-pf: Register and handle link notifications") Signed-off-by: Geetha sowjanya Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit b8a071889fb3d44f75c620bb86c2bdbcb3b1157a Author: Xin Long Date: Fri Jul 23 18:46:01 2021 -0400 tipc: do not write skb_shinfo frags when doing decrytion [ Upstream commit 3cf4375a090473d240281a0d2b04a3a5aaeac34b ] One skb's skb_shinfo frags are not writable, and they can be shared with other skbs' like by pskb_copy(). To write the frags may cause other skb's data crash. So before doing en/decryption, skb_cow_data() should always be called for a cloned or nonlinear skb if req dst is using the same sg as req src. While at it, the likely branch can be removed, as it will be covered by skb_cow_data(). Note that esp_input() has the same issue, and I will fix it in another patch. tipc_aead_encrypt() doesn't have this issue, as it only processes linear data in the unlikely branch. Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Reported-by: Shuang Li Signed-off-by: Xin Long Acked-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 0e99b794c0bcd08111430bca26cf0d1e2995d048 Author: Marc Kleine-Budde Date: Sat Apr 24 16:20:39 2021 +0200 can: mcp251xfd: mcp251xfd_irq(): stop timestamping worker in case error in IRQ [ Upstream commit ef68a717960658e6a1e5f08adb0574326e9a12c2 ] In case an error occurred in the IRQ handler, the chip status is dumped via devcoredump and all IRQs are disabled, but the chip stays powered for further analysis. The chip is in an undefined state and will not receive any CAN frames, so shut down the timestamping worker, which reads the TBC register regularly, too. This avoids any CRC read error messages if there is a communication problem with the chip. Fixes: efd8d98dfb90 ("can: mcp251xfd: add HW timestamp infrastructure") Link: https://lore.kernel.org/r/20210724155131.471303-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin commit d6892195dfbe4f1bb2f2c15b19540ee46611c9a8 Author: Shannon Nelson Date: Fri Jul 23 11:02:49 2021 -0700 ionic: count csum_none when offload enabled [ Upstream commit f07f9815b7046e25cc32bf8542c9c0bbc5eb6e0e ] Be sure to count the csum_none cases when csum offload is enabled. Fixes: 0f3154e6bcb3 ("ionic: Add Tx and Rx handling") Signed-off-by: Shannon Nelson Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 70da7c5042367fe7789baf0a0be0b9fdb5453c7d Author: Shannon Nelson Date: Fri Jul 23 11:02:48 2021 -0700 ionic: fix up dim accounting for tx and rx [ Upstream commit 76ed8a4a00b484dcccef819ef2618bcf8e46f560 ] We need to count the correct Tx and/or Rx packets for dynamic interrupt moderation, depending on which we're processing on the queue interrupt. Fixes: 04a834592bf5 ("ionic: dynamic interrupt moderation") Signed-off-by: Shannon Nelson Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit e8927398faa142ee2061e6b8c07e2619d944fdaa Author: Shannon Nelson Date: Fri Jul 23 11:02:47 2021 -0700 ionic: remove intr coalesce update from napi [ Upstream commit a6ff85e0a2d9d074a4b4c291ba9ec1e5b0aba22b ] Move the interrupt coalesce value update out of the napi thread and into the dim_work thread and set it only when it has actually changed. Fixes: 04a834592bf5 ("ionic: dynamic interrupt moderation") Signed-off-by: Shannon Nelson Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit b367a9a2cebec942482d5d18faae1a82c0ea687c Author: Shannon Nelson Date: Fri Jul 23 11:02:46 2021 -0700 ionic: catch no ptp support earlier [ Upstream commit f79eef711eb57d56874b08ea11db69221de54a6d ] If PTP configuration is attempted on ports that don't support it, such as VF ports, the driver will return an error status -95, or EOPNOSUPP and print an error message enp98s0: hwstamp set failed: -95 Because some daemons can retry every few seconds, this can end up filling the dmesg log and pushing out other more useful messages. We can catch this issue earlier in our handling and return the error without a log message. Fixes: 829600ce5e4e ("ionic: add ts_config replay") Signed-off-by: Shannon Nelson Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 12e88273896d89d210958fbe7cc26f1b26bc01cc Author: Shannon Nelson Date: Fri Jul 23 11:02:45 2021 -0700 ionic: make all rx_mode work threadsafe [ Upstream commit 6840e17b8ea992453e2d6f460d403cb05d194e76 ] Move the bulk of the code from ionic_set_rx_mode(), which can be called from atomic context, into ionic_lif_rx_mode() which is a safe context. A call from the stack will get pushed off into a work thread, but it is also possible to simultaneously have a call driven by a queue reconfig request from an ethtool command or fw recovery event. We add a mutex around the rx_mode work to be sure they don't collide. Fixes: 81dbc24147f9 ("ionic: change set_rx_mode from_ndo to can_sleep") Signed-off-by: Shannon Nelson Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 2e618cf6db69b782ac6304c89e5e9f47c8fb29cc Author: Pavel Skripkin Date: Fri Jul 23 18:31:32 2021 +0300 net: qrtr: fix memory leaks [ Upstream commit 52f3456a96c06760b9bfae460e39596fec7af22e ] Syzbot reported memory leak in qrtr. The problem was in unputted struct sock. qrtr_local_enqueue() function calls qrtr_port_lookup() which takes sock reference if port was found. Then there is the following check: if (!ipc || &ipc->sk == skb->sk) { ... return -ENODEV; } Since we should drop the reference before returning from this function and ipc can be non-NULL inside this if, we should add qrtr_port_put() inside this if. The similar corner case is in qrtr_endpoint_post() as Manivannan reported. In case of sock_queue_rcv_skb() failure we need to put port reference to avoid leaking struct sock pointer. Fixes: e04df98adf7d ("net: qrtr: Remove receive worker") Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Reported-and-tested-by: syzbot+35a511c72ea7356cdcf3@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin Reviewed-by: Manivannan Sadhasivam Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 56a5e590b31eec87c20763d331d88b0e36c564cd Author: Tetsuo Handa Date: Tue Jul 6 23:40:34 2021 +0900 loop: reintroduce global lock for safe loop_validate_file() traversal [ Upstream commit 3ce6e1f662a910970880188ea7bfd00542bd3934 ] Commit 6cc8e7430801fa23 ("loop: scale loop device by introducing per device lock") re-opened a race window for NULL pointer dereference at loop_validate_file() where commit 310ca162d779efee ("block/loop: Use global lock for ioctl() operation.") has closed. Although we need to guarantee that other loop devices will not change during traversal, we can't take remote "struct loop_device"->lo_mutex inside loop_validate_file() in order to avoid AB-BA deadlock. Therefore, introduce a global lock dedicated for loop_validate_file() which is conditionally taken before local "struct loop_device"->lo_mutex is taken. Signed-off-by: Tetsuo Handa Fixes: 6cc8e7430801fa23 ("loop: scale loop device by introducing per device lock") Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin commit fcc99d41954f8deba6863326db0d5bfeee797547 Author: Vladimir Oltean Date: Thu Jul 22 16:05:51 2021 +0300 net: dsa: mv88e6xxx: silently accept the deletion of VID 0 too [ Upstream commit c92c74131a84b508aa8f079a25d7bbe10748449e ] The blamed commit modified the driver to accept the addition of VID 0 without doing anything, but deleting that VID still fails: [ 32.080780] mv88e6085 d0032004.mdio-mii:10 lan8: failed to kill vid 0081/0 Modify mv88e6xxx_port_vlan_leave() to do the same thing as the addition. Fixes: b8b79c414eca ("net: dsa: mv88e6xxx: Fix adding vlan 0") Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit a6964b4c65c754eea889d455430f608f6bc33df4 Author: Gilad Naaman Date: Thu Jul 22 20:01:28 2021 +0300 net: Set true network header for ECN decapsulation [ Upstream commit 227adfb2b1dfbc53dfc53b9dd7a93a6298ff7c56 ] In cases where the header straight after the tunnel header was another ethernet header (TEB), instead of the network header, the ECN decapsulation code would treat the ethernet header as if it was an IP header, resulting in mishandling and possible wrong drops or corruption of the IP header. In this case, ECT(1) is sent, so IP_ECN_decapsulate tries to copy it to the inner IPv4 header, and correct its checksum. The offset of the ECT bits in an IPv4 header corresponds to the lower 2 bits of the second octet of the destination MAC address in the ethernet header. The IPv4 checksum corresponds to end of the source address. In order to reproduce: $ ip netns add A $ ip netns add B $ ip -n A link add _v0 type veth peer name _v1 netns B $ ip -n A link set _v0 up $ ip -n A addr add dev _v0 10.254.3.1/24 $ ip -n A route add default dev _v0 scope global $ ip -n B link set _v1 up $ ip -n B addr add dev _v1 10.254.1.6/24 $ ip -n B route add default dev _v1 scope global $ ip -n B link add gre1 type gretap local 10.254.1.6 remote 10.254.3.1 key 0x49000000 $ ip -n B link set gre1 up # Now send an IPv4/GRE/Eth/IPv4 frame where the outer header has ECT(1), # and the inner header has no ECT bits set: $ cat send_pkt.py #!/usr/bin/env python3 from scapy.all import * pkt = IP(b'E\x01\x00\xa7\x00\x00\x00\x00@/`%\n\xfe\x03\x01\n\xfe\x01\x06 \x00eXI\x00' b'\x00\x00\x18\xbe\x92\xa0\xee&\x18\xb0\x92\xa0l&\x08\x00E\x00\x00}\x8b\x85' b'@\x00\x01\x01\xe4\xf2\x82\x82\x82\x01\x82\x82\x82\x02\x08\x00d\x11\xa6\xeb' b'3\x1e\x1e\\xf3\\xf7`\x00\x00\x00\x00ZN\x00\x00\x00\x00\x00\x00\x10\x11\x12' b'\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./01234' b'56789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ') send(pkt) $ sudo ip netns exec B tcpdump -neqlllvi gre1 icmp & ; sleep 1 $ sudo ip netns exec A python3 send_pkt.py In the original packet, the source/destinatio MAC addresses are dst=18:be:92:a0:ee:26 src=18:b0:92:a0:6c:26 In the received packet, they are dst=18:bd:92:a0:ee:26 src=18:b0:92:a0:6c:27 Thanks to Lahav Schlesinger and Isaac Garzon for helping me pinpoint the origin. Fixes: b723748750ec ("tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040") Cc: David S. Miller Cc: Hideaki YOSHIFUJI Cc: David Ahern Cc: Jakub Kicinski Cc: Toke Høiland-Jørgensen Signed-off-by: Gilad Naaman Acked-by: Toke Høiland-Jørgensen Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 7b3f85278d90410788480257f010a03dd82c391a Author: Hoang Le Date: Fri Jul 23 09:25:34 2021 +0700 tipc: fix sleeping in tipc accept routine [ Upstream commit d237a7f11719ff9320721be5818352e48071aab6 ] The release_sock() is blocking function, it would change the state after sleeping. In order to evaluate the stated condition outside the socket lock context, switch to use wait_woken() instead. Fixes: 6398e23cdb1d8 ("tipc: standardize accept routine") Acked-by: Jon Maloy Signed-off-by: Hoang Le Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit a73d03806166582f280d502ac4c4cbe77b192409 Author: Xin Long Date: Thu Jul 22 12:05:41 2021 -0400 tipc: fix implicit-connect for SYN+ [ Upstream commit f8dd60de194817c86bf812700980762bb5a8d9a4 ] For implicit-connect, when it's either SYN- or SYN+, an ACK should be sent back to the client immediately. It's not appropriate for the client to enter established state only after receiving data from the server. On client side, after the SYN is sent out, tipc_wait_for_connect() should be called to wait for the ACK if timeout is set. This patch also restricts __tipc_sendstream() to call __sendmsg() only when it's in TIPC_OPEN state, so that the client can program in a single loop doing both connecting and data sending like: for (...) sendmsg(dest, buf); This makes the implicit-connect more implicit. Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Xin Long Acked-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 457202b9132f0d05a4361c31ff456f9cdc98eec7 Author: Jedrzej Jagielski Date: Fri Jun 18 08:49:49 2021 +0000 i40e: Fix log TC creation failure when max num of queues is exceeded [ Upstream commit ea52faae1d17cd3048681d86d2e8641f44de484d ] Fix missing failed message if driver does not have enough queues to complete TC command. Without this fix no message is displayed in dmesg. Fixes: a9ce82f744dc ("i40e: Enable 'channel' mode in mqprio for TC configs") Signed-off-by: Grzegorz Szczurek Signed-off-by: Jedrzej Jagielski Tested-by: Imam Hassan Reza Biswas Signed-off-by: Tony Nguyen Signed-off-by: Sasha Levin commit a7ce70625f445e8180946ab6a87fbe5f17b6e8fc Author: Jedrzej Jagielski Date: Wed Jun 2 00:47:03 2021 +0000 i40e: Fix queue-to-TC mapping on Tx [ Upstream commit 89ec1f0886c127c7e41ac61a6b6d539f4fb2510b ] In SW DCB mode the packets sent receive incorrect UP tags. They are constructed correctly and put into tx_ring, but UP is later remapped by HW on the basis of TCTUPR register contents according to Tx queue selected, and BW used is consistent with the new UP values. This is caused by Tx queue selection in kernel not taking into account DCB configuration. This patch fixes the issue by implementing the ndo_select_queue NDO callback. Fixes: fd0a05ce74ef ("i40e: transmit, receive, and NAPI") Signed-off-by: Arkadiusz Kubalewski Signed-off-by: Jedrzej Jagielski Tested-by: Imam Hassan Reza Biswas Signed-off-by: Tony Nguyen Signed-off-by: Sasha Levin commit 644575296661f6bf8abbf9fffe8ccd22f0285735 Author: Arkadiusz Kubalewski Date: Fri May 21 18:41:26 2021 +0200 i40e: Fix firmware LLDP agent related warning [ Upstream commit 71d6fdba4b2d82fdd883fec31dee77fbcf59773a ] Make warning meaningful for the user. Previously the trace: "Starting FW LLDP agent failed: error: I40E_ERR_ADMIN_QUEUE_ERROR, I40E_AQ_RC_EAGAIN" was produced when user tried to start Firmware LLDP agent, just after it was stopped with sequence: ethtool --set-priv-flags disable-fw-lldp on ethtool --set-priv-flags disable-fw-lldp off (without any delay between the commands) At that point the firmware is still processing stop command, the behavior is expected. Fixes: c1041d070437 ("i40e: Missing response checks in driver when starting/stopping FW LLDP") Signed-off-by: Aleksandr Loktionov Signed-off-by: Arkadiusz Kubalewski Tested-by: Imam Hassan Reza Biswas Signed-off-by: Tony Nguyen Signed-off-by: Sasha Levin commit 79c71f5168c5388ee253c5ec907bba4149f6cc8b Author: Arkadiusz Kubalewski Date: Thu Apr 29 19:49:47 2021 +0200 i40e: Fix logic of disabling queues [ Upstream commit 65662a8dcdd01342b71ee44234bcfd0162e195af ] Correct the message flow between driver and firmware when disabling queues. Previously in case of PF reset (due to required reinit after reconfig), the error like: "VSI seid 397 Tx ring 60 disable timeout" could show up occasionally. The error was not a real issue of hardware or firmware, it was caused by wrong sequence of messages invoked by the driver. Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Aleksandr Loktionov Signed-off-by: Arkadiusz Kubalewski Tested-by: Tony Brelinski Signed-off-by: Tony Nguyen Signed-off-by: Sasha Levin commit 367bec7665d14f0cc5329c0137257604876d1ae1 Author: Pablo Neira Ayuso Date: Tue Jul 20 18:22:50 2021 +0200 netfilter: nft_nat: allow to specify layer 4 protocol NAT only [ Upstream commit a33f387ecd5aafae514095c2c4a8c24f7aea7e8b ] nft_nat reports a bogus EAFNOSUPPORT if no layer 3 information is specified. Fixes: d07db9884a5f ("netfilter: nf_tables: introduce nft_validate_register_load()") Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin commit 62659ab3563af310faac0f44356e991235385763 Author: Florian Westphal Date: Sun Jul 18 18:36:00 2021 +0200 netfilter: conntrack: adjust stop timestamp to real expiry value [ Upstream commit 30a56a2b881821625f79837d4d968c679852444e ] In case the entry is evicted via garbage collection there is delay between the timeout value and the eviction event. This adjusts the stop value based on how much time has passed. Fixes: b87a2f9199ea82 ("netfilter: conntrack: add gc worker to remove timed-out entries") Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin commit 525e6eb9258cae124c5dc7634c9f1a5b393594a5 Author: Felix Fietkau Date: Fri Jul 2 07:01:11 2021 +0200 mac80211: fix enabling 4-address mode on a sta vif after assoc [ Upstream commit a5d3cbdb09ff1f52cbe040932e06c8b9915c6dad ] Notify the driver about the 4-address mode change and also send a nulldata packet to the AP to notify it about the change Fixes: 1ff4e8f2dec8 ("mac80211: notify the driver when a sta uses 4-address mode") Signed-off-by: Felix Fietkau Link: https://lore.kernel.org/r/20210702050111.47546-1-nbd@nbd.name Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit 13b8ab2f6f84369dd6228ef0020893ef1230615d Author: Lorenz Bauer Date: Mon Jul 19 09:51:34 2021 +0100 bpf: Fix OOB read when printing XDP link fdinfo [ Upstream commit d6371c76e20d7d3f61b05fd67b596af4d14a8886 ] We got the following UBSAN report on one of our testing machines: ================================================================================ UBSAN: array-index-out-of-bounds in kernel/bpf/syscall.c:2389:24 index 6 is out of range for type 'char *[6]' CPU: 43 PID: 930921 Comm: systemd-coredum Tainted: G O 5.10.48-cloudflare-kasan-2021.7.0 #1 Hardware name: Call Trace: dump_stack+0x7d/0xa3 ubsan_epilogue+0x5/0x40 __ubsan_handle_out_of_bounds.cold+0x43/0x48 ? seq_printf+0x17d/0x250 bpf_link_show_fdinfo+0x329/0x380 ? bpf_map_value_size+0xe0/0xe0 ? put_files_struct+0x20/0x2d0 ? __kasan_kmalloc.constprop.0+0xc2/0xd0 seq_show+0x3f7/0x540 seq_read_iter+0x3f8/0x1040 seq_read+0x329/0x500 ? seq_read_iter+0x1040/0x1040 ? __fsnotify_parent+0x80/0x820 ? __fsnotify_update_child_dentry_flags+0x380/0x380 vfs_read+0x123/0x460 ksys_read+0xed/0x1c0 ? __x64_sys_pwrite64+0x1f0/0x1f0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ================================================================================ ================================================================================ UBSAN: object-size-mismatch in kernel/bpf/syscall.c:2384:2 From the report, we can infer that some array access in bpf_link_show_fdinfo at index 6 is out of bounds. The obvious candidate is bpf_link_type_strs[BPF_LINK_TYPE_XDP] with BPF_LINK_TYPE_XDP == 6. It turns out that BPF_LINK_TYPE_XDP is missing from bpf_types.h and therefore doesn't have an entry in bpf_link_type_strs: pos: 0 flags: 02000000 mnt_id: 13 link_type: (null) link_id: 4 prog_tag: bcf7977d3b93787c prog_id: 4 ifindex: 1 Fixes: aa8d3a716b59 ("bpf, xdp: Add bpf_link-based XDP attachment API") Signed-off-by: Lorenz Bauer Signed-off-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20210719085134.43325-2-lmb@cloudflare.com Signed-off-by: Sasha Levin commit 467c905bb613b88e9744757e0e9b2790e4a2ff9a Author: Dongliang Mu Date: Wed Jul 14 11:27:03 2021 +0800 netfilter: nf_tables: fix audit memory leak in nf_tables_commit [ Upstream commit cfbe3650dd3ef2ea9a4420ca89d9a4df98af3fb6 ] In nf_tables_commit, if nf_tables_commit_audit_alloc fails, it does not free the adp variable. Fix this by adding nf_tables_commit_audit_free which frees the linked list with the head node adl. backtrace: kmalloc include/linux/slab.h:591 [inline] kzalloc include/linux/slab.h:721 [inline] nf_tables_commit_audit_alloc net/netfilter/nf_tables_api.c:8439 [inline] nf_tables_commit+0x16e/0x1760 net/netfilter/nf_tables_api.c:8508 nfnetlink_rcv_batch+0x512/0xa80 net/netfilter/nfnetlink.c:562 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv+0x1fa/0x220 net/netfilter/nfnetlink.c:652 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x2c7/0x3e0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x36b/0x6b0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:702 [inline] sock_sendmsg+0x56/0x80 net/socket.c:722 Reported-by: syzbot Reported-by: kernel test robot Fixes: c520292f29b8 ("audit: log nftables configuration change events once per table") Signed-off-by: Dongliang Mu Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin commit f7046443c8acf96732ccdab615801ee1ee1758fc Author: Bob Pearson Date: Mon Jul 5 11:41:54 2021 -0500 RDMA/rxe: Fix memory leak in error path code [ Upstream commit b18c7da63fcb46e2f9a093cc18d7c219e13a887c ] In rxe_mr_init_user() at the third error the driver fails to free the memory at mr->map. This patch adds code to do that. This error only occurs if page_address() fails to return a non zero address which should never happen for 64 bit architectures. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20210705164153.17652-1-rpearsonhpe@gmail.com Reported by: Haakon Bugge Signed-off-by: Bob Pearson Reviewed-by: Zhu Yanjun Reviewed-by: Håkon Bugge Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin commit 1544d2b86fa7fb4409625a6f90ffcd513897a962 Author: Yang Yingliang Date: Thu Jul 15 15:43:27 2021 +0800 platform/x86: amd-pmc: Fix missing unlock on error in amd_pmc_send_cmd() [ Upstream commit 95edbbf78c3bdbd1daa921dd4a2e61c751e469ba ] Add the missing unlock before return from function amd_pmc_send_cmd() in the error handling case. Fixes: 95e1b60f8dc8 ("platform/x86: amd-pmc: Fix command completion code") Reported-by: Hulk Robot Signed-off-by: Yang Yingliang Link: https://lore.kernel.org/r/20210715074327.1966083-1-yangyingliang@huawei.com Signed-off-by: Hans de Goede Signed-off-by: Sasha Levin commit d23677f3da7ae3bbd6b5bf2ea31ee7f44d6a86fa Author: Shyam Sundar S K Date: Tue Jun 29 14:17:58 2021 +0530 platform/x86: amd-pmc: Fix SMU firmware reporting mechanism [ Upstream commit 4c06d35dfedf4c1fd03702e0f05292a69d020e21 ] It was lately understood that the current mechanism available in the driver to get SMU firmware info works only on internal SMU builds and there is a separate way to get all the SMU logging counters (addressed in the next patch). Hence remove all the smu info shown via debugfs as it is no more useful. Fixes: 156ec4731cb2 ("platform/x86: amd-pmc: Add AMD platform support for S2Idle") Signed-off-by: Shyam Sundar S K Reviewed-by: Hans de Goede Link: https://lore.kernel.org/r/20210629084803.248498-3-Shyam-sundar.S-k@amd.com Signed-off-by: Hans de Goede Signed-off-by: Sasha Levin commit 91f5c8fb6d8e000dcbb9c897fdc04cba0ded462d Author: Shyam Sundar S K Date: Tue Jun 29 14:17:57 2021 +0530 platform/x86: amd-pmc: Fix command completion code [ Upstream commit 95e1b60f8dc8f225b14619e9aca9bdd7d99167db ] The protocol to submit a job request to SMU is to wait for AMD_PMC_REGISTER_RESPONSE to return 1,meaning SMU is ready to take requests. PMC driver has to make sure that the response code is always AMD_PMC_RESULT_OK before making any command submissions. When we submit a message to SMU, we have to wait until it processes the request. Adding a read_poll_timeout() check as this was missing in the existing code. Also, add a mutex to protect amd_pmc_send_cmd() calls to SMU. Fixes: 156ec4731cb2 ("platform/x86: amd-pmc: Add AMD platform support for S2Idle") Signed-off-by: Shyam Sundar S K Acked-by: Raul E Rangel Reviewed-by: Hans de Goede Link: https://lore.kernel.org/r/20210629084803.248498-2-Shyam-sundar.S-k@amd.com Signed-off-by: Hans de Goede Signed-off-by: Sasha Levin commit 7113367b22294c16cd74cfe56f76c1cf15ea67c6 Author: Naresh Kumar PBS Date: Sun Jul 11 06:31:36 2021 -0700 RDMA/bnxt_re: Fix stats counters [ Upstream commit 0c23af52ccd1605926480b5dfd1dd857ef604611 ] Statistical counters are not incrementing in some adapter versions with newer FW. This is due to the stats context length mismatch between FW and driver. Since the L2 driver updates the length correctly, use the stats length from L2 driver while allocating the DMA'able memory and creating the stats context. Fixes: 9d6b648c3112 ("bnxt_en: Update firmware interface spec to 1.10.1.65.") Link: https://lore.kernel.org/r/1626010296-6076-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Naresh Kumar PBS Signed-off-by: Selvin Xavier Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin commit d68acf3537656959493419acaf0c66fb6242f21f Author: Nguyen Dinh Phi Date: Mon Jun 28 21:23:34 2021 +0800 cfg80211: Fix possible memory leak in function cfg80211_bss_update commit f9a5c358c8d26fed0cc45f2afc64633d4ba21dff upstream. When we exceed the limit of BSS entries, this function will free the new entry, however, at this time, it is the last door to access the inputed ies, so these ies will be unreferenced objects and cause memory leak. Therefore we should free its ies before deallocating the new entry, beside of dropping it from hidden_list. Signed-off-by: Nguyen Dinh Phi Link: https://lore.kernel.org/r/20210628132334.851095-1-phind.uet@gmail.com Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit a8d4169f924ad53ab10b9d4762a274ab241c16d8 Author: Hao Xu Date: Wed Jul 28 11:03:22 2021 +0800 io_uring: fix poll requests leaking second poll entries commit a890d01e4ee016978776e45340e521b3bbbdf41f upstream. For pure poll requests, it doesn't remove the second poll wait entry when it's done, neither after vfs_poll() or in the poll completion handler. We should remove the second poll wait entry. And we use io_poll_remove_double() rather than io_poll_remove_waitqs() since the latter has some redundant logic. Fixes: 88e41cf928a6 ("io_uring: add multishot mode for IORING_OP_POLL_ADD") Cc: stable@vger.kernel.org # 5.13+ Signed-off-by: Hao Xu Link: https://lore.kernel.org/r/20210728030322.12307-1-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 5db0ca0fbebf1c502f8a2e948ddc54436ad19cf6 Author: Jens Axboe Date: Tue Jul 27 10:50:31 2021 -0600 io_uring: don't block level reissue off completion path commit ef04688871f3386b6d40ade8f5c664290420f819 upstream. Some setups, like SCSI, can throw spurious -EAGAIN off the softirq completion path. Normally we expect this to happen inline as part of submission, but apparently SCSI has a weird corner case where it can happen as part of normal completions. This should be solved by having the -EAGAIN bubble back up the stack as part of submission, but previous attempts at this failed and we're not just quite there yet. Instead we currently use REQ_F_REISSUE to handle this case. For now, catch it in io_rw_should_reissue() and prevent a reissue from a bogus path. Cc: stable@vger.kernel.org Reported-by: Fabian Ebner Tested-by: Fabian Ebner Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 5bb49c88472f532d96ce2e18f178c34df62484f6 Author: Pavel Begunkov Date: Mon Jul 26 14:14:31 2021 +0100 io_uring: fix io_prep_async_link locking commit 44eff40a32e8f5228ae041006352e32638ad2368 upstream. io_prep_async_link() may be called after arming a linked timeout, automatically making it unsafe to traverse the linked list. Guard with completion_lock if there was a linked timeout. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/93f7c617e2b4f012a2a175b3dab6bc2f27cebc48.1627304436.git.asml.silence@gmail.com Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit ca324a215bf9f421f36f9634057458557c66ca7b Author: Krzysztof Kozlowski Date: Wed Jul 28 08:49:09 2021 +0200 nfc: nfcsim: fix use after free during module unload commit 5e7b30d24a5b8cb691c173b45b50e3ca0191be19 upstream. There is a use after free memory corruption during module exit: - nfcsim_exit() - nfcsim_device_free(dev0) - nfc_digital_unregister_device() This iterates over command queue and frees all commands, - dev->up = false - nfcsim_link_shutdown() - nfcsim_link_recv_wake() This wakes the sleeping thread nfcsim_link_recv_skb(). - nfcsim_link_recv_skb() Wake from wait_event_interruptible_timeout(), call directly the deb->cb callback even though (dev->up == false), - digital_send_cmd_complete() Dereference of "struct digital_cmd" cmd which was freed earlier by nfc_digital_unregister_device(). This causes memory corruption shortly after (with unrelated stack trace): nfc nfc0: NFC: nfcsim_recv_wq: Device is down llcp: nfc_llcp_recv: err -19 nfc nfc1: NFC: nfcsim_recv_wq: Device is down BUG: unable to handle page fault for address: ffffffffffffffed Call Trace: fsnotify+0x54b/0x5c0 __fsnotify_parent+0x1fe/0x300 ? vfs_write+0x27c/0x390 vfs_write+0x27c/0x390 ksys_write+0x63/0xe0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae KASAN report: BUG: KASAN: use-after-free in digital_send_cmd_complete+0x16/0x50 Write of size 8 at addr ffff88800a05f720 by task kworker/0:2/71 Workqueue: events nfcsim_recv_wq [nfcsim] Call Trace:  dump_stack_lvl+0x45/0x59  print_address_description.constprop.0+0x21/0x140  ? digital_send_cmd_complete+0x16/0x50  ? digital_send_cmd_complete+0x16/0x50  kasan_report.cold+0x7f/0x11b  ? digital_send_cmd_complete+0x16/0x50  ? digital_dep_link_down+0x60/0x60  digital_send_cmd_complete+0x16/0x50  nfcsim_recv_wq+0x38f/0x3d5 [nfcsim]  ? nfcsim_in_send_cmd+0x4a/0x4a [nfcsim]  ? lock_is_held_type+0x98/0x110  ? finish_wait+0x110/0x110  ? rcu_read_lock_sched_held+0x9c/0xd0  ? rcu_read_lock_bh_held+0xb0/0xb0  ? lockdep_hardirqs_on_prepare+0x12e/0x1f0 This flow of calling digital_send_cmd_complete() callback on driver exit is specific to nfcsim which implements reading and sending work queues. Since the NFC digital device was unregistered, the callback should not be called. Fixes: 204bddcb508f ("NFC: nfcsim: Make use of the Digital layer") Cc: Signed-off-by: Krzysztof Kozlowski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit caed0df2e52daff8f7a3152bba9eca9fbf20e8eb Author: Tejun Heo Date: Tue Jul 27 14:38:09 2021 -1000 blk-iocost: fix operation ordering in iocg_wake_fn() commit 5ab189cf3abbc9994bae3be524c5b88589ed56e2 upstream. iocg_wake_fn() open-codes wait_queue_entry removal and wakeup because it wants the wq_entry to be always removed whether it ended up waking the task or not. finish_wait() tests whether wq_entry needs removal without grabbing the wait_queue lock and expects the waker to use list_del_init_careful() after all waking operations are complete, which iocg_wake_fn() didn't do. The operation order was wrong and the regular list_del_init() was used. The result is that if a waiter wakes up racing the waker, it can free pop the wq_entry off stack before the waker is still looking at it, which can lead to a backtrace like the following. [7312084.588951] general protection fault, probably for non-canonical address 0x586bf4005b2b88: 0000 [#1] SMP ... [7312084.647079] RIP: 0010:queued_spin_lock_slowpath+0x171/0x1b0 ... [7312084.858314] Call Trace: [7312084.863548] _raw_spin_lock_irqsave+0x22/0x30 [7312084.872605] try_to_wake_up+0x4c/0x4f0 [7312084.880444] iocg_wake_fn+0x71/0x80 [7312084.887763] __wake_up_common+0x71/0x140 [7312084.895951] iocg_kick_waitq+0xe8/0x2b0 [7312084.903964] ioc_rqos_throttle+0x275/0x650 [7312084.922423] __rq_qos_throttle+0x20/0x30 [7312084.930608] blk_mq_make_request+0x120/0x650 [7312084.939490] generic_make_request+0xca/0x310 [7312084.957600] submit_bio+0x173/0x200 [7312084.981806] swap_readpage+0x15c/0x240 [7312084.989646] read_swap_cache_async+0x58/0x60 [7312084.998527] swap_cluster_readahead+0x201/0x320 [7312085.023432] swapin_readahead+0x2df/0x450 [7312085.040672] do_swap_page+0x52f/0x820 [7312085.058259] handle_mm_fault+0xa16/0x1420 [7312085.066620] do_page_fault+0x2c6/0x5c0 [7312085.074459] page_fault+0x2f/0x40 Fix it by switching to list_del_init_careful() and putting it at the end. Signed-off-by: Tejun Heo Reported-by: Rik van Riel Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 749abc8d274f76de3c927e62abf5631712f0d9c9 Author: Jiri Kosina Date: Thu Jun 24 13:20:21 2021 +0200 drm/amdgpu: Fix resource leak on probe error path commit d47255d3f87338164762ac56df1f28d751e27246 upstream. This reverts commit 4192f7b5768912ceda82be2f83c87ea7181f9980. It is not true (as stated in the reverted commit changelog) that we never unmap the BAR on failure; it actually does happen properly on amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() -> amdgpu_device_fini() error path. What's worse, this commit actually completely breaks resource freeing on probe failure (like e.g. failure to load microcode), as amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too early, leaving all the resources that'd normally be freed in amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading to all sorts of oopses when someone tries to, for example, access the sysfs and procfs resources which are still around while the driver is gone. Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure") Reported-by: Vojtech Pavlik Signed-off-by: Jiri Kosina Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit 070f46bcf6b4e76b129e7b8316d6a68a7fd521a1 Author: Jiri Kosina Date: Thu Jun 24 13:11:36 2021 +0200 drm/amdgpu: Avoid printing of stack contents on firmware load error commit 6aade587d329ebe88319dfdb8e8c7b6aede80417 upstream. In case when psp_init_asd_microcode() fails to load ASD microcode file, psp_v12_0_init_microcode() tries to print the firmware filename that failed to load before bailing out. This is wrong because: - the firmware filename it would want it print is an incorrect one as psp_init_asd_microcode() and psp_v12_0_init_microcode() are loading different filenames - it tries to print fw_name, but that's not yet been initialized by that time, so it prints random stack contents, e.g. amdgpu 0000:04:00.0: Direct firmware load for amdgpu/renoir_asd.bin failed with error -2 amdgpu 0000:04:00.0: amdgpu: fail to initialize asd microcode amdgpu 0000:04:00.0: amdgpu: psp v12.0: Failed to load firmware "\xfeTO\x8e\xff\xff" Fix that by bailing out immediately, instead of priting the bogus error message. Reported-by: Vojtech Pavlik Signed-off-by: Jiri Kosina Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit 4e7961b3d5fd9e05a24487b2e81624f74b65a7dc Author: Pratik Vishwakarma Date: Fri Jul 23 18:08:40 2021 +0530 drm/amdgpu: Check pmops for desired suspend state commit 91e273712ab8dd8c31924ac7714b21e011137e98 upstream. [Why] User might change the suspend behaviour from OS. [How] Check with pm for target suspend state and set s0ix flag only for s2idle state. v2: User might change default suspend state, use target state v3: squash in build fix Suggested-by: Lijo Lazar Signed-off-by: Pratik Vishwakarma Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit 0652b1eade5311f1bd7f9961ae93aed2b660a3d1 Author: Dale Zhao Date: Fri Jul 16 09:38:17 2021 +0800 drm/amd/display: ensure dentist display clock update finished in DCN20 commit b53e041d8e4308f7324999398aec092dbcb130f5 upstream. [Why] We don't check DENTIST_DISPCLK_CHG_DONE to ensure dentist display clockis updated to target value. In some scenarios with large display clock margin, it will deliver unfinished display clock and cause issues like display black screen. [How] Checking DENTIST_DISPCLK_CHG_DONE to ensure display clock has been update to target value before driver do other clock related actions. Reviewed-by: Cyr Aric Acked-by: Solomon Chiu Signed-off-by: Dale Zhao Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit 9c2cae70e3a0065f7136acd2c3de6748eafaefd6 Author: Paul Jakma Date: Fri Jul 23 16:13:04 2021 +0100 NIU: fix incorrect error return, missed in previous revert commit 15bbf8bb4d4ab87108ecf5f4155ec8ffa3c141d6 upstream. Commit 7930742d6, reverting 26fd962, missed out on reverting an incorrect change to a return value. The niu_pci_vpd_scan_props(..) == 1 case appears to be a normal path - treating it as an error and return -EINVAL was breaking VPD_SCAN and causing the driver to fail to load. Fix, so my Neptune card works again. Cc: Kangjie Lu Cc: Shannon Nelson Cc: David S. Miller Cc: Greg Kroah-Hartman Cc: stable Fixes: 7930742d ('Revert "niu: fix missing checks of niu_pci_eeprom_read"') Signed-off-by: Paul Jakma Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 633799ddcff43ab48678949f0b87965391d7da6e Author: Mohammad Athari Bin Ismail Date: Mon Jul 26 10:20:20 2021 +0800 net: stmmac: add est_irq_status callback function for GMAC 4.10 and 5.10 commit 94cbe7db7d757c2d481c3617ab5579a28cfc2175 upstream. Assign dwmac5_est_irq_status to est_irq_status callback function for GMAC 4.10 and 5.10. With this, EST related interrupts could be handled properly. Fixes: e49aa315cb01 ("net: stmmac: EST interrupts handling and error reporting") Cc: # 5.13.x Signed-off-by: Mohammad Athari Bin Ismail Acked-by: Wong Vee Khee Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fa1c5eff378fbe2416dafe42f7c98271e444d995 Author: Jason Gerecke Date: Mon Jul 19 13:55:28 2021 -0700 HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT commit 6ca2350e11f09d5d3e53777d1eff8ff6d300ed93 upstream. Commit 670e90924bfe ("HID: wacom: support named keys on older devices") added support for sending named events from the soft buttons on the 24HDT and 27QHDT. In the process, however, it inadvertantly disabled the touchscreen of the 24HDT and 27QHDT by default. The `wacom_set_shared_values` function would normally enable touch by default but because it checks the state of the non-shared `has_mute_touch_switch` flag and `wacom_setup_touch_input_capabilities` sets the state of the /shared/ version, touch ends up being disabled by default. This patch sets the non-shared flag, letting `wacom_set_shared_values` take care of copying the value over to the shared version and setting the default touch state to "on". Fixes: 670e90924bfe ("HID: wacom: support named keys on older devices") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Jason Gerecke Reviewed-by: Ping Cheng Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit 892ced352e05b5f42bdc8184401a32b088af6858 Author: Mike Rapoport Date: Tue Jul 27 23:38:24 2021 +0300 alpha: register early reserved memory in memblock commit 640b7ea5f888b521dcf28e2564ce75d08a783fd7 upstream. The memory reserved by console/PALcode or non-volatile memory is not added to memblock.memory. Since commit fa3354e4ea39 (mm: free_area_init: use maximal zone PFNs rather than zone sizes) the initialization of the memory map relies on the accuracy of memblock.memory to properly calculate zone sizes. The holes in memblock.memory caused by absent regions reserved by the firmware cause incorrect initialization of struct pages which leads to BUG() during the initial page freeing: BUG: Bad page state in process swapper pfn:2ffc53 page:fffffc000ecf14c0 refcount:0 mapcount:1 mapping:0000000000000000 index:0x0 flags: 0x0() raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-03841-gfa3354e4ea39-dirty #26 fffffc0001b5bd68 fffffc0001b5be80 fffffc00011cd148 fffffc000ecf14c0 fffffc00019803df fffffc0001b5be80 fffffc00011ce340 fffffc000ecf14c0 0000000000000000 fffffc0001b5be80 fffffc0001b482c0 fffffc00027d6618 fffffc00027da7d0 00000000002ff97a 0000000000000000 fffffc0001b5be80 fffffc00011d1abc fffffc000ecf14c0 fffffc0002d00000 fffffc0001b5be80 fffffc0001b2350c 0000000000300000 fffffc0001b48298 fffffc0001b482c0 Trace: [] bad_page+0x168/0x1b0 [] free_pcp_prepare+0x1e0/0x290 [] free_unref_page+0x2c/0xa0 [] cmp_ex_sort+0x0/0x30 [] cmp_ex_sort+0x0/0x30 [] _stext+0x1c/0x20 Fix this by registering the reserved ranges in memblock.memory. Link: https://lore.kernel.org/lkml/20210726192311.uffqnanxw3ac5wwi@ivybridge Fixes: fa3354e4ea39 ("mm: free_area_init: use maximal zone PFNs rather than zone sizes") Reported-by: Matt Turner Cc: Signed-off-by: Mike Rapoport Signed-off-by: Matt Turner Signed-off-by: Greg Kroah-Hartman commit a63d311c966cb022e167ef391c59d94f86185bb7 Author: Pavel Skripkin Date: Tue Jul 27 20:00:46 2021 +0300 can: esd_usb2: fix memory leak commit 928150fad41ba16df7fcc9f7f945747d0f56cbb6 upstream. In esd_usb2_setup_rx_urbs() MAX_RX_URBS coherent buffers are allocated and there is nothing, that frees them: 1) In callback function the urb is resubmitted and that's all 2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER is not set (see esd_usb2_setup_rx_urbs) and this flag cannot be used with coherent buffers. So, all allocated buffers should be freed with usb_free_coherent() explicitly. Side note: This code looks like a copy-paste of other can drivers. The same patch was applied to mcba_usb driver and it works nice with real hardware. There is no change in functionality, only clean-up code for coherent buffers. Fixes: 96d8e90382dc ("can: Add driver for esd CAN-USB/2 device") Link: https://lore.kernel.org/r/b31b096926dcb35998ad0271aac4b51770ca7cc8.1627404470.git.paskripkin@gmail.com Cc: linux-stable Signed-off-by: Pavel Skripkin Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit d23e7c014cc4e3bcbb62676f783e749b7896bdb1 Author: Pavel Skripkin Date: Tue Jul 27 20:00:33 2021 +0300 can: ems_usb: fix memory leak commit 9969e3c5f40c166e3396acc36c34f9de502929f6 upstream. In ems_usb_start() MAX_RX_URBS coherent buffers are allocated and there is nothing, that frees them: 1) In callback function the urb is resubmitted and that's all 2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER is not set (see ems_usb_start) and this flag cannot be used with coherent buffers. So, all allocated buffers should be freed with usb_free_coherent() explicitly. Side note: This code looks like a copy-paste of other can drivers. The same patch was applied to mcba_usb driver and it works nice with real hardware. There is no change in functionality, only clean-up code for coherent buffers. Fixes: 702171adeed3 ("ems_usb: Added support for EMS CPC-USB/ARM7 CAN/USB interface") Link: https://lore.kernel.org/r/59aa9fbc9a8cbf9af2bbd2f61a659c480b415800.1627404470.git.paskripkin@gmail.com Cc: linux-stable Signed-off-by: Pavel Skripkin Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit 62365842aed3d0991b3e4987b59f6bf9153afebd Author: Pavel Skripkin Date: Tue Jul 27 19:59:57 2021 +0300 can: usb_8dev: fix memory leak commit 0e865f0c31928d6a313269ef624907eec55287c4 upstream. In usb_8dev_start() MAX_RX_URBS coherent buffers are allocated and there is nothing, that frees them: 1) In callback function the urb is resubmitted and that's all 2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER is not set (see usb_8dev_start) and this flag cannot be used with coherent buffers. So, all allocated buffers should be freed with usb_free_coherent() explicitly. Side note: This code looks like a copy-paste of other can drivers. The same patch was applied to mcba_usb driver and it works nice with real hardware. There is no change in functionality, only clean-up code for coherent buffers. Fixes: 0024d8ad1639 ("can: usb_8dev: Add support for USB2CAN interface from 8 devices") Link: https://lore.kernel.org/r/d39b458cd425a1cf7f512f340224e6e9563b07bd.1627404470.git.paskripkin@gmail.com Cc: linux-stable Signed-off-by: Pavel Skripkin Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit 78673a83947b8b445357ece9929fe1e329ac6d05 Author: Pavel Skripkin Date: Sun Jul 25 13:36:30 2021 +0300 can: mcba_usb_start(): add missing urb->transfer_dma initialization commit fc43fb69a7af92839551f99c1a96a37b77b3ae7a upstream. Yasushi reported, that his Microchip CAN Analyzer stopped working since commit 91c02557174b ("can: mcba_usb: fix memory leak in mcba_usb"). The problem was in missing urb->transfer_dma initialization. In my previous patch to this driver I refactored mcba_usb_start() code to avoid leaking usb coherent buffers. To archive it, I passed local stack variable to usb_alloc_coherent() and then saved it to private array to correctly free all coherent buffers on ->close() call. But I forgot to initialize urb->transfer_dma with variable passed to usb_alloc_coherent(). All of this was causing device to not work, since dma addr 0 is not valid and following log can be found on bug report page, which points exactly to problem described above. | DMAR: [DMA Write] Request device [00:14.0] PASID ffffffff fault addr 0 [fault reason 05] PTE Write access is not set Fixes: 91c02557174b ("can: mcba_usb: fix memory leak in mcba_usb") Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=990850 Link: https://lore.kernel.org/r/20210725103630.23864-1-paskripkin@gmail.com Cc: linux-stable Reported-by: Yasushi SHOJI Signed-off-by: Pavel Skripkin Tested-by: Yasushi SHOJI [mkl: fixed typos in commit message - thanks Yasushi SHOJI] Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit 87d268fe1bdb77abfe84d379ff81a8c8929de229 Author: Stephane Grosjean Date: Fri Jun 25 15:09:29 2021 +0200 can: peak_usb: pcan_usb_handle_bus_evt(): fix reading rxerr/txerr values commit 590eb2b7d8cfafb27e8108d52d4bf4850626d31d upstream. This patch fixes an incorrect way of reading error counters in messages received for this purpose from the PCAN-USB interface. These messages inform about the increase or decrease of the error counters, whose values are placed in bytes 1 and 2 of the message data (not 0 and 1). Fixes: ea8b33bde76c ("can: pcan_usb: add support of rxerr/txerr counters") Link: https://lore.kernel.org/r/20210625130931.27438-4-s.grosjean@peak-system.com Cc: linux-stable Signed-off-by: Stephane Grosjean Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit aec236c7147ae1c3dd3186b1eda949894f9a80fa Author: Ziyang Xuan Date: Thu Jul 22 15:08:19 2021 +0800 can: raw: raw_setsockopt(): fix raw_rcv panic for sock UAF commit 54f93336d000229f72c26d8a3f69dd256b744528 upstream. We get a bug during ltp can_filter test as following. =========================================== [60919.264984] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [60919.265223] PGD 8000003dda726067 P4D 8000003dda726067 PUD 3dda727067 PMD 0 [60919.265443] Oops: 0000 [#1] SMP PTI [60919.265550] CPU: 30 PID: 3638365 Comm: can_filter Kdump: loaded Tainted: G W 4.19.90+ #1 [60919.266068] RIP: 0010:selinux_socket_sock_rcv_skb+0x3e/0x200 [60919.293289] RSP: 0018:ffff8d53bfc03cf8 EFLAGS: 00010246 [60919.307140] RAX: 0000000000000000 RBX: 000000000000001d RCX: 0000000000000007 [60919.320756] RDX: 0000000000000001 RSI: ffff8d5104a8ed00 RDI: ffff8d53bfc03d30 [60919.334319] RBP: ffff8d9338056800 R08: ffff8d53bfc29d80 R09: 0000000000000001 [60919.347969] R10: ffff8d53bfc03ec0 R11: ffffb8526ef47c98 R12: ffff8d53bfc03d30 [60919.350320] perf: interrupt took too long (3063 > 2500), lowering kernel.perf_event_max_sample_rate to 65000 [60919.361148] R13: 0000000000000001 R14: ffff8d53bcf90000 R15: 0000000000000000 [60919.361151] FS: 00007fb78b6b3600(0000) GS:ffff8d53bfc00000(0000) knlGS:0000000000000000 [60919.400812] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [60919.413730] CR2: 0000000000000010 CR3: 0000003e3f784006 CR4: 00000000007606e0 [60919.426479] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [60919.439339] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [60919.451608] PKRU: 55555554 [60919.463622] Call Trace: [60919.475617] [60919.487122] ? update_load_avg+0x89/0x5d0 [60919.498478] ? update_load_avg+0x89/0x5d0 [60919.509822] ? account_entity_enqueue+0xc5/0xf0 [60919.520709] security_sock_rcv_skb+0x2a/0x40 [60919.531413] sk_filter_trim_cap+0x47/0x1b0 [60919.542178] ? kmem_cache_alloc+0x38/0x1b0 [60919.552444] sock_queue_rcv_skb+0x17/0x30 [60919.562477] raw_rcv+0x110/0x190 [can_raw] [60919.572539] can_rcv_filter+0xbc/0x1b0 [can] [60919.582173] can_receive+0x6b/0xb0 [can] [60919.591595] can_rcv+0x31/0x70 [can] [60919.600783] __netif_receive_skb_one_core+0x5a/0x80 [60919.609864] process_backlog+0x9b/0x150 [60919.618691] net_rx_action+0x156/0x400 [60919.627310] ? sched_clock_cpu+0xc/0xa0 [60919.635714] __do_softirq+0xe8/0x2e9 [60919.644161] do_softirq_own_stack+0x2a/0x40 [60919.652154] [60919.659899] do_softirq.part.17+0x4f/0x60 [60919.667475] __local_bh_enable_ip+0x60/0x70 [60919.675089] __dev_queue_xmit+0x539/0x920 [60919.682267] ? finish_wait+0x80/0x80 [60919.689218] ? finish_wait+0x80/0x80 [60919.695886] ? sock_alloc_send_pskb+0x211/0x230 [60919.702395] ? can_send+0xe5/0x1f0 [can] [60919.708882] can_send+0xe5/0x1f0 [can] [60919.715037] raw_sendmsg+0x16d/0x268 [can_raw] It's because raw_setsockopt() concurrently with unregister_netdevice_many(). Concurrent scenario as following. cpu0 cpu1 raw_bind raw_setsockopt unregister_netdevice_many unlist_netdevice dev_get_by_index raw_notifier raw_enable_filters ...... can_rx_register can_rcv_list_find(..., net->can.rx_alldev_list) ...... sock_close raw_release(sock_a) ...... can_receive can_rcv_filter(net->can.rx_alldev_list, ...) raw_rcv(skb, sock_a) BUG After unlist_netdevice(), dev_get_by_index() return NULL in raw_setsockopt(). Function raw_enable_filters() will add sock and can_filter to net->can.rx_alldev_list. Then the sock is closed. Followed by, we sock_sendmsg() to a new vcan device use the same can_filter. Protocol stack match the old receiver whose sock has been released on net->can.rx_alldev_list in can_rcv_filter(). Function raw_rcv() uses the freed sock. UAF BUG is triggered. We can find that the key issue is that net_device has not been protected in raw_setsockopt(). Use rtnl_lock to protect net_device in raw_setsockopt(). Fixes: c18ce101f2e4 ("[CAN]: Add raw protocol") Link: https://lore.kernel.org/r/20210722070819.1048263-1-william.xuanziyang@huawei.com Cc: linux-stable Signed-off-by: Ziyang Xuan Acked-by: Oliver Hartkopp Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit ea9e6fc2bc5d0c19b069373376222a83bfd0d53d Author: Zhang Changzhong Date: Tue Jul 6 19:00:08 2021 +0800 can: j1939: j1939_xtp_rx_dat_one(): fix rxtimer value between consecutive TP.DT to 750ms commit c6eea1c8bda56737752465a298dc6ce07d6b8ce3 upstream. For receive side, the max time interval between two consecutive TP.DT should be 750ms. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/r/1625569210-47506-1-git-send-email-zhangchangzhong@huawei.com Cc: linux-stable Signed-off-by: Zhang Changzhong Acked-by: Oleksij Rempel Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit 9293727af5390d3c94e640ea4b6e023be07ef8ff Author: Wang Hai Date: Thu Jul 29 14:53:54 2021 -0700 mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() commit 121dffe20b141c9b27f39d49b15882469cbebae7 upstream. When I use kfree_rcu() to free a large memory allocated by kmalloc_node(), the following dump occurs. BUG: kernel NULL pointer dereference, address: 0000000000000020 [...] Oops: 0000 [#1] SMP [...] Workqueue: events kfree_rcu_work RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline] RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline] RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363 [...] Call Trace: kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293 kfree_bulk include/linux/slab.h:413 [inline] kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300 process_one_work+0x207/0x530 kernel/workqueue.c:2276 worker_thread+0x320/0x610 kernel/workqueue.c:2422 kthread+0x13d/0x160 kernel/kthread.c:313 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 When kmalloc_node() a large memory, page is allocated, not slab, so when freeing memory via kfree_rcu(), this large memory should not be used by memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for slab. Using page_objcgs_check() instead of page_objcgs() in memcg_slab_free_hook() to fix this bug. Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com Fixes: 270c6a71460e ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data") Signed-off-by: Wang Hai Reviewed-by: Shakeel Butt Acked-by: Michal Hocko Acked-by: Roman Gushchin Reviewed-by: Kefeng Wang Reviewed-by: Muchun Song Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Vlastimil Babka Cc: Johannes Weiner Cc: Alexei Starovoitov Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 87370a9d413acac889089b2022971abd260ccb5b Author: Johannes Weiner Date: Thu Jul 29 14:53:44 2021 -0700 mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code commit 30def93565e5ba08676aa2b9083f253fc586dbed upstream. Dan Carpenter reports: The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr 29, 2021, leads to the following static checker warning: kernel/cgroup/rstat.c:200 cgroup_rstat_flush() warn: sleeping in atomic context mm/memcontrol.c 3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) 3573 { 3574 unsigned long val; 3575 3576 if (mem_cgroup_is_root(memcg)) { 3577 cgroup_rstat_flush(memcg->css.cgroup); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is from static analysis and potentially a false positive. The problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold() which holds an rcu_read_lock(). And the cgroup_rstat_flush() function can sleep. 3578 val = memcg_page_state(memcg, NR_FILE_PAGES) + 3579 memcg_page_state(memcg, NR_ANON_MAPPED); 3580 if (swap) 3581 val += memcg_page_state(memcg, MEMCG_SWAP); 3582 } else { 3583 if (!swap) 3584 val = page_counter_read(&memcg->memory); 3585 else 3586 val = page_counter_read(&memcg->memsw); 3587 } 3588 return val; 3589 } __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the thresholding code is invoked during stat changes, and those contexts have irqs disabled as well. If the lock breaking occurs inside the flush function, it will result in a sleep from an atomic context. Use the irqsafe flushing variant in mem_cgroup_usage() to fix this. Link: https://lkml.kernel.org/r/20210726150019.251820-1-hannes@cmpxchg.org Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat") Signed-off-by: Johannes Weiner Reported-by: Dan Carpenter Acked-by: Chris Down Reviewed-by: Rik van Riel Acked-by: Michal Hocko Reviewed-by: Shakeel Butt Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 3df2bd9978b1448edab1c2a9f3551d96fe8b8bae Author: Junxiao Bi Date: Thu Jul 29 14:53:41 2021 -0700 ocfs2: issue zeroout to EOF blocks commit 9449ad33be8480f538b11a593e2dda2fb33ca06d upstream. For punch holes in EOF blocks, fallocate used buffer write to zero the EOF blocks in last cluster. But since ->writepage will ignore EOF pages, those zeros will not be flushed. This "looks" ok as commit 6bba4471f0cc ("ocfs2: fix data corruption by fallocate") will zero the EOF blocks when extend the file size, but it isn't. The problem happened on those EOF pages, before writeback, those pages had DIRTY flag set and all buffer_head in them also had DIRTY flag set, when writeback run by write_cache_pages(), DIRTY flag on the page was cleared, but DIRTY flag on the buffer_head not. When next write happened to those EOF pages, since buffer_head already had DIRTY flag set, it would not mark page DIRTY again. That made writeback ignore them forever. That will cause data corruption. Even directio write can't work because it will fail when trying to drop pages caches before direct io, as it found the buffer_head for those pages still had DIRTY flag set, then it will fall back to buffer io mode. To make a summary of the issue, as writeback ingores EOF pages, once any EOF page is generated, any write to it will only go to the page cache, it will never be flushed to disk even file size extends and that page is not EOF page any more. The fix is to avoid zero EOF blocks with buffer write. The following code snippet from qemu-img could trigger the corruption. 656 open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR|O_DIRECT|O_CLOEXEC) = 11 ... 660 fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680 660 fallocate(11, 0, 2275868672, 327680) = 0 658 pwrite64(11, " Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit c9302ab319ed6df6565ac90d64153336063b17ae Author: Junxiao Bi Date: Thu Jul 29 14:53:38 2021 -0700 ocfs2: fix zero out valid data commit f267aeb6dea5e468793e5b8eb6a9c72c0020d418 upstream. If append-dio feature is enabled, direct-io write and fallocate could run in parallel to extend file size, fallocate used "orig_isize" to record i_size before taking "ip_alloc_sem", when ocfs2_zeroout_partial_cluster() zeroout EOF blocks, i_size maybe already extended by ocfs2_dio_end_io_write(), that will cause valid data zeroed out. Link: https://lkml.kernel.org/r/20210722054923.24389-1-junxiao.bi@oracle.com Fixes: 6bba4471f0cc ("ocfs2: fix data corruption by fallocate") Signed-off-by: Junxiao Bi Reviewed-by: Joseph Qi Cc: Changwei Ge Cc: Gang He Cc: Joel Becker Cc: Jun Piao Cc: Mark Fasheh Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit a9f2d0884d700130b3e2a62ca00822efc0006cce Author: Paolo Bonzini Date: Tue Jul 27 08:43:10 2021 -0400 KVM: add missing compat KVM_CLEAR_DIRTY_LOG commit 8750f9bbda115f3f79bfe43be85551ee5e12b6ff upstream. The arguments to the KVM_CLEAR_DIRTY_LOG ioctl include a pointer, therefore it needs a compat ioctl implementation. Otherwise, 32-bit userspace fails to invoke it on 64-bit kernels; for x86 it might work fine by chance if the padding is zero, but not on big-endian architectures. Reported-by: Thomas Sattler Cc: stable@vger.kernel.org Fixes: 2a31b9db1535 ("kvm: introduce manual dirty log reprotect") Reviewed-by: Peter Xu Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit a80e3243e92413c64af871b16914523975658387 Author: Juergen Gross Date: Thu Jul 1 17:41:00 2021 +0200 x86/kvm: fix vcpu-id indexed array sizes commit 76b4f357d0e7d8f6f0013c733e6cba1773c266d3 upstream. KVM_MAX_VCPU_ID is the maximum vcpu-id of a guest, and not the number of vcpu-ids. Fix array indexed by vcpu-id to have KVM_MAX_VCPU_ID+1 elements. Note that this is currently no real problem, as KVM_MAX_VCPU_ID is an odd number, resulting in always enough padding being available at the end of those arrays. Nevertheless this should be fixed in order to avoid rare problems in case someone is using an even number for KVM_MAX_VCPU_ID. Signed-off-by: Juergen Gross Message-Id: <20210701154105.23215-2-jgross@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 3c82e27986afe89ec475e0664123eb33b1141471 Author: Srinivas Pandruvada Date: Tue Jul 27 09:18:24 2021 -0700 ACPI: DPTF: Fix reading of attributes commit 41a8457f3f6f829be1f8f8fa7577a46b9b7223ef upstream. The current assumption that methods to read PCH FIVR attributes will return integer, is not correct. There is no good way to return integer as negative numbers are also valid. These read methods return a package of integers. The first integer returns status, which is 0 on success and any other value for failure. When the returned status is zero, then the second integer returns the actual value. This change fixes this issue by replacing acpi_evaluate_integer() with acpi_evaluate_object() and use acpi_extract_package() to extract results. Fixes: 2ce6324eadb01 ("ACPI: DPTF: Add PCH FIVR participant driver") Signed-off-by: Srinivas Pandruvada Cc: 5.10+ # 5.10+ Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit cf90e1c4ad57454048fe56c663d2ece12bc6a985 Author: Hui Wang Date: Wed Jul 28 23:19:58 2021 +0800 Revert "ACPI: resources: Add checks for ACPI IRQ override" commit e0eef3690dc66b3ecc6e0f1267f332403eb22bea upstream. The commit 0ec4e55e9f57 ("ACPI: resources: Add checks for ACPI IRQ override") introduces regression on some platforms, at least it makes the UART can't get correct irq setting on two different platforms, and it makes the kernel can't bootup on these two platforms. This reverts commit 0ec4e55e9f571f08970ed115ec0addc691eda613. Regression-discuss: https://bugzilla.kernel.org/show_bug.cgi?id=213031 Reported-by: PGNd Cc: 5.4+ # 5.4+ Signed-off-by: Hui Wang Acked-by: Greg Kroah-Hartman Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit 1d381aca0d9ceb06a15c07560c306976ee36ee8a Author: Goldwyn Rodrigues Date: Fri Jul 9 11:29:22 2021 -0500 btrfs: mark compressed range uptodate only if all bio succeed commit 240246f6b913b0c23733cfd2def1d283f8cc9bbe upstream. In compression write endio sequence, the range which the compressed_bio writes is marked as uptodate if the last bio of the compressed (sub)bios is completed successfully. There could be previous bio which may have failed which is recorded in cb->errors. Set the writeback range as uptodate only if cb->errors is zero, as opposed to checking only the last bio's status. Backporting notes: in all versions up to 4.4 the last argument is always replaced by "!cb->errors". CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Goldwyn Rodrigues Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit c543bced163b7c225e4bd9e7bf3d810acd1b2d46 Author: Desmond Cheong Zhi Xi Date: Tue Jul 27 15:13:03 2021 +0800 btrfs: fix rw device counting in __btrfs_free_extra_devids commit b2a616676839e2a6b02c8e40be7f886f882ed194 upstream. When removing a writeable device in __btrfs_free_extra_devids, the rw device count should be decremented. This error was caught by Syzbot which reported a warning in close_fs_devices: WARNING: CPU: 1 PID: 9355 at fs/btrfs/volumes.c:1168 close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168 Modules linked in: CPU: 0 PID: 9355 Comm: syz-executor552 Not tainted 5.13.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168 RSP: 0018:ffffc9000333f2f0 EFLAGS: 00010293 RAX: ffffffff8365f5c3 RBX: 0000000000000001 RCX: ffff888029afd4c0 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: ffff88802846f508 R08: ffffffff8365f525 R09: ffffed100337d128 R10: ffffed100337d128 R11: 0000000000000000 R12: dffffc0000000000 R13: ffff888019be8868 R14: 1ffff1100337d10d R15: 1ffff1100337d10a FS: 00007f6f53828700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000047c410 CR3: 00000000302a6000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_close_devices+0xc9/0x450 fs/btrfs/volumes.c:1180 open_ctree+0x8e1/0x3968 fs/btrfs/disk-io.c:3693 btrfs_fill_super fs/btrfs/super.c:1382 [inline] btrfs_mount_root+0xac5/0xc60 fs/btrfs/super.c:1749 legacy_get_tree+0xea/0x180 fs/fs_context.c:592 vfs_get_tree+0x86/0x270 fs/super.c:1498 fc_mount fs/namespace.c:993 [inline] vfs_kern_mount+0xc9/0x160 fs/namespace.c:1023 btrfs_mount+0x3d3/0xb50 fs/btrfs/super.c:1809 legacy_get_tree+0xea/0x180 fs/fs_context.c:592 vfs_get_tree+0x86/0x270 fs/super.c:1498 do_new_mount fs/namespace.c:2905 [inline] path_mount+0x196f/0x2be0 fs/namespace.c:3235 do_mount fs/namespace.c:3248 [inline] __do_sys_mount fs/namespace.c:3456 [inline] __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433 do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae Because fs_devices->rw_devices was not 0 after closing all devices. Here is the call trace that was observed: btrfs_mount_root(): btrfs_scan_one_device(): device_list_add(); <---------------- device added btrfs_open_devices(): open_fs_devices(): btrfs_open_one_device(); <-------- writable device opened, rw device count ++ btrfs_fill_super(): open_ctree(): btrfs_free_extra_devids(): __btrfs_free_extra_devids(); <--- writable device removed, rw device count not decremented fail_tree_roots: btrfs_close_devices(): close_fs_devices(); <------- rw device count off by 1 As a note, prior to commit cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device"), rw_devices was decremented on removing a writable device in __btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit was not set for the device. However, this check does not need to be reinstated as it is now redundant and incorrect. In __btrfs_free_extra_devids, we skip removing the device if it is the target for replacement. This is done by checking whether device->devid == BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check, and so it's redundant to test for that bit. Additionally, following commit 82372bc816d7 ("Btrfs: make the logic of source device removing more clear"), rw_devices is incremented whenever a writeable device is added to the alloc list (including the target device in btrfs_dev_replace_finishing), so all removals of writable devices from the alloc list should also be accompanied by a decrement to rw_devices. Reported-by: syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device") CC: stable@vger.kernel.org # 5.10+ Tested-by: syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com Reviewed-by: Anand Jain Signed-off-by: Desmond Cheong Zhi Xi Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 9e4417af187e0b7eb44b188207fb675a1e2f235c Author: Filipe Manana Date: Tue Jul 27 11:24:43 2021 +0100 btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction commit ecc64fab7d49c678e70bd4c35fe64d2ab3e3d212 upstream. When checking if we need to log the new name of a renamed inode, we are checking if the inode and its parent inode have been logged before, and if not we don't log the new name. The check however is buggy, as it directly compares the logged_trans field of the inodes versus the ID of the current transaction. The problem is that logged_trans is a transient field, only stored in memory and never persisted in the inode item, so if an inode was logged before, evicted and reloaded, its logged_trans field is set to a value of 0, meaning the check will return false and the new name of the renamed inode is not logged. If the old parent directory was previously fsynced and we deleted the logged directory entries corresponding to the old name, we end up with a log that when replayed will delete the renamed inode. The following example triggers the problem: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt $ mkdir /mnt/A $ mkdir /mnt/B $ echo -n "hello world" > /mnt/A/foo $ sync # Add some new file to A and fsync directory A. $ touch /mnt/A/bar $ xfs_io -c "fsync" /mnt/A # Now trigger inode eviction. We are only interested in triggering # eviction for the inode of directory A. $ echo 2 > /proc/sys/vm/drop_caches # Move foo from directory A to directory B. # This deletes the directory entries for foo in A from the log, and # does not add the new name for foo in directory B to the log, because # logged_trans of A is 0, which is less than the current transaction ID. $ mv /mnt/A/foo /mnt/B/foo # Now make an fsync to anything except A, B or any file inside them, # like for example create a file at the root directory and fsync this # new file. This syncs the log that contains all the changes done by # previous rename operation. $ touch /mnt/baz $ xfs_io -c "fsync" /mnt/baz # Mount the filesystem and replay the log. $ mount /dev/sdc /mnt # Check the filesystem content. $ ls -1R /mnt /mnt/: A B baz /mnt/A: bar /mnt/B: $ # File foo is gone, it's neither in A/ nor in B/. Fix this by using the inode_logged() helper at btrfs_log_new_name(), which safely checks if an inode was logged before in the current transaction. A test case for fstests will follow soon. CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 89e34995bdd78f798021adb95a57b10164d8231f Author: Javier Pello Date: Wed Jul 14 18:54:48 2021 +0200 fs/ext2: Avoid page_address on pages returned by ext2_get_page commit 728d392f8a799f037812d0f2b254fb3b5e115fcf upstream. Commit 782b76d7abdf02b12c46ed6f1e9bf715569027f7 ("fs/ext2: Replace kmap() with kmap_local_page()") replaced the kmap/kunmap calls in ext2_get_page/ext2_put_page with kmap_local_page/kunmap_local for efficiency reasons. As a necessary side change, the commit also made ext2_get_page (and ext2_find_entry and ext2_dotdot) return the mapping address along with the page itself, as it is required for kunmap_local, and converted uses of page_address on such pages to use the newly returned address instead. However, uses of page_address on such pages were missed in ext2_check_page and ext2_delete_entry, which triggers oopses if kmap_local_page happens to return an address from high memory. Fix this now by converting the remaining uses of page_address to use the right address, as returned by kmap_local_page. Link: https://lore.kernel.org/r/20210714185448.8707ac239e9f12b3a7f5b9f9@urjc.es Reviewed-by: Ira Weiny Signed-off-by: Javier Pello Fixes: 782b76d7abdf ("fs/ext2: Replace kmap() with kmap_local_page()") Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman commit f0aa1bc37e9a9e48923a7e04dba7da2b59654ec8 Author: Linus Torvalds Date: Fri Jul 30 15:42:34 2021 -0700 pipe: make pipe writes always wake up readers commit 3a34b13a88caeb2800ab44a4918f230041b37dd9 upstream. Since commit 1b6b26ae7053 ("pipe: fix and clarify pipe write wakeup logic") we have sanitized the pipe write logic, and would only try to wake up readers if they needed it. In particular, if the pipe already had data in it before the write, there was no point in trying to wake up a reader, since any existing readers must have been aware of the pre-existing data already. Doing extraneous wakeups will only cause potential thundering herd problems. However, it turns out that some Android libraries have misused the EPOLL interface, and expected "edge triggered" be to "any new write will trigger it". Even if there was no edge in sight. Quoting Sandeep Patil: "The commit 1b6b26ae7053 ('pipe: fix and clarify pipe write wakeup logic') changed pipe write logic to wakeup readers only if the pipe was empty at the time of write. However, there are libraries that relied upon the older behavior for notification scheme similar to what's described in [1] One such library 'realm-core'[2] is used by numerous Android applications. The library uses a similar notification mechanism as GNU Make but it never drains the pipe until it is full. When Android moved to v5.10 kernel, all applications using this library stopped working. The library has since been fixed[3] but it will be a while before all applications incorporate the updated library" Our regression rule for the kernel is that if applications break from new behavior, it's a regression, even if it was because the application did something patently wrong. Also note the original report [4] by Michal Kerrisk about a test for this epoll behavior - but at that point we didn't know of any actual broken use case. So add the extraneous wakeup, to approximate the old behavior. [ I say "approximate", because the exact old behavior was to do a wakeup not for each write(), but for each pipe buffer chunk that was filled in. The behavior introduced by this change is not that - this is just "every write will cause a wakeup, whether necessary or not", which seems to be sufficient for the broken library use. ] It's worth noting that this adds the extraneous wakeup only for the write side, while the read side still considers the "edge" to be purely about reading enough from the pipe to allow further writes. See commit f467a6a66419 ("pipe: fix and clarify pipe read wakeup logic") for the pipe read case, which remains that "only wake up if the pipe was full, and we read something from it". Link: https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0bam6w@mail.gmail.com/ [1] Link: https://github.com/realm/realm-core [2] Link: https://github.com/realm/realm-core/issues/4666 [3] Link: https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/ [4] Link: https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/ Reported-by: Sandeep Patil Cc: Michael Kerrisk Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 5a5aaf4177dad7dc35612ac74dd51ad8d1f040d0 Author: Greg Kroah-Hartman Date: Wed Jul 28 13:51:58 2021 +0200 selftest: fix build error in tools/testing/selftests/vm/userfaultfd.c When backporting 0db282ba2c12 ("selftest: use mmap instead of posix_memalign to allocate memory") to this stable branch, I forgot a { breaking the build. Signed-off-by: Greg Kroah-Hartman