/[packages]/backports/8/kernel/current/SOURCES/mm-gup-fix-foll_force-cow-security-issue-and-remove-foll_cow.patch
ViewVC logotype

Contents of /backports/8/kernel/current/SOURCES/mm-gup-fix-foll_force-cow-security-issue-and-remove-foll_cow.patch

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1880454 - (show annotations) (download)
Fri Aug 26 04:48:43 2022 UTC (19 months ago) by tmb
File size: 12199 byte(s)
- update to 5.19.4
  * drop merged patches
- add current -stable queue


1 From david@redhat.com Thu Aug 25 12:40:27 2022
2 From: David Hildenbrand <david@redhat.com>
3 Date: Wed, 24 Aug 2022 21:23:33 +0200
4 Subject: mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW
5 To: linux-kernel@vger.kernel.org
6 Cc: linux-mm@kvack.org, David Hildenbrand <david@redhat.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Axel Rasmussen <axelrasmussen@google.com>, Nadav Amit <nadav.amit@gmail.com>, Peter Xu <peterx@redhat.com>, Hugh Dickins <hughd@google.com>, Andrea Arcangeli <aarcange@redhat.com>, Matthew Wilcox <willy@infradead.org>, Vlastimil Babka <vbabka@suse.cz>, John Hubbard <jhubbard@nvidia.com>, Jason Gunthorpe <jgg@nvidia.com>, David Laight <David.Laight@ACULAB.COM>, stable@vger.kernel.org
7 Message-ID: <20220824192333.287405-1-david@redhat.com>
8
9 From: David Hildenbrand <david@redhat.com>
10
11 commit 5535be3099717646781ce1540cf725965d680e7b upstream.
12
13 Ever since the Dirty COW (CVE-2016-5195) security issue happened, we know
14 that FOLL_FORCE can be possibly dangerous, especially if there are races
15 that can be exploited by user space.
16
17 Right now, it would be sufficient to have some code that sets a PTE of a
18 R/O-mapped shared page dirty, in order for it to erroneously become
19 writable by FOLL_FORCE. The implications of setting a write-protected PTE
20 dirty might not be immediately obvious to everyone.
21
22 And in fact ever since commit 9ae0f87d009c ("mm/shmem: unconditionally set
23 pte dirty in mfill_atomic_install_pte"), we can use UFFDIO_CONTINUE to map
24 a shmem page R/O while marking the pte dirty. This can be used by
25 unprivileged user space to modify tmpfs/shmem file content even if the
26 user does not have write permissions to the file, and to bypass memfd
27 write sealing -- Dirty COW restricted to tmpfs/shmem (CVE-2022-2590).
28
29 To fix such security issues for good, the insight is that we really only
30 need that fancy retry logic (FOLL_COW) for COW mappings that are not
31 writable (!VM_WRITE). And in a COW mapping, we really only broke COW if
32 we have an exclusive anonymous page mapped. If we have something else
33 mapped, or the mapped anonymous page might be shared (!PageAnonExclusive),
34 we have to trigger a write fault to break COW. If we don't find an
35 exclusive anonymous page when we retry, we have to trigger COW breaking
36 once again because something intervened.
37
38 Let's move away from this mandatory-retry + dirty handling and rely on our
39 PageAnonExclusive() flag for making a similar decision, to use the same
40 COW logic as in other kernel parts here as well. In case we stumble over
41 a PTE in a COW mapping that does not map an exclusive anonymous page, COW
42 was not properly broken and we have to trigger a fake write-fault to break
43 COW.
44
45 Just like we do in can_change_pte_writable() added via commit 64fe24a3e05e
46 ("mm/mprotect: try avoiding write faults for exclusive anonymous pages
47 when changing protection") and commit 76aefad628aa ("mm/mprotect: fix
48 soft-dirty check in can_change_pte_writable()"), take care of softdirty
49 and uffd-wp manually.
50
51 For example, a write() via /proc/self/mem to a uffd-wp-protected range has
52 to fail instead of silently granting write access and bypassing the
53 userspace fault handler. Note that FOLL_FORCE is not only used for debug
54 access, but also triggered by applications without debug intentions, for
55 example, when pinning pages via RDMA.
56
57 This fixes CVE-2022-2590. Note that only x86_64 and aarch64 are
58 affected, because only those support CONFIG_HAVE_ARCH_USERFAULTFD_MINOR.
59
60 Fortunately, FOLL_COW is no longer required to handle FOLL_FORCE. So
61 let's just get rid of it.
62
63 Thanks to Nadav Amit for pointing out that the pte_dirty() check in
64 FOLL_FORCE code is problematic and might be exploitable.
65
66 Note 1: We don't check for the PTE being dirty because it doesn't matter
67 for making a "was COWed" decision anymore, and whoever modifies the
68 page has to set the page dirty either way.
69
70 Note 2: Kernels before extended uffd-wp support and before
71 PageAnonExclusive (< 5.19) can simply revert the problematic
72 commit instead and be safe regarding UFFDIO_CONTINUE. A backport to
73 v5.19 requires minor adjustments due to lack of
74 vma_soft_dirty_enabled().
75
76 Link: https://lkml.kernel.org/r/20220809205640.70916-1-david@redhat.com
77 Fixes: 9ae0f87d009c ("mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte")
78 Signed-off-by: David Hildenbrand <david@redhat.com>
79 Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
80 Cc: Axel Rasmussen <axelrasmussen@google.com>
81 Cc: Nadav Amit <nadav.amit@gmail.com>
82 Cc: Peter Xu <peterx@redhat.com>
83 Cc: Hugh Dickins <hughd@google.com>
84 Cc: Andrea Arcangeli <aarcange@redhat.com>
85 Cc: Matthew Wilcox <willy@infradead.org>
86 Cc: Vlastimil Babka <vbabka@suse.cz>
87 Cc: John Hubbard <jhubbard@nvidia.com>
88 Cc: Jason Gunthorpe <jgg@nvidia.com>
89 Cc: David Laight <David.Laight@ACULAB.COM>
90 Cc: <stable@vger.kernel.org> [5.16]
91 Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
92 Signed-off-by: David Hildenbrand <david@redhat.com>
93 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
94 ---
95 include/linux/mm.h | 1
96 mm/gup.c | 69 ++++++++++++++++++++++++++++++++++++-----------------
97 mm/huge_memory.c | 65 +++++++++++++++++++++++++++++++++----------------
98 3 files changed, 91 insertions(+), 44 deletions(-)
99
100 --- a/include/linux/mm.h
101 +++ b/include/linux/mm.h
102 @@ -2939,7 +2939,6 @@ struct page *follow_page(struct vm_area_
103 #define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */
104 #define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */
105 #define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */
106 -#define FOLL_COW 0x4000 /* internal GUP flag */
107 #define FOLL_ANON 0x8000 /* don't do file mappings */
108 #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */
109 #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */
110 --- a/mm/gup.c
111 +++ b/mm/gup.c
112 @@ -478,14 +478,43 @@ static int follow_pfn_pte(struct vm_area
113 return -EEXIST;
114 }
115
116 -/*
117 - * FOLL_FORCE can write to even unwritable pte's, but only
118 - * after we've gone through a COW cycle and they are dirty.
119 - */
120 -static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
121 +/* FOLL_FORCE can write to even unwritable PTEs in COW mappings. */
122 +static inline bool can_follow_write_pte(pte_t pte, struct page *page,
123 + struct vm_area_struct *vma,
124 + unsigned int flags)
125 {
126 - return pte_write(pte) ||
127 - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
128 + /* If the pte is writable, we can write to the page. */
129 + if (pte_write(pte))
130 + return true;
131 +
132 + /* Maybe FOLL_FORCE is set to override it? */
133 + if (!(flags & FOLL_FORCE))
134 + return false;
135 +
136 + /* But FOLL_FORCE has no effect on shared mappings */
137 + if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED))
138 + return false;
139 +
140 + /* ... or read-only private ones */
141 + if (!(vma->vm_flags & VM_MAYWRITE))
142 + return false;
143 +
144 + /* ... or already writable ones that just need to take a write fault */
145 + if (vma->vm_flags & VM_WRITE)
146 + return false;
147 +
148 + /*
149 + * See can_change_pte_writable(): we broke COW and could map the page
150 + * writable if we have an exclusive anonymous page ...
151 + */
152 + if (!page || !PageAnon(page) || !PageAnonExclusive(page))
153 + return false;
154 +
155 + /* ... and a write-fault isn't required for other reasons. */
156 + if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) &&
157 + !(vma->vm_flags & VM_SOFTDIRTY) && !pte_soft_dirty(pte))
158 + return false;
159 + return !userfaultfd_pte_wp(vma, pte);
160 }
161
162 static struct page *follow_page_pte(struct vm_area_struct *vma,
163 @@ -528,12 +557,19 @@ retry:
164 }
165 if ((flags & FOLL_NUMA) && pte_protnone(pte))
166 goto no_page;
167 - if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) {
168 - pte_unmap_unlock(ptep, ptl);
169 - return NULL;
170 - }
171
172 page = vm_normal_page(vma, address, pte);
173 +
174 + /*
175 + * We only care about anon pages in can_follow_write_pte() and don't
176 + * have to worry about pte_devmap() because they are never anon.
177 + */
178 + if ((flags & FOLL_WRITE) &&
179 + !can_follow_write_pte(pte, page, vma, flags)) {
180 + page = NULL;
181 + goto out;
182 + }
183 +
184 if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) {
185 /*
186 * Only return device mapping pages in the FOLL_GET or FOLL_PIN
187 @@ -967,17 +1003,6 @@ static int faultin_page(struct vm_area_s
188 return -EBUSY;
189 }
190
191 - /*
192 - * The VM_FAULT_WRITE bit tells us that do_wp_page has broken COW when
193 - * necessary, even if maybe_mkwrite decided not to set pte_write. We
194 - * can thus safely do subsequent page lookups as if they were reads.
195 - * But only do so when looping for pte_write is futile: in some cases
196 - * userspace may also be wanting to write to the gotten user page,
197 - * which a read fault here might prevent (a readonly page might get
198 - * reCOWed by userspace write).
199 - */
200 - if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE))
201 - *flags |= FOLL_COW;
202 return 0;
203 }
204
205 --- a/mm/huge_memory.c
206 +++ b/mm/huge_memory.c
207 @@ -978,12 +978,6 @@ struct page *follow_devmap_pmd(struct vm
208
209 assert_spin_locked(pmd_lockptr(mm, pmd));
210
211 - /*
212 - * When we COW a devmap PMD entry, we split it into PTEs, so we should
213 - * not be in this function with `flags & FOLL_COW` set.
214 - */
215 - WARN_ONCE(flags & FOLL_COW, "mm: In follow_devmap_pmd with FOLL_COW set");
216 -
217 /* FOLL_GET and FOLL_PIN are mutually exclusive. */
218 if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
219 (FOLL_PIN | FOLL_GET)))
220 @@ -1349,14 +1343,43 @@ fallback:
221 return VM_FAULT_FALLBACK;
222 }
223
224 -/*
225 - * FOLL_FORCE can write to even unwritable pmd's, but only
226 - * after we've gone through a COW cycle and they are dirty.
227 - */
228 -static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags)
229 +/* FOLL_FORCE can write to even unwritable PMDs in COW mappings. */
230 +static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page,
231 + struct vm_area_struct *vma,
232 + unsigned int flags)
233 {
234 - return pmd_write(pmd) ||
235 - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd));
236 + /* If the pmd is writable, we can write to the page. */
237 + if (pmd_write(pmd))
238 + return true;
239 +
240 + /* Maybe FOLL_FORCE is set to override it? */
241 + if (!(flags & FOLL_FORCE))
242 + return false;
243 +
244 + /* But FOLL_FORCE has no effect on shared mappings */
245 + if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED))
246 + return false;
247 +
248 + /* ... or read-only private ones */
249 + if (!(vma->vm_flags & VM_MAYWRITE))
250 + return false;
251 +
252 + /* ... or already writable ones that just need to take a write fault */
253 + if (vma->vm_flags & VM_WRITE)
254 + return false;
255 +
256 + /*
257 + * See can_change_pte_writable(): we broke COW and could map the page
258 + * writable if we have an exclusive anonymous page ...
259 + */
260 + if (!page || !PageAnon(page) || !PageAnonExclusive(page))
261 + return false;
262 +
263 + /* ... and a write-fault isn't required for other reasons. */
264 + if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) &&
265 + !(vma->vm_flags & VM_SOFTDIRTY) && !pmd_soft_dirty(pmd))
266 + return false;
267 + return !userfaultfd_huge_pmd_wp(vma, pmd);
268 }
269
270 struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
271 @@ -1365,12 +1388,16 @@ struct page *follow_trans_huge_pmd(struc
272 unsigned int flags)
273 {
274 struct mm_struct *mm = vma->vm_mm;
275 - struct page *page = NULL;
276 + struct page *page;
277
278 assert_spin_locked(pmd_lockptr(mm, pmd));
279
280 - if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags))
281 - goto out;
282 + page = pmd_page(*pmd);
283 + VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
284 +
285 + if ((flags & FOLL_WRITE) &&
286 + !can_follow_write_pmd(*pmd, page, vma, flags))
287 + return NULL;
288
289 /* Avoid dumping huge zero page */
290 if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
291 @@ -1378,10 +1405,7 @@ struct page *follow_trans_huge_pmd(struc
292
293 /* Full NUMA hinting faults to serialise migration in fault paths */
294 if ((flags & FOLL_NUMA) && pmd_protnone(*pmd))
295 - goto out;
296 -
297 - page = pmd_page(*pmd);
298 - VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
299 + return NULL;
300
301 if (!pmd_write(*pmd) && gup_must_unshare(flags, page))
302 return ERR_PTR(-EMLINK);
303 @@ -1398,7 +1422,6 @@ struct page *follow_trans_huge_pmd(struc
304 page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT;
305 VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page);
306
307 -out:
308 return page;
309 }
310

  ViewVC Help
Powered by ViewVC 1.1.30