Analyzing the Linux Kernel vmsplice Exploit

Zero-day emerges

On February 9, zero-day exploit code [1] was posted on milw0rm site. It exploited
vulnerability in linux kernels Versions 2.6.17 to 2.6.24.1. This bug allows
an unprivileged local user to gain root privileges. This vulnerability was
assigned CVE-2008-0600.
There are reports that this exploit is reliable and actively used in the wild.
The inner workings of this exploit are quite interesting from the
technical point of view; let’s have a look.

Details on the vulnerability and methods of exploitation

The vulnerability lies in the get_iovec_page_array function
(in fs/splice.c, line numbers from 2.6.23.1-42.fc8 kernel),
reachable from the vmsplice() system function:

1286:       if (unlikely(!len)) // "len" variable is under user's
            control
1287:               break;
...
1296:       off = (unsigned long) base & ~PAGE_MASK;
...
1306:       npages = (off + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
1307:       if (npages > PIPE_BUFFERS - buffers)
1308:               npages = PIPE_BUFFERS - buffers;
1309:
1310:       error = get_user_pages(current, current->mm,
1311:                              (unsigned long) base, npages, 0, 0,
1312:                              &pages[buffers], NULL);

The get_user_pages function expects its fourth argument (the
number of pages descriptors to fill; it limits the return value) to be at
least 1. In the preceding code it is assumed that the npages variable is at least 1 (because len must be nonzero, so the off + len + PAGE_SIZE - 1 expression should be greater or equal than PAGE_SIZE). However, if the len variable is close to UINT32_MAX, then the off + len + PAGE_SIZE -1 computation will result in an integer wrap, and npages can be zero.

As a result, get_user_pages may return more than
PIPE_BUFFERS entries, and the pages array will
overflow. However, the overflow payload is not controlled by the attacker,
so it would be difficult to turn this overflow into reliable code execution.

The reliable exploitation happens thanks to the subsequent loop:

1320:       for (i = 0; i > error; i++) {
1321:               const int plen = min_t(size_t, len,
                    PAGE_SIZE - off);
1322:
1323:               partial[buffers].offset = off;
1324:               partial[buffers].len = plen;
1325:
1326:               off = 0;
1327:               len -= plen;
1328:               buffers++;
1329:       }

Here, the partial array, which is also PIPE_BUFFERS
elements long, is overflowed with (off=0, plen=0×1000) pairs. Now, depending on the variables
layout chosen by the compiler, various data structures (that follow partial array) can be overwritten with zero. In the most common case, the pages array will be located after the partial array. The pages array contains pointers,
thus after the preceding loop, it will contain NULL pointers.

Normally, when the kernel tries to access a NULL pointer, it will result in an
exception and the process will be terminated. However, the attacker can map
memory pages at address zero, and store arbitrary data there. In such a scenario,
when the kernel dereferences pointers from the pages array,
attacker-controlled data will be processed, which may result in arbitrary
code execution in the kernel context. In our case, the convenient technique is
to make an entry in the pages array look as a compound page
descriptor, which will result in a function call to an attacker-controlled
address in user space:

37 static void put_compound_page(struct page *page)
   /* attacker controls arg */
38 {
39     page = (struct page *)page_private(page);
40     if (put_page_testzero(page)) {
41             void (*dtor)(struct page *page);
42
43             dtor = (void (*)(struct page *))page[1].lru.next;
44             (*dtor)(page); /* so attacker controls the target
                of the call
45     }
46 }

To sum up, the exploitation involves:

  • integer overflow
  • buffer overflow
  • mapping the zero address to allow NULL dereference

Workarounds

The kernel upgrade is the preferred solution; but if it is not feasible, there
are workarounds.

A simple kernel module, which disables the sys_vmsplice system
call, has been posted [2].

The exploit we’ve discussed relies heavily on the possibility to map memory at
address zero. Starting with kernel 2.6.23, there is a mechanism to forbid such
mapping via procfs. The echo 65536 > /proc/sys/vm/mmap_min_addr
command will set the lowest possible mapping to be at 64K. Note that:

  • SELinux must be enabled (in enforcing mode) for this command to take effect.
  • Although this setting certainly makes the current exploit fail, there is a nonzero probability that the vulnerability can be exploited without mapping the zero address. I know of no code capable of such exploitation; however, it cannot be ruled out.
  • This setting may prevent exploitation of future NULL pointer dereferences vulnerabilities. Very few programs make legitimate use of mapping the zero address.

关键词: 0-day , exp , linux

上一篇: 超级诡异的MySQL server has gone away解决办法
下一篇: 渗透这点事

相关文章
访客评论
#1
回复 amxku 2008-02-22, 08:42:22
Linux vmsplice Local Root Exploit 国内70%的linux(RHEL5)估计都被root了吧。
发表评论

评论内容 (必填):