Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions pocs/linux/kernelctf/CVE-2023-4623_mitigation/docs/exploit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
## Setup

To trigger the vulnerability we need to create following configuration on the loopback interface:

```
qdisc hfsc 1: dev lo root refcnt 2 default 10
class hfsc 1: root
class hfsc 1:1 parent 1: rt m1 2Kbit d 8us m2 800bit
class hfsc 1:10 parent 1:1 ls m1 2Kbit d 8us m2 800bit
class hfsc 1:2 parent 1: ls m1 2Kbit d 8us m2 800bit
```

Classes marked as 'ls' are link-sharing (have HFSC_FSC flag set)

The last class is not required for the actual triggering, but will help with the exploitation, as explained below.

## Triggering the vulnerability

The first step is to send a packet on the loopback interface. There are no filters, so the packet will be classified to the default 1:10 class and enqueued there.
This will cause the class 1:10 to be inserted into the vttree of 1:1 and class 1:1 to be inserted into the vttree of the root class.
Then hfsc_dequeue() will be called and will call update_vf() which will remove class 1:10 from vttree of 1:1, but will skip 1:1 (no HFSC_FSC) flag, so 1:1 will remain in the vttree of the root class.


The second step is to remove classes 1:10 and 1:1.
Their removal will trigger freeing of the associated qdisc objects (after an RCU delay).

We then reclaim the freed qdisc memory using a netlink allocation primitive.
When a message is sent on a netlink socket (or any socket for that matter) a buffer for the message data is allocated using kmalloc().
This primitive has an advantage of not having any reserved space at the beginning, which is important to us because qdisc object looks like this:
```
struct Qdisc {
int (*enqueue)(struct sk_buff *, struct Qdisc *, struct sk_buff * *); /* 0 0x8 */
struct sk_buff * (*dequeue)(struct Qdisc *); /* 0x8 0x8 */
unsigned int flags; /* 0x10 0x4 */
...
```


We need to control the first 16 bytes to get easy code execution.

Next, we delete class 1:2 and add a new 1:10 class.

Then we send another packet. It will get enqueued to the newly created 1:10 class and hfsc_dequeue() will be called.
hfsc_dequeue() calls vttree_get_minvt() to select the class for the packet to be dequeued from.
vttree_get_minvt() traverses the vttree starting from the qdisc's root and will find a pointer to the old 1:10 class that was freed.
The contents of that memory were not changed and it still has a pointer to the old qdisc, which is now replaced by our fake object.

That's why we needed the 1:2 class to be deleted - without this step, the new 1:10 class would get the same memory as the previously freed 1:10, fixing the dangling pointer.

## Getting RIP control

After vttree_get_min_vt() returns the pointer to the freed class, qdisc_dequeue_peeked() will be called with the pointer to our fake object and eventually ->dequeue() function pointer will be called.

## Pivot to ROP

When ->dequeue() is called, RDI contains a pointer to the qdisc object, which is under our control.

Following gadgets are used to transfer control to our ROP:

```
mov rax, qword ptr [rdi]
mov rbx, rdi
call __x86_indirect_thunk_rax
```


```
lea rsi, [rbx + 0x48]
test eax, eax
jg 0xffffffff81204d3a
mov rax, qword ptr [rbx + 0x30]
lea rdi, [rsp + 8]
call __x86_indirect_thunk_rax
```


```
push rsi
jmp qword ptr [rsi + 0x66]
```

and finally

```
pop rsp
ret
```

### Second pivot

At this point we have full ROP, but there is not much space left, because most of our 512 byte buffer is taken by the skb_shared_info placed at the end.

To solve this we choose an unused read/write area in the kernel and use copy_user_generic_string() to copy the second stage ROP from userspace to that area.
Then we use a `pop rsp ; ret` gadget to pivot there.

## Privilege escalation

The second stage of the ROP does the standard commit_creds(init_cred); switch_task_namespaces(pid, init_nsproxy); sequence and returns to the userspace.


Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
## Requirements to trigger the vulnerability

- CAP_NET_ADMIN in a namespace is required
- Kernel configuration: CONFIG_NET_SCH_HFSC
- User namespaces required: Yes

## Commit which fixed the vulnerability

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3d26c5702c7d6c45456326e56d2ccf3f103e60f

## Affected kernel versions

Introduced in 2.6.3. Fixed in 6.1.52 and other stable trees.

## Affected component, subsystem

net/sched: sch_hfsc

## Description

HFSC is a classful scheduler and its classes can be created with different flags affecting the scheduler behaviour.

When a packet is enqueued to a class with the HFSC_FSC (link-sharing enabled) flag, the class is inserted into the parent class tree in init_vf()/vttree_insert().
Normally, the parent also has the HFSC_FSC flag and the class is removed from the parent in the update_vf()/vttree_remove() during packet dequeue operation.
However, if the parent has no link-sharing flag it is skipped in tree traversal in update_vf() and the child class is still referenced in the parent tree.
If an attacker deletes the child class a use-after-free condition can be triggered during enqueue/dequeue operations.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
INCLUDES = -I/usr/include/libnl3
LIBS = -L. -pthread -lnl-cli-3 -lnl-route-3 -lnl-3 -ldl
CFLAGS = -fomit-frame-pointer -static -fcf-protection=none

exploit: exploit.c
gcc -o $@ exploit.c $(INCLUDES) $(CFLAGS) $(LIBS)

prerequisites:
sudo apt-get install libnl-cli-3-dev libnl-route-3-dev
Binary file not shown.
Loading
Loading