thr3ads.net - Linux Virtualization - [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle

If this information is useful, please help other people find it:
Share via:

Quan Xu

2017-Nov-14 08:15 UTC

[PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

On 2017/11/14 15:12, Wanpeng Li wrote:> 2017-11-14 15:02 GMT+08:00 Quan Xu <quan.xu0 at gmail.com>:
>>
>> On 2017/11/13 18:53, Juergen Gross wrote:
>>> On 13/11/17 11:06, Quan Xu wrote:
>>>> From: Quan Xu <quan.xu0 at gmail.com>
>>>>
>>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is
called
>>>> in idle path which will poll for a while before we enter the
real idle
>>>> state.
>>>>
>>>> In virtualization, idle path includes several heavy operations
>>>> includes timer access(LAPIC timer or TSC deadline timer) which
will
>>>> hurt performance especially for latency intensive workload like
message
>>>> passing task. The cost is mainly from the vmexit which is a
hardware
>>>> context switch between virtual machine and hypervisor. Our
solution is
>>>> to poll for a while and do not enter real idle path if we can
get the
>>>> schedule event during polling.
>>>>
>>>> Poll may cause the CPU waste so we adopt a smart polling
mechanism to
>>>> reduce the useless poll.
>>>>
>>>> Signed-off-by: Yang Zhang <yang.zhang.wz at gmail.com>
>>>> Signed-off-by: Quan Xu <quan.xu0 at gmail.com>
>>>> Cc: Juergen Gross <jgross at suse.com>
>>>> Cc: Alok Kataria <akataria at vmware.com>
>>>> Cc: Rusty Russell <rusty at rustcorp.com.au>
>>>> Cc: Thomas Gleixner <tglx at linutronix.de>
>>>> Cc: Ingo Molnar <mingo at redhat.com>
>>>> Cc: "H. Peter Anvin" <hpa at zytor.com>
>>>> Cc: x86 at kernel.org
>>>> Cc: virtualization at lists.linux-foundation.org
>>>> Cc: linux-kernel at vger.kernel.org
>>>> Cc: xen-devel at lists.xenproject.org
>>> Hmm, is the idle entry path really so critical to performance that
a new
>>> pvops function is necessary?
>> Juergen, Here is the data we get when running benchmark netperf:
>>   1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>>      29031.6 bit/s -- 76.1 %CPU
>>
>>   2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>>      35787.7 bit/s -- 129.4 %CPU
>>
>>   3. w/ kvm dynamic poll:
>>      35735.6 bit/s -- 200.0 %CPU
> Actually we can reduce the CPU utilization by sleeping a period of
> time as what has already been done in the poll logic of IO subsystem,
> then we can improve the algorithm in kvm instead of introduing another
> duplicate one in the kvm guest.We really appreciate upstream's kvm dynamic poll mechanism, which is
really helpful for a lot of scenario..

However, as description said, in virtualization, idle path includes
several heavy operations includes timer access (LAPIC timer or TSC
deadline timer) which will hurt performance especially for latency
intensive workload like message passing task. The cost is mainly from
the vmexit which is a hardware context switch between virtual machine
and hypervisor.

for upstream's kvm dynamic poll mechanism, even you could provide a
better algorism, how could you bypass timer access (LAPIC timer or TSC
deadline timer), or a hardware context switch between virtual machine
and hypervisor. I know these is a tradeoff.

Furthermore, here is the data we get when running benchmark contextswitch
to measure the latency(lower is better):

1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
 ? 3402.9 ns/ctxsw -- 199.8 %CPU

2. w/ patch and disable kvm dynamic poll:
 ? 1163.5 ns/ctxsw -- 205.5 %CPU

3. w/ kvm dynamic poll:
 ? 2280.6 ns/ctxsw -- 199.5 %CPU

so, these tow solution are quite similar, but not duplicate..

that's also why to add a generic idle poll before enter real idle path.
When a reschedule event is pending, we can bypass the real idle path.


Quan
Alibaba Cloud



> Regards,
> Wanpeng Li
>
>>   4. w/patch and w/ kvm dynamic poll:
>>      42225.3 bit/s -- 198.7 %CPU
>>
>>   5. idle=poll
>>      37081.7 bit/s -- 998.1 %CPU
>>
>>
>>
>>   w/ this patch, we will improve performance by 23%.. even we could
improve
>>   performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also
the
>>   cost of CPU is much lower than 'idle=poll' case..
>>
>>> Wouldn't a function pointer, maybe guarded
>>> by a static key, be enough? A further advantage would be that this
would
>>> work on other architectures, too.
>>
>> I assume this feature will be ported to other archs.. a new pvops makes
code
>> clean and easy to maintain. also I tried to add it into existed pvops,
but
>> it
>> doesn't match.
>>
>>
>>
>> Quan
>> Alibaba Cloud
>>>
>>> Juergen
>>>

Wanpeng Li

2017-Nov-14 08:22 UTC

head link

[PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-14 16:15 GMT+08:00 Quan Xu <quan.xu0 at
gmail.com>:>
>
> On 2017/11/14 15:12, Wanpeng Li wrote:
>>
>> 2017-11-14 15:02 GMT+08:00 Quan Xu <quan.xu0 at gmail.com>:
>>>
>>>
>>> On 2017/11/13 18:53, Juergen Gross wrote:
>>>>
>>>> On 13/11/17 11:06, Quan Xu wrote:
>>>>>
>>>>> From: Quan Xu <quan.xu0 at gmail.com>
>>>>>
>>>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll
is called
>>>>> in idle path which will poll for a while before we enter
the real idle
>>>>> state.
>>>>>
>>>>> In virtualization, idle path includes several heavy
operations
>>>>> includes timer access(LAPIC timer or TSC deadline timer)
which will
>>>>> hurt performance especially for latency intensive workload
like message
>>>>> passing task. The cost is mainly from the vmexit which is a
hardware
>>>>> context switch between virtual machine and hypervisor. Our
solution is
>>>>> to poll for a while and do not enter real idle path if we
can get the
>>>>> schedule event during polling.
>>>>>
>>>>> Poll may cause the CPU waste so we adopt a smart polling
mechanism to
>>>>> reduce the useless poll.
>>>>>
>>>>> Signed-off-by: Yang Zhang <yang.zhang.wz at
gmail.com>
>>>>> Signed-off-by: Quan Xu <quan.xu0 at gmail.com>
>>>>> Cc: Juergen Gross <jgross at suse.com>
>>>>> Cc: Alok Kataria <akataria at vmware.com>
>>>>> Cc: Rusty Russell <rusty at rustcorp.com.au>
>>>>> Cc: Thomas Gleixner <tglx at linutronix.de>
>>>>> Cc: Ingo Molnar <mingo at redhat.com>
>>>>> Cc: "H. Peter Anvin" <hpa at zytor.com>
>>>>> Cc: x86 at kernel.org
>>>>> Cc: virtualization at lists.linux-foundation.org
>>>>> Cc: linux-kernel at vger.kernel.org
>>>>> Cc: xen-devel at lists.xenproject.org
>>>>
>>>> Hmm, is the idle entry path really so critical to performance
that a new
>>>> pvops function is necessary?
>>>
>>> Juergen, Here is the data we get when running benchmark netperf:
>>>   1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>>>      29031.6 bit/s -- 76.1 %CPU
>>>
>>>   2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>>>      35787.7 bit/s -- 129.4 %CPU
>>>
>>>   3. w/ kvm dynamic poll:
>>>      35735.6 bit/s -- 200.0 %CPU
>>
>> Actually we can reduce the CPU utilization by sleeping a period of
>> time as what has already been done in the poll logic of IO subsystem,
>> then we can improve the algorithm in kvm instead of introduing another
>> duplicate one in the kvm guest.
>
> We really appreciate upstream's kvm dynamic poll mechanism, which is
> really helpful for a lot of scenario..
>
> However, as description said, in virtualization, idle path includes
> several heavy operations includes timer access (LAPIC timer or TSC
> deadline timer) which will hurt performance especially for latency
> intensive workload like message passing task. The cost is mainly from
> the vmexit which is a hardware context switch between virtual machine
> and hypervisor.
>
> for upstream's kvm dynamic poll mechanism, even you could provide a
> better algorism, how could you bypass timer access (LAPIC timer or TSC
> deadline timer), or a hardware context switch between virtual machine
> and hypervisor. I know these is a tradeoff.
>
> Furthermore, here is the data we get when running benchmark contextswitch
> to measure the latency(lower is better):
>
> 1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>   3402.9 ns/ctxsw -- 199.8 %CPU
>
> 2. w/ patch and disable kvm dynamic poll:
>   1163.5 ns/ctxsw -- 205.5 %CPU
>
> 3. w/ kvm dynamic poll:
>   2280.6 ns/ctxsw -- 199.5 %CPU
>
> so, these tow solution are quite similar, but not duplicate..
>
> that's also why to add a generic idle poll before enter real idle path.
> When a reschedule event is pending, we can bypass the real idle path.
>
There is a similar logic in the idle governor/driver, so how this
patchset influence the decision in the idle governor/driver when
running on bare-metal(power managment is not exposed to the guest so
we will not enter into idle driver in the guest)?

Regards,
Wanpeng Li

Quan Xu

2017-Nov-14 10:23 UTC

head link

[PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

On 2017/11/14 16:22, Wanpeng Li wrote:> 2017-11-14 16:15 GMT+08:00 Quan Xu <quan.xu0 at gmail.com>:
>>
>> On 2017/11/14 15:12, Wanpeng Li wrote:
>>> 2017-11-14 15:02 GMT+08:00 Quan Xu <quan.xu0 at gmail.com>:
>>>>
>>>> On 2017/11/13 18:53, Juergen Gross wrote:
>>>>> On 13/11/17 11:06, Quan Xu wrote:
>>>>>> From: Quan Xu <quan.xu0 at gmail.com>
>>>>>>
>>>>>> So far, pv_idle_ops.poll is the only ops for pv_idle.
.poll is called
>>>>>> in idle path which will poll for a while before we
enter the real idle
>>>>>> state.
>>>>>>
>>>>>> In virtualization, idle path includes several heavy
operations
>>>>>> includes timer access(LAPIC timer or TSC deadline
timer) which will
>>>>>> hurt performance especially for latency intensive
workload like message
>>>>>> passing task. The cost is mainly from the vmexit which
is a hardware
>>>>>> context switch between virtual machine and hypervisor.
Our solution is
>>>>>> to poll for a while and do not enter real idle path if
we can get the
>>>>>> schedule event during polling.
>>>>>>
>>>>>> Poll may cause the CPU waste so we adopt a smart
polling mechanism to
>>>>>> reduce the useless poll.
>>>>>>
>>>>>> Signed-off-by: Yang Zhang <yang.zhang.wz at
gmail.com>
>>>>>> Signed-off-by: Quan Xu <quan.xu0 at gmail.com>
>>>>>> Cc: Juergen Gross <jgross at suse.com>
>>>>>> Cc: Alok Kataria <akataria at vmware.com>
>>>>>> Cc: Rusty Russell <rusty at rustcorp.com.au>
>>>>>> Cc: Thomas Gleixner <tglx at linutronix.de>
>>>>>> Cc: Ingo Molnar <mingo at redhat.com>
>>>>>> Cc: "H. Peter Anvin" <hpa at zytor.com>
>>>>>> Cc: x86 at kernel.org
>>>>>> Cc: virtualization at lists.linux-foundation.org
>>>>>> Cc: linux-kernel at vger.kernel.org
>>>>>> Cc: xen-devel at lists.xenproject.org
>>>>> Hmm, is the idle entry path really so critical to
performance that a new
>>>>> pvops function is necessary?
>>>> Juergen, Here is the data we get when running benchmark
netperf:
>>>>    1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>>>>       29031.6 bit/s -- 76.1 %CPU
>>>>
>>>>    2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>>>>       35787.7 bit/s -- 129.4 %CPU
>>>>
>>>>    3. w/ kvm dynamic poll:
>>>>       35735.6 bit/s -- 200.0 %CPU
>>> Actually we can reduce the CPU utilization by sleeping a period of
>>> time as what has already been done in the poll logic of IO
subsystem,
>>> then we can improve the algorithm in kvm instead of introduing
another
>>> duplicate one in the kvm guest.
>> We really appreciate upstream's kvm dynamic poll mechanism, which
is
>> really helpful for a lot of scenario..
>>
>> However, as description said, in virtualization, idle path includes
>> several heavy operations includes timer access (LAPIC timer or TSC
>> deadline timer) which will hurt performance especially for latency
>> intensive workload like message passing task. The cost is mainly from
>> the vmexit which is a hardware context switch between virtual machine
>> and hypervisor.
>>
>> for upstream's kvm dynamic poll mechanism, even you could provide a
>> better algorism, how could you bypass timer access (LAPIC timer or TSC
>> deadline timer), or a hardware context switch between virtual machine
>> and hypervisor. I know these is a tradeoff.
>>
>> Furthermore, here is the data we get when running benchmark
contextswitch
>> to measure the latency(lower is better):
>>
>> 1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>>    3402.9 ns/ctxsw -- 199.8 %CPU
>>
>> 2. w/ patch and disable kvm dynamic poll:
>>    1163.5 ns/ctxsw -- 205.5 %CPU
>>
>> 3. w/ kvm dynamic poll:
>>    2280.6 ns/ctxsw -- 199.5 %CPU
>>
>> so, these tow solution are quite similar, but not duplicate..
>>
>> that's also why to add a generic idle poll before enter real idle
path.
>> When a reschedule event is pending, we can bypass the real idle path.
>>
> There is a similar logic in the idle governor/driver, so how this
> patchset influence the decision in the idle governor/driver when
> running on bare-metal(power managment is not exposed to the guest so
> we will not enter into idle driver in the guest)?
>
This is expected to take effect only when running as a virtual machine with
proper CONFIG_* enabled. This can not work on bare mental even with proper
CONFIG_* enabled.

Quan
Alibaba Cloud

Maybe Matching Threads

Search for more reasonably related threads

Linux Virtualization - Nov 2017 - [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

[PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

[PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

[PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

Maybe Matching Threads