MESSAGE
DATE | 2020-05-14 |
FROM | Dave Bort
|
SUBJECT | Subject: [Hangout - NYLXS] How to tell if an emulated aarch64 CPU has
|
From hangout-bounces-at-nylxs.com Fri May 15 00:20:41 2020 Return-Path: X-Original-To: archive-at-mrbrklyn.com Delivered-To: archive-at-mrbrklyn.com Received: from www2.mrbrklyn.com (www2.mrbrklyn.com [96.57.23.82]) by mrbrklyn.com (Postfix) with ESMTP id C4320163FD4; Fri, 15 May 2020 00:20:40 -0400 (EDT) X-Original-To: hangout-at-www2.mrbrklyn.com Delivered-To: hangout-at-www2.mrbrklyn.com Received: by mrbrklyn.com (Postfix, from userid 1000) id 5AC01163FD3; Fri, 15 May 2020 00:20:38 -0400 (EDT) Resent-From: Ruben Safir Resent-Date: Fri, 15 May 2020 00:20:38 -0400 Resent-Message-ID: <20200515042038.GA26596-at-www2.mrbrklyn.com> Resent-To: hangout-at-mrbrklyn.com X-Original-To: ruben-at-mrbrklyn.com Delivered-To: ruben-at-mrbrklyn.com Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mrbrklyn.com (Postfix) with ESMTP id A6905163F9B for ; Fri, 15 May 2020 00:09:36 -0400 (EDT) Received: from localhost ([::1]:46436 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jZRfC-0006ej-C5 for ruben-at-mrbrklyn.com; Fri, 15 May 2020 00:09:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58384) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jZOvZ-0005JJ-VR for qemu-discuss-at-nongnu.org; Thu, 14 May 2020 21:14:18 -0400 Received: from mail-ej1-x62d.google.com ([2a00:1450:4864:20::62d]:37593) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jZOvX-0005dO-Ql for qemu-discuss-at-nongnu.org; Thu, 14 May 2020 21:14:16 -0400 Received: by mail-ej1-x62d.google.com with SMTP id l21so675747eji.4 for ; Thu, 14 May 2020 18:14:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dbort.com; s=google; h=mime-version:from:date:message-id:subject:to; bh=Iex8ACz1HZKuzodqNrQ531Eviu0SHYePLdcz7Kcw5So=; b=K2Pab31HulnyQ8YvdbWHrfzI5Tdl5QEVSktr9BXfC075llO4OwjdTM2iDjKCzwaCwN m9A9gspCLA3Kny4FG5cU0yx66j98/zK0BD+bt2lEnuFQNjQ7eI0qKGo8qYsRJdstnUaa 244wGHaY+7c04fVUIzCB2K/YbnFB8C8K9cx0E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=Iex8ACz1HZKuzodqNrQ531Eviu0SHYePLdcz7Kcw5So=; b=WB66GU7USecDNLxchXwcNwxDL2ZDTrq0D/xSz92oGaSNWlse2Mr95Yo3qzzUbnZJKJ heed5bPkl8T0nmEUGnCzlQ39heS8px7pb9JPp5ChdIXpomSqhkUDe8lticA9W3zi7YRS zWGMmAJD+X9tUJDnnFREdTrWTuU8M5pY+lo/mee3XLctYxcEVKA+GepJN7xPS5wmow5F OPrbXhuWPO5i6CXBmPqhSNqCXhCTQo7E9hjOgONizJyqjm+S+kLF9lAcnkHlFRLzBe13 9xSXIPGtgrdMscvP17cz4xcGc4EYf8MEGe6XzX7dh0wPjO67jZki0jKfDzmWp6wSF3Nt 4b3Q== X-Gm-Message-State: AOAM5337tlOC3IyVyTks2MWwkC+wHt3ZWuk5gqO23rC28ZNRi8FjRP+4 zGGbB7U6kWvlyFSD4jZE99XVHwu5j9CGXWUwrbQBxZ4Tlz0= X-Google-Smtp-Source: ABdhPJyb5iVrrbjPxCY1D4aL8tkcQYPoupwkIZF9d3RuMoSoUh+6CZMiuzzJkrfKkITj1D6hKTHUQ4i07gkPUWuYcF4= X-Received: by 2002:a17:906:938a:: with SMTP id l10mr656711ejx.186.1589505251590; Thu, 14 May 2020 18:14:11 -0700 (PDT) MIME-Version: 1.0 From: Dave Bort Date: Thu, 14 May 2020 18:14:00 -0700 Message-ID: To: qemu-discuss-at-nongnu.org Received-SPF: pass client-ip=2a00:1450:4864:20::62d; envelope-from=davebort-at-gmail.com; helo=mail-ej1-x62d.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-Mailman-Approved-At: Fri, 15 May 2020 00:09:11 -0400 X-BeenThere: qemu-discuss-at-nongnu.org X-Mailman-Version: 2.1.23 Precedence: list Subject: [Hangout - NYLXS] How to tell if an emulated aarch64 CPU has stopped doing work? X-BeenThere: hangout-at-nylxs.com List-Id: NYLXS Tech Talk and Politics List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============0524902973==" Errors-To: hangout-bounces-at-nylxs.com Sender: "Hangout"
--===============0524902973== Content-Type: multipart/alternative; boundary="000000000000e3c81605a5a58755"
--000000000000e3c81605a5a58755 Content-Type: text/plain; charset="UTF-8"
We use qemu (4.0.0, about to flip the switch to 5.0.0) to test our aarch64 images, running in linux containers on x86_64 alongside other workloads.
We've recently run into issues where it looks like an emulated CPU (out of four) sometimes stops making progress for ten or more seconds, and we're trying to characterize the problem. When this happens, the other emulated CPUs run just fine, though sometimes two will stall out at the same time.
Any suggestions for how to tell if an emulated CPU stopped doing work?
Based on our experiments, the guest-visible clocks and cycle counters continue to run when a qemu CPU thread is suspended, so it's hard to tell whether the emulation paused, or if our code is spinning with interrupts disabled (though evidence is mounting that that's not the case). We're adding a bunch more instrumentation to our code, but maybe qemu has some features that will help us out.
I tried to find a way to count the number of TBs executed by an emulated core over time, but I didn't see a cheap way to do that with the plugin APIs.
We could maybe turn on instruction tracing, but this problem happens pretty rarely (<1%), we don't have a repro case yet, and we can't really afford the cost of slowing down every test run. There's a decent chance that this is caused by an overloaded host, but our host-side investigations haven't turned up anything concrete either.
Any advice?
--dbort
--000000000000e3c81605a5a58755 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
We use qemu (4.0.0, about to flip the switch to 5.0.0) to = test our aarch64 images, running in linux containers on x86_64 alongside ot= her workloads. We've recently run into issues where it looks lik= e an emulated CPU (out of four) sometimes stops making progress for ten or = more seconds, and we're trying to characterize the problem. When this h= appens, the other emulated CPUs run just fine, though sometimes=C2=A0two wi= ll stall out at the same time. Any suggestions for how to tell if an= emulated CPU stopped doing work? Based on our experiments, the gues= t-visible clocks and cycle counters continue to run when a qemu CPU thread = is suspended, so it's hard to tell whether the emulation paused, or if = our code is spinning with interrupts disabled (though evidence is mounting = that that's not the case). We're adding a bunch more instrumentatio= n to our code, but maybe qemu has some features that will help us out. =
I tried to find a way to count the number of TBs executed by an emulate= d core over time, but I didn't see a cheap way to do that with the plug= in APIs.
We could maybe turn on instruction tracing, but this proble= m happens pretty rarely (<1%), we don't have a repro case yet, and w= e can't really afford the cost of slowing down every test run. There= 9;s a decent chance that this is caused by an overloaded host, but our host= -side investigations haven't turned up anything concrete either. v>
Any advice?
--dbort = div>
--000000000000e3c81605a5a58755--
--===============0524902973== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline
_______________________________________________ Hangout mailing list Hangout-at-nylxs.com http://lists.mrbrklyn.com/mailman/listinfo/hangout
--===============0524902973==--
--===============0524902973== Content-Type: multipart/alternative; boundary="000000000000e3c81605a5a58755"
--000000000000e3c81605a5a58755 Content-Type: text/plain; charset="UTF-8"
We use qemu (4.0.0, about to flip the switch to 5.0.0) to test our aarch64 images, running in linux containers on x86_64 alongside other workloads.
We've recently run into issues where it looks like an emulated CPU (out of four) sometimes stops making progress for ten or more seconds, and we're trying to characterize the problem. When this happens, the other emulated CPUs run just fine, though sometimes two will stall out at the same time.
Any suggestions for how to tell if an emulated CPU stopped doing work?
Based on our experiments, the guest-visible clocks and cycle counters continue to run when a qemu CPU thread is suspended, so it's hard to tell whether the emulation paused, or if our code is spinning with interrupts disabled (though evidence is mounting that that's not the case). We're adding a bunch more instrumentation to our code, but maybe qemu has some features that will help us out.
I tried to find a way to count the number of TBs executed by an emulated core over time, but I didn't see a cheap way to do that with the plugin APIs.
We could maybe turn on instruction tracing, but this problem happens pretty rarely (<1%), we don't have a repro case yet, and we can't really afford the cost of slowing down every test run. There's a decent chance that this is caused by an overloaded host, but our host-side investigations haven't turned up anything concrete either.
Any advice?
--dbort
--000000000000e3c81605a5a58755 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
We use qemu (4.0.0, about to flip the switch to 5.0.0) to = test our aarch64 images, running in linux containers on x86_64 alongside ot= her workloads. We've recently run into issues where it looks lik= e an emulated CPU (out of four) sometimes stops making progress for ten or = more seconds, and we're trying to characterize the problem. When this h= appens, the other emulated CPUs run just fine, though sometimes=C2=A0two wi= ll stall out at the same time. Any suggestions for how to tell if an= emulated CPU stopped doing work? Based on our experiments, the gues= t-visible clocks and cycle counters continue to run when a qemu CPU thread = is suspended, so it's hard to tell whether the emulation paused, or if = our code is spinning with interrupts disabled (though evidence is mounting = that that's not the case). We're adding a bunch more instrumentatio= n to our code, but maybe qemu has some features that will help us out. =
I tried to find a way to count the number of TBs executed by an emulate= d core over time, but I didn't see a cheap way to do that with the plug= in APIs.
We could maybe turn on instruction tracing, but this proble= m happens pretty rarely (<1%), we don't have a repro case yet, and w= e can't really afford the cost of slowing down every test run. There= 9;s a decent chance that this is caused by an overloaded host, but our host= -side investigations haven't turned up anything concrete either. v>
Any advice?
--dbort = div>
--000000000000e3c81605a5a58755--
--===============0524902973== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline
_______________________________________________ Hangout mailing list Hangout-at-nylxs.com http://lists.mrbrklyn.com/mailman/listinfo/hangout
--===============0524902973==--
|
|