MESSAGE
DATE | 2016-10-31 |
FROM | Christopher League
|
SUBJECT | Re: [Learn] cuda kernels
|
From learn-bounces-at-nylxs.com Mon Oct 31 17:33:52 2016 Return-Path: X-Original-To: archive-at-mrbrklyn.com Delivered-To: archive-at-mrbrklyn.com Received: from www.mrbrklyn.com (www.mrbrklyn.com [96.57.23.82]) by mrbrklyn.com (Postfix) with ESMTP id 07F7E161312; Mon, 31 Oct 2016 17:33:52 -0400 (EDT) X-Original-To: learn-at-nylxs.com Delivered-To: learn-at-nylxs.com Received: from liucs.net (contrapunctus.net [174.136.110.10]) by mrbrklyn.com (Postfix) with ESMTP id DEA68160E77 for ; Mon, 31 Oct 2016 17:33:49 -0400 (EDT) Received: from localhost (unknown [148.4.40.11]) by liucs.net (Postfix) with ESMTPSA id 8089FE08E; Mon, 31 Oct 2016 17:33:48 -0400 (EDT) From: Christopher League To: Ruben Safir , Samir Iabbassen , learn-at-nylxs.com In-Reply-To: <9570d66b-7c02-4e2d-3225-20e008158f0d-at-mrbrklyn.com> References: <9570d66b-7c02-4e2d-3225-20e008158f0d-at-mrbrklyn.com> User-Agent: Notmuch/0.21 (http://notmuchmail.org) Emacs/25.1.1 (x86_64-unknown-linux-gnu) Date: Mon, 31 Oct 2016 17:33:47 -0400 Message-ID: <87lgx48apw.fsf-at-contrapunctus.net> MIME-Version: 1.0 Subject: Re: [Learn] cuda kernels X-BeenThere: learn-at-nylxs.com X-Mailman-Version: 2.1.17 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============0787192310==" Errors-To: learn-bounces-at-nylxs.com Sender: "Learn"
--===============0787192310== Content-Type: multipart/alternative; boundary="=-=-="
--=-=-= Content-Type: text/plain
Ruben Safir writes:
> The video says that the below lines ask for 64 copys of the kernel on 64 > threads? I don't see that > > cudaMemcpy(d_in, h_in, ARRAY_BYTES, cudaMemcpyHostToDevice); > //launch kernel > square<<<1, ARRAY_SIZE>>>(d_in, d_out); > cudaMemcpy(h_out, d_out, ARRAY_BYTES, cudaMemcpyDeviceToHost);
I don't really know anything about CUDA yet, but ARRAY_SIZE was 64. And this syntax: `square<<<1, ARRAY_SIZE>>>` is NOT valid C/C++ syntax. So I assume that the specialized GPU compiler interprets that in a way that it distributes the computation of the `square` function across 64 GPU threads...?
BTW, I'm pretty sure "kernel" here doesn't refer to "the Linux kernel."
CL
--=-=-= Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
1.0, user-scalable=3Dyes">
Ruben Safir ruben-at-mrbrklyn.com= writes:
The video says that the below lines ask for 64 copys of the kernel on 64= threads? I don=E2=80=99t see that
cudaMemcpy(d_in, h_in, ARRAY_BYTES, cudaMemcpyHostToDevice); //launch kernel square<<<1, ARRAY_SIZE>>>(d_in, d_out); cudaMemcpy(h_out, d_out, ARRAY_BYTES, cudaMemcpyDeviceToHost);
I don=E2=80=99t really know anything about CUDA yet, but ARRAY_SIZE was = 64. And this syntax: square<<<1, ARRAY_SIZE>>>e> is NOT valid C/C++ syntax. So I assume that the specialized GPU compiler= interprets that in a way that it distributes the computation of the = square function across 64 GPU threads=E2=80=A6?
BTW, I=E2=80=99m pretty sure =E2=80=9Ckernel=E2=80=9D here doesn=E2=80= =99t refer to =E2=80=9Cthe Linux kernel.=E2=80=9D
CL
--=-=-=--
--===============0787192310== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline
_______________________________________________ Learn mailing list Learn-at-nylxs.com http://lists.mrbrklyn.com/mailman/listinfo/learn
--===============0787192310==--
|
|