MESSAGE
DATE | 2016-12-08 |
FROM | ruben safir
|
SUBJECT | Subject: [Learn] Fwd: Re: png data format
|
From learn-bounces-at-nylxs.com Thu Dec 8 21:50:13 2016 Return-Path: X-Original-To: archive-at-mrbrklyn.com Delivered-To: archive-at-mrbrklyn.com Received: from www.mrbrklyn.com (www.mrbrklyn.com [96.57.23.82]) by mrbrklyn.com (Postfix) with ESMTP id 6D7E3161312; Thu, 8 Dec 2016 21:50:13 -0500 (EST) X-Original-To: learn-at-nylxs.com Delivered-To: learn-at-nylxs.com Received: from [10.0.0.62] (flatbush.mrbrklyn.com [10.0.0.62]) by mrbrklyn.com (Postfix) with ESMTP id 618F6160E77 for ; Thu, 8 Dec 2016 21:49:55 -0500 (EST) References: To: learn-at-nylxs.com From: ruben safir X-Forwarded-Message-Id: Message-ID: <1e4c473e-6138-5f6a-3502-4d297a5d0ffc-at-mrbrklyn.com> Date: Thu, 8 Dec 2016 21:49:55 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------698E41EDFEA107524FAE25FB" Subject: [Learn] Fwd: Re: png data format X-BeenThere: learn-at-nylxs.com X-Mailman-Version: 2.1.17 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: learn-bounces-at-nylxs.com Sender: "Learn"
This is a multi-part message in MIME format. --------------698E41EDFEA107524FAE25FB Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit
Endian Discussion
--------------698E41EDFEA107524FAE25FB Content-Type: message/rfc822; name="Re: png data format.eml" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="Re: png data format.eml"
Path: reader1.panix.com!panix!not-for-mail From: ruben safir Newsgroups: comp.lang.c++ Subject: Re: png data format Date: Tue, 6 Dec 2016 17:14:24 -0500 Organization: PANIX Public Access Internet and UNIX, NYC Message-ID: References: NNTP-Posting-Host: www.mrbrklyn.com Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Trace: reader1.panix.com 1481062464 8634 96.57.23.82 (6 Dec 2016 22:14:24 GMT) X-Complaints-To: abuse-at-panix.com NNTP-Posting-Date: Tue, 6 Dec 2016 22:14:24 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.0 In-Reply-To: Xref: panix comp.lang.c++:1125873
On 12/06/2016 03:41 PM, David Brown wrote: > x86 uses little endian format, so 13 is stored as 0b 00 00 00 as a > 32-bit integer. PNG, like many network-related formats, uses big > endian. So it stores 32-bit 13 as 00 00 00 0b. (Incidentally, use hex > for this sort of thing - octal had no place in computing outside of > "chmod" since the 1970's.) > > Assuming you are trying to learn and understand this, rather than > copy-and-paste working code, then this should be enough to get you going.
thanks, excellent. What I don't understand though is why when I set up a loop and take it by the byte that the order is correct. It gets 00 00 00 and then 0d
(which is 13)
--------------698E41EDFEA107524FAE25FB Content-Type: message/rfc822; name="Re: png data format.eml" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="Re: png data format.eml"
Path: reader1.panix.com!panix!goblin3!goblin1!goblin.stu.neva.ru!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Jorgen Grahn Newsgroups: comp.lang.c++ Subject: Re: png data format Date: 6 Dec 2016 23:24:44 GMT Message-ID: References: X-Trace: individual.net S/6y+elBivaLHRW7EQfB1AV1f2XAAYs75DPFQpCIvuApEycxFF Cancel-Lock: sha1:M5KdnUXHAmqo0D4UYqj2QxTKXzw= User-Agent: slrn/pre1.0.0-18 (Linux) Xref: panix comp.lang.c++:1125874
On Tue, 2016-12-06, ruben safir wrote: > On 12/06/2016 03:41 PM, David Brown wrote: >> x86 uses little endian format, so 13 is stored as 0b 00 00 00 as a >> 32-bit integer. PNG, like many network-related formats, uses big >> endian. So it stores 32-bit 13 as 00 00 00 0b. (Incidentally, use hex >> for this sort of thing - octal had no place in computing outside of >> "chmod" since the 1970's.) >> >> Assuming you are trying to learn and understand this, rather than >> copy-and-paste working code, then this should be enough to get you going. > > > thanks, excellent. What I don't understand though is why when I set up > a loop and take it by the byte that the order is correct. It gets 00 00 > 00 and then 0d
I can't answer that, and I didn't read the code. However, treating the file as a series of bytes /is/ the right thing to do, so it doesn't surprise me if the result is correct. If the file looks like
f0 0d 12 34 00 00 00 0d 47 11 -----------
and the file format specification says "we store an integer in big-endian form in the marked area", I'd read it using a function similar to this one:
static unsigned get_bigendian32(const uint8_t* p) { unsigned n = 0; n = (n<<8) | *p++; n = (n<<8) | *p++; n = (n<<8) | *p++; n = (n<<8) | *p++; return n; }
(You also have to watch out for buffer overflows.)
/Jorgen
-- // Jorgen Grahn \X/ snipabacken.se> O o .
--------------698E41EDFEA107524FAE25FB Content-Type: message/rfc822; name="Re: png data format.eml" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="Re: png data format.eml"
Path: reader1.panix.com!panix!bloom-beacon.mit.edu!bloom-beacon.mit.edu!168.235.88.217.MISMATCH!feeder.erje.net!2.us.feeder.erje.net!weretis.net!feeder6.news.weretis.net!news.glorb.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!post01.iad.highwinds-media.com!fx34.iad.POSTED!not-for-mail X-Newsreader: xrn 9.03-beta-14-64bit Sender: scott-at-dragon.sl.home (Scott Lurndal) From: scott-at-slp53.sl.home (Scott Lurndal) Reply-To: slp53-at-pacbell.net Subject: Re: png data format Newsgroups: comp.lang.c++ References: Message-ID: X-Complaints-To: abuse-at-usenetserver.com NNTP-Posting-Date: Wed, 07 Dec 2016 13:35:24 UTC Organization: UsenetServer - www.usenetserver.com Date: Wed, 07 Dec 2016 13:35:24 GMT X-Received-Bytes: 2632 X-Received-Body-CRC: 2617373790 Xref: panix comp.lang.c++:1125888
Jorgen Grahn writes: >On Tue, 2016-12-06, ruben safir wrote: >> On 12/06/2016 03:41 PM, David Brown wrote: >>> x86 uses little endian format, so 13 is stored as 0b 00 00 00 as a >>> 32-bit integer. PNG, like many network-related formats, uses big >>> endian. So it stores 32-bit 13 as 00 00 00 0b. (Incidentally, use hex >>> for this sort of thing - octal had no place in computing outside of >>> "chmod" since the 1970's.) >>> >>> Assuming you are trying to learn and understand this, rather than >>> copy-and-paste working code, then this should be enough to get you going. >> >> >> thanks, excellent. What I don't understand though is why when I set up >> a loop and take it by the byte that the order is correct. It gets 00 00 >> 00 and then 0d > >I can't answer that, and I didn't read the code. However, treating >the file as a series of bytes /is/ the right thing to do, so it >doesn't surprise me if the result is correct. If the file looks like > > f0 0d 12 34 00 00 00 0d 47 11 > ----------- > >and the file format specification says "we store an integer in >big-endian form in the marked area", I'd read it using a function >similar to this one: > > static unsigned get_bigendian32(const uint8_t* p) > { > unsigned n = 0; > n = (n<<8) | *p++; > n = (n<<8) | *p++; > n = (n<<8) | *p++; > n = (n<<8) | *p++; > return n; > } > >(You also have to watch out for buffer overflows.)
I'd read it as a 32-bit int then byteswap it:
static inline uint32 swap32(uint32 value) { __asm__ __volatile__ ("bswap %0": "=a"(value): "0"(value)); return value; }
or
static inline uint32 swap32(uint32 value) { return __builtin_bswap32(value); }
--------------698E41EDFEA107524FAE25FB Content-Type: message/rfc822; name="Re: png data format.eml" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="Re: png data format.eml"
Path: reader1.panix.com!panix!goblin1!goblin.stu.neva.ru!eternal-september.org!feeder.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: David Brown Newsgroups: comp.lang.c++ Subject: Re: png data format Date: Thu, 8 Dec 2016 00:04:36 +0100 Organization: A noiseless patient Spider Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Wed, 7 Dec 2016 23:03:16 -0000 (UTC) Injection-Info: mx02.eternal-september.org; posting-host="33014f53d13646b57d3b45294c324f93"; logging-data="19462"; mail-complaints-to="abuse-at-eternal-september.org"; posting-account="U2FsdGVkX18RJDlWxUoOle6+fnq3v2tCMHr6p4sUq1Q=" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 In-Reply-To: Cancel-Lock: sha1:AAR7dkMHMKhYA109yHNH8FgIGSE= Xref: panix comp.lang.c++:1125902
On 07/12/16 14:35, Scott Lurndal wrote: > Jorgen Grahn writes: >> On Tue, 2016-12-06, ruben safir wrote: >>> On 12/06/2016 03:41 PM, David Brown wrote: >>>> x86 uses little endian format, so 13 is stored as 0b 00 00 00 as a >>>> 32-bit integer. PNG, like many network-related formats, uses big >>>> endian. So it stores 32-bit 13 as 00 00 00 0b. (Incidentally, use hex >>>> for this sort of thing - octal had no place in computing outside of >>>> "chmod" since the 1970's.) >>>> >>>> Assuming you are trying to learn and understand this, rather than >>>> copy-and-paste working code, then this should be enough to get you going. >>> >>> >>> thanks, excellent. What I don't understand though is why when I set up >>> a loop and take it by the byte that the order is correct. It gets 00 00 >>> 00 and then 0d >> >> I can't answer that, and I didn't read the code. However, treating >> the file as a series of bytes /is/ the right thing to do, so it >> doesn't surprise me if the result is correct. If the file looks like >> >> f0 0d 12 34 00 00 00 0d 47 11 >> ----------- >> >> and the file format specification says "we store an integer in >> big-endian form in the marked area", I'd read it using a function >> similar to this one: >> >> static unsigned get_bigendian32(const uint8_t* p) >> { >> unsigned n = 0; >> n = (n<<8) | *p++; >> n = (n<<8) | *p++; >> n = (n<<8) | *p++; >> n = (n<<8) | *p++; >> return n; >> } >> >> (You also have to watch out for buffer overflows.) > > I'd read it as a 32-bit int
That /might/ be acceptable, assuming you have control of aligned or unaligned accesses, as well as aliasing issues.
> then byteswap it: > > static inline uint32 > swap32(uint32 value) > { > __asm__ __volatile__ ("bswap %0": "=a"(value): "0"(value)); > return value; > } > > or > > static inline uint32 > swap32(uint32 value) > { > return __builtin_bswap32(value); > } >
That's okay when you have something working and are looking for greater optimisation - or if you are familiar enough with the endianness issues that you are happy to jump straight to an endianness swap routine. But one step at a time is best until the OP is confident in what he is doing here.
--------------698E41EDFEA107524FAE25FB Content-Type: message/rfc822; name="Re: png data format.eml" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="Re: png data format.eml"
Path: reader1.panix.com!panix!goblin3!goblin.stu.neva.ru!news.tu-darmstadt.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Jorgen Grahn Newsgroups: comp.lang.c++ Subject: Re: png data format Date: 8 Dec 2016 11:41:23 GMT Message-ID: References: X-Trace: individual.net rt5N189DidpIdfeEii/RCw6HleH5M+ZOeFKHe685Z8t+N4zpZb Cancel-Lock: sha1:lYzukdKlKDP72VlsAbT9Qc2XpdA= User-Agent: slrn/pre1.0.0-18 (Linux) Xref: panix comp.lang.c++:1125915
On Wed, 2016-12-07, David Brown wrote: > On 07/12/16 14:35, Scott Lurndal wrote: >> Jorgen Grahn writes: >>> On Tue, 2016-12-06, ruben safir wrote: >>>> On 12/06/2016 03:41 PM, David Brown wrote: >>>>> x86 uses little endian format, so 13 is stored as 0b 00 00 00 as a >>>>> 32-bit integer. PNG, like many network-related formats, uses big >>>>> endian. So it stores 32-bit 13 as 00 00 00 0b. (Incidentally, use hex >>>>> for this sort of thing - octal had no place in computing outside of >>>>> "chmod" since the 1970's.) >>>>> >>>>> Assuming you are trying to learn and understand this, rather than >>>>> copy-and-paste working code, then this should be enough to get you going. >>>> >>>> >>>> thanks, excellent. What I don't understand though is why when I set up >>>> a loop and take it by the byte that the order is correct. It gets 00 00 >>>> 00 and then 0d >>> >>> I can't answer that, and I didn't read the code. However, treating >>> the file as a series of bytes /is/ the right thing to do, so it >>> doesn't surprise me if the result is correct. If the file looks like >>> >>> f0 0d 12 34 00 00 00 0d 47 11 >>> ----------- >>> >>> and the file format specification says "we store an integer in >>> big-endian form in the marked area", I'd read it using a function >>> similar to this one: >>> >>> static unsigned get_bigendian32(const uint8_t* p) >>> { >>> unsigned n = 0; >>> n = (n<<8) | *p++; >>> n = (n<<8) | *p++; >>> n = (n<<8) | *p++; >>> n = (n<<8) | *p++; >>> return n; >>> } >>> >>> (You also have to watch out for buffer overflows.) >> >> I'd read it as a 32-bit int > > That /might/ be acceptable, assuming you have control of aligned or > unaligned accesses, as well as aliasing issues. > >> then byteswap it:
Here he also has to know he's on a little-endian machine, and with certain compilers. IMO, a high price to pay to avoid byte-level reads.
There is ntohl() if you're on Unix.
>> static inline uint32 >> swap32(uint32 value) >> { >> __asm__ __volatile__ ("bswap %0": "=a"(value): "0"(value)); >> return value; >> } >> >> or >> >> static inline uint32 >> swap32(uint32 value) >> { >> return __builtin_bswap32(value); >> } >> > > That's okay when you have something working and are looking for greater > optimisation - or if you are familiar enough with the endianness issues > that you are happy to jump straight to an endianness swap routine. But > one step at a time is best until the OP is confident in what he is doing > here.
Yes. Part of the point with my get_bigendian32() above is that it shows[0] that there /are/ no 32-bit integers in binary files, not in the C++ sense. There are just bytes, and your code is responsible for the conversion (according to the rules set by whoever created the file format).
/Jorgen
[0] Slightly exaggerated, perhaps.
-- // Jorgen Grahn \X/ snipabacken.se> O o .
--------------698E41EDFEA107524FAE25FB Content-Type: message/rfc822; name="Re: png data format.eml" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="Re: png data format.eml"
Path: reader1.panix.com!panix!goblin2!goblin1!goblin.stu.neva.ru!news.albasani.net!.POSTED!not-for-mail From: BGB Newsgroups: comp.lang.c++ Subject: Re: png data format Date: Thu, 8 Dec 2016 12:24:42 -0600 Organization: albasani.net Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Trace: news.albasani.net Q4+sgprFf1c/jn5xc/qsHCQ1k/pwnHKgJk5zbMMRF2IHEZQlEMeeDfzZrpLxnfZ/qCBhitQtXRsUIC2hcjYMBGTVLNM7OzEqqxIyXh9vxP7PQjhXKLmu3qZUFl5pl+zs NNTP-Posting-Date: Thu, 8 Dec 2016 18:24:33 +0000 (UTC) Injection-Info: news.albasani.net; logging-data="G1aEAkXb7CVs1gbaHhVGflsftWJenhULbwf+DxJD8ePqGOVADgyj6yC8fEFoFfFIneFFYsVmRAiPzgxmNnnQQUW+IUq6f4MdqHVEjzTzOW43qfeX8OyjpigrgLzL6YXat5ac+W9fgVzj7PWcXWKm5zr/r0OP4GWSiVG3CAE81FQ="; mail-complaints-to="abuse-at-albasani.net" User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 In-Reply-To: Cancel-Lock: sha1:8QLLNNs/qTlsNku7MXsVdBYS8Lg= Xref: panix comp.lang.c++:1125923
On 12/8/2016 5:41 AM, Jorgen Grahn wrote: > On Wed, 2016-12-07, David Brown wrote: >> On 07/12/16 14:35, Scott Lurndal wrote: >>> Jorgen Grahn writes: >>>> On Tue, 2016-12-06, ruben safir wrote: >>>>> On 12/06/2016 03:41 PM, David Brown wrote: >>>>>> x86 uses little endian format, so 13 is stored as 0b 00 00 00 as a >>>>>> 32-bit integer. PNG, like many network-related formats, uses big >>>>>> endian. So it stores 32-bit 13 as 00 00 00 0b. (Incidentally, use hex >>>>>> for this sort of thing - octal had no place in computing outside of >>>>>> "chmod" since the 1970's.) >>>>>> >>>>>> Assuming you are trying to learn and understand this, rather than >>>>>> copy-and-paste working code, then this should be enough to get you going. >>>>> >>>>> >>>>> thanks, excellent. What I don't understand though is why when I set up >>>>> a loop and take it by the byte that the order is correct. It gets 00 00 >>>>> 00 and then 0d >>>> >>>> I can't answer that, and I didn't read the code. However, treating >>>> the file as a series of bytes /is/ the right thing to do, so it >>>> doesn't surprise me if the result is correct. If the file looks like >>>> >>>> f0 0d 12 34 00 00 00 0d 47 11 >>>> ----------- >>>> >>>> and the file format specification says "we store an integer in >>>> big-endian form in the marked area", I'd read it using a function >>>> similar to this one: >>>> >>>> static unsigned get_bigendian32(const uint8_t* p) >>>> { >>>> unsigned n = 0; >>>> n = (n<<8) | *p++; >>>> n = (n<<8) | *p++; >>>> n = (n<<8) | *p++; >>>> n = (n<<8) | *p++; >>>> return n; >>>> } >>>> >>>> (You also have to watch out for buffer overflows.) >>> >>> I'd read it as a 32-bit int >> >> That /might/ be acceptable, assuming you have control of aligned or >> unaligned accesses, as well as aliasing issues. >> >>> then byteswap it: > > Here he also has to know he's on a little-endian machine, and with > certain compilers. IMO, a high price to pay to avoid byte-level > reads. > > There is ntohl() if you're on Unix. >
it also exists on Windows if using Winsock. though, it is not as good, as it is a function call into a DLL, so faster options are possible.
>>> static inline uint32 >>> swap32(uint32 value) >>> { >>> __asm__ __volatile__ ("bswap %0": "=a"(value): "0"(value)); >>> return value; >>> } >>> >>> or >>> >>> static inline uint32 >>> swap32(uint32 value) >>> { >>> return __builtin_bswap32(value); >>> } >>> >> >> That's okay when you have something working and are looking for greater >> optimisation - or if you are familiar enough with the endianness issues >> that you are happy to jump straight to an endianness swap routine. But >> one step at a time is best until the OP is confident in what he is doing >> here. >
those are not exactly portable options though.
also possible could be: uint32 bswap32(uint32 v0) { uint32 v1, v2; v1=((v0&0xFF00FF00U)>> 8)|((v0&0x00FF00FFU)<< 8); v2=((v1&0xFFFF0000U)>>16)|((v1&0x0000FFFFU)<<16); return(v2); }
with more specialized options based on combination of arch and compiler.
> Yes. Part of the point with my get_bigendian32() above is that it > shows[0] that there /are/ no 32-bit integers in binary files, not in > the C++ sense. There are just bytes, and your code is responsible for > the conversion (according to the rules set by whoever created the file > format). >
yep.
file formats get fun, endianess is variable, non-power-of-2 integer sizes are common, bitstreams are also common (with multiple variations thereof), ...
wrote various stuff about having multiple variations of an LZ compressed bitstream format intended for large numbers of small buffers (where minimizing constant factors becomes a bigger issue), but decided to leave this out as it drifts a bit far from the topic at hand.
but, yes, entropy coded bitstreams are also fun...
--------------698E41EDFEA107524FAE25FB Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline
_______________________________________________ Learn mailing list Learn-at-nylxs.com http://lists.mrbrklyn.com/mailman/listinfo/learn
--------------698E41EDFEA107524FAE25FB--
|
|