MESSAGE
DATE | 2010-02-02 |
FROM | Ruben Safir
|
SUBJECT | Subject: [NYLXS - HANGOUT] C++ Workshop I datatypes cont..
|
We discussed that everything in our computer is data. Lets expand on this concept and create some sample programs that demonstrate this concept and flesh out the data types in more detail as they specifically relate to C++ programming. C++ inherits much of its core datatypes from C. And this has been a general problem for C++ texts because it seems that the authors of most of these books simply don't want to take the necessary time discuss the data types as inherited from C, and just flub it, with the assumption that the reader is familiar with the C language specifications. I'd like to correct that, and give a more uniform and coherent look at the C++ datatypes, which in reality, is completely necessary for understand all the more advanced feature of C++ and object design.
In C++ there are three categries of datatypes. There is the built in datatypes and their built in agregates. There is standard C++ extended Class based datatypes. And then there are user defined or created datatypes.
A) Built in Datatypes and their aggregates B) Standard Library C++ datatypes C) User Created Data types.
Previously I discussed that when it comes to a computer, and any programming language, everything is data. The code you write is data, the information your program retrieves and manipulates as a target is data, and the addressing of the storage locations of your data is data. In order to think about this in a simpilfied way, we often view a computer as a process manipulation system. This mathmatically and theoretically involves set theory and Turing Machine priciples, which I am not going to cover, but try to remember that when dealing with your computer, and when programming your computer, that is system exists in the real world. It is easy to forget this, but strangley enough, computers aren't just good at virtual reality, but actually work in reality...period. It is a machine. And as such it has inherent properties, many of which we take for granted in the real world, although scientists and engineers don't take it for granted, but when we interact with our digital systems, we forget, and then we suffer from poor coding habits and computers that perplex us.
As a real world machine, computers deal with processes. And these processes we try to control and shape to our needs. All processes, and this isn't negotiable, but a matter of fact, have three basic elements. They have input. They have output, and they have side affects. In the physical world, for example, all processes result in an increase in heat. The input is usually some kind of fuel. And the side affect is pretty much what we find useful. A simple machine like a gasoline powered automobie takes in gas, air and electricity, outputs burnt fuel, gases like Carbon Dioxide, and HEAT, and the side affect is controlled movement.
Computers and coding is a little more complex, but the principles remain intact. Your data, in the form of code, is sent into the CPU, along with other inputs and data, the CPU processes that information and returns its "state" and all the pretty pictures, sounds, printed materials, whatever your hardware allows, is the side affect.
Now that is very internal to your system, but another useful process to examine and to be aware of when working with your computer is process of how information flows through your computer, and involves the end user. On a standard computer model, the real world of process control is built into your digital computer. Information (data) enters your computer from the STANDARD INPUT DEVICE (your keyboard), does some work and displays results on your STARNARD OUTPUT DEVICE. And the side affects can be varried. But this is an artificial contruction, otherwise the computer would be useless, because the interaction of your computer with the real physical world would would be incomprehensible by the stupid humans that are using it. So while for your CPU, creating recognizable images on the screen is really a side affect, from an information theory perspective, and from a pracital programming perspective, it is the output. And most of your job as a programmer is to bridge that gap between your CPU, who's output is a return of "state" information, to get all of your programs side affects working so that the information processing generates correct and usable information responses for the humans communicating with it.
Now if I haven't yet confused you (and go back and reread this if I did), data must be understood by your computer and by you. The data that goes into your program must be understood by your computer and the data that is outputed must be understood by you. And the side affects of this process have to be correct as well. In fact, more often than not, the side affects is everything because the only information your program might be returning to you is, "OK - I'm finished - awaiting more instructions". We'll return to these principles over and over again.
In order to help bridge the human digital divide, so to speak, your computer handles data, and in modern machines, that data is based on the 8 bit Byte. The similist computer that individuals come in contact with is the simple light switch. It does everything your computer does. You have a simple switch with a bulb attached to it. Your turn the switch on (stadard input). Your switch allows electricity to flow to the buld and creates light and heat (side affect), and the switch records its "state", which is ON...otherwise it wouldn't be very useful. And as long as the system is working, it retains its state until you give it more information (turn it off in this case). Now in this system, you have one bit of information that can be processed. The light is either ON or OFF. there is nothing else you can do. But if we have 2 switches connected to a light bulb, we can engineer a system with up to 4 states. Off, On Dim, On Brighter, On Brightest, each one representing a different "state". With eight switches, you can represent 256 states...and THAT is exactly what yor digital computer does. Your hardware and program determines what each of those 256 states can mean.
If you need more that 256 states to implement a side affect, lets say we have a byte in our hardware that makes out stardard output device a color. 256 colors is nice. The billions of colors that the human eye can distiguish is nicer. But then we need to stirng together multiple bytes to implement that sceem. Perhaps, two or even 4 bytes. The same holds true for numbers. If we are using one byte, that will only allow us to describe 256 numbers, and in this case, to allow for positive and negitive numbers, one of the bits in our byte usually identifies the number as positive or negative. -127 to 127. That is called a signed one byte integer. And unsigned integer can't represent negitive numbers, (one bit in the byte is not used to describe negitive or positive state) and it can represent 0-255.
Now that isn't good enough for banking, science, and most other applications, although it is fine if we are making a card game or a dice game, so integers are most often represents by at least 2 bytes, and often 4 or more. And the ability to do with has to be available to the hardware and written into the language libaries.
So what data types does C++ give us by default? Well, not colors, we would have to create that. It gives us the following default data types, which I'll try to explain.
char - Characters. A representation of standard characters as a single byte, which is more than adequate for latin based languages like English. A data input that is a character is stored in a single byte which C++ will automatically tranlate into characters, like A-Za-z and others as defined in ASCI.
http://www.asciitable.com/
As computers got larger, the ASCII set was extended to into 2 bytes (ASCII extened) but in C++ a char is a SINGLE BYTE.
int - A one word Integer. Storage is hardware dependent, most normally in something called a " word" which on the standard 32 bit hardware and operating systems is 4 bytes (hence 32 bits). Machines are increasingly 64 bite word based hardware and the operating systems are catching up I would asume that those systems have 8 bytes ints.
Before going forward, it is worth stating that there is a standard for the C programming language called the ANSI C standard. You have to buy the standard if you want it and in it has rules for how to define the limits of datatypes like integers, and these are stored in your C programming enviroment in a file called limit.h.which on my system is located at :
/usr/include/limit.h
Integer sizes aren't nearly as well defined as I would like. The standard is a typical compromise of different corperate and academeic intrests. In the KN King text, on page 111, the following limits are outlined on 32 but systems
Integer data types can be defined as extended types with different sizes as follows:
short int: -32,768 - 32768 unsigned short int: 0 - 65535 int -2,147,483.648 - 2,147,483,648 unsigned int 0 - 4,294,967,295 long int -2,147,483.648 - 2,147,483,648 unsigned long int 0 - 4,294,967,295
These numbers are byte sizes (65535 is the largest number representable in 4 bytes).
If you have a 64 bit system, see your limits.h file for the definition on your box.
You can drop the word "int" for longs and shorts.
long int x; syntaxically is the same as long x;
although we haven't yet discussed any syntax.
Computers has a special chip to work with factional numbers in decimal notation (not an easy piece of engineering IMO). C++ has built in data types to cope with them which have the following minimal and maximum representations on 32 bit systems:
float 1.17 x 10^-39 - 3.40x10^38 and 6 digit precision double 2.22 x10^-308 - 1.19x10^308 and 15 digit precision long double very hardware specific
As I understand it, the numbers are actually stored in scientific notation which explains a precision of 10^308 is only expressed in 15 accurate digits. These are engineering definition as defined by an electronics standard called IEEE standard 754...more information than we usually need. But if you need know what you limits are exactly, see the float.h file that on my system is located at:
/usr/include/c++/4.4/tr1/float.h and /usr/lib/gcc/i586-suse-linux/4.4/include/float.h
There are also manual pages on your Linux system man float.h
C++ adds one additional data type calls a Boolean type to store true or false. In C (and in C++), processes that need to test for a true or false view 0 as false and anything else as true. But there are issues with that. int's are 4 bytes and signed, chars can actually be singed and unsigned as well....so C++ adds
bool true or false
which is really like some kiind of enum operator (which hasn't been introduced).
-- http://www.mrbrklyn.com - Interesting Stuff http://www.nylxs.com - Leadership Development in Free Software
So many immigrant groups have swept through our town that Brooklyn, like Atlantis, reaches mythological proportions in the mind of the world - RI Safir 1998
http://fairuse.nylxs.com DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002
"Yeah - I write Free Software...so SUE ME"
"The tremendous problem we face is that we are becoming sharecroppers to our own cultural heritage -- we need the ability to participate in our own society."
"> I'm an engineer. I choose the best tool for the job, politics be damned.< You must be a stupid engineer then, because politcs and technology have been attached at the hip since the 1st dynasty in Ancient Egypt. I guess you missed that one."
? Copyright for the Digital Millennium
|
|