MESSAGE
DATE | 2010-02-03 |
FROM | Ruben Safir
|
SUBJECT | Re: [NYLXS - HANGOUT] C++ Workshop I datatypes cont..
|
In C and C++, there is three other data types that need to be discussed. All three are essential for advanced programming, and are low level. The are the Pointer, the Reference (which exists only in C++ and not C) and the Function or Method in Object Oriented Parlance. Lets first look at the pointer.
What is Memory? What is a Variable? What is an Address?
The pointer is a variable that stores a memory address. In a simplified picture, imagine that your computer has one megabyte of RAM. Each location in Ram has an address that the CPU understands. And you can ask the CPU to go to any specific location in Ram and read the date at that point. Now from the perspective of Ram, it doesn't care what the data stored at any memory location represents. To the Ram and the CPU, it is just binary bytes of information. And the CPU can read that byte, regardless on what that data represents to the user or other parts of the computer, or other programming. It just reads bytes. And after it reads a Word of data at some specific address, it can easily read the next address space, and the next and so on. Now this is the case all the time, in any programming language. Now the details of all this we're not going to get into the details of. We'll wait for the Assembly Language Workshop for a discussion of the details of how a CPU fetches and processes data from Ram. But we can understand that all the memory in a computer is mapped. And addresses themselves are data that has to be stored somewhere in the hardware for use by programs, and the Operating System.
So as programmers, not just in C++ but in any language. How do we instruct the CPU to to go out into Ram and acquire some data? And more importantly, how do we instruct the CPU to take some data from a source, and to SEND it from Ram for storage? Well, we can give the CPU the exact memory address of our data for storage or retrieval, and in fact, Assembly Language does nearly that. But that is really hard and impossible to debug the syntax. Instead the CPU accommodates SYMBOLIC VARIABLES. This is the creation of a symbol that allows us access to a machine language memory address, and to retrieve and store data in that location. So the CPU and our programs have these symbols, and data associated with them that sits in RAM. Forgetting C++ for a moment, lets make up our own language in this language we create variable symbols.
MYVAR("This is a string of data");
In our imaginary language creates a symbol MYVAR, which is stored in the computer for later use. Our language with the help of the CPU then allocates some finite memory in RAM and stores the address for that data in a very fast lookup table in association of the Symbol "MYVAR". And then we put the string "This is a string of data", all 23 bytes of data to my counting, in RAM starting at the memory address starting at the memory address that is associated with MYVAR. Now I chose the syntax
MYVAR("This is a string of data");
I could have used any other syntax rules that I think might be useful and understandable of programmers (and in fact, not enough thought, IMO, is given to this function of language design) and we could have created a syntax that looks like this:
MYVAR = "This is a string of datax":
or
MYVAR{:This is a string of data:}
or
MYVAR := "This is a string of data";
or
MYVAR<'This is a string of data'>
Let String MYVAR eq [This is a string of data]
are all possible syntax rules in our own made up programming language. I bring this up because the fluidity of syntax rules become an import concept in Object Oriented programming languages like C++. But I also make this point that no matter the syntax, the resulting affect and internal sequence of low level events needs to be the same. A Symbol is stored. An address is associated to the symbol, and data is stored.
Now what if we took a shortcut? Instead of creating a symbol that the CPU associates with some data in RAM, which then stores human useful data at that address, what if we just ignored the useful human data that we stored and just associate the memory address with the symbol, or alternately, store at the address associate with the symbol yet another machine language memory address? This can give us several advantages, creates several potential dangers and pitfalls, and even if it seems to be driving us closer to assembly language like code, can actually simply our coding. If we can avoid having to write real binary addressing code, this gives us a decent level of flexibility. We can indirectly point our symbol at any kind of data, for one thing, although limited by the rules of C++ syntax. Certain operations, like incrementing serially through a segment of memory can be sped up since the CPU is engineered to handle binary memory address operations very efficiently. We can pass access of very large segments of memory from one variable to another, without having to copy the whole memory segment to a new location. And remember that CODE itself is actually data and we can gain access to that code and pass it around like chars, ints and longs. But in order to retrieve the actually useful information that the stored memory address points at, we need to take an extra step. First we have to read the associated address attached to our variable. And then we have to fetch the data at the associated location in RAM that our symbol is associated. And since that data is itself is a machine language memory address, and not otherwise useful data, we then need to map that address that was stored in symbols associated RAM location, and deference that address to reach the useful data that we ultimately want to retrieve.
Does this seem confusing? It is and it isn't. Students to C and C++ choke on this all the time, and yet understanding this concept is absolutely the key to understanding how to read, program, design and analyze good C++ code. But what leads students astray is that the fact that this is a computer specific abstraction, they over think it. A pointer is really quite simple. It is a symbolic variable that stores a memory address. What the heck that stored address is pointing at, though, is where real world programming gets interesting and often confusing. C++ and C have tools to help make this a bit easier, one of which is that C++ is what is called a TYPED language, and has key words to help you from screwing yourself up like CONST. And we will look at all of this as we go forward.
One thing you will note is that I haven't actually written can C++ syntax yet. That isn't an accident. I want to teach the concepts first before we look at the implementations. I believe that often, especially at the beginning, trying to teach the syntax simultaneously with the concepts is a big teaching mistake.
Next - References (I hate references).
|
|