Skip to main content

Programming Language Hierarchies

Computer languages are often divided into groups, such as "3GL" and "4GL" — third- and fourth-generation languages. The divisions aren't perfect, and don't always represent a chronological evolution. Trends that seemed to be advancing programming have been abandoned when they proved overly complex or simply unnecessary. Human languages experience similar trends. Artificial attempts to control languages, human and computer, often fail because the users of the languages have more influence than the "gatekeepers" wish.

A few languages are better "controlled" than others, but in the end the people seeking to communicate will do whatever they believe is needed. I've heard compiler designers complain about software programmers "abusing the language" just as grammarians complain about emerging writers. One difference, and it is a serious one, is that computer languages rely on strictly defined compilers. Curiously, French is controlled by a strict committee, too.

(I would love to compose an academic article — or book — on how computer languages are human languages, and not simply human-created artificial languages.)

In this post, I want to explain some of the groupings of computer languages. Later, I'll explain families of languages within the 3GL category, which is where most modern programming languages reside.

Low-Level (usually hardware specific)

In the ancient days of computing, we all had to "write" machine code. Also known as machine language or binary coding, the programmer had to translate ideas into the series of bits and bytes computers could understand. Today's computer hardware is still binary, though some chips have higher-level languages "in the silicone." Even a chip with higher-level language skills uses binary in the end.

Most computing occurs at the byte or word level. Registers, little "short-term memory" spaces, tend to be sized in bytes, which is why we discuss 8-bit (1 byte), 16-bit (2 bytes), 32-bit (4 bytes), and 64-bit (8 bytes) computing.

A good example of how bits have altered computing can be found in how we represent human languages in computing. The earliest systems used seven bits to store characters. You can store 128 different bit combinations in the seven bits. The American Standard Code for Information Interchange (ASCII) assigned "Latin" characters and symbols, and some special computing actions, to each of the 128 values.

A 256-bit (one full-byte) character table is used by the various flavors of DOS and Microsoft Windows. This table is known as the "PC-Extended" character set. On Apple's Macinstosh systems, the "Mac-Roman" character set uses a similar table. Today, we have Unicode to store characters and 110,182 human symbols and computing actions have been assigned to the table. (See http://www.alanwood.net/unicode/index.html) More bytes, more possibilities.

Machine programming is very, very basic. And today, computers still work this way:

  1. Place data in a register.
  2. Tell the computer what to do with the data.
  3. Read the results from a register.

The machine commands that do something are called "op-codes" for "operation codes." Generally, the number of op-codes is limited to the size of registers, so older systems had far fewer op-codes than modern processors. Larger brains and more memory allow computers to "understand" a more complex, though still simplified, machine language.

I could dig into the joys of machine language, but most people find it pretty boring. Actually, outside compiler creation and chip design, few people need to know about machine code. Still, I'll indulge a bit and explain how amazing what we do with computers can be. To multiple or divide, computers simply "rotate" bits in a byte. It is impressive.

As a human-readable version, here is how to multiply 4 by 2:

  1. Load the main register (a.k.a. the "accumulator") with the number "4" (0100 binary).
  2. Tell the computer to "rotate the bits" one spot, changing the byte to 1000 binary.
  3. Output the results.

Machine languages are specific to computer chips. The 6502 and the 8088 had different "languages" — you can't take machine language for one chip and run it on another. Many of the basics are the same because chip developers don't reinvent the system with each chip, but there are significant differences based on design philosophies and the purposes of chips.

Values in a processor, whether it is the central processing unit (CPU) or specializing processors, are being moved about registers, acted upon, and moved to various forms of memory. Technically, what you see on screen is nothing more than a viewport into a chunk of memory. When you move a "window" on screen, the computer copies what was "hidden" by the window back into the display memory. When I used to code video games, I relied on "paging" through memory, similar to "flip-book" animation.

It's hard to remember that what you see on a screen is as simple as "Copy this byte to video memory" repeated several thousand times. But, that's all it is. Thankfully, the days of "bit-mapping" are behind us. Most systems today use a high-level graphic application programming interface (API) to draw images to a screen canvas. Yet, for all the abstraction benefiting programmers, the computer is merely moving bits and bytes from one place in memory to another.

Second Generation

Even the "bare-metal" programmers I know at least work in "assembly language" and not machine code. I learned coding on a Commodore VIC-20, which was based on a 6502 chip. To develop complex software, you needed to use machine language. However, I never had to learn machine language for the x86 chips of IBM-compatible PCs, because by then I was using the Microsoft Macro Assembler.

According the computer science books, assembly languages are the "second generation" of computer programming. Really, assembly code is merely a slightly (very slightly) more readable way to write machine-specific code. Instead of using 0xED (hex) for "input" with the 8086, an assembler accepts "IN" and coverts the abbreviation to machine code.
http://en.wikipedia.org/wiki/X86_instruction_listingshttp://www.mlsite.net/8086/http://ece425web.groups.et.byu.net/stable/labs/8086Assembly.html
Assembly language is easier to code than machine language, at least for most mere mortals. There aren't many commands, but it is easier to remember them as something approaching "words" and "phrases" than to memorize which number corresponds to an op-code.

Macro Assembly

Computer programming involves reusing the same basic procedures, over and over again. Instead of coding the same dozen lines each time you want to display a character, it is more convenient to reuse and recycle assembly code. A "macro" assembler (such as MASM, the tool I learned years ago) allows programmers to create and reuse "macros" of code.

High-Level

As mentioned earlier, high-level programming languages are categorized as third- or fourth-generation. Most of us code in 3GLs. For a time in the 1980s and early 1990s, it appeared that 4GLs would be popular, but few of them have managed to thrive "in the wild" with developers.

Third Generation (3GL)

Like 99.99 percent of developers, I prefer to code in various 3GLs. The magic is that a 3GL allows developers to write one program that will work on multiple processors. When you write a program in C, you then input the code into a processor-specific compiler. The compiler generates machine code that is specific to the hardware. You don't need to know the differences between an 8086 and a 6502 to write a program that will work on both CPUs.

Examples of third-generation languages include BASIC, Pascal, Fortran, COBOL, and C. There are significant differences among 3GLs, both theoretical and practical. Regardless of these differences, 3GLs are converted to op-codes for a processor in one of three ways:

1) Interpreter: An "interpreter" converts the human-readable program to op-codes in "real-time" by parsing each line of code. Interpreters are slow, since they convert the program to processor commands every time the program is executed. The interpreter is a program, translating yet another program. Old-style BASIC was usually interpreted and many Web "scripting languages" are interpreted.

2) Compiler: A compiler is a program that takes human-readable code and converts it to a lower-level of code that executes more efficiently. The best compilers can evaluate code with multiple passes, removing unused code and adjusting remaining code. These "optimizing compilers" produce code rivaling hand-coded machine language. Most C compilers have optimization settings.

There are "pseudo-code" compilers that create "byte-code" or "p-code" that is faster than purely interpreted code, but it still requires a special "run-time" engine that helps execute programs. Most older Pascal compilers relied on p-code with a run-time. Today, Java and C# are compiled to intermediate code. There are benefits to this approach, especially if programs have to run on various processors.

3) Intermediate Translator: A translator converts a program from one human-readable programming language to another programming language that can be compiled or interpreted. There are many "X to C" examples of this, such as BASIC to C or Fortran to C. The resulting C code is seldom "ideal" but it allows the use of a common compiler.

While it would be ideal to select the best language for a specific task, that's seldom how it works in the "real world" with tool choices dictated by platform, clients, employers, and our own biases.

Curiously, languages are still associated with specific hardware — but not because they are technically linked to the hardware. Instead, today's "lock-in" is at the operating system level. Many "C-family" or "curly-brace" languages are associated with specific platforms: C# on Microsoft Windows, C++ on Linux, and Objective-C on Apple's OS X / iOS platforms. Sadly, you cannot "write once, run everywhere" with these popular programing languages.

In a future post, I'll explore the differences of languages in more detail. In particular, to study Mac programming I'll need to address structured versus object-oriented code.

Fourth Generation (4GL)

The fourth generation languages often are abstracted beyond code. These languages often work within databases, making them special-purpose. The interpreted languages within databases, like the "xBase" dialects originating with the dBase system, are the most familiar 4GLs. Other familiar 4GLs are the variations of Structured Query Language (SQL) used by most relational database engines.

Generally, 4GL languages are interpreted. Developers often use 4GLs within 3GLs. How is that possible? We generally include SQL code within our 3GL applications, sending the SQL commands to a server and then acting on the resulting data. This gives you the speed of the 3GL with the flexibility of the interpreted 4GL. I like SQL, despite some minor grammar oddities, and believe it is one of the most valuable skills to obtain.

Corporations need 4GL experts. You can earn a good living analyzing data, but it isn't glamorous work. Knowing SAS, SPSS, SQL, RPG, and other languages provides a solid career path. Because 4GLs tend to be data-centric, they are poorly suited for other tasks. If you want to analyze data, you'll be using a 4GL. If you want to write a great video game, consider learning a C-family language.

In the end, the more languages you know, the better. Even if you master only one or two, you will be able to transfer those skills to other languages when necessary.

Comments

Popular posts from this blog

MarsEdit and Blogging

MarsEdit (Photo credit: Wikipedia ) Mailing posts to blogs, a practice I adopted in 2005, allows a blogger like me to store copies of draft posts within email. If Blogger , WordPress, or the blogging platform of the moment crashes or for some other reason eats my posts, at least I have the original drafts of most entries. I find having such a nicely organized archive convenient — much easier than remembering to archive posts from Blogger or WordPress to my computer. With this post, I am testing MarsEdit from Red Sweater Software based on recent reviews, including an overview on 9to5Mac . Composing posts an email offers a fast way to prepare draft blogs, but the email does not always work well if you want to include basic formatting, images, and links to online resources. Submitting to Blogger via Apple Mail often produced complex HTML with unnecessary font and paragraph formatting styles. Problems with rich text led me to convert blog entries to plaintext in Apple Mail

Learning to Program

Late last night I installed the update to Apple's OS X programming tool suite, Xcode 4. This summer, in my "free" time I intend to work my way through my old copy of Teach Yourself C and the several Objective-C books I own. While I do play with various languages and tools, from AppleScript to PHP, I've never managed to master Objective-C — which is something I want to do. As I've written several times, knowing simple coding techniques is a practical skill and one that helps learn problem solving strategies. Even my use of AppleScript and Visual Basic for Applications (VBA) on a regular basis helps remind me to tackle problems in distinct steps, with clear objectives from step to step. There are many free programming tools that students should be encouraged to try. On OS X, the first two tools I suggest to non-technical students are Automator and AppleScript. These tools allow you to automate tasks on OS X, similar to the batch files of DOS or the macros of Wor

Learning to Code: Comments Count

I like comments in computer programming source code. I've never been the programmer to claim, "My code doesn't need comments." Maybe it is because I've always worked on so many projects that I need comments  to remind me what I was thinking when I entered the source code into the text editor. Most programmers end up in a similar situation. They look at a function and wonder, "Why did I do it this way?" Tangent : I also like comments in my "human" writing projects. One of the sad consequences of moving to digital media is that we might lose all the little marginalia authors and editors leave on manuscript drafts. That thought, the desire to preserve my notes, is worthy of its own blog post — so watch for a post on writing software and notes. Here are my rules for comments: Source code files should begin with identifying comments and an update log. Functions, subroutines, and blocks of code should have at least one descriptive comment.