An Introduction to and Overview of Computer Programming Languages
INTRODUCTION
In this section, programming languages will be discussed, and the evolution of the various languages and programming paradigms will be traced. The way in which computers are programmed has a direct influence on how computation is thought. Consequently, the fundamental concepts of programming languages constitute an essential body of knowledge in computer science, and these principles are essential to programmers, software specialists, computer scientists, and, to a degree, to humanities scholars that employ programming tools and languages in their work (Louden & Lambert, 2011).
Operating Systems
Before proceeding, however, the important concept of an operating system must be discussed. Most users are familiar with the general principles of operating systems. They are aware that Microsoft Windows or Mac OS X are such systems. They also know that these systems control the overall operation of the computer system. Specifically, the operating system (OS) of a computer is the central system software in the computer system. The OS is the primary way in which the system “communicates” with users. It starts the system software and applications as needed and/or requested by the user. Frequently, users access the OS through a graphical user interface (GUI), a visual interface to operating system or to other system software/applications. Microsoft Windows is an example of a GUI-based operating system. The OS also allows
I/O systems to communicate with various devices (Schneider & Gersting, 2018).
System commands are user instructions specifying actions to be taken by the computer. One may consider the OS as the “receptionist” and “dispatcher” in a computer system. From user commands, system software is scheduled and executed. Often, users employ text-based system commands that are entered at a prompt at a terminal or computer screen. Such operations can, for instance, be entered through the command prompt accessible from Microsoft Windows (by entering cmd at the Windows prompt). The command language implemented by an OS must be learned by users, and, consequently, many users employ GUI-based system commands through a visual/mouse interface (Schneider & Gersting, 2018).
Operating systems also provide system security and protection. They permit only authorized access to resources. The OS therefore acts as a “security guard”. For individual users, access is protected via usernames and passwords. Additionally, sensitive data, such as passwords, are encrypted (coded into a format unreadable to potential threats) to increase security. The OS also maintains authorization lists for folders (directories) and files. File permissions for which authorization can be granted include reading a file, adding to, or modifying a file, and deleting a file (Schneider & Gersting, 2018).
The OS is also responsible for efficient allocation of resources. The goal is to utilize the CPU as much as possible, and to ensure the smooth progress of all running programs. In modern-day computer systems, including laptops and small mobile devices, several programs may be executing simultaneously. Consequently, the OS directs access to system resources, such as memory or CPU time, for these programs. Every program is either in the running state, in which a program is currently using CPU resources, waiting, in which programs are “waiting” for I/O operations to complete before accessing resources or the CPU, and ready, in which programs can use CPU resources. Consequently, the OS maintains a queue of ready programs, and enforces sharing of system resources (Schneider & Gersting, 2018).
Operating systems in common use include Microsoft Windows, which also provides a command prompt available from the cmd command, which offers similar functionality to the older MS-DOS command-based operating system prior to the introduction of Windows in the mid-1980s. For Apple computers, the Mac OS X operating system is employed. Larger (and sometimes older) mainframe and minicomputers run proprietary operating systems provided by the system manufacturer. For large-scale multi-user, multi-user, and high-performance systems, the UNIX system, introduced in 1971 by AT&T Bell Labs, is frequently employed. A newer operating system similar to UNIX, named Linux, was first released in 1991. Linux is the operating system on a variety of computer systems, including personal computers (PCs) and high-performance computing systems. The operating system is widely deployed on supercomputers and parallel distributed computing clusters in Compute Canada, a Canada-wide organization that provides high-performance advanced research computing services and infrastructures to Canadian researchers.
Programming Languages
A programming language is defined as “a notation for communicating to a computer what is to be done” (Louden & Lambert, 2011). There is a variety of programming languages, and not just one. Each language has particular strengths in performing specific computational tasks. Such tasks include complex mathematical computations on real (floating-point) numbers, detailed page layout, or database interactions. Therefore, developers and computer scientists select a language based on what computations and tasks are to be performed. Programming languages are also chosen based on their approach to computation, or their paradigm. A particularly useful group of languages are the procedural (or imperative) languages, or languages that employ the procedural, or imperative, programming paradigm. In procedural language paradigm, the programmer explicitly specifies an algorithm, or step-by-step instructions concerning how variables, or data in memory locations in the computer, are manipulated (Schneider & Gersting, 2018). Several popular programming languages are found in the procedural paradigm, which have the same underlying model.
Programs themselves are sequences of statements. Several different programming languages are described below. These languages vary in syntax, or how statements are written, and semantics, which denotes the meanings of those statements.
In the very early era of computing, prior to the mid- 1940s, operators programmed computers by setting switches to adjust the internal wiring of a computer to perform the required computational tasks. The operator would enter a series of binary codes to organize the basic hardware operations to solve more specific problems. Operators could flip switches to enter these machine language codes into memory. However, the 1940s was also the era of “languages without machines” (Louden & Lambert, 2011). The mathematical basis for programming languages (specifically, functional languages, described below) was provided by the lambda calculus, a mathematical logic for expressing computations, and introduced by American mathematician Alonzo Church (1903 – 1995) in 1941. In another development, in 1945, Konrad Zuse (1910 – 1995), a German civil engineer, proposed Plankalkül (“planning calculus”) a computer language intended for engineering purposes. However, although Plankalkül was designed and developed, it was not implemented.
Programming languages allowed computer users to solve problems without having to reconfigure, and possibly rewire, the hardware. Recall that John von Neumann proposed that computers should be permanently hardwired with a small set of general-purpose operations. Program execution begins with the first line of code entered. Code is fetched from memory, decoded (interpreted), and executed. Control then moves to the next line of code and continues until a halt instruction is reached, or until an error condition is encountered.
Recall that machine language is a low-level language that operates directly on the computer’s hardware, particularly, the CPU. Machine language is specific to each type of processor. Therefore, if a programmer were working on several different computer systems, it was likely that the programmer would employ several different machine languages, specific to the system’s architecture and microprocessor. Furthermore, machine language was difficult to learn, to understand, and to interpret because very low-level instructions (instructions executed by the processor) often did not directly correspond to a computational task. For example, a simple operation, such as setting a variable x to the sum of two numbers, say, 76 and 28, could not simply be written as x = 76 + 28, but needed to be specified in a sequence of processor operations to load the data into registers, execute the op code to perform an addition, store the result in a register, locate the memory address of x, and store the result into an output register. All memory addresses and register addresses were therefore also encoded, and it was the responsibility of the programmer to keep track of these addresses and to encode them properly. This sequence does not even include outputting the x, the sum. I/O was particularly difficult in machine language because of the different subsystems involved. Additionally, machine language instructions were also encoded in a binary representation. Consequently, machine code often appeared as a string of 0’s and 1’s.
To make machine language at least somewhat more understandable and interpretable, assembly language (specifically, several assembly languages) was (were) introduced in the 1950s. Assembly language, in general terms, is a set of mnemonic symbols for instruction codes and memory locations. In the early years of computing, assembly language code used input devices such as keypunch machines and card readers. Assembly language replaced operations represented by binary strings with human-readable codes, such as ADD for addition, LD for loading data into a register, or MOV for moving data from one location to another. As another example, consider the following line of code:
LD R1, FIRST
This code indicates that the value of the memory location specified by FIRST
is loaded into the register designated by R1
. The values of FIRST
(i.e. the memory location) and the location of R1
must be explicitly specified by the programmer. The line of code is a direct “translation” from machine code.
In fact, assembly language is machine language, except that human-readable codes were substituted for the binary codes used in machine language. That is, assembly language has a one-to-one correspondence with machine language, where each assembly language operation maps to a corresponding machine language operation. Consequently, the mnemonic symbols provided by assembly language signified a definite improvement over binary-encoded machine language.
After assembly code has been written, for example, in a standard text editor, an assembler program translates the symbolic assembly language code to binary machine code. To execute (run) the code on the CPU, a loader program loads the machine code into computer memory.
In addition to its obvious advantages, Assembly language possesses some drawbacks. It is still very far from the natural languages by which human beings communicate. The programmer is still responsible for managing the movement of data among and between memory locations and registers. Like machine language, Assembly language only provided the programmer with a low-level (that is, close to the hardware) view of the computational task to be performed and lacks the abstraction of readily understood conventional mathematical notation. As was the case with machine languages, and because of the one-to-one correspondence of assembly language to machine language, assembly language is also machine specific. Each type of computer hardware architecture has its own set of machine language instructions, and therefore requires its own dialect or version of assembly language. Hence, there are several assembly languages, just as there are several machine languages. However, because assembly languages consist of human-readable codes, there are greater similarities among the various assembly languages than among different machine languages. Examples of assembly language code are readily available on the Internet, for example, Assembly language.
It must be emphasized that although assembly languages are still difficult to write and interpret, they remain very “close” to the hardware, and have direct access to the computer’s CPU. Consequently, they are still used to this day for low-level system tools or for manual optimization, or where the highest level of performance is required. They are used extensively for writing driver programs (software that communicates with and controls peripheral devices), components of operating systems, safety-critical systems, and embedded, real-time systems, where high performance is mandated. As a testimony to its continued popularity, assembly language (specifically, the generic category of assembly languages) was listed as the #8 programming language on the Tiobe Index, a monthly programming community-based index providing metrics for the popularity of programming languages (December 2021).
The next step in language progression, high-level programming languages were developed to improve on assembly language and to address some of the shortcomings of the latter.
One specific improvement is removing the requirement that programmer explicitly needs to manage data in memory. Very importantly, high-level languages allowed programmers to express computations in a more natural way, using dialects closer to natural language. The resulting programs were therefore much more readable and interpretable, in addition to being easier to write. Expectations for high-level languages included the following features (Schneider & Gersting, 2018):
- Programmers declare names and (sometimes) data types for variables.
- The program manages the movement of data associated with a given variable name.
- The language allows the programmer to take a higher-level view of tasks (e.g., x = 76 + 28).
- Languages provide statements for high-level mathematics; that is, computations are expressed in mathematical notation.
- Low-level details of conditional statements and iterative, repeating loops are abstracted away from the programmer.
- Programs are generally portable from one computer to another.
- Programming languages are standardized, and generally independent of the specific CPU on which they run.
Many high-level languages require compilers that are written for a particular computer system. Compilers are programs that translate human-readable programs written in high-level languages to the machine language that executes on the processor. Hence, a compiler can be considered as a “translator” for high level languages, converting source code, or code written by a (human) programmer as text, into assembly code or a similar low-level language. The assembly code is then converted to an intermediate object code. Object code can still not be directly executed on a CPU. Therefore, in a subsequent step, the object code is integrated with code libraries that contain object code for useful tools, such as I/O routines. This process of combining the object code from the source code and object code from the code libraries is called linking. The link step uses a special systems program called the linker. The linker integrates multiple files of object code to create an executable module, which runs on the CPU.
Other high-level languages, like Python, are interpreted languages that require an interpreter program on each machine on which the code is executed. The main difference between compiled and interpreted languages is that in the former, an entire program is translated (possibly after a number of intermediate steps) into machine language, which executes directly on the CPU. In other words, the CPU is, in a sense, running a machine language program. In interpreted languages, in contrast, each line is translated into machine code, one line at a time, and subsequently executed by the CPU. Compiled code is therefore more efficient than interpreted code. However, as explained in a subsequent section, interpreted languages have several advantages, especially users in the digital humanities and for users that are not computer scientists or professionals, and in cases where high-performance is not stringently required.
The following subsections describe a very broad genealogy of high-level programming languages and their applications. The dates for the subsection titles indicate only the general time frame in which the language was initially developed. Many of the early programming languages described here still enjoy widespread usage in the present day.
1950s
One of the first high-level languages, and one that is still widely used in the present day, is FORTRAN (or Fortran). FORTRAN, or the FORmula TRANslation or FORmula TRANslator language, was developed by American computer scientist John Backus in the 1950s. The language was originally developed for scientific, engineering, and mathematical applications, which were the main source of demand for computation. The groundbreaking innovation of FORTRAN was that the language employed algebraic notation and reflected the architecture of a particular type of machine being programmed. Although it lacked some of the structured control statements and data structures available in subsequent high-level languages, FORTRAN became extremely popular with scientists and engineers for its support for algebraic notation and floating-point numbers. FORTRAN has undergone many revisions since its inception. Major versions include FORTRAN II and FORTRAN III (both in 1958), FORTRAN IV (1961), FORTRAN 66 (1966), FORTRAN 77, which became informally known as Fortran 90 (1977), and Fortran 95, which introduced many extensions, including high-performance computing features. The current version of FORTRAN in use as of August 2021 is Fortran 2018.
FORTRAN is a highly efficient language, and therefore the language, which now supports parallel computation, is still used for high-performance computing applications. As of December 2021, it held position #17 on the Tiobe Index.
The COBOL language, developed in 1959-1960, is the Common Business-Oriented Language. In 1959, it was developed by a group of computer professionals called the Conference on Data Systems Languages (CODASYL). Admiral Grace Hopper (1906 – 1992) participated in this group. CODASYL was an information technology industry consortium that was formed to guide the development of a standard programming language. COBOL was the first programming language whose use was mandated by the United States Department of Defense. COBOL is still the most widely used programming language for business applications, mostly because of the large amount of legacy software originally written in COBOL and still updated and used in the present. Many current systems can be considered legacy applications, whose code base may have been developed several decades ago. COBOL programmers are therefore in demand to maintain these large software systems. This fact was highlighted in the late 1990s with the realization that COBOL programs that encoded the last two digits of years (e.g., coding 1984 as 84) to save space (which was at a premium in early systems on which these COBOL programs ran), were not able to handle years in the new millennium. This problem, affecting many large-scale computer systems written in a variety of languages, and not just COBOL, became known as the Y2K problem. Consequently, programmers needed to undertake a painstakingly detailed examination of a vast number of large systems written in COBOL to correct the problems. As of December 2021, COBOL is listed at position #23 in the Tiobe Index.
LISP (1956-1962), or the List Processing language, was the first functional programming language. Functional languages were designed for symbolic manipulation, in contrast to numerical computations (Louden & Lambert, 2011). Its syntax was markedly different from FORTRAN or COBOL. In this language, and in fact most functional programming languages, the clear expression of ideas and concepts takes precedence over efficiency. LISP is still heavily used at present artificial intelligence (AI) research and applications. A dialect of LISP is also incorporated into the widely used AutoCAD (AutoDesk, Inc.) software system for computer-assisted design, for the purpose of automating drawing and analysis tasks. It is the #31 language on the Tiobe Index as of December 2021.
ALGOL 60, the ALGOrithmic Language, was a general-purpose programming language developed in the years 1958-1960. It was an expressive language for describing algorithms. It was, and, to a degree, still is used widely in Europe and North America in academic institutions.
1960s
Other early high-level programming languages include PL/I (Programming Language I), developed in 1963-1964. PL/I was intended to be a “universal language”. Although the language had forward-looking features, it is not generally considered to be successful, as its goals were somewhat ambitious and the language itself was too large (Louden & Lambert, 2011).
Algol68 was a theoretical triumph, and quite interesting from a theoretical viewpoint, but impractical for general usage. BASIC, or the Beginners All-purpose Symbolic Instruction Code, was developed in 1964. It distilled the most elementary ideas for the simplest computers. The language was widely distributed on the first home PCs. Microsoft’s Visual Basic language is very popular among programmers and computer enthusiasts. Visual Basic holds position #6 on the Tiobe Index (December 2021). Simula67 (1967) is considered to be the first object-oriented (OO) language programming language, in which real-world objects and abstract computations are modeled as “objects” in the language. Objects communicate through message passing, and contain both functionality and data. Simula67 was inefficient, but still formed the foundation on which subsequent OO languages, such as Java and C++, were built (Louden & Lambert, 2011).
1970s
The decade of the 1970s was a time of simplicity and abstraction in programming languages. Languages became more “high-level” and easier to use, while maintaining powerful, low-level features (i.e., features that utilize CPU functions directly). Abstraction in the current context refers to programmers thinking more about the computational problem to be solved, rather than on specific implementation details. In other words, low-level technical details were “abstracted” away from programmers, allowing them to concentrate on solving problems at a higher level. These programming languages were significantly simpler than the languages 1960s, but also added new data structures (like COBOL) and structured control (like Algol60). However, no major new concepts were added to language design (Louden & Lambert, 2011).
The Prolog (Programming in Logic) language was developed in the early 1970s, with compilers developed in the mid to late 1970s. Prolog is used primarily as a programming language for AI and non-numerical programming. Prolog is a popular programming language and occupies a relatively high position in the Tiobe Index (#25 as of December 2021).
A major innovation in programming languages was the introduction of the C language in 1972. C was designed at Bell Laboratories by American computer scientist Dennis Ritchie. C is a small, compact, yet very powerful general-purpose language. It was originally intended as a systems language to build operating systems, drivers, and other programs that interface with hardware. However, its utility became recognized for a variety of other applications. One of the benefits of C is its ability to interact with low-level hardware, thereby being able to perform some of the operations formerly assigned to assembly language programs. C, however, is considered by some to be a difficult language, primarily because of its use of pointers, or variables that allow direct access to memory locations and to computer hardware. C is arguably the most widely employed high-level programming language in the present day. The code produced by the C compiler is highly efficient, and therefore the language is used in safety-critical systems, embedded systems, and high-performance computing software. It occupies the #2 position in the Tiobe Index (December 2021), having previously held the top spot on this index.
The Pascal language, named after the French philosopher and mathematician Blaise Pascal (1623 – 1662), was developed by Swiss computer scientist Niklaus Wirth in 1971. Wirth also designer several other programming languages. Primarily intended as a pedagogical language to teach good programming skills, Pascal was used in several applications in the 1970s and 1980s, although its use is more limited in the current day. Smalltalk was developed by Xerox PARC in the early 1970s for graphics workstations. It is considered to be instrumental in the development of the object-oriented paradigm, and the language influenced most of the modern object-oriented languages. The Scheme language was introduced in 1975. It is a functional programming language and is related to LISP (Louden & Lambert, 2011).
1980s
The 1980s witnessed the increasing influence of object-orientation. Ada was introduced in 1980. While not strictly an object-oriented language, Ada represented a serious attempt at a universal language. The language was originally designed by a team that was under contract to the United States Department of Defense (DoD). It was to be the default programming language for embedded systems, although other languages, including C, have eventually proved more popular (Louden & Lambert, 2011). Ada occupies position #30 on the Tiobe Index (December 2021), although in 1986, it held the #2 position, and was #5 in 1991.
Smalltalk80 (1980) was an advancement of OO programming (Louden & Lambert, 2011). However, it can be argued that OO became a mainstream paradigm with the introduction of C++ by Danish computer scientist Stroustrup in 1980. Originally named “C with Classes”, C++ was the extending of the C language with OO concepts. The design of C++ was focused on considerations related to system programming and embedded, resource-constrained software and other large systems, including performance, efficiency, and flexibility (Louden & Lambert, 2011). As the C language is renowned for its efficiency, C++ demonstrated that OO programming can also be efficient. C++ has an extremely large community of users and programmers. It is widely used in video game development, commercial systems, servers and databases, and performance-critical applications. C++ occupies the #4 position on the Tiobe Index (December 2021).
1990s
The 1990s witnessed several important technological innovations in programming languages. During this period, there was a recognition for the increasing need for large, powerful software libraries and application programming interfaces (APIs). Arguably, the most important language of the 1990s was Java, designed and introduced by Sun Microsystems in the early 1990s. The language was originally developed for small, embedded systems. However, it eventually became the “default” language for Internet and Web applications. Important design considerations include platform independence, reliability, security (an increasingly important concern starting in the 1990s), graphical user interfaces, and, very importantly, strong integration with web browsers, which were gaining prominence during the decade (Schneider & Gersting, 2018).
Like C++, Java is an object-oriented language. However, it can be considered to be more “purely” OO than C++, as all computations in Java are encapsulated in objects. There are two main categories of Java programs. Applications are stand-alone programs, while applets (for “small applications”) are programs that run through a webpage. A key feature of Java is its portability. Applications and applets run on most platforms and through most browsers. There are some differences between the way C++ and Java generate machine code. While C++ requires a compiler (as does C) to produce machine code, with Java, source code (code written in a computer language as text in some type of editor prior to compilation and other processing) compiles to platform-independent bytecode (or byte code). Bytecode is generic, low-level code, but cannot be executed directly by the CPU. Instead, bytecode is executed by an interpreter known as a virtual machine. The virtual machine software system is different for each hardware architecture. That is, while bytecode is machine-independent, the virtual machine is machine-dependent. Large parts of the Perseus digital library system, since Version 4.0 was introduced in 2005, are written in Java. The shift to Java has increased the interoperability of the system. The object-orientation paradigm employed in Java has made the system more transparent and easier to maintain. In addition, Java has a well-developed and well-documented application programming interface (API), as well as a vast expertise base. Jython, formerly known as JPython, implements the Python scripting language (arguably the most widely-used in the digital humanities) on Java platforms, and was one of the tools used in the development of newer versions of the Perseus system (see Perseus 4.0).
Also in this decade, script programming became a large, active area with introduction of interpreted scripting languages such as Perl, Tcl, Javascript, VBScript, Python, and PHP. Scripting languages integrate utilities, library components, and operating system commands into complete programs. Python is the most popular language for digital humanities applications and will be discussed in detail in subsequent sections. In the current context, Javascript merits special mention. It must first be pointed out that Javascript (sometimes written as JavaScript) is not directly related to Java, described above. Still less is it the “script version” of Java. Javascript is, however, a lightweight scripting language (that is, it does not consume an excessive amount of memory, is easy to implement, and is minimalistic in its syntax) for active or interactive webpages. Javascript code is embedded in the HTML for the webpage. Javascript is one of the prime drivers for webpage interaction. Perl was originally designed as a scripting language for facilitating report processing, and initially ran on the UNIX operating system. It is now, however, a general-purpose language. One of its applications is scripting for CGI (Common Gateway Interface), which enables web servers to execute external programs in response to users’ requests. Perl was also employed in the development of the data management system of the Perseus Digital Library, and early versions of that system were written primarily in Perl. However, the Perl version of Perseus was phased out as of 2006 [http://www.perseus.tufts.edu/hopper/help/archived/perseus4.0.ann.full.html]. PHP is also a general-purpose scripting language, used primarily for web development. It is particularly useful for interfacing web applications – for example, written in Javascript – with database management systems.
Special-purpose programming languages are languages designed for a specific purpose or application and are not typically considered to be suitable for general-purpose programming. A prime example of such a language is SQL, or the Structured Query Language. SQL is a standard that is implemented in virtually all modern-day database systems (specifically, relational database management systems), complex software systems that store data and facilitate rapid access and updating of data. SQL was developed by IBM in 1986. As its name implies, SQL enables users to “ask questions”, or formulate queries, about data in a database. SQL is a declarative language, meaning that users specify what task is to be accomplished, and not the details of how that task is to be accomplished. This is in contrast to procedural, imperative languages, such as FORTRAN, COBOL, and C, and object-oriented languages, such as C++ and Java, where the details of a computation must be explicitly coded by the programmer. In the case of SQL, a query describes what information the user wants, not how to find it or how to access it in the database.
HTML, or the HyperText Markup Language, describes the formatting of webpages. Many users are familiar with the basics of HTML, especially if they have designed web sites. Strictly speaking, HTML is not a language in the proper sense. It does not support control structures, looping, or conditional expressions. It is simply an encoding for web pages. However, as mentioned, languages with these structures, such as Java and Javascript, can be incorporated into web pages to enhance their functionality.
The decade also saw the rise of the functional programming paradigm. In addition to LISP and Scheme, which were introduced earlier, the new functional languages include ML (1978-1988) and Haskell (1989-1998). Haskell, in particular, although not currently in wide usage, has interesting features valued by programmers, and is poised to become more popular in the future (Louden & Lambert, 2011).
2000s – Present
A major aspiration among computer scientists in the 1960s was the design and development of a single universal programming language to meet all needs. This goal had not been realized, and is no longer desirable. In the late 1970s and early 1980s, computer scientists worked towards specification languages that would, at least in theory, allow users to define their computational needs, and then generate the code to execute these tasks. The logic programming languages, such as Prolog, were thought to assist in this endeavor. However, it has become clear that programming has not become obsolete. It is also clear that new languages are arising and will continue to arise to support the new and emerging technologies. One such development is a new class of multiparadigm languages that support several programming paradigms, including the procedural, functional, and object-oriented paradigms. A recent example is the C# language, developed in 2000 by Microsoft. Used as the main language for Microsoft’s X-Box gaming system, C# the three main paradigms. Another Microsoft initiative, F#, also supports all three paradigms, but is primarily a functional language. Other notable examples of new languages include Scala, Forth, R, Go, Erlang, D, Ruby, and Swift. The R language is particularly important for the digital humanities. R, first appearing in 1993, is a free programming language software system for statistical computing and graphics. Holding position #11 on the Tiobe Index (December 2021), R has generated a large community of digital humanities users in areas diverse as digital history, statistical methods for analyzing literature, and text mining, as well as in data science and visual analytics for the digital humanities (see Research Guides).
Julia, introduced in 2012, is another example of a new multiparadigm language. Like R and Python, it is free and open source, with a liberal license. It is a general-purpose language that is increasingly used for scientific applications, numerical analysis, and mathematical programming with a minimalistic core language. Julia can call Python functions, as well as C/Fortran functions, which can be called directly with no wrappers or special application programming interfaces (APIs). Julia features integrated linear algebra, random number generation, and string processing. It also has UTF-8 (Unicode Translation Format, variable width character encoding for electronic communication) support, which facilitates the transfer of mathematical algorithms to high-efficiency code. Although it is an interpreted language, Julia is compatible with “just-in-time” (JIT) compilers that generate high-efficiency code. It also has built in support for different types of parallel computations and has gained recognition in the high-performance computing community.
The Development of Programs
The programming process is generally consistent throughout the range of programming languages and paradigms and employ a correspondingly consistent terminology.
Coding refers to the process of translating algorithms, very high-level “sketches” or “drafts” of working programs (sometimes called pseudocode), and design into working source code. In general, care that is taken during the design stage translates into easier coding. However, as programs are written by human beings, they are susceptible to errors, which must subsequently be fixed. This process of correcting program errors is known as debugging. There are three main types of errors. Syntax errors result from code whose statements violate the grammar of the language. Syntax errors are generally reported by the compiler during compilation, or by the interpreter for interpreted languages. Programs do not usually pass beyond the compilation stage if the source code contains syntax errors. Runtime errors are the usually the result of illegal operations, such as dividing an integer by zero. Such errors usually result in a program aborting, or abending (abnormally ending). Source code that results in runtime errors typically do not violate the syntax or grammar of the language, and therefore do not cause compiler errors. As the name indicates, the errors occur when the program is run, or runtime. Logic errors are errors in the algorithm itself, and, since they are grammatical or syntax errors or the result of an illegal operation, do not generally cause the program to abend (unless the logic error subsequently causes a runtime error). However, with logic errors, the results of the program will be incorrect or unreliable.