C0 | Documentation
About Downloads Tutorial References Courses
C0 Tutorial C0 and other languages Other references

Strings

Strings, used for input and output, are sequences of characters. To mention strings in program they must be enclosed in double quotes, such as: "Hello World!\n". \n is a newline character, written as a so-called escape sequence. There is an empty string, written as "" which consists of 0 characters.

In C0, strings are immutable, which means we cannot modify a string once it has been constructed. There are a few special cases: a string may not contain the NUL character \0, and a double quote character must be preceded by a backslash \ so it is not mistaken for the end-of-string. This in turn mean that a backslash itself must be preceded by a backslash, so it is not mistaken for the first character of an escape sequence.

To illustrate, we start coin with the conio and string libraries by invoking it with two -l specifications. conio provides the print function which takes a string as an argument.

% coin -l conio -l string
Coin 0.2.6 "Penny" (r2087, Fri May 27 12:10:21 EDT 2011)
Type `#help' for help or `#quit' to exit.
--> print("You have to escape backslash (\\) and doublequote (\") but not single quote (')\n");
You have to escape backslash (\) and doublequote (") but not single quote (')
--> print("Write \\\\ for \\ and \\\" for \" inside strings\n");
Write \\ for \ and \" for " inside strings
-->

The C0 language does not provide any direct means of manipulating strings, but the conio (console input/output) and string libraries provides some basic functionality.

conio Library

The conio library is included in C0 source files with #use <conio> at the top, or with -l conio when invoking coin. It provides the following functions related to strings.

void print(string s);  /* print s to standard output (console) */
void println(string s);  /* print s to standard output, with a trailing newline */
string readline();  /* read a line from standard input (console); */
                    /* return it without the trailing newline character */

string Library

The string library has a few simple primitives for manipulating strings. In addition, it provides a general and efficient way to convert between character arrays and strings. Here are some of the salient functions; you should consult the library specification for more.

int string_length(string s);  /* number of characters in s */
string string_join(string a, string b);  /* concatenate a and b */
string string_sub(string a, int start, int end)  /* substring a[start..end) */
  //@requires 0 <= start && start <= end && end <= string_length(a);
  ;

/* lexicographic comparison; less than (-1), equal (0), greater than (1) */
int string_compare(string a, string b)
  //@ensures -1 <= \result && \result <= 1;
  ;

string string_fromint(int i);
string string_frombool(bool b);
string string_fromchar(char c)
  //@requires c != '\0';
  ;

Finally, the general conversions between strings and character arrays. In deference to the C conventions, a character array we would like to treat as a string has to have the NUL character (\0), which is not included in string itself but is seen to terminate it. This means that a character array will hold one more character than the string. This is a common source of errors in C programming, so you have to remind yourself of this.

char[] string_to_chararray(string s)  /* characters of s, adding a trailing '\0' */
  //@ensures \length(\result) == string_length(s) + 1;
  ;

string string_from_chararray(char[] A) /* characters of A up to (but not including) '\0' */
  //@ensures string_length(\result) <= \length(A) - 1;
  ;

Sample Code: string_lower

Here is some sample code that converts uppercase characters A-Z to their lowercase equivalent a-z. We start with a function to convert individual characters, using the char_ord and char_chr functions to convert between characters and their ASCII code. We exploit that characters in the range A-Z have consecutive codes, as do a-z.

#use <string>

char char_tolower(char c) {
  int ccode = char_ord(c);
  if (char_ord('A') <= ccode && ccode <= char_ord('Z')) {
    int shift = char_ord('a') - char_ord('A');
    return char_chr(ccode + shift);
  } else {
    return c;
  }
}

Now we use this in a function that takes a string and constructs a new string. It views the input string as a character array A, and constructs the output in a new character array B which is turned into a string just before we return. We state in the postcondition that the length of the output string in the same as the input. Of course, this is not a full specification, but states a useful invariant property.

string string_lower(string s)
//@ensures string_length(s) == string_length(\result);
{
  int len = string_length(s);
  char[] A = string_to_chararray(s);
  char[] B = alloc_array(char, len+1);
  for (int i = 0; i < len; i++)
    B[i] = char_tolower(A[i]);
  B[len] = '\0';  /* redundant */
  return string_from_chararray(B);
}

Note that B (and also A) are arrays of length len+1, to account for the trailing '\0'. Because '\0' is the default value of type char, the array B will be initialized with all NUL characters in all entries. This makes the last assignment redundant, strictly speaking. However, the code is much clearer, and the role of the extra character in the array B explicated, so it is good style to write the code as above. A slightly less perspicuous alternative would be

string string_lower(string s)
//@ensures string_length(s) == string_length(\result);
{
  int len = string_length(s);
  char[] A = string_to_chararray(s);
  char[] B = alloc_array(char, len+1);
  for (int i = 0; i < len; i++)
    B[i] = char_tolower(A[i]);
  //@assert B[len] == '\0';
  return string_from_chararray(B);
}

where the assertion reminds the reader that B[len] will have been initialized to '\0', and that this is important.