Computers are capable of processing non-numerical data called character data. In computer terminology, a sequence of characters is called a string.
Basic terminology
Every programming language consists of a character set. A character set will have alphabets, numbers and special characters. A finite sequence of zero or more characters from the character set is called a string.
Strings are represented by enclosing inside quotation marks.
A string with length 0 is called an empty string.
Let A and B be two strings. A string consisting of characters in A followed by characters in B is called concatenation of A and B. It is often represented as A//B.
Suppose string A = ‘Hello’ and B = ‘World’ then A//B will be ‘HelloWorld’.
A string Q is called a substring of a string S, if there exist strings P and R such that S=P//Q//R.
Suppose S=”Evening” then ‘Ev’, ‘ni’, ‘ng’… are the substrings of S.
String storage
Generally three types of structures are used to store strings.
- Fixed length structure: In a fixed-length structure, each line of print is viewed as a record. All records have the same length.
- Variable-length storage: Variable length strings can be stored in memory cells either by using a marker like $$ or \0 to represent the end of the string or by listing the length of the string as an additional element in a pointer array.
- Linked Storage: We use a linked list to store strings. It helps to easily delete, change, and insert words, sentences even paragraphs in the text. A linked list is an ordered sequence of nodes, where each node contains a link which has the address of the next node in the list. Here each node is assigned one character or a fixed number of characters.