Appearance, Rendering,
and the Abstract Intention With the Text.
Karsten Kynde
Søren Kierkegaard Forskningscenteret ved Københavns Universitet
Key Words: Text Signals, Electronic Text, Text Encoding.
Abstract: Letters constitute words that constitute lines that constitute pages that constitute a text. As, however, the text appears, letters, lines etc. are interpreted together with several other signals important to the reader. Any other concrete rendering of the text might convey the same meaning of the text. We shall term the union set of such renderings the abstract intention with the text, and show that it is more comprehensive than any single rendering, including the one delivered by the author of the text. The paper address the issue of how to encode and store a computer representation of a text whether it be the base of a printed copy of a book or an electronic version.
About the Søren Kierkegaard Research Centre: The Søren Kierkegaard Forskningscenter was founded in 1994. The interdisciplinary research activity is concentrated in two main groups, and is therefore best described as an ellipse with two foci. One of these consists of researchers whose work is centred in the field of Kierkegaard Studies; the other consists of philological researchers whose primary aim is the establishment of the new critical edition of Søren Kierkegaards Skrifter. The centre is housed at the Faculty of Theology of Copenhagen University, Denmark.
About the Author: Karsten Kynde is born 1950. Graduated as cand.scient. in Computer Science at Copenhagen University 1979. Occupied at Søren Kierkegaard Forskningscenter by text encoding and computer registration of the new critical edition.
When we as readers consider a text, we unconsciously distinguish between at least two distinct levels. Firstly and immediately, the text consists of dots, strokes, little curved and straight lines, forming letters, constituting lines and pages. I suggest that in the following we term this point of view the rendering of the text. The rendering is concrete and dependent of the physical nature of the medium. Here are some examples of the renderings of text:
A a a A
There exists, however, an intention with the rendering that is independent of media. The intention is abstract. It usually appears straightforward from the text, sometimes after a certain interpretation of the explicit text as well as implicit signals of the text. Asked, any skilled reader will state that the above renderings, although some of them are shaped rather differently, all appear as the Latin letter a. On further interrogation, we may admit some slight modification. We would, on closer inspection, expect the capital version reserved for special occasions, the italic for others, and so on. Such interpretation may be of somewhat subjective nature. I suggest defining text appearance as the abstract intention with the rendering of the text. This is furthermore what we want preserved whatever rendering the text is given, whether handwritten, printed or computerised(1).
So, we could say that whereas the rendering is the input to the text, i.e. what the printer or the writer puts into the text, the appearance is the output from the text to the reader. The strange thing is that the output in some respects is seemingly more comprehensive than is the input, as if the text and the context add information to the single rendered symbols conveying the content to the reader in a (usually) convincing and unmistakable way.
The considerations become relevant when publishing a text, the writings of Søren Kierkegaard, say, a text on which these ideas shall be applied and which shall be our example throughout this presentation. These kinds of writings could be, and are often, published as a facsimile of earlier prints. If so, lots of problems vanish. It is furthermore the only satisfactory solution to some problems(2). Having disregarded this uncompromising solution, however, we find ourselves forced to, in a cruelly objective and concrete manner to define and register the above mentioned abstract and subjective intention. In order to write, i.e. to determine what to go into the text, we must know what the reader is supposed to get out of the text. For reasons which should become clear in the sequel especially a computerised version of the text will benefit from such clarification.
We shall throughout this presentation use two examples of a base text, viz. B1: The first printing of a published Kierkegaard book or writing and B2: A handwritten Kierkegaard journal or writing which has not in Kierkegaard's lifetime been published in print. These examples are obviously in no way arbitrarily chosen, a thorough discussion is given elsewhere and shall not be repeated(3).
Taking a character, i.e. a letter, a digit or a punctuation mark, to be the elementary quantity of the text, the base text appearance can seemingly be registered as a sequence of characters. Take again the letter a. The base text reveals a given distribution of ink on a piece of paper. The reader interprets the first letter of the Latin alphabet, and we may encode or represent it as such: A number from 1 through 29. Soon we would find further encoding necessary: The letter may be a or A, both recognised as the first letter of the alphabet. In B1 we may distinguish between typefaces: Blackletter or Roman(4) and further between sort of types: Ordinary, bold or spaced out. We shall not for the present elaborate further on the rendering of a single a spaced out, but promise to return to this matter later!
The handwritten B2 is a simpler, yet more complicated example. The author with his pencil has no access to a huge variation of typefaces or similar, although he might still make use of different character styles such as underlining and the like. On the other hand, the lack of uniformity in handwriting calls for increased interpretation. We can easily imagine a word in B2 where the unbiased reader would read Regina, Latin for "Queen", but any learned Kierkegaardian would have Regine, the name of the woman to whom Kierkegaard was briefly engaged and lifelong devoted. Furthermore, the latter would claim his interpretation to be the correct one! Such a statement brings us to the brink of a discussion, which we shall not pursue, but merely note that we have reached two part conclusions leading us on to the next issue:
1.1 A character cannot adequately be encoded by its place in the alphabet but possesses in addition qualities such as capitalisation, type face, sorts and probably many more properties.
1.2 Even the letter is by reading an object for interpretation.
In the text the letters are collected forming words. But words are not merely sequences of letters. If we observe, say, misspelling or mistyping we will interpret the letters differently in the context of the word. In the printed example of B1, this may be considered dubious and, from a philological point of view, one may argue that this should not happen. At least we should distinguish between the silent and the commented correction. Once more we refer to the discussion elsewhere(5). In B2, however, we must insist that interpretations of this kind happen constantly. Some letters simply cannot be read alone or only so under the assumption that neighbouring letters are interpreted in a certain way.
In the text the words are collected forming lines. The division of the text in lines, however, is merely a practical result of the fact that hardly any prose text can be exhaustedly expressed in one line. Only in exceptional cases, the new line expresses a meaningful signal in the text. So, the rendering of the text does have properties not belonging to the abstract intention. They do reflect an additional property of the text: Normally a new line in the middle of a word would cause a division of that word, so that the first part of it is printed with a hyphen at the end of line. This hyphen is as less meaningful as is the new line. Now, words containing a hyphen do exist, and are especially frequent in 19th century Danish literature, e.g. Forfatter-Virksomhed, (authorship). Usual typography does not reveal any difference between hyphens used for compound words and hyphens used for the occasional parting of syllables, neither is the hyphen doubled when the word is divided at the hyphen. Consequently you cannot immediately tell whether a hyphen at the end of a line is meaningful, the principal example being
Roman-
og Novelle-Litteratur,(6)
(novel and short-story literature) breaking the line after Roman-, and not designating the word *Romanog. But of cause, is the word according to normal orthography hyphenated, the distinction may be done on closer inspection. Coming to our standard examples, however, we should consider that no such thing as a Danish orthography existed until the late 19th century, i.e. after the creation of these texts(7), introducing the rather volatile concept of "orthographic variance". In B1 we must expect to find the above Forfatter-Virksomhed spelled Forfattervirksomhed.(8) If we want to imply a spelling to the word outside the context of the line break, we must copy from Kierkegaard's usual spelling if we can find at least one other instance or use the contemporary standard usage(9). Hence, we must further conclude:
3.1 The rendering of the text does contain signals not belonging to the abstract intention of that text.
3.2 The rendering of the text does contain cases where the abstract intention can not unambiguishly be told. In this case the interpretation of the text must be stretched into an artificial re-composition of the intention.
Some of you who view the text from a literary point may have expected an entry on sentences as the natural item following the sequence characters and words. This would indeed have introduced another interesting problem, as the sentence is yet an example of a text signal that does not appear but implicitly from the text.(10) In that respect the current presentation can be said to focus on a typographical point of view.
Returning to the problem of the out-spaced a: This is obviously another case confirming the conclusion: In the case we read the text i s a m m e Ø i e b l i k (11) (at the same moment), samme and Øieblik being spaced out, is i then spaced out as well?
We may also very well find that the fact that a text is centred on the line carries a meaningful signal to the reader. So does a text indented on the line. If a text however, is indented half an inch, and the text ends exactly half an inch from the right margin, is it indented then or is it centred?
Usually, such ambiguities easily resolve by looking at other, preceding signals of the text. Investigation of corresponding spelling has been mentioned. Indentations should be compared to similar text types. Only at rare occasions we are forced to give up and guess or choose freely. We then also realise that ambiguous signals usually have but little importance. Of cause ambiguity may be cultivated as a style, but usually either author or typesetter will try to avoid it by various means. To separate willingful new lines from the arbitrary ones, a short indentation is frequently used, perhaps combined with a slightly increased vertical spacing. These arrangements are part of the typographical style which is time bound and typically not a matter where the author may choose solely by himself.
It is now fair to ask: To whom is this relevant? The answer is that this is relevant when reproducing a text with an alternative rendering which, we claim, has the same intention. Abstract intention, as you cannot say that the original text was unintentionally published in octavo in blackletter. But the intention was concrete, bound to contemporary customs, style and similar of which the modern reader declares himself independent. This comes as a natural consequence of disregarding a facsimile print.
Resuming our four concluding remarks 1.1 through 3.2 we could say that behind the given basic text exist an intention containing a greater information set, induced from the basic text by interpretation, guesses and arbitrary choices. This is equally true when re-creating the text in a different rendering: This will again contain less information than the intention behind. Just will it typically be other pieces of information that disappears than was the case for the basic text. Words will get hyphenated differently. Considering out spacing old fashioned, we may want to express the same highlighting with italics. We may in that case find it necessary to interpret single characters. The interpretation now goes into the rendering of the new text whose information set has now increased(!). A line that was clearly indented in the basic text may in a new edition with an other line length suddenly look centred.
Furthermore, any rendering has a certain, finite quality. Depending on the anticipated use of the text, we may consider certain signals to be irrelevant and not to reproduce them at all. Such a rendering can be said to be adequate for some but hardly all purposes. A computer screen is not a page in a book and its quality is much different. Here, as always, quality will show out to be limited by the resources that are at hand. Economic common sense is fine and low cost quality is not necessarily fatal, it merely limits the application.
However dubious, we may consider it a necessary evil to reconstruct the intention from the basic text. It is unjustifiable though, and in addition completely unnecessary to reconstruct it from a new edition. Whenever in doubt, we obviously must return to the basic text when available. But for yet another edition it seems most feasible to reuse the interpretation we did last time.
It appears ideal to register the intention of the text and then, in a second move to choose a rendering of a quality fitted for the actual needs and resources. This is also what usually happens. The registration happens partly in the mind of the editor, partly in explicit text critical remarks when necessary. In a computer we are so lucky as to be able to distinguish between the registration and the rendering of the text. We should therefore choose to register the abstract intention of the text and render it concretely in variations according to medium.
This phenomenon may be exemplified by the usual, modern text processing on a computer. The intention of the author is registered through key strokes and stored in a file. From there the intention is rendered in the form of characters on a screen. Certain key strokes will not be shown explicitly on the screen but it will get registered in the file, e.g. the justification of the margin. Implicitly, it will appear from the upper limit that exists for each shown line on the screen. The screen will treat certain signals differently, dependant on its quality. A colour screen may render words in red. On a black and white printer that same word may be rendered in italics. We may consider it an imperfection. We may point to the red text on the screen and claim that this is "really" italics. We may utilise a graphical screen to show italics or a colour printer to print red. But it should not veil the fact that we in every case face a rendering of finite quality of the one and same intention.
The appearance is stored in the file encoded in what we term the format of the file. This is the meta language for the intention and it limits the information we may store. One popular text processing tool is Word Perfect.(12) This system stores its documents in a proprietary file format, which is not intended for public inspection. The interface to the user (author, writer) is the computer screen, being what is termed a WYSIWYG text processor, an acronym for What You See Is What You Get. The idea is that the author, during the act of writing, rather than the meta language of the file format wants to see the final result before him, i.e. the rendering on paper. Thus the paper becomes the authoritative medium. This is consistent with the way the format is composed. We may confirm using the Word Perfect function "Reveal Codes" which, as it seems, is nevertheless a necessary picklock for the user wanting to see what is really written. Here we meet codes like <italics>, <hard carriage return>, <soft carriage return>, i.e. dealing with the rendering on paper yet explicitly expressed.
Facing another kind of text processing like LaTex(13) or SGML,(14) we observe a quite different approach. In here, the formatted text file is the ultimate interface to the human writer. Although WYSIWYG interfaces do exist for these systems, it is regarded cheating, as it tends to focus the attention to a concrete rendering of the text and away from the intentions behind. This is reflected by the use of marking tags of the format. We are dealing with terms like high lighting 1, which may actually be implemented as italics in one possible rendering and bold type in another. SGML is not even a format but rather a template for a wide variety of formats. It allows us to go even deeper into the matter of the text than we have dared so far. This can be told from projects with transcription of primary textual sources, such as Chaucer texts(15) or Ludwig Wittgenstein's papers(16).
Concluding so far, we face the task of choosing a format of registration that can adequately encode any relevant appearance of any relevant character (1.1). In Kierkegaard's case this implies a full Greek as well as a Hebrew character set. We must do all the necessary interpretation from the chosen base text (1.2) and resolve ambiguities to a chosen level (3.2). I.e. if we want to view the text as literature and do syntactical analysis on sentences, say, we must resolve ambiguities connected with sentence separation. Otherwise, we may neglect that point. We should avoid registering arbitrary choices (3.1).
Dependant as it is on the base text in question, we cannot expect to find a standard representation format fitted for our needs. In the Kierkegaard case we have therefore chosen to tailor a convention termed the Kierkegaard Normal Format for our purpose. The format obeys several practical and technical demands in addition the above mentioned ones, such as
7.1 It should be easy to reconstruct the text to any other format for text processing, type setting, information retrieval or any other feasible purpose.
7.2 It should be possible to process the text by means traditionally available on most computers, irrespective of country, operating systems, etc.
7.3 It should be flexible towards new, yet unknown informations.
The result has become an SGML inspired format of text files, using the standard ISO IRF(17) character encoding only. Thus all Greek and even Danish national letters are explicitly encoded as tags, embedded in curly brackets, e.g.
I slige {O/}ieblikke indeslutter han sig da taus og
hemmelighedsfuld i sit {%aduton,}(18)
(In such moments he will disclose himself, silent and secretive in his ) substituting {O/} for Danish letter Ø and {%...} transliterating Greek letters. In the following example
{!|<III>}
{><}{#g}{~}Forord.{~\}{#\}
{!/!/}
{#i}{#k}F{#\}{#\}orord bryder ingen Tr{ae}tte, pleier man at sige;{|} designates a new page. The following Roman number refers to the 1st print, the angel brackets showing that the page number is not put there but rather extrapolated from adjacent pages. The exclamation sign signals that the page break is meaningful and should be reproduced. {/} similarly designates a line break, doubled to mean an entirely blank line. The headline Forord (Preface) is centred {><} and composed in Gothic letters {#g} which are furthermore spaced out {~}. A backward slash marks the end of a type face. {#i} and {#k} designates initial letter and contour, resp. As may be seen, we have chosen to mirror the original setting rather closely. We have not by this act presupposed any specific rendering by a modern typesetter, merely noted that the basic text contained typographical signals here. On the other hand, we do not restrict ourselves to notice that this is in fact a heading of a new paragraph, leaving the details to the publisher. That would mean a harmonisation of the text calling for a uniformity of the text that corresponds badly to the text left by the author.
From the opposite end of the complexity scale, we pick the following sample from Kierkegaard's diaries:
hvor vi gaae ligesaa n{o/}gne ud af vor Selvbetragtning{@mn357b}, som fordum af Modersliv. {_}
{!/}{>]}d. 8. Feb: 39.{@D18390208}{!/}(19)(where we leave our introspection* as naked as we left the womb) {@...} designates a reference. The further coding D says that this is a date followed by a normalised date. This is not for printing, but may show useful to certain analyses of the text. mn refers to a note that has been put in the margin of the diary. The note itself is encoded enclosed in {@@...}, thus:
{@@mn357b} men det er ogsaa n{o/}dvendigt for at Gud kan skabe {X}{<f}Noget {x-}ad{x\}af{X\} os; (...) {@@\}
(but this is indeed necessary, should God create something out of us; (...)) The {X...} enclose a passage of alteration in the manuscript. It appears that the author has been about to write Noget ad, which is a wrong expression, hence he has immediately deleted the last word and corrected to af. To complete, {_} designates a dash, as opposed to a hyphen, {>]} marks a right justified line, and {<f} is editor's comment for an immediately corrected text.
Admittedly this format looks somewhat encrypted, but this is the price of making explicit signals that are otherwise "invisible", nevertheless present.
Notes
1. In his book Concise Survey of Computer Methods, Studentlitteratur, Lund 1974, Peter Naur writes: "Those of us who have been able to read since early childhood may tend to find the relation of data to what they represent trivial." As a reminder that it is not, Naur quotes a passage from Maxim Gorki's autobiography, telling about the fisherman Isot and his reaction at picking up the art of reading:
""Explain to me, brother, how does the thing happen, when all is said? You look at little lines, these shape themselves into words, and I know them living words, our own words! How do I know them? No one whispers them to me (...) it seems as if the very thoughts are printed how is that? (...)"
How could I answer this sort of question? My "I don't know" would dicourage him.
"Magic !", he siad with a sigh, and held the pages of the book toward the light."
2. A facsimile may occationally be an indispensable supplement for the understanding of the text. For the
overall "normal" reading of the text, however, the typographical habits of previous century printing and,
in particular, handwriting is considered a too high hermeneutic threshold to modern readers. In addition
an opportunity is desired to supply text critical and editorial comments. Reasons for critical editions can
be found elsewhere and is considered out of the scope of this paper.
3. Johnny Kondrup: Hvilken grundtekst skal vi vælge and Forslag vedr. Samlede Værker (internal notes),
29.12.93 with further reference to Erik Dal: Udgiverbemærkninger til H.C.Andersens Eventyr bd.I, 1963 or
Herbert Kraft: Editionsphilologie, 4. In accordance with the fashion of the time, Kierkegaard's books were set with blackletter or German
type. Latin words and phrases, however, was consistenly composed by Roman characters.
5. Johnny Kondrup: "Tekstkritisk Regulativ" in Kierkegaard Studies. Yearbook, Verlag Walter de Gruyter
1996.
6. Example from Søren Kierkegaard: Af en endnu Levendes Papirer, København 1838 p.7.
7. Svend Grundtvig: Dansk Haandordbog, 1872 was the first official Danish orthographical dictionary.
8. E.g. in "Derved bliver det!" in the newspaper Fædrelandet No. 304, 1854.12.30, probably following the
newspaper's spelling standard.
9. In his journals (Noget om min Interpunktion, Pap. VIII.1 A 33, 1847) Kierkegaard actually declares a
contemporal dictionary an authoraty, viz. Chr.molbech: Dansk Ordbog, Gyldendal, København 1833.
Kierkegaard, however, in no way follows this line firmly, as compared to a modern writer.
10. The first electronic version of Kierkegaard's texts is due to Alastair McKinnon and contains explicit
characters delimiting sentences. This is the base for McKinnons works on the lexical structure of the texts,
e.g. Mapping the Dimensions of a Literary Corpus (Literary and Linguistic Computing, Vol. 4, No. 2), Oxford
University Press 1989.
11. Example from Søren Kierkegaard: "En Leiligheds-Tale" in Opbyggelige Taler i forskjellig Aand, 1847, p.93.
12. Word Perfect is registered trademark of Novell Inc.
13. Leslie Lamport: LATEX: A document preparation system: User's Guide and reference manual, Addison-Wesley, Reading, Mass. 1994.
14. Charles F. Goldfarb: The SGML Handbook, Clarendon Press, Oxford 1990.
15. Peter Robinson: The Transcription of Primary Textual Sources Using SGML (Office for Humanities
Communication Publications Number 6), Oxford 1994.
16. The MECS variant MECS-WITT does go beyond the limits of SGML. Claus Huitfeldt: Computerizing
Wittgenstein. The Wittgenstein Archives at the University of Bergen, in Wittgenstein and Norway (ed. Johannesen, Larsen, Åmås), Oslo 1994.
17. International Standard Organization International Reference Format almost identical to 7-bit ASCII
18. Af en endnu Levendes Papirer, p. VI
19. From a notebook marked EE, 1839, Pap. II A 357.