[A paper presented at Qualico-97]
Three Laws of Fiction Prosaic Texts Organization
Yu. K. Krylov
There was undertaken a research of lengthes of marinon - some integral units obtained by segmenting of comleted (whole) fiction prosaic texts (stories, small novels, novels). As natural marinons of different mesoscopic levels of text organization there were considered syllables, phonetic words, syntagms, sentences, paragraphs, subchapters, chapters, parts of a text, etc. A set of programms used in this case permitted to analyse regularities of statistics for text organization not noly in the traditionsl grapheme display, but also in some phonetic transcription. As the main rithmic senseorganizing unit in this case was taken a phonetic word defined as follows: (1) any full (not syntactic) word presented in it transcribed form, or (2) combination of some full word with some affached to it syntactic word. In both cases there should be one and only one phonetic accent within a unit. Overall corpus of considereded texts (Russian fiction prosaic) contain more than two million of graphic running words. Undertaken research has lead us to revealing of three main regularities.
1) to formulation of the law of optimum relation between number of elements within each submicroscopic level;
2) to formulation of the law if free combination of phonetic words;
3) to formulation of the law of fractal likeness of distributions of marinons lengthes from different levels of the whole text organization.
In accordance with the first law an optimal correlation between the overall number of phonemes and spaces is characterized by two universal linguistic constants:
The first of them determines the correspondence between the number of spaces in a text and the number of rithmous (number of vocal phones together with spaces).
The second law determines correspondence between the number of consonant phonemes and the number of spaces (phonetic words). In this case C1 and C2 appear to be connected by the inverse correspondence:
C1 * C2 = 1, which reflects equality of numbers for consonant phonemes and rhythmous in an ideal case.
Theoretical correspondences (1) were obtained as a result of solving some optimizational problem. Their testing by looking through volumous empirical data showed that they follow the law with a precision characterized by coincidence with the third significant figure.
Investigation of combinatorical features for phonetic words permitted to eastablish that the probability of appearence of a word beginning from a vowel (or, on the contrary, consonant) does not depend on what kind of phoneme is in the end of a previous word.
Both two above formulated laws allow to calculate absolutely theoretically (without any adjusted parameter) elements of the transitional matrix of a markovian chain of the first order, describing alteration of microscopic level and to test numerous sequences, resulting from proposed theory. For instance, from mentioned above regular correspondences it immediately follows that average number of consonants contained contained in one rhythmon equals 1. Taking into account that the number of syllables in any phonetic word equals total number of rhythmon minus 1, we can arrive to the Menzerath-Altmann's law:
L(g) = 1+1/g
Here L - average length of a syllable (in the number of phonemes), g - length of a phonetic word in syllables.
Formation of the coherent text is not binded with any one specific mesoscopic level of the text organization. Semantic correlations pierce all the text. Forming of the text as structural integrity is determined by the interdependence of all constituting levels of the totality. The later allowed to make an assumption that there should be observed some structural likeness between different mesoscopic levels. Taking into account that 1/C1 equal average length of phonetic words measured by the number of rhythmons it is natural to suppose that analogic correspondences should take place and for the number of marinons in any two adjacent mesoscopic levels of whole text organization: i.e., correspondence between number of syntagms and phonetic words, number of sentences and syntagms, number of paragraphs and sentences etc., should also be characterized by the same universal constant.
Experimental testing of the formulated above hypothesis showed that it really confirmed with very high degree of precision. Moreover, analysis of marinon distributions for any two levels i and j permitted to establish that the distributions posess very high degree of correspondence any levels which have the fifference between their ordinal numbers equaled to i - j = Constant. In this case the length distribution of syntagms, expressed in rhythmons differs only by chance flactuations from the distribution of sentences (expressed in phonetic words) and paragraphs expressed in syntagms). Just this fact is a principal content of the law of fractal likeness.