[A paper presented at Qualico-97]
On the Nature of Koehler's Effect
J.K. Krylov
In the monography by R. Koehler [1] there was described a very interesting effect of the words' average length oscillation as a function of their frequency of use (according to the data from a representative frequency dictionary). Use of the approach to analysis of data, suggested by R.Koehler (calculation of gliding average meaning not only for the function, but also for the argument), let us to establish that oscillation of the conditioned averages are observed also during considering and other distributions.
For instance, they take place in the case of the average words' polysemy dependence on frequency of words' use. Also it is observed for regression line of bimodal length distributions for different levels of coherent text organization. In this context, further the Koehler's effect will be definied as polymodal character of the dependence of the conditioned average in cases of two-dimencional distributions of any linguistic nature.
The main problem arising in connection with this effect study consists in answer to the question of mechanizm of observed nonmonotonous dependences arising. As it is known, smoothing of some time series determines change of its structure and is able to lead to emergence of "low frequency" oscillations even in the case of pure random series - so called Slupski-Jule effect. Correspondingly, there is a problem of separation of the oscillations, containing in empirical data and the oscillations generated by the calculation process, which can be achived by different algorithms of averaging. That is why while revealing empirical dependences there were used not only various algorithms of gliding means calculations, but also smoothing with the help of geometric mean (averaging of basic magnituds logarithms), and even robust algorithms, not having some standard scale of a window for averaging.
Considering of the dependence of lexemic lenght L on frequency their use F for vocabularies of M. Lermontov and V. Shukshin showed that under the condition of norming for the frequency Spectrum by the frequency Fi (of the first rank word), position and width of the main maximum in these vocabularies coincide with analogic characteristics in studies by R.Koehler [1]. Use of frequency dictionaries of textual conglomerates let us to establish that amplitude observed oscillations decreases according to increase of variety of text and of fragments' length taken for compiling some frequency dictionary. Analysis of the trend L = L (F). In case of F.Dostojevski's novel "Demons", of M.Bulgakov's "Master and Margaret", of V.Belov's "Customary Case", of I.Turgenev "Asia" and other whole texts allowed to find that the trend is satisfactory described by the logarithmic dependence L = alnF + b. In this case coefficient b significantly increases as a function of the considered text length. The latter fact give evidence for the gact that "satiation" of a text by words of length their words appears not locally in the area of low frequencies, but more or less evenly through the whole frequency range.
Smoothing of remainders obtained by way of subtraction of the trend from corresponding empirical dependence, revealed presence of some stable maximums, appearing in the result of calculations of some separate solid parts of some fiction text, and correlating with oscillations, observed in case of corresponding processing the whole text.
However, most brightly the Koehler's effect has manifested itself in case of considering of the dependence of an average word length of the whole text. Using data on the length (estimated both number of occurences of graphemes and runnig words) we managed not only to reveal presence of numerous maximums, but to connect extremums of observed curve with typical scale for units (sentences, paragraphs, episodes, chapters) of different levels of the whole text.
All present permits to form a hypothesis, that the main complex of reasons leading to Koehler's effect is connected with the hierarchy and interlevel interaction between structural elements of the whole texts. At the same time, just the effect of Slupski-Jule (however appearing on the deep level of text generation - as some consequence of averaging parameters for units of some lowel level within some whole bigger fragment), leads to emergency of interdependent oscillations for number of element within classes of different mesoscopic levels of text organization. This, eventually reveals itself in the Koehler's effect while study corresponding distributions. Meanwhile, the major goal of the further research developments consists in necessity of relating the locations of observed oscillations with the typical scales of units from different levels of the whole text segmenting.
References
Koehler R. Zur Linguistischen Synergetik: Struktur und Dynamik der Lexik. - Bochum: Brockmeyer, 1986.