eLIBRARY ID: 8377
ISSN: 2074-1588
This paper explores the potential of modern information resources such as supracorpora databases for the multidimensional study of punctuation. On the one hand, in different natural languages, while the repertoire of punctuation marks and their graphic representations tend to coincide, there may be zones of functional divergence, so that the rules of placement of the same punctuation mark will differ from one language to another. To know these interlingual discrepancies is fundamentally important for a human translator and for training machine translation systems; otherwise, the translation may significantly distort the semantic content of its source text. Some of these differences were recorded in the pre-corpus era. More of them can be revealed with the aid of supracorpora databases, modern information resources created through the joint efforts of computer science, computational linguistics, and corpus-based translation studies; they not only help to verify the existing knowledge on a wide scale of texts but also to amplify it. On the other hand, punctuation has traditionally been regarded as an area of language that is fairly well-studied, tightly regulated, and therefore least susceptible to change and innovation. However, supracorpora databases provide an opportunity to identify new (not yet found in the normative literature) functional-semantic features of the use of a given punctuation mark. Nowadays, the development of artificial intelligence based technologies, namely voice assistants, makes it particularly important to thoroughly research the functional semantics of punctuation marks. The paper shows the opportunities that supracorpora databases provide for punctuation studies, using the example of the exclamation mark in Russian and French .
The article presents the corpus-assisted research of the functional potential of the colon in the Russian-French language pair. For the data description, a three-part structure was determined. Following that structural model, a contrastive analysis of the colon was performed in order to uncover and then describe its functional potential in the studied language pair, which allowed to solve the following tasks: 1) to calculate the frequency of the colon in the compared languages; 2) to clarify its functional potential there; 3) to identify the zones of functional symmetry and asymmetry. The main information tool is a supracorpora database that stores parallel texts from the Russian National Corpus. It allows for generating the search queries to process the punctuation component in parallel texts. Corpus-based contrastive studies of punctuation are directly related to current challenges i n computer science. They are essential for designing a range of modern AI-based information products, particularly next-generation voice assistants. Additionally, their findings will contribute to improving machine translation technologies. They will aid in calibrating and refining the training of machine translation systems to account for punctuation differences between source and target languages. Moreover, these results will be of use in advancing AI-driven subtitling technologies.