eLIBRARY ID: 8377
ISSN: 2074-1588

eLIBRARY ID: 8377
ISSN: 2074-1588

En Ru
Supracorpora database as a tool for studying punctuation

Supracorpora database as a tool for studying punctuation

Recieved: 03/29/2024

Accepted: 06/28/2024

Published: 11/25/2024

Keywords: corpus resources; annotation; punctuation; contrastive studies; translation; asymmetry between languages; corpus-based translation studies; database

DOI Number: 10.55959/MSU-2074-1588-19-27-4-11

Available online: 25.11.2024

To cite this article

Nuriev V. A., Ignatova S.D. Supracorpora database as a tool for studying punctuation. // Moscow University Bulletin. Series 19. Linguistics and Intercultural Communication 2024. Vol. 27. Issue 4. 147-158 https://doi.org/10.55959/MSU-2074-1588-19-27-4-11.

Issue 4, 2024

Abstract

This paper explores the potential of modern information resources such as supracorpora databases for the multidimensional study of punctuation. On the one hand, in different natural languages, while the repertoire of punctuation marks and their graphic representations tend to coincide, there may be zones of functional divergence, so that the rules of placement of the same punctuation mark will differ from one language to another. To know these interlingual discrepancies is fundamentally important for a human translator and for training machine translation systems; otherwise, the translation may significantly distort the semantic content of its source text. Some of these differences were recorded in the pre-corpus era. More of them can be revealed with the aid of supracorpora databases, modern information resources created through the joint efforts of computer science, computational linguistics, and corpus-based translation studies; they not only help to verify the existing knowledge on a wide scale of texts but also to amplify it. On the other hand, punctuation has traditionally been regarded as an area of language that is fairly well-studied, tightly regulated, and therefore least susceptible to change and innovation. However, supracorpora databases provide an opportunity to identify new (not yet found in the normative literature) functional-semantic features of the use of a given punctuation mark. Nowadays, the development of artificial intelligence based technologies, namely voice assistants, makes it particularly important to thoroughly research the functional semantics of punctuation marks. The paper shows the opportunities that supracorpora databases provide for punctuation studies, using the example of the exclamation mark in Russian and French .

References

  1. Valgina N.S. 1979. Russkaya punktuatsiya: printsipy i naznachenie [Russian punctuation: principles and purpose]. Moscow, Prosveshchenie Publs. (In Russ.)

  2. Valgina N.S., Svetlysheva V.N. 2000. Russkii yazyk: orfografiya i punktuatsiya. Pravila i uprazhneniya [Russian language: spelling and punctuation. Rules and exercises]. Moscow, Neolit Publs. (In Russ.)

  3. Goncharov A.A., Inkova O.Yu., Kruzhkov M.G. 2019. Metodologiya annotirovaniya v nadkorpusnykh bazakh dannykh [Annotation methodology of supracorpora databases]. Sistemy i Sredstva Informatiki, vol. 29, no. 2, pp. 148–160. (In Russ.)

  4. Nuriev V.A., Karpov V.I. 2023. Metodologiya korpusno-orientirovannogo issledovaniya v oblasti kontrastivnoi punktuatsii [The methodology of the corpus-based studies in the field of contrastive punctuation]. Informatika i ee Primeneniya, vol. 17, no. 2, pp. 90–95. (In Russ.)

  5. Nuriev V.A., Kruzhkov M.G. 2023. Korpusnye dannye pri kontrastivnom izuchenii punktuatsii [The parallel corpora perspective on studying contrastive punctuation]. Sistemy i Sredstva Informatiki, vol. 33, no. 1, pp. 14–23. (In Russ.)

  6. Paducheva E.V. 2010. Semanticheskie issledovaniya: Semantika vremeni i vida v russkom yazyke; Semantika narrative [Semantic studies: Semantics of time and aspect in Russian; Semantics of narrative]. Moscow, Yazyki slavyanskoi kul’tury Publs. (In Russ.)

  7. Rubinshtein S.L. 2009. Osnovy obshchei psikhologii [Fundamentals of General Psychology]. Saint-Petersburg, Piter Publs. (In Russ.)

  8. Stolyarov M. 1937. Iskusstvo perevoda khudozhestvennoi prozy [The art of prose translation]. Literaturnyi kritik, no. 5–6, pp. 242–254. (In Russ.)

  9. Chukovskii K. 1919. Perevody prozaicheskie [Prosaic translations]. Principles of literary translation. Petersburg, Vsemirnaya literatura Publs, pp. 7–24. (In Russ.)

  10. Shapiro A.B. 1974. Sovremennyi russkii yazyk. Punktuatsiya [The modern Russian language. Punctuation]. Moscow, Prosveshchenie Publs. (In Russ.)

  11. Barrault L. et al. 2023. SeamlessM4T-Massively Multilingual & Multimodal Machine Translation. arXiv preprint arXiv:2308.11596. URL: https://arxiv.org/abs/2308.11596 (accessed: 05.03.2024).

  12. Bystrova-McIntyre T. 2007. Looking at the Overlooked: A Corpora Study of Punctuation Use in Russian and English. TIS, vol. 2, no. 1, pp. 137–162.

  13. Catach N. 1996. La ponctuation. Histoire et systè me. Paris, PUF.

  14. Drillon F. 1991. Traité de la ponctuation française. Paris, Gallimard.

  15. Dugas A. 2004. Guide de la ponctuation. Montréal, É ditions Logiques.

  16. Malmkjær K. 1997. Punctuation in Hans Christian Andersen’s stories and in their translations into English. Nonverbal communication and translation: New perspectives and challenges in literature, interpretation and the media. Ed. F. Poyatos. Amsterdam, Philadelphia, John Benjamins Publishing Company, pp. 151–162.

  17. May R. 1994. The Translator in the Text: On Reading Russian Literature in English. Evanston, IL, Northwestern University Press.

  18. Nádvorníková O. 2020. The use of English, Czech and French punctuation marks in reference, parallel and comparable web corpora: a question of methodology. Linguistica Pragensia. vol. 30, no. 2, pp. 30–50.

  19. Nguyen B. et al. 2019. Fast and accurate capitalization and punctuation for automatic speech recognition using transformer and chunk merging. 2019 22nd conference of the oriental COCOSDA international committee for the co-ordination and standardization of speech databases and assessment techniques (O-COCOSDA), IEEE, pp. 1–5.

  20. Nozaki J. et al. 2022. End-to-end Speech-to-Punctuated-Text Recognition. arXiv preprint arXiv:2207.03169. URL: https://arxiv.org/abs/2207.03169 (accessed: 02.03.2024).

  21. Riegel M., Pellat J.-Ch., Rioul R. 2014. Grammaire méthodique du français. 5e éd. Paris, PUF.

  22. Rubenstein P.K. et al. 2023. AudioPaLM: A Large Language Model That Can Speak and Listen. arXiv preprint arXiv:2306.1292. URL: https://arxiv.org/abs/2306.12925 (accessed: 05.03.2024).

  23. Wollin L. 2018. Punctuation: Providing the Setting for Translation? Studia Neophilologica, vol. 90, no. S1, pp. 37–49.

  24. Youdale R. 2020. Using computers in the translation of literary style: Challenges and opportunities. London, UK; New York, NY, USA, Routledge.

  25. Zhou Z., Tan T., Qian Y. 2022. Punctuation Prediction for Streaming On-Device Speech Recognition. ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 7277–7281