言語の多様性とその進化に光を当てる(Shedding light on linguistic diversity and its evolution)

言語学者とコンピュータ科学者が協力して、大規模なグローバルオープンアクセスレキシカルデータベースを公開 Linguists and computer scientists collaborate to publish a large global Open Access lexical database
1. 音韻と語彙の特徴を計算した標準的な単語リストのパブリックリポジトリ「Lexibank」 Lexibank, a public repository of standardized wordlists with computed phonological and lexical features
  1. Abstract

言語学者とコンピュータ科学者が協力して、大規模なグローバルオープンアクセスレキシカルデータベースを公開 Linguists and computer scientists collaborate to publish a large global Open Access lexical database

2022-06-16 マックス・プランク研究所

ドイツのマックス・プランク進化人類学研究所とニュージーランドのオークランド大学の研究者が、言語データの新しいグローバルリポジトリを構築しました。このプロジェクトは、現在世界中で話されている言語の単語や音の進化に関する新たな洞察を促進することを目的としています。Lexibankデータベースは、2000以上の言語の標準的な語彙データを含んでいます。これは、これまでに編集された一般公開されたコレクションとしては、最も広範なものです。

<関連情報>

Shedding light on linguistic diversity and its evolution

Lexibank: Linguists and computer scientists collaborate to publish a large global Open Access lexical database

Lexibank, a public repository of standardized wordlists with computed phonological and lexical features - Scientific Data

Measurement(s) expressions Technology Type(s) data aggregation Factor Type(s) none Sample Characteristic - Organism huma...

音韻と語彙の特徴を計算した標準的な単語リストのパブリックリポジトリ「Lexibank」 Lexibank, a public repository of standardized wordlists with computed phonological and lexical features

Johann-Mattis List,Robert Forkel,Simon J. Greenhill,Christoph Rzymski,Johannes Englisch & Russell D. Gray

Scientific Data Published:16 June 2022

DOI:https://doi.org/10.1038/s41597-022-01432-0

Abstract

The past decades have seen substantial growth in digital data on the world’s languages. At the same time, the demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to diverse questions on human prehistory, cultural evolution, and human cognition. Unfortunately, most published datasets lack standardization which makes their comparison difficult. Here, we present a new approach to increase the comparability of cross-linguistic lexical data. We have designed workflows for the computer-assisted lifting of datasets to Cross-Linguistic Data Formats, a collection of standards that make these datasets more Findable, Accessible, Interoperable, and Reusable (FAIR). We test the Lexibank workflow on 100 lexical datasets from which we derive an aggregated database of wordlists in unified phonetic transcriptions covering more than 2000 language varieties. We illustrate the benefits of our approach by showing how phonological and lexical features can be automatically inferred, complementing and expanding existing cross-linguistic datasets.

Measurement(s)	expressions
Technology Type(s)	data aggregation
Factor Type(s)	none
Sample Characteristic – Organism	human language
Sample Characteristic – Location	global scale

月	火	水	木	金	土	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30