孤児記事:ウィキペディアの「ダークマター」(Orphan articles: the ’dark matter’ of Wikipedia)

ad

2024-05-17 スイス連邦工科大学ローザンヌ校(EPFL)

Wikipediaには300以上の言語で6000万の記事があり、毎月20万の記事が追加されます。しかし、他の記事からリンクされない「孤立記事」も存在します。EPFLとウィキメディア財団の研究者たちは、全319言語版のWikipediaで孤立記事を調査しました。その結果、全記事の約15%、約900万の記事が孤立していることが判明しました。孤立記事は視認性が低く、閲覧数も少ないです。研究者たちは、他言語でリンクが存在する場合、そのリンクを翻訳して孤立記事に追加する方法を提案し、この方法で63%以上の孤立記事にリンクを提案できることを示しました。さらに、AIを活用してリンク推奨ツールを開発し、編集者を支援する取り組みも進行中です。

<関連情報>

孤児記事: ウィキペディアのダークマター Orphan Articles: The Dark Matter of Wikipedia

Akhil Arora, Robert West, Martin Gerlach
arXiv  Submitted on 6 Jun 2023
DOI:https://doi.org/10.48550/arXiv.2306.03940

Abstract

With 60M articles in more than 300 language versions, Wikipedia is the largest platform for open and freely accessible knowledge. While the available content has been growing continuously at a rate of around 200K new articles each month, very little attention has been paid to the accessibility of the content. One crucial aspect of accessibility is the integration of hyperlinks into the network so the articles are visible to readers navigating Wikipedia. In order to understand this phenomenon, we conduct the first systematic study of orphan articles, which are articles without any incoming links from other Wikipedia articles, across 319 different language versions of Wikipedia. We find that a surprisingly large extent of content, roughly 15\% (8.8M) of all articles, is de facto invisible to readers navigating Wikipedia, and thus, rightfully term orphan articles as the dark matter of Wikipedia. We also provide causal evidence through a quasi-experiment that adding new incoming links to orphans (de-orphanization) leads to a statistically significant increase of their visibility in terms of the number of pageviews. We further highlight the challenges faced by editors for de-orphanizing articles, demonstrate the need to support them in addressing this issue, and provide potential solutions for developing automated tools based on cross-lingual approaches. Overall, our work not only unravels a key limitation in the link structure of Wikipedia and quantitatively assesses its impact, but also provides a new perspective on the challenges of maintenance associated with content creation at scale in Wikipedia.

ad
1604情報ネットワーク
ad
ad


Follow
ad
タイトルとURLをコピーしました