「Common Voice」の版間の差分

Common Voice
開発元	Mozilla Foundation
初版	2017年6月19日 (7年前)
リポジトリ	https://github.com/mozilla/voice-web
対応言語	多言語 (言語のリスト)
ライセンス	Creative Commons CC0
公式サイト	commonvoice.mozilla.org
	テンプレートを表示

履歴の双方向閲覧

← 古い編集新しい編集 →

削除された内容追加された内容

ビジュアルウィキテキスト

インライン

2021年5月3日 (月) 15:16時点における版

Common Voiceは、音声認識ソフトウェアの開発のための無料データベース作成を目的にして、Mozillaによって立ち上げられたクラウドソーシングプロジェクトである。このプロジェクトは、ボランティアによって支えられている。ボランティアは、自らのマイクを用いてサンプル文を読み上げた音声を録音し、また他のユーザーが録音した音声の正確性の検証も行う。録音された音声は、パブリックドメインライセンスCC0の下で利用可能な音声データベースに集められる。このライセンスによって、開発者は何らの制限や使用料等なしに、データベースを音声認識アプリケーションに利用することができる。非公式のAndroidアプリが利用可能である。

目的

Common Voiceは、多様性に富む音声サンプルを提供することを目的としている。Mozillaのカタリナ・ボルヒェルト（Katharina Borchert）によると、既存のプロジェクトの多くは、公共のラジオから作成されたデータセットを用いていた。ラジオの出演者には男性や標準的な発音の話者が多いため、女性の声であったり、特徴的な発音や訛りを持つ人々の声の標本数が少ない偏ったデータセットになりがちであった。 ^[1]

音声データベース

英語版Common Voiceデータベースは、自由にアクセス可能な音声データベースとしては、LibriSpeechに次ぐ規模である。2017年11月29日に最初のデータが公開された時点で世界中の2万人以上のユーザーにより、40万の検証済みの音声を登録し、録音時間は合計500時間に及ぶ。^[2]

に、世界中の2万人以上のユーザーが40万件の検証済みの文章を登録しており、その長さは500時間にも及ぶ。 ^[3]

2019年2月、最初のバージョンのコーパスがリリースされ、これには18の言語の音声が含まれている。英語、フランス語、ドイツ語、中国語のほか、ウェールズ語やカビル語などの少数言語の音声も含まれている。全体として、42,000人以上の貢献者による約1,400時間の録音音声データが含まれている。 ^[4]

2020年12月現在、60言語、9,283時間の音声記録がデータベースに蓄積されており、そのうち7,335時間分がボランティアによって検証済みである。 ^[5]

脚注

^ “Why do we gender AI? Voice tech firms move to be more inclusive”. The Guardian. (11 January 2020) 19 April 2020閲覧。
^ “Announcing the Initial Release of Mozilla’s Open Source Speech Recognition Model and Voice Dataset”. blog mozilla.org (November 29, 2017). Template:Cite webの呼び出しエラー：引数 accessdate は必須です。
^ “Announcing the Initial Release of Mozilla’s Open Source Speech Recognition Model and Voice Dataset”. blog mozilla.org (November 29, 2017). Template:Cite webの呼び出しエラー：引数 accessdate は必須です。
^ “Mozilla updates Common Voice dataset with 1,400 hours of speech across 18 languages”. VentureBeat (February 28, 2019). Template:Cite webの呼び出しエラー：引数 accessdate は必須です。
^ “Mozilla Common Voice updates will help train the ‘Hey Firefox’ wakeword for voice-based web browsing”. VentureBeat (1 July 2020). March 10, 2021時点のオリジナルよりアーカイブ。1 April 2021閲覧。

[1] “Why do we gender AI? Voice tech firms move to be more inclusive”. The Guardian. (11 January 2020) 19 April 2020閲覧。

[2] “Announcing the Initial Release of Mozilla’s Open Source Speech Recognition Model and Voice Dataset”. blog mozilla.org (November 29, 2017). Template:Cite webの呼び出しエラー：引数 accessdate は必須です。

[3] “Announcing the Initial Release of Mozilla’s Open Source Speech Recognition Model and Voice Dataset”. blog mozilla.org (November 29, 2017). Template:Cite webの呼び出しエラー：引数 accessdate は必須です。

[4] “Mozilla updates Common Voice dataset with 1,400 hours of speech across 18 languages”. VentureBeat (February 28, 2019). Template:Cite webの呼び出しエラー：引数 accessdate は必須です。

[5] “Mozilla Common Voice updates will help train the ‘Hey Firefox’ wakeword for voice-based web browsing”. VentureBeat (1 July 2020). March 10, 2021時点のオリジナルよりアーカイブ。1 April 2021閲覧。

[1]

[2]

[3]

[4]

[5]

@@ 29行目: / 29行目: @@
 == 脚注 ==
-# [https://www.theguardian.com/technology/2020/jan/11/why-do-we-gender-ai-voice-tech-firms-move-to-be-more-inclusive "Why do we gender AI? Voice tech firms move to be more inclusive"]. ''The Guardian(2020年1月11日). 2021年5月3日閲覧。''
-# [https://blog.mozilla.org/blog/2017/11/29/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset/ "Announcing the Initial Release of Mozilla's Open Source Speech Recognition Model and Voice Dataset"]. ''blog mozilla.org(2017年11月29日). 2021年5月3日閲覧。''.
-# [https://venturebeat.com/2019/02/28/mozilla-updates-common-voice-dataset-with-1400-hours-of-speech-across-19-languages/ "Mozilla updates Common Voice dataset with 1,400 hours of speech across 18 languages"]. ''VentureBeat(2019年2月28日)''. 2021年5月3日閲覧。
-# https://commonvoice.mozilla.org/ja/datasets

2021年5月3日 (月) 15:16時点における版

目的

音声データベース

関連項目

脚注