自然言語処理 BERT

BERTとは何ですか? 

Bidirectional Encoder Representations from Transformers, (BERTとしてよく知られています)は、さまざまなNLPタスクの最先端のパフォーマンスを向上させ、他の多くの革新的なアーキテクチャの足がかりとなった、Googleによる革新的な論文です。
BERTがドメイン全体に新しい方向性を設定したと言っても過言ではありません。 これは、事前にトレーニングされたモデル(巨大なデータセットでトレーニングされた)を使用し、ダウンストリームのタスクとは無関係に学習を転送することの明らかな利点を示しています。

Transformers とは

Transformers とは Hugging Face 社が公開している、最先端の NLP モデルの実装と事前学習済みモデルを提供するライブラリです。Transformers を利用することで、BERT やその派生モデル (2021 年 11 月現在で 82 種類!) を誰でも無料で手軽に利用することができます。

また、NLP の代表的なタスク (文書分類、文書生成、質問応答、要約、固有表現抽出など) の実装も用意されているので、BERT をはじめとする事前学習済みモデルと組み合わせることで、簡単にモデルの構築を行うことが可能です。

Transformers は Python の実行環境があれば利用可能で、pip コマンドや conda コマンドで簡単にインストールできます。機械学習フレームワークとしては PyTorch や TensorFlow、JAX が利用できます。

 

amaru-ai.com

 

www.softbanktech.co.jp

 

https://www.gifu-nct.ac.jp/elec/deguchi/sotsuron/goto.pdf

 

pip install transformers["ja"]
Requirement already satisfied: transformers[ja] in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (4.24.0)Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: requests in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from transformers[ja]) (2.23.0)
Requirement already satisfied: filelock in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from transformers[ja]) (3.0.12)
Requirement already satisfied: packaging>=20.0 in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from transformers[ja]) (21.3)
Requirement already satisfied: huggingface-hub<1.0,>=0.10.0 in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from transformers[ja]) (0.10.1)
Requirement already satisfied: importlib-metadata in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from transformers[ja]) (1.6.0)
Requirement already satisfied: numpy>=1.17 in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from transformers[ja]) (1.21.6)
Requirement already satisfied: regex!=2019.12.17 in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from transformers[ja]) (2020.5.14)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from transformers[ja]) (0.13.2)
Requirement already satisfied: pyyaml>=5.1 in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from transformers[ja]) (5.3.1)
Requirement already satisfied: tqdm>=4.27 in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from transformers[ja]) (4.46.0)
Collecting ipadic<2.0,>=1.0.0
  Downloading ipadic-1.0.0.tar.gz (13.4 MB)
     --------------------------------------- 13.4/13.4 MB 13.9 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting sudachidict-core>=20220729
  Downloading SudachiDict-core-20221021.tar.gz (9.0 kB)
  Preparing metadata (setup.py): started

  Preparing metadata (setup.py): still running...
  Preparing metadata (setup.py): finished with status 'done'
Collecting pyknp>=0.6.1
  Downloading pyknp-0.6.1-py3-none-any.whl (42 kB)
     -------------------------------------- 42.5/42.5 kB 511.4 kB/s eta 0:00:00
Collecting fugashi>=1.0
  Downloading fugashi-1.2.0-cp37-cp37m-win_amd64.whl (498 kB)
     -------------------------------------- 498.9/498.9 kB 5.2 MB/s eta 0:00:00
Collecting unidic-lite>=1.0.7
  Downloading unidic-lite-1.0.8.tar.gz (47.4 MB)
     ---------------------------------------- 47.4/47.4 MB 7.6 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting sudachipy>=0.6.6
  Downloading SudachiPy-0.6.6-cp37-cp37m-win_amd64.whl (1.0 MB)
     ---------------------------------------- 1.0/1.0 MB 8.2 MB/s eta 0:00:00
Collecting unidic>=1.0.2
  Downloading unidic-1.1.0.tar.gz (7.7 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Requirement already satisfied: typing-extensions>=3.7.4.3 in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from huggingface-hub<1.0,>=0.10.0->transformers[ja]) (4.4.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from packaging>=20.0->transformers[ja]) (2.4.7)
Requirement already satisfied: six in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from pyknp>=0.6.1->transformers[ja]) (1.14.0)
Collecting wasabi<1.0.0,>=0.6.0
  Downloading wasabi-0.10.1-py3-none-any.whl (26 kB)
Collecting plac<2.0.0,>=1.1.3
  Downloading plac-1.3.5-py2.py3-none-any.whl (22 kB)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from requests->transformers[ja]) (1.25.9)
Requirement already satisfied: certifi>=2017.4.17 in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from requests->transformers[ja]) (2020.4.5.1)
Requirement already satisfied: chardet<4,>=3.0.2 in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from requests->transformers[ja]) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from requests->transformers[ja]) (2.9)
Requirement already satisfied: zipp>=0.5 in g:\winpython\wpy64-3771\python-3.7.7.amd64\lib\site-packages (from importlib-metadata->transformers[ja]) (3.1.0)
Building wheels for collected packages: ipadic, sudachidict-core, unidic, unidic-lite
  Building wheel for ipadic (setup.py): started
  Building wheel for ipadic (setup.py): finished with status 'done'
  Created wheel for ipadic: filename=ipadic-1.0.0-py3-none-any.whl size=13556708 sha256=17a9c685f95095bd2365b817191af55123cbc6a24472d966b14bf99ec9f96c5f
  Stored in directory: c:\users\**\appdata\local\pip\cache\wheels\33\8b\99\cf0d27191876637cd3639a560f93aa982d7855ce826c94348b
  Building wheel for sudachidict-core (setup.py): started
  Building wheel for sudachidict-core (setup.py): finished with status 'done'
  Created wheel for sudachidict-core: filename=SudachiDict_core-20221021-py3-none-any.whl size=71574769 sha256=9ec9fc29eb73208b35f75da5e73435908cdb1bba2cdd4f19e2578c324b8e1013
  Stored in directory: c:\users\**\appdata\local\pip\cache\wheels\66\a9\e1\bde612c31f0ae6877e7e39f278076befd399c488cba80292b6
  Building wheel for unidic (setup.py): started
  Building wheel for unidic (setup.py): finished with status 'done'
  Created wheel for unidic: filename=unidic-1.1.0-py3-none-any.whl size=7414 sha256=cbc7dea6b993813c33c99d75ef38ace036aeb3b3a069db57641adb0aca84052e
  Stored in directory: c:\users\**\appdata\local\pip\cache\wheels\ce\4d\f1\170bb74b559ca338113c0315c9805e16dfd0a12411ec6b1122
  Building wheel for unidic-lite (setup.py): started
  Building wheel for unidic-lite (setup.py): finished with status 'done'
  Created wheel for unidic-lite: filename=unidic_lite-1.0.8-py3-none-any.whl size=47658824 sha256=3e78686ca65934d0632a679808c5dce46a2d0c6f976074d30c70305f55f48b16
  Stored in directory: c:\users\**\appdata\local\pip\cache\wheels\de\69\b1\112140b599f2b13f609d485a99e357ba68df194d2079c5b1a2
Successfully built ipadic sudachidict-core unidic unidic-lite
Installing collected packages: wasabi, unidic-lite, sudachipy, plac, ipadic, sudachidict-core, pyknp, fugashi, unidic
Successfully installed fugashi-1.2.0 ipadic-1.0.0 plac-1.3.5 pyknp-0.6.1 sudachidict-core-20221021 sudachipy-0.6.6 unidic-1.1.0 unidic-lite-1.0.8 wasabi-0.10.1