精品综合久久久久久97,九九热这里只有精品视频,久久国产高潮流白浆免费观看

Hugging Face Transformers是當(dāng)下非常受歡迎的深度學(xué)習(xí)庫(kù)，為NLP提供了豐富預(yù)訓(xùn)練的模型。由于這個(gè)學(xué)習(xí)庫(kù)由Hugging Face公司開(kāi)發(fā)，并與Transformers論文一起發(fā)布，因此也被稱(chēng)為Hugging Face Transformers。那么這個(gè)深度學(xué)習(xí)庫(kù)如何使用？跟隨站長(zhǎng)百科一起來(lái)看下。

一、安裝輕量級(jí)Transformers

1、基礎(chǔ)安裝（適合新手快速入門(mén)）

打開(kāi)命令行輸入這行代碼，就能安裝一個(gè)小巧的Transformers庫(kù)：

!pip install transformers

裝完后在Python里導(dǎo)入：“import transformers“

2、進(jìn)階安裝（自帶更多實(shí)用功能）

如果想解鎖更多玩法（比如分詞、文本生成），建議裝這個(gè)版本：

!pip install transformers[sentencepiece]

二、Transformer的作用

1、pipeline功能

Transformers里最神奇的是“pipeline()“函數(shù)，把模型、文本處理步驟打包好。第一次用的時(shí)候會(huì)偷偷下載模型和分詞器存到本地，下次就不用等了。

目前支持的熱門(mén)功能包括：

feature-extraction 特征提?。喊岩欢挝淖钟靡粋€(gè)向量來(lái)表示；
fill-mask 填詞：把一段文字的某些部分mask住，然后讓模型填空；
ner 命名實(shí)體識(shí)別：識(shí)別文字中出現(xiàn)的人名地名的命名實(shí)體；
question-answering 問(wèn)答：給定一段文本以及針對(duì)它的一個(gè)問(wèn)題，從文本中抽取答案；
sentiment-analysis 情感分析：一段文本是正面還是負(fù)面的情感傾向；
summarization 摘要：根據(jù)一段長(zhǎng)文本中生成簡(jiǎn)短的摘要；
text-generation文本生成：給定一段文本，讓模型補(bǔ)充后面的內(nèi)容；
translation 翻譯：把一種語(yǔ)言的文字翻譯成另一種語(yǔ)言。

2、Transformer模型分工

Model	Examples	Tasks
Encoder 編碼器模型	ALBERT, BERT, DistilBERT, ELECTRA, RoBERTa	Sentence classification, named entity recognition, extractive question answering 適合需要理解完整句子的任務(wù)，例如句子分類(lèi)、命名實(shí)體識(shí)別（以及更一般的單詞分類(lèi)）和提取式問(wèn)答
Decoder 解碼器模型	CTRL, GPT, GPT-2, Transformer XL	Text generation 解碼器模型的預(yù)訓(xùn)練通常圍繞預(yù)測(cè)句子中的下一個(gè)單詞。這些模型最適合涉及文本生成的任務(wù)
Encoder-decoder 序列到序列模型	BART, T5, Marian, mBART	Summarization, translation, generative question answering 序列到序列模型最適合圍繞根據(jù)給定輸入生成新句子的任務(wù)，例如摘要、翻譯或生成式問(wèn)答。

三、Using Transformers使用流程

1、pipeline背后流程

第一步：切詞器（Tokenizer）把文字變數(shù)字

Transformer模型看不懂文字，得先切成單詞再轉(zhuǎn)成數(shù)字。比如用“AutoTokenizer“加載一個(gè)模型：

from transformers import AutoTokenizer
checkpoint = “distilbert-base-uncased-finetuned-sst-2-english”tokenizer = AutoTokenizer.from_pretrained(checkpoint)

padding是自動(dòng)補(bǔ)全長(zhǎng)度，truncation是截?cái)噙^(guò)長(zhǎng)的句子

raw_inputs = [
“I’ve been waiting for a HuggingFace course my whole life.”,
“I hate this so much!”,]inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors=”pt”)print(inputs)

輸出的是這樣的數(shù)字字典，“input_ids“是單詞對(duì)應(yīng)的數(shù)字，“attention_mask“標(biāo)記哪些位置是真實(shí)單詞（1）哪些是補(bǔ)的空位（0）。

{
‘input_ids’: tensor([
[ 101, 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012, 102],
[ 101, 1045, 5223, 2023, 2061, 2172, 999, 102, 0, 0, 0, 0, 0, 0, 0, 0]
]),
‘attention_mask’: tensor([
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
])}

第二步：模型（Model）處理數(shù)字算結(jié)果

用“AutoModel“加載模型，把剛才的數(shù)字放進(jìn)去：

from transformers import AutoModel
checkpoint = “distilbert-base-uncased-finetuned-sst-2-english”model = AutoModel.from_pretrained(checkpoint)

輸出結(jié)果的形狀是2句話，每句16個(gè)詞，每個(gè)詞768維特征

outputs = model(**inputs)print(outputs.last_hidden_state.shape)

不同任務(wù)有專(zhuān)門(mén)的模型，比如“ForSequenceClassification“用于分類(lèi)，“ForQuestionAnswering“用于問(wèn)答，按需選擇就好。

第三步：結(jié)果翻譯（Post-Processing）

模型輸出的是原始分?jǐn)?shù)，需要用SoftMax轉(zhuǎn)成概率（比如情感分析中“正面”和“負(fù)面”的概率）：

import torch
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)print(predictions)

2、模型操作指南：創(chuàng)建、加載、保存一條龍

自己搭一個(gè)模型：

from transformers import BertConfig, BertModel
# Building the configconfig = BertConfig()
# Building the model from the configmodel = BertModel(config)

加載預(yù)訓(xùn)練模型：

from transformers import BertModel
model = BertModel.from_pretrained(“bert-base-cased”)

保存模型到本地：

model.save_pretrained(“directory_on_my_computer”)

使用Transformer model：

sequences = [“Hello!”, “Cool.”, “Nice!”]encoded_sequences = [
[101, 7592, 999, 102],
[101, 4658, 1012, 102],
[101, 3835, 999, 102],]
import torch
model_inputs = torch.tensor(encoded_sequences)

3、分詞器（Tokenizer）的獨(dú)家技巧

加載和保存：

from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained(“bert-base-cased”)tokenizer(“Using a Transformer network is simple”)
# 輸出”'{‘input_ids’: [101, 7993, 170, 11303, 1200, 2443, 1110, 3014, 102], ‘token_type_ids’: [0, 0, 0, 0, 0, 0, 0, 0, 0], ‘attention_mask’: [1, 1, 1, 1, 1, 1, 1, 1, 1]}”’
# 保存tokenizer.save_pretrained(“directory_on_my_computer”)

4、批量處理文本

模型一次能處理一批文本，但需要把句子長(zhǎng)度對(duì)齊（短的補(bǔ)空位，長(zhǎng)的截?cái)啵?/p>

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(“bert-base-cased”)
sequence = “Using a Transformer network is simple”tokens = tokenizer.tokenize(sequence)
print(tokens) # 輸出 : [‘Using’, ‘a’, ‘transform’, ‘##er’, ‘network’, ‘is’, ‘simple’]
# 從token 到輸入 IDids = tokenizer.convert_tokens_to_ids(tokens)print(ids) # 輸出：[7993, 170, 11303, 1200, 2443, 1110, 3014]

5、加載情感分析模型

import torchfrom transformers import AutoTokenizer, AutoModelForSequenceClassification
checkpoint = “distilbert-base-uncased-finetuned-sst-2-english”tokenizer = AutoTokenizer.from_pretrained(checkpoint)model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
sequence = “I’ve been waiting for a HuggingFace course my whole life.”
tokens = tokenizer.tokenize(sequence)ids = tokenizer.convert_tokens_to_ids(tokens)
input_ids = torch.tensor([ids])print(“Input IDs:”, input_ids)
output = model(input_ids)print(“Logits:”, output.logits)
# 輸出”’Input IDs: [[ 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012]]Logits: [[-2.7276, 2.8789]]”’

6、處理單句話

model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
sequence1_ids = [[200, 200, 200]]sequence2_ids = [[200, 200]]batched_ids = [
[200, 200, 200],
[200, 200, tokenizer.pad_token_id],]
print(model(torch.tensor(sequence1_ids)).logits)print(model(torch.tensor(sequence2_ids)).logits)print(model(torch.tensor(batched_ids)).logits)
# 輸出”’tensor([[ 1.5694, -1.3895]], grad_fn=<AddmmBackward>)tensor([[ 0.5803, -0.4125]], grad_fn=<AddmmBackward>)tensor([[ 1.5694, -1.3895], [ 1.3373, -1.2163]], grad_fn=<AddmmBackward>)”’