The following are common NLP tasks:
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")
[{'label': 'POSITIVE', 'score': 0.9598047137260437}]
classifier(
["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)
[{'label': 'POSITIVE', 'score': 0.9598047137260437},
{'label': 'NEGATIVE', 'score': 0.9994558095932007}]
The zero-shot-classification pipeline lets you select the labels for classification
The text-generation pipeline uses an input prompt to generate text
Here is an example of using the distilgpt2 model
from transformers import pipeline
generator = pipeline("text-generation", model="distilgpt2")
generator(
"In this course, we will teach you how to",
max_length=30,
num_return_sequences=2
)
Output:
[{'generated_text': 'In this course, we will teach you how to understand and use '
'data flow and data interchange when handling user data. We '
'will be working with one or more of the most commonly used '
'data flows — data flows of various types, as seen by the '
'HTTP'}]
Google Translate uses Neural Machine Translation (NMT), primarily powered > by the Transformer architecture, for accurate and fluent translations. The system works as follows: Preprocessing: The input text is tokenized, language-detected, and normalized.
Translation via NMT: An Encoder-Decoder framework processes input in the source language and generates output in the target language. Self-Attention Mechanism ensures the model captures context across sentences. Beam Search selects the most probable translation. Post-Processing: Detokenization and formatting ensure natural output.
Key Technologies: Transformers: Efficiently handle context, replacing older models like RNNs. Multilingual Training: Enables translations between languages even without direct bilingual data. Custom Hardware: Google uses TPUs for efficient model training and inference.
Benefits: Context-aware and fluent translations. Continuous improvement via user feedback and updated training data.
Limitations: Struggles with idiomatic expressions, low-resource languages, and complex grammar. In essence, Google Translate is a state-of-the-art application of deep learning and NLP, designed for large-scale and efficient translation services.
import numpy as np
n_best = 20
max_answer_length = 30
predicted_answers = []
for example in small_eval_set:
example_id = example["id"]
context = example["context"]
answers = []
for feature_index in example_to_features[example_id]:
start_logit = start_logits[feature_index]
end_logit = end_logits[feature_index]
offsets = eval_set["offset_mapping"][feature_index]
start_indexes = np.argsort(start_logit)[-1 : -n_best - 1 : -1].tolist()
end_indexes = np.argsort(end_logit)[-1 : -n_best - 1 : -1].tolist()
for start_index in start_indexes:
for end_index in end_indexes:
# Skip answers that are not fully in the context
if offsets[start_index] is None or offsets[end_index] is None:
continue
# Skip answers with a length that is either < 0 or > max_answer_length.
if (
end_index < start_index
or end_index - start_index + 1 > max_answer_length
):
continue
answers.append(
{
"text": context[offsets[start_index][0] : offsets[end_index][1]],
"logit_score": start_logit[start_index] + end_logit[end_index],
}
)
best_answer = max(answers, key=lambda x: x["logit_score"])
predicted_answers.append({"id": example_id, "prediction_text": best_answer["text"]})