– Large language models (LLMs) are dominant in NLP.
– GPT-3 can be prompted with natural language text to shape predictions.
– GPT-3’s reliability can be improved through effective prompts.
– GPT-3 outperforms smaller-scale supervised models in terms of reliability.

– Provides practical recommendations for users of GPT-3.
– Inspires future work on examining more facets of reliability and applying prompting methods to real-world applications.

– It focuses on four facets of reliability: generalizability, social biases, calibration, and factuality.

– GPT-3 is better calibrated than supervised DPR-BERT.
– Increasing the number of examples in the prompt improves accuracy.
– GPT-3 has similar calibration regardless of the source of examples.
– GPT-3’s confidence scores are more discriminative.
– Selective prediction based on GPT-3 confidence scores is effective.

– Large language models (LLMs) are powerful tools for understanding and generating text.
– Reliability includes factors like generalizability, social biases, calibration, and factuality.
