– Large language models (LLMs) are dominant in NLP.
– GPT-3 is a popular and flexible LLM.
– GPT-3 is a large language model (LLM) that is popular and easy to use.
– GPT-3 can be prompted with natural language text to shape predictions.
– GPT-3’s reliability can be improved through effective prompts.
– GPT-3 outperforms smaller-scale supervised models in terms of reliability.
– Provides practical recommendations for users of GPT-3.
– Inspires future work on examining more facets of reliability and applying prompting methods to real-world applications.
– The paper explores how to improve the reliability of GPT-3.
– It focuses on four facets of reliability: generalizability, social biases, calibration, and factuality.
– Effective prompting strategies improve GPT-3’s reliability.
– GPT-3 outperforms supervised models on multiple facets.
– GPT-3 is better calibrated than supervised DPR-BERT.
– Increasing the number of examples in the prompt improves accuracy.
– GPT-3 has similar calibration regardless of the source of examples.
– GPT-3’s confidence scores are more discriminative.
– Selective prediction based on GPT-3 confidence scores is effective.
– Large language models (LLMs) are powerful tools for understanding and generating text.
– GPT-3 is a popular LLM that is easy to use.
– GPT-3 can be made more reliable by using specific prompts.
– Reliability includes factors like generalizability, social biases, calibration, and factuality.
– Prompting strategies can help practitioners use GPT-3 more reliably.