2210.09150.pdf

– The paper focuses on improving the reliability of GPT-3.
– Reliability is decomposed into four facets: generalizability, social biases, calibration, and factuality.
– Simple and effective prompts are established to improve GPT-3’s reliability.
– GPT-3 is more reliable than smaller-scale supervised models with appropriate prompts.

– The paper systematically studies the reliability of GPT-3 from four key facets: generalizability, fairness, calibration, and factuality.
– Effective prompting strategies are developed to make GPT-3 outperform supervised models on these facets.
– The paper provides practical recommendations for users of GPT-3.
– The work reveals new insights into large language models (LLMs).
– The paper suggests future work on examining more facets of reliability, applying prompting methods to real-world applications, and exploring more effective prompting strategies.