SciBERT: A Pretrained Language Model for Scientific Text

- SciBERT is a pretrained language model based on BERT. - It addresses the lack of high-quality, large-scale labeled scientific data. - SciBERT improves performance on downstream scientific NLP tasks. - It achieves new state-of-the-art results on several tasks. - The code and pretrained models are available at https://github.com/allenai/scibert/.

– NLP is important for extracting knowledge from scientific publications.
– Training deep neural models requires large amounts of labeled data.
– Annotated data in scientific domains is difficult and expensive to collect.
– Unsupervised pretraining of language models improves performance on NLP tasks.
– SCIBERT is a pretrained language model based on BERT trained on scientific text.

SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.

– SciBERT is a pretrained language model for scientific text based on BERT.
– SciBERT outperforms BERT-Base and achieves new state-of-the-art results on several tasks.
– Future work includes releasing a version of SciBERT analogous to BERT-Large.

– SciBERT outperforms BERT-Base on scientific tasks.
– Achieves new state-of-the-art results on many scientific tasks.