Generate Summaries using Google’s Pegasus library
Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models
PEGASUS stands for Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models. It uses self-supervised objective Gap Sentences Generation (GSG) to train a transformer encoder-decoder model. The paper can be found on arXiv. In this article, we will only focus on generating state of the art abstractive summaries using Google’s Pegasus library.
As of now, there is no easy way to generate the summaries using Pegasus library. However, Hugging Face is already working on implementing this and they are expecting to release it around September 2020. In the meantime, we can try to follow the steps mentioned Pegasus Github repository and explore Pegasus. So let’s get started.
Based on the request from many readers, I have added the full code at the end of the article. Hope this helps you!!
This step will clone the library on GitHub, create /content/pegasus folder, and install requirements.
Next, follow the instructions to install gsutil. The below steps worked well for me in Colab.
This will create a folder named ckpt under /content/pegasus/ and then download all the necessary files (fine-tuned models, vocab etc.) from Google Cloud to /content/pegasus/ckpt.