Generate Summaries using Google’s Pegasus library

Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

Chetan Ambi
4 min readAug 5, 2020

--

Photo by Sudan Ouyang on Unsplash

PEGASUS stands for Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models. It uses self-supervised objective Gap Sentences Generation (GSG) to train a transformer encoder-decoder model. The paper can be found on arXiv. In this article, we will only focus on generating state of the art abstractive summaries using Google’s Pegasus library.

As of now, there is no easy way to generate the summaries using Pegasus library. However, Hugging Face is already working on implementing this and they are expecting to release it around September 2020. In the meantime, we can try to follow the steps mentioned Pegasus Github repository and explore Pegasus. So let’s get started.

Update: 15-Sep-2020
Based on the request from many readers, I have added the full code at the end of the article. Hope this helps you!!

This step will clone the library on GitHub, create /content/pegasus folder, and install requirements.

--

--

Chetan Ambi

Data Science | Data Engineering | Big Data | Python | PySpark | Azure. Visit https://pythonsimplified.com