If you’ve ever had to read through hundreds of pages of research only to find out that you need to be reading something else, you’ll see the appeal in text summarization. But that’s just one simple example. Text summarization is the task of condensing a piece of text to a shorter version, while at the same time preserving key informational elements and the meaning of content. Some other examples of text summarization applicants can be help desk and customer support, legal text summarization, financial research and more.
This blog will focus on text summarization for medical research papers. Keep reading to learn about the value of text summarization, and a step by step guide to using cnvrg.io’s pre-trained text summarization model and PubMed connector.
Why text summarization for medical papers?
Scientific research is complex and needs a thorough understanding of the latest advances being made on the topic of study. Researchers have the difficult task of reading through hundreds of pages of research papers and often the papers they are reading are not directly related to their topic. Since medical research and diagnosis can be time sensitive, it becomes increasingly important to speed up the process and prioritize the relevant papers and texts.
cnvrg.io ready to use text summarization model
A key tool that is useful in this scenario is an adequate summarizer specialising in summarizing scientific research papers. Instead of having to read entire papers, the researchers can read summaries and make decisions on whether to dive deeper into the entire paper. This helps save time and cost on research.
cnvrg.io created a text summarization pipeline for scientific research papers. It supports a list of comprehensive data connectors that allow you to transfer data from other sources like PubMed, PLOS, Azure storage, S3 storage and more to cnvrg.io for summarization. The most important connector for this task is the PubMed connector which allows you to enter a research query and fetch the top N research papers from PubMed where N can be set by you. You can then directly summarize these research papers.
Step by step guide of querying PubMed and summarizing the retrieved research papers with cnvrg.io
1. Go to Blueprints marketplace on the the cnvrg.io platform and search PubMed in the search bar. Select the PubMed Summarization Batch blueprint.
2. Next click on Use Blueprint button, this will create a new project for you.
3. Click continue and you will get a flow like the one shown below.
4. The first task is the PubMed connector which will be used to get data from PubMed and the next is the Summarizer which will be used to summarize the pdfs retrieved from PubMed.
5. In order to change the parameters of any task, click on it and select the options that suit you best.
6. We are going to query “Covid cases in Europe” and retrieve top 10 papers that best suit this query from PubMed and then summarize them. In order to provide this query, click on the PubMed Connector task and enter the query in the field named query. Change max_results to 10 and enter your email address.
7. Next, you can change compute to CPU or GPU depending on your needs in the advanced tab above. You can similarly make changes to the summarizer compute if needed.
8. Click on save changes button below.
9. Click on the Run button at the top right and then click on Run again to confirm.
10. Your experiments will start running and once the experiments are complete it will look like this.
11. You can review the papers that were retrieved from PubMed by clicking on the PubMed task and clicking on Artifacts and scrolling down to Output Artifacts.
12. Similarly, you can review the summaries generated by clicking on the Summarizer task and clicking on Artifacts and scrolling down to Output Artifacts.
13. The summaries will be available in a file called results.json. In json, the name of each pdf file in PubMed output Artifacts will be the key and the value will be the summary of that pdf file.
Same as the PubMed connector, users can upload their own papers to cnvrg.io and use them directly with the summarizer task, or use PLOS, AWS or Azure connectors if their papers are present in other locations. Here is an example pipeline where you can use PLOS connector (all you need to do is select PLOS connector from the task list)
In conclusion, the ability to summarize text quickly and accurately allows businesses and organizations to improve efficiency and productivity while gaining valuable insights from their data. With cnvrg.io’s ready to use text summarization model and PubMed connector, doctors, researchers and other professionals can save valuable time and improve the accuracy of medical decision making, as well as facilitate knowledge sharing among healthcare professionals.