In today's digital era, eBooks have gained immense popularity as a convenient way to access a vast library of literature. However, merely reading eBooks is no longer the only option available to us. With advancements in natural language processing and generative AI models, we can now delve deeper into eBook contents, analyze them, and even query over them using cutting-edge tools like Langchain. In this blog post, we will explore the fascinating world of eBooks, learn how to analyze their contents, and query specific information using generative AI and Langchain.
In this blog, we will explore how Generative AI can be leveraged to read an ebook and create a custom knowledge based on one of our stories The Monkey's Paw by W. W. Jacobs
The Monkey's Paw is a captivating short story written by W. W. Jacobs. The story revolves around a mystical talisman, the monkey's paw, which grants its owner three wishes. However, as the characters soon discover, every wish comes at a price, leading them down a path of unforeseen and chilling events. With its gripping narrative and thought-provoking moral dilemmas, "The Monkey's Paw" continues to captivate readers with its timeless appeal.
Prerequisite
- Introduction to BluetickPDF
- Comparing BluetickPDF with Other Popular Tools
- Analyzing eBook Contents:
In this blog post, we further expand upon our previous exploration of using generative AI to analyze PDFs and extract knowledge. If you haven't read our previous blog, "PDF Analysis and Querying with Generative AI," we encourage you to check it out for a comprehensive understanding of the topic, where we introduce a new open-source tool called BluetickPDF, which offers advanced capabilities in reading and analyzing PDF documents using generative AI.
BLOG - PDF Analysis and Querying with Generative AI Blog -
In our ongoing quest to enhance PDF analysis with generative AI, we conducted a comprehensive comparison of BluetickPDF with two other prominent tools: Humata and ChatPDF
To Discover more about the results and insights from our comparison, we invite you to read our blog post titled
"The Ultimate PDF Analyzer Showdown: Humata vs. ChatPDF vs. BluetickPDF."
Let's start analyzing BluetickEBOOKS!
To begin our exploration, let's consider an example eBook called "The Monkey's Paw." We start by importing the necessary libraries and loading the eBook file:
file_name = "The Monkey's Paw.epub"
import ebooklib
from ebooklib import epub
book = epub.read_epub(file_name)
items = list(book.get_items_of_type(ebooklib.ITEM_DOCUMENT))
By accessing the book's items, we can extract individual chapters or sections for further analysis. We collect all the chapters into a list:
chapters = []
for item in book.get_items():
if item.get_type() == ebooklib.ITEM_DOCUMENT:
chapters.append(item.get_content())
Next, we convert each chapter's HTML content into plain text for easier processing:
from bs4 import BeautifulSoup
def chapter_to_str(chapter):
soup = BeautifulSoup(chapter, 'html.parser')
text = [para.get_text() for para in soup.find_all('p')]
return ' '.join(text)
texts = ""
for c in chapters:
raw_text = chapter_to_str(c)
texts += raw_text.replace("\n", "")
Now we have all the text from the eBook concatenated into a single string, ready for analysis.
Querying eBook Contents: To enable querying over the eBook's contents, we utilize Langchain, a powerful framework that integrates generative AI models and other tools. Firstly, we split the eBook into smaller documents using Langchain's text splitter:
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(separator=".", chunk_size=2000, chunk_overlap=200, length_function=len)
pages = text_splitter.create_documents([texts])
num_documents = len(pages)
print(f"Now our book is split up into {num_documents} documents")
print(pages[0])
By splitting the eBook into smaller documents, we can perform more efficient and targeted queries.
Next, we leverage Langchain's embeddings and vector stores to enable similarity search and question-answering capabilities. We use OpenAI's embeddings and Pinecone as the vector store:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
import pinecone
embeddings = OpenAIEmbeddings(openai_api_key=os.environ.get("OPENAI_API_KEY"))
# Initialize Pinecone
pinecone.init(api_key=os.environ.get("PINECONE_API_KEY"), environment=os.environ.get("PINECONE_API_ENV"))
index_name = "the-monkeys-paw"
# Create the index
docsearch = Pinecone.from_texts([t.page_content for t in pages], embeddings, index_name=index_name)
We have now set up a vector store index, allowing us to perform similarity searches and retrieve relevant documents based on queries.
Finally, we can utilize Langchain's generative AI models to answer questions about the eBook's contents. We employ the ChatOpenAI model for this purpose:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(temperature=0, max_tokens=1000, model_name='gpt-3.5-turbo', openai_api_key=os.environ.get("OPENAI_API_KEY"))
from langchain.chains import RetrievalQA
index_name = "the-monkeys-paw"
text_field = "text"
index = pinecone.Index(index_name)
vectorstore = Pinecone(
index, embeddings.embed_query, text_field
)
query = "Who is the author of The Monkey's Paw"
docs = vectorstore.similarity_search(query, k=3)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever()
)
output = qa.run(query)
print(output)
The author of The Monkey's Paw is W. W. Jacobs.
Here are some of the questions asked to BluetickEBOOK related to The Monkey's Paw
query = "What is the mokey's paw?"
docs = vectorstore.similarity_search(query, k=3)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="refine",
retriever=vectorstore.as_retriever()
)
output = qa.run(query)
print(output)
The monkey's paw is a magical object mentioned in the short story "The Monkey's Paw" by W.W. Jacobs. It is introduced by a visitor, Sergeant-Major Morris, who describes it as a dried-up paw of a monkey that has the power to grant three wishes to the person who possesses it. The paw is initially dismissed as nonsense by the family of the protagonist, Mr. White, but they become increasingly curious about it. After Mr. White acquires the paw from Sergeant-Major Morris, he makes a wish for two hundred pounds, and the wish is granted in a way that brings about tragic consequences. The paw is a central element of the story and serves as a cautionary tale about the dangers of greed and the unintended consequences of making wishes.
query = "Where did Sergeant-Major Morris find the Monkey's paw"
docs = vectorstore.similarity_search(query, k=3)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="refine",
retriever=vectorstore.as_retriever()
)
output = qa.run(query)
print(output)
According to the given context, Sergeant-Major Morris obtained the Monkey's paw from an old fakir who put a spell on it, allowing three separate men to each have three wishes from it. The first man had his three wishes, and the third wish was for death. The sergeant-major obtained the paw after that wish was granted. He had considered selling it but decided against it due to the mischief it had already caused. He ultimately threw it into the fire. However, there is no information about where he found the Monkey's paw.
query = "What is the moral of the story?"
docs = vectorstore.similarity_search(query, k=3)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever()
)
output = qa.run(query)
print(output)
The moral of the story is that one should be careful what they wish for, as the consequences of their wishes may not be what they expect and can lead to unforeseen and tragic outcomes.
query = "Provide the summary of this story"
docs = vectorstore.similarity_search(query, k=20)
qa = RetrievalQA.from_chain_type(
llm=llm_new,
chain_type="refine",
retriever=vectorstore.as_retriever()
)
output = qa.run(query)
print(output)
The story is about an old couple who come into possession of a monkey's paw that grants three wishes. The first wish brings tragedy upon them, and they bury their son. Later, a visitor comes to their home to inform them that their son was caught in machinery and has died. The old woman becomes obsessed with using the paw to bring him back to life and eventually convinces her husband to use the second wish. However, their decision leads to a terrifying consequence, and they realize that they should have left their son to rest in peace. In the end, the old man finds the monkey's paw and frantically makes his third and final wish just as their son, who has become a terrifying corpse, knocks on the door. The knocking stops, and the old couple hears their son's long, loud wail of disappointment and misery. The story highlights the consequences of greed and the dangers of meddling with fate.
"Generative AI and eBooks: Where fiction meets friction and imagination gets algorithmically adventurous!"