Performed two major projects throughout the Machine Learning Course:
-
Exploring machine learning models for numeric problem statements. We chose to solve a real time problem of homelessness. Please see jupyter notebook file named NumericProject_HomelessnessPrediction.ipynb
-
Exploring LLMs and related libraries and tools. See file: TeamDuos_NLPProblemStatements.ipynb. More info below
Performed by Team Members:
- Sakshi Kekre
- Poojan Shah
Decoding Kipling: NLP Assignments in Language Wizardry
This document synthesizes the work done for Natural Language Processing (NLP) assignments performed throughout the second half of our course. The tasks involved scraping poems and articles authored by and related to multiple poets, processing, and visualizing them to uncover facts and insights.
Part-Of-Speech tagging, Tokenization, Topic Modeling, Fine-Tuning, Retrieval Augmentation, Knowledge Graph, Poem
Link: NLP POS Substitutions and Tones for Poets Notebook
2. Gold Standard identification and comparison for Pushcart Vs Sample Short Text using sentiments, topics, POS and KG's (Week 9)
Link: Gold Standard Notebook
Link: Fine-tune LLM Notebook
Link: Retrieval Augmented Generation Notebook
Link: Mining News Articles Notebook
To explore the beauty of poetry and dive into its linguistic nuance through NLP techniques.
Poetry of Rudyard Kipling and Alfred Lord Tennyson.
- Scraped poems from allpoetry.com using BeautifulSoup library.
- Explored POS and embeddings to identify stylistic similarities between poets.
- Transposed POS between poets based on semantic similarity.
Summarization, topic modeling, and insights from BERTopic.
Evaluating poetic similarity and condensing text for clarity.
Utilize NLP techniques to compare Pushcart-nominated poems with non-nominated ones.
Datasets of Pushcart-nominated and non-nominated poems.
- Web scraping for data collection.
- Plotting POS distributions and generating knowledge graphs.
POD distributions, knowledge graph visualization, and narrative insights.
Distinctions between nominated and non-nominated poems.
Understand the fine-tuning process of a Language Model and apply it to generate content in a poet's style.
Poems of Rudyard Kipling.
- Fine-tuning the model with Kipling's poems.
- Training the model and evaluating results.
Generated poems from the fine-tuned model.
Model performance and limitations.
Improve the quality of responses from a fine-tuned model using Retrieval-augmented generation (RAG).
Fine-tuned model from the previous assignment.
- Implementing RAG on the model with supplemental information.
- Comparing responses with and without RAG.
Responses from the model with and without RAG.
Impact of RAG on model responses.
Determine the similarity between a poem and an article using knowledge graphs.
Poem by Rudyard Kipling and related article.
- Scraping and preprocessing data.
- Generating and analyzing knowledge graphs.
Visualization of knowledge graphs and cosine similarity analysis.
Effectiveness of knowledge graphs in determining similarity.