Skip to content

SakshiKekre/AI-ML-PythonProjects

Repository files navigation

Performed two major projects throughout the Machine Learning Course:

  1. Exploring machine learning models for numeric problem statements. We chose to solve a real time problem of homelessness. Please see jupyter notebook file named NumericProject_HomelessnessPrediction.ipynb

  2. Exploring LLMs and related libraries and tools. See file: TeamDuos_NLPProblemStatements.ipynb. More info below

Natural Language Processing Projects

Performed by Team Members:

  • Sakshi Kekre
  • Poojan Shah

Project Title:

Decoding Kipling: NLP Assignments in Language Wizardry

Project Description:

This document synthesizes the work done for Natural Language Processing (NLP) assignments performed throughout the second half of our course. The tasks involved scraping poems and articles authored by and related to multiple poets, processing, and visualizing them to uncover facts and insights.

Keywords:

Part-Of-Speech tagging, Tokenization, Topic Modeling, Fine-Tuning, Retrieval Augmentation, Knowledge Graph, Poem

Assignments Performed:

1. NLP POS Substitutions and Tones for Poets (Week 7)

Link: NLP POS Substitutions and Tones for Poets Notebook

2. Gold Standard identification and comparison for Pushcart Vs Sample Short Text using sentiments, topics, POS and KG's (Week 9)

Link: Gold Standard Notebook

3. Fine-tune a LLM for your Poet (Week 10)

Link: Fine-tune LLM Notebook

4. Retrieval Augmented Generation (Week 11)

Link: Retrieval Augmented Generation Notebook

5. Mining News Articles and Assembling a Knowledge Graph (Week 13)

Link: Mining News Articles Notebook

1. NLP POS Substitutions and Tones for Poets

Objective:

To explore the beauty of poetry and dive into its linguistic nuance through NLP techniques.

Input:

Poetry of Rudyard Kipling and Alfred Lord Tennyson.

Experiments:

  • Scraped poems from allpoetry.com using BeautifulSoup library.
  • Explored POS and embeddings to identify stylistic similarities between poets.
  • Transposed POS between poets based on semantic similarity.

Output:

Summarization, topic modeling, and insights from BERTopic.

Insights/Conclusion:

Evaluating poetic similarity and condensing text for clarity.

2. Gold Standard for Pushcart Poems

Objective:

Utilize NLP techniques to compare Pushcart-nominated poems with non-nominated ones.

Input:

Datasets of Pushcart-nominated and non-nominated poems.

Experiments:

  • Web scraping for data collection.
  • Plotting POS distributions and generating knowledge graphs.

Output:

POD distributions, knowledge graph visualization, and narrative insights.

Insights/Conclusion:

Distinctions between nominated and non-nominated poems.

3. Fine-tune a LLM for your Poet

Objective:

Understand the fine-tuning process of a Language Model and apply it to generate content in a poet's style.

Input:

Poems of Rudyard Kipling.

Experiments:

  • Fine-tuning the model with Kipling's poems.
  • Training the model and evaluating results.

Output:

Generated poems from the fine-tuned model.

Insights/Conclusion:

Model performance and limitations.

4. Retrieval Augmented Generation

Objective:

Improve the quality of responses from a fine-tuned model using Retrieval-augmented generation (RAG).

Input:

Fine-tuned model from the previous assignment.

Experiments:

  • Implementing RAG on the model with supplemental information.
  • Comparing responses with and without RAG.

Output:

Responses from the model with and without RAG.

Insights/Conclusion:

Impact of RAG on model responses.

5. Mining News Articles and Assembling a Knowledge Graph

Objective:

Determine the similarity between a poem and an article using knowledge graphs.

Input:

Poem by Rudyard Kipling and related article.

Experiments:

  • Scraping and preprocessing data.
  • Generating and analyzing knowledge graphs.

Output:

Visualization of knowledge graphs and cosine similarity analysis.

Insights/Conclusion:

Effectiveness of knowledge graphs in determining similarity.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published