GitHub - siddhi1201-dev/DSA

Plagiarism Detector in C (Using Custom Hashing) *Overview This is a lightweight Plagiarism Detection Tool implemented in C, designed to compare multiple text files and identify similarities by leveraging a custom hashing technique. Instead of using standard hashing algorithms, it employs a simple and efficient bit-shift based hash function (e.g., using << 5) to generate hash values for words or n-grams. This method helps detect overlapping content quickly and with low computational overhead.

*How It Works Reads each input file and tokenizes the text into words or adjustable-sized n-grams. Applies a custom hash function that uses bitwise operations (such as shifting bits left by 5) to convert tokens into hash values. Stores these hashes in a hash table to track occurrences across documents. Compares documents by counting the number of shared hashes, calculating a similarity score based on overlapping content. Outputs similarity scores that help identify potential plagiarism or content reuse.

*Features Fast and lightweight file comparison using a custom bit-shift based hashing method. Adjustable n-gram size for fine-tuning similarity detection. Simple command-line interface. Outputs clear similarity scores between pairs of files. Scales to multiple files efficiently.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
mainproject.c		mainproject.c
project.c		project.c
project.h		project.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages