MStream

Here is the code of MStream and MStreamF. The code is written by python3.6 with no dependence.

Usage

Modify the hyper-parameters in main.py.
Run code with python main.py.

File Structure

/data/: We provide two datasets: News, Tweets and two processed datasets: News-T, Tweets-T. The datasets are in format of JSON like follows:

{"Id": "006218", "clusterNo": 8, "textCleaned": "scottish independence alistair carmichael issue complacency warning"}
{"Id": "000071", "clusterNo": 40, "textCleaned": "doggie scofflaw bagged dna testing"}

/result/ The outputs will be in the result folder and the result of each batch is saved separately in each folder.
- ClusteringResult.txt: You can check the clustering results of each folder which are in format like follows: (The former is ID of each document and the latter is predicted cluster of each document.)
```
    006218 31
    000071 2
```
- PhiWordsInTopics.txt: You can check the topic words of each folder which are in format like follows:
```
    Topic 0:
        pandemic	0.03410629654705485
        captured	0.03410629654705485
        unsure	0.025643195666892347
        higher	0.025643195666892347
        thought	0.025643195666892347
    Topic 2:
        favorite	0.03323826239578762
        nj	0.03323826239578762
        globe	0.02226853883282141
        won	0.02226853883282141
        seduced	0.02226853883282141
```
- SizeOfEachCluster.txt: You can check the size of each cluster of each folder which are in format like follows: (The former is ID of each cluster and the latter is document number of each cluster.)
```
    0:9,
    2:5,
```

Citation

@inproceedings{yin2018model,
  title={Model-based clustering of short text streams},
  author={Yin, Jianhua and Chao, Daren and Liu, Zhongkun and Zhang, Wei and Yu, Xiaohui and Wang, Jianyong},
  booktitle={Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery \& data mining},
  pages={2634--2642},
  year={2018}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
MStream		MStream
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MStream

Usage

File Structure

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MStream

Usage

File Structure

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages