Welcome to the
Data Mining Lab



About

We conduct innovative research on all aspects of knowledge discovery and data mining, ranging from theoretical foundations to novel models and algorithms for data mining problems in science, business, medicine, and engineering.

Lab Mandate
  • Conduct basic research and development in the area of knowledge discovery and data mining.
  • Advance the "science" of knowledge discovery and data mining by supporting training programs and computer science courses.
  • Equip students with both theoretical knowledge and practical experience in the area of knowledge discovery and data mining.
  • Provide an environment for students that fosters the exchange of ideas and collaborations with industry and academic partners, so they can grow as scientists and researchers.
Research Areas
  • data mining
  • machine learning
  • graph mining
  • natural language processing
  • big data analytics
  • knowledge discovery
  • data visualization
  • data mining applications

People

The Data Mining Lab's achievements are the direct result of the talent and dedication of our people.

Faculty Members

Aijun An

Professor

Nick Cercone

(Late) Professor

Manos Papagelis

Assistant Professor

Current and Former Postdoctoral Fellows (PDF)
Doctoral Students/Candidates (PhD)
Master Students (MSc)
Undergraduate research assistants/interns
  • Mahmoud Alsaeed (scalable analytics of object intersection problems, SU19)
  • Zhiyuan Cao (dynamic network representation learning, SU19)
  • Kenneth Tjhia (trajectory and network representation learning, SU19)
Staff / Research Assistants
  • Bon Ryu (distributed deep learning, machine learning)
Alumni (in chronological order)
  • Heidar Davoudi (PhD, 2018, User Acquisition and Engagement in Digital News Media)
  • Nima Shahbazi (PhD, 2018, Discovery and Effective Use of Frequent Item-sets and Assosciation Rules in Datasets (co-supervised with Jarek Gryz))
  • Emad Gohari (MSc, 2018, Interactive Question Answering Using Frame-based Knowledge Representation)
  • Forouq Khonsari (MSc, 2018, Mining Large-Scale News Articles for Predicting Forced Migration via Machine Learning Techniques)
  • Ameeta Agrawal (PhD, 2018, Enriching Affect Analysis through Emotion and Sarcasm Detection)
  • Morteza Zihayat (PhD, 2016, Mining High Utility Patterns over Data Streams)
  • Yan (Jason) Chen (MSc, 2015, Approximate Parallel High Utility Itemset Mining)
  • Elnaz Delpisheh (PhD, 2015, Extending Topic Models with Syntax and Semantics Relationships)
  • Mehdi Kargar (PhD, 2013, Keyword Search in Graphs, Relational Databases and Social Networks)
  • Martin Dimkovski (MSc, 2012, A Novel Computational Model of Neocortical Columns with Glia as Learning Agent)
  • Ameeta Agrawal (MSc, 2011, Unsupervised Emotion Detection from Text Using Semantic and Syntactic Relations)
  • Bahareh Sarrafzadeh (MSc, 2011, Cross-lingual Word Sense Disambiguation for Languages with Scarce Resources)
  • Hashmat Rohian (MSc, 2011, Discovering Temporal Associations among Significant Changes (co-supervised with Jimmy Huang))
  • Damon Sotoudeh-Hosseini (MSc, 2010, Detecting Partial Drifts Using a Rule Induction Framework)
  • Qian Wan (PhD, 2009, Contrast and Compact Data Mining: Discovering Novel and Useful Patterns from Large Databases)
  • Miro Kuc (MSc, 2009, Cluster Validation Indices: Sensitivity to Distance between Clusters and Affinity to Concurrency)
  • Yang Liu (PhD, 2009, Review Mining from Online Media (co-supervised with Jimmy Huang))
  • Vlad Gerchikov (MSc, 2008, AV-Space with Paging and Performance Comparison)
  • Qinsong Yao (PhD, 2006, Discovering and Using Database User Access Patterns)
  • Bill Andreopoulos (PhD, 2006, Clustering Algorithms for Categorical Data (co-supervised with Steven Wang))
  • Linyan Wang (MSc, 2006, AV-Space for Efficiently Learning Classification Rules from Large Data Sets)
  • Yu Li (MSc, 2005, Integrating XML Data for Virtual OLAP using XML Schemas and UML)
  • Yang Liu (MSc, 2004, Markov Model-based Methods for Web User Clustering and Surfing Recommendation (co-supervised with Jimmy Huang))
  • Ying Zou (MSc, 2004, A Comparison and Selection of Methods for Handling Missing Data in Data Mining)
  • Zhirong Tao (MSc, 2004, Scalable-CLUES: A Scalable Non-parametric Clustering Method Based on Local Shrinking)
  • Qian Wan (MSc, 2003, Efficient Mining of Indirect Associations Using HI-mine)
  • Leah Spo (MSc)
  • Xiangdong An (PhD)
  • Serene Wong (PhD)
  • Kayvan Tirdad (PhD Candidate)
Join Us!

We are looking for bright and hard-working domestic or international students at all levels (Postdoc, PhD, MSc, Senior undergrad). If you are interested in conducting research in the area of data mining and machine learning we would love to hear from you.

Research

Our research builds upon a foundation of academic and industry collaborations that aims to transfer knowledge and have a global impact to academia and industry.

Active

Reinforcement Learning

Active

Deep Learning

Active

Text Mining

Active

Streaming Graph Mining

Active

Trajectory Data Mining

Active

Network Representation Learning

Publications (2010 - present)

Our research is published in peer-reviewed conferences and journals, ensuring the impact of our work reaches to the data mining community at large.

2019
  • A Versatile Computational Framework for Group Pattern Mining of Pedestrian Trajectories. A. Sawas, A. Abuolaim, M. Afifi, M. Papagelis. GeoInformatica (Vol. X, No. X, 2019)
  • A Utility-based News Recommendation System. M. Zihayat, A. Ayanso, X. Zhao, H. Davoudi and A. An. Decision Support Systems, Vol. 117, February 2019, pp.14-27. (DSS).
  • Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning. X. Zhao, A. An, J. Liu and B. X. Chen. Proceedings of the 39th IEEE International Conference on Distributed Computing Systems (IEEE ICDCS 2019).
  • Content-based Dwell Time Engagement Prediction Model for News Articles. H. Davoudi, A. An and G. Edall. Proceedings of the 17th Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2019).
2018
  • dynnode2vec: Scalable Dynamic Network Embedding. S. Mahdavi, S. Khoshraftar and A. An. Proceedings of the 2018 IEEE International Conference in Big Data (BigData 2018).
  • Adaptive Paywall Mechanism for Digital News Media. H. Davoudi, A. An, M. Zihayat and G. Edall. Proceedings of the 2018 ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2018).
  • Affective Representations for Sarcasm Detection. A. Agrawal and A. An. Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018).
  • Decoupling the Layers in Residual Networks. R. Fok, A. An, Z. Rashidi, X. Wang. Proceedings of the 6th International Conference in Learning Representations (ICLR 2018).
  • Improving Real-time Pedestrian Detection using Adaptive Confidence Thresholding and Inter-Frame Correlation. M. Al-Shatnawi, V. Movahedi, A. Asif, A. An. Proceedings of the 20th IEEE International Workshop on Multimedia Signal Processing (IEEE MMSP 2018).
  • Learning Emotion-enriched Word Representations. A. Agrawal, A. An, M. Papagelis. Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018).
  • Scene Classification in Indoor Environments for Robots using Word Embeddings. B. X. Chen, R. Sahdev, D. Wu, X. Zhao, M. Papagelis, and J. K. Tsotsos. Proceedings of the International Conference on Robotics and Automation 2018 Workshop on Multimodal Robot Perception (ICRA MRP 2018).
  • Tensor Methods for Group Pattern Discovery of Pedestrian Trajectories. A. Sawas, A. Abuolaim, M. Afifi, M. Papagelis. Proceedings of the 19th IEEE International Conference on Mobile Data Management (IEEE MDM 2018, best paper award).
  • Trajectolizer: Interactive Analysis and Exploration of Trajectory Group Dynamics. A. Sawas, A. Abuolaim, M. Afifi, M. Papagelis. Proceedings of the 19th IEEE International Conference on Mobile Data Management (IEEE MDM 2018, demo).
  • EvoNRL: Evolving Network Representation Learning based on Random Walks. F. Heidari, M. Papagelis. Proceedings of the 7th International Conference on Complex Networks and Their Applications (Complex Networks 2018).
  • Fast and Accurate Mining of Node Importance in Trajectory Networks. T. Pechlivanoglou, M. Papagelis. Proceedings of the 6th IEEE International Conference on Big Data (IEEE Big Data 2018).
2017
  • Memory-Adaptive High Utility Sequential Pattern Mining over Data Streams. M. Zihayat, Y. Chen and A. An. Machine Learning, 106(6), 799-836, 2017.
  • Efficiently Mining High Utility Sequential Patterns in Static and Streaming Data. M. Zihayat, C-W. Wu, A. An and V. S. Tseng. Intelligent Data Analysis, Vol.21, No.S1, pp.S103-S135, 2017.
  • Geodesic and Contour Optimization Using Conformal Mapping. R. Fok, A. An and X. Wang. Journal of Global Optimization, 69(1): 23-44 (2017).
  • Mining significant high utility gene regulation sequential patterns. Morteza Zihayat, Heidar Davoudi, Aijun An. BMC Systems Biology 11(6): 109:1-109:14 (2017).
  • Mining Evolving Data Streams with Particle Filters. R. Fok, A. An and X. Wang. Computational Intelligence, 33(2): 147-180 (2017).
  • BIM-based Collaborative Design and Socio-technical Analytics of Green Buildings. T. El-Diraby, T. Krijnen, M. Papagelis. Automation in Construction (AiC, Vol. 82, NO. 10, 2017)
  • Contrast Pattern based Collaborative Behavior Recommendation System for Life Improvement. Y.Chen, M. L. Yann, H. Davoudi, J. Choi, A. An and Z. Mei. Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Jeju, South Korea, May 23-26, 2017.
  • Time-Aware Subscription Prediction Model for User Acquisition in Digital News Media. Heidar Davoudi, Morteza Zihayat and Aijun An. Proceedings of the 2017 SIAM International Conference on Data Mining (SDM'17), Houston, Texas, USA, April 27-29, 2017.
  • Authority-based Team Discovery in Social Networks. Morteza Zihayat, Aijun An, Lukasz Golab, Mehdi Kargar and Jaroslaw Szlichta. Proceedings of the 20th International Conference on Extending Database Technology (EDBT'17), Venice, Italy, March 21-24, 2017.
2016
  • Approximate Parallel High Utility Itemset Mining. Y. Chen and A. An. Big Data Research, Vol. 6: 26-42 (2016). (Source code for PHUI-Miner).
  • Time aware topic based recommender system. E. Delpisheh, A. An, H. Davoudi and E. Gohari. Big Data and Information Analytics (BDIA), Vol. 1, No. 2/3, 261-274, 2016.
  • Top-k Utility-based Gene Regulation Sequential Pattern. M. Zihayat, H. Davoudi, and A. An. Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM 2016), in Shenzhen, China, Dec 15-18, 2016.
  • Distributed and Parallel High Utility Sequential Pattern Mining. M. Zihayat, Z. Z. Hu, A. An, and Y. Hu. Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), Washington D.C., USA, December 5-8, 2016.
  • Deep Parallelization of Parallel FP-Growth Using Parent-Child MapReduce. A. Makanju, Z. Farzanyar, A. An, N. Cercone, Z. Hu, and Y. Hu. Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), Washington D.C., USA, December 5-8, 2016.
  • Selective Co-occurrences for Word-Emotion Association. A. Agrawal and A. An. Proceedings of the 26th International Conference on Computational Linguistics (COLING'16), Osaka, Japan, December 11-16, 2016.
  • Detecting the Magnitude of Events from News Articles. A. Agrawal, R. Sahdev, H. Davoudi, F. Khonsari, A. An and S. McGrath. Proceedings of the 2016 IEEE/WIC/ACM International Conference on Web Intelligence, Omaha, Nebraska, USA, October 13-16, 2016.
  • Computational Role of Astrocytes in Bayesian Inference and Probability Distribution Encoding. M. Dimkovski and A. An. Proceedings of the 2016 International Conference on Brain Informatics & Health (BIH'16), Omaha, Nebraska, USA, October 13-16, 2016.
  • Ranking Documents through Stochastic Sampling on Bayesian Network-based Models: A Pilot Study. X. Tan, J. X. Huang and A. An. Proceeding of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR'16), Pisa, Italy, July 17-21, 2016. 961-964.
  • Building FP-Tree on the Fly: Single-Pass Frequent Itemset Mining. N. Shahbazi, R. Soltani, J. Gryz, A. An. (2016). Proceedings of the 12th International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM 2016), New York, United States, July 16-21, 2016. 387-400.
  • Green2.0: Enabling Complex Interactions Between Buildings and People. M. Papagelis, T. F. Krijnen, M. Elshenawy, T. Konomi, R. Fang, T. El-Diraby. Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2016)
2015
  • Refining Social Graph Connectivity via Shortcut Edge Addition. M. Papagelis. ACM Transactions on Knowledge Discovery from Data (ACM TKDD, Vol. 10, NO. 2, 2015)
  • A Bayesian Model for Canonical Circuits in the Neocortex for Parallelized and Incremental Learning of Symbol Representations. M. Dimkovski and A. An. Neurocomputing, 149: 1270-1279, 2015.
  • Finding Top-k r-cliques for Keyword Search from Graphs in Polynomial Delay. Kargar and A. An. Knowledge and Information Systems (KAIS), 43(2): 249-280, 2015.
  • Mining high utility sequential patterns from evolving data streams. M. Zihayat, C.-W. Wu, A. An, and V. S. Tseng. Proceedings of the Fifth ASE International Conference on Big Data (BigData 2015), Kaohsiung, Taiwan, pages 52:1-52:6, 2015.
  • Ontology-Based Topic Labeling and Quality Prediction. H. Davoudi and A. An. Proceedings of International Symposium on Methodologies for Intelligent Systems (ISMIS 2015), Lyon, France, October 2015.
  • Meaningful Keyword Search in Relational Databases with Large and Complex Schema. M. Kargar, A. An, N. Cercone, P. Godfrey, J. Szlichta and X. Yu. Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE'15), Seoul, Korea, April 13-17, 2015. 411-422.
2014
  • Mining Top-k High Utility Patterns Over Data Streams. M. Zihayat and A. An. Information Sciences, 285, 2014 (IS).
  • Efficient Duplication Free and Minimal Keyword Search in Graphs. M. Kargar, A. An and X. Yu. IEEE Transactions on Knowledge and Data Engineering (TKDE), 26(7): 1657-1669, 2014.
  • Topic Modeling using Collapsed Typed Dependency Relations. E. Delpisheh, A. An. Proceedings of the 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2014).
  • MeanKS: Meaningful Keyword Search in Relational Databases with Complex Schema. M. Kargar, A. An, N. Cercone, P. Godfrey, J. Szlichta and X. Yu. proceedings of the 2014 ACM International Conference on Management of Data (ACM SIGMOD 2014, demo).
  • Two-Phase Pareto Set Discovery for Team Formation in Social Networks. M. Zihayat, M. Kargar and A. An. Proceedings of the 2014 IEEE/WIC/ACM International Conference on Web Intelligence (WI'14), Warsaw, Poland, August 11-14, 2014. 304-311.
2013
  • Sampling Online Social Networks. M. Papagelis, G. Das, N. Koudas. IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE, Vol. 25, NO. 3, Mar 2013)
  • Detection of Malicious and Non-malicious Website Visitors Using Unsupervised Neural Network Learning. D. Stevanovic, N. Vlajic and A. An. Applied Soft Computing, Elsevier, 13(1): 698-708, 2013.
  • Riding the Tide of Sentiment Change: Sentiment Analysis with Evolving Online Reviews. Y. Liu, X. Yu, X. Huang and A. An. World Wide Web Journal, 16(4), 477-496, 2013.
  • Signal detection in genome sequences using complexity based features. M. Kargar, A. An, N. Cercone, K. Tirdad, M. Zihayat. Proceedings of the 12th International Workshop on Data Mining in Bioinformatics (BioKDD 2013), Chicago, IL, USA, August 2013. 25-33.
  • Finding Affordable and Collaborative Teams from a Network of Experts. M. Kargar, M. Zihayat and A. An. Proceedings of the 2013 SIAM International Conference on Data Mining (SDM'13), Austin, Texas, USA, May, 2013. 587-595.
2012
  • Feature Evaluation for Web Crawler Detection with Data Mining Techniques. D. Stevanovic, A. An and N. Vlajic. Expert Systems with Applications, 39(10): 8707-8717, 2012.
  • Mining Online Reviews for Predicting Sales Performance: A Case Study in the Movie Domain. Yu, X., Liu, Y., Huang, X. and An, A. IEEE Transactions on Knowledge and Data Engineering (TKDE), 24(4): 720-734, 2012.
  • Unsupervised Emotion Detection from Text using Semantic and Syntactic Relations. A. Agrawal and A. An. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI'12), Macau, China, December 4-7, 2012. 346-353.
  • Efficient Bi-objective Team Formation in Social Networks. M. Kargar, A. An and M. Zihayat. Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD'12), Bristol, U.K., September 2012. 483-498.
  • Efficient Top-k Keyword Search in Graphs with Polynomial Delay. M. Kargar and A. An. Proceedings of the 28th IEEE International Conference on Data Engineering (ICDE'12 demos), Washington D.C, April 1-5, 2012. 1269-1272.
2011
  • Keyword Search in Graphs: Finding r-cliques. M. Kargar and A. An. Proceedings of the VLDB Endowment, Vol.4, No.10, 2011. pp.681-692.
  • Finding Best Evidence for Evidence-based Best Practice Recommendations in Health Care: the Initial Decision Support System Design. N. Cercone, X. An, J. Li, Z. Gu, and A. An. Knowledge and Information Systems: an International Journal (KAIS), Vol.29, No.1, 159-201, 2011.
  • Combining Integrated Sampling with SVM Ensembles for Learning from Imbalanced Datasets. Y. Liu, X. Yu, X. Huang and A. An. Information Processing & Management (IPM), Vol.47, No.4, 617-631, 2011.
  • TeamExp: Top-k Team Formation in Social Networks. M. Kargar and A. An. Proceedings of the Workshops for the 2011 IEEE International Conference on Data Mining (ICDM'11 demos), Vancouver, Canada, in December 2011. 1231-1234.
  • Discovering Top-k Teams of Experts with/without a Leader in Social Networks. M. Kargar and A. An. Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM'11), Glasgow, U.K., October 24-28, 2011. 985-994.
  • Keyword Search in Graphs: Finding r-cliques. M. Kargar and A. An. Proceedings of the VLDB Endowment, Vol.4, No.10, 2011. pp.681-692. Full paper for the 37th International Conference on Very Large Data Bases (VLDB'11), Seattle, WA, 2011.
  • Unsupervised Clustering of Web Sessions to Detect Malicious and Non-malicious Website Users. D. Stevanovic, N. Vlajic, A. An. Proceedings of the 2nd International Conference on Ambient Systems, Networks and Technologies, Niagara Falls, Canada, September 2011. 123-131.
  • Detecting Web Crawlers from Web Server Access Logs with Data Mining Classifiers. D. Stevanovic, A. An, and N. Vlajic. Proceedings of the 19th International Symposium on Methodologies for intelligent Systems (ISMIS'11), Warsaw, Poland, June 28-30, 2011.
  • Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian. B. Sarrafzadeh, N. Yakovets, N. Cercone and A. An. Proceedings of the 19th International Symposium on Methodologies for intelligent Systems (ISMIS'11), Warsaw, Poland, June 28-30, 2011. 449-455.
  • Cross Lingual Word Sense Disambiguation for Languages with Scarce Resources. B. Sarrafzadeh, N. Yakovets, N. Cercone and A. An. Proceedings of the 24th Canadian Conference on Artificial Intelligence (AI'11), St. John's, Newfoundland and Labrador, Canada, May 25-27, 2011. 347-358.
  • Suggesting Ghost Edges for a Smaller World. M. Papagelis, F. Bonchi, A. Gionis. Proceedings of the 20th ACM Conference on Information and Knowledge Management (ACM CIKM 2011).
  • Individual Behavior and Social Influence in Online Social Systems. M. Papagelis, V. Murdock, R. van Zowl. Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia (ACM Hypertext 2011).
2010
  • Active Media Technology, Proceedings of AMT 2010, Lecture Notes in Computer Science 6335. A. An, P. Lingras, S. Petty and R. Huang, .Springer, 2010.
  • Partial Drift Detection Using a Rule Induction Framework. D. Sotoudeh and A. An. Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM'10), Toronto, Canada, October 26-30, 2010.
  • Evaluation of Different Complexity Measures for Signal Detection in Genome Sequences. M. Kargar and A. An. Proceedings of the 2010 ACM International Conference On Bioinformatics and Computational Biology (ACM-BCB'10), Niagara Falls, NY, August 2-4, 2010.
  • The Effect of Sequence Complexity on the Construction of Protein-Protein Interaction Networks. M. Kargar and A. An. Proceedings of the 2010 International Conference on Brain Informatics (BI'10), Toronto, Canada, August 28-30, 2010.
  • S-PLSA+: Adaptive Sentiment Analysis with Application to Sales Performance Prediction. X. Yu, Y. Liu, X. Huang, A. An. Proceedings of the 33rd Annual International ACM SIGIR Conference on Research & Development on Information Retrieval (SIGIR'10), Geneva, Switzerland, on 19-23 July 2010. 873-874.
  • A Quality-Aware Model for Sales Prediction Using Reviews. X. Yu, Y. Liu, X. Huang, A. An. Proceedings of the 19th International World Wide Web Conference (WWW 2010), Raleigh, North Carolina, April 26-30, 2010. 1217-1218.

Courses

Members of the Data Mining Lab are teaching the following courses related to data mining, graph mining, big data analytics and data visualization.

Contact Us

The Data Mining Lab is hosted at the Electrical Engineering and Computer Science (EECS) department of the Lassonde School of Engineering of York University.

Visit Us

Rooms 2057 & 3057
Lassonde Building
York University
Toronto, ON, Canada.

Lassonde Building

Lassonde Building

Interactive Map & Directions