MACHINE LEARNING ALGORITHMS FOR INSIGHTFUL ANALYSIS OF COMPLEX DATA STRUCTURES
->hrvatski
The project is supported by
Croatian Science Foundation
Contract number: 9623
Starting date: August 16th 2014
Finished: August 15th 2017
Project summary
Induction is a process of knowledge extraction from information contained in data.
In our work we will concentrate on descriptive induction whose goal is to construct knowledge that
enables human understanding of the data. Included are techniques for construction of user-interpretable models,
segmentation of the corpus of examples, and detection of outliers.
The methodology is relevant for the computer science fields known as intelligent data analysis,
knowledge discovery from data, and data mining. At Rudjer Boskovic Institute we are developing machine
learning algorithms for more than 15 years and we have successfully applied them in various domains
including chemistry, biology, medicine, social sciences, economics, and manufacturing.
With this project we want to extend the existing methodology and to implement novel techniques
able to cope with data contained in complex structures.
The main topic will be spatio-temporal structures but we will work also with networks of data,
relational databases, and data contained in ontologies. Previous experience clearly demonstrates
that complete transformation of information contained in structured data into a form that may enter
the induction process is not a simple task. Our goal is to develop and implement systematic
and general approaches for this conversion. The consequence will be the explosion of
generated data that must enter into the process of induction. Therefore, the second goal will
be the implementation of efficient algorithms for descriptive induction.
The work includes development of novel algorithms for clustering and outlier detection in sets of
unclassified examples and implementation of rule learning algorithms in hardware.
The third goal is application and evaluation of implemented algorithms in various real life domains.
The success of the complete project will be measured by the quality and usefulness of knowledge obtained
in these applications.
Members
- Strahil Ristov, PhD - www
- head (since 1.1.2017.)
- Dragan Gamberger, PhD - www
- head (till 31.12.2016.)
- Tomislav Šmuc, PhD - www
- Ivan Michieli, PhD - www
- Branka Medved Rogina, PhD - www
- Peter Škoda, PhD - www (till 15.9.2016.)
- Damir Korenčić, PhD student - www
(till 13.12.2016.)
- Matija Piškorec, PhD student - www
- Nino Antulov-Fantulin, PhD - www
- Maria Brbić, PhD student - www (since 2.1.2015.)
- Dijana Tolić, PhD - www (since 1.1.2017.)
Technical support
Activities
- November 2016., M. Piškorec gave the talk Modeling peer and external influence in online social networks .
- October 2016., Accepted paper: Matej Mihelčić, Sašo Džeroski, Nada Lavrač, Tomislav Šmuc,
A framework for redescription set construction, Expert Systems with Applications.
- October 2016., Accepted paper: Maria Brbić , Matija Piškorec , Vedrana Vidulin , Anita Kriško , Tomislav Šmuc , Fran Supek,
The landscape of microbial phenotypic traits and their genetic determinants, Nucleic Acids Research.
- September 2016., T. Šmuc and M. Brbić participated as speaker and student, respectively at Summer School on mining big and complex data .
- July 2016., D. Korenčić sudjelovao na konferenciji PolText 2016 , International Conference on the Advances in Computational Analysis of Political Text, Dubrovnik, 14.-16.07.2016.
- March 2016., Nino Antulov-Fantulin started postdoc at ETH Zurich, Switzerland.
- June 2015., M. Piškorec participated with poster "Modeling peer and external infuence in online social network" at conference Network Science 2015 , Zaragoza, Spain, 1.-5. 6. 2015.
- May 2015, a group of the researchers leaded by M. Brbić participated in PAKDD'15 Data Mining Competition. The team has won rank 12 and it has been rewarded by an invitation for the final presentation at the Workshop.
- April 2015, together with colleagues from Jozef Stefan Institute, Ljubljana organized a two-day international Workshop on Knowledge Technologies 2015 April 2015, Nino Antulov-Fantulin defended its PhD thesis titled: Statistical inference algorithms for epidemic processes on complex networks.
- March 2015, Maestra FP7 project review after the first year took place in Zagreb. photo
- January 2015, M. Brbić started as a PhD student.
- October 8-10, N. Antulov-Fantulin, D. Gamberger, and T. Šmuc, participated in Discovery Science 2014 conference, Bled, Slovenija.
- July 18, Lecture on methods of outlier detection in transactional databases. Press report.
- June 18-20, second Maestra meeting in Porto, Portugal Project meeting.
- Department members at the first day of the project photo.
- In February 2014 started work on EU FP7 project Maestra: Learning from Massive, Incompletely Annotated, and Structured Data.
- In November 2013 started work on EU FP7 project MULTIPLEX: Foundational Research on Multilevel Complex Networks and Systems.
Published papers in year 2017
- Gamberger, D., Lavrac, N., Srivatsa, S., Tanzi, R. E., Doraiswamy, P. M. (2017) Identification of clusters of rapid and slow decliners among subjects at risk for Alzheimer’s disease, Scientific Reports 7:6763-1-6763-12.
- Mihelcic M., Lavrac N., Dzeroski S., Smuc T. (2017) A framework for redescription set construction. Expert Systems With Applications 68:196–215.
- Mihelcic M., Dzeroski S., Lavrac N., Smuc T., (2017) Redescription mining augmented with random forest of predictive clustering trees, Journal of Intelligent Information Systems, 1-34.
- Brbic, M., Kopriva, I. (2017) Multi-view Low-rank Sparse Subspace Clustering. Pattern Recognition, In press.
- Gamberger, D., Zenko, B., Lavrac, N. (2017) Exploratory Clustering for Patient Subpopulation Discovery. In Proc. of EFMI 2017: Informatics for Health: Connected Citizen-Led Wellness and Population Health, 101-105.
- Ristov S., Vaser R., Sikic M. (2017) Trade-offs in query and target indexing for the selection of candidates in protein homology searches. Proceedings of The Prague Stringology Conference 2017, Jan Holub and Jan Zdarek (ed.). Prag, Czech Technical University in Prague.
- Brbic, M., Piskorec, M., Vidulin, V., Krisko, A., Smuc, T., Supek, F. (2017) Phenotype Inference from Text and Genomic Data. Accepted at: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD).
Published papers in year 2016
- Gamberger, D., Zenko, B., Mitelpunkt, A., Lavrac, N. (2016) Homogeneous clusters of Alzheimer’s disease patient population.
BioMedical Engineering OnLine, 15(Suppl 1):78.
- Ristov, S. (2016) A Fast and Simple Pattern Matching with Hamming Distance on Large Alphabets.
Journal of Computational Biology, web publication.
- Gamberger, D., Zenko, B., Mitelpunkt, A., Schachar, N., Lavrac, N. (2016) Clusters of male and female Alzheimer’s disease patients in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database.
Brain Informatics, web publication.
- Skoda, P., Medved Rogina B. (2016) FPGA Kernels for Classification Rule Induction.
MIPRO 2016/DC VIS pp. 353-358.
- Ristov S., Brajkovic V., Cubric-Curik V., Michieli I., Curik I. (2016) MaGelLAn 1.0: a software to facilitate quantitative and population genetic analysis of maternal inheritance by combination of molecular and pedigree information. Genetics Selection Evolution 48:65.
- Brbic, M., Piskorec, M., Vidulin, V., Krisko, A., Smuc, T., Supek, F. (2016) The Landscape of Microbial Phenotypic Traits and Associated Genes. Nucleic Acids Research 44;10074–10090.
- Vidulin, V. , Brbic, M., Supek, F., Smuc, T. (2016) Evaluation of Fusion Approaches in Large-scale Bio-annotation Setting. 4th Workshop on Machine Learning in Life Science at ECML PKDD 2016, Riva del Garda, Italy, 37-51.
- Mihelcic M., Smuc T. (2016) InterSet: Interactive Redescription Set Exploration, Proc. of 19th International Conference on Discovery Science, Bari, Italy, October 19-21, 2016, Lecture Notes in Computer Science, Volume 9956 LNAI, 35-50.
Published papers in year 2015
- Piskorec, M., Sluban, B., Smuc, T. (2015) MultiNets: Web-based multilayer network visualization. Proceedings of ECML/PKDD III, pp. 298-302.
- Korencic, D., Ristov, S., Snajder, J. (2015) Getting the agenda right: Measuring media agenda using topic models. Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications ACM 2015, pp.61-66.
- Gamberger, D., Zenko, B., Mitelpunkt, A., Lavrac, N.(2015)
Identification of gender
specific biomarkers for Alzheimer's disease. In Proc. of Brain Informatics and Health, BIH 2015, pp.57-66.
- Antulov-Fantulin, N., Lancic, A., Smuc, T., Stefanic, H., Sikic, M. (2015) Identification of patient zero in static and temporal networks: Robustness and Limitations. Physical Review Letters. Vol. 114.
- Brbic, M., Warnecke, T., Krisko, A., Supek, F. (2015) Global shifts in genome andproteome composition are very tightly coupled. Genome Biology nd Evolution. Vol. 7:6 pp.1519-1532.
- Gamberger, D., Zenko, B., Mitelpunkt, A., Lavrac, N.(2015) Multilayer clustering: Biomarker driven segmentation of Alzheimer's disease patient population. Proceedings of Int. Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2015, pp.134-145.
Published papers in year 2014
-
Antulov-Fantulin, N., Lancic, A., Stefancic, H., Sikic, M., Smuc, T. (2014) Statistical inference framework for source detection of contagion
processes on arbitrary network structures. Proceedings of 2014 IEEE Eighth International Conference on Self-Adaptive and Self-Organizing Systems Workshops, pp78-83,
London, 8.-12.9.2014.
- Rios-Morales, R., Gamberger, D., Brennan, L., Sweitzer, M. (2014) Ex-ante assessment of an EU-China free trade agreement.
In Proceedings, Vrontis, Weber, Tsoukatos (Eds.), 7th EuroMed Conference - The Future of Entrepreneurship,
Kristiansand, Norway. EuroMed Press.
- Gamberger, D., Mihelcic, M., Lavrac, N. (2014) Multilayer clustering: A discovery experiment
on country level trading data. .
Discovery Science 2014 Conference Proceedings, Springer, pp 87-98 .
- Antulov-Fantulin, N., Bosnjak, M., Zlatic, V., Grcar, M., Smuc, T. (2014) Synthetic Sequence Generator for Recommender Systems - Memory Biased Random Walk on a Sequence Multilayer Network on country level trading data. .
Discovery Science 2014 Conference Proceedings, Springer, pp 25-36.
- Sluban, B., Gamberger, D., Lavrac, N. (2014) Ensemble-based noise detection: noise ranking and visual performance evaluation..
Data Mining and Knowledge Discovery, 28:265-303.