R4tings Recommender is a Java and Apache Spark based open source recommendation engine. We provide a workbook together with the core code for implementation of the recommender system and a workbook containing code examples.
- Core Code : The core code implementing traditional statistical and machine learning based recommendation techniques and procedures is provided as open source, allowing anyone to freely extend and improve the project source code.
- Workbook and Code Examples : The workbooks help you learn and understand recommendation systems and provide a basis for developing prototypes and testing new techniques. It can also be used as a starting point if you want to implement a commercial-level recommendation system, and induces community participation and contribution.
Introduction
Open source recommender systems can be used for a variety of purposes, but there can be some drawbacks in terms of model restrictiveness, black box models, scalability and large data processing that may occur in real application areas.
- Large data processing : The implemented language and methods may limit the ability to handle large data, and performance problems may occur with large data sets.
- Limited modifiability : Many open source recommendation systems operate based on proprietary models and focus on algorithm implementation of specific formulas. This can make it difficult to modify the formula or extend the model in response to commercialization or application areas.
- Black box model limitations : Some recommender systems treat the model's internal behavior as a black box, making it difficult for users to understand, modify or extend the model's behavior.
The R4tings Recommender project aims to develop an open source recommendation engine and build an open source ecosystem that can be maintained and developed by the community.
- Open Source Recommendation Engine Package
- Enables parallel processing of large data sets for recommendation systems.
- The recommendation process is broken down into stages and provided as components for easy understanding of the internal flow.
- Internal algorithms are provided as higher-order functions so that the recommendation model can be extended more flexibly.
- Open Source Ecosystem
- Provides workbooks to aid learning and understanding of the recommendation system
- Allows users with different backgrounds and interests to participate and try out new ideas for the recommendation system.
- Share the results of testing new ideas, recommendation techniques, and modified or extended functionality in the form of plug-ins (Plugins) to the project.
The goals of this project are to
- Learn and understand recommendation systems by implementing traditional statistical and machine learning based recommendation models and providing workbook.
- Theory and performance validation for academic research purposes through simulators and prototyping.
- Investigate the feasibility of implementing a commercial level recommendation system.
Components
R4tings Recommender provides rating scoring and similarity calculations for recommendations, as well as implementations of traditional rating prediction and item recommendation methods such as collaborative filtering and content-based filtering.
Components
Rating normalization
Normalization is the process of converting values that have different ranges or scales in a data set to a constant range. In the case of ratings data, since ratings have different scales or ranges, rating normalization adjusts for differences in ratings due to different users' rating criteria and preferences for items.
- Mean-Centering (Whole/User/Item)
- Z-Score (whole/user/item)
- Min-Max (whole/user/item)
- Decimal Scailing (Whole)
- Binary Thresholding (Whole)
Components
Similarity calculations
Similarity calculations are a method of measuring similarity between data points in a multidimensional space. It is used to measure the similarity or distance between users or items based on the rating data that users have attached to an item, and to understand the relationships between data.
- Cosine Similarity (user/item)
- Pearson correlation coefficient and similarity (user/item)
- Euclidean distance and similarity (user/item)
- Similarity with binary attributes (user/item)
Components
Rating Predictions and Item Recommendations
- Neighborhood-based collaborative filtering recommendation
- Singular value decomposition-based collaborative filtering recommendation
- TF-IDF content-based filtering recommendation
- Frequent rule-based recommendation
Neighborhood-based collaborative filtering recommendation
A memory-based collaborative filtering recommendation model that uses k-nearest neighbor, a traditional collaborative filtering algorithm. It recommends items based on similarities between users or items.
Singular value decomposition-based collaborative filtering recommendation
This is a model-based collaborative filtering recommendation model that uses the singular value decomposition approach, a matrix factorization algorithm. It recommends items based on a truncated singular value decomposition of the residuals from a baseline estimation of the evaluation data.
TF-IDF content-based filtering recommendation
Content-based filtering recommendation is a memory-based recommendation model that recommends items with similar features, or content, to those preferred by the user. It finds the TF-IDF vector of an item and recommends items by calculating the cosine similarity between the user's TF-IDF vectors.
Frequent rule-based recommendation
Relevance rule recommendation is a memory-based recommendation model that recommends items by analyzing the relevance between items, which are rules for items that appear with an item. It recommends items by calculating the support and confidence of the related rules, which is an interest measure.
MISCELLANEOUS
MISCELLANEOUS
Technology Stack
- Programming language : Java 100% (JDK 8 / 11 Compatible)
- Development environment
- Build tool:Gradle 8.3
- IDE: IntelliJ IDEA Community Edition
- Libraries and frameworks
- Apache Spark 3.5.0 (Scala 2.12)
- Junit
- Logback
- Project Lombok
- Zip4j
- Source code management : Git / GitHub
- Dataset
- Example(r4tings) Dataset - 30 ratings
- Book-Crossing Dataset - 1,149,780 ratings
- MovieLens Dataset - 27,753,444 ratings
MISCELLANEOUS
Additional Resources
- Source code repository(GitHub)
- Core Code
- Code Examples
- Workbook
Google Translate
- API Documentation(API Docs)
MISCELLANEOUS
Feedback and Contributions
- If you have functional errors or improvements, please register with ISSUES or provide feedback through DISCUSSIONS.
- Project participation and contributions are always welcome. More information can be found in CONTRIBUTORS.
MISCELLANEOUS
License
- Dual License
- Core Code and Code Examples: Apache License 2.0
- Workbook : Creative Commons BY-NC-SA 4.0
MISCELLANEOUS
References
- Recommender systems handbook. Francesco Ricci, Lior Rokach, Bracha Shapira, Paul B. Kantor. (2011).
- Recommender Systems - The Textbook. Charu C. Aggarwal. (2016).
- Introduction to Data Mining, 2nd edition. P. Tan, M. Steinbach, A. Karpatne, Vipin Kumar. (2018).
- recommenderlab: An R framework for developing and testing recommendation algorithms. Mi/recommender/docs/workbook/latest/chael Hahsler. (2022).
- Recommender Systems Specialization. Coursera.
- Apa/recommender/docs/workbook/latest/che Spark. The Apa/recommender/docs/workbook/latest/che Software Foundation.
MISCELLANEOUS
Contact
If you have any questions, suggestions, or need to get in touch with us about the project, please contact us at dongsup.kim@r4tings.com.
News & Updates
[2023-12-01] 17th Open SW Development Competition Silver Award(President's Award of the Korean Institute of Information Scientists and Engineers). Open Source SW Festival 2023. Ministry of Science and ICT(2023)
R4tings Recommender - Open source recommendation engine implemented with Java language and Apache Spark library.