Parallel Processing
Overview
This project can earn you up to a 10% bonus to your final grade.
In this assignment, you will design and implement a high-performance concurrent log analysis engine in Java. The system must process large log files using multiple threads, safely aggregate results, and demonstrate measurable performance improvements over a sequential baseline.
The log analyzer:
- Accepts one or more large log files as input
- Splits the workload into parallel tasks
- Processes log entries concurrently
- Aggregates results in a thread-safe manner
- Outputs structured statistics and analysis results
Requirements
Input
The input to the program is a plain-text log file that contains:
- Timestamp (in UTC format)
- Log level (INFO, WARN, ERROR, DEBUG)
- Source (for example, module)
- Message
You can assume that the timestamps are in order.
Example log file content:
2025-02-14T10:32:01Z ERROR AuthService Failed login attempt
2025-02-14T10:32:05Z INFO AuthService Login successful
2025-02-14T10:33:11Z WARN PaymentService Slow response detected
2025-02-14T10:33:15Z ERROR PaymentService Transaction failed
2025-02-14T10:34:02Z DEBUG InventoryService Cache miss
2025-02-14T10:34:45Z INFO InventoryService Item restocked
2025-02-14T10:35:01Z ERROR AuthService Token validation failed
Analyses
Your analyzer must compute at least four of the following:
- Count of log entries per log level
- Count of unique sources
- Top-N most frequent error messages
- Time-based aggregation (for example, errors per minute)
- Regex-based pattern detection
- Identification of anomalous bursts
- (possibly other analyses if approved by the instructor)
The type of analysis for a specific execution of the program should be specified as a command line argument.
Implementation
The project must include a sequential baseline implementation and a concurrent implementation. There should be a clear separation of
- Parsing
- Processing
- Aggregation
- Reporting
Turning in the Assignment
For this assignment, you must turn in a zip file of a directory
named bonus containing the following:
- Java project directory structure and source files
- Makefile
- Any programs/scripts used to generate test data
- README
Submit the zip file to the appropriate folder on D2L.
Grading Rubric (100 points)
Program Correctness (30 points)
This evaluates whether your program works as intended. It is based on the results of automated tests that check if your code implements all required functionality correctly. This category also includes any additional issues not explicitly tested but that might lead to incorrect behavior.
Program Design (30 points)
This evaluates the overall structure and organization of your code. A well-designed program will typically have:
- Small, self-contained functions that perform specific tasks and are easy to test
- Common code factored out into reusable functions, avoiding repetition
- Flexibility to handle different use cases
- Low coupling between components (such as functions and classes), where each component only interacts with others through their public interfaces, not their internal details
Code Readability & Style (5 points)
This evaluates how easy it is to read and understand your code, as well as how well you adhere to the style guide. The key points to consider are:
- Clarity: Code should be clear and easy to follow. Avoid unclear variable names, magic numbers, and convoluted logic, etc.
- Style Consistency: While it’s not necessary to follow every single aspect of the style guide, your code should maintain consistency and follow the general principles outlined.
Documentation & Comments (5 points)
This measures how well your code is documented. At a minimum, your code should:
- Adhere to the department documentation standards.
- For each file, begin with a header comment that includes your name, file name, and a description of the file’s purpose. You may also include other details like the date written, your approach, or any references to resources you used.
- Contain comments within the code, especially for complex or unclear sections. Try to comment on parts of the code you may not understand when revisiting it in the future. However, avoid excessive comments—focus on clarity, not redundancy.
README (30 points)
The README file must include a technical report that includes the following:
- Instructions for running the project
- Discussion of the system architecture
- Explanation of concurrency model
- Task partition strategy
- Detailed performance evaluation
- Discussion of trade-offs and limitations
The content of the README file must be coherent, grammatically correct English.