CPSC 558 Final Project Part 2
Deliverables
Your submission must include all of the following components. Each item should be clearly named and organized so that a reviewer can easily reproduce your work.
Data Files
- One or more compressed ARFF file(s) (
.arff.gz,.zip, or similar), or - Other tool-specific data file(s) (e.g., CSV, Weka, R, Python-compatible formats), as modified during the data processing and analysis phases.
These files should reflect the dataset as actually used in your analysis (i.e., after cleaning, transformation, and feature engineering).
Code and Scripts
- All scripts used for data preprocessing, feature engineering, model training, or analysis.
- Include comments or minimal documentation so that another student could understand and rerun your analysis.
Technical Report (PDF)
Submit a single PDF document that serves as a formal technical report. The report must include your name and should be written clearly, concisely, and professionally.
You may reorder sections if it improves the narrative flow, but all required content must be present.
Introduction
- Introduce the project and the problem domain.
- Provide necessary background and motivation.
- Clearly state what the reader should expect from the remainder of the report.
Data Source
- Identify the source of the dataset.
- Include relevant links, citations, or references (a URL in the report is sufficient).
- If the raw dataset is small enough, it may be included in the submission archive.
- Provide summary statistics about the dataset, including (as applicable):
- Number of instances
- Number of attributes
- Attribute types (numeric, categorical, text, etc.)
- Presence of missing values
- Any class imbalance or notable characteristics
Goal
- Clearly state the intended goal of your analysis.
- Indicate whether this work extends prior analyses or constitutes a new analysis.
- If the dataset has been used previously (e.g., in published studies or public
benchmarks), briefly describe:
- What has been done before
- How your analysis differs or adds value
Value Proposition
- Explain how the results of your analysis could be used in:
- A commercial, industrial, or organizational context, and/or
- A research or academic setting
- Focus on practical or scientific impact rather than just technical novelty.
Data Processing
- Describe all steps taken to prepare the data for analysis, including:
- Data cleaning
- Handling missing or inconsistent values
- Normalization/scaling
- Feature selection or feature construction
- Discuss any problems encountered and how they were addressed.
- Be specific and justify key decisions.
Data Analysis
- Describe the machine learning and modeling techniques used.
- Report classification and/or regression results using appropriate evaluation metrics.
- Clearly identify:
- Algorithms used
- Parameter choices
- Filtering or transformation steps
- Justify methodological choices.
- Discuss challenges, limitations, or unexpected outcomes.
- Requirement: Your analysis must include at least one technique not used in Assignments 1–3.
- This section should clearly tell the story of your analysis, focusing on what you did and why.
Results
- Evaluate whether you achieved your stated goal.
- Explain how the results support, refute, or partially address your objectives.
- Include quantitative results (e.g., accuracy, precision/recall, RMSE, ROC, etc.) and explain their significance.
- Explicitly connect outcomes back to the goals stated earlier in the report.
Conclusion
- Summarize key findings and contributions.
- Reflect on strengths and limitations of your approach.
- Discuss possible future work, extensions, or improvements.
Grading Rubric: Technical Report (100 points)
Introduction & Problem Context (10 points)
- Excellent (9–10): Clear, concise introduction with strong motivation and well-defined problem context.
- Good (7–8): Introduction is clear but may lack depth or strong motivation.
- Fair (5–6): Introduction present but vague, unfocused, or missing context.
- Poor (0–4): Introduction unclear or missing.
Data Source & Dataset Description (15 points)
- Excellent (14–15): Data source clearly identified and cited; dataset characteristics thoroughly summarized (instances, attributes, types, issues).
- Good (11–13): Data source and summary provided, minor details missing.
- Fair (8–10): Basic description but incomplete or superficial.
- Poor (0–7): Data source unclear or poorly described.
Goal & Value Proposition (15 points)
- Excellent (14–15): Clear, well-justified goal; strong explanation of novelty and real-world or research value.
- Good (11–13): Goal stated and reasonable value proposition, but limited depth.
- Fair (8–10): Goal vague or value proposition weak.
- Poor (0–7): Goal unclear or missing; little to no value discussion.
Data Processing & Preparation (15 points)
- Excellent (14–15): Data processing steps clearly documented; challenges and decisions well explained and justified.
- Good (11–13): Major steps described; some justification missing.
- Fair (8–10): Basic steps mentioned, limited detail.
- Poor (0–7): Processing steps poorly documented or missing.
Data Analysis & Methodology (20 points)
- Excellent (18–20): Appropriate techniques used and justified; includes at least one new method beyond Assignments 1–3; clear, logical analysis narrative.
- Good (15–17): Correct techniques used with reasonable justification.
- Fair (11–14): Limited or weak methodological reasoning.
- Poor (0–10): Inappropriate methods, poor justification, or missing analysis.
Results & Interpretation (15 points)
- Excellent (14–15): Results clearly presented and directly tied to goals; metrics correctly interpreted.
- Good (11–13): Results presented clearly; interpretation mostly accurate.
- Fair (8–10): Results shown but weak explanation or unclear relevance.
- Poor (0–7): Results missing, incorrect, or not interpreted.
Conclusion & Future Work (5 points)
- Excellent (5): Clear summary and thoughtful discussion of future directions.
- Good (4): Conclusion present but limited insight.
- Fair (2–3): Minimal conclusion.
- Poor (0–1): Conclusion missing.
Writing Quality & Professionalism (5 points)
- Excellent (5): Clear, professional writing; well-organized; minimal errors.
- Good (4): Minor issues but readable and professional.
- Fair (2–3): Writing or organization problems affect clarity.
- Poor (0–1): Poor writing quality or formatting.
Note: All deliverables are required. Failure to submit any component (e.g., data files or scripts) will result in a grade penalty, regardless of the quality of the report.