CPSC 558 Final Project Part 1

Overview

The final for the course is a project. The project requirements are:

  1. Pick a data source and a dataset that interests you. You can use an existing ARFF file or CSV file for which there are NO POSTED SOLUTIONS, or you can use a data source that requires you to clean and format the data for your tool of choice. You can use Weka, or another open source tool that I can run. I must approve your tool choice if it is not Weka, and you must include link to your data source and any related documentation. Using a data source for which there are posted solutions, or absence of a link to your data source, earns 0%. Get my approval on the data source & dataset as early as possible.

  2. Analyze that dataset using techniques we have learned this semester. You must find at least one pattern in the data that does not use tagged attributes for non-target attributes. You can use tagged attributes as non-target attributes during the initial phase. You must eliminate tagged attributes except for the target attribute for the final, data-based analysis. The target attribute may be a tagged attribute (likely). This is up to you.

  3. Document your analysis stages and results similar in format to my solution handouts. Document your work. This must be a PDF paper. You do not need to use the Q1 .. Qn format, although you can. I want to see what you tried. I want to see what approaches worked, why they worked, and what they found. This is the main point. Find at least one non-obvious pattern / correlation, and explain why it is significant. I want to see what approaches did not work, and why. You get credit for trying things. Include a summary of how your findings would be relevant in an industrial or research application of data analytics.

Part 1

Identify a dataset and goal for the project, obtain the data, check and clean it as necessary, and document your goals, your steps, and the relevance of the project to commercial or research application. The dataset must approved by the course instructor. Identify in your documentation whether this is a fresh dataset with no prior analysis, or whether you are extending existing analysis, confirming or refuting parts of that analysis by using data modeling techniques not used in the original analysis. Data cleaning may just be a matter of converting a comma-separated-value (CSV) file into ARFF format and using AddExpression or similar filters to create derived attributes.

Deliverables:

Writing Guidelines

  1. Focus Your Topic. Do not write about your hopes and dreams. Write about the matter at hand. Know who your readers are and write for this intended audience.

  2. Outline. Create the document outline, including headings and perhaps subheadings, to organize the document before you write. In a previous team course each team had a major heading, each team member has a subheading, and each team member should organize that subsection using sub-subheadings. The details of this section-subsection organization varies according to the purpose of the project and assignment.

  3. Establish Context for Your Readers. Warm up your readers. Avoid the common error of launching into low level technical detail in the first paragraph. Readers need to know the overall context of how your proposal or specification fits into the context of the system being proposed. Introduce how your part fits into the whole, and then describe your part in appropriate detail.

  4. Put Some Meat on the Bones. You need to say something concrete. Do not write a proposal, a specification, or a manual that is vague. If you will hand a specification off to someone who will perform the next stage of work, your specification should contain enough concrete information so that the next person will get a clear idea of what to do next. You do not need to get into a level of detail that the readers do not need, and you should never bloat the document with useless verbiage. Strive to be concrete in your writing. Pay attention to minimum page count requirements when I give them.

  5. Illustrate. One picture is worth a thousand words. Capture your key illustrations, then write text that describes the structure and dynamics of the illustrations. Use good illustrations to guide your writing. Use Adobe Illustrator or Google draw to create illustrations that you then import into your Word document.

  6. Write an initial draft. Over generate the first draft. Get your ideas onto paper, and then tune the focus and presentation. Use complete sentences. A complete sentence contains a subject and a verb at a minimum. Use active sentences. For example, prefer: “The students love the software engineering course.” Do not prefer: “The course was loved (by the students).”

  7. Edit. Use Word comments and change tracking to record suggestions and changes. Do not use long, run-on sentences. Use spell checking and possibly grammar checking before handing a document to someone else. Do not use contractions such as “don’t” in a technical document. Use capitalization consistently. All product names should be in upper case. Supply references, particularly URLs for products. Place them in your text or in footnotes as they fit. Edit down to concise prose.

  8. Use peer review. Ask a peer to review your writing and make suggestions about form or clarifications. In our class we will perform in-team peer review before turning in assignments. Use peer suggestions that you feel are appropriate.