You are Required to Carry out a Series of Analyses on Publicly Accessible Datasets: Programming for Big Data Report, NCI,
University | National College of Ireland (NCI) |
Subject | Programming for Big Data |
Assignment Description
You are required to carry out a series of analyses on publicly accessible datasets using the R programming language used in this module and programming environments suitable for the task. It is recommended that your use at least two separate datasets. For each of the chosen datasets, you are required to compile a report of your analysis. Each dataset should have at least 1,000 records (rows). If you are unsure if your dataset(s) is/are appropriate, please check with your lecturer. You must provide evidence in your report that you are authorized to use the dataset(s) that you have chosen.
The main deliverable is a report that provides significant insights into the datasets that you have chosen to analyze. Your report should provide at least four unique insights based on your data analysis. Examples of insights might include relationships, trends/patterns, correlations, models based on the data, visuals, and statistical analyses.
All deliverables should be compiled into a project report document for submission along with all programming code elements in an appendix. Please submit your report via the Turnitin upload link in Moodle. R scripts and additional files are to be uploaded to a separate link in Moodle. Your project report should discuss the challenges that you encountered while handling your chosen datasets and the means and mechanisms you implemented to overcome these challenges. The word count for your report should be not less than 2,000 words, and not more than 2,500 words (not counting R code).
- Description of the objective(s) of the analysis with reference to the basic domain literature to explain the domain purpose of the analyses
- Description of the underlying dataset including an assessment of the data types present, with an emphasis on the data that is actually used in the analytical processes
- Approach to the analysis, aided by visuals such as diagrams, flowcharts and tables where appropriate
- R code demonstrating at least four unique insights. R scripts will be executed as part of the assessment process. It is expected that scripts are fully working, efficient, commented clearly, and do not contain excess code
- Project report structure, presentation and discussion of challenges.
Are You Searching Answer of this Question? Request Ireland Writers to Write a plagiarism Free Copy for You.
Objectives and Literature Review
As with every piece of data analysis, you should ideally have a question or set of questions you expect your work to answer; these are your objectives. They will be graded for realism, imagination, ambition and clarity of expression.
Dataset description
Your chosen datasets should be included in their original form as ancillary files. If they are prohibitively large, you should include a well-chosen, representative subset. Where and how the datasets where located and downloaded should be clearly shown. They will be graded for richness, depth and interest factor.
Analysis approach
Your data analysis should be designed in advance and the design documented via description and visual aids such as tables, flowcharts, and other appropriate schematics. Please note that screenshots of your code do not count as such in the general case and should be avoided unless there is a specific reason why they are appropriate.
There are 3 established approaches to data analysis
- Cross-Industry Standard Process for Data Mining (CRISP-DM)
- Knowledge Discovery in Databases (KDD)
- Sample, Explore, Modify, Model and Assess (SEMMA)
Analysis results and presentation
The results of your analysis should go as far as possible towards reaching your prior stated objectives (i.e. answering the questions you were hoping to answer). Note that a robust conclusion that the dataset or the analysis aren’t enough to reach a specific conclusion isn’t a failure but is, in fact, a positive result!
R code
You are required to use R to an extent that showcases your aptitude with the most important operations learned during the course (file I/O, control structures, functions, etc). A substantial amount of code is expected.
Project report
Your ultimate project report, encompassing all the above, will be graded for structure, presentation and quality of its discussion of challenges. There is no requirement to structure your report as a scientific paper, though you are free to do so if you prefer.
We offer custom writing services for data science assignments to NCI scholars in Ireland. Our best assignment writers provide excellent report writing help for programming for big data assignments that definitely help you to achieve A+ grades.