• 1500+ Experts
  • A+ Grade
  • Free Turnitin Report

The various stages of a data analytics project (using the KDD or CRISP-DM methodology) in order to extract knowledge from data

The goal of this project is to understand and apply the various stages of a data analytics project (using the KDD or CRISP-DM methodology) in order to extract knowledge from data. You should also be able to appropriately evaluate the performance of the algorithms applied, as well as insightfully consider the results and limitations of your analyses.

In order to successfully complete the project, the following seven (7) steps should be followed.

1) Select at least three (3) related datasets appropriate to the FinTech domain. Each dataset should be suitably large (at least 10,000 rows and at least 10 columns).

Possible sources of datasets include, but are not limited to:

  • Statista: https://www.statista.com
  • European Data Portal, EU Open Data Portal, and other: http://data.europa.eu/
  • UK’s open government data repository: http://data.gov.uk
  • Central Statistics Office, Ireland: http://www.cso.ie
  • Kaggle: http://www.kaggle.com
  •  Run My Code: http://www.runmycode.org/
  •  Amazon’s public dataset repository: https://aws.amazon.com/datasets
  • Google’s Public Data Directory: http://www.google.com/publicdata/directory
  • The UCI machine learning repository: http://archive.ics.uci.edu/ml/
  • Google Data Search: https://toolbox.google.com/datasetsearch
  • Zenodo: https://zenodo.org
  •  Dublinked: https://data.smartdublin.ie
  •  Data.gov: https://www.data.gov/
  • Quandl: https://www.quandl.com

2) Produce a Data Quality Report and perform any necessary pre-processing, e.g. transformation, imputation, feature engineering, etc. Perform some summary statistics on the data & describe what these statistics say about the data.

3) Implement four (4) different visualisations on the data. You can implement these on the individual datasets, a combination of the datasets or both.

4) Describe what these visualisations say about the data.

5) Implement four (4) different data mining algorithms. State why you have chosen these algorithms, and what you have found (i.e. knowledge extracted) using them.

6) Describe how well these algorithms perform using measures of performance including (but not limited to) 𝑅2, MSE, Accuracy, Precision, AUC, RMSE, F-measure and MAPE.

7) Write a report using the IEEE template. This document should not be more than eight (8) pages (including references). Papers over the page limit (even if it is only 1 word) will be subjected to a 5 percentile point penalty, i.e. the maximum mark for the paper will be 95%.

The report should contain the following sections:

  • Abstract
  • Introduction (which contains your objectives and motivation)
  • Data Quality, Data Pre-processing and Summary Statistics
  • Visualisations
  • Data Mining Algorithms, Results and Performance Evaluation
  • Conclusions & Future Work
  • References

Are You Searching Answer of this Question? Request Ireland Writers to Write a plagiarism Free Copy for You.

Get Help By Expert

irelandassignments.ie present excellent data science assignment help services for all the students studying in Ireland universities. Our writers are perfect option for completing all your academic task such as university assignment or masters education assignment writings.

Submit Your Assignment Questions & Get Plagiarism Free Answers.

Assignment-Help-Ireland.jpg

Submit Your Assignment