INFS 5102: Unsupervised Methods in Analytics

INFS 5102: Unsupervised Methods in Analytics

Assignment

Task Description

In this assignment, you are given the prices of 5 US stock between 2016 and 2018. We have deliberately changed the price of one particular stock, mimicking the scenario where the stock has been controlled or hacked maliciously by a third party. You are asked to examine these stock prices, aiming to identify which stock is more likely to be hacked. Based on your analysis, you will need to write a report reflecting your analysis.

Things to notice:

Basically, what you are aiming at is to find unnatural fluctuation of stock prices. The one being hacked will likely to observe more frequent abnormal fluctuation.

  1. One way to identify unnatural fluctuation is by comparing the stock prices in consecutive days.
  2. For each stock, you should identify how unnatural/abnormal each daily price is. Therefore, for each stock, you will compute an outlier score for each trading day in the period between 2016 and 2018.
  3. Then you need to compare different stocks, aiming to identify if the outlier scores of a particular stock are significantly larger than other stocks. You can find some information on how to compare groups (each stock is a group) of outlier scores in the reference materials.

The following are some questions that you need to answer:

  • How will you define/measure the outlier scores? You can find some ideas in the URL link https://au.mathworks.com/help/matlab/ref/isoutlier.html.
  • Here is a tip. Suppose the following are the price of two stocks:

Stock 1: [100 100 100, 110, 200, 110, 100 100 90]

Stock 2: [1, 1, 1, 1.1, 2, 1.1, 1, 1, 0.9]

From each stock, you should compute 9 outlier scores (1 for each day). Your outlier scores should reflect that there is an abnormal price fluctuation on the 5th day.

Also, if you compare the prices of the two stocks, they differ only by a constant factor (100). Therefore, when you define the outlier scores of the two stocks, they should be the same. One approach to address this issue is by properly normalizing your data. You need to think about how you should normalize your data.

Deliverables

You need to submit a report and a presentation.

Part 1 is the report. Please use the provided template for the report. Report page length is at most 8 pages in total, excluding the appendix. There is no page length limit on the appendix.

Section 1: Introduction to outlier detection

  • What is outlier detection?
  • Why do we need outlier detection?
  • Examples of outlier detection applications
  • How outlier detection is relevant to detecting abnormal stock behavior.

Section 2:Literature review of outlier detection methods

  • A literature review of different outlier detection methods, including different ways to define outlier scores. You should try to include references for outlier detection in time series.
  • Include all works that you have reviewed in the reference section.

Section 3: Define outlier scores

  • With respect to our application (detecting abnormal stock behavior), you need to propose an outlier score (to determine for a given stock whether its stock price at a given day is normal or not). Explain clearly how you compute the outlier score.
  • Justify your choices.
  • You are encouraged to propose more than one outlier scores. Bonus marks will be given if you propose multiple.

Section 4: Compute outlier score of a group/entity of records — Identification of suspicious stocks

  • Review the Konijn paper (up to and including Section 4.1) for how to assign outlier scores for a group of records.
  • Describe your approach to how you evaluate if a stock has been manipulated or not.
  • Evaluation of your proposed algorithms – pros and cons, limitations, etc.

Section 5: Conclusion and future work

  • Summary
  • Areas of improvement or future work
CLOSE
CLOSE