Financial data analysis is as much a broad area as Finance. You can use it for managing/mitigating different types of financial risk, taking decisions on investment, managing portfolio, valuing assets etc. Below are a few beginner level projects you can try working on.
1- Build a Credit Scorecard Model - Credit scorecards are basically used to assess credit worthiness of customers. Use German Loan data-set (publicly available credit data) to build credit scorecard for customers. The data set has historical data on default status of 1000 customers and the different factors that are possibly correlated with the customer’s chances of defaulting such as salary age, marital status etc. and attributes of the loan contract such as term, APR rate etc. Build a classification model (using techniques like Logistic Regression, LDA, Decision Tree, Random Forest, Boosting, Bagging) to classify good and bad customers (default and non default customers) and use the model to score new customers in future and lend to customers that have a minimum score. Credit scorecards are heavily used in the industry for taking decisions on grating credit, monitoring portfolio, calculating expected loss etc.
2- Build a Stock Price Forecasting Model - These models are used to predict price of a stock or an index for a given time period in future. You can download stock price of any of the publicly listed companies such as Apple, Microsoft, Facebook, Google from Yahoo finance. Such data is known as uni-variate time series data. You can use ARIMA (AR, MA, ARMA, ARIMA) class of models or use Exponential Smoothing models.
3- Portfolio Optimization Problem - Assume you are working as an adviser to a high net worth individual who wants to diversify his 1 million cash in 20 different stocks. How would you advise him? you can find 20 least correlated stocks (that mitigates the risk) using correlation matrix and use optimization algorithms (OR algos) to find out how you would distribute 1million among these 20 different stocks.
4- Segmentation modelling - Financial services are increasingly becoming tailored made. Doing so helps banks in targeting customers in a in a more efficient way. How do banks do so? They use segmentation modelling to cater differently to different segments of customers. You need historical data on customer attributes & data on financial product/services to build a segmentation model. Techniques such as Decision Trees, Clustering are used to build segmentation models.
5- Revenue Forecasting - Revenue forecasting can be done using statistical analysis as well (apart from the conventional accounting practices that companies follow). You can take data for factors affecting revenue of a company or a group of companies for a set of periods of equal interval (monthly, Quarterly, Half year, annual) to build a regression model. make sure you correct for problem of auto-correlation as the data has time series component and the errors are likely to be correlated (that violates assumptions of regression analysis)
6- Pricing Financial Products : You can build models to price financial products such as mortgages, auto loans, credit card transactions etc. (pricing in this case would be charging right interest rate to account for the risk involved, earn profit from the contract and yet be competitive in the market). You can also build models to price forward, future, options, swaps (relatively more complicated though)
7- Prepayment models - Prepayment is a problem in loan contracts for banks. Use loan data to predict customers could potentially prepay. You can build another model in parallel to this to know if a customer prepays, when is he likely to prepay in the life time of the loan (time to prepay). You may also build a model to know how much loss the company would incur if a section of the portfolio of customer prepay in future.
8 - Fraud Model - These models are being used to know if a particular transaction is a fraudulent transaction. Historical data having details of fraud and non-fraud transactions can be used to build a classification model that would predict chances of fraud happening in a transaction. Since we normally have high volume of data, one can try not just relatively simpler models like Logistic Regression or Decision trees but also should try more sophisticated ensemble models.
ANalytics Study Pack : http://analyticuniversity.com/
Analytics University on Twitter : https://twitter.com/AnalyticsUniver
Analytics University on Facebook : https://www.facebook.com/AnalyticsUniversity