Data Science Projects
1. Predicting Stock Performance with Machine Learning
Project type: Group Project University of Liverpool
Using Python language and Packages
Aim of the project: Maximize the return of shareholders by predicting the price of bitcoin using machine and deep learning techniques
2. Working on Credit Card Fraud Prediction using Machine Learning and Artificial Intelligence
Currently working on this ..............................................................>
BIG-Biological Data Projects
1. Rheumatoid Arthritis---Data Analysis using GWAScat
Using R language
Copy #For GWAS analysis we need BiocManager package to be installed
install.packages ( "BiocManager" )
Copy BiocManager :: install( "ggbio" )
BiocManager :: install( "gwascat" )
BiocManager :: install( "Homo.sapiens" )
Copy library (gwascat)
objects ( "package:gwascat" )
data (ebicat38)
topTraits( ebicat38 )
subsetByTraits( ebicat38, tr = "Rheumatoid arthritis" ) [ 1 : 294 ]
df_main <- data.frame ( subsetByTraits( ebicat38, tr = "Rheumatoid arthritis" ) [ 1 : 294 ])
getwd ()
write.csv (df_main, "Gwas_RA_ALL.csv" , row.names = FALSE )
Copy #Basic Manhattan plot
gwtrunc = ebicat38
requireNamespace ( "S4Vectors" )
mcols = S4Vectors :: mcols
mlpv = mcols( ebicat38 ) $ PVALUE_MLOG
mlpv = ifelse (mlpv > 1000 , 1000 , mlpv)
S4Vectors :: mcols( gwtrunc ) $ PVALUE_MLOG = mlpv
library (GenomeInfoDb)
seqlevelsStyle( gwtrunc ) = "UCSC"
gwlit = gwtrunc[ which ( as.character ( seqnames( gwtrunc ) ) %in% c ( "chr1" )) ]
library (ggbio)
mlpv = mcols( gwlit ) $ PVALUE_MLOG
mlpv = ifelse (mlpv > 550 , 550 , mlpv)
S4Vectors :: mcols( gwlit ) $ PVALUE_MLOG = mlpv
methods ::: cbind2 ( FALSE )
autoplot( gwlit, geom = "point" , aes( y = PVALUE_MLOG ) , xlab = "chr1" )
After running this code you get the plot shown below
2. Plotting of Biological Data on the World Map --------> Density-based Mapping
Copy #First install Maps package
install.packages ( "maps" )
Copy library (maps)
library (ggplot2)
Copy world_map <- map_data( "world" )
p <- ggplot() + coord_fixed() +
xlab( "" ) + ylab( "" )
Copy #Add map to base plot
base_world_messy <- p + geom_polygon( data = world_map, aes( x = long, y = lat, group = group ) ,
colour = "darkslategrey" , fill = "white" )
base_world_messy
Copy #Strip the map down so it looks super clean (and beautiful!)
cleanup <-
theme( panel.grid.major = element_blank() , panel.grid.minor = element_blank() ,
panel.background = element_rect( fill = 'darkslategrey' , colour = 'darkslategrey' ) ,
axis.line = element_line( colour = "white" ) , legend.position = "none" ,
axis.ticks = element_blank() , axis.text.x = element_blank() ,
axis.text.y = element_blank())
base_world <- base_world_messy + cleanup
base_world
Copy # read file with cordinate data
df <- read.csv ( "C:/Users/workc/OneDrive/Desktop/RA_map_cord.csv" )
Copy map_data_sized <-
base_world +
geom_point( data = df,
aes( x = long, y = lat, size = value ) , colour = "Black" ,
fill = "Deep Pink" ,pch = 21 , alpha = I ( 1 ) )
map_data_sized
Disease mapping on the world map by countries
(Improved version from the above map )
Data Visualisation Gallery
Data visualization outputs are included below.
Using Power-BI
Using Power-BI
1. Protein Database analysis
Generated visualisations are given below:
Created using R programming language
Creating using R programming language
Created using R programming language
Created using Power-BI
Created using Power-BI
Created using Power-BI
Created using Power-BI