Data Science Projects
Project type: Group Project University of Liverpool
Using Python language and Packages
Aim of the project: Maximize the return of shareholders by predicting the price of bitcoin using machine and deep learning techniques
2. Working on Credit Card Fraud Prediction using Machine Learning and Artificial Intelligence
Currently working on this..............................................................>
BIG-Biological Data Projects
1. Rheumatoid Arthritis---Data Analysis using GWAScat
Using R language
#For GWAS analysis we need BiocManager package to be installed
install.packages("BiocManager")
BiocManager::install("ggbio")
BiocManager::install("gwascat")
BiocManager::install("Homo.sapiens")
library(gwascat)
objects("package:gwascat")
data(ebicat38)
topTraits(ebicat38)
subsetByTraits(ebicat38, tr="Rheumatoid arthritis")[1:294]
df_main <- data.frame(subsetByTraits(ebicat38, tr="Rheumatoid arthritis")[1:294])
getwd()
write.csv(df_main,"Gwas_RA_ALL.csv", row.names = FALSE)
#Basic Manhattan plot
gwtrunc = ebicat38
requireNamespace("S4Vectors")
mcols = S4Vectors::mcols
mlpv = mcols(ebicat38)$PVALUE_MLOG
mlpv = ifelse(mlpv > 1000, 1000, mlpv)
S4Vectors::mcols(gwtrunc)$PVALUE_MLOG = mlpv
library(GenomeInfoDb)
seqlevelsStyle(gwtrunc) = "UCSC"
gwlit = gwtrunc[ which(as.character(seqnames(gwtrunc)) %in% c("chr1")) ]
library(ggbio)
mlpv = mcols(gwlit)$PVALUE_MLOG
mlpv = ifelse(mlpv > 550, 550, mlpv)
S4Vectors::mcols(gwlit)$PVALUE_MLOG = mlpv
methods:::cbind2(FALSE)
autoplot(gwlit, geom="point", aes(y=PVALUE_MLOG), xlab="chr1" )
After running this code you get the plot shown below
2. Plotting of Biological Data on the World Map --------> Density-based Mapping
#First install Maps package
install.packages("maps")
library(maps)
library(ggplot2)
world_map <- map_data("world")
p <- ggplot() + coord_fixed() +
xlab("") + ylab("")
#Add map to base plot
base_world_messy <- p + geom_polygon(data=world_map, aes(x=long, y=lat, group=group),
colour="darkslategrey", fill="white")
base_world_messy
#Strip the map down so it looks super clean (and beautiful!)
cleanup <-
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_rect(fill = 'darkslategrey', colour = 'darkslategrey'),
axis.line = element_line(colour = "white"), legend.position="none",
axis.ticks=element_blank(), axis.text.x=element_blank(),
axis.text.y=element_blank())
base_world <- base_world_messy + cleanup
base_world
# read file with cordinate data
df <- read.csv("C:/Users/workc/OneDrive/Desktop/RA_map_cord.csv")
map_data_sized <-
base_world +
geom_point(data=df,
aes(x=long, y=lat, size=value), colour="Black",
fill="Deep Pink",pch=21, alpha=I(1))
map_data_sized
Disease mapping on the world map by countries
(Improved version from the above map)
Data Visualisation Gallery
Data visualization outputs are included below.
Using Power-BI
Using Power-BI
1. Protein Database analysis
Generated visualisations are given below:
Created using R programming language
Creating using R programming language
Created using R programming language
Created using Power-BI
Created using Power-BI
Created using Power-BI
Created using Power-BI