Data Science Projects
List of all the projects I have done which are related to data science and data visualization are listed below :
Data Science Projects
1. Predicting Stock Performance with Machine Learning
Project type: Group Project University of Liverpool
Using Python language and Packages
Aim of the project: Maximize the return of shareholders by predicting the price of bitcoin using machine and deep learning techniques


2. Working on Credit Card Fraud Prediction using Machine Learning and Artificial Intelligence
Currently working on this..............................................................>
BIG-Biological Data Projects
1. Rheumatoid Arthritis---Data Analysis using GWAScat
Using R language
#For GWAS analysis we need BiocManager package to be installed
install.packages("BiocManager")
BiocManager::install("ggbio")
BiocManager::install("gwascat")
BiocManager::install("Homo.sapiens")
library(gwascat)
objects("package:gwascat")
data(ebicat38)
topTraits(ebicat38)
subsetByTraits(ebicat38, tr="Rheumatoid arthritis")[1:294]
df_main <- data.frame(subsetByTraits(ebicat38, tr="Rheumatoid arthritis")[1:294])
getwd()
write.csv(df_main,"Gwas_RA_ALL.csv", row.names = FALSE)
#Basic Manhattan plot
gwtrunc = ebicat38
requireNamespace("S4Vectors")
mcols = S4Vectors::mcols
mlpv = mcols(ebicat38)$PVALUE_MLOG
mlpv = ifelse(mlpv > 1000, 1000, mlpv)
S4Vectors::mcols(gwtrunc)$PVALUE_MLOG = mlpv
library(GenomeInfoDb)
seqlevelsStyle(gwtrunc) = "UCSC"
gwlit = gwtrunc[ which(as.character(seqnames(gwtrunc)) %in% c("chr1")) ]
library(ggbio)
mlpv = mcols(gwlit)$PVALUE_MLOG
mlpv = ifelse(mlpv > 550, 550, mlpv)
S4Vectors::mcols(gwlit)$PVALUE_MLOG = mlpv
methods:::cbind2(FALSE)
autoplot(gwlit, geom="point", aes(y=PVALUE_MLOG), xlab="chr1" )
After running this code you get the plot shown below

2. Plotting of Biological Data on the World Map --------> Density-based Mapping
#First install Maps package
install.packages("maps")
library(maps)
library(ggplot2)
world_map <- map_data("world")
p <- ggplot() + coord_fixed() +
xlab("") + ylab("")
#Add map to base plot
base_world_messy <- p + geom_polygon(data=world_map, aes(x=long, y=lat, group=group),
colour="darkslategrey", fill="white")
base_world_messy

#Strip the map down so it looks super clean (and beautiful!)
cleanup <-
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_rect(fill = 'darkslategrey', colour = 'darkslategrey'),
axis.line = element_line(colour = "white"), legend.position="none",
axis.ticks=element_blank(), axis.text.x=element_blank(),
axis.text.y=element_blank())
base_world <- base_world_messy + cleanup
base_world

# read file with cordinate data
df <- read.csv("C:/Users/workc/OneDrive/Desktop/RA_map_cord.csv")
map_data_sized <-
base_world +
geom_point(data=df,
aes(x=long, y=lat, size=value), colour="Black",
fill="Deep Pink",pch=21, alpha=I(1))
map_data_sized

Disease mapping on the world map by countries
(Improved version from the above map)

Data Visualisation Gallery
Data visualization outputs are included below.
Data visualization outputs are included below.
Using Power-BI
Using Power-BI

1. Protein Database analysis
Generated visualisations are given below:
Created using R programming language

Creating using R programming language

Created using R programming language

Created using Power-BI

Created using Power-BI

Created using Power-BI

Created using Power-BI

Last updated
Was this helpful?