Data Science Projects

List of all the projects I have done which are related to data science and data visualization are listed below :

Data Science Projects

1. Predicting Stock Performance with Machine Learning

Project type: Group Project University of Liverpool

Using Python language and Packages

Aim of the project: Maximize the return of shareholders by predicting the price of bitcoin using machine and deep learning techniques

2. Working on Credit Card Fraud Prediction using Machine Learning and Artificial Intelligence

Currently working on this..............................................................>

BIG-Biological Data Projects

1. Rheumatoid Arthritis---Data Analysis using GWAScat

Using R language

#For GWAS  analysis we need BiocManager package to be installed 
install.packages("BiocManager")

BiocManager::install("ggbio")
BiocManager::install("gwascat")
BiocManager::install("Homo.sapiens")

library(gwascat)
objects("package:gwascat")
data(ebicat38)
topTraits(ebicat38)
subsetByTraits(ebicat38, tr="Rheumatoid arthritis")[1:294]
df_main <- data.frame(subsetByTraits(ebicat38, tr="Rheumatoid arthritis")[1:294])
getwd()
write.csv(df_main,"Gwas_RA_ALL.csv", row.names = FALSE)

#Basic Manhattan plot
 gwtrunc = ebicat38
 requireNamespace("S4Vectors")
 mcols = S4Vectors::mcols

 mlpv = mcols(ebicat38)$PVALUE_MLOG
 mlpv = ifelse(mlpv > 1000, 1000, mlpv)
 S4Vectors::mcols(gwtrunc)$PVALUE_MLOG = mlpv
 library(GenomeInfoDb)
 seqlevelsStyle(gwtrunc) = "UCSC"
 gwlit = gwtrunc[ which(as.character(seqnames(gwtrunc)) %in% c("chr1")) ]
 library(ggbio)
 mlpv = mcols(gwlit)$PVALUE_MLOG
 mlpv = ifelse(mlpv > 550, 550, mlpv)
 S4Vectors::mcols(gwlit)$PVALUE_MLOG = mlpv
 methods:::cbind2(FALSE)
 autoplot(gwlit, geom="point", aes(y=PVALUE_MLOG), xlab="chr1" )

After running this code you get the plot shown below

2. Plotting of Biological Data on the World Map --------> Density-based Mapping

#First install Maps package
install.packages("maps")

library(maps)
library(ggplot2)

world_map <- map_data("world")
p <- ggplot() + coord_fixed() +
  xlab("") + ylab("")

#Add map to base plot
base_world_messy <- p + geom_polygon(data=world_map, aes(x=long, y=lat, group=group), 
                                     colour="darkslategrey", fill="white")

base_world_messy

#Strip the map down so it looks super clean (and beautiful!)
cleanup <- 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'darkslategrey', colour = 'darkslategrey'), 
        axis.line = element_line(colour = "white"), legend.position="none",
        axis.ticks=element_blank(), axis.text.x=element_blank(),
        axis.text.y=element_blank())

base_world <- base_world_messy + cleanup

base_world

# read file  with cordinate data 
df <- read.csv("C:/Users/workc/OneDrive/Desktop/RA_map_cord.csv")

map_data_sized <- 
  base_world +
  geom_point(data=df, 
             aes(x=long, y=lat, size=value), colour="Black", 
             fill="Deep Pink",pch=21, alpha=I(1)) 

map_data_sized

Disease mapping on the world map by countries

(Improved version from the above map)

Data Visualisation Gallery

`Data visualization outputs are included below.`

Using Power-BI

1. Protein Database analysis

Generated visualisations are given below:

Created using R programming language

Creating using R programming language

Created using R programming language

Created using Power-BI

Created using Power-BI

Created using Power-BI

Created using Power-BI

Last updated 4 years ago

Was this helpful?