Title: | 'Pubmed' Word Clouds |
---|---|
Description: | Create a word cloud using the abstract of publications from 'Pubmed'. |
Authors: | Felix Yanhui Fan <[email protected]> |
Maintainer: | Felix Yanhui Fan <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.3.6 |
Built: | 2025-01-16 03:37:15 UTC |
Source: | https://github.com/felixfan/pubmedwordcloud |
remove Punctuations, remove Numbers, Translate characters to lower or upper case, remove stopwords, remove user specified words, Stemming words.
cleanAbstracts(abstracts, rmNum = TRUE, tolw = TRUE, toup = FALSE, rmWords = TRUE, yrWords = NULL, stemDoc = FALSE)
cleanAbstracts(abstracts, rmNum = TRUE, tolw = TRUE, toup = FALSE, rmWords = TRUE, yrWords = NULL, stemDoc = FALSE)
abstracts |
output of getAbstracts, or just a paragraph of text |
rmNum |
Remove the text document with any numbers in it or not |
tolw |
Translate characters in character vectors to lower case or not |
toup |
Translate characters in character vectors to upper case or not |
rmWords |
Remove a set of English stopwords (e.g., 'the') or not |
yrWords |
A character vector listing the words to be removed. |
stemDoc |
Stem words in a text document using Porter's stemming algorithm. |
# Abs=getAbstracts(c("22693232", "22564732")) # cleanAbs=cleanAbstracts(Abs) # text="Jobs received a number of honors and public recognition." # cleanD=cleanAbstracts(text)
# Abs=getAbstracts(c("22693232", "22564732")) # cleanAbs=cleanAbstracts(Abs) # text="Jobs received a number of honors and public recognition." # cleanD=cleanAbstracts(text)
plot colors.
colSets(type)
colSets(type)
type |
palette names from the lists: Accent, Dark2, Pastel1, Pastel2, Paired, Set1, Set2, Set3. |
# colors= colSets(type="Accent") # colors= colSets(type="Paired") # colors= colSets(type="Set3")
# colors= colSets(type="Accent") # colors= colSets(type="Paired") # colors= colSets(type="Set3")
add two sets of PMIDs together, or exclude one set PMIDs from another set of PMIDs.
editPMIDs(x, y, method = c("add", "exclude"))
editPMIDs(x, y, method = c("add", "exclude"))
x |
output of getPMIDs, or a set of PMIDs |
y |
output of getPMIDs, or a set of PMIDs |
method |
can be 'add' (default) or 'exclude'. see details. |
when method is 'add', PMIDs in 'x' and 'y' will be combined. when method is 'exclude', PMIDs in 'y' will be excluded from 'x'.
# pmid1=getPMIDs(author="Yan-Hui Fan",dFrom=2007,dTo=2013,n=10) # rm1="22698742" # pmids1=editPMIDs(x=pmid1,y=rm1,method="exclude") # pmid2=getPMIDs(author="Yanhui Fan",dFrom=2007,dTo=2013,n=10) # rm2="20576513" # pmids2=editPMIDs(x=pmid2,y=rm2,method="exclude") # pmids=editPMIDs(x=pmids1,y=pmids2,method="add")
# pmid1=getPMIDs(author="Yan-Hui Fan",dFrom=2007,dTo=2013,n=10) # rm1="22698742" # pmids1=editPMIDs(x=pmid1,y=rm1,method="exclude") # pmid2=getPMIDs(author="Yanhui Fan",dFrom=2007,dTo=2013,n=10) # rm2="20576513" # pmids2=editPMIDs(x=pmid2,y=rm2,method="exclude") # pmids=editPMIDs(x=pmids1,y=pmids2,method="add")
retrieve abstracts of the specified PMIDs from PubMed.
getAbstracts(pmid, https = TRUE, s = 100)
getAbstracts(pmid, https = TRUE, s = 100)
pmid |
a set of PMIDs |
https |
use https instead of http |
s |
download how many PMIDs each time |
# pmids=c("22693232", "22564732", "22301463", "22015308", "21283797", "19412437") # abstracts=getAbstracts(pmids) # pmid="22693232" # abstract=getAbstracts(pmid) # pmids=getPMIDs(author="Yan-Hui Fan",dFrom=2007,dTo=2013,n=10) # abstracts=getAbstracts(pmids)
# pmids=c("22693232", "22564732", "22301463", "22015308", "21283797", "19412437") # abstracts=getAbstracts(pmids) # pmid="22693232" # abstract=getAbstracts(pmid) # pmids=getPMIDs(author="Yan-Hui Fan",dFrom=2007,dTo=2013,n=10) # abstracts=getAbstracts(pmids)
retrieve PMIDs (each PMID is 8 digits long) from PubMed for author and the specified date.
getPMIDs(author, dFrom, dTo, n = 500, https = TRUE)
getPMIDs(author, dFrom, dTo, n = 500, https = TRUE)
author |
author's name |
dFrom |
start year |
dTo |
end year |
n |
max number of retrieved articles |
https |
use https instead of http |
# getPMIDs(author="Yan-Hui Fan",dFrom=2007,dTo=2013,n=10) # getPMIDs(author="Yanhui Fan",dFrom=2007,dTo=2013,n=10)
# getPMIDs(author="Yan-Hui Fan",dFrom=2007,dTo=2013,n=10) # getPMIDs(author="Yanhui Fan",dFrom=2007,dTo=2013,n=10)
retrieve PMIDs (each PMID is 8 digits long) from PubMed for Specific Journal, Keywords and date.
getPMIDsByKeyWords(keys = NULL, journal = NULL, dFrom = NULL, dTo = NULL, n = 10000, https = TRUE)
getPMIDsByKeyWords(keys = NULL, journal = NULL, dFrom = NULL, dTo = NULL, n = 10000, https = TRUE)
keys |
keywords |
journal |
journal name |
dFrom |
start year |
dTo |
end year |
n |
max number of retrieved articles |
https |
use https instead of http |
# getPMIDsByKeyWords(keys="breast cancer", journal="science",dTo=2013) # getPMIDsByKeyWords(keys="breast cancer", journal="science") # getPMIDsByKeyWords(keys="breast cancer",dFrom=2012,dTo=2013) # getPMIDsByKeyWords(journal="science",dFrom=2012,dTo=2013)
# getPMIDsByKeyWords(keys="breast cancer", journal="science",dTo=2013) # getPMIDsByKeyWords(keys="breast cancer", journal="science") # getPMIDsByKeyWords(keys="breast cancer",dFrom=2012,dTo=2013) # getPMIDsByKeyWords(journal="science",dFrom=2012,dTo=2013)
PubMed wordcloud.
plotWordCloud(abs, scale = c(3, 0.3), min.freq = 1, max.words = 100, random.order = FALSE, rot.per = 0.35, use.r.layout = FALSE, colors = brewer.pal(8, "Dark2"))
plotWordCloud(abs, scale = c(3, 0.3), min.freq = 1, max.words = 100, random.order = FALSE, rot.per = 0.35, use.r.layout = FALSE, colors = brewer.pal(8, "Dark2"))
abs |
output of cleanAbstracts, or a data frame with one colume of 'word' and one colume of 'freq'. |
scale |
A vector of length 2 indicating the range of the size of the words. |
min.freq |
words with frequency below min.freq will not be plotted |
max.words |
Maximum number of words to be plotted. least frequent terms dropped |
random.order |
plot words in random order. If false, they will be plotted in decreasing frequency |
rot.per |
proportion words with 90 degree rotation |
use.r.layout |
if false, then c++ code is used for collision detection, otherwise R is used |
colors |
color words from least to most frequent |
This function just call 'wordcloud' from package wordcloud. See package wordcloud for more details about the parameters.
# text="Jobs received a number of honors and public recognition." # cleanD=cleanAbstracts(text) # plotWordCloud(cleanD,min.freq=1,scale=c(2,1))
# text="Jobs received a number of honors and public recognition." # cleanD=cleanAbstracts(text) # plotWordCloud(cleanD,min.freq=1,scale=c(2,1))