2017-12-27 1 views
1

Ich versuche, tidyr :: separate() zu verwenden, um einigen freien (gefälschten) Text in verschiedene Spalten zu trennen.Wie man Text in Spalten mit korrekten Spaltennamen trennt

Der Eingang:

structure(list(PathReportWhole = c("SP-37-2784518\nHospital: Random NHS Foundation Trust\nHospital Number: J6044658\nPatient Name: Jargon, Victoria\nDOB: 1943-10-13\nGeneral Practitioner: Dr. Martin, Marche\nDate received: 2009-11-11\nClinical Details: Ongoing antral gastritis despite treatment with PPI,Reflux sx.,High dyshagia OGD - fundic gastritis.,Chronic diarrhoea/colonic biopsies,Currently on steriod for IgG4 disease.,Food bolus obstruction.\n4 specimen. Nature of specimen: Nature of specimen as stated on pot = 'proximal body lesser curve polyps x4 ',Specimen A- Nature of specimen as stated on request form = 'GREATER CURVE ',Nature of specimen as stated on request form = 'Gastric polyp '\nMacroscopic description: 3 specimens collected the largest measuring 3 x 2 x 1 mm and the smallest 2 x 1 x 5 mm\nHistology: Two biopsies consist of small bowel mucosa and are within normal histological limits.\nDiagnosis: Distal transverse colon polyp excision:- tubular adenoma, low grade dysplasia.,Ileo-caecal valve, biopsies:.,Stomach antrum biopsies:- normal mucosa.,- Up to 34 eosinophils per high power field,Stomach, biopsy - Mild chronic inflammation.", 
"SP-50-4495875\nHospital: Random NHS Foundation Trust\nHospital Number: Y6417773\nPatient Name: Powell, Destiny\nDOB: 1946-12-29\nGeneral Practitioner: Dr. al-Safi, Lutfiyya\nDate received: 2008-06-15\nClinical Details: Quadrantic biopsies were taken at.,OGD - only 3cm sliding hiatus.\n7 specimen. Nature of specimen: Nature of specimen as stated on pot = 'RECTAL POLYPS X3 ',Nature of specimen as stated on pot = 'fundus polyps x4 ',Nature of specimen as stated on request form = 'DUODENAL BX ',Nature of specimen as stated on pot = 'Papilloma at 36 cm oesophagus ',a) Nature of specimen as stated on request form = 'D2 bx x 2' ,Nature of specimen as stated on pot = 'Oesophagus 26 cm '\nMacroscopic description: 4 specimens collected the largest measuring 4 x 4 x 4 mm and the smallest 5 x 3 x 1 mm\nHistology: modified giemsa stain.,These are biopsies of gastric mucosa ,There is no evidence of coeliac disease.,The nuclei are hyperchromatic,.,There is no granulomatous inflammation.,The appearances are in keeping with a reactive/chemical gastritis,features including basal layer hyperplasia and reactive nucelar changes with underlying.,These are two biopsies of squamous epithelium within normal limits,fibromuscularisation of the lamina propria and mild chronic inflammation.,These biopsies of columnar mucosa show focal acute inflammation, moderate chronic inflammation.\nDiagnosis: Rectum, polyp biopsy: - Tubular adenoma with mild dysplasia,- Raised intra-epithelial lymphocytes ,Duodenum, biopsies - within normal histological limits.,B GI biopsy - DISTAL OESOPHAGUS X2, MID OESO X3, PROX OESO X2.,Oesophagus, biopsies : - Minimal chronic inflammation,Sigmoid colon, polypectomy: - Tubular adenoma with moderate dysplasia,Oesophagus polyps biopsies:- 2 x papillomas.,Duodenum biopsies:- normal." 
)), .Names = "PathReportWhole", row.names = 1:2, class = "data.frame") 

Im Moment verwende ich ein Zeichen Vektor von Trennzeichen wie folgt:

mywords<-c("Hospital Number","Patient Name:","DOB:","General Practitioner:",+ "Date received:","Clinical Details:","Macroscopic description:","Histology:","Diagnosis:") 

Und dann die Funktion wie folgt (wird in einer benutzerdefinierten Funktion verwendet, da ich eventuell bestimmte Spalten separat aufräumen werde):

Extractor2 <- function(dataframeIn, Column, delim) { 
    Column <- rlang::sym(Column) 
    dataframeIn <- data.frame(dataframeIn) 
    dataframeIn<-dataframeIn %>% tidyr::separate(!!Column, into = delim, 
              sep = paste(delim, collapse = "|")) 
    return(dataframeIn) 
} 

Extractor2(Mypath,"PathReportWhole",mywords) 

Der Ausgang:

'data.frame': 2 obs. of 18 variables: 
$ Hospital Number   : chr "SP-37-2784518\nHospital: Random NHS Foundation Trust\n" "SP-50-4495875\nHospital: Random NHS Foundation Trust\n" 
$ Patient Name:   : chr ": J6044658\n" ": Y6417773\n" 
$ DOB:     : chr " Jargon, Victoria\n" " Powell, Destiny\n" 
$ General Practitioner: : chr " 1943-10-13\n" " 1946-12-29\n" 
$ Date received:   : chr " Dr. Martin, Marche\n" " Dr. al-Safi, Lutfiyya\n" 
$ Clinical Details:  : chr " 2009-11-11\n" " 2008-06-15\n" 
$ Macroscopic description:: chr " Ongoing antral gastritis despite treatment with PPI,Reflux sx.,High dyshagia OGD - fundic gastritis.,Chronic "| __truncated__ " Quadrantic biopsies were taken at.,OGD - only 3cm sliding hiatus.\n7 specimen. Nature of specimen: Nature of"| __truncated__ 
$ Histology:    : chr " 3 specimens collected the largest measuring 3 x 2 x 1 mm and the smallest 2 x 1 x 5 mm\n" " 4 specimens collected the largest measuring 4 x 4 x 4 mm and the smallest 5 x 3 x 1 mm\n" 
$ Diagnosis:    : chr " Two biopsies consist of small bowel mucosa and are within normal histological limits.\n" " modified giemsa stain.,These are biopsies of gastric mucosa ,There is no evidence of coeliac disease.,The nuc"| __truncated__ 
$ HospitalNumber   : chr " J6044658\n" " Y6417773\n" 
$ PatientName    : chr " Jargon, Victoria\n" " Powell, Destiny\n" 
$ DOB      : chr " 1943-10-13\n" " 1946-12-29\n" 
$ GeneralPractitioner  : chr " Dr\n Martin, Marche\n" " Dr\n al-Safi, Lutfiyya\n" 
$ Dateofprocedure   : Date, format: "2009-11-11" "2008-06-15" 
$ ClinicalDetails   : chr " Ongoing antral gastritis despite treatment with PPI,Reflux sx\n,High dyshagia OGD - fundic gastritis\n,Chroni"| __truncated__ " Quadrantic biopsies were taken at\n,OGD - only 3cm sliding hiatus\n\n7 specimen\n Nature of specimen: Nature"| __truncated__ 
$ Macroscopicdescription : chr " 3 specimens collected the largest measuring 3 x 2 x 1 mm and the smallest 2 x 1 x 5 mm\n" " 4 specimens collected the largest measuring 4 x 4 x 4 mm and the smallest 5 x 3 x 1 mm\n" 
$ Histology    : chr " Two biopsies consist of small bowel mucosa and are within normal histological limits\n\n" " modified giemsa stain\n,These are biopsies of gastric mucosa ,There is no evidence of coeliac disease\n,The n"| __truncated__ 
$ Diagnosis    : chr " Distal transverse colon polyp excision:- tubular adenoma, low grade dysplasia\n,Ileo-caecal valve, biopsies:\"| __truncated__ " Rectum, polyp biopsy: - Tubular adenoma with mild dysplasia,- Raised intra-epithelial lymphocytes ,Duodenum, "| __truncated__ 
NULL 

Das Problem:

Das Problem ist, dass die Spalten von eins verschoben zu werden scheinen, so dass die Daten unter Patient Name: zum Beispiel sollte unter Hospital Number undseinunter Patient Name: usw. Ich denke, dies ist, weil der Spaltenname nach dem 'Stop' Begrenzer benannt ist. Wie kann ich dies ändern, ohne etwas unordentlichen Code hinzufügen zu müssen, der sich auf die Verschiebung von Spaltennamen bezieht?

Antwort

1

Ich denke, dass das Hinzufügen eines leeren Spaltennamens zu Ihren Spalten funktionieren könnte?

Extractor2 <- function(dataframeIn, Column, delim) { 
    Column <- rlang::sym(Column) 
    dataframeIn <- data.frame(dataframeIn) 
    dataframeIn<-dataframeIn %>% tidyr::separate(!!Column, into = c("added_name",delim), 
     sep = paste(delim, collapse = "|")) 
    return(dataframeIn) 
} 

Das gibt

'data.frame': 2 obs. of 10 variables: 
$ added_name    : chr "FdSP-37-2784518\nHospital: Random NHS Foundation Trust\n" "SP-50-4495875\nHospital: Random NHS Foundation Trust\n" 
$ Hospital Number   : chr ": J6044658\n" ": Y6417773\n" 
$ Patient Name:   : chr " Jargon, Victoria\n" " Powell, Destiny\n" 
$ DOB:     : chr " 1943-10-13\n" " 1946-12-29\n" 
$ General Practitioner: : chr " Dr. Martin, Marche\n" " Dr. al-Safi, Lutfiyya\n" 
$ Date received:   : chr " 2009-11-11\n" " 2008-06-15\n" 
$ Clinical Details:  : chr " Ongoing antral gastritis despite treatment with PPI,Reflux sx.,High dyshagia OGD - fundic gastritis.,Chronic "| __truncated__ " Quadrantic biopsies were taken at.,OGD - only 3cm sliding hiatus.\n7 specimen. Nature of specimen: Nature of"| __truncated__ 
$ Macroscopic description:: chr " 3 specimens collected the largest measuring 3 x 2 x 1 mm and the smallest 2 x 1 x 5 mm\n" " 4 specimens collected the largest measuring 4 x 4 x 4 mm and the smallest 5 x 3 x 1 mm\n" 
$ Histology:    : chr " Two biopsies consist of small bowel mucosa and are within normal histological limits.\n" " modified giemsa stain.,These are biopsies of gastric mucosa ,There is no evidence of coeliac disease.,The nuc"| __truncated__ 
$ Diagnosis:    : chr " Distal transverse colon polyp excision:- tubular adenoma, low grade dysplasia.,Ileo-caecal valve, biopsies:.,"| __truncated__ " Rectum, polyp biopsy: - Tubular adenoma with mild dysplasia,- Raised intra-epithelial lymphocytes ,Duodenum, "| __truncated__ 
Verwandte Themen