data name
B - 02-1 ~ G - 11-9
.csv
subgroup B - 02 ~ B - 11
G - 02 ~ G - 11
red 의 nuclear translocation 확인하기 (nuclear, cytosol)
green 은 그 단백질이 발현되는 세포만 labeling
!rm *.xlsx
!rm *.csv
## 파일 업로드
from google.colab import files
uploaded = files.upload()
Input
Green_background=1895
Red_background=680
green_threshold=400
ratio_threshold=1
File_name='231028'
Datasorting
import pandas as pd
import numpy as np
n, c, r, g={}, {}, {}, {}
namelist2=[]
df={}
for fn in uploaded.keys():
namelistb=fn.split('-')[0:3]
nameb='-'.join(namelistb)
name=nameb.replace(" ", "")
namelist2.append(name)
if fn.split('-')[3].split('.')[0]=='Green':
g[name]=pd.read_csv(fn)['Mean']-Green_background
if fn.split('-')[3].split('.')[0]=='Red':
n[name]=pd.read_csv(fn)['Nucleus']-Red_background
c[name]=pd.read_csv(fn)['Band']-Red_background
namelist=set(namelist2)
for k in namelist:
r[k]=n[k]/c[k]
df[k]=pd.concat([g[k],n[k], c[k], r[k]], axis=1)
df[k].columns=['Green','Nucleus', 'Cytosol', 'Ratio']
(Option) Green positive cell only
for k in namelist:
df[k]=df[k][df[k]['Green'] > green_threshold].reset_index(drop=True)
모든 데이터 정리 --> Excel export
variables = ['Nucleus', 'Green', 'Cytosol', 'Ratio', 'Nucleus_Mean', 'Green_Mean', 'Cytosol_Mean', 'Ratio_Mean', 'Number', 'Ratio_Positive']
result={}
result_df={}
result_s={}
new_series={}
dff={}
for v in variables:
result[v]={}
for k in namelist:
for v in variables[0:4]:
result[v][k]=df[k][v]
for v in range(4,7):
vkey=variables[v-4]
result[variables[v]][k]=df[k][vkey].mean()
result[variables[7]][k]=df[k][variables[3]].mean()*100
result[variables[8]][k]=len(df[k])
if len(result['Ratio'][k]) > 0 :
result['Ratio_Positive'][k]=len(result['Ratio'][k][result['Ratio'][k] > ratio_threshold])/len(result['Ratio'][k])*100
#순서대로 정렬
sorted_dict={}
for v in variables:
sorted_keys = sorted(result[v].keys())
sorted_dict[v] = {key: result[v][key] for key in sorted_keys}
#df로 바꾸기
for v in variables[0:4]:
result_df[v]=pd.concat(sorted_dict[v], axis=1)
for v in variables[4:]:
result_s[v]=pd.Series(sorted_dict[v])
#특정 패턴을 가진 인덱스를 선택
new_series[v]={}
for k in namelist:
new_k=k.split('-')[0]+'-'+k.split('-')[1]
selected_result_s = result_s[v][result_s[v].index.str.startswith(new_k) ]
new_indices=selected_result_s.index.str.split('-').str[2]
new_series[v][new_k]=pd.Series(selected_result_s.values, index=new_indices)
group=[]
for k in namelist:
group.append(k.split('-')[0]+'-'+k.split('-')[1])
group=set(group)
for v in variables[4:]:
dff[v]=pd.DataFrame(new_series[v])
result_df[v] = dff[v].sort_index(axis=1)
with pd.ExcelWriter(File_name+'.xlsx') as writer:
for v in variables:
result_df[v].to_excel(writer, sheet_name=v)
files.download(File_name+'.xlsx')
'IT > Python' 카테고리의 다른 글
python 소그룹으로 묶기 (subgroup 만들기) str.split (0) | 2023.10.28 |
---|---|
python 딕셔너리, 데이터프레임 알파벳 순서로 재정렬하기 (2) | 2023.10.28 |
파이썬으로 정규분포그래프 그리기 python normal distribution (0) | 2021.06.20 |
python GUI (6) 파이썬 GUI 실행파일 만들기 .py to .exe (pyinstaller) (0) | 2020.04.20 |
파이썬으로 signal processing (scipy.signal) (0) | 2020.04.16 |
댓글