[GA] Google Data Analytics Program Course 4 정리

course 3는 전반적으로 sql 에 대해서 배웠다. Big query에 대해서 차근차근 시작한다. 하지만 역시 입문자에게 좋은 것 같다. broad하게 가르치다 보니 한국에서 제공한는 sql 영상들보다 더 더딘것같다.

다음은 데이터리안 sql 영상을 병행할 예정이다. 영어도 다시 연습하고 데이터 관련 공부할 수 있으니 강추한다.

Terminology Definitions

Population	The entire group that you are interested in for your study. For example, if you are surveying people in your company, the population would be all the employees in your company.
Sample	A subset of your population. Just like a food sample, it is called a sample because it is only a taste. So if your company is too large to survey every individual, you can survey a representative sample of your population.
Margin of error	Since a sample is used to represent a population, the sample’s results are expected to differ from what the result would have been if you had surveyed the entire population. This difference is called the margin of error. The smaller the margin of error, the closer the results of the sample are to what the result would have been if you had surveyed the entire population.
Confidence level	How confident you are in the survey results. For example, a 95% confidence level means that if you were to run the same survey 100 times, you would get similar results 95 of those 100 times. Confidence level is targeted before you start your study because it will affect how big your margin of error is at the end of your study.
Confidence interval	The range of possible values that the population’s result would be at the confidence level of the study. This range is the sample result +/- the margin of error.
Statistical significance	The determination of whether your result could be due to random chance or not. The greater the significance, the less due to chance.

1) 통계 관련

1 CLT(central limit theorum)에 따라 샘플 사이즈는 최소한 30으로 시작한다.

2 confidence level(sample 이 얼마나 population을 reflect 하는지?, 내가 가진 샘플크기가 population크기와 많이 비슷해!) 은 95%가 최선이지만 90%도 어느정도 수용함

1) confidence level이 높으려면, 큰 샘플 사이즈 사용

2) Margin of error(population을 사용했을때 sample size를 사용했을 때 얼마나 유사할지?) 을 줄이려면, 큰 샘플 사이즈 사용

3) 높은 statistical significance(lower chance that resulted from randomness), 큰 샘플 사이즈 사용!

2) A walk through of Statistical Hypothesis Testing

1 연구 시작 전 null hypothesis(there is no result) 설계

- 예를 들면 두 변수는 아무런 관계가 없다 등

2 null hypothesis는 종종 p-value 를 가진다.

- null hypothesis에서 예측한 수와 같을때, 다를때 측정하는 수

- p-value는 항상 confidence level 이랑 같이 진행한다.

- confidence level 은 종종 alpha(a)를 사용하고, 주로 5%, 0.05%가 사용된다.

p-value가 0.05보다 작을 시, statisically significant(reject H0, different distribution)

p-value가 0.05보다 클시, statistically insignificant 사용(fail to reject H0, same distribution

3 Type Error

- Type I error : p- value 가 너무 작을 시 false positive 이라고 칭함

- Type II error : p-value 가 너무 클시 false negative 이라고 칭함

Low statistical value- large risk of commiting type II error

High statistical value- small risk of commiting type II error

출처:

https://machinelearningmastery.com/statistical-power-and-power-analysis-in-python/

3) Spreadsheet 정리

1 left(A2,5)

2 right(A2,5)

3 mid(D2,4,2)

4 concatenate(A2,A3)

CURIOUS ABOUT WHATEVER DATA YOU ARE GIVEN

'Programming > Google Data Analytics Certificate' 카테고리의 다른 글

[GA] Google Data Analytics 데이터 Clean 시 마지막 확인 사항 (0)	2023.03.22
[GA] Google Analytics Certificate Program (GA Day) Course 4 SQL basic queries (0)	2023.03.21
[Google Analytics Professional Program] Databases (0)	2023.03.18
[Google Analytics Professional Certificate Program] Course 3까지 끝낸 후기 (0)	2023.03.17
[Google Analytics Professional Certificate Program] 시작~ (0)	2023.03.13

그렉

[GA] Google Data Analytics Program Course 4 정리

'Programming > Google Data Analytics Certificate' 카테고리의 다른 글

티스토리툴바

[GA] Google Data Analytics Program Course 4 정리

'Programming > Google Data Analytics Certificate' 카테고리의 다른 글

관련글

티스토리툴바