📚 Probability and Statistics: Week 9 Study Guide
Course: ISE 205 Probability and Statistics (2025-2026 Fall) Instructor: Dr. Burcu ÇARKLI YAVUZ (bcarkli@sakarya.edu.tr) Sources: This study material is compiled from the provided lecture text and audio transcript.
🎯 Introduction to Statistics
Statistics is a fundamental field that transforms raw data into meaningful information. It is an applied branch of mathematics, drawing principles from probability theory to evaluate and interpret data. This process is crucial for informed decision-making across various sectors.
📊 What is Statistics?
Statistics can be defined as the comprehensive process of converting data into information. This process involves several key stages:
- Collection: Gathering raw data.
- Organization: Structuring the collected data.
- Summarization: Condensing data into understandable forms (e.g., tables, charts).
- Analysis: Applying methods to extract insights.
- Interpretation: Understanding the meaning of the analysis results.
- Presentation: Communicating findings effectively.
Statistics provides methods for compiling, categorizing, summarizing data, designing experiments, establishing observation principles, and examining, interpreting, and generalizing sample information.
🌍 Where is Statistics Used?
Effective management and decision-making heavily rely on the correct understanding and use of statistical data. Statistics plays an active role in:
- Financial Analysis: Analyzing stock performance.
- Economics: Forecasting economic trends and understanding current conditions.
- Public Opinion: Conducting pre-election polls.
- Medical Research: Evaluating treatment effectiveness and disease patterns.
- Quality Control: Making decisions related to production and service quality.
- Marketing: Understanding consumer behavior and market trends.
- Business Operations: Informing purchasing and sales decisions based on inventory assessments.
💡 Example: Oakland Athletics (Moneyball) The American baseball team, Oakland Athletics, achieved historical success by using statistical analysis to acquire players at low cost. General Manager Billy Beane, with the help of an MIT analyst, broke traditional baseball taboos by employing inferential and relational analysis methods. This approach led the Athletics to win 20 consecutive games in 2002, a first in 103 years of baseball history. This computer-aided statistical player selection method is now widely adopted across teams and was the subject of the movie "Moneyball."
📚 Fundamental Concepts in Statistics
Data vs. Information
- Data: Raw realities or values collected to form information. On their own, data points do not convey meaning.
- Example: Numbers like "55" or "8" are raw data.
- Information: Meaningful values transformed from raw data. Data becomes information when context and meaning are added.
- Example: "The average student score in the class is 55" or "The train departs at 8 AM" are pieces of information.
Essentially, statistics is the art of making sense of a data series.
Variables, Values, and Data
- Variable: A characteristic feature of the situation being studied. Variables are typically denoted by letters (e.g., x, y, z).
- Values: The possible outcomes a variable can take within a certain range.
- Data: The actual observed values obtained from observations of the same variable.
💡 Example:
- Variable: Student Grades
- Values: Student Grades (0 to 100)
- Data: Student Grades {54, 67, 87, 99}
Types of Data
Understanding data categories is crucial for selecting appropriate summarization and analysis methods. Data can be broadly classified into two main types:
-
Qualitative (Categorical) Data: Describes qualities or characteristics that cannot be measured numerically.
- Nominal Data: Categories without any inherent order or ranking. Arithmetic operations are not meaningful.
- Examples: Hair color (black, brown, gray), Marital status (single, married, divorced), License plate codes (54, 34, 10).
- Ordinal Data: Categories with a meaningful order or ranking, but the magnitude between consecutive values is unknown. Arithmetic operations are not meaningful.
- Examples: Course rating (Bad, Acceptable, Good, Very Good), Academic grades (AA, BA, BB), Customer satisfaction (0: not satisfied, 4: very satisfied).
- Nominal Data: Categories without any inherent order or ranking. Arithmetic operations are not meaningful.
-
Quantitative (Numerical) Data: Measurable and expressed with numerical values, allowing for arithmetic operations.
- Discrete Data: Can only take whole number values, typically obtained by counting.
- Examples: Number of students in a class (20 students), Number of rooms in a house (3 rooms), Number on a die roll (1, 2, 3, 4, 5, 6).
- Continuous Data: Can take any value within a given range, including decimal values, typically obtained through measurement.
- Examples: A person's height (1.75 m), A product's weight (2.3 kg), Room temperature (22.5 °C).
- Discrete Data: Can only take whole number values, typically obtained by counting.
Quantitative data can also be further classified by scale:
- Interval Scale: Values have an order, and differences between values are meaningful, but there is no true zero point. Ratios are not meaningful.
- Example: Temperature in Celsius. 20°C is 5 degrees higher than 15°C, but 10°C is not twice as hot as 5°C because 0°C does not represent an absence of temperature.
- Ratio Scale: Values have an order, differences are meaningful, and there is a true zero point. Both differences and ratios are meaningful.
- Example: Weight. A person weighing 90kg is 30kg heavier than someone weighing 60kg, and also twice as heavy as someone weighing 45kg.
Beyond these, there are also:
- Time Series Data: Observes the change of a variable over time (e.g., sales volume in Ankara over several years).
- Cross-Sectional Data: Describes data for different variables at a single point in time (e.g., sales volume in 4 different locations in a given year).
📈 Population, Sample, Parameter, and Statistic
- Population (Ana Kütle): All possible values related to a subject of study. Often too large or infinite to access entirely.
- Sample (Örneklem): A subset of the population selected for analysis. Used to make inferences about the entire population when a census is impractical.
- Example: For research on Istanbul voters (population = 10 million), a sample might be 895 randomly selected individuals.
- Parameter: A characteristic feature of a population. Requires all population data for its calculation.
- Statistic: A characteristic feature of a sample. Numerical summaries calculated from sample data, used to estimate population parameters.
📝 Data Collection Strategies
Data collection is a fundamental stage of statistical analysis.
Census vs. Sampling
- Census (Tamsayım): Involves reaching every single value within the population relevant to the analysis (e.g., a national population census). Often impractical due to time and cost.
- Sampling (Örnekleme): Selecting a representative subset of the population for analysis. Most statistical analyses are performed on samples.
Data Collection Methods
-
Observation (Gözlem): Systematically recording outcomes using sensory organs or tools (e.g., meters, telescopes).
- ✅ Pros: Less manipulation, less bias in natural settings.
- ⚠️ Cons: Costly, time-consuming, susceptible to observer inexperience or sensory limitations.
- Example: Asking individuals about their aspirin use and heart attack history to study aspirin's effect on heart attack risk.
-
Experiments (Deneyler): Systematically recording outcomes under different controlled conditions. Often preferred by scientists.
- ✅ Pros: More reliable data due to controlled conditions.
- ⚠️ Cons: Expensive, requires scientific expertise.
- Example: Randomly assigning two groups, giving one aspirin for two years and the other a placebo, then comparing heart attack rates.
- 💡 Insight: Experimental data is generally more reliable than observational data, but also more demanding and costly to collect.
-
Surveys (Araştırma): A common method, especially in social studies, to determine preferences and behaviors.
- Personal Interview (Mülakat): Direct interaction with respondents.
- ✅ Pros: High response rates, minimizes misunderstandings, considered most accurate.
- Telephone Interview (Telefonla Görüşme): Conducted over the phone.
- ✅ Pros: Cost-effective.
- ⚠️ Cons: Lowest response rates, less personal interaction.
- Questionnaire (Anket): Written set of questions.
- ✅ Pros: Low cost, can reach a large number of subjects.
- ⚠️ Cons: Low response rates, high potential for misinterpreting questions due to lack of direct communication.
- Personal Interview (Mülakat): Direct interaction with respondents.
💡 Questionnaire Design Guidelines
When preparing a questionnaire, consider these points:
- Brevity and Simplicity: Keep it short and simple to encourage completion.
- Clarity: Use easily understandable, well-phrased, short questions.
- Demographic Start: Begin with demographic questions to ease respondents into the survey.
- Multiple-Choice: Prefer multiple-choice questions for ease of analysis.
- Open-Ended Questions: Limit open-ended questions as they are difficult to analyze, though valuable for detailed insights.
- Avoid Leading Questions: Do not phrase questions in a way that suggests a preferred answer (e.g., "Do you agree that the Statistics exam was difficult?").
- Pilot Testing: Conduct a preliminary test with a small group to identify and correct errors (typographical, unclear, leading questions).
- Variable Planning: Carefully conceptualize variables and potential analyses before designing questions to ensure relevant data collection.
📝 Sampling: Techniques and Planning
Sampling allows making inferences about a population by studying a part of it.
Why Sample?
- Cost-effectiveness: Cheaper to survey a sample than an entire population.
- Time Efficiency/Practicality: Faster to collect data from a sample (e.g., crash testing a few cars instead of all).
Sampling Error
- Sampling Error: The natural difference between a sample and the population. It is inherent in sampling.
- ⚠️ To avoid sampling error, a census is required, though even a census can have measurement errors.
Sampling Framework and Plan
- Sampling Frame (Örnekleme Çerçevesi): A comprehensive list of all values within the research universe (e.g., a list of lawyers from the bar association).
- Sampling Plan (Örnekleme Planı): Outlines the methodology and procedures for drawing a sample, covering the target population, data collection method, and sampling type.
Steps to Create a Sampling Plan
- Define Research and Sampling Objective: Crucial first step; errors here invalidate the plan.
- Determine Population/Research Universe: Clearly identify the target group.
- Identify and Define Variables: Specify the characteristics to be compared or investigated.
- Choose Data Collection Method: Select observation, experiment, questionnaire, or interview based on time, cost, and population size.
- Calculate Sample Size: Determine the appropriate number of units for the sample.
- Select Sampling Technique: Choose from probability or non-probability methods.
- Design Sampling Process: Organize the field research.
- Systematic Data Recording: Ensure collected data is systematically recorded.
Sampling Techniques
Sampling techniques are categorized into probability-based and non-probability-based methods.
1️⃣ Probability Sampling Techniques
Every unit in the research universe has a known, non-zero probability of being included in the sample. These techniques yield representative samples.
- Simple Random Sampling (Basit Tesadüfi Örnekleme): Every unit has an equal and independent chance of selection.
- Example: Randomly drawing 200 student names from a list of all university students.
- Stratified Sampling (Tabakalı Örnekleme): Used when proportional representation of a variable from the population needs to be maintained in the sample. The population is divided into strata (subgroups), and samples are drawn from each stratum.
- Example: If a population consists of 800 engineering students and 200 business students, a sample of 100 might include 80 engineers and 20 business students to maintain the 80:20 ratio.
- Cluster Sampling (Küme Örnekleme): Groups (clusters) of units are randomly selected, rather than individual units. Often used when the population is geographically dispersed.
- Example: To survey 200 employees across 10 Sedaş payment units (20 people each), randomly select 10 units and survey all employees within those units.
- ⚠️ Note: Can lead to higher sampling error due to potential relationships within clusters.
- Systematic Sampling (Sistematik Örnekleme): Selects every k-th unit from an ordered list.
- Example: From a list of 5000 people, to sample 500, select every 10th person starting from a random point (e.g., 3rd, 13th, 23rd...).
- ⚠️ Note: Can introduce bias if the population list has a specific periodic order.
2️⃣ Non-Probability Sampling Techniques
Based on the researcher's judgment; typically used for exploratory studies and do not allow for generalization to the population.
- Convenience Sampling (Kolayda Örnekleme): Subjects are chosen based on their easy accessibility.
- Example: Administering surveys to people encountered at a specific location or online (e.g., internet survey forms).
- Purposive (Judgmental) Sampling (Kasti Örnekleme): The researcher selects subjects believed to be most relevant or knowledgeable for the study, based on their judgment.
- Example: Distributing a survey about football fanaticism outside a stadium, assuming people there are more likely to be fanatics.
- Quota Sampling (Kota Örneklemesi): A non-probability equivalent of stratified sampling. The researcher defines strata and selects a quota from each based on judgment. Offers flexibility but increases sampling bias.
- Example: For a fanaticism study, if the researcher divides the population into 5 age groups and aims for 500 respondents, they might select 100 people from each group based on convenience or judgment.
- Snowball Sampling (Kartopu Örneklemesi): Used when the target population is difficult to reach. Initial subjects refer other potential participants, allowing the sample to grow.
- Example: Researching the early challenges of a 50-year-old holding company by starting with 1-2 retired employees who then refer others.
- ⚠️ Note: Only used when reaching the entire research universe is impossible.
✅ Key Learnings from This Week
- Definition of Statistics: The process of transforming data into information.
- Data Types: Understanding nominal, ordinal, interval, and ratio scales.
- Core Concepts: Differentiating between population, sample, parameter, and statistic.
- Data Collection Methods: Exploring observation, experiments, and surveys.
- Sampling Techniques: Distinguishing between probability and non-probability sampling methods.
🔜 Upcoming Topics
In the coming weeks, we will delve into the different types of statistics:
- Descriptive (Betimsel) Statistics: Focuses on collecting, describing, and presenting data.
- Inferential (Yorumlayıcı) Statistics: Involves making decisions and drawing conclusions about a population based on sample data.








