Skip to content

Scalar Types in Protocol Buffers

DodaTech Updated 2026-06-28 4 min read

In this tutorial, you will learn about Scalar Types in Protocol Buffers. We cover key concepts, practical examples, and best practices to help you master this topic.

Protocol Buffers provides a set of scalar value types that map to corresponding types in each programming language. Choosing the right scalar type impacts serialized size, performance, and cross-language compatibility.

What You'll Learn

  • All protobuf scalar types and their wire formats
  • Language-specific type mappings
  • Signed vs unsigned vs fixed-width integer types
  • Choosing the right scalar for your data

Why It Matters

The wrong scalar type can double your message size or cause overflow issues in certain languages. For example, using int64 for a field that stores small values wastes bytes on the wire.

Real-World Use

Google's APIs use sint32 for fields that may have negative values, uint64 for resource IDs that are always positive, and fixed32 for fields requiring fast encoding like hashes.

flowchart LR
    ProtoType[Proto Type] --> WireType[Wire Type]
    ProtoType --> LanguageMapping[Language Mapping]
    WireType --> Varint[Varint: int32, int64, uint32, uint64, sint32, sint64, bool, enum]
    WireType --> Fixed[Fixed 64-bit: fixed64, sfixed64, double]
    WireType --> Fixed32[Fixed 32-bit: fixed32, sfixed32, float]
    WireType --> Length[Length-delimited: string, bytes]

Teacher Mindset

When in doubt, use int32 for whole numbers, double for floating point, string for text, and bytes for binary data. Optimize to smaller types only when you have measured the need.

Code Examples

// Example 1: All scalar types in one message
message ScalarDemo {
  double   pi = 1;          // 3.14159
  float    temperature = 2; // 23.5
  int32    age = 3;         // 30
  int64    timestamp = 4;   // 1700000000
  uint32   positive_id = 5; // 42
  uint64   large_id = 6;    // 18446744073709551615
  sint32   signed_temp = 7; // -10 (more efficient for negatives)
  fixed32  hash32 = 8;      // 4042322161 (always 4 bytes)
  fixed64  hash64 = 9;      // always 8 bytes
  bool     is_active = 10;  // true
  string   name = 11;       // "Alice"
  bytes    image = 12;      // binary data
}
// Example 2: Type selection guidance
message OptimizationDemo {
  // sint32/64 for values that may be negative
  sint32 change = 1;

  // uint32/64 for values that are always non-negative
  uint32 product_count = 2;

  // fixed32 for values > 2^28 or when encoding speed matters
  fixed32 serial_number = 3;

  // Use bytes for raw binary, string for UTF-8 text
  bytes thumbnail = 4;
  string display_name = 5;
}
# Example 3: Language mapping for Python
from scalar_demo_pb2 import ScalarDemo

msg = ScalarDemo()
msg.pi = 3.14159       # Python float -> protobuf double
msg.temperature = 23.5 # Python float -> protobuf float
msg.age = 30           # Python int -> protobuf int32
msg.name = "Alice"     # Python str -> protobuf string
msg.image = b'\x89PNG' # Python bytes -> protobuf bytes
msg.is_active = True   # Python bool -> protobuf bool

Common Mistakes

  • Using int64 when uint64 is more appropriate for always-positive values
  • Using int32 for negative values when sint32 produces smaller wire encoding
  • Forgetting that bool is its own type, not int32
  • Using string for binary data instead of bytes
  • Assuming all languages handle 64-bit integers identically (JavaScript loses precision)

Practice

  1. Create a message with one field of each scalar type.
  2. Serialize the message and observe the byte size of different integer types.
  3. Compare int64 vs uint64 vs sint64 by serializing the value -5.
  4. Map the type to your language of choice and verify correct types.
  5. Challenge: Create a benchmark that serializes 10000 messages with different scalar choices and measures size and speed.

FAQ

What is the difference between int32 and sint32?

sint32 uses ZigZag encoding which stores negative numbers more efficiently. Use sint32 when values may be negative. Use int32 for mostly positive values.

When should I use fixed32 instead of int32?

Use fixed32 when values are typically larger than 2^28 or when encoding speed is more important than space. fixed32 always uses 4 bytes.

What is the maximum size of a string field?

The maximum size is 2^32 bytes (about 4 GB) in proto3, though practical limits are much lower.

How does the bytes type work?

bytes stores arbitrary binary data. It is serialized as a length-delimited field. Use it for images, encrypted data, or any non-UTF-8 content.

Can I use enums as field types?

Yes. Enum fields are treated as int32 on the wire but enforce valid values at compile time.

Mini Project

Design a message for storing product metrics that uses appropriate scalar types: sint32 for price changes, uint64 for SKU numbers, fixed32 for category IDs, double for ratings, and bytes for product images.

What's Next

Next, you will learn about enum types in Protocol Buffers for defining sets of named values.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro