Benchmarks July 3, 2025 8 min read

Pick Rate Benchmarks for Collaborative Robot Arms: What the Numbers Actually Mean

Published pick rate specs from robot arm manufacturers often assume ideal conditions — single-SKU, known orientation, clean grasp. Real warehouse environments are different. Here's a framework for estimating real-world throughput.

Automcore Engineering

Performance data dashboard showing pick rate metrics across robot arm types

Robot arm manufacturers publish pick rate specifications in their product datasheets. A UR10e is rated at up to 1.0 m/s TCP speed. A FANUC CR-14iA has a repeatability of ±0.02 mm. These are real numbers — but they describe the arm under controlled conditions that don't resemble a working warehouse picking environment.

Understanding what the published specs measure, and how to translate them into realistic throughput estimates, prevents the budget and timeline surprises that come from discovering the gap at commissioning.

What manufacturer specs measure

Most published pick rate specs are measured in cycles per hour under the following conditions: single item type with known geometry, fixed-position item presentation (conveyor with consistent spacing), pre-defined pick and place positions with no search, and a clean end effector matched to the test item. The cycle is defined as pick from A, place at B, return to home.

In these conditions, a collaborative arm running at safety-rated speed (typically 250 mm/s TCP at the workspace boundary) can execute 400–600 pick-place cycles per hour for a simple carton. Faster non-collaborative arms in fixed-cell configurations can reach 800–1,200 cycles per hour under similar test conditions. The repeatability figure (±0.02 mm for the CR-14iA) measures arm positioning accuracy at a fixed, known target — it says nothing about the system's ability to locate a variable-orientation item on a pallet and plan a grasp to it.

What changes in a real warehouse environment

Four factors reduce real-world throughput from the benchmark figure.

Item variation. Mixed-SKU picking requires pose estimation on each item before the grasp. A 3D vision pipeline adds 200–400 ms per cycle for point cloud processing and grasp point selection. At 400 ms overhead, you lose roughly 90 cycles per hour at a 400-cycle baseline. This is the single largest throughput reducer in mixed-SKU environments and the one most often missing from pre-deployment models.

Grasp reliability and retry rate. Items that are not presented at a consistent orientation — piled in a bin, partially occluded, or deformable — have higher grasp failure rates. A 5% failure rate with a retry sequence adds ~0.3 seconds per attempt average, reducing net throughput by 5–8% on affected SKUs. Items with lower initial training confidence scores have higher in-operation retry rates; this is one reason the 92–95% confidence deployment threshold matters operationally — it's not an arbitrary quality gate, it's the confidence level at which retry rates stay within the throughput budget.

Collaborative speed limits. Cobots operating in shared workspaces with human workers run at safety-rated speeds per ISO/TS 15066 speed-and-separation monitoring or power-and-force limiting. PFL mode limits TCP speed to 250 mm/s or less in the collaborative zone. This is roughly 40–60% of the arm's maximum programmed speed — and it applies during the portions of the pick cycle closest to the shared workspace boundary. The effect on throughput depends on workspace geometry: an arm where most of the pick-place motion happens within the collaborative zone will be more constrained than one where the approach to the item is outside the monitored zone.

End effector selection. An arm running a single vacuum cup optimized for flat cardboard will fail on flexible packaging. Hybrid end effectors combining vacuum and mechanical gripping handle more SKU types but add weight to the arm's payload budget and introduce switching logic that extends cycle time on SKU transitions. A UR10e with a 2.8 kg hybrid end effector has an effective picking payload of 7.2 kg — adequate for most consumer goods but worth confirming against the heavier items in the catalog.

Realistic mixed-SKU throughput ranges

Based on mixed-SKU warehouse deployments with collaborative robot arms, realistic per-arm throughput ranges by SKU category are:

Rigid consumer goods (consistent geometry): 580–640 picks/arm/hr
Canned/bottled beverage: 500–560 picks/arm/hr
Flexible packaging: 360–420 picks/arm/hr
Irregular shapes under 2 kg: 310–390 picks/arm/hr
Heavy items 2–15 kg: 280–360 picks/arm/hr

A fleet picking a mixed catalog across all five categories will average somewhere between 380 and 550 picks/arm/hr depending on the SKU distribution. A facility heavy on consumer boxed goods will trend toward the upper end; a 3PL handling diverse client products will trend lower.

These ranges are not from a single controlled study — they're composites from operational warehouse environments with different arm models, end effectors, and pick station geometries. Individual deployments will fall outside these ranges in both directions. A facility with unusually clean item presentation and consistent pallet organization will outperform the upper end. One with high levels of deformable packaging and variable pallet stacking will underperform the lower end on the relevant categories.

Why model training quality affects real-world throughput

There's a throughput variable that doesn't appear in most benchmark discussions: the quality of the trained grasp model for each SKU. An arm running a well-trained model for a given item — high pose estimation accuracy, correct end effector selection, grasp point refined over multiple training iterations — will pick that item at the upper end of its category range. An arm running a marginal model that was approved at the minimum confidence threshold will have higher retry rates, more force-torque anomaly flags, and lower net throughput on that SKU.

This matters for fleet propagation specifically. When a model trained on one arm propagates to 11 others, the training quality of that source session determines the throughput baseline for the entire fleet on that SKU. A model trained to 94% confidence on the source arm that propagates cleanly will give all 12 arms a 540 picks/hr performance floor on a rigid consumer goods item. A model that barely cleared the 92% threshold on the source arm may propagate at 87–89% confidence on some destination arms — below deployment threshold, triggering secondary training queues before those arms can run the SKU in production. Training quality on the source arm is a fleet-wide throughput variable, not just a single-arm quality metric.

How to build a realistic throughput model

Start with your current SKU catalog and classify items by the categories above. Weight each category by its share of your daily pick volume. Apply the throughput range for each category to your fleet size. That gives you a projected picks/hr range for your specific mix.

Build the conservative case (lower-end throughput for each category) and the expected case (mid-range). Use the conservative case for capacity planning and the expected case for ROI modeling. Avoid using the manufacturer's benchmark spec as the expected case — the gap between lab conditions and live warehouse operations is consistent enough that it will undermine your deployment timeline and budget projections.

A useful additional check: break your catalog by grasp difficulty tier (rigid uniform / rigid irregular / flexible / very small), estimate the fraction of daily picks in each tier, and build the throughput model against the tier distribution rather than a single average. A catalog where 70% of picks are rigid consumer goods looks very different in throughput terms from one where 40% are flexible packaging — even if the total SKU count is similar. The throughput model that accounts for the mix distribution will be materially more accurate than one that applies a single average pick rate to the full catalog.

Cycle time decomposition: where the time actually goes

A useful diagnostic tool for understanding real-world throughput is decomposing the pick cycle into its component phases and timing each. A complete pick-place cycle in a mixed-SKU warehouse environment breaks into roughly six phases: (1) vision acquisition — point cloud capture and transfer to the processing node; (2) pose estimation — identifying the target item and estimating its 6-DOF pose; (3) grasp planning — selecting the grasp point and approach vector, collision-checking against the scene; (4) arm motion to pick position; (5) grasp execution and lift — including force-torque monitoring; (6) transport and place, return to home position.

In a UR10e running mixed-SKU depalletizing with a hybrid end effector and a standard 3D vision setup, the typical phase timing looks roughly like this: vision acquisition (80–120 ms), pose estimation (150–250 ms), grasp planning (60–120 ms), arm motion to pick (400–700 ms depending on distance), grasp execution and lift (300–500 ms with force-torque monitoring), transport and place (500–800 ms). Total cycle: 1.5–2.5 seconds, corresponding to 1,440–2,400 cycles per hour at the theoretical maximum — but the upper end of that range is never achieved in practice because of the queuing time between cycles, the occasional retry, and the variation in item position that forces longer approach paths on some picks.

The practical operating range of 380–550 picks/arm/hr corresponds to a per-cycle time of 6.5–9.5 seconds, which is longer than the sum of the phase timings above. The difference is consumed by the time between cycles — waiting for the downstream conveyor, confirming place position, clearing the pick zone, handling any out-of-band events — and the occasional slow cycle (retry, long approach path, end effector switching) that pulls the average down from the median. Decomposing the cycle in your actual deployment and measuring where time is going is the most direct path to informed throughput improvement, because the bottleneck phase is not always the one intuition suggests.

Throughput vs. utilization: the distinction that matters for capacity planning

Throughput — picks per arm per hour — is the right metric for comparing system options. Utilization — what fraction of the operational window the arm is actively picking — is the right metric for capacity planning. The two are related but not the same, and conflating them leads to capacity models that don't match operational reality.

An arm that picks at 480 picks/hr during active picking but has 15% planned non-picking time (end-of-pallet changeover, pallet replenishment, minor adjustments) has a net throughput of 480 × 0.85 = 408 picks/hr averaged across the operating window. For a 10-hour shift, that's 4,080 picks per arm per shift — not the 4,800 the nominal pick rate suggests. At fleet scale, the difference between 480 and 408 picks/hr across 10 arms is 7,200 picks per shift — a meaningful capacity planning gap if the model uses the nominal rate.

Build planned downtime and changeover time into throughput models from the start. The utilization discount is fairly consistent across arm types in similar pick applications — 10–20% of the operating window is typically consumed by non-picking activities in depalletizing applications. Use a 15% utilization discount as a default in your throughput models unless you have facility-specific data suggesting otherwise.

What higher throughput numbers don't account for

We're not saying the manufacturer benchmark specs are meaningless — they're accurate for what they measure. A UR10e genuinely can execute the motion profile at those speeds and that repeatability. The benchmark number is the arm's physical ceiling under controlled conditions, not a figure the manufacturer invented. What the benchmark doesn't include is the full pick system: vision latency, grasp planning computation time, collaborative speed enforcement, end effector switching overhead, retry handling, and the unavoidable variation in item presentation that defines real warehouse operations. The delta between benchmark spec and operational throughput is not a failure of the hardware — it's the cost of operating in an uncontrolled environment, which is the only environment that matters.