hivemind.client

This module lets you connect to distributed Mixture-of-Experts or individual experts hosted in the cloud cloud on someone else's computer.

class hivemind.client.RemoteExpert(uid, endpoint: str)[source]

A simple module that runs forward/backward of an expert hosted on a remote machine. Works seamlessly with pytorch autograd. (this is essentially a simple RPC function)

Warning: RemoteExpert currently assumes that you provide it with correct input shapes. Sending wrong input shapes can cause RemoteExpert to freeze indefinitely due to error in runtime.

Parameters:
  • uid – unique expert identifier
  • endpoint – network endpoint of a server that services that expert, e.g. “201.123.321.99:1337” or “[::]:8080”
forward(*args, **kwargs)[source]

Call RemoteExpert for the specified inputs and return its output(s). Compatible with pytorch.autograd.

class hivemind.client.RemoteMixtureOfExperts(*, in_features, grid_size: Tuple[int, ...], dht: hivemind.dht.DHT, k_best: int, k_min: int = 1, forward_timeout: Optional[float] = None, timeout_after_k_min: Optional[float] = None, backward_k_min: int = 1, backward_timeout: Optional[float] = None, uid_prefix='', allow_broadcasting=True, loop: Optional[asyncio.base_events.BaseEventLoop] = None)[source]

A torch module that performs mixture of experts inference with a local gating function and multiple remote experts. Natively supports pytorch autograd.

Note:

By default, not all experts are guaranteed to perform forward pass. Moreover, not all of those who ran forward pass are guaranteed to perform backward pass. In the latter case, gradient will be averaged without the missing experts

Parameters:
  • in_features – common input size for experts and gating function
  • grid_size – hivemind dimensions that form expert uid (see below)
  • uid_prefix – common prefix for all expert uids expert uid follows the pattern {uid_prefix}.{0…grid_size[0]}.{0…grid_size[1]}…{0…grid_size[-1]}
  • dht – DHT where the experts reside
  • k_best – queries this many experts with highest scores
  • k_min – makes sure at least this many experts returned output
  • timeout_after_k_min – waits for this many seconds after k_min experts returned results. Any expert that didn’t manage to return output after that delay is considered unavailable
  • allow_broadcasting – if RemoteMixtureOfExperts if fed with input dimension above 2, allow_broadcasting=True will flatten first d-1 input dimensions, apply RemoteMixtureOfExperts and un-flatten again allow_broadcasting=False will raise an error
forward(input: torch.Tensor, *args, **kwargs)[source]

Choose k best experts with beam search, then call chosen experts and average their outputs. Input tensor is averaged over all dimensions except first and last (we assume that extra dimensions represent sequence length or image dimensions)

Parameters:
  • input – a tensor of values that are used to estimate gating function, batch-first.
  • args – extra positional parameters that will be passed to each expert after input, batch-first
  • kwargs – extra keyword parameters that will be passed to each expert, batch-first
Returns:

averaged predictions of all experts that delivered result on time, nested structure of batch-first

compute_expert_scores(grid_scores: List[torch.Tensor], batch_experts: List[List[hivemind.client.expert.RemoteExpert]]) → torch.Tensor[source]

Compute scores for each expert by adding up grid scores, autograd-friendly :param grid_scores: list of torch tensors, i-th tensor contains scores for i-th grid dimension :param batch_experts: list(batch) of lists(k) of up to k experts selected for this batch :returns: a tensor of scores, float32[batch_size, k] :note: if some rows in batch have less than max number of experts, their scores will be padded with -inf

Find and return k best experts in the grid using (exact) beam search of the product space

Parameters:
  • grid_scores (a sequence of tensors of shape[batch_size, self.grid_size[i]]) – scores predicted for each dimension in the grid,
  • k_best – how many of the top experts participate in the computation
  • kwargs – extra keyword parameters passed to self.dht.first_k_active
Returns:

a list of batch_size lists that contain chosen experts for one sample each inner list contains RemoteExpert instances for up to k_best experts