video-background

Software Engineer – Inference Serving

At Taalas we believe that fundamental progress is achieved by those who are willing to
understand and assail a problem end-to-end, without regard for commonly accepted
abstractions and boundaries.

We are building a team of hands-on technologists who dislike overspecialization and seek to
excel in both depth and breadth.

In this position the successful candidate will build software infrastructure for an inference
serving cluster built around Taalas hardcore AI model chips.

JOB RESPONSIBILITIES

  • Adapt open-source inference servers like vLLM and Punica to interface with Taalas’ hardcore AI models
  • Implement a highly efficient LoRA swapping solution for multi-{tenant,LoRA} environments
  • Build and test a scalable inference serving cluster using K8 and Traefik or similar

QUALIFICATIONS

  • Bachelor’s or higher degree in Computer Science, or Electrical/Computer engineering
  • Experience with K8, HTTP load balancers, web-servers
  • Good knowledge of computer architecture and low-level programming: Linux virtual memory and page table management, direct memory access, CUDA
  • Familiarity with ML, Python and Pytorch

Interested in joining our team? Submit your resume to careers@taalas.com to be
considered for the exciting opportunity!

Close

Join our team!

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam placerat iaculis porta. Nam id blandit lectus. Vivamus at turpis eu dolor vulputate dignissim.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam placerat iaculis porta. Nam id blandit lectus. Vivamus at turpis eu dolor vulputate dignissim.

Send your CV

[contact-form-7 id="c1a6c82" title="Contact form"]

By submitting this form: You agree to the processing of the submitted personal data in accordance with our Privacy Policy, including the transfer of data to the United States.

Search

You are using an outdated browser which can not show modern web content.

We suggest you download Chrome or Firefox.