Vitold Radkevich

About me

Hi, I'm Vitold. I build and operate distributed systems in production environments.

With 7+ years of experience, I focus on system reliability, observability, and performance under real-world production load. I take ownership of backend and platform systems and ensure they remain stable, scalable, and observable in production.

My work sits at the intersection of backend engineering and SRE - building systems that not only work, but remain resilient under failures, high load, and real operational pressure.

I'm particularly interested in combining SRE practices with AI - using LLM-based systems to automate incident analysis, reduce alert fatigue, and improve on-call efficiency.

What I do

I design and operate backend and platform systems in production environments, including architecture decisions, scalability, observability, and incident response.

My focus includes:

Reliability engineering and system stability under production load

Incident response and on-call (SRE practices)

Observability design (metrics, logs, tracing, alerting)

Building internal engineering platforms

Automating operational workflows and reducing MTTR

Technical focus

Reliability & SRE:
Incident Commander experience, SLO / SLI design, Chaos Engineering (Chaos Mesh), production incident response.

Platform Engineering:
Internal engineering platforms for incident management, load testing (Gatling-based systems), and system reliability tooling.

Backend & Distributed Systems:
Java / Kotlin, Spring ecosystem, microservices, event-driven systems (Kafka, RabbitMQ).

Cloud & DevOps:
AWS / Azure / GCP, Kubernetes, Docker, CI/CD pipelines, production monitoring systems.

AI for Engineering:
LLM-based automation for incident analysis, AI-assisted SRE workflows, reducing operational load through automation.

Selected impact

I have experience acting as Incident Commander during production outages, working to reduce MTTR, improve system observability, and build internal SRE tools for incident management and chaos testing.

I have contributed to systems that improve reliability, automate operational workflows, and support engineering teams in high-pressure production environments.

How I work

I take full ownership of systems in production and focus on reliability, clarity under pressure, and reducing operational risk through automation and observability.

I'm used to working in international Agile teams and collaborating closely with engineers, product managers, and stakeholders in high-responsibility environments.

Certifications

AWS Certified Cloud Practitioner

Microsoft Certified - Career Essentials in Generative AI

AI Devs Certification (LLM systems & engineering applications)

Languages

English - B2

Polish - B2

Russian - Native

Outside of work

I enjoy an active lifestyle - skiing, hiking, cycling, swimming, and traveling. I'm also passionate about cars and modern technologies.

Certifications

Commercial projects

Project 13

Project 12

Project 11

Project 10

Project 9

Project 8

Project 7

Project 6

Project 5

Project 4

Project 3

Project 2

Project 1

Contact Me

Roles	Backend & SRE Engineer
Team	4 members (3 programmers, 1 project manager)
Technologies	Kotlin, Spring Boot, Gatling, Kubernetes, CI/CD (GitHub Actions), Testing Tools (ADHOC, SharedGE), Jira, Git
Responsibilities	Managed user access and simulation scheduling, ensuring fair distribution of system resources. Designed and optimized load test scenarios for system resilience and failover validation. Reviewed and validated simulation scenarios from cross-functional teams (Pull Request review). Made architectural decisions to improve system efficiency, scalability, and reliability. Monitored system performance, configured alerts, and analyzed test results to propose improvements.
Duration	16 months

Roles	Backend & SRE Engineer
Team	4 members (3 programmers, 1 project manager)
Technologies	Kotlin, Hexagonal Architecture, Chaos Mesh, Kubernetes, CI/CD (GitHub Actions), REST, MongoDB, Monitoring Tools, Backstage, React, TypeScript, Jira
Responsibilities	Designed and implemented core backend functionality for orchestrating chaos experiments across multiple environments. Developed mechanisms for Canary testing and automated rollback in case of failures. Ensured data consistency and accurate logging for all maintenance activities. Collaborated with SRE and development teams to define safe failure scenarios and assess business impact. Ensured the system could safely run randomized chaos experiments without affecting critical services. Managed and reduced technical debt, improving code quality, maintainability, and reliability of the system.
Duration	16 months

Roles	Backend & SRE Engineer
Team	4 members (3 programmers, 1 project manager)
Technologies	Kotlin, Spring Boot, REST APIs, Slack API, Kubernetes, CI/CD (GitHub Actions), Monitoring & Alerting, MongoDB, Jira, AI, n8n
Responsibilities	Designed and developed Slack-based workflows for automated incident creation, coordination, and status tracking. Implemented mechanisms for automatic creation of dedicated Slack channels for incidents, enabling structured communication and faster response. Integrated the platform with internal monitoring and alerting systems to trigger incident workflows automatically. Built functionality for generating postmortems, ensuring consistent incident documentation and knowledge sharing. Implemented incident data collection and analysis to identify recurring patterns and improve system reliability Contributed to early incident detection and anomaly analysis based on historical data and system signals. Improved incident response processes by reducing manual actions and human error during high-pressure situations. Collaborated closely with SREs, engineers, and on-call teams to refine incident handling practices and tooling. Ensured high availability, reliability, and maintainability of the platform used during critical system failures.
Duration	16 months

Roles	Backend & SRE Engineer
Team	9 members (7 programmers, tech lead, 1 project manager)
Technologies	Java, Spring Boot, REST APIs, Kubernetes, Monitoring & Alerting Systems, Metrics (Prometheus / internal metrics), MongoDB, CI/CD (GitHub Actions), React, Jira
Responsibilities	Designed and developed backend services powering a real-time status dashboard for core Allegro components. Integrated the platform with monitoring, alerting, and event systems to collect and correlate live system health data. Implemented logic for tracking outages, incidents, alerts, and system events with historical context. Built functionality to browse incidents and outages by week, month, and year, supporting operational reviews. Designed and generated reliability reports and summaries for engineering leadership and executive stakeholders (GMV). Enabled data-driven discussions around system stability, reliability trends, and operational risks. Ensured high performance and reliability of the platform used during incidents and operational reviews. Collaborated with SREs, engineers, and stakeholders to define meaningful reliability metrics and reporting formats. Improved observability and transparency across the organization by centralizing system health information.
Duration	16 months

Roles	Backend & SRE Engineer
Team	4 members (3 programmers, 1 project manager, team lead)
Technologies	Java, Spring (MVC, Boot, Data), Angular (frontend), REST, Kubernetes, CI/CD (GitHub Actions), MongoDB, Gradle, Git, GitHub, Jira
Responsibilities	Developed backend and frontend for application to track scheduled changes. Implemented reliable monitoring and alerting for planned system changes. Integrated with internal tools for automated notifications and incident tracking Ensured data consistency and accurate logging for all maintenance activities. Collaborated with cross-functional teams to optimize workflows for SRE operations. Managed and reduced technical debt, improving code quality, maintainability, and reliability of the system.
Duration	16 months

Roles	Backend developer
Team	9 members (7 programmers, tech lead, 1 project manager)
Technologies	Spring, Microservices, Hibernate, Git, JUnit, MongoDB, DynamoDb, MySQL, Stripe, AWS, Docker, Maven, Git, GitHub, Jira, Confluence
Responsibilities	Designed and developed a module for integrating with the Odoo system using microservices architecture. Built and maintained microservices with Java. Configured and managed MongoDB with AWS MongoDB Atlas. Integrated AWS SQS for messaging and AWS SNS for notifications. Developed RESTful APIs. Tested and debugged the integration module. Coordinated with cross-functional teams for deployment.
Duration	4 months

Roles	Backend developer/Tech Lead/DevOps
Team	6 members (3 programmers, 1 qa, 1 project manager, 1 ba)
Technologies	Spring, Hibernate, Git, JUnit, MySQL, MailJet, Stripe, AWS, Google API, JasperReports, GoDaddy, Docker, Linux, Liquibase, Maven, Git, GitHub, Jira, Slack
Responsibilities	Designed and developed a comprehensive project from scratch, including detailed UML diagrams and database schemas to outline the system's structure and data flow. Created and implemented APIs for calculating business parameters, including processing large Excel datasets with over 140,000 records and efficiently storing data in the database. Set up and configured AWS services including EC2, ASG, ELB, S3, Route 53, CloudFront, Beanstalk, VPC, and RDS for full project deployment and management. Oversaw the setup, maintenance, and support of test and production environments to ensure reliable deployment and operation of the application. Led technical discussions and decisions, guiding the development team through architectural and operational challenges. Collaborated with developers, QA engineers, and other stakeholders to align on project goals and deliverables. Monitored system performance, making necessary adjustments to optimize reliability and efficiency.
Duration	20 months

Roles	Backend developer/DevOps
Team	6 members (4 programmers, 1 qa, 1 project manager)
Technologies	Spring MVC, Spring, JPA, Hibernate, Git, JUnit, MySQL, MailJet, Gradle, Google API, JasperReports, Zapier API, Ionos, Tomcat, Linux, Liquibase, Gradle, Git, GitHub, Jira, Slack
Responsibilities	Rewrote the project from scratch, focusing on designing the new architecture and hosting it, transitioning from the old system that used Spring MVC with an embedded React app to a separate Spring REST API and a standalone React front end. Designed and implemented a new architecture, ensuring clear separation between the backend API and the frontend application to improve maintainability and scalability. Managed the hosting and deployment of the application, including setting up and configuring servers and environments to ensure stable and reliable operation. Oversaw the setup, maintenance, and support of test and production environments, ensuring that the application runs smoothly in both environments and handling any issues that arise. Collaborated with frontend developers to ensure seamless integration between the backend API and the frontend application, providing technical support and resolving integration issues. Provided technical leadership and guidance throughout the project, ensuring adherence to best practices and supporting the development team in achieving project goals.
Duration	28 months

Roles	Full stack developer
Team	12 members (8 programmers, 1 qa, 2 ba, project manager)
Technologies	Spring, JPA, Hibernate, Rabbit MQ, Git, JUnit, MsSQL, Microservices, Jenkins, AWS, GraphQL, React, Flyway, Maven, Git, GitHub, Jira, Slack
Responsibilities	Designed and implemented a parcel delivery module integrated with AliExpress, supporting international cargo workflows. Designed and developed a new microservice using Spring and Hibernate as part of a microservice-based architecture. Contributed to system architecture decisions, focusing on scalability, performance, and integration reliability. Integrated backend services with GraphQL and external systems to enable realtime data exchange. Automated build and deployment pipelines using Jenkins, improving release stability and deployment speed. Collaborated with backend, frontend, and business teams to align technical solutions with logistics requirements. Worked in an outstaff model with an Estonian-based team, aligning architecture and integration requirements with external stakeholders.
Duration	4 months