to work on enterprise-grade data ingestion and document processing solutions. The ideal candidate will have strong hands-on experience with
Unstructured.io framework
, data transformation pipelines, and integration with
LLM / Vector DB / Search platforms
. In this role, you will develop and optimize workflows for parsing, cleaning, and indexing complex enterprise documents.
Key Responsibilities
Develop and enhance data processing pipelines using
Unstructured.io
for converting unstructured data (PDF, DOCX, HTML, Emails, Scans) into structured formats.
Integrate extracted data with
Vector Databases
or
Search Indexing
workflows for LLM/RAG applications.
Optimize parsing performance, accuracy, and consistency across various document formats.
Work with
Python-based microservices
, APIs, and orchestration frameworks.
Collaborate with Data Engineering, ML, and Product teams to design scalable ingestion architectures.
Implement best practices for scalable, reusable pipeline components.
Monitor, debug, and resolve pipeline issues across staging and production environments.
Required Skills & Experience
Overall IT Experience: 8+ Years
3+ years
hands-on experience implementing
Unstructured.io
in production environments.
Strong experience with
Python
, including parsing, data transformation, and API development.
Experience building
RAG (Retrieval-Augmented Generation)
or Document AI workflows.
Hands-on with
Vector Databases
(Pinecone, Weaviate, Chroma, FAISS, Milvus, etc.).
Familiarity with
Cloud Platforms
(AWS preferred).
Experience with
Docker
, Git, CI/CD pipelines.
Nice to Have
Experience with frameworks like
LangChain / LlamaIndex
.
Knowledge of
NLP
, embeddings, and tokenization.
Experience integrating with
LLM providers
(OpenAI, Anthropic, Azure OpenAI, etc.).
Familiarity with document OCR tools (Tesseract, Azure Form Recognizer, AWS Textract).
Required Skills
CLOUD DEVELOPER
SQL APPLICATION DEVELOPER
Beware of fraud agents! do not pay money to get a job
MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.