ExcelTechnology
How to Convert Doc to XML via VB.NET without Installing Microsoft Word
It is possible that we need to change the documents format sometimes for displaying contents or meeting some special requirements. It is common to convert Word documents or Excel files to other formats for the two formats are frequently used by users and Word and Excel have the function to convert the documents to some other formats directly, for example, html and xml.
For common users, we can use Save As of Word or Excel to make the documents as other formats. But for programmers, they usually need to use C# or VB.NET to convert formats.
We can find several professional programmer forums where programmers discuss their problems or search for help when they need a good method. Also, we can get many good methods about how to convert formats.
Recently, I found a good method about converting doc to XML. Therefore, I want to introduce it and hope that it will be useful for programmers who haven’t installed Microsoft Word on computer but need to convert the document to XML thorough VB.NET.
The following code shows us how to convert doc to XML with VB.NET. And this example is about country and populations. What’s more, we need to install Spire.Doc on computer.
'Create word document
Dim document_Renamed As New Document()
Dim section As Section = document_Renamed.AddSection()
Dim header As String() = {"Name", "Capital", "Continent", "Area", "Population"}
Dim data As String()() = { _
New String() {"Argentina", "Buenos Aires", "South America", "2777815", "32300003"}, _
New String() {"Bolivia", "La Paz", "South America", "1098575", "7300000"}, _
New String() {"Brazil", "Brasilia", "South America", "8511196", "150400000"}, _
New String() {"Canada", "Ottawa", "North America", "9976147", "26500000"}, _
New String() {"Chile", "Santiago", "South America", "756943", "13200000"}, _
New String() {"Colombia", "Bagota", "South America", "1138907", "33000000"}, _
New String() {"Cuba", "Havana", "North America", "114524", "10600000"}, _
New String() {"Ecuador", "Quito", "South America", "455502", "10600000"}, _
New String() {"El Salvador", "San Salvador", "North America", "20865", "5300000"}, _
New String() {"Guyana", "Georgetown", "South America", "214969", "800000"}, _
New String() {"Jamaica", "Kingston", "North America", "11424", "2500000"}, _
New String() {"Mexico", "Mexico City", "North America", "1967180", "88600000"}, _
New String() {"Nicaragua", "Managua", "North America", "139000", "3900000"}, _
New String() {"Paraguay", "Asuncion", "South America", "406576", "4660000"}, _
New String() {"Peru", "Lima", "South America", "1285215", "21600000"}, _
New String() {"United States of America", "Washington", "North America", "9363130", "249200000"}, _
New String() {"Uruguay", "Montevideo", "South America", "176140", "3002000"}, _
New String() {"Venezuela", "Caracas", "South America", "912047", "19700000"} _
Dim table As Spire.Doc.Table = section.AddTable()
}
table.ResetCells(data.Length + 1, header.Length)
' ***************** First Row *************************
Dim row As TableRow = table.Rows(0)
row.IsHeader = True
row.Height = 20 'unit: point, 1point = 0.3528 mm
row.HeightType = TableRowHeightType.Exactly
row.RowFormat.BackColor = Color.Gray
For i As Integer = 0 To header.Length - 1
row.Cells(i).CellFormat.VerticalAlignment = VerticalAlignment.Middle
Dim p As Paragraph = row.Cells(i).AddParagraph()
p.Format.HorizontalAlignment = Spire.Doc.Documents.HorizontalAlignment.Center
Dim txtRange As TextRange = p.AppendText(header(i))
txtRange.CharacterFormat.Bold = True
Next
For r As Integer = 0 To data.Length - 1
Dim dataRow As TableRow = table.Rows(r + 1)
dataRow.Height = 20
dataRow.HeightType = TableRowHeightType.Exactly
dataRow.RowFormat.BackColor = Color.Empty
For c As Integer = 0 To data(r).Length - 1
dataRow.Cells(c).CellFormat.VerticalAlignment = VerticalAlignment.Middle
dataRow.Cells(c).AddParagraph().AppendText(data(r)(c))
Next
Next
'Save xml file.
document_Renamed.SaveToFile("Sample.xml", FileFormat.Xml)
Conversion between Word and XML with VB.NET
When talking about XML, we may think of HTML. Yes, the two languages are similar and they are based on markup and used to deliver information on Web. But comparing with HTML, XML don’t define the used markup in advance.
One XML file contains two parts: preamble code and root element. Preamble code includes XML statement, processing instruction and architecture statement. Root element is the main part of XML file and includes data and information which is used to describe the data structure. Besides, XML can contain annotations, which provide essential explanation of XML source code.
Sometimes, we may need to import data to XML files from database or other format files which can be used to save data. Also, we may convert other files to XML files, for example, Word to XML.
About converting Word to XML, the easiest way is to save an existing Word document as XML file. However, developers often need to convert Word to XML with VB.NET. And there is a simple way to realize the conversion by using VB.NET.
"http://www.infinity-loop.de/DTD/upcast/4.0/upcast.dtd">
xmlns:html="http://www.w3.org/HTML/1998/html4"
xml:lang="en"
style="widows: 0; orphans: 0; word-break-inside: normal; -ilx-block-border-mode: merge;">
// You Can Convert Each Hyperlink from the Word Document into XML Document.
xlink:show="replace"
xlink:actuate="onRequest"
xlink:href="mailto:simpson@polaris.net">
...
//You Can Convert Each Bookmarks from the Word Document into XML Document.
xlink:href="#theThirdItem" ...>3
//You Can Convert Each Images from Word Document into XML Document.
Sometimes, we need to convert a large amount of Word documents to XML files. If we want to convert them as soon as possible, we may use the third party add-in. I know one named Spire.Doc which can realize the functions in Word without Microsoft Word being installed and the conversion between Word and XML, PDF or HTML.
