VBNet - Форум - LZW - его эффективная реализация

Вопрос: LZW - его эффективная реализация Добавлено: 30.01.08 08:19

Автор вопроса: xuser

Реализация LZW (или же прочих LZ-алгоритмов) с использованием строчных функций поиска и сравнения - работает, мягко говоря, - очень медленно! Есть соображения по этому поводу?
Ответить

Ответы Всего ответов: 15

Номер ответа: 1 Автор ответа: HACKER Разработчик Offline Client Вопросов: 236 Ответов: 8362	Профиль \| \| #1	Добавлено: 30.01.08 23:10
	Использовать другие языки программирования. Хотя имхо LZW в дотнете должен быть реализован. Ответить

Номер ответа: 2 Автор ответа: xuser Вопросов: 1 Ответов: 20	Профиль \| \| #2	Добавлено: 31.01.08 03:50
	В дотнете 2.0 реализовано, однако не то, что нужно. И, хотелось бы, что это работало, например и под net framework 1.0. Подсказка была бы полезной на любом языке программирования. Ответить

Номер ответа: 3 Автор ответа: xuser Вопросов: 1 Ответов: 20	Профиль \| \| #3	Добавлено: 31.01.08 03:58
	Есть, правда, точка с массой полезной информации. www.compression.ru Ответить

Номер ответа: 4 Автор ответа: HACKER Разработчик Offline Client Вопросов: 236 Ответов: 8362	Профиль \| \| #4	Добавлено: 01.02.08 02:19
	Ну так какие проблемы? Береш готовый код на си, компилиш в дллку, и вызываешь из дотнета... Ответить

Номер ответа: 5 Автор ответа: xuser Вопросов: 1 Ответов: 20	Профиль \| \| #5	Добавлено: 01.02.08 04:35
	В том и проблема, что не люблю пользоваться готовым кодом, и зачем моему творению таскать за собой левую библиотеку на C. Меня же интересует реализация лишь (своя модификация LZ для хозяйских целей) функций поиска и сравнения на уровне цепочек байт в байтовом масиве - но это однозначно не должна быть реализация перебором и сравнением отдельных байт в цикле. Например, как найти последовательность байт (100 8 5 9 допустим) в массиве байт (не прибегая к проверке и сопоставлению их в цикле) - то есь нечто подобное строчным функциям InStr, StrComp? Ты должен знать, как это проделать. ... Ответить

Номер ответа: 6 Автор ответа: Фeнягz Вопросов: 2 Ответов: 62	Web-сайт: atauenis.narod.ru Профиль \| \| #6	Добавлено: 14.10.08 07:01
	Option Strict On Option Explicit On Public Class LZW ' see http://marknelson.us/1989/10/01/lzw-data-compression/ ' this is a VB.NET conversion port of mark's C program. ' Please refer to that program prior to modifying this one. Private BITS As Integer = 14 Private HASHING_SHIFT As Integer = 4 Private MAX_VALUE As Integer = (1 << BITS) - 1 Private MAX_CODE As Integer = MAX_VALUE - 1 'Private Const TABLE_SIZE As Integer = 5021 ' 12 bits 'Private Const TABLE_SIZE As Integer = 9029 ' 13 bits Private Const TABLE_SIZE As Integer = 18041 ' 14 bits Private EOF As Integer = -1 Public brInput As IO.BinaryReader = Nothing Public bwOutput As IO.BinaryWriter = Nothing Private iaCode_Value(TABLE_SIZE) As Integer Private iaPrefix_Code(TABLE_SIZE) As Integer Private baAppend_Character(TABLE_SIZE) As Byte ' This is the compression routine. The code should be a fairly close ' match to the algorithm accompanying the article. Public Sub compress() Dim iNextCode As Integer = 0 Dim iCharacter As Integer = 0 Dim iStringCode As Integer = 0 Dim iIndex As Integer = 0 iNextCode = 256 ' Next code is the next available string code For i As Integer = 0 To TABLE_SIZE - 1 ' Clear out the string table before starting iaCode_Value(i) = -1 Next i ' Get the first iCharacter. Assuming it to be 0 - 255 ' Hence only valid for ASCII text files / iStringCode = ReadByte() '* This is the main loop where it all happens. This loop runs util all of ' the Input has been exhausted. Note that it stops adding codes to the ' table after all of the possible codes have been defined. iCharacter = ReadByte() While iCharacter <> -1 iIndex = find_match(iStringCode, iCharacter) ' See if the string is in / If (iaCode_Value(iIndex) <> -1) Then ' the table. If it is, / iStringCode = iaCode_Value(iIndex) ' get the code value. If / Else ' the string is not in the table, try to add it. / If (iNextCode <= MAX_CODE) Then iaCode_Value(iIndex) = iNextCode iNextCode += 1 iaPrefix_Code(iIndex) = iStringCode baAppend_Character(iIndex) = CByte(iCharacter) End If output_code(iStringCode) ' When a string is found / iStringCode = iCharacter ' that is not in the table / End If ' after adding the new one / iCharacter = ReadByte() End While ' End of the main loop. output_code(iStringCode) ' Output the last code output_code(MAX_VALUE) ' Output the end of buffer code / output_code(0) ' This code flushes the Output buffer*/ End Sub ' This is the hashing routine. It tries to find a match for the prefix+char ' string in the string table. If it finds it, the iIndex is returned. If ' the string is not found, the first available iIndex in the string table is ' returned instead. Private Function find_match(ByVal iHashPrefix As Integer, ByVal iHashCharacter As Integer) As Integer Dim iIndex As Integer = 0 Dim iOffset As Integer = 0 iIndex = CInt((iHashCharacter << HASHING_SHIFT) Xor iHashPrefix) If (iIndex = 0) Then iOffset = 1 Else iOffset = TABLE_SIZE - iIndex End If While (True) If iaCode_Value(iIndex) = -1 Then Return iIndex End If If (iaPrefix_Code(iIndex) = iHashPrefix) And (baAppend_Character(iIndex) = iHashCharacter) Then Return iIndex End If iIndex -= iOffset If (iIndex < 0) Then iIndex += TABLE_SIZE End If End While End Function ' The following routine is used to output variable length ' codes. It is written strictly for clarity, and is not ' particularly efficient. Private Sub output_code(ByVal code As Integer) Static output_bit_count As Integer = 0 Static output_bit_buffer As Long = 0 output_bit_buffer = output_bit_buffer Or (code << (32 - BITS - output_bit_count)) output_bit_count += BITS While output_bit_count >= 8 WriteByte(CByte((output_bit_buffer >> 24) And 255)) output_bit_buffer <<= 8 output_bit_count -= 8 End While End Sub ' This is the expansion routine. It takes an LZW format file, and expands ' it to an bwOutput file. The code here should be a fairly close match to ' the algorithm in the accompanying article. Public Sub expand() Dim baDecode_Stack(TABLE_SIZE) As Byte Dim iNextCode As Integer Dim iNewCode As Integer Dim iOldCode As Integer Dim bCharacter As Byte Dim iCurrCode As Integer Dim i As Integer 'This is the next available code to define. iNextCode = 256 ' Read in the first code, initialize the ' character variable, and send the first ' code to the output file. iOldCode = input_code() bCharacter = CType(iOldCode, Byte) WriteByte(CByte(iOldCode)) ' This is the main expansion loop. It reads in characters from the LZW file ' until it sees the special code used to inidicate the end of the data. iNewCode = input_code() While (iNewCode <> MAX_VALUE) If iNewCode >= iNextCode Then ' This code checks for the special STRING+CHARACTER+STRING+CHARACTER+STRING ' case which generates an undefined code. It handles it by decoding ' the last code, and adding a single character to the end of the decode string. baDecode_Stack(0) = bCharacter i = 1 iCurrCode = iOldCode Else ' Otherwise we do a straight decode of the new code. i = 0 iCurrCode = iNewCode End If While iCurrCode > 255 ' This routine simply decodes a string from the string table, storing ' it in a buffer. The buffer can then be output in reverse order by ' the expansion program. baDecode_Stack(i) = baAppend_Character(iCurrCode) i = i + 1 If i >= MAX_CODE Then Throw New ApplicationException("Fatal error during iCurrCode expansion." End If iCurrCode = iaPrefix_Code(iCurrCode) End While baDecode_Stack(i) = CType(iCurrCode, Byte) bCharacter = baDecode_Stack(i) 'Now we output the decoded string in reverse order. While i >= 0 WriteByte(baDecode_Stack(i)) i = i - 1 End While ' Finally, if possible, add a new code to the string table. If (iNextCode <= MAX_CODE) Then iaPrefix_Code(iNextCode) = iOldCode baAppend_Character(iNextCode) = bCharacter iNextCode += 1 End If iOldCode = iNewCode iNewCode = input_code() End While End Sub ' The following routine is used to input variable length ' codes. It is written strictly for clarity, and is not ' particularly efficient. Private Function input_code() As Integer Dim return_value As Long Static input_bit_count As Integer = 0 Static input_bit_buffer As Long = 0 Static Mask32 As Long = CLng(2 ^ 32) - 1 While input_bit_count <= 24 input_bit_buffer = (input_bit_buffer Or _ ReadByte() << (24 - input_bit_count)) And Mask32 input_bit_count += 8 End While return_value = (input_bit_buffer >> 32 - BITS) And Mask32 input_bit_buffer = (input_bit_buffer << BITS) And Mask32 input_bit_count -= BITS Return CInt(return_value) End Function Private Sub WriteByte(ByVal b As Byte) bwOutput.Write(b) End Sub Private Function ReadByte() As Integer Dim ba(1) As Byte Dim iResult As Integer iResult = brInput.Read(ba, 0, 1) If iResult = 0 Then Return -1 End If Return ba(0) End Function End Class Ответить

Номер ответа: 7 Автор ответа: Artyom Разработчик Вопросов: 130 Ответов: 6602	Профиль \| \| #7	Добавлено: 28.10.08 01:32
	Реализация LZW (или же прочих LZ-алгоритмов) с использованием строчных функций поиска и сравнения - работает, мягко говоря, - очень медленно! Есть соображения по этому поводу? Ну скажем так, строковые функции в .NET это далеко не самый оптимальный способ работы с данными. Тебе тут нужен Byte() и соответственно работать с массивом байт, в .NET еще быстрее ручками вряд ли получится сделать. А впрочем, ручками и не надо: http://www.codeproject.com/info/search.aspx?artkw=lzw&sbo=kw Ответить

Номер ответа: 8 Автор ответа: Фeнягz Вопросов: 2 Ответов: 62	Web-сайт: atauenis.narod.ru Профиль \| \| #8	Добавлено: 29.10.08 01:25
	Так я оттуда и брал вышеприведенный код. Кажется. Хороший CodeProject ресурс (бывает больше 10000 участников одновременно). И не Byte() для работы нужен, а MemorySream (получение массива байт - лишние ненужные операции). Ответить

Номер ответа: 9 Автор ответа: Artyom Разработчик Вопросов: 130 Ответов: 6602	Профиль \| \| #9	Добавлено: 29.10.08 02:59
	MemoryStream не может работать быстрее чем Byte(), так как под MemoryStream лежит Byte() Это просто обертка для возможности потоковой работы. Ответить

Номер ответа: 10 Автор ответа: Фeнягz Вопросов: 2 Ответов: 62	Web-сайт: atauenis.narod.ru Профиль \| \| #10	Добавлено: 29.10.08 07:41
	Понимаю. Хотя обёртка скорее не для байтового массива, а блока памяти (как и Byte()), динамически переразмеряемого. Однако в работе может понадобится проведение нескольких действий (и та её сжать, и эдак :D), и тогда MemoryStream - привлекательнее и быстрее. Ответить

Номер ответа: 11 Автор ответа: Artyom Разработчик Вопросов: 130 Ответов: 6602	Профиль \| \| #11	Добавлено: 29.10.08 08:15
	Понимаю. Хотя обёртка скорее не для байтового массива, а блока памяти (как и Byte()), динамически переразмеряемого. Здесь нет места догадкам, ЯЯЯ. Если вы сомневаетесь, посмотрите с помощью программы Reflector исходный код класса MemoryStream и убедитесь что он работает непосредственно с Byte(). Однако в работе может понадобится проведение нескольких действий (и та её сжать, и эдак :D), и тогда MemoryStream - привлекательнее и быстрее. Повторюсь, MemoryStream не может иметь преимуществ перед Byte() по скорости и это должно быть очевидно, выше я объяснял почему. У Byte() есть неоспоримое преимущество - возможность выполнения как последовательного, так и произвольного доступа (но нужно пользоваться ею с пониманием - при произвольном доступе кеш-память процессора может оказаться неэфективной) Ответить

Номер ответа: 12 Автор ответа: Фeнягz Вопросов: 2 Ответов: 62	Web-сайт: atauenis.narod.ru Профиль \| \| #12	Добавлено: 30.10.08 01:14
	http://igorr.zx6.ru/files/byteVSmemstreamVB9.zip Byte(), правда, - эффективней, если работа ведётся с массивом фиксированного размера. Ответить

Номер ответа: 13 Автор ответа: Artyom Разработчик Вопросов: 130 Ответов: 6602	Профиль \| \| #13	Добавлено: 30.10.08 04:03
	А в .NET есть только массивы фиксированного размера. Операции изменения размера массива типа Redim Preserve на самом деле просто создают новый массив нужного размера и выполняют копирование в него данных из старого массива. MemoryStream работает аналогично. При заполнении буфера, создается новый буфер в 2 раза большего размера и в него копируются данные. Если интересно, бери Reflector и смотри исходный код класса MemoryStream (функции WriteByte, EnsureCapacity, свойство Capacity) Ответить

Номер ответа: 14 Автор ответа: Фeнягz Вопросов: 2 Ответов: 62	Web-сайт: atauenis.narod.ru Профиль \| \| #14	Добавлено: 30.10.08 06:31
	В условиях проведения различного рода манипуляций с данными, частыми операциями переразмера - то есть в моей конкретной ситуации - MemoryStream эффективней. Я учту естественно скорость доступа к байтам в байтовом массиве и по возможности буду комбинировать это дело между собой для повышения эффективности, а от услуг Marshal'а - откажусь скорее всего :d. За MemoryStream я уже просветился в MSDN The MemoryStream class creates streams that have memory as a backing store instead of a disk or a network connection. MemoryStream encapsulates data stored as an unsigned byte array that is initialized upon creation of a MemoryStream object, or the array can be created as empty. The encapsulated data is directly accessible in memory. если работа ведётся с массивом фиксированного размера. Имел ввиду массив с заведомо известным колличеством элементов. Ответить

Номер ответа: 15 Автор ответа: Artyom Разработчик Вопросов: 130 Ответов: 6602	Профиль \| \| #15	Добавлено: 30.10.08 06:54
	Не знаю, есть ли смысл что-то добавлять, по-моему налицо полное непонимание принципов работы. Повторно отправляю к и исходному классу MemoryStream Ответить

Страница: 1 |